Skip to main content

CascadeResult

Returned by CascadeAgent.run(), run_streaming(), and run_batch(). Contains the generated response along with full cost, quality, timing, and routing diagnostics.

Usage

result = await agent.run("Explain quantum computing")

print(result.content)
print(f"Model: {result.model_used}")
print(f"Cost: ${result.total_cost:.6f}")
print(f"Savings: {result.savings_percentage}%")
print(f"Cascaded: {result.cascaded}, Accepted: {result.draft_accepted}")

Core Fields

FieldTypeDescription
contentstrGenerated response text
model_usedstrModel that produced the response
total_costfloatTotal cost in USD
latency_msfloatTotal latency in milliseconds
complexitystrDetected complexity level
cascadedboolWhether cascade was used
draft_acceptedboolWhether the draft passed quality validation
routing_strategystrRouting strategy used ("direct" or "cascade")
reasonstrExplanation for the routing decision

Tool Calling

FieldTypeDescription
tool_callslist[dict] | NoneTool calls made during execution
has_tool_callsboolWhether the response includes tool calls

Quality Diagnostics

FieldTypeDescription
quality_scorefloat | NoneQuality score (0-1)
quality_thresholdfloat | NoneThreshold used for validation
quality_check_passedbool | NoneWhether the quality check passed
rejection_reasonstr | NoneWhy the draft was rejected

Response Tracking

FieldTypeDescription
draft_responsestr | NoneFull draft response text
verifier_responsestr | NoneFull verifier response text
response_lengthint | NoneResponse character length
response_word_countint | NoneResponse word count

Timing Breakdown

FieldTypeDescription
complexity_detection_msfloat | NoneTime to detect complexity
draft_generation_msfloat | NoneDraft model generation time
quality_verification_msfloat | NoneQuality validation time
verifier_generation_msfloat | NoneVerifier model generation time
cascade_overhead_msfloat | NoneOverhead from cascade (wasted if draft rejected)

Cost Breakdown

FieldTypeDescription
draft_costfloat | NoneCost of the draft call
verifier_costfloat | NoneCost of the verifier call
cost_savedfloat | NoneSavings vs always using best model
savings_percentagefloat | NoneSavings as percentage (0-100)

Model Information

FieldTypeDescription
draft_modelstr | NoneDraft model name
draft_latency_msfloat | NoneDraft model latency
draft_confidencefloat | NoneDraft confidence score
verifier_modelstr | NoneVerifier model name
verifier_latency_msfloat | NoneVerifier model latency
verifier_confidencefloat | NoneVerifier confidence score

Methods

to_dict()

Convert the result to a plain dictionary.
data = result.to_dict()
import json
print(json.dumps(data, indent=2))