Skip to main content

CascadeAgent

The primary orchestrator for cascade execution. Routes queries through a model cascade — cheaper models first, falling back to more powerful models when quality validation fails.

Constructor

from cascadeflow import CascadeAgent, ModelConfig

agent = CascadeAgent(
    models=[
        ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
        ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
    ],
    quality_config={"threshold": 0.7},
    enable_cascade=True,
    verbose=False,
)

Parameters

ParameterTypeDefaultDescription
modelslist[ModelConfig]requiredModel configurations, sorted by cost
quality_configQualityConfig | dictNoneQuality validation settings
enable_cascadeboolTrueEnable speculative cascade
verboseboolFalseEnable verbose logging
domain_configsdict[str, DomainConfig]NonePer-domain routing configs
enable_domain_detectionboolFalseAuto-detect query domain
use_semantic_domainsboolTrueUse ML-based domain detection
enable_tool_complexity_routingboolTrueRoute tool calls by complexity
rule_engineRuleEngineNoneCustom rule engine for routing
tenant_rulesdict[str, Any]NonePer-tenant routing overrides
channel_modelsdict[str, list[str]]NoneChannel-to-model mapping
channel_failoverdict[str, str]NoneChannel failover map
tool_executorToolExecutorNoneTool executor instance

Methods

run()

Execute a query with cascade logic and full diagnostics.
result = await agent.run(
    "Analyze this dataset",
    max_tokens=500,
    temperature=0.5,
    tools=[...],
    max_steps=10,
)
ParameterTypeDefaultDescription
querystr | list[dict]requiredQuery string or message list
max_tokensint100Maximum tokens to generate
temperaturefloat0.7Sampling temperature (0-2)
complexity_hintstrNoneOverride complexity (“simple”, “moderate”, “complex”)
force_directboolFalseSkip cascade, use best model
toolslist[dict]NoneTool definitions
tool_choicestrNoneTool selection (“auto”, “none”, tool name)
messageslist[dict]NoneMulti-turn conversation history
max_stepsint5Max agent loop iterations
user_tierstrNoneUser tier for routing
workflowstrNoneWorkflow profile name
domain_hintstrNoneOverride detected domain
tenant_idstrNoneTenant identifier
channelstrNoneLogical channel for routing
Returns: CascadeResult

run_streaming()

Execute with streaming output and visual feedback.
result = await agent.run_streaming(
    "Explain quantum mechanics",
    enable_visual=True,
)
Same parameters as run(), plus:
ParameterTypeDefaultDescription
enable_visualboolTrueShow visual streaming indicator
Returns: CascadeResult

stream_events()

Async iterator for real-time streaming events. Use this for custom UI integration.
async for event in agent.stream_events("Explain TypeScript"):
    if event.type == StreamEventType.CHUNK:
        print(event.content, end="")
    elif event.type == StreamEventType.COMPLETE:
        print(f"\nModel: {event.data.get('model')}")
Same parameters as run(). Yields: StreamEvent objects

run_batch()

Process multiple queries with batch optimization.
from cascadeflow import BatchConfig

batch_result = await agent.run_batch(
    ["Query 1", "Query 2", "Query 3"],
    batch_config=BatchConfig(concurrency=3),
    max_tokens=200,
)

print(f"Total cost: ${batch_result.total_cost:.4f}")
print(f"Success rate: {batch_result.successful}/{len(batch_result.results)}")
ParameterTypeDefaultDescription
querieslist[str]requiredList of query strings
batch_configBatchConfigNoneBatch configuration
**run_kwargsArguments passed to each run() call
Returns: BatchResult with results, total_cost, total_time_ms, successful, failed, avg_cost, avg_latency_ms

Class Methods

from_env()

Create an agent by auto-detecting available providers from environment variables.
agent = CascadeAgent.from_env(verbose=True)

from_profile()

Create an agent from a preset profile.
agent = CascadeAgent.from_profile("cost_optimized")
# Profiles: "cost_optimized", "balanced", "speed_optimized", "quality_optimized", "development"

Configuration Methods

agent.update_models([...])                    # Replace model list
agent.update_quality_threshold(0.8)           # Update quality threshold
agent.update_domain_config("legal", config)   # Add domain config
agent.enable_domain_routing()                 # Enable domain detection
agent.disable_domain_routing()                # Disable domain detection

Statistics

stats = agent.get_stats()
agent.print_stats()
config = agent.get_config_snapshot()