CascadeAgent
The primary orchestrator for cascade execution. Routes queries through a model cascade — cheaper models first, falling back to more powerful models when quality validation fails.Constructor
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
models | list[ModelConfig] | required | Model configurations, sorted by cost |
quality_config | QualityConfig | dict | None | Quality validation settings |
enable_cascade | bool | True | Enable speculative cascade |
verbose | bool | False | Enable verbose logging |
domain_configs | dict[str, DomainConfig] | None | Per-domain routing configs |
enable_domain_detection | bool | False | Auto-detect query domain |
use_semantic_domains | bool | True | Use ML-based domain detection |
enable_tool_complexity_routing | bool | True | Route tool calls by complexity |
rule_engine | RuleEngine | None | Custom rule engine for routing |
tenant_rules | dict[str, Any] | None | Per-tenant routing overrides |
channel_models | dict[str, list[str]] | None | Channel-to-model mapping |
channel_failover | dict[str, str] | None | Channel failover map |
tool_executor | ToolExecutor | None | Tool executor instance |
Methods
run()
Execute a query with cascade logic and full diagnostics.
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | list[dict] | required | Query string or message list |
max_tokens | int | 100 | Maximum tokens to generate |
temperature | float | 0.7 | Sampling temperature (0-2) |
complexity_hint | str | None | Override complexity (“simple”, “moderate”, “complex”) |
force_direct | bool | False | Skip cascade, use best model |
tools | list[dict] | None | Tool definitions |
tool_choice | str | None | Tool selection (“auto”, “none”, tool name) |
messages | list[dict] | None | Multi-turn conversation history |
max_steps | int | 5 | Max agent loop iterations |
user_tier | str | None | User tier for routing |
workflow | str | None | Workflow profile name |
domain_hint | str | None | Override detected domain |
tenant_id | str | None | Tenant identifier |
channel | str | None | Logical channel for routing |
CascadeResult
run_streaming()
Execute with streaming output and visual feedback.
run(), plus:
| Parameter | Type | Default | Description |
|---|---|---|---|
enable_visual | bool | True | Show visual streaming indicator |
CascadeResult
stream_events()
Async iterator for real-time streaming events. Use this for custom UI integration.
run().
Yields: StreamEvent objects
run_batch()
Process multiple queries with batch optimization.
| Parameter | Type | Default | Description |
|---|---|---|---|
queries | list[str] | required | List of query strings |
batch_config | BatchConfig | None | Batch configuration |
**run_kwargs | Arguments passed to each run() call |
BatchResult with results, total_cost, total_time_ms, successful, failed, avg_cost, avg_latency_ms