Cascade Engine
The Cascade Engine optimizes model selection through speculative execution with quality validation:- Speculatively executes small, fast models first — optimistic execution ($0.15-0.30/1M tokens)
- Validates quality of responses using configurable thresholds (completeness, confidence, correctness)
- Dynamically escalates to larger models only when quality validation fails ($1.25-3.00/1M tokens)
- Learns patterns to optimize future cascading decisions and domain-specific routing
Harness Engine
The Harness Engine provides agent runtime intelligence — budget enforcement, compliance gating, KPI-weighted routing, energy tracking, and decision traces. Unlike the Cascade Engine which routes between models, the Harness Engine wraps existing agent execution and makes decisions at every step:Decision Flow
For each LLM call or tool execution inside an agent loop, the harness:- Records the model, step number, and cumulative metrics
- Evaluates all configured constraints (budget, compliance, tool calls, latency, energy)
- Scores the call against KPI weights if configured
- Decides an action:
allow,switch_model,deny_tool, orstop - Enforces the action if in
enforcemode (logs only inobservemode) - Appends a trace record for auditability
HarnessConfig
All harness behavior is configured through a single dataclass:Combined Usage
When both engines are active, the Cascade Engine handles model selection while the Harness Engine enforces constraints:Provider Abstraction
cascadeflow supports 17+ providers through a unified interface:| Provider | Type | Package |
|---|---|---|
| OpenAI | API | cascadeflow[openai] |
| Anthropic | API | cascadeflow[anthropic] |
| Groq | API | cascadeflow[groq] |
| Together | API | cascadeflow[together] |
| Hugging Face | API | cascadeflow[huggingface] |
| Ollama | Local | Built-in (HTTP) |
| vLLM | Local | cascadeflow[vllm] |
| Vercel AI SDK | TypeScript | @cascadeflow/vercel-ai |