The Agent Runtime Intelligence Layer

cascadeflow is infrastructure that sits inside AI agent execution and continuously optimizes outcomes across business and technical constraints in real time. This is not another model router. It is a decision system inside the agent loop. Every model call, tool call, and sub-agent handoff can be measured, scored, and steered — where cost, delay, and failure actually happen.

Get Started

Install, observe, enforce, and ship to production in minutes.

Why cascadeflow

The business case for inside-the-loop agent intelligence.

Install

pip install cascadeflow

npm install @cascadeflow/core

import cascadeflow

cascadeflow.init(mode="observe")
# Every OpenAI and Anthropic SDK call is now tracked — zero code changes.

What Makes This Different

	External Proxy	cascadeflow
Where it runs	HTTP boundary	Inside the agent loop
What it sees	Request/response pairs	Step count, budget, tool history, quality, domain, business context
What it optimizes	Cost	Cost + latency + quality + budget + compliance + energy
What it does	Observes	`allow`, `switch_model`, `deny_tool`, `stop`
Latency overhead	40-60ms per call	<1ms in-process
In 10-step agent loop	400-600ms added	~0ms added

Three Lines to Govern Any Agent

import cascadeflow
cascadeflow.init(mode="observe")
# All LLM calls tracked. No blocking, no changes.

import cascadeflow
cascadeflow.init(mode="enforce")

with cascadeflow.run(budget=0.50) as session:
    result = await agent.run("Analyze this dataset")
    print(session.summary())

import cascadeflow
cascadeflow.init(mode="enforce")

@cascadeflow.agent(budget=0.20, compliance="gdpr")
async def my_agent(query: str):
    return await llm.complete(query)

Six Dimensions, One Decision

Every agent step is scored across six dimensions simultaneously:

Dimension	What it controls	Example
Cost	USD per LLM call from pricing table	Budget cap of $0.50 per run
Latency	Wall-clock milliseconds per call	Max 2000ms per call
Quality	Model quality priors for routing	60% weight on quality KPI
Budget	Cumulative spend tracking and caps	Per-user daily limits
Compliance	Model allowlists per regulation	GDPR: only approved models
Energy	Compute-intensity coefficients	Carbon-aware model selection

Works With Every Major Framework

Framework	Python	TypeScript	Type
LangChain / LangGraph	`cascadeflow[langchain]`	`@cascadeflow/langchain`	Callback handler
OpenAI Agents SDK	`cascadeflow[openai-agents]`	—	ModelProvider
CrewAI	`cascadeflow[crewai]`	—	llm_hooks
Google ADK	`cascadeflow[google-adk]`	—	BasePlugin
n8n	—	`@cascadeflow/n8n-nodes-cascadeflow`	Community node
Vercel AI SDK	—	`@cascadeflow/vercel-ai`	Middleware
Hermes Agent	`cascadeflow`	—	Delegation router

Explore

Agent Harness

Configure budget, compliance, KPI, and energy controls.

Agent Loop

How cascadeflow operates inside multi-step agent execution.

Examples

42+ Python and 33+ TypeScript examples on GitHub.

Integrations

LangChain, OpenAI Agents, CrewAI, Google ADK, n8n, Vercel AI, Hermes Agent.

API Reference

Full Python and TypeScript API documentation.

For Coding Agents

Canonical facts, repo map, and implementation entry points.

Overview

Getting Started

Core Concepts

Harness

Integrations

Guides

Resources

cascadeflow

The Agent Runtime Intelligence Layer

Get Started

Why cascadeflow

Install

What Makes This Different

Three Lines to Govern Any Agent

Six Dimensions, One Decision

Works With Every Major Framework

Explore

Agent Harness

Agent Loop

Examples

Integrations

API Reference

For Coding Agents

​The Agent Runtime Intelligence Layer

Get Started

Why cascadeflow

​Install

​What Makes This Different

​Three Lines to Govern Any Agent

​Six Dimensions, One Decision

​Works With Every Major Framework

​Explore

Agent Harness

Agent Loop

Examples

Integrations

API Reference

For Coding Agents

The Agent Runtime Intelligence Layer

Install

What Makes This Different

Three Lines to Govern Any Agent

Six Dimensions, One Decision

Works With Every Major Framework

Explore