Inside the Agent Loop

Most AI optimization operates at the HTTP boundary — one request in, one response out. cascadeflow operates inside the agent loop, with full visibility into every step of multi-turn execution.

Why This Matters

A typical agent workflow is not one call. It is a loop:

Query → Model Call → Tool Call → Model Call → Tool Call → Model Call → Response
  ↑         ↑            ↑           ↑            ↑           ↑
  └── cascadeflow evaluates every decision boundary ──────────┘

Each arrow is a decision point where cascadeflow can measure, score, and act. External proxies see only the outer boundary. cascadeflow sees all of them.

Tool Call Interception

cascadeflow tracks and optionally gates tool calls as part of the agent loop:

import cascadeflow
from cascadeflow.tools import ToolConfig, ToolExecutor

tools = [
    ToolConfig(
        name="search",
        description="Search the web",
        parameters={"query": {"type": "string"}},
        handler=lambda query: f"Results for: {query}",
    ),
    ToolConfig(
        name="calculator",
        description="Evaluate math expressions",
        parameters={"expression": {"type": "string"}},
        handler=lambda expression: str(eval(expression)),
    ),
]

cascadeflow.init(mode="enforce")

with cascadeflow.run(budget=1.00, max_tool_calls=5) as session:
    result = await agent.run(
        "Research this topic and calculate the statistics",
        tools=tools,
        tool_executor=ToolExecutor(tools=tools),
        max_steps=10,
    )

    summary = session.summary()
    print(f"Tool calls used: {summary['tool_calls']}/5")
    print(f"Budget used: ${summary['cost_total']:.4f}/$1.00")

When the tool call cap is reached, cascadeflow issues a deny_tool action — the agent continues with what it has instead of making more calls.

Budget Tracking Across Steps

The Harness tracks cumulative spend across every step in the loop. This prevents cost surprises in deep agent workflows:

with cascadeflow.run(budget=0.50) as session:
    result = await agent.run("Deep multi-step analysis")

    for record in session.trace():
        print(
            f"Step {record['step']}: "
            f"{record['action']} | "
            f"model={record['model']} | "
            f"spent=${record['cost_total']:.4f} | "
            f"budget={record['budget_state']}"
        )
    # Step 1: allow        | model=gpt-4o-mini | spent=$0.0012 | budget=ok
    # Step 2: allow        | model=gpt-4o-mini | spent=$0.0031 | budget=ok
    # Step 3: switch_model | model=gpt-4o      | spent=$0.0245 | budget=ok
    # Step 4: allow        | model=gpt-4o-mini | spent=$0.0258 | budget=ok
    # ...
    # Step 9: stop         | model=gpt-4o      | spent=$0.5012 | budget=exceeded

The agent ran 9 steps before hitting the budget cap. Without cascadeflow, step 10-15 would have added unchecked cost.

Sub-Agent Handoffs

When agents delegate to other agents, cascadeflow tracks budget and policy across the entire chain:

researcher = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

async def research_handler(query: str) -> str:
    """Sub-agent: researches a topic."""
    result = await researcher.run(query)
    return result.content

tools = [
    ToolConfig(
        name="research",
        description="Delegate research to a specialist agent",
        parameters={"query": {"type": "string"}},
        handler=research_handler,
    ),
]

# One budget governs the entire agent tree
with cascadeflow.run(budget=2.00) as session:
    result = await main_agent.run(
        "Analyze and research this topic",
        tools=tools,
    )
    # session.summary() includes costs from main_agent AND researcher
    print(f"Total cost across all agents: ${session.summary()['cost_total']:.4f}")

Model Switching Mid-Loop

The Harness can switch models during execution based on context:

# Quality-driven: cheaper model handles simple steps, better model handles hard ones
with cascadeflow.run(
    kpi_weights={"quality": 0.7, "cost": 0.3},
    kpi_targets={"quality": 0.85},
) as session:
    result = await agent.run("Complex multi-step reasoning task")

    # Trace shows model decisions per step
    for record in session.trace():
        if record['action'] == 'switch_model':
            print(f"Step {record['step']}: Switched to {record['model']} — {record['reason']}")

Latency Advantage in Loops

Every extra hop matters inside a loop. Proxy-based solutions add 40-60ms per call. In a 10-step agent loop, that is 400-600ms of pure overhead — latency that has nothing to do with the actual work. cascadeflow adds <1ms per step because it runs in-process:

Agent loop depth	Proxy overhead	cascadeflow overhead
5 steps	200-300ms	<5ms
10 steps	400-600ms	<10ms
25 steps	1-1.5s	<25ms

For real-time UX, task throughput, and enterprise SLA performance, this compounding matters.

Complete Loop Example

import cascadeflow
from cascadeflow import CascadeAgent, ModelConfig
from cascadeflow.tools import ToolConfig, ToolExecutor

agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

tools = [
    ToolConfig(name="search", description="Web search", parameters={"q": {"type": "string"}},
              handler=lambda q: f"Results for {q}"),
    ToolConfig(name="calc", description="Calculator", parameters={"expr": {"type": "string"}},
              handler=lambda expr: str(eval(expr))),
]

cascadeflow.init(mode="enforce")

with cascadeflow.run(
    budget=1.00,
    max_tool_calls=8,
    compliance="gdpr",
    kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1},
) as session:
    result = await agent.run(
        "Research EU market data and calculate growth rates",
        tools=tools,
        tool_executor=ToolExecutor(tools=tools),
        max_steps=15,
    )

    summary = session.summary()
    print(f"Cost: ${summary['cost_total']:.4f} / $1.00")
    print(f"Steps: {summary['steps']}")
    print(f"Tool calls: {summary['tool_calls']} / 8")
    print(f"Budget remaining: ${summary['budget_remaining']:.4f}")

Run these examples: examples/agentic_multi_agent.py | examples/tool_execution.py | examples/multi_step_cascade.py

Next Step

Plan your production rollout. Follow the Rollout Guide →

Overview

Getting Started

Core Concepts

Harness

Integrations

Guides

Resources

Agent Loop

Inside the Agent Loop

Why This Matters

Tool Call Interception

Budget Tracking Across Steps

Sub-Agent Handoffs

Model Switching Mid-Loop

Latency Advantage in Loops

Complete Loop Example

Next Step

​Inside the Agent Loop

​Why This Matters

​Tool Call Interception

​Budget Tracking Across Steps

​Sub-Agent Handoffs

​Model Switching Mid-Loop

​Latency Advantage in Loops

​Complete Loop Example

​Next Step

Inside the Agent Loop

Why This Matters

Tool Call Interception

Budget Tracking Across Steps

Sub-Agent Handoffs

Model Switching Mid-Loop

Latency Advantage in Loops

Complete Loop Example

Next Step