Skip to main content

Inside the Agent Loop

Most AI optimization operates at the HTTP boundary — one request in, one response out. cascadeflow operates inside the agent loop, with full visibility into every step of multi-turn execution.

Why This Matters

A typical agent workflow is not one call. It is a loop:
Query → Model Call → Tool Call → Model Call → Tool Call → Model Call → Response
  ↑         ↑            ↑           ↑            ↑           ↑
  └── cascadeflow evaluates every decision boundary ──────────┘
Each arrow is a decision point where cascadeflow can measure, score, and act. External proxies see only the outer boundary. cascadeflow sees all of them.

Tool Call Interception

cascadeflow tracks and optionally gates tool calls as part of the agent loop:
import cascadeflow
from cascadeflow.tools import ToolConfig, ToolExecutor

tools = [
    ToolConfig(
        name="search",
        description="Search the web",
        parameters={"query": {"type": "string"}},
        handler=lambda query: f"Results for: {query}",
    ),
    ToolConfig(
        name="calculator",
        description="Evaluate math expressions",
        parameters={"expression": {"type": "string"}},
        handler=lambda expression: str(eval(expression)),
    ),
]

cascadeflow.init(mode="enforce")

with cascadeflow.run(budget=1.00, max_tool_calls=5) as session:
    result = await agent.run(
        "Research this topic and calculate the statistics",
        tools=tools,
        tool_executor=ToolExecutor(tools=tools),
        max_steps=10,
    )

    summary = session.summary()
    print(f"Tool calls used: {summary['tool_calls']}/5")
    print(f"Budget used: ${summary['cost_total']:.4f}/$1.00")
When the tool call cap is reached, cascadeflow issues a deny_tool action — the agent continues with what it has instead of making more calls.

Budget Tracking Across Steps

The Harness tracks cumulative spend across every step in the loop. This prevents cost surprises in deep agent workflows:
with cascadeflow.run(budget=0.50) as session:
    result = await agent.run("Deep multi-step analysis")

    for record in session.trace():
        print(
            f"Step {record['step']}: "
            f"{record['action']} | "
            f"model={record['model']} | "
            f"spent=${record['cost_total']:.4f} | "
            f"budget={record['budget_state']}"
        )
    # Step 1: allow        | model=gpt-4o-mini | spent=$0.0012 | budget=ok
    # Step 2: allow        | model=gpt-4o-mini | spent=$0.0031 | budget=ok
    # Step 3: switch_model | model=gpt-4o      | spent=$0.0245 | budget=ok
    # Step 4: allow        | model=gpt-4o-mini | spent=$0.0258 | budget=ok
    # ...
    # Step 9: stop         | model=gpt-4o      | spent=$0.5012 | budget=exceeded
The agent ran 9 steps before hitting the budget cap. Without cascadeflow, step 10-15 would have added unchecked cost.

Sub-Agent Handoffs

When agents delegate to other agents, cascadeflow tracks budget and policy across the entire chain:
researcher = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

async def research_handler(query: str) -> str:
    """Sub-agent: researches a topic."""
    result = await researcher.run(query)
    return result.content

tools = [
    ToolConfig(
        name="research",
        description="Delegate research to a specialist agent",
        parameters={"query": {"type": "string"}},
        handler=research_handler,
    ),
]

# One budget governs the entire agent tree
with cascadeflow.run(budget=2.00) as session:
    result = await main_agent.run(
        "Analyze and research this topic",
        tools=tools,
    )
    # session.summary() includes costs from main_agent AND researcher
    print(f"Total cost across all agents: ${session.summary()['cost_total']:.4f}")

Model Switching Mid-Loop

The Harness can switch models during execution based on context:
# Quality-driven: cheaper model handles simple steps, better model handles hard ones
with cascadeflow.run(
    kpi_weights={"quality": 0.7, "cost": 0.3},
    kpi_targets={"quality": 0.85},
) as session:
    result = await agent.run("Complex multi-step reasoning task")

    # Trace shows model decisions per step
    for record in session.trace():
        if record['action'] == 'switch_model':
            print(f"Step {record['step']}: Switched to {record['model']}{record['reason']}")

Latency Advantage in Loops

Every extra hop matters inside a loop. Proxy-based solutions add 40-60ms per call. In a 10-step agent loop, that is 400-600ms of pure overhead — latency that has nothing to do with the actual work. cascadeflow adds <1ms per step because it runs in-process:
Agent loop depthProxy overheadcascadeflow overhead
5 steps200-300ms<5ms
10 steps400-600ms<10ms
25 steps1-1.5s<25ms
For real-time UX, task throughput, and enterprise SLA performance, this compounding matters.

Complete Loop Example

import cascadeflow
from cascadeflow import CascadeAgent, ModelConfig
from cascadeflow.tools import ToolConfig, ToolExecutor

agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

tools = [
    ToolConfig(name="search", description="Web search", parameters={"q": {"type": "string"}},
              handler=lambda q: f"Results for {q}"),
    ToolConfig(name="calc", description="Calculator", parameters={"expr": {"type": "string"}},
              handler=lambda expr: str(eval(expr))),
]

cascadeflow.init(mode="enforce")

with cascadeflow.run(
    budget=1.00,
    max_tool_calls=8,
    compliance="gdpr",
    kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1},
) as session:
    result = await agent.run(
        "Research EU market data and calculate growth rates",
        tools=tools,
        tool_executor=ToolExecutor(tools=tools),
        max_steps=15,
    )

    summary = session.summary()
    print(f"Cost: ${summary['cost_total']:.4f} / $1.00")
    print(f"Steps: {summary['steps']}")
    print(f"Tool calls: {summary['tool_calls']} / 8")
    print(f"Budget remaining: ${summary['budget_remaining']:.4f}")

Next Step

Plan your production rollout. Follow the Rollout Guide →