Skip to main content

Agent Harness

The Harness is the core of cascadeflow’s runtime intelligence. It wraps agent execution and makes a decision at every step — should this model call proceed, be switched, or be stopped?

What the Harness Does

At every LLM call or tool execution inside an agent loop, the Harness:
  1. Checks hard constraints — budget remaining, compliance allowlist, tool call cap, latency limit, energy limit
  2. Scores soft dimensions — quality, cost, latency, energy weighted by KPI priorities
  3. Decides an actionallow, switch_model, deny_tool, or stop
  4. Records a trace — action, reason, model, step, cost, budget state
In observe mode, decisions are recorded but not enforced. In enforce mode, they shape execution in real time.

HarnessConfig — The Full Control Surface

All Harness behavior is configured through a single dataclass:
from cascadeflow import HarnessConfig

config = HarnessConfig(
    mode="enforce",                     # "off" | "observe" | "enforce"
    verbose=False,                      # Print decisions to stderr

    # Hard constraints
    budget=0.50,                        # Max USD for the run
    max_tool_calls=10,                  # Max tool/function calls
    max_latency_ms=5000.0,              # Max wall-clock ms per call
    max_energy=100.0,                   # Max energy units

    # Soft scoring
    kpi_weights={                       # Relative importance (must sum to ~1.0)
        "quality": 0.6,
        "cost": 0.3,
        "latency": 0.1,
    },
    kpi_targets={"quality": 0.9},       # Target values for KPI dimensions

    # Compliance
    compliance="gdpr",                  # "gdpr" | "hipaa" | "pci" | "strict"
)

The Three-Tier API

cascadeflow offers three levels of control — use the one that fits your needs:

Tier 1: Global Init (Zero-Change)

import cascadeflow
cascadeflow.init(mode="observe")
# All LLM calls are tracked. Nothing changes.
Best for: first rollout, measuring baseline costs, auditing compliance.

Tier 2: Scoped Run (Block-Level Control)

cascadeflow.init(mode="enforce")

with cascadeflow.run(budget=0.50, compliance="gdpr") as session:
    result = await agent.run("Analyze EU data")
    print(session.summary())
Best for: per-request budgets, scoped policy, session-level metrics.

Tier 3: Agent Decorator (Per-Agent Policy)

@cascadeflow.agent(
    budget=1.00,
    compliance="hipaa",
    kpi_weights={"quality": 0.8, "cost": 0.2},
)
async def medical_agent(query: str):
    return await llm.complete(query)
Best for: multi-agent systems where each agent has different constraints.

Decision Priority

When the Harness evaluates a step, it follows a strict priority order:
PriorityCheckAction if violated
1Budget exhaustedstop
2Compliance allowlistswitch_model or stop
3Tool call capdeny_tool
4Latency limitswitch_model
5Energy limitswitch_model
6KPI scoringallow or switch_model
Hard constraints (budget, compliance) always take priority over soft scoring (KPI weights).

Six Dimensions at a Glance

DimensionHard capSoft scoringDeep dive
Costbudgetkpi_weights.costBudget Enforcement
Qualitykpi_weights.qualityKPI Optimization
Latencymax_latency_mskpi_weights.latencyKPI Optimization
CompliancecomplianceCompliance Gating
Energymax_energykpi_weights.energyEnergy Tracking
Tool callsmax_tool_callsBudget Enforcement

Observe vs Enforce

BehaviorObserveEnforce
Tracks cost, latency, energyYesYes
Records decision traceYesYes
Blocks on budget exceededNoYes
Switches non-compliant modelsNoYes
Denies tool calls at capNoYes
Stops executionNoYes
trace() record applied fieldfalsetrue
Start with observe to validate your policies against real traffic. Switch to enforce when you are confident the rules are correct.
Run this example: examples/enforcement/basic_enforcement.py | API reference: HarnessConfig

Next Step

See how the Harness operates inside multi-step agent loops. Understand the Agent Loop →