Skip to main content
The harness scores each model decision against configurable KPI weights. This lets teams encode business priorities into agent behavior without changing agent code.

KPI Dimensions

DimensionScore SourceRangeWhat it means
qualityModel quality priors0.0-1.0Higher = better output quality
costInverse of model cost0.0-1.0Higher = cheaper model
latencyModel latency priors0.0-1.0Higher = faster response
energyInverse of energy coefficient0.0-1.0Higher = lower compute intensity

Configuration

import cascadeflow

cascadeflow.init(mode="enforce")

with cascadeflow.run(
    kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1},
    kpi_targets={"quality": 0.9}
) as session:
    result = await agent.run("Analyze this legal document")

Weights

Weights are relative — they don’t need to sum to 1.0 (they are normalized internally). They control the relative importance of each dimension in the composite score.
# Quality-first (premium workload)
kpi_weights = {"quality": 0.8, "cost": 0.1, "latency": 0.1}

# Cost-first (high-volume batch)
kpi_weights = {"quality": 0.2, "cost": 0.7, "latency": 0.1}

# Balanced
kpi_weights = {"quality": 0.4, "cost": 0.3, "latency": 0.2, "energy": 0.1}

Targets

Targets set minimum acceptable values. If a model’s score for a dimension falls below the target, it is penalized in the composite score.
kpi_targets = {
    "quality": 0.9,   # Require high quality
    "latency": 0.7,   # Require reasonable speed
}

Scoring Formula

The composite score for a model is:
score = quality_prior * w_quality + cost_utility * w_cost + latency_prior * w_latency + energy_utility * w_energy
Where w_* are the normalized weights and utility values are computed from model priors.

Quality Priors

Built-in quality priors for common models (OpenAI):
ModelQualityLatency
o10.950.40
gpt-4o0.900.72
gpt-4-turbo0.880.66
gpt-40.870.52
gpt-5-mini0.860.84
o1-mini0.820.60
o3-mini0.800.78
gpt-4o-mini0.750.93
gpt-3.5-turbo0.651.00

Per-Agent KPI Weights

Different agents can have different priorities:
@cascadeflow.agent(
    budget=0.50,
    kpi_weights={"quality": 0.8, "cost": 0.2}
)
async def quality_agent(query: str):
    return await llm.complete(query)

@cascadeflow.agent(
    budget=0.10,
    kpi_weights={"cost": 0.8, "quality": 0.2}
)
async def budget_agent(query: str):
    return await llm.complete(query)