KPI-Weighted Routing

The harness scores each model decision against configurable KPI weights. This lets teams encode business priorities into agent behavior without changing agent code.

KPI Dimensions

Dimension	Score Source	Range	What it means
`quality`	Model quality priors	0.0-1.0	Higher = better output quality
`cost`	Inverse of model cost	0.0-1.0	Higher = cheaper model
`latency`	Model latency priors	0.0-1.0	Higher = faster response
`energy`	Inverse of energy coefficient	0.0-1.0	Higher = lower compute intensity

Configuration

import cascadeflow

cascadeflow.init(mode="enforce")

with cascadeflow.run(
    kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1},
    kpi_targets={"quality": 0.9}
) as session:
    result = await agent.run("Analyze this legal document")

Weights

Weights are relative — they don’t need to sum to 1.0 (they are normalized internally). They control the relative importance of each dimension in the composite score.

# Quality-first (premium workload)
kpi_weights = {"quality": 0.8, "cost": 0.1, "latency": 0.1}

# Cost-first (high-volume batch)
kpi_weights = {"quality": 0.2, "cost": 0.7, "latency": 0.1}

# Balanced
kpi_weights = {"quality": 0.4, "cost": 0.3, "latency": 0.2, "energy": 0.1}

Targets

Targets set minimum acceptable values. If a model’s score for a dimension falls below the target, it is penalized in the composite score.

kpi_targets = {
    "quality": 0.9,   # Require high quality
    "latency": 0.7,   # Require reasonable speed
}

Scoring Formula

The composite score for a model is:

score = quality_prior * w_quality + cost_utility * w_cost + latency_prior * w_latency + energy_utility * w_energy

Where w_* are the normalized weights and utility values are computed from model priors.

Quality Priors

Built-in quality priors for common models (OpenAI):

Model	Quality	Latency
o1	0.95	0.40
gpt-4o	0.90	0.72
gpt-4-turbo	0.88	0.66
gpt-4	0.87	0.52
gpt-5-mini	0.86	0.84
o1-mini	0.82	0.60
o3-mini	0.80	0.78
gpt-4o-mini	0.75	0.93
gpt-3.5-turbo	0.65	1.00

Per-Agent KPI Weights

Different agents can have different priorities:

@cascadeflow.agent(
    budget=0.50,
    kpi_weights={"quality": 0.8, "cost": 0.2}
)
async def quality_agent(query: str):
    return await llm.complete(query)

@cascadeflow.agent(
    budget=0.10,
    kpi_weights={"cost": 0.8, "quality": 0.2}
)
async def budget_agent(query: str):
    return await llm.complete(query)

Example walkthrough: KPI-Weighted Routing | GitHub: examples/cost_tracking.py | examples/semantic_quality_domain_detection.py

Overview

Getting Started

Core Concepts

Harness

Integrations

Guides

Resources

KPI Dimensions

Configuration

Weights

Targets

Scoring Formula

Quality Priors

Per-Agent KPI Weights

​KPI Dimensions

​Configuration

​Weights

​Targets

​Scoring Formula

​Quality Priors

​Per-Agent KPI Weights

KPI Dimensions

Configuration

Weights

Targets

Scoring Formula

Quality Priors

Per-Agent KPI Weights