Skip to main content

ModelConfig

Defines a model in the cascade. Models are sorted by cost — cheaper models are tried first as drafters, more expensive models serve as verifiers.

Definition

from cascadeflow import ModelConfig

model = ModelConfig(
    name="gpt-4o-mini",
    provider="openai",
    cost=0.000375,
    supports_tools=True,
)

Fields

FieldTypeDefaultDescription
namestrrequiredModel name (e.g., "gpt-4o-mini")
providerstrrequiredProvider name (e.g., "openai", "anthropic")
costfloat0.0Cost per 1K tokens in USD
keywordslist[str][]Keywords for domain routing
domainslist[str][]Domain tags for routing
supports_toolsboolFalseWhether model supports tool calling
supports_visionboolFalseWhether model supports vision input
max_tokensint2000Max generation tokens
latency_msfloat100.0Estimated latency in milliseconds
temperaturefloat0.7Default temperature
top_pfloat1.0Top-p sampling
frequency_penaltyfloat0.0Frequency penalty

Providers

ProviderValueModels
OpenAI"openai"gpt-4o, gpt-4o-mini, gpt-5, gpt-5-mini
Anthropic"anthropic"claude-opus-4.5, claude-sonnet-4, claude-haiku-3.5
Groq"groq"llama-3.3-70b, mixtral-8x7b
Ollama"ollama"Any locally served model
vLLM"vllm"Any self-hosted model
OpenRouter"openrouter"Any OpenRouter model
Together"together"Any Together AI model

Examples

Two-Model Cascade

from cascadeflow import CascadeAgent, ModelConfig

agent = CascadeAgent(models=[
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
    ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])

Multi-Provider Cascade

agent = CascadeAgent(models=[
    ModelConfig(name="llama-3.3-70b", provider="groq", cost=0.00059),
    ModelConfig(name="gpt-4o-mini", provider="openai", cost=0.000375),
    ModelConfig(name="claude-sonnet-4", provider="anthropic", cost=0.009),
])

With Domain Routing

legal_model = ModelConfig(
    name="gpt-4o",
    provider="openai",
    cost=0.00625,
    domains=["legal", "compliance"],
    keywords=["contract", "regulation", "statute"],
)

Local Model

local = ModelConfig(
    name="llama3:8b",
    provider="ollama",
    cost=0.0,  # Free
    latency_ms=50.0,
)