Skip to main content

Observe Mode — Zero-Change Visibility

Observe mode tracks every LLM call without blocking or modifying any behavior. This is the safest way to start: no enforcement, no model switching, just metrics.

Prerequisites

  • cascadeflow installed (Installation)
  • At least one provider API key set (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
1

Add one line

Add cascadeflow.init(mode="observe") before any LLM calls in your application:
import cascadeflow

cascadeflow.init(mode="observe")

# Your existing code — unchanged
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is cascadeflow?"}],
)
print(response.choices[0].message.content)
Every call is now tracked. Nothing is blocked or changed.
2

See what you spend

Wrap a block with cascadeflow.run() to get aggregate metrics:
import cascadeflow

cascadeflow.init(mode="observe")

with cascadeflow.run() as session:
    # Run your agent, chain, or direct LLM calls
    response1 = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Summarize this document"}],
    )
    response2 = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Analyze the sentiment"}],
    )

    summary = session.summary()
    print(f"Total cost:    ${summary['cost_total']:.4f}")
    print(f"LLM calls:     {summary['steps']}")
    print(f"Total latency: {summary['latency_total_ms']:.0f}ms")
    print(f"Energy used:   {summary['energy_used']:.1f} units")
3

Read the decision trace

Even in observe mode, cascadeflow records what it would have done:
for record in session.trace():
    print(f"Step {record['step']}: {record['action']}{record['reason']}")
    print(f"  Model: {record['model']}, Cost so far: ${record['cost_total']:.4f}")
    print(f"  Applied: {record['applied']}")  # Always False in observe mode
This lets you audit compliance violations, budget overruns, and routing decisions before turning on enforcement.

TypeScript

import { CascadeAgent } from '@cascadeflow/core';

const agent = new CascadeAgent({
  models: [
    { name: 'gpt-4o-mini', provider: 'openai', cost: 0.000375 },
    { name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
  ],
});

const result = await agent.run('What is TypeScript?');
console.log(`Model: ${result.modelUsed}`);
console.log(`Cost: $${result.totalCost}`);
console.log(`Saved: ${result.savingsPercentage}%`);

What You Learn in Observe Mode

  • How much each agent run actually costs
  • Which models are called and how often
  • Where latency accumulates across steps
  • Which calls would violate compliance policies
  • Whether budget caps would have triggered

Next Step

Ready to enforce constraints? Add budget enforcement →