Skip to main content
Documentation

Session Aware

Overview

Version: v0.3

Session Aware

Overview

session_aware selects one model from a decision's modelRefs while respecting agentic multi-turn context. It wraps a base selector such as hybrid, then applies a router-owned stay-vs-switch policy for sessions, tool loops, idle timeout, handoff cost, switch history, confidence-gated remaining-turn priors, and prefix-cache cost.

Use it when clients send a stable x-session-id header and you want long-running agent sessions to avoid unnecessary model churn. Provider-managed continuation state, such as a Response API previous_response_id, is treated as non-portable state and hard-locks the session to the previous physical model. Router memory stores routing facts only, so model selection can reason about continuity without becoming application memory.

It aligns to config/algorithm/selection/session-aware.yaml.

Scope Boundary

session_aware is an Agentic Context Routing policy for model selection. It chooses one model from the matched decision's modelRefs and emits router evidence for that decision. It does not choose upstream endpoints or override Envoy load balancing inside a selected cluster. Endpoint membership, locality, and load-balancing behavior stay in the Envoy or Kubernetes configuration that serves the selected model.

Key Advantages

  • Preserves KV/prefix-cache locality across long-horizon agent sessions.
  • Hard-locks active tool loops to avoid mid-loop model changes.
  • Lets idle sessions reselect after the cache is likely cold.
  • Uses replay-derived remaining-turn priors to be stricter for task families that usually continue for many turns.
  • Scales switch cost up for expensive/frontier model checkouts.
  • Records session_policy in router replay for audit, experiments, and paper/blog analysis.

What Problem Does It Solve?

Single-turn routers often pick the best model for the latest message only. In long-running agent loops that can churn models between tool calls, waste prefix-cache locality, and make frontier model checkouts unnecessarily expensive. session_aware makes the router aware of session continuity before it decides whether a switch is worth the cost.

When to Use

  • Clients set a stable x-session-id header or use Response API conversation IDs.
  • The route serves agents that call tools over multiple turns.
  • Candidate models have materially different costs or prefix-cache behavior.
  • You want replayable policy traces for experiments and release validation.

Configuration

routing:
decisions:
- name: agentic_routing
rules:
operator: AND
conditions:
- type: conversation
name: active_tool_use
modelRefs:
- model: qwen3-8b
- model: qwen3-32b
algorithm:
type: session_aware
session_aware:
base_method: hybrid
idle_timeout_seconds: 300
tool_loop_hard_lock: true
context_portability_hard_lock: true
decision_drift_reset: true
prefix_cache_weight: 0.20
switch_history_weight: 0.04
remaining_turn_prior_weight: 1.0
remaining_turn_prior_horizon: 8
min_remaining_turn_prior_samples: 3

Policy

  • Tool loops stay on the previous model while tool calls/results are still active.
  • Provider-state continuations such as Response API previous_response_id stay on the previous model because that context is not portable across backends.
  • Decision drift resets continuity penalties when a session moves to a different matched decision, so a new task direction can reselect without waiting for idle timeout.
  • Non-idle sessions pay a prefix-cache and handoff penalty before switching.
  • Idle sessions can reselect after idle_timeout_seconds.
  • Expensive/frontier models increase the prefix-cache penalty according to input checkout cost (prompt_per_1m - cached_input_per_1m), so checkout churn is stricter when losing reusable prefix state is expensive. max_cache_cost_multiplier must be at least 1, which keeps the multiplier neutral-or-stricter rather than weakening high-cost checkout discipline.
  • Recent switch history increases the cost of another switch, preventing long-horizon agents from bouncing between models.
  • If lookup tables contain remaining_turn_prior for the matched category or decision, a sufficiently sampled prior lifts continuation mass for early turns and decays as the session advances. remaining_turn_prior_horizon must be positive so the decay window is explicit.
  • Router replay stores session_policy, including base scores, adjusted scores, hard-lock reasons, cache warmth, remaining-turn prior source and sample count, handoff penalties, and net switch advantage.
  • Provider-reported cached prompt tokens are recorded as telemetry and costed with cached_input_per_1m; client-facing usage is not rewritten.
  • Backend errors should be handled as backend or session recovery, outside this selector's stay-vs-switch policy.