Model Selection Overview
Model selection is an advanced feature of vLLM Semantic Router that automatically chooses the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.
The semantic router supports 8 selection algorithms across two categories:
- Core algorithms: Static, Elo, RouterDC, AutoMix, Hybrid
- RL-driven algorithms: Thompson Sampling, GMTRouter, Router-R1
What Problem Does It Solve?â
When you have multiple LLM backends (e.g., GPT-4, Claude, Llama, Mistral), you face a challenge: which model should handle each request?
Traditional approaches:
- Static routing: Always use the same model (simple but suboptimal)
- Round-robin: Distribute evenly (ignores model strengths)
- Random: No intelligence (wastes resources)
Model selection solves this by intelligently matching queries to models based on:
- Learned quality preferences (Elo ratings from user feedback)
- Query-model similarity (RouterDC embeddings)
- Cost-quality tradeoffs (AutoMix optimization)
- Combined signals (Hybrid approach)
Available Algorithmsâ
Core Algorithmsâ
| Algorithm | Best For | Key Benefit |
|---|---|---|
| Static | Simple deployments | Predictable, zero overhead |
| Elo | Learning from feedback | Adapts to user preferences |
| RouterDC | Query-model matching | Matches specialties to queries |
| AutoMix | Cost optimization | Balances quality and cost |
| Hybrid | Complex requirements | Combines all methods |
RL-Driven Algorithmsâ
| Algorithm | Best For | Key Benefit |
|---|---|---|
| Thompson Sampling | Exploration/exploitation | Bayesian adaptive learning |
| GMTRouter | Personalization | Per-user preference learning |
| Router-R1 | Complex reasoning | LLM-powered routing decisions |
Quick Startâ
Basic Configuration (Per-Decision)â
Model selection is configured per-decision, allowing different strategies for different query types:
decisions:
- name: tech
description: "Technical queries"
priority: 10
rules:
operator: "OR"
conditions:
- type: "domain"
name: "tech"
modelRefs:
- model: "llama3.2:3b"
- model: "phi4"
- model: "gemma3:27b"
algorithm:
type: "elo" # Use Elo rating for this decision
elo:
k_factor: 32
category_weighted: true
Algorithm Typesâ
Static (Default)â
Uses the first model in modelRefs. No learning, fully deterministic.
algorithm:
type: "static"
Elo Ratingâ
Learns from user feedback to rank models by quality.
algorithm:
type: "elo"
elo:
k_factor: 32
storage_path: "/var/lib/vsr/elo.json"
RouterDCâ
Matches query embeddings to model descriptions.
algorithm:
type: "router_dc"
router_dc:
temperature: 0.07
require_descriptions: true
AutoMixâ
Optimizes cost-quality tradeoff using POMDP.
algorithm:
type: "automix"
automix:
cost_quality_tradeoff: 0.4
Hybridâ
Combines all methods with configurable weights.
algorithm:
type: "hybrid"
hybrid:
elo_weight: 0.3
router_dc_weight: 0.3
automix_weight: 0.2
cost_weight: 0.2
How It Worksâ
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â User Query â
â "Explain quantum computing" â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â
âŧ
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Decision Matching â
â Decision "tech" matches â 3 models â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â
âŧ
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Selection Algorithm â
â â
â algorithm.type: "elo" â
â â
â âââââââââââââââââââââââââââââââââââââââââââââââââââ ââââââââ â
â â EloSelector.Select() â â
â â â â
â â Model Ratings: â â
â â llama3.2:3b â 1468 (0 wins, 2 losses) â â
â â phi4 â 1501 (3 wins, 2 losses) â â
â â gemma3:27b â 1531 (5 wins, 1 loss) â HIGHEST â â
â âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â
âŧ
âââââââââââââââ ââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Selected: gemma3:27b â
â (highest Elo rating: 1531) â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Choosing an Algorithmâ
See Choosing the Right Algorithm for detailed guidance.
Quick Decision Tree:
- Just getting started? â Use
static(default) - Have user feedback? â Use
elo - Have model descriptions? â Use
router_dc - Want cost optimization? â Use
automix - Need everything? â Use
hybrid
Related Featuresâ
- User Feedback Routing - Collect feedback signals via
/api/v1/feedbackendpoint - Preference Routing - Route based on user preferences in the system
- Domain Routing - Route by topic category using embedding classification
Reference Papersâ
The selection algorithms are based on these research papers:
Core Algorithmsâ
- Elo: Inspired by preference-based routing concepts; see RouteLLM (Ong et al., ICLR 2025) which trains static routers achieving ~50% cost reduction (2x savings)
- RouterDC: Query-Based Router by Dual Contrastive Learning (NeurIPS 2024) - +2.76% accuracy improvement
- AutoMix: Automatically Mixing Language Models (NeurIPS 2024) - >50% cost reduction
- Hybrid: Cost-Efficient Quality-Aware Query Routing (ICLR 2024) - 40% fewer expensive calls
RL-Driven Algorithmsâ
- Thompson Sampling: Classical multi-armed bandit approach; see A Tutorial on Thompson Sampling (Russo, Van Roy et al.)
- GMTRouter: GMTRouter: Personalized LLM Router over Multi-turn User Interactions (Wang et al.) - 0.9-21.6% accuracy improvement
- Router-R1: Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via RL (Hu et al., NeurIPS 2025)