Model Selection Overview
Model selection is an advanced feature of vLLM Semantic Router that automatically chooses the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.
The semantic router supports 9 selection algorithms across two categories:
- Core algorithms: Static, Latency-Aware, Elo, RouterDC, AutoMix, Hybrid
- RL-driven algorithms: Thompson Sampling, GMTRouter, Router-R1
What Problem Does It Solve?
When you have multiple LLM backends (e.g., GPT-4, Claude, Llama, Mistral), you face a challenge: which model should handle each request?
Traditional approaches:
- Static routing: Always use the same model (simple but suboptimal)
- Round-robin: Distribute evenly (ignores model strengths)
- Random: No intelligence (wastes resources)
Model selection solves this by intelligently matching queries to models based on:
- Learned quality preferences (Elo ratings from user feedback)
- Query-model similarity (RouterDC embeddings)
- Cost-quality tradeoffs (AutoMix optimization)
- Combined signals (Hybrid approach)
Available Algorithms
Core Algorithms
| Algorithm | Best For | Key Benefit |
|---|---|---|
| Static | Simple deployments | Predictable, zero overhead |
| Latency-Aware | Latency-sensitive routing | Selects by TPOT/TTFT percentiles |
| Elo | Learning from feedback | Adapts to user preferences |
| RouterDC | Query-model matching | Matches specialties to queries |
| AutoMix | Cost optimization | Balances quality and cost |
| Hybrid | Complex requirements | Combines all methods |