Model Selection Overview
Model selection is an advanced feature of vLLM Semantic Router that automatically chooses the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.
The semantic router supports 8 selection algorithms across two categories:
- Core algorithms: Static, Elo, RouterDC, AutoMix, Hybrid
- RL-driven algorithms: Thompson Sampling, GMTRouter, Router-R1
What Problem Does It Solve?
When you have multiple LLM backends (e.g., GPT-4, Claude, Llama, Mistral), you face a challenge: which model should handle each request?
Traditional approaches:
- Static routing: Always use the same model (simple but suboptimal)
- Round-robin: Distribute evenly (ignores model strengths)
- Random: No intelligence (wastes resources)
Model selection solves this by intelligently matching queries to models based on:
- Learned quality preferences (Elo ratings from user feedback)
- Query-model similarity (RouterDC embeddings)
- Cost-quality tradeoffs (AutoMix optimization)
- Combined signals (Hybrid approach)
Available Algorithms
Core Algorithms
| Algorithm | Best For | Key Benefit |
|---|---|---|
| Static | Simple deployments | Predictable, zero overhead |
| Elo | Learning from feedback | Adapts to user preferences |
| RouterDC | Query-model matching | Matches specialties to queries |
| AutoMix | Cost optimization | Balances quality and cost |
| Hybrid | Complex requirements | Combines all methods |
RL-Driven Algorithms
| Algorithm | Best For | Key Benefit |
|---|---|---|
| Thompson Sampling | Exploration/exploitation | Bayesian adaptive learning |
| GMTRouter | Personalization | Per-user preference learning |
| Router-R1 | Complex reasoning | LLM-powered routing decisions |
Quick Start
Basic Configuration (Per-Decision)
Model selection is configured per-decision, allowing different strategies for different query types:
decisions:
- name: tech
description: "Technical queries"
priority: 10
rules:
operator: "OR"
conditions:
- type: "domain"
name: "tech"
modelRefs:
- model: "llama3.2:3b"
- model: "phi4"
- model: "gemma3:27b"
algorithm:
type: "elo" # Use Elo rating for this decision
elo:
k_factor: 32
category_weighted: true
Algorithm Types
Static (Default)
Uses the first model in modelRefs. No learning, fully deterministic.
algorithm:
type: "static"
Elo Rating
Learns from user feedback to rank models by quality.
algorithm:
type: "elo"
elo:
k_factor: 32
storage_path: "/var/lib/vsr/elo.json"