跳到主要内容
版本:v0.1

Model Selection Overview

Model selection is an advanced feature of vLLM Semantic Router that automatically chooses the best LLM from multiple candidates based on learned preferences, query similarity, and cost-quality optimization.

The semantic router supports 8 selection algorithms across two categories:

  • Core algorithms: Static, Elo, RouterDC, AutoMix, Hybrid
  • RL-driven algorithms: Thompson Sampling, GMTRouter, Router-R1

What Problem Does It Solve?

When you have multiple LLM backends (e.g., GPT-4, Claude, Llama, Mistral), you face a challenge: which model should handle each request?

Traditional approaches:

  • Static routing: Always use the same model (simple but suboptimal)
  • Round-robin: Distribute evenly (ignores model strengths)
  • Random: No intelligence (wastes resources)

Model selection solves this by intelligently matching queries to models based on:

  1. Learned quality preferences (Elo ratings from user feedback)
  2. Query-model similarity (RouterDC embeddings)
  3. Cost-quality tradeoffs (AutoMix optimization)
  4. Combined signals (Hybrid approach)

Available Algorithms

Core Algorithms

AlgorithmBest ForKey Benefit
StaticSimple deploymentsPredictable, zero overhead
EloLearning from feedbackAdapts to user preferences
RouterDCQuery-model matchingMatches specialties to queries
AutoMixCost optimizationBalances quality and cost
HybridComplex requirementsCombines all methods

RL-Driven Algorithms

AlgorithmBest ForKey Benefit
Thompson SamplingExploration/exploitationBayesian adaptive learning
GMTRouterPersonalizationPer-user preference learning
Router-R1Complex reasoningLLM-powered routing decisions

Quick Start

Basic Configuration (Per-Decision)

Model selection is configured per-decision, allowing different strategies for different query types:

decisions:
- name: tech
description: "Technical queries"
priority: 10
rules:
operator: "OR"
conditions:
- type: "domain"
name: "tech"
modelRefs:
- model: "llama3.2:3b"
- model: "phi4"
- model: "gemma3:27b"
algorithm:
type: "elo" # Use Elo rating for this decision
elo:
k_factor: 32
category_weighted: true

Algorithm Types

Static (Default)

Uses the first model in modelRefs. No learning, fully deterministic.

algorithm:
type: "static"

Elo Rating

Learns from user feedback to rank models by quality.

algorithm:
type: "elo"
elo:
k_factor: 32
storage_path: "/var/lib/vsr/elo.json"

RouterDC

Matches query embeddings to model descriptions.

algorithm:
type: "router_dc"
router_dc:
temperature: 0.07
require_descriptions: true

AutoMix

Optimizes cost-quality tradeoff using POMDP.

algorithm:
type: "automix"
automix:
cost_quality_tradeoff: 0.4

Hybrid

Combines all methods with configurable weights.

algorithm:
type: "hybrid"
hybrid:
elo_weight: 0.3
router_dc_weight: 0.3
automix_weight: 0.2
cost_weight: 0.2

How It Works

┌─────────────────────────────────────────────────────────────────────┐
│ User Query │
│ "Explain quantum computing" │
└─────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│ Decision Matching │
│ Decision "tech" matches → 3 models │
└─────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│ Selection Algorithm │
│ │
│ algorithm.type: "elo" │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ EloSelector.Select() │ │
│ │ │ │
│ │ Model Ratings: │ │
│ │ llama3.2:3b → 1468 (0 wins, 2 losses) │ │
│ │ phi4 → 1501 (3 wins, 2 losses) │ │
│ │ gemma3:27b → 1531 (5 wins, 1 loss) ← HIGHEST │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────┐
│ Selected: gemma3:27b │
│ (highest Elo rating: 1531) │
└─────────────────────────────────────────────────────────────────────┘

Choosing an Algorithm

See Choosing the Right Algorithm for detailed guidance.

Quick Decision Tree:

  1. Just getting started? → Use static (default)
  2. Have user feedback? → Use elo
  3. Have model descriptions? → Use router_dc
  4. Want cost optimization? → Use automix
  5. Need everything? → Use hybrid
  • User Feedback Routing - Collect feedback signals via /api/v1/feedback endpoint
  • Preference Routing - Route based on user preferences in the system
  • Domain Routing - Route by topic category using embedding classification

Reference Papers

The selection algorithms are based on these research papers:

Core Algorithms

RL-Driven Algorithms