Router-R1 Selection
Router-R1 uses an LLM as the router itself, performing multi-round "think" and "route" actions to make intelligent routing decisions. The router can reason about query requirements, model capabilities, and cost trade-offs before making selections.
Reference: Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning by Hu et al., NeurIPS 2025. Our implementation is inspired by this paper's think/route action pattern.
Paper vs Implementationâ
The original Router-R1 paper introduces multi-round, multi-model routing and aggregation - the router calls multiple models sequentially, integrates their responses into its context, and synthesizes a final answer. This is the paper's core contribution.
Our implementation provides a simplified single-model selection variant that uses the think/route action pattern for deliberative routing. For full multi-model aggregation, see the advanced configuration below.
Algorithm Flowâ
Think/Route Protocolâ
The Router LLM uses a structured output format with two action types:
| Action | Description |
|---|---|
<think>...</think> | Reasoning step - analyzes the query (can repeat) |
<route>model</route> | Final routing decision |
Example Outputâ
<think>
Analyzing query: "Debug this Python code"
- Query type: coding task
- Requires: code understanding, debugging
- Best model: code-llama (specialized for code)
</think>
<route>code-llama</route>
How It Worksâ
Single-Model Selection (Default)â
- Router LLM receives the user query and model descriptions
- LLM performs THINK actions to analyze query requirements
- LLM performs ROUTE action to select the best model
- Selected model processes the request
- If the route is invalid, retry up to max_iterations
Multi-Model Aggregation (Advanced)â
When enable_aggregation: true, Router-R1 can call multiple models:
- Router LLM reasons about which models to consult
- Router calls Model A, receives response, integrates into context
- Router decides whether to call additional models
- Router synthesizes final answer from all responses
This matches the paper's multi-round aggregation approach.
RL-Based Trainingâ
The router is trained using reinforcement learning with rewards for:
- Format correctness: Proper use of think/route tags
- Outcome quality: Quality of the final response
- Cost efficiency: Balance between performance and cost
Core Algorithm (Go)â
// Select using LLM-as-Router
func (s *RouterR1Selector) Select(ctx context.Context, selCtx *SelectionContext) (*SelectionResult, error) {
if s.routerEndpoint == "" && s.fallbackToStatic {
return s.staticSelector.Select(ctx, selCtx)
}
history := []string{}
for i := 0; i < s.maxIterations; i++ {
response, err := s.callRouterLLM(ctx, selCtx.Query, selCtx.ModelDescriptions, history)
if err != nil {
if s.fallbackToStatic {
return s.staticSelector.Select(ctx, selCtx)
}
return nil, err
}
action := s.parseAction(response)
if action.Type == ActionRoute {
if s.isValidModel(action.Model, selCtx.CandidateModels) {
return &SelectionResult{
SelectedModel: action.Model,
Method: MethodRouterR1,
Reason: action.Reasoning,
}, nil
}
}
history = append(history, response)
}
// Max iterations reached
if s.fallbackToStatic {
return s.staticSelector.Select(ctx, selCtx)
}
return nil, fmt.Errorf("router failed after %d iterations", s.maxIterations)
}
Configurationâ
Single-Model Selectionâ
decision:
algorithm:
type: router_r1
router_r1:
router_endpoint: http://localhost:8001 # Router LLM server
max_iterations: 3 # Max think/route cycles
temperature: 0.7 # Router LLM temperature
use_cot: true # Chain-of-thought reasoning
fallback_to_static: true # Fallback if router unavailable
models:
- name: gpt-4
backend: openai
description: "Complex reasoning and analysis"
- name: gpt-3.5-turbo
backend: openai
description: "Fast general responses"
- name: code-llama
backend: local
description: "Code generation and debugging"
Multi-Model Aggregation (Advanced)â
decision:
algorithm:
type: router_r1
router_r1:
router_endpoint: http://localhost:8001
enable_aggregation: true # Enable multi-model calling
max_models_per_query: 3 # Max models to consult
aggregation_strategy: "synthesize" # or "best_of"
Key Parametersâ
| Parameter | Default | Description |
|---|---|---|
router_endpoint | null | URL of Router-R1 server |
max_iterations | 3 | Maximum think/route iterations |
temperature | 0.7 | Temperature for router LLM |
use_cot | true | Enable chain-of-thought reasoning |
fallback_to_static | true | Use static selection if router unavailable |
enable_aggregation | false | Enable multi-model aggregation |
Router-R1 Serverâ
Router-R1 requires a separate server running the router LLM:
cd src/training/rl_model_selection
python router_r1_server.py --port 8001 --model Qwen/Qwen2.5-3B-Instruct
The server exposes:
POST /route- Route a query to a model
Without Router Serverâ
If router_endpoint is null and fallback_to_static: true, Router-R1 falls back to static selection. This allows gradual adoption:
- Deploy with
fallback_to_static: true - Start Router-R1 server when ready
- Configure
router_endpoint
When to Use Router-R1â
Good for:
- Complex routing logic that's hard to encode in rules
- Queries requiring semantic understanding
- Systems with diverse, specialized models
- Multi-model synthesis (with aggregation enabled)
Consider alternatives when:
- Latency is critical (LLM routing adds 100-500ms)
- Simple routing rules suffice
- No GPU available for router LLM
Best Practicesâ
- Use a small router model: 3B-7B is sufficient for routing
- Enable fallback: Graceful degradation if router fails
- Limit iterations: 3 is usually enough
- Provide good model descriptions: Router uses these for decisions
- Monitor router latency: Track
router_r1_decision_latency_seconds