Skip to main content
Version: v0.1

Router-R1 Selection

Router-R1 uses an LLM as the router itself, performing multi-round "think" and "route" actions to make intelligent routing decisions. The router can reason about query requirements, model capabilities, and cost trade-offs before making selections.

Reference: Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning by Hu et al., NeurIPS 2025. Our implementation is inspired by this paper's think/route action pattern.

Paper vs Implementation​

The original Router-R1 paper introduces multi-round, multi-model routing and aggregation - the router calls multiple models sequentially, integrates their responses into its context, and synthesizes a final answer. This is the paper's core contribution.

Our implementation provides a simplified single-model selection variant that uses the think/route action pattern for deliberative routing. For full multi-model aggregation, see the advanced configuration below.

Algorithm Flow​

Think/Route Protocol​

The Router LLM uses a structured output format with two action types:

ActionDescription
<think>...</think>Reasoning step - analyzes the query (can repeat)
<route>model</route>Final routing decision

Example Output​

<think>
Analyzing query: "Debug this Python code"
- Query type: coding task
- Requires: code understanding, debugging
- Best model: code-llama (specialized for code)
</think>
<route>code-llama</route>

How It Works​

Single-Model Selection (Default)​

  1. Router LLM receives the user query and model descriptions
  2. LLM performs THINK actions to analyze query requirements
  3. LLM performs ROUTE action to select the best model
  4. Selected model processes the request
  5. If the route is invalid, retry up to max_iterations

Multi-Model Aggregation (Advanced)​

When enable_aggregation: true, Router-R1 can call multiple models:

  1. Router LLM reasons about which models to consult
  2. Router calls Model A, receives response, integrates into context
  3. Router decides whether to call additional models
  4. Router synthesizes final answer from all responses

This matches the paper's multi-round aggregation approach.

RL-Based Training​

The router is trained using reinforcement learning with rewards for:

  • Format correctness: Proper use of think/route tags
  • Outcome quality: Quality of the final response
  • Cost efficiency: Balance between performance and cost

Core Algorithm (Go)​

// Select using LLM-as-Router
func (s *RouterR1Selector) Select(ctx context.Context, selCtx *SelectionContext) (*SelectionResult, error) {
if s.routerEndpoint == "" && s.fallbackToStatic {
return s.staticSelector.Select(ctx, selCtx)
}

history := []string{}

for i := 0; i < s.maxIterations; i++ {
response, err := s.callRouterLLM(ctx, selCtx.Query, selCtx.ModelDescriptions, history)
if err != nil {
if s.fallbackToStatic {
return s.staticSelector.Select(ctx, selCtx)
}
return nil, err
}

action := s.parseAction(response)

if action.Type == ActionRoute {
if s.isValidModel(action.Model, selCtx.CandidateModels) {
return &SelectionResult{
SelectedModel: action.Model,
Method: MethodRouterR1,
Reason: action.Reasoning,
}, nil
}
}

history = append(history, response)
}

// Max iterations reached
if s.fallbackToStatic {
return s.staticSelector.Select(ctx, selCtx)
}
return nil, fmt.Errorf("router failed after %d iterations", s.maxIterations)
}

Configuration​

Single-Model Selection​

decision:
algorithm:
type: router_r1
router_r1:
router_endpoint: http://localhost:8001 # Router LLM server
max_iterations: 3 # Max think/route cycles
temperature: 0.7 # Router LLM temperature
use_cot: true # Chain-of-thought reasoning
fallback_to_static: true # Fallback if router unavailable

models:
- name: gpt-4
backend: openai
description: "Complex reasoning and analysis"
- name: gpt-3.5-turbo
backend: openai
description: "Fast general responses"
- name: code-llama
backend: local
description: "Code generation and debugging"

Multi-Model Aggregation (Advanced)​

decision:
algorithm:
type: router_r1
router_r1:
router_endpoint: http://localhost:8001
enable_aggregation: true # Enable multi-model calling
max_models_per_query: 3 # Max models to consult
aggregation_strategy: "synthesize" # or "best_of"

Key Parameters​

ParameterDefaultDescription
router_endpointnullURL of Router-R1 server
max_iterations3Maximum think/route iterations
temperature0.7Temperature for router LLM
use_cottrueEnable chain-of-thought reasoning
fallback_to_statictrueUse static selection if router unavailable
enable_aggregationfalseEnable multi-model aggregation

Router-R1 Server​

Router-R1 requires a separate server running the router LLM:

cd src/training/rl_model_selection
python router_r1_server.py --port 8001 --model Qwen/Qwen2.5-3B-Instruct

The server exposes:

  • POST /route - Route a query to a model

Without Router Server​

If router_endpoint is null and fallback_to_static: true, Router-R1 falls back to static selection. This allows gradual adoption:

  1. Deploy with fallback_to_static: true
  2. Start Router-R1 server when ready
  3. Configure router_endpoint

When to Use Router-R1​

Good for:

  • Complex routing logic that's hard to encode in rules
  • Queries requiring semantic understanding
  • Systems with diverse, specialized models
  • Multi-model synthesis (with aggregation enabled)

Consider alternatives when:

  • Latency is critical (LLM routing adds 100-500ms)
  • Simple routing rules suffice
  • No GPU available for router LLM

Best Practices​

  1. Use a small router model: 3B-7B is sufficient for routing
  2. Enable fallback: Graceful degradation if router fails
  3. Limit iterations: 3 is usually enough
  4. Provide good model descriptions: Router uses these for decisions
  5. Monitor router latency: Track router_r1_decision_latency_seconds