Router-R1 Selection
Router-R1 uses an LLM as the router itself, performing multi-round "think" and "route" actions to make intelligent routing decisions. The router can reason about query requirements, model capabilities, and cost trade-offs before making selections.
Reference: Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning by Hu et al., NeurIPS 2025. Our implementation is inspired by this paper's think/route action pattern.
Paper vs Implementation
The original Router-R1 paper introduces multi-round, multi-model routing and aggregation - the router calls multiple models sequentially, integrates their responses into its context, and synthesizes a final answer. This is the paper's core contribution.
Our implementation provides a simplified single-model selection variant that uses the think/route action pattern for deliberative routing. For full multi-model aggregation, see the advanced configuration below.