Encoder signals turn raw requests into legible semantic state.
Signal
before scale
Encoder-native system intelligence for mixture-of-model serving, built on Shannon signals, entropy folding, and neural-symbolic routing.
13 signal families spanning intent, safety, modality, context, and preference.
12 selectors across symbolic policy, latency heuristics, reinforcement learning, and ML routing.
One architecture across cpu-local, amd-local, and ci-k8s.
Encoder priors. Shannon structure. Entropy folded into action.
A router should feel like a system brain: encoder-guided, entropy-aware, and ruthlessly clear.
Neural-symbolic routing, kept legible.
Encoder priors, Shannon mapping, entropy folding, and model selection stay visible from research prototypes to production paths.
Neural signals meet symbolic rules in auditable routing logic.
Cache, safety, rewrite, and tracing attach as composable behaviors.
Natural language intent compiles into neural-symbolic policy before execution begins.
Selection stays measurable enough for papers, benchmarks, and production tuning.
Docs, papers, and product routes read as one system, not scattered collateral.
Routing Blueprint
How System Works
An interactive walkthrough of signal extraction, decision logic, and model routing behavior.
Shannon Mapping
Structural mapping from communication theory to the routing pipeline.
The user request is the raw source message before encoding.
Encoder-Based Intelligence
Purpose-built encoders read intent, rank relevance, and classify modality before generation begins.
Sequence classification, token labeling, embeddings, and reranking collapse into one system-intelligence layer.
Multi-Modality
Detect and route text, image and audio inputs to the right modality-capable model.
Bi-Encoder Embeddings
Independently encode queries and candidates into dense vectors for similarity search and semantic caching.
Cross-Encoder Learning
Joint cross-attention scoring of query-candidate pairs for high-precision reranking.
Classification
Domain, jailbreak, PII and fact-check classification across 14 MMLU categories via ModernBERT with LoRA.
Full Attention
Bidirectional attention across tokens and sentences, with full context instead of causal masking.
2DMSE
Adjust embedding layers and dimensions at inference time to trade compute for accuracy on the fly.
MRL
Truncate embedding vectors to any dimension without retraining to balance accuracy and speed per request.
Meet Our Team
Innovation thrives when great minds come together
MaintainerHuamin Chen
Distinguished Engineer @Red Hat
MaintainerChen Wang
Senior Staff Research Scientist @IBM
MaintainerYue Zhu
Staff Research Scientist @IBM
MaintainerXunzhuo Liu
Intelligent Routing @vLLM
CommitterSenan Zedan
R&D Manager @Red Hat
Committersamzong
AI Infrastructure / Cloud-Native PM @DaoCloud
Liav Weiss
Software Engineer @Red Hat
Asaad Balum
Senior Software Engineer @Red Hat
Yehudit
Software Engineer @Red Hat
Noa Limoy
Software Engineer @Red Hat
CommitterJaredforReal
Software Engineer @Z.ai
Srinivas A
Software Engineer @Yokogawa
carlory
Open Source Engineer @DaoCloud
CommitterYossi Ovadia
Senior Principal Engineer @Red Hat
CommitterJintao Zhang
Senior Software Engineer @Kong
Committeryuluo-yx
Individual Contributor
Committercryo-zd
Individual Contributor
CommitterOneZero-Y
Individual Contributor
Committeraeft
Individual Contributor
CommitterHao Wu
Individual Contributor
CommitterQiping Pan
Individual Contributor
MaintainerHuamin Chen
Distinguished Engineer @Red Hat
MaintainerChen Wang
Senior Staff Research Scientist @IBM
MaintainerYue Zhu
Staff Research Scientist @IBM
MaintainerXunzhuo Liu
Intelligent Routing @vLLM
CommitterSenan Zedan
R&D Manager @Red Hat
Committersamzong
AI Infrastructure / Cloud-Native PM @DaoCloud
Liav Weiss
Software Engineer @Red Hat
Asaad Balum
Senior Software Engineer @Red Hat
Yehudit
Software Engineer @Red Hat
Noa Limoy
Software Engineer @Red Hat
CommitterJaredforReal
Software Engineer @Z.ai
Srinivas A
Software Engineer @Yokogawa
carlory
Open Source Engineer @DaoCloud
CommitterYossi Ovadia
Senior Principal Engineer @Red Hat
CommitterJintao Zhang
Senior Software Engineer @Kong
Committeryuluo-yx
Individual Contributor
Committercryo-zd
Individual Contributor
CommitterOneZero-Y
Individual Contributor
Committeraeft
Individual Contributor
CommitterHao Wu
Individual Contributor
CommitterQiping Pan
Individual Contributor
MaintainerHuamin Chen
Distinguished Engineer @Red Hat
MaintainerChen Wang
Senior Staff Research Scientist @IBM
MaintainerYue Zhu
Staff Research Scientist @IBM
MaintainerXunzhuo Liu
Intelligent Routing @vLLM
CommitterSenan Zedan
R&D Manager @Red Hat
Committersamzong
AI Infrastructure / Cloud-Native PM @DaoCloud
Liav Weiss
Software Engineer @Red Hat
Asaad Balum
Senior Software Engineer @Red Hat
Yehudit
Software Engineer @Red Hat
Noa Limoy
Software Engineer @Red Hat
CommitterJaredforReal
Software Engineer @Z.ai
Srinivas A
Software Engineer @Yokogawa
carlory
Open Source Engineer @DaoCloud
CommitterYossi Ovadia
Senior Principal Engineer @Red Hat
CommitterJintao Zhang
Senior Software Engineer @Kong
Committeryuluo-yx
Individual Contributor
Committercryo-zd
Individual Contributor
CommitterOneZero-Y
Individual Contributor
Committeraeft
Individual Contributor
CommitterHao Wu
Individual Contributor
CommitterQiping Pan
Individual Contributor
Acknowledgements
vLLM Semantic Router is made possible by the open-source ecosystem.
Architecture, written to be used.
Install, configure, train, and operate from one dense documentation graph.
Docs indexResearch and builders in one loop.
Papers, working groups, and contributors evolve the same system in public.
Community routes




