Skip to main content
vLLM LogoSystem Level Intelligence

Intelligent Routingfor Mixture-of-Modality

Signal-driven decisions · Plugin-chain architecture
Cloud · Data Center · Edge

🎯Signal-Driven
🔌Plugin-Chain
🌐Cloud · DC · Edge
MoM

Routing Blueprint

How System Works

An interactive walkthrough of signal extraction, decision logic, and model routing behavior.

Shannon Mapping

Structural mapping from communication theory to the routing pipeline.

The user request is the raw source message before encoding.

Built on Encoder Models

Encoder-Based Intelligence

Purpose-built encoder models extract meaning from every request — understanding intent, ranking relevance, and classifying content across modalities in real time.

Input
"Is machine learning related to AI?"
Tokenizer
[CLS]IsmachinelearningrelatedtoAI?[SEP]
Embedding
Token Emb
Segment Emb
Position Emb
h₀ = Σ
Encoder Block
×N
🔗Multi-Head Attention
Add & Norm
⚙️Feed-Forward
Add & Norm
Signals
🎯
Sentence-Level (CLS Token)[CLS] → Linear Head → "computer_science"TaskType: SEQ_CLS
DomainJailbreakFact-checkFeedbackModality
🏷️
Token-Level (Per Token)Each token → BIO Label → O O B-LOC I-LOC OTaskType: TOKEN_CLS
PII Detection
🌊
Bi-Encodermean-pooling(h₁..hₙ) → [0.23, -0.41, 0.87, ...]TaskType: EMBEDDING
Semantic CacheSimilarityComplexity-CLJailbreak-CL
🔀
Cross-Encoder[CLS] query [SEP] candidate [SEP] → scoreTaskType: CROSS_LEARNING
RerankMulti-Modal
🎭

Multi-Modality

Detect and route text, image and audio inputs to the right modality-capable model.

🧬

Bi-Encoder Embeddings

Independently encode queries and candidates into dense vectors for similarity search and semantic caching.

Cross-Encoder Learning

Joint cross-attention scoring of query-candidate pairs for high-precision reranking.

🤔

Classification

Domain, jailbreak, PII and fact-check classification across 14 MMLU categories via ModernBERT with LoRA.

👁️

Full Attention

Bidirectional attention across tokens and sentences — full context in both directions, not causal masking.

🪆

2DMSE

Adjust embedding layers and dimensions at inference time to trade compute for accuracy on the fly.

📐

MRL

Truncate embedding vectors to any dimension without retraining — balance accuracy and speed per request.

Meet Our Team

Innovation thrives when great minds come together

Huamin ChenMaintainer

Huamin Chen

Distinguished Engineer @Red Hat

Chen WangMaintainer

Chen Wang

Senior Staff Research Scientist @IBM

Yue ZhuMaintainer

Yue Zhu

Staff Research Scientist @IBM

Xunzhuo LiuMaintainer

Xunzhuo Liu

Intelligent Routing @vLLM

Senan ZedanCommitter

Senan Zedan

R&D Manager @Red Hat

samzongCommitter

samzong

AI Infrastructure / Cloud-Native PM @DaoCloud

Liav WeissCommitter

Liav Weiss

Software Engineer @Red Hat

Asaad BalumCommitter

Asaad Balum

Senior Software Engineer @Red Hat

YehuditCommitter

Yehudit

Software Engineer @Red Hat

Noa LimoyCommitter

Noa Limoy

Software Engineer @Red Hat

JaredforRealCommitter

JaredforReal

Software Engineer @Z.ai

Srinivas ACommitter

Srinivas A

Software Engineer @Yokogawa

carloryCommitter

carlory

Open Source Engineer @DaoCloud

Yossi OvadiaCommitter

Yossi Ovadia

Senior Principal Engineer @Red Hat

Jintao ZhangCommitter

Jintao Zhang

Senior Software Engineer @Kong

yuluo-yxCommitter

yuluo-yx

Individual Contributor

cryo-zdCommitter

cryo-zd

Individual Contributor

OneZero-YCommitter

OneZero-Y

Individual Contributor

aeftCommitter

aeft

Individual Contributor

Hao WuCommitter

Hao Wu

Individual Contributor

Qiping PanCommitter

Qiping Pan

Individual Contributor

Huamin ChenMaintainer

Huamin Chen

Distinguished Engineer @Red Hat

Chen WangMaintainer

Chen Wang

Senior Staff Research Scientist @IBM

Yue ZhuMaintainer

Yue Zhu

Staff Research Scientist @IBM

Xunzhuo LiuMaintainer

Xunzhuo Liu

Intelligent Routing @vLLM

Senan ZedanCommitter

Senan Zedan

R&D Manager @Red Hat

samzongCommitter

samzong

AI Infrastructure / Cloud-Native PM @DaoCloud

Liav WeissCommitter

Liav Weiss

Software Engineer @Red Hat

Asaad BalumCommitter

Asaad Balum

Senior Software Engineer @Red Hat

YehuditCommitter

Yehudit

Software Engineer @Red Hat

Noa LimoyCommitter

Noa Limoy

Software Engineer @Red Hat

JaredforRealCommitter

JaredforReal

Software Engineer @Z.ai

Srinivas ACommitter

Srinivas A

Software Engineer @Yokogawa

carloryCommitter

carlory

Open Source Engineer @DaoCloud

Yossi OvadiaCommitter

Yossi Ovadia

Senior Principal Engineer @Red Hat

Jintao ZhangCommitter

Jintao Zhang

Senior Software Engineer @Kong

yuluo-yxCommitter

yuluo-yx

Individual Contributor

cryo-zdCommitter

cryo-zd

Individual Contributor

OneZero-YCommitter

OneZero-Y

Individual Contributor

aeftCommitter

aeft

Individual Contributor

Hao WuCommitter

Hao Wu

Individual Contributor

Qiping PanCommitter

Qiping Pan

Individual Contributor

Huamin ChenMaintainer

Huamin Chen

Distinguished Engineer @Red Hat

Chen WangMaintainer

Chen Wang

Senior Staff Research Scientist @IBM

Yue ZhuMaintainer

Yue Zhu

Staff Research Scientist @IBM

Xunzhuo LiuMaintainer

Xunzhuo Liu

Intelligent Routing @vLLM

Senan ZedanCommitter

Senan Zedan

R&D Manager @Red Hat

samzongCommitter

samzong

AI Infrastructure / Cloud-Native PM @DaoCloud

Liav WeissCommitter

Liav Weiss

Software Engineer @Red Hat

Asaad BalumCommitter

Asaad Balum

Senior Software Engineer @Red Hat

YehuditCommitter

Yehudit

Software Engineer @Red Hat

Noa LimoyCommitter

Noa Limoy

Software Engineer @Red Hat

JaredforRealCommitter

JaredforReal

Software Engineer @Z.ai

Srinivas ACommitter

Srinivas A

Software Engineer @Yokogawa

carloryCommitter

carlory

Open Source Engineer @DaoCloud

Yossi OvadiaCommitter

Yossi Ovadia

Senior Principal Engineer @Red Hat

Jintao ZhangCommitter

Jintao Zhang

Senior Software Engineer @Kong

yuluo-yxCommitter

yuluo-yx

Individual Contributor

cryo-zdCommitter

cryo-zd

Individual Contributor

OneZero-YCommitter

OneZero-Y

Individual Contributor

aeftCommitter

aeft

Individual Contributor

Hao WuCommitter

Hao Wu

Individual Contributor

Qiping PanCommitter

Qiping Pan

Individual Contributor

Acknowledgements

vLLM Semantic Router is made possible by the open-source ecosystem.