Skip to main content
vLLM LogoSystem Level Intelligence

Intelligent Routingfor Mixture-of-Modality

Signal-driven decisions ยท Plugin-chain architecture
Cloud ยท Data Center ยท Edge

๐ŸŽฏSignal-Driven
๐Ÿ”ŒPlugin-Chain
๐ŸŒCloud ยท DC ยท Edge
MoM

Built on Encoder Models

Encoder-Based Intelligence

Purpose-built encoder models extract meaning from every request โ€” understanding intent, ranking relevance, and classifying content across modalities in real time.

Input
"Is machine learning related to AI?"
Tokenizer
[CLS]IsmachinelearningrelatedtoAI?[SEP]
Embedding
Token Emb
Segment Emb
Position Emb
hโ‚€ = ฮฃ
Encoder Block
ร—N
๐Ÿ”—Multi-Head Attention
โž•Add & Norm
โš™๏ธFeed-Forward
โž•Add & Norm
Signals
๐ŸŽฏ
Sentence-Level (CLS Token)[CLS] โ†’ Linear Head โ†’ "computer_science"TaskType: SEQ_CLS
DomainJailbreakFact-checkFeedbackModality
๐Ÿท๏ธ
Token-Level (Per Token)Each token โ†’ BIO Label โ†’ O O B-LOC I-LOC OTaskType: TOKEN_CLS
PII Detection
๐ŸŒŠ
Bi-Encodermean-pooling(hโ‚..hโ‚™) โ†’ [0.23, -0.41, 0.87, ...]TaskType: EMBEDDING
Semantic CacheSimilarityComplexity-CLJailbreak-CL
๐Ÿ”€
Cross-Encoder[CLS] query [SEP] candidate [SEP] โ†’ scoreTaskType: CROSS_LEARNING
RerankMulti-Modal
๐ŸŽญ

Multi-Modality

Detect and route text, image and audio inputs to the right modality-capable model.

๐Ÿงฌ

Bi-Encoder Embeddings

Independently encode queries and candidates into dense vectors for similarity search and semantic caching.

โšก

Cross-Encoder Learning

Joint cross-attention scoring of query-candidate pairs for high-precision reranking.

๐Ÿค”

Classification

Domain, jailbreak, PII and fact-check classification across 14 MMLU categories via ModernBERT with LoRA.

๐Ÿ‘๏ธ

Full Attention

Bidirectional attention across tokens and sentences โ€” full context in both directions, not causal masking.

๐Ÿช†

2DMSE

Adjust embedding layers and dimensions at inference time to trade compute for accuracy on the fly.

๐Ÿ“

MRL

Truncate embedding vectors to any dimension without retraining โ€” balance accuracy and speed per request.

๐Ÿ—๏ธ Architecture

Architecture

๐ŸŽฏ Our Goals

Building the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems

vLLM Semantic Router Banner
1
How to capture the missing signals in request, response and context?
2
How to combine the signals to make better decisions?
3
How to collaborate more efficiently between different models?
4
How to secure the real world and LLM system from jailbreaks, pii leaks, hallucinations?
5
How to collect the valuable signals and build a self-learning system?

๐Ÿ“ Where it lives

It lives between the real world and models

Where vLLM Semantic Router Lives

๐Ÿ‘ฅ Meet Our Team

The amazing people behind vLLM Semantic Router

Huamin ChenMaintainer

Huamin Chen

Distinguished Engineer @Red Hat

Chen WangMaintainer

Chen Wang

Senior Staff Research Scientist @IBM

Yue ZhuMaintainer

Yue Zhu

Staff Research Scientist @IBM

Xunzhuo LiuMaintainer

Xunzhuo Liu

Intelligent Routing @vLLM

Senan ZedanCommitter

Senan Zedan

R&D Manager @Red Hat

samzongCommitter

samzong

AI Infrastructure / Cloud-Native PM @DaoCloud

Liav WeissCommitter

Liav Weiss

Software Engineer @Red Hat

Asaad BalumCommitter

Asaad Balum

Senior Software Engineer @Red Hat

YehuditCommitter

Yehudit

Software Engineer @Red Hat

Noa LimoyCommitter

Noa Limoy

Software Engineer @Red Hat

JaredforRealCommitter

JaredforReal

Software Engineer @Z.ai

Srinivas ACommitter

Srinivas A

Software Engineer @Yokogawa

carloryCommitter

carlory

Open Source Engineer @DaoCloud

Yossi OvadiaCommitter

Yossi Ovadia

Senior Principal Engineer @Red Hat

Jintao ZhangCommitter

Jintao Zhang

Senior Software Engineer @Kong

yuluo-yxCommitter

yuluo-yx

Individual Contributor

cryo-zdCommitter

cryo-zd

Individual Contributor

OneZero-YCommitter

OneZero-Y

Individual Contributor

aeftCommitter

aeft

Individual Contributor

Huamin ChenMaintainer

Huamin Chen

Distinguished Engineer @Red Hat

Chen WangMaintainer

Chen Wang

Senior Staff Research Scientist @IBM

Yue ZhuMaintainer

Yue Zhu

Staff Research Scientist @IBM

Xunzhuo LiuMaintainer

Xunzhuo Liu

Intelligent Routing @vLLM

Senan ZedanCommitter

Senan Zedan

R&D Manager @Red Hat

samzongCommitter

samzong

AI Infrastructure / Cloud-Native PM @DaoCloud

Liav WeissCommitter

Liav Weiss

Software Engineer @Red Hat

Asaad BalumCommitter

Asaad Balum

Senior Software Engineer @Red Hat

YehuditCommitter

Yehudit

Software Engineer @Red Hat

Noa LimoyCommitter

Noa Limoy

Software Engineer @Red Hat

JaredforRealCommitter

JaredforReal

Software Engineer @Z.ai

Srinivas ACommitter

Srinivas A

Software Engineer @Yokogawa

carloryCommitter

carlory

Open Source Engineer @DaoCloud

Yossi OvadiaCommitter

Yossi Ovadia

Senior Principal Engineer @Red Hat

Jintao ZhangCommitter

Jintao Zhang

Senior Software Engineer @Kong

yuluo-yxCommitter

yuluo-yx

Individual Contributor

cryo-zdCommitter

cryo-zd

Individual Contributor

OneZero-YCommitter

OneZero-Y

Individual Contributor

aeftCommitter

aeft

Individual Contributor

Huamin ChenMaintainer

Huamin Chen

Distinguished Engineer @Red Hat

Chen WangMaintainer

Chen Wang

Senior Staff Research Scientist @IBM

Yue ZhuMaintainer

Yue Zhu

Staff Research Scientist @IBM

Xunzhuo LiuMaintainer

Xunzhuo Liu

Intelligent Routing @vLLM

Senan ZedanCommitter

Senan Zedan

R&D Manager @Red Hat

samzongCommitter

samzong

AI Infrastructure / Cloud-Native PM @DaoCloud

Liav WeissCommitter

Liav Weiss

Software Engineer @Red Hat

Asaad BalumCommitter

Asaad Balum

Senior Software Engineer @Red Hat

YehuditCommitter

Yehudit

Software Engineer @Red Hat

Noa LimoyCommitter

Noa Limoy

Software Engineer @Red Hat

JaredforRealCommitter

JaredforReal

Software Engineer @Z.ai

Srinivas ACommitter

Srinivas A

Software Engineer @Yokogawa

carloryCommitter

carlory

Open Source Engineer @DaoCloud

Yossi OvadiaCommitter

Yossi Ovadia

Senior Principal Engineer @Red Hat

Jintao ZhangCommitter

Jintao Zhang

Senior Software Engineer @Kong

yuluo-yxCommitter

yuluo-yx

Individual Contributor

cryo-zdCommitter

cryo-zd

Individual Contributor

OneZero-YCommitter

OneZero-Y

Individual Contributor

aeftCommitter

aeft

Individual Contributor

Acknowledgements

vLLM Semantic Router is born in open source and built on open source โค๏ธ