跳到主要内容
Blog

Journal

Release notes, field reports, and research commentary from the vLLM Semantic Router project.

2 篇博文 含有标签「agents」

查看所有标签

Giving AgentGateway a Semantic Brain with vLLM Semantic Router

· 阅读需 10 分钟
Aayush Saini
SDE, Data and AI @ Red Hat
Anup Sharma
AI & Distributed System @ Nutanix

vLLM Agent Architecture Workflow: Custom Semantic Routing with AgentGateway and Semantic Router

Agent systems that span multiple models — a local endpoint for coding, a frontier cloud model for deep reasoning, and a fast general-purpose model for everyday tasks — all face the same routing question: how should each request be directed to the right backend?

Many deployments start with a lightweight Python proxy or keyword matcher in front of the gateway. That approach works at small scale, but misroutes grow quickly as traffic, languages, and task types diversify. This post shows how vLLM Semantic Router running as an Envoy ExtProc sidecar inside AgentGateway replaces that pattern with semantic, config-driven routing.

Semantic Tool Selection: Building Smarter AI Agents with Context-Aware Routing

· 阅读需 11 分钟
Xunzhuo Liu
Intelligent Routing @vLLM
Huamin Chen
Distinguished Engineer @ Red Hat

Anthropic recently published an insightful blog post on code execution with MCP, highlighting a critical challenge in modern AI systems: as agents connect to more tools, loading all tool definitions upfront becomes increasingly inefficient. Their solution—using code execution to load tools on-demand—demonstrates how established software engineering patterns can dramatically improve agent efficiency.

This resonates deeply with our experience building the vLLM Semantic Router. We've observed the same problem from a different angle: when AI agents have access to hundreds or thousands of tools, how do they know which tools are relevant for a given task?

Our solution: semantic tool selection—using semantic similarity to automatically select the most relevant tools for each user query before the request even reaches the LLM.

tools