Install with AgentGateway
This guide provides step-by-step instructions for integrating the vLLM Semantic Router with AgentGateway on Kubernetes. AgentGateway acts as the Gateway API data plane for OpenAI-compatible traffic, and vLLM Semantic Router runs as an Envoy ExtProc server that classifies each request and mutates the request body before AgentGateway forwards it to vLLM.
Architecture Overview
The deployment consists of:
- vLLM Semantic Router: Provides prompt classification, model selection, request mutation, and response processing through ExtProc
- AgentGateway: Provides the Kubernetes Gateway API proxy,
AgentgatewayBackend,HTTPRoute, andAgentgatewayPolicyresources - Demo vLLM-compatible backend: Serves a base model and LoRA adapters through an OpenAI-compatible API
Prerequisites
Before starting, ensure you have the following tools installed:
- kind - Kubernetes in Docker (Optional)
- kubectl - Kubernetes CLI
- Helm - Package manager for Kubernetes
This guide requires AgentGateway v1.3.0-alpha.1 or newer because it uses the ExtProc processingOptions and allowModeOverride fields that were added after v1.2.1.
Step 1: Create Kind Cluster (Optional)
Create a local Kubernetes cluster for testing:
kind create cluster --name semantic-router-agentgateway
# Verify cluster is ready
kubectl wait --for=condition=Ready nodes --all --timeout=300s
Step 2: Install AgentGateway
Install the Kubernetes Gateway API CRDs and the AgentGateway control plane:
export AGENTGATEWAY_VERSION=v1.3.0-alpha.1
kubectl apply --server-side --force-conflicts \
-f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.5.0/standard-install.yaml
helm upgrade -i agentgateway-crds oci://cr.agentgateway.dev/charts/agentgateway-crds \
--create-namespace \
--namespace agentgateway-system \
--version "${AGENTGATEWAY_VERSION}" \
--set controller.image.pullPolicy=Always
helm upgrade -i agentgateway oci://cr.agentgateway.dev/charts/agentgateway \
--namespace agentgateway-system \
--version "${AGENTGATEWAY_VERSION}" \
--set controller.image.pullPolicy=Always \
--set controller.extraEnv.KGW_ENABLE_GATEWAY_API_EXPERIMENTAL_FEATURES=true \
--wait
kubectl get pods -n agentgateway-system
Step 3: Create an AgentGateway Proxy
Create a Gateway that uses the AgentGateway GatewayClass:
kubectl apply -f- <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: agentgateway-proxy
namespace: agentgateway-system
spec:
gatewayClassName: agentgateway
listeners:
- protocol: HTTP
port: 80
name: http
allowedRoutes:
namespaces:
from: All
EOF
kubectl wait --for=condition=Available deployment/agentgateway-proxy \
-n agentgateway-system \
--timeout=300s
Step 4: Deploy Demo LLM
Deploy a lightweight OpenAI-compatible simulator that serves base-model plus the LoRA adapter names selected by the Semantic Router demo configuration:
kubectl apply -f- <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-llama3-8b-instruct
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: vllm-llama3-8b-instruct
template:
metadata:
labels:
app: vllm-llama3-8b-instruct
spec:
containers:
- name: vllm-sim
image: ghcr.io/llm-d/llm-d-inference-sim:v0.5.0
imagePullPolicy: IfNotPresent
args:
- --model
- base-model
- --port
- "8000"
- --max-loras
- "6"
- --lora-modules
- '{"name": "math-expert"}'
- '{"name": "science-expert"}'
- '{"name": "social-expert"}'
- '{"name": "humanities-expert"}'
- '{"name": "law-expert"}'
- '{"name": "general-expert"}'
ports:
- containerPort: 8000
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /health
port: http
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
name: vllm-llama3-8b-instruct
namespace: default
labels:
app: vllm-llama3-8b-instruct
spec:
type: ClusterIP
ports:
- port: 8000
targetPort: 8000
protocol: TCP
selector:
app: vllm-llama3-8b-instruct
EOF
kubectl wait --for=condition=Available deployment/vllm-llama3-8b-instruct \
-n default \
--timeout=300s
Step 5: Deploy vLLM Semantic Router
Install the Semantic Router in the agentgateway-system namespace so the AgentGateway ExtProc policy can reference the semantic-router service directly:
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
--version v0.0.0-latest \
--namespace agentgateway-system \
-f https://raw.githubusercontent.com/vllm-project/semantic-router/refs/heads/main/deploy/kubernetes/agentgateway/semantic-router-values/values.yaml
kubectl wait --for=condition=Available deployment/semantic-router \
-n agentgateway-system \
--timeout=600s
The values file configures Semantic Router to send traffic to vllm-llama3-8b-instruct.default.svc.cluster.local:8000 and to select adapter names such as math-expert, science-expert, and general-expert.
Step 6: Create AgentGateway Routing Resources
Create an AgentgatewayBackend for the vLLM-compatible backend and route OpenAI-compatible requests to it:
kubectl apply -f- <<'EOF'
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: semantic-router-vllm
namespace: agentgateway-system
spec:
ai:
provider:
openai: {}
host: vllm-llama3-8b-instruct.default.svc.cluster.local
port: 8000
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: semantic-router-vllm
namespace: agentgateway-system
spec:
parentRefs:
- name: agentgateway-proxy
namespace: agentgateway-system
rules:
- backendRefs:
- name: semantic-router-vllm
namespace: agentgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
EOF
The openai.model field is intentionally omitted so AgentGateway uses the model name from the request body after Semantic Router selects the target model or LoRA adapter.
Step 7: Attach Semantic Router as ExtProc
Create an AgentgatewayPolicy that sends request and response processing phases to the Semantic Router ExtProc service:
kubectl apply -f- <<'EOF'
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: semantic-router-extproc
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: agentgateway-proxy
traffic:
extProc:
backendRef:
name: semantic-router
namespace: agentgateway-system
port: 50051
processingOptions:
requestHeaderMode: Send
requestBodyMode: Buffered
responseHeaderMode: Send
responseBodyMode: Buffered
allowModeOverride: true
EOF
The demo uses buffered request bodies to match the existing Envoy AI Gateway and Istio examples. For large prompts, use requestBodyMode: FullDuplexStreamed together with a Semantic Router configuration that enables streamed body handling. AgentGateway does not support Streamed mode; FullDuplexStreamed is the only streaming option.
Testing the Deployment
Start a port-forward to the AgentGateway proxy:
kubectl port-forward -n agentgateway-system svc/agentgateway-proxy 8080:80
In another terminal, send an OpenAI-compatible request with "model": "auto":
curl -i -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "What is the derivative of f(x) = x^3?"}
],
"max_tokens": 64,
"temperature": 0
}'
Semantic Router should classify the math prompt, select the configured math route, and mutate the request model before AgentGateway forwards the request to the vLLM-compatible backend. Use -i to inspect Semantic Router response headers such as the selected model metadata.
Troubleshooting
AgentGateway proxy not ready:
kubectl get gateway agentgateway-proxy -n agentgateway-system
kubectl get deployment agentgateway-proxy -n agentgateway-system
kubectl logs -n agentgateway-system deployment/agentgateway
HTTPRoute or AgentGateway backend not accepted:
kubectl describe httproute semantic-router-vllm -n agentgateway-system
kubectl describe agentgatewaybackend semantic-router-vllm -n agentgateway-system
Semantic Router not responding to ExtProc:
kubectl get pods -n agentgateway-system
kubectl get svc semantic-router -n agentgateway-system
kubectl logs -n agentgateway-system deployment/semantic-router
kubectl describe agentgatewaypolicy semantic-router-extproc -n agentgateway-system
Demo LLM not responding:
kubectl get pods -n default -l app=vllm-llama3-8b-instruct
kubectl logs -n default deployment/vllm-llama3-8b-instruct
Cleanup
To remove the entire deployment:
kubectl delete agentgatewaypolicy semantic-router-extproc -n agentgateway-system
kubectl delete httproute semantic-router-vllm -n agentgateway-system
kubectl delete agentgatewaybackend semantic-router-vllm -n agentgateway-system
kubectl delete gateway agentgateway-proxy -n agentgateway-system
kubectl delete deployment vllm-llama3-8b-instruct -n default
kubectl delete service vllm-llama3-8b-instruct -n default
helm uninstall semantic-router -n agentgateway-system
helm uninstall agentgateway -n agentgateway-system
helm uninstall agentgateway-crds -n agentgateway-system
kind delete cluster --name semantic-router-agentgateway