Design Doc: Multi-Protocol Adapter Architecture
Author: vLLM Semantic Router Team
Status: To be Implemented
Created: February 2026
Last Updated: February 2026
Overviewโ
This document describes the design and implementation of the multi-protocol adapter architecture for vLLM Semantic Router, which abstracts the API layer to support multiple front-end protocols beyond Envoy ExtProc.
Backgroundโ
The Semantic Router was tightly coupled to Envoy's External Processor (ExtProc) protocol via gRPC. While this provides powerful integration with Envoy, it created barriers for users who:
- Want to use the router without deploying Envoy
- Prefer direct HTTP/REST API integration
- Use Nginx or other reverse proxies
- Need simpler deployment architectures for development or testing
Motivationโ
- Flexibility: Users need direct HTTP API access without requiring Envoy infrastructure
- Testing: Developers need lightweight testing without full Envoy deployment
- Extensibility: Support for nginx, native gRPC, and custom protocols
- Reusability: Single routing engine shared across all protocols
- Deployment Options: Enable serverless, edge, and simplified deployment scenarios
Goalsโ
Primary Goalsโ
- Protocol Abstraction: Separate routing logic from protocol-specific code
- Multi-Protocol Support: Enable simultaneous operation of multiple protocols
- Backward Compatibility: Preserve existing ExtProc functionality
- Shared State: Single source of truth for cache, replay, and routing decisions
- Easy Extension: Simple pattern for adding new protocol adapters
Non-Goalsโ
- Replace or deprecate Envoy ExtProc support
- Change routing decision algorithms or classification logic
- Modify configuration format beyond adapter section
- Support protocol-specific features that break abstraction
Design Principlesโ
1. Single Routing Pipelineโ
CRITICAL: All routing logic MUST flow through RouterEngine.Route(). No exceptions.
- โ
Adapters translate protocol โ
RouteRequestโ callRouterEngine.Route() - โ
RouterEngine.Route()returnsRouteResponseโ adapters translate โ protocol - โ Adapters MUST NOT duplicate classification, security, cache, replay logic
- โ Adapters MUST NOT directly call classifiers, cache, or replay recorders
2. Thin Adapter Layerโ
Adapters are protocol translation only:
- Parse protocol-specific request format
- Convert to
RouteRequest - Call
RouterEngine.Route() - Convert
RouteResponseto protocol format - Return to client
3. RouterEngine Owns All Routingโ
RouterEngine.Route() is the ONLY place where:
- Classification happens
- PII/jailbreak detection runs
- Cache is checked/updated
- Tools are selected
- Replay is recorded
- Backend selection occurs
- Proxying happens (or proxy info is returned)
Designโ
Architecture Overviewโ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Adapter Manager โ โ
โ โ - Reads adapter config โ โ
โ โ - Creates protocol adapters โ โ
โ โ - Manages lifecycle โ โ
โ โโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ
โ โ โ โ โ โ
โ โโโโโโโโผโโโ โโโโโผโโโ โ โโโผโโโโโโโ โโโโผโโโโโโ โ
โ โ ExtProc โ โ HTTP โ โ gRPC โ โ Nginx โ โ
โ โ Adapter โ โAdapter โ โAdapter โ โAdapter โ โ
โ โ โโโโโโโ โ โ โโโโโโโโ โ โโโโโโโโ โ โโโโโโโโ โ
โ โ โParseโ โ โ โParseโโ โ โParseโโ โ โParseโโ โ
โ โ โExtP โ โ โ โHTTP โโ โ โgRPC โโ โ โNJS โโ โ
โ โ โโโโฌโโโ โ โ โโโฌโโโโโ โ โโโฌโโโโโ โ โโโโฌโโโโ โ
โ โ โConvโ โ โCon โ โ โCon โ โ โConโ โ
โ โ โผ โ โ โผ โ โ โผ โ โ โผ โ โ
โ โ โโโโโโโ โ โ โโโโโโ โ โ โโโโโโ โ โ โโโโโโโโ โ
โ โ โReq โ โ โ โReq โ โ โ โReq โ โ โ โReq โโ โ
โ โ โโโโฌโโโ โ โ โโโฌโโโ โ โ โโโฌโโโ โ โ โโโโฌโโโโ โ
โ โโโโโโผโโโโโ โโโโโผโโโโโ โโโโโผโโโโโ โโโโโโผโโโโ โ
โ โ โ โ โ โ
โ โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโ โ
โ Single Entry Point โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RouterEngine.Route() โ โ
โ โ 1. Classify request โ โ
โ โ 2. Check PII / jailbreak โ โ
โ โ 3. Check cache โ โ
โ โ 4. Select tools โ โ
โ โ 5. Select model/backend โ โ
โ โ 6. Record replay โ โ
โ โ 7. Proxy to backend (via Backend Layer) โ โ
โ โ 8. Update cache โ โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ RouteResponse โ
โ โ โ
โ โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโฌโโโโโโโโโโโโ โ
โ โ โ โ โ โ
โ โโโโโโโผโโโโโโ โโโโโโโโผโโโโโ โโโโโโโโโผโโโโ โโโโโโโผโโโโโโ โ
โ โ ExtProc โ โ HTTP โ โ gRPC โ โ Nginx โ โ
โ โ Adapter โ โ Adapter โ โ Adapter โ โ Adapter โ โ
โ โ โโโโโโโโโ โ โ โโโโโโโโโ โ โ โโโโโโโโโ โ โ โโโโโโโโโ โ โ
โ โ โConvertโ โ โ โConvertโ โ โ โConvertโ โ โ โConvertโ โ โ
โ โ โto gRPCโ โ โ โto HTTPโ โ โ โgRPC โ โ โ โto NJS โ โ โ
โ โ โโโโโโโโโ โ โ โโโโโโโโโ โ โ โโโโโโโโโ โ โ โโโโโโโโโ โ โ
โ โโโโโโโฌโโโโโโ โโโโโโโฌโโโโโโ โโโโโโโฌโโโโโโ โโโโโโโฌโโโโโโ โ
โ โ โ โ โ โ
โโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโ
โ โ โ โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Backend Abstraction Layer โ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโผโโโโโโโโโ โโโโโโโโผโโโโโโโโโโโ
โ Envoy Proxy โ โ Direct Proxy โ
โ (ExtProc mode) โ โ (HTTP/gRPC) โ
โ - Dynamic fwd โ โ - HTTP client โ
โ - Headers only โ โ - Full response โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโฌโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโฌโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Inference Backends โ
โ โโโโโโโโโโ โโโโโโโโโโ โ
โ โ vLLM โ โOllama โ โ
โ โServer โ โServer โ โ
โ โโโโโโโโโโ โโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Insight: Adapters are thin translation layers. All intelligence lives in RouterEngine.
Component Designโ
1. RouterEngine (Core)โ
Location: pkg/router/engine/
Responsibilities:
- Protocol-agnostic routing logic
- Request classification and decision evaluation
- Semantic cache operations
- Tool selection and embedding
- Router replay recording
- PII and jailbreak detection
- Model selection
Key Methods:
type RouterEngine struct {
Config *config.RouterConfig
Classifier *classification.Classifier
PIIChecker *pii.PolicyChecker
Cache cache.CacheBackend
ToolsDatabase *tools.ToolsDatabase
ModelSelector *selection.Registry
ReplayRecorders map[string]*routerreplay.Recorder
}
func (e *RouterEngine) Route(ctx context.Context, req *RouteRequest) (*RouteResponse, error)
func (e *RouterEngine) ClassifyRequest(ctx context.Context, messages []Message) (*ClassificationResult, error)
func (e *RouterEngine) CheckCache(ctx context.Context, model, query, decisionName string) (string, bool, error)
func (e *RouterEngine) UpdateCache(ctx context.Context, model, query, response, decisionName string) error
func (e *RouterEngine) SelectTools(ctx context.Context, query string, topK int) ([]openai.ChatCompletionToolParam, error)
func (e *RouterEngine) RecordReplay(ctx context.Context, decisionName string, record *routerreplay.RoutingRecord) error
Design Decisions:
- Single instance shared across all adapters
- Stateful (maintains cache, replay recorders)
- No protocol-specific logic
- Returns protocol-agnostic data structures
2. Adapter Interfaceโ
Location: pkg/adapter/manager.go
type Adapter interface {
Start() error // Start the adapter (blocks)
Stop() error // Graceful shutdown
GetEngine() *engine.RouterEngine // Access to shared engine
}
Design Decisions:
- Minimal interface for maximum flexibility
- Each adapter owns its lifecycle
- No protocol-specific methods in interface
- Adapters run in separate goroutines
3. Adapter Managerโ
Location: pkg/adapter/manager.go
Responsibilities:
- Parse adapter configuration
- Instantiate adapters based on config
- Start adapters in separate goroutines
- Coordinate graceful shutdown
Key Methods:
func (m *Manager) CreateAdapters(cfg *config.RouterConfig, eng *engine.RouterEngine, configPath string) error
func (m *Manager) StartAll() error
func (m *Manager) StopAll() error
func (m *Manager) Wait()
4. ExtProc Adapterโ
Location: pkg/adapter/extproc/
Responsibilities:
- Wrap existing Envoy ExtProc implementation
- Maintain backward compatibility
- Handle gRPC/Envoy protocol specifics
- Support TLS configuration
Key Features:
- Uses existing
extproc.OpenAIRouterinternally - Translates Envoy requests to RouterEngine calls
- Preserves all existing ExtProc functionality
- Configurable TLS support
5. HTTP Adapterโ
Location: pkg/adapter/http/
Responsibilities:
- Provide OpenAI-compatible REST API
- Direct access without Envoy
- Handle HTTP-specific concerns (CORS, headers, etc.)
Endpoints:
POST /v1/chat/completions- Chat completionsPOST /v1/completions- Text completions (future)GET /v1/models- List available modelsPOST /v1/classify- Classification endpointPOST /v1/route- Routing decision endpointGET /v1/router_replay- List replay recordsGET /v1/router_replay/{id}- Get replay recordGET /health- Health checkGET /ready- Readiness check
6. gRPC Adapterโ
Location: pkg/adapter/grpc/
Responsibilities:
- Provide native gRPC API for routing
- More efficient than ExtProc for direct gRPC clients
- Custom service definition optimized for routing
- Support for streaming and unary RPCs
Key Features:
- Custom
.protoservice definition - Optimized for low-latency routing decisions
- Supports both synchronous and asynchronous routing
- Built-in load balancing and connection pooling
- Compatible with gRPC ecosystem (grpc-gateway, etc.)
Service Definition:
service SemanticRouter {
rpc Route(RouteRequest) returns (RouteResponse);
rpc Classify(ClassifyRequest) returns (ClassifyResponse);
rpc StreamRoute(stream RouteRequest) returns (stream RouteResponse);
}
7. Nginx Adapterโ
Location: pkg/adapter/nginx/
Responsibilities:
- Integration with Nginx via NJS (JavaScript) module
- Lua script support for OpenResty
- Header-based routing similar to ExtProc
- Direct Nginx upstream configuration
Key Features:
- NJS module for request/response processing
- Communicates with RouterEngine via HTTP or gRPC
- Sets upstream selection based on routing decision
- Minimal overhead compared to ExtProc
- Native Nginx performance characteristics
Integration Methods:
- NJS Module: JavaScript-based request processing
- Lua/OpenResty: For OpenResty deployments
- HTTP Subrequest: Calls HTTP adapter internally
- Shared Memory: Direct IPC with RouterEngine process
Configuration Designโ
Adapter Configurationโ
adapters:
- type: "envoy" # ExtProc adapter
enabled: true
port: 50051
tls:
enabled: true
cert_file: "/path/to/cert.pem"
key_file: "/path/to/key.pem"
- type: "http" # HTTP REST API
enabled: true
port: 9000
- type: "grpc" # Native gRPC API
enabled: true
port: 50052
tls:
enabled: true
cert_file: "/path/to/cert.pem"
key_file: "/path/to/key.pem"
- type: "nginx" # Nginx integration
enabled: true
port: 9001
mode: "njs" # Options: njs, lua, http
config:
upstream_variable: "backend_upstream"
header_prefix: "x-vsr-"
Design Decisions:
- Array allows multiple adapters of same type (future: multiple HTTP on different ports)
- Per-adapter TLS configuration
- Simple enable/disable without removing config
- Port configuration at adapter level
Data Flowโ
Request Flow (HTTP Adapter Example)โ
1. Client Request
โ
2. HTTP Adapter receives POST /v1/chat/completions
โ
3. Parse OpenAI request format
โ
4. Call RouterEngine.Route(RouteRequest)
โ
5. RouterEngine performs:
- Classification (which decision matches?)
- Cache check (semantic similarity)
- Tool selection (if enabled)
- Replay recording (if configured)
โ
6. RouterEngine returns RouteResponse
โ
7. HTTP Adapter proxies to selected backend
โ
8. Return response to client
Shared State Flowโ
HTTP Request A โ HTTP Adapter
โ
RouterEngine โ Cache (hit/miss)
โ
ExtProc Request B โ ExtProc Adapter
Both adapters share:
- Same cache entries
- Same replay recorders
- Same classification decisions
- Same model selection state
Implementation Detailsโ
Initialization Sequenceโ
// main.go
1. Load configuration
2. Initialize embedding models
3. Create RouterEngine (NewRouterEngine)
- Initialize classifier
- Create semantic cache
- Setup tools database
- Initialize replay recorders per decision
- Setup model selector
4. Create Adapter Manager
5. Manager creates adapters (CreateAdapters)
- Each adapter gets reference to RouterEngine
- Per-adapter configuration (port, TLS)
6. Manager starts all adapters (StartAll)
- Each adapter in separate goroutine
7. Main blocks on Manager.Wait()
Error Handlingโ
- Adapter Creation Failure: Fatal error, application exits
- Adapter Start Failure: Fatal error, application exits
- Runtime Errors: Logged, adapter continues if possible
- RouterEngine Errors: Returned to adapter for protocol-specific handling
Concurrency Modelโ
- RouterEngine: Thread-safe, multiple adapters can call concurrently
- Cache: Backend handles concurrency (Redis, Milvus, etc.)
- Replay Recorders: Thread-safe map with per-decision locks
- Adapters: Independent goroutines, no shared adapter state
Trade-offs and Alternativesโ
Design Decisionsโ
1. Single Shared RouterEngine vs. Per-Adapter Enginesโ
Chosen: Single shared RouterEngine
Rationale:
- Consistent routing decisions across protocols
- Shared cache improves hit rate
- Single source of truth for replay records
- Reduced memory footprint
Trade-off: Potential contention point (mitigated by thread-safe design)
2. Adapter Interface Designโ
Alternatives Considered:
A. Rich Interface:
type Adapter interface {
Start() error
Stop() error
HandleRequest(req *Request) (*Response, error)
GetMetrics() *Metrics
Configure(cfg *Config) error
}
B. Minimal Interface (Chosen):
type Adapter interface {
Start() error
Stop() error
GetEngine() *engine.RouterEngine
}
Rationale: Minimal interface allows maximum protocol flexibility. Different protocols have vastly different request/response models.
3. Configuration Approachโ
Alternatives:
A. Separate files per adapter B. Environment variables C. Single config with adapters section (Chosen)
Rationale: Single config file keeps all configuration in one place, easier to manage and version control.
4. Backward Compatibilityโ
Approach: Wrap existing ExtProc implementation rather than rewrite
Rationale:
- No breaking changes
- Gradual migration path
- Proven, tested code remains in use
- Reduced risk
Known Limitationsโ
- No Protocol-Specific Optimization: Abstraction prevents protocol-specific optimizations
- Adapter Isolation: Adapters can't directly communicate (by design)
- Shared State Challenges: Race conditions if RouterEngine not thread-safe
- Configuration Complexity: More options for users to configure
Testing Strategyโ
Unit Testsโ
- RouterEngine methods with mock adapters
- Individual adapter logic
- Configuration parsing
Integration Testsโ
- Multiple adapters running simultaneously
- Shared state consistency (cache hits across adapters)
- Replay recording from both protocols
E2E Testsโ
- ExtProc via Envoy on port 8801
- HTTP direct on port 9000
- Verify identical routing decisions
- Verify replay records visible from both
Future Workโ
Short Termโ
-
Graceful Shutdown
- Drain in-flight requests
- Close connections cleanly
- Flush replay records
-
Adapter Metrics
- Per-adapter request counters
- Latency histograms
- Error rates
-
Enhanced Nginx Integration
- OpenResty Lua module
- Nginx Plus dynamic upstream API
- Shared memory IPC for zero-copy
Long Termโ
-
gRPC Streaming Enhancements
- Bi-directional streaming support
- Server-side streaming for batch requests
- Client-side streaming for large inputs
-
WebSocket Adapter
- Real-time streaming
- Bi-directional communication
-
Plugin System
- Dynamic adapter loading
- Third-party adapters
-
Per-Adapter Configuration
- Rate limiting
- Authentication
- Custom middleware
Migration Guideโ
From Old ExtProc-Only to Adapter Architectureโ
Before:
server := extproc.NewServer(configPath, port, secure, certPath)
server.Start()
After:
engine := engine.NewRouterEngine(configPath)
manager := adapter.NewManager()
manager.CreateAdapters(cfg, engine, configPath)
manager.StartAll()
manager.Wait()
Configuration:
# Add this to config.yaml
adapters:
- type: "envoy"
enabled: true
port: 50051
Adding New Adapterโ
- Create package
pkg/adapter/myprotocol/ - Implement
Adapterinterface:
type MyAdapter struct {
engine *engine.RouterEngine
port int
}
func NewAdapter(eng *engine.RouterEngine, port int) (*MyAdapter, error) {
return &MyAdapter{engine: eng, port: port}, nil
}
func (a *MyAdapter) Start() error {
// Protocol-specific server setup
// Call a.engine.Route() for routing logic
}
func (a *MyAdapter) Stop() error {
// Graceful shutdown
}
func (a *MyAdapter) GetEngine() *engine.RouterEngine {
return a.engine
}
- Register in manager:
// pkg/adapter/manager.go
case "myprotocol":
adapter, err = myprotocol.NewAdapter(eng, adapterCfg.Port)
- Add configuration support:
adapters:
- type: "myprotocol"
enabled: true
port: 9001
Referencesโ
Appendixโ
Performance Considerationsโ
- RouterEngine: Single instance reduces memory, but could be bottleneck
- Cache: Backend choice critical (Redis/Milvus for production)
- Replay Recording: Async writes recommended for high throughput
- Adapter Overhead: Minimal, mostly network/protocol serialization
Security Considerationsโ
- TLS Support: Per-adapter TLS configuration
- Authentication: Handled at adapter level (future work: external authz abstraction)
- Authorization: Future work to abstract external authz providers (OPA, custom)
- PII Detection: Shared across all adapters
- Jailbreak Detection: Shared across all adapters
Monitoring and Observabilityโ
- Metrics: Per-adapter and RouterEngine metrics
- Tracing: Distributed tracing spans adapters
- Logging: Structured logs with adapter context
- Health Checks: Per-adapter health endpoints