Install in Local
This guide will help you set up and install the Semantic Router on your system. The router runs entirely on CPU and does not require GPU for inference.
System Requirements​
Note: No GPU required - the router runs efficiently on CPU using optimized BERT models.
Software Dependencies​
- Go: Version 1.19 or higher
- Rust: Version 1.70 or higher (for Candle bindings)
- HuggingFace CLI: For model downloads (
pip install huggingface_hub
)
Local Installation​
1. Clone the Repository​
git clone https://github.com/vllm-project/semantic-router.git
cd semantic-router
2. Install Dependencies​
Install Go (if not already installed)​
# Check if Go is installed
go version
# If not installed, download from https://golang.org/dl/
# Or use package manager:
# macOS: brew install go
# Ubuntu: sudo apt install golang-go
Install Rust (if not already installed)​
# Check if Rust is installed
rustc --version
# If not installed:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
Install HuggingFace CLI​
pip install huggingface_hub
3. Build the Project​
# Build everything (Rust + Go)
make build
This command will:
- Build the Rust candle-binding library
- Build the Go router binary
- Place the executable in
bin/router
4. Download Pre-trained Models​
# Download all required models (about 1.5GB total)
make download-models
This downloads the CPU-optimized BERT models for:
- Category classification
- PII detection
- Jailbreak detection
5. Configure Backend Endpoints​
Edit config/config.yaml
to point to your LLM endpoints:
# Example: Configure your vLLM or Ollama endpoints
vllm_endpoints:
- name: "your-endpoint"
address: "your-llm-server.com" # Replace with your server
port: 11434 # Replace with your port
models:
- "your-model-name" # Replace with your model
weight: 1
model_config:
"your-model-name":
param_count: 671000000000 # 671B parameters for DeepSeek-V3.1
batch_size: 512.0 # vLLM default batch size
context_size: 65536.0 # DeepSeek-V3.1 context length
pii_policy:
allow_by_default: false # Deny all PII by default
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON", "GPE", "PHONE_NUMBER"] # Only allow these specific PII types
preferred_endpoints: ["your-endpoint"]
The default configuration includes example endpoints that you should update for your setup.
Running the Router​
1. Start the Services​
Open two terminals and run:
Terminal 1: Start Envoy Proxy
make run-envoy
Terminal 2: Start Semantic Router
make run-router
Step 2: Manual Testing​
You can also send custom requests:
curl -X POST http://localhost:8801/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "What is the derivative of x^2?"}
]
}'
Next Steps​
After successful installation:
- Configuration Guide - Customize your setup and add your own endpoints
- API Documentation - Detailed API reference
Getting Help​
- Issues: Report bugs on GitHub Issues
- Documentation: Full documentation at Read the Docs
You now have a working Semantic Router that runs entirely on CPU and intelligently routes requests to specialized models!