Install in Local

This guide will help you set up and install the Semantic Router on your system. The router runs entirely on CPU and does not require GPU for inference.

System Requirements

Note: No GPU required - the router runs efficiently on CPU using optimized BERT models.

Software Dependencies

Go: Version 1.19 or higher
Rust: Version 1.70 or higher (for Candle bindings)
HuggingFace CLI: For model downloads (pip install huggingface_hub)

Local Installation

1. Clone the Repository

git clone https://github.com/vllm-project/semantic-router.git
cd semantic-router

2. Install Dependencies

Install Go (if not already installed)

# Check if Go is installed
go version

# If not installed, download from https://golang.org/dl/
# Or use package manager:
# macOS: brew install go
# Ubuntu: sudo apt install golang-go

Install Rust (if not already installed)

# Check if Rust is installed
rustc --version

# If not installed:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

Install HuggingFace CLI

pip install huggingface_hub

3. Build the Project

# Build everything (Rust + Go)
make build

This command will:

Build the Rust candle-binding library
Build the Go router binary
Place the executable in bin/router

4. Download Pre-trained Models

# Download all required models (about 1.5GB total)
make download-models

This downloads the CPU-optimized BERT models for:

Category classification
PII detection
Jailbreak detection

5. Configure Backend Endpoints

Edit config/config.yaml to point to your LLM endpoints:

# Example: Configure your vLLM or Ollama endpoints
vllm_endpoints:
  - name: "your-endpoint"
    address: "your-llm-server.com"  # Replace with your server
    port: 11434                     # Replace with your port
    models:
      - "your-model-name"           # Replace with your model
    weight: 1

model_config:
  "your-model-name":
    param_count: 671000000000  # 671B parameters for DeepSeek-V3.1
    batch_size: 512.0  # vLLM default batch size
    context_size: 65536.0 # DeepSeek-V3.1 context length
    pii_policy:
      allow_by_default: false  # Deny all PII by default
      pii_types_allowed: ["EMAIL_ADDRESS", "PERSON", "GPE", "PHONE_NUMBER"]  # Only allow these specific PII types
    preferred_endpoints: ["your-endpoint"]

The default configuration includes example endpoints that you should update for your setup.

Running the Router

1. Start the Services

Open two terminals and run:

Terminal 1: Start Envoy Proxy

make run-envoy

Terminal 2: Start Semantic Router

make run-router

Step 2: Manual Testing

You can also send custom requests:

curl -X POST http://localhost:8801/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is the derivative of x^2?"}
    ]
  }'

Next Steps

After successful installation:

Configuration Guide - Customize your setup and add your own endpoints
API Documentation - Detailed API reference

Getting Help

Issues: Report bugs on GitHub Issues
Documentation: Full documentation at Read the Docs

You now have a working Semantic Router that runs entirely on CPU and intelligently routes requests to specialized models!

System Requirements​

Software Dependencies​

Local Installation​

1. Clone the Repository​

2. Install Dependencies​

Install Go (if not already installed)​

Install Rust (if not already installed)​

Install HuggingFace CLI​

3. Build the Project​

4. Download Pre-trained Models​

5. Configure Backend Endpoints​

Running the Router​

1. Start the Services​

Step 2: Manual Testing​

Next Steps​

Getting Help​