Version: v0.1

vLLM Endpoints Configuration

This guide provides quick configuration recipes for vLLM backend endpoints and load balancing. Use these patterns to set up single or multi-endpoint deployments with weighted traffic distribution.

Basic Endpoint Definition

Define a single vLLM endpoint:

vllm_endpoints:
  - name: "endpoint1"
    address: "172.28.0.20" # IPv4 address
    port: 8002
    weight: 1

See: config.yaml#vllm_endpoints.

caution

The address field must be a valid IP address (IPv4 or IPv6).

✅ Supported: 127.0.0.1, 192.168.1.1, ::1, 2001:db8::1
❌ Not supported: domain names, protocol prefixes (http://), paths, or ports in the address field

Multiple Endpoints with Load Balancing

Configure multiple endpoints with weighted distribution:

vllm_endpoints:
  - name: "primary"
    address: "10.0.0.10"
    port: 8000
    weight: 3 # Receives 3x traffic

  - name: "secondary"
    address: "10.0.0.11"
    port: 8000
    weight: 1 # Receives 1x traffic

Map Models to Specific Endpoints

Route specific models to preferred endpoints:

vllm_endpoints:
  - name: "gpu_cluster_a"
    address: "10.0.1.10"
    port: 8000
    weight: 1

  - name: "gpu_cluster_b"
    address: "10.0.2.10"
    port: 8000
    weight: 1

model_config:
  "qwen3":
    reasoning_family: "qwen3"
    preferred_endpoints: ["gpu_cluster_a"]

  "llama":
    reasoning_family: "llama"
    preferred_endpoints: ["gpu_cluster_b"]

See: config.yaml#preferred_endpoints AND config.go endpoints.

IPv6 Endpoint Configuration

Use IPv6 addresses for endpoints:

vllm_endpoints:
  - name: "ipv6_endpoint"
    address: "2001:db8::1"
    port: 8000
    weight: 1

Docker Compose Network Endpoints

When using Docker Compose, use container IP or service name resolution:

# In config.yaml
vllm_endpoints:
  - name: "llm-katan"
    address: "172.28.0.20" # Static IP assigned in docker-compose.yml
    port: 8002
    weight: 1

# In docker-compose.yml
services:
  llm-service:
    networks:
      app-network:
        ipv4_address: 172.28.0.20

networks:
  app-network:
    ipam:
      config:
        - subnet: 172.28.0.0/16

See: config.yaml#vllm_endpoints AND docker-compose.yml.

Kubernetes Endpoints

For Kubernetes deployments, use Service ClusterIP or Pod IP:

vllm_endpoints:
  - name: "vllm-svc"
    address: "10.96.100.50" # ClusterIP of vLLM Service
    port: 8000
    weight: 1

High Availability Setup

Configure multiple endpoints for failover:

vllm_endpoints:
  # Primary datacenter
  - name: "dc1-primary"
    address: "10.1.0.10"
    port: 8000
    weight: 2

  - name: "dc1-secondary"
    address: "10.1.0.11"
    port: 8000
    weight: 1

  # Secondary datacenter (lower weight for DR)
  - name: "dc2-primary"
    address: "10.2.0.10"
    port: 8000
    weight: 1

Endpoint Validation Checklist

Before deploying, verify:

Check	Command
IP is reachable	`ping <address>`
Port is open	`nc -zv <address> <port>`
vLLM is responding	`curl http://<address>:<port>/health`
Model is loaded	`curl http://<address>:<port>/v1/models`

Common Mistakes

❌ Using Domain Names

# WRONG - domain names not supported
vllm_endpoints:
  - name: "endpoint1"
    address: "vllm.example.com" # ❌ Won't work

❌ Including Protocol or Port in Address

# WRONG - no protocol prefix or port in address
vllm_endpoints:
  - name: "endpoint1"
    address: "http://10.0.0.10:8000" # ❌ Wrong format

✅ Correct Format

# CORRECT
vllm_endpoints:
  - name: "endpoint1"
    address: "10.0.0.10" # ✅ IP only
    port: 8000 # ✅ Port separate

Basic Endpoint Definition​

Multiple Endpoints with Load Balancing​

Map Models to Specific Endpoints​

IPv6 Endpoint Configuration​

Docker Compose Network Endpoints​

Kubernetes Endpoints​

High Availability Setup​

Endpoint Validation Checklist​

Common Mistakes​

❌ Using Domain Names​

❌ Including Protocol or Port in Address​

✅ Correct Format​