版本：🚧 开发中

Redis Storage for Response API

The Redis storage backend provides persistent, distributed storage for Response API conversations. This solution offers excellent performance with automatic TTL expiration and support for both standalone and cluster deployments.

Overview

Redis storage is ideal for:

Production environments requiring persistence across restarts
Distributed deployments sharing state across multiple router instances
Stateful conversations with automatic chain management
TTL-based expiration for automatic cleanup
High availability with cluster support

Configuration

Where to Configure

Response API storage is configured in config/config.yaml:

# config/config.yaml
response_api:
  enabled: true
  store_backend: "memory"  # Default: "memory", Options: "redis", "milvus"
  ttl_seconds: 86400       # 24 hours (default)
  max_responses: 1000      # Memory store limit

To enable Redis storage, change store_backend to "redis" and add Redis configuration:

# config/config.yaml
response_api:
  enabled: true
  store_backend: "redis"   # Switch to Redis
  ttl_seconds: 2592000     # 30 days (recommended for Redis)
  redis:
    address: "localhost:6379"
    password: ""
    db: 0
    key_prefix: "sr:"

Configuration examples:

See config/response-api/redis-simple.yaml for basic standalone setup
See config/response-api/redis-cluster.yaml for cluster configuration
See config/response-api/redis-production.yaml for production with TLS

Standalone Redis Configuration

For simple deployments:

# config/config.yaml
response_api:
  redis:
    address: "localhost:6379"
    db: 0
    key_prefix: "sr:"
    pool_size: 10
    min_idle_conns: 2
    max_retries: 3
    dial_timeout: 5
    read_timeout: 3
    write_timeout: 3

Redis Cluster Configuration

For distributed deployments:

# config/config.yaml
response_api:
  redis:
    cluster_mode: true
    cluster_addresses:
      - "redis-node-1:6379"
      - "redis-node-2:6379"
      - "redis-node-3:6379"
    db: 0  # Must be 0 for cluster
    key_prefix: "sr:"
    pool_size: 20

External Configuration File

For complex configurations, use config/response-api/redis.yaml:

# config/config.yaml
response_api:
  redis:
    config_path: "config/response-api/redis-production.yaml"

Example production config:

# config/response-api/redis-production.yaml
address: "redis.production.svc.cluster.local:6379"
password: "${REDIS_PASSWORD}"
db: 0
key_prefix: "sr:"

# Connection pooling
pool_size: 20
min_idle_conns: 5
max_retries: 3

# TLS/SSL
tls_enabled: true
tls_cert_path: "/etc/ssl/certs/redis-client.crt"
tls_key_path: "/etc/ssl/certs/redis-client.key"

Setup and Deployment

1. Start Redis

Note: This guide covers Standalone Redis setup. For production deployments with high availability and automatic failover, see Redis Cluster Storage Guide.

Using Docker:

Note: Response API uses standard Redis (image: redis:7-alpine), not Redis Stack.

Why? Response API only needs basic key-value storage (GET, SET, DEL, EXPIRE).

# Simple Redis (recommended for Response API)
docker run -d \
  --name redis-response-api \
  -p 6379:6379 \
  redis:7-alpine

# With persistence (production)
docker run -d \
  --name redis-response-api \
  -p 6379:6379 \
  -v redis-data:/data \
  redis:7-alpine redis-server --save 60 1 --appendonly yes

Verify Redis is running:

docker exec redis-response-api redis-cli PING
# Expected: PONG

2. Configure Semantic Router

Update config/config.yaml:

response_api:
  enabled: true
  store_backend: "redis"
  redis:
    address: "localhost:6379"
    db: 0
    key_prefix: "sr:"

3. Run Semantic Router

# Build and run
make build-router
make run-router

4. Run EnvoyProxy

# Start Envoy proxy
make run-envoy

5. Verify vLLM Backend Configuration

Check that vllm_endpoints in your config/config.yaml points to a running vLLM server:

vllm_endpoints:
  - name: "endpoint1"
    address: "127.0.0.1"  # Must match your vLLM server address
    port: 8002            # Must match your vLLM server port
    weight: 1

model_config:
  "qwen3":  # Must match the model name served by vLLM
    preferred_endpoints: ["endpoint1"]

Verify your vLLM server is running:

# Example: Check if llm-katan is running (if using docker-compose)
docker ps | grep llm-katan

# Or check vLLM health endpoint
curl http://localhost:8002/health

6. Test Response API

Note: The examples below use llm-katan (Qwen3-0.6B) as the LLM backend. Adjust the model name to match your vLLM configuration.

Create a response:

curl -X POST http://localhost:8801/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3",
    "input": "What is 2+2?",
    "instructions": "You are a helpful math assistant.",
    "store": true
  }'

Response:

{
  "id": "resp_5f679741e9652a09879dc625",
  "object": "response",
  "created_at": 1767864626,
  "model": "qwen3",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "item_a4bba6509e19a18f1c25f4fa",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "2+2 is 4."
        }
      ],
      "status": "completed"
    }
  ],
  "output_text": "2+2 is 4.",
  "usage": {
    "input_tokens": 21,
    "output_tokens": 512,
    "total_tokens": 533
  },
  "instructions": "You are a helpful math assistant."
}

Create a follow-up response:

curl -X POST http://localhost:8801/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3",
    "input": "What about 3+3?",
    "previous_response_id": "resp_5f679741e9652a09879dc625",
    "store": true
  }'

7. Retrieve a Stored Response

Get a specific response:

curl -X GET http://localhost:8801/v1/responses/resp_5f679741e9652a09879dc625

Response:

{
  "id": "resp_5f679741e9652a09879dc625",
  "object": "response",
  "status": "completed",
  "model": "qwen3",
  "output_text": "2+2 is 4.",
  "created_at": 1767864626
}

8. Get Input Items

Get input items for a response:

curl -X GET http://localhost:8801/v1/responses/resp_5f679741e9652a09879dc625/input_items

Response:

{
  "object": "list",
  "data": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "What is 2+2?"
        }
      ]
    },
    {
      "type": "message",
      "role": "system",
      "content": [
        {
          "type": "input_text",
          "text": "You are a helpful math assistant."
        }
      ]
    }
  ],
  "first_id": "item_xxx",
  "last_id": "item_yyy",
  "has_more": false
}

9. Delete a Response

Delete a specific response:

curl -X DELETE http://localhost:8801/v1/responses/resp_5f679741e9652a09879dc625

Response:

{
  "id": "resp_5f679741e9652a09879dc625",
  "object": "response.deleted",
  "deleted": true
}

Verify deletion:

# This should return 404 Not Found
curl -X GET http://localhost:8801/v1/responses/resp_5f679741e9652a09879dc625

Key Schema

Redis stores data with the following key patterns:

sr:response:{response_id}       → StoredResponse (JSON)

How conversation chains work:

Unlike traditional databases with join tables, Response API uses embedded links:

Each response has previous_response_id field
Chains are built by following these links backwards

Example:

# Response 3 (latest)
GET sr:response:resp_3
→ {"id": "resp_3", "previous_response_id": "resp_2", ...}

# Response 2 (middle)
GET sr:response:resp_2
→ {"id": "resp_2", "previous_response_id": "resp_1", ...}

# Response 1 (first)
GET sr:response:resp_1
→ {"id": "resp_1", "previous_response_id": "", ...}

How chains are retrieved:

Chains are built automatically when you provide previous_response_id in a request. The system:

Uses GetConversationChain(responseID) internally
Follows previous_response_id links backwards using Redis pipelining
Returns responses in chronological order (oldest first)
Passes the full history as context to the LLM

Example flow:

Request with previous_response_id="resp_3"
  ↓
System calls GetConversationChain("resp_3")
  ↓
Redis pipelined GET: resp_3 → resp_2 → resp_1
  ↓
Returns: [resp_1, resp_2, resp_3] (chronological)

Monitoring

Redis CLI Commands

# Connect to Redis CLI
docker exec -it redis-response-api redis-cli

# View all Response API keys
KEYS sr:*

# List response keys only
KEYS sr:response:*

# Get a specific response
GET sr:response:resp_abc123

# Check TTL (time to live)
TTL sr:response:resp_abc123

# Count total keys
DBSIZE

# Monitor live commands (useful for debugging)
MONITOR

# Get Redis server info
INFO

Performance Optimization

Connection Pooling

Adjust pool size for high-traffic deployments:

response_api:
  redis:
    pool_size: 20          # Default: 10
    min_idle_conns: 5      # Default: 2

Pipelining

The Redis store automatically uses pipelining for GetConversationChain() to minimize network round-trips.

Performance improvement:

Without pipelining: 10 responses = 10 network calls
With pipelining: 10 responses = ~2 network calls

TTL Management

Adjust TTL based on your needs:

response_api:
  ttl_seconds: 86400  # 24 hours (shorter for frequent updates)
  # ttl_seconds: 2592000  # 30 days (default, longer retention)

Reference

Response API Overview: See docs/tutorials/intelligent-route/router-memory.md for general Response API documentation
Redis Cluster Setup: See redis-cluster-storage.md for distributed deployment with Redis Cluster
Configuration Examples: See config/response-api/ directory for sample configurations

Overview​

Configuration​

Where to Configure​

Standalone Redis Configuration​

Redis Cluster Configuration​

External Configuration File​

Setup and Deployment​

1. Start Redis​

2. Configure Semantic Router​

3. Run Semantic Router​

4. Run EnvoyProxy​

5. Verify vLLM Backend Configuration​

6. Test Response API​

7. Retrieve a Stored Response​

8. Get Input Items​

9. Delete a Response​

Key Schema​

Monitoring​

Redis CLI Commands​

Performance Optimization​

Connection Pooling​

Pipelining​

TTL Management​

Reference​

Overview

Configuration

Where to Configure

Standalone Redis Configuration

Redis Cluster Configuration

External Configuration File

Setup and Deployment

1. Start Redis

2. Configure Semantic Router

3. Run Semantic Router

4. Run EnvoyProxy

5. Verify vLLM Backend Configuration

6. Test Response API

7. Retrieve a Stored Response

8. Get Input Items

9. Delete a Response

Key Schema

Monitoring

Redis CLI Commands

Performance Optimization

Connection Pooling

Pipelining

TTL Management

Reference