Valkey Agentic Memory
This guide covers deploying Valkey as the agentic memory backend for the Semantic Router. Valkey provides a lightweight, Redis-compatible alternative to Milvus for vector similarity storage using the built-in Search module.
Valkey is optional. The default memory backend is Milvus. Use Valkey when you want a single-binary deployment without external dependencies like etcd or MinIO, or when you already run Valkey for caching.
When to Use Valkey vs Milvus
| Concern | Valkey | Milvus |
|---|---|---|
| Deployment complexity | Single binary with Search module | Requires etcd, MinIO/S3, optional Pulsar |
| Horizontal scaling | Cluster mode (manual sharding) | Native distributed architecture |
| Memory model | In-memory with optional persistence | Disk-based with memory-mapped indexes |
| Best for | Small-to-medium workloads, dev/test, existing Redis/Valkey infra | Large-scale production, billions of vectors |
| Vector index | HNSW via FT.CREATE | HNSW, IVF_FLAT, IVF_SQ8, and more |
Prerequisites
- Valkey 8.0+ with the Search module enabled
- Text support for vector search was added in Search module version 1.2.0
- The
valkey/valkey-bundleDocker image includes Search out of the box. Search module 1.2.0 is available in theunstableand9.1.0-rc1valkey-bundle versions - If your Valkey deployment does not include the Search module, you can add it manually
- For Kubernetes: Helm 3.x and
kubectlconfigured
If you run into issues loading or using the Search module, please open an issue so we can help.
Deploy with Docker
Quick Start
docker run -d --name valkey-memory \
-p 6379:6379 \
valkey/valkey-bundle:latest
Verify the Search module is loaded:
docker exec valkey-memory valkey-cli MODULE LIST | grep search
With Persistence
docker run -d --name valkey-memory \
-p 6379:6379 \
-v valkey-data:/data \
valkey/valkey-bundle:latest \
valkey-server --appendonly yes
Deploy in Kubernetes
Using a StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: valkey-memory
namespace: vllm-semantic-router-system
spec:
serviceName: valkey-memory
replicas: 1
selector:
matchLabels:
app: valkey-memory
template:
metadata:
labels:
app: valkey-memory
spec:
containers:
- name: valkey
image: valkey/valkey-bundle:latest
ports:
- containerPort: 6379
args: ["valkey-server", "--appendonly", "yes"]
# For production, add --requirepass or mount a Secret:
# args: ["valkey-server", "--appendonly", "yes", "--requirepass", "$(VALKEY_PASSWORD)"]
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
name: valkey-memory
namespace: vllm-semantic-router-system
spec:
selector:
app: valkey-memory
ports:
- port: 6379
targetPort: 6379
clusterIP: None
Configure the Router
Add the Valkey memory backend to your config.yaml:
global:
stores:
memory:
enabled: true
backend: valkey
auto_store: true
valkey:
host: valkey-memory # Service name or hostname
port: 6379
database: 0
timeout: 10
collection_prefix: "mem:"
index_name: mem_idx
dimension: 384 # Must match your embedding model
metric_type: COSINE # COSINE, L2, or IP
index_m: 16
index_ef_construction: 256
embedding_model: bert
default_retrieval_limit: 5
default_similarity_threshold: 0.70
hybrid_search: true
hybrid_mode: rerank
adaptive_threshold: true
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
host | localhost | Valkey server hostname |
port | 6379 | Valkey server port |
database | 0 | Database number (0-15) |
password | (empty) | Authentication password |
timeout | 10 | Connection timeout in seconds |
collection_prefix | mem: | Key prefix for HASH documents |
index_name | mem_idx | FT.CREATE index name |
dimension | 384 | Embedding vector dimension |
metric_type | COSINE | Distance metric: COSINE, L2, or IP |
index_m | 16 | HNSW M parameter (links per node) |
index_ef_construction | 256 | HNSW build-time search width |
Optional Redis Hot Cache
You can layer a Redis/Valkey hot cache in front of the Valkey memory store for frequently accessed memories:
redis_cache:
enabled: true
address: "valkey-memory:6379"
ttl_seconds: 900
db: 1 # Use a different DB to avoid key collisions
key_prefix: "memory_cache:"
Per-Decision Memory Plugin
Routes can override global memory settings using the memory plugin:
routing:
decisions:
- name: personalized_route
plugins:
- type: memory
configuration:
enabled: true
retrieval_limit: 10
similarity_threshold: 0.60
auto_store: true
See the Memory plugin tutorial for details.