Skip to main content
Documentation

Qdrant

This guide covers deploying Qdrant as a backend for the Semantic Router. Qdrant can serve as the semantic cache, agentic memory store, vector store, and router replay store.

Version: Latest

Qdrant

This guide covers deploying Qdrant as a backend for the Semantic Router. Qdrant can serve as the semantic cache, agentic memory store, vector store, and router replay store.

Prerequisites

  • Docker or a Kubernetes cluster with kubectl configured
  • For Kubernetes: Helm 3.x installed

Deploy with Docker

Quick Start

docker run -d --name qdrant \
-p 6333:6333 \
-p 6334:6334 \
qdrant/qdrant:latest

Verify Qdrant is running:

curl http://localhost:6333/healthz

With Persistence

docker run -d --name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v qdrant-data:/qdrant/storage \
qdrant/qdrant:latest

With API Key Authentication

docker run -d --name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v qdrant-data:/qdrant/storage \
-e QDRANT__SERVICE__API_KEY=your-secret-key \
qdrant/qdrant:latest

Deploy in Kubernetes

Using Helm

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm repo update

helm install qdrant qdrant/qdrant \
--namespace vllm-semantic-router-system --create-namespace \
--set persistence.size=10Gi

Using a StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
namespace: vllm-semantic-router-system
spec:
serviceName: qdrant
replicas: 1
selector:
matchLabels:
app: qdrant
template:
metadata:
labels:
app: qdrant
spec:
containers:
- name: qdrant
image: qdrant/qdrant:latest
ports:
- containerPort: 6333
- containerPort: 6334
volumeMounts:
- name: data
mountPath: /qdrant/storage
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: qdrant
namespace: vllm-semantic-router-system
spec:
selector:
app: qdrant
ports:
- name: rest
port: 6333
targetPort: 6333
- name: grpc
port: 6334
targetPort: 6334
clusterIP: None

Configure the Router

Semantic Cache

global:
stores:
semantic_cache:
enabled: true
backend_type: qdrant
similarity_threshold: 0.90
ttl_seconds: 7200
embedding_model: bert
qdrant:
host: qdrant # Service name or hostname
port: 6334
api_key: ""
use_tls: false
collection_name: semantic_cache
connect_timeout: 10

Agentic Memory

global:
stores:
memory:
enabled: true
backend: qdrant
qdrant:
host: qdrant
port: 6334
api_key: ""
collection: agentic_memory
dimension: 384 # Must match your embedding model
embedding_model: bert
default_retrieval_limit: 5
default_similarity_threshold: 0.70

Router Replay Store

global:
services:
router_replay:
store_backend: qdrant
qdrant:
host: qdrant
port: 6334
api_key: ""
collection_name: router_replay

Configuration Reference

ParameterDefaultDescription
hostlocalhostQdrant server hostname
port6334Qdrant gRPC port
api_key(empty)API key for authentication
use_tlsfalseEnable TLS for the gRPC connection
collection_namevariesCollection to use (auto-created if absent)
connect_timeout10Connection timeout in seconds