OpenAI RAG Integration
This guide demonstrates how to use OpenAI's File Store and Vector Store APIs for RAG (Retrieval-Augmented Generation) in Semantic Router, following the OpenAI Responses API cookbook.
Overviewโ
The OpenAI RAG backend integrates with OpenAI's File Store and Vector Store APIs to provide a first-class RAG experience. It supports two workflow modes:
- Direct Search Mode (default): Synchronous retrieval using vector store search API
- Tool-Based Mode: Adds
file_searchtool to request (Responses API workflow)
Architectureโ
โโโโโโโโโโโโโโโ
โ Client โ
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Semantic Router โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RAG Plugin โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ OpenAI RAG Backend โ โ โ
โ โ โโโ โโโโโฌโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ โ โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ โ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OpenAI API โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ File Store โ โVector Store โ โ
โ โ API โ โ API โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Prerequisitesโ
- OpenAI API key with access to File Store and Vector Store APIs
- Files uploaded to OpenAI File Store
- Vector store created and populated with files
Configurationโ
Basic Configurationโ
Add the OpenAI RAG backend to your decision configuration:
decisions:
- name: rag-openai-decision
signals:
- type: keyword
keywords: ["research", "document", "knowledge"]
plugins:
rag:
enabled: true
backend: "openai"
backend_config:
vector_store_id: "vs_abc123" # Your vector store ID
api_key: "${OPENAI_API_KEY}" # Or use environment variable
max_num_results: 10
workflow_mode: "direct_search" # or "tool_based"
Advanced Configurationโ
rag:
enabled: true
backend: "openai"
similarity_threshold: 0.7
top_k: 10
max_context_length: 5000
injection_mode: "tool_role" # or "system_prompt"
on_failure: "skip" # or "warn" or "block"
cache_results: true
cache_ttl_seconds: 3600
backend_config:
vector_store_id: "vs_abc123"
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.openai.com/v1" # Optional, defaults to OpenAI
max_num_results: 10
file_ids: # Optional: restrict search to specific files
- "file-123"
- "file-456"
filter: # Optional: metadata filter
category: "research"
published_date: "2024-01-01"
workflow_mode: "direct_search" # or "tool_based"
timeout_seconds: 30
Workflow Modesโ
1. Direct Search Mode (Default)โ
Synchronous retrieval using vector store search API. Context is retrieved before sending the request to the LLM.
Use Case: When you need immediate context injection and want to control the retrieval process.
Example:
backend_config:
workflow_mode: "direct_search"
vector_store_id: "vs_abc123"
Flow:
- User sends query
- RAG plugin calls vector store search API
- Retrieved context is injected into request
- Request sent to LLM with context
2. Tool-Based Mode (Responses API)โ
Adds file_search tool to the request. The LLM calls the tool automatically, and results appear in response annotations.
Use Case: When using Responses API and want the LLM to control when to search.
Example:
backend_config:
workflow_mode: "tool_based"
vector_store_id: "vs_abc123"
Flow:
- User sends query
- RAG plugin adds
file_searchtool to request - Request sent to LLM
- LLM calls
file_searchtool - Results appear in response annotations
Usage Examplesโ
Example 1: Basic RAG Queryโ
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-VSR-Selected-Decision: rag-openai-decision" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "What is Deep Research?"
}
]
}'
Example 2: Responses API with file_search Toolโ
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"input": "What is Deep Research?",
"tools": [
{
"type": "file_search",
"file_search": {
"vector_store_ids": ["vs_abc123"],
"max_num_results": 5
}
}
]
}'
Example 3: Python Clientโ
import requests
# Direct search mode
response = requests.post(
"http://localhost:8080/v1/chat/completions",
headers={
"Content-Type": "application/json",
"X-VSR-Selected-Decision": "rag-openai-decision"
},
json={
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is Deep Research?"}
]
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])
File Store Operationsโ
The OpenAI RAG backend includes a File Store client for managing files:
Upload Fileโ
import "github.com/vllm-project/semantic-router/src/semantic-router/pkg/openai"
client := openai.NewFileStoreClient("https://api.openai.com/v1", apiKey)
file, err := client.UploadFile(ctx, fileReader, "document.pdf", "assistants")
Create Vector Storeโ
vectorStoreClient := openai.NewVectorStoreClient("https://api.openai.com/v1", apiKey)
store, err := vectorStoreClient.CreateVectorStore(ctx, &openai.CreateVectorStoreRequest{
Name: "my-vector-store",
FileIDs: []string{"file-123", "file-456"},
})
Attach File to Vector Storeโ
_, err := vectorStoreClient.CreateVectorStoreFile(ctx, "vs_abc123", "file-123")
Testingโ
Unit Testsโ
Run unit tests for OpenAI RAG:
cd src/semantic-router
go test ./pkg/openai/... -v
go test ./pkg/extproc/req_filter_rag_openai_test.go -v
E2E Testsโ
Run E2E tests based on the OpenAI cookbook:
# Python-based E2E test
python e2e/testing/08-rag-openai-test.py --base-url http://localhost:8080
# Go-based E2E test (requires Kubernetes cluster)
make e2e-test E2E_TESTS=rag-openai
OpenAI API Validation Test Suiteโ
Validation tests ensure the OpenAI API implementation (Files, Vector Stores, Search) stays compatible with upstream. Adapted from openai-python/tests. Run when OPENAI_API_KEY is set.
Python E2E (contract validation against real API):
# From repo root; skips all tests if OPENAI_API_KEY is not set
OPENAI_API_KEY=sk-... python e2e/testing/09-openai-api-validation-test.py --verbose
# Optional: override API base URL
OPENAI_BASE_URL=https://api.openai.com/v1 OPENAI_API_KEY=sk-... python e2e/testing/09-openai-api-validation-test.py
Go integration (pkg/openai client against real API):
cd src/semantic-router
# Skips tests if OPENAI_API_KEY is not set
OPENAI_API_KEY=sk-... go test -tags=openai_validation ./pkg/openai -v
Tests cover: Files (list, upload, get, delete), Vector Stores (list, create, get, update, delete), Vector Store Files (list), and Vector Store Search (response schema).
Monitoring and Observabilityโ
The OpenAI RAG backend exposes the following metrics:
rag_retrieval_attempts_total{backend="openai", decision="...", status="success|error"}rag_retrieval_latency_seconds{backend="openai", decision="..."}rag_similarity_score{backend="openai", decision="..."}rag_context_length_chars{backend="openai", decision="..."}rag_cache_hits_total{backend="openai"}rag_cache_misses_total{backend="openai"}
Tracingโ
OpenTelemetry spans are created for:
semantic_router.rag.retrieval- RAG retrieval operationsemantic_router.rag.context_injection- Context injection operation
Error Handlingโ
The RAG plugin supports three failure modes:
- skip (default): Continue without context, log warning
- warn: Continue with warning header
- block: Return error response (503)
rag:
on_failure: "skip" # or "warn" or "block"
Best Practicesโ
- Use Direct Search for Synchronous Workflows: When you need immediate context injection
- Use Tool-Based for Responses API: When using Responses API and want LLM-controlled search
- Cache Results: Enable caching for frequently accessed queries
- Set Appropriate Timeouts: Configure
timeout_secondsbased on your vector store size - Filter Results: Use
file_idsorfilterto narrow search scope - Monitor Metrics: Track retrieval latency and similarity scores
Troubleshootingโ
No Results Foundโ
- Verify vector store ID is correct
- Check that files are attached to the vector store
- Ensure files have completed processing (check
file_counts.completed)
High Latencyโ
- Reduce
max_num_results - Enable result caching
- Use
file_idsto limit search scope
Authentication Errorsโ
- Verify API key is correct
- Check API key has access to File Store and Vector Store APIs
- Ensure base URL is correct (if using custom endpoint)