Router Memory
Router Memory enables stateful conversations via the OpenAI Response API, supporting conversation chaining with previous_response_id.
Overviewβ
Semantic Router acts as the unified brain for multiple LLM backends that only support the Chat Completions API. It provides:
- Cross-Model Stateful Conversations: Maintain conversation history across different models
- Unified Response API: Single API interface regardless of backend model
- Transparent Translation: Automatic conversion between Response API and Chat Completions
With Router Memory, you can start a conversation with one model and continue it with anotherβthe conversation history is preserved in the router, not in any single backend.
Request Flowβ
Endpointsβ
| Endpoint | Method | Description |
|---|---|---|
/v1/responses | POST | Create a new response |
/v1/responses/{id} | GET | Retrieve a stored response |
/v1/responses/{id} | DELETE | Delete a stored response |
/v1/responses/{id}/input_items | GET | List input items for a response |
Configurationβ
response_api:
enabled: true
store_backend: "memory" # Currently only "memory" is supported
ttl_seconds: 86400 # Default: 30 days
max_responses: 1000
Usageβ
1. Create Responseβ
curl -X POST http://localhost:8801/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"input": "Tell me a joke.",
"instructions": "Remember my name is Xunzhuo. Then I will ask you!",
"temperature": 0.7,
"max_output_tokens": 100
}'
Response:
{
"id": "resp_7cb437001e1ad5b84b6dd8ef",
"object": "response",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Sure thing, Xunzhuo! Why don't scientists trust atoms? Because they make up everything! π"}]
}],
"usage": {"input_tokens": 94, "output_tokens": 75, "total_tokens": 169}
}
2. Continue Conversationβ
Use previous_response_id to chain conversations:
curl -X POST http://localhost:8801/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"input": "What is my name?",
"previous_response_id": "resp_7cb437001e1ad5b84b6dd8ef",
"max_output_tokens": 100
}'
Response:
{
"id": "resp_ec2822df62e390dcb87aa61d",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Your name is Xunzhuo."}]
}],
"previous_response_id": "resp_7cb437001e1ad5b84b6dd8ef"
}
3. Get Responseβ
curl http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef
4. List Input Itemsβ
curl http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef/input_items
Response:
{
"object": "list",
"data": [{
"type": "message",
"role": "system",
"content": [{"type": "input_text", "text": "Remember my name is Xunzhuo."}]
}],
"has_more": false
}
5. Delete Responseβ
curl -X DELETE http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef
API Translationβ
| Response API | Chat Completions |
|---|---|
input | messages[].content (role: user) |
instructions | messages[0] (role: system) |
previous_response_id | Expanded to full messages array |
max_output_tokens | max_tokens |