Skip to main content

Router Memory

Router Memory enables stateful conversations via the OpenAI Response API, supporting conversation chaining with previous_response_id.

Overview​

Semantic Router acts as the unified brain for multiple LLM backends that only support the Chat Completions API. It provides:

  • Cross-Model Stateful Conversations: Maintain conversation history across different models
  • Unified Response API: Single API interface regardless of backend model
  • Transparent Translation: Automatic conversion between Response API and Chat Completions

With Router Memory, you can start a conversation with one model and continue it with anotherβ€”the conversation history is preserved in the router, not in any single backend.

Request Flow​

Endpoints​

EndpointMethodDescription
/v1/responsesPOSTCreate a new response
/v1/responses/{id}GETRetrieve a stored response
/v1/responses/{id}DELETEDelete a stored response
/v1/responses/{id}/input_itemsGETList input items for a response

Configuration​

response_api:
enabled: true
store_backend: "memory" # Currently only "memory" is supported
ttl_seconds: 86400 # Default: 30 days
max_responses: 1000

Usage​

1. Create Response​

curl -X POST http://localhost:8801/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"input": "Tell me a joke.",
"instructions": "Remember my name is Xunzhuo. Then I will ask you!",
"temperature": 0.7,
"max_output_tokens": 100
}'

Response:

{
"id": "resp_7cb437001e1ad5b84b6dd8ef",
"object": "response",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Sure thing, Xunzhuo! Why don't scientists trust atoms? Because they make up everything! πŸ˜„"}]
}],
"usage": {"input_tokens": 94, "output_tokens": 75, "total_tokens": 169}
}

2. Continue Conversation​

Use previous_response_id to chain conversations:

curl -X POST http://localhost:8801/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"input": "What is my name?",
"previous_response_id": "resp_7cb437001e1ad5b84b6dd8ef",
"max_output_tokens": 100
}'

Response:

{
"id": "resp_ec2822df62e390dcb87aa61d",
"status": "completed",
"output": [{
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": "Your name is Xunzhuo."}]
}],
"previous_response_id": "resp_7cb437001e1ad5b84b6dd8ef"
}

3. Get Response​

curl http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef

4. List Input Items​

curl http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef/input_items

Response:

{
"object": "list",
"data": [{
"type": "message",
"role": "system",
"content": [{"type": "input_text", "text": "Remember my name is Xunzhuo."}]
}],
"has_more": false
}

5. Delete Response​

curl -X DELETE http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef

API Translation​

Response APIChat Completions
inputmessages[].content (role: user)
instructionsmessages[0] (role: system)
previous_response_idExpanded to full messages array
max_output_tokensmax_tokens

Reference​