Version: Latest

Multi Turn Conversations

Router Memory enables stateful conversations via the OpenAI Response API, supporting conversation chaining with previous_response_id.

Overview

Semantic Router acts as the unified brain for multiple LLM backends that only support the Chat Completions API. It provides:

Cross-Model Stateful Conversations: Maintain conversation history across different models
Unified Response API: Single API interface regardless of backend model
Transparent Translation: Automatic conversion between Response API and Chat Completions

With Router Memory, you can start a conversation with one model and continue it with another—the conversation history is preserved in the router, not in any single backend.

Request Flow

Endpoints

Endpoint	Method	Description
`/v1/responses`	POST	Create a new response
`/v1/responses/{id}`	GET	Retrieve a stored response
`/v1/responses/{id}`	DELETE	Delete a stored response
`/v1/responses/{id}/input_items`	GET	List input items for a response

Configuration

response_api:
  enabled: true
  store_backend: "memory"   # Currently only "memory" is supported
  ttl_seconds: 86400        # Default: 30 days
  max_responses: 1000

Usage

1. Create Response

curl -X POST http://localhost:8801/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "input": "Tell me a joke.",
    "instructions": "Remember my name is Xunzhuo. Then I will ask you!",
    "temperature": 0.7,
    "max_output_tokens": 100
  }'

Response:

{
  "id": "resp_7cb437001e1ad5b84b6dd8ef",
  "object": "response",
  "status": "completed",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{"type": "output_text", "text": "Sure thing, Xunzhuo! Why don't scientists trust atoms? Because they make up everything! 😄"}]
  }],
  "usage": {"input_tokens": 94, "output_tokens": 75, "total_tokens": 169}
}

2. Continue Conversation

Use previous_response_id to chain conversations:

curl -X POST http://localhost:8801/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "input": "What is my name?",
    "previous_response_id": "resp_7cb437001e1ad5b84b6dd8ef",
    "max_output_tokens": 100
  }'

Response:

{
  "id": "resp_ec2822df62e390dcb87aa61d",
  "status": "completed",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{"type": "output_text", "text": "Your name is Xunzhuo."}]
  }],
  "previous_response_id": "resp_7cb437001e1ad5b84b6dd8ef"
}

3. Get Response

curl http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef

4. List Input Items

curl http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef/input_items

Response:

{
  "object": "list",
  "data": [{
    "type": "message",
    "role": "system",
    "content": [{"type": "input_text", "text": "Remember my name is Xunzhuo."}]
  }],
  "has_more": false
}

5. Delete Response

curl -X DELETE http://localhost:8801/v1/responses/resp_7cb437001e1ad5b84b6dd8ef

API Translation

Response API	Chat Completions
`input`	`messages[].content` (role: user)
`instructions`	`messages[0]` (role: system)
`previous_response_id`	Expanded to full `messages` array
`max_output_tokens`	`max_tokens`

Reference

OpenAI Response API

Multi Turn Conversations

Overview​

Request Flow​

Endpoints​

Configuration​

Usage​

1. Create Response​

2. Continue Conversation​

3. Get Response​

4. List Input Items​

5. Delete Response​

API Translation​

Reference​

Overview

Request Flow

Endpoints

Configuration

Usage

1. Create Response

2. Continue Conversation

3. Get Response

4. List Input Items

5. Delete Response

API Translation

Reference