OpenAI RAG Integration
This guide demonstrates how to use OpenAI's File Store and Vector Store APIs for RAG (Retrieval-Augmented Generation) in Semantic Router, following the OpenAI Responses API cookbook.
Overview
The OpenAI RAG backend integrates with OpenAI's File Store and Vector Store APIs to provide a first-class RAG experience. It supports two workflow modes:
- Direct Search Mode (default): Synchronous retrieval using vector store search API
- Tool-Based Mode: Adds
file_searchtool to request (Responses API workflow)
Architecture
┌─────────────┐
│ Client │
└──────┬──────┘
│
▼
┌─────────────────────────────────────┐
│ Semantic Router │
│ ┌───────────────────────────────┐ │
│ │ RAG Plugin │ │
│ │ ┌─────────────────────────┐ │ │
│ │ │ OpenAI RAG Backend │ │ │
│ │ └──────┬──────────────────┘ │ │
│ └─────────┼──────────────────── ┘ │
└────────────┼─────────────────────── ┘
│
▼
┌─────────────────────────────────────┐
│ OpenAI API │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ File Store │ │Vector Store │ │
│ │ API │ │ API │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────┘
Prerequisites
- OpenAI API key with access to File Store and Vector Store APIs
- Files uploaded to OpenAI File Store
- Vector store created and populated with files
Configuration
Basic Configuration
Add the OpenAI RAG backend to your decision configuration:
decisions:
- name: rag-openai-decision
signals:
- type: keyword
keywords: ["research", "document", "knowledge"]
plugins:
rag:
enabled: true
backend: "openai"
backend_config:
vector_store_id: "vs_abc123" # Your vector store ID
api_key: "${OPENAI_API_KEY}" # Or use environment variable
max_num_results: 10
workflow_mode: "direct_search" # or "tool_based"
Advanced Configuration
rag:
enabled: true
backend: "openai"
similarity_threshold: 0.7
top_k: 10
max_context_length: 5000
injection_mode: "tool_role" # or "system_prompt"
on_failure: "skip" # or "warn" or "block"
cache_results: true
cache_ttl_seconds: 3600
backend_config:
vector_store_id: "vs_abc123"
api_key: "${OPENAI_API_KEY}"
base_url: "https://api.openai.com/v1" # Optional, defaults to OpenAI
max_num_results: 10
file_ids: # Optional: restrict search to specific files
- "file-123"
- "file-456"
filter: # Optional: metadata filter
category: "research"
published_date: "2024-01-01"
workflow_mode: "direct_search" # or "tool_based"
timeout_seconds: 30
Workflow Modes
1. Direct Search Mode (Default)
Synchronous retrieval using vector store search API. Context is retrieved before sending the request to the LLM.
Use Case: When you need immediate context injection and want to control the retrieval process.
Example:
backend_config:
workflow_mode: "direct_search"
vector_store_id: "vs_abc123"
Flow:
- User sends query
- RAG plugin calls vector store search API
- Retrieved context is injected into request
- Request sent to LLM with context
2. Tool-Based Mode (Responses API)
Adds file_search tool to the request. The LLM calls the tool automatically, and results appear in response annotations.
Use Case: When using Responses API and want the LLM to control when to search.
Example:
backend_config:
workflow_mode: "tool_based"
vector_store_id: "vs_abc123"
Flow:
- User sends query
- RAG plugin adds
file_searchtool to request - Request sent to LLM
- LLM calls
file_searchtool - Results appear in response annotations
Usage Examples
Example 1: Basic RAG Query
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-VSR-Selected-Decision: rag-openai-decision" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "What is Deep Research?"
}
]
}'
Example 2: Responses API with file_search Tool
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"input": "What is Deep Research?",
"tools": [
{
"type": "file_search",
"file_search": {
"vector_store_ids": ["vs_abc123"],
"max_num_results": 5
}
}
]
}'
Example 3: Python Client
import requests
# Direct search mode
response = requests.post(
"http://localhost:8080/v1/chat/completions",
headers={
"Content-Type": "application/json",
"X-VSR-Selected-Decision": "rag-openai-decision"
},
json={
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is Deep Research?"}
]
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])