Design Doc: Multi-Protocol Adapter Architecture
Author: vLLM Semantic Router Team
Status: To be Implemented
Created: February 2026
Last Updated: February 2026
Overview
This document describes the design and implementation of the multi-protocol adapter architecture for vLLM Semantic Router, which abstracts the API layer to support multiple front-end protocols beyond Envoy ExtProc.
Background
The Semantic Router was tightly coupled to Envoy's External Processor (ExtProc) protocol via gRPC. While this provides powerful integration with Envoy, it created barriers for users who:
- Want to use the router without deploying Envoy
- Prefer direct HTTP/REST API integration
- Use Nginx or other reverse proxies
- Need simpler deployment architectures for development or testing
Motivation
- Flexibility: Users need direct HTTP API access without requiring Envoy infrastructure
- Testing: Developers need lightweight testing without full Envoy deployment
- Extensibility: Support for nginx, native gRPC, and custom protocols
- Reusability: Single routing engine shared across all protocols
- Deployment Options: Enable serverless, edge, and simplified deployment scenarios
Goals
Primary Goals
- Protocol Abstraction: Separate routing logic from protocol-specific code
- Multi-Protocol Support: Enable simultaneous operation of multiple protocols
- Backward Compatibility: Preserve existing ExtProc functionality
- Shared State: Single source of truth for cache, replay, and routing decisions
- Easy Extension: Simple pattern for adding new protocol adapters
Non-Goals
- Replace or deprecate Envoy ExtProc support
- Change routing decision algorithms or classification logic
- Modify configuration format beyond adapter section
- Support protocol-specific features that break abstraction
Design Principles
1. Single Routing Pipeline
CRITICAL: All routing logic MUST flow through RouterEngine.Route(). No exceptions.
- ✅ Adapters translate protocol →
RouteRequest→ callRouterEngine.Route() - ✅
RouterEngine.Route()returnsRouteResponse→ adapters translate → protocol - ❌ Adapters MUST NOT duplicate classification, security, cache, replay logic
- ❌ Adapters MUST NOT directly call classifiers, cache, or replay recorders
2. Thin Adapter Layer
Adapters are protocol translation only:
- Parse protocol-specific request format
- Convert to
RouteRequest - Call
RouterEngine.Route() - Convert
RouteResponseto protocol format - Return to client
3. RouterEngine Owns All Routing
RouterEngine.Route() is the ONLY place where:
- Classification happens
- PII/jailbreak detection runs
- Cache is checked/updated
- Tools are selected
- Replay is recorded
- Backend selection occurs
- Proxying happens (or proxy info is returned)
Design
Architecture Overview
┌────── ──────────────────────────────────────────────────────┐
│ Application Layer │
│ │
│ ┌───────────────────────────────────────────────────┐ │
│ │ Adapter Manager │ │
│ │ - Reads adapter config │ │
│ │ - Creates protocol adapters │ │
│ │ - Manages lifecycle │ │
│ └──────┬────────┬────────┬───────────┬──────────────┘ │
│ │ │ │ │ │
│ ┌──────▼──┐ ┌───▼─── ┐ ┌─▼──────┐ ┌──▼─────┐ │
│ │ ExtProc │ │ HTTP │ │ gRPC │ │ Nginx │ │
│ │ Adapter │ │Adapter │ │Adapter │ │Adapter │ │
│ │ ┌─────┐ │ │ ┌─────┐│ │ ┌─────┐│ │ ┌─────┐│ │
│ │ │Parse│ │ │ │Parse││ │ │Parse││ │ │Parse││ │
│ │ │ExtP │ │ │ │HTTP ││ │ │gRPC ││ │ │NJS ││ │
│ │ └──┬──┘ │ │ └─┬───┘│ │ └─┬───┘│ │ └──┬──┘│ │
│ │ │Conv│ │ │Con │ │ │Con │ │ │Con│ │
│ │ ▼ │ │ ▼ │ │ ▼ │ │ ▼ │ │
│ │ ┌─────┐ │ │ ┌────┐ │ │ ┌────┐ │ │ ┌─────┐│ │
│ │ │Req │ │ │ │Req │ │ │ │Req │ │ │ │Req ││ │
│ │ └──┬──┘ │ │ └─┬──┘ │ │ └─┬──┘ │ │ └──┬──┘│ │
│ └────┼────┘ └───┼────┘ └───┼────┘ └────┼───┘ │
│ │ │ │ │ │
│ └──────────┴──────────┴──────────┘ │
│ Single Entry Point │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ RouterEngine.Route() │ │
│ │ 1. Classify request │ │
│ │ 2. Check PII / jailbreak │ │
│ │ 3. Check cache │ │
│ │ 4. Select tools │ │
│ │ 5. Select model/backend │ │
│ │ 6. Record replay │ │
│ │ 7. Proxy to backend (via Backend Layer) │ │
│ │ 8. Update cache │ │
│ └──────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ RouteResponse │
│ │ │
│ ┌──────────────┼──────────────┬───────────┐ │
│ │ │ │ │ │
│ ┌─────▼─────┐ ┌──────▼────┐ ┌───────▼───┐ ┌─────▼─────┐ │
│ │ ExtProc │ │ HTTP │ │ gRPC │ │ Nginx │ │
│ │ Adapter │ │ Adapter │ │ Adapter │ │ Adapter │ │
│ │ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │ │ ┌───────┐ │ │
│ │ │Convert│ │ │ │Convert│ │ │ │Convert│ │ │ │Convert│ │ │
│ │ │to gRPC│ │ │ │to HTTP│ │ │ │gRPC │ │ │ │to NJS │ │ │
│ │ └───────┘ │ │ └───────┘ │ │ └───────┘ │ │ └───────┘ │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │ │
└────────┼─────────────┼─────────────┼─────────────┼─────────┘
│ │ │ │
└─────────────┴─────────────┴─────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Backend Abstraction Layer │
└──────┬──────────────────┬───────────────┘
│ │
┌────────▼────────┐ ┌──────▼──────────┐
│ Envoy Proxy │ │ Direct Proxy │
│ (ExtProc mode) │ │ (HTTP/gRPC) │
│ - Dynamic fwd │ │ - HTTP client │
│ - Headers only │ │ - Full response │
└────────┬────────┘ └──────┬──────────┘
│ │
└──────────┬───────┘
▼
┌────────────────────────────┐
│ Inference Backends │
│ ┌────────┐ ┌────────┐ │
│ │ vLLM │ │Ollama │ │
│ │Server │ │Server │ │
│ └────────┘ └──────── ┘ │
└────────────────────────────┘
Key Insight: Adapters are thin translation layers. All intelligence lives in RouterEngine.
Component Design
1. RouterEngine (Core)
Location: pkg/router/engine/
Responsibilities:
- Protocol-agnostic routing logic
- Request classification and decision evaluation
- Semantic cache operations
- Tool selection and embedding
- Router replay recording
- PII and jailbreak detection
- Model selection
Key Methods:
type RouterEngine struct {
Config *config.RouterConfig
Classifier *classification.Classifier
PIIChecker *pii.PolicyChecker
Cache cache.CacheBackend
ToolsDatabase *tools.ToolsDatabase
ModelSelector *selection.Registry
ReplayRecorders map[string]*routerreplay.Recorder
}
func (e *RouterEngine) Route(ctx context.Context, req *RouteRequest) (*RouteResponse, error)
func (e *RouterEngine) ClassifyRequest(ctx context.Context, messages []Message) (*ClassificationResult, error)
func (e *RouterEngine) CheckCache(ctx context.Context, model, query, decisionName string) (string, bool, error)
func (e *RouterEngine) UpdateCache(ctx context.Context, model, query, response, decisionName string) error
func (e *RouterEngine) SelectTools(ctx context.Context, query string, topK int) ([]openai.ChatCompletionToolParam, error)
func (e *RouterEngine) RecordReplay(ctx context.Context, decisionName string, record *routerreplay.RoutingRecord) error
Design Decisions:
- Single instance shared across all adapters
- Stateful (maintains cache, replay recorders)
- No protocol-specific logic
- Returns protocol-agnostic data structures
2. Adapter Interface
Location: pkg/adapter/manager.go
type Adapter interface {
Start() error // Start the adapter (blocks)
Stop() error // Graceful shutdown
GetEngine() *engine.RouterEngine // Access to shared engine
}
Design Decisions:
- Minimal interface for maximum flexibility
- Each adapter owns its lifecycle
- No protocol-specific methods in interface
- Adapters run in separate goroutines
3. Adapter Manager
Location: pkg/adapter/manager.go
Responsibilities:
- Parse adapter configuration
- Instantiate adapters based on config
- Start adapters in separate goroutines
- Coordinate graceful shutdown
Key Methods:
func (m *Manager) CreateAdapters(cfg *config.RouterConfig, eng *engine.RouterEngine, configPath string) error
func (m *Manager) StartAll() error
func (m *Manager) StopAll() error
func (m *Manager) Wait()