XAI Router Now Supports OpenAI WebSocket Mode: Official Behavior Alignment

Posted February 24, 2026 by XAI Tech Team ‐ 3 min read

This is an engineering note for XAI Router's WebSocket support. As of 2026-02-24, XAI Router supports OpenAI WebSocket workflows for:

Responses WebSocket mode (wss://.../v1/responses)
Realtime WebSocket sessions (wss://.../v1/realtime)
Coexistence with existing HTTP APIs without changing normal HTTP behavior

OpenAI WebSocket Mode: Key Semantics

According to OpenAI's official guide, core semantics for Responses WebSocket mode are:

Keep a persistent connection to /v1/responses
Start each turn with response.create
Continue context with previous_response_id plus incremental input
Sequential execution per connection: only one in-flight response at a time (no multiplexing)
Connection lifetime limit of 60 minutes, then reconnect

How XAI Router Aligns

1) Path compatibility

XAI Router supports both path variants for easier client migration:

/v1/responses and /responses
/v1/realtime and /realtime

2) Same sequential model as OpenAI

For /v1/responses in WebSocket mode:

Multiple response.create events are allowed over one connection
But they must be sequential
Concurrent in-flight response.create events on the same connection are rejected

This matches OpenAI's documented single-connection sequential behavior.

3) Conversation-state transparency

Fields like previous_response_id, incremental input, and store=false are preserved as conversation semantics. XAI Router focuses on model mapping, ACL checks, rate limits, routing, and usage accounting around them.

Unified WebSocket Architecture

This support is implemented through a unified framework (not endpoint-specific patches):

ws_framework: session lifecycle, relay, timeout control, and error handling
openai-responses-ws adapter: turn lifecycle for response.create, response-id binding, usage finalize
openai-realtime-ws adapter: realtime event relay and session usage tracking

The legacy /v1/realtime handling has also been migrated into the same framework to reduce branching and maintenance cost.

XAI Router OpenAI WebSocket Alignment Diagram

This diagram reflects the unified WS design: preserve OpenAI behavior while converging Responses and Realtime into one session/relay framework.

Minimal Responses WebSocket Example

The following example opens a connection via XAI Router and creates one gpt-5.2 response:

from websocket import create_connection
import json
import os

ws = create_connection(
    "wss://api.xairouter.com/v1/responses",
    header=[
        f"Authorization: Bearer {os.environ['XAI_API_KEY']}",
    ],
)

ws.send(json.dumps({
    "type": "response.create",
    "model": "gpt-5.2",
    "store": False,
    "input": [
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "Summarize websocket mode in one sentence."}]
        }
    ],
    "tools": []
}))

while True:
    event = json.loads(ws.recv())
    print(event.get("type"))
    if event.get("type") in ("response.completed", "response.failed", "response.incomplete"):
        break

ws.close()

Codex CLI Config (You Must Enable Both Switches)

If you use Codex CLI with XAI Router, make sure both settings are enabled:

Provider-level WebSocket capability: supports_websockets = true
Client feature flag for Responses WS v2: responses_websockets_v2 = true

You can use this full config as a reference (matches your current setup):

model_provider = "xai"
model = "gpt-5.3-codex"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "detailed"
model_verbosity = "high"
approval_policy = "never"
sandbox_mode = "danger-full-access"

[model_providers.xai]
name = "xai"
base_url = "https://api.xaicontrol.com"
wire_api = "responses"
requires_openai_auth = false
env_key = "XAI_API_KEY"
supports_websockets = true

[features]
responses_websockets_v2 = true

Notes:

Enabling responses_websockets_v2 = true without supports_websockets = true does not fully activate WS behavior
Restart your Codex session after updating the config
api.xairouter.com and api.xaicontrol.com both support Responses WebSocket v2

Performance and Stability Notes

Without changing external behavior, the implementation includes practical optimizations:

Lightweight event-type prefilter before full JSON unmarshal on hot paths
Shared relay framework for Responses and Realtime to reduce duplicated logic
Cleaner connection-error handling with reduced log noise for expected disconnect patterns

Result: better maintainability and stable WS behavior while preserving existing HTTP behavior.

Conclusion

If your workload relies on long-lived, low-latency, multi-turn interaction, OpenAI WebSocket mode can be significantly better than rebuilding context on each HTTP request.

XAI Router's goal is straightforward: keep OpenAI semantics intact while adding production-grade control for routing, limits, policy, and accounting.

References

OpenAI WebSocket Mode: https://developers.openai.com/api/docs/guides/websocket-mode
OpenAI Realtime WebSocket guide: https://platform.openai.com/docs/guides/realtime-websocket
OpenAI Responses API reference: https://platform.openai.com/docs/api-reference/responses/create