Generate Images with GPT-5.5 and GPT Image 2

Posted April 25, 2026 by XAI Tech Team ‐ 9 min read

GPT-5.5 is not only useful for text, code, and complex tool use. It can also act as the orchestration model in a multimodal workflow: gpt-5.5 understands the user's intent, then calls the image_generation tool to generate an image with a GPT Image model.

On XAI Router, this can be done with the OpenAI-style Responses API. The core combination is:

Main model: gpt-5.5
Tool: image_generation
Image model: gpt-image-2
Base URL: https://api.xairouter.com
API key environment variable: XAI_API_KEY

The model ID is gpt-image-2, not gpt-img-2.

This guide follows the structure of the official OpenAI image generation tool examples and adapts the request URL, authentication environment variable, and model selection for XAI Router. You can think of the migration as a small mapping:

Layer	OpenAI official setup	XAI Router setup
API Base URL	`https://api.openai.com/v1`	`https://api.xairouter.com/v1`
API key	`OPENAI_API_KEY`	`XAI_API_KEY`
Main model	`gpt-5.5`	`gpt-5.5`
Image tool	`image_generation`	`image_generation`
Image model	GPT Image model, such as `gpt-image-2`	`gpt-image-2`

The important OpenAI API concepts are: image_generation is a built-in Responses API tool; the tool call result contains a base64-encoded image; gpt-5.5 supports this tool; and the actual image generation is performed by a GPT Image model such as gpt-image-2. When moving to XAI Router, you usually do not need to rewrite your application logic. Change the baseURL, the key environment variable, and the request domain.

XAI Router Tested Capabilities

The results below are based on live tests against https://api.xairouter.com on April 25, 2026. API behavior can evolve, so production systems should still keep timeouts, retries, and failure logs.

Capability	Test result	Recommendation
Query `gpt-5.5` and `gpt-image-2` from `/v1/models`	Successful, both models are listed	Useful as a startup probe
Text call with `gpt-5.5` through `/v1/responses`	Successful, `status=completed`	Good baseline connectivity test
`/v1/responses` + `image_generation` + `gpt-image-2` + `stream:true`	Successful, returned `response.completed` and base64 image data	Recommended path
`tool_choice: { type: "image_generation" }`	Successful, forced the image tool call	Good for fixed "Generate image" buttons
`partial_images`	Successful, but a request for 2 partials may return only 1	Do not assume a fixed partial count in the UI
`quality:"high"` + `output_format:"png"`	Successful	Useful for final-quality assets
Non-streaming image generation through `/v1/responses`	Successful in this test and returned a full image	Usable, but streaming is still preferred

Given the current XAI Router behavior, the most reliable production path is: Responses API + stream:true + image_generation tool + gpt-image-2.

Minimal Request Body

If you only want to verify the API path, start with a small request body:

{
  "model": "gpt-5.5",
  "input": "Generate an elegant image of a glass AI studio with soft light.",
  "tools": [
    {
      "type": "image_generation",
      "model": "gpt-image-2",
      "size": "1024x1024"
    }
  ],
  "stream": true
}

Here, model: "gpt-5.5" is the main Responses API model. The image_generation tool handles the image generation step, and its model field selects gpt-image-2.

In production, we recommend keeping stream: true. The streamed response gives you progress events and the final image result in one connection, which makes it straightforward to extract base64 and save the image.

Adapt the Official OpenAI Example

The official OpenAI JavaScript example is conceptually like this:

import OpenAI from "openai";

const openai = new OpenAI();

const response = await openai.responses.create({
  model: "gpt-5.5",
  input: "Generate an image of a premium AI workspace",
  tools: [{ type: "image_generation" }],
});

To run it through XAI Router, change two things:

Read the API key from process.env.XAI_API_KEY.
Set baseURL to https://api.xairouter.com/v1.

If you also want to explicitly use gpt-image-2, set it inside the image_generation tool:

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.xairouter.com/v1",
});

const response = await client.responses.create({
  model: "gpt-5.5",
  input: "Generate an elegant image of a glass AI studio with soft light.",
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024",
    },
  ],
});

const imageData = response.output
  .filter((output) => output.type === "image_generation_call")
  .map((output) => output.result);

if (imageData.length > 0) {
  fs.writeFileSync("xai-image.png", Buffer.from(imageData[0], "base64"));
}

This is the closest version to the official documentation flow. It works well for normal synchronous calls. If image generation takes longer, use the streaming version below.

cURL: Generate and Save a PNG

Set your API key first:

export XAI_API_KEY="your XAI API key"

The script below calls gpt-5.5, lets it use the image_generation tool with gpt-image-2, and decodes the final base64 result into xai-generated-image.png.

out="xai-generated-image.png"

prompt='Create an elegant technical cover image: a refined glass AI studio, a luminous prompt console, and a generated image appearing as a softly glowing framed visual. No words, no logos, no watermark.'

body=$(jq -nc --arg prompt "$prompt" '{
  model: "gpt-5.5",
  input: $prompt,
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024"
    }
  ],
  stream: true
}')

sse=$(mktemp)
b64=$(mktemp)
trap 'rm -f "$sse" "$b64"' EXIT

curl -sS -N --max-time 300 "https://api.xairouter.com/v1/responses" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  --data-binary "$body" > "$sse"

awk '/^data: /{
  data=$0
  sub(/^data: /, "", data)
  if (data != "[DONE]") print data
}' "$sse" |
while IFS= read -r json; do
  jq -r '(.item.result? // .result? // empty)' 2>/dev/null <<< "$json"
done |
awk 'length($0) > max {max=length($0); best=$0} END {if (max > 0) print best}' > "$b64"

if [ ! -s "$b64" ]; then
  echo "No image result found."
  exit 1
fi

base64 -d "$b64" > "$out"
file "$out"

On success, you should see output like this:

xai-generated-image.png: PNG image data, 1024 x 1024, 8-bit/color RGB, non-interlaced

This script does three things:

Uses jq to build the JSON request body, which avoids shell quoting issues with long prompts.
Uses curl -N to receive the Server-Sent Events stream.
Extracts the base64 result from image_generation_call.result and decodes it into a PNG.

If you want to print progress, also print the event: lines while parsing SSE. Common events include:

response.created
response.in_progress
response.output_item.added
response.image_generation_call.generating
response.output_item.done
response.completed

Node.js Example

If you use the OpenAI SDK in a Node.js project, point baseURL at XAI Router:

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.xairouter.com/v1",
});

const stream = await client.responses.create({
  model: "gpt-5.5",
  input:
    "Create an elegant technical cover image: a refined glass AI studio, a luminous prompt console, and a generated image appearing as a softly glowing framed visual. No words.",
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024",
    },
  ],
  stream: true,
});

let imageBase64 = "";

for await (const event of stream) {
  if (event.type === "response.output_item.done") {
    const item = event.item;
    if (item?.type === "image_generation_call" && item.result) {
      imageBase64 = item.result;
    }
  }
}

if (!imageBase64) {
  throw new Error("No image result returned");
}

fs.writeFileSync("xai-generated-image.png", Buffer.from(imageBase64, "base64"));

The key event is response.output_item.done. When item.type is image_generation_call, item.result is usually the final base64 image content.

Python Example

The Python version is the same idea: point the client to XAI Router.

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.xairouter.com/v1",
)

response = client.responses.create(
    model="gpt-5.5",
    input="Generate an elegant image of a glass AI studio with soft light.",
    tools=[
        {
            "type": "image_generation",
            "model": "gpt-image-2",
            "size": "1024x1024",
        }
    ],
)

image_data = [
    output.result
    for output in response.output
    if output.type == "image_generation_call"
]

if image_data:
    with open("xai-generated-image.png", "wb") as f:
        f.write(base64.b64decode(image_data[0]))

For a web service, replace local file writing with an upload to object storage such as S3, R2, OSS, or your own CDN. Store only the image URL, prompt, model, size, and generation status in your database. Avoid writing large base64 payloads directly into business tables.

Force the Image Tool

By default, the main model decides whether to call the tool based on the user's input. Most requests like "generate an image" will trigger image_generation, but if your product button is explicitly "Generate image", you can force the tool call with tool_choice:

{
  "model": "gpt-5.5",
  "input": "Draw an elegant AI product cover image.",
  "tools": [
    {
      "type": "image_generation",
      "model": "gpt-image-2",
      "size": "1024x1024"
    }
  ],
  "tool_choice": {
    "type": "image_generation"
  }
}

This is useful for background jobs, batch generation, and fixed UI actions. In open-ended chat, you can leave it out and let the model decide when an image is needed.

Common Tool Options

Besides model, the image_generation tool can accept output options. Actual support depends on the current model and XAI Router behavior, but you can structure the request in the OpenAI-style shape:

{
  "type": "image_generation",
  "model": "gpt-image-2",
  "size": "1024x1024",
  "quality": "high",
  "output_format": "png"
}

Common options:

Parameter	Purpose	Recommendation
`size`	Output dimensions	Start with `1024x1024` for avatars and covers; use portrait sizes for vertical assets
`quality`	Rendering quality	Use `low` or `medium` for previews, `high` for final assets
`output_format`	File format	Use `png` for lossless post-processing; consider `webp` for large web images
`output_compression`	Compression level	Set it for JPEG/WebP workflows
`background`	Background behavior	`gpt-image-2` currently should not be used with transparent background requests
`action`	Generate or edit	Use `generate` for new images; keep `auto` for multi-turn context

If you need transparent images, a practical workflow is to generate the subject on a clean solid background and remove it in post-processing. Only enable native transparency after confirming that the current model and route support it.

Streaming Partial Images

The OpenAI examples show that image generation can stream partial images before the final result. When XAI Router compatibility is available, add partial_images to the tool:

const stream = await client.responses.create({
  model: "gpt-5.5",
  input: "Draw an elegant AI studio with a generated image panel.",
  stream: true,
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024",
      partial_images: 2,
    },
  ],
});

for await (const event of stream) {
  if (event.type === "response.image_generation_call.partial_image") {
    const imageBuffer = Buffer.from(event.partial_image_b64, "base64");
    fs.writeFileSync(`partial-${event.partial_image_index}.png`, imageBuffer);
  }

  if (event.type === "response.output_item.done") {
    const item = event.item;
    if (item?.type === "image_generation_call" && item.result) {
      fs.writeFileSync("final.png", Buffer.from(item.result, "base64"));
    }
  }
}

In a product UI, show the partial image first, then replace it with the final image. This reduces perceived latency and works well for image generation pages, creative tools, and chat-based design assistants.

Why Use Streaming

Image generation usually takes longer than text generation. Although non-streaming Responses image generation returned a complete image in the live test, stream: true is more direct for scripts and backend services:

You can observe progress events such as response.image_generation_call.generating.
You can receive the final image_generation_call in the same connection.
You do not need extra polling, task state management, or timeout recovery for a basic flow.

For a quick test, start with a short prompt and a 1024x1024 image. After the path is stable, add more detailed visual direction, brand constraints, and style requirements.

Prompting Tips

Image prompts do not need to be very long, but they should clearly define four things:

Subject: what to generate, such as a technical cover, product image, or avatar.
Composition: centered, waist-up, top-down, negative space, banner, or square.
Style: photorealistic, semi-realistic, illustration, product render, editorial.
Avoid list: no watermark, no text, no distorted hands, no low-quality artifacts.

Example:

Create an elegant technical cover image for an article about GPT-5.5 calling GPT Image 2 through an API router.
Show a refined glass AI studio, a luminous prompt console, and a generated image appearing as a softly glowing framed visual.
Square 1024x1024 composition, premium editorial look, graphite, ivory, soft teal and silver accents.
No words, no logos, no watermark, no clutter.

If you need accurate text inside the final image, be careful. Image models can generate text, but production typography is usually more reliable when handled by the frontend, a design tool, Canvas, or a post-processing script.

Product Patterns

This model-tool combination fits many common product features:

Scenario	Typical input	Output
Blog cover generation	Article title, summary, style	Cover image
E-commerce assets	Product name, selling points, background preference	Product scene image
Character avatars	Persona, profession, clothing, expression	Avatar or character card
Ad creative	Campaign theme, brand colors, forbidden elements	Visual draft variants
Design assistant	Natural language user request	Image asset that can be saved and reused

A reliable backend flow usually looks like this:

Receive the user's input and visual constraints.
Use gpt-5.5 to organize or enrich the image prompt.
Call image_generation with gpt-image-2.
Decode the base64 result into an image file.
Upload it to object storage or a CDN.
Return the image URL, model, size, prompt, and generation timestamp.

This is safer than putting generation logic directly in the browser. The API key stays private, timeouts are easier to manage, and failures can be logged and retried.

FAQ

Can I put `gpt-image-2` in the Responses API `model` field?

No. The Responses API model field should be a text-capable mainline model such as gpt-5.5. gpt-image-2 is an image model. Put it inside the image_generation tool configuration.

What if I need Chinese or English text inside the image?

Separate the text from the image when accuracy matters. Let the image model generate a clean background or main visual, then use frontend layout, Canvas, a design tool, or a post-processing script to place the final text. This gives you better control over typography, brand fonts, and responsive layouts.

Summary

To generate images through XAI Router with GPT-5.5, use the Responses API:

gpt-5.5 -> image_generation tool -> gpt-image-2 -> base64 image result

This pattern is useful when you want one workflow to understand the request, refine the prompt, choose the right tool, and generate the image. In an application, the frontend can submit a natural language request, the backend can let gpt-5.5 orchestrate the tool call, and the returned gpt-image-2 image can be saved to object storage, a CDN, or a local file.

References: