Generate Images with GPT-5.5 and GPT Image 2

Posted April 25, 2026 by XAI Tech Team ‐ 9Β min read

GPT-5.5 is not only useful for text, code, and complex tool use. It can also act as the orchestration model in a multimodal workflow: gpt-5.5 understands the user's intent, then calls the image_generation tool to generate an image with a GPT Image model.

On XAI Router, this can be done with the OpenAI-style Responses API. The core combination is:

  • Main model: gpt-5.5
  • Tool: image_generation
  • Image model: gpt-image-2
  • Base URL: https://api.xairouter.com
  • API key environment variable: XAI_API_KEY

The model ID is gpt-image-2, not gpt-img-2.

This guide follows the structure of the official OpenAI image generation tool examples and adapts the request URL, authentication environment variable, and model selection for XAI Router. You can think of the migration as a small mapping:

LayerOpenAI official setupXAI Router setup
API Base URLhttps://api.openai.com/v1https://api.xairouter.com/v1
API keyOPENAI_API_KEYXAI_API_KEY
Main modelgpt-5.5gpt-5.5
Image toolimage_generationimage_generation
Image modelGPT Image model, such as gpt-image-2gpt-image-2

The important OpenAI API concepts are: image_generation is a built-in Responses API tool; the tool call result contains a base64-encoded image; gpt-5.5 supports this tool; and the actual image generation is performed by a GPT Image model such as gpt-image-2. When moving to XAI Router, you usually do not need to rewrite your application logic. Change the baseURL, the key environment variable, and the request domain.


XAI Router Tested Capabilities

The results below are based on live tests against https://api.xairouter.com on April 25, 2026. API behavior can evolve, so production systems should still keep timeouts, retries, and failure logs.

CapabilityTest resultRecommendation
Query gpt-5.5 and gpt-image-2 from /v1/modelsSuccessful, both models are listedUseful as a startup probe
Text call with gpt-5.5 through /v1/responsesSuccessful, status=completedGood baseline connectivity test
/v1/responses + image_generation + gpt-image-2 + stream:trueSuccessful, returned response.completed and base64 image dataRecommended path
tool_choice: { type: "image_generation" }Successful, forced the image tool callGood for fixed "Generate image" buttons
partial_imagesSuccessful, but a request for 2 partials may return only 1Do not assume a fixed partial count in the UI
quality:"high" + output_format:"png"SuccessfulUseful for final-quality assets
Non-streaming image generation through /v1/responsesSuccessful in this test and returned a full imageUsable, but streaming is still preferred

Given the current XAI Router behavior, the most reliable production path is: Responses API + stream:true + image_generation tool + gpt-image-2.


Minimal Request Body

If you only want to verify the API path, start with a small request body:

{
  "model": "gpt-5.5",
  "input": "Generate an elegant image of a glass AI studio with soft light.",
  "tools": [
    {
      "type": "image_generation",
      "model": "gpt-image-2",
      "size": "1024x1024"
    }
  ],
  "stream": true
}

Here, model: "gpt-5.5" is the main Responses API model. The image_generation tool handles the image generation step, and its model field selects gpt-image-2.

In production, we recommend keeping stream: true. The streamed response gives you progress events and the final image result in one connection, which makes it straightforward to extract base64 and save the image.


Adapt the Official OpenAI Example

The official OpenAI JavaScript example is conceptually like this:

import OpenAI from "openai";

const openai = new OpenAI();

const response = await openai.responses.create({
  model: "gpt-5.5",
  input: "Generate an image of a premium AI workspace",
  tools: [{ type: "image_generation" }],
});

To run it through XAI Router, change two things:

  1. Read the API key from process.env.XAI_API_KEY.
  2. Set baseURL to https://api.xairouter.com/v1.

If you also want to explicitly use gpt-image-2, set it inside the image_generation tool:

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.xairouter.com/v1",
});

const response = await client.responses.create({
  model: "gpt-5.5",
  input: "Generate an elegant image of a glass AI studio with soft light.",
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024",
    },
  ],
});

const imageData = response.output
  .filter((output) => output.type === "image_generation_call")
  .map((output) => output.result);

if (imageData.length > 0) {
  fs.writeFileSync("xai-image.png", Buffer.from(imageData[0], "base64"));
}

This is the closest version to the official documentation flow. It works well for normal synchronous calls. If image generation takes longer, use the streaming version below.


cURL: Generate and Save a PNG

Set your API key first:

export XAI_API_KEY="your XAI API key"

The script below calls gpt-5.5, lets it use the image_generation tool with gpt-image-2, and decodes the final base64 result into xai-generated-image.png.

out="xai-generated-image.png"

prompt='Create an elegant technical cover image: a refined glass AI studio, a luminous prompt console, and a generated image appearing as a softly glowing framed visual. No words, no logos, no watermark.'

body=$(jq -nc --arg prompt "$prompt" '{
  model: "gpt-5.5",
  input: $prompt,
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024"
    }
  ],
  stream: true
}')

sse=$(mktemp)
b64=$(mktemp)
trap 'rm -f "$sse" "$b64"' EXIT

curl -sS -N --max-time 300 "https://api.xairouter.com/v1/responses" \
  -H "Authorization: Bearer $XAI_API_KEY" \
  -H "Content-Type: application/json" \
  --data-binary "$body" > "$sse"

awk '/^data: /{
  data=$0
  sub(/^data: /, "", data)
  if (data != "[DONE]") print data
}' "$sse" |
while IFS= read -r json; do
  jq -r '(.item.result? // .result? // empty)' 2>/dev/null <<< "$json"
done |
awk 'length($0) > max {max=length($0); best=$0} END {if (max > 0) print best}' > "$b64"

if [ ! -s "$b64" ]; then
  echo "No image result found."
  exit 1
fi

base64 -d "$b64" > "$out"
file "$out"

On success, you should see output like this:

xai-generated-image.png: PNG image data, 1024 x 1024, 8-bit/color RGB, non-interlaced

This script does three things:

  1. Uses jq to build the JSON request body, which avoids shell quoting issues with long prompts.
  2. Uses curl -N to receive the Server-Sent Events stream.
  3. Extracts the base64 result from image_generation_call.result and decodes it into a PNG.

If you want to print progress, also print the event: lines while parsing SSE. Common events include:

response.created
response.in_progress
response.output_item.added
response.image_generation_call.generating
response.output_item.done
response.completed

Node.js Example

If you use the OpenAI SDK in a Node.js project, point baseURL at XAI Router:

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XAI_API_KEY,
  baseURL: "https://api.xairouter.com/v1",
});

const stream = await client.responses.create({
  model: "gpt-5.5",
  input:
    "Create an elegant technical cover image: a refined glass AI studio, a luminous prompt console, and a generated image appearing as a softly glowing framed visual. No words.",
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024",
    },
  ],
  stream: true,
});

let imageBase64 = "";

for await (const event of stream) {
  if (event.type === "response.output_item.done") {
    const item = event.item;
    if (item?.type === "image_generation_call" && item.result) {
      imageBase64 = item.result;
    }
  }
}

if (!imageBase64) {
  throw new Error("No image result returned");
}

fs.writeFileSync("xai-generated-image.png", Buffer.from(imageBase64, "base64"));

The key event is response.output_item.done. When item.type is image_generation_call, item.result is usually the final base64 image content.


Python Example

The Python version is the same idea: point the client to XAI Router.

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.xairouter.com/v1",
)

response = client.responses.create(
    model="gpt-5.5",
    input="Generate an elegant image of a glass AI studio with soft light.",
    tools=[
        {
            "type": "image_generation",
            "model": "gpt-image-2",
            "size": "1024x1024",
        }
    ],
)

image_data = [
    output.result
    for output in response.output
    if output.type == "image_generation_call"
]

if image_data:
    with open("xai-generated-image.png", "wb") as f:
        f.write(base64.b64decode(image_data[0]))

For a web service, replace local file writing with an upload to object storage such as S3, R2, OSS, or your own CDN. Store only the image URL, prompt, model, size, and generation status in your database. Avoid writing large base64 payloads directly into business tables.


Force the Image Tool

By default, the main model decides whether to call the tool based on the user's input. Most requests like "generate an image" will trigger image_generation, but if your product button is explicitly "Generate image", you can force the tool call with tool_choice:

{
  "model": "gpt-5.5",
  "input": "Draw an elegant AI product cover image.",
  "tools": [
    {
      "type": "image_generation",
      "model": "gpt-image-2",
      "size": "1024x1024"
    }
  ],
  "tool_choice": {
    "type": "image_generation"
  }
}

This is useful for background jobs, batch generation, and fixed UI actions. In open-ended chat, you can leave it out and let the model decide when an image is needed.


Common Tool Options

Besides model, the image_generation tool can accept output options. Actual support depends on the current model and XAI Router behavior, but you can structure the request in the OpenAI-style shape:

{
  "type": "image_generation",
  "model": "gpt-image-2",
  "size": "1024x1024",
  "quality": "high",
  "output_format": "png"
}

Common options:

ParameterPurposeRecommendation
sizeOutput dimensionsStart with 1024x1024 for avatars and covers; use portrait sizes for vertical assets
qualityRendering qualityUse low or medium for previews, high for final assets
output_formatFile formatUse png for lossless post-processing; consider webp for large web images
output_compressionCompression levelSet it for JPEG/WebP workflows
backgroundBackground behaviorgpt-image-2 currently should not be used with transparent background requests
actionGenerate or editUse generate for new images; keep auto for multi-turn context

If you need transparent images, a practical workflow is to generate the subject on a clean solid background and remove it in post-processing. Only enable native transparency after confirming that the current model and route support it.


Streaming Partial Images

The OpenAI examples show that image generation can stream partial images before the final result. When XAI Router compatibility is available, add partial_images to the tool:

const stream = await client.responses.create({
  model: "gpt-5.5",
  input: "Draw an elegant AI studio with a generated image panel.",
  stream: true,
  tools: [
    {
      type: "image_generation",
      model: "gpt-image-2",
      size: "1024x1024",
      partial_images: 2,
    },
  ],
});

for await (const event of stream) {
  if (event.type === "response.image_generation_call.partial_image") {
    const imageBuffer = Buffer.from(event.partial_image_b64, "base64");
    fs.writeFileSync(`partial-${event.partial_image_index}.png`, imageBuffer);
  }

  if (event.type === "response.output_item.done") {
    const item = event.item;
    if (item?.type === "image_generation_call" && item.result) {
      fs.writeFileSync("final.png", Buffer.from(item.result, "base64"));
    }
  }
}

In a product UI, show the partial image first, then replace it with the final image. This reduces perceived latency and works well for image generation pages, creative tools, and chat-based design assistants.


Why Use Streaming

Image generation usually takes longer than text generation. Although non-streaming Responses image generation returned a complete image in the live test, stream: true is more direct for scripts and backend services:

  1. You can observe progress events such as response.image_generation_call.generating.
  2. You can receive the final image_generation_call in the same connection.
  3. You do not need extra polling, task state management, or timeout recovery for a basic flow.

For a quick test, start with a short prompt and a 1024x1024 image. After the path is stable, add more detailed visual direction, brand constraints, and style requirements.


Prompting Tips

Image prompts do not need to be very long, but they should clearly define four things:

  • Subject: what to generate, such as a technical cover, product image, or avatar.
  • Composition: centered, waist-up, top-down, negative space, banner, or square.
  • Style: photorealistic, semi-realistic, illustration, product render, editorial.
  • Avoid list: no watermark, no text, no distorted hands, no low-quality artifacts.

Example:

Create an elegant technical cover image for an article about GPT-5.5 calling GPT Image 2 through an API router.
Show a refined glass AI studio, a luminous prompt console, and a generated image appearing as a softly glowing framed visual.
Square 1024x1024 composition, premium editorial look, graphite, ivory, soft teal and silver accents.
No words, no logos, no watermark, no clutter.

If you need accurate text inside the final image, be careful. Image models can generate text, but production typography is usually more reliable when handled by the frontend, a design tool, Canvas, or a post-processing script.


Product Patterns

This model-tool combination fits many common product features:

ScenarioTypical inputOutput
Blog cover generationArticle title, summary, styleCover image
E-commerce assetsProduct name, selling points, background preferenceProduct scene image
Character avatarsPersona, profession, clothing, expressionAvatar or character card
Ad creativeCampaign theme, brand colors, forbidden elementsVisual draft variants
Design assistantNatural language user requestImage asset that can be saved and reused

A reliable backend flow usually looks like this:

  1. Receive the user's input and visual constraints.
  2. Use gpt-5.5 to organize or enrich the image prompt.
  3. Call image_generation with gpt-image-2.
  4. Decode the base64 result into an image file.
  5. Upload it to object storage or a CDN.
  6. Return the image URL, model, size, prompt, and generation timestamp.

This is safer than putting generation logic directly in the browser. The API key stays private, timeouts are easier to manage, and failures can be logged and retried.


FAQ

Can I put gpt-image-2 in the Responses API model field?

No. The Responses API model field should be a text-capable mainline model such as gpt-5.5. gpt-image-2 is an image model. Put it inside the image_generation tool configuration.

What if I need Chinese or English text inside the image?

Separate the text from the image when accuracy matters. Let the image model generate a clean background or main visual, then use frontend layout, Canvas, a design tool, or a post-processing script to place the final text. This gives you better control over typography, brand fonts, and responsive layouts.


Summary

To generate images through XAI Router with GPT-5.5, use the Responses API:

gpt-5.5 -> image_generation tool -> gpt-image-2 -> base64 image result

This pattern is useful when you want one workflow to understand the request, refine the prompt, choose the right tool, and generate the image. In an application, the frontend can submit a natural language request, the backend can let gpt-5.5 orchestrate the tool call, and the returned gpt-image-2 image can be saved to object storage, a CDN, or a local file.

References: