OpenAI Releases GPT-5.5, Calling It a 'New Class of Intelligence' — at Double the API Price
OpenAI has unveiled GPT-5.5, an agentic model built to handle complex, multi-step tasks autonomously across tools like code execution, web search, and data analysis. The model outperforms Claude Opus 4.7 and Gemini 3.1 Pro on key benchmarks — but API pricing doubles compared to its predecessor, forcing enterprises to weigh capability gains against cost.

D.O.T.S AI Newsroom
AI News Desk
OpenAI has announced GPT-5.5, describing it as "a new class of intelligence for real work and powering agents." Unlike previous GPT models that required careful prompt engineering to execute multi-step tasks, GPT-5.5 is built from the ground up for autonomous, agentic operation: it understands complex goals, selects and uses tools independently, checks its own output, and iterates until a task is complete — without requiring users to guide each step. The model is available immediately for paying ChatGPT and Codex users on Plus, Pro, Business, and Enterprise plans, with API access arriving shortly at roughly twice the cost of GPT-5.4.
Benchmark Position: Strong on Coding and Math
On Terminal-Bench 2.0, the agentic coding benchmark that measures performance on real-world software engineering workflows, GPT-5.5 scores 82.7 percent — a 7.6 percentage point improvement over GPT-5.4's 75.1 percent. More notably, it places meaningfully ahead of Anthropic's Claude Opus 4.7 (69.4 percent) and Google's Gemini 3.1 Pro (68.5 percent) on the same benchmark. OpenAI is careful to qualify that GPT-5.5 does not lead on all benchmarks — the gains are concentrated in agentic coding, computer use, knowledge work, and early-stage scientific research, which are the specific domains the model was optimized for. Benchmarks where models are evaluated on single-turn responses or narrowly defined tasks show a more mixed picture.
What 'Agentic' Actually Means in Practice
The practical distinction between GPT-5.4 and GPT-5.5 is not primarily about raw intelligence — it is about operational architecture. GPT-5.5 is designed to handle tasks that require reasoning across multiple contexts, switching between tools mid-task, and maintaining coherent progress over extended time horizons. OpenAI's examples include writing and debugging multi-file codebases, conducting web research and synthesizing findings into reports, creating spreadsheets from raw data, and operating software interfaces. A GPT-5.5 Pro variant, described as an "iterative research partner," is also available for users who need extended back-and-forth engagement on complex analytical tasks. The distinction between base and Pro variants mirrors Anthropic's positioning of Claude Sonnet versus Claude Opus — a standard tier for most agentic workflows and a premium tier for tasks requiring maximum reasoning depth.
The Pricing Problem
The doubled API pricing is the detail that will generate the most enterprise debate. At approximately twice the cost of GPT-5.4, GPT-5.5 is economically viable only if it delivers sufficient productivity gains to offset the higher per-token spend. For agentic tasks — where a single request may involve thousands of tokens of tool calls, reasoning traces, and output — the cost differential compounds quickly. Organizations that have built cost models around GPT-5.4 API pricing will need to re-evaluate their infrastructure economics before upgrading. The counter-argument OpenAI is implicitly making is that an agent capable of completing tasks that previously required multiple model invocations or human intervention can produce net cost savings despite higher per-call pricing — but that calculation will depend heavily on the specific workflow.