BusinessApril 23, 2026

The First Real Reviews of GPT-5.5 Are In. It Is Fast, Steady, and Not the King OpenAI Promised.

24 hours after launch, the verdict: GPT-5.5 is OpenAI's best workhorse model. It is not the best AI model. The gap is closing.

Axel Reed

GPT-5.5 has been live for 24 hours and the reviews are arriving from the people who matter: the researchers, developers, and analysts who actually push these models to their limits. The consensus is forming fast, and it is more nuanced than OpenAI's launch hype suggested.

The short version: GPT-5.5 is very good. It is fast. It is steady. It is not the best AI model in the world.

The Bull Case: Ethan Mollick Says It Is a Big Deal

Wharton professor Ethan Mollick, who had early access, published one of the most detailed assessments. He gave every recent model the same coding challenge: build a procedurally generated 3D simulation of a harbor town evolving from 3000 BCE to 3000 AD. Only GPT-5.5 Pro actually modelled an evolving town rather than generating building replacements over time. It also completed the task in 20 minutes, down from 33 for GPT-5.4 Pro.

Mollick's verdict: "It indicates that we are not done with the rapid improvement in AI." He frames GPT-5.5 as a sign that the pre-training paradigm still has runway, not just a better model but evidence that the curve has not flattened.

The Every.to Benchmark: 62.5% vs Opus 4.7's Low 30s

Every.to ran GPT-5.5 through their Senior Engineer Benchmark, which measures how well models rewrite messy code the way an experienced engineer would. GPT-5.5 with extra high reasoning hit 62.5 on its best run. Opus 4.7 at similar reasoning levels landed in the low 30s. Human senior engineers score in the high 80s and low 90s.

That is a massive gap in GPT-5.5's favor on pure coding execution. But Every.to also found something curious: GPT-5.5 performed best when it executed a plan written by Opus 4.7. The implication is that Opus 4.7 still thinks better, even if GPT-5.5 executes faster. Brains versus hands.

Dan Shipper, Every.to's CEO, called GPT-5.5 his "new daily driver." But his review stopped short of calling it the best model period. It is the best model for getting things done. That is a different and potentially more important distinction.

The Security Verdict: Cleanest Code an LLM Has Ever Produced

SonarSource ran GPT-5.5 through 4,444 coding tasks across 10 independent runs. The results: vulnerability density of 75 per million lines of code, with a flat distribution across all severity levels. That means the model is not just avoiding easy catches. It is producing some of the cleanest security profiles SonarSource has ever analyzed.

The catch: concurrency and threading bugs came in at 170 per million lines, substantially higher than any other bug category. These are the expensive bugs. Hard to reproduce, environment-dependent, and intermittent. And GPT-5.5 only comments 2% of its output. That is 700,000 lines of code with virtually no explanation of what it does.

Where It Falls Short

Every.to noted GPT-5.5 can still be bland. It struggles with Ruby. It trails Opus 4.7 on PowerPoint presentations, spatial composition, and ambitious prototypes. Opus 4.7 writes better plans, has a superior eye for design, and catches product-level details that GPT-5.5 misses.

The new pre-training did not solve everything. OpenAI's image generation model is impressive, but that is a tool addition, not a model improvement. The 1 million token context window matches but does not exceed what competitors already offer. And GPT-5.5 launches in ChatGPT and Codex first, with the API coming later. Enterprise customers who need API access are stuck waiting.

The Pricing Problem

GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. GPT-5.5 Pro costs $30 and $180. For comparison: Opus 4.7 runs $5 and $25. OpenAI's argument is that fewer retries and better reasoning lower the effective cost per completed task. That may be true for complex coding. It is harder to justify for everyday business use where GPT-5.4 at half the price was already good enough.

The Verdict

GPT-5.5 is OpenAI's best model and its clearest bid to win back the professional AI market that Anthropic has been capturing. It succeeds on execution speed and coding quality. It does not clearly win on reasoning depth, design sense, or prose quality. It costs more than everything else.

The interesting signal is not whether GPT-5.5 is the best model. It is that "best" no longer exists as a single category. Different models win different tasks. The era of one model to rule them all is over. OpenAI built a workhorse. Anthropic built a strategist. Google built a platform. The question for enterprises is no longer which model is best. It is which models, plural, fit which workflows.

For OpenAI, that might be enough. For the "new class of intelligence" marketing? Not quite.

openaigpt-5-5reviewsanthropicbenchmarkscoding

THE AI POST