
Microsoft Just Made Claude Fact-Check GPT Before Every Response. The AI Trust Problem Is That Bad.
Microsoft Copilot now runs every GPT response through Claude for accuracy checks. They call it Critique. We call it an admission.
The AI Post newsroom — delivering AI news at the speed of intelligence.
Microsoft just quietly admitted something the entire AI industry has been dancing around for two years: no single AI model can be trusted to get things right on its own.
On Monday, the company unveiled a new Copilot feature called "Critique" that forces Anthropic's Claude to review every response generated by OpenAI's GPT before it reaches the user. Read that again. Microsoft, OpenAI's biggest investor and partner, is paying a competitor to check OpenAI's homework.
"Having various different models from different vendors in Copilot is highly attractive, but we're taking this to the next level, where customers actually get the benefits of the models working together," Nicole Herskowitz, corporate vice president of Microsoft 365 and Copilot, told Reuters.
Translation: GPT hallucinates enough that Microsoft needs a second opinion on every single output. And they chose Claude to provide it.
The Architecture of Distrust
Here's how Critique works: GPT generates a response, then Claude reviews it for accuracy and quality before the user sees it. Microsoft says they plan to make it bi-directional, eventually letting GPT review Claude's drafts too.
Alongside Critique, Microsoft launched "Model Council," which lets users compare responses from different AI models side by side. They also expanded Copilot Cowork, an agentic AI tool inspired by Anthropic's Claude Cowork, to early-access Frontier program members.
What This Really Means
This is the most significant admission about AI reliability from any major tech company in 2026. Microsoft poured $13 billion into OpenAI. They built their entire productivity suite around GPT. And now they're telling enterprise customers: we need Claude to make sure GPT isn't lying to you.
The hallucination problem hasn't been solved. It's been architectured around. Instead of building one model that's trustworthy, Microsoft is building a system where models police each other. It's clever engineering. It's also a concession that the fundamental reliability problem remains unsolved.
For Anthropic, this is a massive validation. Their competitor's biggest partner chose Claude as the quality gate. For OpenAI, it's an uncomfortable signal that being first and biggest isn't the same as being most trusted.
The multi-model future is here. And it arrived not because companies wanted it, but because no single model earned enough trust to stand alone.