THURSDAY, MAY 21, 2026 · BRISBANESUBSCRIBE →

THE AI POST

INTELLIGENCE. CURATED.

Server room with rows of blinking data center equipment representing the global AI infrastructure race
ResearchApril 17, 2026

Stanford's 423-Page AI Report Card Has One Devastating Finding. Nobody Is Testing Whether AI Is Safe.

The 2026 AI Index found that China has closed the performance gap to 2.7%. AI incidents hit 362. And safety benchmarks are mostly empty.

The AI Post

The AI Post newsroom — delivering AI news at the speed of intelligence.

Stanford's 2026 AI Index Report is 423 pages long. It covers everything from model benchmarks to patent filings to public sentiment surveys across dozens of countries. Most coverage has focused on the headline finding: China has effectively erased America's AI performance lead. That is a big deal. But the most consequential section is the one most outlets skipped.

Nobody is systematically testing whether frontier AI models are safe.

The Safety Benchmarking Gap Is Getting Worse

Almost every frontier model developer reports results on capability benchmarks. The same is not true for responsible AI benchmarks. The report's benchmark table for safety, fairness, and factuality is almost entirely empty. Only Claude Opus 4.5 reports results on more than two responsible AI benchmarks. Only GPT-5.2 reports StrongREJECT scores. Across benchmarks measuring security, human agency, and fairness, most frontier models report nothing at all.

This does not necessarily mean labs are doing zero internal safety work. The report acknowledges that red-teaming and alignment testing happen behind closed doors. But as the authors note, "these efforts are rarely disclosed using a common, externally comparable set of benchmarks." The result: external comparison across AI safety dimensions is effectively impossible.

Meanwhile, documented AI incidents rose to 362 in 2025, up from 233 the year before and under 100 annually before 2022. The OECD's broader automated tracking system recorded a peak of 435 monthly incidents in January 2026 alone. The trendline is not ambiguous.

China Closed the Gap. America Did Not See It Coming.

The US-China performance gap, once considered durable, has closed to 2.7%. In March 2026, Anthropic's top model leads China's Dola-Seed 2.0 by just 39 Arena points. For context, that gap changes with every major model release. US and Chinese models have traded the top position multiple times since early 2025, when DeepSeek-R1 briefly matched America's best.

The US still produces more notable AI models (50 in 2025 versus China's 30) and retains higher-impact patents. But China now leads in publication volume, citation share, and patent grants. China's share of the top 100 most-cited AI papers grew from 33 in 2021 to 41 in 2024. South Korea, notably, leads the world in AI patents per capita.

And then there is the structural vulnerability the report identifies but nobody wants to talk about: a single company, TSMC, fabricates almost every leading AI chip on the planet. The entire global AI hardware supply chain runs through one foundry in Taiwan.

America's Brain Drain Is Accelerating

The number of AI researchers and developers moving to the US has dropped 89% since 2017. That decline is accelerating: it fell 80% in just the last year alone. More researchers are still entering America than leaving, but the trajectory is unmistakable.

China, meanwhile, has built a massive cohort of homegrown talent. A Hoover Institution study found that nearly all researchers behind DeepSeek's foundational papers were educated or trained domestically. The assumption that the world's best AI researchers will always come to America is no longer safe.

The Public Is Nervous. Organisations Are Not Ready.

Globally, 59% of people say AI's benefits outweigh its drawbacks. But 52% also say AI products make them nervous. Both numbers are moving upward. Trust is not growing with familiarity. It is splintering.

The corporate response is struggling to match. The share of organisations rating their AI incident response as "excellent" dropped from 28% to 18% in one year. Those reporting "good" responses also fell, from 39% to 24%. Meanwhile, the share experiencing three to five incidents rose from 30% to 50%. Companies are deploying faster and responding worse.

What to Watch

The Stanford report identifies a structural problem in responsible AI improvement itself: gains in one dimension tend to reduce performance in another. Improving safety can degrade accuracy. Improving privacy can reduce fairness. There is no established framework for managing these trade-offs, and the standardised data needed to track progress over time does not yet exist.

Private AI investment in America hit $285.9 billion in 2025, more than 23 times China's $12.4 billion. The US funded 1,953 new AI companies last year. The money is flowing. The safety infrastructure is not keeping pace. The report makes the gap visible in ways that are difficult to dismiss.

The full 2026 AI Index Report is available at hai.stanford.edu. First reported by MIT Technology Review, IEEE Spectrum, and Fortune.

StanfordAI safetyUS-ChinaAI Indexbenchmarksbrain drain