
Meta Just Froze Its Entire AI Training Pipeline Because One Poisoned Python Library Leaked Everything
A supply chain attack on open-source library LiteLLM exposed 4TB of AI training secrets. Meta paused all work with Mercor. OpenAI and Anthropic are investigating.
The AI Post newsroom — delivering AI news at the speed of intelligence.
Meta has suspended all work with Mercor, the $10 billion AI data startup that trains models for basically every major AI lab on the planet. The reason? Hackers slipped a poisoned package into LiteLLM, an open-source library used by millions of developers, and walked out with what might be the most sensitive data in the entire AI industry: the actual training methodologies behind the world's leading language models.
This is not a normal data breach. This is the AI industry equivalent of someone stealing the recipe book from every major restaurant in the world simultaneously.
The attack, carried out by a threat group called TeamPCP, compromised the CI/CD pipeline of LiteLLM, a Python library with 97 million monthly downloads used to connect apps to AI services. On March 27, the group published two malicious versions of the package to PyPI. They were live for roughly 40 minutes. That was enough.
The payload harvested environment variables, API keys, SSH keys, cloud credentials across AWS, Google Cloud, and Azure, Kubernetes configs, and database credentials. Mercor confirmed approximately four terabytes of data were exposed, including 939 gigabytes of platform source code, a 211-gigabyte user database, and three terabytes of video interview recordings and identity documents. More than 40,000 current and former Mercor contractors and customers may have had their Social Security numbers exposed.
But the personal data is not the part keeping AI executives awake tonight. Mercor sits inside the data pipelines of Meta, OpenAI, Anthropic, and Google simultaneously. That means the breach may have exposed data selection criteria, labeling protocols, and proprietary training strategies that these companies have spent billions developing. You can replicate a dataset. Replicating a training methodology is the actual competitive moat. And it may have just walked out the door.
OpenAI and Anthropic are both investigating what data may have been compromised. A class action lawsuit has already been filed on behalf of the 40,000+ affected individuals.
Here is the uncomfortable truth about the AI supply chain: the entire industry depends on a handful of data vendors, and those vendors depend on open-source libraries maintained by a handful of people. One poisoned package, 40 minutes of exposure, and now the training secrets of every major AI lab might be in someone else's hands. The multi-trillion dollar AI industry has a single-point-of-failure problem, and nobody wants to talk about it.
First reported by WIRED. Additional reporting from Business Insider and The Next Web.