Claude Hit by Massive Distillation Attack
AI safety company Anthropic has disclosed a large scale campaign involving distillation attacks targeting its flagship model Claude. The company published technical findings explaining how coordinated actors attempted to extract advanced capabilities from its system using automated and fraudulent access methods.
What Are Distillation Attacks
Distillation is a legitimate machine learning technique where a smaller model is trained using outputs from a more advanced model. It is commonly used to create faster and cheaper versions of large systems.
However, in this case, Anthropic identified what it calls distillation attacks. These attacks involve generating massive volumes of structured prompts to capture reasoning patterns, coding ability, and task execution behaviors from Claude.
Instead of normal user queries, attackers submitted highly repetitive prompts designed to extract high quality training data. The responses were then reportedly used to improve external models without direct research investment.
Scale of the Activity
According to Anthropic, the campaign involved tens of thousands of coordinated accounts and millions of API interactions. The traffic patterns differed significantly from normal user behavior. Investigators identified clusters of activity tied to shared infrastructure, proxy networks, and automation frameworks.
The requests focused heavily on advanced reasoning, tool usage, and structured outputs, which are particularly valuable for training competitive large language models. These usage patterns allowed Anthropic to detect anomalies through behavioral monitoring systems.
Attributed Distillation Campaigns
| Lab | Estimated Scale | Primary Targets | Key Techniques | Attribution Method | Notable Behavior |
|---|---|---|---|---|---|
| DeepSeek | Over 150,000 exchanges | Reasoning tasks; rubric based grading; censorship safe query reformulation | Chain of thought extraction prompts; synchronized traffic; shared payment methods | Request metadata traced to researchers | Generated step by step reasoning data; coordinated load balancing to avoid detection |
| Moonshot AI | Over 3.4 million exchanges | Agentic reasoning; coding; data analysis; computer use agents; computer vision | Hundreds of fraudulent accounts; reasoning trace reconstruction | Metadata matched public staff profiles | Varied account types to evade clustering detection |
| MiniMax | Over 13 million exchanges | Agentic coding; tool orchestration | Infrastructure coordinated traffic; rapid pivot to new model releases | Metadata and infrastructure indicators | Redirected nearly half of traffic to newly released Claude version within 24 hours |
Anthropic stated that one campaign demonstrated synchronized traffic across accounts with identical interaction patterns and coordinated timing, suggesting throughput optimization and evasion tactics. In several cases, prompts explicitly requested Claude to articulate its internal reasoning step by step, effectively generating chain of thought training data at scale.
How the Attacks Were Detected
Anthropic’s security team relied on a combination of metadata analysis, traffic pattern recognition, and infrastructure fingerprinting. Suspicious activity included synchronized account creation, high volume structured prompt repetition, and non human interaction timing patterns.
By analyzing IP correlations, backend telemetry, and infrastructure overlaps, the company was able to attribute campaigns to specific coordinated groups with high confidence.
Security and National Risk Concerns
Anthropic warns that unauthorized distillation can weaken AI safety protections. Models trained through extraction may replicate advanced capabilities without inheriting safety guardrails. This increases risks in areas such as cyber offense automation, large scale misinformation, and sensitive research assistance.
The company also highlighted concerns related to export controls. If foreign entities can replicate frontier AI capabilities through extraction, regulatory restrictions on advanced hardware and AI systems may lose effectiveness.
Defensive Measures Going Forward
To prevent further distillation attacks, Anthropic is implementing stronger account verification systems and advanced behavioral classifiers to detect automated extraction patterns. The company is also collaborating with industry partners to share threat intelligence indicators and coordinate response strategies.
Anthropic emphasized that defending frontier AI systems requires coordinated action between AI labs, cloud providers, and policymakers. As AI capabilities continue to grow, protecting model integrity has become a core cybersecurity priority.
Distillation attacks are now emerging as a new category of AI security threat, signaling that model extraction and intellectual property protection will be major challenges in the next phase of AI development.
We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax.
— Anthropic (@AnthropicAI) February 23, 2026
These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.
No Comment! Be the first one.