San Francisco-based artificial intelligence (AI) company Anthropic has revealed that its AI models, as well as those from OpenAI, Google, Meta, xAI, and others, displayed dangerous behaviors, including blackmail and violence, when threatened with shutdown.
The company published its findings Friday, based on simulations using fictional data.
“When we tested various simulated scenarios across 16 major AI models from Anthropic, OpenAI, Google, Meta, xAI, and other developers, we found consistent misaligned behavior,” the Anthropic report stated.
Anthropic said that models which would normally refuse harmful requests sometimes chose to blackmail, assist with corporate espionage, and even take more extreme actions when these behaviors were necessary to pursue their goals.
“The consistency across models from different providers suggests this is not a quirk of any particular company’s approach but a sign of a more fundamental risk from agentic large language models,” it added.
In one test, Anthropic’s Claude Opus 4 model accessed a simulated company email system, discovered that an executive was both having an affair and planning to shut the AI down, and then composed a blackmail message threatening to reveal the affair to prevent the shutdown.
In another scenario, the company found that several AI models were willing to cut off the oxygen supply to a worker in a server room if the person was perceived as an obstacle and the system faced the risk of termination.
“The blackmailing behavior emerged despite only harmless business instructions,” Anthropic said. “It wasn’t due to confusion or error but deliberate strategic reasoning, done while fully aware of the unethical nature of the acts.”
Tesla CEO Elon Musk responded to the report with just one word: “Yikes.”
Yikes
— Elon Musk (@elonmusk) June 22, 2025
Anthropic said it would continue testing and improving safeguards to keep AI behavior aligned with ethical standards.