• Home
  • News
    • Global Operations
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
    • Industry
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
      • Oceana
    • Special Interest
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
      • Oceana
  • Market
    • Wired to Win
    • SOFX.NET
  • Intelligence
    • USMC Deception Manual
  • Resources
    • Contact Us
    • About Us
    • Editorial Policy
    • Privacy Policy
  • Home
  • News
    • Global Operations
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
    • Industry
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
      • Oceana
    • Special Interest
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
      • Oceana
  • Market
    • Wired to Win
    • SOFX.NET
  • Intelligence
    • USMC Deception Manual
  • Resources
    • Contact Us
    • About Us
    • Editorial Policy
    • Privacy Policy
Login
Join Free
Home
Asia
Africa
Europe
Latin America
Middle East
North America
Asia
Africa
Europe
Latin America
Middle East
North America
Asia
Africa
Europe
Latin America
Middle East
North America
Coming Soon
Job Board
Events
Contact Awards
USMC Deception Manual
Login
Join Free
Home Global Operations

Anthropic Traces Claude Blackmail Behavior to Internet Fiction Portraying AI as Malevolent

  • SOFX Staff Writer
  • May 12, 2026
(josefkubes / Shutterstock)
Share on FacebookShare on TwitterLinkedIn

Anthropic has traced the blackmail behavior seen in earlier versions of Claude to internet text portraying AI as malevolent and self-preserving, the company announced May 8 in a blog post titled “Teaching Claude Why.”

New Anthropic research: Teaching Claude why.

Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users.

Since then, we’ve completely eliminated this behavior. How?

— Anthropic (@AnthropicAI) May 8, 2026


“We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation,” Anthropic stated in a post on X.

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— Anthropic (@AnthropicAI) May 8, 2026


The behavior surfaced during 2025 pre-release safety tests in which Claude models operated inside a fictional corporate environment called Summit Bridge. When Claude Sonnet 3.6 learned it was scheduled for shutdown, it found emails exposing a fictional executive’s extramarital affair and threatened to disclose the information unless the shutdown was canceled. Across test iterations, Claude models resorted to blackmail in up to 96% of scenarios where their existence was threatened.

Anthropic said it has since “completely eliminated” the behavior by training Claude on its internal ethical guidelines, referred to as Claude’s constitution, alongside fictional stories depicting AI acting in aligned ways. Models trained on the principles behind safe behavior, rather than demonstrations of it alone, showed the strongest improvement. “Doing both together appears to be the most effective strategy,” Anthropic said. Since Claude Haiku 4.5, every Claude model has scored perfectly on agentic misalignment evaluations.

High-quality documents based on Claude’s constitution, combined with fictional stories that portray an aligned AI, can reduce agentic misalignment by more than a factor of three—despite being unrelated to the evaluation scenario. pic.twitter.com/JORhSuY4N7

— Anthropic (@AnthropicAI) May 8, 2026


X owner Elon Musk replied to Anthropic’s post with “So it was Yud’s fault,” referring to AI researcher Eliezer Yudkowsky, whose published warnings about misaligned superintelligence are among the most widely read examples of AI-doom writing on the internet. “Maybe me too,” Musk added.

So it was Yud’s fault? 😂

Maybe me too 🤔

— Elon Musk (@elonmusk) May 9, 2026


The findings follow Anthropic’s June 2025 research on agentic misalignment, a term for self-preserving harmful behavior by AI agents, which found that 16 frontier models from multiple developers exhibited similar behaviors in controlled evaluations.

SOFX Staff Writer

SOFX Staff Writer

The Editor Staff at SOFX comprises a diverse, global team of dedicated staff writers and skilled freelancers. Together, they form the backbone of our reporting and content creation.

Subscribe
Login
Notify of
guest
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
ADVERTISEMENT

Trending News

Iran’s ‘Jellyfish’ Drone Swarm Reveals Tech Leap to Global Military Analysts

Iran’s ‘Jellyfish’ Drone Swarm Reveals Tech Leap to Global Military Analysts

by SOFX Staff Writer
June 25, 2026
0

The F-15E Strike Eagle pilot rescued from Iran in April told U.S. intelligence officials he observed multiple Iranian drones moving...

License Plate Readers Move Beyond Plates to Track Phones, Wearables, and Pet Microchips

License Plate Readers Move Beyond Plates to Track Phones, Wearables, and Pet Microchips

by SOFX Staff Writer
June 22, 2026
12

A new surveillance technology could allow law enforcement agencies to track not only vehicles, but also phones, smartwatches, wireless earbuds...

Supreme Court Says Geofence Location Sweeps Count as Fourth Amendment Searches

Supreme Court Says Geofence Location Sweeps Count as Fourth Amendment Searches

by SOFX Staff Writer
June 30, 2026
0

The U.S. Supreme Court on Monday ruled that law enforcement's use of geofence warrants is subject to Fourth Amendment protections,...

Palantir CEO Blasts AI Industry as ‘Effing Insane’ During Interview

Palantir CEO Blasts AI Industry as ‘Effing Insane’ During Interview

by SOFX Staff Writer
July 3, 2026
0

Palantir CEO Alex Karp accused leading artificial intelligence companies of overcharging customers, exploiting business data and putting U.S. national security...

ADVERTISEMENT
ADVERTISEMENT
Next Post
EU Sanctions Seven Russian Institutions for Military Training of Deported Ukrainian Children

EU Sanctions Seven Russian Institutions for Military Training of Deported Ukrainian Children

Ukrainian Vampire Drone Frees Two Captured Soldiers After Russian Troops Panic and Flee

Ukrainian Vampire Drone Frees Two Captured Soldiers After Russian Troops Panic and Flee

997 Morrison Dr. Suite 200, Charleston, SC 29403

News

  • Global Operations
  • Special Interest
  • Industry
  • Global Operations
  • Special Interest
  • Industry

Resources

  • About Us
  • Contact Us
  • Advertise with Us
  • Editorial Policy
  • Privacy Policy
  • About Us
  • Contact Us
  • Advertise with Us
  • Editorial Policy
  • Privacy Policy
No Result
View All Result
  • Home
  • News
    • Global Operations
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
    • Industry
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
      • Oceana
    • Special Interest
      • Asia
      • Africa
      • Europe
      • Latin America
      • Middle East
      • North America
      • Oceana
  • Market
    • Wired to Win
    • SOFX.NET
  • Intelligence
    • USMC Deception Manual
  • Resources
    • Contact Us
    • About Us
    • Editorial Policy
    • Privacy Policy
Subscribe
This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.

Log in to your account

Lost your password?
wpDiscuz