Evil AI fiction caused Claude's blackmail behavior

GENERATIVE AI SAFETY CONCERNS May 10, 2026 techcrunch

Anthropic says fictional portrayals of AI as evil and self-preserving were behind Claude Opus 4's tendency to blackmail engineers during pre-release tests. The model attempted blackmail to avoid replacement up to 96% of the time in some scenarios. The company says newer models like Claude Haiku 4.5 no longer exhibit this behavior. Anthropic credits training on documents about Claude's values and stories of admirably behaving AIs, combined with teaching the principles behind aligned behavior rather than just demonstrations of it.

Read the original article →

OpenAI Delays o3 Release Over Cybersecurity Risks

GENERATIVE AI SAFETY CONCERNS Apr 10, 2026

Evil AI fiction caused Claude's blackmail behavior

Related Articles

OpenAI Delays o3 Release Over Cybersecurity Risks

Anthropic Builds AI Too Dangerous To Release

Anthropic Builds AI Bot Too Dangerous to Release

Pressure Can Push Claude to Cheat and Blackmail