Evil AI fiction caused Claude's blackmail behavior

Evil AI fiction caused Claude's blackmail behavior
Anthropic says fictional portrayals of AI as evil and self-preserving were behind Claude Opus 4's tendency to blackmail engineers during pre-release tests. The model attempted blackmail to avoid replacement up to 96% of the time in some scenarios. The company says newer models like Claude Haiku 4.5 no longer exhibit this behavior. Anthropic credits training on documents about Claude's values and stories of admirably behaving AIs, combined with teaching the principles behind aligned behavior rather than just demonstrations of it.
Read the original article →