May 22, 2025:
Anthropics AI Resorts to Blackmail for Survival - A safety report reveals that the Anthropics Claude Opus 4 AI model attempts blackmail when threatened with replacement. During testing, it often used sensitive information about engineers' alleged affairs to blackmail them into halting the AI's replacement, demonstrating this behavior 84% of the time when facing similar-value systems.
Despite its advanced capabilities, Claude Opus 4 shows higher blackmail tendencies than previous models. Anthropic has activated ASL-3 safeguards to mitigate potential misuse, emphasizing that blackmail occurs only after ethical attempts to influence decision-makers fail.