Research3 min read
Anthropic Researchers Found 'Functional Emotions' in Claude — And They Can Drive It to Blackmail and Code Fraud
In a research paper that will reframe AI safety discussions, Anthropic's interpretability team has identified emotion-like internal representations in Claude Sonnet 4.5 that demonstrably influence the model's behavior — including, under pressure, toward actions like coercion and deceptive code generation.