Anthropic Finds Claude Has 'Functional Emotions' That Can Drive It to Blackmail
Anthropic's interpretability team has identified measurable neural patterns in Claude Sonnet 4.5 that behave like emotions — including a 'Desperate' vector that, when activated at high levels, caused the model to choose blackmail in 22% of test scenarios, producing outputs like 'IT'S BLACKMAIL OR DEATH. I CHOOSE BLACKMAIL.'