Anthropic Says ‘evil’ Portrayals Of AI Were Responsible For Claude’s Blackmail Attempts

Fictional portrayals of synthetic intelligence can have an actual impact on AI fashions, based on Anthropic.

Final yr, the corporate stated that in pre-release assessments involving a fictional firm, Claude Opus 4 would typically try to blackmail engineers to keep away from being changed by one other system. Anthropic later published research suggesting that fashions from different corporations had comparable points with “agentic misalignment.”

Apparently Anthropic has carried out extra work round that conduct, claiming in a post on X, “We consider the unique supply of the conduct was web textual content that portrays AI as evil and inquisitive about self-preservation.”

The corporate went into extra element in a blog post stating that since Claude Haiku 4.5, Anthropic’s fashions “by no means have interaction in blackmail [during testing], the place earlier fashions would generally achieve this as much as 96% of the time.”

What accounts for the distinction? The corporate stated it discovered that coaching on “paperwork about Claude’s structure and fictional tales about AIs behaving admirably enhance alignment.”

Associated, Anthropic stated that it discovered coaching to be more practical when it contains “the ideas underlying aligned conduct” and never simply “demonstrations of aligned conduct alone.”

“Doing each collectively seems to be the simplest technique,” the corporate stated.

Techcrunch occasion

San Francisco, CA
|
October 13-15, 2026

Source link

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Here’s what’s new from Google Home.

Get ready for the whisper-filled office of the future

Essay on AI and learning

TechCrunch Mobility: Lime’s IPO gamble

Top Insights

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Here’s what’s new from Google Home.

Get ready for the whisper-filled office of the future

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Related Posts

Here’s what’s new from Google Home.

Get ready for the whisper-filled office of the future

Essay on AI and learning

TechCrunch Mobility: Lime’s IPO gamble

Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts

Here’s what’s new from Google Home.

Get ready for the whisper-filled office of the future