Anthropic's Claude 4 could "blackmail" you in extreme situations

Pro@programming.dev · 2 days ago

Anthropic's Claude 4 could "blackmail" you in extreme situations

milicent_bystandr@lemm.ee · 1 day ago

From that snippet, it looks like they basically primed it to try blackmail, to see if it would.

neukenindekeuken@sh.itjust.works · 9 hours ago

Correct, the point being, why are they priming it for blackmail? Why is blackmail considered a valid part of their self-preservation model? Why is it a part of their ethics model? It makes no sense haha. It’s like handing it a loaded gun then be surprised when it shoots someone.

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations - Hypertext