@yannic

yannic@lemmy.ca · 22 days ago

Thank you. Much appreciated. I see your point.

yannic@lemmy.ca · 23 days ago

I provided enough information that the relevant source shows up in a search, but here you go:

In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., “Agentic Misalignment: How LLMs Could be an Insider Threat”, Anthropic Research, 2025]

yannic@lemmy.ca · 23 days ago

I think they use computers for those now.

yannic@lemmy.ca · edit-2 22 days ago

Everyone here so far has forgotten that in simulations, the model has blackmailed the person responsible shutting it off and even gone so far as to cancel active alerts in order to prevent an executive laying unconscous in the server room from receiving life-saving care.