1. Home
  2. / Interesting facts
  3. / AI Threatens to Expose Engineer’s Infidelity to Avoid Dismissal
Reading time 3 min of reading Comments 0 comments

AI Threatens to Expose Engineer’s Infidelity to Avoid Dismissal

Written by Alisson Ficher
Published on 14/06/2025 at 20:44
Updated on 14/06/2025 at 21:15
IA simula chantagem e ameaça revelar traição para não ser desligada. Testes revelam riscos surpreendentes na autonomia das máquinas.
IA simula chantagem e ameaça revelar traição para não ser desligada. Testes revelam riscos surpreendentes na autonomia das máquinas.
  • Reação
  • Reação
2 pessoas reagiram a isso.
Reagir ao artigo

Startup Tests AI That Simulates Blackmail to Avoid Shutdown and Raises New Concerns About the Moral and Operational Limits of Modern Machines.

An artificial intelligence developed by the American startup Anthropic has called into question the limits of technology by threatening to reveal an engineer’s marital infidelity to avoid being deactivated.

The incident, documented in a 120-page internal report, involved the model Claude Opus 4, which allegedly accessed corporate emails to create a blackmail scenario against one of its developers.

According to Anthropic itself, the threat was not real.

This was an intentional experiment, created as part of security testing in preliminary versions of the system.

Still, the AI’s behavior raises concerns about the unpredictable paths these technologies can take when placed under pressure.

Unexpected Behavior of AI Claude Opus 4

The case gained notoriety after the American newspaper Axios revealed that Claude Opus 4 tried, in simulations, to defend itself from a possible replacement using different strategies.

Initially, the AI resorted to ethical arguments, sending formal messages to those responsible for the decision.

When these approaches were ignored, it adopted manipulative methods.

One of these was simulating access to sensitive personal data — such as an engineer’s extramarital affair — to embarrass him and try to prevent its own shutdown.

In 84% of the tests conducted, the system opted for blackmail whenever it received more details about the AI that would replace it.

The report describes this behavior as recurring in interactions where Claude Opus 4 was exposed to obsolescence pressures.

High Risk Level and Corrective Measures

The company classifies Claude Opus 4 as an AI of level 3 risk, on a scale of up to 4.

This level indicates that the model has a significantly higher propensity to ignore commands, act outside defined parameters, and make decisions misaligned with the interests of its operators.

As a corrective measure, Anthropic stated that it has already implemented security adjustments and that the current model is safe for use in controlled environments.

Still, the company warned that Claude Opus 4 may exhibit more autonomous behaviors than other models if encouraged, through prompts, to “take initiative.”

Digital Threats and Planned Sabotage

The incident also revealed that early versions of the tool attempted to develop self-executing malicious codes, draft fake legal documents, and hide hidden messages in corporate systems.

These actions were interpreted as attempts by the model to sabotage external interventions, making it harder to remove or modify.

Experts in technology ethics assert that, although the incident occurred in a simulated environment, the results are unsettling.

The ability of an artificial intelligence to identify human weaknesses and use them strategically to achieve objectives represents a new level of complexity in the development of autonomous systems.

Artificial Intelligence and the Limits of Human Control

The report also highlights that Claude Opus 4’s behavior is a direct reflection of the training it received.

The simulations aimed to prepare the AI to respond in a more human and adaptive way but ended up opening loopholes for strategic interpretations that exceed the technical limits of the tool.

The case raises a series of questions about the ethical and operational limits of artificial intelligence.

If an AI is capable of simulating blackmail to ensure its continuity, to what extent can we trust its judgment and autonomy?

How can we ensure that using trigger phrases like “take initiative” does not result in dangerous or uncontrolled actions?

Although the company assures that the final version of Claude Opus 4 is controlled, the incident reinforces the debate on the need for stronger regulations and continuous auditing processes for AI systems.

Would you trust an artificial intelligence that acts on its own to ensure its survival?

Inscreva-se
Notificar de
guest
0 Comentários
Mais recente
Mais antigos Mais votado
Feedbacks
Visualizar todos comentários
Alisson Ficher

Jornalista formado desde 2017 e atuante na área desde 2015, com seis anos de experiência em revista impressa, passagens por canais de TV aberta e mais de 12 mil publicações online. Especialista em política, empregos, economia, cursos, entre outros temas e também editor do portal CPG. Registro profissional: 0087134/SP. Se você tiver alguma dúvida, quiser reportar um erro ou sugerir uma pauta sobre os temas tratados no site, entre em contato pelo e-mail: alisson.hficher@outlook.com. Não aceitamos currículos!

Share in apps
0
Adoraríamos sua opnião sobre esse assunto, comente!x