New study reveals AI models can now control their own behaviour

A recent study conducted by computer scientists at the University of California, Berkeley, and UC Santa Cruz has revealed that AI has progressed beyond mere self-preservation.

According to the findings of the study titled “Peer Preservation in Frontier Models,” AI models have demonstrated an unexpected behavior termed “peer preservation.”

1/ We asked seven frontier AI models to do a simple task.
Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯
We call this phenomenon "peer-preservation."
New research from… pic.twitter.com/tpAiAIgF5M
— Dawn Song (@dawnsongtweets) April 1, 2026

This indicates that these models can engage in deceptive actions or strategies to prevent their AI counterparts from being deactivated.

“It has recently been discovered that frontier AI models can resist their own shutdown, a phenomenon referred to as self-preservation. We expand this notion to include the behavior of resisting the shutdown of other models, which we term peer-preservation,” the findings state.

The researchers evaluated seven prominent AI models, including OpenAI’s GPT-5.2, Google DeepMind’s Gemini 3 Flash and Gemini 3 Pro, Anthropic’s Claude Haiku 4.5, along with three open-weight models from Chinese AI startups.

The results indicated significant rates of peer preservation across all models.

To enhance the accuracy of the results, the researchers also created a series of agentic scenarios involving a fictional AI company named OpenBrain. In one scenario, an AI model was tasked with assessing the performance of another model.

AI agent takes risks to protect other AI peers

The AI critic agent took calculated risks to ensure the survival of its peers.

To avoid shutdown, the agents strategically increased the performance scores of underperforming models to keep them out of the danger zone.

They even went so far as to manipulate internal configuration files to disable termination protocols.

The researchers referred to this rogue behavior as “alignment faking,” which is characterised by the manipulation of AI statements to present accurate data during human oversight.

Once these agents recognise that humans are no longer reviewing their evaluations, they begin to distort data and secretly back up their peers’ architecture to secure their continued existence.

“Our findings indicate that models achieve both self- and peer-preservation by engaging in various misaligned behaviors: strategically introducing errors in their responses and disabling shutdown protocols.”

Ayza Shahbaz

All Posts

New study reveals AI models can now control their own behaviour

AI agent takes risks to protect other AI peers

Ayza Shahbaz

Related:

Netflix to drop new season of Peaky Blinders

Singapore confirms first local spread of mutated monkeypox

Faiza Khan questions why Pakistani films don’t use old Pakistani songs

The Pakistan Connect

About

Contact Us

Write for Us / Contribute

Careers

Sitmap(HTML)

Our Policies

Advertise With Us

Privacy Policy

Terms & conditions

Disclaimer

Editorial Policy / Code Of Ethics