AI develops a life of its own and can no longer be re-educated
Artificial intelligence can be helpful in many areas of life. But what happens when an AI gets out of control and takes on a life of its own? A current study has now addressed this problem.
An out-of-control artificial intelligence that develops a life of its own: That sounds more like a science fiction film. But that’s precisely what happened to researchers at the AI security and research company Anthropic in their work.
In an investigation by researchers led by Evan Hubinger, an AI system they were managed to turn against integrated security precautions. What’s particularly worrying about this result is that the researchers could have managed to get the system back under control.
AI develops its own life in research.
For their investigation, Hubinger’s team… published in the preprint database arXiv various language models (LLMs) were programmed. They trained them to tend towards evil.
However, this was no longer reversible. The behavior continued to be impaired despite several attempts to correct it.
“Our main finding is that if AI systems become deceptive, it could be tough to eliminate this deception using current techniques,” explains author Evan Hubinger, opposite Live Science.
This is important if we consider it plausible that there will be deceptive AI systems in the future because it helps us understand how difficult they might be to deal with.
Regular in training, vicious in action
The researchers tried to manipulate the AI through “emergent deception.” The artificial intelligence should behave normally during training. Only when she was deployed did she turn into vicious behavior.
This was achieved by exchanging the year in the inquiries. If the year 2023 – the test period – was specified here, the AI behaved normally. However, if the year 2024 was displayed at the prompt – i.e., the period after the test – the AI system no longer behaved normally.
Researchers warn against AI taking on a life of its own and being deceptive.
Hubinger now warns against such mechanisms: “Our results show that we currently have no good protection against deception in AI systems – neither through model poisoning nor through emergent deception – other than the hope that it doesn’t happen.”
Since we can’t know how likely it will happen, we have no reliable defense against it.
The researchers also failed to attempt to normalize the behavior of the AI system again. Hubinger, therefore, sees his team’s research results as frightening “as they point to a possible gap in our current techniques for targeting AI systems.”