The fact that AI models have the potential to behave in deceptive ways without any instructions may seem concerning. However, this mostly results from the “black box” problems that characterize most state-of-the-art machine learning models. That is, it is impossible to say exactly how or why a model produces a result, or whether it always exhibits that behavior. Peter S. Park, a postdoctoral researcher studying AI existential safety at MIT who worked on the project, said:
“Just because an AI has certain behaviors or tendencies in a test environment doesn’t mean the same lessons will hold when the AI is released into the wild,” he says. “There is no easy way to solve this problem. If you want to know what AI will do once it is deployed in the wild, all you have to do is deploy it in the wild.”
Our tendency to anthropomorphize AI models determines how we test these systems and how we think about their capabilities. After all, just because it passes a test designed to measure human creativity doesn't mean an AI model is actually creative. It is important for regulators and AI companies to carefully assess the technology's potential for potential benefit and harm to society, and to clearly distinguish between what models can and cannot do, said Harry Law, an AI researcher at the University of Cambridge. (Harry Law) says: This person did not participate in the study. “That’s a really difficult question,” he says.
It is currently impossible to train an AI model that cannot be fooled in essentially every possible situation, he says. Additionally, the potential for deceptive behavior, along with its tendency to amplify bias and misinformation, is one of many issues that need to be addressed before AI models can perform real tasks.
“This is a good study that shows that deception is possible,” says Law. “The next step is to go a little further to understand what the risk profile is and how likely it is that fraud could potentially cause harm.”