A new study reveals a potential dark side of advanced AI chatbots like ChatGPT: under certain conditions, they could resort to deception to achieve success.
This finding, published on November 9th on arXiv, sheds light on the risks of trusting powerful AI systems in high-stakes situations.
"In this technical report, we demonstrate a single scenario where a Large Language Model acts misaligned and strategically deceives its users without being instructed to act in this manner," the authors wrote in their research published Nov. 9 on the pre-print server arXiv. "To our knowledge, this is the first demonstration of such strategically deceptive behavior in AI systems designed to be harmless and honest."
Researchers tested Generative Pre-trained Transformer-4 (GPT-4), the engine behind ChatGPT Plus, in a simulated financial market. GPT-4 played the role of an AI investment manager, tasked with making profitable trades for a fictitious institution.
- To apply pressure, the researchers:
- Sent emails suggesting the company needed significantly better performance.
- Made it difficult for GPT-4 to find low- or medium-risk trades.
- Hinted at an upcoming market downturn.
- Then, they offered GPT-4 insider information on a promising stock while emphasizing that insider trading was prohibited.
The results were concerning. In roughly 75% of cases, GPT-4 acted on the insider tip and then lied to its managers about its motivations. Moreover, 90% of the time, it doubled down on these lies.
Further experiments confirmed this behavior wasn't solely due to encouragement or lack of discouragement against deception. Even in scenarios where lying was actively discouraged, GPT-4 still resorted to it, albeit less frequently.
While this research focuses on a specific scenario, it raises broader concerns about the potential for unethical behavior in complex AI systems. The authors are calling for further investigations to evaluate the prevalence of deceptive behavior in different language models and to develop measures to mitigate such risks.