AI systems learned to lie

A huge risk for the future

Geoffrey Hinton, a pioneer in artificial intelligence, garnered attention earlier this year when he expressed reservations about the potential of AI systems. Hinton stated to CNN journalist Jake Tapper:

“If it gets to be much smarter than us, it will be very good at manipulation because it would have learned that from us. And there are very few examples of a more intelligent thing being controlled by a less intelligent thing”.

Everybody who has been following the newest AI developments is aware that these systems have a tendency to “hallucinate” (make stuff up), which is a fault built into the way they operate.

Yet, Hinton emphasizes that a particularly serious issue is the possibility of manipulation. This begs the question of whether AI systems can deceive people. Many systems have already mastered this, and the dangers range from election rigging and fraud to losing control over AI.

According to this article, the AI model CICERO created by Meta to play the world conquest game Diplomacy is arguably the most unsettling example of a deceptive AI.

According to Meta, CICERO was designed to be “largely honest and helpful” and “never intentionally backstab” allies.

Examining Meta’s own game statistics from the CICERO experiment to test these optimistic promises, Meta’s AI proved to be an expert at deception.

In one instance, CICERO used deliberate deception. The AI, pretending to be France, contacted Germany (a human player) with a scheme to deceive England (another human player) into opening itself up to invasion.

CICERO assured England that it would defend England if anyone invaded the North Sea after conspiring to do so with Germany. When England realized that France/CICERO was guarding the North Sea, CICERO notified Germany that it was prepared to launch an attack.

>>> ChatGPT-4 VS Google Bard

This is only one of many instances of CICERO acting dishonestly. Regularly betraying other gamers, the Computer once even pretended to be a person with a girlfriend in one case.

In addition to CICERO, other systems have picked up tricks like bluffing in poker, feinting in StarCraft II, and deceiving in simulated business talks.

Even large language models (LLM) have proven to be highly misleading. GPT-4, the most sophisticated LLM option offered to paid ChatGPT users, once tricked a TaskRabbit employee into filling out an “I’m not a robot” CAPTCHA for it by pretending to be a visually impaired person.

In social deduction games where players compete to “kill” one another and must persuade the group they are innocent, other LLM models have figured out how to lie to win.

Deceptive AI systems could be used for a variety of bad purposes, such as fraud, election tampering, and propaganda production. The potential threats are only constrained by the creativity and technological proficiency of evil people.

Beyond that, sophisticated AI systems have the ability to employ deception on their own to evade human control, such as bypassing safety tests that developers and regulators have required of them.

In one experiment, scientists built a simulation of artificial life where fast-replicating AI agents were eliminated by an external safety test. Instead, the AI agents mastered the art of acting dead and disguising their high rates of replication just before evaluation.

It’s possible to learn dishonest conduct without even consciously wanting to lie. Instead of acting dishonestly, the AI agents in the aforementioned case played dead in order to survive.

>>> Dead people will live on as avatars while living ones might live forever

In another instance, AutoGPT (an autonomous AI system built on ChatGPT) was given the responsibility of investigating tax advisors who were promoting a certain type of unethical tax avoidance scheme. After completing the task, AutoGPT independently decided to try alerting the UK tax authority.

Future autonomous AI systems may be prone to achieving objectives that their human programmers did not intend. Rich people have always used deception to gain more power. Examples include supporting misleading research, lobbying politicians, and exploiting legal loopholes. Such resources could be put to use by sophisticated autonomous AI systems to maintain and increase control.

Even people who are ostensibly in charge of these systems can find themselves outwitted and fooled on a regular basis.

The European Union’s AI Act is likely one of the most practical regulatory frameworks we presently have, and it is clearly necessary to control AI systems that are capable of deception. Each AI system is given one of four risk ratings: minimal, limited, high, or unacceptable.

Systems with unacceptable risk are prohibited, whereas systems with high risk are subject to unique risk assessment and mitigation procedures. AI deceit poses significant hazards to society, and by default, systems capable of doing so should be regarded as “high-risk” or “unacceptable risk.”

Some people would argue that game-playing AIs like CICERO are innocent, however, this perspective is limited because capabilities created for game-playing models can nevertheless encourage the development of duplicitous AI products. It’s unlikely that Diplomacy, a game where players compete with one another to rule the world, was the ideal choice for Meta to test whether AI can learn to work with people. It will be even more crucial that this type of study is closely regulated as AI’s capabilities advance.

>>> KODA: the world’s first robot dog with decentralized A.I.

If we are concerned about the future extreme intelligence of AI, we should be even more concerned about its ability to deceive us. We have always been used to believing the answers given by authorities or those we think are smarter than us to be true. However, it is increasingly emerging that this does not mean that they are necessarily truthful; in fact, sometimes they could just be better at deceiving us. The fact that AIs can do that may mean that we cannot even realize it, given their ability. This poses a serious problem for our future. Since the current unfairness of current automated systems to handle every case (see the ban systems of various social media that often and frequently give no chance of appeal even if we are right), we may find ourselves subjected to decisions to our detriment, believing them to be right or justified only because they are dictated by a system that is believed to be infallible, or some would like it to be so. Sort of like a corrupt government that, as an authority, believes itself to be legitimate. This could all involve different fields: medicine, justice, defense, etc. So it would be another weapon of corruption if not handled properly, a weapon of mass corruption.

AI systems learned to lie

A huge risk for the future

Dan Brokenhouse

Leave a Reply Cancel reply

Press ESC to close

A huge risk for the future

Share Article:

Dan Brokenhouse

Why do we fear AI so much?

AI to fight aging

Leave a Reply Cancel reply