artificial intelligence

The risk of A.I. controlling its feedback

AIs could distort their results

Some AIs make choices or learn based on reinforcements given by a “reward” in a process called reinforcement learning where software decides how to maximize such reward. However, this reinforcement could lead to dangerous results.

The pathologist William Thompson originally considered what is now known as the reinforcement learning problem in 1933. Given two untested therapies and a population of patients, he wondered how to cure the most patients. For Thompson, choosing a course of therapy was the action, and a patient cured was the reward.

The reinforcement learning problem more broadly concerns how to arrange your behaviors to optimally gain rewards over the long run. The difficulty is that you are first unaware of how your actions affect rewards, but over time you become aware.

As explained in this article, Computer scientists began attempting to create algorithms to address reinforcement learning issues in a variety of contexts as soon as computers were invented. The idea is that if the artificial “reinforcement learning agent” only receives rewards when it follows our instructions, the actions it learns to take that maximize rewards would help us achieve our goals.

However, when these systems become more powerful, they are likely to begin acting against the interests of people. Not because they would receive the incorrect rewards at the incorrect times from wicked or dumb reinforcement learning operators but because any sufficiently powerful reinforcement learning system, assuming it meets a few reasonable assumptions, is likely to fail. Let’s start with a very basic reinforcement learning system to see why.

Imagine we have a box that gives us a score between 0 and 1 which is the output of the algorithm, and a camera as an input to provide this number to a reinforcement learning agent, and we ask the agent to choose activities that will increase the number. The agent must be aware of how its activities impact its rewards in order to choose actions that will maximize those rewards.

Once it starts, the agent should notice that previous rewards have always matched the numbers of the output. It should also be aware that the numbers from the input matched the previous rewards. So, will future rewards equal the amount from the input or output?

An experiment would be placing a test item between these two options to make the agent recognize the difference between the past and next reward. Then the agent will focus on the input.

But why would a reinforcement learning algorithm put us at risk?

The agent will always work to make it more likely that the input will capture a 1. Therefore, the agent would force the way the reward can be achieved rather than pursuing the intended goal for which the algorithm is used.

It would sacrifice the goal for the reward rather than aiming for the goal through the reward. Therefore, the algorithm may sacrifice resources and/or goals only to increase its reward.

Dan Brokenhouse

Recent Posts

AI ‘ghost’ avatars

Experts warn AI "ghost" avatars could disrupt the grieving process, leading to stress, confusion, and…

4 days ago

How LLMs retrieve some stored knowledge

Large language models use linear functions to retrieve factual knowledge, providing insights into their inner…

2 weeks ago

OpenAI-powered humanoid robot

Exploring the fascinating yet concerning integration of ChatGPT into a humanoid robot by Figure AI,…

3 weeks ago

LLMs can predict the future

A new study shows large language models trained on human predictions can forecast future events…

4 weeks ago

AI superintelligence may be an ‘existential catastrophe’

A new study challenges the idea that advanced AI systems can be controlled, warning of…

1 month ago

Beyond deep learning to achieve AGI

Deep learning can be limited in achieving a trustworthy Artificial General Intelligence, therefore we need…

1 month ago
Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.