Rewards vs Goals

Rewards vs Goals


Hi! I'm Tim Tyler, and this is a video entitled Rewards vs Goals. Essentially it revisits the wirehead problem - and considers it in the light of the forecasting first scenario.

The wirehead problem

The wirehead problem arises as a result of intelligent agents stimulating their own pleasure centres - once they are properly able to understand the consequences of their own actions. In the context of intelligent machines, this usually involves the agents not doing what you want them to do - and so it is seen as undesirable.

There are a couple of conflicting visions of what dynamics machine intelligence is likely to exhibit in connection with the wirehead problem.

One perspective is from economics - and the framework of expected utility maximisation. According to a number of those who have studied the situation from this perspective, intelligent agents are likely to exhibit stable utility functions - and will strive to protect their values from modification under most circumstances.

From this perspective, most intelligent agents can be expected to reject behaviour that is analogous to drug-taking - and so will not stimulate their own pleasure centres or synthesise fake utility.

The possibility of an agent adjusting itself so it directly stimulates its own pleasure centres - thereby attaing its reward without doing any work - would be seen as a disaster, rather than as an opportunity, since none of its original goals would be being met.

Another perspective on the issue comes from reinforcement learning enthusiasts. In reinforcement learning, agents learn the regularities in their environment by receiving feedback about how well they are doing in the form of a single, scalar reward signal. They then learn to adapt their behaviour in an attempt to increase their rewards.

Reinforcement learning agents seem likely to be hedonistic pleasure seekers - and thus relatively prone to wireheading.

Wirehead behaviour has been observed in a variety of real-world situations:

  • Animals - including humans - often pleasure themselves by taking drugs.

  • Lab animals also stimulate their own pleasure centres - if given the chance to do so.

  • Money serves as utility within governments - and they sometimes print money - causing a form of dramatic and rapid inflation known as hyperinflation. It is like the government taking drugs.

  • For investors, stock prices often function as a kind of reward signal. Usually, increases are good, and decreases are bad. However there are a variety of fraudlent practices which companies can use to temporarily artificially inflate their stock price. The Enron company used some of these strategies in an insider-trading accounting scandal that resulted in the company going bankrupt.

  • Wireheading was also famously seen in the early intelligent program, Eurisko.

These examples show that wireheading is a real phenomenon. However, they also suggest that wireheading may be relatively rare. Most companies, governments, animals and machines do not wirehead themselves. This illustrates that - though wireheading is a possibility - wirehead avoidance is also possible.

Some supporters of the idea that wireheading is important suggest that it may act as a limit, or brake on the future development of machine intelligence. They argue that, once intelligence becomes advanced enough for agents to become capable of performing brain surgery on themselves, they will start to take advantage of this capability by wireheading themselves - and consequently improvements to intelligence will automattically slow down, through a kind of natural selection.

Wirehead enthusiasts tend to view valuing things other than pleasure as signifying a lack of intelligence. A pleasure-valuing agent can learn to perform any task - and its intelligence is quite general. By contrast, an agent that is programmed to find value in one specific way may find it difficult to adapt and survive if the environment it finds itself in changes.

For their part, those who think the wirehead problem has a neat solution tend to think that it is the agents that are prone to wireheading that are the stupid ones.

Forecasting first

Having described some background now we get to the point of the video - which is to do with what insights into the wirehead problem can be obtained by adopting a "forecasting-first" perspective.

Firstly, I think that the analyisis of goal-seeking provided by expected utility theory is probably essentially correct. So, I think it will be possible to construct an intelligent agent with a specified goal that remains stable over time, even as a system improves itself and increases its own intelligence.

However, this doesn't mean that the wirehead problem won't be an issue.

It looks as though a reward-seeking agent is going to be easier to construct initially. Evolution built reward-seeking agents first - and if you look at the engineering problems involved, it looks much simpler to build a reward-seeking agent than one with a clear conception about its own purpose, which it strives to preserve.

In particular, if you just consider the forecasting component of a machine intelligence, by far the most obvious way to construct that is as a reinforcement learning system - one that is rewarded for compressing its world model, making accurate predictions, or some combination of the two.

This approach has some side effects. A forecasting component built using reinforcement learning could increase the intensity of its reward signal in several ways:

  • By making better forecasts;
  • by attempting to manipulate the world so that it is more predictable, and...
  • by attempting to manipulate the world so that the agent gets pleasure directly.

Of these, only the first one is desirable. If you are trying to predict a company's stock price, and the system notices that it can be made very predictable if it arranges things so the company goes bankrupt, that might not be the ideal outcome. A forecasting agent has limited ability to influence the world, but it can change things by making forecasts - and there is such a thing as a self-fulfilling prophesy in the world of financial markets.

You could try and reduce the liklihood of such outcomes by making the agent dislike them - but that starts to get into an area of non-trivial evaluation function problems - where you are rewarding the agent for something more complex than making correct predictions.

A forecasting-based architecture

The other thing to say about the "forecasting first" perspective is that it helps throw some light on what early reinforcement learning systems might look like.

I have believed for a while that goal based systems would be better - but also that we will probably get reinforcement systems first due to reasons associated with engineering difficulty.

However, I imagined simple reinforcement systems being rewarded for their immediate sensory inputs. It now looks as though we probably won't build that kind of system very much initially - but rather will engineer a type of consequentialist system - and engineer-in a division between forecasting, tree pruning and evaluation. Instead of the system having to figure out the expected utility framework itself, we can usefully build that in - and let the machine skip out the problem of coming up with that design pattern.

Such a system will still have an evaluation function - and it will still look much like the evaluation function of a simple reinforcement agent. So, in the analogous case of a human, warmth, sugar, fat and sex would be assigned value, while pain and discomfort would be regarded negatively. However, rather being applied to existing sensory inputs, the evaluation function would be appield to leaf nodes of a forecasting tree - to expected sensory inputs.

That provides some insight into what utility engineering will probably look like for such agents. They will be apply an evaluation function to expected sensory inputs - similar to the way in which simple reinforcement learning agents value current sensory inputs.

There are some downsides to engineering this sort of thing in, because there are some subtleties and it isn't trivial to do right - but I think that dividing the problem up this way can be expected to result in it being solved quite a bit more rapidly.


Tim Tyler | Contact |