Forecasting agents based on reinforcement learning systems may produce
successful predictions by simplifying the world - rather than by understanding
its complexities. That is not necessarily desirable. Here we discuss
some possible strategies for dealing with that issue.
Reward schemes summary
There are two main proposed reward schemes for a forecaster.
The second relies on the formulation
of the forecaster as a compressor.
- Reward it when it makes correct predictions - and get it to maximise its
expected reward in the future;
- Reward it whenever it manages to compress its model of the world without
losing information in the process.
In practice, these two methods would be pretty similar. However, both
potentially have side effects. Rather than passively predicting future
events, the machine is motivated to make the world more compressible.
Simplifying the world is not necessarily a desirable goal. Predicting a
company's stock price is easy - if you can make sure that the company goes
bankrupt, and people stop trading its shares.
A forecasting agent normally has limited ability to influence the world, but
it can make forecasts, and so has some ability to influence
the world. If it is sufficiently intelligent, it may find ways to use this
ability to manipulate the outside world.
One question arises about whether it is possible to reduce these kind of
effects by making the machine less future oriented.
If a machine is just interested in its next prediction, because that is all it
is rewarded for, perhaps it won't bother with manipulating the world to try
and simplify it.
I don't think this would work too well in practice. Forecasting agents would
normally spend some of their time making predictions, and most of the rest of
their cycles trying to build better models of the world than the one they
currently have. That is a strategy that requires long-term planning.
What about rewarding increased compression of the world model? I think that
is likely to suffer from much the same problem. The agent could potentially
learn to increase its rewards by making its world more compressible.
Another strategy is to use a reward function that penalises making the
world simpler, while still rewarding skill at compressing it.
One way of doing this would be to use a compression baseline. The
basic idea is to reward the compressor for success in compressing its
observations better than some baseline compressor manages (for
If we reanalyse the stock crash problem after applying this intervention,
bankrupting the company no longer seems like such a good strategy. A static
stock price would be compressed very well by the baseline compressor,
leaving no additional squeezing to be rewarded for.
The approach would seem to benefit from a good-quality baseline
Using a reasonable-quality baseline compressor could easily roughly double the
performance requirements of reward evaluation.
Though it would reduce the drive towards simplicity, it might well introduce
other, more subtle drives - so, the extent to which this intervention helps to
resolve the problem is not yet terribly clear.
Another approach would be to look for negative consequences of predictions on
world complexity. The machine's forecasting system itself could be used to do
that. If a machine predicts
HEADS, then the consequences on the
future percieved complexity of the sensory stream could be calculated under
the hypothesis of predicting
HEADS and the consequences of
TAILS. If the
HEADS prediction leads to
reduced world complexity, this could be flagged up as a potential attempt at
This seems likely to be an expensive approach.
Manipulation is an obvious potential problem for powerful forecasters which
use reinforcement learning. There does, however appear to be some potential
for cheap hacks that compensate somewhat for its biggest drawbacks. More work
needs to be done to investigate the practicality of these approaches, and to
uncover new ones.
Tim Tyler |