Manipulative Forecasting


Forecasting agents based on reinforcement learning systems may produce successful predictions by simplifying the world - rather than by understanding its complexities. That is not necessarily desirable. Here we discuss some possible strategies for dealing with that issue.

Reward schemes summary

There are two main proposed reward schemes for a forecaster.

  • Reward it when it makes correct predictions - and get it to maximise its expected reward in the future;

  • Reward it whenever it manages to compress its model of the world without losing information in the process.

The second relies on the formulation of the forecaster as a compressor.

Side effects

In practice, these two methods would be pretty similar. However, both potentially have side effects. Rather than passively predicting future events, the machine is motivated to make the world more compressible.

Simplifying the world is not necessarily a desirable goal. Predicting a company's stock price is easy - if you can make sure that the company goes bankrupt, and people stop trading its shares.

A forecasting agent normally has limited ability to influence the world, but it can make forecasts, and so has some ability to influence the world. If it is sufficiently intelligent, it may find ways to use this ability to manipulate the outside world.


One question arises about whether it is possible to reduce these kind of effects by making the machine less future oriented.

  • Short-term thinking

    If a machine is just interested in its next prediction, because that is all it is rewarded for, perhaps it won't bother with manipulating the world to try and simplify it.

    I don't think this would work too well in practice. Forecasting agents would normally spend some of their time making predictions, and most of the rest of their cycles trying to build better models of the world than the one they currently have. That is a strategy that requires long-term planning.

    What about rewarding increased compression of the world model? I think that is likely to suffer from much the same problem. The agent could potentially learn to increase its rewards by making its world more compressible.

  • Baseline

    Another strategy is to use a reward function that penalises making the world simpler, while still rewarding skill at compressing it.

    One way of doing this would be to use a compression baseline. The basic idea is to reward the compressor for success in compressing its observations better than some baseline compressor manages (for example PKZIP).

    If we reanalyse the stock crash problem after applying this intervention, bankrupting the company no longer seems like such a good strategy. A static stock price would be compressed very well by the baseline compressor, leaving no additional squeezing to be rewarded for.

    The approach would seem to benefit from a good-quality baseline compressor.

    Using a reasonable-quality baseline compressor could easily roughly double the performance requirements of reward evaluation.

    Though it would reduce the drive towards simplicity, it might well introduce other, more subtle drives - so, the extent to which this intervention helps to resolve the problem is not yet terribly clear.

  • Consequences

    Another approach would be to look for negative consequences of predictions on world complexity. The machine's forecasting system itself could be used to do that. If a machine predicts HEADS, then the consequences on the future percieved complexity of the sensory stream could be calculated under the hypothesis of predicting HEADS and the consequences of predicting TAILS. If the HEADS prediction leads to reduced world complexity, this could be flagged up as a potential attempt at manipulation.

    This seems likely to be an expensive approach.


Manipulation is an obvious potential problem for powerful forecasters which use reinforcement learning. There does, however appear to be some potential for cheap hacks that compensate somewhat for its biggest drawbacks. More work needs to be done to investigate the practicality of these approaches, and to uncover new ones.

Tim Tyler | Contact |