Machine Forecasting

Machine forecasting


Hi! I'm Tim Tyler, and this is a video about machine forecasting.

Machine forecasting refers to the use of artefacts to forecast future events. I've spent a while thinking about the automating of forecasting recently, and in this video I will step back and attempt to give a brief overview of the field and a discussion of its significance.


The human brain spends a lot of its time predicting the future consequences of its possible actions.

Many machine intelligences consist of a prediction component, an evaluation function and tree-pruning heuristics. The prediction component is used to understand the long-term consequences of actions, the evaluation function says how desirable possible future states are - and the tree-pruning heuristics are there to help narrow down the search space - so it can be searched in real time.

My main thesis here will be that machine forecasting is a problem that is likely to be solved before we have full machine intelligence.

Forecasting, prediction, compression

Machine forecasting is equivalent to the problem of predicting discrete sequences - since sensory input channels can be digitised and then serialised. Also, forecasting the far future is equivalent to predicting the probability density function associated with very the next symbol in the sensory stream - since if you can do that, you can iterate the process to make a weighted tree of predictions that reaches out into the more distant future.

Sequence prediction is equivalent to stream compression. The close link between sequence prediction and stream compression is not obvious to everyone - so briefly:

Both problems require the construction of a model of the stream. For a prediction engine, the model is used to predict what symbol will come next - and for a compressor, the same prediction is made, and then the probabilities of the actual symbol observed are converted into output bits using an arithmetic encoding scheme, or similar. In practice there are sometimes a few minor differences associated with these two application domains, but nothing to write home about - the two problems are essentially one problem wearing two different outfits.


That compression and machine intelligence are so closely linked seems to me to be one of the most important theoretical breakthroughs in the field in the last decade. The idea has been most prominently championed by Marcus Hutter. Marcus sponsored a compression prize in 2006 - to help drive forwards research in the area.

The significance of the work relating compression to machine intelligence is high for several reasons. It allows an easy way of measuring progress - an area which has been explored by Shane Legg. Also, it successfully breaks a challenging problem down into sub-components - often an important step on the way to solving the problem. Lastly, but perhaps most significantly, developing good quality stream compression engines looks like an easier problem than machine intelligence - and it is one which immediately suggests possible ways to solve it.

Compression implications

What could you do with a high quality stream compressor? Many of those working in the area appear to argue that compression is equivalent to machine intelligence. When pressed for details, they often say that a powerful sequence prediction engine would allow machines to pass the Turing test - if they were given sufficient relevant training data.

It seems to be true that - if a sufficiently powerful predictor watched enough Turing tests take place, and was then given a partial transcript, then it might well be able to do a convincing job of predicting which reply was most likely to come next.

However, this approach doesn't help much with something like soundly beating a 9-dan go player - since then, there are not any agents with sufficiently-superior intellects available to get relevant training data from.

So, it seems as though predictors can copy and imitate - but are less adept at innovation. They could imitate a human, but perhaps not pass themselves off as a superintelligence. Possibly there are ways of making such a machine think that a superintelligence exists - and get them wondering what actions it might take - but this gets us into rather contrived territory.

So: machine forecasting isn't quite the same thing as full machine intelligence. Of course, in order to predict the actions of any intelligent agents it observes in its environment, a competent forecaster necessarily has to model them in considerable detail - thereby developing a model of acting intelligently. So, forecasing is close to advanced machine intelligence - but it isn't exactly the same thing.

Advantages of the approach

To my eyes, a forecasting agent seems rather like a machine intelligence with its main evaluation and tree-pruning circuitry stripped out, and a stripped down set of actuators. Since practically any intelligent agent needs to consider the future consequences of its actions, that makes forecasting a subset of almost any machine intelligence project. As with any modular construction, usually individual components are constructed before they are assembled.

Forecasting has sufficiently many real-world applications for its development to be well funded. Predicting stock market prices is perhaps the most obvious application. Google has discovered that it would like to know your seach query before you type most of it in. There are many other applications - and they will provide an economic impetus to developing such systems - allowing their successes to catalyse their future development.

Compression is a problem where we have mountains of training data. If a reinforcement learning paradigm is used, the reward is relatively simple to calculate - and can be applied very rapidly. Also, there are no mechanical robots or humans in the loop - which is good, since such things tend to slow the whole system down.

Finally, which evaluation function to use for full machine intelligence projects is a difficult and controversial area - and tree pruning can quickly get messy. However, general-purpose data compression is pretty-much a pure math problem. It is, at the very least, a traditional computer science problem.


I think about the only other likely path to machine intelligence involves automated programming. In automated (or inductive) programming, computer programs write other computer programs from specifications in high-level languages. Automated programming has potentially greater autocatalytic potential - compared to forecasting. It closes up the build-test cycle, cutting out human programmers out of the loop, potentially resulting in much-improved efficiency. However, progress in the field looks slow to me, the associated autocatalytic process might not start to snowball terribly early on - and I don't think it has much of a chance of beating an approach that aims directly at a forecasting agent. Possibly the two fields might mutually catalyse each other.

In closing

This then, is the case for machine forecasting coming first. Once we have machine forecasting, machine intelligence will probably follow relatively quickly.

It does seem very likely to me that we will get machine forecasting first. However, I note that only a few seem individuals seem to be currently engaged in the area. In my humble opinion, the approach seems so promising that many of those interested in machine intelligence should seriously consider either aiming for such an agent - or considering how to deal with the consequences of one being produced.


Machine forecasting implications

Machine forecasting implications

Hi! I'm Tim Tyler, and this is a rather technical video about machine forecasting - in other words, it is about automating the task of predicting the future.

In my last video on the topic I discussed the reasons for thinking that a forecasing agent might be an important stepping stone on the path towards creating general-purpose machine intelligence. In this video I will look further into one of the topics I discussed there - and consider what access to a powerful forecasting agent would allow you to achieve, and how close to general machine intelligence it would actually get you.

Firstly, forecasting and stream compression are equivalent problems - as I have explained in more detail elsewhere, and you should bear that in mind for the next minute or so.

Many seem to consider that access to a powerful real-time stream compression system would be equivalent to access to a powerful machine intelligence.

For example, Matt Mahoney writes:

I argue that compressing, or equivalently, modeling natural language text is "AI-hard". Solving the compression problem is equivalent to solving hard NLP problems such as speech recognition, optical character recognition (OCR), and language translation. I argue that ideal text compression, if it were possible, would be equivalent to passing the Turing test for artificial intelligence (AI), proposed in 1950.

Matt goes on to say:
In 2000, Hutter proved that finding the optimal behavior of a rational agent is equivalent to compressing its observations.

Then he goes on to argue that a powerful text compression engine would allow machines to pass the Turing test - if they were given a sufficiently large quantity of relevant training data.

I agree that - if a sufficiently powerful predictor watched enough Turing tests take place, and was then given a partial transcript - then it would be able to do a convincing job of predicting which reply was most likely to come next.

However, I am not really convinced that advanced forecasting is equivalent to advanced machine intelligence.

A powerful forecaster seems to be rather like an intelligent agent with some pieces missing.

One way of representing a cybernetic diagram of an intelligent agent would be as the combination of a forecasting component, an evaluation function, and a tree-pruning algorithm.

If you just have a forecasting component, then you can use that to predict the consequences of your possible actions. However, you would really need an evaluation function and a tree-pruning algorithm to make full use of this ability.

Simple tree-pruning algorithms are available as off-the-shelf components - though in general, tree-pruning can be a non-trivial problem. However, the evaluation function is a substantial and non-trivial element, which would not be supplied with the forecasting component.

A powerful forecaster would be able to predict the behaviour of agents in its environment. If it watched some experts playing go, it would model them in considerable detail in order to acquire the ability to predict their moves. Being able to do that would mean that such a forecaster would be a pretty smart machine.

Such a forecaster would need to at least know something about tree pruning and evaluation functions if it was successfully able to model and predict the actions of other agents in its environment. However, modelling the evaluation function of others isn't quite the same thing as having your own evaluation function.

Imagine for a moment that you had access to a black box containing a fairly powerful forecaster - and that you wanted to use it to obtain a program which was able to play the game of go better than the best humans can.

In practice, you can't just graft a tree pruning algorithm - and the idea of winning games as an evaluation function onto a forecasting component. The resulting composite machine would spend its lifetime exploring the resulting search space of mover - and might never get any positive rewards through winning games. Evaluation functions in go are complex. Deciding which evaluation function to use is a non-trivial part of finding the solution to the problem - and having a forecasting component doesn't obviously help terribly much with the problem.

Perhaps it would be possible to trick the forecaster into thinking a superintelligent go player already existed - and then get it to predict its moves. However, without actually seeing the actions of a superintelligent go player, the existence of such an agent might not be obvious to the forecaster.

It might be challenging to fool the forecaster into believing that such a creature actually existed. The forecaster might instead apply Occam's razor - and decide that such an explanation was a needlessly complex way of interpreting the data - and conclude, accurately, that it is more likely that someone is trying to fool it.

Much the same kind of thing might happen if you tried to get the forecaster to imagine printing out the source code for such an agent.

If I was convinced that an intelligent machine go player existed, I would be hard pressed to write down a reasonable guess at what their source code was.

Even if you did trick a forecaster into counter-factually believing that such an expert go playing agent existed, it might not do a very good job of predicting either its actions or its source code - unless it was itself extremely powerful.

Like humans, an agent might be most adept at creating lesser agents - and have problems creating its own intellectual peer.

So, it seems to me that a forecasting component doesn't seem to be quite the same thing as advanced machine intelligence would be. However, it does seem likely that an advanced prediction engine would allow you to build creatures that imitated humans - if given sufficient training time.

With enough human-like machines around, the rate of scientific and technical progress might well increase substantially. That would allow faster research and progress into the field of machine intelligence - facilitating a gradual evolutionary boot-strapping process - that would then lead to full superintelligence.



Sequence Prediction - my other main page on this topic
50'000 Prize for Compressing Human Knowledge
Why Compress?
Forecasting - Wikipedia
Universal Artificial Intelligence - Facebook group
Prediction and Forecasting - Facebook group

Tim Tyler | Contact |