Hi! I'm Tim Tyler, and this is a video about machine forecasting.
Machine forecasting refers to the use of artefacts to forecast future
events. I've spent a while thinking about the automating of forecasting
recently, and in this video I will step back and attempt to give a brief
overview of the field and a discussion of its significance.
Forecasting
The human brain spends a lot of its time predicting the future consequences of
its possible actions.
Many machine intelligences consist of a prediction component, an evaluation
function and tree-pruning heuristics. The prediction component is used to
understand the long-term consequences of actions, the evaluation function says
how desirable possible future states are - and the tree-pruning heuristics are
there to help narrow down the search space - so it can be searched in real
time.
My main thesis here will be that machine forecasting is a problem
that is likely to be solved before we have full machine intelligence.
Forecasting, prediction, compression
Machine forecasting is equivalent to the problem of predicting discrete
sequences - since sensory input channels can be digitised and then serialised.
Also, forecasting the far future is equivalent to predicting the probability
density function associated with very the next symbol in the sensory stream -
since if you can do that, you can iterate the process to make a
weighted tree of predictions that reaches out into the more distant future.
Sequence prediction is equivalent to stream compression. The close link
between sequence prediction and stream compression is not obvious to everyone - so
briefly:
Both problems require the construction of a model of the stream. For a
prediction engine, the model is used to predict what symbol will come next -
and for a compressor, the same prediction is made, and then the probabilities
of the actual symbol observed are converted into output bits using an
arithmetic encoding scheme, or similar. In practice there are sometimes a few
minor differences associated with these two application domains, but
nothing to write home about - the two problems are essentially one problem
wearing two different outfits.
Significance
That compression and machine intelligence are so closely linked seems to me to
be one of the most important theoretical breakthroughs in the field in the
last decade. The idea has been most prominently championed by Marcus Hutter.
Marcus sponsored a compression prize in 2006 - to help drive forwards research
in the area.
The significance of the work relating compression to machine intelligence is
high for several reasons. It allows an easy way of measuring progress - an
area which has been explored by Shane Legg. Also, it successfully breaks a
challenging problem down into sub-components - often an important step on the
way to solving the problem. Lastly, but perhaps most significantly, developing
good quality stream compression engines looks like an easier problem
than machine intelligence - and it is one which immediately suggests possible
ways to solve it.
Compression implications
What could you do with a high quality stream compressor? Many of those working
in the area appear to argue that compression is equivalent to machine
intelligence. When pressed for details, they often say that a powerful
sequence prediction engine would allow machines to pass the Turing test -
if they were given sufficient relevant training data.
It seems to be true that - if a sufficiently powerful predictor watched enough
Turing tests take place, and was then given a partial transcript, then it might
well be able to do a convincing job of predicting which reply was most likely
to come next.
However, this approach doesn't help much with something like soundly beating a
9-dan go player - since then, there are not any agents with
sufficiently-superior intellects available to get relevant training data
from.
So, it seems as though predictors can copy and imitate - but are less adept at
innovation. They could imitate a human, but perhaps not pass themselves off as
a superintelligence. Possibly there are ways of making such a machine
think that a superintelligence exists - and get them wondering what actions
it might take - but this gets us into rather contrived territory.
So: machine forecasting isn't quite the same thing as full machine
intelligence. Of course, in order to predict the actions of any intelligent
agents it observes in its environment, a competent forecaster
necessarily has to model them in considerable detail - thereby developing
a model of acting intelligently. So, forecasing is close to
advanced machine intelligence - but it isn't exactly the same thing.
Advantages of the approach
To my eyes, a forecasting agent seems rather like a machine intelligence with
its main evaluation and tree-pruning circuitry stripped out, and a
stripped down set of actuators. Since practically any intelligent agent needs
to consider the future consequences of its actions, that makes forecasting a
subset of almost any machine intelligence project. As with any modular
construction, usually individual components are constructed before they are
assembled.
Forecasting has sufficiently many real-world applications for its development
to be well funded. Predicting stock market prices is perhaps the most obvious
application. Google has discovered that it would like to know your seach query
before you type most of it in. There are many other applications - and they
will provide an economic impetus to developing such systems - allowing their
successes to catalyse their future development.
Compression is a problem where we have mountains of training data. If
a reinforcement learning paradigm is used, the reward is relatively
simple to calculate - and can be applied very rapidly. Also, there are no
mechanical robots or humans in the loop - which is good, since such things
tend to slow the whole system down.
Finally, which evaluation function to use for full machine intelligence
projects is a difficult and controversial area - and tree pruning can quickly
get messy. However, general-purpose data compression is pretty-much a pure
math problem. It is, at the very least, a traditional computer science
problem.
Competitors
I think about the only other likely path to machine intelligence involves
automated programming. In automated (or inductive) programming, computer
programs write other computer programs from specifications in high-level
languages. Automated programming has potentially greater
autocatalytic potential - compared to forecasting. It closes up the build-test
cycle, cutting out human programmers out of the loop, potentially resulting in
much-improved efficiency. However, progress in the field looks slow to me, the
associated autocatalytic process might not start to snowball
terribly early on - and I don't think it has much of a chance of beating an
approach that aims directly at a forecasting agent. Possibly the two
fields might mutually catalyse each other.
In closing
This then, is the case for machine forecasting coming first. Once we have
machine forecasting, machine intelligence will probably follow
relatively quickly.
It does seem very likely to me that we will get machine forecasting
first. However, I note that only a few seem individuals seem to be currently
engaged in the area. In my humble opinion, the approach seems so promising
that many of those interested in machine intelligence should seriously
consider either aiming for such an agent - or considering how to deal
with the consequences of one being produced.
Hi! I'm Tim Tyler, and this is a rather technical video about machine
forecasting - in other words, it is about automating the task of predicting the future.
In my last video on the topic I discussed the reasons for thinking that a
forecasing agent might be an important stepping stone on the path towards
creating general-purpose machine intelligence. In this video I will look
further into one of the topics I discussed there - and consider what access to
a powerful forecasting agent would allow you to achieve, and how close to
general machine intelligence it would actually get you.
Firstly, forecasting and stream compression are equivalent problems - as I
have explained in more detail elsewhere, and you should bear that in mind for
the next minute or so.
Many seem to consider that access to a powerful real-time stream compression
system would be equivalent to access to a powerful machine intelligence.
For example, Matt Mahoney writes:
I argue that compressing, or equivalently, modeling natural language text is
"AI-hard". Solving the compression problem is equivalent to solving hard NLP
problems such as speech recognition, optical character recognition (OCR), and
language translation. I argue that ideal text compression, if it were possible,
would be equivalent to passing the Turing test for artificial intelligence
(AI), proposed in 1950.
Matt goes on to say:
In 2000, Hutter proved that finding the optimal behavior of a rational
agent is equivalent to compressing its observations.
Then he goes on to argue that a powerful text compression engine would allow
machines to pass the Turing test - if they were given a sufficiently
large quantity of relevant training data.
I agree that - if a sufficiently powerful predictor watched enough Turing tests
take place, and was then given a partial transcript - then it would be able to
do a convincing job of predicting which reply was most likely to come next.
However, I am not really convinced that advanced forecasting is equivalent to
advanced machine intelligence.
A powerful forecaster seems to be rather like an intelligent agent
with some pieces missing.
One way of representing a cybernetic diagram of an intelligent agent would be
as the combination of a forecasting component, an evaluation function, and a
tree-pruning algorithm.
If you just have a forecasting component, then you can use that to predict the
consequences of your possible actions. However, you would really need
an evaluation function and a tree-pruning algorithm to make full use of this
ability.
Simple tree-pruning algorithms are available as off-the-shelf
components - though in general, tree-pruning can be a non-trivial problem.
However, the evaluation function is a substantial and non-trivial element,
which would not be supplied with the forecasting component.
A powerful forecaster would be able to predict the behaviour of agents in its
environment. If it watched some experts playing go, it would model them in
considerable detail in order to acquire the ability to predict their moves.
Being able to do that would mean that such a forecaster would be a pretty smart
machine.
Such a forecaster would need to at least know something about tree
pruning and evaluation functions if it was successfully able to model and
predict the actions of other agents in its environment. However, modelling the
evaluation function of others isn't quite the same thing as having your
own evaluation function.
Imagine for a moment that you had access to a black box containing a fairly
powerful forecaster - and that you wanted to use it to obtain a program which
was able to play the game of go better than the best humans can.
In practice, you can't just graft a tree pruning algorithm - and the idea of
winning games as an evaluation function onto a forecasting component. The
resulting composite machine would spend its lifetime exploring the resulting
search space of mover - and might never get any positive rewards through
winning games. Evaluation functions in go are complex. Deciding which
evaluation function to use is a non-trivial part of finding the solution to
the problem - and having a forecasting component doesn't obviously
help terribly much with the problem.
Perhaps it would be possible to trick the forecaster into thinking a
superintelligent go player already existed - and then get it to predict its
moves. However, without actually seeing the actions of a superintelligent go
player, the existence of such an agent might not be obvious to the forecaster.
It might be challenging to fool the forecaster into believing that such a
creature actually existed. The forecaster might instead apply Occam's
razor - and decide that such an explanation was a needlessly complex way of
interpreting the data - and conclude, accurately, that it is more likely that
someone is trying to fool it.
Much the same kind of thing might happen if you tried to get the forecaster to
imagine printing out the source code for such an agent.
If I was convinced that an intelligent machine go player existed, I
would be hard pressed to write down a reasonable guess at what their source
code was.
Even if you did trick a forecaster into counter-factually believing that such
an expert go playing agent existed, it might not do a very good job of
predicting either its actions or its source code - unless it was itself
extremely powerful.
Like humans, an agent might be most adept at creating lesser agents - and have
problems creating its own intellectual peer.
So, it seems to me that a forecasting component doesn't seem to be
quite the same thing as advanced machine intelligence would be. However,
it does seem likely that an advanced prediction engine would allow you to
build creatures that imitated humans - if given sufficient training time.
With enough human-like machines around, the rate of scientific and technical
progress might well increase substantially. That would allow faster research
and progress into the field of machine intelligence - facilitating a gradual
evolutionary boot-strapping process - that would then lead to full
superintelligence.