Hi, I'm Tim Tyler, and this is a video about Solomonoff induction.
Solomonoff induction is an abstract model of high-quality mathematical induction.
It refers to an entity that takes as its input a finite sequence of symbols - and then predicts the probabilities associated with each of the possible symbols that might come next in the stream.
Algorithmic probability
Solomonoff induction is based on the idea of algorithmic probability.
Algorithmic probability provides an estimate of the probability of a particular finite string of symbols arising - based on how likely a randomised computer program is to produce output prefixed with that string.
Essentially, algorithmic probability works by considering what fraction of the set of all possible computer programs in a specified programming language produce the target sequence as a prefix of their output.
So, for example, in the space of all possible programs which produce binary output strings, there are usually a lot of that produce outputs that start with a million zeros. In English, a million zeros is easy to describe - you can just say "a million zeros". It is usually easy to represent in a computer program as well, you just have a loop that outputs the "zero" symbol inside it.
Such highly-compressible strings are common output prefixes in the space of all possible computer programs.
The idea of calculating a fraction of an infinite set of programs may seem potentially problematical. However, if you consider the notion of random sampling from the space - and taking the limit as the number of samples tends towards infinity - it may start to seem a little more reasonable.
Note that algorithmic probability is defined with respect to a programming language. Often this is assumed to be some kind of simple Turing machine. The language used is sometimes described as being a reference machine.
Forecasting
If you can calculate the algorithmic probability of arbitrary sequences, then you can use that ability in order to make probabilistic forecasts.
To calculate the probability of some particular symbol occuring next in a given stream, you would first calculate the algorithmic probability of the stream so far, and then calculate the algorithmic probability of the stream with the candidate symbol appended.
Once you have those figures, you can then use the math of conditional probability to calculate the probabilities of that particular symbol occuring next. In this case, that involves dividing the probability of the longer sequence by the probability of the prefix of it that has already been observed.
This process can then be iterated to produce longer-range forecasts, if needed.
Approximations
Unfortunately, algorithmic probability and Solomonoff induction are both uncomputable. Attempts to calculate them directly have difficulties dealing with the infinite number of possible programs - and tend to run into the halting problem. However, there are approximations of Solomonoff induction which are computable - for example, Levin search.
In practice, approximations often limit program length and program execution time - to make the process more tractable. Sometimes, compression programs and optimisation techniques are used to identify small programs that model the data - rather than exhaustively searching for them.
Reference machine considerations
I said earlier that algorithmic probability - and thus Solomonoff induction were dependent on a choice of programming language - or a reference machine. The results can vary dramatically depending on which reference machine is chosen. This is similar to the situation with Kolmogorov complexity - which depends on a reference machine in exactly the same way. Usually a simple reference machine is chosen.
What is the significance of the choice of reference machine?
In the case of forecasters or intelligent agents, if you are dealing with a very small quantity of data, the choice of reference machine can have a significant impact on the results that are obtained. However, if the system observes a lot of data, its prior probabilities usually get swamped by the data that has previously been observed. Then the choice of reference machine is not so critical.
Applications
Computable approximations of Solomonoff induction are used to provide prior distributions for use with Bayes's theorem.
Solomonoff induction can be seen as a formalised version of induction using Occam's razor. The idea is as the heart of many machine forecasting and machine intelligence projects.