Beyond Occam's razor

Transcript:

Hi. I'm Tim Tyler, and this is a brief video about Occam's razor.

Occam's razor is a central principle in science. It suggests that, if you have multiple hypotheses that explain a given set of observations, then you should prefer the least complex hypothesis. There's some debate over exactly what "complexity" means in this context. However, despite this problem, the principle has massive empirical support.

There's another interesting wrinkle, though - which was brought to my attention via a 2012 paper by Alexey Potapov, Andrew Svitenkov and Yurii Vinogradov. If you want to make predictions you shouldn't just use the simplest explanation. Instead, you should consider an ensemble of possible explanations and weight them according to a negative exponential function of their length.

It might seem as though this would make little difference - since an explanation that is shorter by one bit has considerably more weight. However, in principle, longer explanations could outnumber and outweigh the shortest one, and this could mean that a completely different set of predictions is more likely than the ones offered by the least complex hypothesis.

What is the significance of the need for an ensemble oif hypotheses? On one hand, it will often make little practical difference. On the other, you could consider that the need for an ensemble of hypotheses means that Occam's razor is reduced to an irrelevance. It no longer matters terribly much what the shortest explanation says: using a single explanation is not a very good idea in the first place - since really you need to consider an ensemble of hypotheses if you are serious about making predictions.

Some formulations of Occam's razor are more badly hit than others. For instance one formulation says that: "entities must not be multiplied beyond necessity". However it turns out that the most accurate approach to prediction actually contains an enormous number of entities in a massive ensemble - far more entities than are present in the simplest explanation.

Even the term "razor" no longer seems appropriate. The idea of a razor suggests that the more complex hypotheses are discarded. However, it turns out that you ideally should not just "trim off" more complex hypotheses, you need to keep many of them around. Not just in case they turn out to be true, but because they contribute probability mass to current predictions.

No doubt the contents of this video will seem kind-of obvious to some. I'm sure that some will say that the need to keep longer explanations around is already well known - and that algorithmic probability has been around since the 1960s. However, I'm not sure that the message has completely got out. Instead of trimming off longer hypotheses, we should really be keeping them around and making use of them. So: Occam's razor has turned out to be the disposable kind. It is a useful heuristic, but it is not really the correct way to make predictions from an ensemble of hypotheses.

Enjoy,

Tim Tyler | Contact | http://matchingpennies.com/