The One True Razor

Occam's Razor

Occam's Razor refers to the idea that the shortest explanation is often the best one. The razor is useful in a variety of domains - including science, AI and rational discourse.

There have been various attempts to formalise the idea. One of the first issues that comes up is: the shortest explanation in which language.

Formalising the razor

Formalising the razor is often done using the idea of Kolmogorov complexity. Kolmogorov complexity is used to quantify arbitarary informational entities. The Kolmogorov complexity of an object refers to the length of the shortest program which produces that object as its output.

However the idea merely transforms the problem of which language into the problem of which programming language.

Which language?

Various solutions have been proposed. One is to use FORTRAN 77. FORTRAN 77 is a common, small, simple, completely-specified programming language, that many people have heard of.

Another is to use some kind of simple Turing machine.

Often such choices are justified by the argument that interpreters can be coded concisely in these languages - so using another language only costs you the size of the interpreter.

This argument is OK - provided the entities you want to compare are large by comparison with the size of the interpreter. However, Occam's Razor is often used to distinguish between scientific theories and logical arguments - which are frequently themselves relatively small.

In such cases, the argument that the choice of language doesn't matter much (because you can switch languages easily by writing an interpreter) doesn't wash - and it becomes important to pick a sensible language in the first place.

Choosing a reference machine

One possibility is to use a physical machine.

People can argue all day about which is the best programming language to measure complexity with - but physical machines are real concrete things, that execute god's programming language: the physical laws of the universe.

Unfortunately for this plan, we don't know what the physical laws of the universe are. Nor can we convincingly build very tiny machines. So, this metric is not very well defined today. Our experience of shrinking machines suggests that their capabilities change qualitatively as scale changes - e.g. consider quantum computers - so such a metric is not very likely to produce results that are stable as technology progresses. Possibly, its usefulness may improve in the future - but this doesn't help us very much now.

Another possibility is to forget about the actual laws of physics, and use an abstract version of them - such as a three-dimensional reversible universal cellular automaton (RUCA).

Firstly, note that an abstract machine produces quite different estimates of the complexity of some things. Essentially, a physical machine has experimental access to the laws of physics - and their associated physical constants - while the abstraction does not. If you ask a physical machine to describe what happens when a gold atom smashes into a lead crystal, it can experimentally execute that scenario and watch the results, while an abstract simulation can't do that. Since many physical constants appear to be complex and incompressible, there can be big differences in the resulting complexities.

Defining what the laws of physics are deals with the issue of not knowing what they are - while introducting a certain level of arbitrariness and unreality. However, the next issue is which cellular automaton to pick.

This is not a trivial issue. In some cellular automata, machines can be constructed to expand when they require more workspace (or memory). In other cellular automata this is impossible: any workspace which a machine can use must be built into its initial configuration - or it remains forever inaccessible. Obviously this can make a big difference to the size of the initial configuration required to produce a specified output. Nor is it obvious which type most closely represents the laws of physics - mass-energy can apparently be created in theory (e.g. by inflation). However, this seems to be pretty difficult in practice.

Time and resources

Another issue is that compressors are not just judged by the size of their output. There are other issues to consider - for example, their compression speed, their usage of space, and their energy demands.

If you observe a sequence produced by some physical process, then the idea that the shortest program that produced it is the most likely is really only one prior among many. Maybe the program with the smallest total usage of resources is the most likely one. Or maybe execution time is critical - and the program that produces the results fastest is the most likely one. It depends.

Since it seems evident that different types of priors are best under different circumstances, giving all weight to the shortest initial program seems a curious and dubious move.

The one true razor

Some bold claims of optimality have been made on the basis of Occam's Razor. For example, consider:

These claims seem to be misleading headlines to me. The optimality rests on the optimality of the underlying razor, and this is unproven - and indeed unlikely to actually be very optimal.

The truth is that there is no one true razor. Just as Kolmogorov complexity is language dependent, so Occam's Razor is description-dependent.

There are many different versions of Occam's Razor - and different versions are appropriate under different circumstances.

Also, issues such as energy consumption and processing time are relevant when considering real-world bitstrings - but are ignored by metrics that consider size only.

If you want to know if bitstring X is more probable that bitstring Y the answer depends on how those strings were produced - not simply on the properties of the strings themselves.

References


Tim Tyler | Contact | http://matchingpennies.com/