The One True Razor
Occam's Razor
Occam's Razor refers to the idea that the shortest
explanation is often the best one.
The razor is useful in a variety of domains - including science,
AI and rational discourse.
There have been various attempts to formalise the idea. One of the
first issues that comes up is: the shortest explanation in which
language.
Formalising the razor
Formalising the razor is often done using the idea of Kolmogorov
complexity. Kolmogorov complexity is used to quantify
arbitarary informational entities. The Kolmogorov complexity
of an object refers to the length of the shortest program which
produces that object as its output.
However the idea merely transforms the problem of which
language into the problem of which programming language.
Which language?
Various solutions have been proposed. One is to use FORTRAN 77.
FORTRAN 77 is a common, small, simple, completely-specified
programming language, that many people have heard of.
Another is to use some kind of simple Turing machine.
Often such choices are justified by the argument that interpreters
can be coded concisely in these languages - so using another
language only costs you the size of the interpreter.
This argument is OK - provided the entities you want to compare are
large by comparison with the size of the interpreter. However, Occam's Razor
is often used to distinguish between scientific theories and logical
arguments - which are frequently themselves relatively small.
In such cases, the argument that the choice of language doesn't matter
much (because you can switch languages easily by writing an interpreter)
doesn't wash - and it becomes important to pick a sensible language
in the first place.
Choosing a reference machine
One possibility is to use a physical machine.
People can argue all day about which is the best programming language
to measure complexity with - but physical machines are real concrete
things, that execute god's programming language: the physical
laws of the universe.
Unfortunately for this plan, we don't know what the physical laws of the
universe are. Nor can we convincingly build very tiny machines. So, this
metric is not very well defined today. Our experience of shrinking machines
suggests that their capabilities change qualitatively as scale changes - e.g.
consider quantum computers - so such a metric is not very likely to produce
results that are stable as technology progresses. Possibly, its usefulness may
improve in the future - but this doesn't help us very much now.
Another possibility is to forget about the actual laws of physics, and use an
abstract version of them - such as a three-dimensional reversible
universal cellular automaton (RUCA).
Firstly, note that an abstract machine produces quite different
estimates of the complexity of some things. Essentially, a physical
machine has experimental access to the laws of physics - and their
associated physical constants - while the abstraction does not. If you
ask a physical machine to describe what happens when a gold atom
smashes into a lead crystal, it can experimentally execute that
scenario and watch the results, while an abstract simulation can't do
that. Since many physical constants appear to be complex and
incompressible, there can be big differences in the resulting
complexities.
Defining what the laws of physics are deals with the issue of
not knowing what they are - while introducting a certain level of
arbitrariness and unreality. However, the next issue is which cellular
automaton to pick.
This is not a trivial issue. In some cellular automata,
machines can be constructed to expand when they require more workspace
(or memory). In other cellular automata this is impossible: any
workspace which a machine can use must be built into its
initial configuration - or it remains forever inaccessible. Obviously
this can make a big difference to the size of the initial
configuration required to produce a specified output. Nor is it
obvious which type most closely represents the laws of physics -
mass-energy can apparently be created in theory (e.g. by
inflation). However, this seems to be pretty difficult in practice.
Time and resources
Another issue is that compressors are not just judged by the size of their
output. There are other issues to consider - for example, their compression
speed, their usage of space, and their energy demands.
If you observe a sequence produced by some physical process, then the idea
that the shortest program that produced it is the most likely is really only
one prior among many. Maybe the program with the smallest total usage of
resources is the most likely one. Or maybe execution time is critical - and the
program that produces the results fastest is the most likely one. It depends.
Since it seems evident that different types of priors are best under different
circumstances, giving all weight to the shortest initial program seems a
curious and dubious move.
The one true razor
Some bold claims of optimality have been made on the basis of Occam's Razor. For example, consider:
These claims seem to be misleading headlines to me. The optimality rests on
the optimality of the underlying razor, and this is unproven - and indeed
unlikely to actually be very optimal.
The truth is that there is no one true razor. Just as Kolmogorov
complexity is language dependent, so Occam's Razor is
description-dependent.
There are many different versions of Occam's Razor
- and different versions are appropriate under different
circumstances.
Also, issues such as energy consumption and processing time
are relevant when considering real-world bitstrings - but are ignored by
metrics that consider size only.
If you want to know if bitstring X is more probable that
bitstring Y the answer depends on how those strings were
produced - not simply on the properties of the strings themselves.
References
|