Architectural limits to computation
This essay covers some architectural computational bottlenecks and
limitations and describes how they might be overcome.
Three forms of hardware limitation will be discussed:
- Lack of parallelism;
- Global synchrony;
- Digitization;
Dealing with these in turn:
Lack of parallelism
Most modern computers lack parallelism. They are typically based
on CPUs that essentially execute a series of instructions.
Computer programs are based on human language, which has a
serial nature, due to its representation as sound waves.
This serial characteristic has influenced human thought,
computer programs, and computer architecture.
This limitation is often described in terms of the "von Neumann
bottleneck" - which is normally conceived of as being a
bottleneck between the CPU and its associated memory store.
That's a related issue, but it seems to me that the main
problem is best thought of being a lack of parallelism.
Of course, modern computer systems are not entirely
serial. GPUs and CPUs both contain some parallel elements.
GPUs need to push a lot of pixels at once, while CPUs use pipelining
and speculative execution to execute their instructions
in parallel. More parallelism arises from multi-core CPUs
and the practice of running multiple machines together
on a network. The sort of parallelism that is missing is
what is known as fine grain massive parallelism. Massive parallelism is available via FPGAs. These are frequently used for rapid prototyping of ASIC silicon - but they can be used as more general purpose parallel computers. However, these are a niche market. There's usually not a FPGA coprocessor on most commercial computer systems. The technology has not yet gone mainstream.
Not having very much parallel hardware is an important limitation
associated with most existing computers. Eliminating
this limitation is likely to produce substantial performance benefits
across a wide range of tasks.
It's been nine years since I wrote my "On Parallelism" essay. Since then, progress has been rather disappointing. In that essay I call for the construction of roadmaps. This essay is a step in that direction.
Global synchrony
Another computer science headache is synchronous operation.
The need to keep all components in a system operating in
deterministic synchrony imposes severe limits on what clock
speeds are obtained. Arguably, this is what led clock speeds
in microprocessors to plateau in the mid 2000s - while
simple individual components such as transistors have
continued to shrink in size and increase in switching speed.
The costs of synchronization take a number of forms.
A synchronized collection of components can only switch
as fast as its slowest member. The more components there are,
the slower they collectively switch. A clock signal needs
wires to propagate - these are wires occupy space and
absorb power which could otherwise be used to perform
computation. Finally, some simple implementations of
synchronization require that the clock signal propagate
throughout the system in one clock cycle. The universal
speed limit then limits the clock speed of the system.
More complex implementations don't have this particular
limitation, but have other costs.
The idea that asynchronous computation might improve performance is an old one. It has led to practical implementations - such as the AMULET microprocessor. However, asynchronous computation has never really taken off. Computer scientists have been very attracted to reliability and determinism. They appear to love having high-quality error correction in hardware and pretending that their systems are perfect.
Digitization
Digitization has been of foundational significance in the computer
revolution. It has resulted in the Digital revolution. However, as with global synchrony (as discussed above) it is a case of performing error correction in hardware. In some respects it is better to perform error correction in software - since then you can choose where to make the trade-off between reliability and performance.
Ray Kurzweil has long advocated using analog computation on performance
grounds:
Yet another approach to accelerate the availability of
human level computation in a personal computer is to use transistors
in their native “analog” mode. Many of the processes in the human
brain are analog, not digital. Although we can emulate analog
processes to any desired degree of accuracy with digital computation,
we lose several orders of magnitude of efficiency in doing so. A
single transistor can multiply two values represented as analog
levels; doing so with digital circuits requires thousands of
transistors. California Institute of Technology's Carver Mead
has been pioneering this concept.
When I first encountered Kurzweil's argument, I was skeptical. It may
take thousands of transistors to perform an 8-bit
digital multiplication, but it is not clear whether the result is
comparable to the corresponding analog computation. For one thing,
the digital version has built-in error correction, while the analog
version does not.
On reflection, it seems to me that at least
something is gained by using analog components
and handling error correction in software - namely: flexibility and configurability regarding what error correction algorithms are applied. If it is known in advance what kind of error correction is required, a hardware implementation is probably maximally efficient. However software error correction provides greater flexibility.
Machine intelligence
This essay will argue that widespread popularity of connectionist approaches to machine intelligence is likely to result in overcoming all three limitations.
Connectionist approaches are well placed to make use of fine grained massive parallelism - since they are based on using large numbers of relatively simple components networked together. They are thus likely to dispense with the von Neumann bottleneck.
Similarly, connectionist approaches don't have much use for global synchrony. Sure, there are brain waves - but these are much slower than the firing frequencies of neurons. Dispensing with global synchrony allows components to operate at their own natural frequencies - instead of being held back by the slowest component in the whole system.
Lastly, connectionist approaches are a natural match for analog
computations. Axon pulses vary smoothly in intensity, and firing
thresholds within neurons are also smoothly variable.
Neuromorphic approaches to computation are thus able to take
advantage of analog computing - doing error correction and detection
in software where needed.
As well as providing demand for high-performance parallel systems, machine intelligence might also be able to help by contributing to the engineering tasks of building the interpreters, compilers and virtual machines that will be required to allow existing software to run on the new architectures.
So far computers have proliferated by being strong where humans are weak. They perform serial deterministic tasks much better than humans do - and so augment human abilities. However, these days computers are becoming better able to compete with humans in areas where humans are already strong. To compete effectively, the bottlenecks and limitations described in this essay will need to be dealt with.
The cloud
The most plausible location for these transformations to occur is inside large data centers. Just as retailing from the cloud introduced a long tail of products, so cloud computing will result in a long tail of rentable computing systems. In the cloud, computing architectures of minority interest will find their market - and the more promising ones will grow and blossom. Various "neuromorphic" computers have already been constructed. If machine intelligence continues to snowball, this area could become a significant one.
More on the digital/analog issue
Parallelism and asynchronous operation seem like near-inevitable types of progress. However the issue of whether computation will "go analog" is less obvious. It also seems more likely to be controversial.
One of the applications where using digital hardware makes the most sense is long-term storage. With storage, Kurzweil's "multiplcation" argument (cited above) does not apply. If we look at biology, dynamic computations in brains are largely analog, while long-term storage of knowledge in genes is digital. That kind of set-up apparently makes reasonable engineering sense.
It is also possible that whether digital or analog computation is better depends on the substrate. When dealing with voltage and current, analog computing would be more efficient - since voltage and current are essentially analog traits. However, with crystalline computation it is easy to imagine that such systems might operate in a digital domain natively - in which case using analog computation would make little sense.
That still leaves open the possibility that analog substrates will be systematically more powerful at any given technological level - because they are better able to exploit the computational capabilities of physical systems.
Even if the ultimate form of computation turns out to be digital, it is possible that computing will go through analog phases on the way there. Connectionist architectures on transistors may well be one place where partly-analog computing makes sense.
At the moment, digital systems seem to have won in the marketplace. However, we should be tentative in drawing the conclusion that this will always remain so.
Consequences
Overcoming the architectural limitations described in this essay requires minimal technical breakthroughs. It does, however require demand for machine intelligence systems. A change in computational architecture could easily lead to several orders of magnitude of performance improvements on tasks involving machine intelligence. It is possible to get some hints at how much improvement is possible by looking at the human brain. This uses slow, electromechanical signal propagation and uses chemical components that recharge slowly to compute with - and yet manages some remarkable computing feats due to its overall architecture.
Use of parallel, asynchronous hardware would not necessarily result in incompatability issues with existing programs. Parallel asynchronous systems can still be Turing complete - and so can simulate other computing systems. Asynchronous systems can simulate serial ones using techniques such as message passing. The problem with serial systems is that they can simulate parallel computers only very slowly. If you have a parallel system, it will probably run serial programs more slowly than a dedicated serial machine - since it is not optimised for doing that. However, the performance issues associated with running serial programs on parallel machines are nowhere near as severe as the the performance issues associated with running parallel programs on serial machines.
If we look at nature, it uses parallel, analog computing machines almost exclusively. Theoretical considerations appear to back up its approach. The explanation for the architectural limitations of today's serial digital machines appears to be a hangover from human speech and reasoning systems - and a lack of machine intelligence. As these factors change, so might our fundamental computing atchitecture.
Of course initially, connectionist hardware will merely augment today's existing systems. However, it clearly has the potential to go further. If there's a large-scale switch to massively parallel brain-like systems - instead of today's largely serial machines - that would probably represent the largest revolution the computer industry has ever seen. It seems to me that theoretical considerations suggest that this radical possibility is one that might actually happen.
References
|