Deep prediction
This document presents a high-level modular architecture for machine intelligence systems.
The architecture is based around a 'deep learning' component used to model sensory inputs and
forecast their future values.
Advantages of modularity
Modularity allow problems to be broken down into smaller encapsulated components with
clearly-defined interfaces between them. The smaller modules are then easier to build and test.
The advantages of modularity mean that computer programmers like to
"divide and conquer". Many problems may be broken up into sub-problems
which are then themselves easier to solve.
The anatomy of the brain offers some hope for those who would seek to
divide intelligence up. The brain is split into two hemispheres which
are only sparsely connected. Also, the cerebellum acts as a separate
brain within the brain.
Here we propose a 3-way modular split: modelling, forecasting and evaluation.
Modelling, forecasting and evaluation
One of the most fundamental components of intelligent machines is a
modelling engine. This takes in a sensory stream and builds a model
of it. This model can then be used to make forecasts.
Given a predictive model, acting intelligently typically consists of
considering your possible actions, their consequences, and then taking
the action with the highest expected utility. This generally involves
building and managing a tree of future possibilities.
A tree-pruning algorithm is usually employed - to avoid the tree of
possible futures from undergoing a computational explosion. The leaf nodes of
the tree are then fed to an "evaluation" component.
Deep prediction
The term "deep prediction" refers to using a deep learning engine
(i.e. a multi-layer neural network) to implement the modelling engine,
while interfacing this to standard computer science forecasting
approaches and human-readable position evaluation algorithms.
The main motivation for this is to combine the power of modern deep learning
techniques with modern tree-management and tree-pruning algorithms, while
retaining transparency when it comes to the evaluation function.
The deep learning component would be continuously rewarded according
to whether the predictions of its model matched its sensory inputs.
The overall architecture described here involves dual optimizers.
One optimizer is the deep learning engine used to implement the
modelling component. The other optimizer is the system as a whole.
The modelling component seeks to build an accurate model of its
sensory inputs. The whole system seeks whatever goal is specified
in the position evaluation component.
The neural net has no actuators - except for the prediction it makes.
It is a pure knowledge-seeking agent - an oracle that answers one
question "what will happen next?" One previously-expressed concern
about knowledge-seeking agents is that they become too interested in
modelling sources of noise. That fate seems to be avoided by this
architecture - since the main goal specified in the evaluation
component gets to decide what direction the sensors point in, and
what filters are used to pre-process the sensory inputs. If noise
has a negative impact on performance, it will be avoided.
Other components
Other modular components have been proposed, but most of them fit fairly
easily into the modelling, forecasting and evaluation framework:
- Move generation - here the generation of actions
would be part of the forecasting component.
- Quiescence detection - here, avoiding position
evaluation under dynamic conditions is seen as being the terminal
part of the tree-pruning process - and as part of the forecasting
component.
- Branch detection - if the modelling component
outputs a probability density function over possible outcomes,
then branch detection becomes relatively simple - since branch
points are places where the current state has multiple probable
successors. Branch detection would be part of the forecasting
component.
- Forgetting is another common component. Here we
regard forgetting as being a part of the modelling engine. Neural
networks manage their own forgetting.
Implementation issues
Classical tree search algorithms will repeatedly reset the modelling component
to a previous state. Assuming that "backing up" the modelling component is
an expensive operation, one implementation possibility is to make the
modelling component using reversible logic - so it can be run backwards to
recover its previous state. Although reversibility is not high on the agenda for
many neural network researchers, there are standard techniques for constructing
reversible systems out of irreversible ones - for example in the theory of
cellular automata. So making a reversible neural network should not be
too demanding.
Most tree search algorithms are well placed to take advantage of parallel
execution. The problem here is again that copying the modelling component
is likely to be an expensive operation. A plausible approach here is to
maintain multiple copies, and feed them identical sensory inputs. Error
detection and correction algorithms could be employed to ensure that the
copies don't get out-of-sync over time due to sources of indeterminism.
Alternative implementation
Another possibility is to allow humans to evaluate the solutions. Here,
modelling and forecasting components are the same, but the resulting outcomes
are shown to humans - instead of being fed into an evaluation function.
Here, machine intelligence is used to increase human forecasting abilities -
without affecting human values. Of course, this option would have
an associated performance degradation.
Misc problems
One problem not solved by the "deep prediction" approach involves
translating from sensory inputs to a world model. It would be nice to
feed a world model (rather than sensory inputs) to the evaluation
component. If you feed sensory inputs to the modelling component,
it is natural to have it output a probability distribution over
expected sensory inputs - and then reward close matches between
actual and expected perceptions. The modelling component necessarily
builds a world model in this case - but doesn't directly
expose it.
|