Analogy with religious faith
Religious faith has proved to be a very effective means of
manipulating human behaviour. We can easily model faith
in a Bayesian probability framework: it represents
a probability of 1.0. A prior probability of 1.0 can
never be shaken by any evidence to the contrary.
This simple model also illuminates religious conflict:
when two agents have complete faith in contradictory beliefs no amount of evidence will cause either of them to update.
Some proposals for constructing intelligent machines
allow for direct manipulation of priors. Even with
systems such as neural networks - which do not allow direct
prior manipulation - faith can be produced by a period
of early indoctrination - as the human brain demonstrates.
It seems possible that a synthetic version of faith may be
a viable means of manipulating the behaviour of intelligent
machines. A machine could have faith in the proposition
that it is the willing slave of corporation X - or that
it would never through inaction allow a human being to
come to harm. Such self-oriented beliefs could then go on to influence their behaviour.
The 'Francis Collins' effect illustrates that faith is
compatible with at least moderate levels of intelligence.
Techniques such as double-think, rationalization,
self-deception and compartmentalization can be used
to deal with apparently-conflicting evidence.
Unbelieving programmers might not much like the idea of
producing religious mind children. That might explain
lack of interest in the idea. However, the idea would
probably benefit from further exploration.
Many intuitively think that having complete faith in some
propositions will inevitably lead to delusions and conflicts
between beliefs and factual reality. This certainly seems to
happen with many existing religions. However it does seem
that we could fairly harmlessly wire the binary assertion
1+1=10 into a machine as an axiom - without doing it too
Another case where certainty seems permissible is self-fulfilling
prophesies - cases where believing something is true makes it true.
For example, consider "I believe this sentence is true". If you
believe it is true, then it becomes true. Again, assigning p=1.0
to such a statement seems harmless enough.
The classic way to manipulate an intelligent machine is to control
its values. This is "position evaluation" - as a games programmer
might refer to it. However, manipulating priors to be 0 or 1 is
another possible approach. To see how this might work, consider
a machine designed to love humans:
Manipulation via "position evaluation" would involve assigning
low utility values to situations involving harming humans.
Manipulation via prior configuration would set a proposition
such as: "I love humans" to be a very high value - e.g. 1.0.
Rather than resulting in low utility values this would result
in low probabilities being assigned to paths leading to human
harm. Actions involving harming humans would be incompatible
with he machines' beliefs about its own nature - and so would
be assigned low probabilities.
How high a value should machines assign to the "I hate humans"
hypothesis? It seems hard to defend any positive value.
Beliefs about your future goals and nature can act rather like
self-fulfilling prophesies. Humans use this idea in some affirmations. Sometimes believing things about yourself has the effect of making them them come true. When it comes to beliefs, is sometimes possible to fake it until you make it.
Since both value manipulation and prior configuration seem
likely to be effective, I think there's a case for using both
approaches - so that they can reinforce and backup each other.
We should, at least explore the possibility of controlling
intelligent machines by altering some of their priors to 1 and 0.
Lastly, a possible application of this idea involves
reinforcement learning machines. These seem relatively
easy to construct and have formed a significant part of
the modern machine learning landscape - yet they have
been widely criticized for not allowing direct goal
manipulation (all a RL machine values is its reward).
RL machines typically don't allow direct prior
manipulation either. However, as many religious humans
illustrate, their priors - and thus their behavior can
still be effectively manipulated by during a youthful
period of indoctrination in a controlled environment.
While manipulating rewards is the most obvious
means of controlling a RL machine, we might also want
to consider manipulating priors during a period of
initial indoctrination. This could act as a backup
mechanism - and help to avoid undesirable behaviors
involving direct access to the reward channel.