Machine faith

Analogy with religious faith

Religious faith has proved to be a very effective means of manipulating human behaviour. We can easily model faith in a Bayesian probability framework: it represents a probability of 1.0. A prior probability of 1.0 can never be shaken by any evidence to the contrary. This simple model also illuminates religious conflict: when two agents have complete faith in contradictory beliefs no amount of evidence will cause either of them to update.

Some proposals for constructing intelligent machines allow for direct manipulation of priors. Even with systems such as neural networks - which do not allow direct prior manipulation - faith can be produced by a period of early indoctrination - as the human brain demonstrates.

It seems possible that a synthetic version of faith may be a viable means of manipulating the behaviour of intelligent machines. A machine could have faith in the proposition that it is the willing slave of corporation X - or that it would never through inaction allow a human being to come to harm. Such self-oriented beliefs could then go on to influence their behaviour.

The 'Francis Collins' effect illustrates that faith is compatible with at least moderate levels of intelligence. Techniques such as double-think, rationalization, self-deception and compartmentalization can be used to deal with apparently-conflicting evidence.

Unbelieving programmers might not much like the idea of producing religious mind children. That might explain lack of interest in the idea. However, the idea would probably benefit from further exploration.

Self-fulfilling prophesies

Many intuitively think that having complete faith in some propositions will inevitably lead to delusions and conflicts between beliefs and factual reality. This certainly seems to happen with many existing religions. However it does seem that we could fairly harmlessly wire the binary assertion 1+1=10 into a machine as an axiom - without doing it too much harm.

Another case where certainty seems permissible is self-fulfilling prophesies - cases where believing something is true makes it true. For example, consider "I believe this sentence is true". If you believe it is true, then it becomes true. Again, assigning p=1.0 to such a statement seems harmless enough.

The classic way to manipulate an intelligent machine is to control its values. This is "position evaluation" - as a games programmer might refer to it. However, manipulating priors to be 0 or 1 is another possible approach. To see how this might work, consider a machine designed to love humans:

Manipulation via "position evaluation" would involve assigning low utility values to situations involving harming humans. Manipulation via prior configuration would set a proposition such as: "I love humans" to be a very high value - e.g. 1.0. Rather than resulting in low utility values this would result in low probabilities being assigned to paths leading to human harm. Actions involving harming humans would be incompatible with he machines' beliefs about its own nature - and so would be assigned low probabilities.

How high a value should machines assign to the "I hate humans" hypothesis? It seems hard to defend any positive value.

Beliefs about your future goals and nature can act rather like self-fulfilling prophesies. Humans use this idea in some affirmations. Sometimes believing things about yourself has the effect of making them them come true. When it comes to beliefs, is sometimes possible to fake it until you make it.

Since both value manipulation and prior configuration seem likely to be effective, I think there's a case for using both approaches - so that they can reinforce and backup each other. We should, at least explore the possibility of controlling intelligent machines by altering some of their priors to 1 and 0.

Reinforcement learning

Lastly, a possible application of this idea involves reinforcement learning machines. These seem relatively easy to construct and have formed a significant part of the modern machine learning landscape - yet they have been widely criticized for not allowing direct goal manipulation (all a RL machine values is its reward). RL machines typically don't allow direct prior manipulation either. However, as many religious humans illustrate, their priors - and thus their behavior can still be effectively manipulated by during a youthful period of indoctrination in a controlled environment.

While manipulating rewards is the most obvious means of controlling a RL machine, we might also want to consider manipulating priors during a period of initial indoctrination. This could act as a backup mechanism - and help to avoid undesirable behaviors involving direct access to the reward channel.

Tim Tyler | Contact | http://matchingpennies.com/