Review of Ethical Artificial Intelligence

I read Bill Hibbard's book "Ethical Artificial Intelligence" when it first came out - in 2014. It was the best book I read that year. However, it's a tricky book to review because it raises so many interesting issues. Probably my main reason for wanting to write a review is so that I can comment on some of the positions expressed in chapter 6 - a chapter about avoiding delusion. However, firstly some preamble:

Bill's book covers the intersection of machine intelligence and ethics. This has turned into a popular subject area relatively recently, with lots of people expressing their views. It is also a rather controversial topic - where opinions are have become rather polarized. On the one hand we have the apocalyptic folk - who seem convinced that machine intelligence is likely to result in the rapid and sticky end of the human race. On the other hand, we have a bunch of machine intelligence enthusiasts - who are equally convinced that the risks are overblown and that intelligent machines are likely to usher in an era of happiness and plenty.

Bill takes the concerns about machine ethics seriously - and attempts to find technical solutions for the problems - or at least sketch out solutions or suggest where to look for them. One of the first problems he looks at is delusions. He considers this in the context of the wirehead problem - where agents self-stimulate by taking drugs, implanting electrodes in their pleasure centers, and so on. Experience with drug addicts suggests that they may become desperate and behave badly. We do not want machine intelligence to be too much like that.

Bill describes a "delusion box". This is a box in the environment which contains sensory stimulii that correspond to high utility states. Bill considers what can be done to prevent machine intelligences from getting hooked on such delusional high-utility stimulii. Bill advocates for a scheme involving evaluating a utility function on the domain of expected states of the world. He argues that if creatures predict future states of the world and evaluate the utility of those, then they will be unlikely to become obsessed with delusions.

The delusion box is attractive to researchers partly because it is easy to analyze. Even agents with a cartesian mind/body split can be analyzed using it. However, in my opinion, it doesn't really capture a lot of what is important about the wirehead problem. Wireheading typically involves actions that affect your own brain - and Cartesian dualism isn't a helpful assumption in this case. Anyway, ignoring this issue, it does seem likely that applying a utility function directly to the state of simulated future worlds is indeed sufficient to avoid wireheading.

However, there do seem to be some problems with this solution. A more conventional approach to building a learning agent is based around predicting perceptions. Expected perceptions are compared against actual ones, and then steps are taken to minimize the differences. To produce these predictions an environmental model is still constructed, but the only way to query the model is to have the agent interact with it and observe the results. This is more or less how animal brains work. However, if working with environmental models inferred from perceptions, these operations are significantly harder - since often much of the state of the environment is uncertain or unknown. If both the expected state and the actual state are both largely unknown, it becomes more challenging to compare them and minimize the differences.

With perfect information games - like chess and go - it is easy to go from perceptions to an environmental model. In general, however, inferring an environmental model from perceptions is itself a challenging problem.

Direct environmental models are great - but I think they will prove to be expensive. My expectation is that researchers will mostly work from perceptions rather than the more challenging route involving calculating the utility from the state of environmental models. The former path heads towards wirehead territory - but we know that there are other ways of avoiding problems with wireheading. For example, most humans avoid wireheading partly because brain surgery is difficult and partly because of influences from family and peers. Machines may avoid wireheading for a while by using similar techniques. For example, they will probably not redesign their own brains, but rather will collaborate with other agents to design the brains of the next generation of machines. We probably don't have to worry too much about wireheading causing serious problems until machines are much more capable than humans.

In the book, it seemed to me that where Bill faced a trade-off between ethics and performance, he unhesitatingly chose the more ethical solution. The problem I see with this is that performance is often important to competitive viability. It is no use being ethical if you are dead or obsolete. Consequently, I am inclined to put a greater emphasis on performance and efficiency. Which approach is more correct depends partly on how cut-throat the race towards superintelligent machines is. I expect significant levels of competition. However, if the race is actually more like a walk in the park, then it is possible that a reduced emphasis on performance and efficiency would not be so immediately fatal.

The mixture of machine intelligence and ethics looks set to produce significant culture clashes between engineers and philosophers. Bill's book is one of the most level-headed contributions to the field that I have seen so far. I don't agree with all of Bill's policy proposals, but his book certainly makes for stimulating reading.


Tim Tyler | Contact | http://matchingpennies.com/