Alignment

“In artificial intelligence (AI) and philosophy, the AI control problem is the issue of how to build a superintelligent agent that will aid its creators, and avoid inadvertently building a superintelligence that will harm its creators. Its study is motivated by the notion that humanity will have to solve the control problem before any superintelligence is created, as a poorly designed superintelligence might rationally decide to seize control over its environment and refuse to permit its creators to modify it after launch” (Wikipedia, retrieved June 2021).

In this realm, understanding how to reward machine learning behaviour so as to develop a “policy” that dictates how the “intelligent agents” do what we want them to, has been supplanted by looking instead at structuring the environments in which these agents will operate. In his book “The Alignment Problem: Machine Learning and Human Values” (2020), author Brian Christian explains why, using the example of ourselves in nature:

“A programmed heuristic like, ‘Always eat as much sugar and fat as you can’ is optimal as long as there isn’t all that much sugar or fat in your environment and you aren’t especially good at getting it. Once that dynamic changes, a reward function that served you and your ancestors for tens of thousands of years suddenly leads you off the rails” (2020, p. 173).

Clues from evolution and child development are now useful to reward designers of robots and artificial intelligence. Beyond specific policies, Christian says “values” must be instilled in these agents using notions of parenting and pedagogy, and in a manner where not only will our actions be understandable to our creations, but so that they act in ways that are transparent to us. He cautions against relinquishing too much control, not to the agents and machines, but to the training models we use for these sorts of purposes, citing Hanna Arendt as to how easily evil can emerge from an ill-conceived but otherwise innocuous template, as the models themselves “might become true” (2020, p. 326).

Given their complex nature, should we wonder whether our intelligent machines might develop some equivalent of emotion? In an essay titled “In The Chinese Room, Do Computers Think?”, science author George Johnson suggests such anomalous behaviour could take the form of “qualities and quirks that arose through emergence, through the combination of millions of different processes. Emotions like these might seem as foreign to us as ours would to a machine. We might not even have a name for them” (1987, p. 169).

How might such artificial emotions arise and what might they be like? As science fiction author Philip K. Dick wonders, will our Androids Dream of Electric Sheep? Do such speculations point to how our very own emotions and thoughts arise, and the factors in our bodies and environments which contribute to their arising?


Christian, B. (2020) The Alignment Problem: Machine Learning and Human Values. New York, United States: Penguin Random House.

Dick, P. K. (1968) Do Androids Dream of Electric Sheep? New York, United States: Penguin Random House – Doubleday.

Johnson, G. (1987), In The Chinese Room, Do Computers Think? in Minton, A. J. & Shipka, T. A. (Eds.), Philosophy: Paradox and Discovery Third Edition (1990), (pp. 156 – 170). New York, United States: McGraw-Hill.