Friday, March 27, 2009

AI Moral Issues

I recently had a discussion with Eliezer Yudkowski. I contacted him mainly because I thought from what I knew of his thinking that he would be interested in foundational logical issues, yet I saw no technical details of this sort in his many online writings (except for a cartoon guide to Lob's Theorem). I quickly learned the reason for this. Eliezer believes that there is great risk associated with AI, as I already knew. What I had not guessed (perhaps I should have) was that he therefore considers technical details too dangerous to release.

My natural disposition is to favor open research. So, I am very reluctant to accept this position. Of course, that is not in itself an argument! If I want to reject the conclusion, I need an actual reason-- but furthermore I should search just as hard for arguments in the other direction, lest I bias my judgement.

(That second part is hard... although, it is made easier by the large body of literature Eliezer himself has produced.)

The key question is the actual probability of Very Bad Things happening. If the chance of disaster thanks to AI research is above, say, .1%, we should be somewhat concerned. If the chances are above 1%, we should be really concerned. If the chances are above 10%, we should put a hold on technological development altogether until we can find a better solution. (AI panic puts the current probability at 26.3%!)

There are two issues here: the probability that AIs will become sufficiently powerful to pose a serious threat, and the probability that they would then proceed to do bad things. These topics are large enough individually to merit a long discussion, but I'll try to jot down my thoughts.

AI Power

From what I've read (not from my recent discussion with him), Eliezer's main argument for the first point is that AIs will, because they are software, have a far greater ability to self-improve than humans have. Thus, if they start out with approximately human-level intelligence, they could end up far smarter than us just by thinking for a while. "Thinking for a while" could mean a few days, or a few years-- the risk is still there, so long as the AIs would eventually reach superhuman intelligence.

Personally, I think the human mind has a fairly high ability to self-improve. Sure, we can't modify our neural substrate directly, but it can equally be said that an AI can't modify its silicon. Our software could in principle be equally maleable; and the possibility, combined with the assumption that self-improvement is highly effective, would suggest that evolution would have hit on such a solution.

However. Eliezer likes to say that evolution is not very smart. This is certainly true. So, I don't really know how to assign a probability to evolution having hit upon a comparatively flexible form of recursive self-improvement. So, I need to take into account more evidence.

  • I can't "examine my own source code". An AI could. (If humans could do this, AI would be an easy problem!)
  • I can learn new ways of thinking. I can even consciously examine them, and decide to accept/reject them.
  • I can't wish habits away.
  • An AI could run detailed simulations of parts of itself to test improved algorithms.
  • I can run mental simulations of myself to some extent (and do so).
Tentative conclusion: An AI could be a better self-improver than I, but it would be a difference in degree, not a difference in kind.

This conclusion is neither here nor there in terms of the question at hand.

Other arguments to the effect that AIs could gain enough power are ready at hand, however. The main one is that if a particular amount of computing power can create human-level intelligence, it seems pretty obvious that larger amounts of computing power will create larger amounts of intelligence. It also seems like people would willingly cross this threshold, even competing to create the largest AI brain. Even if self-improvement turns out to be a total flop, this would create superhuman intelligences. If one such intelligence decided to grab more hardware (for example using a virus to grab computing power from the internet) the amount of computing power avaliable to it would probably become rather large rather fast.

(All of this assumes human-level AI is possible in the first place: a notable assumption, but one I will not examine for the moment.)

The argument from superhuman intelligence to superhuman power is fairly straightforward, though perhaps not 100% certain. The AI could hack into things, accumulate large amounts of money through careful investment, buy a private army of robots... more probably it would come up with a much cleverer plan. How smart, exactly, does it need to get in order for this to be a danger? Estimates range from the level of a smart human (because a smart human can already be dangerous) to the intelligence of all humans combined (because that is what the AI would be up against). For the really bad scenarios to occur, it would seem, the AI needs to be capable of major scientific innovation; innovation on the order of at least a team of scientists. (I must admit that, here and now, a single scientist could potentially release a deadly disease from a lab-- this involves no innovation, since the diseases are already there. But, this is because a select few scientists have special access to these diseases. An AI might get such access, but that doesn't seem especially probable until a time when AIs are all around and being given all sorts of other responsibilities... at which point, if the AIs are actually unfriendly, the disease is only one of many worries.)

One question remains: is there enough computing power around currently to cause concern? This is something that did come up in the conversation. If current machines are not risky, then it could be possible today to hit upon the right AI design, yet not achaive human-level intelligence using it. Personally, I think this would be ideal. Such a scenario, with AIs gradually increasing in intelligence as the hardware increased in capability, would give humans time to experiment with AI technology, and also consider its consequences. (Indeed, some argue that this is already the situation: that the fundamental algorithms are already known, and the hardware just needs to catch up. I don't agree, although I can't deny that a large amount of progress has already been made.)

Eliezer argued that human-level intelligence on modern-day machines was plausible, because evolution is not a good engineer, so human-level intelligence may require far less hardware than the human brain provides. Estimates based on the brain's computing power vary quite widely, because it is not at all clear what in the brain constitutes useful computation and what does not. Low estimates, so far as I am aware, put the brain's computing power near to today's largest supercomputer. High estimates can basically go as far as one likes, claiming that chemistry or even quantum physics needs to be simulated in order to capture what is happening to create intelligence.

Of course, the internet is vastly more powerful than a single machine. But the risk of an AI escaping to the internet does not seem very high until that AI is at least near human level pre-escape. So, what is the probability that current machines could be human-level with the current algorithm?

My faith in evolution's engineering capabilities is somewhat higher than Eliezer's. Specifically, Eliezer is (from what I've read) quite fond of the study of cognitive bias that has become a popular subfield of psycholgy. While I enjoy Eliezer's writings on rationality, which explicates many of these biases, I am reluctant to call them design flaws. Upon reflection, there are better ways of doing things, and explicating these better ways is an important project. But my best guess is that each cognitive bias we have is there for a reason, essentially because it makes for a good pre-reflection guess. So, rather than design flaws, I see the cognitive biases as clever engineering tricks. (I do not know exactly how far from Eliezer's way of thinking this falls.) This is merely a default assumption; if I studied the cognitive bias literature longer, attempted to come up with explanations for each bias, and failed, I might change my mind. But, for now, I am not comfortable with assuming large amounts of mental inefficiency... although I admit I do have to postulate some. On the other hand, I also have to postulate a fairly high amount of inefficiency to human-made AI, because it is a hard problem.

So, again, this leads neither here nor there.

But, the real question is not whether today's computers are human level; the more critical question is quite a bit more complicated. Essentially:

Will there be a critical advance in software that occurs at a time when the existing hardware is enough to create an AI hazard, or will software advances come before hardware advances, such that humanity has sufficient time to get used to the implications and plan ahead?

Again, a difficult question to answer. Yet it is a really important question!

A historical view will of course show a fairly good match-up between amount of processing power and results. This old paper begins with such an account geared towards computer vision. Yet, there are real advances in algorithms happening, and they will continue to happen. A small but striking example of sudden improvement in algorithms is provided by Graphplan, an algorithm which changed the AI planning landscape in 1997. Of course, today, the algorithms are even better. So, hardware clearly isn't everything.

A proper estimate would involve a serous analysis of the pace of advance in computing-- how probable is it that Moore's law will keep its pace, speed up, slow down, et cetera-- and likewise an analysis of progress in AI algorithmics. But, equally important is the other side of the question; "such that humanity has sufficient time to get used to the implications and plan ahead". How much hope is there of this, assuming that the software is available before the hardware?

I've said that I think this is the ideal outcome-- that people have a while to first get used to near-human-level AI, then human-level, then superhuman level. Part of why I think this is that, in this scenario, there would probably be many superhuman AIs rather than just one. I think this would improve the situation greatly, but the reasons are more a topic for the section on whether an AI would in fact do bad things. In terms of AI power, the situation is not as persuasive. It seems perfectly possible that a society that became used to the presence of AIs would give them various sorts of power withought thinking too hard, or perhaps even thinking that AIs in power were safer than humans in power.

The thing is, they might be right-- depending on how well-designed the AIs were. Which brings us to:

AI Ethics

If an AI gained sufficient power, would it destroy humanity?

Of course, that depends on many variables. The real question is:

Given various scenarious for an AI of sufficient power being created, what would that AI do?

The major scenarios under consideration:

  • Many powerful AIs (an many not-so-powerful AIs) are developed as part of an ongoing, incremental process open to essentially all of humankind
  • A single powerful AI is developed suddenly, by a single organization, as a result of a similarly open process
  • A powerful AI is developed by a single organization, but as a result of a closed process designed to minimize the risk of AI methods falling into the wrong hands
Eliezer argues that the second scenario is not as safe as the third. Suppose the added effort to make a friendly powerful AI as opposed to just-any-powerful-AI is 1 year. Then, in an open process, an organization not very concerned with friendlyness will be able to create an AI 1 year before an organization concerned with friendliness.

This, of course, depends on the idea that it is easier to create an unfriendly AI than a friendly one. Eliezer has written at length on this. The key concept is that a mind can have any particular goal; that there is no system of ethics that will be universally accepted by any sufficiently intelligent mind, because we can quite literally program a mind that has the exact opposite set of values for any given set. (Just reverse the sign on the utility function.)

The argument I gave to eliezer for universal ethics is essentially the same as a view once argued by Roko. (Roko has since been convinced that Eliezer is correct.) He calls it the theory of "instrumental values". The idea is that most rational agents will value truth, science, technology, creativity, and several other key items. It is possible to contrive utility functions that will not value these things, but most will. Therefore, a future created by a bad AI will not be devoid of value as Eliezer argues; rather, unless it has a really weird utility function, it will look a lot like the future that a good AI would create.

This is a topic I want to go into a lot of detail on, and if the first part of the post hadn't ended up being so long, I probably would. Instead, I'll blog more about it later...

For now, it is important to observe (as Eliezer did in our conversation) that reguardless of far-off future similarities, a good AI and a bad AI have an immediate, critical difference: a bad AI will very probably consider humans a waste of resources, and do away with us.

Similarly, a badly designed but well-intentioned AI will very probably result in a future devoid of purpose for humans. Maximizing happiness will justify forced drugging. Maximizing a more complicated value based on what people actually consider good may easily result in a locking-in of currently enjoyed hobbies, art forms, et cetera. Maximizing instrumental values might easily lead to the destruction of humans.

The terrible, horrible dillemma (it seems to me) is that once a single AI gains power, be it good or bad, it seems that the single utility function that such an AI is programmed with becomes completely locked in. Any flaws in the utility function, no matter how small they may seem, will be locked in forever as well.

There are various ways of getting around this, to some extent. One approach I've been tossing around in my head for a while is that an AI should be uncertain about its own goals, similar to human uncertainty about what is ethical. This is entirelyadmissable within a Bayesian formalism. What, then, would the AI take as evidence concerning ethics? I visualize it something like this: the AI would have a fair amound of (hand-programmed) knowledge about what sorts of things are probably ethical, and it would search for simple rules that would meet these criteria. A better theory would be one that fit better to the patchwork of preconceptions about ethics. Preconceptions would include things like "what a human considers ethical, is more probably ethical..." "what a human, given large amounts of time to reflect, would consider ethical, is more probably ethical..." as well as simpler statements like "killing an unwilling victim is unethical with high probability", creating pain, and so on. A few different Three-Laws style systems could "fight it out", so to speak.

Eliezer suggests a different sort of solution: an AI should behave in a highly lawful manner, setting definite rules and consequences, rather than merely doing whatever it takes to do "good" as defined by humanity. He's suggesting this as a solution to a somewhat different problem, but it applies about as well here. An AI that booted up, calculated a good set of laws for utopia, set up physical mechanisms to enforce those laws, and then shuts off, will not lock the future into a single utility function. It will of course give it a huge push in a particular direction, but that is quite different. It is purposefully leaving the future open, because an open future is a plus according to (at least) the majority of humans.

The third option that I can think of is one I've already mentioned: have several powerful AIs rather than one. This still carries a large risk. 20 AIs that decide humans are useless are just as bad as 1 AI that decides humans are useless. However, 20 AIs with well-intentioned-but-wrong utility functions are probably much better than 1, so long as they all have different well-intentioned utility functions.

The AIs would probably have incentive to enforce a balance of power. If one AI becomes obviously more powerful than the others, the others have incentive to gang up on it, because that one persuing its utility function to the utmost is probably far worse for the others than whatever the group consensus is. That consensus should be something favoring humans, since the individual goals are all random variations of that theme... if we look at all the goals, and ask what they have in common, favoring humanity should be the answer.

Of course, that result isn't entirely certain. First, the average of many mistaken goals is not necessarily a good goal. Second, the average is not necessarily the sort of compromize that would result. Third, once a compromize has been agreed upon, the AIs might (rather than maintaining their standoff) all rewrite their utility functions to reflect the consensus, and effectively merge. (This would be to avoid any future defectors; the utility of stopping other possible defectors might be higher than the utility of keeping your ability to defect by keeping your utility function, thus making it rational to agree to a group rewrite.) In this case, the lock-in that I'm afraid of would happen anyway (although we'd be locked in to a probably-less-terrible utility function). Fourth, the situation might not result in a standoff in the first place. Even with several AIs to begin with, one could gain an upper hand.

Friendly AI (as Eliezer calls it) is a hard question. But, it is not the question at hand. The question at hand is:

Which is better, an open research process or a closed one?

I've touched on a few of the factors, but I haven't come close to a definite answer. A proper answer requires an examination of the friendliness issue; an analysis of the curve of technology's growth (especially as it relates to computing power); an examination of what sort of theoretical advances could create a powerful AI, and in particular how suddenly they could occur; an idea of AIs future place in society (both sub-human, human-level, and superhuman AI), which requires a socio-economic theory of what we will do with AI; and a theory of AI psychology, mapping the space of possible minds (focusing on which minds are friendly to humans).

I'll try to address each of these issues more closely in the next few posts.


  1. Anonymous2:55 AM

    This comment has been removed by a blog administrator.

  2. I don't claim to have developed e.g. a unified logic whereby an AI can quine itself without falling prey to Lob's Theorem, which I have but am not releasing;

    I am saying that *if* I developed such a thing, I would not release it openly, and would urge you not to do so either, because the basic dynamic of "Easier to make bad AI than good AI" requires good AI researchers to accumulate at least some advantages that are not openly shared.

  3. See for a concept of updating on evidence about human(e) values ("ethical uncertainty" is the wrong term here because that implies a pre-existing framework).

  4. Eliezer,

    Did the post make it sound as if you claimed that? If so, that was completely unintentional.

    However, I suppose I did make it sound as if you favored an almost completely closed research community. This is somewhat unjustified. So, where do you draw the line? Do you think friendly AI researchers should reveal nothing at all to the wider scientific community, or only non-critical insights, or nearly-but-not-quite-everything?

    Obviously, there is reason to be completely open about insights that help build friendliness into an AI but don't help build the AI in the first place... such as fun theory. The question becomes just how open to be about everything else.

    Thanks for the link.

  5. Leo Szilard had this conversation with Enrico Fermi about the true neutron cross section of purified graphite, by the way. Fermi wanted to publish, Szilard said no. Fermi, who thought that atomic weapons were probably not possible and if possible were 50 years away, and who was a big believer in the international unity and openness of science, nearly exploded. Rabi came in on Szilard's side, and their small conspiracy kept the result secret.

    This is one of the major reasons why the Germans didn't get the atom bomb; they didn't realize that graphite was an effective neutron moderator, and went hunting for deuterium instead, which the Allies denied them.

    Sure, there are plenty of things you can talk about in Friendly AI, even mathy things - I would be willing to write up my theory of Newcomblike problems, if something made that worth the time investment.

    On the other hand, graphite of itself does not explode. We're lucky Szilard didn't let Fermi get away with "We don't even know that a chain reaction is possible! And just knowing the neutron cross section of graphite isn't enough to make an atomic bomb, right?" No secret can be unpublished, once published, so there is a strong argument for being conservative.

    Incidentally, the first comment currently showing by "Arabic" is spam, you should delete that.