The question of AI ethics is not merely a matter of estimating the probability of Really Bad Things; it is a matter of weighing the probability of Really Bad Things against the probability of Really Good Things.
This makes some of the stuff in the previous post less relevant (though not completely irrelevant). The major question becomes: in various scenarios, what is the ratio of good to bad?
When thinking about these issues, I find myself going back and forth between two concepts of the future:
- AIs increase in intelligence slowly. For a long time, AIs remain at human level or anyway human-understandable level. There is no miracle of recursive self-improvement; you put in what you get out.
- The first AI smart enough to begin recursive self-improvement quickly goes beyond the human-understandable level of intelligence, and has its way with the future.
I have argued already that the first outcome would be safer than the second; the opportunity for Really Good Things is less, but this (I would think) is outweighed by the far reduced probability of Reall Bad Things. (In scenario 1, sociopathic AIs would occur, but they would be of roughly human level and so could be contained; by the time AIs were too powerful, the art of creating ethical AIs would be fairly well perfected. (Also, even if an unethical AI was created, there would be a safety net of nearly-as-smart AIs to contain it.)
Of course, there isn't freedom to choose between the two outcomes, for the most part. Either recursive self-improvement (aka RSI) is a powerful possibility, or it isn't. To some extent, the outcome is dependent on whether the algorithmic research outpaces hardware developments, or vice versa (as discussed last time). The "race between software and hardware" mindset would have it that the best strategy for avoiding Really Bad Things is to make algorithmic progress as quickly as possible-- which favors open research, et cetera. But even if algorithms outpace hardware, there could still be an RSI tipping-point: suddenly the hardware is good enough to support RSI, and outcome 2 occurs. So this isn't well-justified. What is needed is a solid analysis of RSI.
Matt Mahony offers an argument that RSI is inherently slow. However, this model is limited by the (purposeful) restriction to strict self-improvement; in particular, the exclusion of learning. Since explosive RSI-like learning capabilities are essentially as dangerous, this is isn't a particularly helpful model. Shane Legg makes similar claims for learning. As it happens, I disagree with those claims:
3 days ago Matt Mahoney referenced a paper by Shane Legg, supposedly
formally proving this point:
I read it, and must say that I disagree with the interpretations
provided for the theorems. Specifically, one conclusion is that
because programs of high Kolmogorov complexity are required if we want
to guarantee the ability to learn sequences of comparably high
Kolmogorov complexity, AI needs to be an experimental science. So,
Shane Legg is assuming that highly complex programs are difficult to
invent. But there is an easy counterexample to this, which also
addresses your above point:
Given is T, the amount of computation time the algorithm is given
between sensory-inputs. Sensory inputs can ideally be thought of as
coming in at the rate of 1 bit per T cpu cycles (fitting with the
framework in the paper, which has data come in 1 bit at a time),
although in practice it would probably come in batches. Each time
--add the new input to a memory of all data that's come in so far
--Treat the memory as the output of a computer program in some
specific language. Run the program backwards, inferring everything
that can be inferred about its structure. A zero or one can only be
printed by particular basic print statements. It is impossible to know
for certain where conditional statements are, where loops are, and so
on, but at least the space of possibilities is well defined (since we
know which programming language we've chosen). Every time a choice
like this occurs, we split the simulation, so that we will quickly be
running a very large number of programs backwards.
--Whenever we get a complete program from this process, we need to run
it forwards (again, simulating it in parallel with everything else
that is going on). We record what it predicts as the NEXT data, along
with the program's length (because we will be treating shorter
programs as better models, and trusting what they tell us more
strongly than we trust longer programs).
--Because there are so many things going on at once, this will run
VERY slowly; however, we will simply terminate the process at time T
and take the best prediction we have at that point. (If we hadn't
gotten any yet, let's just say we predict 0.)
A more sophisticated version of that alg was presented at the AGI
conference in this paper:
The algorithm will be able to learn any program, if given enough time!
NOW, why did Shane Legg's paper say that such a thing was impossible?
Well, in the formalism of the paper, the above "algorithm" cheated: it
isn't an algorithm at all! Fun, huh?
The reason is because I parameterized it in terms of that number T.
So, technically, it is a class of algorithms; we get a specific
algorithm by choosing a T-value. If we choose a very large T-value,
the algorithm coulf be very "complex", in terms of Kolmogorov
complexity. However, it will not be complex to humans, since it will
just be another instance of the same general algorithm! In fact, it
would just be the same algorithm given more time to do its job.
So, on that grounds, I disagree with Shane Legg: it is still possible
to find algorithms analytically, despite the found algorithms being
"of high complexity". Or, to rephrase, there are simple algorithms of
high Kolmogorov complexity, and those algorithms do the job that can't
be done by algorithms of low Kolmogoriv complexity.
(As an aside, the next paragraph in my email uses a mathematically flawed argument: "Once we have the ordering, I can define the game as: create the lowest video sequence not definable in set theory." What I was trying to do was diagonalize set theory, to define a set-theoretically undefinable video sequence. The right way to do this is not to take the "lowest" sequence not defineable by set theory. Instead: put an ordering on set-theoretical definitions of sequences. For the Nth frame of our non-definable video sequence, look at the Nth set-theoretic video sequence; if the screen is all white in that sequence, take our sequence to be all black on that frame; otherwise, take it to be all white. :D)
Still, all I'm claiming is that we can construct an algorithm that will perform better if it is given more time. That provides a sort of counterexample, but not one that is explosive (until the point where an AI gets the ability to improve its own hardware).
[Edit: In the comments, Shane Legg points out that this doesn't really provide a counterexample. We can construct an algorithm that will do better given more time, but that does not mean that for every sequence we might want to learn, there is some amount of processing time that will allow the algorithm to converge correctly. In fact, that's false; there will be sequences that can't converge, for any fixed amount of time we allow the algorithm. Shane also corrects me on the conclusions he draws: he does not, in fact, conclude that "ai must be an experimental science".]
Eliezer makes a vivid argument for the possibility of explosively fast learning. Here is the key mathematical bit:
I occasionally run into people who say something like, "There's a theoretical limit on how much you can deduce about the outside world, given a finite amount of sensory data."
Yes. There is. The theoretical limit is that every time you see 1 additional bit, it cannot be expected to eliminate more than half of the remaining hypotheses (half the remaining probability mass, rather). And that a redundant message, cannot convey more information than the compressed version of itself. Nor can a bit convey any information about a quantity, with which it has correlation exactly zero, across the probable worlds you imagine.
But nothing I've depicted this human civilization doing, even begins to approach the theoretical limits set by the formalism of Solomonoff induction. It doesn't approach the picture you could get if you could search through every single computable hypothesis, weighted by their simplicity, and do Bayesian updates on all of them.
He does, however, admit reservations about the possibility of this power being manifested in the physical universe:
No one - not even a Bayesian superintelligence - will ever come remotely close to making efficient use of their sensory information...
...is what I would like to say, but I don't trust my ability to set limits on the abilities of Bayesian superintelligences.
(Though I'd bet money on it, if there were some way to judge the bet. Just not at very extreme odds.)
But, of course, an entity (say) halfway between humans and a Bayesian Superintelligence is still rather dangerous.