## Monday, April 27, 2009

One of the big rifts between the way things are described in the theoretical foundations of mathematics and the way things are actually done in real mathematics is that real mathematicians are constantly defining new objects to study. Perhaps someone sees a particular combination of properties pop up a few times, so they give that combination a name and start studying it directly. Perhaps someone decides a particular restriction of a given problem will be easier to solve, and so gives that a name and starts studying it. Perhaps someone wants to know how many X need to be combined to make Y, and they call the number the "X number" of Y, and start studying the properties of it as X and Y are varied. And so on.

In fact, even when studying the foundations of mathematics, such names are often snuck in as "unofficial notation" that is thought of as abbreviating the official notation. Unofficial notation makes axioms and other formulae easier to read.

Why do we do this? I can think of two explanations. First, one can explain this in the same way "named entities" are explained in other domains: it is compressive to name common entities. This means it takes up less room in memory, and also as a side benifit it usually makes the world easier to predict. This is an important explanation, but not the focus of this post. The second explanation is that naming things is a means of inference control.

When we name things, what we proceed to do is to determine some basic properties of the new entities. (This seems true in non-mathematical discourse as well, but perhaps not as clear-cut.) We come up with properties of the new entities, and look for ways of calculating those properties. We come up with theorems that hold for the entities. We look for existence and uniqueness proofs. And so on. What I'm arguing is that all of this activity is basically focused on helping later inference.

One particular phenomenon serves as a good illustration of this: the behavior of mathematicians when studying sequences of numbers that emerge from particular mathematical entities. The prime numbers; the number of groups of each finite size; the number of graphs of a given size; the fibonacci sequence. The big thing to try to do with a sequence is to "find the pattern". What could this mean? If the sequence is mathematically well-defined, don't we know the pattern already? A mathematician will find the first few values of a sequence, and then study them looking for relationships. Often what is sought is a recursive definition of the sequence: a function that calculates the next number based on the direct previous number, or several of the previous numbers, or perhaps all of the previous numbers. If we've already got a recursive form of the sequence, we might try to find a "closed form" version. My argument here is that all of this behavior is explained by the desire to make later reasoning as expedient as possible. Finding the recursive form is not merely a sort of curiosity, like wondering if it is possible to keep a spider alive in a paper cup; rather, the recursive form is a faster way of calculating the sequence then the method that follows directly from the definition. Similarly, the closed form wil usually be even faster.

So, what I'm saying essentially is that a reasoning system should (when it isn't busy doing other things) be looking for nice entities related to the task at hand, and nice quick ways of calculating stuff about those entities. What "nice entity" means is based on two (sometimes conflicting) motivations: the entity should be compressive (meaning it should help make descriptions take up less space), and it should be tractable (meaning reasoning about it should be quick).

How should the search for good ways of calculating be carried out? By the same methods as general inference. The system should be able to apply good reasoning methods that it finds back onto the task of looking for good reasoning methods. Of course, for this to work very well, the system needs to be able to have fairly good reasoning methods to begin with.

What form should the problem-solving methods that the system finds take?

Definitely they should be able to take on the form of typical algorithms: closed-form expressions, recursive expressions, and generally any . Definitely these should be associated with the necessery criteria for application to an object (the criteria that guarantee correct results). Probably they would also be associated with provable upper bounds on runtime, so that the method chosen for a particular case (where multiple methods might apply) would be the one with the lowest time-estimate for that case. (Problems might arise for difficult-to-calculate time bounds; perhaps estimators would be required to be linear time.)

Some extra features could probably improve upon this basic setup:

-Allow possibly-non-terminating algorithms. This would make "search for a proof" count amoung the methods (as the last resort), which strikes me as elegent.

-Allow learned expected time bounds

-Allow "soft" problem-solving strategies (of possibly various sorts) to be learned; ie, heuristics

Last time, I wrote about the apparent limitations of logical systems arising purely from a formal axiomatization of the Tarski hierarchy. I said:

So, if this gets us a lot of math, what is missing?

The obvious answer is "a truth predicate for that system". This doesn't lend much insight, though.

I should have thought before I spoke. A more powerful language cannot be constructed by adding a truth predicate. The semantics of the language is that it should be referring to the Tarski hierarchy! A truth predicate over such a language would need to be a truth predicate over the entire Tarski hierarchy. Such a truth predicate would apparently correspond to an ordinal above all ordinals. Not good!

As a side note: I've been looking at Quine's "New Foundations", and in that system, it is possible to talk about the ordering of all ordinals. Surprisingly, this ordinal does not cause problems. So, maybe in New Foundations the above move would be OK. With a naive view of the ordinals, though, it is not.

Keeping the naive view, it seems like I should deny the possibility of enriching the language by adding a truth predicate. Does that mean that I should say that the language is maximally rich? I don't think so. I suspect the situation is more like what happens with Kripke's language that contains it's own truth predicate: the language can be expanded in other, new directions.

[edit- some of the text was garbled as posted. I may have lost a paragraph or so.]

## Saturday, April 25, 2009

(Analysis of the system proposed in this post)

The idea of creating axiom systems describing Tarski's hierarchy in terms of ordinals, and ordinals in terms of Tarski's hierarchy, was discussed previously. It seems that the circularity would help prove the existence of lots of ordinals (and lots of Tarskian levels), hopefully without leading to paradox. So, if this gets us a lot of math, what is missing?

The obvious answer is "a truth predicate for that system". This doesn't lend much insight, though.

My bet is that we don't get anything uncountable by such a method. Where would it come from? We're constructing truth-predicates for countable languages, and (initially) countable sets of countable languages. Sure, the set of all countable ordinals is uncountable. But there is no reason to believe that we get the set of all countable ordinals!

My feeling is that if I studied more about the hyperarithmetical hierarchy, there would be an obvious mapping between some portion of it and the system(s) under consideration.

In some ways, the notion of "uncountable" seems to come from nowhere. All the formal constructions for "real number" and related entities seem to rely at some point on some other uncountable set. It's a bit like the (obviously illicit) method of defining a really big ordinal: "The ordering of all ordinals that can be defined without reference to this ordinal".

Yet I do feel that the notion is meaningful! So, where might it come from?

I have several ideas, none of them entirely convincing.

One idea is that uncountable sets can be motivated by considering the space of possible sequences of input from the external environment, assuming a countably infinite amount of time. I've seen similar ideas elsewhere. One might counter that all structures die eventually, so all sequences of input in real-world situations are finite; this makes the set into a mere countable infinity. On the other hand, one might say that this is merely the case in practice; the idea of an infinitely long-lived structure is still sound, and even physically possible (if not probable). But even so, I don't feel like this is a really complete justification.

Another attempt would be to claim that we need to allow for the possibility of infinitely long statements, despite not being able to actually make and manipulate such statements. (Universal statements and such may abbreviate infinitely many claims, but they are not literally infinitely long.) This idea is motivated by the following consideration: a nice way of getting a theory of set theory from a non-set-theoretic foundational logic is to think of a set as a statement with an unfilled slot into which entity-names can be put to make complete statements. Claiming that a set contains an element is thought of as claiming that the statement is true of that object. At first, this might seem to fully justify naive set theory: a set exists for each statement-with-hole. But, this can only work if the theory contains its own truth predicate, so that we can make arbitrary talk about whether a statement is true when we fill a slot with a given element. The amount of set theory that gets captured by this method depends basically on the extent to which the theory is capapble of self-reference; the naive theory of truth corresponds to naive set theory.

This is interesting by itself, but the point I'm making here is that if we want to have an uncountable number of sets (for example if we believe in the uncountable powerset of the real numbers), then we'll want a logic that acts as if infinite statements exist. What this means is an interesting question; we can't actually use these infinite statements, so what's the difference with the logic?

One difference is that I don't automatically have a way of interpreting talk about turing machines and numbers and other equivalent things anymore. I was justifying this referential power via talk about statements: they provide an immediate example of such entities. If we posit infinite statements that are "out there", somewhere in our set of sentences, we lose that quick way of grounding the idea of "finite". (This could be taken to show that such a method of grounding is not very real in the first place. :P)

Semantically, then, we think of the base-language as an infinitary logic, rather than regular first-order logic. The language formed by adding the first truth predicate is then thought of as already containing talk about uncountable sets. (This changes the axoims that we'd be justified in using to manipulate the truth predicate.)

All in all, I think this direction is mathematically interesting, but I'm not sure that it is really a route to justify uncountables.

A relevant question is: why do mathematicians think that uncountables exist? The proof is given by taking the powerset of some countably infinite set, which is defined as the set of all subsets of the countably infinite set. It's then shown that no function exists that maps the powerset onto the countable set. This can be done even in systems that does not really have any uncountable sets: the set of subsets of a countably infinite set will map onto the original set, but not by a function within the system. So from inside the system, it will look as if there are uncountables.

If this explains what mathematicians are doing, then mathematicians are being fooled into thinking there are real uncountables... but how can I say that? I'm just being fooled into thinking that there is a difference, a "real" and "not real", right?

I think it is plausible that this weirdness would go away, or at least change significantly, in a logic that resolves other foundational problems. We might have a much better story for why the function that would map the uncountable set onto the countable set doesn't exist, so that it becomes implausible to claim that it exists from the outside but not from the inside. (But would that make the uncountables "real"?)

(Originally posted to this logic mailing list I'm trying to get started. All are welcome! Except if you know nothing about logic and aren't willing to learn. Then you're not welcome. Sorry.)

An interpretation is a translation of a statement from one logic into

another. I was looking at interpretations recently because they appear

to be a way to show that semantically stronger logics (specifically,

logics formed by adding a truth predicate) really are stronger in

practice. The stronger logic can interpret the weaker, but the weaker

cannot interpret the stronger. (This is one type of proof-theoretic

strength.)

Normally, one does not allow just *any* interpretation... for example,

we wouldn't want a universal quantifier in the first logic to be

interpreted as an existential one in the new logic. It is typical (so

far as I have seen) to require the logical connectives to remain

unchanged, and only allow minimal changes to the quantifiers (such as

translating them to be restricted quantifiers).

Yet, all we care about is structure in these interpretations. For

example, if we're interpreting talk about numbers into a logic that

talks about sets, we don't really care if the resulting translated

sentences don't have anything to do with sizes of sets-- all we're

supposed to worry about is relationships. For example, our new

translated notion of "addition" should tell is that "5" and "6" makes

"11". (Usually when people do construct interpretations, of course,

they try to find ones that make some intuitive sense.)

So if all we care about is structure, why constrain the way logical

connectives and quantifiers are translated? What happens if we get rid

of these constraints?

Well... we can't get rid of *all* the constraints: we still need to

capture what we mean by "interpretation". Not every function from one

language to the other counts! The way I see it, we want to keep the

following minimal constraint:

C1: If A |- B in L1, then f(A) |- f(B) in L2

Where f(X) is the translation function, "X |- Y" means Y can be proven

from X, L1 is the language being translated, L2 is the language being

translated into, and A and B are statements in L1.

I've also thought of the following:

C2: A |- B in L1 if and only if f(A) |- f(B) in L2

But, that is not much like the standard notion of interpretation.

Essentially it means that the interpretation into L2 adds no extra

knowledge about the entities in L1. But, if we're looking for strong

languages, extra knowledge is a good thing (as long as it is true

knowledge). (One could argue that this justification doesn't apply if

we're only worrying about structure, though, I think. Specifically, of

when we say "structure" we mean not just structures of what's provable

but also structures of what isn't, we should take C2 as the proper

constraint. I'll proceed with C1 since it is the more standard-looking

constraint.)

Anyway. What now happens to the earlier assertion, that semantically

stronger languages are also proof-theoretically stronger, because they

can interpret more logics?

Answer: plain first-order logic can interpret any logic L with a

computable notion of provability.

Proof: Arbitrarily declare that some of the function symbols represent

the operations necessary to build up statements in L (for example, one

function might be chosen for each character in the alphabet of L, so

that composing those functions in a particular sequence would

represent writing down characters in that sequence). For an

L-statement X, call the first-order version of X built just from these

function symbols h(X). Write down some statements describing

L-provability. Writing down the rules of L like this is possible

because first-order logic is Turing-complete. Take the conjunction of

the statements about L and call it R (for "rules"). The interpretation

of an L-statement X will simply be R->h(X), where "->" is material

implication. This will satisfy C1 because wherever L proves A from B,

first-order logic will prove that if the rules of L-provability hold,

then A is L-provable from B. (This proof is still pretty messy, but

nonetheless I'm guessing I'm going into more detail here than needed,

so I'll leave it messy for now.)

What does this mean? To me this means that proof-theoretic strength is

not a very justifiable way of enticing people over to the side of

logics with strong semantics. First-order logic is in some sense

proof-theoretically all-powerful (if we allow arbitrary

interpretations, and if we're dealing with computable provability). If

someone is set in their path of using just first-order logic, I cannot

convince them just by talking about first-order truth; first-order

logic doesn't have a strong enough semantics to talk about first-order

truth, but they can interpret what I am saying by just listening to

the computable inference rules of my more-powerful logic. Every time I

say X, they can interpret me as meaning "The rules I've laid out imply

X". They will then be able to assert that first-order reasoning can

justify all the reasoning I'm doing, without even needing any new

axioms.

asserting that X follows from the rules I've laid out... but if

they're *really* applying the interpretation properly, they'll tell me

that I'm making a meaningless distinction, since they'll think

R->(R->h(X)) is the same as R->h(X). (If they make this reasoning

explicit, though, I have them: I can assert that I'm not saying

R->h(X), I'm saying h(X).)

## Friday, April 17, 2009

Continues this post.

I think now that it is relatively straightforward to establish a correspondence between the Tarski hierarchy of truth and my hierarchy of nonsense.

Basically, the two hierarchies diverge thanks to two different notions of the correct way to add a "truth" predicate to a base language. The Tarski hierarchy adds a metalanguage that only talks about truth in the base language. The nonsense hierarchy instead prefers Kripke's method, in which we construct a metalanguage that contains its own truth predicate. Both approaches can be thought of as constructing a metalanguage on top of a base language, but the Tarski hierarchy keeps doing so, resulting in a hierarchy of truth, whereas the Kripke fixed-point construction cannot be iterated- doing so adds nothing more. To continue upwords in the Kripke construction, we proceed in a different direction, adding nonsense predicates.

When we use the Kripke truth construction, we can clearly interpret the first Tarski iteration: all the truth-sentences that talk only about truth of base-language statements will be there, provided we have enough machinery to interpret the restricted quantifiers. (Details here will depend on the exact construction.) The semantics assigns them the same truth values; none of these sentences will come up undefined. (I'm talking about semantic interpretations, not proof-theoretic ones... again, details need to be worked out.) The second iteration of Tarskian truth will similarly be inside the Kripke construction; since the first iteration gets definite truth values, the second does. So it goes for as long as the Kripke construction can interpret the restricted quantifiers; that is, for as long as the characteristics of a particular level of the Tarski hierarchy are definable given the tools that the Kripke construction has at its disposal. For example, if these tools can only define computable structures, I'd suppose that the Kripke construction would interpret the portions of the Tarski hierarchy corresponding to the computable ordinals. (That's just a guess. More research required!)

In any case, given a particular amount of expressiveness in the base langauge, the Kripke construction will add a definite amount of expressiveness, corresponding to climbing a particular number of Tarski-hierarchy steps. (Probably this is known; I imagine someone has researched the semantic expressiveness of the Kripke least-fixed-point...) So what happens when we add in more nonsense predicates? Well, adding in nonsense predicates basically allows us to climb that same number of levels again; each nonsense predicate plays the role of allowing us to talk fully about the semantic structure of the construction-so-far (the role that the truth predicate plays in the Tarski hierarchy). This can be thought of as adding that amount of structure to the base language. Then, the Kripke truth construction can do its work on that increased amount of structure. So, we jump up the same number of steps on the Tarski hierarchy for every nonsense predicate added.

Eventually, since the amount of structure added by the truth predicate is always fixed, the scene will be dominated by the hierarchical structure added by the nonsense predicates. Still, it seems clear that each level will correspond in a definite way to a level on the Tarski hierarchy. The nonsense hierarchy merely forces one to make larger jumps at a time.

## Friday, April 10, 2009

What are we doing when we list infinities? Particularly, I take ordinal infinities as my example.

The most natural ordinals, which people will tend to stay within if you ask them to invent bigger and bigger ordinals and give them the basic definitions, are the recursive ordinals. In fact, people will tend to stay within what I'd call the "prinitive recursive" ordinals:

infinity, infinity + 1, infinity + 2, .... infinity + infinity (= 2*infinity), 3*infinity, 4*infinity, ... infinity*infinity (= infinity^2), infinity^3... infinity^infinity (=infinity^^2), infinity^(infinity^infinity) (=infinity^^3), infinity^(infinity^(infinity^infinity)) (=infinity^^4) ...... infinity^^infinity ..... infinity^^^infinity ..... infinity^^^^infinity .....

The above uses up-arrow notation.

Enumerating computable ordinals involves finding halting computations, such as the numerical operations used above. Sticking infinity into a computable function basically stands for the infinite series produced by sticking finite numbers in that same location. So just "infinity" stands for:

1, 2, 3, 4, 5, ....

We can define + as the concatenation of two different orderings, which leaves the internals of the orderings untouched but specifies that all elements from the second will be larger than any elements from the first. So, "infinity + 1" stands for:

1, 2, 3, 4, .... A.

Here, "A" is a pseudo-number larger than all the finite numbers. (In other words, X is the first infinite number.)

"infinity + infinity":

1, 2, 3, ... A, A+1, A+2, A+3 ...

Here, there are essentially two infinite series, the one that starts with 0, and the one that starts with A.

"infinity + infinity + infinity":

1, 2, 3, ... A, A+1, A+2, ... B, B+1, B+2 ...

Now there are 3 series; 0, A, and B.

"infinity * infinity":

1, 2, 3, ... A, A+1, A+2 ... B, B+1 ... C, C+1 ... D, D+1 ... ...

Now we have an infinite number of series.

And so on. We can imagine each computable function as a method of specifying an infinite tree. The tree is "computable" because we can generate any finite portion of the tree using the computable function. Approximately speaking, the more tangled and structured and multiferous the branching of the tree, the larger the ordinal. But if we try just generating computations at random, then they may not halt; so we've got to try to put as much tangledness in as we can without putting in so much tangledness that things become ill-defined. This can potentially take any knowledge we've got about the halting problem.

If this were all there was to enumerating infinities, then I would say that something like probabilistic mathematical truth explains what it is we're doing. However, mathematicians (especially logicians) talk about ordinals that are not computable. These include Kleene's O, the first uncomputable ordinal (which makes it the ordering on all computable ordinals), as well as many higher countable ordinals; the first uncountable ordinal, which is of course just the ordering of all countable ordinals; and so on.

To make the initial jump outside of the recursive ordinals, to Kleene's O, we need to make an interesting sort of move: we admit our ignorance. We give up the hope of being able to enumerate every ordinal on the way, and make due with saying "if the function halts, then the ordinal is well-defined". Since we never will have all the information about which functions halt, we'll always be somewhat ignorant of which of these ordinals are well-defined. Yet, we look at their ordering as a mathematical entity, and start talking about it.

This gives the impression that we'll need to give up more and more if we want to climb higher in the ordinals. But how much can we give up before we've given everything up?

I propose that, in general, the process of listing ordinals is a process of deciding which things are well-defined. If we give that up, we've given up too much.

Here, "well-defined" means "having a description on some level of Tarski's hierarchy of truth", or alternatively, "having a description on some level of my hierarchy of nonsense".

Of course, this is circular. Those hierarchies are both defined in terms of ordinals, so defining ordinals in terms of them appears unproductive. However, it is not completely unproductive. Let's take Tarski's hierarchy as the working example. Let 0 represent first-order logic, 1 represent the theory of truth for that (specifically, the theory with enough axioms to be equivalent in strength to Peano Arithmetic), 2 to be the theory of truth-in-1, and so on.

The thing I want to note here is that the ordinal assigned to a level in the hierarchy is far lower than the largest ordinals whose existance can be proven in that theory. Suppose I lay down axioms for Tarski's hierarchy in terms of ordinals, and then lay down axioms for ordinals which require definability in Tarski's hierarchy. It seems that I'll get a large number of ordinals in this manner. If I start out believing that the ordinal 1 is well-defined, then I'll believe all the ordinals proven well-defined by Peano arithmetic are well-defined. That is a rather large number of ordinals. Since I believe in them, I'll believe in all the levels of the Tarski hierarchy corresponding to them... lots of levels! This gives me many more ordinals to believe in, which gives me many more levels to believe in, and so on.

Of course, this stops somewhere (in the same sense that counting up stops somewhere...). It will only imply the existence of so many ordinals, assuming that it is a consistent theory. Furthermore, if it is consistent, then I can name an ordinal that it does not: the ordering of all the ordinals the system talks about. Let's call this the "outside ordinal" for the system. (This is a bit trickier to specify than it looks at first, however. We can't just say "the ordering of all ordinals the system will consider well-defined". The system will have gaps in its knowledge. For example, it will prove a bunch of recursive ordinals to be well-defined, but not all of them; it then jumps to Kleene's O, because it can talk about the set of well-defined recursive ordinals. Even more clearly: the system might be able to prove that the first uncountable ordinal is well-defined, but it will not be able to prove that all ordinals below this are well defined... there are uncountably many of them!

The main precaution that must be taken is to prevent the system from taking "the ordering over all ordinals" to be an ordinal. This is like me saying "Consider the set of all well-defined ordinals. Consider the ordering on these as an ordinal, Q. Take Q + 1..." This is not allowed!

OK. Given that, let's think about what happens when we add probabilistic justification to the system. We can think of the system as (eventually) knowing the truth about the halting problem (for any particular instance). This means that it is (eventually) correct in its judgements about well-definedness for computable ordinals. Thanks to the feedback effect of the system, this will mean that it is (eventually) correct in its judgements concerning a whole lot of ordinals. All of them? No: just as there are ordinal notations for which the halting problem determines well-definedness, there are ordinal notations for which the convergence problem determines well-definedness. (For a definition of the convergence problem, see here.) Still, this is an interesting class of ordinals.

So how could one go even further? Well, perhaps we could consider the "outside ordinal" for the version of the system that knows all the halting truths...

## Wednesday, April 08, 2009

Continues Risk Estimate.

I am generally disappointed by the lack of detailed tracking of the progress of computing power... I do not know where to find projections based on large amounts of up-to-date data. The most often cited projection is Kurzweil's, based on data from up to 2000; for example, here. I found one that may be more recent, but it does not show data points, just a curve. These curves are based on what $1000 gets you. It is also interesting to note the supercomputing curve.

Moore himself believes that the exponential trend will stop in 10 to 15 years, but he does not take into account 3d chips (by his own admission) or other possible ways around the halting spot. I think it is safe to allow a few more years based just on that... but since the halting spot is a point of contention, for now I'll just give the analysis based on the raw exponential curve.

By the curve, computers reach human-level processing power between 2020 and 2030. leave the curve at exponential, rather than going super-exponential as Kurzweil suggests we should, the rate looks to be around 5 orders of magnitude for every 10 to 20 years. (I know that is a big range, but I don't have much data to go on.) So if a computer is 1 person in 2025, it is 10 people in 2040, and 100 people in 2055.

What does it mean for a computer to be as smart as 10 or 100 people? A first approximation would be to say that it would be able to accomplish as much intellectual activity as a group of 10 or 100 people that were completely unified in their aim, and could communicate with eachother continuously. But even this estimate is low, because there is a large amount of redundancy in 10 such people. It is hard to estimate how much, but roughly we could say that only 1 visual cortex is needed, and the other 9 people could be using that processign power for something else; only 1 motor cortex is needed, so the other 9 could be using that processing power for something else; and so on. This might (roughly) double the amount of thinking power, after the first person (who needs their motor and visual cortex intact).

So how safe is this? I'd say as soon as we get roughly human level we've got a significant risk of the AI deciding it would be awesome to "get loose" and use spare computer power stolen from the internet. My estimate could be greatly improved here, but there are about 7 billion people in the world, and more will be here by 2020, so assuming 1/8th have computers on the internet (that is where the estimate is shaky) we're talking a 1-billion-fold increase in computing power as soon as an AI is able to "get loose". That assumes that the 1 billion computers on the net are roughly equivalent in power to the 1 that the AI started on (in line with the idea that we're estimating based on what $1000 of computing power is). By my earlier estimate, an AI as smart as 1 person becomes as smart as 2 billion. But once we go distributed, the advanteges I was talking about go away; the massive redundancy becomes necessary. So, back to 1 billion. (The idea that "smartness" merely doubles when we get rid of the inefficiencies of distributed existence is blatantly silly for the cae of billions of people... oh well.)

So, could the world defend itself against the intellectual effort of 1 billion virtual people? We've got 8 billion on our side (by that point)... plus we start out with a good-sized advantage in terms of control of resources. How much could 1 billion hackers aquire on short notice?

For now, I'll give that a 50% chance if extinction assuming bad intent on the AIs part, and assuming it is able to get on the internet, and assuming it is able to create a virus of some kind (or use some other scheme) to get a fair chunk of the internet's computing power. I'll give 50% probability to those other two as well... making 12.5% probability of extreme badness given evil intent. So the question is, how probable is bad intent in this scenario?

By the way, this estimate puts current computers at around the level of a mouse. Do the best current AIs acheive this? Seems doubtful, I'd say. Mice are pretty smart. They accomplish a fair amount of visual recognition, and furthermore, they are able to put it to good use. (The second part is what we have the least ability to model, I think... goal-oriented systems that can flexibly use highly structured sensory info.) So, by the model so far, AI progress will more probably be sudden then gradual... someone will put together an algorithm capable of taking full advantage of the hardware, and things will change.

I'd say that might happen in anywhere between 5 and 20 years. The outcome if it happens in 5 years are very different from those if it happens in 20. If it happens in 5 years, I'd say good results are fairly certain. 10 years, and there is more concern. 20 years and Very Bad Things have fair chances, maybe as high as my 10% "halt everything" level.

Let's take that arbitrary statement and mathematize it... 5 to 10 years = .1% chance of Really Bad Things, 10 to 15 = 1%, 15 to 20 = 10%.

Giving each option a 1/3 probability, we have around 3.7%.

But, giving my assumptions, the best way to reduce risk appears to be trying to find a good algorithm quickly (favoring open research). At some point between 2015 and 2020, the risk goes beyond 1% (which I arbitrarily label as the point of "real concern") and the strategy should turn towards making sure the first good-enough algorithm is also endowed with a safe goal system.

It should be obvious that this estimate is an extremely rough one. More later?

---[edit]---

One of the most obvious factors not considered here is the chance that the brain is badly-designed enough that a human level AI could be run on current PCs. The probability of this is nonzero, but if the brain were really so inefficient (= 5 orders of magnitude of inefficiency), I would expect that human AI programmers would already be outperforming it. The idea that current AIs are not as smart as mice despite having roughly as much hardware suggests that brains are fairly well-engineered. (The statement "roughly as much hardware" here needs to be made more specific. However, as long as it is inside 5 orders of magnitude, the argument makes some sense.)

The picture I've been using mostly for the idea of knowable truths goes something like this: quantifier-free statements are computable (we can tell if they are true or false); statements that can be reduced to a normal form containing only existential quantifiers are computably verifiable (if they are true, we can know this by eventually finding an example) but only probabalistically falsifiable (if they are false, we'll never know it for sure, but we can suspect it because we've tried for a long time to find an example and failed); statemets similarly containing only universal quantifiers are falsifiable but only probabilistically verifiable; statements whose normal form universally asserts an existential statement are probabilistically falsifiable; statements asserting the existence of something satisfying a universal are probabilistically verifyable. Nothing else appears to be knowable in any significant sense.

Nonetheless, even Robinson arithmetic gives us more definite truths than this scheme does. Statements like "For all x, x=x" can be verified, not just falsified. Should this be looked at as probabilistic knowledge? This view becomes understandable if we think of equality as a black box, which we know nothing about. We just feed it numbers, and accept what we get out. If not the axioms of equality, at least the first-order tautologies seem knowable: perhaps "for all x, x=x" is probabilistic knowledge of the behavior of equality, but "for all x, x=x if x=x" is clear and knowable... right? Well, this too could be questioned by claiming that the truth functions are also to be treated as black boxes.

Why would we take this view? Perhaps we claim that knowledge of the actual structure of equality, and similarly of the actual structure of truth functions, should be represented at the metalanguage level. Then, "for all x, x=x" would be something we could prove in the metalanguage, thanks to our additional information about equality, but not in the language itself. I think there is something intuitively appealing about this, despite the restrictive nature of the claim.

The base logic would contain rules for reducing equality statements to true/false, and similarly rules that reduced a truth-function whose arguments were already true or false. A statement that reduces to "true" can then be concluded. In addition to these rules, there would be a rule that allowed existential quantifications to be concluded from examples; this would be a fairly complicated process, as we can pull out any set of identical pieces and replace them with a variable. Once an existential statement has been concluded, it can reduce to "true" in larger expressions. Universal quantifiers can be defined from existentials as usual, which allows us to conclude their negation by finding counterexamples (but does not allow us to affirm any universals).

How could a metalanguage be constructed along similar lines, but yielding the first-order truths we're used to? Not sure. What's needed most is the ability to see that even though we don't have all the information for a truth-function to be calculated, the additional information will create the same answer no matter what.

## Monday, April 06, 2009

4 probability estimates seem essential to me.

- A projection of the growth of computing power. This is of course extensively discussed, so it is at least somewhat reasonable to use standard predictions.
- Estimation of probable AI intelligence levels given different hardware levels. There is a lot of hidden complication here. First, how do we want to measure intelligence? How should we define intelligence level? The most relevant way of doing this is to estimate real-world problem solving capabilities (in each important area). Second, what we want isn't just a fixed maximum intelligence level, but rather a curve over time; this captures information about intelligence explosions due to fast learning or recursive self-improvement. Third, ideally we don't just want one probable curve for each hardware level but rather a probability distribution over such curves.
- Estimation of probability of unfriendly behavior, given various classes of probable goal functions that AI programmers might come up with. Of course we'll also want to assign a probability to each of these goal-classes. The probability of unfriendly behavior depends mainly on the intelligence level reached (which depends in turn on the hardware level). An AI that is only somewhat smarter than humans will more probably have incentive to be nice to humans for the same reasons that humans have incentive to be nice to eachother.
- Estimation of human ability to to respond to various intelligence levels if an AI of that level turns out to be unfriendly.

## Thursday, April 02, 2009

The question of AI ethics is not merely a matter of estimating the probability of Really Bad Things; it is a matter of weighing the probability of Really Bad Things against the probability of Really Good Things.

This makes some of the stuff in the previous post less relevant (though not completely irrelevant). The major question becomes: in various scenarios, what is the ratio of good to bad?

When thinking about these issues, I find myself going back and forth between two concepts of the future:

- AIs increase in intelligence slowly. For a long time, AIs remain at human level or anyway human-understandable level. There is no miracle of recursive self-improvement; you put in what you get out.
- The first AI smart enough to begin recursive self-improvement quickly goes beyond the human-understandable level of intelligence, and has its way with the future.

I have argued already that the first outcome would be safer than the second; the opportunity for Really Good Things is less, but this (I would think) is outweighed by the far reduced probability of Reall Bad Things. (In scenario 1, sociopathic AIs would occur, but they would be of roughly human level and so could be contained; by the time AIs were too powerful, the art of creating ethical AIs would be fairly well perfected. (Also, even if an unethical AI was created, there would be a safety net of nearly-as-smart AIs to contain it.)

Of course, there isn't freedom to choose between the two outcomes, for the most part. Either recursive self-improvement (aka RSI) is a powerful possibility, or it isn't. To some extent, the outcome is dependent on whether the algorithmic research outpaces hardware developments, or vice versa (as discussed last time). The "race between software and hardware" mindset would have it that the best strategy for avoiding Really Bad Things is to make algorithmic progress as quickly as possible-- which favors open research, et cetera. But even if algorithms outpace hardware, there could still be an RSI tipping-point: suddenly the hardware is good enough to support RSI, and outcome 2 occurs. So this isn't well-justified. What is needed is a solid analysis of RSI.

Matt Mahony offers an argument that RSI is inherently slow. However, this model is limited by the (purposeful) restriction to strict self-improvement; in particular, the exclusion of learning. Since explosive RSI-like learning capabilities are essentially as dangerous, this is isn't a particularly helpful model. Shane Legg makes similar claims for learning. As it happens, I disagree with those claims:

3 days ago Matt Mahoney referenced a paper by Shane Legg, supposedly

formally proving this point:

http://www.vetta.org/documents/IDSIA-12-06-1.pdf

I read it, and must say that I disagree with the interpretations

provided for the theorems. Specifically, one conclusion is that

because programs of high Kolmogorov complexity are required if we want

to guarantee the ability to learn sequences of comparably high

Kolmogorov complexity, AI needs to be an experimental science. So,

Shane Legg is assuming that highly complex programs are difficult to

invent. But there is an easy counterexample to this, which also

addresses your above point:

Given is T, the amount of computation time the algorithm is given

between sensory-inputs. Sensory inputs can ideally be thought of as

coming in at the rate of 1 bit per T cpu cycles (fitting with the

framework in the paper, which has data come in 1 bit at a time),

although in practice it would probably come in batches. Each time

period T:

--add the new input to a memory of all data that's come in so far

--Treat the memory as the output of a computer program in some

specific language. Run the program backwards, inferring everything

that can be inferred about its structure. A zero or one can only be

printed by particular basic print statements. It is impossible to know

for certain where conditional statements are, where loops are, and so

on, but at least the space of possibilities is well defined (since we

know which programming language we've chosen). Every time a choice

like this occurs, we split the simulation, so that we will quickly be

running a very large number of programs backwards.

--Whenever we get a complete program from this process, we need to run

it forwards (again, simulating it in parallel with everything else

that is going on). We record what it predicts as the NEXT data, along

with the program's length (because we will be treating shorter

programs as better models, and trusting what they tell us more

strongly than we trust longer programs).

--Because there are so many things going on at once, this will run

VERY slowly; however, we will simply terminate the process at time T

and take the best prediction we have at that point. (If we hadn't

gotten any yet, let's just say we predict 0.)

A more sophisticated version of that alg was presented at the AGI

conference in this paper:

http://www.agiri.org/docs/ComputationalApproximation.pdf

The algorithm will be able to learn any program, if given enough time!

NOW, why did Shane Legg's paper say that such a thing was impossible?

Well, in the formalism of the paper, the above "algorithm" cheated: it

isn't an algorithm at all! Fun, huh?

The reason is because I parameterized it in terms of that number T.

So, technically, it is a class of algorithms; we get a specific

algorithm by choosing a T-value. If we choose a very large T-value,

the algorithm coulf be very "complex", in terms of Kolmogorov

complexity. However, it will not be complex to humans, since it will

just be another instance of the same general algorithm! In fact, it

would just be the same algorithm given more time to do its job.

So, on that grounds, I disagree with Shane Legg: it is still possible

to find algorithms analytically, despite the found algorithms being

"of high complexity". Or, to rephrase, there are simple algorithms of

high Kolmogorov complexity, and those algorithms do the job that can't

be done by algorithms of low Kolmogoriv complexity.

(As an aside, the next paragraph in my email uses a mathematically flawed argument: "Once we have the ordering, I can define the game as: create the lowest video sequence not definable in set theory." What I was trying to do was diagonalize set theory, to define a set-theoretically undefinable video sequence. The right way to do this is not to take the "lowest" sequence not defineable by set theory. Instead: put an ordering on set-theoretical definitions of sequences. For the Nth frame of our non-definable video sequence, look at the Nth set-theoretic video sequence; if the screen is all white in that sequence, take our sequence to be all black on that frame; otherwise, take it to be all white. :D)

Still, all I'm claiming is that we can construct an algorithm that will perform better if it is given more time. That provides a sort of counterexample, but not one that is explosive (until the point where an AI gets the ability to improve its own hardware).

[Edit: In the comments, Shane Legg points out that this doesn't really provide a counterexample. We can construct an algorithm that will do better given more time, but that does not mean that for every sequence we might want to learn, there is some amount of processing time that will allow the algorithm to converge correctly. In fact, that's false; there will be sequences that can't converge, for any fixed amount of time we allow the algorithm. Shane also corrects me on the conclusions he draws: he does not, in fact, conclude that "ai must be an experimental science".]

Eliezer makes a vivid argument for the possibility of explosively fast learning. Here is the key mathematical bit:

I occasionally run into people who say something like, "There's a theoretical limit on how much you can deduce about the outside world, given a finite amount of sensory data."

Yes. There is. The theoretical limit is that every time you see 1 additional bit, it cannot be expected to eliminate more than half of the remaining hypotheses (half the remaining probability mass, rather). And that a redundant message, cannot convey more information than the compressed version of itself. Nor can a bit convey any information about a quantity, with which it has correlation

exactly zero,across the probable worlds you imagine.But nothing I've depicted this human civilization doing, even

beginsto approach the theoretical limits set by the formalism of Solomonoff induction. It doesn't approach the picture you could get if you could search throughevery single computable hypothesis, weighted by their simplicity, and do Bayesian updates onallof them.

He does, however, admit reservations about the possibility of this power being manifested in the physical universe:

No one - not even a Bayesian superintelligence - will ever come remotely close to making efficient use of their sensory information...

...is what I would like to say, but I don't trust my ability to set limits on the abilities of Bayesian superintelligences.

(Though I'd bet money on it, if there were some way to judge the bet. Just not at very extreme odds.)

But, of course, an entity (say) halfway between humans and a Bayesian Superintelligence is still rather dangerous.