## Friday, August 21, 2009

Undefinability is a harsh reality.

Any solution that one offers is still vulnerable to the attack: full self-reference seems impossible. Regardless of how clever you think you've been, if you propose a solution, I can point to a structure that can't be defined within that system: the system itself. If that's not the case, then the system is inconsistent. (And even then, even if we allow inconsistent systems, I know of no system which could be said to fully describe itself.)

What this all suggests is that human beings (along with a broad range of sentient entities) are fundamentally incapable of articulating our own inner logic. Why? For the same reason that a powerset is larger than a set: if we could fully refer to ourselves, our logic would be "larger than itself" (loosely speaking).

It comes down to a problem of wanting to be able to form sentences that mean anything we might want. If we're fully capable, then we can construct a sentence that is true precisely when it is not true... and the system falls apart.

Kripke's fixed point theory offers a nice fix: with some specific machinery, we're able to call these sentences "undefined". But now we can't refer properly to this idea, "undefined". So we've got a complete theory of truth (one might say), but we're still stuck with regards to undefinability.

So, it looks like we've got to accept it: we can't find a mind (artificial or otherwise) that fulfills the imperative "Know Thyself". The self remains shrouded in mystery. What's a self-respecting AI engineer to do? Am I forced to always design minds with less power of logical reference than my own, because I could not comprehend a design that was truly up to human capability? Are artificial intelligences doomed to be fancy calculators, lacking "understanding" because they will always have a weaker logical structure?

First, no. That doesn't really follow. It's quite possible to use an algorithm that can in principle learn anything: evolution. For example, one could build an artificial mind that held an initially simple program within it, mutated the recently run areas of code when punishment occured, and strengthened recently run code against mutation when rewarded. Or, a little more sophisticated, one could implement Schmidhuber's Sucess Story algorithm, which always and only keeps apparently beneficial mutations, is capable of learning what and when to mutate, can learn to delay reward, and has other desireable features. And yet again, we could try William Pearson's design, which sets up an artificial economy of agents which can co-evolve to produce the desired behavior. With these styles of approaches, there is not a worry of fundamental limitation: such systems can learn the correct logic if it exists (it just might take quite a while!). The worry, rather, is that these aprroaches do not take full advantage of the data at hand. There is no guarantee that they will perform better given more processing power and memory, either. In short, they are not a model of rationality.

This could be taken as an indication that studying logic and rationality is not directly relevant to AI, but I would not agree with such an argument. For one thing, it is possible to derive a model of rationality from such approaches. If they work, there is a reason. The techniques each essentially provide some way of evaluating how a particular program of behavior is doing, together with a technique of searching through the possible behaviors. One could consider the space of all possible programs that might have generated the behavior so far, rather than the single program that actually did. One then takes the best program from that space, or perhaps a weighted vote. Obviously there will be some details to fill in (which is to say that such models of rationality don't just follow directly from the evolutionary algorithms employed), but the general approach is clear... such a system would take an infinite amount of processing power to compute, so one would need to use approximations; the more computing power given, the closer the approximation could be. All the data at hand is now being used, because the system now has the ability to go back and re-think details of the past, asking if particular sensory patterns might have been clues warning of a punishment, et cetera.

So why not accept such models of rationality? I have two reasons... first, they are purely reinforcement-learning-based. Agents based on these models can be driven only by pleasure and pain. There is no ability to consider external, unobserved objects; everything consists of patterns of directly observed sensation. Second, even if one is OK with purely reward-based systems, it is not clear that these are optimal. The evaluation criteria for the programs is not totally clear. There needs to be some assumption that punishment and reward are associated with recently taken actions, and recently executed code, but it cannot be too strong... The sucess story approach looks at things in terms of modifying a basic policy, and a modification is held responsible for all reward and punishment after the point at which it is made. The simple mutation-based scheme I described instead would use some decaying recent-use marker to decide responsibility. William Pearson suggests dividing time up into large sections, and splitting up the total goodness of the section as the payment for the programs that were in charge for that time. Each of these will result in different models of rationality.

So, I desire an approach which contains explicit talk of an outside world, so that one can state goals in such language, and furthermore can apply utility theory to evaluate actions toward those goals in an optimal way. But, that takes me back to the problem: which explicit logic do I use? Am I doomed to only understand logics less powerful than my own internal logic, and hence, to create AI systems limited by such logics?

One way out which I'm currently thinking about is this: a system may be initially self-ignorant, but may learn more about itself over time. This idea came from the thought that if I was shown the correct logic, I could refer to its truth predicate as an external thing, and so appear to have greater logical power than it, without really causing a problem. Furthermore, it seems I could learn about it over time, perhaps eventually gaining more referential power.

In understanding one's own logic, one becomes more powerful, and again does not understand one's own logic. The "correct logic", then, could be imagined to be the (unreachable) result of an infinite amount of self-study. But can we properly refer to such a limit? If so, it seems we've got an interesting situation on our hands, since we'd be able to talk about the truth predicate of a language more referentially powerful than any other... Does the limit in fact exist?

I need to formalize this idea to evaluate it further.

## Thursday, June 18, 2009

Inspired by this blog.

I have mentioned most of these thoughts before, but I am writing them up in one post as a cohesive argument. I will argue that uncomputable models are important. I am not arguing that people think in ways that computers cannot, but rather than people and computers alike can benefit from using models that have "holes" which represent meaningful questions that can't be answered definitively by running a computation.

The world that we live in consists of a bunch of macro-objects. What are macro-objects? Roughly speaking, they are coherent structures of micro-objects, which recur in both time and space.

A macro-object "emerges" from the micro-dynamics. It is a structure which persists, propagating itself into the future, like a wave in water.

A spherical bubble is a macro-object, because slight warps in the sphere get evened out over time. (Until it pops.)

Similarly, a water droplet, a planet, and a star.

An atom is an emergent object (though not macro-level) because the positive and negative charges of the electrons and protons enter into a stable relationship.

A grasshopper is a stable entity because its metabolism maintains a homeostasis which allows it to live for a fair amount of time.

Similarly for all living organisms.

And, so on.

The key idea here is that all of these objects are in a convergent state: a state that (within limits) it always returns to after wandering away from.

Now, to logical foundations. The halting problem is unsolvable; there is no computable algorithm which can tell you which computations do and do not halt. But, suppose we could run our computer an infinite amount of time to get the answer. A machine that can do this is called a hypercomputer. Then, we could solve the halting problem by waiting to see if the computation in question halted; so, we can use hypercomputations to answer many questions that regular computations cannot answer. However, we've got a new type of problem. Some computations will flip back and forth between answers infinitely often, so that when we run them an infinite amount of time, the output is indeterminate. The result of the hypercomputation is then undefined, or "nonconvergent".

Asking whether a hypercomputation converges is analagous to asking whether a normal computation halts. In a specific sense, it is twice as hard: if we solved the halting problem for normal computations, and made a little magic box that can give the correct answer for halting questions, and connect that to an ordinary computer, then we have a machine equivalent to a hypercomputer. Asking whether the programs of the new machine halt is actually equivalent to asking if hypercomputations converge.

So, halting is uncomputable, but convergence is doubly so!

Yet, I have argued that convergence is all around us. On an everyday basis, we deal with convergent states as if they were solid entities. So, I argue, we are viewing the world through an uncomputable model.

Mathematically, one should expect reasoning about convergence to be quite hard (assuming, as I do, that human reasoning is computable). Everyday reasoning is not "quite hard" in this way. We mitigate the full force of the uncomputability of our models with many coping strategies; we mainly reason under the assumption of convergence (for structures that have converged in the past), rather than attempting to question this assumption. We have to learn when things converge and fail to converge. Yet, even so, using uncomputable models is easier than trying to apply computable models to the problem. Asking whether a structure is generally convergent is a very useful abbreviation, approximately summing up a lot of questions about the state at a given time.

Also, it is important to admit that the mathematically pure concept of convergence is not quite what we are interested in. In practical situations, we are interested in whether something is quickly convergent. This is not uncomputable; however, it can be more expensive to check then reasoning abstractly about convergence. So (and this is probably the weakest point in my argument) I think it is worthwhile to keep reasoning about the uncomputable models.

Another interesting point is that, much of the time, while we have a fairly good concept of the emergent convergent objects we deal with day to day, we do not have such a good idea of the underlying dynamic. This means that, in practice, we do not ask too many convergence-hard questions. Often, we think we already have those answers, and instead we ask what sort of underlying structure might give rise to them.

PS--

I am being somewhat messy here, because "convergent" in the case of hypercomputation does not mean quite the same thing as "convergent" in the case of emergent objects. For one thing, convergence of entities has to do with correcting for disturbances from the external environment, while convergence for hypercomputations does not. I think this difference does not harm the argument. As I see it, emergent-convergence is a more general problem, having hypercomputation-convergence as a subproblem.

## Sunday, June 14, 2009

Typed lambda calculus is not Turing-complete.

There. I said it.

More specifically, simply typed lambda calculus is not Turing-complete, and neither are any variants that are both strongly normalizing and have decidable type-checking. This is because programs that the type-checker verifies are guaranteed to compute a result. If such a type-checker allowed a Turing-complete set of programs, it would be a solution to the halting problem!

Really, I should have put two and two together earlier on this one. I suppose this is what comes of picking lambda calculus up by reading diverse things in diverse places rather than learning it from one authoritative source.

What this indicates for me is that, at least in many cases, the point of allowing more and more sophisticated type systems is to get closer to a Turing-complete system. That is why people add things like parametric polymorphism, dependent types, and kinds. When we add these to typed lambda calculus, it doesn't just get "more expressive" in the sense that a high-level programming language is more expressive than machine code; it is literally able to do things that a simpler type system could not.

This doesn't mean that strongly typed programming languages are not Turing-complete. Typically the type-checkers for these will not guarantee that the program contains no infinite loops. So, one must be careful to figure out exactly what one is dealing with.

## Wednesday, June 10, 2009

In this post, I will explain a way to create a theory of sets, given a theory of truth. These two foundational issues are typically treated as separate matters, so asking for a particular relationship to hold between a theory of truth and a set theory adds some interesting constraints to the situation.

(Part of this post is an edited version of an email I sent originally to Randall Holmes.)

Claim: A set is essentially a sentence with a hole in it. Set membership is characterized by the truth/falsehood of the sentences when we fill the holes.

The major justification for this way of understanding sets is the way we use the comprehension principle to talk about sets. The comprehension principle is what allows us to say, in a set theory, that if we can describe the membership requirements for a set then that set exists. For example, I can describe the requirement "it must be a prime number that is over 5 digits long", so the set of prime numbers over five digits long exists. (The comprehension principle with no restrictions leads to naive set theory, however, which is paradoxical.)

This view is not far from the way Frege explained sets, as I understand it. However, he distinguished the set as the extension of the sentence-with-hole; meaning, the things for which it is true.

So, suppose we've got a logic with enough machinery to represent computable functions, and furthermore we've got a description of the language in itself (ie, Godel-numbering-or-something-equivalent). Furthermore, we've got some theory of truth. Then, via the claim, we can already talk about sets, even though they haven't been purposefully introduced. In particular, "x is an element of y" is

"When "x" is used to fill in the partial sentence Y, the result is true"

where "x" is the name of the term x (Godel-numbering-or-equivalent, again, for those who are familiar with such things), and Y is the sentence-with-hole corresponding to the set y.

The truth predicate is needed here in order to assert the result of the substitution. With the naive theory of truth, it is always meaningful to apply the truth predicate to a sentence. So, the naive theory of truth gives us the naive theory of sets, in which set-membership is meaningful for any set we can describe. Of course, this is inconsistent under classical logic.

So, what I'm saying is: if the claim is accepted, then the set theory is pinned down completely by the theory of truth. The naive theory of truth gives us the naive theory of sets. A tarski-hierarchy theory of truth gives us something vaguely resembling type theory. Kripke's theory of truth gives us a theory in which all sets exist, but not all membership evaluations are meaningful. In particular, Russel's set "all sets that do not contain themselves" exists. We can meaningfully say that any set of integers is in Russel's set, and that the set of all sets (which exists) is not. The paradoxical situation, in which we ask if Russel's set is a member of itself, is simply meaningless.

So good so far. But, there is the issue of extensionality to deal with. The axiom of extensionality is taken as a very basic fact of set theory, one that not even nonstandard set theories consider rejecting. Given the above discussion, however, the axiom of extentionality would be false. Two different sentences-with-holes can be logically equivalent, and so have the same extension. For example, "_ is an even prime number" and "_ added to itself equals 4" are the same set, but they are different sentences.

My solution here is to interpret the notion of set-equality as being a notion of logical equivalence between sentences-with-holes, rather than one of syntactic equivalence. In other words, "x=y" for two sets x and y needs to be interpreted as saying that X and Y mutually imply each other given any slot-filler, rather than just as saying X=Y. But this is doable within the language, since all we need to do is quantify over the slot-filler-names.

This can be thought of as my way of interpreting Frege's concept of the "extension" of a sentence-with-hole. Rather than being a seperate entity, the extension is a "view" of the sentence: the sentence up-to-equivalence.

## Monday, April 27, 2009

One of the big rifts between the way things are described in the theoretical foundations of mathematics and the way things are actually done in real mathematics is that real mathematicians are constantly defining new objects to study. Perhaps someone sees a particular combination of properties pop up a few times, so they give that combination a name and start studying it directly. Perhaps someone decides a particular restriction of a given problem will be easier to solve, and so gives that a name and starts studying it. Perhaps someone wants to know how many X need to be combined to make Y, and they call the number the "X number" of Y, and start studying the properties of it as X and Y are varied. And so on.

In fact, even when studying the foundations of mathematics, such names are often snuck in as "unofficial notation" that is thought of as abbreviating the official notation. Unofficial notation makes axioms and other formulae easier to read.

Why do we do this? I can think of two explanations. First, one can explain this in the same way "named entities" are explained in other domains: it is compressive to name common entities. This means it takes up less room in memory, and also as a side benifit it usually makes the world easier to predict. This is an important explanation, but not the focus of this post. The second explanation is that naming things is a means of inference control.

When we name things, what we proceed to do is to determine some basic properties of the new entities. (This seems true in non-mathematical discourse as well, but perhaps not as clear-cut.) We come up with properties of the new entities, and look for ways of calculating those properties. We come up with theorems that hold for the entities. We look for existence and uniqueness proofs. And so on. What I'm arguing is that all of this activity is basically focused on helping later inference.

One particular phenomenon serves as a good illustration of this: the behavior of mathematicians when studying sequences of numbers that emerge from particular mathematical entities. The prime numbers; the number of groups of each finite size; the number of graphs of a given size; the fibonacci sequence. The big thing to try to do with a sequence is to "find the pattern". What could this mean? If the sequence is mathematically well-defined, don't we know the pattern already? A mathematician will find the first few values of a sequence, and then study them looking for relationships. Often what is sought is a recursive definition of the sequence: a function that calculates the next number based on the direct previous number, or several of the previous numbers, or perhaps all of the previous numbers. If we've already got a recursive form of the sequence, we might try to find a "closed form" version. My argument here is that all of this behavior is explained by the desire to make later reasoning as expedient as possible. Finding the recursive form is not merely a sort of curiosity, like wondering if it is possible to keep a spider alive in a paper cup; rather, the recursive form is a faster way of calculating the sequence then the method that follows directly from the definition. Similarly, the closed form wil usually be even faster.

So, what I'm saying essentially is that a reasoning system should (when it isn't busy doing other things) be looking for nice entities related to the task at hand, and nice quick ways of calculating stuff about those entities. What "nice entity" means is based on two (sometimes conflicting) motivations: the entity should be compressive (meaning it should help make descriptions take up less space), and it should be tractable (meaning reasoning about it should be quick).

How should the search for good ways of calculating be carried out? By the same methods as general inference. The system should be able to apply good reasoning methods that it finds back onto the task of looking for good reasoning methods. Of course, for this to work very well, the system needs to be able to have fairly good reasoning methods to begin with.

What form should the problem-solving methods that the system finds take?

Definitely they should be able to take on the form of typical algorithms: closed-form expressions, recursive expressions, and generally any . Definitely these should be associated with the necessery criteria for application to an object (the criteria that guarantee correct results). Probably they would also be associated with provable upper bounds on runtime, so that the method chosen for a particular case (where multiple methods might apply) would be the one with the lowest time-estimate for that case. (Problems might arise for difficult-to-calculate time bounds; perhaps estimators would be required to be linear time.)

Some extra features could probably improve upon this basic setup:

-Allow possibly-non-terminating algorithms. This would make "search for a proof" count amoung the methods (as the last resort), which strikes me as elegent.

-Allow learned expected time bounds

-Allow "soft" problem-solving strategies (of possibly various sorts) to be learned; ie, heuristics

Last time, I wrote about the apparent limitations of logical systems arising purely from a formal axiomatization of the Tarski hierarchy. I said:

So, if this gets us a lot of math, what is missing?

The obvious answer is "a truth predicate for that system". This doesn't lend much insight, though.

I should have thought before I spoke. A more powerful language cannot be constructed by adding a truth predicate. The semantics of the language is that it should be referring to the Tarski hierarchy! A truth predicate over such a language would need to be a truth predicate over the entire Tarski hierarchy. Such a truth predicate would apparently correspond to an ordinal above all ordinals. Not good!

As a side note: I've been looking at Quine's "New Foundations", and in that system, it is possible to talk about the ordering of all ordinals. Surprisingly, this ordinal does not cause problems. So, maybe in New Foundations the above move would be OK. With a naive view of the ordinals, though, it is not.

Keeping the naive view, it seems like I should deny the possibility of enriching the language by adding a truth predicate. Does that mean that I should say that the language is maximally rich? I don't think so. I suspect the situation is more like what happens with Kripke's language that contains it's own truth predicate: the language can be expanded in other, new directions.

[edit- some of the text was garbled as posted. I may have lost a paragraph or so.]

## Saturday, April 25, 2009

(Analysis of the system proposed in this post)

The idea of creating axiom systems describing Tarski's hierarchy in terms of ordinals, and ordinals in terms of Tarski's hierarchy, was discussed previously. It seems that the circularity would help prove the existence of lots of ordinals (and lots of Tarskian levels), hopefully without leading to paradox. So, if this gets us a lot of math, what is missing?

The obvious answer is "a truth predicate for that system". This doesn't lend much insight, though.

My bet is that we don't get anything uncountable by such a method. Where would it come from? We're constructing truth-predicates for countable languages, and (initially) countable sets of countable languages. Sure, the set of all countable ordinals is uncountable. But there is no reason to believe that we get the set of all countable ordinals!

My feeling is that if I studied more about the hyperarithmetical hierarchy, there would be an obvious mapping between some portion of it and the system(s) under consideration.

In some ways, the notion of "uncountable" seems to come from nowhere. All the formal constructions for "real number" and related entities seem to rely at some point on some other uncountable set. It's a bit like the (obviously illicit) method of defining a really big ordinal: "The ordering of all ordinals that can be defined without reference to this ordinal".

Yet I do feel that the notion is meaningful! So, where might it come from?

I have several ideas, none of them entirely convincing.

One idea is that uncountable sets can be motivated by considering the space of possible sequences of input from the external environment, assuming a countably infinite amount of time. I've seen similar ideas elsewhere. One might counter that all structures die eventually, so all sequences of input in real-world situations are finite; this makes the set into a mere countable infinity. On the other hand, one might say that this is merely the case in practice; the idea of an infinitely long-lived structure is still sound, and even physically possible (if not probable). But even so, I don't feel like this is a really complete justification.

Another attempt would be to claim that we need to allow for the possibility of infinitely long statements, despite not being able to actually make and manipulate such statements. (Universal statements and such may abbreviate infinitely many claims, but they are not literally infinitely long.) This idea is motivated by the following consideration: a nice way of getting a theory of set theory from a non-set-theoretic foundational logic is to think of a set as a statement with an unfilled slot into which entity-names can be put to make complete statements. Claiming that a set contains an element is thought of as claiming that the statement is true of that object. At first, this might seem to fully justify naive set theory: a set exists for each statement-with-hole. But, this can only work if the theory contains its own truth predicate, so that we can make arbitrary talk about whether a statement is true when we fill a slot with a given element. The amount of set theory that gets captured by this method depends basically on the extent to which the theory is capapble of self-reference; the naive theory of truth corresponds to naive set theory.

This is interesting by itself, but the point I'm making here is that if we want to have an uncountable number of sets (for example if we believe in the uncountable powerset of the real numbers), then we'll want a logic that acts as if infinite statements exist. What this means is an interesting question; we can't actually use these infinite statements, so what's the difference with the logic?

One difference is that I don't automatically have a way of interpreting talk about turing machines and numbers and other equivalent things anymore. I was justifying this referential power via talk about statements: they provide an immediate example of such entities. If we posit infinite statements that are "out there", somewhere in our set of sentences, we lose that quick way of grounding the idea of "finite". (This could be taken to show that such a method of grounding is not very real in the first place. :P)

Semantically, then, we think of the base-language as an infinitary logic, rather than regular first-order logic. The language formed by adding the first truth predicate is then thought of as already containing talk about uncountable sets. (This changes the axoims that we'd be justified in using to manipulate the truth predicate.)

All in all, I think this direction is mathematically interesting, but I'm not sure that it is really a route to justify uncountables.

A relevant question is: why do mathematicians think that uncountables exist? The proof is given by taking the powerset of some countably infinite set, which is defined as the set of all subsets of the countably infinite set. It's then shown that no function exists that maps the powerset onto the countable set. This can be done even in systems that does not really have any uncountable sets: the set of subsets of a countably infinite set will map onto the original set, but not by a function within the system. So from inside the system, it will look as if there are uncountables.

If this explains what mathematicians are doing, then mathematicians are being fooled into thinking there are real uncountables... but how can I say that? I'm just being fooled into thinking that there is a difference, a "real" and "not real", right?

I think it is plausible that this weirdness would go away, or at least change significantly, in a logic that resolves other foundational problems. We might have a much better story for why the function that would map the uncountable set onto the countable set doesn't exist, so that it becomes implausible to claim that it exists from the outside but not from the inside. (But would that make the uncountables "real"?)

(Originally posted to this logic mailing list I'm trying to get started. All are welcome! Except if you know nothing about logic and aren't willing to learn. Then you're not welcome. Sorry.)

An interpretation is a translation of a statement from one logic into

another. I was looking at interpretations recently because they appear

to be a way to show that semantically stronger logics (specifically,

logics formed by adding a truth predicate) really are stronger in

practice. The stronger logic can interpret the weaker, but the weaker

cannot interpret the stronger. (This is one type of proof-theoretic

strength.)

Normally, one does not allow just *any* interpretation... for example,

we wouldn't want a universal quantifier in the first logic to be

interpreted as an existential one in the new logic. It is typical (so

far as I have seen) to require the logical connectives to remain

unchanged, and only allow minimal changes to the quantifiers (such as

translating them to be restricted quantifiers).

Yet, all we care about is structure in these interpretations. For

example, if we're interpreting talk about numbers into a logic that

talks about sets, we don't really care if the resulting translated

sentences don't have anything to do with sizes of sets-- all we're

supposed to worry about is relationships. For example, our new

translated notion of "addition" should tell is that "5" and "6" makes

"11". (Usually when people do construct interpretations, of course,

they try to find ones that make some intuitive sense.)

So if all we care about is structure, why constrain the way logical

connectives and quantifiers are translated? What happens if we get rid

of these constraints?

Well... we can't get rid of *all* the constraints: we still need to

capture what we mean by "interpretation". Not every function from one

language to the other counts! The way I see it, we want to keep the

following minimal constraint:

C1: If A |- B in L1, then f(A) |- f(B) in L2

Where f(X) is the translation function, "X |- Y" means Y can be proven

from X, L1 is the language being translated, L2 is the language being

translated into, and A and B are statements in L1.

I've also thought of the following:

C2: A |- B in L1 if and only if f(A) |- f(B) in L2

But, that is not much like the standard notion of interpretation.

Essentially it means that the interpretation into L2 adds no extra

knowledge about the entities in L1. But, if we're looking for strong

languages, extra knowledge is a good thing (as long as it is true

knowledge). (One could argue that this justification doesn't apply if

we're only worrying about structure, though, I think. Specifically, of

when we say "structure" we mean not just structures of what's provable

but also structures of what isn't, we should take C2 as the proper

constraint. I'll proceed with C1 since it is the more standard-looking

constraint.)

Anyway. What now happens to the earlier assertion, that semantically

stronger languages are also proof-theoretically stronger, because they

can interpret more logics?

Answer: plain first-order logic can interpret any logic L with a

computable notion of provability.

Proof: Arbitrarily declare that some of the function symbols represent

the operations necessary to build up statements in L (for example, one

function might be chosen for each character in the alphabet of L, so

that composing those functions in a particular sequence would

represent writing down characters in that sequence). For an

L-statement X, call the first-order version of X built just from these

function symbols h(X). Write down some statements describing

L-provability. Writing down the rules of L like this is possible

because first-order logic is Turing-complete. Take the conjunction of

the statements about L and call it R (for "rules"). The interpretation

of an L-statement X will simply be R->h(X), where "->" is material

implication. This will satisfy C1 because wherever L proves A from B,

first-order logic will prove that if the rules of L-provability hold,

then A is L-provable from B. (This proof is still pretty messy, but

nonetheless I'm guessing I'm going into more detail here than needed,

so I'll leave it messy for now.)

What does this mean? To me this means that proof-theoretic strength is

not a very justifiable way of enticing people over to the side of

logics with strong semantics. First-order logic is in some sense

proof-theoretically all-powerful (if we allow arbitrary

interpretations, and if we're dealing with computable provability). If

someone is set in their path of using just first-order logic, I cannot

convince them just by talking about first-order truth; first-order

logic doesn't have a strong enough semantics to talk about first-order

truth, but they can interpret what I am saying by just listening to

the computable inference rules of my more-powerful logic. Every time I

say X, they can interpret me as meaning "The rules I've laid out imply

X". They will then be able to assert that first-order reasoning can

justify all the reasoning I'm doing, without even needing any new

axioms.

asserting that X follows from the rules I've laid out... but if

they're *really* applying the interpretation properly, they'll tell me

that I'm making a meaningless distinction, since they'll think

R->(R->h(X)) is the same as R->h(X). (If they make this reasoning

explicit, though, I have them: I can assert that I'm not saying

R->h(X), I'm saying h(X).)

## Friday, April 17, 2009

Continues this post.

I think now that it is relatively straightforward to establish a correspondence between the Tarski hierarchy of truth and my hierarchy of nonsense.

Basically, the two hierarchies diverge thanks to two different notions of the correct way to add a "truth" predicate to a base language. The Tarski hierarchy adds a metalanguage that only talks about truth in the base language. The nonsense hierarchy instead prefers Kripke's method, in which we construct a metalanguage that contains its own truth predicate. Both approaches can be thought of as constructing a metalanguage on top of a base language, but the Tarski hierarchy keeps doing so, resulting in a hierarchy of truth, whereas the Kripke fixed-point construction cannot be iterated- doing so adds nothing more. To continue upwords in the Kripke construction, we proceed in a different direction, adding nonsense predicates.

When we use the Kripke truth construction, we can clearly interpret the first Tarski iteration: all the truth-sentences that talk only about truth of base-language statements will be there, provided we have enough machinery to interpret the restricted quantifiers. (Details here will depend on the exact construction.) The semantics assigns them the same truth values; none of these sentences will come up undefined. (I'm talking about semantic interpretations, not proof-theoretic ones... again, details need to be worked out.) The second iteration of Tarskian truth will similarly be inside the Kripke construction; since the first iteration gets definite truth values, the second does. So it goes for as long as the Kripke construction can interpret the restricted quantifiers; that is, for as long as the characteristics of a particular level of the Tarski hierarchy are definable given the tools that the Kripke construction has at its disposal. For example, if these tools can only define computable structures, I'd suppose that the Kripke construction would interpret the portions of the Tarski hierarchy corresponding to the computable ordinals. (That's just a guess. More research required!)

In any case, given a particular amount of expressiveness in the base langauge, the Kripke construction will add a definite amount of expressiveness, corresponding to climbing a particular number of Tarski-hierarchy steps. (Probably this is known; I imagine someone has researched the semantic expressiveness of the Kripke least-fixed-point...) So what happens when we add in more nonsense predicates? Well, adding in nonsense predicates basically allows us to climb that same number of levels again; each nonsense predicate plays the role of allowing us to talk fully about the semantic structure of the construction-so-far (the role that the truth predicate plays in the Tarski hierarchy). This can be thought of as adding that amount of structure to the base language. Then, the Kripke truth construction can do its work on that increased amount of structure. So, we jump up the same number of steps on the Tarski hierarchy for every nonsense predicate added.

Eventually, since the amount of structure added by the truth predicate is always fixed, the scene will be dominated by the hierarchical structure added by the nonsense predicates. Still, it seems clear that each level will correspond in a definite way to a level on the Tarski hierarchy. The nonsense hierarchy merely forces one to make larger jumps at a time.

## Friday, April 10, 2009

What are we doing when we list infinities? Particularly, I take ordinal infinities as my example.

The most natural ordinals, which people will tend to stay within if you ask them to invent bigger and bigger ordinals and give them the basic definitions, are the recursive ordinals. In fact, people will tend to stay within what I'd call the "prinitive recursive" ordinals:

infinity, infinity + 1, infinity + 2, .... infinity + infinity (= 2*infinity), 3*infinity, 4*infinity, ... infinity*infinity (= infinity^2), infinity^3... infinity^infinity (=infinity^^2), infinity^(infinity^infinity) (=infinity^^3), infinity^(infinity^(infinity^infinity)) (=infinity^^4) ...... infinity^^infinity ..... infinity^^^infinity ..... infinity^^^^infinity .....

The above uses up-arrow notation.

Enumerating computable ordinals involves finding halting computations, such as the numerical operations used above. Sticking infinity into a computable function basically stands for the infinite series produced by sticking finite numbers in that same location. So just "infinity" stands for:

1, 2, 3, 4, 5, ....

We can define + as the concatenation of two different orderings, which leaves the internals of the orderings untouched but specifies that all elements from the second will be larger than any elements from the first. So, "infinity + 1" stands for:

1, 2, 3, 4, .... A.

Here, "A" is a pseudo-number larger than all the finite numbers. (In other words, X is the first infinite number.)

"infinity + infinity":

1, 2, 3, ... A, A+1, A+2, A+3 ...

Here, there are essentially two infinite series, the one that starts with 0, and the one that starts with A.

"infinity + infinity + infinity":

1, 2, 3, ... A, A+1, A+2, ... B, B+1, B+2 ...

Now there are 3 series; 0, A, and B.

"infinity * infinity":

1, 2, 3, ... A, A+1, A+2 ... B, B+1 ... C, C+1 ... D, D+1 ... ...

Now we have an infinite number of series.

And so on. We can imagine each computable function as a method of specifying an infinite tree. The tree is "computable" because we can generate any finite portion of the tree using the computable function. Approximately speaking, the more tangled and structured and multiferous the branching of the tree, the larger the ordinal. But if we try just generating computations at random, then they may not halt; so we've got to try to put as much tangledness in as we can without putting in so much tangledness that things become ill-defined. This can potentially take any knowledge we've got about the halting problem.

If this were all there was to enumerating infinities, then I would say that something like probabilistic mathematical truth explains what it is we're doing. However, mathematicians (especially logicians) talk about ordinals that are not computable. These include Kleene's O, the first uncomputable ordinal (which makes it the ordering on all computable ordinals), as well as many higher countable ordinals; the first uncountable ordinal, which is of course just the ordering of all countable ordinals; and so on.

To make the initial jump outside of the recursive ordinals, to Kleene's O, we need to make an interesting sort of move: we admit our ignorance. We give up the hope of being able to enumerate every ordinal on the way, and make due with saying "if the function halts, then the ordinal is well-defined". Since we never will have all the information about which functions halt, we'll always be somewhat ignorant of which of these ordinals are well-defined. Yet, we look at their ordering as a mathematical entity, and start talking about it.

This gives the impression that we'll need to give up more and more if we want to climb higher in the ordinals. But how much can we give up before we've given everything up?

I propose that, in general, the process of listing ordinals is a process of deciding which things are well-defined. If we give that up, we've given up too much.

Here, "well-defined" means "having a description on some level of Tarski's hierarchy of truth", or alternatively, "having a description on some level of my hierarchy of nonsense".

Of course, this is circular. Those hierarchies are both defined in terms of ordinals, so defining ordinals in terms of them appears unproductive. However, it is not completely unproductive. Let's take Tarski's hierarchy as the working example. Let 0 represent first-order logic, 1 represent the theory of truth for that (specifically, the theory with enough axioms to be equivalent in strength to Peano Arithmetic), 2 to be the theory of truth-in-1, and so on.

The thing I want to note here is that the ordinal assigned to a level in the hierarchy is far lower than the largest ordinals whose existance can be proven in that theory. Suppose I lay down axioms for Tarski's hierarchy in terms of ordinals, and then lay down axioms for ordinals which require definability in Tarski's hierarchy. It seems that I'll get a large number of ordinals in this manner. If I start out believing that the ordinal 1 is well-defined, then I'll believe all the ordinals proven well-defined by Peano arithmetic are well-defined. That is a rather large number of ordinals. Since I believe in them, I'll believe in all the levels of the Tarski hierarchy corresponding to them... lots of levels! This gives me many more ordinals to believe in, which gives me many more levels to believe in, and so on.

Of course, this stops somewhere (in the same sense that counting up stops somewhere...). It will only imply the existence of so many ordinals, assuming that it is a consistent theory. Furthermore, if it is consistent, then I can name an ordinal that it does not: the ordering of all the ordinals the system talks about. Let's call this the "outside ordinal" for the system. (This is a bit trickier to specify than it looks at first, however. We can't just say "the ordering of all ordinals the system will consider well-defined". The system will have gaps in its knowledge. For example, it will prove a bunch of recursive ordinals to be well-defined, but not all of them; it then jumps to Kleene's O, because it can talk about the set of well-defined recursive ordinals. Even more clearly: the system might be able to prove that the first uncountable ordinal is well-defined, but it will not be able to prove that all ordinals below this are well defined... there are uncountably many of them!

The main precaution that must be taken is to prevent the system from taking "the ordering over all ordinals" to be an ordinal. This is like me saying "Consider the set of all well-defined ordinals. Consider the ordering on these as an ordinal, Q. Take Q + 1..." This is not allowed!

OK. Given that, let's think about what happens when we add probabilistic justification to the system. We can think of the system as (eventually) knowing the truth about the halting problem (for any particular instance). This means that it is (eventually) correct in its judgements about well-definedness for computable ordinals. Thanks to the feedback effect of the system, this will mean that it is (eventually) correct in its judgements concerning a whole lot of ordinals. All of them? No: just as there are ordinal notations for which the halting problem determines well-definedness, there are ordinal notations for which the convergence problem determines well-definedness. (For a definition of the convergence problem, see here.) Still, this is an interesting class of ordinals.

So how could one go even further? Well, perhaps we could consider the "outside ordinal" for the version of the system that knows all the halting truths...

## Wednesday, April 08, 2009

Continues Risk Estimate.

I am generally disappointed by the lack of detailed tracking of the progress of computing power... I do not know where to find projections based on large amounts of up-to-date data. The most often cited projection is Kurzweil's, based on data from up to 2000; for example, here. I found one that may be more recent, but it does not show data points, just a curve. These curves are based on what $1000 gets you. It is also interesting to note the supercomputing curve.

Moore himself believes that the exponential trend will stop in 10 to 15 years, but he does not take into account 3d chips (by his own admission) or other possible ways around the halting spot. I think it is safe to allow a few more years based just on that... but since the halting spot is a point of contention, for now I'll just give the analysis based on the raw exponential curve.

By the curve, computers reach human-level processing power between 2020 and 2030. leave the curve at exponential, rather than going super-exponential as Kurzweil suggests we should, the rate looks to be around 5 orders of magnitude for every 10 to 20 years. (I know that is a big range, but I don't have much data to go on.) So if a computer is 1 person in 2025, it is 10 people in 2040, and 100 people in 2055.

What does it mean for a computer to be as smart as 10 or 100 people? A first approximation would be to say that it would be able to accomplish as much intellectual activity as a group of 10 or 100 people that were completely unified in their aim, and could communicate with eachother continuously. But even this estimate is low, because there is a large amount of redundancy in 10 such people. It is hard to estimate how much, but roughly we could say that only 1 visual cortex is needed, and the other 9 people could be using that processign power for something else; only 1 motor cortex is needed, so the other 9 could be using that processing power for something else; and so on. This might (roughly) double the amount of thinking power, after the first person (who needs their motor and visual cortex intact).

So how safe is this? I'd say as soon as we get roughly human level we've got a significant risk of the AI deciding it would be awesome to "get loose" and use spare computer power stolen from the internet. My estimate could be greatly improved here, but there are about 7 billion people in the world, and more will be here by 2020, so assuming 1/8th have computers on the internet (that is where the estimate is shaky) we're talking a 1-billion-fold increase in computing power as soon as an AI is able to "get loose". That assumes that the 1 billion computers on the net are roughly equivalent in power to the 1 that the AI started on (in line with the idea that we're estimating based on what $1000 of computing power is). By my earlier estimate, an AI as smart as 1 person becomes as smart as 2 billion. But once we go distributed, the advanteges I was talking about go away; the massive redundancy becomes necessary. So, back to 1 billion. (The idea that "smartness" merely doubles when we get rid of the inefficiencies of distributed existence is blatantly silly for the cae of billions of people... oh well.)

So, could the world defend itself against the intellectual effort of 1 billion virtual people? We've got 8 billion on our side (by that point)... plus we start out with a good-sized advantage in terms of control of resources. How much could 1 billion hackers aquire on short notice?

For now, I'll give that a 50% chance if extinction assuming bad intent on the AIs part, and assuming it is able to get on the internet, and assuming it is able to create a virus of some kind (or use some other scheme) to get a fair chunk of the internet's computing power. I'll give 50% probability to those other two as well... making 12.5% probability of extreme badness given evil intent. So the question is, how probable is bad intent in this scenario?

By the way, this estimate puts current computers at around the level of a mouse. Do the best current AIs acheive this? Seems doubtful, I'd say. Mice are pretty smart. They accomplish a fair amount of visual recognition, and furthermore, they are able to put it to good use. (The second part is what we have the least ability to model, I think... goal-oriented systems that can flexibly use highly structured sensory info.) So, by the model so far, AI progress will more probably be sudden then gradual... someone will put together an algorithm capable of taking full advantage of the hardware, and things will change.

I'd say that might happen in anywhere between 5 and 20 years. The outcome if it happens in 5 years are very different from those if it happens in 20. If it happens in 5 years, I'd say good results are fairly certain. 10 years, and there is more concern. 20 years and Very Bad Things have fair chances, maybe as high as my 10% "halt everything" level.

Let's take that arbitrary statement and mathematize it... 5 to 10 years = .1% chance of Really Bad Things, 10 to 15 = 1%, 15 to 20 = 10%.

Giving each option a 1/3 probability, we have around 3.7%.

But, giving my assumptions, the best way to reduce risk appears to be trying to find a good algorithm quickly (favoring open research). At some point between 2015 and 2020, the risk goes beyond 1% (which I arbitrarily label as the point of "real concern") and the strategy should turn towards making sure the first good-enough algorithm is also endowed with a safe goal system.

It should be obvious that this estimate is an extremely rough one. More later?

---[edit]---

One of the most obvious factors not considered here is the chance that the brain is badly-designed enough that a human level AI could be run on current PCs. The probability of this is nonzero, but if the brain were really so inefficient (= 5 orders of magnitude of inefficiency), I would expect that human AI programmers would already be outperforming it. The idea that current AIs are not as smart as mice despite having roughly as much hardware suggests that brains are fairly well-engineered. (The statement "roughly as much hardware" here needs to be made more specific. However, as long as it is inside 5 orders of magnitude, the argument makes some sense.)

The picture I've been using mostly for the idea of knowable truths goes something like this: quantifier-free statements are computable (we can tell if they are true or false); statements that can be reduced to a normal form containing only existential quantifiers are computably verifiable (if they are true, we can know this by eventually finding an example) but only probabalistically falsifiable (if they are false, we'll never know it for sure, but we can suspect it because we've tried for a long time to find an example and failed); statemets similarly containing only universal quantifiers are falsifiable but only probabilistically verifiable; statements whose normal form universally asserts an existential statement are probabilistically falsifiable; statements asserting the existence of something satisfying a universal are probabilistically verifyable. Nothing else appears to be knowable in any significant sense.

Nonetheless, even Robinson arithmetic gives us more definite truths than this scheme does. Statements like "For all x, x=x" can be verified, not just falsified. Should this be looked at as probabilistic knowledge? This view becomes understandable if we think of equality as a black box, which we know nothing about. We just feed it numbers, and accept what we get out. If not the axioms of equality, at least the first-order tautologies seem knowable: perhaps "for all x, x=x" is probabilistic knowledge of the behavior of equality, but "for all x, x=x if x=x" is clear and knowable... right? Well, this too could be questioned by claiming that the truth functions are also to be treated as black boxes.

Why would we take this view? Perhaps we claim that knowledge of the actual structure of equality, and similarly of the actual structure of truth functions, should be represented at the metalanguage level. Then, "for all x, x=x" would be something we could prove in the metalanguage, thanks to our additional information about equality, but not in the language itself. I think there is something intuitively appealing about this, despite the restrictive nature of the claim.

The base logic would contain rules for reducing equality statements to true/false, and similarly rules that reduced a truth-function whose arguments were already true or false. A statement that reduces to "true" can then be concluded. In addition to these rules, there would be a rule that allowed existential quantifications to be concluded from examples; this would be a fairly complicated process, as we can pull out any set of identical pieces and replace them with a variable. Once an existential statement has been concluded, it can reduce to "true" in larger expressions. Universal quantifiers can be defined from existentials as usual, which allows us to conclude their negation by finding counterexamples (but does not allow us to affirm any universals).

How could a metalanguage be constructed along similar lines, but yielding the first-order truths we're used to? Not sure. What's needed most is the ability to see that even though we don't have all the information for a truth-function to be calculated, the additional information will create the same answer no matter what.

## Monday, April 06, 2009

4 probability estimates seem essential to me.

- A projection of the growth of computing power. This is of course extensively discussed, so it is at least somewhat reasonable to use standard predictions.
- Estimation of probable AI intelligence levels given different hardware levels. There is a lot of hidden complication here. First, how do we want to measure intelligence? How should we define intelligence level? The most relevant way of doing this is to estimate real-world problem solving capabilities (in each important area). Second, what we want isn't just a fixed maximum intelligence level, but rather a curve over time; this captures information about intelligence explosions due to fast learning or recursive self-improvement. Third, ideally we don't just want one probable curve for each hardware level but rather a probability distribution over such curves.
- Estimation of probability of unfriendly behavior, given various classes of probable goal functions that AI programmers might come up with. Of course we'll also want to assign a probability to each of these goal-classes. The probability of unfriendly behavior depends mainly on the intelligence level reached (which depends in turn on the hardware level). An AI that is only somewhat smarter than humans will more probably have incentive to be nice to humans for the same reasons that humans have incentive to be nice to eachother.
- Estimation of human ability to to respond to various intelligence levels if an AI of that level turns out to be unfriendly.

## Thursday, April 02, 2009

The question of AI ethics is not merely a matter of estimating the probability of Really Bad Things; it is a matter of weighing the probability of Really Bad Things against the probability of Really Good Things.

This makes some of the stuff in the previous post less relevant (though not completely irrelevant). The major question becomes: in various scenarios, what is the ratio of good to bad?

When thinking about these issues, I find myself going back and forth between two concepts of the future:

- AIs increase in intelligence slowly. For a long time, AIs remain at human level or anyway human-understandable level. There is no miracle of recursive self-improvement; you put in what you get out.
- The first AI smart enough to begin recursive self-improvement quickly goes beyond the human-understandable level of intelligence, and has its way with the future.

I have argued already that the first outcome would be safer than the second; the opportunity for Really Good Things is less, but this (I would think) is outweighed by the far reduced probability of Reall Bad Things. (In scenario 1, sociopathic AIs would occur, but they would be of roughly human level and so could be contained; by the time AIs were too powerful, the art of creating ethical AIs would be fairly well perfected. (Also, even if an unethical AI was created, there would be a safety net of nearly-as-smart AIs to contain it.)

Of course, there isn't freedom to choose between the two outcomes, for the most part. Either recursive self-improvement (aka RSI) is a powerful possibility, or it isn't. To some extent, the outcome is dependent on whether the algorithmic research outpaces hardware developments, or vice versa (as discussed last time). The "race between software and hardware" mindset would have it that the best strategy for avoiding Really Bad Things is to make algorithmic progress as quickly as possible-- which favors open research, et cetera. But even if algorithms outpace hardware, there could still be an RSI tipping-point: suddenly the hardware is good enough to support RSI, and outcome 2 occurs. So this isn't well-justified. What is needed is a solid analysis of RSI.

Matt Mahony offers an argument that RSI is inherently slow. However, this model is limited by the (purposeful) restriction to strict self-improvement; in particular, the exclusion of learning. Since explosive RSI-like learning capabilities are essentially as dangerous, this is isn't a particularly helpful model. Shane Legg makes similar claims for learning. As it happens, I disagree with those claims:

3 days ago Matt Mahoney referenced a paper by Shane Legg, supposedly

formally proving this point:

http://www.vetta.org/documents/IDSIA-12-06-1.pdf

I read it, and must say that I disagree with the interpretations

provided for the theorems. Specifically, one conclusion is that

because programs of high Kolmogorov complexity are required if we want

to guarantee the ability to learn sequences of comparably high

Kolmogorov complexity, AI needs to be an experimental science. So,

Shane Legg is assuming that highly complex programs are difficult to

invent. But there is an easy counterexample to this, which also

addresses your above point:

Given is T, the amount of computation time the algorithm is given

between sensory-inputs. Sensory inputs can ideally be thought of as

coming in at the rate of 1 bit per T cpu cycles (fitting with the

framework in the paper, which has data come in 1 bit at a time),

although in practice it would probably come in batches. Each time

period T:

--add the new input to a memory of all data that's come in so far

--Treat the memory as the output of a computer program in some

specific language. Run the program backwards, inferring everything

that can be inferred about its structure. A zero or one can only be

printed by particular basic print statements. It is impossible to know

for certain where conditional statements are, where loops are, and so

on, but at least the space of possibilities is well defined (since we

know which programming language we've chosen). Every time a choice

like this occurs, we split the simulation, so that we will quickly be

running a very large number of programs backwards.

--Whenever we get a complete program from this process, we need to run

it forwards (again, simulating it in parallel with everything else

that is going on). We record what it predicts as the NEXT data, along

with the program's length (because we will be treating shorter

programs as better models, and trusting what they tell us more

strongly than we trust longer programs).

--Because there are so many things going on at once, this will run

VERY slowly; however, we will simply terminate the process at time T

and take the best prediction we have at that point. (If we hadn't

gotten any yet, let's just say we predict 0.)

A more sophisticated version of that alg was presented at the AGI

conference in this paper:

http://www.agiri.org/docs/ComputationalApproximation.pdf

The algorithm will be able to learn any program, if given enough time!

NOW, why did Shane Legg's paper say that such a thing was impossible?

Well, in the formalism of the paper, the above "algorithm" cheated: it

isn't an algorithm at all! Fun, huh?

The reason is because I parameterized it in terms of that number T.

So, technically, it is a class of algorithms; we get a specific

algorithm by choosing a T-value. If we choose a very large T-value,

the algorithm coulf be very "complex", in terms of Kolmogorov

complexity. However, it will not be complex to humans, since it will

just be another instance of the same general algorithm! In fact, it

would just be the same algorithm given more time to do its job.

So, on that grounds, I disagree with Shane Legg: it is still possible

to find algorithms analytically, despite the found algorithms being

"of high complexity". Or, to rephrase, there are simple algorithms of

high Kolmogorov complexity, and those algorithms do the job that can't

be done by algorithms of low Kolmogoriv complexity.

(As an aside, the next paragraph in my email uses a mathematically flawed argument: "Once we have the ordering, I can define the game as: create the lowest video sequence not definable in set theory." What I was trying to do was diagonalize set theory, to define a set-theoretically undefinable video sequence. The right way to do this is not to take the "lowest" sequence not defineable by set theory. Instead: put an ordering on set-theoretical definitions of sequences. For the Nth frame of our non-definable video sequence, look at the Nth set-theoretic video sequence; if the screen is all white in that sequence, take our sequence to be all black on that frame; otherwise, take it to be all white. :D)

Still, all I'm claiming is that we can construct an algorithm that will perform better if it is given more time. That provides a sort of counterexample, but not one that is explosive (until the point where an AI gets the ability to improve its own hardware).

[Edit: In the comments, Shane Legg points out that this doesn't really provide a counterexample. We can construct an algorithm that will do better given more time, but that does not mean that for every sequence we might want to learn, there is some amount of processing time that will allow the algorithm to converge correctly. In fact, that's false; there will be sequences that can't converge, for any fixed amount of time we allow the algorithm. Shane also corrects me on the conclusions he draws: he does not, in fact, conclude that "ai must be an experimental science".]

Eliezer makes a vivid argument for the possibility of explosively fast learning. Here is the key mathematical bit:

I occasionally run into people who say something like, "There's a theoretical limit on how much you can deduce about the outside world, given a finite amount of sensory data."

Yes. There is. The theoretical limit is that every time you see 1 additional bit, it cannot be expected to eliminate more than half of the remaining hypotheses (half the remaining probability mass, rather). And that a redundant message, cannot convey more information than the compressed version of itself. Nor can a bit convey any information about a quantity, with which it has correlation

exactly zero,across the probable worlds you imagine.But nothing I've depicted this human civilization doing, even

beginsto approach the theoretical limits set by the formalism of Solomonoff induction. It doesn't approach the picture you could get if you could search throughevery single computable hypothesis, weighted by their simplicity, and do Bayesian updates onallof them.

He does, however, admit reservations about the possibility of this power being manifested in the physical universe:

No one - not even a Bayesian superintelligence - will ever come remotely close to making efficient use of their sensory information...

...is what I would like to say, but I don't trust my ability to set limits on the abilities of Bayesian superintelligences.

(Though I'd bet money on it, if there were some way to judge the bet. Just not at very extreme odds.)

But, of course, an entity (say) halfway between humans and a Bayesian Superintelligence is still rather dangerous.

## Friday, March 27, 2009

I recently had a discussion with Eliezer Yudkowski. I contacted him mainly because I thought from what I knew of his thinking that he would be interested in foundational logical issues, yet I saw no technical details of this sort in his many online writings (except for a cartoon guide to Lob's Theorem). I quickly learned the reason for this. Eliezer believes that there is great risk associated with AI, as I already knew. What I had not guessed (perhaps I should have) was that he therefore considers technical details too dangerous to release.

My natural disposition is to favor open research. So, I am very reluctant to accept this position. Of course, that is not in itself an argument! If I want to reject the conclusion, I need an actual reason-- but furthermore I should search just as hard for arguments in the other direction, lest I bias my judgement.

(That second part is hard... although, it is made easier by the large body of literature Eliezer himself has produced.)

The key question is the actual probability of Very Bad Things happening. If the chance of disaster thanks to AI research is above, say, .1%, we should be somewhat concerned. If the chances are above 1%, we should be really concerned. If the chances are above 10%, we should put a hold on technological development altogether until we can find a better solution. (AI panic puts the current probability at 26.3%!)

There are two issues here: the probability that AIs will become sufficiently powerful to pose a serious threat, and the probability that they would then proceed to do bad things. These topics are large enough individually to merit a long discussion, but I'll try to jot down my thoughts.

AI Power

From what I've read (not from my recent discussion with him), Eliezer's main argument for the first point is that AIs will, because they are software, have a far greater ability to self-improve than humans have. Thus, if they start out with approximately human-level intelligence, they could end up far smarter than us just by thinking for a while. "Thinking for a while" could mean a few days, or a few years-- the risk is still there, so long as the AIs would eventually reach superhuman intelligence.

Personally, I think the human mind has a fairly high ability to self-improve. Sure, we can't modify our neural substrate directly, but it can equally be said that an AI can't modify its silicon. Our software could in principle be equally maleable; and the possibility, combined with the assumption that self-improvement is highly effective, would suggest that evolution would have hit on such a solution.

However. Eliezer likes to say that evolution is not very smart. This is certainly true. So, I don't really know how to assign a probability to evolution having hit upon a comparatively flexible form of recursive self-improvement. So, I need to take into account more evidence.

- I can't "examine my own source code". An AI could. (If humans could do this, AI would be an easy problem!)
- I can learn new ways of thinking. I can even consciously examine them, and decide to accept/reject them.
- I can't wish habits away.
- An AI could run detailed simulations of parts of itself to test improved algorithms.
- I can run mental simulations of myself to some extent (and do so).

This conclusion is neither here nor there in terms of the question at hand.

Other arguments to the effect that AIs could gain enough power are ready at hand, however. The main one is that if a particular amount of computing power can create human-level intelligence, it seems pretty obvious that larger amounts of computing power will create larger amounts of intelligence. It also seems like people would willingly cross this threshold, even competing to create the largest AI brain. Even if self-improvement turns out to be a total flop, this would create superhuman intelligences. If one such intelligence decided to grab more hardware (for example using a virus to grab computing power from the internet) the amount of computing power avaliable to it would probably become rather large rather fast.

(All of this assumes human-level AI is possible in the first place: a notable assumption, but one I will not examine for the moment.)

The argument from superhuman intelligence to superhuman power is fairly straightforward, though perhaps not 100% certain. The AI could hack into things, accumulate large amounts of money through careful investment, buy a private army of robots... more probably it would come up with a much cleverer plan. How smart, exactly, does it need to get in order for this to be a danger? Estimates range from the level of a smart human (because a smart human can already be dangerous) to the intelligence of all humans combined (because that is what the AI would be up against). For the really bad scenarios to occur, it would seem, the AI needs to be capable of major scientific innovation; innovation on the order of at least a team of scientists. (I must admit that, here and now, a single scientist could potentially release a deadly disease from a lab-- this involves no innovation, since the diseases are already there. But, this is because a select few scientists have special access to these diseases. An AI might get such access, but that doesn't seem especially probable until a time when AIs are all around and being given all sorts of other responsibilities... at which point, if the AIs are actually unfriendly, the disease is only one of many worries.)

One question remains: is there enough computing power around currently to cause concern? This is something that did come up in the conversation. If current machines are not risky, then it could be possible today to hit upon the right AI design, yet not achaive human-level intelligence using it. Personally, I think this would be ideal. Such a scenario, with AIs gradually increasing in intelligence as the hardware increased in capability, would give humans time to experiment with AI technology, and also consider its consequences. (Indeed, some argue that this is already the situation: that the fundamental algorithms are already known, and the hardware just needs to catch up. I don't agree, although I can't deny that a large amount of progress has already been made.)

Eliezer argued that human-level intelligence on modern-day machines was plausible, because evolution is not a good engineer, so human-level intelligence may require far less hardware than the human brain provides. Estimates based on the brain's computing power vary quite widely, because it is not at all clear what in the brain constitutes useful computation and what does not. Low estimates, so far as I am aware, put the brain's computing power near to today's largest supercomputer. High estimates can basically go as far as one likes, claiming that chemistry or even quantum physics needs to be simulated in order to capture what is happening to create intelligence.

Of course, the internet is vastly more powerful than a single machine. But the risk of an AI escaping to the internet does not seem very high until that AI is at least near human level pre-escape. So, what is the probability that current machines could be human-level with the current algorithm?

My faith in evolution's engineering capabilities is somewhat higher than Eliezer's. Specifically, Eliezer is (from what I've read) quite fond of the study of cognitive bias that has become a popular subfield of psycholgy. While I enjoy Eliezer's writings on rationality, which explicates many of these biases, I am reluctant to call them design flaws. Upon reflection, there are better ways of doing things, and explicating these better ways is an important project. But my best guess is that each cognitive bias we have is there for a reason, essentially because it makes for a good pre-reflection guess. So, rather than design flaws, I see the cognitive biases as clever engineering tricks. (I do not know exactly how far from Eliezer's way of thinking this falls.) This is merely a default assumption; if I studied the cognitive bias literature longer, attempted to come up with explanations for each bias, and failed, I might change my mind. But, for now, I am not comfortable with assuming large amounts of mental inefficiency... although I admit I do have to postulate some. On the other hand, I also have to postulate a fairly high amount of inefficiency to human-made AI, because it is a hard problem.

So, again, this leads neither here nor there.

But, the real question is not whether today's computers are human level; the more critical question is quite a bit more complicated. Essentially:

Will there be a critical advance in software that occurs at a time when the existing hardware is enough to create an AI hazard, or will software advances come before hardware advances, such that humanity has sufficient time to get used to the implications and plan ahead?

Again, a difficult question to answer. Yet it is a really important question!

A historical view will of course show a fairly good match-up between amount of processing power and results. This old paper begins with such an account geared towards computer vision. Yet, there are real advances in algorithms happening, and they will continue to happen. A small but striking example of sudden improvement in algorithms is provided by Graphplan, an algorithm which changed the AI planning landscape in 1997. Of course, today, the algorithms are even better. So, hardware clearly isn't everything.

A proper estimate would involve a serous analysis of the pace of advance in computing-- how probable is it that Moore's law will keep its pace, speed up, slow down, et cetera-- and likewise an analysis of progress in AI algorithmics. But, equally important is the other side of the question; "such that humanity has sufficient time to get used to the implications and plan ahead". How much hope is there of this, assuming that the software is available before the hardware?

I've said that I think this is the ideal outcome-- that people have a while to first get used to near-human-level AI, then human-level, then superhuman level. Part of why I think this is that, in this scenario, there would probably be many superhuman AIs rather than just one. I think this would improve the situation greatly, but the reasons are more a topic for the section on whether an AI would in fact do bad things. In terms of AI power, the situation is not as persuasive. It seems perfectly possible that a society that became used to the presence of AIs would give them various sorts of power withought thinking too hard, or perhaps even thinking that AIs in power were safer than humans in power.

The thing is, they might be right-- depending on how well-designed the AIs were. Which brings us to:

AI Ethics

If an AI gained sufficient power, would it destroy humanity?

Of course, that depends on many variables. The real question is:

Given various scenarious for an AI of sufficient power being created, what would that AI do?

The major scenarios under consideration:

- Many powerful AIs (an many not-so-powerful AIs) are developed as part of an ongoing, incremental process open to essentially all of humankind

- A single powerful AI is developed suddenly, by a single organization, as a result of a similarly open process

- A powerful AI is developed by a single organization, but as a result of a closed process designed to minimize the risk of AI methods falling into the wrong hands

This, of course, depends on the idea that it is easier to create an unfriendly AI than a friendly one. Eliezer has written at length on this. The key concept is that a mind can have any particular goal; that there is no system of ethics that will be universally accepted by any sufficiently intelligent mind, because we can quite literally program a mind that has the exact opposite set of values for any given set. (Just reverse the sign on the utility function.)

The argument I gave to eliezer for universal ethics is essentially the same as a view once argued by Roko. (Roko has since been convinced that Eliezer is correct.) He calls it the theory of "instrumental values". The idea is that most rational agents will value truth, science, technology, creativity, and several other key items. It is possible to contrive utility functions that will not value these things, but most will. Therefore, a future created by a bad AI will not be devoid of value as Eliezer argues; rather, unless it has a really weird utility function, it will look a lot like the future that a good AI would create.

This is a topic I want to go into a lot of detail on, and if the first part of the post hadn't ended up being so long, I probably would. Instead, I'll blog more about it later...

For now, it is important to observe (as Eliezer did in our conversation) that reguardless of far-off future similarities, a good AI and a bad AI have an immediate, critical difference: a bad AI will very probably consider humans a waste of resources, and do away with us.

Similarly, a badly designed but well-intentioned AI will very probably result in a future devoid of purpose for humans. Maximizing happiness will justify forced drugging. Maximizing a more complicated value based on what people actually consider good may easily result in a locking-in of currently enjoyed hobbies, art forms, et cetera. Maximizing instrumental values might easily lead to the destruction of humans.

The terrible, horrible dillemma (it seems to me) is that once a single AI gains power, be it good or bad, it seems that the single utility function that such an AI is programmed with becomes completely locked in. Any flaws in the utility function, no matter how small they may seem, will be locked in forever as well.

There are various ways of getting around this, to some extent. One approach I've been tossing around in my head for a while is that an AI should be uncertain about its own goals, similar to human uncertainty about what is ethical. This is entirelyadmissable within a Bayesian formalism. What, then, would the AI take as evidence concerning ethics? I visualize it something like this: the AI would have a fair amound of (hand-programmed) knowledge about what sorts of things are probably ethical, and it would search for simple rules that would meet these criteria. A better theory would be one that fit better to the patchwork of preconceptions about ethics. Preconceptions would include things like "what a human considers ethical, is more probably ethical..." "what a human, given large amounts of time to reflect, would consider ethical, is more probably ethical..." as well as simpler statements like "killing an unwilling victim is unethical with high probability", creating pain, and so on. A few different Three-Laws style systems could "fight it out", so to speak.

Eliezer suggests a different sort of solution: an AI should behave in a highly lawful manner, setting definite rules and consequences, rather than merely doing whatever it takes to do "good" as defined by humanity. He's suggesting this as a solution to a somewhat different problem, but it applies about as well here. An AI that booted up, calculated a good set of laws for utopia, set up physical mechanisms to enforce those laws, and then shuts off, will not lock the future into a single utility function. It will of course give it a huge push in a particular direction, but that is quite different. It is purposefully leaving the future open, because an open future is a plus according to (at least) the majority of humans.

The third option that I can think of is one I've already mentioned: have several powerful AIs rather than one. This still carries a large risk. 20 AIs that decide humans are useless are just as bad as 1 AI that decides humans are useless. However, 20 AIs with well-intentioned-but-wrong utility functions are probably much better than 1, so long as they all have different well-intentioned utility functions.

The AIs would probably have incentive to enforce a balance of power. If one AI becomes obviously more powerful than the others, the others have incentive to gang up on it, because that one persuing its utility function to the utmost is probably far worse for the others than whatever the group consensus is. That consensus should be something favoring humans, since the individual goals are all random variations of that theme... if we look at all the goals, and ask what they have in common, favoring humanity should be the answer.

Of course, that result isn't entirely certain. First, the average of many mistaken goals is not necessarily a good goal. Second, the average is not necessarily the sort of compromize that would result. Third, once a compromize has been agreed upon, the AIs might (rather than maintaining their standoff) all rewrite their utility functions to reflect the consensus, and effectively merge. (This would be to avoid any future defectors; the utility of stopping other possible defectors might be higher than the utility of keeping your ability to defect by keeping your utility function, thus making it rational to agree to a group rewrite.) In this case, the lock-in that I'm afraid of would happen anyway (although we'd be locked in to a probably-less-terrible utility function). Fourth, the situation might not result in a standoff in the first place. Even with several AIs to begin with, one could gain an upper hand.

Friendly AI (as Eliezer calls it) is a hard question. But, it is not the question at hand. The question at hand is:

Which is better, an open research process or a closed one?

I've touched on a few of the factors, but I haven't come close to a definite answer. A proper answer requires an examination of the friendliness issue; an analysis of the curve of technology's growth (especially as it relates to computing power); an examination of what sort of theoretical advances could create a powerful AI, and in particular how suddenly they could occur; an idea of AIs future place in society (both sub-human, human-level, and superhuman AI), which requires a socio-economic theory of what we will do with AI; and a theory of AI psychology, mapping the space of possible minds (focusing on which minds are friendly to humans).

I'll try to address each of these issues more closely in the next few posts.

## Sunday, March 22, 2009

Consider the original stated purpose of my investigation: to examine which statements are probabilistically justifiable. To probabilistically accept one of the enumerators simply because (so far) all of its implications appear to be true would be a violation of the semantics, unless the truth value of an enumerator really was just the conjunction of the enumerated statements. So to satisfy my original purpose, such a thing is needed.

As I said, paradox can be avoided by allowing enumerators to be neither true nor false. In particular, it is natural to suppose a construction similar to Kripke's least-fixed-point theory of truth: an enumerator is "ungrounded" if it implies some chain of enumerators which never bottoms out to actual statements; ungrounded enumerators are neither true nor false.

The problem is that we will now want to refer to the ungrounded-ness of those sentences, since it appears to be a rather important property. For this we need to augment the language. This can be done in multiple ways, but it will ultimately lead to a new reference failure. And filling in that hole will lead to yet another hole to fill. In general I would deal with this by using my theory that almost works. This entails building an infinite hierarchy of truth values which starts:

True, False, Meaningless1, Meaningless2, Meaningless3, MeaninglessInfinity, MeaninglessInfinity+1...

I am generally interested in investigating whether this hierarchy is equal in power to the usual Tarski hierarchy, namely,

True1, True2, True3, ... TrueInfinity, TrueInfinity+1, ...

The difference basically springs out of the use of a loopy truth predicate (a truth predicate that can apply truly to sentences which contain the same truth predicate). Asking for a single truth predicate appears to force the existence of infinitely many meaninglessness predicates. Is a loopy truth predicate automatically more powerful? Not obviously so: the loopy truth predicate will have a maximal amount of mathematical structure that it implies (the least fixed point), which does not even include uncountable entities. The Tarski hierarchy will continue into the uncountables, and further. (I should note that that statement is not universally accepted... but if someone suggests that the Tarski hierarchy should stop at some well-defined level, can't I just say, OK, let's make a truth predicate for that level? Shy not keep going? And we don't actually need an uncountable number of truth predicates: we need an ordinal notation that can refer to uncountable ordinals, and we just make the truth predicate take an ordinal as an agrument.)

So the question is: is there an isomorphism between the mathematical structures captured by the Tarski heirarchy of truth, and my hierarchy of nonsense.