Artificial Intelligence: foundations of mathematics

Showing posts with label foundations of mathematics. Show all posts

Friday, April 10, 2009

Enumerating Infinity

What are we doing when we list infinities? Particularly, I take ordinal infinities as my example.

The most natural ordinals, which people will tend to stay within if you ask them to invent bigger and bigger ordinals and give them the basic definitions, are the recursive ordinals. In fact, people will tend to stay within what I'd call the "prinitive recursive" ordinals:

infinity, infinity + 1, infinity + 2, .... infinity + infinity (= 2*infinity), 3*infinity, 4*infinity, ... infinity*infinity (= infinity^2), infinity^3... infinity^infinity (=infinity^^2), infinity^(infinity^infinity) (=infinity^^3), infinity^(infinity^(infinity^infinity)) (=infinity^^4) ...... infinity^^infinity ..... infinity^^^infinity ..... infinity^^^^infinity .....

The above uses up-arrow notation.

Enumerating computable ordinals involves finding halting computations, such as the numerical operations used above. Sticking infinity into a computable function basically stands for the infinite series produced by sticking finite numbers in that same location. So just "infinity" stands for:

1, 2, 3, 4, 5, ....

We can define + as the concatenation of two different orderings, which leaves the internals of the orderings untouched but specifies that all elements from the second will be larger than any elements from the first. So, "infinity + 1" stands for:

1, 2, 3, 4, .... A.

Here, "A" is a pseudo-number larger than all the finite numbers. (In other words, X is the first infinite number.)

"infinity + infinity":

1, 2, 3, ... A, A+1, A+2, A+3 ...

Here, there are essentially two infinite series, the one that starts with 0, and the one that starts with A.

"infinity + infinity + infinity":

1, 2, 3, ... A, A+1, A+2, ... B, B+1, B+2 ...

Now there are 3 series; 0, A, and B.

"infinity * infinity":

1, 2, 3, ... A, A+1, A+2 ... B, B+1 ... C, C+1 ... D, D+1 ... ...

Now we have an infinite number of series.

And so on. We can imagine each computable function as a method of specifying an infinite tree. The tree is "computable" because we can generate any finite portion of the tree using the computable function. Approximately speaking, the more tangled and structured and multiferous the branching of the tree, the larger the ordinal. But if we try just generating computations at random, then they may not halt; so we've got to try to put as much tangledness in as we can without putting in so much tangledness that things become ill-defined. This can potentially take any knowledge we've got about the halting problem.

If this were all there was to enumerating infinities, then I would say that something like probabilistic mathematical truth explains what it is we're doing. However, mathematicians (especially logicians) talk about ordinals that are not computable. These include Kleene's O, the first uncomputable ordinal (which makes it the ordering on all computable ordinals), as well as many higher countable ordinals; the first uncountable ordinal, which is of course just the ordering of all countable ordinals; and so on.

To make the initial jump outside of the recursive ordinals, to Kleene's O, we need to make an interesting sort of move: we admit our ignorance. We give up the hope of being able to enumerate every ordinal on the way, and make due with saying "if the function halts, then the ordinal is well-defined". Since we never will have all the information about which functions halt, we'll always be somewhat ignorant of which of these ordinals are well-defined. Yet, we look at their ordering as a mathematical entity, and start talking about it.

This gives the impression that we'll need to give up more and more if we want to climb higher in the ordinals. But how much can we give up before we've given everything up?

I propose that, in general, the process of listing ordinals is a process of deciding which things are well-defined. If we give that up, we've given up too much.

Here, "well-defined" means "having a description on some level of Tarski's hierarchy of truth", or alternatively, "having a description on some level of my hierarchy of nonsense".

Of course, this is circular. Those hierarchies are both defined in terms of ordinals, so defining ordinals in terms of them appears unproductive. However, it is not completely unproductive. Let's take Tarski's hierarchy as the working example. Let 0 represent first-order logic, 1 represent the theory of truth for that (specifically, the theory with enough axioms to be equivalent in strength to Peano Arithmetic), 2 to be the theory of truth-in-1, and so on.

The thing I want to note here is that the ordinal assigned to a level in the hierarchy is far lower than the largest ordinals whose existance can be proven in that theory. Suppose I lay down axioms for Tarski's hierarchy in terms of ordinals, and then lay down axioms for ordinals which require definability in Tarski's hierarchy. It seems that I'll get a large number of ordinals in this manner. If I start out believing that the ordinal 1 is well-defined, then I'll believe all the ordinals proven well-defined by Peano arithmetic are well-defined. That is a rather large number of ordinals. Since I believe in them, I'll believe in all the levels of the Tarski hierarchy corresponding to them... lots of levels! This gives me many more ordinals to believe in, which gives me many more levels to believe in, and so on.

Of course, this stops somewhere (in the same sense that counting up stops somewhere...). It will only imply the existence of so many ordinals, assuming that it is a consistent theory. Furthermore, if it is consistent, then I can name an ordinal that it does not: the ordering of all the ordinals the system talks about. Let's call this the "outside ordinal" for the system. (This is a bit trickier to specify than it looks at first, however. We can't just say "the ordering of all ordinals the system will consider well-defined". The system will have gaps in its knowledge. For example, it will prove a bunch of recursive ordinals to be well-defined, but not all of them; it then jumps to Kleene's O, because it can talk about the set of well-defined recursive ordinals. Even more clearly: the system might be able to prove that the first uncountable ordinal is well-defined, but it will not be able to prove that all ordinals below this are well defined... there are uncountably many of them!

The main precaution that must be taken is to prevent the system from taking "the ordering over all ordinals" to be an ordinal. This is like me saying "Consider the set of all well-defined ordinals. Consider the ordering on these as an ordinal, Q. Take Q + 1..." This is not allowed!

OK. Given that, let's think about what happens when we add probabilistic justification to the system. We can think of the system as (eventually) knowing the truth about the halting problem (for any particular instance). This means that it is (eventually) correct in its judgements about well-definedness for computable ordinals. Thanks to the feedback effect of the system, this will mean that it is (eventually) correct in its judgements concerning a whole lot of ordinals. All of them? No: just as there are ordinal notations for which the halting problem determines well-definedness, there are ordinal notations for which the convergence problem determines well-definedness. (For a definition of the convergence problem, see here.) Still, this is an interesting class of ordinals.

So how could one go even further? Well, perhaps we could consider the "outside ordinal" for the version of the system that knows all the halting truths...

Sunday, March 22, 2009

After having thought about it, I am becoming more convinced that the correct approach to the issues mentioned in these two posts is to accept the line of reasoning that deduces an enumerator's truth from the truth of all its consequences, thus (as mentioned) inviting paradox.

Consider the original stated purpose of my investigation: to examine which statements are probabilistically justifiable. To probabilistically accept one of the enumerators simply because (so far) all of its implications appear to be true would be a violation of the semantics, unless the truth value of an enumerator really was just the conjunction of the enumerated statements. So to satisfy my original purpose, such a thing is needed.

As I said, paradox can be avoided by allowing enumerators to be neither true nor false. In particular, it is natural to suppose a construction similar to Kripke's least-fixed-point theory of truth: an enumerator is "ungrounded" if it implies some chain of enumerators which never bottoms out to actual statements; ungrounded enumerators are neither true nor false.

The problem is that we will now want to refer to the ungrounded-ness of those sentences, since it appears to be a rather important property. For this we need to augment the language. This can be done in multiple ways, but it will ultimately lead to a new reference failure. And filling in that hole will lead to yet another hole to fill. In general I would deal with this by using my theory that almost works. This entails building an infinite hierarchy of truth values which starts:

True, False, Meaningless1, Meaningless2, Meaningless3, MeaninglessInfinity, MeaninglessInfinity+1...

I am generally interested in investigating whether this hierarchy is equal in power to the usual Tarski hierarchy, namely,

True1, True2, True3, ... TrueInfinity, TrueInfinity+1, ...

The difference basically springs out of the use of a loopy truth predicate (a truth predicate that can apply truly to sentences which contain the same truth predicate). Asking for a single truth predicate appears to force the existence of infinitely many meaninglessness predicates. Is a loopy truth predicate automatically more powerful? Not obviously so: the loopy truth predicate will have a maximal amount of mathematical structure that it implies (the least fixed point), which does not even include uncountable entities. The Tarski hierarchy will continue into the uncountables, and further. (I should note that that statement is not universally accepted... but if someone suggests that the Tarski hierarchy should stop at some well-defined level, can't I just say, OK, let's make a truth predicate for that level? Shy not keep going? And we don't actually need an uncountable number of truth predicates: we need an ordinal notation that can refer to uncountable ordinals, and we just make the truth predicate take an ordinal as an agrument.)

So the question is: is there an isomorphism between the mathematical structures captured by the Tarski heirarchy of truth, and my hierarchy of nonsense.

Thursday, March 19, 2009

Maximally Structured Theory

The proof-theoretic concept of a theory's "strength" has little to do with the sorts of considerations in the previous two posts. Proof theory normally worries about how much a system proves, while I've been worried about how much a system can say.

The sort of proof-theoretic strength I've read about (I admit there are at least two, and I don't know much about the other one, known as ordinal strength) is defined in terms of interpretations. One system can be interpreted in another if, roughly, we can "translate" the provable statements in one into provable statements in the other, and the translation preserves the basic logical operators (so and -> and, or -> or...). There is no requirement that the translation of an unprovable statement be likewise unprovable, which means that the logic we're translating into may add more provable statements. This is, of course, the point: of two systems A and B, B is stronger than A if B interprets A but A does not interpret B. This indicates that B proves more statements than A.

Now suppose that system C is also stronger than A. It may be the case that system C is not comparable to system B: neither can interpret the other, so there is no way to say that one is more powerful than another. We can think of this as a disagreement between the two systems; both prove more than A proves, but B might prove some statement where C instead proves the negation of that statement.

Is one system right and the other system wrong? Well, if both systems are supposed to be talking about the same mathematical entity, then yes. (Or they could both be wrong!) But if not, we can instead think of the different systems as describing separate mathematical entities. It is then natural (in my opinion) to assert that the proper foundation of mathematics should interpret both. This would mean that it contained both structures within it.

I am somewhat surprised that this is not the current approach. Instead, people seem to be concerned with whether or not the axiom of choice is actually true or not. I suppose this is my first deviation from the classical 'platonist' view :).

So, suppose we restrict ourselves to finite first-order theories. How much structure do we get?

First, just because we restrict ourselves to finite theories doesn't mean we can't interpret an infinite theory. Peano arithmetic, which I've been talking about for the last two posts, is a perfect example; it has an infinite number of axioms, but they are computably enumerable, so we can create an axiom system that enumerates them by allowing the use of extra function symbols.

Second, we can interpret far more than first-order theories; higher-order theories can be interpreted as first-order theories with restricted quantifiers.

What we can't interpret is a fundamentally infinite theory, such as... well, the truths of arithmetic. No finite theory will interpret the infinite theory containing all those truths. (Notice, now I'm being Platonist, and claiming that there actually is a fact of the matter.)

What will the maximal theory look like, the theory that can interpret any others? Will it be finite?

Let's first construct an infinite theory that interprets all finite theories. We can't interpret inconsistent theories without being inconsistent, so let's throw those out... then, we enumerate all the consistent theories (in order of size, let's say, using some other rule to break ties), but using a new set of function symbols for each theory, so that they don't interfere with eachother. (Each finite theory can only use a finite set of function symbols, so there is no trouble here.) Using different function symbols allows us to simply take the union, forming an infinite theory.

This infinite theory cannot be converted into a finite form, unfortunately, because the enumeration is not computable: we can't simply distinguish between theories that are consistent and theories that are not.

Now... if we take a logic wich lacks the rule of explosion, ie, a paraconsistent logic, we wouldn't need to worry so much about avoiding inconsistent theories. The inconsistent theories would be isolated, and wouldn't "infect" the others. In this case we could computably enumerate all theories (still making sure to take different function names for each), and therefore we could make a finite theory which performed the enumeration. (The finite theory would use only a finite number of function symbols, but it would provide an interpretation for an infinite number of them, via some encoding.)

This provides an interesting argument in favor of paraconsistent logic. But, not necessarily a definitive one. It seems to me that one could claim that the paraconsistent logic is not really interpreting the classical theories, since it is putting them into a logic with a nonclassical semantics. Perhaps one could define what the paraconsistent logic is really doing, and construct a classical theory that does that much. I don't know.

[edit]
postscript--

Don't think that it ends here, with one infinite (unspecifiable) theory to rule over all finite theories. We can talk about other such infinite theories, such as the theory containing all the truths of first-order arithmetc, the theory containing all the truths of second-order arithmetic, et cetera, the truths of set theory... (Actually, I don't think just "the truths of set theory" is well specified. "The truths of set theory under the iterative conception of set" is one way of being more specific.) So, we'll want to talk about whether these infinite theories can interpret eachother, and hence we'll have an increasing chain of "maximally structured" systems with respect to particular reference sets... the maximally structured theory with respect to finite theories is countably infinite; so is the set of truths of first-order arithmetic; the max. struct. theory w.r.t countably infinite theories will be uncountably infinite; and so it goes.

Alternatively, the paraconsistant version of this hierarchy appears to stop right away, with a finite theory that interprets all finite theories. This might be taken as a strength or a weakness...

Wednesday, March 18, 2009

More on Last Time's Subject

The system I suggested last time, at the end, can actually be stated in a much simpler (more standard) way.

Rather than adding in a totally new notation for representing enumerations of sentences in a Turing-complete manner, all we really need is to allow the use of arbitrary function symbols when making assertions. This sounds trivial! However (as the considerations from last time show) it makes a big difference in the actual behavior of the system.

Added function symbols are a standard part of the machinery of first-order logic, so in some ways it seems only natural to allow their use in first-order arithmetic. However, they are not normally considered a part of that language; functions in first-order arithmetic must be constructed rather than defined.

Considering functions part of the language obviously proves no new theorems about arithmetic, because we don't add any axioms that include the function symbols. The only thing that changes is what we can say. So what can we say?

Well, it allows us to say a whole lot. With function symbols, we can go as far as defining a truth predicate for arithmetic! This means that we can build the metalanguage for arithmetic. We can also go further, building the metametalanguage (the language containing the truth predicate for the metalanguage), et cetera. However, don't be fooled; there is an important difference between being able to build it and having it as part of the system. Normally when we talk about a language containing the truth predicate we mean that we've already got the rules to make it work in the axioms, not that we can add them if we like.

Adding function symbols to arithmetic follows the advice of the edit to the previous post, namely, we don't include any claim that the added statements are true just because all the sentences of arithmetic that they imply are true. This is essentially because the logical system is treating the function-symbols as actual things, not just as notational conveniences. The logic thinks there is a truth of the matter about how "F" behaves, even before we add axioms to define its behavior. It has no idea what that truth is, but it assumes that the truth exists. So even if a particular function-based statement implies only true statements about arithmetic, and so looks totally un-risky, the logic still will think that it implies a bunch of risky statements about the function symbols involved.

It would be interesting to develop a semantics for the function-symbols that did not result in this conclusion... however, as mentioned in the previous post, in that direction there is a strong risk of paradox. Some sort of non-classical system of truth values will almost certainly be needed.

Sunday, March 15, 2009

OK.

So.

More logic stuff.

I've spoken in the past about using probabilistic justifications (or, when simplicity is desired, nonmonotonic justufucations) for mathematical belief. This provides at least some account of how a system can meaningfully refer to uncomputable concepts such as the natural numbers. (Calling the natural numbers "uncomputable" is sure to raise a few eyebrows. What I mean is that there are undecidable propositions concerning the natural numbers.) The natural question is: how much can be probabilistically justified?

To classify one statement as computably decidable, or probabilistically decidable, or neither, is quite accurate; any one statement or its negation could be simply added as an axiom, and thus made decidable (or more realistically, something else might be added as an axiom, rendering the statement decidable). Similarly, a statement might be made probabilistically decidable with the addition of an axiom that caused it to be equivalent to some probabilistically decidable statement. (Most trivially, the axiom "A <-> B" where "A" is a probabilistically jsutifiable statement and "B" was previosly not. Such an axiom might seem contrived, but it is true if A and B are both true!)

Still, it is convenient to talk in this way, and can be made meaningful. One way is by fixing the axioms we're dealing with; so, for example, we might be working just with robinson arithmetic or peano arithmetic. (I would prefer robinson arithmetic, since I would like to leave the epistemic status of mathematical induction open, but peano arithmetic assumes it to be true.) But this is not the only possibility. For now, I prefer something like the following:

Basic functions of arithmetic are computable. These vary from axiomatixation to axiomatization. At one extreme, we could provide axioms giving the behavior of all the common functions on natural numbers; at the other extreme, we could have no functions in our language, instead having relations sufficient to specify the structure. Most common is to take only the sucessor function as basic, and define addition, multiplication, et cetera from these. These differences may cause trouble for me (ie, result in non-equivalent definitions), but I'll ignore that for now...
Basic predicates and relations are computably decidable. Again, there is some variation about which relations are taken as basic, and which are taken as defined. Usually equality is taken as basic, but I've heard of greater-than being taken instead.
Statements whose truth is a function of a finite number of computably decidable statements are computably decidable. This lets us form larger decidable statements with the normal boolean functions, and or and not. It also allows bound quantifiers to be used, such as "For all numbers less than 100..."
Statements whose truth is a function of an infinite number of computably decidable statements are probabilistically decidable if any reasonable probabilistic decision process is guaranteed to converge.

A "reasonable probabilistic decision process" is intended to be something like the procedure described in this post. Obviously I need to make that a bit more well-defined, but it should be good enough for now.

It might also be convenient to define "computably justifiable", meaning that if a statement is true, it is computably decidable. (For example, "there exists a place in the decimal expansion of pi at which the number 9 is repeated 24 times" is verifiable if it is true (we can simply calculate the digits of pi until we find the spot), but if it is false, it is not.) "Computably falsifiable" similarly means that a statement can be falsified finitely if it is indeed false. "Probabilistically justifiable" would mean a statement was probabilistically decidable if it was true, and "probabilistically falsifiable" would mean that it was probabilistically decidable if false.

The basics are all set up, so let's get to the main show.

The question is: what statements are probabilistically justifiable?

I think I have mentioned before that the delta-2 statements on the arithmetic hierarchy are definitely probabilistically decidable, since they are limit-computable. For any statements not in delta-2, there is no guarantee that they will converge to the correct value (they might not converge, or worse, they might converge to the wrong value). Still, it is an interesting question whether *some* of these converge, and whether applying probabilistic methods is still better than guessing randomly (if there are more that converge correctly than converge incorrectly). But that is not the question for today...

Previously, I have restricted this question to statements in arithmetic. This seems sensible. However, lately I've been looking at the concept of proof-theoretic strength. Roughly, a logic is considered stronger if it proves the same truths plus more. This brought my attention to the idea of probabilistically justifying statements in higher logics based on their relevance to arithmetic. Could set theory, for example, be justified by its usefulness in describing true statements of arithmetic?

This led to a question: are there any statements that can be made, which purely assert things about arithmetic, but cannot be formulated in the language of first-order arithmetic? Obviously arithmetic can assert any single thing about arithmetic, so the question is whether all combinations of arithmetic statements that can be asserted by some logic can be asserted in arithmetic. ("Combinations of statements" refers to the same thing that "truth functions of statements" referred to in the earlier definition. Now, obviously, arithmetic can specify any finite truth-functional statement; so that challenge comes when we examine infinite truth-functional statements, such as those that are probabilistically justifiable.)

More formally:

Definition: Logic B is "effectively complete" with respect to logic A if and only if for every computably enumerable set S of statements in logic A, there exists a statement in logic B which imples exactly S closed under the deductive consequence relation of B. (We need to close the set, because we can't penalize logic B for not being able to imply x&y without automatically implying y, for example.)

Conjecture: Arithmetic is effectively complete with respect to itself.

Whether this is true or false, there are some interesting consequences. I'll note some of them in both directions, but keep in mind that (since only one direction is true) half of the consequences I mention will be false.

If the conjecture is false, then even if we are suspicious of mathematical entities other than numbers, it seems that we'll have good reason to want a more mathematically expressive language then first-order arithmetic. The idea of "effective completeness" as I defined it is fairly weak; it does not seem at all implausible to say that if we care about a domain, we should want a logic which is effectively complete with respect to it. So, if arithmetic is not, it seems like another is needed. Furthermore, these new statements may have the hope of probabilistic justification.

This does not necessarily justify set theory. It most certainly doesn't justify a Platonist attitude towards sets (ie, believing in sets as actual entities rather than mere symbol-manipulation). Still, it starts the climb towards abstraction, and that's a slippery slope. (ha ha, mixed metaphor.)

If the statement is true, on the other hand, that means first-order arithmetic can state anything that can be effectively stated. (That qualification, "effective", means: arithmetic can state any combination of statements that some computer program could generate in a (possibly nonterminating) sequence.) This would mean that any higher logic, in its capacity to refer to combinations of arithmetic statements, is unnecessary; in some sense it is already hiding within arithmetic, so it doesn't need to be invented seperately. Any higher-logic statement that could be justified probabilistically via its relationship to arithmetic could actually be interpreted as some statement in arithmetic: once we found the sentence's interpretation, we could not distinguish between the two in terms of their meaning.

So which is true?

I spent quite a while pondering it, and went down a few wrong paths trying to find the answer, but I won't waste time describing those. Eventually, I realized that I had an example right in front of me of a sentence that is not ever stated in the language of first-order arithmetic, yet assers an effectively enumerable set of such statements: mathematical induction!

Mathematical induction is always axiomized in peano arithmetic via the aid of an axiom schema. An axiom schema is an axiom with a hole in it, into which is supposed to be placed every possible statement that can be made in the system. Here is the induction schema:

"If X[0] and X[n]->X[n+1] for all n, then X[n] for all n."

In other words, if we can prove that X holds for zero, and that whenever X holds for a number it holds for the next number, then we have proved that X holds for all numbers.

"X" is the hole into which any particular statement that can be made about a number is to be dropped. This technique, making a statement about all statements, isn't actually avaliable to us in first-order arithmetic. Officially, the axiom schema is a stand-in for an infinite number of axioms.

So, you see, if everyone uses an axiom schema, I realized that it's probably because we can't summarize that infinite statement in the language of arithmetic alone. Furthermore, I realized that someone had probably proven that fact by now. So I looked it up, and it's true. My conjecture was false.

This means that higher logics are necessary, in a fairly strong sense. Which logics? Well, adding the truth predicate for arithmetic (ie jumping to the metalanguage for arithmetic) gets us effective completeness for first-order arithmetic statements. If we really were only concerned with arithmetic, we would stop there. But this new language will (I suspect) still not be effectively complete with respect to itself. So we could add a third layer to the cake, using a truth predicate for the second logic to form a meta-meta-language that would be effectively complete for it... probably, we can continue in this manner for the entire Tarski hierarchy of truth predicates. But instead, we could tackle head-on the problem of creating a language that is effectively complete with respect to itself.

Such a think might be impossible, I admit... it might lead to paradox in the same way that a language containing its own truth predicate can.

My idea is that, in addition to some standard stock of standard machinery (conjunction, negation, a domain such as arithmetic to talk about...), we add a turing-complete representation for constructing statement-enumerators. A statement-enumerator would be treated as the combined assertion of all statements it generated. These generators would be capable of generating other generator-sentences (otherwise it would just be another metalanguage).

A generator would be true if all its generated sentences were.

Sure enough, we can construct a generator that generates only its own negation. Assuming classical truth values, paradox ensues. So, we need to allow some sentences to be neither true nor false...

Same old game.

Edit-

Actually, this scheme might fail only because I'm overstepping what is needed for effective completeness. Specifically, the sentence "A generator would be true if all its generated sentences were" seems convenient, but is not necessary.

Friday, December 05, 2008

A General Theory

Well, I have been climbing the ladder of infinities for a while now on this blog. Soon I must get back to how this all applies to AI... but not quite yet.

Here is a theory that almost works.

Start with first-order logic. This can be your favorite non-classical variation if you wish; intuitionistic, paraconsistent, relevant, whatever you want. There is only one requirement: it needs to be strong enough to be Turing-complete (more specifically, logical consequence should be completely enumerable but not co-enumerable, thanks to the good old halting problem). Call this the base language.

Add to this your favorite theory of truth. For maximum effectiveness, it should include the infinite hierarchy of truth that I suggested in this post. This is not a difficult requirement: either a revision theory of truth or a fixed-point theory will do, as well as many less-well-known theories, I'm sure... and anyway, the requirement is not totally necessary, as I'll attempt to make clear. Anyway, call this language the language of truth. The theory of truth that you choose will assign actual truth-values to the statements in this language.

Now we have what we consider a solid theory of truth. But, as I pointed out in the message I copied in the previous post, all such theories appear to have referential gaps: some statements will have a status not nameable within the theory, which we can name and reason about as people standing outside of the theory. The type of referential gap will depend on the type of theory of truth that was used. In the most common case, the gap will be sentences that are not assigned either "true" or "false". The theory of truth will be able to state that a sentences is true, or false, but not that it is neither. Defenders of such theories of truth will attempt to claim that we really can't say that; for example, one way of arguing is saying that such sentences have undefined truth values, but we could later add logical conventions that ascribe true or false to them. However, such arguments are self-defeating: the argument needs to refer to the class of sentences that are in this intermediate state, so it generally must invent a label for them (such as "undefined"). This name is precisely the reference gap of the theory, and cannot be stated inside the theory.

So, the next step is to add to the language whatever labels we need in order to fill the gap(s) that exist in the theory of truth. I call this stage a theory of meaning, because I think it is important to point out that the theory of truth is not incomplete just because of the gaps; it may be a complete and valid theory of truth, it just is not a complete theory of logical/mathematical reference. Call the new gap-filling language the first language of meaning. I will generally pretend that there is only one gap, as in the simple case. Call this gap 1-meaningles. (The idea doesn't seem hard to modify to approaches that create multiple gaps.)

Assigning truth values to this language can generally be accomplished by relying on the original theory of truth that we chose. First, label the 1-meaningless sentences as such. Second, modify the theory of truth to act upon 3 truth values rather than the usual 2. This will involve some decisions, but as you can see I am not too concerned with details in this post. Generally, we'll have to decide things like whether it is 1-meaningless or just false to claim that a 1-meaningless sentences is true. Once we've made these decisions, we simply apply the method.

This, of course, creates another gap; having 3 truth values rather than 2 is not so radical as to change the result there. Call the new gap 2-meaningless, and call the language that includes it the second language of meaning. Assign truth-values to this language in the same way, by expanding the method to include 4 truth values.

By now you get the idea. We define 5-meaningless, 6-meaningless, and so on. And if you read the first post I mentioned (this post), then you'll probably also realize that I want to similarly define infinity-meaningless, inf+1-meaningless, inf+inf-meaningless, and so on. More specifically, I want to define a type of meaninglessness corresponding to every ordinal number. As I hinted at the beginning, this rather large hierarchy should smooth out most differences in referential power of the methods involved; so, a really weak initial theory of truth should still do the trick in the end, gaining maximal referential power after unstateably many iterations.

Now for the pleasant surprise. Once I've done this, I can prove that there is no referential gap left. If I had a final gap, it would correspond to an ordinal number larger than all other ordinal numbers (including itself)! This cannot be, so the theory is safe, almost as if it were protected by a magic charm.

For a few weeks now I've been satisfied with this conclusion. But, early on, I realized that I didn't know what the logic should say about a statement like "This sentence is either false or some type of meaningless". A day or so ago I realized what was going on. Each new language can refer to any combination of the truth-states from the levels under it, but obviously there is no top level (since there is no ordinal larger than all others), so we don't have permission to refer to any combination of values from any level we want; we are only allowed to refer to combinations that have some upper bound. The phrase "some type of meaningless" has no upper bound; it attempts to refer to the entire unbounded list.

There is some legitimate mathematical tradition that could be used to justify this limitation of the logic. One is not supposed to tamper with all ordinals at once. So, I could simply say that it's OK to stop here. Of course, I'm using the same sort of self-defeating argument that is so common in this area... in order to say that we can't refer to the list of all ordinals, I am doing exactly that.

Another way out would be to allow the newly-problematic statements to be both true and false, using some paraconsistent logic. This is similarly justified by the motto "one is not supposed to tamper with all ordinals at once". The taboo surrounding the list of all ordinals arises, of course, from the fact that contradictions follow quite quickly from reasoning about it. So, I could somewhat legitimately argue that it is literally an inconsistent yet well-defined mathematical entity.

However, this does not give me maximal expressive power... I will end up wanting to invent new notation to fill reference-gaps in the paraconsistent theory, and on we go.

So, a more direct approach would be to allow unrestricted statements about ordinals, and then do the same thing we've been doing... assign these statements truth values in the obvious ways, then apply an expanded theory of truth to add a truth predicate to that language, then call anything that isn't assigned a value "super-meaningless", expand the theory of truth to give that a predicate, invent 2-super-meaningless, 3-super-meaningless, inf-, and the whole ordinal hierarchy again. Then what? Well, we'll have the same sort of gap for this second ordinal hierarchy. So by doing the whole thing again, we can create super-super-meaningless, super-super-super-meaningless, and infinite versions with any ordinal number of supers . What next? Well, we'll have the same problem again...

But notice what I'm doing...

All of this is quite seriously illegal if we take the notion of ordinal number seriously, because I'm simply constructing a hierarchy of ordinals larger than all ordinals. An ordinal cannot be larger than all ordinals! And even less can there be a whole hierarchy up there...

This demonstrates the force of the two limitative arguments (either "no, you seriously cannot refer to those things" or "sure, go ahead, but you'll derive contradictions"). Even though there really really seems to be a solid next step that can be taken, it is either a brick wall or a gaping ravine...

So, like I said, it seems to be a theory that almost works.

Tuesday, November 11, 2008

Fixing Naive Lambda Logic

In this old post, I explained why naive lambda logic is paradoxical, and explained how to fix it. The fix that I suggested restricted lambda statements to represent finite computations, which is really what they are for anyway. However, in this recent post, I suggest that many naive, inconsistent logics (such as the naive lambda logic) are actually referentially complete: they can state any mathematically meaningful idea. The problem is that they also can state some meaningless things.

So, it is interesting to consider how one might fix naive lambda logic in a way that keeps the meaningful infinite stuff while removing the nonsense.

To do that, though, a definition of meaningfulness is needed. As I said in the more recent post, a statement's meaning needs to come ultimately from a basic level of meaningful statements (such as statements about sensory data). A meaningful statement should either be on this base level, or its meaning should be derived purely from other meaningful statements.

In the lambda logic, it is natural to think of a term as meaningful if it evaluates to something. A statement, then, is meaningful if its truth relies only on the value of meaningful terms. To make this precise, we've got to specify the infinite build-up process by which we assign meaning. For each operation that forms a new sentence, we've got to define how meaning is carried over. One of these rules, for example, will be:

If statement X is meaningful and statement Y is meaningful, then the statement X and Y is meaningful.

We need a rule like this for every operation: and, or, not, quantifiers, lambda, and the lambda-value relation. That may sound simple, but complications quickly build up. And is meaningful when its arguments are meaningful. But what about or? It seems like X or Y could be meaningfully true if X is true but Y is meaningless. But if we want this to work, then it would also be sensible to allow not Y to be meaningfully true when Y is meaningless. (That behavior of not would make the behavior of or that I mentioned consistent with classical logic.) But that equates meaningless with false, which seems wrong.

Another problem arises with the treatment of quantifiers. Do quantifiers range over all possible terms, or only meaningful ones? It makes a difference!

There are many different places we can run to get standard solutions to these problems: free logic, the revision theory of truth, fixed-point theories, and others.

A third problem, perhaps worse, arises from the concept "not meaningful". For a concept to be meaningful seems straightforward: it should be built up in a meaningful way from other meaningful statements. But trouble presents itself when we discuss non-meaning.

Imagine the base-level facts as ground from which trees are growing. The trees, of course, are the meaningful statements that can be built up. Meaningless statements would be branches hanging in midair, attempting to grow from themselves, or from other meaningless statements. (Meaningful statements can also have self-reference, but have an attachment to the ground somewhere.)

Now, when we consider in the concept "meaningless", we see some weird stuff happening: somehow the paradoxical branches that grow from nothing are able to support meaningful branches, such as the statement "This sentence is false" is meaningless. Or, even stranger, consider "This sentences is meaningless". It appears to be meaningful but false. Or, consider "This sentence is either false or meaningless". If it is true, it is false or meaningless; if it is false, it is true; if it is meaningless, then it is true. It looks like the only way to deal with it is to say that it is meaningless to ask which of the categories it is assigned to: it is meaningless to talk about its meaningfulness or meaninglessness.

To handle cases like these requires a sort of back-and-forth between meaningful and meaningless. We can't just grow meaning from the ground up and then declare the rest meaningless; in declaring things meaningless we allow more meaningful statements, so we've got to go back and add them in. That in turn might change the set of meaningless statements, and so on. If in doing this we are changing our assessments of various statements (going back and forth between "meaningful" and "meaningless"), then we are doing something similar to what the revision theory of truth recommends. On the other hand, I like the idea of marking things "definitely meaningful" and "definitely meaningless". A back-and-forth woulds still be needed, but all decisions would be final.

Anyway. Suppose we resolve all of those issues. Another interesting issue comes up: infinite lambda statements.

An infinite lambda statement could directly represent the mathematical entities that I want the system to be able to reason about. For example, an arbitrary real number would be any function from natural numbers to the integers 0 through 9 (if we want decimal notation), represented by a finite or infinite lambda statement. (Note: the calculation itself could always be fixed to halt in finite time, despite the lambda statement being infinite.) The interesting thing is that if the logic has been satisfactorily rigged up, it will be in some sense as if infinite lambda statements were allowed, even though they aren't.

This suggests that we need to be careful of the semantics, and therefore of how nonmonotonic reasoning is used. Are the quantifiers interpreted as ranging over all actually-representable lambda terms, or are they also ranging over the unrepresentable infinite ones? If the logic is to talk about unrepresentables properly, the quantifiers will have to range over them. But then nonmonotonic reasoning will not converge to the correct answers: it will converge to answers that hold for the finite terms only. This will sometimes be correct for the infinite case, but not always. The matter seems complicated, and I'm not yet sure how to deal with it.

Saturday, November 08, 2008

The Lure of Paraconsistency

A paraconsistent logic is a logic that allows some contradictions. To make this somewhat bearable, paraconsistent logics find various ways of stopping the rule of explosion: "from a contradiction, anything follows". The rule of explosion holds in both classical logic and intuitionistic logic, and makes inconsistent theories very uninteresting: there is only one of them, and in it all things are both true and false. Any theory that turns out to be inconsistent is logically equivalent to this boring theory.

So, contradictions are not as bad in paraconsistent logics-- the explosion is contained, so it's not like everything falls apart. We might still be irked, and ask how this counts as "logic" if it is explicitly contradictory, but at least we can do interesting formal work without worry.

For example, it would be possible to construct paraconsistant versions of naive set theory, my naive lambda calculus, and naive theories of truth. In the context of my most recent posts, it is interesting to consider a naive theory of logical implication: collapse all of the different levels of implication (the systems I named one, two, infinity, A(infinity, infinity)...) into a single implication relation that can talk about any level.

Now, the temptation: these "naive" foundational theories appear to be referentially complete! Frege was able to use naive set theory to construct basic mathematics, and indeed it doesn't seem as if there is any bound to what can be constructed in it. On my end, a naive theory of implication obviously contains all the levels I might construct: the system one is the fragment in which implication is only wrapped around first-order statements, two is the fragment in which implications are additionally allowed to be wrapped around statements in one, infinity is the fragment in which we can use any of one, two, three.... infinity^infinity allows us to use one, two, three,... infinity+1, infinity+2, infinity+3...

The problem, of course, is that in addition to allowing all meaningful statements to enter, we apparently let some meaningless statements through as well. But, at the moment, I know of no other way to let in all the meaningful ones! So paraconsistent logic is looking pretty good at the moment.

Essentially, it seems, a meaningful statement is any statement constructable in a naive theory that happens to have a truth value that relies on other meaningful statements. Those other meaningful statements gain their meaning from yet other meaningful statements, and so on down the line until we reach good solid first-order facts that everything relies on. But, the path down to basic statements can be arbitrarily complicated; so, it is impossible to construct a logic that contains all of the meaningful statements and none of the meaningless ones, because we can't know ahead of time which is which for every single case.

I found an argument that paraconsistent logic isn't the only way to preserve naive set theory (free version), but it apparently is only hinting at the possibility, not providing a concrete proposal. Actually, I've made some relevant speculations myself... in this post, towards the end, I talk about a "logic of definitions". Such a logic would be four-valued: true, false, both, neither. A definition can easily be incomplete (rendering some statements neither true nor false), but it can just as easily be inconsistant (rendering some statements both true and false). This is suited particularly well to the way the comprehension principle works in naive set theory; essentially, we can read the naive comprehension principle as stating that any set that has a definition, exists. The trouble comes from the fact that some such definitions are contradictory!

This seems nice; just alter naive set theory to use a four-valued logic and there you go, you've got your foundational logic that can do anything we might want to do. But I'm not about to claim that... first off, I haven't even begun to define how the four-valued logic would work. Second, that scheme omits the extra grounding that nonmonotonic methods seem somewhat capable of providing; I would want to look into that omission... Third, the non-classical manipulations provided by the four-valued logic may not be sufficiently similar to classical manipulations to justify much of classical mathematics. That would be a big warning sign. So, generally, the idea needs to be worked out in much more detail before it can be judged. (But, it looks like there is some work on four-valued logics... here, here, here... haven't read them yet.)

But, anyway, the naive theories (and the idea of using paraconsistant logic to make them workable) are quite valuable in that they provide a glimpse into the possible, showing that it is not utterly crazy to ask for a logic that can define any infinity a human can define... we just might have to give up consistency.

Tuesday, November 04, 2008

Normative Grounding

At the end of the last post, I mentioned a normative definition of grounding: a concept is grounded in a system if the system captures everything worth capturing about the way we reason about that concept. Perhaps "grounding" isn't the best term for this, but whatever you call it, this is an important criteria. This principle should also cut the other way: if something is not worth capturing, it should not be in the system. At the end of the last post I listed a few ways in which the system fails the first test: things that we do (to our advantage) that the system doesn't capture. But the system also has problems in the other direction: it does things that it has no good reason to do.

The expanded notion of meaning that I've made such a big deal about, which allows concepts that are meaningful in terms of other concepts but not in terms of base-level data, seems like a normative stretch. So what if a concept would be useful if we knew it to be true/false? The fact remains that it is useless, since we cannot know one way or the other!

Yet, we silly humans seem to use such concepts. We are (it seems) even able to prove or disprove a few of them, so that we do know them to be true or false. How and why?

The arithmetical hierarchy is defined in terms of a base-level class of computable predicates. However, thanks to the halting problem, it is impossible to tell which predicates are computable and which aren't. So, we never know for certain which level a statement is on. In terms of the formalism I've described, I suppose this would reflect our ability to disprove some statements of the form "true(...)" or "implies(...,...)"; for example, if we can show that a statement is a logical contradiction, then we know that it is not a tautology. (Do I need to add that to the list of manipulation rules?? Is it somehow justifiable if the system defines "true(X)" as "there exists a proof of X"??) So, some proof-searches are provably nonhalting, without resorting to mathematical induction... meaning some of the higher-level statements that appear undecidable will turn out to be decidable, and some that appear un-limit-decidable will turn out to be limit-decidable after all. Since we can't tell which ones will do this ahead of time, there may be normative justification for keeping them all...

Friday, October 31, 2008

Progress

After pondering the previous post for a bit, I became dissatisfied with the informality of it-- I know what I'm saying, but I'm leaving so much out that someone else could easily interpret what I said in a different way. So, I started over, this time stating things more explicitly. This turned out to be a fruitful path.

One: Start with first-order logic.
Two: Everyone seems to think that first-order logic is well-defined, but of course the fact is it cannot be defined in itself. So, we need to add the ability to define it. This can be done by adding some basic turing-complete formalism for the logic to manipulate, like lambda calculus or arithmetic or (perhaps most conveniently) first-order logic. So, we have some operator that means "implies in One", which takes two statements and is true if one can be deduced from the other with first-order manipulations. (We then use nonmonotonic logic when wewant to determine the truth value of the new operator.)
Three: OK, so we agree that system is well-defined. But, by Tarski's Undefinability Teorem, we know it can't be defined within itself. So, we add another operator, that means "implies in Two". This takes two statements and is true if the first implies the second by the rules of Two.
Infinity: We can obviously keep going in this manner. So, define the logic that has each of the "implies" operators that might be gained in this manner. For each number, it has the operator "imlies in [number]".
Infinity+1: By Tarski's theorem, this system also can't define its own truth. But, we just did, so we want our ideal logic to be able to as well. So, we add the "implies in Infinity" operator. (Maybe you've heard that infinity plus one is still just infinity? Well, that is true of cardinal infinities, but not ordinal infinities, which is what we have here. Fun stuff!)
Infinity+2: Add the "implies" relation for Infinity+1.
Infinity+Infinity: Keep doing that, so that we've now got 2 "implies" relations for each number: the one for just the number, and the one for infinity+number.
Infinity*infinity: Keep going on in this manner so that we have an infinite progression for each number, namely "implies in [number]", "implies in Infinity+[number]", "implies in Infinity+Infinity+[number]", ...
Infinity*infinity*Infinity...
Infinity^Infinity...
Infinity^(Infinity^Infinity)...
A(Infinity,Infinity)... (where A is the ackermann function)

Obviously I've gone on far longer than I need to to illustrate the pattern. But, it's fun! Anyway, the point is that finding higher and higher meaningful logics is "merely" a matter of finding higher and higher infinities.

Saturday, October 25, 2008

Somewhat Broader Definition of Meaning

These are some thoughts arising from a recent discussion I was involved with on the agi list. As usual, I'm looking for a semantics that follows closely from syntax. In this post, I discussed my first jump to a broader definition. Here, I'll discuss some issues involved in such definitions, developing a series of broader and broader definitions along the way.

One of the simplest possible views on logical meaning is that a statement's meaning is totally specified by its verification criteria. This point of view is called logical positivism, and the idea also resonates strongly with constructivism. There are several ways of making it more precise:

-A sentence has meaning if there is a computation that will always indicate "true" or "false" from some finite set of sensory observations, when those observations are available. -A sentence has meaning if there is a computation that gives "true" from some finite set of sensory observations, when the sentence is true and those observations are available. -A sentence has meaning if there is a computation that gives "false" from some finite set of sensory observations, when the sentence is false and those observations are available.

These variations are interesting because they seem small, but make a real difference. The first is the most stringent, obviously. The second requires only verifiability, and the third requires only falsifiability.

One interesting property is that all three are inherently difficult to check, thanks to the halting problem. We can tell that a statement is meaningful once we've determined that it is true or false, but we won't always be able to do that ahead of time. Furthermore, because of this, the term "meaningful" won't always be meaningful :).

So, the first definition allows any first-order statements that always reduce to either "true" or "false" when the proper sensory information is added: a class we cannot sort out ahead of time. The second admits all of these, plus an extra existential quantifier. The third instead allows an extra universal quantifier.

Another interesting point: for the second and third definitions, the negation of a meaningful statement is not necessarily meaningful. This is rather odd, and the problem doesn't arise for the first definition. The negation of a verifiable statement is falsifiable; the negation of a falsifiable statement is verifiable. So, if we accept both types of statements as meaningful, then we regain the ability to negate any meaningful statement.

-A sentence has meaning if there is a computation that, from some finite set of sensory observations, either always gives "true" when the sentence is true or always gives "false" when the sentence is false (when the observations are available).

But, an unwanted limitation still remains: we cannot arbitrarily compose meaningful statements. More specifically, we can't put any meaningfully true/false statement in the place of sensory data. To fix this, we can change the definition from one that relies on sensory information to one that can rely on any meaningful information:

-A sentence has meaning if there is a computation that, from some finite set of meaningful data, either would always give "true" when the sentence is true or always give "false" when the sentence is false (if the data were available).

The definition shifts to hypothetical because there are now some meaningful statements for which the data will never be available, since the truth of a meaningful statement can be unknowable (thanks to Godel's Incompleteness Theorem).

The discussion up to now does not depend on a choice between intuitionist and classical logic: both would work. This is largely because I've been discussing meaning, not truth. I have not actually given any definitions that specify when a statement is true or false. If we run a statement's computation, and it spits out "true" or "false", then obviously the statement is correspondingly true or false. However, what if we are stuck with a nonhalting case? The truth value is undefined. An intuitionist approach would leave the value undefined. (So, any truths that are unknowable according to Godel's incompleteness theorem are actually meaningful-but-undefined.) A classical approach would be more likely to define these facts: if the computation corresponding to a verifiable statement never outputs "true", then it is false; if the computation for a falsifiable statement never outputs "false", then it is true. (So, truths that are unknowable according to Godel's incompleteness theorem are meaningful but take an infinite amount of time to verify/falsify.)

I will generally use the classical version, but the reader should make up their own mind.

Notice: the extra assumptions that I add for the classical version cannot be stated in first-order logic. There are two reasons. First, if for some particular computation I attempt to say "if it doesn't output 'true', then the corresponding statement is false", then I am being circular: "doesn't output true" will be a statement of the sort that is either false (if the computation *does* output), or undefined. Since it is never definitely true, my statement carries no weight. (At this point the intuitionists in the audience should be saying "told you so".) Second, I need to state the assumption *for all sentences*, but so far my definitions have only allowed reference to a finite number of meaningful facts.

So, are the classical assumptions meaningful? I want to justify them in terms of manipulation rules: I want to set things up so that if some logicians came and looked at the workings of the logic, with no prior knowledge, they would ascribe to it the semantics I have in mind.

First. I want manipulation rules for the condition "if the computation doesn't output". For this, I can invoke the nonmonotonic logic that I described previously: we assume that the computation doesn't stop, and revise our belief when we find out that it does. Thus, our belief will eventually converge to the truth if we spend enough time checking. (We could also throw on fancer stuff, like weaker/stronger beliefs, so that our belief that the computation doesn't halt gets stronger as we run the computation for longer and longer.)

Second. It seems fairly simple to extend the definition to allow statements to be about an infinite number of base-facts. In fact, it is quite necessary to do so! So far, I've only been discussing statements that apply to a "finite number of meaningful statements". For sensory statements, that makes the system incapable of generalizing. We want a practical AI logic to be able to learn rules that will apply to any new sensory data; surely such statements are meaningful. But, this is a larger shift then may be at first obvious. In terms of the logic, universal statements were up until now merely procedural, used essentially to indicate that some inference step can be made repeatedly. This step allows them to actually reference an infinite number of things. Is that really "meaningful"? Someone might object, for example, that all beings have finite lifespan, so there is really only a finite amount of sensory data to be talked about. I would respond to that particular objection by saying that our lifespan is finite, but not bound by any particular number ahead of time. So, similarly, the reference to all sensory data isn't bound ahead of time. (In terms of the logic, this would mean that we need to be able to quantify over statements.)

Both of these modifications can be understood in terms of super-recursive algorithms. A super-recursive algorithm is an algorithm that is fairly easily implemented on a computer, yet inconsistent with formal models like Turing machines. Specifically, the first definition makes use of limit-computation, and the second makes use of both that and interactive computation.

A limit computation is a computation that only definitely gives the correct answer after an infinite amount of time. The more time we give it, the more likely the answer is to be correct. This sort of computation is commonly used to render some types of fractals, like the mandelbrot set. For each square in the image, a limit-computation is approximated with some finite cutoff time. The longer we allow, the closer the image is to being correct. Here, it is being used in the first case to characterize the concept of not halting, and in the second it could arise in situations where a statement quantifies over an infinite number of meaningful statements.

Interactive computation is what a computer does when it asks you (or the internet) questions during a process. A commonly-invoked image is the driver of a car. The driver is constantly adjusting to events that unfold in the surrounding traffic. The standard models of computation, based on Turing machines and similar constructions require that all input is present before starting, and output is only available once finished. To apply such models of computation to driving, we would need to break the whole process up into small input/output phases-- which does not seem very useful. Interactive computation is used in the above when a statement references an infinite number of sensory items, so that the evaluation of the truth value must be a continual process that continues to take into account more input from the environment.

So, what does the definition of meaning look like now?

-A sentence has meaning if, from some computably-specified set of meaningful data, there is either a computation that would eventually halt if the sentence is true and not otherwise, or a computation that would eventually halt if the sentence is false and not otherwise (if the data were available).

Notice: not all truths are even limit-computable. It may look like it from the definition, but it's not the case. That is because a sentence may now reference an infinite number of limit-computable sentences. We could attempt to share time between some finite-but-large number of these, but in some cases the computation will be provably non-convergent: we won't eventually get better and better answers as we spend more time. One way of interpreting this is to say that some statements don't have meaning in and of themselves, because they don't really have a working verification/falsification procedure tied to them. Instead, they are only meaningful relative to the larger web of statements.

Further expansions of the definition are imaginable... for example, what if we replace both instances of "computable" in the above with "meaningful"? But, I think that is enough speculation for now.

Sunday, October 05, 2008

Finite Sets Prove Mathematical Induction Correct

In this post, I asked two questions:

1. Where does our knowledge about nonhalting programs come from?

2. What grounds the mathematical concept of "set"?

One thing I didn't mention is that an answer to the second question would probably provide an answer to the first. This is because practically all of our knowledge about nonhalting programs can be proven in axiomatic set theory, and it is (somewhat) reasonable to assume that a "perfect" set theory (one that really did capture everything worth capturing about the human intuition of sets) would capture all of it.

Now, I should reserve myself a bit: explaining where the grounding comes from does not necessarily explain where the knowledge comes from, as illustrated with my logic that (in my opinion) has a grounded notion of "programs" and "computation", but which does not necessarily have any additional knowledge to accompany that grounding. But this is beside my point, which is...

Finite sets prove mathematical induction correct!

I mentioned in that post that most (if not all) knowledge about halting can be provided by the proof method called mathematical induction (no relation to the use of "induction" meaning "learning"). It is not difficult to prove the correctness of mathematical induction from set theory; but, I did not consider this route before, because full set theory is harder to justify then induction. However, I came across a paper by George Boolos called "The Justification of Mathematical Induction" which shows how this can, after all, be a useful approach.

The paper justifies mathematical induction from only an extremely minimal set theory, plus the transitivity of less-than (if a is less then b and b is less than c, a is less then c). The set-theoretic fact needed is approximately this:

Pick any number, n, and formula, F. There exists a set X at level n, containing all and only the those things existing at lower levels and satisfying F.

This is an axiom schema, because we can fill in any formula F to get a specific statement. (First-order logic does not actually have a way of saying "for all formulas".) It establishes levels of existence, one for each natural number. It is "extremely weak" because without more principles, it only requires the existence of each possible finite set! There is no grounding problem, because each set that is forced to exist can be explicitly manipulated. (Note: the principle does not rule out the existence of more complicated sets.) I think this is quite satisfying.

On the other hand, I don't have a good feel for where the extra knowledge "comes from"... I can read through the proof, but it is surprising that merely adding the ability to collect entities together into sets gives us so much power to prove important truths! It is dangerous, too. Dropping the notion of levels gives this schema:

Pick any formula F. There exists a set X, containing all and only the those things satisfying F.

This is the naive set comprehension, which (as is well known) leads to a contradiction known as Russel's paradox. Russel's paradox cannot be derived from the version I gave that is restricted to levels. however, perhaps Russel's paradox can still serve as a lesson that, more often than not, an axiom schema will yield unexpected results.

Remaining Questions:

How can talk of larger infinities be grounded, and how can their mathematical existence be proved? How can the way human mathematicians talk about them be justified? Or, is it all actually nonsense? How much of human mathematical knowledge does an AI need to be given before it can learn the rest? (Remember, all facts about halting are in principle learnable; the reason I'm worried about proving some of them is that humans seem to be able to. Also, humans seem capable of coming to know facts about convergence of processes, which are not in principle learnable or provable, unless we really can prove facts about halting.)

Wednesday, September 24, 2008

Some Comments on the Previous Post

One thing that is very interesting about what I talked about last time-- it provides a surprisingly practical purpose for detailed talk about infinities. The bigger infinities you know how to talk about, the more logical systems you can prove consistent, giving you more information about which Turing machines never halt. But this is very strange! These Turing machines are not being run for so long that they enter into the realm of higher infinities. They either halt in finite time, or keep looping for one infinity (the smallest infinity, sometimes called omega). Why, then, does talk of greater and greater infinities become useful?

The straightforward answer is to recount how that information is used to prove consistency (and therefore non-halting-ness) of a system. First, we assign a mathematical semantics to a system; a meaning for the system-states. Second, we show that the initial meanings are true. Third, we show that if one state is true, then the transition rules guarantee that the next state is true as well; in other words, we show that all manipulations are truth-preserving. This is the step where infinities come in. Although no infinities are involved in the actual states and transitions, the semantics may include infinities, so our reasoning may rely on facts about infinities. This sounds silly, but it creeps in more readily then you might expect. If the logic that we are proving consistent uses no infinity, then we must use a single infinity (omega) when proving it consistent; if it uses a small infinity, we must use a larger one to prove it consistent; and so on.

Utterly strange! Why does this method of proving consistency work? We're somehow making the proof easier by treating facts about finite stuff as if they were facts about infinite, unreachable stuffs.

So, it seems that self-reference (trying to prove oneself consistent) would give grounding for as many infinities as one might desire. OK, but, how do we actually get them? A normal logic is stuck with just the infinities it has, even when it starts to reason about itself.

Artificial Intelligence