Progress
After pondering the previous post for a bit, I became dissatisfied with the informality of it-- I know what I'm saying, but I'm leaving so much out that someone else could easily interpret what I said in a different way. So, I started over, this time stating things more explicitly. This turned out to be a fruitful path.
One: Start with first-order logic.
Two: Everyone seems to think that first-order logic is well-defined, but of course the fact is it cannot be defined in itself. So, we need to add the ability to define it. This can be done by adding some basic turing-complete formalism for the logic to manipulate, like lambda calculus or arithmetic or (perhaps most conveniently) first-order logic. So, we have some operator that means "implies in One", which takes two statements and is true if one can be deduced from the other with first-order manipulations. (We then use nonmonotonic logic when wewant to determine the truth value of the new operator.)
Three: OK, so we agree that system is well-defined. But, by Tarski's Undefinability Teorem, we know it can't be defined within itself. So, we add another operator, that means "implies in Two". This takes two statements and is true if the first implies the second by the rules of Two.
Infinity: We can obviously keep going in this manner. So, define the logic that has each of the "implies" operators that might be gained in this manner. For each number, it has the operator "imlies in [number]".
Infinity+1: By Tarski's theorem, this system also can't define its own truth. But, we just did, so we want our ideal logic to be able to as well. So, we add the "implies in Infinity" operator. (Maybe you've heard that infinity plus one is still just infinity? Well, that is true of cardinal infinities, but not ordinal infinities, which is what we have here. Fun stuff!)
Infinity+2: Add the "implies" relation for Infinity+1.
Infinity+Infinity: Keep doing that, so that we've now got 2 "implies" relations for each number: the one for just the number, and the one for infinity+number.
Infinity*infinity: Keep going on in this manner so that we have an infinite progression for each number, namely "implies in [number]", "implies in Infinity+[number]", "implies in Infinity+Infinity+[number]", ...
Infinity*infinity*Infinity...
Infinity^Infinity...
Infinity^(Infinity^Infinity)...
A(Infinity,Infinity)... (where A is the ackermann function)
Obviously I've gone on far longer than I need to to illustrate the pattern. But, it's fun! Anyway, the point is that finding higher and higher meaningful logics is "merely" a matter of finding higher and higher infinities.
Friday, October 31, 2008
Saturday, October 25, 2008
Somewhat Broader Definition of Meaning
These are some thoughts arising from a recent discussion I was involved with on the agi list. As usual, I'm looking for a semantics that follows closely from syntax. In this post, I discussed my first jump to a broader definition. Here, I'll discuss some issues involved in such definitions, developing a series of broader and broader definitions along the way.
One of the simplest possible views on logical meaning is that a statement's meaning is totally specified by its verification criteria. This point of view is called logical positivism, and the idea also resonates strongly with constructivism. There are several ways of making it more precise:
-A sentence has meaning if there is a computation that will always indicate "true" or "false" from some finite set of sensory observations, when those observations are available. -A sentence has meaning if there is a computation that gives "true" from some finite set of sensory observations, when the sentence is true and those observations are available. -A sentence has meaning if there is a computation that gives "false" from some finite set of sensory observations, when the sentence is false and those observations are available.
These variations are interesting because they seem small, but make a real difference. The first is the most stringent, obviously. The second requires only verifiability, and the third requires only falsifiability.
One interesting property is that all three are inherently difficult to check, thanks to the halting problem. We can tell that a statement is meaningful once we've determined that it is true or false, but we won't always be able to do that ahead of time. Furthermore, because of this, the term "meaningful" won't always be meaningful :).
So, the first definition allows any first-order statements that always reduce to either "true" or "false" when the proper sensory information is added: a class we cannot sort out ahead of time. The second admits all of these, plus an extra existential quantifier. The third instead allows an extra universal quantifier.
Another interesting point: for the second and third definitions, the negation of a meaningful statement is not necessarily meaningful. This is rather odd, and the problem doesn't arise for the first definition. The negation of a verifiable statement is falsifiable; the negation of a falsifiable statement is verifiable. So, if we accept both types of statements as meaningful, then we regain the ability to negate any meaningful statement.
-A sentence has meaning if there is a computation that, from some finite set of sensory observations, either always gives "true" when the sentence is true or always gives "false" when the sentence is false (when the observations are available).
But, an unwanted limitation still remains: we cannot arbitrarily compose meaningful statements. More specifically, we can't put any meaningfully true/false statement in the place of sensory data. To fix this, we can change the definition from one that relies on sensory information to one that can rely on any meaningful information:
-A sentence has meaning if there is a computation that, from some finite set of meaningful data, either would always give "true" when the sentence is true or always give "false" when the sentence is false (if the data were available).
The definition shifts to hypothetical because there are now some meaningful statements for which the data will never be available, since the truth of a meaningful statement can be unknowable (thanks to Godel's Incompleteness Theorem).
The discussion up to now does not depend on a choice between intuitionist and classical logic: both would work. This is largely because I've been discussing meaning, not truth. I have not actually given any definitions that specify when a statement is true or false. If we run a statement's computation, and it spits out "true" or "false", then obviously the statement is correspondingly true or false. However, what if we are stuck with a nonhalting case? The truth value is undefined. An intuitionist approach would leave the value undefined. (So, any truths that are unknowable according to Godel's incompleteness theorem are actually meaningful-but-undefined.) A classical approach would be more likely to define these facts: if the computation corresponding to a verifiable statement never outputs "true", then it is false; if the computation for a falsifiable statement never outputs "false", then it is true. (So, truths that are unknowable according to Godel's incompleteness theorem are meaningful but take an infinite amount of time to verify/falsify.)
I will generally use the classical version, but the reader should make up their own mind.
Notice: the extra assumptions that I add for the classical version cannot be stated in first-order logic. There are two reasons. First, if for some particular computation I attempt to say "if it doesn't output 'true', then the corresponding statement is false", then I am being circular: "doesn't output true" will be a statement of the sort that is either false (if the computation *does* output), or undefined. Since it is never definitely true, my statement carries no weight. (At this point the intuitionists in the audience should be saying "told you so".) Second, I need to state the assumption *for all sentences*, but so far my definitions have only allowed reference to a finite number of meaningful facts.
So, are the classical assumptions meaningful? I want to justify them in terms of manipulation rules: I want to set things up so that if some logicians came and looked at the workings of the logic, with no prior knowledge, they would ascribe to it the semantics I have in mind.
First. I want manipulation rules for the condition "if the computation doesn't output". For this, I can invoke the nonmonotonic logic that I described previously: we assume that the computation doesn't stop, and revise our belief when we find out that it does. Thus, our belief will eventually converge to the truth if we spend enough time checking. (We could also throw on fancer stuff, like weaker/stronger beliefs, so that our belief that the computation doesn't halt gets stronger as we run the computation for longer and longer.)
Second. It seems fairly simple to extend the definition to allow statements to be about an infinite number of base-facts. In fact, it is quite necessary to do so! So far, I've only been discussing statements that apply to a "finite number of meaningful statements". For sensory statements, that makes the system incapable of generalizing. We want a practical AI logic to be able to learn rules that will apply to any new sensory data; surely such statements are meaningful. But, this is a larger shift then may be at first obvious. In terms of the logic, universal statements were up until now merely procedural, used essentially to indicate that some inference step can be made repeatedly. This step allows them to actually reference an infinite number of things. Is that really "meaningful"? Someone might object, for example, that all beings have finite lifespan, so there is really only a finite amount of sensory data to be talked about. I would respond to that particular objection by saying that our lifespan is finite, but not bound by any particular number ahead of time. So, similarly, the reference to all sensory data isn't bound ahead of time. (In terms of the logic, this would mean that we need to be able to quantify over statements.)
Both of these modifications can be understood in terms of super-recursive algorithms. A super-recursive algorithm is an algorithm that is fairly easily implemented on a computer, yet inconsistent with formal models like Turing machines. Specifically, the first definition makes use of limit-computation, and the second makes use of both that and interactive computation.
A limit computation is a computation that only definitely gives the correct answer after an infinite amount of time. The more time we give it, the more likely the answer is to be correct. This sort of computation is commonly used to render some types of fractals, like the mandelbrot set. For each square in the image, a limit-computation is approximated with some finite cutoff time. The longer we allow, the closer the image is to being correct. Here, it is being used in the first case to characterize the concept of not halting, and in the second it could arise in situations where a statement quantifies over an infinite number of meaningful statements.
Interactive computation is what a computer does when it asks you (or the internet) questions during a process. A commonly-invoked image is the driver of a car. The driver is constantly adjusting to events that unfold in the surrounding traffic. The standard models of computation, based on Turing machines and similar constructions require that all input is present before starting, and output is only available once finished. To apply such models of computation to driving, we would need to break the whole process up into small input/output phases-- which does not seem very useful. Interactive computation is used in the above when a statement references an infinite number of sensory items, so that the evaluation of the truth value must be a continual process that continues to take into account more input from the environment.
So, what does the definition of meaning look like now?
-A sentence has meaning if, from some computably-specified set of meaningful data, there is either a computation that would eventually halt if the sentence is true and not otherwise, or a computation that would eventually halt if the sentence is false and not otherwise (if the data were available).
Notice: not all truths are even limit-computable. It may look like it from the definition, but it's not the case. That is because a sentence may now reference an infinite number of limit-computable sentences. We could attempt to share time between some finite-but-large number of these, but in some cases the computation will be provably non-convergent: we won't eventually get better and better answers as we spend more time. One way of interpreting this is to say that some statements don't have meaning in and of themselves, because they don't really have a working verification/falsification procedure tied to them. Instead, they are only meaningful relative to the larger web of statements.
Further expansions of the definition are imaginable... for example, what if we replace both instances of "computable" in the above with "meaningful"? But, I think that is enough speculation for now.
These are some thoughts arising from a recent discussion I was involved with on the agi list. As usual, I'm looking for a semantics that follows closely from syntax. In this post, I discussed my first jump to a broader definition. Here, I'll discuss some issues involved in such definitions, developing a series of broader and broader definitions along the way.
One of the simplest possible views on logical meaning is that a statement's meaning is totally specified by its verification criteria. This point of view is called logical positivism, and the idea also resonates strongly with constructivism. There are several ways of making it more precise:
-A sentence has meaning if there is a computation that will always indicate "true" or "false" from some finite set of sensory observations, when those observations are available. -A sentence has meaning if there is a computation that gives "true" from some finite set of sensory observations, when the sentence is true and those observations are available. -A sentence has meaning if there is a computation that gives "false" from some finite set of sensory observations, when the sentence is false and those observations are available.
These variations are interesting because they seem small, but make a real difference. The first is the most stringent, obviously. The second requires only verifiability, and the third requires only falsifiability.
One interesting property is that all three are inherently difficult to check, thanks to the halting problem. We can tell that a statement is meaningful once we've determined that it is true or false, but we won't always be able to do that ahead of time. Furthermore, because of this, the term "meaningful" won't always be meaningful :).
So, the first definition allows any first-order statements that always reduce to either "true" or "false" when the proper sensory information is added: a class we cannot sort out ahead of time. The second admits all of these, plus an extra existential quantifier. The third instead allows an extra universal quantifier.
Another interesting point: for the second and third definitions, the negation of a meaningful statement is not necessarily meaningful. This is rather odd, and the problem doesn't arise for the first definition. The negation of a verifiable statement is falsifiable; the negation of a falsifiable statement is verifiable. So, if we accept both types of statements as meaningful, then we regain the ability to negate any meaningful statement.
-A sentence has meaning if there is a computation that, from some finite set of sensory observations, either always gives "true" when the sentence is true or always gives "false" when the sentence is false (when the observations are available).
But, an unwanted limitation still remains: we cannot arbitrarily compose meaningful statements. More specifically, we can't put any meaningfully true/false statement in the place of sensory data. To fix this, we can change the definition from one that relies on sensory information to one that can rely on any meaningful information:
-A sentence has meaning if there is a computation that, from some finite set of meaningful data, either would always give "true" when the sentence is true or always give "false" when the sentence is false (if the data were available).
The definition shifts to hypothetical because there are now some meaningful statements for which the data will never be available, since the truth of a meaningful statement can be unknowable (thanks to Godel's Incompleteness Theorem).
The discussion up to now does not depend on a choice between intuitionist and classical logic: both would work. This is largely because I've been discussing meaning, not truth. I have not actually given any definitions that specify when a statement is true or false. If we run a statement's computation, and it spits out "true" or "false", then obviously the statement is correspondingly true or false. However, what if we are stuck with a nonhalting case? The truth value is undefined. An intuitionist approach would leave the value undefined. (So, any truths that are unknowable according to Godel's incompleteness theorem are actually meaningful-but-undefined.) A classical approach would be more likely to define these facts: if the computation corresponding to a verifiable statement never outputs "true", then it is false; if the computation for a falsifiable statement never outputs "false", then it is true. (So, truths that are unknowable according to Godel's incompleteness theorem are meaningful but take an infinite amount of time to verify/falsify.)
I will generally use the classical version, but the reader should make up their own mind.
Notice: the extra assumptions that I add for the classical version cannot be stated in first-order logic. There are two reasons. First, if for some particular computation I attempt to say "if it doesn't output 'true', then the corresponding statement is false", then I am being circular: "doesn't output true" will be a statement of the sort that is either false (if the computation *does* output), or undefined. Since it is never definitely true, my statement carries no weight. (At this point the intuitionists in the audience should be saying "told you so".) Second, I need to state the assumption *for all sentences*, but so far my definitions have only allowed reference to a finite number of meaningful facts.
So, are the classical assumptions meaningful? I want to justify them in terms of manipulation rules: I want to set things up so that if some logicians came and looked at the workings of the logic, with no prior knowledge, they would ascribe to it the semantics I have in mind.
First. I want manipulation rules for the condition "if the computation doesn't output". For this, I can invoke the nonmonotonic logic that I described previously: we assume that the computation doesn't stop, and revise our belief when we find out that it does. Thus, our belief will eventually converge to the truth if we spend enough time checking. (We could also throw on fancer stuff, like weaker/stronger beliefs, so that our belief that the computation doesn't halt gets stronger as we run the computation for longer and longer.)
Second. It seems fairly simple to extend the definition to allow statements to be about an infinite number of base-facts. In fact, it is quite necessary to do so! So far, I've only been discussing statements that apply to a "finite number of meaningful statements". For sensory statements, that makes the system incapable of generalizing. We want a practical AI logic to be able to learn rules that will apply to any new sensory data; surely such statements are meaningful. But, this is a larger shift then may be at first obvious. In terms of the logic, universal statements were up until now merely procedural, used essentially to indicate that some inference step can be made repeatedly. This step allows them to actually reference an infinite number of things. Is that really "meaningful"? Someone might object, for example, that all beings have finite lifespan, so there is really only a finite amount of sensory data to be talked about. I would respond to that particular objection by saying that our lifespan is finite, but not bound by any particular number ahead of time. So, similarly, the reference to all sensory data isn't bound ahead of time. (In terms of the logic, this would mean that we need to be able to quantify over statements.)
Both of these modifications can be understood in terms of super-recursive algorithms. A super-recursive algorithm is an algorithm that is fairly easily implemented on a computer, yet inconsistent with formal models like Turing machines. Specifically, the first definition makes use of limit-computation, and the second makes use of both that and interactive computation.
A limit computation is a computation that only definitely gives the correct answer after an infinite amount of time. The more time we give it, the more likely the answer is to be correct. This sort of computation is commonly used to render some types of fractals, like the mandelbrot set. For each square in the image, a limit-computation is approximated with some finite cutoff time. The longer we allow, the closer the image is to being correct. Here, it is being used in the first case to characterize the concept of not halting, and in the second it could arise in situations where a statement quantifies over an infinite number of meaningful statements.
Interactive computation is what a computer does when it asks you (or the internet) questions during a process. A commonly-invoked image is the driver of a car. The driver is constantly adjusting to events that unfold in the surrounding traffic. The standard models of computation, based on Turing machines and similar constructions require that all input is present before starting, and output is only available once finished. To apply such models of computation to driving, we would need to break the whole process up into small input/output phases-- which does not seem very useful. Interactive computation is used in the above when a statement references an infinite number of sensory items, so that the evaluation of the truth value must be a continual process that continues to take into account more input from the environment.
So, what does the definition of meaning look like now?
-A sentence has meaning if, from some computably-specified set of meaningful data, there is either a computation that would eventually halt if the sentence is true and not otherwise, or a computation that would eventually halt if the sentence is false and not otherwise (if the data were available).
Notice: not all truths are even limit-computable. It may look like it from the definition, but it's not the case. That is because a sentence may now reference an infinite number of limit-computable sentences. We could attempt to share time between some finite-but-large number of these, but in some cases the computation will be provably non-convergent: we won't eventually get better and better answers as we spend more time. One way of interpreting this is to say that some statements don't have meaning in and of themselves, because they don't really have a working verification/falsification procedure tied to them. Instead, they are only meaningful relative to the larger web of statements.
Further expansions of the definition are imaginable... for example, what if we replace both instances of "computable" in the above with "meaningful"? But, I think that is enough speculation for now.
Sunday, October 05, 2008
Finite Sets Prove Mathematical Induction Correct
In this post, I asked two questions:
1. Where does our knowledge about nonhalting programs come from?
2. What grounds the mathematical concept of "set"?
One thing I didn't mention is that an answer to the second question would probably provide an answer to the first. This is because practically all of our knowledge about nonhalting programs can be proven in axiomatic set theory, and it is (somewhat) reasonable to assume that a "perfect" set theory (one that really did capture everything worth capturing about the human intuition of sets) would capture all of it.
Now, I should reserve myself a bit: explaining where the grounding comes from does not necessarily explain where the knowledge comes from, as illustrated with my logic that (in my opinion) has a grounded notion of "programs" and "computation", but which does not necessarily have any additional knowledge to accompany that grounding. But this is beside my point, which is...
Finite sets prove mathematical induction correct!
I mentioned in that post that most (if not all) knowledge about halting can be provided by the proof method called mathematical induction (no relation to the use of "induction" meaning "learning"). It is not difficult to prove the correctness of mathematical induction from set theory; but, I did not consider this route before, because full set theory is harder to justify then induction. However, I came across a paper by George Boolos called "The Justification of Mathematical Induction" which shows how this can, after all, be a useful approach.
The paper justifies mathematical induction from only an extremely minimal set theory, plus the transitivity of less-than (if a is less then b and b is less than c, a is less then c). The set-theoretic fact needed is approximately this:
Pick any number, n, and formula, F. There exists a set X at level n, containing all and only the those things existing at lower levels and satisfying F.
This is an axiom schema, because we can fill in any formula F to get a specific statement. (First-order logic does not actually have a way of saying "for all formulas".) It establishes levels of existence, one for each natural number. It is "extremely weak" because without more principles, it only requires the existence of each possible finite set! There is no grounding problem, because each set that is forced to exist can be explicitly manipulated. (Note: the principle does not rule out the existence of more complicated sets.) I think this is quite satisfying.
On the other hand, I don't have a good feel for where the extra knowledge "comes from"... I can read through the proof, but it is surprising that merely adding the ability to collect entities together into sets gives us so much power to prove important truths! It is dangerous, too. Dropping the notion of levels gives this schema:
Pick any formula F. There exists a set X, containing all and only the those things satisfying F.
This is the naive set comprehension, which (as is well known) leads to a contradiction known as Russel's paradox. Russel's paradox cannot be derived from the version I gave that is restricted to levels. however, perhaps Russel's paradox can still serve as a lesson that, more often than not, an axiom schema will yield unexpected results.
Remaining Questions:
How can talk of larger infinities be grounded, and how can their mathematical existence be proved? How can the way human mathematicians talk about them be justified? Or, is it all actually nonsense? How much of human mathematical knowledge does an AI need to be given before it can learn the rest? (Remember, all facts about halting are in principle learnable; the reason I'm worried about proving some of them is that humans seem to be able to. Also, humans seem capable of coming to know facts about convergence of processes, which are not in principle learnable or provable, unless we really can prove facts about halting.)
In this post, I asked two questions:
1. Where does our knowledge about nonhalting programs come from?
2. What grounds the mathematical concept of "set"?
One thing I didn't mention is that an answer to the second question would probably provide an answer to the first. This is because practically all of our knowledge about nonhalting programs can be proven in axiomatic set theory, and it is (somewhat) reasonable to assume that a "perfect" set theory (one that really did capture everything worth capturing about the human intuition of sets) would capture all of it.
Now, I should reserve myself a bit: explaining where the grounding comes from does not necessarily explain where the knowledge comes from, as illustrated with my logic that (in my opinion) has a grounded notion of "programs" and "computation", but which does not necessarily have any additional knowledge to accompany that grounding. But this is beside my point, which is...
Finite sets prove mathematical induction correct!
I mentioned in that post that most (if not all) knowledge about halting can be provided by the proof method called mathematical induction (no relation to the use of "induction" meaning "learning"). It is not difficult to prove the correctness of mathematical induction from set theory; but, I did not consider this route before, because full set theory is harder to justify then induction. However, I came across a paper by George Boolos called "The Justification of Mathematical Induction" which shows how this can, after all, be a useful approach.
The paper justifies mathematical induction from only an extremely minimal set theory, plus the transitivity of less-than (if a is less then b and b is less than c, a is less then c). The set-theoretic fact needed is approximately this:
Pick any number, n, and formula, F. There exists a set X at level n, containing all and only the those things existing at lower levels and satisfying F.
This is an axiom schema, because we can fill in any formula F to get a specific statement. (First-order logic does not actually have a way of saying "for all formulas".) It establishes levels of existence, one for each natural number. It is "extremely weak" because without more principles, it only requires the existence of each possible finite set! There is no grounding problem, because each set that is forced to exist can be explicitly manipulated. (Note: the principle does not rule out the existence of more complicated sets.) I think this is quite satisfying.
On the other hand, I don't have a good feel for where the extra knowledge "comes from"... I can read through the proof, but it is surprising that merely adding the ability to collect entities together into sets gives us so much power to prove important truths! It is dangerous, too. Dropping the notion of levels gives this schema:
Pick any formula F. There exists a set X, containing all and only the those things satisfying F.
This is the naive set comprehension, which (as is well known) leads to a contradiction known as Russel's paradox. Russel's paradox cannot be derived from the version I gave that is restricted to levels. however, perhaps Russel's paradox can still serve as a lesson that, more often than not, an axiom schema will yield unexpected results.
Remaining Questions:
How can talk of larger infinities be grounded, and how can their mathematical existence be proved? How can the way human mathematicians talk about them be justified? Or, is it all actually nonsense? How much of human mathematical knowledge does an AI need to be given before it can learn the rest? (Remember, all facts about halting are in principle learnable; the reason I'm worried about proving some of them is that humans seem to be able to. Also, humans seem capable of coming to know facts about convergence of processes, which are not in principle learnable or provable, unless we really can prove facts about halting.)
Labels:
foundations of mathematics,
logic,
set theory
Subscribe to:
Posts (Atom)