Artificial Intelligence: 2008

Sunday, December 21, 2008

Back to AI

OK, here we go.

AI is full of many fairly well-developed methods of "propositional" learning: probabilistic, fuzzy, and otherwise fancified versions of stuff we can do with boolean algebra but no quantifiers. In other words, current AI is fairly good at manipulating systems that have a fixed number of variables, but not so good at arbitrary logical structures.

A simple illustration of the two categories:

Propositional Version:
I give the AI data on 3,000 patients, which includes information about which patients had a list of 20 common symptoms, and what of 10 diseases those patients were ultumately determined to be suffering from. The AI learns to predict disease from symptoms.

Predicate Version:
I give the AI data on 3,000 patients, with logical descriptions of symptoms and logical descriptions of diseases. Again, the AI is to learn to predict diseases from symptoms.

In the first case, the AI is learning a mapping from 20 variables (the twenty symptoms) to 10 variables (the diseases). In the second case, the AI is learning a mapping from one infinite space (possible logical descriptions of symptoms) to another (possible logical descriptions of diseases).

There are some borderline cases. For example, if we are learning to map arbitrary real numbers to arbitrary real numbers, the space is infinite, and we *could* treat the real numbers as logical entities rather than merely continuous variables, but this is almost never done; so, it is propositional. (Treating the real numbers as logical entities would mean worrying about interesting exact numbers like 1, 2, 3, pi, e, 1/2, and so on. For propositional methods, 1 is not any more special than 1.002.) It gets fuzzier... suppose we are learning about integers rather than real numbers. We could adapt the same propositional strategies we used in the real-number case, restricting them to integers. It might work well. But with integers there is a greater chance that the exact number matters, and a greater chance that the number should be treated as a logical entity. Perhaps a pattern relies on one of the integers being even. A propositional method will not see this, and will need to learn from the data which integers the pattern applies to and which it doesn't. So, it will only be able to extend the pattern to cases it has seen before. If this sort of pattern is likely, more sophisticated methods become critical. Yet, we're still working with a fixed number of variables. The more sophisticated methods become really critical when a given problem instance can contain a totally new variable, yet one that patterns from old variables might apply to. And then they become really really critical when problem instances can't even be totally separated, because they all fit into one big structure of logical relationships...

The transition from the first type of AI to the second is now taking place, largely under the banner of "probabilistic relational models". Hooray!

(I've labeled the second type "predicate" to distinguish it from "propositional". This way of naming the two types is not uncommon, but many people use "relational" to describe the second class, instead. Another convenient term is "logical".)

Anyway, I'm going to outline one way of carrying over the progress that's been made with propositional models to the higher level... this is an idea I've been sort of thinking about on the back burner, partly as a way of making an inherently parallel learning method. By no means do I think this is the only way of going forward, and in fact I'll probably be working in a different direction myself for at least a little while... maybe I'll write about that later.

So, how can we use propositional algorithms in the wider domain (whatever we choose to call it)?

I described propositional models as models having no quantifiers. Really, though, a universal quantifier over all the variables is taken for granted in the propositional setting. If a propositional method learns that a particular variable is "2 or 3", it doesn't mean that for some one case it is either two or three; it means for any case it is either two or three. This single quantifier gives us some ability to do relational-style learning.

Suppose we want to learn a probabilistic model for pictures. We could treat every pixel as a variable, and learn correlations between them from the dataset. This approach would require a very large dataset before any interesting patterns could be found. It would need to re-learn every object in every possible location, because it would have no way of generalizing from one location to another. A smarter way to apply propositional methods to the problem would be to treat a small square, say 10 by 10, as a single "instance"; we learn correlations between variables in such squares. With this approach, we might be able to get something interesting from even one image. There is a cost, of course: where before it was impossible to generalize from one location in the picture to another, now it is impossible to not do so. For example, the method would not notice that an egg appears in the top right corner of each picture; it will simply notice that an egg is a common object. This can be helped, by adding the location information as an extra variable.

More generally, for any structured data, this approach can be used to learn local structure. For linear data such as text, we can use a fixed number of letters rather than a fixed number of pixels. For tree-structured data (such as computer program code, HTML...), we can use a branching segment. But, we can go even further: there is a totally general mapping that will handle any logical structure. Any data can be represented in the form of predicate statements: a series of entities, each of which can have properties and relationships with other entities. Just as we can select a random 10 by 10 area of a picture and ask what the colors are, we can select 10 entities at random and ask what their properties and relationships are. Let's call this the "universal mapping".

The universal mapping allows us to learn logical structure, but it does have some serious limitations. Suppose once again that we're looking at linearly structured data such as text, but in predicate calculus form, and with the universal mappping. We have a bunch of entities ordered by a next/previous relation, and with a "letter" property that distinguishes each entity as 'a', 'b', 'c', ... 'A', 'B', 'C', ... 'space', 'comma', 'period', et cetera. Now suppose that we sample entities at random. If we've got much text, it will be very unlikely that we'll pick letters that occur next to eachother. We're learning a correct distribution, technically, but we're not looking at the "interesting" part of it (the line). The algorithm could eventually learn what it was supposed to, but it would take too much time, and besides that it would not be in a directly usable form (since the distribution we wnat is embedded in a larger, useless distribution). If we used a special-purpose linear mapping that did not sample so randomly, we'd be much better off.

OK. Anyway, that is the "state of the art" so to speak. These are all basically a type of markov model. So, what is my new suggestion?

Well, first, let's look at one more relevant detail of the current propositional algorithms. Many of them are hierarchical in form. This means that rather than finding complex patterns directly in the set of variables, the algorithm creates additional variables, and finds simpler relationships. Simpler relationships that include additional entities amount to the more complex relationships that we could have looked for in the first place; but the process is easier to manage by abstracting it in this way. This abstraction is iterated: second and third levels of hidden variables can be created to manage increasingly complex relationships, which treat the lower levels exactly as the first level treats the visible variables.

Two recent sucessful examples are Hinton's deep belief nets and Numenta's Hierarchical Temporal Memory.

So in the propositional domain, we can talk about hidden variables. In the predicate domain, we can also talk about hidden variables, but we can add hidden relations and even hidden entities to the list. For propositional methods, hidden variables are just an efficiency issue. In the predicate domain, it is very different: hidden stuff dramatically increases the representational power (and therefore the model search space).

Starting simple with hidden variables: if each entity is allowed to posses hidden variables, then the modeling power has changed from markov-model to hidden-markov-model.

Adding hidden entities and relations is enough to boost the representational power up to turing-completeness: any computable pattern can be represented.

So, OK, enough background. Now, the question is, how can we best use the power of existing propositional methods to search a turing-complete model space? Here is the idea I've been pondering recently...

For the moment, let's say we're working with linearly structured data, for simplicity. One turing-complete formalism is the cellular automaton. These bear some structural similarity to a hierarchical network of variables. The major difference is that all cells are the same, meaning that all variables are interacting with their neighbors in exactly the same manner. My basic idea is to semi-smoothly integrate deep belief network learning with cellular automaton learning. For the linearly structured data, then, the system would be searching for two-dimensional automata. Intuitively, learning a cellular automaton of hidden variables is similar to learning the physical laws that hold sway equally at any point in space. We can see directly how those physical laws work for the visible space, and we generalize, assuming that there are nearby hidden spaces that partially explain the visible space, and farther-off hidden spaces, and so on, but that all follow the same physical laws.

Well, that was nearly 20 paragraphs of introduction for 1 paragraph of an idea... but, that introductory material was interesting in itself, and it will be useful to refer to in future posts. So, more later...

Friday, December 05, 2008

A General Theory

Well, I have been climbing the ladder of infinities for a while now on this blog. Soon I must get back to how this all applies to AI... but not quite yet.

Here is a theory that almost works.

Start with first-order logic. This can be your favorite non-classical variation if you wish; intuitionistic, paraconsistent, relevant, whatever you want. There is only one requirement: it needs to be strong enough to be Turing-complete (more specifically, logical consequence should be completely enumerable but not co-enumerable, thanks to the good old halting problem). Call this the base language.

Add to this your favorite theory of truth. For maximum effectiveness, it should include the infinite hierarchy of truth that I suggested in this post. This is not a difficult requirement: either a revision theory of truth or a fixed-point theory will do, as well as many less-well-known theories, I'm sure... and anyway, the requirement is not totally necessary, as I'll attempt to make clear. Anyway, call this language the language of truth. The theory of truth that you choose will assign actual truth-values to the statements in this language.

Now we have what we consider a solid theory of truth. But, as I pointed out in the message I copied in the previous post, all such theories appear to have referential gaps: some statements will have a status not nameable within the theory, which we can name and reason about as people standing outside of the theory. The type of referential gap will depend on the type of theory of truth that was used. In the most common case, the gap will be sentences that are not assigned either "true" or "false". The theory of truth will be able to state that a sentences is true, or false, but not that it is neither. Defenders of such theories of truth will attempt to claim that we really can't say that; for example, one way of arguing is saying that such sentences have undefined truth values, but we could later add logical conventions that ascribe true or false to them. However, such arguments are self-defeating: the argument needs to refer to the class of sentences that are in this intermediate state, so it generally must invent a label for them (such as "undefined"). This name is precisely the reference gap of the theory, and cannot be stated inside the theory.

So, the next step is to add to the language whatever labels we need in order to fill the gap(s) that exist in the theory of truth. I call this stage a theory of meaning, because I think it is important to point out that the theory of truth is not incomplete just because of the gaps; it may be a complete and valid theory of truth, it just is not a complete theory of logical/mathematical reference. Call the new gap-filling language the first language of meaning. I will generally pretend that there is only one gap, as in the simple case. Call this gap 1-meaningles. (The idea doesn't seem hard to modify to approaches that create multiple gaps.)

Assigning truth values to this language can generally be accomplished by relying on the original theory of truth that we chose. First, label the 1-meaningless sentences as such. Second, modify the theory of truth to act upon 3 truth values rather than the usual 2. This will involve some decisions, but as you can see I am not too concerned with details in this post. Generally, we'll have to decide things like whether it is 1-meaningless or just false to claim that a 1-meaningless sentences is true. Once we've made these decisions, we simply apply the method.

This, of course, creates another gap; having 3 truth values rather than 2 is not so radical as to change the result there. Call the new gap 2-meaningless, and call the language that includes it the second language of meaning. Assign truth-values to this language in the same way, by expanding the method to include 4 truth values.

By now you get the idea. We define 5-meaningless, 6-meaningless, and so on. And if you read the first post I mentioned (this post), then you'll probably also realize that I want to similarly define infinity-meaningless, inf+1-meaningless, inf+inf-meaningless, and so on. More specifically, I want to define a type of meaninglessness corresponding to every ordinal number. As I hinted at the beginning, this rather large hierarchy should smooth out most differences in referential power of the methods involved; so, a really weak initial theory of truth should still do the trick in the end, gaining maximal referential power after unstateably many iterations.

Now for the pleasant surprise. Once I've done this, I can prove that there is no referential gap left. If I had a final gap, it would correspond to an ordinal number larger than all other ordinal numbers (including itself)! This cannot be, so the theory is safe, almost as if it were protected by a magic charm.

For a few weeks now I've been satisfied with this conclusion. But, early on, I realized that I didn't know what the logic should say about a statement like "This sentence is either false or some type of meaningless". A day or so ago I realized what was going on. Each new language can refer to any combination of the truth-states from the levels under it, but obviously there is no top level (since there is no ordinal larger than all others), so we don't have permission to refer to any combination of values from any level we want; we are only allowed to refer to combinations that have some upper bound. The phrase "some type of meaningless" has no upper bound; it attempts to refer to the entire unbounded list.

There is some legitimate mathematical tradition that could be used to justify this limitation of the logic. One is not supposed to tamper with all ordinals at once. So, I could simply say that it's OK to stop here. Of course, I'm using the same sort of self-defeating argument that is so common in this area... in order to say that we can't refer to the list of all ordinals, I am doing exactly that.

Another way out would be to allow the newly-problematic statements to be both true and false, using some paraconsistent logic. This is similarly justified by the motto "one is not supposed to tamper with all ordinals at once". The taboo surrounding the list of all ordinals arises, of course, from the fact that contradictions follow quite quickly from reasoning about it. So, I could somewhat legitimately argue that it is literally an inconsistent yet well-defined mathematical entity.

However, this does not give me maximal expressive power... I will end up wanting to invent new notation to fill reference-gaps in the paraconsistent theory, and on we go.

So, a more direct approach would be to allow unrestricted statements about ordinals, and then do the same thing we've been doing... assign these statements truth values in the obvious ways, then apply an expanded theory of truth to add a truth predicate to that language, then call anything that isn't assigned a value "super-meaningless", expand the theory of truth to give that a predicate, invent 2-super-meaningless, 3-super-meaningless, inf-, and the whole ordinal hierarchy again. Then what? Well, we'll have the same sort of gap for this second ordinal hierarchy. So by doing the whole thing again, we can create super-super-meaningless, super-super-super-meaningless, and infinite versions with any ordinal number of supers . What next? Well, we'll have the same problem again...

But notice what I'm doing...

All of this is quite seriously illegal if we take the notion of ordinal number seriously, because I'm simply constructing a hierarchy of ordinals larger than all ordinals. An ordinal cannot be larger than all ordinals! And even less can there be a whole hierarchy up there...

This demonstrates the force of the two limitative arguments (either "no, you seriously cannot refer to those things" or "sure, go ahead, but you'll derive contradictions"). Even though there really really seems to be a solid next step that can be taken, it is either a brick wall or a gaping ravine...

So, like I said, it seems to be a theory that almost works.

Monday, November 24, 2008

An Interesting Mailing List

I recently joined the everything list. It looks to be some good philosophical fun. The list practice is to submit a "joining post" that reviews intellectual background. I am replicating mine here.

----------------------------------

Hi everyone!

My name is Abram Demski. My interest, when it comes to this list, is:
what is the correct logic, the logic that can refer to (and reason
about) any mathematical structure? The logic that can define
everything definable? If every possible universe exists, then of
course we've got to ask which universes are possible. As someone
mentioned recently, a sensible approach is to take the logically
consistent ones. So, I'm asking: in what logic?

I am also interested in issues of specifying a probability
distribution over these probabilities, and what such a probability
distribution really means. Again there was some recent discussion on
this... I was very tempted to comment, but I wanted to lurk a while to
get the idea of the group before posting my join post.

Following is my view on what the big questions are when it comes to
specifying the correct logic.

The first two big puzzles are presented to us by Godel's
incompleteness theorem and Tarski's undefinability theorem. The way I
see it, Godel's theorem presents a "little" puzzle, which points us in
the direction of the "big" puzzle presented by Tarski's theorem.

http://en.wikipedia.org/wiki/Godel%27s_theorem
http://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem

The little puzzle is this: Godel's theorem tells us that any
sufficiently strong logic does not have a complete set of deduction
rules; the axioms will fail to capture all truths about the logical
entities we're trying to talk about. But if these entities cannot be
completely axiomized, then in what sense are they well-defined? How is
logic logical, if it is doomed to be incompletely specified? One way
out here is to say that numbers (which happen to be the logical
entities that Godel showed were doomed to incompleteness, though of
course the incompleteness theorem has since been generalized to other
domains) really are incompletely specified: the axioms are incomplete
in that they fail to prove every sentence about numbers either true or
false, but they are complete in that the ones they miss are in some
sense actually not specified by our notion of number. I don't like
this answer, because it is equivalent to saying that the halting
problem really has no answer in the cases where it is undecidable.

http://en.wikipedia.org/wiki/Halting_problem

Instead, I prefer to say that while decidable facts correspond to
finite computations, undecidable facts simply correspond to infinite
computations; so, there is still a well-defined procedure for deciding
them, it simply takes too long for us to complete. For the case of
number theory, this can be formalized with the arithmetical hierarchy:

http://en.wikipedia.org/wiki/Arithmetical_hierarchy

Essentially, each new quantifier amounts to a potentially infinite
number of cases we need to check. There are similar hierarchies for
broader domains:

http://en.wikipedia.org/wiki/Hyperarithmetical_hierarchy
http://en.wikipedia.org/wiki/Analytical_hierarchy
http://en.wikipedia.org/wiki/Projective_hierarchy

This brings us to the "big" puzzle. To specify the logic an refer to
any structure I want, I need to define the largest of these
hierarchies: a hierarchy that includes all truths of mathematics.
Unfortunately, Tarski's undefinability theorem presents a roadblock to
this project: If I can use logic L to define a hierarchy H, then H
will necessarily fail to include all truths of L. To describe the
hierarchy of truths for L, I will always need a more powerful language
L+1. Tarski proved this under some broad assumptions; since Tarski's
theorem completely blocks my project, it appears I need to examine
these assumptions and reject some of them.

I am, of course, not the first to pursue such a goal. There is an
abundant literature on theories of truth. From what I've seen, the
important potential solutions are Kripke's fixed-points, revision
theories, and paraconsistent theories:

http://en.wikipedia.org/wiki/Saul_Kripke#Truth
http://plato.stanford.edu/entries/truth-revision/
http://en.wikipedia.org/wiki/Paraconsistent_logic

All of these solutions create reference gaps: they define a language L
that can talk about all of its truths, and therefore could construct
its own hierarchy in one sense, but in addition to simple true and
false more complicated truth-states are admitted that the language
cannot properly refer to. For Kripke's theory, we are unable to talk
about the sentences that are neither-true-nor-false. For revision
theories, we are unable to talk about which sentences have unstable
truth values or multiple stable truth values. In paraconsistent logic,
we are able to refer to sentences that are both-true-and-false, but we
can't state within the language that a statement is *only* true or
*only* false (to my knowledge; paraconsistent theory is not my strong
suit). So using these three theories, if we want a hierarchy that
defines all the truth value *combinations* within L, we're still out
of luck.

As I said, I'm also interested in the notion of probability. I
disagree with Solomonoff's universal distribution
(http://en.wikipedia.org/wiki/Ray_Solomonoff), because it assumes that
the universe is computable. I cannot say whether the universe we
actually live in is computable or not; however, I argue that,
regardless, an uncomputable universe is at least conceivable, even if
it has a low credibility. So, a universal probability distribution
should include that possibility.

I also want to know exactly what it means to measure a probability. I
think use of subjective probabilities is OK; a probability can reflect
a state of belief. But, I think the reason that this is an effective
way of reasoning is because these subjective probabilities tend to
converge to the "true" probabilities as we gain experience. It seems
to me that this "true probability" needs to be a frequency. It also
seems to me that this would be meaningful even in universes that
actually happened to have totally deterministic physics-- so by a
"true probability" I don't mean to imply a physically random outcome,
though I don't mean to rule it out either (like uncomputable
universes, I think it should be admitted as possible).

Well, I think that is about it. For now.

Tuesday, November 11, 2008

Fixing Naive Lambda Logic

In this old post, I explained why naive lambda logic is paradoxical, and explained how to fix it. The fix that I suggested restricted lambda statements to represent finite computations, which is really what they are for anyway. However, in this recent post, I suggest that many naive, inconsistent logics (such as the naive lambda logic) are actually referentially complete: they can state any mathematically meaningful idea. The problem is that they also can state some meaningless things.

So, it is interesting to consider how one might fix naive lambda logic in a way that keeps the meaningful infinite stuff while removing the nonsense.

To do that, though, a definition of meaningfulness is needed. As I said in the more recent post, a statement's meaning needs to come ultimately from a basic level of meaningful statements (such as statements about sensory data). A meaningful statement should either be on this base level, or its meaning should be derived purely from other meaningful statements.

In the lambda logic, it is natural to think of a term as meaningful if it evaluates to something. A statement, then, is meaningful if its truth relies only on the value of meaningful terms. To make this precise, we've got to specify the infinite build-up process by which we assign meaning. For each operation that forms a new sentence, we've got to define how meaning is carried over. One of these rules, for example, will be:

If statement X is meaningful and statement Y is meaningful, then the statement X and Y is meaningful.

We need a rule like this for every operation: and, or, not, quantifiers, lambda, and the lambda-value relation. That may sound simple, but complications quickly build up. And is meaningful when its arguments are meaningful. But what about or? It seems like X or Y could be meaningfully true if X is true but Y is meaningless. But if we want this to work, then it would also be sensible to allow not Y to be meaningfully true when Y is meaningless. (That behavior of not would make the behavior of or that I mentioned consistent with classical logic.) But that equates meaningless with false, which seems wrong.

Another problem arises with the treatment of quantifiers. Do quantifiers range over all possible terms, or only meaningful ones? It makes a difference!

There are many different places we can run to get standard solutions to these problems: free logic, the revision theory of truth, fixed-point theories, and others.

A third problem, perhaps worse, arises from the concept "not meaningful". For a concept to be meaningful seems straightforward: it should be built up in a meaningful way from other meaningful statements. But trouble presents itself when we discuss non-meaning.

Imagine the base-level facts as ground from which trees are growing. The trees, of course, are the meaningful statements that can be built up. Meaningless statements would be branches hanging in midair, attempting to grow from themselves, or from other meaningless statements. (Meaningful statements can also have self-reference, but have an attachment to the ground somewhere.)

Now, when we consider in the concept "meaningless", we see some weird stuff happening: somehow the paradoxical branches that grow from nothing are able to support meaningful branches, such as the statement "This sentence is false" is meaningless. Or, even stranger, consider "This sentences is meaningless". It appears to be meaningful but false. Or, consider "This sentence is either false or meaningless". If it is true, it is false or meaningless; if it is false, it is true; if it is meaningless, then it is true. It looks like the only way to deal with it is to say that it is meaningless to ask which of the categories it is assigned to: it is meaningless to talk about its meaningfulness or meaninglessness.

To handle cases like these requires a sort of back-and-forth between meaningful and meaningless. We can't just grow meaning from the ground up and then declare the rest meaningless; in declaring things meaningless we allow more meaningful statements, so we've got to go back and add them in. That in turn might change the set of meaningless statements, and so on. If in doing this we are changing our assessments of various statements (going back and forth between "meaningful" and "meaningless"), then we are doing something similar to what the revision theory of truth recommends. On the other hand, I like the idea of marking things "definitely meaningful" and "definitely meaningless". A back-and-forth woulds still be needed, but all decisions would be final.

Anyway. Suppose we resolve all of those issues. Another interesting issue comes up: infinite lambda statements.

An infinite lambda statement could directly represent the mathematical entities that I want the system to be able to reason about. For example, an arbitrary real number would be any function from natural numbers to the integers 0 through 9 (if we want decimal notation), represented by a finite or infinite lambda statement. (Note: the calculation itself could always be fixed to halt in finite time, despite the lambda statement being infinite.) The interesting thing is that if the logic has been satisfactorily rigged up, it will be in some sense as if infinite lambda statements were allowed, even though they aren't.

This suggests that we need to be careful of the semantics, and therefore of how nonmonotonic reasoning is used. Are the quantifiers interpreted as ranging over all actually-representable lambda terms, or are they also ranging over the unrepresentable infinite ones? If the logic is to talk about unrepresentables properly, the quantifiers will have to range over them. But then nonmonotonic reasoning will not converge to the correct answers: it will converge to answers that hold for the finite terms only. This will sometimes be correct for the infinite case, but not always. The matter seems complicated, and I'm not yet sure how to deal with it.

Monday, November 10, 2008

How to Reason with Four-valued Definitional Set Theory

1. There is a "ground level" composed of statements in first-order logic that can be reasoned about in the usual manner.
2. The axiom of set comprehension is interpreted as allowing arbitrary definitions of sets (rather than asserting the actual existence of sets). A general logic of definitions would allow us to define any sort of object with any sort of definition, and reason about it from there; in this case, though, I only am worried about defining sets.
3. The definition of a set S can be "for all x: x is an element of S if and only if [statement]" for any statement (in the full language, not just the first-order part).
4. If "x is an element of S" can be derived from the definition of S plus any relevant information on x, then it is so. Similarly for "x is not an element of S". "Relevant information" here means the definition of x if x is a defined set, plus all information existing at the first-order level. Notice that the definition of S might imply both "x is an element of S" and "x is not an element of X", or neither of these.

That's it! Unless I've missed something. But, anyway, one thing that is very specifically not included is a way for contradictions (or anything) to leak over from the level of defined objects to the "ground level" of normal logic.

Oh, I suppose this should be added:

5. Statements of the full language can be reasoned about with some sort of intuitionistic paraconsistent logic.

This should compound statements to be constructed; for example, if two different statements about sets are true by definition, then "statement1 and statement2" should be treated as a true sentence. I think an OK way of doing this would be to allow the Gentzen-style [Edit: oh, I meant Prawitz-style!] introduction/elimination rules for and, or, forall, and exists, skipping over the rules applying to implication and negation. But, don't take my word for it!

Now, the system is definitely not referentially complete. It lacks the ability to refer to its own possible truth-value combinations. I'd need to add a nonmonotonic operator allowing the logic to conclude that a a statement was not true-by-definition and not false-by-definition, which of course gain credibility as attempts to prove a statement true/false fail. This gets somewhat confusing, because false-by-definition is distinct from not true-by-definition. Anyway, soon I'd be adding in the full machinery of a logic of "implies". So, the four-valued set theory doesn't seem like a complete foundation. Still, it is interesting!

Saturday, November 08, 2008

The Lure of Paraconsistency

A paraconsistent logic is a logic that allows some contradictions. To make this somewhat bearable, paraconsistent logics find various ways of stopping the rule of explosion: "from a contradiction, anything follows". The rule of explosion holds in both classical logic and intuitionistic logic, and makes inconsistent theories very uninteresting: there is only one of them, and in it all things are both true and false. Any theory that turns out to be inconsistent is logically equivalent to this boring theory.

So, contradictions are not as bad in paraconsistent logics-- the explosion is contained, so it's not like everything falls apart. We might still be irked, and ask how this counts as "logic" if it is explicitly contradictory, but at least we can do interesting formal work without worry.

For example, it would be possible to construct paraconsistant versions of naive set theory, my naive lambda calculus, and naive theories of truth. In the context of my most recent posts, it is interesting to consider a naive theory of logical implication: collapse all of the different levels of implication (the systems I named one, two, infinity, A(infinity, infinity)...) into a single implication relation that can talk about any level.

Now, the temptation: these "naive" foundational theories appear to be referentially complete! Frege was able to use naive set theory to construct basic mathematics, and indeed it doesn't seem as if there is any bound to what can be constructed in it. On my end, a naive theory of implication obviously contains all the levels I might construct: the system one is the fragment in which implication is only wrapped around first-order statements, two is the fragment in which implications are additionally allowed to be wrapped around statements in one, infinity is the fragment in which we can use any of one, two, three.... infinity^infinity allows us to use one, two, three,... infinity+1, infinity+2, infinity+3...

The problem, of course, is that in addition to allowing all meaningful statements to enter, we apparently let some meaningless statements through as well. But, at the moment, I know of no other way to let in all the meaningful ones! So paraconsistent logic is looking pretty good at the moment.

Essentially, it seems, a meaningful statement is any statement constructable in a naive theory that happens to have a truth value that relies on other meaningful statements. Those other meaningful statements gain their meaning from yet other meaningful statements, and so on down the line until we reach good solid first-order facts that everything relies on. But, the path down to basic statements can be arbitrarily complicated; so, it is impossible to construct a logic that contains all of the meaningful statements and none of the meaningless ones, because we can't know ahead of time which is which for every single case.

I found an argument that paraconsistent logic isn't the only way to preserve naive set theory (free version), but it apparently is only hinting at the possibility, not providing a concrete proposal. Actually, I've made some relevant speculations myself... in this post, towards the end, I talk about a "logic of definitions". Such a logic would be four-valued: true, false, both, neither. A definition can easily be incomplete (rendering some statements neither true nor false), but it can just as easily be inconsistant (rendering some statements both true and false). This is suited particularly well to the way the comprehension principle works in naive set theory; essentially, we can read the naive comprehension principle as stating that any set that has a definition, exists. The trouble comes from the fact that some such definitions are contradictory!

This seems nice; just alter naive set theory to use a four-valued logic and there you go, you've got your foundational logic that can do anything we might want to do. But I'm not about to claim that... first off, I haven't even begun to define how the four-valued logic would work. Second, that scheme omits the extra grounding that nonmonotonic methods seem somewhat capable of providing; I would want to look into that omission... Third, the non-classical manipulations provided by the four-valued logic may not be sufficiently similar to classical manipulations to justify much of classical mathematics. That would be a big warning sign. So, generally, the idea needs to be worked out in much more detail before it can be judged. (But, it looks like there is some work on four-valued logics... here, here, here... haven't read them yet.)

But, anyway, the naive theories (and the idea of using paraconsistant logic to make them workable) are quite valuable in that they provide a glimpse into the possible, showing that it is not utterly crazy to ask for a logic that can define any infinity a human can define... we just might have to give up consistency.

Tuesday, November 04, 2008

Normative Grounding

At the end of the last post, I mentioned a normative definition of grounding: a concept is grounded in a system if the system captures everything worth capturing about the way we reason about that concept. Perhaps "grounding" isn't the best term for this, but whatever you call it, this is an important criteria. This principle should also cut the other way: if something is not worth capturing, it should not be in the system. At the end of the last post I listed a few ways in which the system fails the first test: things that we do (to our advantage) that the system doesn't capture. But the system also has problems in the other direction: it does things that it has no good reason to do.

The expanded notion of meaning that I've made such a big deal about, which allows concepts that are meaningful in terms of other concepts but not in terms of base-level data, seems like a normative stretch. So what if a concept would be useful if we knew it to be true/false? The fact remains that it is useless, since we cannot know one way or the other!

Yet, we silly humans seem to use such concepts. We are (it seems) even able to prove or disprove a few of them, so that we do know them to be true or false. How and why?

The arithmetical hierarchy is defined in terms of a base-level class of computable predicates. However, thanks to the halting problem, it is impossible to tell which predicates are computable and which aren't. So, we never know for certain which level a statement is on. In terms of the formalism I've described, I suppose this would reflect our ability to disprove some statements of the form "true(...)" or "implies(...,...)"; for example, if we can show that a statement is a logical contradiction, then we know that it is not a tautology. (Do I need to add that to the list of manipulation rules?? Is it somehow justifiable if the system defines "true(X)" as "there exists a proof of X"??) So, some proof-searches are provably nonhalting, without resorting to mathematical induction... meaning some of the higher-level statements that appear undecidable will turn out to be decidable, and some that appear un-limit-decidable will turn out to be limit-decidable after all. Since we can't tell which ones will do this ahead of time, there may be normative justification for keeping them all...

Sunday, November 02, 2008

Back to the Ground

In the previous post, I described a way of constructing arbitrarily powerful logics. This description does not amount to an algorithm-- it invokes something human mathematicians do (listing larger infinities), but doesn't provide a way to implement that on a computer. Worse, it is known that by the standard definition of algorithm, such an algorithm does not exist; I can only hope that it does exist as a limit-computable algorithm. (Note: I just found out that the term "super-recursive algorithm" that I used last time can refer to algorithms decidedly not implementable on normal computer hardware... so I guess there is not a single term that refers to exactly those super-recursive algorithms that are implementable on standard hardware. "Limit-computable" is closer...) But, I think this is likely, since I think humans use such an algorithm.

Anyway. What I did not describe was a way of "grounding" each new level. I have previously discussed at length an idea for grounding "two", based on non-monotonic logic and an understanding of meaning within the network of beliefs rather than only in terms of the outside world. But, what about grounding the higher operators? With no manipulation rules, they are quite pointless!

Actually, the operators are not exactly the problem. If we wrap two statements in an "implies" statement, the truth/falsehood of the result relies on no more then the validity of the deduction from the first statement to the second. In other words, in some sense we aren't adding anything, we're just making explicit something we already do. But then something strange happens: we're suddenly able to wrap quantifiers around the statement. This becomes problematic. At that stage, we've got to create the manipulation rules that the next stage will wrap up in an explicit operator.

Actually, the same problem applies for the system two. I haven't specifically talked about how one statement in two might be derived from another.

But, really, the use of the "implies" operator is not essential: I could just have well used a "true" operator that took one statement. "A implies B" is true in the metalogic when "A -> B" is provably true in the base logic, with "->" representing truth-functional implication.

Reducing "implies" to "true" in this way changes the appearance of the whole system greatly-- it makes obvious the direct connection to Tarski's Undefinability Theorem and the hierarchy of truth predicates that it was originally taken to entail.

So, the manipulation rules for "true" are sufficient to account for the manipulation of "implies". First, we can adopt the famous T-schema:

something <=> true(something)

This does not lead to paradox because there is indeed a hierarchy established, rather then a single "true".

Second, boolean combinations and sentences with quantifiers are subject to the typical deduction rules.

Third, quantifiers are additionally subject to fallible rules: existential quantifiers are fallibly refuted by (infallibly) refuting many instances, and universal quantifiers are fallibly affirmed by (infallibly) affirming many instances. (One may wish to allow fallible evidence as well, but there is no normative justification for doing so: the resulting beliefs will never converge to the truth!)

fourth, "true(something)" is to be interpreted as "exists(affirmation of something in a lower logical system)", so that it is fallibly refutable like other existential statements. ("Affirmation" means proof if the lower system is one, limit-proof if the system is two, and so on. (So, in general, a proof of finite or infinite length.) Different logical primitives could be used in order to let this definition of "true" actually occur within the system; in particular, we could introduce more and more powerful quantifiers rather than more and more powerful truth-operators.)

So, do these definitions ground the system's mathematical knowledge (Assuming we;ve also supplied the system with a method of enumerating infinities)? Well, it's hard to say, isn't it? "Grounding" is a philosphical issue. We need a concrete definition. Try this: a system's knowledge of X is grounded iff it reasons about X in the way that a human who knew about X would reason. More precisely, it captures everything worth capturing about the human way of reasoning. The answer to the question still isn't obvious; I do not have at my disposal a list of all the things worth capturing about human mathematical reasoning. But, I do have a partial list of things that I think are worth capturing...

-first-order reasoning
-limit-computable fallible reasoning
-arithmetical meaning
-set-theoretic meaning
-mathematical induction
-transfinite induction

The theory above does not totally encompass all of limit-computation (it uses sigma-1 and pi-1 but not delta-2!). It does not entirely explain set-theoretical intuition (it includes "grounding" for uncountable entities, since it (hypothetically) enumerates any infinity a mathematician might; but, it doesn't support direct talk about sets). It doesn't support mathematical induction or transfinite mathematical induction.

So, still work to be done!

Friday, October 31, 2008

Progress

After pondering the previous post for a bit, I became dissatisfied with the informality of it-- I know what I'm saying, but I'm leaving so much out that someone else could easily interpret what I said in a different way. So, I started over, this time stating things more explicitly. This turned out to be a fruitful path.

One: Start with first-order logic.
Two: Everyone seems to think that first-order logic is well-defined, but of course the fact is it cannot be defined in itself. So, we need to add the ability to define it. This can be done by adding some basic turing-complete formalism for the logic to manipulate, like lambda calculus or arithmetic or (perhaps most conveniently) first-order logic. So, we have some operator that means "implies in One", which takes two statements and is true if one can be deduced from the other with first-order manipulations. (We then use nonmonotonic logic when wewant to determine the truth value of the new operator.)
Three: OK, so we agree that system is well-defined. But, by Tarski's Undefinability Teorem, we know it can't be defined within itself. So, we add another operator, that means "implies in Two". This takes two statements and is true if the first implies the second by the rules of Two.
Infinity: We can obviously keep going in this manner. So, define the logic that has each of the "implies" operators that might be gained in this manner. For each number, it has the operator "imlies in [number]".
Infinity+1: By Tarski's theorem, this system also can't define its own truth. But, we just did, so we want our ideal logic to be able to as well. So, we add the "implies in Infinity" operator. (Maybe you've heard that infinity plus one is still just infinity? Well, that is true of cardinal infinities, but not ordinal infinities, which is what we have here. Fun stuff!)
Infinity+2: Add the "implies" relation for Infinity+1.
Infinity+Infinity: Keep doing that, so that we've now got 2 "implies" relations for each number: the one for just the number, and the one for infinity+number.
Infinity*infinity: Keep going on in this manner so that we have an infinite progression for each number, namely "implies in [number]", "implies in Infinity+[number]", "implies in Infinity+Infinity+[number]", ...
Infinity*infinity*Infinity...
Infinity^Infinity...
Infinity^(Infinity^Infinity)...
A(Infinity,Infinity)... (where A is the ackermann function)

Obviously I've gone on far longer than I need to to illustrate the pattern. But, it's fun! Anyway, the point is that finding higher and higher meaningful logics is "merely" a matter of finding higher and higher infinities.

Saturday, October 25, 2008

Somewhat Broader Definition of Meaning

These are some thoughts arising from a recent discussion I was involved with on the agi list. As usual, I'm looking for a semantics that follows closely from syntax. In this post, I discussed my first jump to a broader definition. Here, I'll discuss some issues involved in such definitions, developing a series of broader and broader definitions along the way.

One of the simplest possible views on logical meaning is that a statement's meaning is totally specified by its verification criteria. This point of view is called logical positivism, and the idea also resonates strongly with constructivism. There are several ways of making it more precise:

-A sentence has meaning if there is a computation that will always indicate "true" or "false" from some finite set of sensory observations, when those observations are available. -A sentence has meaning if there is a computation that gives "true" from some finite set of sensory observations, when the sentence is true and those observations are available. -A sentence has meaning if there is a computation that gives "false" from some finite set of sensory observations, when the sentence is false and those observations are available.

These variations are interesting because they seem small, but make a real difference. The first is the most stringent, obviously. The second requires only verifiability, and the third requires only falsifiability.

One interesting property is that all three are inherently difficult to check, thanks to the halting problem. We can tell that a statement is meaningful once we've determined that it is true or false, but we won't always be able to do that ahead of time. Furthermore, because of this, the term "meaningful" won't always be meaningful :).

So, the first definition allows any first-order statements that always reduce to either "true" or "false" when the proper sensory information is added: a class we cannot sort out ahead of time. The second admits all of these, plus an extra existential quantifier. The third instead allows an extra universal quantifier.

Another interesting point: for the second and third definitions, the negation of a meaningful statement is not necessarily meaningful. This is rather odd, and the problem doesn't arise for the first definition. The negation of a verifiable statement is falsifiable; the negation of a falsifiable statement is verifiable. So, if we accept both types of statements as meaningful, then we regain the ability to negate any meaningful statement.

-A sentence has meaning if there is a computation that, from some finite set of sensory observations, either always gives "true" when the sentence is true or always gives "false" when the sentence is false (when the observations are available).

But, an unwanted limitation still remains: we cannot arbitrarily compose meaningful statements. More specifically, we can't put any meaningfully true/false statement in the place of sensory data. To fix this, we can change the definition from one that relies on sensory information to one that can rely on any meaningful information:

-A sentence has meaning if there is a computation that, from some finite set of meaningful data, either would always give "true" when the sentence is true or always give "false" when the sentence is false (if the data were available).

The definition shifts to hypothetical because there are now some meaningful statements for which the data will never be available, since the truth of a meaningful statement can be unknowable (thanks to Godel's Incompleteness Theorem).

The discussion up to now does not depend on a choice between intuitionist and classical logic: both would work. This is largely because I've been discussing meaning, not truth. I have not actually given any definitions that specify when a statement is true or false. If we run a statement's computation, and it spits out "true" or "false", then obviously the statement is correspondingly true or false. However, what if we are stuck with a nonhalting case? The truth value is undefined. An intuitionist approach would leave the value undefined. (So, any truths that are unknowable according to Godel's incompleteness theorem are actually meaningful-but-undefined.) A classical approach would be more likely to define these facts: if the computation corresponding to a verifiable statement never outputs "true", then it is false; if the computation for a falsifiable statement never outputs "false", then it is true. (So, truths that are unknowable according to Godel's incompleteness theorem are meaningful but take an infinite amount of time to verify/falsify.)

I will generally use the classical version, but the reader should make up their own mind.

Notice: the extra assumptions that I add for the classical version cannot be stated in first-order logic. There are two reasons. First, if for some particular computation I attempt to say "if it doesn't output 'true', then the corresponding statement is false", then I am being circular: "doesn't output true" will be a statement of the sort that is either false (if the computation *does* output), or undefined. Since it is never definitely true, my statement carries no weight. (At this point the intuitionists in the audience should be saying "told you so".) Second, I need to state the assumption *for all sentences*, but so far my definitions have only allowed reference to a finite number of meaningful facts.

So, are the classical assumptions meaningful? I want to justify them in terms of manipulation rules: I want to set things up so that if some logicians came and looked at the workings of the logic, with no prior knowledge, they would ascribe to it the semantics I have in mind.

First. I want manipulation rules for the condition "if the computation doesn't output". For this, I can invoke the nonmonotonic logic that I described previously: we assume that the computation doesn't stop, and revise our belief when we find out that it does. Thus, our belief will eventually converge to the truth if we spend enough time checking. (We could also throw on fancer stuff, like weaker/stronger beliefs, so that our belief that the computation doesn't halt gets stronger as we run the computation for longer and longer.)

Second. It seems fairly simple to extend the definition to allow statements to be about an infinite number of base-facts. In fact, it is quite necessary to do so! So far, I've only been discussing statements that apply to a "finite number of meaningful statements". For sensory statements, that makes the system incapable of generalizing. We want a practical AI logic to be able to learn rules that will apply to any new sensory data; surely such statements are meaningful. But, this is a larger shift then may be at first obvious. In terms of the logic, universal statements were up until now merely procedural, used essentially to indicate that some inference step can be made repeatedly. This step allows them to actually reference an infinite number of things. Is that really "meaningful"? Someone might object, for example, that all beings have finite lifespan, so there is really only a finite amount of sensory data to be talked about. I would respond to that particular objection by saying that our lifespan is finite, but not bound by any particular number ahead of time. So, similarly, the reference to all sensory data isn't bound ahead of time. (In terms of the logic, this would mean that we need to be able to quantify over statements.)

Both of these modifications can be understood in terms of super-recursive algorithms. A super-recursive algorithm is an algorithm that is fairly easily implemented on a computer, yet inconsistent with formal models like Turing machines. Specifically, the first definition makes use of limit-computation, and the second makes use of both that and interactive computation.

A limit computation is a computation that only definitely gives the correct answer after an infinite amount of time. The more time we give it, the more likely the answer is to be correct. This sort of computation is commonly used to render some types of fractals, like the mandelbrot set. For each square in the image, a limit-computation is approximated with some finite cutoff time. The longer we allow, the closer the image is to being correct. Here, it is being used in the first case to characterize the concept of not halting, and in the second it could arise in situations where a statement quantifies over an infinite number of meaningful statements.

Interactive computation is what a computer does when it asks you (or the internet) questions during a process. A commonly-invoked image is the driver of a car. The driver is constantly adjusting to events that unfold in the surrounding traffic. The standard models of computation, based on Turing machines and similar constructions require that all input is present before starting, and output is only available once finished. To apply such models of computation to driving, we would need to break the whole process up into small input/output phases-- which does not seem very useful. Interactive computation is used in the above when a statement references an infinite number of sensory items, so that the evaluation of the truth value must be a continual process that continues to take into account more input from the environment.

So, what does the definition of meaning look like now?

-A sentence has meaning if, from some computably-specified set of meaningful data, there is either a computation that would eventually halt if the sentence is true and not otherwise, or a computation that would eventually halt if the sentence is false and not otherwise (if the data were available).

Notice: not all truths are even limit-computable. It may look like it from the definition, but it's not the case. That is because a sentence may now reference an infinite number of limit-computable sentences. We could attempt to share time between some finite-but-large number of these, but in some cases the computation will be provably non-convergent: we won't eventually get better and better answers as we spend more time. One way of interpreting this is to say that some statements don't have meaning in and of themselves, because they don't really have a working verification/falsification procedure tied to them. Instead, they are only meaningful relative to the larger web of statements.

Further expansions of the definition are imaginable... for example, what if we replace both instances of "computable" in the above with "meaningful"? But, I think that is enough speculation for now.

Sunday, October 05, 2008

Finite Sets Prove Mathematical Induction Correct

In this post, I asked two questions:

1. Where does our knowledge about nonhalting programs come from?

2. What grounds the mathematical concept of "set"?

One thing I didn't mention is that an answer to the second question would probably provide an answer to the first. This is because practically all of our knowledge about nonhalting programs can be proven in axiomatic set theory, and it is (somewhat) reasonable to assume that a "perfect" set theory (one that really did capture everything worth capturing about the human intuition of sets) would capture all of it.

Now, I should reserve myself a bit: explaining where the grounding comes from does not necessarily explain where the knowledge comes from, as illustrated with my logic that (in my opinion) has a grounded notion of "programs" and "computation", but which does not necessarily have any additional knowledge to accompany that grounding. But this is beside my point, which is...

Finite sets prove mathematical induction correct!

I mentioned in that post that most (if not all) knowledge about halting can be provided by the proof method called mathematical induction (no relation to the use of "induction" meaning "learning"). It is not difficult to prove the correctness of mathematical induction from set theory; but, I did not consider this route before, because full set theory is harder to justify then induction. However, I came across a paper by George Boolos called "The Justification of Mathematical Induction" which shows how this can, after all, be a useful approach.

The paper justifies mathematical induction from only an extremely minimal set theory, plus the transitivity of less-than (if a is less then b and b is less than c, a is less then c). The set-theoretic fact needed is approximately this:

Pick any number, n, and formula, F. There exists a set X at level n, containing all and only the those things existing at lower levels and satisfying F.

This is an axiom schema, because we can fill in any formula F to get a specific statement. (First-order logic does not actually have a way of saying "for all formulas".) It establishes levels of existence, one for each natural number. It is "extremely weak" because without more principles, it only requires the existence of each possible finite set! There is no grounding problem, because each set that is forced to exist can be explicitly manipulated. (Note: the principle does not rule out the existence of more complicated sets.) I think this is quite satisfying.

On the other hand, I don't have a good feel for where the extra knowledge "comes from"... I can read through the proof, but it is surprising that merely adding the ability to collect entities together into sets gives us so much power to prove important truths! It is dangerous, too. Dropping the notion of levels gives this schema:

Pick any formula F. There exists a set X, containing all and only the those things satisfying F.

This is the naive set comprehension, which (as is well known) leads to a contradiction known as Russel's paradox. Russel's paradox cannot be derived from the version I gave that is restricted to levels. however, perhaps Russel's paradox can still serve as a lesson that, more often than not, an axiom schema will yield unexpected results.

Remaining Questions:

How can talk of larger infinities be grounded, and how can their mathematical existence be proved? How can the way human mathematicians talk about them be justified? Or, is it all actually nonsense? How much of human mathematical knowledge does an AI need to be given before it can learn the rest? (Remember, all facts about halting are in principle learnable; the reason I'm worried about proving some of them is that humans seem to be able to. Also, humans seem capable of coming to know facts about convergence of processes, which are not in principle learnable or provable, unless we really can prove facts about halting.)

Wednesday, September 24, 2008

Some Comments on the Previous Post

One thing that is very interesting about what I talked about last time-- it provides a surprisingly practical purpose for detailed talk about infinities. The bigger infinities you know how to talk about, the more logical systems you can prove consistent, giving you more information about which Turing machines never halt. But this is very strange! These Turing machines are not being run for so long that they enter into the realm of higher infinities. They either halt in finite time, or keep looping for one infinity (the smallest infinity, sometimes called omega). Why, then, does talk of greater and greater infinities become useful?

The straightforward answer is to recount how that information is used to prove consistency (and therefore non-halting-ness) of a system. First, we assign a mathematical semantics to a system; a meaning for the system-states. Second, we show that the initial meanings are true. Third, we show that if one state is true, then the transition rules guarantee that the next state is true as well; in other words, we show that all manipulations are truth-preserving. This is the step where infinities come in. Although no infinities are involved in the actual states and transitions, the semantics may include infinities, so our reasoning may rely on facts about infinities. This sounds silly, but it creeps in more readily then you might expect. If the logic that we are proving consistent uses no infinity, then we must use a single infinity (omega) when proving it consistent; if it uses a small infinity, we must use a larger one to prove it consistent; and so on.

Utterly strange! Why does this method of proving consistency work? We're somehow making the proof easier by treating facts about finite stuff as if they were facts about infinite, unreachable stuffs.

So, it seems that self-reference (trying to prove oneself consistent) would give grounding for as many infinities as one might desire. OK, but, how do we actually get them? A normal logic is stuck with just the infinities it has, even when it starts to reason about itself.

Monday, September 22, 2008

Some Questions

1. Where does our knowledge about nonhalting programs come from?

An arithmetical logic is doomed to incompleteness because no logic can capture all the facts about nonhalting programs. However, in practice, we humans use logics that capture some of these facts. Which ones, and why? Well, mostly, we justify what we know about nonhalting programs via a special proof method called mathematical induction. The correctness of arguments by mathematical induction is fairly obvious (once the method of argument has been properly explained to a person, anyway); but why? What justification can be provided? I've read a justification based on infinitary logic, but while interesting, the easiest way to formalize the argument in first-order logic would just rely on mathematical induction again. So how do we humans know mathematical induction is right?

I've argued in the past that the best way to deal with the halting problem is to (1) assume a program doesn't halt, and (2) run it for a really long time, revising the assumption if it turns out that it does halt. This general idea could also account for us humans having a certain amount of inborn knowledge about the halting problem: evolution weeds out those with wrong inborn assumptions (which lead to wrong results), but favors those with the right assumptions (which lead to correct reasoning). But, is this really where we get mathematical induction? Is it just something that happens to work, and got embedded in our genes? That doesn't seem right! It seems totally implausible!

A second source of our knowledge about the halting problem is our seeming knowledge about the consistancy of formal systems. Goedel's 2nd incompleteness theorem shows that any (sufficiently powerful) formal system cannot prove its own consistancy. But, as humans, we reason that if the system is correct it must be consistant, since the truth is not inconsistant. The formal systems agree with us on the first point (they "believe" their own formulas, so implicitly support their own correctness), but cannot reason from that to their own consistancy.

This is equivalent to knowledge about halting, because what we are saying is that a program that proves theorems until it arrives at an inconsistancy will never halt. So, utilizing our human insight, we can add that fact to a logical system to increase its knowledge about halting! Call the first system S1, and the second S2. Since we believed that S1 was consistant, and S2 differs only in that it contains more true information, we will believe S2 to be consistant. If we add *that* fact, we can form a system S3. We know S3 is consistant, so we can record that knowledge to get S4. And so on. In this way, a human appears to have an infinite supply of true infomation concerning the halting problem, even over and above our knowledge coming from mathematical induction (which we can suppose was encoded in S1 already).

So, we have two mysterious sources of information about the halting problem. Also, it seems that these two sources can be linked, because the truth is we can do somewhat better than simply saying "the axioms of arithmetic are obviously true, and therefore consistant". We can instead look to the proof by Gentzen. This proof relies on a method of argument that is stronger than normal mathematical induction, called transfinite induction. Basically, this extends normal induction to fairly small infinities (I think the exact class would be computable ordinals?). It seems that this principle extends to proving the consistancy of more powerful logics as well: we can prove their consistancy by using some variant of infinite induction, so long as we pick infinities larger than the system itself can refer to. (If anyone has a better reference about this, I'd love to hear about it.)

So, this is how it looks: we seem to have an uncanny ability to list larger and larger infinities, while any formal system is limited to some ceiling. Furthermore, it seems we somehow justify applying mathematical induction to any such infinity, yielding more truths about halting then any particular formal system can muster.

This situation is very strange. First, the process of listing larger infinities seems well defined, yet we can't define it (because when we do, we can automatically come up with a larger infinity by saying "the first one not reacable by that mathod"). Second, mathematical induction seems justifiable; it seems different than the random (truly unknowable) facts about halting that "cause" Goedelian incompleteness. But if it is different, then where is the difference?

2. A concept of 'procedure' could be grounded in our own thought process, but what grounds the mathematical notion of 'set', especially 'uncountable set'?

I have mentioned this before, of coruse. It continues to puzzle me. Uncountable sets are hugely infinite, so it seems really really obvious that we cannot have any actual examples to ground our reference to them.

The elementary facts about uncountable sets are proven by defining them in terms of powersets. The powerset P of a set S is defined as the set of all subsets of a set. The powerset of the set of integers is uncountable. However, there is a problem: when we try to give an actual definition, "all subsets" is hard to properly define. For finite sets, it is easy; but for infinite sets, it tends to collapse to "all referencable subsets". Mathematicians know that there are far more subsets that cannot be referenced than subsets that can be referenced, so this is no good.

Wednesday, September 03, 2008

Logic Standing Firmly on its Own Two Shoulders

I have thought of an amusing and elegant variation of my grounded arithmetical logic (a variation on standard lambda logic, which I discussed here, here, and almost everywhere inbetween).

The logic supposedly gets its grounding from the possession of a black box that executes programs represented as lambda terms-- the idea being that they literally are programs, so cannot fail to be grounded (despite the fact that no logical description of the concept of computation is complete, which makes it sound impossible to represent logically).

But, this idea is odd; it sounds as if the logic is getting grounding from an external black box that it is hooked to. What is needed is an embodiment of procedural-ness. But, there is a natural source of this from within the logic: deduction is a procedural task, which is itself Turing-complete; so, it can serve as a representation for arbitrary computations, just as well as lambda-reduction can.

So, the idea is to ground the logic in its own deduction system. This is not a radically new idea; provability logic is a well-established field of study. But, in any case, the idea is useful here if we don't want the logic's groundedness to rely on a lambda-evaluator that is outside of the logic itself.

This is satisfying for another reason, as well. I talked previously, somewhat, about the "reason" that my version of lambda logic is more semantically capable than plain first-order logic: it is able to represent concepts with a closed-world assumption in addition to the standard open-world assumption. What this means is, the logic can define a predicate by stating some rules that hold for the entity, and then adding "that's all". Statements that don't follow from the rules are automatically false. This concept offers an explanation for where unprovably true statements come from: the "that's all" operator has a meaning that cannot possibly be covered by an associated proof method. (Or, at least not one that terminates in finite time.) So it is the origin of much mathematical mystery.

Unfortunately, it cannot be the source of all mysteries; it allows us to talk about arithmetical truths, but not set-theoretical ones.

Anyway. The second nice feature of a logic grounded in its own deduction system is that it allows the "that's all" concept to be put directly to use. We can define a predicate by saying it is true if and only if it is provably true via a given set of statements.

This construction would have some interesting behavior. The first thing to realize is the all-encompassing reach of the closed-world assumption. The construction should not be used, for example, when defining some property of bicycles: if we discover a new bicycle, we will be unable to describe it with that property. It is better suited for defining properties of sets that can be completely pre-specified, like the natural numbers. We will not happen upon any new numbers, so we are safe. To apply predicates to bicycles while still using the closed-world construction, we would need to do something fancy such as define a class of all possible bicycle-types, define a closed-world predicate on that, and then apply the predicate indirectly to actual bicycles by classifying them as instances of the abstract bicycle-types.

The strength of the system could be increased even further by adding truth-predicates via one of the methods in this post, for example... but set theory would still not be grounded in the system.

Wednesday, August 27, 2008

A More Direct Attack

So, as I noted last time, there are many possible ways to fit the predicate "True" into a language. Which of the many ways is correct? What does that even mean? What makes one better than another? Or, are all of them OK? Is it merely a matter of taste, since all of them capture the normal everyday uses of the concept of truth? In other words, is "truth" merely a concept that might be used in slightly different ways by different people, meaning there is no right answer?

I also noted last time that these logics do not seem to automatically solve the other problems I'm worried about; in particular, they do fairly little in the way of providing a foundation of mathematics.

The goal is to create a logic that can talk about its own semantics; a logic that can reference any mathematical ideas that were needed in describing the logic in the first place. Tarski started the whole line of research in this way; he asked, if a logic could define its own notion of truth, then what? (And he showed that, given some additional assumptions, such logics turn out to be contradictory.) But the theories I referenced last time, and those like them, have diverged somewhat from this line of inquiry... instead of providing a logic capable of defining its own truth predicate based on the semantic notions a human would employ in explaining the logic to another, these theories simply put the truth predicate in the language to begin with, and attempt to state rules that make it behave as much like our concept of truth as possible without allowing Tarski's contradiction.

That's totally different!

So, a more direct approach would be to try to create a logic that can define itself; a logic that can develop its notion of truth from more basic concepts, rather than declaring it as given. I suspect that such an attempt would not result in the same multitude of possible solutions.

The Logic of Definitions

On a related train of thought-- I have come up with another idea for a logic. The idea was based on the observation that many of the proposed solutions to defining truth were actually proposals for a theory of definitions in general; truth was then treated as a defined term, using the dfollowing definition, or some variant:

" "X" is true" means "X".

The revision theory treats this as a rule of belief revision: if we believe "X", we should revise our belief to also accept " "X" is true". The supervaluation theories claim that the above definition is incomplete (aka "vague"), and formulate a theory about how to work with incomplete definitions. (But, see this blog for a note on the ambiguity of the term "supervaluation".)

So, I have my own ideas about how to create a logic of definitions.

--The base logic should be normal classical logic. It could simply be first-order logic if that is all you need in order to define your terms; it could be arithmetic if that is needed; or it could be axiomatic set theory, if you need it to define what you want to define. In other words, the basic theory will depend on what you want to talk about with your definitions, but the logic that theory is set in will be classical.
--The logic of the defined concepts, however, will be violently non-classical. A defined predicate may be neither true nor false in some cases, because its definition simply fails to say anything one way or the other. It could be both true and false in other cases, when the definition implies both.

This is, of course, only a rough outline of the logic... it differs from most treatments of definitions (such as revision theory and supervaluation theory) by allowing not just statements that are assigned no value, but also statements that get both truth values.

Friday, August 22, 2008

Other's Inventions

My goal mentioned in the previous post was creating a logic that can reason about its own semantics (somehow sidestepping Tarski's proof of the impossibility of this). Well. This is a very ambitious goal. Luckily, I am not the first to desire such a thing, and there is a vast body of work on the subject. I find not one solution, but three:

Fixed Point Theories
Supervaluation Theory
Revision Theory

There are other theories than these, but these are the ones I saw mentioned the most.

It is comforting to see that "grounding" is often mentioned in connection with these theories, although the word is not being used in quite the same way I am currently using it. The word comes up particularly often in association with fixed-point theories.

Some of the differences between the three theories are superficial. For example, fixed-point theories are often stated in terms of three truth values: true, false, and neither-true-nor-false. Revision theorists reject this idea, and instead simply fail to assign a truth value to some sentences. This makes no difference for the math, however.

So, in a sense, the three theories are very similar: each describes some method of assigning truth values to some, but not all, sentences. Each of the methods sound overly complicated at first, compared to typical paradox-free mathematical settings, in which truth-assignment is very straightforward. But, once one realizes that the complexity is needed to avoid assigning truth values to meaningless statements such as paradoxes, each of the methods seems fairly intuitive.

Unfortunately, each method yields different results. So which one is right?

I've got to do a lot more research, to find out more about how they differ, other alternative theories, and so on. I suspect the fact that I am looking for concrete results rather than merely a philosophically satisfying logic will help guide me somewhat, but I am not sure.

Also, it appears that most of these logics are developed as modifications of arithmetic, rather than set theory. But, I am looking for a logic that can serve as a foundation for mathematics. It appears that a "direct attack" on the self-reference problem does not automatically get me what I want in terms of the general reference problem. So, I still need a good theory on how mathematical facts about uncountable sets can be grounded...

Thursday, August 14, 2008

Direct Attack

My approach thus far to the issue of finding the correct logic has been incremental. For each logic, find something I cannot represent but want to, and attempt to extend the logic to include that. So, since I've settled on a form of lambda logic as a logic for arithmetical concepts, the next step would be to look for an extension into the hyperarithmetical hierarchy or the analytic hierarchy, or possibly to set theory. These form a progression:

arithmetic hierarchy < hyperarithmetic hierarchy < analytical hierarchy < axiomatic set theory

So, a jump directly to set theory would be nice, but the hyperarithmetic hierarchy would be the easiest next step.

This process can continue indefinitely, thanks to Tarski's Undefinability Theorem: no logic is able to define its own semantics, so if I want to find a way to extend some logic X, the standard way would be to look for a logic Y that can define the semantics of X. However, my vague hope is that if I work in these incremental extensions for long enough, I will eventually understand the undefinability principle well enough to see a way around it, thus making one final extension to the logic that encompasses all possible extensions.

This is a silly idea.

So, what about a direct attack? For it to work, I need to find a way around the Undefinability theorem. The undefinability theorem makes some specific assumptions; so, I know before starting that the logic I come up with needs to break these assumptions.

I'll outline the theorem's proof.

First, Tarski asks us to assume a minimal fact about the concept of truth: For all statements X, X is true only when X.

T(x) <=> x

This is called "Tarski's T-Schema". The idea is that if a formalization does not satisfy this, it cannot be a concept of truth. It is possible that more than the T-schema is required for a proper notion of truth, but if there are any, they aren't needed for Tarski's proof. (So, the T-schema is one of the critical assumptions to scrutinize. Is it really a necessary property of truth?)

Next, assume that some logic can talk about its own concept of truth. The idea is to prove that it can then say "This sentence is false", causing a paradox within that logic.

To prove this, we need one more assumption: the logic must contain arithmetic. This is a rather large assumption. However, it is one I am obviously making. So let's continue.

The reason arithmetic is required for the proof is that the Diagonal Lemma holds for any logic extending arithmetic. (So, if the Diagonal Lemma holds for some other logic, we don't need to require that the logic extends arithmetic.)

The diagonal Lemma states that if we can represent some predicate P, then we can create a sentence that says "I am P". In short, the Diagonal Lemma shows that any language extending arithmetic contains self-referential sentences.

Now: if the language can negate predicates, then it can express the predicate "not true" (since we've already assumed it can express "true"). Therefore, by the Diagonal Lemma, it can express "I am not true". Call this sentence Q.

By Q's definition, we have Q<=>~T(Q). (In english, this just says that Q and "Q is not true" say the same thing.) By the T-schema, we have Q<=>T(Q) (meaning Q and "Q is true" say the same thing). Therefore, we have T(Q)<=>~T(Q) (meaning "Q is true" and "Q is not true" say the same thing). Now we seem to have serious problems. In a classical logic, this proves a contradiction, finishing Tarski's proof that no such logic can exist.

So, it seems that there are plenty of ways to get around the undefinability theorem. However, it is not obvious which one is the right way.

Here is a list of some assumptions that could be violated:

1. The T-schema
2. The Diagonal Lemma
3. If we can express a predicate P, we can express its negation ~P
4. Every sentence is either true or false
5. No sentence is both true and false

This is not a new problem, so there are many proposed solutions. Of course, my version is somewhat unique, since I want a solution that not only allows "truth" to be defined, but also any other coherent concept that a human might wish to define.

Some initial thoughts.

The T-schema seems solid, although I can't rule out the possibility of counterintuitive exceptions.

The Diagonal Lemma is an assumption I cannot break, since what I am proposing is a logic that can refer to anything a human can refer to. It is conceivable that the logic humans use does not admit the Diagonal Lemma, but if that is the case, then my goal is impossible because it implies that humans cannot define the logic that they use. If my goal is acheivable, then it is acheivable with a self-referential logic.

Assumption 3 seems solid, but again maybe there are some strange counterintuitive exceptions.

Assumption 4 seems false in the presence of sentence Q; it is far from obvious that sentence Q is actually true or false.

Q could also be seen as a counterexample to assumption 5, if it is seen as both thrue and false rather than neither. Personally I prefer neither.

Source:
http://plato.stanford.edu/entries/self-reference/#Matt-sema

Artificial Intelligence