## Monday, April 27, 2009

Thoughts on Guiding Inference

One of the big rifts between the way things are described in the theoretical foundations of mathematics and the way things are actually done in real mathematics is that real mathematicians are constantly defining new objects to study. Perhaps someone sees a particular combination of properties pop up a few times, so they give that combination a name and start studying it directly. Perhaps someone decides a particular restriction of a given problem will be easier to solve, and so gives that a name and starts studying it. Perhaps someone wants to know how many X need to be combined to make Y, and they call the number the "X number" of Y, and start studying the properties of it as X and Y are varied. And so on.

In fact, even when studying the foundations of mathematics, such names are often snuck in as "unofficial notation" that is thought of as abbreviating the official notation. Unofficial notation makes axioms and other formulae easier to read.

Why do we do this? I can think of two explanations. First, one can explain this in the same way "named entities" are explained in other domains: it is compressive to name common entities. This means it takes up less room in memory, and also as a side benifit it usually makes the world easier to predict. This is an important explanation, but not the focus of this post. The second explanation is that naming things is a means of inference control.

When we name things, what we proceed to do is to determine some basic properties of the new entities. (This seems true in non-mathematical discourse as well, but perhaps not as clear-cut.) We come up with properties of the new entities, and look for ways of calculating those properties. We come up with theorems that hold for the entities. We look for existence and uniqueness proofs. And so on. What I'm arguing is that all of this activity is basically focused on helping later inference.

One particular phenomenon serves as a good illustration of this: the behavior of mathematicians when studying sequences of numbers that emerge from particular mathematical entities. The prime numbers; the number of groups of each finite size; the number of graphs of a given size; the fibonacci sequence. The big thing to try to do with a sequence is to "find the pattern". What could this mean? If the sequence is mathematically well-defined, don't we know the pattern already? A mathematician will find the first few values of a sequence, and then study them looking for relationships. Often what is sought is a recursive definition of the sequence: a function that calculates the next number based on the direct previous number, or several of the previous numbers, or perhaps all of the previous numbers. If we've already got a recursive form of the sequence, we might try to find a "closed form" version. My argument here is that all of this behavior is explained by the desire to make later reasoning as expedient as possible. Finding the recursive form is not merely a sort of curiosity, like wondering if it is possible to keep a spider alive in a paper cup; rather, the recursive form is a faster way of calculating the sequence then the method that follows directly from the definition. Similarly, the closed form wil usually be even faster.

So, what I'm saying essentially is that a reasoning system should (when it isn't busy doing other things) be looking for nice entities related to the task at hand, and nice quick ways of calculating stuff about those entities. What "nice entity" means is based on two (sometimes conflicting) motivations: the entity should be compressive (meaning it should help make descriptions take up less space), and it should be tractable (meaning reasoning about it should be quick).

How should the search for good ways of calculating be carried out? By the same methods as general inference. The system should be able to apply good reasoning methods that it finds back onto the task of looking for good reasoning methods. Of course, for this to work very well, the system needs to be able to have fairly good reasoning methods to begin with.

What form should the problem-solving methods that the system finds take?

Definitely they should be able to take on the form of typical algorithms: closed-form expressions, recursive expressions, and generally any . Definitely these should be associated with the necessery criteria for application to an object (the criteria that guarantee correct results). Probably they would also be associated with provable upper bounds on runtime, so that the method chosen for a particular case (where multiple methods might apply) would be the one with the lowest time-estimate for that case. (Problems might arise for difficult-to-calculate time bounds; perhaps estimators would be required to be linear time.)

Some extra features could probably improve upon this basic setup:

-Allow possibly-non-terminating algorithms. This would make "search for a proof" count amoung the methods (as the last resort), which strikes me as elegent.

-Allow learned expected time bounds

-Allow "soft" problem-solving strategies (of possibly various sorts) to be learned; ie, heuristics