Artificial Intelligence: lambda calculus

Showing posts with label lambda calculus. Show all posts

Sunday, June 14, 2009

What They Don't Tell You About Type Checking

Typed lambda calculus is not Turing-complete.

There. I said it.

More specifically, simply typed lambda calculus is not Turing-complete, and neither are any variants that are both strongly normalizing and have decidable type-checking. This is because programs that the type-checker verifies are guaranteed to compute a result. If such a type-checker allowed a Turing-complete set of programs, it would be a solution to the halting problem!

Really, I should have put two and two together earlier on this one. I suppose this is what comes of picking lambda calculus up by reading diverse things in diverse places rather than learning it from one authoritative source.

What this indicates for me is that, at least in many cases, the point of allowing more and more sophisticated type systems is to get closer to a Turing-complete system. That is why people add things like parametric polymorphism, dependent types, and kinds. When we add these to typed lambda calculus, it doesn't just get "more expressive" in the sense that a high-level programming language is more expressive than machine code; it is literally able to do things that a simpler type system could not.

This doesn't mean that strongly typed programming languages are not Turing-complete. Typically the type-checkers for these will not guarantee that the program contains no infinite loops. So, one must be careful to figure out exactly what one is dealing with.

Monday, February 09, 2009

Compression

As promised in the previous post, I'll describe what I'm currently working on.

Really, this is a continuation of this post. The ideas I outlined there have the same basic shape, but have evolved a good deal.

The basic idea: Any Turing-complete representation can be "run backwards" to generate the search space of all (and only) the programs that generate the desired piece of data. This approach promises to be much more effective then a genetic programming solution (for the special case to which it can be applied). A possible immediate application of this is data compression, which is formally equivalent to looking for short programs that output a desired piece of data (or, it is if we include the decompression program when measuring the size of the file, as is often required in compression contests to prevent cheating).

Realizing that compression was the application area I should be focusing on helped simplify my design a great deal. There may be a theoretical connection between compression and prediction, but converting between the two is not computationally simple. Converting compression to prediction can be done in two ways (so far as I can see):

-generate futures and then compress them to see how probable they are (shorter = more probable)
-generate (short) compressed representations of the future and decompress them to see what they actually mean

The first method can use the "running backwards" trick, trimming the search space. The second cannot, so it may generate meaningless futures (compressed representations that do not actually decompress to anything). However, the second has the advantage of starting with short (probable) representations; the first will go through many improbable possibilities. I can think of many little tricks one could try to improve and/or combine the two methods, but my point is, it is a hard problem. I was attempting to tackle this problem for most of last semester. Eventually, I may try again. But for now, compression is a good testbed for the basic algorithm I'm working on.

What about the other direction: can we convert from prediction to compression in an effective manner? It turns out that the answer is yes. There is an awesome algorithm called arithmetic coding that does just that. This is the basis of the PAQ compression algorithm, the current leader in the Hutter Prize and Calgary Challenge (both linked to earlier).

However, this strategy inherently takes the same amount of time for decompression as it takes for compression. PAQ and similar methods need to form predictions as they decompress in the same way that they form predictions as they compress. This is what takes the majority of the time. More "direct" methods, like the one I am working on, take far less time for decompression: compression is a process of searching for the shortest representation of a file, while decompression simply involves interpreting that short representation.

The Hutter prize places a time limit on decompression, but not on compression; so in that way, my method has the upper hand. In theory I could let my algorithm search for months, and the resulting file might decompress in minutes. (A rough estimate would be that the decompress time is the logarithm of the compress time.)

By the way, these compression contests are far from practical in nature. Real applications of compression prefer compression methods that work fairly fast for both compression and decompression. The Hutter prize is designed to increase interest in the theoretical connection between prediction and compression, as it relates to artificial intelligence. Thus, it has a very loose time retriction, allowing for "smart" compression methods such as PAQ that take a long hard look at the data.

But, enough about contests. Time for some more details concerning the algorithm itself.

The major design decision is the choice of Turing-complete representation. The algorithm actually depends critically on this. Representations vary greatly in their properties; specifically, how well a heuristic can guide the serch. The major heuristic to consider is simply size: if a move decreases size immediately, it is somewhat more probable that it decreases size in the long run, too.

The "classic" choice would be to just go with Turing machines. Heuristics on pure turing machines, however, happen to be terrible. There are a very small number of options for backwards-moves, and usually none of them immediately changethe size of the current representation. We are running blind!

For most of last semester, I was working on a version that would use lambda calculus as the representation. Lambda calculus is much better; in particular, it allows us to immediately take advantage of any repetitions that we recognize. Noticing repetitions is the most direct way to apply the shortness heuristic; it is the method used by sequitur, one of the algorithms inspiring my current direction.

Lambda calculus can compress repetitions using "abstraction". However, an abstraction needstoactually be put somewhere when it is made. There is no convenient heuristic for where to put it (at least, not that I've thought of). For example, if I see a string "abcabcabc", I could either abstract at the global level, which would allow me to compress any other occurrence of "abc" that occurs into the same abstraction, or at the local level, which might allow me to compress other local groupings of three repetitions. Also, I could place the abstraction anywhere inbetween, if I thought some of the context was important. "Anywhere inbetween" is a rather large search space. This and other issues led me away from lambda calculus as the representation of choice.

Generative grammars are another option. These appeal to me, partly because they are the formalism chosen by sequitur. Like lambda calculus, they offer a similar way of abbreviating repetitions. With grammars, the choice is what to rewrite them to, rather than where to place their abstraction. Sequitur can be seen as the special case of always rewriting repetitions to a single symbol, which represents the sequence that had repeated. More sophisticated types of patterns can be represented by allowing more complicated rewrites. Heuristically, simpler rewrites can be preferred.

Unfortunately, there is a serious problem with this representation. Unlike lambda expressions, grammars are not necessarily confluent, which means that when we use the grammar to generate data we may get a different file then intended. It is possible to specify a convention telling us how to apply the grammar to get a file (for example, "always expand the first applicable rule going left-right, deciding ties by using the shorter of the rules"). However, a restriction on forward-execution must be mirrored by a corresponding restriction on backwards-execution, which means the search during the compression phase is more constrained. In many cases it is good news when we can constrain a search more; but in this case, it hurts the ability of the heuristic to guide us. (It seems there is a fairly direct connection here: restricting the branching factor leads to a worse heuristic unless we're very careful.)

I also spent some time creating a set of restrictions that ensure confluence, but although these systems salvaged much of the situation, they were not as good as the option I finally settled on: combinators.

Combinators differ from lambda expressions in that functions are given names and have external definitions, whereas in lambda expressions the function definitions are in-place via abstraction. Minimally, one can use just two combinators, S and K, for a Turing-complete formalism. (If you want to know the definitions of S and K, look them up.) However. for my purposes, it makes sense to invent new combinators as needed.

The idea behind a heuristic for combinators is to not only look for exact repetition, but also repetition of pieces with holes. Not only can we notice multiple instances of "abc", but also "a_c". Repetitions with holes suggest combinators that take hole-fillers as arguments.

Turing-completeness is assured by the fact that the system could invent S and K, although it probably won't. (For one thing, S and K are not the only set of operators that allow expressive completeness. For another, they are too "abstract" to be suggested directly by data... the heuristic will not give them a high score. Although, they are short, so their score should not be too low...)

That, then, is the basic idea. There are many directions for expansion, which might help the algorithm (and might not).

-More complicated functions
--functions that take sequences as arguments, not just single terms (this is technically possible without expanding the formalism, via a trivial combinator A (x,y) -> xy; however, some work could be done to getthe heuristic to look for this sort of thing, essentially by keeping an eye out for repeated patterns with variable-sized holes rather than single-slot holes)
--probabilistic functions (this may or may not provide a compression advantage, since we've got to keep track of the extra info to make things deterministic anyway, but even if it creates no advantage it might make things more human-readable)
--functions with more complicated definitions, such as if-then statements

-more sophisticated heuristic & more sophisticated search
--best-first vs semi-random
--learning
---memoize "reduced forms" of commonly-encountered structures?
---learn which heurustic info to tabulate?
---learn sophisticated search strategies?

Tuesday, November 11, 2008

Fixing Naive Lambda Logic

In this old post, I explained why naive lambda logic is paradoxical, and explained how to fix it. The fix that I suggested restricted lambda statements to represent finite computations, which is really what they are for anyway. However, in this recent post, I suggest that many naive, inconsistent logics (such as the naive lambda logic) are actually referentially complete: they can state any mathematically meaningful idea. The problem is that they also can state some meaningless things.

So, it is interesting to consider how one might fix naive lambda logic in a way that keeps the meaningful infinite stuff while removing the nonsense.

To do that, though, a definition of meaningfulness is needed. As I said in the more recent post, a statement's meaning needs to come ultimately from a basic level of meaningful statements (such as statements about sensory data). A meaningful statement should either be on this base level, or its meaning should be derived purely from other meaningful statements.

In the lambda logic, it is natural to think of a term as meaningful if it evaluates to something. A statement, then, is meaningful if its truth relies only on the value of meaningful terms. To make this precise, we've got to specify the infinite build-up process by which we assign meaning. For each operation that forms a new sentence, we've got to define how meaning is carried over. One of these rules, for example, will be:

If statement X is meaningful and statement Y is meaningful, then the statement X and Y is meaningful.

We need a rule like this for every operation: and, or, not, quantifiers, lambda, and the lambda-value relation. That may sound simple, but complications quickly build up. And is meaningful when its arguments are meaningful. But what about or? It seems like X or Y could be meaningfully true if X is true but Y is meaningless. But if we want this to work, then it would also be sensible to allow not Y to be meaningfully true when Y is meaningless. (That behavior of not would make the behavior of or that I mentioned consistent with classical logic.) But that equates meaningless with false, which seems wrong.

Another problem arises with the treatment of quantifiers. Do quantifiers range over all possible terms, or only meaningful ones? It makes a difference!

There are many different places we can run to get standard solutions to these problems: free logic, the revision theory of truth, fixed-point theories, and others.

A third problem, perhaps worse, arises from the concept "not meaningful". For a concept to be meaningful seems straightforward: it should be built up in a meaningful way from other meaningful statements. But trouble presents itself when we discuss non-meaning.

Imagine the base-level facts as ground from which trees are growing. The trees, of course, are the meaningful statements that can be built up. Meaningless statements would be branches hanging in midair, attempting to grow from themselves, or from other meaningless statements. (Meaningful statements can also have self-reference, but have an attachment to the ground somewhere.)

Now, when we consider in the concept "meaningless", we see some weird stuff happening: somehow the paradoxical branches that grow from nothing are able to support meaningful branches, such as the statement "This sentence is false" is meaningless. Or, even stranger, consider "This sentences is meaningless". It appears to be meaningful but false. Or, consider "This sentence is either false or meaningless". If it is true, it is false or meaningless; if it is false, it is true; if it is meaningless, then it is true. It looks like the only way to deal with it is to say that it is meaningless to ask which of the categories it is assigned to: it is meaningless to talk about its meaningfulness or meaninglessness.

To handle cases like these requires a sort of back-and-forth between meaningful and meaningless. We can't just grow meaning from the ground up and then declare the rest meaningless; in declaring things meaningless we allow more meaningful statements, so we've got to go back and add them in. That in turn might change the set of meaningless statements, and so on. If in doing this we are changing our assessments of various statements (going back and forth between "meaningful" and "meaningless"), then we are doing something similar to what the revision theory of truth recommends. On the other hand, I like the idea of marking things "definitely meaningful" and "definitely meaningless". A back-and-forth woulds still be needed, but all decisions would be final.

Anyway. Suppose we resolve all of those issues. Another interesting issue comes up: infinite lambda statements.

An infinite lambda statement could directly represent the mathematical entities that I want the system to be able to reason about. For example, an arbitrary real number would be any function from natural numbers to the integers 0 through 9 (if we want decimal notation), represented by a finite or infinite lambda statement. (Note: the calculation itself could always be fixed to halt in finite time, despite the lambda statement being infinite.) The interesting thing is that if the logic has been satisfactorily rigged up, it will be in some sense as if infinite lambda statements were allowed, even though they aren't.

This suggests that we need to be careful of the semantics, and therefore of how nonmonotonic reasoning is used. Are the quantifiers interpreted as ranging over all actually-representable lambda terms, or are they also ranging over the unrepresentable infinite ones? If the logic is to talk about unrepresentables properly, the quantifiers will have to range over them. But then nonmonotonic reasoning will not converge to the correct answers: it will converge to answers that hold for the finite terms only. This will sometimes be correct for the infinite case, but not always. The matter seems complicated, and I'm not yet sure how to deal with it.

Tuesday, August 12, 2008

Paradox

I discovered a paradox in the version of lambda logic that I am creating. It is not difficult to fix. However, I'll explain the paradox before I explain how I fixed it.

Here it is symbolically (or rather, semisymbolically, because I'm not using any math rendering):

L = lambda x . (x x)
H = lambda x . exists y . output((x x), y)
I = lambda x . (H x (L L) 1)

With these definitions, the expression (H I) is paradoxical. I'll explain why.

The first important thing is that I am treating an existential statement as if it were a function that returned true or false. In lambda logic (the official version), this is allowed:

True = lambda x y . x
False = lambda x y . y

So, True is a function that takes 2 arguments and returns the first, while False takes two and returns the second. This should be thought of in terms of if/then/else statements. We can make triples (a b c), and if a evaluates to True, b is returned (the "then" clause); but if a turns out false, then c gets returned.

I define L for convenience. L takes an input, x, and returns x applied to itself. The compound statement (L L) is an infinite loop in lambda calculus: it takes L and applies it to itself, resulting in (L L) again. This isn't paradoxical in itself, because the logic does not assume that a result exists for every computation.

The second term, H, takes a term and detects whether it has a result when applied to itself. That is the meaning of "exists y . output((x x), y)": there exists an output of (x x). (I could have used the L construct, writing (L x) instead of (x x), but that wouldn't be any shorter, would it?) So, the way to read (H x) is "x outputs something when handed itself as input".

I is what I call the switch operation. If F is a function that returns some result (when applied to itself), then (I F) doesn't return anything. If F doesn't, (I F) does. How does it work? Well, I is written in the if-then-else format. The "if" part is the "H x", meaning "x returns something when applied to itself". The "then" part is (L L), a statement that never returns anything. The "else" is just 1, a number. Any concrete thing could be put in this spot; it just gives the function something definite to return.

Now, the paradox comes when we consider (H I). This expression asks the question, "Does the switch operation return anything when applied to itself?" Either it does or it doesn't. If it does, then (because it is the switch operator), it doesn't. If it doesn't, then it does. Contradiction. Paradox.

The problem is that I am allowing lambda-terms to be formed with non-computable elements inside them (such as the existential statement inside H). To avoid the paradox, but preserve the ability to represent any statement on the arithmetical hierarchy, I should restrict the formation of lambda terms. A set of computable basic functions should be defined. The list could include numerical equality, addition, subtraction, multiplication, exponent, et cetera. (Any operation that takes nonnegative integers and returns nonnegative integers.) However, it is sufficient to include only the function F(x), where F(x)=x+1. (Everything else can be developed from this, using lambda calculus.) Also-- it should be understood that the terms that can be created (from these basic computable terms plus lambda) are all the entities that exist within the logic. So, if the logic says things like "exists y . output((x x), y)", it should be understood that the "y" that exists is one of these terms.

Now, why doesn't the official version of lambda logic need to avoid throwing existential statements inside lambda expressions? The reason is that strong predicate I use, "output(x, y)", that means "y is the result of evaluating x". Lambda logic proper only uses a function that represents the performance of a single computational step. If I try to define H, I get a statement that always evaluates to True, because there is always a result of performing a single computational step.

Wednesday, August 06, 2008

More on Lambda Logic

I've read more about Lambda Logic. The Lambda Logic that I found is actually a bit different than the one that I was considering on my own. First, it is not referentially strong-- the logic has a complete inference method, which means it can only talk about provable things. So, it can't really reference lambda-computations in the way I want, because there are unprovable facts about computations.

Another interesting point-- Lambda Logic is inconsistent with the Axiom of Choice!

Tuesday, August 05, 2008

Latest Finds

My logic exists!
Its official name is "lambda logic". Makes sense.

Also, the Opencog Prime documentation was recently released. Worth a look! The knowledge representation used there is also of the form I've been considering-- there are both explicitly represented facts, and facts that are represented implicitly by programs.

Monday, July 21, 2008

New Grounding Results

In this post I detailed a different (and hopefully better) grounding requirement for an AI logic. I've now come up with a logic that satisfies the requirement, based on the speculations in that post.

This logic is nicely simple. All I do is add lambda calculus to first-order logic. The lambda calculus is used as a new way of notating terms, acting like function-symbols act normally. The difference is that lambda-calculus is able to uniquely define computations, while normal function symbols cannot.

The reason first-order logic cannot uniquely define computations is that it cannot rule out nonstandard interpretations. Let's say we try to define some method of computation (such as lambda calculus, Turing machines, etc.). Determining the next step in the computation is always very simple; it wouldn't be a very useful definition of computation otherwise. And since a computation is nothing but a series of these simple steps, it might seem like all we need to do is define what happens in a single step. Once we do that, the logic knows what to do next at each stage, and can carry any particular computation from start to finish without error.

The problem here is that we are only specifying what the computation does do, not what it doesn't. In other words, the first-order logic will carry out the computation correctly, but it will not "know" that the steps it is taking are the only steps the computation takes, or that the result it gets is the only result of that computation. This may sound inane, but first-order logic makes no assumptions. It only knows what we tell it.

The problem is, with just first-order logic, there is no way to say what we need to here. If there was an "and that's all" operator, we could list the positive facts about computations, and say "that's all". The logic would then know that any computational step not derivable from those basic steps never happens. But there is no such operator. I can tell it various things about what the computation does not do, but I will never finish my list.

Another way of putting this is that we can positively characterize computation, but we cannot negatively characterize it. Why not? Well, first-order logic is complete; this means that the rules of deduction can find all implications of statements we make. If we were able to state all negative facts, it would be able to find all consequences of them. But this would go against fundamental facts of computation. In particular, we know there is no way of deducing which computations eventually halt. If computations could be totally negatively characterized in a complete logic like first-order logic, the deduction rules would be capable of telling us this. So, this must be impossible.

In a sense, the reason we can't characterize computations negatively is because first-order logic has an open-world assumption. This means that if the deduction rules do not prove a statement true or false, it could be either. This is as opposed to a closed-world assumption, which would mean that if a statement is not proven true, it must be false.

So how does inserting lambda calculus into the mess help?

Lambda calculus is sufficiently closed to solve the problem. While a first-order theory can be added to without changing the meaning of what's already been stated, a program (stated in lambda calculus) cannot be altered freely. If we modify it, we simply have a different program on our hands. This means we can completely characterize many things that first-order logic alone cannot.

Fleshing out some more details of the logic:

-The deduction system should be a mix of a regular first-order deduction system, and a set of rules for the normal manipulation of the lambda calculus. This is as opposed to axioms describing how to manipulate the lambda calculus. Axioms would be in first-order logic and would therefore do us no good in terms of my grounding requirement, because they would not negatively characterize. The rules negatively characterize because they cannot be added to. (This is a petty distinction in a way; in practice, it wouldn't matter much, since nobody would add more facts to a running system concerning the lambda-calculus manipulation. But they could, and the system would accept these facts, whereas they can't add new rules without changing the system.)

-There is a question about how much to tell the first-order logic about its embedded lambda calculus. We could tell it nothing. But it seems practical for the system to have explicit first-order knowledge about the lambda-terms, so that it could deduce some facts about their manipulation (perhaps deducing facts about the results of certain calculations more quickly than could be explicitly checked by running all of those calculations). However, I have already said that we could keep telling the system more and more facts about computation, and yet never finish. So where do we stop? I have two ideas here. First, it seems reasonable to tell the system only the positive facts about the lambda calculus, since the negative ones are the ones that cause trouble. A few basic negative facts could be added at whim, if desired. This is very flexible. Second, we do not need to limit ourselves to first-order logic. We have lambda calculus at our disposal, and we can use it! In addition to placing the rules of manipulation outside of the system, as deduction rules, we could encode them within the system, using lambda calculus. This would allow the system to "know the deduction rules" despite it being impossible to completely characterize them in first-order logic.

Artificial Intelligence