The last post discussed why we can't build an ultimate solution to AI-- an ultimate model of how to model the world. I've got to admit that I don't particularly like the fact. I want every element of my AI to follow logically from the basic purpose-- when I describe it, I want it to sound like the obvious solution, the right way of doing things. So, although we can't say what model for modelmaking is best, let's discuss some basic elements that model-models should have.
Obviously, the more types of models we can generate with our modelmaking model, the better. That's why AIXI and OOPS use turing-machines; technically, they can do anything. So a good modelmodel should be universal in the sense that it can come up with any particular model.
Second, we need some rating for how good each model is. Since we can come up with any model, there will be a large number of possibilities for each set of data. There are several ways to rate models. For genetic algorithms and neural nets, a common concern is how closely we match the data-- we give each model a score for percent correct on a number of problems. As I explained in the previous post, AIXI and OOPS both use the criteria of 100% matching; no errors are to be tolerated. But this criteria alone did not limit them to only one model. Similarly, with genetic algorithms and neural nets we run into trouble if we only use percent correct. The model with the highest score may be overmatched: it may have memorized the answers for our particular test, but have no idea what to do for any other. The equivalent for AIXI and OOPS would be a program that generates the desired output by quoting it in it's program-- if the data is "1001011010010101", then the program is "say 1001011010010101". No structure has actually been found in the data, or perhaps only an extremely small amount of structure ("say 100101101001 and then stick 01 on the end twice"). In some sense, these models are too complicated. AIXI and OOPS differ in how they define this notion of "complicated", of course. Neural nets and genetic algorithms have a different often-used solution: instead of giving the entire dataset to the program for it to train itself on, we give it only some of the data. Then once it's trained, we test it using the data that we didn't give it-- if it's just memorized the first dataset, it won't do too well on the new data.
I call these two different measures, the percent correct and the measure of memorization, "utility" and "elegance". "Utility" measures how well the model fits the data, and "elegance" measures how likely the model is to work well for other data in addition to the original training set. The term elegance makes more sense when applied to the AIXI length prior and the OOPS speed prior than for the method of withholding some data until the end, and it's true that I'm biased towards the second sort of method-- a measure of elegance based on what the model looks like, how 'nice' a model it is, not based on withholding some data.