Wednesday, September 27, 2006

Much Later, Much Better

AI Formalization

1. A Thing is a symbol containing a list of relations. Relations can be affirmed or denied in this list; those that are not either affirmed or denied are considered unknown. The list of relations is thus in the form of pairs and denials of such pairs; the first element of these pairs is the relation type, and the second is the value of the relation. It may also be allowable to list only the relation type, leaving the value unknown. The value of a relation is generally also a Thing, the symbol for which is given in the ordered pair. The relation points to the value, so that the whole group of Things might be thought of as a network of dots and arrows of different types.

2. Relation-types are represented as Things. Thus, different relation-types may be related. For example, one relation may be the opposite of another, meaning it points in the other direction. "Opposite" would be a relation between relations.

3. The following count as variables of a thing:
a. The relations possessed by that Thing
b. The variables of that relation
c. The values of those relations
d. The variables of those values
The possible values that these variables may take on can be called "states" (to differentiate from "values", which are defined more specifically as the Things pointed to in a relation). This recursive definition allows us to construct arbitrarily long variables that may be in different states. All the variables of any Things that a Thing is connected to are a variable of that Thing.

4. A Space of Things is defined as a Thing having one particular relation connecting it to all members of itself. (This can be any relation we choose.) Often, these members will have a special class of relations called "spatial relations" interconnecting them. If not, they are an unordered space. If they do have spatial relations, however, they may be of any form: the space could be 1-dimensional, 2-dimensional, or any number of dimensions; it could form a tree with any branching-number or no particular branching-number; it could be a tangled hierarchy; it could be a strange loop; it may be anything. The spatial relations can be stored as another special relation of the space, since relation-types are Things, and can be pointed to. (You can call this second special relation spatial-relation-ship.) The values of the spatial properties of a Thing can be called "neighbors" of that Thing; they are other Things within the space.

5. We define the properties of the Things in a space as those variables we are concerned with modeling in that space. The properties which are relations or relation values of a particular Thing can be called the "personal properties" of that Thing, while the properties that are inherited from neighbors might be called "contextual properties". Like the variables of a Thing, the properties of a Thing are defined recursively: any property of a neighbor can also be called a property of the Thing itself. The relations which point to the personal properties of a Thing can be called "Property-relations". The property-relations of a space can be stored in a third special relation of a space. (You could call it spatial-property-ship.)

6. We define a "data space" as an open collection of Things (meaning a space in which we know certain Things dwell, but in which other Things may also dwell, that we don't know about at the moment, but may know about later, as more data comes in) having defined spatial relations and property relations.

7. We define a class of Things as a Thing that organizes other Things into a set using an "instance" relation. A closed class is one determined completely by what members it has; no members can be added to or taken away. (Also known as an extensional class.) An open class is defined instead by the properties of it's members, meaning it can be added to when new Things are found that satisfy those properties. (Open classes are also known as intentional classes.) In order to define the properties of an open class, we refer to a template object. A template object is a Thing (not in the data) possessing all desired properties for class-membership. We can define a property positively or negatively, by affirming it or denying it for the template-object; if something has all the positive properties and none of the negative ones, it may be added to the list of instances. (Because we list specifically that something doesn't have a particular relation-value pair, and leave the matter undecided if it is not listed explicitly, we must both see that all positive requirements are met and see that all negative requirements are specifically negated. We cannot assume that a Thing meets all requirements just because it doesn't specifically list a negative value as something it has; if the property is unknown, it is unknown.)

8. We act as if all closed classes for a space already exist, meaning we can make open classes of closed classes with particular properties and search for such closed classes. For example, we can define the class "all pairs of Things which point to eachother with relation X". This is actually an open class of all closed classes that contain two things, each of which points to the other with whatever relation is specified as relation X. In this use, it is convenient to call closed classes "aggregates"; we are searching over all possible aggregates for the particular aggregates which meet our demands. To this end, we need to define templates that contain more than one template-objects; this allows us to talk in the abstract about different Things being related to one another. Two Things that point to one another would need to be represented as a template with two objects and two relations; without the ability to form such multi-object templates, it would be impossible to represent this. A class of one object can be thought of as a special case of the multi-object class. Each instance of a multi-object class is represented by a Thing that points to the Things in the data that fit the template-objects. These Things can be called the "parts" of the class instance. Each part is connected to the class-instance by a special relation with the same name as that given to the Thing standing in that place in the class template; thus, we give a unique name to every part-relation. A class formed by multiple parts may be called an aggregate class.

9. A possibility-space on some space is defined as a space of mutually exclusive classes-- classes that do not overlap by admitting the same members as eachother. A well-defined possibility space is one that is formed by always specifying things about the same variables or combinations of variables. For example, if we have 2 property-variables in a space that we want to form possibility-spaces for, we can form a well-defined possibility-space by forming classes that only select a value for the first variable, or only for the second, or that always select a value for both variables. I could form possibility-spaces that are not well-formed by throwing in, say, "value A for var1", "value D for var2", and "value B for var1, value C for var2". These do not overlap, but the space is not well-defined because we do not know in general how to add more classes to the space in a systematic way. To notate which variables a well-defined space uses, we add them to it's list of property-variables.

10. In a well-defined possibility-space, we call negated classes (classes known not to be in the possibility-space) "impossible", and the meaning of the negation is to signify that this value or combination of values cannot occur in the data. We say that affirmed classes (classes stated to be in the possibility-space) "exist", and the meaning of the affirmation is to say that an actual example of the class has been found in the data. The classes not talked about are "Possible but unknown", meaning that as we get more data we do not know if an example will turn up or not.

11. A single-state variable is one that we only want to take on one value at a time; for example, a pixel is only one color at a time. (An example of a multi-state variable might be "child"; one person may have several children, but if we wanted to represent that here we wouldn't use a "children" relation to link to a list of children, rather we would use a "child" relation to link to each child. Thus the "child" relation for a person with several children would take on multiple values.) We can restrict a particular relation to one value by adding to the possibility space a negated class that has two of the relation. Since that class will match to any Thing with more than one of the relation, it has the meaning "there is no class with more than one of this relation".

12. A possibility-space is closed if we have a limited number of possible states for the variables involved; in other words, we have negated all but a finite set of states. We can make a possibility-space closed for a single-state variable by stating the non-existance of a class that, as it's property-list, negates every value that we consider possible for that variable.

13. A well-defined possibility-space that restricts each of it's variables to one value in some finite range of values may be called a regular possibility-space. The rest of the discussion will be (perhaps unfortunately) restricted to these.

14. A model of a data-space is defined as a set of regular possibility-grids over all of the property-variables for that data-space, with dependence-information for each multivariable possibility-space. A default model for a data-space is one which includes an individual well-defined possibility-space for every single variable, with no multivariable possibility-spaces, and has no affirmed or negated members of any of the possibility-spaces (they are all empty, meaning everything is possible but unknown). More complex models are built up by affirming and negating possibilities, and also by creating multivariable possibility-spaces. When we create multivariable grids, we can bring in contextual properties of a Thing rather than just the personal properties; a Thing's personal properties can be limited by it's surroundings. When a multivariable possibility-space is created, we need to add dependance-information; this means we need to decide which variables are chosen first, and which are chosen based on those first choices. If there are any limitations on the variables, then choosing a state for one variable will tend to narrow the options for the other. This is important in the things to come. Either each variable for a grid is chosen in a particular order, so that we have a first, second, et cetera, or we can group certain variables together and choose them simultaneously, meaning that we choose from all possible combinations of the two instead of choosing them one at a time. The default for dependancy is choosing all variables at once in this way. Contextual properties are not included in this order; they are treated as pre-existing, already chosen. (note: it would work fine to allow contextual properties to be dependent on personal properties, but it would be more work to calculate, and we should be able to represent all possible relationships without the added representational power. In other words, all situations in which an outside variable depends on a personal variable can be broken down into the more standard type of relationship by looking at it from the other Thing's point of view.)

15. The probability of a space given a model of that space is calculated as the product of the probabilities of each Thing in that space, which is given by the model; we calculate from the model by looking at the probability of each variable given the actual selections of the variables they are dependent on, including contextual variables (if any dependance on context has been noted). If variables are supposed to be chosen simultaneously, then we record the probability of their particular joint value (out of all available values). This is relatively simple. But in actuality, there are two possible ways to perform this calculation: we can define "all possible states" as all states that are not negated in the possibility space, or as all states that are affirmed. The second will often be much more strict than the first (and therefore give higher probabilities). The first I call the narrow interpretation of a possibility-space, the second the broad interpretation. The two methods are ultimately impossible to reconcile, because they represent different ways of thinking, and different assumptions about the world: the first represents the assumption that anything is possible unless proven impossible, and the second represents the assumption that everything not proven possible must be impossible. These two ways of thinking will be developed as semi-semetrical opposites in the following development; no attempt will be made in the current document to suggest which type of reasoning is better in what case, and the only attempt at unification I can currently put forward is the hope of reducing the unknown by affirming/negating more of the possibilities, so that the two interpretations of a model approach eachother.

16. An existence-model is a model made with no negations, only affirmations, in it's possibility-spaces; it is interpreted narrowly, by seeing only affirmed classes as possible. I also use the term to refer to just the affirmative part of mixed models, particularly when applying the narrow interpretation to them. An existence-model reasons about what we know exists, using the assumption that anything else does not. It has the flaw of ruling out too many possibilities.

17. An impossibility-model is a model constructed only by negating certain classes, interpreted broadly (by assuming that only negated classes are impossible). I also use the term to refer to just the negative part of mixed models, particularly when applying the broad interpretation to that model. An impossibility-model reasons about what we know to be impossible, using the assumption that everything we can't prove impossible is a possibility. It has the flaw of assuming too many possibilities.

18. The utility of a model is defined as the probability of the data given that model. The more likely a model makes the data, the more useful that model is. If a model makes the data 100% probable, then it has the highest possible utility: 1. If it makes the data impossible, then it has the lowest possible utility: 0.

19. Utility is not a complete measure of how good a model is, because it is easy to find the model that gives the data 100% probability: we simply define the current data to be the only data possible. (This can be either an existence model with one large aggregate class, or an impossibility model that rules out everything else.) This model is obviously very bad. So we need a second measure: elegance. The simpler a model, the more elegant it is. The 100% model is inevitably exactly as complicated as the data itself; this is horribly bad elegance. Elegance is actually quite easy to define: it is the probability of the possibility-space itself, under the default model. If there are 10 states total in the possibility-space, each new negation or assertion we add gets 1/10 probability, because it's calculated as if it's traits were chosen randomly. (If there are multiple variables, it comes out to the same probability: 5 states times 4 states = 20 possible states, and 1/5 probability for one variable's choice times 1/4 probability for the other = 1/20 total probability for each class negated or asserted). Thus the total elegance of a possibility-space equals the number of states to the power of the number of classes mentioned in the possibility-space, and the elegance of a model is the product of the elegance of each possibility-space involved. (A possibility-space with no negations or assertions is totally elegant.)

20. Since a possibility-space is treated just as a normal space is in many ways, including how the probability is calculated, we can make models of it in the same way as well. If a space is patternistic, we can increase it's probability by making a model of it that has a high utility-- a model that increases the probability of the space. Similarly, we can increase the elegance rating of a space by recognizing patterns in it. Models we make of models increase the probability (elegance) in exactly the same way models increase the probability of the data. In other words, in this system, models ARE data-- they can be reasoned about in the same way. A model of a model is called a metamodel.

21. The probability that any particular model is true is the elegance of that model multiplied by it's utility. (if we've made a second model modeling the first, we multiply in it's elegance to this measure. If we've made a third, it goes in, as well. Et cetera.) This is a fundamentally unjustifiable step, in the sense that we cannot ever really hope to know if any particular model is actually true or false-- even assigning it a probability requires a big assumption about how the world works, and about the typicality of the data you're getting. But this step is necessary for the generation of new knowledge. The remainder of this document discusses how to use this fundamental assumption to search for new and better models. The essential goal is to maximize the probability of the model.

22. For limitation-models, the default is a blank possibility-space. The goal is to increase the probability of the data by adding limitations to that space, while avoiding inelegance by eliminating too many too specifically. Eliminating possibilities reduces possibilities for the data, making the current data more likely; but if we ever eliminate any currently existing possibilities, the data drops to 0% probability and we must abandon that model. This can happen when new data comes in that contradicts our model of the old data. The two basic courses of action are to check each possibility in a particular space as a candidate for elimination, and to search for other possibility-spaces that may be able to contain better models. Better spaces are arrived at by finding a relationship between two personal properties or between a personal property and a contextual property, in which some value of one property limits the possible values the other can take on. We can then represent this limitation by way of a possibility-space. (Note: variable dependance also should be a part of a model. The default is everything being independent. Other possibilities simply have the effect of changing the probabilities of the various classes involved in certain situations. If the distributions match better, they should be adopted.)

23. Existence models are somewhat less intuitive.The default is still an empty space, but this time that means no assertions, which means nothing is yet considered possible. So if we have any data at all, then the default model has a 0% probability because it rules out things that happen in the data, giving model absolutely no utility. To rectify this, every existence model must at the bare minimum accept all situations that actually occur as possibilities. A second strange property of existence models is that, whereas impossibility-models tend to try to get higher utility by compromising some elegance, existence-models tend to do the opposite: compromise some utility for greater gains in the area of elegance. New models are made by making generalizations about which classes exist; in other words, by making a model of the model (a "metamodel", as I said earlier). This increases the elegance by coming up with a more abstract classification of what exists and what doesn't, but often compromises some utility by considering more situations possible. The only other type of progress is finding better grids for which the relationship between the variables is more definite, just as with impossibility-grids. (Again, variable-dependance can be included.)

And that's it. Once models have been created, they can be used for deduction and guidance of goal-oriented behavior. (This isn't trivial, since I want the system to guide it's own thoughts, but a more naive non-self-guided version does follow quite simply.) The theory still has a few holes in it; some things are described above in a fairly sketchy manner, and other things are described in a more limited domain than that of all possible problems (the most obvious restriction being the restriction to regular possibility spaces). But this is the general idea. Better descriptions of what's here, along with improvements to the overall system, will of course be forthcoming.