Thursday, March 29, 2007

Well, far from coming up with complexifications over the interveining time, I've come up with two things: a fairly convincing argument about why I don't need any more improvements, and yet another (slight) simplification. The simplification is pretty short, so I'll give it first.

Basically, "2a" from the previous post was unnecessary. It will hapen anyway, as a result of carrying out the other operations. It might help things along to include it seperately, but then again it might make things needlessly complicated.

Here's how it emerges on its own:

Remember, 2a is: view the context of an object as a property of the object. This means that in addition concrete properties, such as color, shape, or whatever it may be, an object also has a "normal context" that it appears in, and this is just as important a property. When finding patterns, the system must look for predictors of redness, curviness, or whatever it may be, but also should watch out for predictors of these context-variables. This allows the system to notice if several objects that have appeared before in a particular context suddenly start appearing together in some new context, too; it can then predict that any object that might have appeared in that old context is prone to appear in the new. (This is perhaps a clearer explanation then I made in the previous post.)

The reason this isn't necessary is because we're already recording contexts of objects and finding patterns in them (#2). What we're really doing when we decide that objects appearing in the new context have a common old context is deciding that the new context and the old context appear to share properties; in particular, that they appear to share at least some items. Our probabilistic model of the behavior of contexts in general will then start making conclusions based on this similarity, which is what we want.

That makes the reduced model as follows:

-record probabilities of objects (objects being aggregates of the basic sense-pieces). These records are called "models", and are used to predict unknown data, and possibly to remove static from known data.
-interpret data by recognizing learned objects, and treat these interpretations as data (finding patterns on the objects in the same way one finds patterns in raw data).
-look at the models as data as well, attempting to predict them and remove errors in the usual way. (Even if it isn't considered important to remove static from raw data, it's unquestionably wise to attempt to remove errors from models; models definitely have potential errors, because data is almost never entirely unbiased.)
-Record the probabilities of various contexts for each object.
-Treat these context-models as data in the same way.

This can probably be simplified even further in various ways. For example, the second point might be simulated by looking for larger and larger objects with the first point, and then looking for patterns in these large objects to simplify them, and compensating in various other ways. But this would probably not be an improvement in the clarity of the system, and other possible simplifications that occur to me have the same problem; they make the abilities of the system less directly related to the immediate behavior of the system. But, of course, of a simplification is sufficiently simplifying, it's worth it.

Hey, never mind, 2a is totally necessary: a new object might be found that is constructed entirely of context-properties. For example, we might define a class of pictures that consisted of a drawing of a fruit with a drawing of some object from an office sitting next to it. Neither "fruit" nor "object from an office" are defined by visual characteristics. This may not seem like a particularly critical class of pictures for a person to be able to learn, but the fact is that a person could learn it. It is possible to learn objects composed of items that are defined only by their normal context.

Now that that reduction is invalidated, let's see if any others are possible. Can 2a replace the basic search for patterns in context-spaces, then? No. 2a relies on this search. Without it, context-spaces would be a disarray of unrelated stuff; there could barely be a 2a.

One idea comes to mind. Recording the probabilities of different aggregates and recording the probabilities of different contexts for singles or aggregates is essentially the same thing. If I had all context-probabilities for, say, a yellow pixel, I could calculate the probability of all aggregates containing yellow pixels. Likewise, if I started with all those aggregate probabilities, I could calculate the probability of each possible yellow-pixel context. This suggests, of course, dropping one or the other process. What makes me hesitant is that, although it is clear that the two are equivalent, it is not so clear that the same sorts of patterns can be found in both. The basic benefits of the two processes might be similar, but the benefits resulting from modeling each may be different. Recording contexts, for example, provides the material that is used in the step 2a we've been discussing. Can recording objects do the same?

The added variables of an object would not be context, then, but rather object-involvement; which larger objects the smaller object could fit into. (As with context, this added variable doesn't represent what the object is currently involved in, but what it can be (has been) involved in. It's a class property, not an instance property.)

This is obviously perfectly equivalent to using contexts as variables. However, there is one more consideration: what about the patterns found in contexts? Are they the same patterns that can be found in objects?

Some of them are. For example, it's possible to find smaller, repeated sub-objects within larger objects and across larger objects. A good example-- children actually learn word meanings before syllable meanings in some cases, only later realizing that particular syllables seem to have consistent meanings across words. But there is another type of pattern that doesn't seem to be covered: consistencies in probabilities of various contexts. When two different patterns occur repeatedly in similar situations, they should be deemed "the same" (so-called variable patterns or quotient-objects). This can be taken care of, however, simply by realizing that (1) I've already declared that the larger patterns an object occurs in are class variables, and (2) I've been working under the assumption that the system looks for patterns between classes.

Everything seems to add up.

Reduced system:

(1) We've got data made of basic sensory percepts linked up in some way. Each basic percept is represented by a class.
(2) We record the probabilities of various aggregations of percepts in some way. Each possible aggregate (or at least each aggregate that the system decides is worth noting) is a class.
(3) Interpretations of the data are created using the aggregates, replacing lower percepts with aggregate labels (corresponding to the aggregate classes). These interpretations can then be treated as data, and all properties of the classes are available as variables of the labeled aggregates.
(4) If a class occurs as part of a larger aggregate that is recorded, then that fact is recorded within the class; in other words, the likely contexts of a class's instances are a variable of the class.
(5) The entire range of classes is also treated as data, which the system tries to find patterns in.

Well, the discussion of the reduction turned out longer than I'd planned, so the other part will have to wait until later.