Saturday, May 27, 2006

Notes

Always what matters is how higher concepts are tacked onto data.

Always, there must be "mappings". Time is always mapped. Space is generally mapped. To detect all patterns, all things must be mappable; a lack of mapping merely indicates a lack of common pattern. A mapping is an ability to look for commonality. In a visual grid, we would map each gridpoint to every other, so that we could search for a picture in one place or another, and recognize it as a repeat. On the other hand, we would not map them randomly; we would map the points according to the grid that they fall into, so that, say, (3,4)-(8,12) (a line) would map onto (4,4)-(9,12) or (3,0)-(8,8) or (7,14)-(12,22). We would not want to map it onto, say, a particular scattering of random points; not without evidence. Perhaps a pattern exists between that set of random points and that line; but it is not a property of the default space.

The point is that visual "pixels" (or any kind of basic sensory grain) usually do not stand on their own, to be compared in mass to every other grain, but come with some relations attached, generally forming some sort of space. A machine could eventually learn some of these mappings by such brute-force comparison; if the activity of a pixel can to some extent predict that of nearby pixels (and it should), then a set of correlations can be derived; the more similar in activity, the closer two pixels ought to be. But what then? The machine must be able to extend this "closeness" into the type of mapping that I mentioned. When a net of close pixels behave in a particular way (either behaving similarly to eachother, or following some other pattern) then the same activity in other pixel groupings must be seen as the same.

But I digress. The intelligence should have this grid already set out for it, and I only want it to be able to form it for itself to make sure that it has "complete intelligence". My original statement:

Always what matters is how higher concepts are tacked onto data.

The simplest way to tack on higher concepts is to notice direct repetition: exact likeness in two different places (two places that are "mapped", that is; related either in time or in space). The minimum requirement to find all static patterns is the ability to group any two patterns together if they look alike, whether they occur in data or as a grouped pattern. Two could then turn to three, three to four, et cetera, until all (or most) occurrences of the pattern had been found. It may be simpler, of course, to perform the full search at once, marking all instances of a pattern.

When searching for such static patterns, there may be (there are likely to be) options and compromises.

One problem: a larger pattern will likely exist that incorporates a smaller pattern, but has less instances. The solution here is to split the larger pattern up into various additions on the smaller pattern. The smaller pattern acts as the "core" of the larger pattern (or at least as a piece of the larger pattern). Another problem: two patterns will often overlap, so that parts of the data could belong to either. In a machine mind, one could set things up so that they did not interfere with eachother in doing so. But in a human mind, they will. The mind flips back and forth between the two interpretations of the data. The solution here, I think, is that both interpretations should be "kept in mind", allowing the intellect to choose between them based on their merits.

But sometimes a pattern actually requires something like overlap; for example, in finding simple patterns with numbers, one must recognize that one number is greater than (or less than) the last. They together form a pair with a particular slope. But an elongated run with slop forms many such pairs, and each number must belong to two pairs: a pair with the number previous to it, and a pair with the number after it. One possible solution would be to allow for detection of patterns of overlap. Another might be to use a different sort of an object to detect slopes and similar patterns. This object would be a "relation". The relation in question would be "higher-than" and "lower-than", possibly getting more specific (one-higher-than, two-lower-than...). Data can be transformed into sequences and then groups of such relations. Sequences of such relations should be treated just as raw data, meaning that they can be patterned according to relations as well.

Relations can be learned, based on patterns that occur often in data, but they should also be default items when it comes to such things as numbers and colors. With textual data, each letter can be treated individually; and the same is true for visual data that uses a small number of possible colors, such as 16 or less. But with data that delivers quantities instead of qualities, default relations must exist. If these are numerical (and most will be), then these relations consist mainly of less-than/more-than. Specifics may be given (2-less-than, 18-more-than), but not necessarily; with colors, probably not. Relations may be only addition-based, or may be multiplication-based, or may even be power-based. In addition to numerical default relations, one could use any other imaginable scheme; for example, a variable might be three-dimensional, but not based in a grid; rather based on an alternative 3D tiling. Variable-states could be related in an arbitrary network; a hierarchy; a number-line that branches and loops; whatever.

Problems to be addressed:
(1) In a way, I'm still only talking about static patterns. Variable patterns are also necessary.
(2) I didn't actually provide any way for such relations to be learned; I only said that it would be possible.

These two problems should be related; "variable patterns" create "variables", and "relations" connect "variable states".

It all starts when two variables of the same type predict eachother.

This creates a mapping of one value onto another; a "function".

A function does not necessarily mean an order; it only does so if each value is uniquely mapped onto another. Even if this is true, the order may not encompass all values; the function may only partially order the values, or may make multiple, separate orders (so that groups of values are related to eachother, but not all values can be related). Additionally, there may be multiple such mappings, due to multiple different learned sets of relations. One could imagine some confusing schemes indeed.

And, yes, "mapping" is the proper word. Relations are what orders space and time, just as we are using them now to order variables.

And now a digression. I still have not written exactly how variable patterns are learned, which is part of my original point (see first sentence). But I will now clarify the nature of mappings (or attempt it) with examples.

Chiefly, I must clarify that mapping is somewhat different from ordering. If something is mapped, but not ordered, over time, then we have many "moments", but do not know in what order they occurred. Each moment is separate, but moments may be compared. We may still look for repeated patterns in moments; moments may resemble eachother either in whole or in part. Moments may be classified by their similarities. If such classifications reduce the number of options for other variables, a predictive pattern has been found.

If time is ordered, the difference is simply that a particular moment may predict other nearby (or even farther away, in principle) moments. To look for such patterns, we "collapse" everything into an abstract idea of "the current moment" and "nearby moments". All moments fall under each category, but their is order within the broad groups; each "present" has an associated "past" and "future". We then collapse the present category by various classification schemes. When we do so, the past and future collapses likewise. If, for a particular scheme, the future or past becomes more ordered upon collapsing, we have found a pattern.

If we only want to look for instantaneous temporal influences, in which one moment effects the next, we can collapse everything into two categories: past and future. Each moment counts as both a "past" moment with an associated immediate "future" (the next moment), and as a "future" moment with an associated immediate "past" moment. Using more moments then this in the mapping simply looks for longer-term influences; for example, three units can detect influences that skip one moment before showing. For a computer scientist, this is an Nth-order markov model, but with more complicated states than a single multivalued random variable.

For "completeness", we would want an AI to be able to expand the field of mapping arbitrarily far. This means to me that any moment points to the next moment and the last moment, and that any such ordered and mapped variable (be it space, time, or otherwise) is able to associate in this manner. Alternatively, one could simulate this pointing by re-acessing the larger datastructure every time. This simply means that one must remember the location of everything one is thinking about. But the choice here only effects computation speed and memory efficiency, so I'll dwell on it no longer.

So: we can have mapping without ordering. Can we have ordering without mapping?

I was wrong to say "And, yes, "mapping" is the proper word. Relations are what orders space and time, just as we are using them now to order variables.". Here we have ordering without mapping. Maybe.

No comments:

Post a Comment