Well, I keep progressively coming up with simpler ways to describe the whole system, which is good. But I've also figured out that the system cannot learn any arbitrary model in the turing-complete sense, which is bad. So I'll be revising the whole thing to be more complicated soon (if I think of a solution!). But, for now, here's the shortest-yet description, which simplifies matters by using only objects rather than objects and logical rules, the two being equivalent anyway.
The data is represented as a graph with labelled directed edges-- for those who don't know graph theory, that's a network of points connected by arrows that come in different types. The different types of arrows (a.k.a. the different labelings of the edges) represent different logical relations between the nodes, which represent single things (of a general sort).
These relations are divided into spatial relations and property relations; spatial relations point to other data-objects (the archtypical spatial relation being "next-to"), while property-relations point to nodes representing values of properties. If the data is a digital photo, for example, the spatial relations would be "above", "below", "right-of", and "left-of". The property relations would be "red" "green" and "blue", and would point to non-data nodes representing every number from 0 to 255. (These number nodes would have their own spatial relations: one-bigger-than and one-smaller-than.)
The first operation on this space is to make models of it. These models are constructed by learning objects in the space. "Learning objects" means recording probabilities for subgraphs (sub-networks). These probabilities can be recorded in various ways. The roughest way would be to use some threshhold, above which an object is considered "salient"-- thus, a model is a list of salient objects. (I've developed some math for deciding this threshold, but I won't go into it.) We can keep more information by actually recording probabilities for objects. We can keep even more by recording probability density functions (which give a probability that each possible probability value is the real probability value). The system works about the same regardless of choice (it just takes longer to process things and gives more accurate results).
Once we construct objects on a data space, we can create a new space in which these objects are the data, and objects on this space can be learned. In other words: once we've used the basic data as peices to put together larger objects, we can use these larger objects as pieces to put together even larger objects, et cetera, iterating on itself continually. So the "first operation" loops back on it's own output.
Models are also considered new data-spaces, the probabilities being the new properties to be predicted. This means that if the system has learned models for multiple comparable data-spaces, it can compare the different resulting models to try to find similarities. Based on these similarities, it can do such things as revise possible inaccuracies in the models (based on discrepencies with the general model patterns) and predict likely models for new spaces. Thus, there is a second way in which the modeling process wraps back on it's own output.
The second operation we can do is to record information about the context in which each of our learned objects appear. Like recording the probabilities of the objects, this can be done in different ways. The most convenient, given (1a), is to record information about what higher objects our object finds itself appearing in, and with what frequency. These records of context serve as new data-spaces in which to find patterns. The patterns found there can be used to revise the likely contexts of objects, as well as predict likely contexts for objects where little evidence has been gathered (the likely contexts of objects that occur rarely). (1b) applies here, meaning we compare the different models for the different context-spaces to find patterns.
It may be helpful to view the context of an object as a property of the object, which is acessible to the modeling process in data. This means that proccess 1 can build objects using context as a property in addition to the regular properties. Rather than predicting the direct properties given a particular situation, the system predicts an object with particular contextual properties, which in turn allows a prediction for the actual properties of the object. This is useful if there is some pattern between the contexts in which a pattern occured before and the contexts it's occuring in now, other than the obvious "it should still be the same" pattern; in other words, if some transformation has occured mapping the contexts to a new distribution.
And that's it. Well, not quite. I skipped over some little details.