Hmmm. Markov models and hidden markov models are essentially both just probabilistic generalizations of regular grammmars. The argument I've mentioned twice now (but not given) actually shows that my system is at least as powerful as context-free grammars, which I was just reading about.
(A regular grammar is quite minimal, only a step better than rote memorization, then a context-free grammar is less restricted, and powerful enough to represent most formal languages, but the least-restricted grammar is the class of turing-machines. A regular grammar allows repetition, rote memorization of sequences, and picking from alternatives. A context-free grammar additionally allows recursively defined concepts-- for example, the class "(), (()), ((())), (((())))..." can be recursively defined as either () or (Q), where the Q stands for any member of the class. The ability to recursively define in this manner still leaves some constraints. I won't go into that here, mainly because I'm no expert. Look up regular grammar and context-free grammar on wikipedia.)
Here's the argument. Suppose that a recursively defineable entity occurs in some context, which we will call context A. Trying to learn what happens in the context, the system will first memorize the base-cases as well as some small constructs, without realizing the larger pattern. Then, the system will recognize the base-cases within the small construct's structure. As it keeps learning, it will memorize some larger constructs, and notice the smaller constructs within the larger structure. The smaller structure's locations will be exactly analogous to the locations of the base-cases within the small cases. With enough experience, the system will relize this, and represent the fact with an aggregate reflecting the structure of the construct, having some blanks filled in not with specific sequences or with alternatives, but with the simple requirement "anything that can occur in context A can occur here, with the same probabilities." Because all properties of a class become properties of instances, the system can notice this repetition and take advantage.
Not only does this allow recursive concepts, but it does so without using recursively defined classes. This is good for simplicity of calculation. If the final aggregate class I mentioned filled in its slots with the requirement "another instance of this class" (as recursive definitions usually do) then it would be difficult to assign a probability to the aggregate. But since it instead refers to itself indirectly with "anything from context A", there's no problem; the definitions of each class (and therefore the calculations for their probabilities) is non-recursive, even though it all adds up to a recursive concept.
So (if you've read the last few posts) it seems that my system can learn regular grammars, markov models, hidden markov models, and context-free grammars. (Formal proof of each thing is admittantly wanting, especially for context-free grammars, but I'm fairly sure.) But can it learn unrestricted grammars?