Thursday, April 26, 2007

Hmmm. Markov models and hidden markov models are essentially both just probabilistic generalizations of regular grammmars. The argument I've mentioned twice now (but not given) actually shows that my system is at least as powerful as context-free grammars, which I was just reading about.

(A regular grammar is quite minimal, only a step better than rote memorization, then a context-free grammar is less restricted, and powerful enough to represent most formal languages, but the least-restricted grammar is the class of turing-machines. A regular grammar allows repetition, rote memorization of sequences, and picking from alternatives. A context-free grammar additionally allows recursively defined concepts-- for example, the class "(), (()), ((())), (((())))..." can be recursively defined as either () or (Q), where the Q stands for any member of the class. The ability to recursively define in this manner still leaves some constraints. I won't go into that here, mainly because I'm no expert. Look up regular grammar and context-free grammar on wikipedia.)

Here's the argument. Suppose that a recursively defineable entity occurs in some context, which we will call context A. Trying to learn what happens in the context, the system will first memorize the base-cases as well as some small constructs, without realizing the larger pattern. Then, the system will recognize the base-cases within the small construct's structure. As it keeps learning, it will memorize some larger constructs, and notice the smaller constructs within the larger structure. The smaller structure's locations will be exactly analogous to the locations of the base-cases within the small cases. With enough experience, the system will relize this, and represent the fact with an aggregate reflecting the structure of the construct, having some blanks filled in not with specific sequences or with alternatives, but with the simple requirement "anything that can occur in context A can occur here, with the same probabilities." Because all properties of a class become properties of instances, the system can notice this repetition and take advantage.

Not only does this allow recursive concepts, but it does so without using recursively defined classes. This is good for simplicity of calculation. If the final aggregate class I mentioned filled in its slots with the requirement "another instance of this class" (as recursive definitions usually do) then it would be difficult to assign a probability to the aggregate. But since it instead refers to itself indirectly with "anything from context A", there's no problem; the definitions of each class (and therefore the calculations for their probabilities) is non-recursive, even though it all adds up to a recursive concept.

So (if you've read the last few posts) it seems that my system can learn regular grammars, markov models, hidden markov models, and context-free grammars. (Formal proof of each thing is admittantly wanting, especially for context-free grammars, but I'm fairly sure.) But can it learn unrestricted grammars?

Wednesday, April 25, 2007

Addendum to last post:

Here's one difference. The system will not stop at trying to find hidden models. Once it's satisfied in that arena (or perhaps once it's become dissatisfied), it will go on to look for doubly-hidden models, triply-hidden models, et cetera. Interesting.
OK. Forget the old argument I mentioned (but never posted). I now have a new, more exact/formal/rigorous/not-wrong argument which shows the exact relationship of my method to other, known/loved methods of the pattern recognition field: markov models and hidden markov models.

(In a way, this is silly. Every time I get ahold of some real information about the field, I realize how much I've merely reinvented in primitive, approximate form. The current realization is from a book Dr. Albee lent me.)

A Markov model is a table of transition probabilities between states. This is a somewhat simplistic method of patternfinding, because it is making entries in a static table. Essentially, it's a probabilistic way of recording rote repetition. In fact, if we record the probability of each sequence of some set size, we can easily use these numbers to calculate the table entries in the markov model. These sequences are what I called "aggregates" before- so finding aggregates is precisely equivalent to finding markov models, except that certain things are easier to calculate if you've been collecting aggregates while other things are easier if you've been calculating markov models.

But, somewhat more exciting:

Whereas in a markov model, one assumes that the next state of the system depends completely on the previous states, in a hidden markov model, one instead assumes that the current state depends soley on the state of a "hidden" variable. The hidden variable follows the markov assumption: it's next state is determined by it's previous states in some probabilistic manner. But the observable variables are assumed to be dependant on this hidden variable, rather than on their own past states. It's like watching a markov-system through a frosted window; we can only determine the state of the hidden variable by observing what we can see. We must learn both the markov-model that governs the state of the hidden variable, and the probabilities of different visible-variable states given each hidden-variable state. We can't actually learn one without already having some guess about the other, so learning hidden markov models is pretty difficult.

What's cool is that "variable patterns" are equivalent to hidden markov models. A variable pattern is characterized by an unknown (but guessed-at) distribution of observable state probabilities. The system must attempt to learn transitions between variable patterns at the same time as trying to learn the variable pattern's definitions. (This is reflected by the fact that aggregates can be formed completely of variable patterns.) My system tries to learn hidden markov models automatically.

My system, therefore, has different elements that are equivalent to two related patternfinding methods. It serves as an interesting go-between; it learns regular markov models and then attepts to learn hidden markov models on top of 'em. I should research further into the issue to see if there are similar things out there. However, this doesn't completely characterize the system... at least I think it doesn't. In relating the system to markov and hidden markov models, I didn't mention all of the five points from the conclusion in the last post. I'll have to think about it.