Wednesday, September 05, 2007

Some thoughts on the hidden-variable puzzle.

In principle, the problem is already solved-- we already know the search space, and could (again, in principle) simply look through all possible hidden variable configurations. All we need is a reasonable prior. (This is not entirely non-trivial, but we'll save it for later.) This search space, however, is huge. It seems likely that there is some redundancy here; and even if there aren't, there should be some general principles that tend to yield correct results-- hidden variables that are typically useful. This would allow the continuation of the paradigm I've followed thus far: search for the simplest patterns quickly, but search slowly for the general case.

The fundamental question here is when to check for a hidden variable. An interesting way to rephrase this: when do we accept something as a general pattern, and when do we instead think something requires an explanation? For example, if all squirrels had a striped fur pattern, most people would accept that as part of what a squirrel is. (Evolutionary biologists would be the minority asking for an explanation.) However, if a particular box made a strange tinkling noise when shook, we will probably wonder why.

I think some of what's going on is that we accept patterns when we have little hope of discovering the hidden mechanisms, but question them when we have some idea of the structures involved and so can hope to reach a solution. In the case of the box, we know that there is something hidden inside it, and that what's inside determines the noise. This is not the phenomenon I'm interested in. In this case, we already know that there is a hidden variable, and are merely learning about it. How do we come up with the hidden variables in the first place?

One strategy might be to assume that each basic variable is "masking" a hidden variable. This means that, essentially, we copy over the visible network to make a hidden network that we assume at first is fairly similar in structure. We then make reasonable modifications on the hidden network to improve its predictive power. The first type of modification would be to reduce static. Aberrations in the visible variables that have no causal effect (they do not seem to modify their surroundings) do not need to be a part of the shadow network. This may reduce the amount of past-dependance needed; where before the state a few steps ago might be necessary to predict the current state because static blotted out what was inbetween, now it is evident that the state depends on hidden steps that were masked by noise. Space-dependence may reduce for similar reasons. This is good, but so far everything I've mentioned can be done without a shadow net, by just recognizing the uncertainty in the initial variables. A more interesting effect would be the possibility of a different sort of variable pattern: a single state in a hidden variable might take the role of multiple states in the visible variables. (To this end one might purposefully experiment with shadow networks containing variables with fewer states than the actual network.) But the most promising results should come from changes to the structure of the hidden network. Several closely related variables might be dependent on a single hidden variable. Visible variables might not depend directly on their shadow-pair, but instead be related to the shadows of nearby relatives.

Not only do the hidden variables represent variable patterns, but they do so in a way that is explicit, rather than implicit. This satisfies the explicit-or requirement. How can the explicit self reference be constructed?

Here I am, searching for turing-completeness again. The truth is, there are multiple meanings of turing-complete. More precisely, something can be turing-complete concerning different domains; it can be complete from different angles. For example, the knowledge representation could be turing complete but the learning algorithm might not be. Turing-complete could mean a visible or hidden turing machine. Most importantly for the current situation, a hidden turing machine could manifest its pattern in different ways. The form of turing completeness that I considered before (calling the calculations "scratchwork") is turing completeness of the function from the past to the present. The present may be determined by arbitrary calculations involving the past. This may or may not be equivalent to the more standard idea of a turing-complete learner (from the works of Solomonoff and Hutter). The assumption there is instead that the world is the output of an unknown turing machine. The scratchwork idea makes more sense to me, because that sort of knowledge is more generalizable; the same rule is applying itself in many places (throughout time). (Actually, since the mechanism works just as well for space, it isn't specifically limited to time.) In the more standard definition, the only repetition is the recursive calculation of the end result; this is not necessarily observable in any dimension of the data, existing instead only "in theory" (reminiscent of the way scientists sometimes claim that quarks or strings or extra dimensions are "mathematical abstractions" rather than physical fact). Learning this sort of turing-complete model sounds harder (although both are technically incalculable thanks to the halting problem), and so I would think that their sense of turing-complete is "stronger" than mine (more general).

Stronger doesn't necessarily mean better, however. Another interesting sense in which an AI can be turing-complete is that it can learn that the world behaves as any turing machine if taught properly. This sort of AI is more or less useful depending on what "taught properly" means; if an extremely narrow definition is chosen, any programmable computer satisfies the definition. Slightly more interesting is the definition "teaching = explaining how the world works in some formal notation". This does not include all programmable computers because computers accept programs, not descriptions of the world. A system satisfying this more strict definition needs a planning/deduction system, to turn the knowledge into behavior. If teaching properly involves only exposure to a learnable world, then it's back to Solomonoff. But somewhere between Solomonoff and planning systems, there might be some useful possibilities. I've gone too long in this, so I'll just say that John Andreae's PurrPuss learns turing-complete models if taught with something called "soft teaching", which is similar to a parent directing a child's actions (telling the child what to do, but without telling it why). This might be another direction in which I can do a more exhaustive search of the easier learning domain, beyond merely "turing complete": quickly learn weaker senses of turing-complete, but still try to learn Solomonoff-style with a very slow search.