Saturday, November 17, 2007

An Interpretation of Probability


You know that something is seriously wrong when, in a science, there is as serious of talk about correctly interpreting a theory as there is work attempting to extend the theory or propose new theories. I personally think it is ridiculous that so much effort is put into "interpreting" quantum mechanics. There's the many-worlds interpretation, the ideas about what counts as an "observer", questions of whether or not a particle actually has a definite speed and location if they can't even in principle be observed simultaneously (what can't be observed doesn't exist), arguments about what it means for an event to be probabilistic, and so on. I'm frustrated both because these don't seem to be real questions (or where they are real, we should experimentally test them rather than argue), and because I want to claim obvious answers myself.

But enough of criticizing a science I know little about. Another field, which I know somewhat more about, is experiencing a similar problem: probability theory. The argument between Bayesianism and Frequentism, two separate schools of thought that have their own very different mathematics for statistics and machine learning, is essentially an argument about the meaning of probability.

Frequentists interpret probability based on the idea of random variables and physical random processes. Probability is defined as a statistical frequency; a probability for an event is the ratio at which that event will occur if we sample for a sufficiently long amount of time. For example, if we flip a coin enough times, we can determine to any desired degree of accuracy the actual ratio of heads to tails for that coin. This seems intuitive, right? This notion turns probability into a solid, objective quantity. In fact, frequentist probabilities are often called "objective probabilities".

Bayesians, however, disagree. For a bayesian, probability can also represent a degree of belief. This distinction is particularly important when an event only occurs once, or when we're talking about the probability of a statement, which can only be actually true or actually false. Frequentist probability cannot be used here, because there is no frequency to measure. For example, suppose we're back before the time of Magellan. You ask me my opinion on the shape of the Earth. I, being well-versed in the popular philosophy of the time, suppose that the earth is round; but I'm not sure. So I give it a 95% chance. From a frequentist view, this is nonsense. The earth is either flat or round; it isn't a physical random process. The frequency of a flat earth is either 1 or 0.

At this point, you're probably siding with the frequentists. The earth does not have a 5% chance of being flat-- sticking an exact number on it sounds silly, even if I want to say that there's a possibility. But let's consider an example you might sympathize with a little better. Suppose you have a friend who is purchasing a vehicle for the first time. You know that your friend (for some reason) refuses to own any car that is not either pink or purple. However, you don't know of any bias between the two colors, so before you see the car, you can do no better than assign a 50% probability to each color. Upon showing you the car, which is pink, your friend explains that the decision was made based on alphabetical order; pink comes before purple.

Now notice-- you had no way of knowing which of the two colors your friend would choose. It seems very reasonable to assign equal probabilities to each. However, such a probability does not seem to be a frequency-- if we rewound history to "try again", the friend would always reason the same way, and the new car would always be pink. Probability can only be interpreted as degree of belief here; and that's exactly what bayesians want to allow. In contrast to the frequentist objective probabilities, bayesian probabilities are called "subjective probabilities". (It is fairly common practice to admit both kinds of probability, so that one might say "The coin has an objective probability of landing on heads [since it could land on either heads or tails], but a subjective probability of being counterfeit [it either is or it isn't]".)

The battle between bayes and frequency has been a long one, but that's not exactly what I'm going to talk about. I'd like to talk about my own grapplings with the interpretation of probability. While I am (at the moment) firmly bayesian in terms of the mathematics, I actually agree with both interpretations of probability; I think that, all probabilities represent a belief about a frequency. But I also want to be able to say a few other things about probability, which I'm not sure are entirely consistent with that view.

Here's what I want to say about the meaning of probability:

(1) All probabilities are a belief about a frequency.

-Even the most obviously frequentist example of probability, coin flipping, can be thought of in this way. We never know the exact distribution of heads to tails; but we use the 50/50 estimate regularly, and it causes no problems.

-Many probabilities that seem to be fundamentally bayesian can be explained with the concept of possible universes. The frequency of a flat earth in our universe is 0. But the frequency of a flat earth in all possible universes may be higher. Even if it's not, because we can never know probabilities for certain, but only estimate them, it's entirely reasonable for someone who does not have the benefit of modern science to estimate something like 5% of all alternatives universes have a flat earth. The estimate can be improved later. (Because we can only ever visit one universe, our estimate of the distribution of possible universes will always be quite crude; so probabilities based on such estimates will have a "subjective" feel to them. But I claim that they are not really a different kind of probability.)

-When counting the frequencies, we only consider universes that match ours sufficiently; in the flat earth example, we consider alternative universes that match everything we *know* about earth, and estimate the frequency of something we *don't* know: whether the earth is flat or round. Similarly, in the example of the pink car, we consider universes that match things we know, but (being only human) are unable to use facts we don't know (such as the fact that our friend loves using alphabetical order to resolve difficult decisions). (This is called a "conditional probability"; if you think it doesn't sound very well-defined, I assure you that it is a very mathematical notion, which has been rigorously founded in logic.) This explains another reason that bayesian probabilities seem "subjective": different people are often considering different evidence (pinning down different facts) when giving probabilities.

(2) All probabilities are really just statements about uncertainty.

-I'm claiming here that the concept of a "physical random process" is philosophically unnecessary. When an event like a coin flip seems random, it is actually quite deterministic; we just don't know all the relevant factors involved. Even if we do know all the relevant factors, we aren't necessarily able to calculate their influence on the final result (at least not quickly).

-Whenever we assign something a probability, then, it's simply because we don't know all the relevant facts (or haven't calculated their influence). Quantum mechanics, for example, gives us probabilistic laws for the behavior of particles; I'm claiming that the probabilistic nature of these laws shows that they aren't the fundamental laws, and that our predictions of quantum events could in principle be improved with more information (or perhaps with a better method of calculating).


To be perfectly honest, I'm less certain about the second claim. I think both claims seem to be what's intuitively correct; but the two seem to contradict eachother.

If probabilities are statements of uncertainty, can they also be frequency estimates?

It seems possible in at least some situations. For example, when flipping a coin, we can create a frequency estimate for both heads and tails, but still claim that if we knew more, we would be able to calculate ahead of time which it would be. In this case, the frequency is based on common known variables from toss to toss (mainly the coin involved), whereas the uncertainty is caused by the unknown variables (the physics of the toss). But this doesn't necessarily solve everything.

The first conflict that I see is that the idea that there are other possible worlds, necessary to my argument for (1), seems to be ruled out by (2). If anything can in principle be narrowed down to one possibility by knowing the relevant facts, then there can be no actual alternatives! Alternatives are illusions, which can be eliminated by the diligent scientist.

Never mind the conflict with (1)! Is this view even self-consistant? (2) asserts that any question can be answered by looking at the relevant information. But this implicitly assumes that every event has a cause. The result is an infinite regress of causes. (While this isn't a direct inconsistency, it is reason to worry.)

So, I'm willing to back down on (2), instead stating two possibilities:

(A) The universe has some set of first causes. Therefore, there exist real alternatives; the universe actually could have been different. The probability of these different universes is unknown and unknowable, because we can only observe on universe, but it isn't an unintelligible concept. (1) holds but (2) does not hold if the event we're examining happens to be a first cause.

(B) The universe has no uncaused events. There is an infinite chain of causes leading to each event. (2) holds, but (1) does not always hold: the universe is deterministic, so probability cannot always be interpreted as a belief about frequency. Specifically, (1) doesn't work when we would resort to alternative universes because we don't have multiple instances of the given event in our universe.

I'm a bit torn between the two, but I prefer (A). I don't have anything against the infinite past implicit in (B), except that the entire timeline has an uncaused feel to it. Why this timeline rather than another? If (B) is correct, then there is some reason. OK. Sure. But (B) also states that this reason has a cause, and it's cause has a cause, and so on. Another infinite causal chain, over and above time. But what caused that infinite chain? And so on. The idea of infinite causes is a bit unsettling.

So is the idea of an uncaused event, to be sure. But I like being able to say that there are real alternatives, and to say that, it seems I've got to admit uncaused events.

Notice that these problems melt away if we restrict our talk to manageable islands of reality rather than entire universes. Both (1) and (2) hold just fine if we don't run into an uncaused event or an event so singular that none like it have ever occurred before or will occur again.

Can we ever know what's really true? Can we experimentally differentiate between (A) and (B)? Probably not. Then why does it matter?

Maybe it doesn't.

Wednesday, November 14, 2007

In case anybody out there is wondering what I'm up to.

I've started an AI club at my university, which is great. I mean, we actually didn't have one! Shouldn't every self-respecting computer science department have some crazy AI people stashed in a corner trying to change the world?

Well, we're not there yet-- our short-term goal is to start a small AI competition in a game called Diplomacy. Turns out there's a pre-existing framework for it:

www.daide.org.uk

Also, I've been looking at something that goes by various names, including "competent optimization". I'd call it "intelligent search". Based on genetic algorithms, the idea is to think while searching; more specifically, based on what's been seen so far, attempt to learn the characteristics of good solutions, thus guiding the search.

http://www.cs.umsl.edu/~pelikan/boa.html

http://metacog.org/doc.html

The idea of intelligent search is something I've thought about before, so I was both pleased and disappointed to see it already being researched. This means I can't say I invented it! To to meaningful research in the field, I've got to up the quality of my ideas :).

Of course, really, I haven't done any "meaningful research" at all yet. So it goes. Part of my problem is that I don't focus on one thing well. Also, I seem to like getting started far more than finishing. Coming up with ideas is more exciting than implementing them.

To implement competent search, or at least to implement the Bayesian Optimization Algorithm (which I guess is the current best), I'll need a Bayes Net framework. There are many to choose from, but here's the one I picked:

http://www.openbayes.org/

Probably the Bayes framework in Matlab is the best (except for speed issues), but Matlab costs money (although this Bayes framework for it is free and open source).

http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

So I've been having a lot of thoughts about possible improvements to these intelligent search methods. Mainly, I'm trying to figure out what the theoretically perfect way of doing it would be-- that is, assuming that the only thing that takes computational resources is the actual testing of a point, and so that we can do any complicated analysis we like to decide what point to pick next.