Well, I think I've come up with something of an answer. I want a prior to be interpreted as a frequency estimate on possible worlds. This sounds funny, because we can't possibly estimate such a frequency: we only live in one world. But this is actually just fine: we shouldn't be estimating it, because it's our prior. It's what we use to estimate.
Anything more we learn, we learn using our prior. So can't improve upon our prior. If you've got a bad prior, tough luck.
A prior is an estimate of the frequency of alternative worlds. The perfect prior would contain all knowledge we ever needed; it would give our actual world-of-birth a probability of 1, and all other worlds, 0. But no two people are born to the same world, so evolution couldn't find this prior. (By the way, we could also view evolution as using a prior-- this prior is given to it by the very nature of chemistry and physics, and is not very good, but far better then it might have been.) So a slightly weaker and more useful notion of the perfect prior would be one that would do a fair job if all humans had it. Forcing all humans to have the same prior (which is close to being true) causes the perfect prior to have far more interesting structure (ie some learning occurs), although it still would have freakish foresight for things common to all humans (it would know the one true physics, for example).
Since what I'm interested in is learning, I want some way of ruling out this freakish foresight: I want to talk about a universal prior, one that will learn well no matter what the true physics turns out to be (and so on). I'm rejecting the Solomonoff prior because I think computability is too strict a requirement, but I also know that some restrictions are needed (otherwise there is no structure for the prior to take advantage of). What kind of a universal prior is this? And once I've figured that out, is it really of any use?
Friday, February 15, 2008
Saturday, February 09, 2008
More on the Interpretation of Probability
In my previous discussion, I have failed to distinguish between logical necessity and physical necessity. This distinction is critical for my analysis requirement 2: a probability is a statement of uncertainty that could always be turned into certainty given more information. In my first post on the interpretation of probability, I talked about alternatives (A) and (B):
(A): The universe has some set of uncaused events.
(B): The universe does not have uncaused events.
(A) implies that there are actual alternatives: the universe could have happened differently, if these uncaused events happened differently.
(B) implies that there are no such alternatives. However, this comes at the cost of implying an infinite chain of causes. Worse, as I reasoned before, we should still desire a cause even of this infinite chain; an infinite past history is not enough, because we also need an explanation of why that particular infinite past is the one we have. Furthermore, we need a cause for this cause, and so on.
Now comes the new stuff. When I stated this, I had in mind that the explanation for the infinite past was physics. This isn't quite sufficient: physics needs to be supplemented by a single completely-specified time-slice. Then a deterministic physics can specify the infinite past and future for us. So allowing this, we further ask: why does the universe contain this particular physics? To answer this, we create a meta-physics that specifies for us an explanation of why physics is the way it is. (My impression is that there is work in theoretical physics corresponding to this desire.) Again, in addition to a meta-physics, we need something like a physics-slice sufficient to specify the rest of physics (that is, a minimal number of variables that determine the rest via the meta-physical laws).
(I should note that my concept of "meta-physics" is not the typical concept of "metaphysics": metaphysics is a philosophical pursuit, but meta-physics is essentially very abstract physics, which worries about things like how we might predict the existence of electrons from base principles.)
The chain continues to meta-meta-physics, meta-meta-meta-physics, et cetera ad infinitum.
All of these things are akin to physical necessity. Each fact is determined by some law of a corresponding physics (except for the determining slices). However, as the sets of rules get more abstract, it seems as if we will hit the ceiling of mathematical necessity. This may have happened already: the physical theory of A. Garrett Lisi (which from what I know looks like what I called a meta-physics) describes our (meta-) physics as an algebra, so that the obvious alternatives are different algebras. (As I understand it: our physics obeys the algebra E8, but E8 does not fully determine our physics; in addition, we need information about symmetry-breaking. So E8 is the meta-physics and symmetry breaking information is the minimal set of variables needed to determine everything else about the physics. But don't take my word for fact.) The algebraic alternatives are governed by mathematical laws. The mathematical laws are governed by logic. So logic may be as little as 3 metas away! (E8 is meta-physics, so math is meta-meta, so logic is meta-meta-meta-physics.) This seems to stop the questionable infinite regress: it seems at least plausible for logic to be uncaused and unexplained.
But even this doesn't tie us down to one possible world, namely because of all those minimal slices that we specify along the way. These do the real work of specifying everything; logic's being "at the top" is only a convenient trick. Presumably we could forever play the game of navigating plausible infinite regresses of explanation and plausible places for the regress to stop, but it seems that we could never find a real end to it. (This is particularly chronic if we start to question which logic is at the top, i.e. classical vs intuitionistic vs many other possibilities, and why that logic is at the top, i.e. do we need a meta-logical theory?)
Therefore we are forced into (A)! There must be fundamentally unexplained things, and therefore actual alternatives.
By the way, I don't think that exempts humanity from the investigation of the hierarchy I've described ultimately topped by logic: this hierarchy seems very important despite its ultimate futility (we cannot in principle explain everything, but we must still explain as much as possible).
Concluding (A) does not force me to give up (1) or (2). In particular, I had previously assumed that it went against (2), because I took "more information" to mean causal (or explanatory) information. This is unnecessary; all I need to say is that all meaningful statements are either true or false. Thus the "more information" may merely be the fact itself. As an example, suppose that some physical events really are random: physical law dictates probabilities but not the definite outcome. What I'm saying is that there still is a definite fact of the matter, although not determined by physical law; if we knew everything there is to know, we would know the outcome.
There are still some clarifications needed. I have dealt with issues arising from my first post, but there are also difficulties with what I said in my second post. Basically, in that post I trapped myself into a fundamental confusion that always will arise for those that try to do away with the concept of a bayesian prior. This is hinted at in the infinite regress I get into when I try to take into account uncertainty about relevant information. At some point, if the probability estimate given is to be coherent, the person must invoke a prior belief concerning each probability. So to revise the conclusion I reached there:
A probability (used by a person) is (1) a belief concerning a frequency, (2) a statement of uncertainty given limited information which can always be turned into certainty given more information, and (3) based on some prior belief (updated by the limited information available).
The image here is that our limited information narrows down the space of possible worlds somewhat, and that we hold beliefs about the frequencies of events in the remaining possible worlds. So, for example, if we assert a 50% probability that our friend will buy a pink car, we mean that (given our prior) our information narrows us down to a space of possible worlds such that in about half our friend will get a pink car.
I want to interpret the prior probability in a way consistent with the way I interpret the beliefs formed using the prior plus evidence. Otherwise, it doesn't seem like a full interpretation. I think the best way of doing this, in line with the idea that a probability is a frequency estimate, is to say that the prior is the person's estimate of the frequency of all possible worlds.
I admit this sounds strange. Accepting (A) forces me to say that there are actual alternatives, meaning possible worlds. It seems somewhat reasonable to attach probabilities to these (by attaching probabilities to the uncaused facts). But to go a step further and call these frequencies? Does this make sense?
I suppose I'm forced to leave this question open for now. I think the answer is yes, but my only reason for thinking so is the way it simplifies the whole scheme.
In my previous discussion, I have failed to distinguish between logical necessity and physical necessity. This distinction is critical for my analysis requirement 2: a probability is a statement of uncertainty that could always be turned into certainty given more information. In my first post on the interpretation of probability, I talked about alternatives (A) and (B):
(A): The universe has some set of uncaused events.
(B): The universe does not have uncaused events.
(A) implies that there are actual alternatives: the universe could have happened differently, if these uncaused events happened differently.
(B) implies that there are no such alternatives. However, this comes at the cost of implying an infinite chain of causes. Worse, as I reasoned before, we should still desire a cause even of this infinite chain; an infinite past history is not enough, because we also need an explanation of why that particular infinite past is the one we have. Furthermore, we need a cause for this cause, and so on.
Now comes the new stuff. When I stated this, I had in mind that the explanation for the infinite past was physics. This isn't quite sufficient: physics needs to be supplemented by a single completely-specified time-slice. Then a deterministic physics can specify the infinite past and future for us. So allowing this, we further ask: why does the universe contain this particular physics? To answer this, we create a meta-physics that specifies for us an explanation of why physics is the way it is. (My impression is that there is work in theoretical physics corresponding to this desire.) Again, in addition to a meta-physics, we need something like a physics-slice sufficient to specify the rest of physics (that is, a minimal number of variables that determine the rest via the meta-physical laws).
(I should note that my concept of "meta-physics" is not the typical concept of "metaphysics": metaphysics is a philosophical pursuit, but meta-physics is essentially very abstract physics, which worries about things like how we might predict the existence of electrons from base principles.)
The chain continues to meta-meta-physics, meta-meta-meta-physics, et cetera ad infinitum.
All of these things are akin to physical necessity. Each fact is determined by some law of a corresponding physics (except for the determining slices). However, as the sets of rules get more abstract, it seems as if we will hit the ceiling of mathematical necessity. This may have happened already: the physical theory of A. Garrett Lisi (which from what I know looks like what I called a meta-physics) describes our (meta-) physics as an algebra, so that the obvious alternatives are different algebras. (As I understand it: our physics obeys the algebra E8, but E8 does not fully determine our physics; in addition, we need information about symmetry-breaking. So E8 is the meta-physics and symmetry breaking information is the minimal set of variables needed to determine everything else about the physics. But don't take my word for fact.) The algebraic alternatives are governed by mathematical laws. The mathematical laws are governed by logic. So logic may be as little as 3 metas away! (E8 is meta-physics, so math is meta-meta, so logic is meta-meta-meta-physics.) This seems to stop the questionable infinite regress: it seems at least plausible for logic to be uncaused and unexplained.
But even this doesn't tie us down to one possible world, namely because of all those minimal slices that we specify along the way. These do the real work of specifying everything; logic's being "at the top" is only a convenient trick. Presumably we could forever play the game of navigating plausible infinite regresses of explanation and plausible places for the regress to stop, but it seems that we could never find a real end to it. (This is particularly chronic if we start to question which logic is at the top, i.e. classical vs intuitionistic vs many other possibilities, and why that logic is at the top, i.e. do we need a meta-logical theory?)
Therefore we are forced into (A)! There must be fundamentally unexplained things, and therefore actual alternatives.
By the way, I don't think that exempts humanity from the investigation of the hierarchy I've described ultimately topped by logic: this hierarchy seems very important despite its ultimate futility (we cannot in principle explain everything, but we must still explain as much as possible).
Concluding (A) does not force me to give up (1) or (2). In particular, I had previously assumed that it went against (2), because I took "more information" to mean causal (or explanatory) information. This is unnecessary; all I need to say is that all meaningful statements are either true or false. Thus the "more information" may merely be the fact itself. As an example, suppose that some physical events really are random: physical law dictates probabilities but not the definite outcome. What I'm saying is that there still is a definite fact of the matter, although not determined by physical law; if we knew everything there is to know, we would know the outcome.
There are still some clarifications needed. I have dealt with issues arising from my first post, but there are also difficulties with what I said in my second post. Basically, in that post I trapped myself into a fundamental confusion that always will arise for those that try to do away with the concept of a bayesian prior. This is hinted at in the infinite regress I get into when I try to take into account uncertainty about relevant information. At some point, if the probability estimate given is to be coherent, the person must invoke a prior belief concerning each probability. So to revise the conclusion I reached there:
A probability (used by a person) is (1) a belief concerning a frequency, (2) a statement of uncertainty given limited information which can always be turned into certainty given more information, and (3) based on some prior belief (updated by the limited information available).
The image here is that our limited information narrows down the space of possible worlds somewhat, and that we hold beliefs about the frequencies of events in the remaining possible worlds. So, for example, if we assert a 50% probability that our friend will buy a pink car, we mean that (given our prior) our information narrows us down to a space of possible worlds such that in about half our friend will get a pink car.
I want to interpret the prior probability in a way consistent with the way I interpret the beliefs formed using the prior plus evidence. Otherwise, it doesn't seem like a full interpretation. I think the best way of doing this, in line with the idea that a probability is a frequency estimate, is to say that the prior is the person's estimate of the frequency of all possible worlds.
I admit this sounds strange. Accepting (A) forces me to say that there are actual alternatives, meaning possible worlds. It seems somewhat reasonable to attach probabilities to these (by attaching probabilities to the uncaused facts). But to go a step further and call these frequencies? Does this make sense?
I suppose I'm forced to leave this question open for now. I think the answer is yes, but my only reason for thinking so is the way it simplifies the whole scheme.
Subscribe to:
Posts (Atom)