<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-28417647</id><updated>2012-01-19T06:59:42.288-08:00</updated><category term='propositional models'/><category term='ethics'/><category term='computation'/><category term='math'/><category term='hypercomputation'/><category term='tarski hierarchy'/><category term='reality'/><category term='ai'/><category term='uncountable'/><category term='logic'/><category term='deduction'/><category term='interpretations'/><category term='lambda calculus'/><category term='prior'/><category term='relational models'/><category term='meaning'/><category term='probabilistic relational models'/><category term='convergence'/><category term='metamathematics'/><category term='grounding'/><category term='progic'/><category term='paraconsistent'/><category term='types'/><category term='compression'/><category term='foundations of mathematics'/><category term='set theory'/><category term='truth'/><category term='combinator'/><category term='emergence'/><category term='proof theory'/><category term='AM'/><category term='self-reference'/><category term='markov logic'/><category term='guiding inference'/><category term='singularity'/><category term='AGI'/><category term='hyperlogic hypercomputation'/><category term='infinity'/><category term='formal grammar'/><category term='probability'/><category term='prediction'/><category term='ordinals'/><category term='hyperlogic'/><category term='constructivism'/><category term='predicate models'/><category term='morality'/><title type='text'>Artificial Intelligence</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>86</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-28417647.post-1745532863083557971</id><published>2010-01-18T15:16:00.001-08:00</published><updated>2010-07-30T21:39:41.838-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size: 180%;"&gt;Moving&lt;span style="font-size: 100%;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;I've been considering starting over with this blog thing for a while, so I thought I'd put up a post warning the few people who follow this blog. My reason for the move is that I feel much of what I have said on this blog is, well, foolish given what I've learned. At the same time, my current beliefs are not that far off what I have said, making it hard to correct what I've said without long explanations. In addition to this, there is a great deal that I have not written, which I should have; and what I have written, I have not organized in any fashion.&lt;br /&gt;&lt;br /&gt;So, I feel that it is best to start from the top and explain my beliefs, goals, intuitions, and so on.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;I'll be starting on another Google blog, &lt;a href="http://dragonlogic-ai.blogspot.com/"&gt;The Logic of Thought&lt;/a&gt;. I realise, the name may sound a bit pretentious-- I do not claim to have the answer. Still, that seems like a fair label for the question I am asking.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;Edit-- I changed the name to "In Seach of a Logic of Thought."&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;Edit again-- Now it's "In Search of Logic." &lt;/span&gt;&lt;span style="font-size: small;"&gt;Ok, last name change I promise.&lt;/span&gt;&lt;span style="font-size: small;"&gt; :)&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1745532863083557971?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1745532863083557971/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2010/01/moving-ive-been-considering-starting.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1745532863083557971'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1745532863083557971'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2010/01/moving-ive-been-considering-starting.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8566731849348090809</id><published>2009-08-21T10:50:00.000-07:00</published><updated>2009-08-23T12:22:00.072-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Climbing the Mountain&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Undefinability is a harsh reality.&lt;br /&gt;&lt;br /&gt;Any solution that one offers is still vulnerable to the attack: full self-reference seems impossible. Regardless of how clever you think you've been, &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; you propose a solution, I can point to a structure that can't be defined within that system: the system itself. If that's not the case, then the system is inconsistent. (And even &lt;span style="font-style: italic;"&gt;then&lt;/span&gt;, even if we &lt;span style="font-style: italic;"&gt;allow&lt;/span&gt; inconsistent systems, I know of no system which could be said to fully describe itself.)&lt;br /&gt;&lt;br /&gt;What this all suggests is that human beings (along with a broad range of sentient entities) are &lt;span style="font-style: italic;"&gt;fundamentally&lt;/span&gt; incapable of articulating our own inner logic. Why? For the same reason that a powerset is larger than a set: if we could fully refer to ourselves, our logic would be "larger than itself" (loosely speaking).&lt;br /&gt;&lt;br /&gt;It comes down to a problem of wanting to be able to form sentences that mean anything we might want. If we're fully capable, then we can construct a sentence that is true precisely when it is not true... and the system falls apart.&lt;br /&gt;&lt;br /&gt;Kripke's fixed point theory offers a nice fix: with some specific machinery, we're able to call these sentences "undefined". But now we can't refer properly to this idea, "undefined". So we've got a complete theory of truth (one might say), but we're still stuck with regards to undefinability.&lt;br /&gt;&lt;br /&gt;So, it looks like we've got to accept it: we can't find a mind (artificial or otherwise) that fulfills the imperative "Know Thyself". The self remains shrouded in mystery. What's a self-respecting AI engineer to do? Am I forced to always design minds with less power of logical reference than my own, because I could not comprehend a design that was truly up to human capability? Are artificial intelligences doomed to be fancy calculators, lacking "understanding" because they will always have a weaker logical structure?&lt;br /&gt;&lt;br /&gt;First, no. That doesn't really follow. It's quite possible to use an algorithm that can &lt;span style="font-style: italic;"&gt;in principle&lt;/span&gt; learn anything: evolution. For example, one could build an artificial mind that held an initially simple program within it, mutated the recently run areas of code when punishment occured, and strengthened recently run code against mutation when rewarded. Or, a little more sophisticated, one could implement Schmidhuber's Sucess Story algorithm, which always and only keeps apparently beneficial mutations, is capable of learning what and when to mutate, can learn to delay reward, and has other desireable features. And yet again, we could try William Pearson's design, which sets up an artificial economy of agents which can co-evolve to produce the desired behavior. With these styles of approaches, there is not a worry of fundamental limitation: such systems can learn the correct logic if it exists (it just might take quite a while!). The worry, rather, is that these aprroaches do not take full advantage of the data at hand. There is no guarantee that they will perform better given more processing power and memory, either. In short, they are not a model of rationality.&lt;br /&gt;&lt;br /&gt;This could be taken as an indication that studying logic and rationality is not directly relevant to AI, but I would not agree with such an argument. For one thing, it &lt;span style="font-style: italic;"&gt;is&lt;/span&gt; possible to derive a model of rationality from such approaches. If they work, there is a reason. The techniques each essentially provide some way of evaluating how a particular program of behavior is doing, together with a technique of searching through the possible behaviors. One could consider the space of all possible programs that might have generated the behavior so far, rather than the single program that actually did. One then takes the best program from that space, or perhaps a weighted vote. Obviously there will be some details to fill in (which is to say that such models of rationality don't just follow &lt;span style="font-style: italic;"&gt;directly&lt;/span&gt; from the evolutionary algorithms employed), but the general approach is clear... such a system would take an infinite amount of processing power to compute, so one would need to use approximations; the more computing power given, the closer the approximation could be. All the data at hand is now being used, because the system now has the ability to go back and re-think details of the past, asking if particular sensory patterns might have been clues warning of a punishment, et cetera.&lt;br /&gt;&lt;br /&gt;So why not accept such models of rationality? I have two reasons... first, they are purely reinforcement-learning-based. Agents based on these models can be driven only by pleasure and pain. There is no ability to consider external, unobserved objects; everything consists of patterns of directly observed sensation. Second, even if one is OK with purely reward-based systems, it is not clear that these are optimal. The evaluation criteria for the programs is not totally clear. There needs to be some assumption that punishment and reward are associated with recently taken actions, and recently executed code, but it cannot be too strong... The sucess story approach looks at things in terms of modifying a basic policy, and a modification is held responsible for all reward and punishment after the point at which it is made. The simple mutation-based scheme I described instead would use some decaying recent-use marker to decide responsibility. William Pearson suggests dividing time up into large sections, and splitting up the total goodness of the section as the payment for the programs that were in charge for that time. Each of these will result in different models of rationality.&lt;br /&gt;&lt;br /&gt;So, I desire an approach which contains explicit talk of an outside world, so that one can state goals in such language, and furthermore can apply utility theory to evaluate actions toward those goals in an optimal way. But, that takes me back to the problem: which explicit logic do I use? Am I doomed to only understand logics less powerful than my own internal logic, and hence, to create AI systems limited by such logics?&lt;br /&gt;&lt;br /&gt;One way out which I'm currently thinking about is this: a system may be initially self-ignorant, but may learn more about itself over time. This idea came from the thought that if I was shown the correct logic, I could refer to its truth predicate as an &lt;span style="font-style: italic;"&gt;external&lt;/span&gt; thing, and so &lt;span style="font-style: italic;"&gt;appear&lt;/span&gt; to have greater logical power than it, without really causing a problem. Furthermore, it seems I could learn about it over time, perhaps eventually gaining more referential power.&lt;br /&gt;&lt;br /&gt;In understanding one's own logic, one becomes more powerful, and again does not understand one's own logic. The "correct logic", then, could be imagined to be the (unreachable) result of an infinite amount of self-study. But can we properly refer to such a limit? If so, it seems we've got an interesting situation on our hands, since we'd be able to talk about the truth predicate of a language more referentially powerful than any other... Does the limit in fact exist?&lt;br /&gt;&lt;br /&gt;I need to formalize this idea to evaluate it further.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8566731849348090809?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8566731849348090809/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/08/climbing-mountain-undefinability-is.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8566731849348090809'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8566731849348090809'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/08/climbing-mountain-undefinability-is.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8897788547249844180</id><published>2009-06-18T12:26:00.000-07:00</published><updated>2009-06-18T14:32:55.111-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reality'/><category scheme='http://www.blogger.com/atom/ns#' term='hypercomputation'/><category scheme='http://www.blogger.com/atom/ns#' term='convergence'/><category scheme='http://www.blogger.com/atom/ns#' term='emergence'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;The Importance of Uncomputable Models&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Inspired by &lt;a href="http://supermodelling.net/?p=130"&gt;this blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I have mentioned most of these thoughts before, but I am writing them up in one post as a cohesive argument. I will argue that uncomputable models are important. I am not arguing that people think in ways that computers cannot, but rather than people and computers alike can benefit from using models that have "holes" which represent meaningful questions that can't be answered definitively by running a computation.&lt;br /&gt;&lt;br /&gt;The world that we live in consists of a bunch of macro-objects. What are macro-objects? Roughly speaking, they are coherent structures of micro-objects, which recur in both time and space.&lt;br /&gt;&lt;br /&gt;A macro-object "emerges" from the micro-dynamics. It is a structure which persists, propagating itself into the future, like a wave in water.&lt;br /&gt;&lt;br /&gt;A spherical bubble is a macro-object, because slight warps in the sphere get evened out over time. (Until it pops.)&lt;br /&gt;&lt;br /&gt;Similarly, a water droplet, a planet, and a star.&lt;br /&gt;&lt;br /&gt;An atom is an emergent object (though not macro-level) because the positive and negative charges of the electrons and protons enter into a stable relationship.&lt;br /&gt;&lt;br /&gt;A grasshopper is a stable entity because its metabolism maintains a homeostasis which allows it to live for a fair amount of time.&lt;br /&gt;&lt;br /&gt;Similarly for all living organisms.&lt;br /&gt;&lt;br /&gt;And, so on.&lt;br /&gt;&lt;br /&gt;The key idea here is that all of these objects are in a &lt;span style="font-style: italic;"&gt;convergent&lt;/span&gt; state: a state that (within limits) it always returns to after wandering away from.&lt;br /&gt;&lt;br /&gt;Now, to logical foundations. The halting problem is unsolvable; there is no computable algorithm which can tell you which computations do and do not halt. But, suppose we could run our computer an infinite amount of time to get the answer. A machine that can do this is called a hypercomputer. Then, we could solve the halting problem by waiting to see if the computation in question halted; so, we can use hypercomputations to answer many questions that regular computations cannot answer. However, we've got a new type of problem. Some computations will flip back and forth between answers infinitely often, so that when we run them an infinite amount of time, the output is indeterminate. The result of the hypercomputation is then undefined, or "nonconvergent".&lt;br /&gt;&lt;br /&gt;Asking whether a hypercomputation converges is analagous to asking whether a normal computation halts. In a specific sense, it is twice as hard: if we solved the halting problem for normal computations, and made a little magic box that can give the correct answer for halting questions, and connect that to an ordinary computer, then we have a machine equivalent to a hypercomputer. Asking whether the programs of the new machine halt is actually equivalent to asking if hypercomputations converge.&lt;br /&gt;&lt;br /&gt;So, halting is uncomputable, but convergence is doubly so!&lt;br /&gt;&lt;br /&gt;Yet, I have argued that convergence is all around us. On an everyday basis, we deal with convergent states as if they were solid entities. So, I argue, we are viewing the world through an uncomputable model.&lt;br /&gt;&lt;br /&gt;Mathematically, one should expect reasoning about convergence to be quite hard (assuming, as I do, that human reasoning is computable). Everyday reasoning is not "quite hard" in this way. We mitigate the full force of the uncomputability of our models with many coping strategies; we mainly reason under the &lt;span style="font-style: italic;"&gt;assumption&lt;/span&gt; of convergence (for structures that have converged in the past), rather than attempting to question this assumption. We have to &lt;span style="font-style: italic;"&gt;learn&lt;/span&gt; when things converge and fail to converge. Yet, even so, using uncomputable models is easier than trying to apply computable models to the problem. Asking whether a structure is generally convergent is a &lt;span style="font-style: italic;"&gt;very&lt;/span&gt; useful abbreviation, approximately summing up a lot of questions about the state at a given time.&lt;br /&gt;&lt;br /&gt;Also, it is important to admit that the mathematically pure concept of convergence is not quite what we are interested in. In practical situations, we are interested in whether something is &lt;span style="font-style: italic;"&gt;quickly&lt;/span&gt; convergent. This is not uncomputable; however, it can be more expensive to check then reasoning abstractly about convergence. So (and this is probably the weakest point in my argument) I think it is worthwhile to keep reasoning about the uncomputable models.&lt;br /&gt;&lt;br /&gt;Another interesting point is that, much of the time, while we have a fairly good concept of the emergent convergent objects we deal with day to day, we do not have such a good idea of the underlying dynamic. This means that, in practice, we do not ask &lt;span style="font-style: italic;"&gt;too&lt;/span&gt; many convergence-hard questions. Often, we think we already have those answers, and instead we ask what sort of underlying structure might give rise to them.&lt;br /&gt;&lt;br /&gt;PS--&lt;br /&gt;&lt;br /&gt;I am being somewhat messy here, because "convergent" in the case of hypercomputation does not mean quite the same thing as "convergent" in the case of emergent objects. For one thing, convergence of entities has to do with correcting for disturbances from the external environment, while convergence for hypercomputations does not. I think this difference does not harm the argument. As I see it, emergent-convergence is a more general problem, having hypercomputation-convergence as a subproblem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8897788547249844180?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8897788547249844180/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/06/importance-of-uncomputable-models-i.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8897788547249844180'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8897788547249844180'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/06/importance-of-uncomputable-models-i.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2141033664021344123</id><published>2009-06-14T14:03:00.000-07:00</published><updated>2009-06-15T15:01:04.223-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='types'/><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;What They Don't Tell You About Type Checking&lt;/span&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Typed_lambda_calculus"&gt;&lt;br /&gt;Typed lambda calculus&lt;/a&gt; is not Turing-complete.&lt;br /&gt;&lt;br /&gt;There. I said it.&lt;br /&gt;&lt;br /&gt;More specifically, &lt;a href="http://en.wikipedia.org/wiki/Simply_typed_lambda_calculus"&gt;simply typed lambda calculus&lt;/a&gt; is not Turing-complete, and neither are any variants that are both &lt;a href="http://en.wikipedia.org/wiki/Normalization_property_%28lambda-calculus%29"&gt;strongly normalizing&lt;/a&gt; and have decidable type-checking. This is because programs that the type-checker verifies are guaranteed to compute a result. If such a type-checker allowed a Turing-complete set of programs, it would be a solution to the &lt;a href="http://en.wikipedia.org/wiki/Halting_problem"&gt;halting problem&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;Really, I should have put two and two together earlier on this one. I suppose this is what comes of picking lambda calculus up by reading diverse things in diverse places rather than learning it from one authoritative source.&lt;br /&gt;&lt;br /&gt;What this indicates for me is that, at least in many cases, the point of allowing more and more sophisticated type systems is to get closer to a Turing-complete system. &lt;span style="font-style: italic;"&gt;That&lt;/span&gt; is why people add things like &lt;a href="http://en.wikipedia.org/wiki/Polymorphism_%28computer_science%29#Parametric_polymorphism"&gt;parametric polymorphism&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/Dependent_type_theory"&gt;dependent types&lt;/a&gt;, and &lt;a href="http://en.wikipedia.org/wiki/Kind_%28type_theory%29"&gt;kinds&lt;/a&gt;. When we add these to typed lambda calculus, it doesn't just get "more expressive" in the sense that a high-level programming language is more expressive than machine code; it is literally able to do things that a simpler type system could not.&lt;br /&gt;&lt;br /&gt;This doesn't mean that strongly typed programming languages are not Turing-complete. Typically the type-checkers for these will &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; guarantee that the program contains no infinite loops. So, one must be careful to figure out exactly what one is dealing with.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2141033664021344123?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2141033664021344123/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/06/what-they-dont-tell-you-about-type.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2141033664021344123'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2141033664021344123'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/06/what-they-dont-tell-you-about-type.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8574746919850729563</id><published>2009-06-10T12:56:00.000-07:00</published><updated>2009-06-10T14:57:11.055-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metamathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='set theory'/><category scheme='http://www.blogger.com/atom/ns#' term='truth'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Sets and Truth&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In this post, I will explain a way to create a theory of sets, given a theory of truth. These two foundational issues are typically treated as separate matters, so asking for a particular relationship to hold between a theory of truth and a set theory adds some interesting constraints to the situation.&lt;br /&gt;&lt;br /&gt;(Part of this post is an edited version of an email I sent originally to Randall Holmes.)&lt;br /&gt;&lt;br /&gt;Claim: A set is essentially a sentence with a hole in it. Set membership is characterized by the truth/falsehood of the sentences when we fill the holes.&lt;br /&gt;&lt;br /&gt;The major justification for this way of understanding sets is the way we use the comprehension principle to talk about sets. The comprehension principle is what allows us to say, in a set theory, that if we can describe the membership requirements for a set then that set exists. For example, I can describe the requirement "it must be a prime number that is over 5 digits long", so the set of prime numbers over five digits long exists. (The comprehension principle with no restrictions leads to naive set theory, however, which is paradoxical.)&lt;br /&gt;&lt;br /&gt;This view is not far from the way Frege explained sets, as I understand it. However, he distinguished the set as the &lt;span style="font-style: italic;"&gt;extension&lt;/span&gt; of the sentence-with-hole; meaning, the things for which it is true.&lt;br /&gt;&lt;br /&gt;So, suppose we've got a logic with enough machinery to represent computable functions, and furthermore we've got a description of the language in itself (ie, Godel-numbering-or-something-equivalent). Furthermore, we've got some theory of truth. Then, via the claim, we can already talk about sets, even though they haven't been purposefully introduced. In particular, "x is an element of y" is&lt;br /&gt;&lt;div id=":12n" class="ii gt"&gt; interpreted as:&lt;br /&gt;&lt;br /&gt;"When "x" is used to fill in the partial sentence Y, the result is true"&lt;br /&gt;&lt;br /&gt;where "x" is the &lt;span style="font-style: italic;"&gt;name&lt;/span&gt; of the term x (Godel-numbering-or-equivalent, again, for those who are familiar with such things), and Y is the sentence-with-hole corresponding to the set y.&lt;br /&gt;&lt;br /&gt;The truth predicate is needed here in order to assert the result of the substitution. With the naive theory of truth, it is &lt;span style="font-style: italic;"&gt;always&lt;/span&gt; meaningful to apply the truth predicate to a sentence. So, the naive theory of truth gives us the naive theory of sets, in which set-membership is meaningful for any set we can describe. Of course, this is inconsistent under classical logic.&lt;br /&gt;&lt;br /&gt;So, what I'm saying is: if the claim is accepted, then the set theory is pinned down completely by the theory of truth. The naive theory of truth gives us the naive theory of sets. A tarski-hierarchy theory of truth gives us something vaguely resembling type theory. Kripke's theory of truth gives us a theory in which all sets exist, but not all membership evaluations are meaningful. In particular, Russel's set "all sets that do not contain themselves" exists. We can meaningfully say that any set of integers is in Russel's set, and that the set of all sets (which exists) is not. The paradoxical situation, in which we ask if Russel's set is a member of itself, is simply meaningless.&lt;br /&gt;&lt;br /&gt;So good so far. But, there is the issue of extensionality to deal with. The axiom of extensionality is taken as a very basic fact of set theory, one that not even nonstandard set theories consider rejecting. Given the above discussion, however, the axiom of extentionality would be false. Two different sentences-with-holes can be logically equivalent, and so have the same extension. For example, "_ is an even prime number" and "_ added to itself equals 4" are the same set, but they are different sentences.&lt;br /&gt;&lt;br /&gt;My solution here is to interpret the notion of set-equality as being a notion of logical equivalence between sentences-with-holes, rather than one of syntactic equivalence. In other words, "x=y" for two sets x and y needs to be interpreted as saying that X and Y mutually imply each other given any slot-filler, rather than just as saying X=Y. But this is doable within the language, since all we need to do is quantify over the slot-filler-names.&lt;br /&gt;&lt;br /&gt;This can be thought of as my way of interpreting Frege's concept of the "extension" of a sentence-with-hole. Rather than being a seperate entity, the extension is a "view" of the sentence: the sentence up-to-equivalence.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8574746919850729563?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8574746919850729563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/06/sets-and-truth-in-this-post-i-will.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8574746919850729563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8574746919850729563'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/06/sets-and-truth-in-this-post-i-will.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8028974928035880919</id><published>2009-04-27T13:13:00.000-07:00</published><updated>2009-04-27T13:14:22.671-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='deduction'/><category scheme='http://www.blogger.com/atom/ns#' term='guiding inference'/><category scheme='http://www.blogger.com/atom/ns#' term='AM'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Thoughts on Guiding Inference&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One of the big rifts between the way things are described in the theoretical foundations of mathematics and the way things are actually done in real mathematics is that real mathematicians are constantly defining new objects to study. Perhaps someone sees a particular combination of properties pop up a few times, so they give that combination a name and start studying it directly. Perhaps someone decides a particular restriction of a given problem will be easier to solve, and so gives that a name and starts studying it. Perhaps someone wants to know how many X need to be combined to make Y, and they call the number the "X number" of Y, and start studying the properties of it as X and Y are varied. And so on.&lt;br /&gt;&lt;br /&gt;In fact, even when studying the foundations of mathematics, such names are often snuck in as "unofficial notation" that is thought of as abbreviating the official notation. Unofficial notation makes axioms and other formulae easier to read.&lt;br /&gt;&lt;br /&gt;Why do we do this? I can think of two explanations. First, one can explain this in the same way "named entities" are explained in other domains: it is compressive to name common entities. This means it takes up less room in memory, and also as a side benifit it usually makes the world easier to predict. This is an important explanation, but not the focus of this post. The second explanation is that naming things is a means of inference control.&lt;br /&gt;&lt;br /&gt;When we name things, what we proceed to do is to determine some basic properties of the new entities. (This seems true in non-mathematical discourse as well, but perhaps not as clear-cut.) We come up with properties of the new entities, and look for ways of calculating those properties. We come up with theorems that hold for the entities. We look for existence and uniqueness proofs. And so on. What I'm arguing is that all of this activity is basically focused on helping later inference.&lt;br /&gt;&lt;br /&gt;One particular phenomenon serves as a good illustration of this: the behavior of mathematicians when studying sequences of numbers that emerge from particular mathematical entities. The prime numbers; the number of groups of each finite size; the number of graphs of a given size; the fibonacci sequence. The big thing to try to do with a sequence is to "find the pattern". What could this mean? If the sequence is mathematically well-defined, don't we know the pattern already? A mathematician will find the first few values of a sequence, and then study them looking for relationships. Often what is sought is a recursive definition of the sequence: a function that calculates the next number based on the direct previous number, or several of the previous numbers, or perhaps all of the previous numbers. If we've already got a recursive form of the sequence, we might try to find a "closed form" version. My argument here is that all of this behavior is explained by the desire to make later reasoning as expedient as possible. Finding the recursive form is not merely a sort of curiosity, like wondering if it is possible to keep a spider alive in a paper cup; rather, the recursive form is a faster way of calculating the sequence then the method that follows directly from the definition. Similarly, the closed form wil usually be even faster.&lt;br /&gt;&lt;br /&gt;So, what I'm saying essentially is that a reasoning system should (when it isn't busy doing other things) be looking for nice entities related to the task at hand, and nice quick ways of calculating stuff about those entities. What "nice entity" means is based on two (sometimes conflicting) motivations: the entity should be compressive (meaning it should help make descriptions take up less space), and it should be tractable (meaning reasoning about it should be quick).&lt;br /&gt;&lt;br /&gt;How should the search for good ways of calculating be carried out? By the same methods as general inference. The system should be able to apply good reasoning methods that it finds back onto the task of looking for good reasoning methods. Of course, for this to work very well, the system needs to be able to have fairly good reasoning methods to begin with.&lt;br /&gt;&lt;br /&gt;What form should the problem-solving methods that the system finds take?&lt;br /&gt;&lt;br /&gt;Definitely they should be able to take on the form of typical algorithms: closed-form expressions, recursive expressions, and generally any . Definitely these should be associated with the necessery criteria for application to an object (the criteria that guarantee correct results). Probably they would also be associated with provable upper bounds on runtime, so that the method chosen for a particular case (where multiple methods might apply) would be the one with the lowest time-estimate for that case. (Problems might arise for difficult-to-calculate time bounds; perhaps estimators would be required to be linear time.)&lt;br /&gt;&lt;br /&gt;Some extra features could probably improve upon this basic setup:&lt;br /&gt;&lt;br /&gt;-Allow possibly-non-terminating algorithms. This would make "search for a proof" count amoung the methods (as the last resort), which strikes me as elegent.&lt;br /&gt;&lt;br /&gt;-Allow learned expected time bounds&lt;br /&gt;&lt;br /&gt;-Allow "soft" problem-solving strategies (of possibly various sorts) to be learned; ie, heuristics&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8028974928035880919?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8028974928035880919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/thoughts-on-guiding-inference-one-of.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8028974928035880919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8028974928035880919'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/thoughts-on-guiding-inference-one-of.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-45569616868776409</id><published>2009-04-27T12:31:00.000-07:00</published><updated>2009-05-18T20:09:46.934-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tarski hierarchy'/><category scheme='http://www.blogger.com/atom/ns#' term='infinity'/><category scheme='http://www.blogger.com/atom/ns#' term='truth'/><category scheme='http://www.blogger.com/atom/ns#' term='ordinals'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Correction&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Last time, I wrote about the apparent limitations of logical systems arising purely from a formal axiomatization of the Tarski hierarchy. I said:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;So, if this gets us a lot of math, what is missing?&lt;br /&gt;&lt;br /&gt;The obvious answer is "a truth predicate for that system". This doesn't lend much insight, though.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;I should have thought before I spoke. A more powerful language cannot be constructed by adding a truth predicate. The semantics of the language is that it should be referring to the Tarski hierarchy! A truth predicate over such a language would need to be a truth predicate over the entire Tarski hierarchy. Such a truth predicate would apparently correspond to an ordinal above all ordinals. Not good!&lt;br /&gt;&lt;br /&gt;As a side note: I've been looking at Quine's "New Foundations", and in that system, it is possible to talk about the ordering of all ordinals. Surprisingly, this ordinal does not cause problems. So, maybe in New Foundations the above move would be OK. With a naive view of the ordinals, though, it is not.&lt;br /&gt;&lt;br /&gt;Keeping the naive view, it seems like I should deny the possibility of enriching the language by adding a truth predicate. Does that mean that I should say that the language is maximally rich? I don't think so. I suspect the situation is more like what happens with Kripke's language that contains it's own truth predicate: the language can be expanded in other, new directions.&lt;br /&gt;&lt;br /&gt;[edit- some of the text was garbled as posted. I may have lost a paragraph or so.]&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-45569616868776409?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/45569616868776409/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/last-time-i-wrote-about-apparent.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/45569616868776409'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/45569616868776409'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/last-time-i-wrote-about-apparent.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-135003507849581207</id><published>2009-04-25T13:28:00.000-07:00</published><updated>2009-04-25T16:08:42.125-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='uncountable'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Limitations&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(Analysis of the system proposed in &lt;a href="http://dragonlogic-ai.blogspot.com/2009/04/enumerating-infinity-what-are-we-doing.html"&gt;this post&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;The idea of creating axiom systems describing Tarski's hierarchy in terms of ordinals, and ordinals in terms of Tarski's hierarchy, was discussed previously. It seems that the circularity would help prove the existence of lots of ordinals (and lots of Tarskian levels), hopefully without leading to paradox. So, if this gets us a lot of math, what is missing?&lt;br /&gt;&lt;br /&gt;The obvious answer is "a truth predicate for that system". This doesn't lend much insight, though.&lt;br /&gt;&lt;br /&gt;My bet is that we don't get anything uncountable by such a method. Where would it come from? We're constructing truth-predicates for countable languages, and (initially) countable sets of countable languages. Sure, the set of all countable ordinals is uncountable. But there is no reason to believe that we get the set of all countable ordinals!&lt;br /&gt;&lt;br /&gt;My feeling is that if I studied more about the hyperarithmetical hierarchy, there would be an obvious mapping between some portion of it and the system(s) under consideration.&lt;br /&gt;&lt;br /&gt;In some ways, the notion of "uncountable" seems to come from nowhere. All the formal constructions for "real number" and related entities seem to rely at some point on some other uncountable set. It's a bit like the (obviously illicit) method of defining a really big ordinal: "The ordering of all ordinals that can be defined without reference to this ordinal".&lt;br /&gt;&lt;br /&gt;Yet I do feel that the notion is meaningful! So, where might it come from?&lt;br /&gt;&lt;br /&gt;I have several ideas, none of them entirely convincing.&lt;br /&gt;&lt;br /&gt;One idea is that uncountable sets can be motivated by considering the space of possible sequences of input from the external environment, assuming a countably infinite amount of time. I've seen similar ideas elsewhere. One might counter that all structures die eventually, so all sequences of input in real-world situations are finite; this makes the set into a mere countable infinity. On the other hand, one might say that this is merely the case in practice; the idea of an infinitely long-lived structure is still sound, and even physically possible (if not probable). But even so, I don't feel like this is a really complete justification.&lt;br /&gt;&lt;br /&gt;Another attempt would be to claim that we need to allow for the possibility of infinitely long statements, despite not being able to actually make and manipulate such statements. (Universal statements and such may abbreviate infinitely many claims, but they are not literally infinitely long.) This idea is motivated by the following consideration: a nice way of getting a theory of set theory from a non-set-theoretic foundational logic is to think of a set as a statement with an unfilled slot into which entity-names can be put to make complete statements. Claiming that a set contains an element is thought of as claiming that the statement is true of that object. At first, this might seem to fully justify naive set theory: a set exists for each statement-with-hole. But, this can only work if the theory contains its own truth predicate, so that we can make arbitrary talk about whether a statement is true when we fill a slot with a given element. The amount of set theory that gets captured by this method depends basically on the extent to which the theory is capapble of self-reference; the naive theory of truth corresponds to naive set theory.&lt;br /&gt;&lt;br /&gt;This is interesting by itself, but the point I'm making here is that if we want to have an uncountable number of sets (for example if we believe in the uncountable powerset of the real numbers), then we'll want a logic that acts as if infinite statements exist. What this means is an interesting question; we can't actually use these infinite statements, so what's the difference with the logic?&lt;br /&gt;&lt;br /&gt;One difference is that I don't automatically have a way of interpreting talk about turing machines and numbers and other equivalent things anymore. I was justifying this referential power via talk about statements: they provide an immediate example of such entities. If we posit infinite statements that are "out there", somewhere in our set of sentences, we lose that quick way of grounding the idea of "finite". (This could be taken to show that such a method of grounding is not very real in the first place. :P)&lt;br /&gt;&lt;br /&gt;Semantically, then, we think of the base-language as an infinitary logic, rather than regular first-order logic. The language formed by adding the first truth predicate is then thought of as already containing talk about uncountable sets. (This changes the axoims that we'd be justified in using to manipulate the truth predicate.)&lt;br /&gt;&lt;br /&gt;All in all, I think this direction is mathematically interesting, but I'm not sure that it is really a route to justify uncountables.&lt;br /&gt;&lt;br /&gt;A relevant question is: why do mathematicians think that uncountables exist? The proof is given by taking the powerset of some countably infinite set, which is defined as the set of all subsets of the countably infinite set. It's then shown that no function exists that maps the powerset onto the countable set. This can be done even in systems that does not really have any uncountable sets: the set of subsets of a countably infinite set will map onto the original set, but not by a function within the system. So from inside the system, it will look as if there are uncountables.&lt;br /&gt;&lt;br /&gt;If this explains what mathematicians are doing, then mathematicians are being fooled into thinking there are real uncountables... but how can I say that? I'm just being fooled into thinking that there is a difference, a "real" and "not real", right?&lt;br /&gt;&lt;br /&gt;I think it is plausible that this weirdness would go away, or at least change significantly, in a logic that resolves other foundational problems. We might have a much better story for why the function that would map the uncountable set onto the countable set doesn't exist, so that it becomes implausible to claim that it exists from the outside but not from the inside. (But would that make the uncountables "real"?)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-135003507849581207?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/135003507849581207/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/limitations-analysis-of-system-proposed.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/135003507849581207'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/135003507849581207'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/limitations-analysis-of-system-proposed.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2547533717717677094</id><published>2009-04-25T13:21:00.000-07:00</published><updated>2009-04-25T13:24:50.901-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='interpretations'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Some musings on interpretations&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(Originally posted to &lt;a href="http://groups.google.com/group/one-logic"&gt;this logic mailing list I'm trying to get started&lt;/a&gt;. All are welcome! Except if you know nothing about logic and aren't willing to learn. Then you're not welcome. Sorry.)&lt;br /&gt;&lt;p&gt;An interpretation is a translation of a statement from one logic into&lt;br /&gt;another. I was looking at interpretations recently because they appear&lt;br /&gt;to be a way to show that semantically stronger logics (specifically,&lt;br /&gt;logics formed by adding a truth predicate) really are stronger in&lt;br /&gt;practice. The stronger logic can interpret the weaker, but the weaker&lt;br /&gt;cannot interpret the stronger. (This is one type of proof-theoretic&lt;br /&gt;strength.)&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Normally, one does not allow just *any* interpretation... for example,&lt;br /&gt;we wouldn't want a universal quantifier in the first logic to be&lt;br /&gt;interpreted as an existential one in the new logic. It is typical (so&lt;br /&gt;far as I have seen) to require the logical connectives to remain&lt;br /&gt;unchanged, and only allow minimal changes to the quantifiers (such as&lt;br /&gt;translating them to be restricted quantifiers).&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Yet, all we care about is structure in these interpretations. For&lt;br /&gt;example, if we're interpreting talk about numbers into a logic that&lt;br /&gt;talks about sets, we don't really care if the resulting translated&lt;br /&gt;sentences don't have anything to do with sizes of sets-- all we're&lt;br /&gt;supposed to worry about is relationships. For example, our new&lt;br /&gt;translated notion of "addition" should tell is that "5" and "6" makes&lt;br /&gt;"11". (Usually when people do construct interpretations, of course,&lt;br /&gt;they try to find ones that make some intuitive sense.)&lt;br /&gt;&lt;/p&gt;&lt;p&gt;So if all we care about is structure, why constrain the way logical&lt;br /&gt;connectives and quantifiers are translated? What happens if we get rid&lt;br /&gt;of these constraints?&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Well... we can't get rid of *all* the constraints: we still need to&lt;br /&gt;capture what we mean by "interpretation". Not every function from one&lt;br /&gt;language to the other counts! The way I see it, we want to keep the&lt;br /&gt;following minimal constraint:&lt;br /&gt;&lt;/p&gt;&lt;p&gt;C1: If A |- B in L1, then f(A) |- f(B) in L2&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Where f(X) is the translation function, "X |- Y" means Y can be proven&lt;br /&gt;from X, L1 is the language being translated, L2 is the language being&lt;br /&gt;translated into, and A and B are statements in L1.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;I've also thought of the following:&lt;br /&gt;&lt;/p&gt;&lt;p&gt;C2: A |- B in L1 if and only if f(A) |- f(B) in L2&lt;br /&gt;&lt;/p&gt;&lt;p&gt;But, that is not much like the standard notion of interpretation.&lt;br /&gt;Essentially it means that the interpretation into L2 adds no extra&lt;br /&gt;knowledge about the entities in L1. But, if we're looking for strong&lt;br /&gt;languages, extra knowledge is a good thing (as long as it is true&lt;br /&gt;knowledge). (One could argue that this justification doesn't apply if&lt;br /&gt;we're only worrying about structure, though, I think. Specifically, of&lt;br /&gt;when we say "structure" we mean not just structures of what's provable&lt;br /&gt;but also structures of what isn't, we should take C2 as the proper&lt;br /&gt;constraint. I'll proceed with C1 since it is the more standard-looking&lt;br /&gt;constraint.)&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Anyway. What now happens to the earlier assertion, that semantically&lt;br /&gt;stronger languages are also proof-theoretically stronger, because they&lt;br /&gt;can interpret more logics?&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Answer: plain first-order logic can interpret any logic L with a&lt;br /&gt;computable notion of provability.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;Proof: Arbitrarily declare that some of the function symbols represent&lt;br /&gt;the operations necessary to build up statements in L (for example, one&lt;br /&gt;function might be chosen for each character in the alphabet of L, so&lt;br /&gt;that composing those functions in a particular sequence would&lt;br /&gt;represent writing down characters in that sequence). For an&lt;br /&gt;L-statement X, call the first-order version of X built just from these&lt;br /&gt;function symbols h(X). Write down some statements describing&lt;br /&gt;L-provability. Writing down the rules of L like this is possible&lt;br /&gt;because first-order logic is Turing-complete. Take the conjunction of&lt;br /&gt;the statements about L and call it R (for "rules"). The interpretation&lt;br /&gt;of an L-statement X will simply be R-&gt;h(X), where "-&gt;" is material&lt;br /&gt;implication. This will satisfy C1 because wherever L proves A from B,&lt;br /&gt;first-order logic will prove that if the rules of L-provability hold,&lt;br /&gt;then A is L-provable from B. (This proof is still pretty messy, but&lt;br /&gt;nonetheless I'm guessing I'm going into more detail here than needed,&lt;br /&gt;so I'll leave it messy for now.)&lt;br /&gt;&lt;/p&gt;&lt;p&gt;What does this mean? To me this means that proof-theoretic strength is&lt;br /&gt;not a very justifiable way of enticing people over to the side of&lt;br /&gt;logics with strong semantics. First-order logic is in some sense&lt;br /&gt;proof-theoretically all-powerful (if we allow arbitrary&lt;br /&gt;interpretations, and if we're dealing with computable provability). If&lt;br /&gt;someone is set in their path of using just first-order logic, I cannot&lt;br /&gt;convince them just by talking about first-order truth; first-order&lt;br /&gt;logic doesn't have a strong enough semantics to talk about first-order&lt;br /&gt;truth, but they can interpret what I am saying by just listening to&lt;br /&gt;the computable inference rules of my more-powerful logic. Every time I&lt;br /&gt;say X, they can interpret me as meaning "The rules I've laid out imply&lt;br /&gt;X". They will then be able to assert that first-order reasoning can&lt;br /&gt;justify all the reasoning I'm doing, without even needing any new&lt;br /&gt;axioms.&lt;br /&gt;&lt;/p&gt;I'll then try to argue that I'm actually asserting X, not just&lt;br /&gt;asserting that X follows from the rules I've laid out... but if&lt;br /&gt;they're *really* applying the interpretation properly, they'll tell me&lt;br /&gt;that I'm making a meaningless distinction, since they'll think&lt;br /&gt;R-&gt;(R-&gt;h(X)) is the same as R-&gt;h(X). (If they make this reasoning&lt;br /&gt;explicit, though, I have them: I can assert that I'm not saying&lt;br /&gt;R-&gt;h(X), I'm saying h(X).)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2547533717717677094?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2547533717717677094/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/some-musings-on-interpretations.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2547533717717677094'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2547533717717677094'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/some-musings-on-interpretations.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8508038023578943909</id><published>2009-04-17T09:21:00.000-07:00</published><updated>2009-04-17T09:21:00.769-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metamathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='truth'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Truth and Nonsense&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Continues &lt;a href="http://dragonlogic-ai.blogspot.com/2009/03/after-having-thought-about-it-i-am.html"&gt;this&lt;/a&gt; post.&lt;br /&gt;&lt;br /&gt;I think now that it is relatively straightforward to establish a correspondence between the Tarski hierarchy of truth and my hierarchy of nonsense.&lt;br /&gt;&lt;br /&gt;Basically, the two hierarchies diverge thanks to two different notions of the correct way to add a "truth" predicate to a base language. The Tarski hierarchy adds a metalanguage that only talks about truth in the base language. The nonsense hierarchy instead prefers Kripke's method, in which we construct a metalanguage that contains its own truth predicate. Both approaches can be thought of as constructing a metalanguage on top of a base language, but the Tarski hierarchy &lt;span style="font-style: italic;"&gt;keeps doing so&lt;/span&gt;, resulting in a hierarchy of truth, whereas the Kripke fixed-point construction cannot be iterated- doing so adds nothing more. To continue upwords in the Kripke construction, we proceed in a different direction, adding &lt;span style="font-style: italic;"&gt;nonsense&lt;/span&gt; predicates.&lt;br /&gt;&lt;br /&gt;When we use the Kripke truth construction, we can clearly interpret the first Tarski iteration: all the truth-sentences that talk only about truth of base-language statements will be there, provided we have enough machinery to interpret the restricted quantifiers. (Details here will depend on the exact construction.) The semantics assigns them the same truth values; none of these sentences will come up undefined. (I'm talking about semantic interpretations, not proof-theoretic ones... again, details need to be worked out.) The second iteration of Tarskian truth will similarly be inside the Kripke construction; since the first iteration gets definite truth values, the second does. So it goes for as long as the Kripke construction can interpret the restricted quantifiers; that is, for as long as the characteristics of a particular level of the Tarski hierarchy are definable given the tools that the Kripke construction has at its disposal. For example, if these tools can only define computable structures, I'd suppose that the Kripke construction would interpret the portions of the Tarski hierarchy corresponding to the computable ordinals. (That's just a guess. More research required!)&lt;br /&gt;&lt;br /&gt;In any case, given a particular amount of expressiveness in the base langauge, the Kripke construction will add a definite amount of expressiveness, corresponding to climbing a particular number of Tarski-hierarchy steps. (Probably this is known; I imagine someone has researched the semantic expressiveness of the Kripke least-fixed-point...) So what happens when we add in more nonsense predicates? Well, adding in nonsense predicates basically allows us to climb that same number of levels again; each nonsense predicate plays the role of allowing us to talk fully about the semantic structure of the construction-so-far (the role that the &lt;span style="font-style: italic;"&gt;truth&lt;/span&gt; predicate plays in the Tarski hierarchy). This can be thought of as adding that amount of structure to the base language. Then, the Kripke truth construction can do its work on that increased amount of structure. So, we jump up the same number of steps on the Tarski hierarchy for every nonsense predicate added.&lt;br /&gt;&lt;br /&gt;Eventually, since the amount of structure added by the truth predicate is always fixed, the scene will be dominated by the hierarchical structure added by the nonsense predicates. Still, it seems clear that each level will correspond in a definite way to a level on the Tarski hierarchy. The nonsense hierarchy merely forces one to make larger jumps at a time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8508038023578943909?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8508038023578943909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/truth-and-nonsense-continues-this-post.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8508038023578943909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8508038023578943909'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/truth-and-nonsense-continues-this-post.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-5104564335420941914</id><published>2009-04-10T11:31:00.000-07:00</published><updated>2009-04-24T14:38:18.320-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Enumerating Infinity&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;What are we doing when we list infinities? Particularly, I take &lt;a href="http://www.google.com/search?q=ordinal+site:http://en.wikipedia.org/wiki+-inurl:%22User:%22+-inurl:Talk:+-inurl:%22User_talk:%22+-inurl:%22Template:%22+-inurl:%22Template_talk:%22&amp;amp;btnI=I%27m+Feeling+Lucky"&gt;ordinal infinities&lt;/a&gt; as my example.&lt;br /&gt;&lt;br /&gt;The most natural ordinals, which people will tend to stay within if you ask them to invent bigger and bigger ordinals and give them the basic definitions, are the &lt;a href="http://en.wikipedia.org/wiki/Recursive_ordinal"&gt;recursive ordinals&lt;/a&gt;. In fact, people will tend to stay within what I'd call the "prinitive recursive" ordinals:&lt;br /&gt;&lt;br /&gt;infinity, infinity + 1, infinity + 2, .... infinity + infinity (= 2*infinity), 3*infinity, 4*infinity, ... infinity*infinity (= infinity^2), infinity^3... infinity^infinity (=infinity^^2), infinity^(infinity^infinity) (=infinity^^3), infinity^(infinity^(infinity^infinity)) (=infinity^^4) ...... infinity^^infinity ..... infinity^^^infinity ..... infinity^^^^infinity .....&lt;br /&gt;&lt;br /&gt;The above uses &lt;a href="http://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation"&gt;up-arrow notation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Enumerating computable ordinals involves finding halting computations, such as the numerical operations used above. Sticking infinity into a computable function basically stands for the infinite &lt;span style="font-style: italic;"&gt;series&lt;/span&gt; produced by sticking &lt;span style="font-style: italic;"&gt;finite&lt;/span&gt; numbers in that same location. So just "infinity" stands for:&lt;br /&gt;&lt;br /&gt;1, 2, 3, 4, 5, ....&lt;br /&gt;&lt;br /&gt;We can define + as the concatenation of two different orderings, which leaves the internals of the orderings untouched but specifies that all elements from the second will be larger than any elements from the first. So, "infinity + 1" stands for:&lt;br /&gt;&lt;br /&gt;1, 2, 3, 4, .... A.&lt;br /&gt;&lt;br /&gt;Here, "A" is a pseudo-number larger than all the finite numbers. (In other words, X is the first infinite number.)&lt;br /&gt;&lt;br /&gt;"infinity + infinity":&lt;br /&gt;&lt;br /&gt;1, 2, 3, ... A, A+1, A+2, A+3 ...&lt;br /&gt;&lt;br /&gt;Here, there are essentially two infinite series, the one that starts with 0, and the one that starts with A.&lt;br /&gt;&lt;br /&gt;"infinity + infinity + infinity":&lt;br /&gt;&lt;br /&gt;1, 2, 3, ... A, A+1, A+2, ... B, B+1, B+2 ...&lt;br /&gt;&lt;br /&gt;Now there are 3 series; 0, A, and B.&lt;br /&gt;&lt;br /&gt;"infinity * infinity":&lt;br /&gt;&lt;br /&gt;1, 2, 3, ... A, A+1, A+2 ... B, B+1 ... C, C+1 ... D, D+1 ... ...&lt;br /&gt;&lt;br /&gt;Now we have an infinite number of series.&lt;br /&gt;&lt;br /&gt;And so on. We can imagine each computable function as a method of specifying an infinite tree. The tree is "computable" because we can generate any finite portion of the tree using the computable function. Approximately speaking, the more tangled and structured and multiferous the branching of the tree, the larger the ordinal. But if we try just generating computations at random, then they may not halt; so we've got to try to put as much tangledness in as we can without putting in &lt;span style="font-style: italic;"&gt;so&lt;/span&gt; much tangledness that things become ill-defined. This can potentially take any knowledge we've got about the &lt;a href="http://en.wikipedia.org/wiki/Halting_problem"&gt;halting problem&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If this were all there was to enumerating infinities, then I would say that something like probabilistic &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html"&gt;mathematical truth&lt;/a&gt; explains what it is we're doing. However, mathematicians (especially logicians) talk about ordinals that are not computable. These include &lt;a href="http://en.wikipedia.org/wiki/Kleene%27s_O"&gt;Kleene's O&lt;/a&gt;, the first uncomputable ordinal (which makes it the ordering on all computable ordinals), as well as many &lt;a href="http://en.wikipedia.org/wiki/Large_countable_ordinal#Beyond_recursive_ordinals"&gt;higher countable ordinals&lt;/a&gt;; the first uncountable ordinal, which is of course just the ordering of all countable ordinals; and so on.&lt;br /&gt;&lt;br /&gt;To make the initial jump outside of the recursive ordinals, to Kleene's O, we need to make an interesting sort of move: we admit our ignorance. We give up the hope of being able to enumerate every ordinal on the way, and make due with saying "if the function halts, then the ordinal is well-defined". Since we never will have all the information about which functions halt, we'll always be somewhat ignorant of which of these ordinals are well-defined. Yet, we look at their ordering as a mathematical entity, and start talking about it.&lt;br /&gt;&lt;br /&gt;This gives the impression that we'll need to give up more and more if we want to climb higher in the ordinals. But how much can we give up before we've given everything up?&lt;br /&gt;&lt;br /&gt;I propose that, in general, the process of listing ordinals is a process of deciding which things are &lt;span style="font-style: italic;"&gt;well-defined&lt;/span&gt;. If we give &lt;span style="font-style: italic;"&gt;that&lt;/span&gt; up, we've given up too much.&lt;br /&gt;&lt;br /&gt;Here, "well-defined" means "having a description on some level of Tarski's hierarchy of truth", or alternatively, "having a description on some level of my hierarchy of nonsense".&lt;br /&gt;&lt;br /&gt;Of course, this is circular. Those hierarchies are both defined in terms of ordinals, so defining ordinals in terms of them appears unproductive. However, it is not &lt;span style="font-style: italic;"&gt;completely&lt;/span&gt; unproductive. Let's take Tarski's hierarchy as the working example. Let 0 represent first-order logic, 1 represent the theory of truth for that (specifically, the theory with enough axioms to be equivalent in strength to Peano Arithmetic), 2 to be the theory of truth-in-1, and so on.&lt;br /&gt;&lt;br /&gt;The thing I want to note here is that the ordinal assigned to a level in the hierarchy is far lower than the largest ordinals whose existance can be proven in that theory. Suppose I lay down axioms for Tarski's hierarchy in terms of ordinals, and then lay down axioms for ordinals which require definability in Tarski's hierarchy. It seems that I'll get a large number of ordinals in this manner. If I start out believing that the ordinal 1 is well-defined, then I'll believe all the ordinals proven well-defined by Peano arithmetic are well-defined. That is a rather large number of ordinals. Since I believe in them, I'll believe in all the levels of the Tarski hierarchy corresponding to them... lots of levels! This gives me many more ordinals to believe in, which gives me many more levels to believe in, and so on.&lt;br /&gt;&lt;br /&gt;Of course, this stops somewhere (in the same sense that counting up stops somewhere...). It will only imply the existence of so many ordinals, assuming that it is a consistent theory. Furthermore, if it is consistent, then I can name an ordinal that it does not: the ordering of all the ordinals the system talks about. Let's call this the "outside ordinal" for the system. (This is a bit trickier to specify than it looks at first, however. We can't just say "the ordering of all ordinals the system will consider well-defined". The system will have gaps in its knowledge. For example, it will prove a bunch of recursive ordinals to be well-defined, but not all of them; it then jumps to Kleene's O, because it can &lt;span style="font-style: italic;"&gt;talk about&lt;/span&gt; the set of well-defined recursive ordinals. Even more clearly: the system might be able to prove that the first uncountable ordinal is well-defined, but it will not be able to prove that all ordinals below this are well defined... there are uncountably many of them!&lt;br /&gt;&lt;br /&gt;The main precaution that must be taken is to prevent the system from taking "the ordering over all ordinals" to be an ordinal. This is like me saying "Consider the set of all well-defined ordinals. Consider the ordering on these as an ordinal, Q. Take Q + 1..." This is not allowed!&lt;br /&gt;&lt;br /&gt;OK. Given that, let's think about what happens when we add probabilistic justification to the system. We can think of the system as (eventually) knowing the truth about the halting problem (for any particular instance). This means that it is (eventually) correct in its judgements about well-definedness for computable ordinals. Thanks to the feedback effect of the system, this will mean that it is (eventually) correct in its judgements concerning a whole lot of ordinals. All of them? No: just as there are ordinal notations for which the halting problem determines well-definedness, there are ordinal notations for which the convergence problem determines well-definedness. (For a definition of the convergence problem, see &lt;a href="http://www.idsia.ch/%7Ejuergen/kolmogorov.html"&gt;here&lt;/a&gt;.) Still, this is an interesting class of ordinals.&lt;br /&gt;&lt;br /&gt;So how could one go even further? Well, perhaps we could consider the "outside ordinal" for the version of the system that knows all the halting truths...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-5104564335420941914?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/5104564335420941914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/enumerating-infinity-what-are-we-doing.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5104564335420941914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5104564335420941914'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/enumerating-infinity-what-are-we-doing.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8880298578986466285</id><published>2009-04-08T12:36:00.000-07:00</published><updated>2009-04-11T11:39:09.497-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ethics'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Some Numbers&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Continues &lt;a href="http://dragonlogic-ai.blogspot.com/2009/04/risk-estimate-4-probability-estimates.html"&gt;Risk Estimate&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I am generally disappointed by the lack of detailed tracking of the progress of computing power... I do not know where to find projections based on large amounts of up-to-date data. The most often cited projection is Kurzweil's, based on data from up to 2000; for example, &lt;a href="http://images.google.com/imgres?imgurl=http://brent.kearneys.ca/wp-content/uploads/2009/02/ss08_exponential_growth_large.jpg&amp;amp;imgrefurl=http://brent.kearneys.ca/2009/02/14/the-singularity-summit-2008/&amp;amp;usg=__c3EBL4Ep1vpTVtvYcTWgmKes6Pk=&amp;amp;h=870&amp;amp;w=1020&amp;amp;sz=141&amp;amp;hl=en&amp;amp;start=14&amp;amp;um=1&amp;amp;tbnid=DGCdHNouYbYr5M:&amp;amp;tbnh=128&amp;amp;tbnw=150&amp;amp;prev=/images%3Fq%3Dkurzweil%2Bexponential%26hl%3Den%26client%3Dfirefox-a%26rls%3Dcom.ubuntu:en-US:unofficial%26sa%3DN%26um%3D1"&gt;here&lt;/a&gt;. I found one that may be &lt;a href="http://images.google.com/imgres?imgurl=http://havemacwillblog.com/wordpress/wp-content/gallery/article-diagrams/compint.gif&amp;amp;imgrefurl=http://havemacwillblog.com/2009/02/02/what-will-happen-when-computers-exceed-our-inteligence/&amp;amp;usg=__rbLvACKNmakUZHS-1ExWG_60gWg=&amp;amp;h=380&amp;amp;w=530&amp;amp;sz=23&amp;amp;hl=en&amp;amp;start=21&amp;amp;um=1&amp;amp;tbnid=_i-kBd56wtNHgM:&amp;amp;tbnh=95&amp;amp;tbnw=132&amp;amp;prev=/images%3Fq%3Dkurzweil%2Bexponential%26ndsp%3D18%26hl%3Den%26client%3Dfirefox-a%26rls%3Dcom.ubuntu:en-US:unofficial%26sa%3DN%26start%3D18%26um%3D1"&gt;more recent&lt;/a&gt;, but it does not show data points, just a curve. These curves are based on what $1000 gets you. It is also interesting to note the &lt;a href="http://images.google.com/imgres?imgurl=http://www.vetta.org/VettaPics/SuperComputerPower.jpg&amp;amp;imgrefurl=http://www.vetta.org/2006/09/moore-power/&amp;amp;usg=__yTEEebWVASQ4un8DUuGrcR6XCV4=&amp;amp;h=429&amp;amp;w=589&amp;amp;sz=41&amp;amp;hl=en&amp;amp;start=56&amp;amp;um=1&amp;amp;tbnid=kunKimWbdQhJ8M:&amp;amp;tbnh=98&amp;amp;tbnw=135&amp;amp;prev=/images%3Fq%3Dkurzweil%2Bexponential%26ndsp%3D18%26hl%3Den%26client%3Dfirefox-a%26rls%3Dcom.ubuntu:en-US:unofficial%26sa%3DN%26start%3D54%26um%3D1"&gt;supercomputing curve&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Moore himself believes that the exponential trend will stop in &lt;a href="http://news.cnet.com/8301-10784_3-9780752-7.html"&gt;10 to 15 years&lt;/a&gt;, but he does not take into account 3d chips (by his own admission) or other possible ways around the halting spot. I think it is safe to allow a few more years based just on that... but since the halting spot is a point of contention, for now I'll just give the analysis based on the raw exponential curve.&lt;br /&gt;&lt;br /&gt;By the curve, computers reach human-level processing power between 2020 and 2030. leave the curve at exponential, rather than going super-exponential as Kurzweil suggests we should, the rate looks to be around 5 orders of magnitude for every 10 to 20 years. (I know that is a big range, but I don't have much data to go on.) So if a computer is 1 person in 2025, it is 10 people in 2040, and 100 people in 2055.&lt;br /&gt;&lt;br /&gt;What does it mean for a computer to be as smart as 10 or 100 people? A first approximation would be to say that it would be able to accomplish as much intellectual activity as a group of 10 or 100 people that were completely unified in their aim, and could communicate with eachother continuously. But even this estimate is low, because there is a large amount of redundancy in 10 such people. It is hard to estimate how much, but roughly we could say that only 1 visual cortex is needed, and the other 9 people could be using that processign power for something else; only 1 motor cortex is needed, so the other 9 could be using that processing power for something else; and so on. This might (roughly) double the amount of thinking power, after the first person (who needs their motor and visual cortex intact).&lt;br /&gt;&lt;br /&gt;So how safe is this? I'd say as soon as we get roughly human level we've got a significant risk of the AI deciding it would be awesome to "get loose" and use spare computer power stolen from the internet. My estimate could be greatly improved here, but there are about 7 billion people in the world, and more will be here by 2020, so assuming 1/8th have computers on the internet (that is where the estimate is shaky) we're talking a 1-billion-fold increase in computing power as soon as an AI is able to "get loose". That assumes that the 1 billion computers on the net are roughly equivalent in power to the 1 that the AI started on (in line with the idea that we're estimating based on what $1000 of computing power is). By my earlier estimate, an AI as smart as 1 person becomes as smart as 2 billion. But once we go distributed, the advanteges I was talking about go away; the massive redundancy becomes necessary. So, back to 1 billion. (The idea that "smartness" merely doubles when we get rid of the inefficiencies of distributed existence is blatantly silly for the cae of billions of people... oh well.)&lt;br /&gt;&lt;br /&gt;So, could the world defend itself against the intellectual effort of 1 billion virtual people? We've got 8 billion on our side (by that point)... plus we start out with a good-sized advantage in terms of control of resources. How much could 1 billion hackers aquire on short notice?&lt;br /&gt;&lt;br /&gt;For now, I'll give that a 50% chance if extinction assuming bad intent on the AIs part, and assuming it is able to get on the internet, and assuming it is able to create a virus of some kind (or use some other scheme) to get a fair chunk of the internet's computing power. I'll give 50% probability to those other two as well... making 12.5% probability of extreme badness given evil intent. So the question is, how probable is bad intent in this scenario?&lt;br /&gt;&lt;br /&gt;By the way, this estimate puts current computers at around the level of a mouse. Do the best current AIs acheive this? Seems doubtful, I'd say. Mice are pretty smart. They accomplish a fair amount of visual recognition, and furthermore, they are able to put it to good use. (The second part is what we have the least ability to model, I think... goal-oriented systems that can flexibly use highly structured sensory info.) So, by the model so far, AI progress will more probably be sudden then gradual... someone will put together an algorithm capable of taking full advantage of the hardware, and things will change.&lt;br /&gt;&lt;br /&gt;I'd say that might happen in anywhere between 5 and 20 years. The outcome if it happens in 5 years are very different from those if it happens in 20. If it happens in 5 years, I'd say good results are fairly certain. 10 years, and there is more concern. 20 years and Very Bad Things have fair chances, maybe as high as my 10% "halt everything" level.&lt;br /&gt;&lt;br /&gt;Let's take that arbitrary statement and mathematize it... 5 to 10 years = .1% chance of Really Bad Things, 10 to 15 = 1%, 15 to 20 = 10%.&lt;br /&gt;&lt;br /&gt;Giving each option a 1/3 probability, we have around 3.7%.&lt;br /&gt;&lt;br /&gt;But, giving my assumptions, the best way to reduce risk appears to be trying to find a good algorithm quickly (favoring open research). At some point between 2015 and 2020, the risk goes beyond 1% (which I arbitrarily label as the point of "real concern") and the strategy should turn towards making sure the first good-enough algorithm is also endowed with a safe goal system.&lt;br /&gt;&lt;br /&gt;It should be obvious that this estimate is an &lt;span style="font-style: italic;"&gt;extremely&lt;/span&gt; rough one. More later?&lt;br /&gt;&lt;br /&gt;---[edit]---&lt;br /&gt;&lt;br /&gt;One of the most obvious factors not considered here is the chance that the brain is badly-designed enough that a human level AI could be run on current PCs. The probability of this is nonzero, but if the brain were really so inefficient (= 5 orders of magnitude of inefficiency), I would expect that human AI programmers would already be outperforming it. The idea that current AIs are not as smart as mice despite having roughly as much hardware suggests that brains are fairly well-engineered. (The statement "roughly as much hardware" here needs to be made more specific. However, as long as it is inside 5 orders of magnitude, the argument makes some sense.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8880298578986466285?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8880298578986466285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/some-numbers-continues-risk-estimate.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8880298578986466285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8880298578986466285'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/some-numbers-continues-risk-estimate.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-930172177372383720</id><published>2009-04-08T09:51:00.000-07:00</published><updated>2009-04-08T09:51:00.817-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='constructivism'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Provable Truths&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The picture I've been using mostly for the idea of knowable truths goes something like this: quantifier-free statements are computable (we can tell if they are true or false); statements that can be reduced to a normal form containing only existential quantifiers are computably verifiable (if they are true, we can know this by eventually finding an example) but only probabalistically falsifiable (if they are false, we'll never know it for sure, but we can suspect it because we've tried for a long time to find an example and failed); statemets similarly containing only universal quantifiers are falsifiable but only probabilistically verifiable; statements whose normal form universally asserts an existential statement are probabilistically falsifiable; statements asserting the existence of something satisfying a universal are probabilistically verifyable. Nothing else appears to be knowable in any significant sense.&lt;br /&gt;&lt;br /&gt;Nonetheless, even Robinson arithmetic gives us more definite truths than this scheme does. Statements like "For all x, x=x" can be verified, not just falsified. Should this be looked at as probabilistic knowledge? This view becomes understandable if we think of equality as a black box, which we know nothing about. We just feed it numbers, and accept what we get out. If not the axioms of equality, at least the first-order tautologies seem knowable: perhaps "for all x, x=x" is probabilistic knowledge of the behavior of equality, but "for all x, x=x if x=x" is clear and knowable... right? Well, this too could be questioned by claiming that the truth functions are also to be treated as black boxes.&lt;br /&gt;&lt;br /&gt;Why would we take this view? Perhaps we claim that knowledge of the actual structure of equality, and similarly of the actual structure of truth functions, should be represented at the metalanguage level. Then, "for all x, x=x" would be something we could prove in the metalanguage, thanks to our additional information about equality, but not in the language itself. I think there is something intuitively appealing about this, despite the restrictive nature of the claim.&lt;br /&gt;&lt;br /&gt;The base logic would contain rules for reducing equality statements to true/false, and similarly rules that reduced a truth-function whose arguments were already true or false. A statement that reduces to "true" can then be concluded. In addition to these rules, there would be a rule that allowed existential quantifications to be concluded from examples; this would be a fairly complicated process, as we can pull out any set of identical pieces and replace them with a variable. Once an existential statement has been concluded, it can reduce to "true" in larger expressions. Universal quantifiers can be defined from existentials as usual, which allows us to conclude their negation by finding counterexamples (but does not allow us to &lt;span style="font-style: italic;"&gt;affirm&lt;/span&gt; any universals).&lt;br /&gt;&lt;br /&gt;How could a metalanguage be constructed along similar lines, but yielding the first-order truths we're used to? Not sure. What's needed most is the ability to see that even though we don't have all the information for a truth-function to be calculated, the additional information will create the same answer no matter what.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-930172177372383720?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/930172177372383720/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/provable-truths-picture-ive-been-using.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/930172177372383720'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/930172177372383720'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/provable-truths-picture-ive-been-using.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-6560982616593049213</id><published>2009-04-06T09:47:00.000-07:00</published><updated>2009-04-06T13:38:49.883-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Risk Estimate&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;4 probability estimates seem essential to me.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A projection of the growth of computing power. This is of course extensively discussed, so it is at least somewhat reasonable to use standard predictions.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Estimation of probable AI intelligence levels given different hardware levels. There is a lot of hidden complication here. First, how do we want to measure intelligence? How should we define intelligence level? The most relevant way of doing this is to estimate real-world problem solving capabilities (in each important area). Second, what we want isn't just a fixed maximum intelligence level, but rather a curve over time; this captures information about intelligence explosions due to fast learning or recursive self-improvement. Third, ideally we don't just want one probable curve for each hardware level but rather a probability distribution over such curves.&lt;/li&gt;&lt;li&gt;Estimation of probability of unfriendly behavior, given various classes of probable goal functions that AI programmers might come up with. Of course we'll also want to assign a probability to each of these goal-classes. The probability of unfriendly behavior depends mainly on the intelligence level reached (which depends in turn on the hardware level). An AI that is only somewhat smarter than humans will more probably have incentive to be nice to humans for the same reasons that humans have incentive to be nice to eachother.&lt;/li&gt;&lt;li&gt;Estimation of human ability to to respond to various intelligence levels if an AI of that level turns out to be unfriendly.&lt;/li&gt;&lt;/ol&gt;Chaining all these estimates together would result in a fair estimate of AI risk.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-6560982616593049213?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/6560982616593049213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/risk-estimate-4-probability-estimates.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6560982616593049213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6560982616593049213'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/risk-estimate-4-probability-estimates.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-907155098972636742</id><published>2009-04-02T09:30:00.000-07:00</published><updated>2009-04-06T06:51:55.987-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ethics'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;2 Futures&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The question of AI ethics is not merely a matter of estimating the probability of Really Bad Things; it is a matter of weighing the probability of Really Bad Things against the probability of Really Good Things.&lt;br /&gt;&lt;br /&gt;This makes some of the stuff in the previous post less relevant (though not completely irrelevant). The major question becomes: in various scenarios, what is the ratio of good to bad?&lt;br /&gt;&lt;br /&gt;When thinking about these issues, I find myself going back and forth between two concepts of the future:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;AIs increase in intelligence slowly. For a long time, AIs remain at human level or anyway human-understandable level. There is no miracle of recursive self-improvement; you put in what you get out.&lt;/li&gt;&lt;li&gt;The first AI smart enough to begin recursive self-improvement quickly goes beyond the human-understandable level of intelligence, and has its way with the future.&lt;/li&gt;&lt;/ol&gt;I think of the first as "the normal-world outcome" and the second as "the science-fiction outcome". A good argument could be made for reversing the labels. The first future is much more like what is normally written about in scifi; AIs typically need to be at the human-understandable level in order for humans to write about them, and furthermore AI is usually a side-part of the story (so admitting the possibility of explosive recursive improvement would get in the way of the plot). Thus, scifi has reason to unrealistically limit AI success.&lt;br /&gt;&lt;br /&gt;I have argued already that the first outcome would be safer than the second; the opportunity for Really Good Things is less, but this (I would think) is outweighed by the far reduced probability of Reall Bad Things. (In scenario 1, sociopathic AIs would occur, but they would be of roughly human level and so could be contained; by the time AIs were too powerful, the art of creating ethical AIs would be fairly well perfected. (Also, even if an unethical AI was created, there would be a safety net of nearly-as-smart AIs to contain it.)&lt;br /&gt;&lt;br /&gt;Of course, there isn't freedom to choose between the two outcomes, for the most part. Either recursive self-improvement (aka RSI) is a powerful possibility, or it isn't. To some extent, the outcome is dependent on whether the algorithmic research outpaces hardware developments, or vice versa (as discussed last time). The "race between software and hardware" mindset would have it that the best strategy for avoiding Really Bad Things is to make algorithmic progress as quickly as possible-- which favors open research, et cetera. But even if algorithms outpace hardware, there could still be an RSI tipping-point: suddenly the hardware is good enough to support RSI, and outcome 2 occurs. So this isn't well-justified. What is needed is a solid analysis of RSI.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.blogger.com/www.mattmahoney.net"&gt;Matt Mahony&lt;/a&gt; offers an argument that &lt;a href="http://www.mattmahoney.net/rsi.pdf"&gt;RSI is inherently slow&lt;/a&gt;. However, this model is limited by the (purposeful) restriction to strict &lt;span style="font-style: italic;"&gt;self&lt;/span&gt;-improvement; in particular, the exclusion of learning. Since explosive RSI-like learning capabilities are essentially as dangerous, this is isn't a &lt;span style="font-style: italic;"&gt;particularly&lt;/span&gt; helpful model. Shane Legg makes &lt;a href="http://www.vetta.org/documents/IDSIA-12-06-1.pdf"&gt;similar claims&lt;/a&gt; for learning. As it happens, I &lt;a href="http://www.mail-archive.com/agi@v2.listbox.com/msg13113.html"&gt;disagree&lt;/a&gt; with those claims:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;pre&gt;3 days ago Matt Mahoney referenced a paper by Shane Legg, supposedly&lt;br /&gt;formally proving this point:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" href="http://www.vetta.org/documents/IDSIA-12-06-1.pdf"&gt;http://www.vetta.org/documents/IDSIA-12-06-1.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I read it, and must say that I disagree with the interpretations&lt;br /&gt;provided for the theorems. Specifically, one conclusion is that&lt;br /&gt;because programs of high Kolmogorov complexity are required if we want&lt;br /&gt;to guarantee the ability to learn sequences of comparably high&lt;br /&gt;Kolmogorov complexity, AI needs to be an experimental science. So,&lt;br /&gt;Shane Legg is assuming that highly complex programs are difficult to&lt;br /&gt;invent. But there is an easy counterexample to this, which also&lt;br /&gt;addresses your above point:&lt;br /&gt;&lt;br /&gt;Given is T, the amount of computation time the algorithm is given&lt;br /&gt;between sensory-inputs. Sensory inputs can ideally be thought of as&lt;br /&gt;coming in at the rate of 1 bit per T cpu cycles (fitting with the&lt;br /&gt;framework in the paper, which has data come in 1 bit at a time),&lt;br /&gt;although in practice it would probably come in batches. Each time&lt;br /&gt;period T:&lt;br /&gt;--add the new input to a memory of all data that's come in so far&lt;br /&gt;--Treat the memory as the output of a computer program in some&lt;br /&gt;specific language. Run the program backwards, inferring everything&lt;br /&gt;that can be inferred about its structure. A zero or one can only be&lt;br /&gt;printed by particular basic print statements. It is impossible to know&lt;br /&gt;for certain where conditional statements are, where loops are, and so&lt;br /&gt;on, but at least the space of possibilities is well defined (since we&lt;br /&gt;know which programming language we've chosen). Every time a choice&lt;br /&gt;like this occurs, we split the simulation, so that we will quickly be&lt;br /&gt;running a very large number of programs backwards.&lt;br /&gt;--Whenever we get a complete program from this process, we need to run&lt;br /&gt;it forwards (again, simulating it in parallel with everything else&lt;br /&gt;that is going on). We record what it predicts as the NEXT data, along&lt;br /&gt;with the program's length (because we will be treating shorter&lt;br /&gt;programs as better models, and trusting what they tell us more&lt;br /&gt;strongly than we trust longer programs).&lt;br /&gt;--Because there are so many things going on at once, this will run&lt;br /&gt;VERY slowly; however, we will simply terminate the process at time T&lt;br /&gt;and take the best prediction we have at that point. (If we hadn't&lt;br /&gt;gotten any yet, let's just say we predict 0.)&lt;br /&gt;&lt;br /&gt;A more sophisticated version of that alg was presented at the AGI&lt;br /&gt;conference in this paper:&lt;br /&gt;&lt;br /&gt;&lt;a rel="nofollow" href="http://www.agiri.org/docs/ComputationalApproximation.pdf"&gt;http://www.agiri.org/docs/ComputationalApproximation.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The algorithm will be able to learn any program, if given enough time!&lt;br /&gt;&lt;br /&gt;NOW, why did Shane Legg's paper say that such a thing was impossible?&lt;br /&gt;Well, in the formalism of the paper, the above "algorithm" cheated: it&lt;br /&gt;isn't an algorithm at all! Fun, huh?&lt;br /&gt;&lt;br /&gt;The reason is because I parameterized it in terms of that number T.&lt;br /&gt;So, technically, it is a class of algorithms; we get a specific&lt;br /&gt;algorithm by choosing a T-value. If we choose a very large T-value,&lt;br /&gt;the algorithm coulf be very "complex", in terms of Kolmogorov&lt;br /&gt;complexity. However, it will not be complex to humans, since it will&lt;br /&gt;just be another instance of the same general algorithm! In fact, it&lt;br /&gt;would just be the same algorithm given more time to do its job.&lt;br /&gt;&lt;br /&gt;So, on that grounds, I disagree with Shane Legg: it is still possible&lt;br /&gt;to find algorithms analytically, despite the found algorithms being&lt;br /&gt;"of high complexity". Or, to rephrase, there are simple algorithms of&lt;br /&gt;high Kolmogorov complexity, and those algorithms do the job that can't&lt;br /&gt;be done by algorithms of low Kolmogoriv complexity.&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;pre&gt;&lt;br /&gt;&lt;/pre&gt;(As an aside, the next paragraph in my email uses a mathematically flawed argument: "&lt;span style="font-style: italic;"&gt;Once we have the&lt;/span&gt;&lt;span style="font-style: italic;font-family:monospace;" &gt; &lt;/span&gt;&lt;span style="font-style: italic;"&gt;ordering, I can define the game as: create the lowest video sequence&lt;/span&gt;&lt;span style="font-style: italic;font-family:monospace;" &gt; &lt;/span&gt;&lt;span style="font-style: italic;"&gt;not definable in set theory.&lt;/span&gt;" What I was trying to do was diagonalize set theory, to define a set-theoretically undefinable video sequence. The right way to do this is not to take the "lowest" sequence not defineable by set theory. Instead: put an ordering on set-theoretical definitions of sequences. For the Nth frame of our non-definable video sequence, look at the Nth set-theoretic video sequence; if the screen is all white in that sequence, take our sequence to be all black on that frame; otherwise, take it to be all white. :D)&lt;br /&gt;&lt;br /&gt;Still, all I'm claiming is that we can construct an algorithm that will perform better if it is given more time. That provides a sort of counterexample, but not one that is explosive (until the point where an AI gets the ability to improve its own hardware).&lt;br /&gt;&lt;br /&gt;[Edit: In the comments, Shane Legg points out that this doesn't really provide a counterexample. We can construct an algorithm that will do better given more time, but that does not mean that for every sequence we might want to learn, there is some amount of processing time that will allow the algorithm to converge correctly. In fact, that's false; there will be sequences that can't converge, for any fixed amount of time we allow the algorithm. Shane also corrects me on the conclusions he draws: he does not, in fact, conclude that "ai must be an experimental science".]&lt;br /&gt;&lt;br /&gt;Eliezer makes a &lt;a href="http://www.overcomingbias.com/2008/05/faster-than-ein.html"&gt;vivid argument&lt;/a&gt; for the possibility of explosively fast learning. Here is the key mathematical bit:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;I occasionally run into people who say something like, "There's a theoretical limit on how much you can deduce about the outside world, given a finite amount of sensory data."&lt;/p&gt;      &lt;p&gt;Yes.  There is.  The theoretical limit is that every time you see 1 additional bit, it cannot be expected to eliminate more than half of the remaining hypotheses (half the remaining probability mass, rather).  And that a redundant message, cannot convey more information than the compressed version of itself.  Nor can a bit convey any information about a quantity, with which it has correlation &lt;em&gt;exactly zero,&lt;/em&gt; across the probable worlds you imagine.&lt;/p&gt;  &lt;p&gt;But nothing I've depicted this human civilization doing, even &lt;em&gt;begins&lt;/em&gt; to approach the theoretical limits set by the formalism of Solomonoff induction.  It doesn't approach the picture you could get if you could search through &lt;em&gt;every single computable hypothesis&lt;/em&gt;, weighted by their simplicity, and do Bayesian updates on &lt;em&gt;all&lt;/em&gt; of them.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;br /&gt;He does, however, admit reservations about the possibility of this power being manifested in the physical universe:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;No one - not even a Bayesian superintelligence - will ever come remotely close to making efficient use of their sensory information...&lt;/p&gt;  &lt;p&gt;...is what I would like to say, but I don't trust my ability to set limits on the abilities of Bayesian superintelligences.&lt;/p&gt;  &lt;p&gt;(Though I'd bet money on it, if there were some way to judge the bet.  Just not at very extreme odds.)&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;But, of course, an entity (say) halfway between humans and a Bayesian Superintelligence is still rather dangerous.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-907155098972636742?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/907155098972636742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/2-futures-question-of-ai-ethics-is-not.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/907155098972636742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/907155098972636742'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/04/2-futures-question-of-ai-ethics-is-not.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1131454920500345146</id><published>2009-03-27T15:48:00.000-07:00</published><updated>2009-03-29T15:22:44.731-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ai'/><category scheme='http://www.blogger.com/atom/ns#' term='ethics'/><category scheme='http://www.blogger.com/atom/ns#' term='morality'/><category scheme='http://www.blogger.com/atom/ns#' term='AGI'/><category scheme='http://www.blogger.com/atom/ns#' term='singularity'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;AI Moral Issues&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I recently had a discussion with &lt;a href="http://yudkowsky.net/"&gt;Eliezer Yudkowski&lt;/a&gt;. I contacted him mainly because I thought from what I knew of his thinking that he would be interested in foundational logical issues, yet I saw no technical details of this sort in his many online writings (except for a &lt;a href="http://yudkowsky.net/rational/lobs-theorem"&gt;cartoon guide to Lob's Theorem&lt;/a&gt;). I quickly learned the reason for this. Eliezer believes that there is &lt;a href="http://yudkowsky.net/singularity/ai-risk"&gt;great risk&lt;/a&gt; associated with AI, as I already knew. What I had not guessed (perhaps I should have) was that he therefore considers technical details too dangerous to release.&lt;br /&gt;&lt;br /&gt;My natural disposition is to favor open research. So, I am very reluctant to accept this position. Of course, that is not in itself an argument! If I want to reject the conclusion, I need an actual reason-- but furthermore I should search just as hard for arguments in the other direction, lest I bias my judgement.&lt;br /&gt;&lt;br /&gt;(That second part is hard... although, it is made easier by the large body of literature Eliezer himself has produced.)&lt;br /&gt;&lt;br /&gt;The key question is the actual probability of Very Bad Things happening. If the chance of disaster thanks to AI research is above, say, .1%, we should be somewhat concerned. If the chances are above 1%, we should be really concerned. If the chances are above 10%, we should put a hold on technological development altogether until we can find a better solution. (&lt;a href="http://aipanic.com/"&gt;AI panic&lt;/a&gt; puts the current probability at 26.3%!)&lt;br /&gt;&lt;br /&gt;There are two issues here: the probability that AIs will become sufficiently powerful to pose a serious threat, and the probability that they would then proceed to do bad things. These topics are large enough &lt;span style="font-style: italic;"&gt;individually&lt;/span&gt; to merit a long discussion, but I'll try to jot down my thoughts.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;AI Power&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;From what I've read (not from my recent discussion with him), Eliezer's main argument for the first point is that AIs will, because they are software, have a far greater ability to self-improve than humans have. Thus, if they start out with approximately human-level intelligence, they could end up far smarter than us just by thinking for a while. "Thinking for a while" could mean a few days, or a few years-- the risk is still there, so long as the AIs would eventually reach superhuman intelligence.&lt;br /&gt;&lt;br /&gt;Personally, I think the human mind has a fairly high ability to self-improve. Sure, we can't modify our neural substrate directly, but it can equally be said that an AI can't modify its silicon. Our software could in principle be equally maleable; and the possibility, combined with the assumption that self-improvement is highly effective, would &lt;span style="font-style: italic;"&gt;suggest&lt;/span&gt; that evolution would have hit on such a solution.&lt;br /&gt;&lt;br /&gt;However. Eliezer likes to say that evolution is not very smart. This is certainly true. So, I don't really know how to assign a probability to evolution having hit upon a comparatively flexible form of recursive self-improvement. So, I need to take into account more evidence.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I can't "examine my own source code". An AI could. (If humans could do this, AI would be an easy problem!)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;I can learn new ways of thinking. I can even consciously examine them, and decide to accept/reject them.&lt;/li&gt;&lt;li&gt;I can't wish habits away.&lt;/li&gt;&lt;li&gt;An AI could run detailed simulations of parts of itself to test improved algorithms.&lt;/li&gt;&lt;li&gt;I can run mental simulations of myself to some extent (and do so).&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Tentative conclusion: An AI could be a better self-improver than I, but it would be a difference in degree, not a difference in kind.&lt;br /&gt;&lt;br /&gt;This conclusion is neither here nor there in terms of the question at hand.&lt;br /&gt;&lt;br /&gt;Other arguments to the effect that AIs could gain enough power are ready at hand, however. The main one is that if a particular amount of computing power can create human-level intelligence, it seems pretty obvious that larger amounts of computing power will create larger amounts of intelligence. It also seems like people would willingly cross this threshold, even competing to create the largest AI brain. Even if self-improvement turns out to be a total flop, this would create superhuman intelligences. If one such intelligence decided to grab more hardware (for example using a virus to grab computing power from the internet) the amount of computing power avaliable to it would probably become rather large rather fast.&lt;br /&gt;&lt;br /&gt;(All of this assumes human-level AI is possible in the first place: a notable assumption, but one I will not examine for the moment.)&lt;br /&gt;&lt;br /&gt;The argument from superhuman intelligence to superhuman power is fairly straightforward, though perhaps not 100% certain. The AI could hack into things, accumulate large amounts of money through careful investment, buy a private army of robots... more probably it would come up with a much cleverer plan. How smart, exactly, does it need to get in order for this to be a danger? Estimates range from the level of a smart human (because a smart human can already be dangerous) to the intelligence of all humans combined (because that is what the AI would be up against). For the really bad scenarios to occur, it would seem, the AI needs to be capable of major scientific innovation; innovation on the order of at least a team of scientists. (I must admit that, here and now, a single scientist could potentially release a deadly disease from a lab-- this involves no innovation, since the diseases are already there. But, this is because a select few scientists have special access to these diseases. An AI &lt;span style="font-style: italic;"&gt;might&lt;/span&gt; get such access, but that doesn't seem especially probable until a time when AIs are all around and being given all sorts of other responsibilities... at which point, if the AIs are actually unfriendly, the disease is only one of many worries.)&lt;br /&gt;&lt;br /&gt;One question remains: is there enough computing power around &lt;span style="font-style: italic;"&gt;currently&lt;/span&gt; to cause concern? This is something that &lt;span style="font-style: italic;"&gt;did&lt;/span&gt; come up in the conversation. If current machines are not risky, then it could be possible today to hit upon the right AI design, yet not achaive human-level intelligence using it. Personally, I think this would be ideal. Such a scenario, with AIs gradually increasing in intelligence as the hardware increased in capability, would give humans time to experiment with AI technology, and also consider its consequences. (Indeed, some argue that this is already the situation: that the fundamental algorithms are already known, and the hardware just needs to catch up. I don't agree, although I can't deny that a large amount of progress has already been made.)&lt;br /&gt;&lt;br /&gt;Eliezer argued that human-level intelligence on modern-day machines was plausible, because evolution is not a good engineer, so human-level intelligence may require far less hardware than the human brain provides. Estimates based on the brain's computing power vary quite widely, because it is not at all clear what in the brain constitutes useful computation and what does not. Low estimates, so far as I am aware, put the brain's computing power near to today's largest supercomputer. High estimates can basically go as far as one likes, claiming that chemistry or even quantum physics needs to be simulated in order to capture what is happening to create intelligence.&lt;br /&gt;&lt;br /&gt;Of course, the internet is vastly more powerful than a single machine. But the risk of an AI escaping to the internet does not seem very high until that AI is at least near human level &lt;span style="font-style: italic;"&gt;pre&lt;/span&gt;-escape. So, what is the probability that current machines could be human-level with the current algorithm?&lt;br /&gt;&lt;br /&gt;My faith in evolution's engineering capabilities is somewhat higher than Eliezer's. Specifically, Eliezer is (from what I've read) quite fond of the study of &lt;a href="http://en.wikipedia.org/wiki/List_of_cognitive_biases"&gt;cognitive bias&lt;/a&gt; that has become a popular subfield of psycholgy. While I enjoy Eliezer's writings on rationality, which explicates many of these biases, I am reluctant to call them design flaws. &lt;span style="font-style: italic;"&gt;Upon reflection&lt;/span&gt;, there are better ways of doing things, and explicating these better ways is an important project. But my best guess is that each cognitive bias we have is there for a reason, essentially because it makes for a good pre-reflection guess. So, rather than design flaws, I see the cognitive biases as clever engineering tricks. (I do not know &lt;span style="font-style: italic;"&gt;exactly&lt;/span&gt; how far from Eliezer's way of thinking this falls.) This is merely a default assumption; if I studied the cognitive bias literature longer, attempted to come up with explanations for each bias, and failed, I might change my mind. But, for now, I am not comfortable with assuming large amounts of mental inefficiency... although I admit I &lt;span style="font-style: italic;"&gt;do&lt;/span&gt; have to postulate &lt;span style="font-style: italic;"&gt;some&lt;/span&gt;. On the other hand, I also have to postulate a fairly high amount of inefficiency to human-made AI, because it is a hard problem.&lt;br /&gt;&lt;br /&gt;So, again, this leads neither here nor there.&lt;br /&gt;&lt;br /&gt;But, the &lt;span style="font-style: italic;"&gt;real&lt;/span&gt; question is not whether &lt;span style="font-style: italic;"&gt;today&lt;/span&gt;'s computers are human level; the more critical question is quite a bit more complicated. Essentially:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Will there be a critical advance in software that occurs at a time when the existing hardware is enough to create an AI hazard, or will software advances come before hardware advances, such that humanity has sufficient time to get used to the implications and plan ahead?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Again, a difficult question to answer. Yet it is a really important question!&lt;br /&gt;&lt;br /&gt;A historical view will of course show a fairly good match-up between amount of processing power and results. &lt;a href="http://www.transhumanist.com/volume1/moravec.htm"&gt;This old paper&lt;/a&gt; begins with such an account geared towards computer vision. Yet, there are real advances in algorithms happening, and they will continue to happen. A small but striking example of sudden improvement in algorithms is provided by &lt;a href="http://www.cs.cmu.edu/%7Eavrim/graphplan.html"&gt;Graphplan&lt;/a&gt;, an algorithm which changed the AI planning landscape in 1997. Of course, today, the algorithms are even better. So, hardware clearly isn't everything.&lt;br /&gt;&lt;br /&gt;A proper estimate would involve a serous analysis of the pace of advance in computing-- how probable is it that Moore's law will keep its pace, speed up, slow down, et cetera-- and likewise an analysis of progress in AI algorithmics. But, equally important is the other side of the question; "&lt;span style="font-style: italic;"&gt;such that humanity has sufficient time to get used to the implications and plan ahead&lt;/span&gt;". How much hope is there of this, assuming that the software is available before the hardware?&lt;br /&gt;&lt;br /&gt;I've said that I think this is the ideal outcome-- that people have a while to first get used to near-human-level AI, then human-level, then superhuman level. Part of why I think this is that, in this scenario, there would probably be many superhuman AIs rather than just one. I think this would improve the situation greatly, but the reasons are more a topic for the section on whether an AI would in fact do bad things. In terms of AI power, the situation is not as persuasive. It seems perfectly possible that a society that became used to the presence of AIs would give them various sorts of power withought thinking too hard, or perhaps even thinking that AIs in power were safer than humans in power.&lt;br /&gt;&lt;br /&gt;The thing is, they &lt;span style="font-style: italic;"&gt;might&lt;/span&gt; be right-- depending on how well-designed the AIs were. Which brings us to:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;AI Ethics&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;If an AI gained sufficient power, would it destroy humanity?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Of course, that depends on many variables. The real question is:&lt;br /&gt;&lt;br /&gt;Given various scenarious for an AI of sufficient power being created, what would that AI do?&lt;br /&gt;&lt;br /&gt;The major scenarios under consideration:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Many powerful AIs (an many not-so-powerful AIs) are developed as part of an ongoing, incremental process open to essentially all of humankind&lt;br /&gt;  &lt;/li&gt;&lt;li&gt;A single powerful AI is developed suddenly, by a single organization, as a result of a similarly open process&lt;br /&gt;  &lt;/li&gt;&lt;li&gt;A powerful AI is developed by a single organization, but as a result of a closed process designed to minimize the risk of AI methods falling into the wrong hands&lt;/li&gt;&lt;/ul&gt; Eliezer argues that the second scenario is not as safe as the third. Suppose the added effort to make a friendly powerful AI as opposed to just-any-powerful-AI is 1 year. Then, in an open process, an organization not very concerned with friendlyness will be able to create an AI 1 year before an organization concerned with friendliness.&lt;br /&gt;&lt;br /&gt;This, of course, depends on the idea that it is easier to create an unfriendly AI than a friendly one. Eliezer has written at length on this. The key concept is that a mind can have any particular goal; that there is no system of ethics that will be universally accepted by any sufficiently intelligent mind, because we can quite literally program a mind that has the exact opposite set of values for any given set. (Just reverse the sign on the utility function.)&lt;br /&gt;&lt;br /&gt;The argument I gave to eliezer for universal ethics is essentially the same as a view once argued by &lt;a href="http://transhumangoodness.blogspot.com/2007/10/road-to-universal-ethics-universal.html"&gt;Roko&lt;/a&gt;. (Roko has since been convinced that Eliezer is correct.) He calls it the theory of "instrumental values". The idea is that most rational agents will value truth, science, technology, creativity, and several other key items. It is &lt;span style="font-style: italic;"&gt;possible&lt;/span&gt; to contrive utility functions that will not value these things, but most will. Therefore, a future created by a bad AI will not be devoid of value &lt;a href="http://www.overcomingbias.com/2009/01/fragile-value.html"&gt;as Eliezer argues&lt;/a&gt;; rather, unless it has a really weird utility function, it will look a lot like the future that a good AI would create.&lt;br /&gt;&lt;br /&gt;This is a topic I want to go into a lot of detail on, and if the first part of the post hadn't ended up being so long, I probably would. Instead, I'll blog more about it later...&lt;br /&gt;&lt;br /&gt;For now, it is important to observe (as Eliezer did in our conversation) that reguardless of far-off future similarities, a good AI and a bad AI have an immediate, critical difference: a bad AI will very probably consider humans a waste of resources, and do away with us.&lt;br /&gt;&lt;br /&gt;Similarly, a badly designed but well-intentioned AI will very probably result in a future devoid of purpose for humans. Maximizing happiness will justify forced drugging. Maximizing a more complicated value based on what people actually consider good may easily result in a locking-in of currently enjoyed hobbies, art forms, et cetera. Maximizing instrumental values might easily lead to the destruction of humans.&lt;br /&gt;&lt;br /&gt;The terrible, horrible dillemma (it seems to me) is that once a single AI gains power, be it good or bad, it seems that the single utility function that such an AI is programmed with becomes completely locked in. Any flaws in the utility function, no matter how small they may seem, will be locked in forever as well.&lt;br /&gt;&lt;br /&gt;There are various ways of getting around this, to some extent. One approach I've been tossing around in my head for a while is that an AI should be uncertain about its own goals, similar to human uncertainty about what is ethical. This is entirelyadmissable within a Bayesian formalism. What, then, would the AI take as evidence concerning ethics? I visualize it something like this: the AI would have a fair amound of (hand-programmed) knowledge about what sorts of things are &lt;span style="font-style: italic;"&gt;probably&lt;/span&gt; ethical, and it would search for simple rules that would meet these criteria. A better theory would be one that fit better to the patchwork of preconceptions about ethics. Preconceptions would include things like "what a human considers ethical, is more probably ethical..." "what a human, given large amounts of time to reflect, would consider ethical, is more probably ethical..." as well as simpler statements like "killing an unwilling victim is unethical with high probability", creating pain, and so on. A few different Three-Laws style systems could "fight it out", so to speak.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.overcomingbias.com/2009/01/free-to-optimize.html"&gt;Eliezer suggests&lt;/a&gt; a different sort of solution: an AI should behave in a highly lawful manner, setting definite rules and consequences, rather than merely doing whatever it takes to do "good" as defined by humanity. He's suggesting this as a solution to a somewhat different problem, but it applies about as well here. An AI that booted up, calculated a good set of laws for utopia, set up physical mechanisms to enforce those laws, and then shuts off, will &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; lock the future into a single utility function. It will of course give it a huge push in a particular direction, but that is quite different. It is purposefully leaving the future open, because an open future is a plus according to (at least) the majority of humans.&lt;br /&gt;&lt;br /&gt;The third option that I can think of is one I've already mentioned: have several powerful AIs rather than one. This still carries a large risk. 20 AIs that decide humans are useless are just as bad as 1 AI that decides humans are useless. However, 20 AIs with well-intentioned-but-wrong utility functions are probably much better than 1, so long as they all have &lt;span style="font-style: italic;"&gt;different&lt;/span&gt; well-intentioned utility functions.&lt;br /&gt;&lt;br /&gt;The AIs would probably have incentive to enforce a balance of power. If one AI becomes obviously more powerful than the others, the others have incentive to gang up on it, because that one persuing its utility function to the utmost is probably far worse for the others than whatever the group consensus is. That consensus should be something favoring humans, since the individual goals are all random variations of that theme... if we look at all the goals, and ask what they have in common, favoring humanity should be the answer.&lt;br /&gt;&lt;br /&gt;Of course, that result isn't entirely certain. First, the average of many mistaken goals is not necessarily a good goal. Second, the average is not necessarily the sort of compromize that would result. Third, once a compromize has been agreed upon, the AIs might (rather than maintaining their standoff) all rewrite their utility functions to reflect the consensus, and effectively merge. (This would be to avoid any future defectors; the utility of stopping other possible defectors might be higher than the utility of keeping your ability to defect by keeping your utility function, thus making it rational to agree to a group rewrite.) In this case, the lock-in that I'm afraid of would happen anyway (although we'd be locked in to a probably-less-terrible utility function). Fourth, the situation might not result in a standoff in the first place. Even with several AIs to begin with, one could gain an upper hand.&lt;br /&gt;&lt;br /&gt;Friendly AI (as Eliezer calls it) is a hard question. But, it is not the question at hand. The question at hand is:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Which is better, an open research process or a closed one?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I've touched on a few of the factors, but I haven't come close to a definite answer. A proper answer requires an examination of the friendliness issue; an analysis of the curve of technology's growth (especially as it relates to computing power); an examination of what sort of theoretical advances could create a powerful AI, and in particular how suddenly they could occur; an idea of AIs future place in society (both sub-human, human-level, and superhuman AI), which requires a socio-economic theory of what we will do with AI; and a theory of AI psychology, mapping the space of possible minds (focusing on which minds are friendly to humans).&lt;br /&gt;&lt;br /&gt;I'll try to address each of these issues more closely in the next few posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1131454920500345146?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1131454920500345146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/ai-moral-issues-i-recently-had.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1131454920500345146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1131454920500345146'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/ai-moral-issues-i-recently-had.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-6229877251163439844</id><published>2009-03-22T10:48:00.000-07:00</published><updated>2009-03-22T11:55:43.403-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>After having thought about it, I am becoming more convinced that the correct approach to the issues mentioned in these &lt;a href="http://dragonlogic-ai.blogspot.com/2009/03/more-on-last-times-subject-system-i.html"&gt;two&lt;/a&gt; &lt;a href="http://dragonlogic-ai.blogspot.com/2009/03/ok.html"&gt;posts&lt;/a&gt; is to accept the line of reasoning that deduces an enumerator's truth from the truth of all its consequences, thus (as mentioned) inviting paradox.&lt;br /&gt;&lt;br /&gt;Consider the original stated purpose of my investigation: to examine which statements are probabilistically justifiable. To probabilistically accept one of the enumerators simply because (so far) all of its implications appear to be true would be a violation of the semantics, unless the truth value of an enumerator&lt;span style="font-style: italic;"&gt; really was&lt;/span&gt; just the conjunction of the enumerated statements. So to satisfy my original purpose, such a thing is needed.&lt;br /&gt;&lt;br /&gt;As I said, paradox can be avoided by allowing enumerators to be neither true nor false. In particular, it is natural to suppose a construction similar to Kripke's least-fixed-point theory of truth: an enumerator is "ungrounded" if it implies some chain of enumerators which never bottoms out to actual statements; ungrounded enumerators are neither true nor false.&lt;br /&gt;&lt;br /&gt;The problem is that we will now want to refer to the ungrounded-ness of those sentences, since it appears to be a rather important property. For this we need to augment the language. This can be done in multiple ways, but it will ultimately lead to a new reference failure. And filling in &lt;span style="font-style: italic;"&gt;that&lt;/span&gt; hole will lead to yet another hole to fill. In general I would deal with this by using my &lt;a href="http://dragonlogic-ai.blogspot.com/2008/12/general-theory-well-i-have-been.html"&gt;theory that almost works&lt;/a&gt;. This entails building an infinite hierarchy of truth values which starts:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;True, False, Meaningless1, Meaningless2, Meaningless3, MeaninglessInfinity, MeaninglessInfinity+1...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I am generally interested in investigating whether this hierarchy is equal in power to the usual Tarski hierarchy, namely,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;True1, True2, True3, ... TrueInfinity, TrueInfinity+1, ...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The difference basically springs out of the use of a loopy truth predicate (a truth predicate that can apply truly to sentences which contain the &lt;span style="font-style: italic;"&gt;same&lt;/span&gt; truth predicate). Asking for a single truth predicate appears to force the existence of infinitely many meaninglessness predicates. Is a loopy truth predicate automatically more powerful? Not obviously so: the loopy truth predicate will have a maximal amount of mathematical structure that it implies (the least fixed point), which does not even include uncountable entities. The Tarski hierarchy will continue into the uncountables, and further. (I should note that that statement is not universally accepted... but if someone suggests that the Tarski hierarchy should stop at some well-defined level, can't I just say, OK, let's make a truth predicate for that level? Shy not keep going? And we don't actually need an uncountable number of truth predicates: we need an ordinal notation that can refer to uncountable ordinals, and we just make the truth predicate take an ordinal as an agrument.)&lt;br /&gt;&lt;br /&gt;So the question is: is there an isomorphism between the mathematical structures captured by the Tarski heirarchy of truth, and my hierarchy of nonsense.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-6229877251163439844?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/6229877251163439844/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/after-having-thought-about-it-i-am.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6229877251163439844'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6229877251163439844'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/after-having-thought-about-it-i-am.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1721453408042520471</id><published>2009-03-19T10:35:00.000-07:00</published><updated>2009-03-22T10:48:10.024-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='proof theory'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='paraconsistent'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-weight: bold;font-size:130%;" &gt;Maximally Structured Theory&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The proof-theoretic concept of a theory's "strength" has little to do with the sorts of considerations in the previous two posts. Proof theory normally worries about how much a system &lt;span style="font-style: italic;"&gt;proves&lt;/span&gt;, while I've been worried about how much a system can &lt;span style="font-style: italic;"&gt;say&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The sort of proof-theoretic strength I've read about (I admit there are at least two, and I don't know much about the other one, known as &lt;a href="http://en.wikipedia.org/wiki/Proof-theoretic_ordinal"&gt;ordinal strength&lt;/a&gt;) is defined in terms of interpretations. One system can be interpreted in another if, roughly, we can "translate" the provable statements in one into provable statements in the other, and the translation preserves the basic logical operators (so and -&gt; and, or -&gt; or...). There is no requirement that the translation of an unprovable statement be likewise unprovable, which means that the logic we're translating into may add more provable statements. This is, of course, the point: of two systems A and B, B is stronger than A if B interprets A but A does not interpret B. This indicates that B proves more statements than A.&lt;br /&gt;&lt;br /&gt;Now suppose that system C is also stronger than A. It may be the case that system C is not comparable to system B: neither can interpret the other, so there is no way to say that one is more powerful than another. We can think of this as a disagreement between the two systems; both prove more than A proves, but B might prove some statement where C instead proves the negation of that statement.&lt;br /&gt;&lt;br /&gt;Is one system right and the other system wrong? Well, if both systems are supposed to be talking about the same mathematical entity, then yes. (Or they could both be wrong!) But if not, we can instead think of the different systems as describing separate mathematical entities. It is then natural (in my opinion) to assert that the &lt;span style="font-style: italic;"&gt;proper&lt;/span&gt; foundation of mathematics should interpret &lt;span style="font-style: italic;"&gt;both&lt;/span&gt;. This would mean that it contained both structures within it.&lt;br /&gt;&lt;br /&gt;I am somewhat surprised that this is not the current approach. Instead, people seem to be concerned with whether or not the axiom of choice is actually true or not. I suppose this is my first deviation from the classical 'platonist' view :).&lt;br /&gt;&lt;br /&gt;So, suppose we restrict ourselves to finite first-order theories. How much structure do we get?&lt;br /&gt;&lt;br /&gt;First, just because we restrict ourselves to finite theories doesn't mean we can't interpret an infinite theory. Peano arithmetic, which I've been talking about for the last two posts, is a perfect example; it has an infinite number of axioms, but they are computably enumerable, so we can create an axiom system that enumerates them by allowing the use of extra function symbols.&lt;br /&gt;&lt;br /&gt;Second, we can interpret far more than first-order theories; higher-order theories can be interpreted as first-order theories with restricted quantifiers.&lt;br /&gt;&lt;br /&gt;What we &lt;span style="font-style: italic;"&gt;can't&lt;/span&gt; interpret is a &lt;span style="font-style: italic;"&gt;fundamentally&lt;/span&gt; infinite theory, such as... well, the truths of arithmetic. No finite theory will interpret the infinite theory containing all those truths. (Notice, now I'm being Platonist, and claiming that there actually is a fact of the matter.)&lt;br /&gt;&lt;br /&gt;What will the maximal theory look like, the theory that can interpret any others? Will it be finite?&lt;br /&gt;&lt;br /&gt;Let's first construct an infinite theory that interprets all finite theories. We can't interpret inconsistent theories without being inconsistent, so let's throw those out... then, we enumerate all the consistent theories (in order of size, let's say, using some other rule to break ties), but using a &lt;span style="font-style: italic;"&gt;new&lt;/span&gt; set of function symbols for each theory, so that they don't interfere with eachother. (Each finite theory can only use a finite set of function symbols, so there is no trouble here.) Using different function symbols allows us to simply take the union, forming an infinite theory.&lt;br /&gt;&lt;br /&gt;This infinite theory cannot be converted into a finite form, unfortunately, because the enumeration is not computable: we can't simply distinguish between theories that are consistent and theories that are not.&lt;br /&gt;&lt;br /&gt;Now... if we take a logic wich lacks the rule of explosion, ie, a paraconsistent logic, we wouldn't need to worry so much about avoiding inconsistent theories. The inconsistent theories would be isolated, and wouldn't "infect" the others. In this case we could computably enumerate all theories (still making sure to take different function names for each), and therefore we could make a &lt;span style="font-style: italic;"&gt;finite theory &lt;/span&gt;which performed the enumeration. (The finite theory would use only a finite number of function symbols, but it would provide an interpretation for an infinite number of them, via some encoding.)&lt;br /&gt;&lt;br /&gt;This provides an interesting argument in favor of paraconsistent logic. But, not necessarily a definitive one. It seems to me that one could claim that the paraconsistent logic is not really &lt;span style="font-style: italic;"&gt;interpreting&lt;/span&gt; the classical theories, since it is putting them into a logic with a nonclassical semantics. Perhaps one could define what the paraconsistent logic is really doing, and construct a classical theory that does that much. I don't know.&lt;br /&gt;&lt;br /&gt;[edit]&lt;br /&gt;postscript--&lt;br /&gt;&lt;br /&gt;Don't think that it ends here, with one infinite (unspecifiable) theory to rule over all finite theories. We can talk about other such infinite theories, such as the theory containing all the truths of first-order arithmetc, the theory containing all the truths of second-order arithmetic, et cetera, the truths of set theory... (Actually, I don't think just "the truths of set theory" is well specified. "The truths of set theory under the iterative conception of set" is one way of being more specific.) So, we'll want to talk about whether these infinite theories can interpret eachother, and hence we'll have an increasing chain of "maximally structured" systems with respect to particular reference sets... the maximally structured theory with respect to finite theories is countably infinite; so is the set of truths of first-order arithmetic; the max. struct. theory w.r.t countably infinite theories will be uncountably infinite; and so it goes.&lt;br /&gt;&lt;br /&gt;Alternatively, the paraconsistant version of this hierarchy appears to stop right away, with a finite theory that interprets all finite theories. This might be taken as a strength or a weakness...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1721453408042520471?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1721453408042520471/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/maximally-structured-theory-proof.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1721453408042520471'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1721453408042520471'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/maximally-structured-theory-proof.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2955150253066938204</id><published>2009-03-18T18:23:00.000-07:00</published><updated>2009-03-18T19:36:23.000-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-weight: bold;font-size:130%;" &gt;More on Last Time's Subject&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The system I suggested last time, at the end, can actually be stated in a much simpler (more standard) way.&lt;br /&gt;&lt;br /&gt;Rather than adding in a totally new notation for representing enumerations of sentences in a Turing-complete manner, all we &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; need is to allow the use of arbitrary function symbols when making assertions. This sounds trivial! However (as the considerations from last time show) it makes a big difference in the actual behavior of the system.&lt;br /&gt;&lt;br /&gt;Added function symbols are a standard part of the machinery of first-order logic, so in some ways it seems only natural to allow their use in first-order arithmetic. However, they are not normally considered a part of that language; functions in first-order arithmetic must be &lt;span style="font-style: italic;"&gt;constructed&lt;/span&gt; rather than &lt;span style="font-style: italic;"&gt;defined&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Considering functions part of the language obviously proves no new theorems about arithmetic, because we don't add any axioms that include the function symbols. The only thing that changes is what we can say. So what &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; we say?&lt;br /&gt;&lt;br /&gt;Well, it allows us to say a whole lot. With function symbols, we can go as far as defining a truth predicate for arithmetic! This means that we can build the metalanguage for arithmetic. We can also go further, building the metametalanguage (the language containing the truth predicate for the metalanguage), et cetera. However, don't be fooled; there is an important difference between &lt;span style="font-style: italic;"&gt;being able to build it&lt;/span&gt; and &lt;span style="font-style: italic;"&gt;having it as part of the system&lt;/span&gt;. Normally when we talk about a language containing the truth predicate we mean that we've already got the rules to make it work in the axioms, not that we can add them if we like.&lt;br /&gt;&lt;br /&gt;Adding function symbols to arithmetic follows the advice of the edit to the previous post, namely, we don't include any claim that the added statements are true just because all the sentences of arithmetic that they imply are true. This is essentially because the logical system is treating the function-symbols as &lt;span style="font-style: italic;"&gt;actual things&lt;/span&gt;, not just as notational conveniences. The logic thinks there is a truth of the matter about how "F" behaves, even before we add axioms to define its behavior. It has no idea what that truth &lt;span style="font-style: italic;"&gt;is&lt;/span&gt;, but it assumes that the truth exists. So even if a particular function-based statement implies only true statements about arithmetic, and so looks totally un-risky, the logic still will think that it implies a bunch of risky statements about the function symbols involved.&lt;br /&gt;&lt;br /&gt;It would be interesting to develop a semantics for the function-symbols that did not result in this conclusion... however, as mentioned in the previous post, in that direction there is a strong risk of paradox. Some sort of non-classical system of truth values will almost certainly be needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2955150253066938204?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2955150253066938204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/more-on-last-times-subject-system-i.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2955150253066938204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2955150253066938204'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/more-on-last-times-subject-system-i.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8923074792857109237</id><published>2009-03-15T13:44:00.000-07:00</published><updated>2009-03-18T09:44:24.287-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>OK.&lt;br /&gt;&lt;br /&gt;So.&lt;br /&gt;&lt;br /&gt;More logic stuff.&lt;br /&gt;&lt;br /&gt;I've spoken in the past about using probabilistic justifications (or, when simplicity is desired, nonmonotonic justufucations) for mathematical belief. This provides at least some account of how a system can meaningfully refer to uncomputable concepts such as the natural numbers. (Calling the natural numbers "uncomputable" is sure to raise a few eyebrows. What I mean is that there are undecidable propositions concerning the natural numbers.) The natural question is: how much can be probabilistically justified?&lt;br /&gt;&lt;br /&gt;To classify one statement as computably decidable, or probabilistically decidable, or neither, is quite accurate; any one statement or its negation could be simply added as an axiom, and thus made decidable (or more realistically, something else might be added as an axiom, rendering the statement decidable). Similarly, a statement might be made probabilistically decidable with the addition of an axiom that caused it to be equivalent to some probabilistically decidable statement. (Most trivially, the axiom "A &lt;-&gt; B" where "A" is a probabilistically jsutifiable statement and "B" was previosly not. Such an axiom might seem contrived, but it is true if A and B are both true!)&lt;br /&gt;&lt;br /&gt;Still, it is convenient to talk in this way, and can be made meaningful. One way is by fixing the axioms we're dealing with; so, for example, we might be working just with robinson arithmetic or peano arithmetic. (I would prefer robinson arithmetic, since I would like to leave the epistemic status of mathematical induction open, but peano arithmetic assumes it to be true.) But this is not the only possibility. For now, I prefer something like the following:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Basic functions of arithmetic are computable.&lt;/span&gt; These vary from axiomatixation to axiomatization. At one extreme, we could provide axioms giving the behavior of all the common functions on natural numbers; at the other extreme, we could have no functions in our language, instead having relations sufficient to specify the structure. Most common is to take only the sucessor function as basic, and define addition, multiplication, et cetera from these. These differences &lt;span style="font-style: italic;"&gt;may&lt;/span&gt; cause trouble for me (ie, result in non-equivalent definitions), but I'll ignore that for now...&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Basic predicates and relations are computably decidable.&lt;/span&gt; Again, there is some variation about which relations are taken as basic, and which are taken as defined. Usually equality is taken as basic, but I've heard of greater-than being taken instead.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;Statements whose truth is a function of a finite number of computably decidable statements are computably decidable&lt;/span&gt;. This lets us form larger decidable statements with the normal boolean functions, &lt;span style="font-weight: bold;"&gt;and&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;or&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt;. It also allows bound quantifiers to be used, such as "For all numbers less than 100..."&lt;/li&gt;&lt;li style="font-style: italic;"&gt;Statements whose truth is a function of an infinite number of computably decidable statements are probabilistically decidable if any reasonable probabilistic decision process is guaranteed to converge.&lt;/li&gt;&lt;/ul&gt;A "reasonable probabilistic decision process" is intended to be something like the procedure described in &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html"&gt;this post&lt;/a&gt;. Obviously I need to make that a bit more well-defined, but it should be good enough for now.&lt;br /&gt;&lt;br /&gt;It might also be convenient to define "computably justifiable", meaning that &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; a statement is true, it is computably decidable. (For example, "there exists a place in the decimal expansion of pi at which the number 9 is repeated 24 times" is verifiable if it is true (we can simply calculate the digits of pi until we find the spot), but if it is false, it is not.) "Computably falsifiable" similarly means that a statement can be falsified finitely if it is indeed false. "Probabilistically justifiable" would mean a statement was probabilistically decidable if it was true, and "probabilistically falsifiable" would mean that it was probabilistically decidable if false.&lt;br /&gt;&lt;br /&gt;The basics are all set up, so let's get to the main show.&lt;br /&gt;&lt;br /&gt;The question is: what statements are probabilistically justifiable?&lt;br /&gt;&lt;br /&gt;I think I have mentioned before that the delta-2 statements on the arithmetic hierarchy are definitely probabilistically decidable, since they are limit-computable. For any statements not in delta-2, there is no guarantee that they will converge to the correct value (they might not converge, or worse, they might converge to the wrong value). Still, it is an interesting question whether *some* of these converge, and whether applying probabilistic methods is still better than guessing randomly (if there are more that converge correctly than converge incorrectly). But that is not the question for today...&lt;br /&gt;&lt;br /&gt;Previously, I have restricted this question to statements in arithmetic. This seems sensible. However, lately I've been looking at the concept of proof-theoretic strength. Roughly, a logic is considered stronger if it proves the same truths plus more. This brought my attention to the idea of probabilistically justifying statements in higher logics based on their relevance to arithmetic. Could set theory, for example, be justified by its usefulness in describing true statements of arithmetic?&lt;br /&gt;&lt;br /&gt;This led to a question: &lt;span style="font-style: italic;"&gt;are&lt;/span&gt; there any statements that can be made, which purely assert things about arithmetic, but cannot be formulated in the language of first-order arithmetic? Obviously arithmetic can assert any &lt;span style="font-style: italic;"&gt;single&lt;/span&gt; thing about arithmetic, so the question is whether all &lt;span style="font-style: italic;"&gt;combinations&lt;/span&gt; of arithmetic statements that can be asserted by some logic can be asserted in arithmetic. ("Combinations of statements" refers to the same thing that "truth functions of statements" referred to in the earlier definition. Now, obviously, arithmetic can specify any finite truth-functional statement; so that challenge comes when we examine infinite truth-functional statements, such as those that are probabilistically justifiable.)&lt;br /&gt;&lt;br /&gt;More formally:&lt;br /&gt;&lt;br /&gt;Definition: &lt;span style="font-style: italic;"&gt;Logic B is "&lt;span style="font-weight: bold;"&gt;effectively complete&lt;/span&gt;" with respect to logic A if and only if for every computably enumerable set S of statements in logic A, there exists a statement in logic B which imples exactly S closed under the deductive consequence relation of B.&lt;/span&gt; (We need to close the set, because we can't penalize logic B for not being able to imply x&amp;amp;y without automatically implying y, for example.)&lt;br /&gt;&lt;br /&gt;Conjecture: &lt;span style="font-style: italic;"&gt;Arithmetic is effectively complete with respect to itself.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Whether this is true or false, there are some interesting consequences. I'll note some of them in both directions, but keep in mind that (since only one direction is true) half of the consequences I mention will be false.&lt;br /&gt;&lt;br /&gt;If the conjecture is false, then even if we are suspicious of mathematical entities other than numbers, it seems that we'll have good reason to want a more mathematically expressive language then first-order arithmetic. The idea of "effective completeness" as I defined it is fairly weak; it does not seem at all implausible to say that if we care about a domain, we should want a logic which is effectively complete with respect to it. So, if arithmetic is &lt;span style="font-style: italic;"&gt;not&lt;/span&gt;, it seems like another is needed. &lt;span style="font-style: italic;"&gt;Furthermore&lt;/span&gt;, these new statements may have the hope of probabilistic justification.&lt;br /&gt;&lt;br /&gt;This does not necessarily justify set theory. It most certainly doesn't justify a Platonist attitude towards sets (ie, believing in sets as actual entities rather than mere symbol-manipulation). Still, it starts the climb towards abstraction, and that's a slippery slope. (ha ha, mixed metaphor.)&lt;br /&gt;&lt;br /&gt;If the statement is &lt;span style="font-style: italic;"&gt;true&lt;/span&gt;, on the other hand, that means first-order arithmetic &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; state anything that can be effectively stated. (That qualification, "effective", means: arithmetic can state any combination of statements that some computer program could generate in a (possibly nonterminating) sequence.) This would mean that any higher logic, in its capacity to refer to combinations of arithmetic statements, is unnecessary; in some sense it is already hiding within arithmetic, so it doesn't need to be invented seperately. Any higher-logic statement that could be justified probabilistically via its relationship to arithmetic could actually be interpreted &lt;span style="font-style: italic;"&gt;as&lt;/span&gt; some statement in arithmetic: once we found the sentence's interpretation, we could not distinguish between the two in terms of their meaning.&lt;br /&gt;&lt;br /&gt;So which is true?&lt;br /&gt;&lt;br /&gt;I spent quite a while pondering it, and went down a few wrong paths trying to find the answer, but I won't waste time describing those. Eventually, I realized that I had an example right in front of me of a sentence that is &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; ever stated in the language of first-order arithmetic, yet assers an effectively enumerable set of such statements: mathematical induction!&lt;br /&gt;&lt;br /&gt;Mathematical induction is always axiomized in peano arithmetic via the aid of an &lt;a href="http://en.wikipedia.org/wiki/Axiom_schema"&gt;axiom schema&lt;/a&gt;. An axiom schema is an axiom with a hole in it, into which is supposed to be placed every possible statement that can be made in the system. Here is the induction schema:&lt;br /&gt;&lt;br /&gt;"If X[0] and X[n]-&gt;X[n+1] for all n, then X[n] for all n."&lt;br /&gt;&lt;br /&gt;In other words, if we can prove that X holds for zero, and that whenever X holds for a number it holds for the next number, then we have proved that X holds for all numbers.&lt;br /&gt;&lt;br /&gt;"X" is the hole into which any particular statement that can be made about a number is to be dropped. This technique, making a statement about all statements, isn't actually avaliable to us in first-order arithmetic. Officially, the axiom schema is a stand-in for an infinite number of axioms.&lt;br /&gt;&lt;br /&gt;So, you see, if everyone uses an axiom schema, I realized that it's probably because we &lt;span style="font-style: italic;"&gt;can't summarize&lt;/span&gt; that infinite statement in the language of arithmetic alone. Furthermore, I realized that someone had probably proven that fact by now. So I looked it up, and it's true. My conjecture was false.&lt;br /&gt;&lt;br /&gt;This means that higher logics are necessary, in a fairly strong sense. Which logics? Well, adding the truth predicate for arithmetic (ie jumping to the metalanguage for arithmetic) gets us effective completeness for first-order arithmetic statements. If we really were only concerned with arithmetic, we would stop there. But this new language will (I suspect) still not be effectively complete with respect to itself. So we could add a &lt;span style="font-style: italic;"&gt;third&lt;/span&gt; layer to the cake, using a truth predicate for the &lt;span style="font-style: italic;"&gt;second&lt;/span&gt; logic to form a meta-meta-language that would be effectively complete for &lt;span style="font-style: italic;"&gt;it&lt;/span&gt;... probably, we can continue in this manner for the entire Tarski hierarchy of truth predicates. But instead, we could tackle head-on the problem of creating a language that is effectively complete with respect to itself.&lt;br /&gt;&lt;br /&gt;Such a think might be impossible, I admit... it might lead to paradox in the same way that a language containing its own truth predicate can.&lt;br /&gt;&lt;br /&gt;My idea is that, in addition to some standard stock of standard machinery (conjunction, negation, a domain such as arithmetic to talk about...), we add a turing-complete representation for constructing statement-enumerators. A statement-enumerator would be treated as the combined assertion of all statements it generated. These generators would be capable of generating other generator-sentences (otherwise it would just be another metalanguage).&lt;br /&gt;&lt;br /&gt;A generator would be true if all its generated sentences were.&lt;br /&gt;&lt;br /&gt;Sure enough, we can construct a generator that generates only its own negation. Assuming classical truth values, paradox ensues. So, we need to allow some sentences to be neither true nor false...&lt;br /&gt;&lt;br /&gt;Same old game.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Edit-&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Actually, this scheme might fail only because I'm overstepping what is needed for effective completeness. Specifically, the sentence "&lt;span style="font-style: italic;"&gt;A generator would be true if all its generated sentences were&lt;/span&gt;" seems convenient, but is not necessary.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8923074792857109237?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8923074792857109237/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/ok.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8923074792857109237'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8923074792857109237'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/03/ok.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1347979676186528466</id><published>2009-02-09T14:03:00.000-08:00</published><updated>2009-05-19T14:19:52.859-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='formal grammar'/><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><category scheme='http://www.blogger.com/atom/ns#' term='prediction'/><category scheme='http://www.blogger.com/atom/ns#' term='combinator'/><category scheme='http://www.blogger.com/atom/ns#' term='compression'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Compression&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As promised in the previous post, I'll describe what I'm currently working on.&lt;br /&gt;&lt;br /&gt;Really, this is a continuation of &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/something-cool-i-ran-into-some.html"&gt;this post&lt;/a&gt;. The ideas I outlined there have the same basic shape, but have evolved a good deal.&lt;br /&gt;&lt;br /&gt;The basic idea: Any Turing-complete representation can be "run backwards" to generate the search space of all (and only) the programs that generate the desired piece of data. This approach promises to be much more effective then a genetic programming solution (for the special case to which it can be applied). A possible immediate application of this is data compression, which is formally equivalent to looking for short programs that output a desired piece of data (or, it is if we include the decompression program when measuring the size of the file, as is often required in &lt;a href="http://mailcom.com/challenge/"&gt;compression&lt;/a&gt; &lt;a href="http://prize.hutter1.net/"&gt;contests&lt;/a&gt; to prevent cheating).&lt;br /&gt;&lt;br /&gt;Realizing that compression was the application area I should be focusing on helped simplify my design a great deal. There may be a theoretical connection between compression and prediction, but converting between the two is not computationally simple. Converting compression to prediction can be done in two ways (so far as I can see):&lt;br /&gt;&lt;br /&gt;-generate futures and then compress them to see how probable they are (shorter = more probable)&lt;br /&gt;-generate (short) compressed representations of the future and decompress them to see what they actually mean&lt;br /&gt;&lt;br /&gt;The first method can use the "running backwards" trick, trimming the search space. The second cannot, so it may generate meaningless futures (compressed representations that do not actually decompress to anything). However, the second has the advantage of starting with short (probable) representations; the first will go through many improbable possibilities. I can think of many little tricks one could try to improve and/or combine the two methods, but my point is, it is a hard problem. I was attempting to tackle this problem for most of last semester. Eventually, I may try again. But for now, compression is a good testbed for the basic algorithm I'm working on.&lt;br /&gt;&lt;br /&gt;What about the other direction: can we convert from prediction to compression in an effective manner? It turns out that the answer is yes. There is an awesome algorithm called &lt;a href="http://en.wikipedia.org/wiki/Arithmetic_coding"&gt;arithmetic coding&lt;/a&gt; that does just that. This is the basis of the &lt;a href="http://en.wikipedia.org/wiki/PAQ"&gt;PAQ&lt;/a&gt; compression algorithm, the current leader in the Hutter Prize and Calgary Challenge (both linked to earlier).&lt;br /&gt;&lt;br /&gt;However, this strategy inherently takes the same amount of time for decompression as it takes for compression. PAQ and similar methods need to form predictions as they decompress in the same way that they form predictions as they compress. This is what takes the majority of the time. More "direct" methods, like the one I am working on, take far less time for decompression: compression is a process of searching for the shortest representation of a file, while decompression simply involves interpreting that short representation.&lt;br /&gt;&lt;br /&gt;The Hutter prize places a time limit on decompression, but not on compression; so in that way, my method has the upper hand. In theory I could let my algorithm search for months, and the resulting file might decompress in minutes. (A rough estimate would be that the decompress time is the logarithm of the compress time.)&lt;br /&gt;&lt;br /&gt;By the way, these compression contests are far from practical in nature. Real applications of compression prefer compression methods that work fairly fast for both compression and decompression. The Hutter prize is designed to increase interest in the theoretical connection between prediction and compression, as it relates to artificial intelligence. Thus, it has a very loose time retriction, allowing for "smart" compression methods such as PAQ that take a long hard look at the data.&lt;br /&gt;&lt;br /&gt;But, enough about contests. Time for some more details concerning the algorithm itself.&lt;br /&gt;&lt;br /&gt;The major design decision is the choice of Turing-complete representation. The algorithm  actually depends critically on this. Representations vary greatly in their properties; specifically, how well a heuristic can guide the serch. The major heuristic to consider is simply size: if a move decreases  size immediately, it is somewhat more probable that it decreases size in the long run, too.&lt;br /&gt;&lt;br /&gt;The "classic" choice would be to just go with Turing machines. Heuristics on pure turing machines, however, happen to be &lt;span style="font-style: italic;"&gt;terrible&lt;/span&gt;. There are a very small number of options for backwards-moves, and usually none of them immediately changethe size of the current representation. We are running blind!&lt;br /&gt;&lt;br /&gt;For most of last semester, I was working on a version that would use lambda calculus as the representation. Lambda calculus is much better; in particular, it allows us to immediately take advantage of any repetitions that we recognize. Noticing repetitions is the most direct way to apply the shortness heuristic; it is the method used by &lt;a href="http://sequitur.info/"&gt;sequitur&lt;/a&gt;, one of the algorithms inspiring my current direction.&lt;br /&gt;&lt;br /&gt;Lambda calculus can compress repetitions using "abstraction". However, an abstraction needstoactually be put somewhere when it is made. There is no convenient heuristic for where to put it (at least, not that I've thought of). For example, if I see a string "abcabcabc", I could either abstract at the global level, which would allow me to compress any other occurrence of "abc" that occurs into the same abstraction, or at the local level, which might allow me to compress other local groupings of three repetitions. Also, I could place the abstraction anywhere inbetween, if I thought some of the context was important. "Anywhere inbetween" is a rather large search space. This and other issues led me away from lambda calculus as the representation of choice.&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Formal_grammar"&gt;&lt;br /&gt;Generative grammars&lt;/a&gt; are another option. These appeal to me, partly because they are the formalism chosen by sequitur. Like lambda calculus, they offer a similar way of abbreviating repetitions. With grammars, the choice is what to rewrite them to, rather than where to place their abstraction. Sequitur can be seen as the special case of always rewriting repetitions to a single symbol, which represents the sequence that had repeated. More sophisticated types of patterns can be represented by allowing more complicated rewrites. Heuristically, simpler rewrites can be preferred.&lt;br /&gt;&lt;br /&gt;Unfortunately, there is a serious problem with this representation. Unlike lambda expressions, grammars are not necessarily &lt;span style="font-style: italic;"&gt;confluent&lt;/span&gt;, which means that when we use the grammar to generate data we may get a different file then intended. It is possible to specify a convention telling us how to apply the grammar to get a file (for example, "always expand the first applicable rule going left-right, deciding ties by using the shorter of the rules"). However, a restriction on forward-execution must be mirrored by a corresponding restriction on backwards-execution, which means the search during the compression phase is more constrained. In many cases it is good news when we can constrain a search more; but in this case, it hurts the ability of the heuristic to guide us. (It seems there is a fairly direct connection here: restricting the branching factor leads to a worse heuristic unless we're very careful.)&lt;br /&gt;&lt;br /&gt;I also spent some time creating a set of restrictions that ensure confluence, but although these systems salvaged much of the situation, they were not as good as the option I finally settled on: combinators.&lt;br /&gt;&lt;br /&gt;Combinators differ from lambda expressions in that functions are given names and have external definitions, whereas in lambda expressions the function definitions are in-place via abstraction. Minimally, one can use just two combinators, S and K, for a Turing-complete formalism. (If you want to know the definitions of S and K, &lt;a href="http://en.wikipedia.org/wiki/SKI_combinator_calculus"&gt;look them up&lt;/a&gt;.) However. for my purposes, it makes sense to invent new combinators as needed.&lt;br /&gt;&lt;br /&gt;The idea behind a heuristic for combinators is to not only look for exact repetition, but also repetition of pieces with holes. Not only can we notice multiple instances of "abc", but also "a_c". Repetitions with holes suggest combinators that take hole-fillers as arguments.&lt;br /&gt;&lt;br /&gt;Turing-completeness is assured by the fact that the system could invent S and K, although it probably won't. (For one thing, S and K are not the only set of operators that allow expressive completeness. For another, they are too "abstract" to be suggested directly by data... the heuristic will not give them a high score. Although, they are short, so their score should not be &lt;span style="font-style: italic;"&gt;too&lt;/span&gt; low...)&lt;br /&gt;&lt;br /&gt;That, then, is the basic idea. There are many directions for expansion, which might help the algorithm (and might not).&lt;br /&gt;&lt;br /&gt;-More complicated functions&lt;br /&gt;--functions that take sequences as arguments, not just single terms (this is technically possible without expanding the formalism, via a trivial combinator A (x,y) -&gt; xy; however, some work could be done to getthe heuristic to look for this sort of thing, essentially by keeping an eye out for repeated patterns with variable-sized holes rather than single-slot holes)&lt;br /&gt;--probabilistic functions (this may or may not provide a compression advantage, since we've got to keep track of the extra info to make things deterministic anyway, but even if it creates no advantage it might make things more human-readable)&lt;br /&gt;--functions with more complicated definitions, such as if-then statements&lt;br /&gt;&lt;br /&gt;-more sophisticated heuristic &amp;amp; more sophisticated search&lt;br /&gt;--best-first vs semi-random&lt;br /&gt;--learning&lt;br /&gt;---memoize "reduced forms" of commonly-encountered structures?&lt;br /&gt;---learn which heurustic info to tabulate?&lt;br /&gt;---learn sophisticated search strategies?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1347979676186528466?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1347979676186528466/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/02/compression-as-promised-in-previous.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1347979676186528466'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1347979676186528466'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2009/02/compression-as-promised-in-previous.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-9163938823980050162</id><published>2008-12-21T21:25:00.000-08:00</published><updated>2008-12-22T23:35:11.388-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='predicate models'/><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><category scheme='http://www.blogger.com/atom/ns#' term='relational models'/><category scheme='http://www.blogger.com/atom/ns#' term='propositional models'/><category scheme='http://www.blogger.com/atom/ns#' term='probabilistic relational models'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Back to AI&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;OK, here we go.&lt;br /&gt;&lt;br /&gt;AI is full of many fairly well-developed methods of "propositional" learning: probabilistic, fuzzy, and otherwise fancified versions of stuff we can do with boolean algebra but no quantifiers. In other words, current AI is fairly good at manipulating systems that have a fixed number of variables, but not so good at arbitrary logical structures.&lt;br /&gt;&lt;br /&gt;A simple illustration of the two categories:&lt;br /&gt;&lt;br /&gt;Propositional Version:&lt;br /&gt;I give the AI data on 3,000 patients, which includes information about which patients had a list of 20 common symptoms, and what of 10 diseases those patients were ultumately determined to be suffering from. The AI learns to predict disease from symptoms.&lt;br /&gt;&lt;br /&gt;Predicate Version:&lt;br /&gt;I give the AI data on 3,000 patients, with logical descriptions of symptoms and logical descriptions of diseases. Again, the AI is to learn to predict diseases from symptoms.&lt;br /&gt;&lt;br /&gt;In the first case, the AI is learning a mapping from 20 variables (the twenty symptoms) to 10 variables (the diseases). In the second case, the AI is learning a mapping from one infinite space (possible logical descriptions of symptoms) to another (possible logical descriptions of diseases).&lt;br /&gt;&lt;br /&gt;There are some borderline cases. For example, if we are learning to map arbitrary real numbers to arbitrary real numbers, the space is infinite, and we *could* treat the real numbers as logical entities rather than merely continuous variables, but this is almost never done; so, it is propositional. (Treating the real numbers as logical entities would mean worrying about interesting exact numbers like 1, 2, 3, pi, e, 1/2, and so on. For propositional methods, 1 is not any more special than 1.002.) It gets fuzzier... suppose we are learning about integers rather than real numbers. We could adapt the same propositional strategies we used in the real-number case, restricting them to integers. It might work well. But with integers there is a greater chance that the exact number matters, and a greater chance that the number should be treated as a logical entity. Perhaps a pattern relies on one of the integers being even. A propositional method will not see this, and will need to learn from the data which integers the pattern applies to and which it doesn't. So, it will only be able to extend the pattern to cases it has seen before. If this sort of pattern is likely, more sophisticated methods become critical. Yet, we're still working with a fixed number of variables. The more sophisticated methods become &lt;span style="font-style: italic;"&gt;really&lt;/span&gt; critical when a given problem instance can contain a totally new variable, yet one that patterns from old variables might apply to. And then they become &lt;span style="font-style: italic;"&gt;really really&lt;/span&gt; critical when problem instances can't even be totally separated, because they all fit into one big structure of logical relationships...&lt;br /&gt;&lt;br /&gt;The transition from the first type of AI to the second is now taking place, largely under the banner of "probabilistic relational models". Hooray!&lt;br /&gt;&lt;br /&gt;(I've labeled the second type "predicate" to distinguish it from "propositional". This way of naming the two types is not uncommon, but many people use "relational" to describe the second class, instead. Another convenient term is "logical".)&lt;br /&gt;&lt;br /&gt;Anyway, I'm going to outline one way of carrying over the progress that's been made with propositional models to the higher level... this is an idea I've been sort of thinking about on the back burner, partly as a way of making an inherently parallel learning method. By no means do I think this is the only way of going forward, and in fact I'll probably be working in a different direction myself for at least a little while... maybe I'll write about that later.&lt;br /&gt;&lt;br /&gt;So, how can we use propositional algorithms in the wider domain (whatever we choose to call it)?&lt;br /&gt;&lt;br /&gt;I described propositional models as models having no quantifiers. Really, though, a universal quantifier over all the variables is taken for granted in the propositional setting. If a propositional method learns that a particular variable is "2 or 3", it doesn't mean that for some one case it is either two or three; it means for &lt;span style="font-style: italic;"&gt;any&lt;/span&gt; case it is either two or three.  This single quantifier gives us some ability to do relational-style learning.&lt;br /&gt;&lt;br /&gt;Suppose we want to learn a probabilistic model for pictures. We could treat every pixel as a variable, and learn correlations between them from the dataset. This approach would require a very large dataset before any interesting patterns could be found. It would need to re-learn every object in every possible location, because it would have no way of generalizing from one location to another. A smarter way to apply propositional methods to the problem would be to treat a small square, say 10 by 10, as a single "instance"; we learn correlations between variables in such squares. With this approach, we might be able to get something interesting from even one image. There is a cost, of course: where before it was impossible to generalize from one location in the picture to another, now it is impossible to &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; do so. For example, the method would not notice that an egg appears in the top right corner of each picture; it will simply notice that an egg is a common object. This can be helped, by adding the location information as an extra variable.&lt;br /&gt;&lt;br /&gt;More generally, for any structured data, this approach can be used to learn local structure. For linear data such as text, we can use a fixed number of letters rather than a fixed number of pixels. For tree-structured data (such as computer program code, HTML...), we can use a branching segment. But, we can go even further: there is a totally general mapping that will handle &lt;span style="font-style: italic;"&gt;any&lt;/span&gt; logical structure. Any data can be represented in the form of predicate statements: a series of entities, each of which can have properties and relationships with other entities. Just as we can select a random 10 by 10 area of a picture and ask what the colors are, we can select 10 entities at random and ask what their properties and relationships are. Let's call this the "universal mapping".&lt;br /&gt;&lt;br /&gt;The universal mapping allows us to learn logical structure, but it does have some serious limitations. Suppose once again that we're looking at linearly structured data such as text, but in predicate calculus form, and with the universal mappping. We have a bunch of entities ordered by a next/previous relation, and with a "letter" property that distinguishes each entity as 'a', 'b', 'c', ... 'A', 'B', 'C', ... 'space', 'comma', 'period', et cetera. Now suppose that we sample entities at random. If we've got much text, it will be very unlikely that we'll pick letters that occur next to eachother. We're learning a correct distribution, technically, but we're not looking at the "interesting" part of it (the line). The algorithm could eventually learn what it was supposed to, but it would take too much time, and besides that it would not be in a directly usable form (since the distribution we wnat is embedded in a larger, useless distribution). If we used a special-purpose linear mapping that did not sample so randomly, we'd be much better off.&lt;br /&gt;&lt;br /&gt;OK. Anyway, that is the "state of the art" so to speak. These are all basically a type of markov model. So, what is my new suggestion?&lt;br /&gt;&lt;br /&gt;Well, first, let's look at one more relevant detail of the current propositional algorithms. Many of them are hierarchical in form. This means that rather than finding complex patterns directly in the set of variables, the algorithm creates additional variables, and finds simpler relationships. Simpler relationships that include additional entities amount to the more complex relationships that we could have looked for in the first place; but the process is easier to manage by abstracting it in this way. This abstraction is iterated: second and third levels of hidden variables can be created to manage increasingly complex relationships, which treat the lower levels exactly as the first level treats the visible variables.&lt;br /&gt;&lt;br /&gt;Two recent sucessful examples are &lt;a href="http://www.scholarpedia.org/article/Deep_belief_networks"&gt;Hinton's deep belief nets&lt;/a&gt; and &lt;a href="http://www.numenta.com/"&gt;Numenta's Hierarchical Temporal Memory&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So in the propositional domain, we can talk about hidden variables. In the predicate domain, we can also talk about hidden variables, but we can add hidden relations and even hidden entities to the list. For propositional methods, hidden variables are just an efficiency issue. In the predicate domain, it is very different: hidden stuff dramatically increases the representational power (and therefore the model search space).&lt;br /&gt;&lt;br /&gt;Starting simple with hidden variables: if each entity is allowed to posses hidden variables, then the modeling power has changed from &lt;a href="http://en.wikipedia.org/wiki/Markov_chain"&gt;markov-model&lt;/a&gt; to &lt;a href="http://en.wikipedia.org/wiki/Hidden_Markov_model"&gt;hidden-markov-model&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Adding hidden entities and relations is enough to boost the representational power up to &lt;a href="http://en.wikipedia.org/wiki/Turing_complete"&gt;turing-completeness&lt;/a&gt;: any computable pattern can be represented.&lt;br /&gt;&lt;br /&gt;So, OK, enough background. Now, the question is, how can we best use the power of existing propositional methods to search a turing-complete model space? Here is the idea I've been pondering recently...&lt;br /&gt;&lt;br /&gt;For the moment, let's say we're working with linearly structured data, for simplicity. One turing-complete formalism is the &lt;a href="http://en.wikipedia.org/wiki/Cellular_automata"&gt;cellular automaton&lt;/a&gt;. These bear some structural similarity to a hierarchical network of variables. The major difference is that all cells are the same, meaning that all variables are interacting with their neighbors in exactly the same manner. My basic idea is to semi-smoothly integrate deep belief network learning with cellular automaton learning. For the linearly structured data, then, the system would be searching for two-dimensional automata. Intuitively, learning a cellular automaton of hidden variables is similar to learning the physical laws that hold sway equally at any point in space. We can see directly how those physical laws work for the visible space, and we generalize, assuming that there are nearby hidden spaces that partially explain the visible space, and farther-off hidden spaces, and so on, but that all follow the same physical laws.&lt;br /&gt;&lt;br /&gt;Well, that was nearly 20 paragraphs of introduction for 1 paragraph of an idea... but, that introductory material was interesting in itself, and it will be useful to refer to in future posts. So, more later...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-9163938823980050162?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/9163938823980050162/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/9163938823980050162'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/9163938823980050162'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8975688144626158278</id><published>2008-12-05T19:51:00.000-08:00</published><updated>2008-12-05T21:55:27.306-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='metamathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='infinity'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;A General Theory&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Well, I have been climbing the ladder of infinities for a while now on this blog. Soon I must get back to how this all applies to AI... but not quite yet.&lt;br /&gt;&lt;br /&gt;Here is a theory that almost works.&lt;br /&gt;&lt;br /&gt;Start with first-order logic. This can be your favorite non-classical variation if you wish; intuitionistic, paraconsistent, relevant, whatever you want. There is only one requirement: it needs to be strong enough to be Turing-complete (more specifically, logical consequence should be completely enumerable but not co-enumerable, thanks to the good old halting problem). Call this the &lt;span style="font-weight: bold;"&gt;base language&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Add to this your favorite theory of truth. For maximum effectiveness, it should include the infinite hierarchy of truth that I suggested in &lt;a href="http://dragonlogic-ai.blogspot.com/2008/10/progress-after-pondering-previous-post.html"&gt;this post&lt;/a&gt;. This is not a difficult requirement: either a revision theory of truth or a fixed-point theory will do, as well as many less-well-known theories, I'm sure... and anyway, the requirement is not totally necessary, as I'll attempt to make clear. Anyway, call this language the &lt;span style="font-weight: bold;"&gt;language of truth&lt;/span&gt;. The theory of truth that you choose will assign actual truth-values to the statements in this language.&lt;br /&gt;&lt;br /&gt;Now we have what we consider a solid theory of truth. But, as I pointed out in the message I copied in the &lt;a href="http://dragonlogic-ai.blogspot.com/2008/11/interesting-mailing-list-i-recently.html"&gt;previous post&lt;/a&gt;, all such theories appear to have referential gaps: some statements will have a status not nameable within the theory, which we can name and reason about as people standing outside of the theory. The type of referential gap will depend on the type of theory of truth that was used. In the most common case, the gap will be sentences that are not assigned either "true" or "false". The theory of truth will be able to state that a sentences is true, or false, but not that it is neither. Defenders of such theories of truth will attempt to claim that we really can't say that; for example, one way of arguing is saying that such sentences have undefined truth values, but we could later add logical conventions that ascribe true or false to them. However, such arguments are self-defeating: the argument needs to refer to the class of sentences that are in this intermediate state, so it generally must invent a label for them (such as "undefined"). This name is precisely the reference gap of the theory, and cannot be stated inside the theory.&lt;br /&gt;&lt;br /&gt;So, the next step is to add to the language whatever labels we need in order to fill the gap(s) that exist in the theory of truth. I call this stage a &lt;span style="font-style: italic;"&gt;theory of meaning&lt;/span&gt;, because I think it is important to point out that the theory of truth is not incomplete just because of the gaps; it may be a complete and valid theory of truth, it just is not a complete theory of logical/mathematical reference. Call the new gap-filling language the &lt;span style="font-weight: bold;"&gt;first language of meaning&lt;/span&gt;. I will generally pretend that there is only one gap, as in the simple case. Call this gap &lt;span style="font-weight: bold;"&gt;1-meaningles&lt;/span&gt;. (The idea doesn't seem hard to modify to approaches that create multiple gaps.)&lt;br /&gt;&lt;br /&gt;Assigning truth values to this language can generally be accomplished by relying on the original theory of truth that we chose. First, label the 1-meaningless sentences as such. Second, modify the theory of truth to act upon 3 truth values rather than the usual 2. This will involve some decisions, but as you can see I am not too concerned with details in this post. Generally, we'll have to decide things like whether it is 1-meaningless or just false to claim that a 1-meaningless sentences is true. Once we've made these decisions, we simply apply the method.&lt;br /&gt;&lt;br /&gt;This, of course, creates another gap; having 3 truth values rather than 2 is not so radical as to change the result there. Call the new gap &lt;span style="font-weight: bold;"&gt;2-meaningless&lt;/span&gt;, and call the language that includes it the &lt;span style="font-weight: bold;"&gt;second language of meaning&lt;/span&gt;. Assign truth-values to this language in the same way, by expanding the method to include 4 truth values.&lt;br /&gt;&lt;br /&gt;By now you get the idea. We define 5-meaningless, 6-meaningless, and so on. And if you read the first post I mentioned (&lt;a href="http://dragonlogic-ai.blogspot.com/2008/10/progress-after-pondering-previous-post.html"&gt;this post&lt;/a&gt;), then you'll probably also realize that I want to similarly define infinity-meaningless, inf+1-meaningless, inf+inf-meaningless, and so on. More specifically, I want to define a type of meaninglessness corresponding to &lt;span style="font-style: italic;"&gt;every ordinal number&lt;/span&gt;. As I hinted at the beginning, this rather large hierarchy should smooth out most differences in referential power of the methods involved; so, a really weak initial theory of truth should still do the trick in the end, gaining maximal referential power after unstateably many iterations.&lt;br /&gt;&lt;br /&gt;Now for the pleasant surprise. Once I've done this, I can prove that there is &lt;span style="font-style: italic;"&gt;no referential gap left&lt;/span&gt;. If I had a final gap, it would correspond to an ordinal number larger than all other ordinal numbers (including itself)! This cannot be, so the theory is safe, almost as if it were protected by a magic charm.&lt;br /&gt;&lt;br /&gt;For a few weeks now I've been satisfied with this conclusion. But, early on, I realized that I didn't know what the logic should say about a statement like "This sentence is either false or some type of meaningless". A day or so ago I realized what was going on. Each new language can refer to any combination of the truth-states from the levels under it, but obviously there is no top level (since there is no ordinal larger than all others), so we don't have permission to refer to any combination of values from any level we want; we are only allowed to refer to combinations that have some upper bound. The phrase "some type of meaningless" has no upper bound; it attempts to refer to the entire unbounded list.&lt;br /&gt;&lt;br /&gt;There is some legitimate mathematical tradition that could be used to justify this limitation of the logic. One is not supposed to tamper with all ordinals at once. So, I could simply say that it's OK to stop here. Of course, I'm using the same sort of self-defeating argument that is so common in this area... in order to say that we can't refer to the list of all ordinals, I am doing exactly that.&lt;br /&gt;&lt;br /&gt;Another way out would be to allow the newly-problematic statements to be both true and false, using some paraconsistent logic. This is similarly justified by the motto "one is not supposed to tamper with all ordinals at once". The taboo surrounding the list of all ordinals arises, of course, from the fact that contradictions follow quite quickly from reasoning about it. So, I could somewhat legitimately argue that it is literally an inconsistent yet well-defined mathematical entity.&lt;br /&gt;&lt;br /&gt;However, this does not give me maximal expressive power... I will end up wanting to invent new notation to fill reference-gaps in the paraconsistent theory, and on we go.&lt;br /&gt;&lt;br /&gt;So, a more direct approach would be to allow unrestricted statements about ordinals, and then do the same thing we've been doing... assign these statements truth values in the obvious ways, then apply an expanded theory of truth to add a truth predicate to that language, then call anything that isn't assigned a value "super-meaningless", expand the theory of truth to give that a predicate, invent 2-super-meaningless, 3-super-meaningless, inf-, and the whole ordinal hierarchy again. Then what? Well, we'll have the same sort of gap for this second ordinal hierarchy. So by doing the whole thing &lt;span style="font-style: italic;"&gt;again&lt;/span&gt;, we can create super-super-meaningless, super-super-super-meaningless, and infinite versions with any ordinal number of supers . What next? Well, we'll have the same problem again...&lt;br /&gt;&lt;br /&gt;But notice what I'm doing...&lt;br /&gt;&lt;br /&gt;All of this is quite seriously illegal if we take the notion of ordinal number seriously, because I'm simply constructing a hierarchy of ordinals &lt;span style="font-style: italic;"&gt;larger than all ordinals&lt;/span&gt;. An ordinal cannot be larger than all ordinals! And even less can there be a whole hierarchy up there...&lt;br /&gt;&lt;br /&gt;This demonstrates the force of the two limitative arguments (either "no, you seriously cannot refer to those things" or "sure, go ahead, but you'll derive contradictions"). Even though there really really seems to be a solid next step that can be taken, it is either a brick wall or a gaping ravine...&lt;br /&gt;&lt;br /&gt;So, like I said, it seems to be a theory that almost works.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8975688144626158278?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8975688144626158278/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/12/general-theory-well-i-have-been.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8975688144626158278'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8975688144626158278'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/12/general-theory-well-i-have-been.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1420413472329956017</id><published>2008-11-24T10:55:00.000-08:00</published><updated>2008-11-24T11:00:46.705-08:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;An Interesting Mailing List&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I recently joined the &lt;a href="http://www.weidai.com/everything.html"&gt;everything list&lt;/a&gt;. It looks to be some good philosophical fun. The list practice is to submit a "joining post" that reviews intellectual background. I am replicating mine here.&lt;br /&gt;&lt;br /&gt;----------------------------------&lt;br /&gt;&lt;br /&gt;Hi everyone!&lt;br /&gt;&lt;br /&gt;My name is Abram Demski. My interest, when it comes to this list, is:&lt;br /&gt;what is the correct logic, the logic that can refer to (and reason&lt;br /&gt;about) any mathematical structure? The logic that can define&lt;br /&gt;everything definable? If every possible universe exists, then of&lt;br /&gt;course we've got to ask which universes are possible. As someone&lt;br /&gt;mentioned recently, a sensible approach is to take the logically&lt;br /&gt;consistent ones. So, I'm asking: in what logic?&lt;br /&gt;&lt;br /&gt;I am also interested in issues of specifying a probability&lt;br /&gt;distribution over these probabilities, and what such a probability&lt;br /&gt;distribution really means. Again there was some recent discussion on&lt;br /&gt;this... I was very tempted to comment, but I wanted to lurk a while to&lt;br /&gt;get the idea of the group before posting my join post.&lt;br /&gt;&lt;br /&gt;Following is my view on what the big questions are when it comes to&lt;br /&gt;specifying the correct logic.&lt;br /&gt;&lt;br /&gt;The first two big puzzles are presented to us by Godel's&lt;br /&gt;incompleteness theorem and Tarski's undefinability theorem. The way I&lt;br /&gt;see it, Godel's theorem presents a "little" puzzle, which points us in&lt;br /&gt;the direction of the "big" puzzle presented by Tarski's theorem.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Godel%27s_theorem" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Godel%27s_theorem&lt;/a&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Tarski%27s_undefinability_theorem" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Tarski%27s_undefinability_&lt;wbr&gt;theorem&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The little puzzle is this: Godel's theorem tells us that any&lt;br /&gt;sufficiently strong logic does not have a complete set of deduction&lt;br /&gt;rules; the axioms will fail to capture all truths about the logical&lt;br /&gt;entities we're trying to talk about. But if these entities cannot be&lt;br /&gt;completely axiomized, then in what sense are they well-defined? How is&lt;br /&gt;logic logical, if it is doomed to be incompletely specified? One way&lt;br /&gt;out here is to say that numbers (which happen to be the logical&lt;br /&gt;entities that Godel showed were doomed to incompleteness, though of&lt;br /&gt;course the incompleteness theorem has since been generalized to other&lt;br /&gt;domains) really are incompletely specified: the axioms are incomplete&lt;br /&gt;in that they fail to prove every sentence about numbers either true or&lt;br /&gt;false, but they are complete in that the ones they miss are in some&lt;br /&gt;sense actually not specified by our notion of number. I don't like&lt;br /&gt;this answer, because it is equivalent to saying that the halting&lt;br /&gt;problem really has no answer in the cases where it is undecidable.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Halting_problem" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Halting_problem&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Instead, I prefer to say that while decidable facts correspond to&lt;br /&gt;finite computations, undecidable facts simply correspond to infinite&lt;br /&gt;computations; so, there is still a well-defined procedure for deciding&lt;br /&gt;them, it simply takes too long for us to complete. For the case of&lt;br /&gt;number theory, this can be formalized with the arithmetical hierarchy:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Arithmetical_hierarchy" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Arithmetical_hierarchy&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Essentially, each new quantifier amounts to a potentially infinite&lt;br /&gt;number of cases we need to check. There are similar hierarchies for&lt;br /&gt;broader domains:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Hyperarithmetical_hierarchy" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Hyperarithmetical_hierarchy&lt;/a&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Analytical_hierarchy" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Analytical_hierarchy&lt;/a&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Projective_hierarchy" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Projective_hierarchy&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This brings us to the "big" puzzle. To specify the logic an refer to&lt;br /&gt;any structure I want, I need to define the largest of these&lt;br /&gt;hierarchies: a hierarchy that includes all truths of mathematics.&lt;br /&gt;Unfortunately, Tarski's undefinability theorem presents a roadblock to&lt;br /&gt;this project: If I can use logic L to define a hierarchy H, then H&lt;br /&gt;will necessarily fail to include all truths of L. To describe the&lt;br /&gt;hierarchy of truths for L, I will always need a more powerful language&lt;br /&gt;L+1. Tarski proved this under some broad assumptions; since Tarski's&lt;br /&gt;theorem completely blocks my project, it appears I need to examine&lt;br /&gt;these assumptions and reject some of them.&lt;br /&gt;&lt;br /&gt;I am, of course, not the first to pursue such a goal. There is an&lt;br /&gt;abundant literature on theories of truth. From what I've seen, the&lt;br /&gt;important potential solutions are Kripke's fixed-points, revision&lt;br /&gt;theories, and paraconsistent theories:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Saul_Kripke#Truth" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Saul_Kripke#Truth&lt;/a&gt;&lt;br /&gt;&lt;a href="http://plato.stanford.edu/entries/truth-revision/" target="_blank"&gt;http://plato.stanford.edu/&lt;wbr&gt;entries/truth-revision/&lt;/a&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Paraconsistent_logic" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Paraconsistent_logic&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;All of these solutions create reference gaps: they define a language L&lt;br /&gt;that can talk about all of its truths, and therefore could construct&lt;br /&gt;its own hierarchy in one sense, but in addition to simple true and&lt;br /&gt;false more complicated truth-states are admitted that the language&lt;br /&gt;cannot properly refer to. For Kripke's theory, we are unable to talk&lt;br /&gt;about the sentences that are neither-true-nor-false. For revision&lt;br /&gt;theories, we are unable to talk about which sentences have unstable&lt;br /&gt;truth values or multiple stable truth values. In paraconsistent logic,&lt;br /&gt;we are able to refer to sentences that are both-true-and-false, but we&lt;br /&gt;can't state within the language that a statement is *only* true or&lt;br /&gt;*only* false (to my knowledge; paraconsistent theory is not my strong&lt;br /&gt;suit). So using these three theories, if we want a hierarchy that&lt;br /&gt;defines all the truth value *combinations* within L, we're still out&lt;br /&gt;of luck.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As I said, I'm also interested in the notion of probability. I&lt;br /&gt;disagree with Solomonoff's universal distribution&lt;br /&gt;(&lt;a href="http://en.wikipedia.org/wiki/Ray_Solomonoff" target="_blank"&gt;http://en.wikipedia.org/wiki/&lt;wbr&gt;Ray_Solomonoff&lt;/a&gt;), because it assumes that&lt;br /&gt;the universe is computable. I cannot say whether the universe we&lt;br /&gt;actually live in is computable or not; however, I argue that,&lt;br /&gt;regardless, an uncomputable universe is at least conceivable, even if&lt;br /&gt;it has a low credibility. So, a universal probability distribution&lt;br /&gt;should include that possibility.&lt;br /&gt;&lt;br /&gt;I also want to know exactly what it means to measure a probability. I&lt;br /&gt;think use of subjective probabilities is OK; a probability can reflect&lt;br /&gt;a state of belief. But, I think the reason that this is an effective&lt;br /&gt;way of reasoning is because these subjective probabilities tend to&lt;br /&gt;converge to the "true" probabilities as we gain experience. It seems&lt;br /&gt;to me that this "true probability" needs to be a frequency. It also&lt;br /&gt;seems to me that this would be meaningful even in universes that&lt;br /&gt;actually happened to have totally deterministic physics-- so by a&lt;br /&gt;"true probability" I don't mean to imply a physically random outcome,&lt;br /&gt;though I don't mean to rule it out either (like uncomputable&lt;br /&gt;universes, I think it should be admitted as possible).&lt;br /&gt;&lt;br /&gt;Well, I think that is about it. For now.&lt;br /&gt;&lt;span style="color:#888888;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1420413472329956017?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1420413472329956017/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/interesting-mailing-list-i-recently.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1420413472329956017'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1420413472329956017'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/interesting-mailing-list-i-recently.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-3457241044728622014</id><published>2008-11-11T06:56:00.000-08:00</published><updated>2008-11-15T09:54:30.859-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Fixing Naive Lambda Logic&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://dragonlogic-ai.blogspot.com/2008/08/paradox-i-discovered-paradox-in-version.html"&gt;this old post&lt;/a&gt;, I explained why naive lambda logic is paradoxical, and explained how to fix it. The fix that I suggested restricted lambda statements to represent finite computations, which is really what they are for anyway. However, in &lt;a href="http://dragonlogic-ai.blogspot.com/2008/11/lure-of-paraconsistency-paraconsistent.html"&gt;this recent post&lt;/a&gt;, I suggest that many naive, inconsistent logics (such as the naive lambda logic) are actually referentially complete: they can state any mathematically meaningful idea. The problem is that they also can state some meaningless things.&lt;br /&gt;&lt;br /&gt;So, it is interesting to consider how one might fix naive lambda logic in a way that keeps the meaningful infinite stuff while removing the nonsense.&lt;br /&gt;&lt;br /&gt;To do that, though, a definition of meaningfulness is needed. As I said in the more recent post, a statement's meaning needs to come ultimately from a basic level of meaningful statements (such as statements about sensory data). A meaningful statement should either be on this base level, or its meaning should be derived purely from other meaningful statements.&lt;br /&gt;&lt;br /&gt;In the lambda logic, it is natural to think of a term as meaningful if it evaluates to something. A statement, then, is meaningful if its truth relies only on the value of meaningful terms. To make this precise, we've got to specify the infinite build-up process by which we assign meaning. For each operation that forms a new sentence, we've got to define how meaning is carried over. One of these rules, for example, will be:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;If statement &lt;span style="font-weight: bold;"&gt;X&lt;/span&gt; is meaningful and statement &lt;span style="font-weight: bold;"&gt;Y&lt;/span&gt; is meaningful, then the statement &lt;span style="font-weight: bold;"&gt;X and Y&lt;/span&gt; is meaningful.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We need a rule like this for every operation: &lt;span style="font-weight: bold;"&gt;and&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;or&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt;, quantifiers, lambda, and the lambda-value relation. That may sound simple, but complications quickly build up. &lt;span style="font-weight: bold;"&gt;And&lt;/span&gt; is meaningful when its arguments are meaningful. But what about &lt;span style="font-weight: bold;"&gt;or&lt;/span&gt;? It seems like &lt;span style="font-weight: bold;"&gt;X or Y&lt;/span&gt; could be meaningfully true if &lt;span style="font-weight: bold;"&gt;X&lt;/span&gt; is true but &lt;span style="font-weight: bold;"&gt;Y&lt;/span&gt; is meaningless. But if we want this to work, then it would also be sensible to allow &lt;span style="font-weight: bold;"&gt;not Y&lt;/span&gt; to be meaningfully true when &lt;span style="font-weight: bold;"&gt;Y&lt;/span&gt; is meaningless. (That behavior of &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt; would make the behavior of &lt;span style="font-weight: bold;"&gt;or&lt;/span&gt; that I mentioned consistent with classical logic.) But that equates &lt;span style="font-style: italic;"&gt;meaningless&lt;/span&gt; with &lt;span style="font-style: italic;"&gt;false&lt;/span&gt;, which seems wrong.&lt;br /&gt;&lt;br /&gt;Another problem arises with the treatment of quantifiers. Do quantifiers range over all possible terms, or only meaningful ones? It makes a difference!&lt;br /&gt;&lt;br /&gt;There are many different places we can run to get standard solutions to these problems: &lt;a href="http://en.wikipedia.org/wiki/Free_logic"&gt;free logic&lt;/a&gt;, &lt;a href="http://www.science.uva.nl/%7Eseop/entries/truth-revision/"&gt;the revision theory of truth&lt;/a&gt;, fixed-point theories, and others.&lt;br /&gt;&lt;br /&gt;A third problem, perhaps worse, arises from the concept "not meaningful". For a concept to be meaningful seems straightforward: it should be built up in a meaningful way from other meaningful statements. But trouble presents itself when we discuss non-meaning.&lt;br /&gt;&lt;br /&gt;Imagine the base-level facts as ground from which trees are growing. The trees, of course, are the meaningful statements that can be built up. Meaningless statements would be branches hanging in midair, attempting to grow from themselves, or from other meaningless statements. (Meaningful statements can also have self-reference, but have an attachment to the ground somewhere.)&lt;br /&gt;&lt;br /&gt;Now, when we consider in the concept "meaningless", we see some weird stuff happening: somehow the paradoxical branches that grow from nothing are able to support meaningful branches, such as the statement &lt;span style="font-style: italic;"&gt;"This sentence is false" is meaningless&lt;/span&gt;. Or, even stranger, consider "This sentences is meaningless". It appears to be meaningful but false. Or, consider "This sentence is either false or meaningless". If it is true, it is false or meaningless; if it is false, it is true; if it is meaningless, then it is true. It looks like the only way to deal with it is to say that it is meaningless to ask which of the categories it is assigned to: it is meaningless to talk about its meaningfulness or meaninglessness.&lt;br /&gt;&lt;br /&gt;To handle cases like these requires a sort of back-and-forth between meaningful and meaningless. We can't just grow meaning from the ground up and then declare the rest meaningless; in declaring things meaningless we allow more meaningful statements, so we've got to go back and add them in. That in turn might change the set of meaningless statements, and so on. If in doing this we are changing our assessments of various statements (going back and forth between "meaningful" and "meaningless"), then we are doing something similar to what the revision theory of truth recommends. On the other hand, I like the idea of marking things "definitely meaningful" and "definitely meaningless". A back-and-forth woulds still be needed, but all decisions would be final.&lt;br /&gt;&lt;br /&gt;Anyway. Suppose we resolve all of those issues. Another interesting issue comes up: infinite lambda statements.&lt;br /&gt;&lt;br /&gt;An infinite lambda statement could directly represent the mathematical entities that I want the system to be able to reason about. For example, an arbitrary real number would be any function from natural numbers to the integers 0 through 9 (if we want decimal notation), represented by a finite or infinite lambda statement. (Note: the calculation itself could always be fixed to halt in finite time, despite the lambda statement being infinite.) The interesting thing is that if the logic has been satisfactorily rigged up, it will be in some sense &lt;span style="font-style: italic;"&gt;as if&lt;/span&gt; infinite lambda statements were allowed, even though they aren't.&lt;br /&gt;&lt;br /&gt;This suggests that we need to be careful of the semantics, and therefore of how nonmonotonic reasoning is used. Are the quantifiers interpreted as ranging over all actually-representable lambda terms, or are they also ranging over the unrepresentable infinite ones? If the logic is to talk about unrepresentables properly, the quantifiers will have to range over them. But then nonmonotonic reasoning will not converge to the correct answers: it will converge to answers that hold for the finite terms only. This will sometimes be correct for the infinite case, but not always. The matter seems complicated, and I'm not yet sure how to deal with it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-3457241044728622014?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/3457241044728622014/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/fixing-naive-lambda-logic-in-this-old.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/3457241044728622014'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/3457241044728622014'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/fixing-naive-lambda-logic-in-this-old.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4693550830185894046</id><published>2008-11-10T14:44:00.000-08:00</published><updated>2008-11-15T10:28:50.354-08:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;How to Reason with Four-valued Definitional Set Theory&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;1. There is a "ground level" composed of statements in first-order logic that can be reasoned about in the usual manner.&lt;br /&gt;2. The &lt;a href="http://en.wikipedia.org/wiki/Axiom_of_separation"&gt;axiom of set comprehension&lt;/a&gt; is interpreted as allowing arbitrary &lt;span style="font-style: italic;"&gt;definitions&lt;/span&gt; of sets (rather than asserting the actual &lt;span style="font-style: italic;"&gt;existence&lt;/span&gt; of sets). A general logic of definitions would allow us to define any sort of object with any sort of definition, and reason about it from there; in this case, though, I only am worried about defining sets.&lt;br /&gt;3. The definition of a set S can be "for all x: x is an element of S if and only if [statement]" for any statement (in the full language, not just the first-order part).&lt;br /&gt;4. If "x is an element of S" can be derived from the definition of S plus any relevant information on x, then it is so. Similarly for "x is not an element of S". "Relevant information" here means the definition of x if x is a defined set, plus all information existing at the first-order level. Notice that the definition of S might imply both "x is an element of S" and "x is not an element of X", or neither of these.&lt;br /&gt;&lt;br /&gt;That's it! Unless I've missed something. But, anyway, one thing that is very specifically &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; included is a way for contradictions (or &lt;span style="font-style: italic;"&gt;anything&lt;/span&gt;) to leak over from the level of defined objects to the "ground level" of normal logic.&lt;br /&gt;&lt;br /&gt;Oh, I suppose this should be added:&lt;br /&gt;&lt;br /&gt;5. Statements of the full language can be reasoned about with some sort of intuitionistic paraconsistent logic.&lt;br /&gt;&lt;br /&gt;This should compound statements to be constructed; for example, if two different statements about sets are true by definition, then "statement1 and statement2" should be treated as a true sentence. I think an OK way of doing this would be to allow the Gentzen-style [Edit: oh, I meant Prawitz-style!] introduction/elimination rules for &lt;span style="font-style: italic;"&gt;and&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;or&lt;/span&gt;, &lt;span style="font-style: italic;"&gt;forall&lt;/span&gt;, and &lt;span style="font-style: italic;"&gt;exists&lt;/span&gt;, skipping over the rules applying to implication and negation. But, don't take my word for it!&lt;br /&gt;&lt;br /&gt;Now, the system is definitely not referentially complete. It lacks the ability to refer to its own possible truth-value combinations. I'd need to add a nonmonotonic operator allowing the logic to conclude that a a statement was &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; true-by-definition and &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; false-by-definition, which of course gain credibility as attempts to prove a statement true/false fail. This gets somewhat confusing, because false-by-definition is distinct from not true-by-definition. Anyway, soon I'd be adding in the full machinery of a logic of "implies". So, the four-valued set theory doesn't seem like a complete foundation. Still, it is interesting!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4693550830185894046?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4693550830185894046/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/how-to-reason-with-four-valued.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4693550830185894046'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4693550830185894046'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/how-to-reason-with-four-valued.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-3689952487020779107</id><published>2008-11-08T18:18:00.000-08:00</published><updated>2008-11-08T22:10:52.689-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;The Lure of Paraconsistency&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A &lt;a href="http://plato.stanford.edu/entries/logic-paraconsistent/"&gt;paraconsistent&lt;/a&gt; logic is a logic that allows some contradictions. To make this somewhat bearable, paraconsistent logics find various ways of stopping the &lt;span style="font-style: italic;"&gt;rule of explosion&lt;/span&gt;: "from a contradiction, anything follows". The rule of explosion holds in both classical logic and intuitionistic logic, and makes inconsistent theories very uninteresting: there is only one of them, and in it all things are both true and false. Any theory that turns out to be inconsistent is logically equivalent to this boring theory.&lt;br /&gt;&lt;br /&gt;So, contradictions are not as bad in paraconsistent logics-- the explosion is contained, so it's not like everything falls apart. We might still be irked, and ask how this counts as "logic" if it is explicitly contradictory, but at least we can do interesting formal work without worry.&lt;br /&gt;&lt;br /&gt;For example, it would be possible to construct paraconsistant versions of &lt;a href="http://en.wikipedia.org/wiki/Naive_set_theory"&gt;naive set theory&lt;/a&gt;, &lt;a href="http://dragonlogic-ai.blogspot.com/2008/08/paradox-i-discovered-paradox-in-version.html"&gt;my naive lambda calculus&lt;/a&gt;, and naive theories of truth. In the context of my most recent posts, it is interesting to consider a naive theory of logical implication: collapse all of the different levels of implication (the &lt;a href="http://dragonlogic-ai.blogspot.com/2008/10/progress-after-pondering-previous-post.html"&gt;systems&lt;/a&gt; I named &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;infinity&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;A(infinity, infinity)&lt;/span&gt;...) into a single implication relation that can talk about any level.&lt;br /&gt;&lt;br /&gt;Now, the temptation: these "naive" foundational theories appear to be referentially complete! &lt;a href="http://en.wikipedia.org/wiki/Frege"&gt;Frege&lt;/a&gt; was able to use naive set theory to construct basic mathematics, and indeed it doesn't seem as if there is any bound to what can be constructed in it. On my end, a naive theory of implication obviously contains all the levels I might construct: the system &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt; is the fragment in which implication is only wrapped around first-order statements, &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt; is the fragment in which implications are additionally allowed to be wrapped around statements in &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;infinity&lt;/span&gt; is the fragment in which we can use any of &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;three&lt;/span&gt;.... &lt;span style="font-weight: bold;"&gt;infinity^infinity&lt;/span&gt; allows us to use &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;three&lt;/span&gt;,... &lt;span style="font-weight: bold;"&gt;infinity+1&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;infinity+2&lt;/span&gt;, &lt;span style="font-weight: bold;"&gt;infinity+3&lt;/span&gt;...&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-style: italic;"&gt;problem&lt;/span&gt;, of course, is that in addition to allowing all meaningful statements to enter, we apparently let some meaningless statements through as well. But, at the moment, I know of no other way to let in all the meaningful ones! So paraconsistent logic is looking pretty good at the moment.&lt;br /&gt;&lt;br /&gt;Essentially, it seems, a meaningful statement is any statement constructable in a naive theory that happens to have a truth value that relies on other meaningful statements. Those other meaningful statements gain their meaning from yet other meaningful statements, and so on down the line until we reach good solid first-order facts that everything relies on. But, the path down to basic statements can be arbitrarily complicated; so, it is impossible to construct a logic that contains all of the meaningful statements and none of the meaningless ones, because we can't know ahead of time which is which for every single case.&lt;br /&gt;&lt;br /&gt;I found an argument that paraconsistent logic &lt;a href="http://www.jstor.org/pss/2659783"&gt;isn't the only way to preserve naive set theory &lt;/a&gt;(&lt;a href="http://findarticles.com/p/articles/mi_m2346/is_n428_v107/ai_21248796/pg_1?tag=artBody;col1"&gt;free version&lt;/a&gt;), but it apparently is only hinting at the possibility, not providing a concrete proposal. Actually, I've made some relevant speculations myself... in &lt;a href="http://dragonlogic-ai.blogspot.com/2008/08/more-direct-attack-so-as-i-noted-last.html"&gt;this&lt;/a&gt; post, towards the end, I talk about a "logic of definitions". Such a logic would be four-valued: true, false, both, neither. A definition can easily be incomplete (rendering some statements neither true nor false), but it can just as easily be inconsistant (rendering some statements both true and false). This is suited particularly well to the way the comprehension principle works in naive set theory; essentially, we can read the naive comprehension principle as stating that any set that has a definition, exists. The trouble comes from the fact that some such definitions are contradictory!&lt;br /&gt;&lt;br /&gt;This seems nice; just alter naive set theory to use a four-valued logic and there you go, you've got your foundational logic that can do anything we might want to do. But I'm not about to claim that... first off, I haven't even begun to define how the four-valued logic would work. Second, that scheme omits the extra grounding that nonmonotonic methods seem somewhat capable of providing; I would want to look into that omission... Third, the non-classical manipulations provided by the four-valued logic may not be sufficiently similar to classical manipulations to justify much of classical mathematics. That would be a big warning sign. So, generally, the idea needs to be worked out in much more detail before it can be judged. (But, it looks like there is some work on four-valued logics... &lt;a href="http://citeseer.ist.psu.edu/596052.html"&gt;here&lt;/a&gt;, &lt;a href="http://portal.acm.org/citation.cfm?id=1164557"&gt;here&lt;/a&gt;, &lt;a href="http://www.springerlink.com/index/U814RP53874K1063.pdf"&gt;here&lt;/a&gt;... haven't read them yet.)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;But, &lt;span style="font-style: italic;"&gt;any&lt;/span&gt;way, the naive theories (and the idea of using paraconsistant logic to make them workable) are quite valuable in that they provide a glimpse into the possible, showing that it is not utterly crazy to ask for a logic that can define any infinity a human can define... we just might have to give up consistency.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-3689952487020779107?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/3689952487020779107/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/lure-of-paraconsistency-paraconsistent.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/3689952487020779107'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/3689952487020779107'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/lure-of-paraconsistency-paraconsistent.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-6561173719706617798</id><published>2008-11-04T13:33:00.001-08:00</published><updated>2008-11-04T16:47:15.340-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='grounding'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Normative Grounding&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;At the end of the last post, I mentioned a normative definition of grounding:&lt;span style="font-style: italic;"&gt; a concept is grounded in a system if the system captures everything worth capturing about the way we reason about that concept&lt;/span&gt;. Perhaps "grounding" isn't the best term for this, but whatever you call it, this is an important criteria. This principle should also cut the other way: if something is &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; worth capturing, it should not be in the system. At the end of the last post I listed a few ways in which the system fails the first test: things that we do (to our advantage) that the system doesn't capture. But the system also has problems in the other direction: it does things that it has no good reason to do.&lt;br /&gt;&lt;br /&gt;The expanded notion of meaning that I've made such a big deal about, which allows concepts that are meaningful in terms of other concepts but not in terms of base-level data, seems like a normative stretch. So what if a concept would be useful &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; we knew it to be true/false? The fact remains that it is useless, since we &lt;span style="font-style: italic;"&gt;cannot&lt;/span&gt; know one way or the other!&lt;br /&gt;&lt;br /&gt;Yet, we silly humans seem to use such concepts. We are (it seems) even able to prove or disprove a few of them, so that we &lt;span style="font-style: italic;"&gt;do&lt;/span&gt; know them to be true or false. How and why?&lt;br /&gt;&lt;br /&gt;The arithmetical hierarchy is defined in terms of a base-level class of computable predicates. However, thanks to the halting problem, it is impossible to tell which predicates are computable and which aren't. So, we never know for certain which level a statement is on. In terms of the formalism I've described, I suppose this would reflect our ability to disprove some statements of the form "true(...)" or "implies(...,...)"; for example, if we can show that a statement is a logical contradiction, then we know that it is not a tautology. (Do I need to add that to the list of manipulation rules?? Is it somehow justifiable if the system defines "true(X)" as "there exists a proof of X"??) So, some proof-searches are provably nonhalting, without resorting to mathematical induction... meaning some of the higher-level statements that appear undecidable will turn out to be decidable, and some that appear un-limit-decidable will turn out to be limit-decidable after all. Since we can't tell which ones will do this ahead of time, there may be normative justification for keeping them all...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-6561173719706617798?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/6561173719706617798/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/normative-grounding-at-end-of-last-post.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6561173719706617798'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6561173719706617798'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/normative-grounding-at-end-of-last-post.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8439395481956573276</id><published>2008-11-02T14:27:00.000-08:00</published><updated>2008-11-08T19:16:09.887-08:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Back to the Ground&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the previous post, I described a way of constructing arbitrarily powerful logics. This description does not amount to an algorithm-- it invokes something human mathematicians do (listing larger infinities), but doesn't provide a way to implement that on a computer. Worse, it is known that by the standard definition of algorithm, such an algorithm does not exist; I can only hope that it does exist as a limit-computable algorithm. (Note: I just found out that the term "super-recursive algorithm" that I used last time can refer to algorithms decidedly not implementable on normal computer hardware... so I guess there is not a single term that refers to exactly those super-recursive algorithms that &lt;span style="font-style: italic;"&gt;are&lt;/span&gt; implementable on standard hardware. "Limit-computable" is closer...) But, I think this is likely, since I think humans use such an algorithm.&lt;br /&gt;&lt;br /&gt;Anyway. What I did &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; describe was a way of "grounding" each new level. I have previously discussed at length an idea for grounding "&lt;span style="font-weight: bold;"&gt;two&lt;/span&gt;", based on non-monotonic logic and an understanding of meaning &lt;span style="font-style: italic;"&gt;within the network of beliefs&lt;/span&gt; rather than only in terms of the outside world. But, what about grounding the higher operators? With no manipulation rules, they are quite pointless!&lt;br /&gt;&lt;br /&gt;Actually, the operators are not exactly the problem. If we wrap two statements in an "implies" statement, the truth/falsehood of the result relies on no more then the validity of the deduction from the first statement to the second. In other words, in some sense we aren't adding anything, we're just making explicit something we already do. But then something strange happens: we're suddenly able to wrap quantifiers around the statement. This becomes problematic. At that stage, we've got to &lt;span style="font-style: italic;"&gt;create&lt;/span&gt; the manipulation rules that the next stage will wrap up in an explicit operator.&lt;br /&gt;&lt;br /&gt;Actually, the same problem applies for the system &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt;. I haven't specifically talked about how one statement in &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt; might be derived from another.&lt;br /&gt;&lt;br /&gt;But, really, the use of the "implies" operator is not essential: I could just have well used a "true" operator that took one statement. "A implies B" is true in the metalogic when "A -&gt; B" is &lt;span style="font-style: italic;"&gt;provably&lt;/span&gt; true in the base logic, with "-&gt;" representing truth-functional implication.&lt;br /&gt;&lt;br /&gt;Reducing "implies" to "true" in this way changes the appearance of the whole system greatly-- it makes obvious the direct connection to Tarski's Undefinability Theorem and the hierarchy of truth predicates that it was originally taken to entail.&lt;br /&gt;&lt;br /&gt;So, the manipulation rules for "true" are sufficient to account for the manipulation of "implies". First, we can adopt the famous T-schema:&lt;br /&gt;&lt;br /&gt;something &lt;=&gt; true(something)&lt;br /&gt;&lt;br /&gt;This does not lead to paradox because there is indeed a hierarchy established, rather then a single "true".&lt;br /&gt;&lt;br /&gt;Second, boolean combinations and sentences with quantifiers are subject to the typical deduction rules.&lt;br /&gt;&lt;br /&gt;Third, quantifiers are additionally subject to fallible rules: existential quantifiers are fallibly refuted by (infallibly) refuting  many instances, and universal quantifiers are fallibly affirmed by (infallibly) affirming many instances. (One may wish to allow fallible evidence as well, but there is no normative justification for doing so: the resulting beliefs will never converge to the truth!)&lt;br /&gt;&lt;br /&gt;fourth, "true(something)" is to be interpreted as "exists(affirmation of something in a lower logical system)", so that it is fallibly refutable like other existential statements. ("Affirmation" means proof if the lower system is &lt;span style="font-weight: bold;"&gt;one&lt;/span&gt;, limit-proof if the system is &lt;span style="font-weight: bold;"&gt;two&lt;/span&gt;, and so on. (So, in general, a proof of finite or infinite length.) Different logical primitives could be used in order to let this definition of "true" actually occur within the system; in particular, we could introduce more and more powerful quantifiers rather than more and more powerful truth-operators.)&lt;br /&gt;&lt;br /&gt;So, do these definitions ground the system's mathematical knowledge (Assuming we;ve also supplied the system with a method of enumerating infinities)? Well, it's hard to say, isn't it? "Grounding" is a philosphical issue. We need a concrete definition. Try this: a system's knowledge of X is grounded iff it reasons about X in the way that a human who knew about X would reason. More precisely, it captures everything worth capturing about the human way of reasoning. The answer to the question still isn't obvious; I do not have at my disposal a list of all the things worth capturing about human mathematical reasoning. But, I do have a partial list of things that I &lt;span style="font-style: italic;"&gt;think&lt;/span&gt; are worth capturing...&lt;br /&gt;&lt;br /&gt;-first-order reasoning&lt;br /&gt;-limit-computable fallible reasoning&lt;br /&gt;-arithmetical meaning&lt;br /&gt;-set-theoretic meaning&lt;br /&gt;-mathematical induction&lt;br /&gt;-transfinite induction&lt;br /&gt;&lt;br /&gt;The theory above does not totally encompass &lt;span style="font-style: italic;"&gt;all&lt;/span&gt; of limit-computation (it uses sigma-1 and pi-1 but not delta-2!). It does not entirely explain set-theoretical intuition (it includes "grounding" for uncountable entities, since it (hypothetically) enumerates any infinity a mathematician might; but, it doesn't support direct talk about sets). It doesn't support mathematical induction or transfinite mathematical induction.&lt;br /&gt;&lt;br /&gt;So, still work to be done!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8439395481956573276?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8439395481956573276/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/back-to-ground-in-previous-post-i.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8439395481956573276'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8439395481956573276'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/11/back-to-ground-in-previous-post-i.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-29542953511694623</id><published>2008-10-31T08:27:00.001-07:00</published><updated>2008-10-31T09:46:47.307-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='infinity'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='meaning'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Progress&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;After pondering the previous post for a bit, I became dissatisfied with the informality of it-- &lt;span style="font-style: italic;"&gt;I&lt;/span&gt; know what I'm saying, but I'm leaving so much out that someone else could easily interpret what I said in a different way. So, I started over, this time stating things more explicitly. This turned out to be a fruitful path.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;One&lt;/span&gt;: Start with first-order logic.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Two&lt;/span&gt;: Everyone seems to think that first-order logic is well-defined, but of course the fact is it cannot be defined in itself. So, we need to add the ability to define it. This can be done by adding some basic turing-complete formalism for the logic to manipulate, like lambda calculus or arithmetic or (perhaps most conveniently) first-order logic. So, we have some operator that means "implies in &lt;span style="font-weight: bold;"&gt;One&lt;/span&gt;", which takes two statements and is true if one can be deduced from the other with first-order manipulations. (We then use nonmonotonic logic when wewant to determine the truth value of the new operator.)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Three&lt;/span&gt;: OK, so we agree that system is well-defined. But, by Tarski's Undefinability Teorem, we know it can't be defined within itself. So, we add &lt;span style="font-style: italic;"&gt;another &lt;/span&gt;operator, that means "implies in &lt;span style="font-weight: bold;"&gt;Two&lt;/span&gt;". This takes two statements and is true if the first implies the second by the rules of &lt;span style="font-weight: bold;"&gt;Two&lt;/span&gt;.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity&lt;/span&gt;: We can obviously keep going in this manner. So, define the logic that has each of the "implies" operators that might be gained in this manner. For each number, it has the operator "imlies in [number]".&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity+1&lt;/span&gt;: By Tarski's theorem, this system also can't define its own truth. But, we just did, so we want our ideal logic to be able to as well. So, we add the "implies in &lt;span style="font-weight: bold;"&gt;Infinity&lt;/span&gt;" operator. (Maybe you've heard that infinity plus one is still just infinity? Well, that is true of &lt;a href="http://en.wikipedia.org/wiki/Cardinal_number"&gt;cardinal infinities&lt;/a&gt;, but not ordinal infinities, which is what we have here. Fun stuff!)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity+2&lt;/span&gt;: Add the "implies" relation for &lt;span style="font-weight: bold;"&gt;Infinity+1&lt;/span&gt;.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity+Infinity&lt;/span&gt;: Keep doing that, so that we've now got 2 "implies" relations for each number: the one for just the number, &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; the one for infinity+number.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity*infinity&lt;/span&gt;: Keep going on in this manner so that we have an infinite progression for each number, namely "implies in [number]", "implies in Infinity+[number]", "implies in Infinity+Infinity+[number]", ...&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity*infinity*Infinity&lt;/span&gt;...&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity^Infinity&lt;/span&gt;...&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Infinity^(Infinity^Infinity)&lt;/span&gt;...&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;A(Infinity,Infinity&lt;/span&gt;)... (where A is the &lt;a href="http://en.wikipedia.org/wiki/Ackermann_function"&gt;ackermann function&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Obviously I've gone on far longer than I need to to illustrate the pattern. But, it's fun! Anyway, the point is that finding higher and higher meaningful logics is "merely" a matter of finding higher and higher infinities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-29542953511694623?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/29542953511694623/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/10/progress-after-pondering-previous-post.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/29542953511694623'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/29542953511694623'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/10/progress-after-pondering-previous-post.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-5331797343704785320</id><published>2008-10-25T20:31:00.000-07:00</published><updated>2008-10-27T18:45:38.127-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='hypercomputation'/><category scheme='http://www.blogger.com/atom/ns#' term='grounding'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Somewhat Broader Definition of Meaning&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;These are some thoughts arising from a recent &lt;a href="http://www.mail-archive.com/agi@v2.listbox.com/msg14465.html"&gt;discussion&lt;/a&gt; I was involved with on the &lt;a href="http://www.mail-archive.com/agi@v2.listbox.com/"&gt;agi list&lt;/a&gt;. As usual, I'm looking for a semantics that follows closely from syntax. In &lt;a href="http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-criteria-in-this-post-i.html"&gt;this&lt;/a&gt; post, I discussed my first jump to a broader definition. Here, I'll discuss some issues involved in such definitions, developing a series of broader and broader definitions along the way.&lt;br /&gt;&lt;br /&gt;One of the simplest possible views on logical meaning is that a statement's meaning is totally specified by its verification criteria. This point of view is called logical positivism, and the idea also resonates strongly with constructivism. There are several ways of making it more precise:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;-A sentence has meaning if there is a computation that will always indicate "true" or "false" from some finite set of sensory observations, when those observations are available.&lt;/span&gt;   &lt;span style="font-style: italic;"&gt; -A sentence has meaning if there is a computation that gives "true" from some finite set of sensory observations, when the sentence is true and those observations are available.&lt;/span&gt;   &lt;span style="font-style: italic;"&gt; -A sentence has meaning if there is a computation that gives "false" from some finite set of sensory observations, when the sentence is false and those observations are available.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;These variations are interesting because they seem small, but make a real difference. The first is the most stringent, obviously. The second requires only verifiability, and the third requires only falsifiability.&lt;br /&gt;&lt;br /&gt;One interesting property is that all three are inherently difficult to check, thanks to the halting problem. We can tell that a statement is meaningful once we've determined that it is true or false, but we won't always be able to do that ahead of time. Furthermore, because of this, the term "meaningful" won't always be meaningful :).&lt;br /&gt;&lt;br /&gt;So, the first definition allows any first-order statements that always reduce to either "true" or "false" when the proper sensory information is added: a class we cannot sort out ahead of time. The second admits all of these, plus an extra existential quantifier. The third instead allows an extra universal quantifier.&lt;br /&gt;&lt;br /&gt;Another interesting point: for the second and third definitions, the negation of a meaningful statement is not necessarily meaningful. This is rather odd, and the problem doesn't arise for the first definition. The negation of a verifiable statement is falsifiable; the negation of a falsifiable statement is verifiable. So, if we accept &lt;span style="font-style: italic;"&gt;both&lt;/span&gt; types of statements as meaningful, then we regain the ability to negate any meaningful statement.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;-A sentence has meaning if there is a computation that, from some finite set of sensory observations, either always gives "true" when the sentence is true or always gives "false" when the sentence is false (when the observations are available).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;But, an unwanted limitation still remains: we cannot arbitrarily compose meaningful statements. More specifically, we can't put any meaningfully true/false statement in the place of sensory data. To fix this, we can change the definition from one that relies on sensory information to one that can rely on any meaningful information:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;-A sentence has meaning if there is a computation that, from some finite set of meaningful data, either would always give "true" when the sentence is true or always give "false" when the sentence is false (if the data were available).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The definition shifts to hypothetical because there are now some meaningful statements for which the data will never be available, since the truth of a meaningful statement can be unknowable (thanks to Godel's Incompleteness Theorem).&lt;br /&gt;&lt;br /&gt;The discussion up to now does not depend on a choice between intuitionist and classical logic: both would work. This is largely because I've been discussing &lt;span style="font-style: italic;"&gt;meaning&lt;/span&gt;, not &lt;span style="font-style: italic;"&gt;truth&lt;/span&gt;. I have not actually given any definitions that specify when a statement is true or false. If we run a statement's computation, and it spits out "true" or "false", then obviously the statement is correspondingly true or false. However, what if we are stuck with a nonhalting case? The truth value is undefined. An intuitionist approach would &lt;span style="font-style: italic;"&gt;leave&lt;/span&gt; the value undefined. (So, any truths that are unknowable according to Godel's incompleteness theorem are actually meaningful-but-undefined.) A classical approach would be more likely to define these facts: if the computation corresponding to a verifiable statement never outputs "true", then it is false; if the computation for a falsifiable statement never outputs "false", then it is true. (So, truths that are unknowable according to Godel's incompleteness theorem are meaningful but take an infinite amount of time to verify/falsify.)&lt;br /&gt;&lt;br /&gt;I will generally use the classical version, but the reader should make up their own mind.&lt;br /&gt;&lt;br /&gt;Notice: the extra assumptions that I add for the classical version cannot be stated in first-order logic. There are two reasons. First, if for some particular computation I attempt to say "if it doesn't output 'true', then the corresponding statement is false", then I am being circular: "doesn't output true" will be a statement of the sort that is either false (if the computation *does* output), or undefined. Since it is never definitely true, my statement carries no weight. (At this point the intuitionists in the audience should be saying "told you so".) Second, I need to state the assumption *for all sentences*, but so far my definitions have only allowed reference to a finite number of meaningful facts.&lt;br /&gt;&lt;br /&gt;So, are the classical assumptions meaningful? I want to justify them in terms of manipulation rules: I want to set things up so that if some logicians came and looked at the workings of the logic, with no prior knowledge, they would ascribe to it the semantics I have in mind.&lt;br /&gt;&lt;br /&gt;First. I want manipulation rules for the condition "if the computation doesn't output". For this, I can invoke the nonmonotonic logic that I described &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html"&gt;previously&lt;/a&gt;: we &lt;span style="font-style: italic;"&gt;assume&lt;/span&gt; that the computation &lt;span style="font-style: italic;"&gt;doesn't&lt;/span&gt; stop, and revise our belief when we find out that it does. Thus, our belief will eventually converge to the truth if we spend enough time checking. (We could also throw on fancer stuff, like weaker/stronger beliefs, so that our belief that the computation doesn't halt gets stronger as we run the computation for longer and longer.)&lt;br /&gt;&lt;br /&gt;Second. It seems fairly simple to extend the definition to allow statements to be about an infinite number of base-facts. In fact, it is quite necessary to do so! So far, I've only been discussing statements that apply to a "finite number of meaningful statements". For sensory statements, that makes the system incapable of generalizing. We want a practical AI logic to be able to learn rules that will apply to any new sensory data; surely such statements are meaningful. But, this is a larger shift then may be at first obvious. In terms of the logic, universal statements were up until now merely procedural, used essentially to indicate that some inference step can be made repeatedly. This step allows them to actually reference an infinite number of things. Is that really "meaningful"? Someone might object, for example, that all beings have finite lifespan, so there is really only a finite amount of sensory data to be talked about. I would respond to that particular objection by saying that our lifespan is finite, but not bound by any particular number ahead of time. So, similarly, the reference to all sensory data isn't bound ahead of time. (In terms of the logic, this would mean that we need to be able to quantify over statements.)&lt;br /&gt;&lt;br /&gt;Both of these modifications can be understood in terms of &lt;a href="http://en.wikipedia.org/wiki/Super-recursive_algorithm"&gt;super-recursive algorithms&lt;/a&gt;. A super-recursive algorithm is an algorithm that is fairly easily implemented on a computer, yet inconsistent with formal models like Turing machines. Specifically, the first definition makes use of &lt;a href="http://en.wikipedia.org/wiki/Computation_in_the_limit"&gt;limit-computation&lt;/a&gt;, and the second makes use of both that and &lt;a href="http://en.wikipedia.org/wiki/Interactive_computation"&gt;interactive computation&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A limit computation is a computation that only &lt;span style="font-style: italic;"&gt;definitely&lt;/span&gt; gives the correct answer after an infinite amount of time. The more time we give it, the more likely the answer is to be correct. This sort of computation is commonly used to render some types of fractals, like the &lt;a href="http://en.wikipedia.org/wiki/Mandelbrot_set"&gt;mandelbrot set&lt;/a&gt;. For each square in the image, a limit-computation is approximated with some finite cutoff time. The longer we allow, the closer the image is to being correct. Here, it is being used in the first case to characterize the concept of not halting, and in the second it could arise in situations where a statement quantifies over an infinite number of meaningful statements.&lt;br /&gt;&lt;br /&gt;Interactive computation is what a computer does when it asks you (or the internet) questions during a process. A commonly-invoked image is the driver of a car. The driver is constantly adjusting to events that unfold in the surrounding traffic. The standard models of computation, based on Turing machines and similar constructions require that all input is present before starting, and output is only available once finished. To apply such models of computation to driving, we would need to break the whole process up into small input/output phases-- which does not seem very useful. Interactive computation is used in the above when a statement references an infinite number of sensory items, so that the evaluation of the truth value must be a continual process that continues to take into account more input from the environment.&lt;br /&gt;&lt;br /&gt;So, what does the definition of meaning look like now?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;-A sentence has meaning if, from some computably-specified set of meaningful data, there is either a computation that would eventually halt if the sentence is true and not otherwise, or a computation that would eventually halt if the sentence is false and not otherwise (if the data were available).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Notice: not all truths are even limit-computable. It may look like it from the definition, but it's not the case. That is because a sentence may now reference an infinite number of limit-computable sentences. We could attempt to share time between some finite-but-large number of these, but in some cases the computation will be provably non-convergent: we won't eventually get better and better answers as we spend more time. One way of interpreting this is to say that some statements don't have meaning &lt;span style="font-style: italic;"&gt;in and of themselves&lt;/span&gt;, because they don't really have a working verification/falsification procedure tied to them. Instead, they are only meaningful relative to the larger web of statements.&lt;br /&gt;&lt;br /&gt;Further expansions of the definition are imaginable... for example, what if we replace both instances of "computable" in the above with "meaningful"? But, I think that is enough speculation for now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-5331797343704785320?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/5331797343704785320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/10/somewhat-broader-definition-of-meaning.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5331797343704785320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5331797343704785320'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/10/somewhat-broader-definition-of-meaning.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2990702836691077349</id><published>2008-10-05T12:13:00.000-07:00</published><updated>2008-10-05T14:18:26.402-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='set theory'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Finite Sets Prove Mathematical Induction Correct&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://dragonlogic-ai.blogspot.com/2008/09/some-questions-1.html"&gt;this post&lt;/a&gt;, I asked two questions:&lt;br /&gt;&lt;br /&gt;1. Where does our knowledge about nonhalting programs come from?&lt;br /&gt;&lt;br /&gt;2. What grounds the mathematical concept of "set"?&lt;br /&gt;&lt;br /&gt;One thing I didn't mention is that an answer to the second question would probably provide an answer to the first. This is because practically all of our knowledge about nonhalting programs can be proven in axiomatic set theory, and it is (somewhat) reasonable to assume that a "perfect" set theory (one that really did capture everything worth capturing about the human intuition of sets) would capture all of it.&lt;br /&gt;&lt;br /&gt;Now, I should reserve myself a bit: explaining where the grounding comes from does not necessarily explain where the knowledge comes from, as illustrated with my logic that (in my opinion) has a grounded notion of "programs" and "computation", but which does not necessarily have any additional knowledge to accompany that grounding. But this is beside my point, which is...&lt;br /&gt;&lt;br /&gt;Finite sets prove mathematical induction correct!&lt;br /&gt;&lt;br /&gt;I mentioned in that post that most (if not all) knowledge about halting can be provided by the proof method called mathematical induction (no relation to the use of "induction" meaning "learning"). It is not difficult to prove the correctness of mathematical induction from set theory; but, I did not consider this route before, because full set theory is harder to justify then induction. However, I came across a paper by George Boolos called "The Justification of Mathematical Induction" which shows how this can, after all, be a useful approach.&lt;br /&gt;&lt;br /&gt;The paper justifies mathematical induction from only an extremely minimal set theory, plus the transitivity of less-than (if &lt;span style="font-weight: bold; font-style: italic;"&gt;a&lt;/span&gt; is less then &lt;span style="font-weight: bold; font-style: italic;"&gt;b&lt;/span&gt; and &lt;span style="font-weight: bold; font-style: italic;"&gt;b&lt;/span&gt; is less than &lt;span style="font-weight: bold; font-style: italic;"&gt;c&lt;/span&gt;, &lt;span style="font-weight: bold; font-style: italic;"&gt;a&lt;/span&gt; is less then &lt;span style="font-weight: bold; font-style: italic;"&gt;c&lt;/span&gt;). The set-theoretic fact needed is approximately this:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Pick any number, &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;n&lt;/span&gt;&lt;span style="font-style: italic;"&gt;, and formula, &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;F&lt;/span&gt;&lt;span style="font-style: italic;"&gt;. There exists a set &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;X&lt;/span&gt;&lt;span style="font-style: italic;"&gt; at level n, containing all and only the those things existing at lower levels and satisfying &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;F&lt;/span&gt;&lt;span style="font-style: italic;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is an axiom schema, because we can fill in any formula &lt;span style="font-weight: bold;"&gt;F&lt;/span&gt; to get a specific statement. (First-order logic does not actually have a way of saying "for all formulas".) It establishes levels of existence, one for each natural number. It is "extremely weak" because without more principles, it only requires the existence of each possible finite set! There is no grounding problem, because each set that is forced to exist can be explicitly manipulated. (Note: the principle does not rule out the existence of more complicated sets.) I think this is quite satisfying.&lt;br /&gt;&lt;br /&gt;On the other hand, I don't have a good feel for where the extra knowledge "comes from"... I can read through the proof, but it is surprising that merely adding the ability to collect entities together into sets gives us so much power to prove important truths! It is dangerous, too. Dropping the notion of levels gives this schema:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Pick any&lt;/span&gt;&lt;span style="font-style: italic;"&gt; formula &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;F&lt;/span&gt;&lt;span style="font-style: italic;"&gt;. There exists a set &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;X&lt;/span&gt;&lt;span style="font-style: italic;"&gt;, containing all and only the those things satisfying &lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;F&lt;/span&gt;&lt;span style="font-style: italic;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is the naive set comprehension, which (as is well known) leads to a contradiction known as Russel's paradox. Russel's paradox cannot be derived from the version I gave that is restricted to levels. however, perhaps Russel's paradox can still serve as a lesson that, more often than not, an axiom schema will yield unexpected results.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Remaining Questions:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;How can talk of larger infinities be grounded, and how can their mathematical existence be proved? How can the way human mathematicians talk about them be justified? Or, is it all actually nonsense? How much of human mathematical knowledge does an AI need to be given before it can learn the rest? (Remember, all facts about halting are in principle learnable; the reason I'm worried about &lt;span style="font-style: italic;"&gt;proving&lt;/span&gt; some of them is that humans seem to be able to. Also, humans seem capable of coming to know facts about convergence of processes, which are &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; in principle learnable &lt;span style="font-style: italic;"&gt;or&lt;/span&gt; provable, unless we really &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; prove facts about halting.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2990702836691077349?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2990702836691077349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/10/finite-sets-prove-mathematical.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2990702836691077349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2990702836691077349'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/10/finite-sets-prove-mathematical.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2131470932663919114</id><published>2008-09-24T10:39:00.000-07:00</published><updated>2008-10-05T12:13:07.507-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='self-reference'/><category scheme='http://www.blogger.com/atom/ns#' term='foundations of mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Some Comments on the Previous Post&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One thing that is very interesting about what I talked about last time-- it provides a surprisingly practical purpose for detailed talk about infinities. The bigger infinities you know how to talk about, the more logical systems you can prove consistent, giving you more information about which Turing machines never halt. But this is very strange! These Turing machines are not being run for so long that they enter into the realm of higher infinities. They either halt in &lt;span style="font-style: italic;"&gt;finite&lt;/span&gt; time, or keep looping for &lt;span style="font-style: italic;"&gt;one&lt;/span&gt; infinity (the smallest infinity, sometimes called omega). Why, then, does talk of greater and greater infinities become useful?&lt;br /&gt;&lt;br /&gt;The straightforward answer is to recount how that information is used to prove consistency (and therefore non-halting-ness) of a system. First, we assign a mathematical semantics to a system; a meaning for the system-states. Second, we show that the initial meanings are true. Third, we show that if one state is true, then the transition rules guarantee that the next state is true as well; in other words, we show that all manipulations are truth-preserving. This is the step where infinities come in. Although no infinities are involved in the actual states and transitions, the &lt;span style="font-style: italic;"&gt;semantics&lt;/span&gt; may include infinities, so our reasoning may rely on facts about infinities. This sounds silly, but it creeps in more readily then you might expect. If the logic that we are proving consistent uses no infinity, then we must use a single infinity (omega) when proving it consistent; if it uses a small infinity, we must use a larger one to prove it consistent; and so on.&lt;br /&gt;&lt;br /&gt;Utterly strange! Why does this method of proving consistency &lt;span style="font-style: italic;"&gt;work&lt;/span&gt;? We're somehow making the proof easier by treating facts about finite stuff as if they were facts about infinite, unreachable stuffs.&lt;br /&gt;&lt;br /&gt;So, it seems that self-reference (trying to prove oneself consistent) would give grounding for as many infinities as one might desire. OK, but, how do we actually &lt;span style="font-style: italic;"&gt;get&lt;/span&gt; them? A normal logic is stuck with just the infinities it has, even when it starts to reason about itself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2131470932663919114?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2131470932663919114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/09/some-comments-on-previous-post-one.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2131470932663919114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2131470932663919114'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/09/some-comments-on-previous-post-one.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4093462788204983232</id><published>2008-09-22T10:48:00.000-07:00</published><updated>2008-09-22T12:07:16.389-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Some Questions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;1. Where does our knowledge about nonhalting programs come from?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;An arithmetical logic is doomed to incompleteness because no logic can capture all the facts about nonhalting programs. However, in practice, we humans use logics that capture &lt;span style="font-style: italic;"&gt;some&lt;/span&gt; of these facts. Which ones, and why? Well, mostly, we justify what we know about nonhalting programs via a special proof method called &lt;span style="font-style: italic;"&gt;mathematical induction&lt;/span&gt;. The correctness of arguments by mathematical induction is fairly obvious (once the method of argument has been properly explained to a person, anyway); but why? What justification can be provided? I've read a justification based on infinitary logic, but while interesting, the easiest way to formalize the argument in first-order logic would just rely on mathematical induction again. So how do we humans know mathematical induction is right?&lt;br /&gt;&lt;br /&gt;I've argued in the past that the best way to deal with the halting problem is to (1) assume a program doesn't halt, and (2) run it for a really long time, revising the assumption if it turns out that it does halt. This general idea could also account for us humans having a certain amount of inborn knowledge about the halting problem: evolution weeds out those with wrong inborn assumptions (which lead to wrong results), but favors those with the right assumptions (which lead to correct reasoning). But, is this really where we get mathematical induction? Is it just something that happens to work, and got embedded in our genes? That doesn't seem right! It seems totally implausible!&lt;br /&gt;&lt;br /&gt;A second source of our knowledge about the halting problem is our seeming knowledge about the consistancy of formal systems. Goedel's 2nd incompleteness theorem shows that any (sufficiently powerful) formal system cannot prove its own consistancy. But, as humans, we reason that if the system is correct it must be consistant, since the truth is not inconsistant. The formal systems agree with us on the first point (they "believe" their own formulas, so implicitly support their own correctness), but cannot reason from that to their own consistancy.&lt;br /&gt;&lt;br /&gt;This is equivalent to knowledge about halting, because what we are saying is that a program that proves theorems until it arrives at an inconsistancy will never halt. So, utilizing our human insight, we can add that fact to a logical system to increase its knowledge about halting! Call the first system S1, and the second S2. Since we believed that S1 was consistant, and S2 differs only in that it contains more true information, we will believe S2 to be consistant. If we add *that* fact, we can form a system S3. We know S3 is consistant, so we can record that knowledge to get S4. And so on. In this way, a human appears to have an infinite supply of true infomation concerning the halting problem, even over and above our knowledge coming from mathematical induction (which we can suppose was encoded in S1 already).&lt;br /&gt;&lt;br /&gt;So, we have two mysterious sources of information about the halting problem. Also, it seems that these two sources can be linked, because the truth is we can do somewhat better than simply saying "the axioms of arithmetic are obviously true, and therefore consistant". We can instead look to the &lt;a href="http://en.wikipedia.org/wiki/Gentzen%27s_consistency_proof"&gt;proof by Gentzen&lt;/a&gt;. This proof relies on a method of argument that is stronger than normal mathematical induction, called &lt;a href="http://en.wikipedia.org/wiki/Transfinite_recursion#Transfinite_recursion"&gt;transfinite induction&lt;/a&gt;. Basically, this extends normal induction to fairly small infinities (I think the exact class would be &lt;a href="http://en.wikipedia.org/wiki/Large_countable_ordinal"&gt;computable ordinals&lt;/a&gt;?). &lt;a href="http://www.rbjones.com/rbjpub/logic/inter014.htm"&gt;It seems&lt;/a&gt; that this principle extends to proving the consistancy of more powerful logics as well: we can prove their consistancy by using some variant of infinite induction, so long as we pick infinities larger than the system itself can refer to. (If anyone has a better reference about this, I'd love to hear about it.)&lt;br /&gt;&lt;br /&gt;So, this is how it looks: we &lt;span style="font-style: italic;"&gt;seem&lt;/span&gt; to have an uncanny ability to list larger and larger infinities, while any formal system is limited to some ceiling. Furthermore, it &lt;span style="font-style: italic;"&gt;seems&lt;/span&gt; we somehow justify applying mathematical induction to any such infinity, yielding more truths about halting then any particular formal system can muster.&lt;br /&gt;&lt;br /&gt;This situation is very strange. First, the process of listing larger infinities &lt;span style="font-style: italic;"&gt;seems&lt;/span&gt; well defined, yet we can't define it (because when we do, we can automatically come up with a larger infinity by saying "the first one not reacable by that mathod"). Second, mathematical induction &lt;span style="font-style: italic;"&gt;seems&lt;/span&gt; justifiable; it &lt;span style="font-style: italic;"&gt;seems&lt;/span&gt; different than the random (truly unknowable) facts about halting that "cause" Goedelian incompleteness. But if it is different, then where is the difference?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;2. A concept of 'procedure' could be grounded in our own thought process, but what grounds the mathematical notion of 'set', especially 'uncountable set'?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I have mentioned this before, of coruse. It continues to puzzle me. Uncountable sets are hugely infinite, so it seems really really obvious that we cannot have any actual examples to ground our reference to them.&lt;br /&gt;&lt;br /&gt;The elementary facts about uncountable sets are proven by defining them in terms of &lt;span style="font-style: italic;"&gt;powersets&lt;/span&gt;. The powerset &lt;span style="font-weight: bold;"&gt;P&lt;/span&gt; of a set &lt;span style="font-weight: bold;"&gt;S&lt;/span&gt; is defined as the set of all subsets of a set. The powerset of the set of integers is uncountable. However, there is a problem: when we try to give an actual definition, "all subsets" is hard to properly define. For finite sets, it is easy; but for infinite sets, it tends to collapse to "all referencable subsets". Mathematicians know that there are &lt;span style="font-style: italic;"&gt;far&lt;/span&gt; more subsets that cannot be referenced than subsets that can be referenced, so this is no good.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4093462788204983232?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4093462788204983232/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/09/some-questions-1.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4093462788204983232'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4093462788204983232'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/09/some-questions-1.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-5161140726164217933</id><published>2008-09-03T11:42:00.000-07:00</published><updated>2008-09-03T12:31:22.866-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span class="Apple-style-span"  style="font-size:x-large;"&gt;Logic Standing Firmly on its Own Two Shoulders&lt;/span&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have thought of an amusing and elegant variation of my grounded arithmetical logic (a variation on standard lambda logic, which I discussed &lt;a href="http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-results-in-this-post-i.html"&gt;here&lt;/a&gt;, &lt;a href="http://dragonlogic-ai.blogspot.com/2008/08/paradox-i-discovered-paradox-in-version.html"&gt;here&lt;/a&gt;, and almost everywhere inbetween).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The logic supposedly gets its grounding from the possession of a black box that executes programs represented as lambda terms-- the idea being that they &lt;span class="Apple-style-span" style="font-style: italic;"&gt;literally are&lt;/span&gt; programs, so cannot fail to be grounded  (despite the fact that no logical description of the concept of computation is complete, which makes it sound impossible to represent logically).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But, this idea is odd; it sounds as if the logic is getting grounding from an external black box that it is hooked to. What is needed is an embodiment of procedural-ness. But, there is a natural source of this from within the logic: deduction is a procedural task, which is itself Turing-complete; so, it can serve as a representation for arbitrary computations, just as well as lambda-reduction can.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, the idea is to ground the logic in its own deduction system. This is not a radically new idea; &lt;a href="http://en.wikipedia.org/wiki/Provability_logic"&gt;provability logic&lt;/a&gt; is a well-established field of study. But, in any case, the idea is useful here if we don't want the logic's groundedness to rely on a lambda-evaluator that is outside of the logic itself.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is satisfying for another reason, as well. I talked previously, somewhat, about the "reason" that my version of lambda logic is more semantically capable than plain first-order logic: it is able to represent concepts with a closed-world assumption in addition to the standard open-world assumption. What this means is, the logic can define a predicate by stating some rules that hold for the entity, and then adding "that's all". Statements that don't follow from the rules are automatically false. This concept offers an explanation for where unprovably true statements come from: the "that's all" operator has a meaning that cannot possibly be covered by an associated proof method. (Or, at least not one that terminates in finite time.) So it is the origin of much mathematical mystery.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unfortunately, it cannot be the source of &lt;span class="Apple-style-span" style="font-style: italic;"&gt;all&lt;/span&gt; mysteries; it allows us to talk about &lt;span class="Apple-style-span" style="font-style: italic;"&gt;arithmetical&lt;/span&gt; truths, but &lt;span class="Apple-style-span" style="font-style: italic;"&gt;not&lt;/span&gt; set-theoretical ones.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Anyway. The second nice feature of a logic grounded in its own deduction system is that it allows the "that's all" concept to be put directly to use. We can define a predicate by saying it is true if and only if it is provably true via a given set of statements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This construction would have some interesting behavior. The first thing to realize is the all-encompassing reach of the closed-world assumption. The construction should not be used, for example, when defining some property of bicycles: if we discover a new bicycle, we will be unable to describe it with that property. It is better suited for defining properties of sets that can be completely pre-specified, like the natural numbers. We will not happen upon any new numbers, so we are safe. To apply predicates to bicycles while still using the closed-world construction, we would need to do something fancy such as define a class of all possible bicycle-types, define a closed-world predicate on that, and then apply the predicate indirectly to actual bicycles by classifying them as instances of the abstract bicycle-types.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The strength of the system could be increased even further by adding truth-predicates via one of the methods in &lt;a href="http://dragonlogic-ai.blogspot.com/2008/08/others-inventions-my-goal-mentioned-in.html"&gt;this&lt;/a&gt; post, for example... but set theory would still not be grounded in the system.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-5161140726164217933?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/5161140726164217933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/09/logic-standing-firmly-on-its-own-two.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5161140726164217933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5161140726164217933'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/09/logic-standing-firmly-on-its-own-two.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7372484838755814450</id><published>2008-08-27T09:53:00.000-07:00</published><updated>2008-08-27T10:50:03.344-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;A &lt;span style="font-style: italic;"&gt;More&lt;/span&gt; Direct Attack&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, as I noted last time, there are many possible ways to fit the predicate "True" into a language. Which of the many ways is correct? What does that even mean? What makes one better than another? Or, are all of them OK? Is it merely a matter of taste, since all of them capture the normal everyday uses of the concept of truth? In other words, is "truth" merely a concept that might be used in slightly different ways by different people, meaning there is no right answer?&lt;br /&gt;&lt;br /&gt;I also noted last time that these logics do not seem to automatically solve the other problems I'm worried about; in particular, they do fairly little in the way of providing a foundation of mathematics.&lt;br /&gt;&lt;br /&gt;The goal is to create a logic that can talk about its own semantics; a logic that can reference any mathematical ideas that were needed in describing the logic in the first place. Tarski started the whole line of research in this way; he asked, &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; a logic could define its own notion of truth, &lt;span style="font-style: italic;"&gt;then&lt;/span&gt; what? (And he showed that, given some additional assumptions, such logics turn out to be contradictory.) But the theories I referenced last time, and those like them, have diverged somewhat from this line of inquiry... instead of providing a logic capable of defining its own truth predicate based on the semantic notions a human would employ in explaining the logic to another, these theories simply put the truth predicate in the language to begin with, and attempt to state rules that make it behave as much like our concept of truth as possible without allowing Tarski's contradiction.&lt;br /&gt;&lt;br /&gt;That's totally different!&lt;br /&gt;&lt;br /&gt;So, a &lt;span style="font-style: italic;"&gt;more&lt;/span&gt; direct approach would be to try to create a logic that can define itself; a logic that can develop its notion of truth from more basic concepts, rather than declaring it as given. I suspect that such an attempt would &lt;span style="font-style: italic;"&gt;not &lt;/span&gt;result in the same multitude of possible solutions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;The Logic of Definitions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;On a related train of thought-- I have come up with another idea for a logic. The idea was based on the observation that many of the proposed solutions to defining truth were actually proposals for a theory of definitions in general; truth was then treated as a defined term, using the dfollowing definition, or some variant:&lt;br /&gt;&lt;br /&gt;" "X" is true" means "X".&lt;br /&gt;&lt;br /&gt;The revision theory treats this as a rule of belief revision: if we believe "X", we should revise our belief to also accept " "X" is true". The supervaluation theories claim that the above definition is incomplete (aka "vague"), and formulate a theory about how to work with incomplete definitions. (But, see &lt;a href="http://theoriesnthings.blogspot.com/2008/02/supervaluation-word.html"&gt;this&lt;/a&gt; blog for a note on the ambiguity of the term "supervaluation".)&lt;br /&gt;&lt;br /&gt;So, I have my own ideas about how to create a logic of definitions.&lt;br /&gt;&lt;br /&gt;--The base logic should be normal classical logic. It could simply be first-order logic if that is all you need in order to define your terms; it could be arithmetic if that is needed; or it could be axiomatic set theory, if you need it to define what you want to define. In other words, the basic theory will depend on what you want to talk about with your definitions, but the logic that theory is set in will be classical.&lt;br /&gt;--The logic of the defined concepts, however, will be violently non-classical. A defined predicate may be neither true nor false in some cases, because its definition simply fails to say anything one way or the other. It could be &lt;span style="font-style: italic;"&gt;both&lt;/span&gt; true &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; false in other cases, when the definition implies both.&lt;br /&gt;&lt;br /&gt;This is, of course, only a rough outline of the logic... it differs from most treatments of definitions (such as revision theory and supervaluation theory) by allowing not just statements that are assigned no value, but also statements that get both truth values.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7372484838755814450?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7372484838755814450/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/more-direct-attack-so-as-i-noted-last.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7372484838755814450'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7372484838755814450'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/more-direct-attack-so-as-i-noted-last.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8694955259865376563</id><published>2008-08-22T12:59:00.000-07:00</published><updated>2008-08-22T13:45:14.877-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='truth'/><category scheme='http://www.blogger.com/atom/ns#' term='math'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Other's Inventions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My goal mentioned in the previous post was creating a logic that can reason about its own semantics (somehow sidestepping Tarski's proof of the impossibility of this). Well. This is a very ambitious goal. Luckily, I am not the first to desire such a thing, and there is a vast body of work on the subject. I find not one solution, but three:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://asweb.artsci.uc.edu/philosophy/gauker/KripkeTruth.pdf"&gt;Fixed Point Theories&lt;/a&gt;&lt;br /&gt;&lt;a href="http://theoriesnthings.blogspot.com/2008/02/supervaluation-word.html"&gt;Supervaluation Theory&lt;/a&gt;&lt;br /&gt;&lt;a href="http://plato.stanford.edu/entries/truth-revision/"&gt;Revision Theory&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There are other theories than these, but these are the ones I saw mentioned the most.&lt;br /&gt;&lt;br /&gt;It is comforting to see that "grounding" is often mentioned in connection with these theories, although the word is not being used in quite the same way I am currently using it. The word comes up particularly often in association with fixed-point theories.&lt;br /&gt;&lt;br /&gt;Some of the differences between the three theories are superficial. For example, fixed-point theories are often stated in terms of three truth values: true, false, and neither-true-nor-false. Revision theorists reject this idea, and instead simply fail to assign a truth value to some sentences. This makes no difference for the math, however.&lt;br /&gt;&lt;br /&gt;So, in a sense, the three theories are very similar: each describes some method of assigning truth values to some, but not all, sentences. Each of the methods sound overly complicated at first, compared to typical paradox-free mathematical settings, in which truth-assignment is very straightforward. But, once one realizes that the complexity is needed to avoid assigning truth values to meaningless statements such as paradoxes, each of the methods seems fairly intuitive.&lt;br /&gt;&lt;br /&gt;Unfortunately, each method yields different results. So which one is right?&lt;br /&gt;&lt;br /&gt;I've got to do a lot more research, to find out more about how they differ, other alternative theories, and so on. I suspect the fact that I am looking for concrete results rather than merely a philosophically satisfying logic will help guide me somewhat, but I am not sure.&lt;br /&gt;&lt;br /&gt;Also, it appears that most of these logics are developed as modifications of arithmetic, rather than set theory. But, I am looking for a logic that can serve as a foundation for mathematics. It appears that a "direct attack" on the &lt;span style="font-style: italic;"&gt;self&lt;/span&gt;-reference problem does not automatically get me what I want in terms of the &lt;span style="font-style: italic;"&gt;general&lt;/span&gt; reference problem. So, I still need a good theory on how mathematical facts about uncountable sets can be grounded...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8694955259865376563?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8694955259865376563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/others-inventions-my-goal-mentioned-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8694955259865376563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8694955259865376563'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/others-inventions-my-goal-mentioned-in.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-348066551872316304</id><published>2008-08-14T10:33:00.000-07:00</published><updated>2008-08-14T13:14:21.102-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Direct Attack&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My approach thus far to the issue of finding the correct logic has been incremental. For each logic, find something I cannot represent but want to, and attempt to extend the logic to include that. So, since I've settled on a form of lambda logic as a logic for arithmetical concepts, the next step would be to look for an extension into the hyperarithmetical hierarchy or the analytic hierarchy, or possibly to set theory. These form a progression:&lt;br /&gt;&lt;br /&gt;arithmetic hierarchy &lt; hyperarithmetic hierarchy &lt; analytical hierarchy &lt; axiomatic set theory&lt;br /&gt;&lt;br /&gt;So, a jump directly to set theory would be nice, but the hyperarithmetic hierarchy would be the easiest next step.&lt;br /&gt;&lt;br /&gt;This process can continue indefinitely, thanks to Tarski's Undefinability Theorem: no logic is able to define its own semantics, so if I want to find a way to extend some logic X, the standard way would be to look for a logic Y that can define the semantics of X. However, my vague hope is that if I work in these incremental extensions for long enough, I will eventually understand the undefinability principle well enough to see a way around it, thus making one final extension to the logic that encompasses all possible extensions.&lt;br /&gt;&lt;br /&gt;This is a silly idea.&lt;br /&gt;&lt;br /&gt;So, what about a direct attack? For it to work, I need to find a way around the Undefinability theorem. The undefinability theorem makes some specific assumptions; so, I know before starting that the logic I come up with needs to break these assumptions.&lt;br /&gt;&lt;br /&gt;I'll outline the theorem's proof.&lt;br /&gt;&lt;br /&gt;First, Tarski asks us to assume a minimal fact about the concept of truth: For all statements X, X is true only when X.&lt;br /&gt;&lt;br /&gt;T(x) &lt;=&gt; x&lt;br /&gt;&lt;br /&gt;This is called "Tarski's T-Schema". The idea is that if a formalization does not satisfy this, it cannot be a concept of truth. It is possible that more than the T-schema is required for a proper notion of truth, but if there are any, they aren't needed for Tarski's proof. (So, the T-schema is one of the critical assumptions to scrutinize. Is it really a necessary property of truth?)&lt;br /&gt;&lt;br /&gt;Next, assume that some logic &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; talk about its own concept of truth. The idea is to prove that it can then say "This sentence is false", causing a paradox within that logic.&lt;br /&gt;&lt;br /&gt;To prove this, we need one more assumption: the logic must contain arithmetic. This is a rather large assumption. However, it is one I am obviously making. So let's continue.&lt;br /&gt;&lt;br /&gt;The reason arithmetic is required for the proof is that the Diagonal Lemma holds for any logic extending arithmetic. (So, if the Diagonal Lemma holds for some other logic, we don't need to require that the logic extends arithmetic.)&lt;br /&gt;&lt;br /&gt;The diagonal Lemma states that if we can represent some predicate P, then we can create a sentence that says "I am P". In short, the Diagonal Lemma shows that any language extending arithmetic contains self-referential sentences.&lt;br /&gt;&lt;br /&gt;Now: if the language can negate predicates, then it can express the predicate "not true" (since we've already assumed it can express "true"). Therefore, by the Diagonal Lemma, it can express "I am not true". Call this sentence Q.&lt;br /&gt;&lt;br /&gt;By Q's definition, we have Q&lt;=&gt;~T(Q). (In english, this just says that Q and "Q is not true" say the same thing.) By the T-schema, we have Q&lt;=&gt;T(Q) (meaning Q and "Q is true" say the same thing). Therefore, we have T(Q)&lt;=&gt;~T(Q) (meaning "Q is true" and "Q is not true" say the same thing). Now we seem to have serious problems. In a classical logic, this proves a contradiction, finishing Tarski's proof that no such logic can exist.&lt;br /&gt;&lt;br /&gt;So, it seems that there are plenty of ways to get around the undefinability theorem. However, it is not obvious which one is the &lt;span style="font-style: italic;"&gt;right&lt;/span&gt; way.&lt;br /&gt;&lt;br /&gt;Here is a list of some assumptions that could be violated:&lt;br /&gt;&lt;br /&gt;1. The T-schema&lt;br /&gt;2. The Diagonal Lemma&lt;br /&gt;3. If we can express a predicate P, we can express its negation ~P&lt;br /&gt;4. Every sentence is either true or false&lt;br /&gt;5. No sentence is both true and false&lt;br /&gt;&lt;br /&gt;This is not a new problem, so there are many proposed solutions. Of course, my version is somewhat unique, since I want a solution that not only allows "truth" to be defined, but also any other coherent concept that a human might wish to define.&lt;br /&gt;&lt;br /&gt;Some initial thoughts.&lt;br /&gt;&lt;br /&gt;The T-schema seems solid, although I can't rule out the possibility of counterintuitive exceptions.&lt;br /&gt;&lt;br /&gt;The Diagonal Lemma is an assumption I cannot break, since what I am proposing is a logic that can refer to anything a human can refer to. It is &lt;span style="font-style: italic;"&gt;conceivable&lt;/span&gt; that the logic humans use does not admit the Diagonal Lemma, but if that is the case, then my goal is impossible because it implies that humans cannot define the logic that they use. If my goal is acheivable, then it is acheivable with a self-referential logic.&lt;br /&gt;&lt;br /&gt;Assumption 3 seems solid, but again maybe there are some strange counterintuitive exceptions.&lt;br /&gt;&lt;br /&gt;Assumption 4 seems false in the presence of sentence Q; it is far from obvious that sentence Q is actually true or false.&lt;br /&gt;&lt;br /&gt;Q could also be seen as a counterexample to assumption 5, if it is seen as &lt;span style="font-style: italic;"&gt;both&lt;/span&gt; thrue and false rather than neither. Personally I prefer neither.&lt;br /&gt;&lt;br /&gt;Source:&lt;br /&gt;http://plato.stanford.edu/entries/self-reference/#Matt-sema&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-348066551872316304?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/348066551872316304/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/direct-attack-my-approach-thus-far-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/348066551872316304'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/348066551872316304'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/direct-attack-my-approach-thus-far-to.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8974519994493823065</id><published>2008-08-12T10:26:00.001-07:00</published><updated>2008-08-12T11:20:04.814-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Paradox&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I discovered a paradox in the version of lambda logic that I am creating. It is not difficult to fix. However, I'll explain the paradox before I explain how I fixed it.&lt;br /&gt;&lt;br /&gt;Here it is symbolically (or rather, semisymbolically, because I'm not using any math rendering):&lt;br /&gt;&lt;br /&gt;L = lambda x . (x x)&lt;br /&gt;H = lambda x . exists y . output((x x), y)&lt;br /&gt;I = lambda x . (H x (L L) 1)&lt;br /&gt;&lt;br /&gt;With these definitions, the expression (H I) is paradoxical. I'll explain why.&lt;br /&gt;&lt;br /&gt;The first important thing is that I am treating an existential statement as if it were a function that returned true or false. In lambda logic (the official version), this is allowed:&lt;br /&gt;&lt;br /&gt;True = lambda x y . x&lt;br /&gt;False = lambda x y . y&lt;br /&gt;&lt;br /&gt;So, True is a function that takes 2 arguments and returns the first, while False takes two and returns the second. This should be thought of in terms of if/then/else statements. We can make triples (a b c), and if &lt;span style="font-style: italic; font-weight: bold;"&gt;a&lt;/span&gt; evaluates to True, &lt;span style="font-style: italic; font-weight: bold;"&gt;b&lt;/span&gt; is returned (the "then" clause); but if &lt;span style="font-style: italic;"&gt;a&lt;/span&gt; turns out false, then &lt;span style="font-style: italic; font-weight: bold;"&gt;c&lt;/span&gt; gets returned.&lt;br /&gt;&lt;br /&gt;I define &lt;span style="font-weight: bold;"&gt;L&lt;/span&gt; for convenience. &lt;span style="font-weight: bold;"&gt;L&lt;/span&gt; takes an input, x, and returns x applied to itself. The compound statement (&lt;span style="font-weight: bold;"&gt;L L&lt;/span&gt;) is an infinite loop in lambda calculus: it takes &lt;span style="font-weight: bold;"&gt;L&lt;/span&gt; and applies it to itself, resulting in (&lt;span style="font-weight: bold;"&gt;L L&lt;/span&gt;) again. This isn't paradoxical in itself, because the logic does not assume that a result exists for every computation.&lt;br /&gt;&lt;br /&gt;The second term, &lt;span style="font-weight: bold;"&gt;H&lt;/span&gt;, takes a term and detects whether it has a result when applied to itself. That is the meaning of "exists y . output((x x), y)": there exists an output of (x x). (I could have used the &lt;span style="font-weight: bold;"&gt;L&lt;/span&gt; construct, writing (&lt;span style="font-weight: bold;"&gt;L&lt;/span&gt; x) instead of (x x), but that wouldn't be any shorter, would it?) So, the way to read (&lt;span style="font-weight: bold;"&gt;H&lt;/span&gt; x) is "x outputs something when handed itself as input".&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;I&lt;/span&gt; is what I call the switch operation. If &lt;span style="font-weight: bold;"&gt;F&lt;/span&gt; is a function that returns some result (when applied to itself), then (&lt;span style="font-weight: bold;"&gt;I F&lt;/span&gt;) doesn't return anything. If F doesn't, (&lt;span style="font-weight: bold;"&gt;I F&lt;/span&gt;) does. How does it work? Well, &lt;span style="font-weight: bold;"&gt;I&lt;/span&gt; is written in the if-then-else format. The "if" part is the "H x", meaning "x returns something when applied to itself". The "then" part is (&lt;span style="font-weight: bold;"&gt;L L&lt;/span&gt;), a statement that never returns anything. The "else" is just 1, a number. Any concrete thing could be put in this spot; it just gives the function something definite to return.&lt;br /&gt;&lt;br /&gt;Now, the paradox comes when we consider (&lt;span style="font-weight: bold;"&gt;H I&lt;/span&gt;). This expression asks the question, "Does the switch operation return anything when applied to itself?" Either it does or it doesn't. If it does, then (because it is the switch operator), it doesn't. If it doesn't, then it does. Contradiction. Paradox.&lt;br /&gt;&lt;br /&gt;The problem is that I am allowing lambda-terms to be formed with non-computable elements inside them (such as the existential statement inside H). To avoid the paradox, but preserve the ability to represent any statement on the arithmetical hierarchy, I should restrict the formation of lambda terms. A set of computable basic functions should be defined. The list could include numerical equality, addition, subtraction, multiplication, exponent, et cetera. (Any operation that takes nonnegative integers and returns nonnegative integers.) However, it is sufficient to include only the function F(x), where F(x)=x+1. (Everything else can be developed from this, using lambda calculus.) Also-- it should be understood that the terms that can be created (from these basic computable terms plus lambda) are &lt;span style="font-style: italic;"&gt;all the entities that exist within the logic&lt;/span&gt;. So, if the logic says things like "exists y . output((x x), y)", it should be understood that the "y" that exists is one of these terms.&lt;br /&gt;&lt;br /&gt;Now, why doesn't the official version of lambda logic need to avoid throwing existential statements inside lambda expressions? The reason is that strong predicate I use, "output(x, y)", that means "y is the result of evaluating x". Lambda logic proper only uses a function that represents the performance of a single computational step. If I try to define &lt;span style="font-weight: bold;"&gt;H&lt;/span&gt;, I get a statement that always evaluates to True, because there is &lt;span style="font-style: italic;"&gt;always&lt;/span&gt; a result of performing a single computational step.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8974519994493823065?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8974519994493823065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/paradox-i-discovered-paradox-in-version.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8974519994493823065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8974519994493823065'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/paradox-i-discovered-paradox-in-version.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4783673755082429990</id><published>2008-08-06T08:07:00.001-07:00</published><updated>2008-08-06T08:15:39.947-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;More on Lambda Logic&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I've read more about Lambda Logic. The Lambda Logic that I found is actually a bit different than the one that I was considering on my own. First, it is not referentially strong-- the logic has a complete inference method, which means it can only talk about provable things. So, it can't really reference lambda-computations in the way I want, because there are unprovable facts about computations.&lt;br /&gt;&lt;br /&gt;Another interesting point-- Lambda Logic is inconsistent with the &lt;a href="http://en.wikipedia.org/wiki/Axiom_of_choice"&gt;Axiom of Choice&lt;/a&gt;!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4783673755082429990?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4783673755082429990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/more-on-lambda-logic-ive-read-more.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4783673755082429990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4783673755082429990'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/more-on-lambda-logic-ive-read-more.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7670423030725893977</id><published>2008-08-05T14:08:00.000-07:00</published><updated>2008-08-05T14:21:50.860-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Latest Finds&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;My logic exists!&lt;br /&gt;Its official name is "&lt;a href="http://www.michaelbeeson.com/research/papers/LambdaLogicOriginal.pdf"&gt;lambda logic&lt;/a&gt;". Makes sense.&lt;br /&gt;&lt;br /&gt;Also, the &lt;a href="http://www.opencog.org/wiki/OpenCogPrime:WikiBook"&gt;Opencog Prime documentation&lt;/a&gt; was recently released. Worth a look! The knowledge representation used there is &lt;span style="font-style: italic;"&gt;also&lt;/span&gt; of the form I've been considering-- there are both explicitly represented facts, and facts that are represented implicitly by programs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7670423030725893977?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7670423030725893977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/my-logic-exists-its-official-name-is.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7670423030725893977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7670423030725893977'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/08/my-logic-exists-its-official-name-is.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7224522405144618367</id><published>2008-07-29T11:55:00.000-07:00</published><updated>2008-07-29T12:47:25.422-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;What's Up?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-results-in-this-post-i.html"&gt;Last time&lt;/a&gt;, I gave a fairly simple logic that satisfies me in its ability to refer to mathematical concepts that fall in the realm mathematicians call the Arithmetical Hierarchy. Fun. But perhaps you think something is fishy-- it isn't too hard to design a deduction system that uses this logic I've described, but it doesn't seem like it would be able to do anything that a normal logic engine can't do. What's up?&lt;br /&gt;&lt;br /&gt;Perhaps you're thinking that I'm being a bit philosophical, that I'm not worrying about something that actually matters to people trying to implement an AI system. The issue sounds pretty abstract. "Is the logic grounded? Does it &lt;span style="font-style: italic;"&gt;Refer&lt;/span&gt;?"&lt;br /&gt;&lt;br /&gt;Well, anyway, here's why it actually matters.&lt;br /&gt;&lt;br /&gt;The logic doesn't do much by itself. But it can be conveniently augmented by a nonmonotonic reasoning system like the one I talked about &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html"&gt;previously&lt;/a&gt;. What's nice is that the nonmonotonic theory is automatically specified by the structure of the first-order-logic-plus-lambda-calculus theory, automatically taking care of some issues that were troubling me. (If I wanted to make that nonmonotonic logic look like a normal nonmonotonic logic, then I would need to add some strange control operators specifying a nesting of nonmonotonicness corresponding to the arithmetical levels. Maybe that is OK. But it gets worse if you ask what happens when someone comes along and arranges those nesting operators in a way that creates recursion.)&lt;br /&gt;&lt;br /&gt;A computable predicate is a sentence containing unbound terms, halting lambda terms, and predicates that can be determined immediately (such as equality for numbers that are given in a standardized notation). Also, boolean operations on such predicates. When we add qunatifiers to these statements, we can easily determine where they fall on the arithmetical hierarchy (although we cannot guarantee that this is the lowest order, since there could be a simpler way of writing the same thing, and it could fall on a lower order.) This allows us to use the procedure previously defined to nonmonotonically reason about all statements.&lt;br /&gt;&lt;br /&gt;Another way that this union of lambda calculus and first-order logic could matter would be in inductive systems. If a system learns models based on the combined logic, it might be able to usefully differentiate between models that would be equivalent otherwise. As I discussed in the last post, the difference is that the semantics of lambda calculus assumes a closed world in which no new rules or propositions will be added, while the first-order world is fundamentally open. A theory that uses lambdas is therefore strictly more specific than the pseudo-equivalent first-order theory that is able to make all the same positive deductions; the first-order theory fails to rule out some things that the theory with lambda can. How can we take advantage of this?&lt;br /&gt;&lt;br /&gt;Well. Via probability theory, a more specific theory should make the &lt;span style="font-style: italic;"&gt;actual&lt;/span&gt; data more likely, by ruling out a greater number of alternative possibilities. To get this to work for us, we've got to actually enforce the open-world-ness of pure first-order logic when we assign probabilities. This is not typically something people worry too much about. It is easy to see this as a bit philosophical; I might be fixing a problem that people do not have, because they go ahead and make closed-world assumptions where they need to in practice. But I am describing a formal way of looking at these closed-world assupmptions, and that seems useful.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7224522405144618367?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7224522405144618367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/07/whats-up-last-time-i-gave-fairly-simple.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7224522405144618367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7224522405144618367'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/07/whats-up-last-time-i-gave-fairly-simple.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-6820442378797647617</id><published>2008-07-21T09:55:00.000-07:00</published><updated>2008-07-21T14:03:57.005-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='lambda calculus'/><category scheme='http://www.blogger.com/atom/ns#' term='computation'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;New Grounding Results&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-criteria-in-this-post-i.html"&gt;this post&lt;/a&gt; I detailed a different (and hopefully better) grounding requirement for an AI logic. I've now come up with a logic that satisfies the requirement, based on the speculations in that post.&lt;br /&gt;&lt;br /&gt;This logic is nicely simple. All I do is add &lt;a href="http://en.wikipedia.org/wiki/Lambda_calculus"&gt;lambda calculus&lt;/a&gt; to &lt;a href="http://en.wikipedia.org/wiki/First_order_logic"&gt;first-order logic&lt;/a&gt;. The lambda calculus is used as a new way of notating &lt;a href="http://en.wikipedia.org/wiki/First_order_logic#Formation_rules"&gt;terms&lt;/a&gt;, acting like function-symbols act normally. The difference is that lambda-calculus is able to uniquely define computations, while normal function symbols cannot.&lt;br /&gt;&lt;br /&gt;The reason first-order logic cannot uniquely define computations is that it cannot rule out &lt;a href="http://en.wikipedia.org/wiki/Non-standard_model"&gt;nonstandard interpretations&lt;/a&gt;. Let's say we try to define some method of computation (such as lambda calculus, Turing machines, etc.). Determining the next step in the computation is always very simple; it wouldn't be a very useful definition of computation otherwise. And since a computation is nothing but a series of these simple steps, it might seem like all we need to do is define what happens in a single step. Once we do that, the logic knows what to do next at each stage, and can carry any particular computation from start to finish without error.&lt;br /&gt;&lt;br /&gt;The problem here is that we are only specifying what the computation &lt;span style="font-style: italic;"&gt;does&lt;/span&gt; do, not what it &lt;span style="font-style: italic;"&gt;doesn't&lt;/span&gt;. In other words, the first-order logic will carry out the computation correctly, but it will not "know" that the steps it is taking are the &lt;span style="font-style: italic;"&gt;only&lt;/span&gt; steps the computation takes, or that the result it gets is the  &lt;span style="font-style: italic;"&gt;only&lt;/span&gt; result of that computation. This may sound inane, but first-order logic makes no assumptions. It only knows what we tell it.&lt;br /&gt;&lt;br /&gt;The problem is, with just first-order logic, there &lt;span style="font-style: italic;"&gt;is&lt;/span&gt; no way to say what we need to here. If there was an "and that's all" operator, we could list the positive facts about computations, and say "that's all". The logic would then know that any computational step not derivable from those basic steps never happens. But there is no such operator.  I can tell it various things about what the computation does not do, but I will never finish my list.&lt;br /&gt;&lt;br /&gt;Another way of putting this is that we can positively characterize computation, but we cannot negatively characterize it. Why not? Well, first-order logic is &lt;span style="font-style: italic;"&gt;complete&lt;/span&gt;; this means that the rules of deduction can find all implications of statements we make. If we were able to state all negative facts, it would be able to find all consequences of them. But this would go against fundamental facts of computation. In particular, we know there is no way of deducing  &lt;a href="http://en.wikipedia.org/wiki/Halting_problem"&gt;which computations eventually halt&lt;/a&gt;. If computations could be totally negatively characterized in a complete logic like first-order logic, the deduction rules would be capable of telling us this. So, this must be impossible.&lt;br /&gt;&lt;br /&gt;In a sense, the reason we can't characterize computations negatively is because first-order logic has an &lt;a href="http://en.wikipedia.org/wiki/Open_world_assumption"&gt;open-world assumption&lt;/a&gt;. This means that if the deduction rules do not prove a statement true or false, it could be either. This is as opposed to a &lt;a href="http://en.wikipedia.org/wiki/Closed_World_Assumption"&gt;closed-world assumption&lt;/a&gt;, which would mean that if a statement is not proven true, it must be false.&lt;br /&gt;&lt;br /&gt;So how does inserting lambda calculus into the mess help?&lt;br /&gt;&lt;br /&gt;Lambda calculus is sufficiently closed to solve the problem. While a first-order theory can be added to without changing the meaning of what's already been stated, a program (stated in lambda calculus) cannot be altered freely. If we modify it, we simply have a different program on our hands. This means we can completely characterize many things that first-order logic alone cannot.&lt;br /&gt;&lt;br /&gt;Fleshing out some more details of the logic:&lt;br /&gt;&lt;br /&gt;-The deduction system should be a mix of a regular first-order deduction system, and a set of rules for the normal manipulation of the lambda calculus. This is as opposed to &lt;span style="font-style: italic;"&gt;axioms&lt;/span&gt; describing how to manipulate the lambda calculus. Axioms would be in first-order logic and would therefore do us no good in terms of my grounding requirement, because they would not negatively characterize. The rules negatively characterize because they cannot be added to. (This is a petty distinction in a way; in practice, it wouldn't matter much, since nobody would add more facts to a running system concerning the lambda-calculus manipulation. But they &lt;span style="font-style: italic;"&gt;could&lt;/span&gt;, and the system would accept these facts, whereas they &lt;span style="font-style: italic;"&gt;can't &lt;/span&gt;add new rules without changing the system.)&lt;br /&gt;&lt;br /&gt;-There is a question about how much to tell the first-order logic about its embedded lambda calculus. We could tell it nothing. But it seems practical for the system to have explicit first-order knowledge about the lambda-terms, so that it could deduce some facts about their manipulation (perhaps deducing facts about the results of certain calculations more quickly than could be explicitly checked by running all of those calculations). However, I have already said that we could keep telling the system more and more facts about computation, and yet never finish. So where do we stop? I have two ideas here. First, it seems reasonable to tell the system only the positive facts about the lambda calculus, since the negative ones are the ones that cause trouble. A few basic negative facts could be added at whim, if desired. This is very flexible. Second, we do not need to limit ourselves to first-order logic. We have lambda calculus at our disposal, and we can use it! In addition to placing the rules of manipulation outside of the system, as deduction rules, we could encode them within the system, using lambda calculus. This would allow the system to "know the deduction rules" despite it being impossible to completely characterize them in first-order logic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-6820442378797647617?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/6820442378797647617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-results-in-this-post-i.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6820442378797647617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/6820442378797647617'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-results-in-this-post-i.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7000496778467783570</id><published>2008-07-14T14:18:00.000-07:00</published><updated>2008-07-14T15:11:17.146-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic hypercomputation'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;New Grounding Criteria&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html"&gt;this post&lt;/a&gt; I talk about the need for a logic's semantics to be in some sense determined by its deduction rules. If this isn't the case, then the logic is not suitable for artificial intelligence: the system's behavior will not reflect the intended meaning of the symbols. In other words, the logic's meaning needs to be "grounded" by its use.&lt;br /&gt;&lt;br /&gt;Unfortunately, although I presented some ideas (and added more in this post) I didn't come up with a real definition of the necessary relationship between syntax and semantics.&lt;br /&gt;&lt;br /&gt;The general idea that I had was to ground a statement by specifying a sensible procedure for determining the truth of the statement. Due to Godel's theorem, it is impossible for such a procedure to be always-correct, so I set about attempting to define approximately correct methods.&lt;br /&gt;&lt;br /&gt;But, now, I am questioning that approach. I've decided on a different (still somewhat ambiguous) grounding requirement. Rather than requiring that the system know what to do to &lt;span style="font-style: italic;"&gt;get&lt;/span&gt; a statement, I could require that it knows what to &lt;span style="font-style: italic;"&gt;do&lt;/span&gt; with it once it &lt;span style="font-style: italic;"&gt;has&lt;/span&gt; it. In other words, a statement that is not meaningful on its own (in terms of manipulating the world) may be meaningful in terms of other statements (it helps manipulate meaningful statements).&lt;br /&gt;&lt;br /&gt;Again, this covers the &lt;a href="http://en.wikipedia.org/wiki/Arithmetical_hierarchy"&gt;arithmetical hierarchy&lt;/a&gt; (see &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/more-hyperlogic-in-previous-hyperlogic.html"&gt;previous&lt;/a&gt;). We start with the computable predicates, which are well-grounded. If I have a grounded predicate, I can make a new grounded predicate by adding a universal &lt;a href="http://en.wikipedia.org/wiki/Quantifier"&gt;quantifier&lt;/a&gt; over one of the variables. If I knew some instance of the new predicate to be true, I would be able to conclude that an infinite number of instances of the old were true (namely, all instances obtainable by putting some number in the spot currently occupied by the universally quantified variable). Existential quantifications have a less direct grounding: if we knew an existential statement to be true, we could conclude the falsehood of a universal statement concerning the opposite predicate (meaning "there exists X for which P(X)" is grounded in "it is not the case that for all X, not P(X)").&lt;br /&gt;&lt;br /&gt;Because we know what &lt;span style="font-style: italic;"&gt;would&lt;/span&gt; be true &lt;span style="font-style: italic;"&gt;if&lt;/span&gt; some statement were true, we can attempt to use the scientific method to test the truth of statements. This is essentially what I was doing with my previous attempts. However, as I mentioned then, such methods will not even eventually converge to the right answer (for the tough cases); they will keep flipping back and forth, and even the frequencies of such flipping are meaningless (otherwise we could use them to decide). Nonetheless, human nature makes us want to try, I think...&lt;br /&gt;&lt;br /&gt;Obviously I have a bit of work to do. For example... while this provides &lt;span style="font-style: italic;"&gt;some&lt;/span&gt; grounding for arithmetical statements, does it provide enough to really fix their desired meaning? (This is particularly unclear for the existential statements.) Also, what else can be characterized like this? Is this method only useful for the arithmetical truths?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7000496778467783570?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7000496778467783570/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-criteria-in-this-post-i.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7000496778467783570'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7000496778467783570'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/07/new-grounding-criteria-in-this-post-i.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-939391840922380697</id><published>2008-06-27T09:28:00.000-07:00</published><updated>2008-06-27T10:25:29.575-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Something Cool&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;I ran into some interesting parallels...&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.agiri.org/docs/ComputationalApproximation.pdf"&gt;This paper&lt;/a&gt; gives a method for finding programs that could have generated a given piece of data, by simulating a universal Turing machine running backwards from the data to the possible programs that generated it. Since a Turing machine can destroy information while running forward (for example, if it writes 0 to a part of the tape then we no longer know if there was already a 0 there or if there was a 1) running it backwards involves making arbitrary decisions (such as what to put on a square that is reverse-written-to). These different decisions will lead us to different possible programs that might have created the data. We should look for &lt;span style="font-style: italic;"&gt;elegant&lt;/span&gt; programs, because these explain the &lt;span style="font-style: italic;"&gt;pattern&lt;/span&gt; behind the data. But of course we do not know which paths lead to the most elegant programs, so we need to search.&lt;br /&gt;&lt;br /&gt;The search space is rather large, but it is a &lt;span style="font-style: italic;"&gt;huge&lt;/span&gt; improvement over trying to search the &lt;span style="font-style: italic;"&gt;entire&lt;/span&gt; space of programs. Contrast this to &lt;a href="http://www.idsia.ch/%7Ejuergen/oops.html"&gt;Jurgen Schmidhuber's&lt;/a&gt; earlier approach, which throws out a program as soon as it produces a wrong output. The new approach avoids the incorrect programs altogether, so that all we need to worry about in our search is program search.&lt;br /&gt;&lt;br /&gt;Backtracking is also easy. All we need to do is run the simulation forwards. Since the Turing machine is deterministic, we don't need to make any decisions. We can keep going forward and backward as many times as we like and still remain in the space of programs that produce the desired output.&lt;br /&gt;&lt;br /&gt;My interest in all of this is adapting it to learning &lt;a href="http://en.wikipedia.org/wiki/Formal_grammar"&gt;grammars&lt;/a&gt;. Since grammars are turing-complete representations, this is not difficult. What is interesting, though, is that when translated, there is a strong analogy between this new method and &lt;a href="http://www.aclclp.org.tw/rocling/2006/s1p4.pdf"&gt;old methods&lt;/a&gt; for grammar learning. The old methods are restricted to a special case called &lt;a href="http://en.wikipedia.org/wiki/Context-free_grammar"&gt;context free grammars&lt;/a&gt;, so the new method is basically a generalization. To use a grammar, we apply its rules successively until we get the final product. The old methods for finding context-free grammars are interesting because what they do is essentially make reverse rule applications to construct a grammar from the end result. The analogy between this and running a Turing machine backwards is quite strong.  Those methods for learning context-free grammars are also quite analagouse to the crazy ideas in my earlier posts on this blog.&lt;br /&gt;&lt;br /&gt;I am interested in examining how these methods can be adapted in various ways, including taking some ideas from &lt;a href="http://arxiv.org/abs/cs/9709102"&gt;SEQUITUR&lt;/a&gt; to attempt an incremental version of the algorithm, and ideas from &lt;a href="http://archives.cs.iastate.edu/documents/disk0/00/00/05/72/index.html"&gt;this paper&lt;/a&gt; to better guide the search (at least for context-free cases). That is, if I find the time...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-939391840922380697?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/939391840922380697/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/something-cool-i-ran-into-some.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/939391840922380697'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/939391840922380697'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/something-cool-i-ran-into-some.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8502620451661750911</id><published>2008-06-23T09:13:00.000-07:00</published><updated>2008-06-23T10:40:12.238-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hyperlogic'/><category scheme='http://www.blogger.com/atom/ns#' term='computation'/><category scheme='http://www.blogger.com/atom/ns#' term='hypercomputation'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;More Hyperlogic&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;In the &lt;a href="http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html"&gt;previous hyperlogic post&lt;/a&gt; I described loosely the &lt;a href="http://en.wikipedia.org/wiki/Arithmetical_hierarchy"&gt;arithmetical hierarchy&lt;/a&gt;, and a "hyperlogic" to deal with it. Although I didn't describe this in detail, the definition of this hyperlogic was inspired by a particular set of &lt;a href="http://en.wikipedia.org/wiki/Arithmetical_hierarchy"&gt;hypercomputers&lt;/a&gt; that can calculate functions in the arithmetical hierarchy.&lt;br /&gt;&lt;br /&gt;The particular hypercomputers that I was thinking of are infinite-time turing machines, but restricted to change each output square only once during each infinite subcomputation. An infinite subcomputation is the size of the first ordinal infinity, whereas the computation as a whole may be logner (and thus have many infinite subcomputations); the number of infinite subcomputations is determined by the level on the hierarchy. (Additionally I need to stipulate that the output squares can start out either all black or all white depending on if we are calculating an existential (assumed false) or a universal (assumed true).)&lt;br /&gt;&lt;br /&gt;So, to extend the hyperlogic further, it seems as if I need a more powerful hypercomputer to use as a basis.&lt;br /&gt;&lt;br /&gt;After the arithmetical hierarchy comes the &lt;a href="http://en.wikipedia.org/wiki/Analytical_hierarchy"&gt;analytical hierarchy&lt;/a&gt;. The shift is from 1st-order arithmetic to 2nd-order arithmetic. (It is important to distinguish this from &lt;a href="http://en.wikipedia.org/wiki/First-order_logic"&gt;1st-order logic&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Second-order_logic"&gt;2nd-order logic&lt;/a&gt;. 1st order logic is complete and doesn't need hypercomputers to decide its quantifiers, while 1st-order arithmetic is incomplete and does need hypercomputation.) In the arithmetical hierarchy, we are only &lt;a href="http://en.wikipedia.org/wiki/Quantifier"&gt;quantifying&lt;/a&gt; over integers. In the analytic hierarchy, we quantify over sets of integers. The space being quantified over is infinitely larger, in a way that makes the previous methods fail.&lt;br /&gt;&lt;br /&gt;The class of infinite-time turing machines I was considering uses only &lt;a href="http://en.wikipedia.org/wiki/Countable_set"&gt;countable&lt;/a&gt; infinities of time. The analytic hierarchy requires quantification over an &lt;a href="http://en.wikipedia.org/wiki/Uncountable"&gt;uncountable&lt;/a&gt; set, meaning it needs uncountable infinities of time.&lt;br /&gt;&lt;br /&gt;An uncountable infinity of turing-machine steps may seem strange (or worse, incoherent), but if we ignore the details it works. It doesn't even matter what order everything happens in; since all I need is existential and universal quantification, I can just say that the machine stops and returns true (/false) if an example (/counterexample) is found, and otherwise keeps going until it has checked everything (at which point it returns the opposite truth value).&lt;br /&gt;&lt;br /&gt;I do need to specify further how "checking an individual case" works. But this is not too difficult. Quantifying over sets of numbers gives us in each case some predicate that returns either true or false for each number. If there are more than one such quantifiers, we get more than one predicates. Ignoring the inner workings of the predicates, we just use them as given in what is otherwise a normal case of arithmetical hypercomputation.&lt;br /&gt;&lt;br /&gt;Now, can we convert this to a finite process that in some sense captures the concept?&lt;br /&gt;&lt;br /&gt;My tentative proposal is: look at a generic case of the problem, converting all set quantifiers to undefined predicates. Run the decision procedure for this arithmetical statement. Every time we need a particular value of one of the predicates, split the computation; run one version with the value of the predicate being "true" for the number it was given, and a second version with the value being "false". The result (after all versions of the arithmetical decision process terminate) will be a table with the number of dimensions determined by the number of set quantifiers, and the number of entries determined by how many versions of each quantifier ended up being needed. In any case we can attribute an approximate truth value to the statement by deciding each quantifier in order based on the examples examined.&lt;br /&gt;&lt;br /&gt;Now, I am less sure about this method than I was about the arithmetical method. It seemed that improvements to the arithmetical method could never cause a serious increase in power, so that the method was within the realm of "the best we can do". However, for this method it is not clear that the machine is doing the best that can be done. In particular, I am not sure that adding direct 2nd-order theorem proving would not seriously increase the power.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8502620451661750911?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8502620451661750911/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/more-hyperlogic-in-previous-hyperlogic.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8502620451661750911'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8502620451661750911'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/more-hyperlogic-in-previous-hyperlogic.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-5646990314541495985</id><published>2008-06-19T14:56:00.000-07:00</published><updated>2008-06-21T17:01:18.964-07:00</updated><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Some Links&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;http://research.microsoft.com/~minka/statlearn/glossary/glossary.html&lt;br /&gt;&lt;br /&gt;A long list of AI-related terminology with definitions, and lots of links to related websites.&lt;br /&gt;&lt;br /&gt;http://tldp.org/HOWTO/AI-Alife-HOWTO.html#toc1&lt;br /&gt;&lt;br /&gt;A really long list of AI for Linux.&lt;br /&gt;&lt;br /&gt;http://www.math.hawaii.edu/~dale/godel/godel.html&lt;br /&gt;&lt;br /&gt;A nice description of Godel's theorem and other results in therms of the diagonal argument.&lt;br /&gt;&lt;br /&gt;http://www.cs.berkeley.edu/~bhaskara/alisp/&lt;br /&gt;&lt;br /&gt;This is Alisp. Alisp is, in a sense, a very basic AI system (since it employs simple learning methods, with little potential for recognizing complex structure). However, it is very interesting from an interface point of view. It allows the maximum possible use of the simple AI it contains. Therefore, I think the idea of using Alisp as a starting point to build up a more complicated AI is interesting (since I am into the idea of recursively building up complex capabilities from simple ones).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-5646990314541495985?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/5646990314541495985/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/some-links-httpresearch.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5646990314541495985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5646990314541495985'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/some-links-httpresearch.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4291689620590959058</id><published>2008-06-18T06:45:00.000-07:00</published><updated>2008-06-21T17:01:40.463-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;The Logic of Proof: Unifying Classical and Intuitionistic Logic&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In a previous post, I mentioned &lt;a href="http://web.cs.gc.cuny.edu/%7Esartemov/"&gt;Sergei Artemov&lt;/a&gt;. I've now read a couple of his papers there. His work is based in extending one particular modal logic, developed by Godel. Godel's logic introduces a "provability" operator. (More specifically, it takes the meaning of the modal "necessary" to be that a statement is provable.) Part of the purpose of this logic is to pin down the meaning of intuitionistic logic. To translate an intuitionistic statement into a classical statement, we add the provability operator in front of the entire formula and in front of every subformula. Take [] to be the provability operator. "P and Q" stated in intuitionistic logic translates to []([]P and []Q), which means "it is provable that both P and Q are provable." "P or Q" would translate to []([]P or []Q), "it is provable that either P is provable or Q is." And so on for larger statements. The strange altered of intuitionistic logic naturally arise from this. (It is also possible to add slightly less [] operators, interestingly.)&lt;br /&gt;&lt;br /&gt;Artemov's main contribution is to provide an &lt;span style="font-style: italic;"&gt;explicit&lt;/span&gt; provability logic: instead of just containing an operator "x is provable", we can say "y is a proof of x". This can be represented by [y]x (we just fill in the box that was left empty in Godel's original version). Godel was interested in finding such a logic, but had not provided a complete set of axioms for it. The explicit provability has some nice formal properties that implicit provability lacks, and those properties help to prove some tighter relationships between various logics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4291689620590959058?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4291689620590959058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/logic-of-proof-unifying-classical-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4291689620590959058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4291689620590959058'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/logic-of-proof-unifying-classical-and.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7316080169999427279</id><published>2008-06-16T15:37:00.000-07:00</published><updated>2008-06-16T15:37:00.520-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computation'/><category scheme='http://www.blogger.com/atom/ns#' term='hypercomputation'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;&lt;span style="font-size:100%;"&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 24.0px Georgia"&gt;A Basic Hyperlogic&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;My search for the proper logic is guided by the requirement that desired meaning of the logic must be reflected totally in the rules of inference. This is a form of something called &lt;a href="http://www.eucognition.org/wiki/index.php?title=Symbol_Grounding_in_Cognitive_Systems"&gt;&lt;span style="text-decoration: underline ; color: #4d2382"&gt;grounding&lt;/span&gt;&lt;/a&gt;. The idea behind grounding is, roughly, that in order to have a meaning, a symbol needs to have a proper connection to what it actually refers to. This is sometimes used in the AI community to argue that robotics is absolutely required for AI, because an AI shunting symbols around on its own inside computer cannot really "know" anything (its symbols have no meaning). This is idea is also known as embodies intelligence, or situated cognition. The form of grounding that I'm talking about is somewhat different: I want a way to ground logical and mathematical concepts. Mathematical grounding &lt;i&gt;must&lt;/i&gt; exist, because humans are able to refer to mathematical entities, and understand mathematical meaning. My aim is to figure out how.&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;I've discussed the inadequacy of current mathematical logics in the past. In that post, I also made some wild speculations about creating a logic based on "limit computers" and even more powerful machines, . Basically, this post is a revision of those speculations based on some solid facts.&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;Limit-computers, as well as more powerful imaginary machines, are called "hypercomputers". Similarly, I'll call the logics that match them "hyperlogics".&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;To measure the power of a hyperlogic, I'll use something called the arithmetical hierarchy. Very approximately: the first level of the hierarchy contains all computable facts. (So for example, it contains the fact that 3 + 4 = 7.) The second level contains facts either of the form "for all x, P(x)" or "there exists x such that P(x)", for any computable predicate P. Notice that we &lt;i&gt;may&lt;/i&gt; need to check an infinite number of cases to decide the truth of these facts, but we may not. If the fact is a "for all" fact, then we may run into a case where P isn't true; we have a counterexample, and are done. But if we never find a counterexample, we would never finish checking the infinite number of possible values for x to conclude that P was true everywhere. Similarly for "exists": we can stop if we find a true case of P, but we would need an infinite amount of time to conclude that none exists.&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;The third level of the hierarchy is the same, except that it takes P to be second-level, rather than first-level. This means we need to check a possibly infinite number of second-level statements to decide a third-level statement. (But remember, we may never finish even the &lt;i&gt;first&lt;/i&gt; of these, since checking a second-level statement can take forever.) Similarly, each level contains those statements that can be verified by looking at a (possibly) infinite number of statements on the previous level. (Nice fact: each higher level can be decided by an infinite-time Turing machine given one more infinity of time.)&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;Mathematical truths beyond the second level cannot be logically determined, because no logic can check them. Yet humans can talk about them. So, the goal is to specify a logic that tells us how to reason about facts on these levels. The logic cannot possibly tell us whether these facts are true or false for sure, beyond facts of the 2nd level. I'm just looking for a logic that does the best it can. If we do the best we can to try to ascertain the truth of a statement, that should ground us, right? Intuitively, the idea is that some external observer looking into our thoughts (or the inner workings of our AI) should say "Oh, this symbol here must represent mathematical entity X, because it is manipulated the way entity X should be manipulated." Ascribing that meaning to the symbol is the best way of explaining the manipulation rules attached to it.&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;So: for 1st-level statements, use the method of computation. For 2nd-level statements, keep checking 1st-level cases until a deciding case is found, or until we're satisfied that there is none. (This reflects the scientific method: if a hypothesis escapes our best attempts to disprove it,we conclude that it is true.) For a third-level statement, test the 2nd-level cases until a decisive case is found, or until we're satisfied there is none. (Notice: the meaning of "decisive case" is weaker here, because our conclusions concerning 2nd-level statements are not entirely perfect.) And so on. This gives us an approximate decision of the truth of any statement on any level of the arithmetical hierarchy.&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;It may seem like I've left out some important details. To fully specify the decision method, I should specify how time is to be distributed between cases when checking. For example, on the third level, I could check many 2nd-level cases shallowly, or I could check a few deeply. Also, for higher-level statements, since the so-called decisive cases are not entirely decisive, it might be worthwhile to seek out more than one. If several were found, how should we represent the increase in certainty? These are important questions, but not &lt;i&gt;too&lt;/i&gt; important&lt;i&gt;.&lt;/i&gt; The statements in question are fundamentally undecidable, so there &lt;i&gt;can't be&lt;/i&gt; a really right answer. Any particular technique will fail in some cases. The only thing that can &lt;i&gt;definitely&lt;/i&gt; be said is: the more time we spend checking, the better. So I feel I am somewhat justified in leaving out these details, at least for now.&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia; min-height: 19.0px"&gt;&lt;br /&gt;&lt;/p&gt; &lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px Georgia"&gt;This hyperlogic is nice, but there is a reason this post is titled "a &lt;i&gt;basic&lt;/i&gt; hyperlogic". The arithmetical hierarchy does&lt;i&gt; not&lt;/i&gt; cover &lt;i&gt;all&lt;/i&gt; mathematical truths. So the quest isn't over. But I think this is a good start. I've got a concrete example with a good mathematical foundation, showing that it &lt;i&gt;is&lt;/i&gt; possible to create systems more powerful than 1st-order logic for AI.&lt;/p&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7316080169999427279?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7316080169999427279/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7316080169999427279'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7316080169999427279'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/basic-hyperlogic-my-search-for-proper.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-84386914465727930</id><published>2008-06-11T08:40:00.000-07:00</published><updated>2008-06-11T08:46:25.749-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='prior'/><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'></title><content type='html'>&lt;p style="margin-bottom: 0in;"&gt;&lt;span style="font-size:180%;"&gt;Bayesian Convergence: What it Will and Won't Do&lt;/span&gt;&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;In my continuing quest for the proper prior, I happened upon a nice result: under very general conditions, the effect that the prior has on the current belief will vanish as evidence accumulates. Nice! This means that a Bayesian learner will be good regardless of the choice of prior-- the learned beliefs will fit the evidence.&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;Does this mean the search for the correct prior is needless?&lt;br /&gt;&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;To answer that question, I should first give the convergence result in a bit more detail. (To get it in full detail, see &lt;a href="http://socrates.berkeley.edu/%7Efitelson/few/few_04/hawthorne.pdf"&gt;this paper&lt;/a&gt;.)&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;The key assumption that is made is that no model of the world has zero probability. Bayesian learning will never increase the probability of such a model, so this makes sense. The convergence result is most easily understood in the language of likelihood ratios. In this version of bayesian learning (which gives the same end results as other versions, just by different intermediate steps) we start out with the "prior odds," rather than the prior probability, of a model. The prior odds of a model is just the probability for that model divided by the probability against. For each new bit of evidence that comes in, we multiply the current odds by the "likelihood ratio". The likelihood ratio is the probability of the evidence &lt;i&gt;given&lt;/i&gt; the model, divided by the probability of that evidence given &lt;i&gt;its negation.&lt;/i&gt; (The probability of the evidence given the negation is actually the sum of its probability given each of the &lt;i&gt;other&lt;/i&gt; possible models.) Now, as we observe more and more evidence, we multiply again and again to update the odds for each model we're considering. Yet the &lt;i&gt;prior&lt;/i&gt; odds remain a constant at the beginning of that long line of multiplication. The odds of a model, then, can become as large as they like &lt;i&gt;regardless of prior&lt;/i&gt;, and likewise can become as small as they might. The evidence is what matters, not the prior.&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;In any case, I am not about to drop my concern about which prior a rational entity should choose. The &lt;i&gt;main&lt;/i&gt; reason for this is that the convergence result leaves open the question of the class of models to be considered, which is my primary concern. Even if this were settled, however, the convergence theorem would not convince me that ensuring nonzero probability for each model is sufficient. The reason has to do with predictions.&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;To make the case as extreme as possible, I'm going to ignore probabilistic models, and only consider deterministic ones. This is actually no restriction at all; any prior over probabilistic models could be seen as a fancy way of specifying a prior over totally deterministic ones. A probabilistic model gives a probability to each possible dataset, and a prior over many probabilistic models can be seen as just a weighted sum of these, giving us a new (possibly more complicated) distribution over the possible datasets. This can be used as a prior. In fact, since it gives the same overall probability for each dataset, it is for practical purposes the same prior; yet the models it considers are deterministic.&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;Now that we're working with completely deterministic models, the data will either fit or not fit with each model. When it doesn't fit, we throw that model out. The convergence theorem still holds, because the set of models we're considering will keep shrinking as we throw more out; whenever this happens, the probability that belonged to the discredited model will be redistributed among the still-valid ones. Thus the probability of the correct model will continue to increase (since it's never thrown out).&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;However, this is not much comfort! The &lt;i&gt;relative&lt;/i&gt; probabilities of the models still in consideration will not be based on the evidence &lt;i&gt;at all&lt;/i&gt;; it will still be based purely on the prior. (The probability from a model that gets thrown out is redistributed, but not evenly.) This means that when we make predictions, the prior is (in a loose sense) the &lt;i&gt;only&lt;/i&gt; thing that determines our prediction.&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;In fact, if the prior assigns nonzero probability to every possible dataset, then the set of models not yet ruled out will contain all possible futures. The only thing that can narrow this down to make a useful prediction is the prior, which may or may not do so in a way dependent on the evidence so far.&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;Perhaps someone objects: "But then, can't we just require that a prior's predictions &lt;i&gt;do&lt;/i&gt; depend on the evidence? Isn't it an obviously silly mistake to construct a prior that violates this?" Unfortunately, simply ruling out these cases doesn't tell us what prior &lt;i&gt;to&lt;/i&gt; use. What &lt;i&gt;kind&lt;/i&gt; of dependence do we want? I want a prior that can in theory "notice any sort of regularity"; but this includes noticing that the data is just completely random (predictably unpredictable).&lt;/p&gt;&lt;p style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/p&gt;  &lt;p style="margin-bottom: 0in;"&gt;In a way, allowing probabilistic models is a very strange move. It's very similar to allowing models that are infinitely large; in a way, a probabilistic model includes information about an infinite number of coin flips, which are used in a well-specified (deterministic) way to decide on predictions. Of course, when we specify a probabilistic model, we don't specify this infinite table of heads and tails; in fact, that's where probability theory gets its power. This is reminiscent of the idea of a "random sequence" being a more fundamental notion then "probability", as discussed in the previous post... but that's enough speculation for today.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-84386914465727930?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/84386914465727930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/bayesian-convergence-what-it-will-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/84386914465727930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/84386914465727930'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/bayesian-convergence-what-it-will-and.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7036022792804642484</id><published>2008-06-10T17:29:00.000-07:00</published><updated>2008-06-10T16:48:30.369-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'></title><content type='html'>&lt;span style="font-size:180%;"&gt;Interpretation of Probability, Yet Again&lt;/span&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;In my previous post, I mentioned that my "&lt;/span&gt;&lt;span style="font-size:100%;"&gt;personalized "bayesian frequentist" interpretation of probability was struck down by the news that frequentism conflicts with the standard axioms of probability." The issue needs more consideration, so this post will discuss it exclusively.&lt;br /&gt;&lt;br /&gt;To that end, I've found &lt;a href="http://staff.science.uva.nl/%7Emichiell/docs/Blackwell.pdf"&gt;a very interesting article on the subject&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The information I had found previously went something like this. The simplistic frequentist position is to say that the probability of an event is the fraction of times it occurs in some set. This is problematic mainly because if we flip a coin five times and get the fractions heads=1/5 and tails=4/5, we can attribute these fractions to chance. We don't automatically believe that the "real" probability of heads is 1/5.&lt;br /&gt;&lt;br /&gt;Revision 1 of the frequentist view changes the definition to "limiting frequency": the probability is the fraction we would get if we kept at it. The limiting frequency does not always exist. For example, consider a sequence containing As and Bs, starting with ABBAAAABBBBBBBB... Each time, the number of same letters in a row doubles. The ratio of A to B will wave back and forth forever, never settling. So by definition, probabilities only apply to sequences for which the limiting frequency exists.&lt;br /&gt;&lt;br /&gt;This is better, but it still isn't quite right. There are two problems. First, a coin could land on heads every even flip and tails every odd flip. The limiting fraction for both sides would be 1/2, but this is obviously not a random sequence. So the requirement that there is a limiting frequency is not enough to guarantee that the sequence is probabilistic. Second, it is possible to re-order an infinite sequence to make the limiting frequencies different. With the same alternating heads/tails sequence, we could reorder as follows: group heads together in pairs by moving them backwards, but keep tails isolated. This makes the limiting frequency of heads 2/3. It seems odd that the probability would change when we're just counting in a different order.&lt;br /&gt;&lt;br /&gt;To fix this, von Mises defined something called a "collective". Before reading the paper, I knew that a collective had the additional property that any subsequence chosen without knowledge of where the heads were and where the tails were would have the same limit frequencies. I had also read that the resulting theory was inconsistant with the standard axioms of probability. I wondered: if this definition is inconsistant with the standard axiomization, what sort of alternative probability theory does it yield?&lt;br /&gt;&lt;br /&gt;What the paper immediately revealed was that the collective-based definition of probability was a competitor to the now-standard axiomization, Kolmogorov's axiomization. It is not surprising, then, that the two are inconsistant with eachother. Where the two differ, von Mises preferred the collection-based account. It is not hard to see why; &lt;span style="font-style: italic;"&gt;his&lt;/span&gt; account is grounded in a mathematical concept of a random sequence, while Kolmogorov's axioms simply tell us how probabilities must be calculated, without any explanation of what a probability means.&lt;br /&gt;&lt;br /&gt;Mainly, the notion of a collective is weaker; it does not allow us to prove as much. For example, it is quite possible that a coin being flipped approaches the correct frequency "from above": with only a finite number of exceptions, the ratio of heads at any finite time can be greater than 1/2, although it gets  to 1/2 as we keep going. Perhaps a stonger notion of random sequence is needed, then. But I do think the von Mises approach of defining random sequences &lt;span style="font-style: italic;"&gt;first&lt;/span&gt;, and &lt;span style="font-style: italic;"&gt;then&lt;/span&gt; probabilities, seems like a better foundation. I wonder: is there any definition of randomness from which the Kolmogorov axioms automatically follow?&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7036022792804642484?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7036022792804642484/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/interpretation-of-probability-yet-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7036022792804642484'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7036022792804642484'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/interpretation-of-probability-yet-again.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4405501314486422414</id><published>2008-06-07T13:31:00.000-07:00</published><updated>2008-06-10T17:22:52.340-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>&lt;span style="font-size:100%;"&gt;&lt;span style="font-size:180%;"&gt;Modal Probability Stuff&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;My personalized "bayesian frequentist" interpretation of probability was struck down by the news that frequentism conflicts with the standard axioms of probability. In the wake of this disaster, I am forced to develop some new ideas about how to interpret probabilities.&lt;br /&gt;&lt;br /&gt;I've been thinking of using modal logic to support probability theory. I looked around the internet somewhat, and it seems like there is some action in the opposite direction (using probability theory as a foundation for modal logic), but not so much in the direction I want to go. (I assume of course that my direction isn't totally unique... it seems natural enough, so someone's  probably tried it. But anyway it is in the minority.)&lt;br /&gt;&lt;br /&gt;Modal logic is essentially the logic of possibility and  necessity. Because these words are ambiguous, there are many possible modal logics. We have a number of worlds, some of which can "see" eachother. Different facts may be true and false in each world. In a given world, a fact is "necessary" if it is true in all worlds that world can see. It is "possible" if it is true in at least one of those worlds. Modal logic lets us assign definite meaning to larger stacks such as "necessarily necessary" (it is true in all worlds that the worlds we can see are able to see),  "possibly necessary" (in &lt;i&gt;some&lt;/i&gt; world we can see, it is true in all worlds &lt;i&gt;that &lt;/i&gt;world can see), and "possibly possibly possible" (in some world we can see, in some world it can see, in some world &lt;i&gt;it&lt;/i&gt; can see, the fact holds).&lt;br /&gt;&lt;br /&gt;Perhaps the idea of "worlds we can see" seems ambiguous. My choice of wording is essentially arbitrary; a common choice is "accessibility". As with necessity and possibility, the exact meaning depends on the modal logic we're using. For example, we might want to talk about immediate future necessity and possibility. The worlds we can see are the possible next moments. "Possibly possible" refers to possibility two moments ahead, "possibly possibly possible" is three moments ahead, and so on. Another modal logic corresponds to possibility and necessity in the entire future. We can "see" any possible world we can evetually reach. Additionally, we might also say that we can see the present moment. (We could choose not to do this, but it is convenient.) In this modal logic, "possibly possible" amounts to the same thing as "possible"; "necessarily necessary" amounts to "necessary". However, "necessarily possible" does not collapse (since it means that a fact remains possible no matter which path we go down in the future), and neither does "possibly necessary" (which means we can take some path to make a fact necessary). Which strings of modal operators collapse in a given modal logic is one of the basic questions to ask.&lt;br /&gt;&lt;br /&gt;Other modal logics might represent possibility given current knowledge (so we can only access worlds not ruled out by what we already know), moral necessity and possibility ("must" vs "may"), and so on. Each has a different "seeability" relationship between worlds, which dictates a different set of logical rules governing necessity and possibility.&lt;br /&gt;&lt;br /&gt;Just to give an example of using probability theory as a foundation for modal logic, we might equate "possible" with "has probability greater than 0," and "necessary" with "has probability 1". I doubt this simple approach is very interesting, but I know very little about this. I'm more interested in the opposite possibility.&lt;br /&gt;&lt;br /&gt;The idea would be something like this: we attach a weight to each world, and at any particular world, we talk about the probability of an event in the worlds we can see. The probability of a fact is the sum of the weights of each seeable world in which it is true, divided by the sum of the weight of all seeable worlds.&lt;br /&gt;&lt;br /&gt;The idea here is that probability does not make sense without a context of possible worlds. Probability within a single world doesn't make sense; each thing is either true or false (or perhaps undefined or meaningless). Furthermore, there is no privileged scope of possibilities; we might at one moment talk about probabilities assuming the context of physical possibility (in which case the flip is nearly deterministic, we just don't have enough information to calculate the result), or epistemic possibility (in which case it is just fine to assign a probability to the coin flip without sufficient knowledge of the physical conditions).&lt;br /&gt;&lt;br /&gt;One advantage of this approach is that we get an automatic meaning for the troublesome notion of a "probability of a probability". We need this concept, for example, if we want to use bayesian learning to learn the bias of a possibly-biased coin. We assign some probability to each possible ratio of heads to tails, and then start flipping the coin and updating our belief based on the outcomes. This particular example is a bit complicated to treat with the framework I've outlined (which is perhaps a point against): we need to mix modal logics, taking one probability to be based in worlds in which the coin has different properties, and the other to only range over changes in the suurounding conditions of the coin flip. The probability-of-a-probability, then, is the sum over seeable worlds in which,in &lt;i&gt;their&lt;/i&gt; seeable worlds, the sum of an event divided by the total sum equals a particular ratio. In the case of the coin, the two instances of "seeable" have different definitions: we jump  from the current world to worlds in which the coin has different properties, but from them we jump to worlds in which the flip has different starting conditions. (Note: we know neither the coin's actual properties nor the actual exact starting conditions, so for us itis not much of a "jump". But &lt;i&gt;sometimes&lt;/i&gt; we reason with posible worlds which are quite distinctly not our own.)&lt;br /&gt;&lt;br /&gt;Perhaps this is not especially intuitive. One problem I can think of is the lack of justification for picking particular ranges of worlds to reason in. But it is an interesting idea.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4405501314486422414?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4405501314486422414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/modal-probability-stuff-my-personalized.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4405501314486422414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4405501314486422414'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/modal-probability-stuff-my-personalized.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4160981547446076950</id><published>2008-06-06T09:14:00.000-07:00</published><updated>2008-06-16T12:41:07.531-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>In partial answer to the questions posed in the last post, I found &lt;a href="http://brian.weatherson.org/conprob.pdf"&gt;an interesting paper&lt;/a&gt; that shows how, given any logic (within certain broad boundaries), it is possible to construct a corresponding probability theory. In particular, this paper uses the technique to construct an intuitionistic probability theory.&lt;br /&gt;&lt;br /&gt;The paper also argues against abandoning what it calls the "principle of addition". The principle is simple: given non-overlapping states, A and B, the probability of A or B should be the probability of A plus the probability of B. (Formally: if P(A and B) = 0, then P(A or B) = P(A) + P(B). The paper gives a slightly different version.) This may not sound very worrisome, but it actually causes severe problems for the frequency interpretation of probabilities. (I was very surprised to learn this. The idea that probabilities are frequencies is logically incompatible with the standard rules of probability theory! I have not yet found a detailed explanation of the incompatibility; it is often mentioned, but rarely explained...) So although a large class of alternative probability theories are covered by the definition (as many as there are alternative logics), the definition is very conservative in some ways.&lt;br /&gt;&lt;br /&gt;I do not know much about what happens when we drop the principle of addition (also commonly called "finite additivity"). But in some sense, fuzzy logics are among this field of possible notions of "probability". This idea (together with the fact that frequencies are actually incompatible with normal probability) makes my &lt;span style="font-style: italic;"&gt;slightly&lt;/span&gt; less antagonistic towards the fuzzy logic approach... but not much. I was previously of the mindset "Probabilities are the mathematically correct way to go, and anything else is just silly." Now I am of the mindset "Probabilities have a fair mathematical motivation, but there is room for improvement." So another theory &lt;span style="font-style: italic;"&gt;may&lt;/span&gt; be workable, but I'm asking a very strong foundation before I'll stop clinging to standard probability (or if I finally settle on an alternative logic, some conservative generalization of probability such as the one above).&lt;br /&gt;&lt;br /&gt;In other news, I've found some very interesting material concerning intuitionistic logic. &lt;a href="http://web.cs.gc.cuny.edu/%7Esartemov/"&gt;Sergei N. Artemov&lt;/a&gt; has done some amazing-looking work. In particular, on &lt;a href="http://web.cs.gc.cuny.edu/%7Esartemov/accom.html"&gt;this page&lt;/a&gt;, he explicitly claims to have a logic that "circumvents the Incompleteness Theorem". I don't yet know exactly what he means, but the approach is described in &lt;a href="http://www.cs.gc.cuny.edu/%7Esartemov/publications/CADE99.ps"&gt;this paper&lt;/a&gt;, which I will read ASAP. (Actually, I'm reading &lt;a href="http://www.cs.gc.cuny.edu/%7Esartemov/publications/CFIS99-08.ps"&gt;this one&lt;/a&gt; first, because it is also very interesting...)&lt;br /&gt;&lt;br /&gt;I have also been running into some amazing math blogs lately. &lt;a href="http://www.scienceblogs.com/goodmath/"&gt;Good Math&lt;/a&gt; has a nice introduction to lambda calculus and intuitionistic logic (as well as pleny of other introductions to important areas). (Note: the blog moved from Blogspot to ScienceBlogs, but the old posts I'm referring to are the &lt;a href="http://www.scienceblogs.com/goodmath/"&gt;ones on Blogspot&lt;/a&gt;.) Also, &lt;a href="http://math.andrej.com/"&gt;Mathematics and Computation&lt;/a&gt; looks like a good read. There is a very interesting-looking post on &lt;a href="http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/"&gt;seemingly impossible functional programs&lt;/a&gt;, which gives an algorithm  for exhaustively searching through the space of infinite sequences of 1s and 0s. Unfortunately, I did not understand very well on my first read, because the algorithms are written in Haskell, which I am unfamiliar with... I found this blog through &lt;a href="http://hunch.net/"&gt;Machine Learning (Theory)&lt;/a&gt;, another good blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4160981547446076950?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4160981547446076950/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/in-partial-answer-to-questions-posed-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4160981547446076950'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4160981547446076950'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/06/in-partial-answer-to-questions-posed-in.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2780042398110920448</id><published>2008-05-29T11:36:00.000-07:00</published><updated>2008-06-16T12:41:47.286-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><category scheme='http://www.blogger.com/atom/ns#' term='logic'/><title type='text'></title><content type='html'>My next post was *going* to be an essay showing how the halting problem, Godel's theorem, Russel's paradox, and several other things can all be demonstrated using one method, called the diagonal method. Perhaps this post is still coming, but I have not felt inspired to write the whole thing out yet. For the moment:&lt;br /&gt;&lt;br /&gt;Some Important/Crazy Questions&lt;br /&gt;&lt;br /&gt;1. OK, so there's classical logic. In classical logic, everything is either true or false: not neither, not both. Classical logic has some oddities, which have led people to propose alternative logics. Two such logics are: intuitionistic logic, and paraconsistent logics. In some sense, these are opposite solutions to the same problem. (See &lt;a href="http://en.wikipedia.org/wiki/Paraconsistent_logic#Relation_to_other_logics"&gt;here&lt;/a&gt;.) I like to think of it this way: both logics admit that there are true cases, false cases, and border cases.&lt;br /&gt;They differ in how they deal with the border cases.  Intuitionistic logic calls them "neither true nor false". Paraconsistant logic calls them "both true and false". This choice results in very different treatments. This notion of a "border case" has surprisingly been formalized using topology, and that is where I got the metaphor. (See &lt;a href="http://en.wikipedia.org/wiki/Heyting_algebra"&gt;here&lt;/a&gt;.) But If there is such a nice relationship between the two, shouldn't they be usefully equivalent, or perhaps part of some larger system? A movie I recently watched made a random reference to the idea of a four-valued logic, one with "true", "false", "both", and "neither". How would this work? (Googling the idea seems to bring up random unrelated stuff.)&lt;br /&gt;&lt;br /&gt;2. Another alternative logic I have heard of is called "quantum logic". This is literally a logical system to work with the oddness of quantum mechanics. Strange! So my second question is: how does this logic work? How does it relate to paraconsistant and intuitionistic logics?&lt;br /&gt;&lt;br /&gt;3. For applications in artificial intelligence, it is very useful to attempt a seamless integration of probability and logic, rather than simply using logic to reason about probabilities. (This is of course the goal of the Alchemy system I recently mentioned, as well as many other AI systems.) The natural question is how these alternative logics generalize to alternative theories of probability. For intuitionistic logic and paraconsistant logic, possible probabilitic versions readily present themselves: intuitionistic probability theory would let probabilities sum to less than one, and paraconsistant probability theory would let them sum to more than one. I was amazed to hear that the quantum logic has a simple probabilitic version: allow probabilities to have both real-number and imaginary-number components. (I heard this on the AGIRI mailing list. See &lt;a href="http://groups.google.com/group/opencog/browse_thread/thread/17cb53a969bbad86/3f97e7f6ff6ceca9?lnk=gst&amp;amp;q=re%3A+intensional#3f97e7f6ff6ceca9"&gt;here&lt;/a&gt;, second post in the thread.) How should all these things be interpreted??&lt;br /&gt;&lt;br /&gt;Googling all of this stuff comes up with many results, so perhaps I will have a post that answers some of these questions soon. Meanwhile, I am still convinced that some form of nonmonotonic logic that can reason about incomputable models of the world is the important direction to go.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2780042398110920448?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2780042398110920448/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/05/my-next-post-was-going-to-be-essay.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2780042398110920448'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2780042398110920448'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/05/my-next-post-was-going-to-be-essay.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-5334838363746677658</id><published>2008-05-13T12:01:00.000-07:00</published><updated>2008-05-13T12:16:12.695-07:00</updated><title type='text'></title><content type='html'>An interesting collection of videos--&lt;br /&gt;&lt;br /&gt;http://videolectures.net/Top/&lt;br /&gt;&lt;br /&gt;A great number of these are related to AI, although only a few are actually in the "ai" section. For example, there is a large computer vision section, and a larger one about data mining.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-5334838363746677658?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/5334838363746677658/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/05/interesting-collection-of-videos.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5334838363746677658'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/5334838363746677658'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/05/interesting-collection-of-videos.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-948176441217462693</id><published>2008-04-16T10:32:00.000-07:00</published><updated>2008-04-16T10:51:35.948-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='progic'/><category scheme='http://www.blogger.com/atom/ns#' term='markov logic'/><category scheme='http://www.blogger.com/atom/ns#' term='AGI'/><title type='text'></title><content type='html'>More AGI Goodies&lt;br /&gt;&lt;br /&gt;&lt;a href="http://people.csail.mit.edu/kersting/plmr/"&gt;http://people.csail.mit.edu/kersting/plmr/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The above website has a list of AI systems, all created with the goal of integrating logic and probability theory. This goal is more interesting than it sounds at first. There are trivial integrations of logic and probability theory (such as using logic to reason about probability) and there are non-trivial integrations. Nontrivial integrations are of great interest, because they allow algorithms from both arenas to be applied, they allow easy "softening" of existing hard-logic knowledge bases that prove to be too inflexible (amazingly, &lt;a href="http://www.cyc.com"&gt;Cyc&lt;/a&gt; has started taking that strategy), and they open up new possibilities for learning and inference. All of the above systems are openly available (which is amazing!).&lt;br /&gt;&lt;br /&gt;Of particular interest is &lt;a href="http://alchemy.cs.washington.edu/"&gt;Alchemy&lt;/a&gt;. Alchemy has a deceivingly simple-sounding and intuitive scheme: take a set of logical statements and attach a weight to each, representing how much the system should endorse each claim. The weights don't just range between 0 and 1 like probabilities, they can be as large as you like (infinity would mean absolutely true). Together, all of the propositions and there weights are transformed by the system onto a standard type of probabilistic model (a markov random process), which can be reasoned about using well-established algorithms. However, the group also has invented some amazing-sounding algorithms of their own...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-948176441217462693?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/948176441217462693/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/04/more-agi-goodies-httppeople.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/948176441217462693'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/948176441217462693'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/04/more-agi-goodies-httppeople.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1083729396296607024</id><published>2008-04-11T10:49:00.000-07:00</published><updated>2008-04-11T11:01:01.226-07:00</updated><title type='text'></title><content type='html'>This is old news now, but the &lt;a href="http://www.agi-08.org/"&gt;First Annual AGI Conference&lt;/a&gt; occurred recently. This is fairly exciting in and of itself, but more exciting is that &lt;a href="http://www.agi-08.org/papers/#full"&gt;all the papers&lt;/a&gt; are available free online. This provides a very interesting snapshot of what the emerging "AGI Community" is and will be about.&lt;br /&gt;&lt;br /&gt;For those who don't know, AGI is a term created to distinguish general AI (aiming at human-level intelligence in a broad range of tasks) from narrow AI (which aims for high performance on some single, specialized task). Typical narrow-AI applications include chess playing programs, stock-market forecasters, face recognition software, and generally anything else AI is used for these days. A typical example of AGI (standing for artificial &lt;span style="font-style: italic;"&gt;general&lt;/span&gt; intelligence) might be a project aiming to make a "baby machine" (an AI that is good at nothing, but could be trained to do anything).&lt;br /&gt;&lt;a href="http://www.agiri.org/wiki/index.php?title=Artificial_General_Intelligence"&gt;&lt;br /&gt;http://www.agiri.org/wiki/index.php?title=Artificial_General_Intelligence&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1083729396296607024?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1083729396296607024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/04/this-is-old-news-now-but-first-annual.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1083729396296607024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1083729396296607024'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/04/this-is-old-news-now-but-first-annual.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1311080162356019396</id><published>2008-03-28T11:41:00.000-07:00</published><updated>2008-03-28T13:55:15.238-07:00</updated><title type='text'></title><content type='html'>Godel incompleteness is not a sufficient measure of completeness.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;To begin this argument, I need to talk about Skolem's Paradox, an intriguing result concerning the foundation of mathematics.&lt;br /&gt;&lt;br /&gt;http://en.wikipedia.org/wiki/Skolem's_paradox&lt;br /&gt;&lt;br /&gt;Skolem's Paradox shows that the logical constructions of set theory will always be insufficient, in a way slightly worse than Godel's Incompleteness. The problem is that any logical characterization of set theory fails to characterize sets completely. The same rule applies to many entities. For example, any attempt at a logical description of the notion "Turing Machine" will similarly fail. Also "number", "connectedness", "higher-order predicate", and "logic" itself, to name a few. All of these cannot be described logically.&lt;br /&gt;&lt;br /&gt;The basis for this claim lies in the fact that these entities cannot be described in first-order logic, which (if you haven't heard of it) is a restricted logic that doesn't fall prey to Godel's theorem (because it is not sufficiently powerful to represent its own rules, and so doesn't contain self-reference). 1st-order logic is often taught as if it were Logic, period. In fact, some have argued that this is the truth: after all, what is logic if there isn't a complete set of deduction rules? However, 1st order logic is too weak to describe the majority of mathematical entities.&lt;br /&gt;&lt;br /&gt;Nonetheless, mathematicians try. What Skolem did was essentially call them on the error, using a nice theorem he and Lowenheim had developed. In fact, the same error is committed *whenever* a logic attempts to be anything more than 1st-order: all you're really doing is trying to make 1st-order logic talk about higher-order entities. (In a sense, this is *why* Godel's Theorem is true; that is, it's *why* the deduction rules are incomplete: they can't ever really do anything more than 1st-order logic.)&lt;br /&gt;&lt;br /&gt;But this leaves a big gaping hole. How DO we reason about mathematical objects? If logical constructions fail, what are we doing? Math SEEMS logical. The easy way out is to act like humans have a magical ability to talk about these things, while any formal construction is doomed to fail. Luckily, this is not the only way out.&lt;br /&gt;&lt;br /&gt;But the other way out is *much* more complicated! Sorry.&lt;br /&gt;&lt;br /&gt;The other way begins with the notion of "limit computation". This is an idea in theoretical computer science, which (sort of) allows a computer to calculate things that it cannot normally calculate. The prime example of something a computer cannot calculate is the halting problem.&lt;br /&gt;&lt;br /&gt;http://en.wikipedia.org/wiki/Halting_problem&lt;br /&gt;&lt;br /&gt;The halting problem is very much like Godel's theorem. The two seem to me to be the same theorem developed in different settings. Godel's theorem says that there will be truths about formal logic that formal logic cannot deduce, while the halting problem shows that there will be facts about computer programs that computer programs will be unable to calculate.&lt;br /&gt;&lt;br /&gt;Perhaps it seems odd that we can have a mathematically well-defined value but be unable to compute it, just as it seems odd that we have well-defined mathematical entities with no proper logical characterization. The intriguing thing, though, is that the halting problem is "limit-computable".&lt;br /&gt;&lt;br /&gt;http://en.wikipedia.org/wiki/Hypercomputation&lt;br /&gt;http://en.wikipedia.org/wiki/Zeno_machine&lt;br /&gt;&lt;br /&gt;A limit-computer is a computer that is allowed to revise its output. The program never halts; instead, the output "converges" to the correct answer.&lt;br /&gt;&lt;br /&gt;This convergence occurs in a finite amount of time, but the problem is that we don't know for sure *when* it has occurred; so to get guaranteed results, we've got to wait forever. But giving up the sureness of finite-time computation allows us to construct programs that do what others cannot.&lt;br /&gt;&lt;br /&gt;So, translate this back to logic. We can increase the number of entities we can reason about by allowing facts to be revised: we may make a wrong inference, so long as any wrong inferences will be corrected along the way. This is called a trial-and-error predicate.&lt;br /&gt;&lt;br /&gt;http://www.jstor.org/view/00224812/di985142/98p0302e/0&lt;br /&gt;&lt;br /&gt;But there's more! :)&lt;br /&gt;&lt;br /&gt;Just as no halting program can solve the halting problem, no converging program can solve the "convergence problem". Just as we can ask if a normal program halts, we can ask if a limit-program converges to an answer or keeps revising its answer forever.&lt;br /&gt;&lt;br /&gt;But just as we can solve the halting problem by resorting to limit-computers, we can solve the convergence problem by resorting to an augmented limit-computer that has access to another computer (meaning it can run as many limit-computations as it likes). Equivalently, we can give the computer as many infinities as it needs, rather than just one, to converge (which again amounts to it being able to run as many limit computations as it likes). In fact, we can give a computer larger and larger infinities of computation time, resulting in the ability to compute more and more things.&lt;br /&gt;&lt;br /&gt;The question arises: if we give the computer "as large an infinity as it likes", can we compute any mathematically well-defined value? I do not know the answer. But it *sounds* reasonable...&lt;br /&gt;&lt;br /&gt;If we're willing to grant this wild assumption, then we can again transfer this development to the logical domain. Essentially, we allow trial-and-error predicates to be defined in terms of other trial-and-error predicates. This gives up the guarantee that all fallible inferences will be eventually corrected (unless by "eventually" we mean "after arbitrarily large infinities of time have passed").&lt;br /&gt;&lt;br /&gt;Why is all this in a blog about AI?&lt;br /&gt;&lt;br /&gt;Well, if I'm right, then the "magical" quality that humans posses and formal systems do not is the ability to make fallible inferences. Any AI based in infallible logic would be unable to understand mathematics, but an AI that included a good fallible reasoning system would be able to. Perhaps this comes automatically with any good learning algorithm, but perhaps not; perhaps only learning systems with very specific properties are sufficient. This needs further research! One avenue is "nonmonotonic logic", which is very similar to the logic I'm proposing.&lt;br /&gt;&lt;br /&gt;http://en.wikipedia.org/wiki/Nonmonotonic_logic&lt;br /&gt;&lt;br /&gt;However, standard nonmonotonic logic doesn't have quite as much machinery as I want... I think it is equal to normal limit-computation, rather than the forms of computation involving larger infinities.&lt;br /&gt;&lt;br /&gt;But that's enough speculation for today.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1311080162356019396?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1311080162356019396/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/03/godel-incompleteness-is-not-sufficient.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1311080162356019396'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1311080162356019396'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/03/godel-incompleteness-is-not-sufficient.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1094818817290426029</id><published>2008-03-07T18:47:00.000-08:00</published><updated>2008-03-07T18:51:49.258-08:00</updated><title type='text'></title><content type='html'>I recently read an article called "Complex Systems, Artificial Intelligence and Theoretical Psychology" by Richard Loosemore. The argument it makes goes something like this:&lt;br /&gt;&lt;br /&gt;1. A "complex system" (referring to complex systems science) is one that displays behavior that cannot be predicted analytically using the system's defining rules. (This is called, in the paper, a global-local disconnect: the local rules that play the role of the "physical laws" of the system are disconnected analytically from the global behavior displayed by the system.)&lt;br /&gt;&lt;br /&gt;2. The mind seems to be a complex system, and intelligence seems to be a global phenomenon that is somewhat disconnected with the local processes that create it. (Richard Loosemore argues that no real proof of this can be given, even in principle; such proofs are blocked by the global-local disconnect. However, he thinks it is the case, partly because no analytical solution for the mind's behavior has been found so far.)&lt;br /&gt;&lt;br /&gt;3. The mind therefore has global-local disconnect. Richard Loosemore argues that, therefore, artificial intelligence cannot be achieved by a logical approach that attempts to derive the local rules from the list of desired global properties. Instead, he proposes an experimental approach to artificial intelligence: researchers should produce and test a large number of systems based on intuitions about what will produce results, rather than devoting years to the development of single systems based on mathematical proofs of what will produce results.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I agree with some points and disagree with others, so I'll try to go through the argument approximately in order.&lt;br /&gt;&lt;br /&gt;First, I do take the idea of a complex system seriously. Perhaps the idea that the global behavior of some systems cannot be mathematically predicted seems a bit surprising. It IS surprising. But it seems to be true.&lt;br /&gt;&lt;br /&gt;My acceptance of this idea is due in part to an eye-opening book I recently read, called "Meta math: the quest for omega", by Gregory Chaitin.&lt;br /&gt;&lt;br /&gt;Chaitin's motivation is to determine why Godel's Theorem, proving the incompleteness of mathematics, is true. He found Godel's proof convincing, but not very revealing: it only gives a single counterexample, a single meaningful theorem that is true but mathematically unprovable. But this theorem is a very strange-sounding one, one that nobody would ever really want. Perhaps it was the only place where math failed, or perhaps math would only fail in similarly contrived cases. Chaitin wanted some indication of how bad the situation was. So he found an infinite class of very practical-sounding, useful theorems, all but a handful of which are unreachable by any formal logic! Terrible!&lt;br /&gt;&lt;br /&gt;Perhaps I'll go through the proof in another post.&lt;br /&gt;&lt;br /&gt;Chaitin shows us, then, that there really are global properties that are analytically unreachable. In fact, in addition to his infinite class of unreachable-but-useful theorems, he talks about a global property that any given programming language will have, but which is analytically unreachable: the probability that a randomly generated program will ever produce any output. This probability has some very interesting properties, but I won't go into that.&lt;br /&gt;&lt;br /&gt;I think the term Chaitin used for math's failure is somewhat more evocative than "global-local disconnect". He uses the term "irreducible mathematical fact", a fact of mathematics that is "true for no reason", at least no logical reason. Notice that he still refers to them as mathematical facts, because they are true statements about mathematical entities. As with other mathematical facts, it still seems as if their truth would be unchanged in any possible world. Yet, they are "irreducible": logically disconnected from the body of provable facts of mathematics.&lt;br /&gt;&lt;br /&gt;So, math sometimes cannot tell us what we want to know, even when we are asking seemingly reasonable questions about mathematically defined entities. Does this mean anything about artificial intelligence?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Another term related to global-local disconnect, this one mentioned in the paper by Loosemore, is "computational irreducibility". This term was introduced by Stephan Wolfram. The idea is that physics is able to predict the orbits of the planets and the stress on a beam in an architectural design because these physical systems are computationally reducible; we can come up with fairly simple equations that abbreviate (with high accuracy) a huge number of physical interactions. If they were not reducible, we would be forced to simulate each atom to get from the initial state to the final result. This is the situation that occurs in complex systems. Unlike the "mathematically irreducible" facts just discussed, there IS a way to get the answer: run the system. But there are no shortcuts. This form of global-local disconnect is considerably easier to deal with, but it's still bad enough.&lt;br /&gt;&lt;br /&gt;It's this second kind of irreducibility, computational irreducibility, that I see as more relevant to AI. Take the example of a planning AI. To find the best plan, it must search through a great number of possibilities. Smarter AIs will try to reduce the computation by ruling out some possibilities, but significant gains can be made only if we're willing to take a chance and rule out possibilities we're not sure are bad. The computation, in other words, is irreducible-- we'll have a sort of global-local disconnect simply because if we could predict the result from the basic rules, we wouldn't need the AI to find it for us.&lt;br /&gt;&lt;br /&gt;So it seems Loosemore was right: intelligence does seem to involve complexity, and unavoidably so. But this sort of irreducibility clearly doesn't support the conclusion that Loosemore draws! The global-local disconnect seems caused by taking a logical approach to AI, rather than somehow negating it.&lt;br /&gt;&lt;br /&gt;But there is a way to salvage Loosemore's position, at least to an extent. I mentioned briefly the idea of shortcutting an irreducible computation by compromising, allowing the system to produce less-than-perfect results. In the planning example, this meant the search didn't check some plans that may have been good. But for more complicated problems, the situation can be worse; as we tackle harder problems, the methods must become increasingly approximate.&lt;br /&gt;&lt;br /&gt;When Loosemore says AI, he means AGI: artificial general intelligence, the branch of AI devoted to making intelligent machines that can deal with every aspect of the human world, rather than merely working on specialized problems like planning. In other words, he's talking about a really complicated problem, one where approximation will be involved in every step.&lt;br /&gt;&lt;br /&gt;Whereas methods of calculating the answer to a problem often seem to follow logically from the problem description, approximations usually do not. Approximation is hard. It's in this arena that I'm willing to grant that, maybe, the "logical approach" fails, or at least becomes subservient to the experimental approach Loosemore argues for.&lt;br /&gt;&lt;br /&gt;So, I think there is a sort of split: the "logical" approach applies to the broad problem descriptions (issues like defining the prior for a universal agent), and to narrow AI applications, but the "messy" approach must be used in practice on difficult problems, especially AGI.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1094818817290426029?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1094818817290426029/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/03/i-recently-read-article-called-complex.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1094818817290426029'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1094818817290426029'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/03/i-recently-read-article-called-complex.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-863016898723588082</id><published>2008-02-15T09:18:00.000-08:00</published><updated>2008-03-28T16:16:08.913-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'></title><content type='html'>Well, I think I've come up with something of an answer. I want a prior to be interpreted as a frequency estimate on possible worlds. This sounds funny, because we can't possibly estimate such a frequency: we only live in one world. But this is actually just fine: we &lt;span style="font-style: italic;"&gt;shouldn't&lt;/span&gt; be estimating it, because it's our prior. It's what we &lt;span style="font-style: italic;"&gt;use &lt;span style="font-weight: bold;"&gt;to&lt;/span&gt; estimate&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Anything more we learn, we learn using our prior. So can't improve upon our prior. If you've got a bad prior, tough luck.&lt;br /&gt;&lt;br /&gt;A prior is an estimate of the frequency of alternative worlds. The perfect prior would contain all knowledge we ever needed; it would give our actual world-of-birth a probability of 1, and all other worlds, 0. But no two people are born to the same world, so evolution couldn't find this prior. (By the way, we could also view evolution as using a prior-- this prior is given to it by the very nature of chemistry and physics, and is not very good, but far better then it might have been.) So a slightly weaker and more useful notion of the perfect prior would be one that would do a fair job if all humans had it. Forcing all humans to have the same prior (which is close to being true) causes the perfect prior to have far more interesting structure (ie some learning occurs), although it still would have freakish foresight for things common to all humans (it would know the one true physics, for example).&lt;br /&gt;&lt;br /&gt;Since what I'm interested in is learning, I want some way of ruling out this freakish foresight: I want to talk about a &lt;span style="font-style: italic;"&gt;universal&lt;/span&gt; prior, one that will learn well no matter what the true physics turns out to be (and so on). I'm rejecting the Solomonoff prior because I think computability is too strict a requirement, but I also know that some restrictions are needed (otherwise there is no structure for the prior to take advantage of). What kind of a universal prior is this? And once I've figured that out, is it really of any use?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-863016898723588082?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/863016898723588082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/02/well-i-think-ive-come-up-with-something.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/863016898723588082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/863016898723588082'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/02/well-i-think-ive-come-up-with-something.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-164561675725572617</id><published>2008-02-09T13:29:00.001-08:00</published><updated>2008-02-09T13:29:51.358-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'></title><content type='html'>More on the Interpretation of Probability&lt;br /&gt;&lt;br /&gt;In my previous discussion, I have failed to distinguish between logical necessity and physical necessity. This distinction is critical for my analysis requirement 2: a probability is a statement of uncertainty that could always be turned into certainty given more information. In my first post on the interpretation of probability, I talked about alternatives (A) and (B):&lt;br /&gt;&lt;br /&gt;(A): The universe has some set of uncaused events.&lt;br /&gt;&lt;br /&gt;(B): The universe does not have uncaused events.&lt;br /&gt;&lt;br /&gt;(A) implies that there are actual alternatives: the universe could have happened differently, if these uncaused events happened differently.&lt;br /&gt;&lt;br /&gt;(B) implies that there are no such alternatives. However, this comes at the cost of implying an infinite chain of causes. Worse, as I reasoned before, we should still desire a cause even of this infinite chain; an infinite past history is not enough, because we also need an explanation of why that particular infinite past is the one we have. Furthermore, we need a cause for this cause, and so on.&lt;br /&gt;&lt;br /&gt;Now comes the new stuff. When I stated this, I had in mind that the explanation for the infinite past was physics. This isn't quite sufficient: physics needs to be supplemented by a single completely-specified time-slice. Then a deterministic physics can specify the infinite past and future for us. So allowing this, we further ask: why does the universe contain this particular physics? To answer this, we create a meta-physics that specifies for us an explanation of why physics is the way it is. (My impression is that there is work in theoretical physics corresponding to this desire.) Again, in addition to a meta-physics, we need something like a physics-slice sufficient to specify the rest of physics (that is, a minimal number of variables that determine the rest via the meta-physical laws).&lt;br /&gt;&lt;br /&gt;(I should note that my concept of "meta-physics" is not the typical concept of "metaphysics": metaphysics is a philosophical pursuit, but meta-physics is essentially very abstract physics, which worries about things like how we might predict the existence of electrons from base principles.)&lt;br /&gt;&lt;br /&gt;The chain continues to meta-meta-physics, meta-meta-meta-physics, et cetera ad infinitum.&lt;br /&gt;&lt;br /&gt;All of these things are akin to physical necessity. Each fact is determined by some law of a corresponding physics (except for the determining slices). However, as the sets of rules get more abstract, it seems as if we will hit the ceiling of mathematical necessity. This may have happened already: the physical theory of A. Garrett Lisi (which from what I know looks like what I called a meta-physics) describes our  (meta-) physics as an algebra, so that the obvious alternatives are different algebras. (As I understand it: our physics obeys the algebra E8, but E8 does not fully determine our physics; in addition, we need information about symmetry-breaking. So E8 is the meta-physics and symmetry breaking information is the minimal set of variables needed to determine everything else about the physics. But don't take my word for fact.) The algebraic alternatives are governed by mathematical laws. The mathematical laws are governed by logic. So logic may be as little as 3 metas away! (E8 is meta-physics, so math is meta-meta, so logic is meta-meta-meta-physics.) This seems to stop the questionable infinite regress: it seems at least plausible for logic to be uncaused and unexplained.&lt;br /&gt;&lt;br /&gt;But even this doesn't tie us down to one possible world, namely because of all those minimal slices that we specify along the way. These do the real work of specifying everything; logic's being "at the top" is only a convenient trick. Presumably we could forever play the game of navigating plausible infinite regresses of explanation and plausible places for the regress to stop, but it seems that we could never find a real end to it. (This is particularly chronic if we start to question which logic is at the top, i.e. classical vs intuitionistic vs many other possibilities, and why that logic is at the top, i.e. do we need a meta-logical theory?)&lt;br /&gt;&lt;br /&gt;Therefore we are forced into (A)! There must be fundamentally unexplained things, and therefore actual alternatives.&lt;br /&gt;&lt;br /&gt;By the way, I don't think that exempts humanity from the investigation of the hierarchy I've described ultimately topped by logic: this hierarchy seems very important despite its ultimate futility (we cannot in principle explain everything, but we must still explain as much as possible).&lt;br /&gt;&lt;br /&gt;Concluding (A) does not force me to give up (1) or (2). In particular, I had previously assumed that it went against (2), because I took "more information" to mean causal (or explanatory) information. This is unnecessary; all I need to say is that all meaningful statements are either true or false. Thus the "more information" may merely be the fact itself. As an example, suppose that some physical events really are random: physical law dictates probabilities but not the definite outcome. What I'm saying is that there still is a definite fact of the matter, although not determined by physical law; if we knew everything there is to know, we would know the outcome.&lt;br /&gt;&lt;br /&gt;There are still some clarifications needed. I have dealt with issues arising from my first post, but there are also difficulties with what I said in my second post. Basically, in that post I trapped myself into a fundamental confusion that always will arise for those that try to do away with the concept of a bayesian prior. This is hinted at in the infinite regress I get into when I try to take into account uncertainty about relevant information. At some point, if the probability estimate given is to be coherent, the person must invoke a prior belief concerning each probability. So to revise the conclusion I reached there:&lt;br /&gt;&lt;br /&gt;A probability (used by a person) is (1) a belief concerning a frequency, (2) a statement of uncertainty given limited information which can always be turned into certainty given more information, and (3) based on some prior belief (updated by the limited information available).&lt;br /&gt;&lt;br /&gt;The image here is that our limited information narrows down the space of possible worlds somewhat, and that we hold beliefs about the frequencies of events in the remaining possible worlds. So, for example, if we assert a 50% probability that our friend will buy a pink car, we mean that (given our prior) our information narrows us down to a space of possible worlds such that in about half our friend will get a pink car.&lt;br /&gt;&lt;br /&gt;I want to interpret the prior probability in a way consistent with the way I interpret the beliefs formed using the prior plus evidence. Otherwise, it doesn't seem like a full interpretation. I think the best way of doing this, in line with the idea that a probability is a frequency estimate, is to say that the prior is the person's estimate of the frequency of all possible worlds.&lt;br /&gt;&lt;br /&gt;I admit this sounds strange. Accepting (A) forces me to say that there are actual alternatives, meaning possible worlds. It seems somewhat reasonable to attach probabilities to these (by attaching probabilities to the uncaused facts). But to go a step further and call these frequencies? Does this make sense?&lt;br /&gt;&lt;br /&gt;I suppose I'm forced to leave this question open for now. I think the answer is yes, but my only reason for thinking so is the way it simplifies the whole scheme.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-164561675725572617?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/164561675725572617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/02/more-on-interpretation-of-probability.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/164561675725572617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/164561675725572617'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/02/more-on-interpretation-of-probability.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2194605149790926673</id><published>2008-01-19T16:30:00.000-08:00</published><updated>2008-01-19T16:31:52.487-08:00</updated><title type='text'></title><content type='html'>I've (almost) decided what I mean by completeness. The basic statement is this:&lt;br /&gt;&lt;br /&gt;"If a system is Godel-incomplete, it's as complete as it can be."&lt;br /&gt;&lt;br /&gt;So if a system is rich enough to run into Godel's Incompleteness Theorem, it's good enough (Apologies for the lack of two dots above the 'o' in the name, by the way.).&lt;br /&gt;&lt;br /&gt;This is probably not exactly true. Godel's Theorem can apply to different systems with somewhat varying levels of expressiveness, so long as whole numbers are representable. But it's at least a fair goal.&lt;br /&gt;&lt;br /&gt;Here are some more precise thoughts that followed from the idea.&lt;br /&gt;&lt;br /&gt;Because I'm focusing on learning/induction, statements are essentially probabilistic rather than definitely true or false. Bayesian networks, Markov networks, factor graphs, and other such things (which generally fall under the heading "graphical models") constitute a well-researched, popular, and practical field of machine learning. These probabilistic models can be thought of as the boolean algebra of probability theory (aka the propositional logic of probability). For those without a logic background, this means (very approximately) that we can make statements relating the truth of one idea to the truth of another, and can deduce more relationships that follow from these, but cannot reason about the internal structure of ideas.&lt;br /&gt;&lt;br /&gt;The next step up is called 1st-order predicate calculus. Predicate calculus allows us to make statements about entities, rather than merely making statements. Statements about entities have some internal structure, because we can refer to different entities. We can also refer to all entities at once (stating universal properties).&lt;br /&gt;&lt;br /&gt;If propositional calculus correspond to the graphical models, what is the probabilistic counterpart to predicate calculus? The answer is Probabilistic Relational Models. (They can talk about the relationships of entities to eachother!) These seem very much like a good thing, something I'd like to get my hands on and mess with.&lt;br /&gt;&lt;br /&gt;But there's a bit of a catch. The "1st order" part of "1st order predicate calculus" means that the things we can talk about are divided into two categories: basic entities, and predicates. We can make statements about entities using predicates, but we are not allowed to make statements about predicates themselves; we can only use them.&lt;br /&gt;&lt;br /&gt;This setup, along with the terminology "1st order", "2nd order", "3rd..." "4th..." and so on, is part of type theory. Type theory is a way of avoiding paradox. The 1st-order predicates talk about the base entities. There are 2nd-order predicates that can be used to talk about 1st-order predicates. The 3rd order talks about the 2nd. And so on. Logic was originally supposed to have all of the orders, but nowadays it's mostly restricted to a low number; 1st-order logic is highly typical, 2nd-order is occasionally used when 1st isn't enough, and on occasion 3rd is needed.&lt;br /&gt;&lt;br /&gt;1st order logic by itself is not strong enough to be subject to Godel's theorem. The theorem was originally written concerning logic with all the orders (but also applies to other systems). Thus, extending probabilistic relational models to higher-order logic is an obvious step to fulfill my requirement for expressiveness. However, we can do even better.&lt;br /&gt;&lt;br /&gt;Type theory has an obvious oddness to it. If we apply it to natural language, we get the following ideas:&lt;br /&gt;&lt;br /&gt;-We have nouns.&lt;br /&gt;-Nouns have properties, such as "white" and "heavy", and also relationships to other nouns, such as "under" and "attacking".&lt;br /&gt;-There are special words that talk about properties and relationships; for example, a property can be a physical property or various other types, it can be a color/weight/etc; similarly for relationships.&lt;br /&gt;-So on for higher types.&lt;br /&gt;-However, we must separate certain things that intuitively would be the same. For example, an entity can be "important". This is a 1st-order property. But if a property is "important", this is a 2nd-order property. These two cannot be merged. We must also have a 3rd-order importance for 2nd-order properties, a 4th-order importance, and so on. The same thing happens to many other intuitive concepts: they must be split into an infinite number of concepts, one for each level.&lt;br /&gt;&lt;br /&gt;Another problem that makes me reject type theory is that it cannot be described in its own terms. Any description of type theory must make simultaneous reference to all of the levels, which is not allowed.&lt;br /&gt;&lt;br /&gt;Luckily, there is an alternative. Type theory can be seen as a very restricted form of set theory. 1st-order properties can be seen as sets of entities; 2nd-order properties as sets of 1st-order properties; and so on. Relations are a bit harder to translate, but can be seen as sets of ordered lists. So it's simple to drop the restrictions imposed by type theory, giving a more intuitive higher-order logic.&lt;br /&gt;&lt;br /&gt;So can probabilistic models be made in this more general domain? I'm tempted to answer "sure". I don't know a whole lot about probabilistic relational models, and I don't know a whole lot about set theory, but (perhaps in my ignorance) I see no inherent contradiction between the two. It seems simple enough to extend the one into the other. In fact, I'd say it captures many of the ideas I've talked about in the past. (Shall we say, captures those ideas worth capturing?)&lt;br /&gt;&lt;br /&gt;One problem remains: what prior? (My impression is that probabilistic relational models sidestep this issue by talking about working algorithms more than theoretical ideals... perhaps one reason why they are a matter of great practical interest! But I tend to think the ideal is important as well.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2194605149790926673?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2194605149790926673/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/01/ive-almost-decided-what-i-mean-by.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2194605149790926673'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2194605149790926673'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2008/01/ive-almost-decided-what-i-mean-by.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1752136385315137510</id><published>2007-12-13T12:22:00.000-08:00</published><updated>2008-03-28T16:16:08.913-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'></title><content type='html'>Let's take another look at the two statements I wanted to make about probability.&lt;br /&gt;&lt;br /&gt;1. A probability asserted by a human is really a believed frequency.&lt;br /&gt;&lt;br /&gt;2. A probability is a statement of uncertainty that could always be turned into certainty given more information.&lt;br /&gt;&lt;br /&gt;If the second statement is true, then any probabilistic model is inherently incomplete. This means that there are no single-event probabilities; any "believed frequency" I assert is wrong unless it's 0 or 1, because a single event can only happen or not, and it's meaningless to give a probability.&lt;br /&gt;&lt;br /&gt;What is meaningful however, is giving a probability based on the limited information at hand. In doing so, we give our belief about the frequency that would result from looking at all situations that have the same givens; situations that would look the same from our perspective, but might turn out differently. I'd argue that this is basically what people intend when they give such probabilities.&lt;br /&gt;&lt;br /&gt;However, this also has some snags: I can't seriously assert that people believe such parallel situations always literally exist. People could give probability estimates in unique situations that never occurred before and may never occur again.&lt;br /&gt;&lt;br /&gt;To get past this, I reformulate my statement:&lt;br /&gt;&lt;br /&gt;People give probability estimates (1) based on the limited information at hand, (2) only using relevant information, and (3) ignoring potentially relevant information if it doesn't match any previous experience and so doesn't help predict.&lt;br /&gt;&lt;br /&gt;The first addition, that only information thought to be relevant is used, helps somewhat by reducing the previously crippling number of unique situations. Now, situations can be considered the same if they vary only in ways irrelevant to the event being predicted. The other addition, however, that potentially relevant information be ignored if it turns the situation into a unique situation, is the real clincher. It guarantees that the probability estimate is meaningful.&lt;br /&gt;&lt;br /&gt;But there are still problems.&lt;br /&gt;&lt;br /&gt;Clause 3 above may fix everything, but it's pretty problematic from the point of view of machine learning. A prediction made by ignoring some evidence should be given lower certainty. The math there is straightforward; we have some probability of the variable being relevant, and we have no idea how it effects things if it is. We therefore weight the two possibilities, adding our prediction of what happens if the variable isn't relevant to an even wash if it is. (This is an inexact statement, but whatever.) So the result is weaker for each ignored item.&lt;br /&gt;&lt;br /&gt;The probabilities-of-relevance here are necessary, but to fit them in to the interpretation, must be given the same sort of interpretation; in other words, they've got to be estimates based on the limited amount of relevant information and so on. The "so on" includes a potential infinite regress because we need to again weaken our statements based on any potentially relevant but new variables, and again this involves a probability of relevance, which again must be estimated in the same way, and so on. However, I'm not sure this is a problem. The reason I say this is that I see it as a series of progressively better estimates; in practice, we can cut it off at some point if we need to, and just use the coarsest possible estimates of the next-down level of probabilities. This could be reflected in a further modification:&lt;br /&gt;&lt;br /&gt;People give probability estimates based on as much of the relevant information at hand as they can quickly decide the consequences of.&lt;br /&gt;&lt;br /&gt;In other words, we don't compute instantaneously, so we may not use all the relevant information at our disposal, or may use some only partially (using it in the most important estimates, but making less important estimates more quickly by ignoring more information).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This basically seems to reconcile the statements I began with, (1) and (2) at the top of the page. However, I'm still not completely sure about (2). I still may want to assert that there are actual alternatives in the world.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1752136385315137510?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1752136385315137510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/12/lets-take-another-look-at-two.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1752136385315137510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1752136385315137510'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/12/lets-take-another-look-at-two.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8709576185193131857</id><published>2007-12-08T07:54:00.000-08:00</published><updated>2007-12-20T12:57:09.255-08:00</updated><title type='text'></title><content type='html'>Why is there no way to give these posts proper titles?&lt;br /&gt;&lt;br /&gt;Anyway, I've realized that in the past months, I drifted further and further into bayesianism with my thoughts on AI. That meant not only assuming that the bayesian math was the ideal that an implementation should approximate, but also putting more and more work onto Bayes Nets at the base level of the AI (and therefore spending most of my time figuring out how to extend bayes nets to get them to do what I wanted, rather than thinking about the entire system *around* the bayes nets).&lt;br /&gt;&lt;br /&gt;So, realizing this, I put the base-level statistic on the back burner for the moment and went back to theorizing about the system as a whole. (Coming up with a great base-level statistic is still a good idea, but not of ultimate importance, to some extent.)&lt;br /&gt;&lt;br /&gt;Here's where I was before I got distracted:&lt;br /&gt;&lt;br /&gt;0. (base level) The system uses some set of base-level patterns; these can be almost anything, or a combination of several things, but we must require that there aren't any "cheaters": patterns self-report how well they fit particular data, so some pattern could just claim to be perfect. In other words, the base-level statistic needs to be somewhat smart to start out. We apply the base-level statistic to the data, recording where each possible pattern succeeds and fails. (For example, if we're using neural nets, what we're doing is going through every possible neural net and recording how well it fits each spot in the data. In practice, we can't do that, so we focus on neural nets that we get via standard training algorithms on local data subsets.)&lt;br /&gt;&lt;br /&gt;1. (first wraparound) The base level statistic is applied back onto itself: it is used to predict when particular patterns work well and when they work badly. It then is applied back on itself again, to see where these second-order patterns work well and badly. And so on. This forms a hierarchy of patterns, which is good for two things: (1) it can increase the expressive ability of the base-level statistic; for example, if the base-level only looks for correlations between two variables, the hierarchy represents larger correlation sets; (2) it can increase the compressive ability of the models; for example, if the base-level simply records common sequences that can get arbitrarily long, there's no increase in expressive ability, but there is a great potential increase in compression of models; essentially, we're adding the ability to represent repetition in these sequences. A "hierarchical pattern", as opposed to a "base-level pattern", is a pattern that can be expressed using this method.&lt;br /&gt;&lt;br /&gt;2. (second wraparound) The system also records a global average success for each pattern. (To be more bayesian, we *could* instead keep a density function representing current belief about each pattern's probability.) These records are examined for patterns, particularly patterns in the hierarchies formed by the 1st wraparound. What are the common features of successful hierarchical patterns? For example, do several hierarchical patterns look similar to each other except for isolated difference? Can we make a fill-in-the-blank type form that summarizes them? What typically goes in the blanks, with what probabilities? I typically call this new sort of pattern a "metapattern".&lt;br /&gt;&lt;br /&gt;3. (third wraparound) When the system learns a fill-in-the-blank style metapattern, then the possible blank-fillers should be regarded as now being "the same" in some respect. The third wraparound specifies that whenever this happens, a new property should be invented for this purpose and attached to the elements in the data that satisfy it. This new property can then be used to form new base-level patterns. This allows those things to be grouped together for future purposes, so that the system can learn new patterns that reuse the same type of variable elements without having to go through the whole process of learning a metapattern again. Think of madlibs that reuse blank types such as "noun" and "verb" over and over again. These new types of pattern are what I call "variable patterns".&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This system has a few problems, but the one I'm most concerned about is the difference in style of the 3rd wraparound. To me, it doesn't seem to fit. It's not really a "wraparound", it's just a statement of availability of a particular fact to the base level. In fact, the entire series of wraparounds is a statement of where we &lt;span style="font-style: italic;"&gt;can&lt;/span&gt; look for patterns. Instead, why not go from the opposite direction: let the system look everywhere, except where we explicitly state that it &lt;span style="font-style: italic;"&gt;can't&lt;/span&gt;?&lt;br /&gt;&lt;br /&gt;"Anywhere" means: when we record how well a pattern fits on a particular location, we add that as part of the data. The global record of the strength of different patterns counts as part of the data. The definition of each pattern counts as part of the data.&lt;br /&gt;&lt;br /&gt;From this, the third wraparound emerges by itself: so long as we can draw a connection between a pattern-instance in the data and the pattern's abstract record of success, metapatterns will be seen as directly related to the base-level data, and so participation in a metapattern can be seen as a base-level property.&lt;br /&gt;&lt;br /&gt;The first problem that occurs is that the system could find may useless correlations here, for example the obvious correlation that will exist between the definition of a pattern and what sort of data it has a high score at. So we add our first restriction:&lt;br /&gt;&lt;br /&gt;1. Anything that we can already logically deduce, we are not interested in learning probabilistically.&lt;br /&gt;&lt;br /&gt;This restriction cannot be enforced completely, because we'd have to completely determine everything that we can or can't logically deduce; but that's fine. Instead, the system should ignore relationships that are logically &lt;span style="font-style: italic;"&gt;obvious&lt;/span&gt;: things that it can quickly deduce. (this will be dependent on many things, but the sensitivity is a good thing, not a bad thing. The fact that we must draw an arbitrary line for how long a thing takes to deduce is also forgivable in my opinion.)&lt;br /&gt;&lt;br /&gt;So far, I haven't seen the need for any additional restrictions.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The problem now is that I'm nto sure of the generality of this method. What class of patterns is it capable of learning? Since I abandoned the Turing machine as my most general class of patterns, I don't even know how to define the class of patterns I'm concerned with. I know "statements in predicate calculus" basically covers it, but there are several concerns:&lt;br /&gt;&lt;br /&gt;1. Is 1st-order predicate calculus good enough? Or do I need higher-order predicate calculus, in which case I'd (personally) just go to using axiomatic set theory?&lt;br /&gt;&lt;br /&gt;2. What atomic statements do I need? Do I need equality? Probably. Do I need arbitrary predicate operators that can be defined by partially (by making statements concerning their behavior), or can all needed predicates be composed of existing properties in the data (embedded in quantified boolean-logic statements)?&lt;br /&gt;&lt;br /&gt;Of course, I'm not considering literally using predicate logic; what I mean is not "do I need X in predicate logic?" but "do I need something in my system that would translate to X if I translated everything to predicate logic?".&lt;br /&gt;&lt;br /&gt;Before I can even address those concerns, however, another problem jumps out: how can I represent quantifiers to begin with? First, the quantifying I'm doing is probabilistic where predicate logic is not. That seems fine, though. The real problem is that the system sort of implicitly quantifies. Patterns state what variables they quantify over, but are silent about what variables they might be holding still. The system then takes the set of variables being quantified over and sort of roughly quantifies over them all at once; but in predicate logic, the order in which we quantify is important, so the two approaches are at odds.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8709576185193131857?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8709576185193131857/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/12/why-is-there-no-way-to-give-these-posts.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8709576185193131857'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8709576185193131857'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/12/why-is-there-no-way-to-give-these-posts.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-2715014452138537053</id><published>2007-11-17T10:59:00.001-08:00</published><updated>2008-03-28T16:16:08.914-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'></title><content type='html'>An Interpretation of Probability&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;You know that something is seriously wrong when, in a science, there is as serious of talk about correctly interpreting a theory as there is work attempting to extend the theory or propose new theories. I personally think it is ridiculous that  so much effort is put into "interpreting" quantum mechanics. There's the many-worlds interpretation, the ideas about what counts as an "observer", questions of whether or not a particle actually has a definite speed and location if they can't even in principle be observed simultaneously (what can't be observed doesn't exist), arguments about what it means for an event to be probabilistic, and so on. I'm frustrated both because these don't seem to be real questions (or where they are real, we should experimentally test them rather than argue), and because I want to claim obvious answers myself.&lt;br /&gt;&lt;br /&gt;But enough of criticizing a science I know little about. Another field, which I know somewhat more about, is experiencing a similar problem: probability theory. The argument between Bayesianism and Frequentism, two separate schools of thought that have their own very different mathematics for statistics and machine learning, is essentially an argument about the meaning of probability.&lt;br /&gt;&lt;br /&gt;Frequentists interpret probability based on the idea of random variables and physical random processes. Probability is defined as a statistical frequency; a probability for an event is the ratio at which that event will occur if we sample for a sufficiently long amount of time. For example, if we flip a coin enough times, we can determine to any desired degree of accuracy the actual ratio of heads to tails for that coin. This seems intuitive, right? This notion turns probability into a solid, objective quantity. In fact, frequentist probabilities are often called "objective probabilities".&lt;br /&gt;&lt;br /&gt;Bayesians, however, disagree. For a bayesian, probability can also represent a degree of belief. This distinction is particularly important when an event only occurs once, or when we're talking about the probability of a statement, which can only be actually true or actually false. Frequentist probability cannot be used here, because there is no frequency to measure. For example, suppose we're back before the time of Magellan. You ask me my opinion on the shape of the Earth. I, being well-versed in the popular philosophy of the time, suppose that the earth is round; but I'm not sure. So I give it a 95% chance. From a frequentist view, this is nonsense. The earth is either flat or round; it isn't a physical random process. The frequency of a flat earth is either 1 or 0.&lt;br /&gt;&lt;br /&gt;At this point, you're probably siding with the frequentists. The earth does not have a 5% chance of being flat-- sticking an exact number on it sounds silly, even if I want to say that there's a possibility. But let's consider an example you might sympathize with a little better. Suppose you have a friend who is purchasing a vehicle for the first time. You know that your friend (for some reason) refuses to own any car that is not either pink or purple. However, you don't know of any bias between the two colors, so before you see the car, you can do no better than assign a 50% probability to each color. Upon showing you the car, which is pink, your friend explains that the decision was made based on alphabetical order; pink comes before purple.&lt;br /&gt;&lt;br /&gt;Now notice-- you had no way of knowing which of the two colors your friend would choose. It seems very reasonable to assign equal probabilities to each. However, such a probability does not seem to be a frequency-- if we rewound history to "try again", the friend would always reason the same way, and the new car would always be pink. Probability can only be interpreted as degree of belief here; and that's exactly what bayesians want to allow. In contrast to the frequentist objective probabilities, bayesian probabilities are called "subjective probabilities". (It is fairly common practice to admit both kinds of probability, so that one might say "The coin has an objective probability of landing on heads [since it could land on either heads or tails], but a subjective probability of being counterfeit [it either is or it isn't]".)&lt;br /&gt;&lt;br /&gt;The battle between bayes and frequency has been a long one, but that's not exactly what I'm going to talk about. I'd like to talk about my own grapplings with the interpretation of probability. While I am (at the moment) firmly bayesian in terms of the mathematics, I actually agree with both interpretations of probability; I think that, all probabilities represent a belief about a frequency. But I also want to be able to say a few other things about probability, which I'm not sure are entirely consistent with that view.&lt;br /&gt;&lt;br /&gt;Here's what I want  to say about the meaning of probability:&lt;br /&gt;&lt;br /&gt;(1) All probabilities are a belief about a frequency.&lt;br /&gt;&lt;br /&gt;-Even the most obviously frequentist example of probability, coin flipping, can be thought of in this way. We never know the exact distribution of heads to tails; but we use the 50/50 estimate regularly, and it causes no problems.&lt;br /&gt;&lt;br /&gt;-Many probabilities that seem to be fundamentally bayesian can be explained with the concept of possible universes. The frequency of a flat earth in our universe is 0. But the frequency of a flat earth in all possible universes may be higher. Even if it's not, because we can never know probabilities for certain, but only estimate them, it's entirely reasonable for someone who does not have the benefit of modern science to estimate something like 5% of all alternatives universes have a flat earth. The estimate can be improved later. (Because we can only ever visit one universe, our estimate of the distribution of possible universes will always be quite crude; so probabilities based on such estimates will have a "subjective" feel to them. But I claim that they are not really a different kind of probability.)&lt;br /&gt;&lt;br /&gt;-When counting the frequencies, we only consider universes that match ours sufficiently; in the flat earth example, we consider alternative universes that match everything we *know* about earth, and estimate the frequency of something we *don't* know: whether the earth is flat or round. Similarly, in the example of the pink car, we consider universes that match things we know, but (being only human) are unable to use facts we don't know (such as the fact that our friend loves using alphabetical order to resolve difficult decisions). (This is called a "conditional probability"; if you think it doesn't sound very well-defined, I assure you that it is a very mathematical notion, which has been rigorously founded in logic.) This explains another reason that bayesian probabilities seem "subjective": different people are often considering different evidence (pinning down different facts) when giving probabilities.&lt;br /&gt;&lt;br /&gt;(2) All probabilities are really just statements about uncertainty.&lt;br /&gt;&lt;br /&gt;-I'm claiming here that the concept of a "physical random process" is philosophically unnecessary. When an event like a coin flip seems random, it is actually quite deterministic; we just don't know all the relevant factors involved. Even if we do know all the relevant factors, we aren't necessarily able to calculate their influence on the final result (at least not quickly).&lt;br /&gt;&lt;br /&gt;-Whenever we assign something a probability, then, it's simply because we don't know all the relevant facts (or haven't calculated their influence). Quantum mechanics, for example, gives us probabilistic laws for the behavior of particles; I'm claiming that the probabilistic nature of these laws shows that they aren't the fundamental laws, and that our predictions of quantum events could in principle be improved with more information (or perhaps with a better method of calculating).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;To be perfectly honest, I'm less certain about the second claim. I think both claims seem to be what's intuitively correct; but the two seem to contradict eachother.&lt;br /&gt;&lt;br /&gt;If probabilities are statements of uncertainty, can they also be frequency estimates?&lt;br /&gt;&lt;br /&gt;It seems possible in at least some situations. For example, when flipping a coin, we can create a frequency estimate for both heads and tails, but still claim that if we knew more, we would be able to calculate ahead of time which it would be. In this case, the frequency is based on common known variables from toss to toss (mainly the coin involved), whereas the uncertainty is caused by the unknown variables (the physics of the toss). But this doesn't necessarily solve everything.&lt;br /&gt;&lt;br /&gt;The first conflict that I see is that the idea that there are other possible worlds, necessary to my argument for (1), seems to be ruled out by (2). If anything can in principle be narrowed down to one possibility by knowing the relevant facts, then there can be no actual alternatives! Alternatives are illusions, which can be eliminated by the diligent scientist.&lt;br /&gt;&lt;br /&gt;Never mind the conflict with (1)! Is this view even self-consistant? (2) asserts that any question can be answered by looking at the relevant information. But this implicitly assumes that every event has a cause. The result is an infinite regress of causes. (While this isn't a direct inconsistency, it is reason to worry.)&lt;br /&gt;&lt;br /&gt;So, I'm willing to back down on (2), instead stating two possibilities:&lt;br /&gt;&lt;br /&gt;(A) The universe has some set of first causes. Therefore, there exist real alternatives; the universe actually could have been different. The probability of these different universes is unknown and unknowable, because we can only observe on universe, but it isn't an unintelligible concept. (1) holds but (2) does not hold if the event we're examining happens to be a first cause.&lt;br /&gt;&lt;br /&gt;(B) The universe has no uncaused events. There is an infinite chain of causes leading to each event. (2) holds, but (1) does not always hold: the universe is deterministic, so probability cannot always be interpreted as a belief about frequency. Specifically, (1) doesn't work when we would resort to alternative universes because we don't have multiple instances of the given event in our universe.&lt;br /&gt;&lt;br /&gt;I'm a bit torn between the two, but I prefer (A). I don't have anything against the infinite past implicit in (B), except that the entire timeline has an uncaused feel to it. Why this timeline rather than another? If (B) is correct, then there is some reason. OK. Sure. But (B) also states that this reason has a cause, and it's cause has a cause, and so on. Another infinite causal chain, over and above time. But what caused that infinite chain? And so on. The idea of infinite causes is a bit unsettling.&lt;br /&gt;&lt;br /&gt;So is the idea of an uncaused event, to be sure. But I like being able to say that there are real alternatives, and to say that, it seems I've got to admit uncaused events.&lt;br /&gt;&lt;br /&gt;Notice that these problems melt away if we restrict our talk to manageable islands of reality rather than entire universes. Both (1) and (2) hold just fine if we don't run into an uncaused event or an event so singular that none like it have ever occurred before or will occur again.&lt;br /&gt;&lt;br /&gt;Can we ever know what's really true? Can we experimentally differentiate between (A) and (B)? Probably not. Then why does it matter?&lt;br /&gt;&lt;br /&gt;Maybe it doesn't.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-2715014452138537053?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/2715014452138537053/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/11/interpretation-of-probability-you-know.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2715014452138537053'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/2715014452138537053'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/11/interpretation-of-probability-you-know.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-4503721881183721848</id><published>2007-11-14T06:47:00.000-08:00</published><updated>2007-11-14T09:01:02.994-08:00</updated><title type='text'></title><content type='html'>In case anybody out there is wondering what I'm up to.&lt;br /&gt;&lt;br /&gt;I've started an AI club at my university, which is great. I mean, we actually didn't have one! Shouldn't every self-respecting computer science department have some crazy AI people stashed in a corner trying to change the world?&lt;br /&gt;&lt;br /&gt;Well, we're not there yet-- our short-term goal is to start a small AI competition in a game called Diplomacy. Turns out there's a pre-existing framework for it:&lt;br /&gt;&lt;br /&gt;www.daide.org.uk&lt;br /&gt;&lt;br /&gt;Also, I've been looking at something that goes by various names, including "competent optimization". I'd call it "intelligent search". Based on genetic algorithms, the idea is to think while searching; more specifically, based on what's been seen so far, attempt to learn the characteristics of good solutions, thus guiding the search.&lt;br /&gt;&lt;br /&gt;http://www.cs.umsl.edu/~pelikan/boa.html&lt;br /&gt;&lt;br /&gt;http://metacog.org/doc.html&lt;br /&gt;&lt;br /&gt;The idea of intelligent search is something I've thought about before, so I was both pleased and disappointed to see it already being researched. This means I can't say I invented it! To to meaningful research in the field, I've got to up the quality of my ideas :).&lt;br /&gt;&lt;br /&gt;Of course, really, I haven't done any "meaningful research" at all yet. So it goes. Part of my problem is that I don't focus on one thing well. Also, I seem to like getting started far more than finishing. Coming up with ideas is more exciting than implementing them.&lt;br /&gt;&lt;br /&gt;To implement competent search, or at least to implement the Bayesian Optimization Algorithm (which I guess is the current best), I'll need a Bayes Net framework. There are many to choose from, but here's the one I picked:&lt;br /&gt;&lt;br /&gt;http://www.openbayes.org/&lt;br /&gt;&lt;br /&gt;Probably the Bayes framework in Matlab is the best (except for speed issues), but Matlab costs money (although this Bayes framework for it is free and open source).&lt;br /&gt;&lt;br /&gt;http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html&lt;br /&gt;&lt;br /&gt;So I've been having a lot of thoughts about possible improvements to these intelligent search methods. Mainly, I'm trying to figure out what the theoretically perfect way of doing it would be-- that is, assuming that the only thing that takes computational resources is the actual testing of a point, and so that we can do any complicated analysis we like to decide what point to pick next.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-4503721881183721848?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/4503721881183721848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/11/in-case-anybody-out-there-is-wondering.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4503721881183721848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/4503721881183721848'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/11/in-case-anybody-out-there-is-wondering.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-1111791010346313103</id><published>2007-10-09T13:50:00.000-07:00</published><updated>2007-10-09T14:36:59.139-07:00</updated><title type='text'></title><content type='html'>Why Turing Machines Aren't the Most General Type of Model&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I realized recently that, while it seems reasonable to use "any computer program" as the most general type of model for the world (particularly for computer-based artificial intelligence), it certainly isn't the most general model type. Computer programs represent computable models. A more general class of models is definable models.&lt;br /&gt;&lt;br /&gt;An example of something that's definable but not computable: all numbers that can be printed out by a program that's less than 100 kb in size. The reason this can't be computed? You might think at first that you could in theory generate all programs less than 1 kb in size, run them, and record the outputs. But you can't! This is because of the "halting problem": there is no general way of knowing how long a program will take to run, and if it will ever stop. Many of the programs you generate will contain infinite loops, and so will never produce any output (or will continue to produce output forever-- let's say we want to discard the output in such a case, because we don't know what to do with infinitely long numbers). So, basically, while you might expect the process to merely take a long time, it will actually take forever: you will never finish running the programs, so you'll never get the completed list of numbers. And if you stop early, there's always the chance that you're missing a number or two (since there is no general way to distinguish a program that's on an infinite loop from one that is just taking its time).&lt;br /&gt;&lt;br /&gt;So, why is this important? Why do we want an AI to be able to make models of the world that are definable but not computable? Because physicists and mathematicians do it all the time, so it is obviously a basic human faculty, necessary to the understanding of the universe we live in (unless you think physicists and mathematicians have been somehow warped by their disciplines into creating these actually meaningless models).&lt;br /&gt;&lt;br /&gt;One point of irony: because of the halting problem, most AI theories that restrict the AI's domain of models to the computable are themselves uncomputable (they must be approximated, in practice). This means that although a human could understand the general guiding principle behind the design, any AI based on such a theory would be incapable of comprehending the concepts behind its construction!&lt;br /&gt;&lt;br /&gt;Does this mean that computers are fundamentally unable to think at a human level? Actually, no. An AI could make any model representable by formal logic, rather than any model representable by Turing machine, would be able to reason about any definable model. It would be able to understand both the principles behind a Turing-machine-based AI, and those behind itself. (Serious anti-AI folks would invoke Godel's Theorem here, though, and claim that &lt;span style="font-style: italic;"&gt;any&lt;/span&gt; formal system is incapable of fully comprehending itself. This is true, but also appears to apply to humans, so isn't as concerning.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-1111791010346313103?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/1111791010346313103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/10/why-turing-machines-arent-most-general.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1111791010346313103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/1111791010346313103'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/10/why-turing-machines-arent-most-general.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-979569516731674680</id><published>2007-09-05T11:37:00.000-07:00</published><updated>2007-09-05T12:18:11.637-07:00</updated><title type='text'></title><content type='html'>Some thoughts on the hidden-variable puzzle.&lt;br /&gt;&lt;br /&gt;In principle, the problem is already solved-- we already know the search space, and could (again, in principle) simply look through all possible hidden variable configurations. All we need is a reasonable prior. (This is not entirely non-trivial, but we'll save it for later.) This search space, however, is huge. It seems likely that there is some redundancy here; and even if there aren't, there should be some general principles that tend to yield correct results-- hidden variables that are typically useful. This would allow the continuation of the paradigm I've followed thus far: search for the simplest patterns quickly, but search slowly for the general case.&lt;br /&gt;&lt;br /&gt;The fundamental question here is when to check for a hidden variable. An interesting way to rephrase this: when do we accept something as a general pattern, and when do we instead think something requires an explanation? For example, if all squirrels had a striped fur pattern, most people would accept that as part of what a squirrel is. (Evolutionary biologists would be the minority asking for an explanation.) However, if a particular box made a strange tinkling noise when shook, we will probably wonder why.&lt;br /&gt;&lt;br /&gt;I think some of what's going on is that we accept patterns when we have little hope of discovering the hidden mechanisms, but question them when we have some idea of the structures involved and so can hope to reach a solution. In the case of the box, we know that there is something hidden inside it, and that what's inside determines the noise. This is not the phenomenon I'm interested in. In this case, we already know that there is a hidden variable, and are merely learning about it. How do we come up with the hidden variables in the first place?&lt;br /&gt;&lt;br /&gt;One strategy might be to assume that each basic variable is "masking" a hidden variable. This means that, essentially, we copy over the visible network to make a hidden network that we assume at first is fairly similar in structure. We then make reasonable modifications on the hidden network to improve its predictive power. The first type of modification would be to reduce static. Aberrations in the visible variables that have no causal effect (they do not seem to modify their surroundings) do not need to be a part of the shadow network. This may reduce the amount of past-dependance needed; where before the state a few steps ago might be necessary to predict the current state because static blotted out what was inbetween, now it is evident that the state depends on hidden steps that were masked by noise. Space-dependence may reduce for similar reasons. This is good, but so far everything I've mentioned can be done without a shadow net, by just recognizing the uncertainty in the initial variables. A more interesting effect would be the possibility of a different sort of variable pattern: a single state in a hidden variable might take the role of multiple states in the visible variables. (To this end one might purposefully experiment with shadow networks containing variables with fewer states than the actual network.) But the most promising results should come from changes to the structure of the hidden network. Several closely related variables might be dependent on a single hidden variable. Visible variables might not depend directly on their shadow-pair, but instead be related to the shadows of nearby relatives.&lt;br /&gt;&lt;br /&gt;Not only do the hidden variables represent variable patterns, but they do so in a way that is explicit, rather than implicit. This satisfies the explicit-or requirement. How can the explicit self reference be constructed?&lt;br /&gt;&lt;br /&gt;Here I am, searching for turing-completeness again. The truth is, there are multiple meanings of turing-complete. More precisely, something can be turing-complete concerning different domains; it can be complete from different angles. For example, the knowledge representation could be turing complete but the learning algorithm might not be. Turing-complete could mean a visible or hidden turing machine. Most importantly for the current situation, a hidden turing machine could manifest its pattern in different ways. The form of turing completeness that I considered before (calling the calculations "scratchwork") is turing completeness of the function from the past to the present. The present may be determined by arbitrary calculations involving the past. This may or may not be equivalent to the more standard idea of a turing-complete learner (from the works of Solomonoff and Hutter). The assumption there is instead that the world is the output of an unknown turing machine. The scratchwork idea makes more sense to me, because that sort of knowledge is more generalizable; the same rule is applying itself in many places (throughout time). (Actually, since the mechanism works just as well for space, it isn't specifically limited to time.) In the more standard definition, the only repetition is the recursive calculation of the end result; this is not necessarily observable in any dimension of the data, existing instead only "in theory" (reminiscent of the way scientists sometimes claim that quarks or strings or extra dimensions are "mathematical abstractions" rather than physical fact). Learning this sort of turing-complete model sounds harder (although both are technically incalculable thanks to the halting problem), and so I would think that their sense of turing-complete is "stronger" than mine (more general).&lt;br /&gt;&lt;br /&gt;Stronger doesn't necessarily mean better, however. Another interesting sense in which an AI can be turing-complete is that it can learn that the world behaves as any turing machine if taught properly. This sort of AI is more or less useful depending on what "taught properly" means; if an extremely narrow definition is chosen, any programmable computer satisfies the definition. Slightly more interesting is the definition "teaching = explaining how the world works in some formal notation". This does not include all programmable computers because computers accept programs, not descriptions of the world. A system satisfying this more strict definition needs a planning/deduction system, to turn the knowledge into behavior. If teaching properly involves only exposure to a learnable world, then it's back to Solomonoff. But somewhere between Solomonoff and planning systems, there might be some useful possibilities. I've gone too long in this, so I'll just say that John Andreae's PurrPuss learns turing-complete models if taught with something called "soft teaching", which is similar to a parent directing a child's actions (telling the child what to do, but without telling it why). This might be another direction in which I can do a more exhaustive search of the easier learning domain, beyond merely "turing complete": quickly learn weaker senses of turing-complete, but still try to learn Solomonoff-style with a very slow search.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-979569516731674680?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/979569516731674680/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/09/some-thoughts-on-hidden-variable-puzzle.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/979569516731674680'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/979569516731674680'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/09/some-thoughts-on-hidden-variable-puzzle.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-7617286525034787027</id><published>2007-08-17T07:19:00.000-07:00</published><updated>2007-08-17T07:25:17.801-07:00</updated><title type='text'></title><content type='html'>Over the summer, I've been working on a more rigorous mathematical underpinning for the system. In particular, I wanted a way to measure the significance of aggregate objects in a totally correct way, rather than the many ad-hoc methods I'd been coming up with previously. I've come up with something fairly satisfying, but the details involve taking integrations over multiple variables so I won't try to describe them here.&lt;br /&gt;&lt;br /&gt;Anyway, that explains the neglect of this blog.&lt;br /&gt;&lt;br /&gt;But now, I've come up with a much more satisfying way of extending the theory to learn turing-complete patterns.&lt;br /&gt;&lt;br /&gt;The system can learn the behavior of the "internals" of a turing machine, because a turing machine on the inside is a simply-implemented thing. The power of computation comes from applying simple rules over and over to get possibly complicated results. As I've said before, the problem comes when the internals of such a complex process are hidden from view, and the system must learn the resulting behavior without being able to see the causes. In this case, my system by itself may not be strong enough to learn the general pattern-- it could only memorize special  cases.&lt;br /&gt;&lt;br /&gt;The new solution to this is based on the idea of a "hidden variable". Hidden variables come into play both in what's called a "hidden markov model", and sometimes in Bayes Net learning, and in other places (I imagine). A hidden variable is one that is assumed to exist, but whose value cannot be directly observed through the senses, only inferred from them.&lt;br /&gt;&lt;br /&gt;Hidden variables are obviously what's needed to make my system representationally powerful enough to define any pattern. The question is, when should a hidden variable be introduced to explain an observation? The system should be able to learn not just individual hidden variables, but infinite spaces of them; this corresponds to the theoretically infinite tape of a turing machine, and also to the very human tendency to reason about space as an essentially infinite thing. When thinking about the world, we don't assume only areas we've been to exist: we freely imagine that there is some area at any distance, in any direction. This corresponds to an infinitely extendable system of hidden variables.&lt;br /&gt;&lt;br /&gt;Actually, the "infinitely extendable" part isn't what's hard (although it leads to some obvious hard computational issues). Once I figure out what conditions lead the system to consider the existence of a hidden variable, infinite systems of hidden variables can be inferred from the already-existing system whenever a pattern behind the finite number of hidden variables implies such a system. So the important question is when the system should create hidden variables.&lt;br /&gt;&lt;br /&gt;Ideally, the system might first create hidden variables to represent phenomena such as  an object still existing if a robot moves its camera away and then back, then create hidden variables to represent things staying the same when it moved around the room and back, and to a different room and then back... and when the world is what is moving, then hidden variables might represent what's literally hidden behind other objects. That last one is interesting, because it *could* be done with correlations between non-hidden structures, but it obviously *shouldn't* be. A pixel being in a particular state before the leading edge of an object covers it correlates with it returning to that state when the trailing edge of the object uncovers it; however, it seems far better to correlate it with a hidden variable representing the color of a particular object that can only sometimes be seen. This is particularly important if hidden objects are interacting; if they are represented as hidden objects, the system has a chance of figuring out what the result of the interaction will be.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-7617286525034787027?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/7617286525034787027/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/08/over-summer-ive-been-working-on-more.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7617286525034787027'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/7617286525034787027'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/08/over-summer-ive-been-working-on-more.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-8210475348085904837</id><published>2007-06-12T15:35:00.000-07:00</published><updated>2007-06-12T16:30:23.218-07:00</updated><title type='text'></title><content type='html'>I've figured out that, although the system can represent arbitrary models, it can't learn all of them. The reason has to do with being unable to restrict a recursive definition. Because of the way recursive definitions are learned, the system is unable to entertain imaginary alternative structures and test for their occurence in the data (as it could for non-nested patterns). The basic reason for this is that nested patterns emerge from the system, being represented only implicitly through non-nested patterns. They aren't represented explicitly, so they can't be fiddled around with. A nested pattern is defined by looping back onto its defining context, not by looping back on an actual definition (as discussed before); if there is no context to hold it, it can't be defined. Why this is bad: since the system is unable to come up with imaginary nested patterns, it can't (on its own) imagine arbitrary calculations "off to the side" (I called then "algebraic manipulations" before) that are a key feature of turing-complete learning. (Or, at least, I have come to that conclusion. It's possible that I'm wrong, and the system as stated can learn turing-complete worlds after all.)&lt;br /&gt;&lt;br /&gt;I've figured out a way to make it do so, although it's not particularly pleasing. Two modifications must be made. First, aggregate definitions must be able to contain alternatives. This is an explicit version of what happens when multiple items occur in a paticular context. Second, definitions must be able to be explicitly recursive, containing the object being defined; this is an explicit version of what happens when an object is its own context.&lt;br /&gt;&lt;br /&gt;Interestingly, we might require that these two things can only occur at the same time-- a recursive call only occurs in an alternative, and an alternative only can occur if it contains a recursive call. The first of these two will almost always be true anyway (otherwise all instances of the object being defined will be infinitely long; this could only be true of concepts such as "string of text", and the system doesn't really need to come up with the concept of an infinite string of text, although it seems human). The second restriction does not bar turing-complete learning (as far as I know), so it doesn't change the representational power of the system; but it still would probably hurt the system's power.&lt;br /&gt;&lt;br /&gt;There are a few modifications on this theme. Firts, of we're allowing alternatives, then why not allow denials, too? This seems like it would make a good deal of difference for some purposes and not much for others. It would make the system more difficult to compute-- rather than identifying an aggregate when we see all of its parts, we would not need to worry about seeing a lack of certain things. This is more troublesome than it sounds, because in many cases the system only knows what is, not what isn't. (Imaginary cases, of course; this is how I visualizew thew system, but that doesn't mean much about what it will be like in practice.) A denial of an abstract entity requires us not just to fail to find an instance of it currently instantiated in the location in question, but to find no instance after instantiating all possible entities in that location. Since the system is now supposed to be turing-complete, this runs into the halting problem. Ideally the system should be able to guess after looking for a good amount of time: "It doesn't seem as if there is an instance there, although I can't be sure."&lt;br /&gt;&lt;br /&gt;Another variation: Most aggregates with explicit alternatives or self-reference should come from a recognized implicit alternative or self-reference. Can I change this "most" to "all"? I think the following version works: "All aggregate classes with explicit alternatives or self-reference either reflect aggregate classes with implicit alternatives or self-reference, or are formed by reducing the number of alternatives in such a class." The problem with the implicit alternatives is that they are too all-encompassing, embracing anything that occurs in the defining context. Making such alternatives explicit and then reducing the number of alternatives fixes this. Is that good enough? More thought is needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/28417647-8210475348085904837?l=dragonlogic-ai.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dragonlogic-ai.blogspot.com/feeds/8210475348085904837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/06/ive-figured-out-that-although-system.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8210475348085904837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/28417647/posts/default/8210475348085904837'/><link rel='alternate' type='text/html' href='http://dragonlogic-ai.blogspot.com/2007/06/ive-figured-out-that-although-system.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-28417647.post-3559158279575658421</id><published>2007-05-02T11:23:00.000-07:00</published><updated>2007-06-12T15:35:43.667-07:00</updated><title type='text'></title><content type='html'>Having read more, the argument I gave in the last post obviously aplies to what's called "context-dependent grammars" even better than it applies to context-free ones. This is nice, because context-dependent is more powerful than context-free. Also, I'm not sure there are any other attempts to learn context-dependent grammars. But the question still remains: can the method learn unrestricted grammars?&lt;br /&gt;&lt;br /&gt;In a way, it seems silly to try to learn unrestricted grammars. What does it mean? The simplest characterization is: sort through the space of all possible programs in some manner, looking for the ones that output the data so far (but which don't stop  there). Use the further output of these programs to predict the future. (Either choose one that's somehow best, or somehow assign different weights to each, or treat them all equally.)&lt;br /&gt;&lt;br /&gt;This sort of patternfinding is on a level we wouldn't even expext of a human. The output of an arbitrary computer program can be pretty hideous-- completely tangled, with every value dependent on every other. Yet, at the same time, scientists can go further than the class of "unrestricted" patterns, into the incalculable. The class of unrestricted patterns is actually defined as any calculable pattern. However, not all equations in physics are calculable. We suspect that the equations hold in all cases simply because they seem to hold in the cases we can calculate. So perhaps the "unrestricted" class of patterns isn't exactly the goal I should be after-- a general learning algorithm shouldn't be expected to learn all patterns of the unrestricted class, but it should be able to learn some that don't fall within it; patterns that are calculable only in some cases. (Here's an image of what it means for a pattern to be "incalculable": there is some sort of exact definition for it, but this definition does not imply any sort of algorithm for calculating the pattern; the only way to work out the pattern is with brute-force theorem-proving to get algorithms for any special cases that you need to know, and no such theorem-proving can ever cover all cases. Neat, huh?)&lt;br /&gt;&lt;br /&gt;Actually, I suppose it's possible (in principle) to construct computer programs that will compute uncomputable functions. They'll simply run forever if you ask for an incomputable case. This suggests, then, that a learner of computable unrestricted patterns may very well learn some partially uncomputable patterns by accident, because it can't go through and check to see if a function is computable for every case. Neat.&lt;br /&gt;&lt;br /&gt;But back to the more important issue: what would learning unrestricted patterns look like? Searching through the space of all programs that produce the data doesn't provide a very good image, because it's not very dissectable. For example, it doesn't allow us to ask questions like "if this past history had occured instead, what future would you predict using the same patterns?" I think a slight improvement, without loss of generality, would be to search for possible programs that make fairly good predictions of the next moment when fed the last. This can be thought of as a "computed markov model"-- rather than using a table of probabilities, use an algorithm. Obviously, stopping at 1st-order computed markov models would be a bad idea (because in real life all sensory input from one second ago will not tell you everything you need to know about the next moment by a longshot)-- the higher the order, the better. In fact, it might be desireable to search for programs that make fair preductions when fed arbitrarily-sized chunks of the past leading up to various sample moments.&lt;br /&gt;&lt;br /&gt;A penalty for complexity, however, is also to be desired. Otherwize the program can just memorize all of the answers.&lt;br /&gt;&lt;br /&gt;Anyway, this is an easier question to ask about the system I've been discussing: given enough data, can the system learn arbitrary algorithms for calculating the next moment from some portion of the past?&lt;br /&gt;&lt;br /&gt;It is interesting to note that, simply because the system can learn regular grammars, it can actually learn to predict the next step in a turing machine, possible next steps in a proof, et cetera. That is, it can predict arbitrary algorithms if it is allowed to see all of the "work" inbetween input and output. However, obviously the world hides its work from the observer. So can the system learn to calculate "hidden work"? This would mean something like treating the past as an algebraic statement, and performing manipulations on that statement "off to the side" until some reduction to a particular form is acheived. Then, that form must recognized as a sign to stop, and translated (via regular means) into the answer.&lt;br /&gt;&lt;br /&gt;Since the system could in theory learn any such "algebraic manipulations" as a new type of relation (not found in the dataset), it could perform the off-to-the-side calculations. The question is: could it learn to do so? Also, could it recognize the halt-state properly?If the answer to these two questions is yes, then it could learn the arbitrary transforms.&lt;br /&gt;&lt;br /&gt;To recognize the halting state, and utilize the result of the performed calculation, I think it would only need context-free means. All that's necessary is to recognize the arbitrarily long calculation as a single object with an answer at the end, and store the learned fact that this answer corresponds in some simple way to the future. To recognise all the scratchwork as a single entity requires only a simple recursively defined entity: scratchwork = valid-step -&gt; scratchwork. So can particular sorts of scratchwork be learned?&lt;br /&gt;&lt;br /&gt;For the current system to learn some sort of scratchwork without me realizing it, it would need to support some sort of "classification cascade": a few seperate classifications can be combined to form a new classification, which can combine with others to form yet more classifications, resulting in an arbitrarily long series of additional classifications resembling a proof. Each new fact may aid in deriving yet more facts, and the cascade can go on for indeterminate amounts of time.&lt;br /&gt;&lt;br /&gt;For this to be the case, there needs to be an infinite number of possible classifications. (Otherwize, a cascade couldn't go on for very long.) This is workable, because I allow for classifications of classifications-- and such a meta-classification can potentially be a catch-all for an infinite number of classifications. However, in it's present state, the system can't fully utilize all such classifications; the meta-classifier can be used to identify new classifiers as instances of the infinite class, but still each such new classifier must be created individually; they cannot be created on the fly. An interesting problem. To solve it, we need "anonymous classifiers" (classifiers that we don't bother to give names) defined on-the-spot for entities; all possible 1st-level classifiers that would say "Yes" for the given object are searched through, but without actually bothering to instantiate them, and we test the meta-classifiers on all of them. If one satisfies a meta-classifier, then we know that the object falls into a particular infinite class. (This could be recorded in various ways, I suppo
