Climbing the Mountain
Undefinability is a harsh reality.
Any solution that one offers is still vulnerable to the attack: full self-reference seems impossible. Regardless of how clever you think you've been, if you propose a solution, I can point to a structure that can't be defined within that system: the system itself. If that's not the case, then the system is inconsistent. (And even then, even if we allow inconsistent systems, I know of no system which could be said to fully describe itself.)
What this all suggests is that human beings (along with a broad range of sentient entities) are fundamentally incapable of articulating our own inner logic. Why? For the same reason that a powerset is larger than a set: if we could fully refer to ourselves, our logic would be "larger than itself" (loosely speaking).
It comes down to a problem of wanting to be able to form sentences that mean anything we might want. If we're fully capable, then we can construct a sentence that is true precisely when it is not true... and the system falls apart.
Kripke's fixed point theory offers a nice fix: with some specific machinery, we're able to call these sentences "undefined". But now we can't refer properly to this idea, "undefined". So we've got a complete theory of truth (one might say), but we're still stuck with regards to undefinability.
So, it looks like we've got to accept it: we can't find a mind (artificial or otherwise) that fulfills the imperative "Know Thyself". The self remains shrouded in mystery. What's a self-respecting AI engineer to do? Am I forced to always design minds with less power of logical reference than my own, because I could not comprehend a design that was truly up to human capability? Are artificial intelligences doomed to be fancy calculators, lacking "understanding" because they will always have a weaker logical structure?
First, no. That doesn't really follow. It's quite possible to use an algorithm that can in principle learn anything: evolution. For example, one could build an artificial mind that held an initially simple program within it, mutated the recently run areas of code when punishment occured, and strengthened recently run code against mutation when rewarded. Or, a little more sophisticated, one could implement Schmidhuber's Sucess Story algorithm, which always and only keeps apparently beneficial mutations, is capable of learning what and when to mutate, can learn to delay reward, and has other desireable features. And yet again, we could try William Pearson's design, which sets up an artificial economy of agents which can co-evolve to produce the desired behavior. With these styles of approaches, there is not a worry of fundamental limitation: such systems can learn the correct logic if it exists (it just might take quite a while!). The worry, rather, is that these aprroaches do not take full advantage of the data at hand. There is no guarantee that they will perform better given more processing power and memory, either. In short, they are not a model of rationality.
This could be taken as an indication that studying logic and rationality is not directly relevant to AI, but I would not agree with such an argument. For one thing, it is possible to derive a model of rationality from such approaches. If they work, there is a reason. The techniques each essentially provide some way of evaluating how a particular program of behavior is doing, together with a technique of searching through the possible behaviors. One could consider the space of all possible programs that might have generated the behavior so far, rather than the single program that actually did. One then takes the best program from that space, or perhaps a weighted vote. Obviously there will be some details to fill in (which is to say that such models of rationality don't just follow directly from the evolutionary algorithms employed), but the general approach is clear... such a system would take an infinite amount of processing power to compute, so one would need to use approximations; the more computing power given, the closer the approximation could be. All the data at hand is now being used, because the system now has the ability to go back and re-think details of the past, asking if particular sensory patterns might have been clues warning of a punishment, et cetera.
So why not accept such models of rationality? I have two reasons... first, they are purely reinforcement-learning-based. Agents based on these models can be driven only by pleasure and pain. There is no ability to consider external, unobserved objects; everything consists of patterns of directly observed sensation. Second, even if one is OK with purely reward-based systems, it is not clear that these are optimal. The evaluation criteria for the programs is not totally clear. There needs to be some assumption that punishment and reward are associated with recently taken actions, and recently executed code, but it cannot be too strong... The sucess story approach looks at things in terms of modifying a basic policy, and a modification is held responsible for all reward and punishment after the point at which it is made. The simple mutation-based scheme I described instead would use some decaying recent-use marker to decide responsibility. William Pearson suggests dividing time up into large sections, and splitting up the total goodness of the section as the payment for the programs that were in charge for that time. Each of these will result in different models of rationality.
So, I desire an approach which contains explicit talk of an outside world, so that one can state goals in such language, and furthermore can apply utility theory to evaluate actions toward those goals in an optimal way. But, that takes me back to the problem: which explicit logic do I use? Am I doomed to only understand logics less powerful than my own internal logic, and hence, to create AI systems limited by such logics?
One way out which I'm currently thinking about is this: a system may be initially self-ignorant, but may learn more about itself over time. This idea came from the thought that if I was shown the correct logic, I could refer to its truth predicate as an external thing, and so appear to have greater logical power than it, without really causing a problem. Furthermore, it seems I could learn about it over time, perhaps eventually gaining more referential power.
In understanding one's own logic, one becomes more powerful, and again does not understand one's own logic. The "correct logic", then, could be imagined to be the (unreachable) result of an infinite amount of self-study. But can we properly refer to such a limit? If so, it seems we've got an interesting situation on our hands, since we'd be able to talk about the truth predicate of a language more referentially powerful than any other... Does the limit in fact exist?
I need to formalize this idea to evaluate it further.