Saturday, May 27, 2006

Notes

Always what matters is how higher concepts are tacked onto data.

Always, there must be "mappings". Time is always mapped. Space is generally mapped. To detect all patterns, all things must be mappable; a lack of mapping merely indicates a lack of common pattern. A mapping is an ability to look for commonality. In a visual grid, we would map each gridpoint to every other, so that we could search for a picture in one place or another, and recognize it as a repeat. On the other hand, we would not map them randomly; we would map the points according to the grid that they fall into, so that, say, (3,4)-(8,12) (a line) would map onto (4,4)-(9,12) or (3,0)-(8,8) or (7,14)-(12,22). We would not want to map it onto, say, a particular scattering of random points; not without evidence. Perhaps a pattern exists between that set of random points and that line; but it is not a property of the default space.

The point is that visual "pixels" (or any kind of basic sensory grain) usually do not stand on their own, to be compared in mass to every other grain, but come with some relations attached, generally forming some sort of space. A machine could eventually learn some of these mappings by such brute-force comparison; if the activity of a pixel can to some extent predict that of nearby pixels (and it should), then a set of correlations can be derived; the more similar in activity, the closer two pixels ought to be. But what then? The machine must be able to extend this "closeness" into the type of mapping that I mentioned. When a net of close pixels behave in a particular way (either behaving similarly to eachother, or following some other pattern) then the same activity in other pixel groupings must be seen as the same.

But I digress. The intelligence should have this grid already set out for it, and I only want it to be able to form it for itself to make sure that it has "complete intelligence". My original statement:

Always what matters is how higher concepts are tacked onto data.

The simplest way to tack on higher concepts is to notice direct repetition: exact likeness in two different places (two places that are "mapped", that is; related either in time or in space). The minimum requirement to find all static patterns is the ability to group any two patterns together if they look alike, whether they occur in data or as a grouped pattern. Two could then turn to three, three to four, et cetera, until all (or most) occurrences of the pattern had been found. It may be simpler, of course, to perform the full search at once, marking all instances of a pattern.

When searching for such static patterns, there may be (there are likely to be) options and compromises.

One problem: a larger pattern will likely exist that incorporates a smaller pattern, but has less instances. The solution here is to split the larger pattern up into various additions on the smaller pattern. The smaller pattern acts as the "core" of the larger pattern (or at least as a piece of the larger pattern). Another problem: two patterns will often overlap, so that parts of the data could belong to either. In a machine mind, one could set things up so that they did not interfere with eachother in doing so. But in a human mind, they will. The mind flips back and forth between the two interpretations of the data. The solution here, I think, is that both interpretations should be "kept in mind", allowing the intellect to choose between them based on their merits.

But sometimes a pattern actually requires something like overlap; for example, in finding simple patterns with numbers, one must recognize that one number is greater than (or less than) the last. They together form a pair with a particular slope. But an elongated run with slop forms many such pairs, and each number must belong to two pairs: a pair with the number previous to it, and a pair with the number after it. One possible solution would be to allow for detection of patterns of overlap. Another might be to use a different sort of an object to detect slopes and similar patterns. This object would be a "relation". The relation in question would be "higher-than" and "lower-than", possibly getting more specific (one-higher-than, two-lower-than...). Data can be transformed into sequences and then groups of such relations. Sequences of such relations should be treated just as raw data, meaning that they can be patterned according to relations as well.

Relations can be learned, based on patterns that occur often in data, but they should also be default items when it comes to such things as numbers and colors. With textual data, each letter can be treated individually; and the same is true for visual data that uses a small number of possible colors, such as 16 or less. But with data that delivers quantities instead of qualities, default relations must exist. If these are numerical (and most will be), then these relations consist mainly of less-than/more-than. Specifics may be given (2-less-than, 18-more-than), but not necessarily; with colors, probably not. Relations may be only addition-based, or may be multiplication-based, or may even be power-based. In addition to numerical default relations, one could use any other imaginable scheme; for example, a variable might be three-dimensional, but not based in a grid; rather based on an alternative 3D tiling. Variable-states could be related in an arbitrary network; a hierarchy; a number-line that branches and loops; whatever.

Problems to be addressed:
(1) In a way, I'm still only talking about static patterns. Variable patterns are also necessary.
(2) I didn't actually provide any way for such relations to be learned; I only said that it would be possible.

These two problems should be related; "variable patterns" create "variables", and "relations" connect "variable states".

It all starts when two variables of the same type predict eachother.

This creates a mapping of one value onto another; a "function".

A function does not necessarily mean an order; it only does so if each value is uniquely mapped onto another. Even if this is true, the order may not encompass all values; the function may only partially order the values, or may make multiple, separate orders (so that groups of values are related to eachother, but not all values can be related). Additionally, there may be multiple such mappings, due to multiple different learned sets of relations. One could imagine some confusing schemes indeed.

And, yes, "mapping" is the proper word. Relations are what orders space and time, just as we are using them now to order variables.

And now a digression. I still have not written exactly how variable patterns are learned, which is part of my original point (see first sentence). But I will now clarify the nature of mappings (or attempt it) with examples.

Chiefly, I must clarify that mapping is somewhat different from ordering. If something is mapped, but not ordered, over time, then we have many "moments", but do not know in what order they occurred. Each moment is separate, but moments may be compared. We may still look for repeated patterns in moments; moments may resemble eachother either in whole or in part. Moments may be classified by their similarities. If such classifications reduce the number of options for other variables, a predictive pattern has been found.

If time is ordered, the difference is simply that a particular moment may predict other nearby (or even farther away, in principle) moments. To look for such patterns, we "collapse" everything into an abstract idea of "the current moment" and "nearby moments". All moments fall under each category, but their is order within the broad groups; each "present" has an associated "past" and "future". We then collapse the present category by various classification schemes. When we do so, the past and future collapses likewise. If, for a particular scheme, the future or past becomes more ordered upon collapsing, we have found a pattern.

If we only want to look for instantaneous temporal influences, in which one moment effects the next, we can collapse everything into two categories: past and future. Each moment counts as both a "past" moment with an associated immediate "future" (the next moment), and as a "future" moment with an associated immediate "past" moment. Using more moments then this in the mapping simply looks for longer-term influences; for example, three units can detect influences that skip one moment before showing. For a computer scientist, this is an Nth-order markov model, but with more complicated states than a single multivalued random variable.

For "completeness", we would want an AI to be able to expand the field of mapping arbitrarily far. This means to me that any moment points to the next moment and the last moment, and that any such ordered and mapped variable (be it space, time, or otherwise) is able to associate in this manner. Alternatively, one could simulate this pointing by re-acessing the larger datastructure every time. This simply means that one must remember the location of everything one is thinking about. But the choice here only effects computation speed and memory efficiency, so I'll dwell on it no longer.

So: we can have mapping without ordering. Can we have ordering without mapping?

I was wrong to say "And, yes, "mapping" is the proper word. Relations are what orders space and time, just as we are using them now to order variables.". Here we have ordering without mapping. Maybe.

Sunday, May 21, 2006

Classification of AI

This classification system is somewhat based on cybernetics, somewhat based on terminology of the AI field, and somewhat based on my own metaphysics. The paper is essentially in order of importance.

The idea is this: although the details of an AI differ, all artificial intelligence must accomplish one, several, or all of a few basic tasks. We can classify artificial intelligences by what requirements they fulfill; this tells us how "complete" they are.

I will detail the main elements by listing and then subdividing. Each level could be presumably subdivided further.

The Two Elements:

1. Knowledge-seeking
2. Goal-seeking

A simple organism starts out with only direct reaction: it metabolizes food, and performs other acts of homeostasis. Eventually, simple knowlegde with direct response is incorporated; various things start to count as pain and pleasure, and a simple awareness develops. Bacteria will often know to go towards food, for example. The knowledge and goal is not learned by an individual, but learned by evolution and kept in "genetic memory".

Eventually, more complexity arises. We can characterize a mind as consisting of intertwined knowledge-base and goal-base.

Knowledge is applied to goal-oriented behavior. Without knowledge, a goal is useless; without a goal, knowledge is useless. Both need the other. These are the two complementary elements of mind.


The Four Elements:

1. Induction
2. Deduction

3. Emotion
4. Planning

Both knowledge and goals must be first created and then expanded.

Knowledge creation is called induction. It is pattern finding; it deals with identifying known objects and learning new ones.

Knowledge expansion is called deduction. It is logic; it deals with extrapolating further from patterns found in data.

Goal creation is called emotion. We create goals based on different sorts of pain and pleasure.

Goal expansion is called planning. It essentially harnesses logic to control the environment.


Induction is my chief aim. Deduction closely resembles the cold, hard logic that computers already do so well. A computer is able to use a model that we input. But to make a new model? That's something that still needs work. Induction is what interfaces directly with the outside world in the form of senses. It deals with data. An A.I. that dealt only with induction would take data in and spit out an analysis. It would perform mathematical regressions, statistical analysis, or the like. Or perhaps it would be more advanced, finding more general types of patterns as well. But, essentially, it would only provide the user with a model of the data. Neural nets are also under this category, generally, although they might be used for pain/pleasure learning.

Deduction, however, is still something of an issue. There is not one version of "logic", despite the impression people may get. There are many versions of formal logic. Some have hidden contradictions, and there is no way to know which until the contradiction is found. On the other hand, a computer can do most acts of deduction rather well. A program that only performs deduction is rather common. A calculator is one. However, much more advanced types of deduction can also be carried out. Besides advanced mathematics, we can input models of the world and get various results. These models may be formulated in various logical languages. Bayesian networks are popular now. Formal logic is arguably more general.

Goal creation is not strictly necessary for an A.I.; usually, the goal for the A.I. will be set. Even biological organisms do not really need goal creation; the simple, single goal of maximizing pleasure would do well enough. However, it is expedient to create new goals. We "like" or "dislike" things originally because they give us pleasure in some way, but we eventually disconnect the liking/disliking from the original reason. This gives us ethics, in a way. We decide that certain things are "good" and "bad", rather than simply to our advantage or against it. Our ultimate ideas of good and bad replace our sense of pain and pleasure. An AI based on emotion would probably be based on behaviorism; it would act according to the rewards and punishments received in the past. It might also go one level higher, adding fight-or-flight response and other emotions.

Goal expansion caries out the whims of the emotions. It is heavily based in deduction, but may use more particular search strategies or logical languages. It is difficult to make an AI with only goal-elements, but this type of deduction can be the focus of an AI, and has been the focus of many.


These could possibly be split up further; splitting up "induction" further, in fact, is the subject of the first post to this blog. Also, one might wish to add "data" and "actions" as elements, because the types of data and the types of action vary from AI to AI. One AI may be in a fully self-animating robot, while another may only have text output. One may read medical data, while another may read books. We get a chart something like this:

data -> { -> (induction -> deduction) -> (emotion -> planning) -> } -> actions


But there is one more element that we humans have: honesty. We have a natural tendency to say what we think. Talking is not generally a goal-oriented behavior; we do not plan everything that we say according to what it will get us, but first think to the truth. We CAN say things because we decide it will accomplish something. But our automatic action is not to do this, but to say what's on our mind.

We view language not like just any data, but like data that talks about the world. Most inputs are merely part of the world. But language is a part of the world that represents another part. For an AI to truly use language, it must have this view.

It could possibly develop naturally, without hard-wiring; an intelligence could learn that what people say can often be used to predict the world. Similarly, an AI could learn to say the truth by default as a part of goal-creation; if it has to communicate its needs regularly, it could develop as a separate goal. However, it could also be programmed specifically in addition to or in replacement of other goal-elements. A calculator is a perfect example of a truth-speaker. It is an excellent conversationalist. It always knows just what to say; it always has a response (even if it's ERROR: DIVISION BY ZERO). Any deducer could have a similar conversational model of deducing and reporting facts from things that the other person says.

"It's sunny out today."

"There must not be many clouds."

"No. It rained last night. I guess that used them up."

"If the rainwater evaporates, there could be cloud formation."

"I saw three porcupines dead on the roadside."

"At that rate, the local population will be depleted by 2006."

Such an AI could have no goals whatever. All it needs to do is reason, and decide what facts are most relevant. On the other hand, one might want both goals and a truth-saying default. This would be possible as well, though much more difficult.


Oh, and one more: copycat learning. This also may be an important element in an intelligence. Whereas truth-saying is mainly deduction, copycat learning relies mainly on induction. Many chatbot AIs use the copycat strategy to mimic human speech.


Most importantly, we have interplay. These separate elements are not really so separate. Each element is constantly active. Each depends on the others. The ultimate behavior of the AI emerges as a synthesis of many elemental activities.

Friday, May 19, 2006

AI formalization

This is the current formulation of my AI. Several things must be realized:

First, this is only the pattern-finding portion of my AI. The point here is to be able to potentially recognize any pattern in data (though in practice any implementation is limited). A "full mind" would also need deduction based on these patterns to be intelligent, and would need goals to act on in order to use this intelligence.

1. Data is made up of atoms in datastructures.

2. An atom is a sensory object that may have one or more variables.

3. These variables can take a certain number of states (be it a finite number, or a workably infinite number in which a new state can be created at whim, based on a relation to existing states).

4. A particular type of atom is defined by a particular set of variable states.

5. Datastructures are relations between many such instances of atoms. Atoms may be strung in a line, placed on a grid, or put an any order generally, depending on the nature of the data. It does not matter if these relations are temporal, spatial, or more abstract: the structure is simply adapted as necessary. There may also be a total lack of structure; atoms or larger structures can be an unordered set, rather than a spatially or temporally ordered one.

6. The variables of a particular set of data include the variables of the elements (the "atomic variables", so to speak), and any patterns that the data fits into. (Since all patterns that may describe the data are not known, this second type of variable must be added as it is spotted.)

7. A static pattern type is defined by a set of variables in data; an occurrence of such a type is the realization of those values in the data. (Most commonly, these variables are atomic, so that the static pattern is a string of atoms that can be found in the data in many places; a word that can be found repeatedly in text, for example. New static patterns can be found by searching the datastructure for repeated strings of values, be they in the atoms themselves or in more abstract regions.)

8. A variable pattern type is a set of such static patterns; any one of the possible value combinations will do to identify the pattern as belonging to the variable type. (A good example of a useful variable pattern is a set of static patterns that behave in nearly the same way, appearing in the same contexts; they are put together as a single variable pattern, allowing these several static patterns to be recognized as functionally the same thing. The variable pattern can be thought of as an uncertain value; a set of possibilities. Predictions can then be made as to what set a thing will be drawn from, when exact predictions could not be made, et cetera. Even if exact predictions can be made, though, the variable pattern can still be used; the domain and range of a function, for example, can also be thought of as variable patterns.) Any set of patterns may be used as a variable pattern; this means that a constant pattern can always be looked at in terms of where else it has appeared.

9. The set of instances of a pattern is an unordered set. (The only relations between instances are any that existed in the data, and equality of the defining variables of the pattern type.) This set, being a datastructure, is searched for patterns as well.

10. Any values not set by the pattern definition are variable patterns, containing the set of all values that have occurred in that slot as their set of values. (This includes both concrete variables and abstract variables in the data.)

11. Sub-types are constructed by setting more variables. Once this is done, more variable types may be created by listing all the values that still occur in the yet-open variable slots, despite the field being narrowed down. (The variable pattern consisting of all subtypes of a certain static pattern is equivalent to that static pattern.)

12. A function is constructed as a list of possible fixed values for one variable in a type, and the related values for another variable of the type. If fixing one variable decreases the options in another, then a correlation has been found; one variable is a function of another, carrying great deductive advantage. (This method of patternfinding in uncertainty, defined in 11 and 12 for finding functions in types, could actually be used for any comparable data situation.)

13. Patterns in functions may be found, as well. The method here is the same as in data, but focusing on describing functions in terms of simpler functions. (Simple functions are remembered as static patterns that can appear in data of certain types.) A pattern may be looked for between a particular input and its output, or, alternatively, between the outputs themselves.

14. The variables of an item include all places where that item has appeared. This includes all functions in which it has appeared. This allows expressions to be created which stand for certain values, but as functions of other values; 144, for example, could be remembered as the square of twelve, rather than as itself. (Expression-creation is an important way to represent a function in terms of simpler functions; if a pattern cannot be found for the unmodified items, perhaps a pattern can be found in their expressions (for example, each might be the square of the next-higher prime number than the last; in such a case, no simple direct relationship between the squares exists, but the pattern is much more obvious if they are all expressed as squares of some number, rather than as just bare numbers). This is essentially the same activity as finding some pattern in the variables of a list of items.)

15. Such expressions, if used as general rules, may describe an infinity. Because a pattern can be described by an expression rather than memorized, and a description is a pattern which can hold true for an infinite number of cases, a pattern may (in effect) be infinite, and still held in the simulated mind, if its (infinite) behavior can be described (finitely). This means that spaces need not be defined only as discrete data structures (as they are in #5); continuous spaces may be imagined, which contain an infinite number of points in a finite space (as is described in geometry). Such spaces are defined by the simple idea that a new point may be between any two points. This is a finite definition which holds true for an infinite number of cases. The same method may be used to define infinity ("bigger that any number" holds true for an infinite number of cases), infinitesimal ("smaller than any positive number" holds), repeating decimals ("three in all decimal places," for example), and other structures. This method is also related to "mathematical induction" and "recursive definition". (In truth, I am not entirely sure how a computer could do this.)

16. In addition to looking for patterns in raw data and in subtypes and in functions, patterns between types may be found, as well. First off, since the definition of a static type is essentially arbitrary (that is, simply memorized), we may look for patterns in it in terms of smaller elements, to find its pattern. With language, this is equivalent to splitting a word into clearly defined syllables in one's mind only after learning the word. The syllables are there implicitly, but not recognized. Also, patterns behind other variables than internal structure should be looked at; to continue the same example, this would be like trying to figure out why the syllables create a word of that meaning, based on their meaning in similar words.

17. Patterns behind patterns, as such, may lead to another very important abstraction: numbers. If several patterns exist that could be described by numbers (that is, types for doubles and triples of a few different patterns), then these patterns will be related in the fact that larger patterns of the same type contain eachother. If this relation could be seen as a pattern (perhaps that's a stretch), then the fact of it could be first established separately for a few cases (one apple/two apple/three apple -> any number apple, plus one lemon/two/three -> any number lemon), then these separate types of numbers could be consolidated into one invariant type: the number (assuming that this pattern could be seen as the same in the multiple cases). In truth, I suppose I don't think humans invent numbers in this way (as babies or whathaveyou), but that numbers are inborn; but even so, they are still constructed from more basic ideas.