Quantitative Measure of Intelligence
Othman Ahmad, A.M.Alvarez & Chan Choong Wah
Nanyang Technological University, Singapore 2263
Standard text books on artificial intelligence1,2 do not describe a quantitative measure of intelligence. The widely quoted test for artificial intelligence, the Alan Turing test, is only a subjective one. It does not allocate intelligence units to an object. The definition of intelligence which is outlined below only gives a quantitative measure of the thinking process distinguishable from the information processing part and is based on the proven information theory. The amount of intelligence measured can be used to develop more intelligent microprocessors and algorithms, instead of using only brute-force parallelism which requires a lot of hardware.
Based on our usual observation, an intelligent object is capable of autonomous operations and the operations are executed in an unpredictable manner. An object that just do repetitive tasks is considered as less intelligent.
The amount of information is proportional to the degree of unpredictability. An event that is sure to happen has zero information.
The information I(xi) of an event X=xi with probability P(xi) is
Equation (1) is known as the logarithmic measure of information as proposed by C. E. Shannon.
This implies that a truth table (or any highly parallel network structures such as Neural Networks), which gives a definite output for a fixed input pattern, has zero information value.
In order to have autonomous operation, the machine must have memory to store instructions. The number of instructions in such machine is its program size, Z.
An autonomous machine which executes instructions in a predictable manner (instructions such as unconditional branches) has zero intelligence because the information content of all possible events (execution of instructions) is zero.
A unit instruction at time t, a t, is an indivisible event in time, and can be stored in one unit of memory (the unit of the memory must be sufficient to store all possible instruction sizes). The number of unit-instructions which can be stored is Z.
The intelligence content, H, of a sequence of instructions from time t=0 to T, is the total information content of that occurrence of instructions, I(a t).
Equation (2) represents the joint probability of events as the product of the probability of each event.
Definitions 1-4 imply that each instruction has equal importance. Therefore an instruction that executes a complex pattern matching in one cycle has the same weight as an instruction that does nothing (NOP). This result comes from equation (1). A program (a sequence of instructions) may have a large intelligence content but zero information processing ability.
Please note that we are only measuring the intelligence content of a machine, not its throughput. A stupid machine can be very productive indeed.
Intelligence can be thought of as a resource which should be used at the correct circumstances, similar to energy and time. There must be a time, space and intelligence relationship whereby we can increase intelligence only to increase the time of processing (number of sequential instructions) but decreasing the space requirement (less hardware). For the same information processing rate, increasing parallelism in the program (by utilising more hardware) results in the reduction in the number of sequential instructions (implying a reduction in intelligence). Of course we can design circumstances whereby we optimise all these relationships where we can get the maximum intelligence while reducing time by maximising the number of random conditional branches.
An analogy is a student trying to take his examination. He has 2 options open to him. One is to remember as much as possible so that he can just regurgitate facts with little intelligence. The other is to concentrate on the deductive reasoning by remembering only key concepts from which he can recover information or even invent alternative solutions. The second method can be called the more intelligent method whereas the first one is more of a reflex action. The throughput is the number of correct answers, which may be the same for both cases.
A stupid machine may look intelligent if it is controlled by pseudo-random sequence generators which is common in Spread Spectrum systems. However pseudo-random is actually predictable. The degree of randomness (or the inverse, which is the predictability), is dependent on the period of the sequences. The pattern of the sequences can only be detected if we are able to sample twice the period of one complete cycle of sequences. The information content measurable is dependent on the sampling period that the observer can make.
If we have full access to the algorithm of the code generation, we would discover that the intelligence of the pseudo-random pattern generator is virtually zero.
If the observer has no access, he must try to obtain as many samples as he can. The information content that he has measured of the pseudo-random generator is his perceived intelligence of the pseudo-random generator, which must be wrong if the observer has not broken the code generation algorithm. Definitions 1-4 does not fail to quantify the intelligence of the pseudo-random generator, it is just that the sampling process is not sufficient.
The development of this measure of intelligence is due to efforts in identifying the critical instructions for microprocessors. Let us apply these definitions to a typical general purpose program running on a typical microprocessor.
The only instructions which may have intelligence are the conditional branches. Some conditional branches such as those used in loops are very predictable. Although we have defined intelligence, the actual amount of information is very hard to determine. We have to resort to statistical sampling techniques.
Let us assume that only and all conditional branches are truly random, there are b P of them, where P is the number of instructions which have been executed and each branch may choose B # Z equally likely addresses. For a DLX3 microprocessor B=2. For man, B is very large. B determines the capacity for intelligence. For the same microprocessor DLX, b are 0.20 and 0.05 for Free Software Foundation's GNU C Compiler and Spice respectively.
E in equation (3) is the average intelligence(entropy) of each instruction measured at each instruction execution. For simple problems (requiring less B), the rate of intelligence of a machine is higher than man.
We can conclude that a C compiler program uses more intelligence per step(b ) than Spice which is mainly a numerical program, and total intelligence depends only on b , B and P.
If a microprocessor is to be designed for highly intelligent programs, such as expert systems, it must be optimised for minimal pipeline flushes on conditional jumps and each conditional branch may choose many instructions. We now have a concrete guideline in designing microprocessors.
This measure should reinforce our intuition about our intelligence versus reflex action. Pattern recognition is just a reflex action after a period of training. Initially a lot of intelligence is required to incorporate knowledge into our memory. After the initial training period, we require less intelligence. Parallel brute force hardware is not the ultimate solution. They still need intelligent pre and post processors which is more likely to be sequential.
Although a human being has a lot of organs that exploit parallelism, our consciousness is sequential. It seems as though there is a master computer which is sequential (Von Neumann Machine) which controls other distributed processors of various degrees of intelligence. This argues the case for SIMD supercomputers which may have slave MIMD machines.
A method of quantifying intelligence is proposed based on information theory and this theory is used to support some design decisions.
1 M. W. SHIELDS 'An Introduction to Automata Theory' Blackwell Scientific Publications (1987)
2 G. F. LUGER, W. A. STUBBLEFIELD 'Artificial Intelligence and the Design of Expert Systems' The Benjamin/Cummings Publishing Company, Inc. (1989)
3 J. L. HENNESSY, D. A. PATTERSON 'Computer Architecture A Quantitative Approach' Morgan Kaufmann Publishers, Inc. (1990)
Prof O Hirota
Technical Programme Chaiman
Machida Tokyo 194
School of EEE,
Nanyang Technological University,
11th March 1992
Enclosed are 4 copies of extended abstracts titled "Quantitative Measure of Intelligence" for your consideration for publication.
We try our best to ensure that the proposal is deemed to be original by searching through some text books and the COMPENDEX PLUS CD ROM archive from 1988-1991.
The closest one was a paper by Goodman et.al. at the IEEE 1988 Int. Symp on Inf Theory where they measure the rule information content which is not a general intelligence measure. Although knowledge is a stored intelligence, to make it useful, it must be extracted using more intelligence.
I have no access to the paper, only its abstract but it won't be surprising if their equations and definitions are very similar to what we had presented.
The originality of our paper is in separating intelligence from other information such as knowledge but it was developed to help us in designing microprocessors. We would be happy to quote any work that is relevant to our objective.