What Is Artificial Intelligence by Jack Copeland
What Is Artificial Intelligence by Jack Copeland
What Is Artificial Intelligence by Jack Copeland
NET
Jack Copeland
Artificial Intelligence (AI) is usually defined as the science of making computers do things that require intelligence
when done by humans. AI has had some success in limited, or simplified, domains. However, the five decades since
the inception of AI have brought only very slow progress, and early optimism concerning the attainment of human-
level intelligence has given way to an appreciation of the profound difficulty of the problem.
Table of Contents
What is Intelligence?
Mainstream thinking in psychology regards human intelligence not as a single ability or cognitive process
but rather as an array of separate components. Research in AI has focussed chiefly on the following
components of intelligence: learning, reasoning, problem-solving, perception, and language-
understanding.
Learning
Learning is distinguished into a number of different forms. The simplest is learning by trial-and-error. For
example, a simple program for solving mate-in-one chess problems might try out moves at random until
one is found that achieves mate. The program remembers the successful move and next time the
computer is given the same problem it is able to produce the answer immediately. The simple
memorising of individual items--solutions to problems, words of vocabulary, etc.--is known as rote
learning.
Rote learning is relatively easy to implement on a computer. More challenging is the problem of
implementing what is called generalisation. Learning that involves generalisation leaves the learner able
to perform better in situations not previously encountered. A program that learns past tenses of regular
English verbs by rote will not be able to produce the past tense of e.g. "jump" until presented at least
once with "jumped", whereas a program that is able to generalise from examples can learn the "add-ed"
rule, and so form the past tense of "jump" in the absence of any previous encounter with this verb.
Sophisticated modern techniques enable programs to generalise complex rules from data.
Reasoning
To reason is to draw inferences appropriate to the situation in hand. Inferences are classified as either
deductive or inductive. An example of the former is "Fred is either in the museum or the cafŽ; he isn't in
the cafŽ; so he's in the museum", and of the latter "Previous accidents just like this one have been
caused by instrument failure; so probably this one was caused by instrument failure". The difference
between the two is that in the deductive case, the truth of the premisses guarantees the truth of the
conclusion, whereas in the inductive case, the truth of the premiss lends support to the conclusion that
the accident was caused by instrument failure, but nevertheless further investigation might reveal that,
despite the truth of the premiss, the conclusion is in fact false.
There has been considerable success in programming computers to draw inferences, especially
deductive inferences. However, a program cannot be said to reason simply in virtue of being able to
draw inferences. Reasoning involves drawing inferences that are relevant to the task or situation in
hand. One of the hardest problems confronting AI is that of giving computers the ability to distinguish
the relevant from the irrelevant.
Problem-solving
Problems have the general form: given such-and-such data, find x. A huge variety of types of problem is
addressed in AI. Some examples are: finding winning moves in board games; identifying people from
their photographs; and planning series of movements that enable a robot to carry out a given task.
Perception
In perception the environment is scanned by means of various sense-organs, real or artificial, and
processes internal to the perceiver analyse the scene into objects and their features and relationships.
Analysis is complicated by the fact that one and the same object may present many different
appearances on different occasions, depending on the angle from which it is viewed, whether or not
parts of it are projecting shadows, and so forth.
At present, artificial perception is sufficiently well advanced to enable a self-controlled car-like device to
drive at moderate speeds on the open road, and a mobile robot to roam through a suite of busy offices
searching for and clearing away empty soda cans. One of the earliest systems to integrate perception
and action was FREDDY, a stationary robot with a moving TV 'eye' and a pincer 'hand' (constructed at
Edinburgh University during the period 1966-1973 under the direction of Donald Michie). FREDDY was
able to recognise a variety of objects and could be instructed to assemble simple artefacts, such as a toy
car, from a random heap of components.
Language-understanding
A language is a system of signs having meaning by convention. Traffic signs, for example, form a mini-
language, it being a matter of convention that, for example, the hazard-ahead sign means hazard ahead.
This meaning-by-convention that is distinctive of language is very different from what is called natural
meaning, exemplified in statements like 'Those clouds mean rain' and 'The fall in pressure means the
valve is malfunctioning'.
An important characteristic of full-fledged human languages, such as English, which distinguishes them
from, e.g. bird calls and systems of traffic signs, is their productivity. A productive language is one that is
rich enough to enable an unlimited number of different sentences to be formulated within it.
It is relatively easy to write computer programs that are able, in severely restricted contexts, to respond
in English, seemingly fluently, to questions and statements, for example the Parry and Shrdlu programs
described in the section Early AI Programs. However, neither Parry nor Shrdlu actually understands
language. An appropriately programmed computer can use language without understanding it, in
principle even to the point where the computer's linguistic behaviour is indistinguishable from that of a
native human speaker of the language (see the section Is Strong AI Possible?). What, then, is involved in
genuine understanding, if a computer that uses language indistinguishably from a native human speaker
does not necessarily understand? There is no universally agreed answer to this difficult question.
According to one theory, whether or not one understands depends not only upon one's behaviour but
also upon one's history: in order to be said to understand one must have learned the language and have
been trained to take one's place in the linguistic community by means of interaction with other
language-users.
Strong AI, Applied AI, and CS
Research in AI divides into three categories: "strong" AI, applied AI, and cognitive simulation or CS.
Strong AI aims to build machines whose overall intellectual ability is indistinguishable from that of a
human being. Joseph Weizenbaum, of the MIT AI Laboratory, has described the ultimate goal of strong
AI as being "nothing less than to build a machine on the model of man, a robot that is to have its
childhood, to learn language as a child does, to gain its knowledge of the world by sensing the world
through its own organs, and ultimately to contemplate the whole domain of human thought". The term
"strong AI", now in wide use, was introduced for this category of AI research in 1980 by the philosopher
John Searle, of the University of California at Berkeley. Some believe that work in strong AI will
eventually lead to computers whose intelligence greatly exceeds that of human beings. Edward Fredkin,
also of MIT AI Lab, has suggested that such machines "might keep us as pets". Strong AI has caught the
attention of the media, but by no means all AI researchers view strong AI as worth pursuing. Excessive
optimism in the 1950s and 1960s concerning strong AI has given way to an appreciation of the extreme
difficulty of the problem, which is possibly the hardest that science has ever undertaken. To date,
progress has been meagre. Some critics doubt whether research in the next few decades will produce
even a system with the overall intellectual ability of an ant.
Applied AI, also known as advanced information-processing, aims to produce commercially viable
"smart" systems--such as, for example, a security system that is able to recognise the faces of people
who are permitted to enter a particular building. Applied AI has already enjoyed considerable success.
Various applied systems are described in this article.
In cognitive simulation, computers are used to test theories about how the human mind works--for
example, theories about how we recognise faces and other objects, or about how we solve abstract
problems (such as the "missionaries and cannibals" problem described later). The theory that is to be
tested is expressed in the form of a computer program and the program's performance at the task--e.g.
face recognition--is compared to that of a human being. Computer simulations of networks of neurons
have contributed both to psychology and to neurophysiology (some of this work is described in the
section Connectionism). The program Parry, described below, was written in order to test a particular
theory concerning the nature of paranoia. Researchers in cognitive psychology typically view CS as a
powerful tool.
Alan Turing and the Origins of AI
The earliest substantial work in the field was done by the British logician and computer pioneer Alan
Mathison Turing.
In 1935, at Cambridge University, Turing conceived the modern computer. He described an abstract
computing machine consisting of a limitless memory and a scanner that moves back and forth through
the memory, symbol by symbol, reading what it finds and writing further symbols. The actions of the
scanner are dictated by a program of instructions that is also stored in the memory in the form of
symbols. This is Turing's "stored-program concept", and implicit in it is the possibility of the machine
operating on, and so modifying or improving, its own program. Turing's computing machine of 1935 is
now known simply as the universal Turing machine. All modern computers are in essence universal
Turing machines.
During the Second World War Turing was a leading cryptanalyst at the Government Code and Cypher
School, Bletchley Park (where the Allies were able to decode a large proportion of the Wehrmacht's
radio communications). Turing could not turn to the project of building a stored-program electronic
computing machine until the cessation of hostilities in Europe in 1945. Nevertheless, during the wartime
years he gave considerable thought to the issue of machine intelligence. Colleagues at Bletchley Park
recall numerous off-duty discussions with him on the topic, and at one point Turing circulated a
typewritten report (now lost) setting out some of his ideas. One of these colleagues, Donald Michie
(who later founded the Department of Machine Intelligence and Perception at the University of
Edinburgh), remembers Turing talking often about the possibility of computing machines (1) learning
from experience and (2) solving problems by means of searching through the space of possible
solutions, guided by rule-of-thumb principles. The modern term for the latter idea is "heuristic search", a
heuristic being any rule-of-thumb principle that cuts down the amount of searching required in order to
find the solution to a problem. Programming using heuristics is a major part of modern AI, as is the area
now known as machine learning.
At Bletchley Park Turing illustrated his ideas on machine intelligence by reference to chess. (Ever since,
chess and other board games have been regarded as an important test-bed for ideas in AI, since these
are a useful source of challenging and clearly defined problems against which proposed methods for
problem-solving can be tested.) In principle, a chess-playing computer could play by searching
exhaustively through all the available moves, but in practice this is impossible, since it would involve
examining an astronomically large number of moves. Heuristics are necessary to guide and to narrow
the search. Michie recalls Turing experimenting with two heuristics that later became common in AI,
minimax and best-first. The minimax heuristic (described by the mathematician John von Neumann in
1928) involves assuming that one's opponent will move in such a way as to maximise their gains; one
then makes one's own move in such a way as to minimise the losses caused by the opponent's expected
move. The best-first heuristic involves ranking the moves available to one by means of a rule-of-thumb
scoring system and examining the consequences of the highest-scoring move first.
In London in 1947 Turing gave what was, so far as is known, the earliest public lecture to mention
computer intelligence, saying "What we want is a machine that can learn from experience", adding that
the "possibility of letting the machine alter its own instructions provides the mechanism for this". In
1948 he wrote (but did not publish) a report entitled "Intelligent Machinery". This was the first
manifesto of AI and in it Turing brilliantly introduced many of the concepts that were later to become
central, in some cases after reinvention by others. One of these was the concept of "training" a network
of artificial neurons to perform specific tasks.
In 1950 Turing introduced the test for computer intelligence that is now known simply as the Turing test.
This involves three participants, the computer, a human interrogator, and a human "foil". The
interrogator attempts to determine, by asking questions of the other two participants, which is the
computer. All communication is via keyboard and screen. The interrogator may ask questions as
penetrating and wide-ranging as he or she likes, and the computer is permitted to do everything
possible to force a wrong identification. (So the computer might answer "No" in response to "Are you a
computer?" and might follow a request to multiply one large number by another with a long pause and
an incorrect answer.) The foil must help the interrogator to make a correct identification. A number of
different people play the roles of interrogator and foil, and if sufficiently many interrogators are unable
to distinguish the computer from the human being then (according to proponents of the test) it is to be
concluded that the computer is an intelligent, thinking entity. In 1991, the New York businessman Hugh
Loebner started the annual Loebner Prize competition, offering a $100,000 prize for the first computer
program to pass the Turing test (with $2,000 awarded each year for the best effort). However, no AI
program has so far come close to passing an undiluted Turing test.
In 1951 Turing gave a lecture on machine intelligence on British radio and in 1953 he published a classic
early article on chess programming. Both during and after the war Turing experimented with machine
routines for playing chess. (One was called the Turochamp.) In the absence of a computer to run his
heuristic chess program, Turing simulated the operation of the program by hand, using paper and pencil.
Play was poor! The first true AI programs had to await the arrival of stored-program electronic digital
computers.
Early AI Programs
The first working AI programs were written in the UK by Christopher Strachey, Dietrich Prinz, and
Anthony Oettinger. Strachey was at the time a teacher at Harrow School and an amateur programmer;
he later became Director of the Programming Research Group at Oxford University. Prinz worked for the
engineering firm of Ferranti Ltd, which built the Ferranti Mark I computer in collaboration with
Manchester University. Oettinger worked at the Mathematical Laboratory at Cambridge University,
home of the EDSAC computer.
Strachey chose the board game of checkers (or draughts) as the domain for his experiment in machine
intelligence. Strachey initially coded his checkers program in May 1951 for the pilot model of Turing's
Automatic Computing Engine at the National Physical Laboratory. This version of the program did not
run successfully; StracheyÕs efforts were defeated first by coding errors and subsequently by a
hardware change that rendered his program obsolete. In addition, Strachey was dissatisfied with the
method employed in the program for evaluating board positions. He wrote an improved version for the
Ferranti Mark I at Manchester (with Turing's encouragement and utilising the latter's recently
completed Programmers' Handbook for the Ferranti computer). By the summer of 1952 this program
could, Strachey reported, "play a complete game of Draughts at a reasonable speed".
Prinz's chess program, also written for the Ferranti Mark I, first ran in November 1951. It was for solving
simple problems of the mate-in-two variety. The program would examine every possible move until a
solution was found. On average several thousand moves had to be examined in the course of solving a
problem, and the program was considerably slower than a human player.
Turing started to program his Turochamp chess-player on the Ferranti Mark I but never completed the
task. Unlike Prinz's program, the Turochamp could play a complete game and operated not by
exhaustive search but under the guidance of rule-of-thumb principles devised by Turing.
Machine learning
Oettinger was considerably influenced by Turing's views on machine learning. His "Shopper" was the
earliest program to incorporate learning (details of the program were published in 1952). The program
ran on the EDSAC. Shopper's simulated world was a mall of eight shops. When sent out to purchase an
item Shopper would if necessary search for it, visiting shops at random until the item was found. While
searching, Shopper would memorise a few of the items stocked in each shop visited (just as a human
shopper would). Next time Shopper was sent out for the same item, or for some other item that it had
already located, it would go to the right shop straight away. As previously mentioned, this simple form
of learning is called "rote learning" and is to be contrasted with learning involving "generalisation",
which is exhibited by the program described next. Learning involving generalisation leaves the learner
able to perform better in situations not previously encountered. (Strachey also investigated aspects of
machine learning, taking the game of NIM as his focus, and in 1951 he reported a simple rote-learning
scheme in a letter to Turing.)
The first AI program to run in the U.S. was also a checkers program, written in 1952 by Arthur Samuel of
IBM for the IBM 701. Samuel took over the essentials of Strachey's program (which Strachey had
publicised at a computing conference in Canada in 1952) and over a period of years considerably
extended it. In 1955 he added features that enabled the program to learn from experience, and
therefore improve its play. Samuel included mechanisms for both rote learning and generalisation. The
program soon learned enough to outplay its creator. Successive enhancements that Samuel made to the
learning apparatus eventually led to the program winning a game against a former Connecticut checkers
champion in 1962 (who immediately turned the tables and beat the program in six games straight).
To speed up learning, Samuel would set up two copies of the program, Alpha and Beta, on the same
computer, and leave them to play game after game with each other. The program used heuristics to
rank moves and board positions ("looking ahead" as many as ten turns of play). The learning procedure
consisted in the computer making small numerical changes to Alpha's ranking procedure, leaving Beta's
unchanged, and then comparing Alpha's and Beta's performance over a few games. If Alpha played
worse than Beta, these changes to the ranking procedure were discarded, but if Alpha played better
than Beta then Beta's ranking procedure was replaced with Alpha's. As in biological evolution, the fitter
survived, and over many such cycles of mutation and selection the program's skill would increase.
(However, the quality of learning displayed by even a simple living being far surpasses that of Samuel's
and Oettinger's programs.)
Evolutionary computing
The work by Samuel just described was among the earliest in a field now called evolutionary computing
and is an example of the use of a genetic algorithm or GA. The term "genetic algorithm" was introduced
in about 1975 by John Holland and his research group at the University of Michigan, Ann Arbor.
Holland's work is principally responsible for the current intense interest in GAs. GAs employ methods
analogous to the processes of natural evolution in order to produce successive generations of software
entities that are increasingly fit for their intended purpose. The concept in fact goes back to Turing's
manifesto of 1948, where he employed the term "genetical search". The use of GAs is burgeoning in AI
and elsewhere. In one recent application, a GA-based system and a witness to a crime cooperate to
generate on-screen faces that become closer and closer to the recollected face of the criminal.
The ability to reason logically is an important aspect of intelligence and has always been a major focus of
AI research. In his 1948 manifesto, Turing emphasised that once a computer can prove logical theorems
it will be able to search intelligently for solutions to problems. (An example of a simple logical theorem is
"given that either X is true or Y is true, and given that X is in fact false, it follows that Y is true".) Prinz
used the Ferranti Mark I, the first commercially available computer, to solve logical problems, and in
1949 and 1951 Ferranti built two small experimental special-purpose computers for theorem-proving
and other logical work.
An important landmark in this area was a theorem-proving program written in 1955-1956 by Allen
Newell and J. Clifford Shaw of the RAND Corporation at Santa Monica and Herbert Simon of the
Carnegie Institute of Technology (now Carnegie-Mellon University). The program was designed to prove
theorems from the famous logical work Principia Mathematica by Alfred North Whitehead and Bertrand
Russell. In the case of one theorem, the proof devised by the program was more elegant than the proof
given by Whitehead and Russell.
The Logic Theorist, as the program became known, was the major exhibit at a conference organised in
1956 at Dartmouth College, New Hampshire, by John McCarthy, who subsequently became one of the
most influential figures in AI. The title of the conference was "The Dartmouth Summer Research Project
on Artificial Intelligence". This was the first use of the term "Artificial Intelligence". Turing's original term
"machine intelligence" has also persisted, especially in Britain.
Newell, Simon and Shaw went on to construct the General Problem Solver, or GPS. The first version of
GPS ran in 1957 and work continued on the project for about a decade. GPS could solve an impressive
variety of puzzles, for example the "missionaries and cannibals" problem: How are a party of three
missionaries and three cannibals to cross a river in a small boat that will take no more than two at a
time, without the missionaries on either bank becoming outnumbered by cannibals? GPS would search
for a solution in a trial-and-error fashion, under the guidance of heuristics supplied by the programmers.
One criticism of GPS, and other programs that lack learning, is that the program's "intelligence" is
entirely second-hand, coming from the programmer (mainly via the heuristics, in the case of GPS).
Two of the best-known early programs are Eliza and Parry. Details of both were first published in 1966.
These programs gave an eerie semblance of conversing intelligently. Parry, written by Stanford
University psychiatrist Kenneth Colby, simulated a human paranoiac. Parry's responses are capitalised in
the following extract from a "conversation" between Parry and a psychiatric interviewer.
Eliza, written by Joseph Weizenbaum at MIT, simulated a human therapist. In the following extract, Eliza
"speaks" second.
Neither Parry nor Eliza can reasonably be described as intelligent. Parry's contributions to the
conversation are "canned"--constructed in advance by the programmer and stored away in the
computer's memory. As the philosopher Ned Block says, systems like Parry are no more intelligent than
is a juke box. Eliza, too, relies on canned sentences and simple programming tricks (such as editing and
returning the remark that the human participant has just made).
AI Programming Languages
In the course of their work on the Logic Theorist and GPS, Newell, Simon and Shaw developed their
Information Processing Language, or IPL, a computer language tailored for AI programming. At the heart
of IPL was a highly flexible data-structure they called a "list". A list is simply an ordered sequence of
items of data. Some or all of the items in a list may themselves be lists. This leads to richly branching
structures.
In 1960 John McCarthy combined elements of IPL with elements of the lambda calculus--a powerful
logical apparatus dating from 1936--to produce the language that he called LISP (from LISt Processor). In
the U.S., LISP remains the principal language for AI work. (The lambda calculus itself was invented by
Princeton logician Alonzo Church, while investigating the abstract Entscheidungsproblem, or decision
problem, for predicate logic--the same problem that Turing was attacking when he invented the
universal Turing machine.)
The logic programming language PROLOG (from PROgrammation en LOGique) was conceived by Alain
Colmerauer at the University of Marseilles, where the language was first implemented in 1973. PROLOG
was further developed by logician Robert Kowalski, a member of the AI group at Edinburgh University.
This language makes use of a powerful theorem-proving technique known as "resolution", invented in
1963 at the Atomic Energy Commission's Argonne National Laboratory in Illinois by the British logician
Alan Robinson. PROLOG can determine whether or not a given statement follows logically from other
given statements. For example, given the statements "All logicians are rational" and "Robinson is a
logician", a PROLOG program responds in the affirmative to the query "Robinson is rational?". PROLOG
is widely used for AI work, especially in Europe and Japan.
Researchers at the Institute for New Generation Computer Technology in Tokyo have used PROLOG as
the basis for sophisticated logic programming languages. These languages are in use on non-numerical
parallel computers developed at the Institute. (The languages and the computers are known as "Fifth
Generation" software and hardware.)
Other recent work includes the development of languages for reasoning about time-dependent data
such as "the account was paid yesterday". These languages are based on tense logic, a type of logic that
permits statements to be located in the flow of time. (Tense logic was invented in 1953 by the
philosopher Arthur Prior at the University of Canterbury, New Zealand.)
Micro-World AI
The real world is full of distracting and obscuring detail: generally science progresses by focussing on
artificially simple models of reality (in physics, frictionless planes and perfectly rigid bodies, for
example). In 1970 Marvin Minsky and Seymour Papert, of the MIT AI Laboratory, proposed that AI
research should likewise focus on developing programs capable of intelligent behaviour in artificially
simple situations known as micro-worlds. Much research has focussed on the so-called blocks world,
which consists of coloured blocks of various shapes and sizes arrayed on a flat surface.
An early success of the micro-world approach was SHRDLU, written by Terry Winograd of MIT (details of
the program were published in 1972). SHRDLU controlled a robot arm that operated above a flat surface
strewn with play blocks (both the arm and the blocks were virtual). SHRDLU would respond to
commands typed in natural English, such as "Will you please stack up both of the red blocks and either a
green cube or a pyramid". The program would plan out a sequence of actions and in the virtual world
the robot arm would arrange the blocks appropriately. SHRDLU could correctly answer questions about
its world of blocks, for example "Can a pyramid be supported by a pyramid?" (SHRDLU attempts to stack
up two pyramids and fails) and "Is there anything which is bigger than every pyramid but is not as wide
as the thing that supports it?" (to which SHRDLU answered "Yes, the blue block"). SHRDLU could also
answer questions about its own actions:
Had you touched any pyramid before you put one on the green block?
YES, THE GREEN ONE.
When did you pick it up?
WHILE I WAS STACKING UP A RED CUBE, A LARGE GREEN CUBE, AND A LARGE RED BLOCK.
Why?
TO GET RID OF IT.
Why did you do that?
TO CLEAN OFF THE RED CUBE.
Why did you clear off that cube?
TO PUT IT ON A LARGE GREEN CUBE.
Why did you do that?
TO STACK UP A RED CUBE, A LARGE GREEN CUBE, AND A LARGE RED BLOCK.
Why did you do that?
BECAUSE YOU ASKED ME TO.
Although SHRDLU was initially hailed as a major breakthrough, Winograd soon announced that the
program was in fact a dead end. The techniques pioneered in the program proved unsuitable for
application in wider, more interesting worlds. Moreover, the appearance that Shrdlu gives of
understanding the blocks micro-world, and English statements concerning it, is in fact an illusion. Shrdlu
has no idea what a red block is.
Another product of the micro-world approach was Shakey, a mobile robot developed at the Stanford
Research Institute by Bertram Raphael, Nils Nilsson and their group, during the period 1968-1972.
(Shakey can now be viewed at the Boston Computing Museum.) The robot occupied a specially built
micro-world consisting of walls, doorways, and a few simply-shaped wooden blocks. Each wall had a
carefully painted baseboard to enable the robot to "see" where the wall met the floor (a simplification
of reality that is typical of the micro-world approach). Shakey had about a dozen basic abilities, such as
TURN, PUSH and CLIMB-RAMP. These could be combined in various ways by the robot's planning
programs. Shakey's primary sensor was a black-and-white television camera. Other sensors included a
"bump bar", and odometry that enabled the robot to calculate its position by "dead reckoning". A
demonstration video showed Shakey obeying an instruction to move a certain block from one room to
another by locating a ramp, pushing the ramp to the platform on which the block happened to be
located, trundling up the ramp, toppling the block onto the floor, descending the ramp, and
manoeuvring the block to the required room, this sequence of actions having been devised entirely by
the robot's planning program without human intervention. Critics emphasise the highly simplified
nature of Shakey's environment and point out that, despite these simplifications, Shakey operated
excruciatingly slowly--the sequence of actions in the demonstration video in fact took days to complete.
The reasons for Shakey's inability to operate on the same time-scale as a human being are examined
later in this article.
FREDDY, a stationary robot with a TV "eye" mounted on a steerable platform, and a pincer "hand", was
constructed at Edinburgh University under the direction of Donald Michie. FREDDY was able to
recognise a small repertoire of objects, including a hammer, a cup and a ball, with about 95% accuracy;
recognising a single object would take several minutes of computing time. The robot could be "taught"
to assemble simple objects, such as a toy car, from a kit of parts. Envisaged applications included
production-line assembly work and automatic parcel handling. FREDDY was conceived in 1966 but work
was interrupted in 1973, owing to a change in the British Government's funding policy in the wake of a
disparaging report on AI (and especially robotics) by the Cambridge mathematician Sir James Lighthill.
Work on FREDDY resumed in 1982 with U.S. funding.
Roger Schank and his group at Yale applied a form of the micro-world approach to language processing.
Their program SAM (1975) could answer questions about simple stories concerning stereotypical
situations, such as dining in a restaurant and travelling on the subway. The program could infer
information that was implicit in the story. For example, when asked "What did John order?", SAM
replies "John ordered lasagne", even though the story states only that John went to a restaurant and ate
lasagne. FRUMP, another program by Schank's group (1977), produced summaries in three languages of
wire-service news reports. Impressive though SAM and FRUMP are, it is important to bear in mind that
these programs are disembodied and have no real idea what lasagne and eating are. As critics point out,
understanding a story requires more than an ability to produce strings of symbols in response to other
strings of symbols.
The greatest success of the micro-world approach is a type of programs known as an expert system.
Expert Systems
An expert system is a computer program dedicated to solving problems and giving advice within a
specialised area of knowledge. A good system can match the performance of a human specialist. The
field of expert systems is the most advanced part of AI, and expert systems are in wide commercial use.
Expert systems are examples of micro-world programs: their "worlds"--for example, a model of a ship's
hold and the containers that are to be stowed in it--are self-contained and relatively uncomplicated.
Uses of expert systems include medical diagnosis, chemical analysis, credit authorisation, financial
management, corporate planning, document routing in financial institutions, oil and mineral
prospecting, genetic engineering, automobile design and manufacture, camera lens design, computer
installation design, airline scheduling, cargo placement, and the provision of an automatic customer help
service for home computer owners.
The basic components of an expert system are a "knowledge base" or KB and an "inference engine". The
information in the KB is obtained by interviewing people who are expert in the area in question. The
interviewer, or "knowledge engineer", organises the information elicited from the experts into a
collection of rules, typically of "if-then" structure. Rules of this type are called "production rules". The
inference engine enables the expert system to draw deductions from the rules in the KB. For example, if
the KB contains production rules "if x then y" and "if y then z", the inference engine is able to deduce "if
x then z". The expert system might then query its user "is x true in the situation that we are
considering?" (e.g. "does the patient have a rash?") and if the answer is affirmative, the system will
proceed to infer z.
In 1965 the AI researcher Edward Feigenbaum and the geneticist Joshua Lederberg, both of Stanford
University, began work on Heuristic Dendral, the high-performance program that was the model for
much of the ensuing work in the area of expert systems (the name subsequently became DENDRAL). The
program's task was chemical analysis. The substance to be analysed might, for example, be a
complicated compound of carbon, hydrogen and nitrogen. Starting from spectrographic data obtained
from the substance, DENDRAL would hypothesise the substance's molecular structure. DENDRAL's
performance rivalled that of human chemists expert at this task, and the program was used in industry
and in universities.
Work on MYCIN, an expert system for treating blood infections, began at Stanford in 1972. MYCIN would
attempt to identify the organism responsible for an infection from information concerning the patient's
symptoms and test results. The program would request further information if necessary, asking
questions such as "has the patient recently suffered burns?". Sometimes MYCIN would suggest
additional laboratory tests. When the program had arrived at a diagnosis it would recommend a course
of medication. If requested, MYCIN would explain the reasoning leading to the diagnosis and
recommendation.
Examples of production rules from MYCIN's knowledge base are (1) If the site of the culture is blood,
and the stain of the organism is gramneg, and the morphology of the organism is rod, and the patient
has been seriously burned, then there is evidence (.4) that the identity of the organism is pseudomonas.
(The decimal number is a certainty factor, indicating the extent to which the evidence supports the
conclusion.) (2) If the identity of the organism is pseudomonas then therapy should be selected from
among the following drugs: Colistin (.98) Polymyxin (.96) Gentamicin (.96) Carbenicillin (.65)
Sulfisoxazole (.64). (The decimal numbers represent the statistical probability of the drug arresting the
growth of pseudomonas.) The program would make a final choice of drug from this list after quizzing the
user concerning contra-indications such as allergies. Using around 500 such rules MYCIN achieved a high
level of performance. The program operated at the same level of competence as human specialists in
blood infections, and rather better than general practitioners.
Janice Aikins' medical expert system Centaur (1983) was designed to determine the presence and
severity of lung disease in a patient by interpreting measurements from pulmonary function tests. The
following is actual output from the expert system concerning a patient at Pacific Medical Center in San
Francisco.
The findings about the diagnosis of obstructive airways disease are as follows: Elevated lung volumes
indicate overinflation. The RV/TLC ratio is increased, suggesting a severe degree of air trapping. Low
mid-expiratory flow is consistent with severe airway obstruction. Obstruction is indicated by curvature
of the flow-volume loop which is of a severe degree. Conclusions: Smoking probably exacerbates the
severity of the patient's airway obstruction. Discontinuation of smoking should help relieve the
symptoms. Good response to bronchodilators is consistent with an asthmatic condition, and their
continued use is indicated. Pulmonary function diagnosis: Severe obstructive airways disease, asthmatic
type. Consultation finished.
An important feature of expert systems is that they are able to work cooperatively with their human
users, enabling a degree of human-computer symbiosis. AI researcher Douglas Lenat says of his expert
system Eurisko, which became a champion player in the star-wars game Traveller, that the "final
crediting of the win should be about 60/40% Lenat/Eurisko, though the significant point here is that
neither Lenat nor Eurisko could have won alone". Eurisko and Lenat cooperatively designed a fleet of
warships which exploited the rules of the Traveller game in unconventional ways, and which was
markedly superior to the fleets designed by human participants in the game.
Fuzzy logic
Some expert systems use fuzzy logic. In standard, non-fuzzy, logic there are only two "truth values", true
and false. This is a somewhat unnatural restriction, since we normally think of statements as being
nearly true, partly false, truer than certain other statements, and so on. According to standard logic,
however, there are no such in-between values--no "degrees of truth"--and any statement is either
completely true or completely false. In 1920 and 1930 the Polish philosopher Jan Lukasiewicz introduced
a form of logic that employs not just two values but many. Lotfi Zadeh, of the University of California at
Berkeley, subsequently proposed that the many values of Lukasiewicz's logic be regarded as degrees of
truth, and he coined the expression "fuzzy logic" for the result. (Zadeh published the first of many
papers on the subject in 1965.) Fuzzy logic is particularly useful when it is necessary to deal with vague
expressions, such as "bald", "heavy", "high", "low", "hot", "cold" and so on. Vague expressions are
difficult to deal with in standard logic because statements involving them--"Fred is bald", say--may be
neither completely true nor completely false. Non-baldness shades gradually into baldness, with no
sharp dividing line at which the statement "Fred is bald" could change from being completely false to
completely true. Often the rules that knowledge engineers elicit from human experts contain vague
expressions, so it is useful if an expert system's inference engine employs fuzzy logic. An example of
such a rule is: "If the pressure is high but not too high, then reduce the fuel flow a little". (Fuzzy logic is
used elsewhere in AI, for example in robotics and in neuron-like computing. There are literally
thousands of commercial applications of fuzzy logic, many developed in Japan, ranging from an
automatic subway train controller to control systems for washing machines and cameras.)
Expert systems have no "common sense". They have no understanding of what they are for, nor of what
the limits of their applicability are, nor of how their recommendations fit into a larger context. If MYCIN
were told that a patient who has received a gunshot wound is bleeding to death, the program would
attempt to diagnose a bacterial cause for the patient's symptoms. Expert systems can make absurd
errors, such as prescribing an obviously incorrect dosage of a drug for a patient whose weight and age
are accidentally swapped by the clerk. One project aimed at improving the technology further is
described in the next section.
The knowledge base of an expert system is small and therefore manageable--a few thousand rules at
most. Programmers are able to employ simple methods of searching and updating the KB which would
not work if the KB were large. Furthermore, micro-world programming involves extensive use of what
are called "domain-specific tricks"--dodges and shortcuts that work only because of the circumscribed
nature of the program's "world". More general simplifications are also possible. One example concerns
the representation of time. Some expert systems get by without acknowledging time at all. In their
micro-worlds everything happens in an eternal present. If reference to time is unavoidable, the micro-
world programmer includes only such aspects of temporal structure as are essential to the task--for
example, that if a is before b and b is before c then a is before c. This rule enables the expert system to
merge suitable pairs of before-statements and so extract their implication (e.g. that the patient's rash
occurred before the application of penicillin). The system may have no other information at all
concerning the relationship "before"--not even that it orders events in time rather than space.
The problem of how to design a computer program that performs at human levels of competence in the
full complexity of the real world remains open.
The CYC Project
CYC (the name comes from "encyclopaedia") is the largest experiment yet in symbolic AI. The project
began at the Microelectronics and Computer Technology Corporation in Texas in 1984 under the
direction of Douglas Lenat, with an initial budget of U.S.$50 million, and is now Cycorp Inc. The goal is to
build a KB containing a significant percentage of the common sense knowledge of a human being. Lenat
hopes that the CYC project will culminate in a KB that can serve as the foundation for future generations
of expert systems. His expectation is that when expert systems are equipped with common sense they
will achieve an even higher level of performance and be less prone to errors of the sort just mentioned.
By "common sense", AI researchers mean that large corpus of worldly knowledge that human beings
use to get along in daily life. A moment's reflection reveals that even the simplest activities and
transactions presuppose a mass of trivial-seeming knowledge: to get to a place one should (on the
whole) move in its direction; one can pass by an object by moving first towards it and then away from it;
one can pull with a string, but not push; pushing something usually affects its position; an object resting
on a pushed object usually but not always moves with the pushed object; water flows downhill; city
dwellers do not usually go outside undressed; causes generally precede their effects; time constantly
passes and future events become past events ... and so on and so on. A computer that is to get along
intelligently in the real world must somehow be given access to millions of such facts. Winograd, the
creator of SHRDLU, has remarked "It has long been recognised that it is much easier to write a program
to carry out abstruse formal operations than to capture the common sense of a dog".
The CYC project involves "hand-coding" many millions of assertions. By the end of the first six years,
over one million assertions had been entered manually into the KB. Lenat estimates that it will require
some 2 person-centuries of work to increase this figure to the 100 million assertions that he believes are
necessary before CYC can begin learning usefully from written material for itself. At any one time as
many as 30 people may be logged into CYC, all simultaneously entering data. These knowledge-enterers
(or "cyclists") go through newspaper and magazine articles, encyclopaedia entries, advertisements, and
so forth, asking themselves what the writer assumed the reader would already know: living things get
diseases, the products of a commercial process are more valuable than the inputs, and so on. Lenat
describes CYC as "the complement of an encyclopaedia": the primary goal of the project is to encode the
knowledge that any person or machine must have before they can begin to understand an
encyclopaedia. He has predicted that in the early years of the new millennium, CYC will become "a
system with human-level breadth and depth of knowledge".
CYC uses its common-sense knowledge to draw inferences that would defeat simpler systems. For
example, CYC can infer "Garcia is wet" from the statement "Garcia is finishing a marathon run",
employing its knowledge that running a marathon entails high exertion, that people sweat at high levels
of exertion, and that when something sweats it is wet.
Among the outstanding fundamental problems with CYC are (1) issues in search and problem-solving,
for example how to automatically search the KB for information that is relevant to a given problem
(these issues are aspects of the frame problem, described in the section Nouvelle AI) and (2) issues in
knowledge representation, for example how basic concepts such as those of substance and causation
are to be analyzed and represented within the KB. Lenat emphasises the importance of large-scale
knowledge-entry and is devoting only some 20 percent of the project's effort to development of
mechanisms for searching, updating, reasoning, learning, and analogizing. Critics argue that this strategy
puts the cart before the horse.
Top-Down AI vs Bottom-Up AI
Turing's manifesto of 1948 distinguished two different approaches to AI, which may be termed "top
down" and "bottom up". The work described so far in this article belongs to the top-down approach. In
top-down AI, cognition is treated as a high-level phenomenon that is independent of the low-level
details of the implementing mechanism--a brain in the case of a human being, and one or another
design of electronic digital computer in the artificial case. Researchers in bottom-up AI, or
connectionism, take an opposite approach and simulate networks of artificial neurons that are similar to
the neurons in the human brain. They then investigate what aspects of cognition can be recreated in
these artificial networks.
The difference between the two approaches may be illustrated by considering the task of building a
system to discriminate between W, say, and other letters. A bottom-up approach could involve
presenting letters one by one to a neural network that is configured somewhat like a retina, and
reinforcing neurons that happen to respond more vigorously to the presence of W than to the presence
of aany other letter. A top-down approach could involve writing a computer program that checks inputs
of letters against a description of W that is couched in terms of the angles and relative lengths of
intersecting line segments. Simply put, the currency of the bottom-up approach is neural activity and of
the top-down approach descriptions of relevant features of the task.
The descriptions employed in the top-down approach are stored in the computer's memory as
structures of symbols (e.g. lists). In the case of a chess or checkers program, for example, the
descriptions involved are of board positions, moves, and so forth. The reliance of top-down AI on
symbolically encoded descriptions has earned it the name "symbolic AI". In the 1970s Newell and Simon-
-vigorous advocates of symbolic AI--summed up the approach in what they called the Physical Symbol
System Hypothesis, which says that the processing of structures of symbols by a digital computer is
sufficient to produce artificial intelligence, and that, moreover, the processing of structures of symbols
by the human brain is the basis of human intelligence. While it remains an open question whether the
Physical Symbol System Hypothesis is true or false, recent successes in bottom-up AI have resulted in
symbolic AI being to some extent eclipsed by the neural approach, and the Physical Symbol System
Hypothesis has fallen out of fashion.
Connectionism
Connectionism, or neuron-like computing, developed out of attempts to understand how the brain
works at the neural level, and in particular how we learn and remember.
A natural neural network. The Golgi method of staining brain tissue renders the neurons and their
interconnecting fibres visible in silhouette.
In one famous connectionist experiment (conducted at the University of California at San Diego and
published in 1986), David Rumelhart and James McClelland trained a network of 920 artificial neurons to
form the past tenses of English verbs. The network consisted of two layers of 460 neurons:
Each of the 460 neurons in the input layer is connected to each of the 460 neurons in the output layer
Root forms of verbs--such as "come", "look", and "sleep"--were presented (in an encoded form) to one
layer of neurons, the input layer. A supervisory computer program observed the difference between the
actual response at the layer of output neurons and the desired response--"came", say--and then
mechanically adjusted the connections throughout the network in such a way as to give the network a
slight push in the direction of the correct response (this procedure is explained in more detail in what
follows).
About 400 different verbs were presented one by one to the network and the connections were
adjusted after each presentation. This whole procedure was repeated about 200 times using the same
verbs. By this stage the network had learned its task satisfactorily and would correctly form the past
tense of unfamiliar verbs as well as of the original verbs. For example, when presented for the first time
with "guard" the network responded "guarded", with "weep" "wept", with "cling" "clung", and with
"drip" "dripped" (notice the double "p"). This is a striking example of learning involving generalisation.
(Sometimes, though, the peculiarities of English were too much for the network and it formed
"squawked" from "squat", "shipped" from "shape", and "membled" from "mail".)
The simple neural network shown below illustrates the central ideas of connectionism.
A pattern-classifier
Four of the network's five neurons are for input and the fifth--to which each of the others is connected--
is for output. Each of the neurons is either firing (1) or not firing (0). This network can learn to which of
two groups, A and B, various simple patterns belong. An external agent is able to "clamp" the four input
neurons into a desired pattern, for example 1100 (i.e. the two neurons to the left are firing and the
other two are quiescent). Each such pattern has been pre-assigned to one of two groups, A and B. When
a pattern is presented as input, the trained network will correctly classify it as belonging to group A or
group B, producing 1 as output if the pattern belongs to A, and 0 if it belongs to B (i.e. the output neuron
fires in the former case, does not fire in the latter).
Each connection leading to N, the output neuron, has a "weight". What is called the "total weighted
input" into N is calculated by adding up the weights of all the connections leading to N from neurons
that are firing. For example, suppose that only two of the input neurons, X and Y, are firing. Since the
weight of the connection from X to N is 1.5 and the weight of the connection from Y to N is 2, it follows
that the total weighted input to N is 3.5.
N has a "firing threshold" of 4. That is to say, if N's total weighted input exceeds or equals N's threshold,
then N fires; and if the total weighted input is less than the threshold, then N does not fire. So, for
example, N does not fire if the only input neurons to fire are X and Y, but N does fire if X, Y and Z all fire.
Training the network involves two steps. First, the external agent inputs a pattern and observes the
behaviour of N. Second, the agent adjusts the connection-weights in accordance with the rules:
(1) If the actual output is 0 and the desired output is 1, increase by a small fixed amount the weight of
each connection leading to N from neurons that are firing (thus making it more likely that N will fire next
time the network is given the same pattern)
(2) If the actual output is 1 and the desired output is 0, decrease by that same small amount the weight
of each connection leading to the output neuron from neurons that are firing (thus making it less likely
that the output neuron will fire the next time the network is given that pattern as input).
The external agent--actually a computer program--goes through this two-step procedure with each of
the patterns in the sample that the network is being trained to classify. The agent then repeats the
whole process a considerable number of times. During these many repetitions, a pattern of connection
weights is forged that enables the network to respond correctly to each of the patterns.
The striking thing is that the learning process is entirely mechanistic and requires no human intervention
or adjustment. The connection weights are increased or decreased mechanically by a constant amount
and the procedure remains the same no matter what task the network is learning.
Another name for connectionism is "parallel distributed processing" or PDP. This terminology
emphasises two important features of neuron-like computing. (1) A large number of relatively simple
processors--the neurons--operate in parallel. (2) Neural networks store information in a distributed or
holistic fashion, with each individual connection participating in the storage of many different items of
information. The know-how that enables the past-tense network to form "wept" from "weep", for
example, is not stored in one specific location in the network but is spread through the entire pattern of
connection weights that was forged during training. The human brain also appears to store information
in a distributed fashion, and connectionist research is contributing to attempts to understand how the
brain does so.
(1) The recognising of faces and other objects from visual data. A neural network designed by John
Hummel and Irving Biederman at the University of Minnesota can identify about ten objects from simple
line drawings. The network is able to recognise the objects--which include a mug and a frying pan--even
when they are drawn from various different angles. Networks investigated by Tomaso Poggio of MIT are
able to recognise (a) bent-wire shapes drawn from different angles (b) faces photographed from
different angles and showing different expressions (c) objects from cartoon drawings with grey-scale
shading indicating depth and orientation. (An early commercially available neuron-like face recognition
system was WISARD, designed at the beginning of the 1980s by Igor Aleksander of Imperial College
London. WISARD was used for security applications.)
(2) Language processing. Neural networks are able to convert handwriting and typewritten material to
standardised text. The U.S. Internal Revenue Service has commissioned a neuron-like system that will
automatically read tax returns and correspondence. Neural networks also convert speech to printed text
and printed text to speech.
(3) Neural networks are being used increasingly for loan risk assessment, real estate valuation,
bankruptcy prediction, share price prediction, and other business applications.
(4) Medical applications include detecting lung nodules and heart arrhythmia, and predicting patients'
reactions to drugs.
History of connectionism
In 1933 the psychologist Edward Thorndike suggested that human learning consists in the strengthening
of some (then unknown) property of neurons, and in 1949 psychologist Donald Hebb suggested that it is
specifically a strengthening of the connections between neurons in the brain that accounts for learning.
In 1943, the neurophysiologist Warren McCulloch of the University of Illinois and the mathematician
Walter Pitts of the University of Chicago published an influential theory according to which each neuron
in the brain is a simple digital processor and the brain as a whole is a form of computing machine. As
McCulloch put it subsequently, "What we thought we were doing (and I think we succeeded fairly well)
was treating the brain as a Turing machine".
McCulloch and Pitts gave little discussion of learning and apparently did not envisage fabricating
networks of artificial neuron-like elements. This step was first taken, in concept, in 1947-48, when
Turing theorized that a network of initially randomly connected artificial neurons--a Turing Net--could
be "trained" (his word) to perform a given task by means of a process that renders certain neural
pathways effective and others ineffective. Turing foresaw the procedure--now in common use by
connectionists--of simulating the neurons and their interconnections within an ordinary digital computer
(just as engineers create virtual models of aircraft wings and skyscrapers).
However, Turing's own research on neural networks was carried out shortly before the first stored-
program electronic computers became available. It was not until 1954 (the year of Turing's death) that
Belmont Farley and Wesley Clark, working at MIT, succeeded in running the first computer simulations
of small neural networks. Farley and Clark were able to train networks containing at most 128 neurons
to recognise simple patterns (using essentially the training procedure described above). In addition, they
discovered that the random destruction of up to 10% of the neurons in a trained network does not
affect the network's performance at its task--a feature that is reminiscent of the brain's ability to
tolerate limited damage inflicted by surgery, an accident, or disease.
During the 1950s neuron-like computing was studied on both sides of the Atlantic. Important work was
done in England by W.K. Taylor at University College, London, J.T. Allanson at Birmingham University,
R.L. Beurle and A.M. Uttley at the Radar Research Establishment, Malvern; and in the U.S. by Frank
Rosenblatt, at the Cornell Aeronautical Laboratory.
In 1957 Rosenblatt began investigating artificial neural networks that he called "perceptrons". Although
perceptrons differed only in matters of detail from types of neural network investigated previously by
Farley and Clark in the U.S. and byTaylor, Uttley, Beurle and Allanson in Britain, Rosenblatt made major
contributions to the field, through his experimental investigations of the properties of perceptrons
(using computer simulations), and through his detailed mathematical analyses. Rosenblatt was a
charismatic communicator and soon in the U.S. there were many research groups studying perceptrons.
Rosenblatt and his followers called their approach connectionist to emphasise the importance in
learning of the creation and modification of connections between neurons and modern researchers in
neuron-like computing have adopted this term.
Rosenblatt distinguished between simple perceptrons with two layers of neurons--the networks
described earlier for forming past tenses and classifying patterns both fall into this category--and multi-
layer perceptrons with three or more layers.
A three-layer perceptron. Between the input layer (bottom) and the output layer (top) lies a so-called
'hidden layer' of neurons.
One of Rosenblatt's important contributions was to generalise the type of training procedure that Farley
and Clark had used, which applied only to two-layer networks, so that the procedure can be applied to
multi-layer networks. Rosenblatt used the phrase "back-propagating error correction" to describe his
method. The method, and the term "back-propagation", are now in everyday use in neuron-like
computing (with improvements and extensions due to Bernard Widrow and M.E. Hoff, Paul Werbos,
David Rumelhart, Geoffrey Hinton, Ronald Williams, and others).
During the 1950s and 1960s, the top-down and bottom-up approaches to AI both flourished, until in
1969 Marvin Minsky and Seymour Papert of MIT, who were both committed to symbolic AI, published a
critique of Rosenblatt's work. They proved mathematically that there are a variety of tasks that simple
two-layer perceptrons cannot accomplish. Some examples they gave are:
(1) No two-layer perceptron can correctly indicate at its output neuron (or neurons) whether there are
an even or an odd number of neurons firing in its input layer.
(2) No two-layer perceptron can produce at its output layer the exclusive disjunction of two binary
inputs X and Y (the so-called "XOR problem").
The exclusive disjunction of two binary inputs X and Y
is defined by this table.
It is important to realise that the mathematical results obtained by Minsky and Papert about two-layer
perceptrons, while interesting and technically sophisticated, showed nothing about the abilities of
perceptrons in general, since multi-layer perceptrons are able to carry out tasks that no two-layer
perceptron can accomplish. Indeed, the "XOR problem" illustrates this fact: a simple three-layer
perceptron can form the exclusive disjunction of X and Y (as Minsky and Papert knew). Nevertheless,
Minsky and Papert conjectured--without any real evidence--that the multi-layer approach is "sterile"
(their word). Somehow their analysis of the limitations of two-layer perceptrons convinced the AI
community--and the bodies that fund it--of the fruitlessness of pursuing work with neural networks, and
the majority of researchers turned away from the approach (although a small number remained loyal).
This hiatus in research into neuron-like computing persisted for well over a decade before a renaissance
occurred. Causes of the renaissance included (1) a widespread perception that symbolic AI was
stagnating (2) the possibility of simulating larger and more complex neural networks, owing to the
improvements that had occurred in the speed and memory of digital computers, and (3) results
published in the early and mid 1980s by McClelland, Rumelhart and their research group (for example,
the past-tenses experiment) which were widely viewed as a powerful demonstration of the potential of
neural networks. There followed an explosion of interest in neuron-like computing, and symbolic AI
moved into the back seat.
Nouvelle AI
The approach to AI now known as "nouvelle AI" was pioneered at the MIT AI Laboratory by the
Australian Rodney Brooks, during the latter half of the 1980s. Nouvelle AI distances itself from
traditional characterisations of AI, which emphasize human-level performance. One aim of nouvelle AI is
the relatively modest one of producing systems that display approximately the same level of intelligence
as insects.
Practitioners of nouvelle AI reject micro-world AI, emphasising that true intelligence involves the ability
to function in a real-world environment. A central idea of nouvelle AI is that the basic building blocks of
intelligence are very simple behaviours, such as avoiding an object, and moving forward. More complex
behaviours "emerge" from the interaction of these simple behaviours. For example, a micro-robot
whose simple behaviours are (1) collision-avoidance and (2) motion toward a moving object will appear
to chase the moving object while hanging back from it a little.
Brooks focussed in his initial work on building robots that behave somewhat like simplified insects (and
in doing so he deliberately turned away from traditional characterisations of AI such as the one given at
the beginning of this article). Examples of his insect-like mobile robots are Allen (after Allen Newell) and
Herbert (after Herbert Simon). Allen has a ring of twelve ultrasonic sonars as its primary sensors and
three independent behaviour-producing modules. The lowest-level module makes the robot avoid both
stationary and moving objects. With only this module activated, Allen sits in the middle of a room until
approached and then scurries away, avoiding obstacles as it goes. The second module makes the robot
wander about at random when not avoiding objects, and the third pushes the robot to look for distant
places with its sensors and to move towards them. (The second and third modules are in tension--just as
our overall behaviour may sometimes be the product of conflicting drives, such as the drive to seek
safety and the drive to avoid boredom.)
Herbert has thirty infrared sensors for avoiding local obstacles, a laser system that collects three-
dimensional depth data over a distance of about twelve feet in front of the robot, and a hand equipped
with a number of simple sensors. Herbert's real-world environment consists of the busy offices and
work-spaces of the AI lab. The robot searches on desks and tables in the lab for empty soda cans, which
it picks up and carries away. Herbert's seemingly coordinated and goal-directed behaviour emerges from
the interactions of about fifteen simple behaviours. Each simple behaviour is produced by a separate
module, and each of these modules functions without reference to the others. (Unfortunately, Herbert's
mean time from power-on to hardware failure is no more than fifteen minutes, owing principally to the
effects of vibration.)
Other robots produced by Brooks and his group include Genghis, a six-legged robot that walks over
rough terrain and will obediently follow a human, and Squirt, which bides in dark corners until a noise
beckons it out, when it will begin to follow the source of the noise, moving with what appears to be
circumspection from dark spot to dark spot. Other experiments involve tiny "gnat" robots. Speaking of
potential applications, Brooks describes possible colonies of gnat robots designed to inhabit the surface
of TV and computer screens and keep them clean.
Brooks admits that even his more complicated artificial insects come nowhere near the complexity of
real insects. One question that must be faced by those working in situated AI is whether insect-level
behaviour is a reasonable initial goal. John von Neumann, the computer pioneer and founder, along with
Turing, of the research area now known as "artificial life", thought otherwise. In a letter to the
cyberneticist Norbert Wiener in 1946, von Neumann argued that automata theorists who select the
human nervous system as their model are unrealistically picking "the most complicated object under the
sun", and that there is little advantage in selecting instead the ant, since any nervous system at all
exhibits "exceptional complexity". Von Neumann believed that "the decisive break" is "more likely to
come in another theater" and recommended attention to "organisms of the virus or bacteriophage
type" which, he pointed out, are "self-reproductive and ... are able to orient themselves in an
unorganised milieu, to move towards food, to appropriate it and to use it". This starting point would, as
he put it, provide "a degree of complexity which is not necessarily beyond human endurance".
The products of nouvelle AI are quite different from those of symbolic AI, for example Shakey and
FREDDY. These contained an internal model (or "representation") of their micro-worlds, consisting of
symbolic descriptions. This structure of symbols had to be updated continuously as the robot moved or
the world changed. The robots' planning programs would juggle with this huge structure of symbols
until descriptions were derived of actions that would transform the current situation into the desired
situation. All this computation required a large amount of processing time. This is why Shakey
performed its tasks with extreme slowness, even though careful design of the robot's environment
minimised the complexity of the internal model. In contrast, Brooks' robots contain no internal model of
the world. Herbert, for example, continuously discards the information that is received from its sensors,
sensory information persisting in the robot's memory for no more than two seconds.
AI researchers call the problem of updating, searching, and otherwise manipulating, a large structure of
symbols in realistic amounts of time the frame problem. The frame problem is endemic to symbolic AI.
Some critics of symbolic AI believe that the frame problem is largely insolvable and so maintain that the
symbolic approach will not "scale up" to yield genuinely intelligent systems. It is possible that CYC, for
example, will succumb to the frame problem long before the system achieves human levels of
knowledge.
Nouvelle AI sidesteps the frame problem. Nouvelle systems do not contain a complicated symbolic
model of their environment. Information is left "out in the world" until such time as the system needs it.
A nouvelle system refers continuously to its sensors rather than to an internal model of the world: it
"reads off" the external world whatever information it needs, at precisely the time it needs it. As Brooks
puts it, the world is its own best model--always exactly up to date and complete in every detail.
Situated AI
Traditional AI has by and large attempted to build disembodied intelligences whose only way of
interacting with the world has been via keyboard and screen or printer. Nouvelle AI attempts to build
embodied intelligences situated in the real world. Brooks quotes approvingly from the brief sketches
that Turing gave in 1948 and 1950 of the "situated" approach. Turing wrote of equipping a machine
"with the best sense organs that money can buy" and teaching it "to understand and speak English" by a
process that would "follow the normal teaching of a child". Turing contrasted this with the approach to
AI that focuses on abstract activities, such as the playing of chess. He advocated that both approaches
be pursued, but until now relatively little attention has been paid to the situated approach.
The situated approach is anticipated in the writings of the philosopher Bert Dreyfus, of the University of
California at Berkeley. Dreyfus is probably the best-known critic of symbolic AI. He has been arguing
against the Physical Symbol System Hypothesis since the early 1960s, urging the inadequacy of the view
that everything relevant to intelligent behaviour can be captured by means of structures (e.g. lists) of
symbolic descriptions. At the same time he has advocated an alternative view of intelligence, which
stresses the need for an intelligent agent to be situated in the world, and he has emphasised the role of
the body in intelligent behaviour and the importance of such basic activities as moving about in the
world and dealing with obstacles. Once reviled by admirers of AI, Dreyfus is now regarded as a prophet
of the situated approach.
Cog
Brooks' own recent work has taken the opposite direction to that proposed by von Neumann in the
quotations given earlier. Brooks is pursuing AI's traditional goal of human-level intelligence, and with
Lynn Andrea Stein, he has built a humanoid robot known as Cog. Cog has four microphone-type sound
sensors and is provided with saccading foveated vision by cameras mounted on its "head". Cog's
(legless) torso is capable of leaning and twisting. Strain gauges on the spine give Cog information about
posture. Heat and current sensors on the robot's motors provide feedback concerning exertion. The arm
and manipulating hand are equipped with strain gauges and heat and current sensors. Electrically-
conducting rubber membranes on the hand and arm provide tactile information.
Brooks believes that Cog will learn to correlate noises with visual events and to extract human voices
from background noise; and that in the long run Cog will, through its interactions with its environment
and with human beings, learn for itself some of the wealth of common sense knowledge that Lenat and
his team are patiently hand-coding into CYC.
Critics of nouvelle AI emphasis that so far the approach has failed to produce a system exhibiting
anything like the complexity of behaviour found in real insects. Suggestions by some advocates of
nouvelle AI that it is only a short step to systems which are conscious and which possess language seem
entirely premature.
Chess
Some of AI's most conspicuous successes have been in chess, its oldest area of research.
In 1945 Turing predicted that computers would one day play "very good chess", an opinion echoed in
1949 by Claude Shannon of Bell Telephone Laboratories, another early theoretician of computer chess.
By 1958 Simon and Newell were predicting that within ten years the world chess champion would be a
computer, unless barred by the rules. Just under 40 years later, on May 11 1997, in midtown
Manhattan, IBM's Deep Blue beat the reigning world champion, Gary Kasparov, in a six-game match.
Critics question the worth of research into computer chess. MIT linguist Noam Chomsky has said that a
computer program's beating a grandmaster at chess is about as interesting as a bulldozer's "winning" an
Olympic weight-lifting competition. Deep Blue is indeed a bulldozer of sorts--its 256 parallel processors
enable it to examine 200 million possible moves per second and to look ahead as many as fourteen
turns of play.
The huge improvement in machine chess since Turing's day owes much more to advances in hardware
engineering than to advances in AI. Massive increases in cpu speed and memory have meant that each
generation of chess machine has been able to examine increasingly more possible moves. Turing's
expectation was that chess-programming would contribute to the study of how human beings think. In
fact, little or nothing about human thought processes has been learned from the series of projects that
culminated in Deep Blue.
Is Strong AI Possible?
The ongoing success of applied Artificial Intelligence and of cognitive simulation seems assured.
However, strong AI, which aims to duplicate human intellectual abilities, remains controversial. The
reputation of this area of research has been damaged over the years by exaggerated claims of success
that have appeared both in the popular media and in the professional journals. At the present time,
even an embodied system displaying the overall intelligence of a cockroach is proving elusive, let alone a
system rivalling a human being.
The difficulty of "scaling up" AI's so far relatively modest achievements cannot be overstated. Five
decades of research in symbolic AI has failed to produce any firm evidence that a symbol-system can
manifest human levels of general intelligence. Critics of nouvelle AI regard as mystical the view that
high-level behaviours involving language-understanding, planning, and reasoning will somehow
"emerge" from the interaction of basic behaviours like obstacle avoidance, gaze control and object
manipulation. Connectionists have been unable to construct working models of the nervous systems of
even the simplest living things. Caenorhabditis elegans, a much-studied worm, has approximately 300
neurons, whose pattern of interconnections is perfectly known. Yet connectionist models have failed to
mimic the worm's simple nervous system. The "neurons" of connectionist theory are gross
oversimplifications of the real thing.
However, this lack of substantial progress may simply be testimony to the difficulty of strong AI, not to
its impossibility.
Let me turn to the very idea of strong artificial intelligence. Can a computer possibly be intelligent, think
and understand? Noam Chomsky suggests that debating this question is pointless, for it is a question of
decision, not fact: decision as to whether to adopt a certain extension of common usage. There is,
Chomsky claims, no factual question as to whether any such decision is right or wrong--just as there is
no question as to whether our decision to say that aeroplanes fly is right, or our decision not to say that
ships swim is wrong. However, Chomsky is oversimplifying matters. Of course we could, if we wished,
simply decide to describe bulldozers, for instance, as things that fly. But obviously it would be misleading
to do so, since bulldozers are not appropriately similar to the other things that we describe as flying. The
important questions are: could it ever be appropriate to say that computers are intelligent, think, and
understand, and if so, what conditions must a computer satisfy in order to be so described?
Some authors offer the Turing test as a definition of intelligence: a computer is intelligent if and only if
the test fails to distinguish it from a human being. However, Turing himself in fact pointed out that his
test cannot provide a definition of intelligence. It is possible, he said, that a computer which ought to be
described as intelligent might nevertheless fail the test because it is not capable of successfully imitating
a human being. For example, why should an intelligent robot designed to oversee mining on the moon
necessarily be able to pass itself off in conversation as a human being? If an intelligent entity can fail the
test, then the test cannot function as a definition of intelligence.
It is even questionable whether a computer's passing the test would show that the computer is
intelligent. In 1956 Claude Shannon and John McCarthy raised the objection to the test that it is possible
in principle to design a program containing a complete set of "canned" responses to all the questions
that an interrogator could possibly ask during the fixed time-span of the test. Like Parry, this machine
would produce answers to the interviewer's questions by looking up appropriate responses in a giant
table. This objection--which has in recent years been revived by Ned Block, Stephen White, and myself--
seems to show that in principle a system with no intelligence at all could pass the Turing test.
In fact AI has no real definition of intelligence to offer, not even in the sub-human case. Rats are
intelligent, but what exactly must a research team achieve in order for it to be the case that the team
has created an artefact as intelligent as a rat?
In the absence of a reasonably precise criterion for when an artificial system counts as intelligent, there
is no way of telling whether a research program that aims at producing intelligent artefacts has
succeeded or failed. One result of AI's failure to produce a satisfactory criterion of when a system counts
as intelligent is that whenever AI achieves one of its goals--for example, a program that can summarise
newspaper articles, or beat the world chess champion--critics are able to say "That's not intelligence!"
(even critics who have previously maintained that no computer could possibly do the thing in question).
Marvin Minsky's response to the problem of defining intelligence is to maintain that "intelligence" is
simply our name for whichever problem-solving mental processes we do not yet understand. He likens
intelligence to the concept "unexplored regions of Africa": it disappears as soon as we discover it. Earlier
Turing made a similar point, saying "One might be tempted to define thinking as consisting of those
mental processes that we don't understan"'. However, the important problem remains of giving a clear
criterion of what would count as success in strong artificial intelligence research.
The Chinese Room Objection
One influential objection to strong AI, the Chinese room objection, originates with the philosopher John
Searle. Searle claims to be able to prove that no computer program--not even a computer program from
the far-distant future--could possibly think or understand.
Searle's alleged proof is based on the fact that every operation that a computer is able to carry out can
equally well be performed by a human being working with paper and pencil. As Turing put the point, the
very function of an electronic computer is to carry out any process that could be carried out by a human
being working with paper and pencil in a "disciplined but unintelligent manner". For example, one of a
computer's basic operations is to compare the binary numbers in two storage locations and to write 1 in
some further storage location if the numbers are the same. A human can perfectly well do this, using
pieces of paper as the storage locations. To believe that strong AI is possible is to believe that
intelligence can "emerge" from long chains of basic operations each of which is as simple as this one.
Given a list of the instructions making up a computer program, a human being can in principle obey each
instruction using paper and pencil. This is known as "handworking" a program. Searle's Chinese room
objection is as follows. Imagine that, at some stage in the future, AI researchers in, say, China announce
a program that really does think and understand, or so they claim. Imagine further that in a Turing test
(conducted in Chinese) the program cannot be distinguished from human beings. Searle maintains that,
no matter how good the performance of the program, and no matter what algorithms and data-
structures are employed in the program, it cannot in fact think and understand. This can be proved, he
says, by considering an imaginary human being, who speaks no Chinese, handworking the program in a
closed room. (Searle extends the argument to connectionist AI by considering not a room containing a
single person but a gymnasium containing a large group of people, each one of whom simulates a single
artificial neuron.) The interogator's questions, expressed in the form of Chinese ideograms, enter the
room through an input slot. The human in the room--Clerk, let's say--follows the instructions in the
program and carries out exactly the same series of computations that an electronic computer running
the program would carry out. These computations eventually produce strings of binary symbols that the
program instructs Clerk to correlate, via a table, with patterns of squiggles and squoggles (actually
Chinese ideograms). Clerk finally pushes copies of the ideograms through an output slot. As far as the
waiting interogator is concerned, the ideograms form an intelligent response to the question that was
posed. But as far as Clerk is concerned, the output is just squiggles and squoggles--hard won, but
completely meaningless. Clerk does not even know that the inputs and outputs are linguistic
expressions. Yet Clerk has done everything that a computer running the program would do. It surely
follows, says Searle, that since Clerk does not understand the input and the output after working
through the program, then nor does an electronic computer.
Few accept Searle's objection, but there is little agreement as to exactly what is wrong with it. My own
response to Searle, known as the Logical Reply to the Chinese room objection, is this. The fact that Clerk
says "No" when asked whether he understands the Chinese input and output by no means shows that
the wider system of which Clerk is a part does not understand Chinese. The wider system consists of
Clerk, the program, quantities of data (such as the table correlating binary code with ideograms), the
input and output slots, the paper memory store, and so forth. Clerk is just a cog in a wider machine.
Searle's claim is that the statement "The system as a whole does not understand" follows logically from
the statement "Clerk does not understand". The logical reply holds that this claim is fallacious, for just
the same reason that it would be fallacious to claim that the statement "The organisation of which Clerk
is a part has no taxable assets in Japan" follows logically from the statement "Clerk has no taxable assets
in Japan". If the logical reply is correct then Searle's objection to strong AI proves nothing.
For More Information About AI
Read my book:
Jack Copeland
Oxford UK and Cambridge, Mass.: Basil Blackwell, September 1993, reprinted 1994, 1995, 1997 (twice),
1998, 1999 (xii, 320). Translated into Hebrew (1995), Spanish (1996). Second edition forthcoming in
2001.
Order this book through: Blackwell | Amazon.com | Barnes & Noble | Borders