Chapter 1
Chapter 1
Chapter 1
INTRODUCTION
We call ourselves Homo sapiens—man the wise—because our intelligence is so important to us.
For thousands of years, we have tried to understand how we think; that is, how a mere handful of
matter can perceive, understand, predict, and manipulate a world far larger and more complicated
than itself. The field of artificial intelligence, or AI, goes further still: it attempts not just to
understand but also to build intelligent entities.
AI is one of the newest fields in science and engineering. Work started in earnest soon after
World War II, and the name itself was coined in 1956. Along with molecular biology, AI is
regularly cited as the “field I would most like to be in” by scientists in other disciplines. A
student in physics might reasonably feel that all the good ideas have already been taken by
Galileo, Newton, Einstein, and the rest. AI, on the other hand, still has openings for several
fulltime Einsteins and Edisons.
AI currently encompasses a huge variety of subfields, ranging from the general (learning and
perception) to the specific, such as playing chess, proving mathematical theorems, writing
poetry, driving a car on a crowded street, and diagnosing diseases. AI is relevant to any
intellectual task; it is truly a universal field.
We have claimed that AI is exciting, but we have not said what it is. In Figure 1.1 we see eight
definitions of AI, laid out along two dimensions. The definitions on top are concerned with
thought processes and reasoning, whereas the ones on the bottom address behavior. The
definitions on the left measure success in terms of fidelity to human performance where as the
ones on the right measure against an ideal performance measure, called rationality. A system is
rational if it does the “right thing,” given what it knows.
Historically, all four approaches to AI have been followed, each by different people with
different methods. A human-centered approach must be in part an empirical science, involving
observations and hypotheses about human behavior. A rationalist approach involves a
combination of mathematics and engineering. The various group have both disparaged and
helped each other. Let us look at the four approaches in more detail.
Page | 1
Thinking Humanly Thinking Rationally
“The exciting new effort to make computers “The study of mental faculties through the use
think...machines with minds, in the full and of computational models.”(Charniak and
literal sense.” (Haugeland, 1985) McDermott, 1985)
“[The automation of] activities that we “The study of the computations that make it
associate with human thinking, activities such possible to perceive, reason, and
as decision-making, problem solving, act.”(Winston, 1992)
learning...” (Bellman, 1978)
“The art of creating machines that per-form “Computational Intelligence is the study of the
functions that require intelligence when design of intelligent agents.” (Pooleet al.,
performed by people.” (Kurzweil,1990) 1998)
“The study of how to make computers do “AI . . . is concerned with intelligent behavior
things at which, at the moment, people are in artifacts.” (Nilsson, 1998)
better.” (Rich and Knight, 1991)
Figure 1.1 Some definitions of artificial intelligence, organized into four categories.
The Turing Test, proposed by Alan Turing (1950), was designed to provide a satisfactory
operational definition of intelligence. A computer passes the test if a human interrogator, after
posing some written questions, cannot tell whether the written responses come from a person or
from a computer. For now, we note that programming a computer to pass a rigorously applied
test provides plenty to work on. The computer would need to possess the following capabilities:
Turing’s test deliberately avoided direct physical interaction between the interrogator and the
computer, because physical simulation of a person is unnecessary for intelligence. However, the
so-called total Turing Test includes a video signal so that the interrogator can test the subject’s
perceptual abilities, as well as the opportunity for the interrogator to pass physical objects
“through the hatch.” To pass the total Turing Test, the computer will need
Page | 2
These six disciplines compose most of AI, and Turing deserves credit for designing a test that
remains relevant 60 years later. Yet AI researchers have devoted little effort to passing the
Turing Test, believing that it is more important to study the underlying principles of intelligence
than to duplicate an exemplar. The quest for “artificial flight” succeeded when the Wright
brothers and others stopped imitating birds and started using wind tunnels and learning about
aerodynamics. Aeronautical engineering texts do not define the goal of their field as making
“machines that fly so exactly like pigeons that they can fool even other pigeons.”
If we are going to say that a given program thinks like a human, we must have some way of
determining how humans think. We need to get inside the actual workings of human minds.
There are three ways to do this: through introspection—trying to catch our own thoughts as they
go by; through psychological experiments—observing a person in action; and through brain
imaging—observing the brain in action. Once we have a sufficiently precise theory of the mind,
it becomes possible to express the theory as a computer program. If the program’s input–output
behavior matches corresponding human behavior, that is evidence that some of the program’s
mechanisms could also be operating in humans. For example, Allen Newell and Herbert Simon,
who developed GPS, the “General Problem Solver” (Newell and Simon,1961), were not content
merely to have their program solve problems correctly. They were more concerned with
comparing the trace of its reasoning steps to traces of human subjects solving the same problems.
The interdisciplinary field of cognitive science brings together computer models from AI and
experimental techniques from psychology to construct precise and testable theories of the human
mind.
Cognitive science is a fascinating field in itself, worthy of several textbooks and at least one
encyclopedia (Wilson and Keil, 1999). We will occasionally comment on similarities or
differences between AI techniques and human cognition. Real cognitive science, however, is
necessarily based on experimental investigation of actual humans or animals. We will leave that
for other books, as we assume the reader has only a computer for experimentation.
In the early days of AI there was often confusion between the approaches: an author would argue
that an algorithm performs well on a task and that it is therefore a good model of human
performance, or vice versa. Modern authors separate the two kinds of claims; this distinction has
allowed both AI and cognitive science to develop more rapidly. The two fields continue to
fertilize each other, most notably in computer vision, which incorporates neurophysiological
evidence into computational models.
The Greek philosopher Aristotle was one of the first to attempt to codify “right thinking,” that is,
irrefutable reasoning processes. His syllogisms provided patterns for argument structures
Page | 3
SYLLOGISM that always yielded correct conclusions when given correct premises—for
example, “Socratesis a man; all men are mortal; therefore, Socrates is mortal.” These laws of
thought were supposed to govern the operation of the mind; their study initiated the field called
logic.
Logicians in the 19th century developed a precise notation for statements about all kinds of
objects in the world and the relations among them. (Contrast this with ordinary arithmetic
notation, which provides only for statements about numbers.) By 1965, programs existed that
could, in principle, solve any solvable problem described in logical notation. (Although if no
solution exists, the program might loop forever.) The so-called logicist tradition within artificial
intelligence hopes to build on such programs to create intelligent systems.
There are two main obstacles to this approach. First, it is not easy to take informal knowledge
and state it in the formal terms required by logical notation, particularly when the knowledge is
less than 100% certain. Second, there is a big difference between solving a problem “in
principle” and solving it in practice. Even problems with just a few hundred facts can exhaust the
computational resources of any computer unless it has some guidance as to which reasoning
steps to try first. Although both of these obstacles apply to any attempt to build computational
reasoning systems, they appeared first in the logicist tradition.
An agent is just something that acts (agent comes from the Latin agere, to do). Of course, all
computer programs do something, but computer agents are expected to do more: operate
autonomously, perceive their environment, persist over a prolonged time period, adapt to change,
and create and pursue goals. A rational agent is one that acts so as to achieve the best outcome
or, when there is uncertainty, the best expected outcome.
In the “laws of thought” approach to AI, the emphasis was on correct inferences. Making correct
inferences is sometimes part of being a rational agent, because one way to act rationally is to
reason logically to the conclusion that a given action will achieve one’s goals and then to act on
that conclusion. On the other hand, correct inference is not all of rationality; in some situations,
there is no provably correct thing to do, but something must still be done. There are also ways of
acting rationally that cannot be said to involve inference. For example, recoiling from a hot stove
is a reflex action that is usually more successful than as lower action taken after careful
deliberation.
All the skills needed for the Turing Test also allow an agent to act rationally. Knowledge
representation and reasoning enable agents to reach good decisions. We need to be able to
generate comprehensible sentences in natural language to get by in a complex society. We need
learning not only for erudition, but also because it improves our ability to generate effective
behavior.
Page | 4
1.2 THE FOUNDATIONS OF ARTIFICIAL INTELLIGENCE
Out of the following areas, one or multiple areas can contribute to build an intelligent system.
The first work that is now generally recognized as AI was done by Warren McCulloch and
Walter Pitts (1943). They drew on three sources: knowledge of the basic physiology and function
of neurons in the brain; a formal analysis of propositional logic due to Russell and Whitehead;
and Turing’s theory of computation. They proposed a model of artificial neurons in which each
Page | 5
neuron is characterized as being “on” or “off,” with a switch to “on” occurring in response to
stimulation by a sufficient number of neighboring neurons. The state of a neuron was conceived
of as “factually equivalent to a proposition which proposed its adequate stimulus.” They showed,
for example, that any computable function could be computed by some network of connected
neurons, and that all the logical connectives (and, or, not, etc.) could be implemented by simple
net structures. McCulloch and Pitts also suggested that suitably defined networks could learn.
Donald Hebb (1949) demonstrated a simple updating rule for modifying the connection strengths
between neurons. His rule, now called Hebbian learning, remains an influential model to this
day.
Two undergraduate students at Harvard, Marvin Minsky and Dean Edmonds, built the first
neural network computer in 1950. The SNARC, as it was called, used 3000 vacuum tubes and a
surplus automatic pilot mechanism from a B-24 bomber to simulate a network of 40 neurons.
Later, at Princeton, Minsky studied universal computation in neural networks. His Ph.D.
committee was skeptical about whether this kind of work should be considered mathematics, but
von Neumann reportedly said, “If it isn’t now, it will be someday.” Minsky was later to prove
influential theorems showing the limitations of neural network research.
There were a number of early examples of work that can be characterized as AI, but Alan
Turing’s vision was perhaps the most influential. He gave lectures on the topic as early as 1947
at the London Mathematical Society and articulated a persuasive agenda in his 1950article
“Computing Machinery and Intelligence.” Therein, he introduced the Turing Test, machine
learning, genetic algorithms, and reinforcement learning. He proposed the Child Programme
idea, explaining “Instead of trying to produce a programme to simulate the adult mind, why not
rather try to produce one which simulated the child’s?”
Princeton was home to another influential figure in AI, John McCarthy. After receiving his PhD
there in 1951 and working for two years as an instructor, McCarthy moved to Stan-ford and then
to Dartmouth College, which was to become the official birthplace of the field. McCarthy
convinced Minsky, Claude Shannon, and Nathaniel Rochester to help him bring together U.S.
researchers interested in automata theory, neural nets, and the study of intelligence. They
organized a two-month workshop at Dartmouth in the summer of 1956. The proposal states:
We propose that a 2 month, 10 man study of artificial intelligence be carried out during the
summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on
the basis of the conjecture that every aspect of learning or any other feature of intelligence can in
principle be so precisely de-scribed that a machine can be made to simulate it. An attempt will be
made to find how to make machines use language, form abstractions and concepts, solve kinds of
problems now reserved for humans, and improve themselves. We think that a significant advance
Page | 6
can be made in one or more of these problems if a carefully selected group of scientists work on
it together for a summer.
There were 10 attendees in all, including Trenchard More from Princeton, Arthur Samuel from
IBM, and Ray Solomon off and Oliver Selfridge from MIT.
Looking at the proposal for the Dartmouth workshop (McCarthyet al., 1955), we can see why it
was necessary for AI to become a separate field. Why couldn’t all the work done in AI have
taken place under the name of control theory or operations research or decision theory, which,
after all, have objectives similar to those of AI? Or why isn’t AI a branch of mathematics? The
first answer is that AI from the start embraced the idea of duplicating human faculties such as
creativity, self-improvement, and language use. None of the other fields were addressing these
issues. The second answer is methodology. AI is the only one of these fields that is clearly a
branch of computer science (although operations research does share an emphasis on computer
simulations), and AI is the only field to attempt to build machines that will function
autonomously in complex, changing environments.
The first successful commercial expert system, R1, began operation at the Digital Equipment
Corporation (McDermott, 1982). The program helped configure orders for new computer
systems; by 1986, it was saving the company an estimated $40 million a year. By 1988,DEC’s
AI group had 40 expert systems deployed, with more on the way. DuPont had 100 in use and 500
in development, saving an estimated $10 million a year. Nearly every major U.S. corporation had
its own AI group and was either using or investigating expert systems.
In 1981, the Japanese announced the “Fifth Generation” project, a 10-year plan to build
intelligent computers running Prolog. In response, the United States formed the Microelectronics
and Computer Technology Corporation (MCC) as a research consortium designed to assure
national competitiveness. In both cases, AI was part of a broad effort, including chip design and
human-interface research. In Britain, the Alvey report reinstated the funding that was cut by the
Lighthill report. In all three countries, however, the projects never met their ambitious goals.
Overall, the AI industry boomed from a few million dollars in 1980 to billions of dollars in 1988,
including hundreds of companies building expert systems, vision systems, robots, and software
and hardware specialized for these purposes. Soon after that came a period called the “AI
Winter,” in which many companies fell by the wayside as they failed to deliver on extravagant
promises.
In the mid-1980s at least four different groups reinvented the back-propagation learning
algorithm first found in 1969 by Bryson and Ho. The algorithm was applied to many learning
Page | 7
problems in computer science and psychology, and the widespread dissemination of the results in
the collection Parallel Distributed Processing(Rumelhart and McClelland, 1986)caused great
excitement.
These so-called connectionist models of intelligent systems were seen by some as direct
competitors both to the symbolic models promoted by Newell and Simon and to the logicist
approach of McCarthy and others (Smolensky, 1988). It might seem obvious that at some level
humans manipulate symbols—in fact, Terrence Deacon’s book The Symbolic Species(1997)
suggests that this is the defining characteristic of humans—but the most ardent connectionists
questioned whether symbol manipulation had any real explanatory role in detailed models of
cognition. This question remains unanswered, but the current view is that connectionist and
symbolic approaches are complementary, not competing. As occurred with the separation of AI
and cognitive science, modern neural network research has bifurcated in to two fields, one
concerned with creating effective network architectures and algorithms and understanding their
mathematical properties, the other concerned with careful modeling of the empirical properties
of actual neurons and ensembles of neurons.
Perhaps encouraged by the progress in solving the sub problems of AI, researchers have also
started to look at the “whole agent” problem again. The work of Allen Newell, John Laird, and
Paul Rosenbloom on SOAR(Newell, 1990; Laird et al., 1987) is the best-known example of a
complete agent architecture. One of the most important environments for intelligent agents is the
Internet. AI systems have become so common in Web-based applications that the “-bot” suffix
has entered everyday language. Moreover, AI technologies underlie many Internet tools, such as
search engines, recommender systems, and Web site aggregators.
One consequence of trying to build complete agents is the realization that the previously isolated
subfields of AI might need to be reorganized somewhat when their results are to be tied together.
In particular, it is now widely appreciated that sensory systems (vision, sonar, speech
recognition, etc.) cannot deliver perfectly reliable information about the environment. Hence,
reasoning and planning systems must be able to handle uncertainty. A second major consequence
of the agent perspective is that AI has been drawn into much closer contact with other fields,
such as control theory and economics, that also deal with agents. Recent progress in the control
of robotic cars has derived from a mixture of approaches ranging from better sensors, control-
theoretic integration of sensing, localization and mapping, as well as a degree of high-level
planning.
Despite these successes, some influential founders of AI, including John McCarthy(2007),
Marvin Minsky (2007), Nils Nilsson (1995, 2005) and Patrick Winston (Beal and Winston,
2009), have expressed discontent with the progress of AI. They think that AI should put less
emphasis on creating ever-improved versions of applications that are good at a specific task,
Page | 8
such as driving a car, playing chess, or recognizing speech. Instead, they believe AI should
return to its roots of striving for, in Simon’s words, “machines that think, that learn and that
create.” They call the effort human-level AI or HLAI; their first symposium was in 2004
(Minskyet al., 2004). The effort will require very large knowledge bases; Hendler et al.(1995)
discuss where these knowledge bases might come from.
A related idea is the subfield of Artificial General Intelligence or AGI (Goertzel and
Pennachin, 2007), which held its first conference and organized the Journal of Artificial General
Intelligence in 2008. AGI looks for a universal algorithm for learning and acting in any
environment, and has its roots in the work of Ray Solomonoff (1964), one of the attendees of the
original 1956 Dartmouth conference. Guaranteeing that what we create is really Friendly AI is
also a concern (Yudkowsky, 2008; Omohundro, 2008)
Throughout the 60-year history of computer science, the emphasis has been on the algorithm as
the main subject of study. But some recent work in AI suggests that for many problems, it makes
more sense to worry about the data and be less picky about what algorithm to apply. This is true
because of the increasing availability of very large data sources: for example, trillions of words
of English and billions of images from the Web (Kilgarriff and Grefenstette,2006); or billions of
base pairs of genomic sequences (Collinset al., 2003).
One influential paper in this line was Yarowsky’s (1995) work on word-sense disambiguation:
given the use of the word “plant” in a sentence, does that refer to flora or factory? Previous
approaches to the problem had relied on human-labeled examples combined with machine
learning algorithms. Yarowsky showed that the task can be done, with accuracy above 96%, with
no labeled examples at all. Instead, given a very large corpus of un annotated text and just the
dictionary definitions of the two senses—“works, industrial plant” and“ flora, plant life”—one
can label examples in the corpus, and from there bootstrap to learn new patterns that help label
new examples. Banko and Brill (2001) show that techniques like this perform even better as the
amount of available text goes from a million words to a billion and that the increase in
performance from using more data exceeds any difference in algorithm choice; a mediocre
algorithm with 100 million words of unlabeled training data outperforms the best known
algorithm with 1 million words.
As another example, Hays and Efros (2007) discuss the problem of filling in holes in a
photograph. Suppose you use Photoshop to mask out an ex-friend from a group photo, but now
you need to fill in the masked area with something that matches the background. Hays and Efros
defined an algorithm that searches through a collection of photos to find something that will
match. They found the performance of their algorithm was poor when they used a collection of
only ten thousand photos, but crossed a threshold into excellent performance when they grew the
collection to two million photos.
Page | 9
Work like this suggests that the “knowledge bottleneck” in AI—the problem of how to express
all the knowledge that a system needs—may be solved in many applications by learning methods
rather than hand-coded knowledge engineering, provided the learning algorithms have enough
data to go on (Halevyet al., 2009). Reporters have noticed the surge of new applications and have
written that “AI Winter” may be yielding to a new Spring (Havenstein,2005). As Kurzweil
(2005) writes, “today, many thousands of AI applications are deeply embedded in the
infrastructure of every industry.”
For some researchers and developers, the goal is to build systems that can act (or just think)
intelligently in the same way that people do. Others simply don’t care if the systems they build
have humanlike functionality, just so long as those systems do the right thing. Alongside these
two schools of thought are others somewhere in between, using human reasoning as a model to
help inform how we can get computers to do similar things.
Strong AI
The work aimed at genuinely simulating human reasoning tends to be called strong AI in that
any result can be used to not only build systems that think but also explain how humans think as
well. Genuine models of strong AI or systems that are actual simulations of human cognition
have yet to be built.
Weak AI
The work in the second school of thought, aimed at just getting systems to work, is usually called
weak AI in that while we might be able to build systems that can behave like humans, the results
tell us nothing about how humans think. One of the prime examples of this was IBM’s Deep
Blue, a system that was a master chess player but certainly did not play in the same way that
humans do and told us very little about cognition in general.
Everything in between
Balanced between strong and weak AI are those systems that are informed by human reasoning
but not slaves to it. This tends to be where most of the more powerful work in AI is happening
today. This work uses human reasoning as a guide, but is not driven by the goal to perfectly
model it. Now if we could only think of a catchy name for this school of thought! I don’t know,
maybe Practical AI? A good example is advanced natural language generation (NLG). Advanced
NLG platforms transform data into language. Where basic NLG platforms simply turn data into
text, advanced platforms turn this data into language in distinguish-able from the way a human
would write. By analyzing the context of what is being said and deciding what are the most
interesting and important things to say, these platforms communicate to us through intelligent
narratives.
Page | 10
The important takeaway is that in order for a system to be AI, it doesn’t have to be smart
in the same way that people are. It just needs to be smart.
Some AI systems are designed around specific tasks (often called narrow AI) and some are
designed around the ability to reason in general (referred to as broad AI or general AI). As with
strong and weak AI, the most visible work tends to focus on specific problems and falls into the
category of narrow AI.
The major exceptions to this are found in emerging work such as Google’s deep learning (aimed
at a general model of automatically learning categories from examples) and IBM’s Watson
(designed to draw conclusions from masses of textual evidence). But in both of these cases, the
commercial impact of these systems has yet to be completely played out.
The power of narrow AI systems is that they are focused on specific tasks. The weakness is that
these systems tend to be very good at what they do and absolutely useless for things that they
don’t do.
Different systems use different techniques and are aimed at different kinds of inference. So
there’s a difference between systems that recommend things to you based on your past behavior,
systems that learn to recognize images from examples, and systems that make decisions based on
the synthesis of evidence.
Many systems fall under the definition of narrow AI even though some people don’t think of
them as AI at all. When Amazon recommends a book for you, you don’t realize that an AI
system is behind the recommendation. A system collects information about you and your buying
behavior, figures out who you are and how you are similar to other people, and uses that
information to suggest products based on what similar people like. You don’t need to understand
how the system works. Amazon’s ability to look at what you like and figure out what else you
might like is pretty darn smart.
Page | 11
What is AI Technique?
AI Technique is a manner to organize and use the knowledge efficiently in such a way that:
What can AI do today? A concise answer is difficult because there are so many activities in so
many subfields. Here we sample a few applications;
Robotic vehicles:
A driverless robotic car named STANLEY sped through the rough terrain of the Mojave dessert
at 22 mph, finishing the 132-mile course first to win the 2005DARPA Grand Challenge.
STANLEY is a Volkswagen Touareg outfitted with cameras, radar, and laser rangefinders to
sense the environment and onboard software to command the steering, braking, and acceleration
(Thrun, 2006). The following year CMU’s BOSS won the Ur-ban Challenge, safely driving in
traffic through the streets of a closed Air Force base, obeying traffic rules and avoiding
pedestrians and other vehicles.
Speech recognition:
A traveler calling United Airlines to book a flight can have the en-tire conversation guided by an
automated speech recognition and dialog management system.
A hundred million miles from Earth, NASA’s Remote Agent program became the first on-board
autonomous planning program to control the scheduling of operations for a spacecraft (Jonssonet
al., 2000). REMOTEAGENT generated plans from high-level goals specified from the ground
and monitored the execution of those plans—detecting, diagnosing, and recovering from
problems as they occurred. Successor program MAPGEN (Al-Chang et al., 2004) plans the daily
operations for NASA’s Mars Exploration Rovers, and MEXAR2 (Cesta et al., 2007) did mission
planning both logistics and science planning for the European Space Agency’s Mars Express
mission in 2008.
Page | 12
Game playing:
IBM’s DEEP BLUE became the first computer program to defeat the world champion in a chess
match when it bested Garry Kasparov by a score of 3.5 to 2.5 in an exhibition match (Goodman
and Keene, 1997). Kasparov said that he felt a “new kind of intelligence” across the board from
him. Newsweek magazine described the match as “The brain’s last stand.” The value of IBM’s
stock increased by $18 billion. Human champions studied Kasparov’s loss and were able to draw
a few matches in subsequent years, but the most recent human-computer matches have been won
convincingly by the computer.
Spam fighting:
Each day, learning algorithms classify over a billion messages as spam, saving the recipient from
having to waste time deleting what, for many users, could comprise 80% or 90% of all messages,
if not classified away by algorithms. Because the spammers are continually updating their
tactics, it is difficult for a static programmed approach to keep up, and learning algorithms work
best (Sahami et al., 1998; Goodman and Heckerman, 2004).
Logistics planning:
During the Persian Gulf crisis of 1991, U.S. forces deployed a Dynamic Analysis and
Replanning Tool, DART (Cross and Walker, 1994), to do automated logistics planning and
scheduling for transportation. This involved up to 50,000 vehicles, cargo, and people at a time,
and had to account for starting points, destinations, routes, and conflict resolution among all
parameters. The AI planning techniques generated in hours a plan that would have taken weeks
with older methods. The Defense Advanced Research Project Agency (DARPA) stated that this
single application more than paid back DARPA’s30-year investment in AI.
Robotics:
The iRobot Corporation has sold over two million Roomba robotic vacuum cleaners for home
use. The company also deploys the more rugged PackBot to Iraq and Afghanistan, where it is
used to handle hazardous materials, clear explosives, and identify the location of snipers.
Machine Translation:
Page | 13
Handwriting Recognition
The hand writing recognition software reads the text written on paper by a pen or on screen by a
stylus. It can recognize the shapes of the letters and convert it into editable text.
Page | 14