AI

Download as pdf or txt
Download as pdf or txt
You are on page 1of 94

Artificial Intelligence

● Module I: Introduction to Intelligent Systems and


Intelligent Agents
1. Introduction to AI
a. What is AI? List down all components of AI. [Dec 2018, Dec
2015] [4mrks]
i. AI is a study of how human brain think, learn, decide and
work, when it tries to solve problems. And finally this study
outputs intelligent software systems.The aim of AI is to
improve computer functions which are related to human
knowledge, for example, reasoning, learning, and
problem-solving.
ii. Acting humanly: The Turing Test Approach
1. Definition​: The art of creating machines that perform
functions that perform functions that require
intelligence when performed by people.
2. According to this test, a computer needs to interact
with human interrogator by answering his questions in
written format. Computer passes the test if human
interrogator, cannot identify whether the written
responses from a person or a computer. Turing test is
valid even after 60 years of research.
3. For this test, the computer would need to possess the
following capabilities:
a. Natural Language Processing: This unit enables
computer to interact the English language and
communicate successfully.
b. Knowledge Representation: This unit is used to
store knowledge gathered by the system
through input devices.
c. Automated Reasoning: This unit enables to
analyze the knowledge stored in the system and
makes new interfaces to answer questions.
d. Machine Learning: This unit learns new
knowledge by taking current input from the
environment and adapts to new circumstances,
thereby enhancing the knowledge base of the
system.
iii. Thinking Humanly: The Cognitive Modelling Approach.
1. Definition: The exciting new effort to make computers
think machines with minds, in the full and literal sense.
2. Cognitive science is an interdisciplinary field which
combines computer models from Artificial Intelligence
with the techniques of psychology in order to construct
precise and testable theories for working of human
mind.
3. In order to make machines think like humans, we need
to first understand how humans think.
4. Research showed that there are various three ways
using which human’s thinking pattern can be caught.
a. Introspection: Through which humans can catch
their own thoughts as they go by.
b. Psychological experiments: can be carried out
by observing a person in action.
c. Brain imaging can be done by observing the
brain in action.
5. By catching the human thinking pattern it can be
implemented in Computer system as a program and if
the program’s input and output matches with that of
human, then it can be claimed that the system can
operate like humans.
iv. Thinking Rationally: The “Laws of Thought” Approach
1. Definition: The study of mental faculties through the
use of computational models.
2. The laws of thought are supposed to implement
operation of the mind and their study initiated the field
called logic. It provides precise notations to express
facts of the real world.
3. It also includes reasoning and “right thinking” that is
irrefutable thinking process. Also computer programs
based on those logic notations were developed to
create intelligent systems.
4. There are two problems with this approach:
a. This approach is not suitable to use when 100%
knowledge is not available for any problem.
b. As a vast number of computations was required
even to implement a simple human reasoning
process; practically all problems were not
solvable because even problems with just a few
hundred facts can exhaust the computational
resources of any computer.
v. Acting Rationally: The Rational Agent Approach
1. Definition: Computational Intelligence is the study of
the design of intelligent agents.
2. Rational Agent: Agent perceive their environment
through sensors over a prolonged time period and
adapt changes to create and pursue goals and take
actions through actuators to achieve those goals. A
rational agent is the one that does the right things and
acts rationally so as to achieve the best outcome even
when there is uncertainty in knowledge.
3. The rational agent approach has two advantages over
the other approaches:
a. As compared to the other approaches this is the
more general approach as, rationality can be
achieved by selecting the correct inference from
the several available.
b. Rationality has specific standards and is
mathematically well defined and completely
general and can be used to develop agent
designs that achieve it. Human behavior, on the
other hand, is very subjective and can we
proved mathematically
2. AI Problems and AI techniques
a. Explain any AI problem with suitable example. [Dec 2018]
[4mrks]
i. General Problem Solving was developed for common sense
reasoning problems. GPS used less amount of knowledge for
performing simple tasks.
ii. When the knowledge base was scaled up for the problems
which needed perception through vision and/or speech,
natural language understanding, problem solving for medical
diagnosis, chemical analysis, etc. it was difficult because it
involved analog signals which are generally very noisy
compared to digital signals.
iii. Following are some of the famous search problems in
artificial intelligence.
iv. 8-Puzzle: In 8-puzzle there are 8 tiles need to be arranged in
a Way showed in the goal state. The condition is to only
blank tile can be moved to immediate up down, right or left
positions and the goal state is to be attending a minimum
number of moves.
v. Formulate the state space search problems for 8 puzzle
problem.
vi. For the above examples, it must be clear that AI problems
are the one in which there are few conditions specified aim
not only to generate the solution but also improve the
performance of the system; because that is where the
intelligence of the system gets challenged. The term
intelligence includes many cognitive skills, like the ability to
solve problems, learn, interpret and understand language.
vii. Artificial intelligence address all of these, bus more progress
has been made in the area of problem solving concepts and
methods. These techniques enable to build programs that not
only find solutions but also are able to reason about
problems.
3. Solving problems by searching
a. Explain State space search [May 2017] [5mrks]
i. A state space search has two types:
1. Uninformed search​: These are search problems
where we do not have any knowledge about the
domain of the problem, so we have to search without
any additional knowledge.
2. Informed Search​: We have knowledge about the
domain about the problem, we will have
functions(heuristic) to guide the searching problem.
ii. A state space is a set of descriptions or states.
iii. Each search problem consists of:
1. One or more initial states.
2. A set of legal actions. Actions are represented by
operators or moves applied to each state. For
example, the operators in a state space representation
of the 8-puzzle problem are left, right, up and down.
3. One or more goal states.
iv. The number of operators are problem dependent and specific
to a particular state space representation. The more
operators the larger the branching factor of the state space.
Thus, the number of operators should kept to a minimum,
e.g. 8-puzzle: operations are defined in terms of moving the
space instead of the tiles.
v. A search algorithm is applied to a state space representation
to find a solution path. Each search algorithm applies a
particular search strategy.
vi. Need to specify:
1. The ​initial state
2. The ​operators​: actions that take you from one state
to another
3. A ​goal test​: determines if a state is a goal state
a. if only one goal state, see if the current state
matches it
b. the test may also be more complex:
i. n-queens: do we have n queens on the
board without any two queens on the
same row, column, or diagonal?
4. The ​costs​ associated with applying a given operator
a. Allow us to differentiate between solutions
b. Example: allow us to prefer 8-puzzle solutions
that involve fewer steps
c. Can be 0 if all solutions are equally preferable
vii. Basic idea:
1. If the initial state is a goal state, return it.
2. If not, apply the operators to generate all states that
are one step from the initial state (its successors).
3. Different search strategies consider the states in
different orders, they may use different data structures
to store the states
4. that have yet to be considered
4. Problem Formulation
Problem formulation involves deciding what actions and states to consider, given
the goal. For example, if the agent were to consider the action to be at the level
of “move the left foot by one inch” or “turn the steering wheel by 1 degree left”,
there would be too many steps for the agent to leave the parking lot, let alone to
Bucharest. In general, we need to abstract the state details from the
representation.

A problem can be defined formally by 5 components:

5. The ​initial state of the agent. In this case, the initial state can be
described as ​In: Arad
6. The possible ​actions available to the agent, corresponding to each of the
state the agent resides in. For example, ACTIONS(​In: Arad​) = {​Go: Sibiu,​
Go: Timisoara​, ​Go: Zerind} ​
7. The ​transition model describing what each action does. Let us represent
it by RESULT(s, a) where s is the state the action is currently in and a is
the action performed by the agent. In this example, RESULT(​In: Arad, Go:
Zerind​) = ​In: Zerind.
8. The ​goal test​, determining whether the current state is a goal state.
Here, the goal state is {​In: Bucharest​}
9. The ​path cost function, which determine the cost of each path, which is
reflecting in the performance measure. For the agent trying to reach
Bucharest, time is essential, so we can set the cost function to be the
distance between the places. (Here, we are ignoring the other factors that
influence the traveling time). By convention, we define the cost function
as ​c(s, a, s’)​, where s is the current state and a is the action performed by
the agent to reach state s’.
10. State Space Representation
a. Given a full 4 Gallon jug and an empty 3 Gallon jug, the goal
is to fill the 4 Gallon jug with exactly 2 Gallons of water. Give
state space representation. [May 2019] [10mrks]
i. It is a set of all possible states for a given problem is called
state space of the problem.
ii. Searching is needed for solution, if steps are not known
beforehand and needs to be found out.
iii. Following factors are needed:
1. Initial state description of the problem
2. Set of legal operators
3. Final or goal state
iv. Here, let ​x denote the 4-gallon jug and ​y denote the 3-gallon
jug.
v.
S.n Initial Conditi Final Description
o. State on State

1 (x,y) If x<4 (4,y) Fill the 4 gallon jug


completely

2 (x,y) If y<3 (x,3) Fill the 3 gallon jug


completely

3 (x,y) If x>0 (x-d,y) Pour some part


from the 4 gallon
jug

4 (x,y) If y>0 (x,y-d) Pour some part


from the 3 gallon
jug

5 (x,y) If x>0 (0,y) Empty the 4 gallon


jug

6 (x,y) If y>0 (x,0) Empty the 3 gallon


jug

7 (x,y) If (4,y-[4-x Pour some water


(x+y)< ]) from the 3 gallon
7 jug to fill the four
gallon jug

8 (x,y) If (x-[3-y], Pour some water


(x+y)< 3) from the 4 gallon
7 jug to fill the 3
gallon jug.

9 (x,y) If (x+y,0) Pour all water from


(x+y)< 3 gallon jug to the
4 4 gallon jug

10 (x,y) If (0,x+y) Pour all water from


(x+y)< the 4 gallon jug to
3 the 3 gallon jug
vi. The listed production rules contain all the actions that could
be performed by the agent in transferring the contents of
jugs. But, to solve the water jug problem in a minimum
number of moves, following set of rules in the given
sequence should be performed:
vii.
S.no. 4 Gallon jug 3 Gallon jug Rule

1 4 0 Initial state

2 1 3 Rule #8

3 1 0 Rule #2

4 0 1 Rule #10

5 4 1 Rule #1

6 2 3 Rule #8
viii. On reaching the 6​th attempt, we reach a state which is our
goal state. Therefore, at this state, our problem is solved.

ix.
b. Given a 7 Gallon jug and a 2 Gallon jug, the goal is to
measure 3 and 4 liters. Give state space representation. [Dec
2016] [10mrks]
c. Given a 7 Gallon jug and a 5 Gallon jug, the goal is to
measure 1 liter. Give state space representation. [Dec 2016]
[10mrks]

d. Formulate the state space search problems for 8 puzzle


problem.
i. The 8-Puzzle involves moving the tiles on the board above
into a particular configuration. The black square on the board
represents a space. The player can move a tile into the
space, freeing that position for another tile to be moved into
and so on.
ii.
Sr.no. Initial Condition Final Descriptio
State State n

1 ({0,x2,x3 If 0 in ({x1,0,x2 Right


},{4,5,6} first row, },{x3,x4,
,{7,8,9}) first x5},{x6,x
column 7,x8})

2 ({0,x1,x2 If 0 in ({x3,x1,x Down


},{x3,x4, first row, 2},{0,x4,
x5},{x6,x first x5},{x6,x
7,x8}) column 7,x8})

3 ({x1,0,x2 If 0 in ({0,x1,x2 Left


},{x3,x4, first row, },{x3,x4,
x5},{x6,x second x5},{x6,x
7,x8}) column 7,x8})

4 ({x1,0,x2 If 0 in ({x1,x2,0 Right


},{x3,x4, first row, },{x3,x4,
x5},{x6,x second x5},{x6,x
7,x8}) column 7,x8})

5 ({x1,0,x2 If 0 in ({x1,x4,x Down


},{x3,x4, first row, 2},{x3,0,
x5},{x6,x second x5},{x6,x
7,x8}) column 7,x8})

... ... ... ... ...


iii.
iv. Example:

v.
vi.
vii. Each of the following could serve as a heuristic function for
the 8-Puzzle problem:
1. Number of tiles out of place - count the number of tiles
out of place in each state compared to the goal .
2. Sum all the distance that the tiles are out of place.
3. Tile reversals - multiple the number of tile reversals by
2.
e. Formulate the state space search problems for vacuum
cleaner world. [Dec 2016] [5mrks]

i.

ii.
iii.

11. Structure of Intelligent agents


a. An agent is anything that can be viewed as perceiving its
environment through sensors and acting upon the environment
through actuators• Human agent: eyes, ears, and other organs for
sensors;hands,legs, mouth, and other body parts for actuators.

b.
c.
12. Types of Agents
a. List down all types of agents. [May 2018] [5mrks]
i. Simple Reflex agent:
1. The Simple reflex agents are the simplest agents.
These agents take decisions on the basis of the current
percepts and ignore the rest of the percept history.
2. These agents only succeed in the fully observable
environment.
3. The Simple reflex agent does not consider any part of
percepts history during their decision and action
process.
4. The Simple reflex agent works on Condition-action
rule, which means it maps the current state to action.
Such as a Room Cleaner agent, it works only if there is
dirt in the room.
5. Problems for the simple reflex agent design approach:
a. They have very limited intelligence
b. They do not have knowledge of non-perceptual
parts of the current state
c. Mostly too big to generate and to store.
d. Not adaptive to changes in the environment.
6.
ii. Model-based reflex agent [Dec 2016] [5mrks]
1. The Model-based agent can work in a partially
observable environment, and track the situation.
2. A model-based agent has two important factors:
a. Model: It is knowledge about "how things
happen in the world," so it is called a
Model-based agent.
b. Internal State: It is a representation of the
current state based on percept history.
3. These agents have the model, "which is knowledge of
the world" and based on the model they perform
actions.
4. Updating the agent state requires information about:
a. How the world evolves\
b. How the agent's action affects the world.

5.
iii. Goal-based agents ​[Dec 2018, Dec 2017, May 2017]
[5mrks]
1. The knowledge of the current state environment is not
always sufficient to decide for an agent to what to do.
2. The agent needs to know its goal which describes
desirable situations.
3. Goal-based agents expand the capabilities of the
model-based agent by having the "goal" information.
4. They choose an action, so that they can achieve the
goal.
5. These agents may have to consider a long sequence of
possible actions before deciding whether the goal is
achieved or not. Such considerations of different
scenario are called searching and planning, which
makes an agent proactive.

6.
iv. Utility-based agents ​[Dec 2018, Dec 2017, May 2016]
[5mrks]
1. These agents are similar to the goal-based agent but
provide an extra component of utility measurement
which makes them different by providing a measure of
success at a given state.
2. Utility-based agent act based not only goals but also
the best way to achieve the goal.
3. The Utility-based agent is useful when there are
multiple possible alternatives, and an agent has to
choose in order to perform the best action.
4. The utility function maps each state to a real number
to check how efficiently each action achieves the goals.
5.
v. Learning Agents ​[May 2019, May 2016] [5mrks] [May
2018, Dec 2017, May 2017] [4mrks]
1. A learning agent in AI is the type of agent which can
learn from its past experiences, or it has learning
capabilities.
2. It starts to act with basic knowledge and then able to
act and adapt automatically through learning.
3. A learning agent has mainly four conceptual
components, which are:
a. Learning element: It is responsible for making
improvements by learning from environment
b. Critic: Learning element takes feedback from
critic which describes how well the agent is
doing with respect to a fixed performance
standard.
c. Performance element: It is responsible for
selecting external action
d. Problem generator: This component is
responsible for suggesting actions that will lead
to new and informative experiences.
4. Hence, learning agents are able to learn, analyze
performance, and look for new ways to improve the
performance.
5.
13. Agent Environments PEAS representation for an Agent.
a. Short note on properties of agent task environment [May
2019] [5mrks]
i. Fully observable Vs Partially Observable
1. If an agents sensors give it access to the complete
state of the environment at each point in time, then
we say that the task environment is fully observable. A
task environment is effectively fully observable if the
sensors detect all aspects that are relevant to the
choice of action, relevance depends on the
performance measure. Fully observable environments
are convenient because the agent need not maintain
any internal state to keep track of the world. An
environment might be partially observable because of
noisy and inaccurate sensors missing from the sensor
data.
2. Eg: ​Image recognition operates in fully observable
domains. Partially observable environments such as
the ones encountered in self-driving vehicle
scenarios deal with partial information in order to solve
AI problems.
ii. Deterministic Vs stochastic
1. If the next state of the environment is completely
determined by the current state and the action
executed by the agent then we say the environment is
deterministic(​For example, chess​), otherwise it is
stochastic​(For example, driving)​.
2. In other words, deterministic environments ignore
uncertainty. Most real world AI environments are not
deterministic. Instead, they can be classified as
stochastic.
iii. Episodic Vs Sequential
1. In an episodic task environment, the agent experience
is divided into atomic episodes each episode consists
of the agent perceiving and then performing a single
action. Crucially, the next episode does not depend on
the actions taken in previous episodes. In episodic
environment, the choice of action in each episode
depends only on the episode itself. Many classification
tasks are episodes.
2. Eg: An agent that has to ​spot defective parts on an
assembly line bases each decision on the current part,
regardless of previous decisions. Moreover, the current
decision doesn’t affect whether the next part is
defective. ​Chess and taxi driving are sequential, in
both cases, short term actions can have long term
consequences.
3. Episodic environments are much simpler than
sequential environments because the agent does not
need to think ahead.
iv. Static Vs Dynamics
1. If the environment can change white an agent is
delebrating then we say the environment is dynamic
for that agent otherwise, it is static. Static
environments are easy to deal with because the agent
need not keep looking at the world while it is deciding
on an action, nor need it worry about the passage of
time. Dynamic environments are continuously asking
the agent what it wants to do, if it has not been
decided yet , that counts as deciding to do nothing. If
the environment itself does not change with the
passage of time but the agents performance score
does, then we say the environment is semi dynamic.
2. Eg: Taxi driving is clearly dynamic, the other cars
and the taxi itself keep moving while the driving
algorithm dithers about what to do next. ​Chess when
played with a clock is semi dynamic ​crossword,
puzzles​ are static.
v. Single Agent Vs Multi Agent
1. Either an agent is acting in the environment solely, or
engage in certain relationships with other agents,
distinguishing them from other objects of the
environment (by identifying that its own performance
depends on other agent’s performance). Multiagent
environment can be competitive, cooperative, or
partially both.
2. Eg: An agent solving a ​crossword puzzle by itself is
clearly in a single agent chess is in a two agents
environment. ​Chess is a competitive multi agent
environment. In a ​taxi driving ​environment, avoiding
collisions maximizes the performance measure. Of all
agents, so it is partially cooperative multi agent
environment.
vi. Discrete Vs Continuous
1. If there are a limited number of distinct, clearly
defined, states of the environment, the environment is
discrete (For example, ​chess​); otherwise it is
continuous (For example, ​driving​).
b. What is PEAS descriptor? Give PEAS Descriptor for Taxi
Driver. [May 2019, May 2016] [4mrks]
i. PEAS stands for Performance, Environment, Actuators,
and Sensors​.
ii. Based on these properties of an agent, they can be grouped
together or can be differentiated from each other. Each agent
has these following properties defined for it.
iii. Performance: ​The output which we get from the agent. All
the necessary results that an agent gives after processing
comes under its performance.
iv. Environment: ​All the surrounding things and conditions of
an agent fall in this section. It basically consists of all the
things under which the agents work.
v. Actuators: ​The devices, hardware or software through which
the agent performs any actions or processes any information
to produce a result are the actuators of the agent.
vi. Sensors: ​The devices through which the agent observes and
perceives its environment are the sensors of the agent.
vii. Eg: PEAS for Taxi Driver
1. Performance Measure:
a. Safety: Automated system should be able to
drive the car safely without dashing anywhere.
b. Optimum speed: Automated system should be
able to maintain the optimal speed depending
upon the surroundings.
c. Comfortable journey: Automated system should
be able to give a comfortable journey to the end
user.
2. Environment:
a. Roads: Automated car driver should be able to
drive on any kind of road ranging from city
roads to the highway.
b. Traffic conditions: You will find different sorts of
traffic conditions for different types of roads.
3. Actuators:
a. Steering wheel: used to direct car in desired
directions.
b. Accelerator, gear: To increase or decrease the
speed of the car.
4. Sensors:
a. To take i/p from environment in car driving
example cameras, sonar system etc.
c. Explain PEAS descriptor of Wumpus world problem. [Dec
2018, May 2018, Dec 2017, May 2017, May 2016] [5mrks]
i. Performance Measure
1. +1000 points for leaving the cave with Gold
2. -1000 points for being eaten by Wumpus or falling into
pit (death)
3. -1 point per step
4. -10 point for using the arrow
5. Game ends if agent dies or leaves the cave
ii. Environment (Internet version)
1. A 4*4 grid of rooms.
2. The agent initially in room square [1, 1], facing right.
3. Location of Wumpus and gold are chosen randomly
except the first square [1,1].
4. Each square of the cave (except the first square) can
be a pit with probability 0.2.
iii. Actuators
1. Left turn
2. Right turn
3. Move Forward
4. Grab
5. Release
6. Shoot
iv. Sensors
1. Stench - Agent will perceive stench in the room
adjacent to Wumpus (not diagonally).
2. Breeze - Agent will perceive breeze in the room
directly adjacent to the pit.
3. Glitter - Agent will perceive glitter in the room where
Gold is present.
4. Bump - Agent will perceive bump if they walk into a
wall.
5. Scream - When the Wumpus is shot, it emits a horrible
scream which can be perceived anywhere in the cave.
v. Properties:
1. Partially observable: The Wumpus world is partially
observable because the agent can only perceive the
close environment such as an adjacent room.
2. Deterministic: It is deterministic, as the result and
outcome of the world are already known.
3. Sequential:​ The order is important, so it is sequential.
4. Static:​ It is static as Wumpus and Pits are not moving.
5. Discrete:​ The environment is discrete.
6. One agent: The environment is a single agent as we
have one agent only and Wumpus is not considered as
an agent.
d. Give PEAS Descriptor for part picking robot and medical
diagnosis system. [May 2019] [4mrks]
i. Part Picking
1. Performance measure: Percentage of parts in correct
bins
2. Environment: Conveyor belt with parts, bins
3. Actuators: Jointed arm and hand
4. Sensors: Camera, joint angle sensors
ii. Medical diagnosis system
1. Performance measure: Healthy patient, minimize
costs, lawsuits
2. Environment: Patient, hospital, staff
3. Actuators: Screen display (questions, tests, diagnosis,
treatments, referrals)
4. Sensors: Keyboard (entry of symptoms, findings,
patient's answers)
e. Explain WUMPUS problem and percept sequence [May 2018]
[5mrks]
i. The Wumpus world is a cave which has 4/4 rooms connected
with passageways. So there are a total of 16 rooms which
are connected with each other. We have a knowledge-based
agent who will go forward in this world. The cave has a room
with a beast which is called Wumpus, who eats anyone who
enters the room. The Wumpus can be shot by the agent, but
the agent has a single arrow. In the Wumpus world, there
are some Pits rooms which are bottomless, and if agent falls
in Pits, then he will be stuck there forever. The exciting thing
with this cave is that in one room there is a possibility of
finding a heap of gold. So the agent goal is to find the gold
and climb out the cave without fallen into Pits or eaten by
Wumpus. The agent will get a reward if he comes out with
gold, and he will get a penalty if eaten by Wumpus or falls in
the pit.
ii. "GOLD"​-Percept: (cumulative)
There is at least one piece of ​gold​ in the current field...
iii. "STENCH"​-Percept: (cumulative)
There is a ​wumpus near the current field (up, down, left or
right).
iv. "BREEZE"​-Percept: (cumulative)
There is a ​pit​ near the current field (up, down, left or right).
v. "BUMP"​-Percept: (cumulative)
The agent ran into a wall with it's last move.
vi. "SCREAM"​-Percept: (cumulative)
Someone - if not the agent himself - killed the Wumpus with
a javelin.

vii. "EMPTY"​-Percept: (non-cumulative)


This Percept is generated in the absence of all the above
Percepts. In this case the agent may move randomly without
risking it's life.
Module II: Search Techniques

1. Uninformed Search: DFS


a. Apply DFS algorithm on a given tree. Write sequence of
node. [May 2018] [10mrks]
i. Depth first search (DFS) algorithm starts with the initial node
of the graph G, and then goes to deeper and deeper until we
find the goal node or the node which has no children. The
algorithm, then backtracks from the dead end towards the
most recent node that is yet to be completely unexplored.
ii. The data structure which is being used in DFS is stack. The
process is similar to BFS algorithm. In DFS, the edges that
leads to an unvisited node are called discovery edges while
the edges that leads to an already visited node are called
block edges.
iii. Algorithm:
1. Step 1: SET STATUS = 1 (ready state) for each node
in G
2. Step 2: Push the starting node A on the stack and set
its STATUS = 2 (waiting state)
3. Step 3:​ Repeat Steps 4 and 5 until STACK is empty
4. Step 4: Pop the top node N. Process it and set its
STATUS = 3 (processed state)
5. Step 5: Push on the stack all the neighbours of N that
are in the ready state (whose STATUS = 1) and set
their STATUS = 2 (waiting state) [END OF LOOP]
6. Step 6:​ EXIT
iv. Performance evaluation:
1. Completeness: ​No​, It is complete only if the search
tree is finite.
2. Optimality: ​No​, DFS is not optimal, meaning the
number of steps in reaching the solution, or the cost
spent in reaching it is high.
3. Time complexity: ​bm​ ​ : Terrible if m is much larger than
d. But it solution is dense, may be faster than BFS
4. Space Complexity: ​bm: ​Equivalent to how large can
the fringe get.
v.
2. Uninformed Search: BFS
a. Apply BFS algorithm on a given tree. Write sequence of
node. [May 2018] [10mrks]
i. It starts from the root node, explores the neighboring nodes
first and moves towards the next level neighbors. It
generates one tree at a time until the solution is found. It can
be implemented using FIFO queue data structure. This
method provides the shortest path to the solution.
ii. If ​branching factor (average number of child nodes for a
given node) = b and depth = d, then the number of nodes at
level d = b​d​.
iii. The total no of nodes created in worst case is b + b​2 + b​3 +
… + b​d​.
iv. Algorithm​:
1. Step 1: SET STATUS = 1 (ready state) for each node
in G
2. Step 2: Enqueue the starting node A and set its
STATUS = 2 (waiting state)
3. Step 3:​ Repeat Steps 4 and 5 until QUEUE is empty
4. Step 4: Dequeue a node N. Process it and set its
STATUS = 3 (processed state).
5. Step 5: Enqueue all the neighbours of N that are in
the ready state (whose STATUS = 1) and set their
STATUS = 2 (waiting state) [END OF LOOP]
6. Step 6:​ EXIT
v. Performance evaluation:
1. Completeness: ​Yes​, given search tree, BFS will come
up with a solution if it exists.
2. Optimality: ​Yes​, BFS is optimal as long as the costs of
all edges are equal.
3. Time complexity: ​bd+1​
​ : Equivalent to the number of
nodes traversed in BFS until the shallowest solution.
4. Space Complexity: ​bd+1​​ : Equivalent to how large can
the fringe get.
vi.
3. Uninformed Search: Uniform cost search
a. Apply uniform cost search on the given graph [Dec 2016]
[5mrks]

i.

ii.
iii.

iv.
4. Uninformed Search: Depth Limited Search
a. Depth limited search [Dec 2018] [5mrks]
i. In order to avoid the infinite loop condition arising in DFS, in
depth limited search technique, depth-first search is carried
out with a predetermined depth limit.
ii. As in the case of DFS in DLS we can use the same fringe
implemented as queue.
iii. Additionally the level of each node needs to be calculated to
check whether it is within the specified depth limit.
iv. Depth-limited search can terminate with two conditions:
1. If the solution is found.
2. If there is no solution within given depth limit.
v. Process: If depth is fixed to 2, DLS carries out depth first
search till second level in the search tree.

vi.
vii. Algorithm:
1. Determine the start node and the search depth.
2. Check if the current node is the goal node.
a. If not: Do nothing
b. If yes:return
3. Check if the current node is within the specified search
depth
a. If not: Do nothing
b. If yes:Expand the node and save all of its
successors in a stack.
4. Call DLS recursively for all nodes of the stack and go
back to step 2.
viii. Performance evaluation:
1. Completeness: Its complete if shallowest goal is
beyond the depth limit.
2. Optimality: Non optimal, as the depth chosen can be
greater than d.
3. Time complexity: Same as DFS, O(​bl​​ ), where l is the
specified depth limit.
4. Space Complexity: Same as DFS, O(​bl​), where l is
specified depth limit.
5. Uninformed Search: Iterative Deepening
a. Give the comparative analysis of DFS, BFS, Depth Limited
Search, Iterative Deepening, Bidirectional Search with
respect to TIme, Space, Optimality, Completeness [May
2019, Dec 2017, Dec 2016, May 2016] [5mrks] [May 2018,
May 2017] [10mrks]
i.
DFS BFS Depth Iterativ Bi
Limite e directio
d Deepen nal
ing

Def: The It starts Depth-f IDDFS Starts


algorith at the irst calls searchin
m starts tree root search DFS for g from
at the and is different root to
root explores carried depths goal and
node and all of the out starting from
explores neighbor with a from an goal to
as far as nodes at predete initial root
possible the rmined value. In simultan
along present depth every eously.
each depth limit. call, DFS When
branch prior to is the 2
before moving restricte trees
backtrac on to d from intersect
kin. the going the
nodes at beyond solution
the next given is
depth depth. obtained
level. So .
basically
we do
DFS in a
BFS
fashion.

Time b​m​: b​d+1​: b​l​: b​d​: In b​d/2​: d/2


Terrible Equivale Where l an since we
if m is nt to the is the iterative have 2
much number limit set deepeni trees
larger of nodes in DFS ng running
than d. traverse search, simultan
But it d in BFS the eously.
solution until the nodes
is dense, shallowe on the
may be st bottom
faster solution. level are
than BFS expande
d once,
those on
the next
to
bottom
level are
expande
d twice,
Spac bm: b​d+1​: bl​: Bd: b​d/2​: d/2
e Equivale Equivale Where l Where d since we
nt to nt to is the is the have 2
how how limit set height trees
large can large in DFS till goal running
the can the node simultan
fringe fringe eously.
get. get.

Opti No​, DFS Yes​, No​, as Yes​, if Yes​, if


malit is not BFS is the step BFS is
y optimal, optimal depth cost = used for
meaning as long chosen 1. Can search
the as the can be be and
number costs of greater modified paths
of steps all edges than d. to have
in are explore uniform
reaching equal. uniform cost.
the cost
solution, tree.
or the
cost
spent in
reaching
it is high.

Com No​, It is Yes​, Yes​, if Yes​, Yes​, if


plete complete given l>=d given BFS is
ness only if search i.e. if search used in
the tree, shallow tree, both
search BFS will est goal IDDFS searches
tree is come up is will .
finite. with a beyond come up
solution the with a
if it depth solution
exists. limit. if it
exists.
ii. d -> height till goal
iii. m -> Total height

iv.
6. Informed Search: Heuristic functions
a. What is heuristic function? Explain 8 puzzle problem. [Dec
2018, May 2017, Dec 2016] [5mrks] [May 2018] [4mkrs]
i. Heuristic is a function which is used in Informed Search, and
it finds the most promising path. It takes the current state of
the agent as its input and produces the estimation of how
close agent is from the goal.
ii. The heuristic method, however, might not always give the
best solution, but it guaranteed to find a good solution in
reasonable time. Heuristic function estimates how close a
state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of
states. The value of the heuristic function is always positive.
iii. The term ​admissible,​ which means the heuristic never
overestimates the true cost. ​Admissibility can be an
important quality and is required for some search algorithms
like A*.
iv. The heuristic function is a way to inform the search about the
direction to a goal. It provides an informed way to guess
which neighbor of a node will lead to a goal.
v. Admissible Heuristics for the 8-puzzle:

vi.
1. h1 : Number of misplaced tiles​.
a. In the above figure, all the tiles are out of
position, hence for this state, h1 = 8.
b. h1 is an admissible heuristic, since it is clear
that every tile that is out of position must be
moved at least once.
2. h2 : Sum of Euclidean distances of the tiles from
their goal positions
a. In the given figure, all the tiles are out of
position, hence for this state,
b. h2 = sqrt(5) + 1 + sqrt(2) + sqrt(2) + 2 +
sqrt(5) + sqrt(5) + 2 = 14.53.
c. h2 is an admissible heuristic, since in every
move, one tile can only move closer to its goal
by one step and the euclidean distance is never
greater than the number of steps required to
move a tile to its goal position.
3. h3 : Sum of Manhattan distances of the tiles from
their goal positions
a. In the given figure, all the tiles are out of
position, hence for this state,
b. h3 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18.
c. h3 is an admissible heuristic, since in every
move, one tile can only move closer to its goal
by one step.
4. h4 : Number of tiles out of row + Number of tiles
out of column
a. In the given figure, all the tiles are out of
position, hence for this state,
b. h4 = 5 (out of row) + 8 (out of column) = 13.
c. h4 is an admissible heuristic, since every tile
that is out of column or out of row must be
moved at least once and every tile that is both
out of column and out of row must be moved at
least twice.
7. Informed Search: Hill Climbing
a. Explain Hill Climbing and its drawbacks in detail. [May 2019,
May 2018, Dec 2017] [5mrks] [May 2017] [10mrks]
i. Hill climbing algorithm is a ​local search algorithm which
continuously moves in the direction of increasing
elevation/value to find the peak of the mountain or best
solution to the problem. It terminates when it reaches a ​peak
value where no neighbor has a higher value.
ii. Hill climbing algorithm is a technique which is used for
optimizing the ​mathematical problems​. One of the widely
discussed examples of Hill climbing algorithm is
Traveling-salesman Problem in which we need to minimize
the distance traveled by the salesman.
iii. It is also called ​greedy local search as it only looks to its
good immediate neighbor state and not beyond that.
iv. A node of hill climbing algorithm has two components which
are ​state and value.
v. Hill Climbing is mostly used when a good heuristic is
available.
vi. In this algorithm, we ​don't need to maintain and handle the
search tree or graph​ as it only keeps a single current state.
vii. Features:
1. Generate and Test variant​: Produces feedback which
helps to decide which direction to move in the search
space.
2. Greedy approach:​ Optimizes the cost.
3. No backtracking:​ Does not backtrack
viii. State Space:
1. On Y-axis we have taken the function which can be an
objective function or cost function, and state-space on
the x-axis. If the function on Y-axis is cost then, the
goal of search is to find the global minimum and local
minimum. If the function of Y-axis is Objective
function, then the goal of the search is to find the
global maximum and local maximum.

ix.
x. Regions:
1. Local Maximum​: Local maximum is a state which is
better than its neighboring states, but there is also
another state which is higher than it.
2. Global Maximum: Global maximum is the best possible
state of state space landscape. It has the highest value
of objective function.
3. Current state: It is a state in a landscape diagram
where an agent is currently present.
4. Flat local maximum: It is a flat space in the landscape
where all the neighboring states of current states have
the same value.
xi. Algorithm:
1. Step 1: Evaluate the initial state, if it is goal state
then return success and Stop.
2. Step 2: Loop Until a solution is found or there is no
new operator left to apply.
3. Step 3: Select and apply an operator to the current
state.
4. Step 4:​ Check new state:
a. If it is goal state, then return success and quit.
b. Else if it is better than the current state then
assign new state as a current state.
c. Else if not better than the current state, then
return to step2.
5. Step 5:​ Exit.
xii. Limitations:
1. Local Maximum:
a.
b. A local maximum is a peak state in the
landscape which is better than each of its
neighboring states, but there is another state
also present which is higher than the local
maximum.
c. Solution: Backtracking technique can be a
solution of the local maximum in state space
landscape. Create a list of the promising path so
that the algorithm can backtrack the search
space and explore other paths as well.
2. Plateau:

a.
b. A plateau is the flat area of the search space in
which all the neighboring states of the current
state contains the same value, because of this
algorithm does not find any best direction to
move. A hill-climbing search might be lost in the
plateau area.
c. Solution: The solution for the plateau is to take
big steps or very little steps while searching, to
solve the problem. Randomly select a state
which is far away from the current state so it is
possible that the algorithm could find
non-plateau region.
3. Ridges:

a.
b. A ridge is a special form of the local maximum.
It has an area which is higher than its
surrounding areas, but itself has a slope, and
cannot be reached in a single move.
c. Solution: With the use of bidirectional search,
or by moving in different directions, we can
improve this problem.
8. Informed Search: Simulated Annealing
a. Explain ​Hill climbing and Simulated Annealing with suitable
example. [Dec 2018, Dec 2017, May 2016] [10mrks]
i. SA is a global optimization technique.
ii. SA distinguishes between different local optima.
iii. SA is a memory less algorithm, the algorithm does not use
any information gathered during the search.
iv. SA is motivated by an analogy with annealing in solids.
v. Simulated Annealing – an iterative improvement algorithm.
vi. Background: Annealing:
1. Simulated annealing is so named because of its
analogy to the process of physical annealing with
solids.
2. A crystalline solid is heated and then allowed to cool
very slowly until it achieves its most regular possible
crystal lattice configuration (i.e., its minimum lattice
energy state), and thus is free of crystal defects.
3. If the cooling schedule is sufficiently slow, the final
configuration results in a solid with such superior
structural integrity.
4. Simulated annealing establishes the connection
between this type of thermodynamic behaviour and
the search for global minima for a discrete optimization
problem.
vii. Acceptance of a search step:
1. Assume the performance change in the search
direction is (f (B) – f (A)).
2. Always accept a descending step.
3. Accept an ascending step only if it passes a random
test,, i.e. P(move A→B ) = e^( (f (B) – f (A)) / T)
viii. Stopping Criterion:
1. A given minimum value of the temperature has been
reached.
2. A certain number of iterations (or temperatures) has
passed without acceptance of a new solution.
3. A specified number of total iterations has been
executed
ix.
x. Simulated Annealing algorithms are usually better than
greedy algorithms, when it comes to problems that have
numerous locally optimum solutions.
xi. Simulated Annealing is not the best solution to circuit
partitioning or placement. Network flow approach to solving
these problems functions much faster.
xii. Simulated Annealing guarantees a convergence upon running
sufficiently large number of iterations.
9. Informed Search: Best First Search
a. Greedy best-first search algorithm always selects the path which
appears best at that moment. It is the combination of depth-first
search and breadth-first search algorithms. It uses the heuristic
function and search. Best-first search allows us to take the
advantages of both algorithms. With the help of best-first search, at
each step, we can choose the most promising node. In the best first
search algorithm, we expand the node which is closest to the goal
node and the closest cost is estimated by heuristic function, i.e.
b. f(n)= g(n). Were, h(n)= estimated cost from node n to the goal.
c. The greedy best first algorithm is implemented by the priority
queue.
d. Advantages:
i. Best first search can switch between BFS and DFS by gaining
the advantages of both the algorithms.
ii. This algorithm is more efficient than BFS and DFS algorithms.
e. Disadvantages:
i. It can behave as an unguided depth-first search in the worst
case scenario.
ii. It can get stuck in a loop as DFS.
iii. This algorithm is not optimal.
f. Time Complexity: The worst case time complexity of Greedy best
first search is O(b​m​).
g. Space Complexity: The worst case space complexity of Greedy
best first search is O(b​m​). Where, m is the maximum depth of the
search space.
h. Complete: Greedy best-first search is also incomplete, even if the
given state space is finite.
i. Optimal:​ Greedy best first search algorithm is not optimal.
10. Informed Search: A*
a. Explain A* Algorithm with an example. [May 2019, Dec
2018, May 2018, May 2017] [5mrks]
i. A* Tree Search, or simply known as A* Search, combines the
strengths of uniform-cost search and greedy search. In this
search, the heuristic is the summation of the cost in UCS,
denoted by g(x), and the cost in greedy search, denoted by
h(x). The added cost is denoted by f(x).
ii. Heuristic: ​The following points should be noted wrt
heuristics in A* search.
1. Here, h(x) is called the ​forward cost​, and is an
estimate of the distance of the current node from the
goal node.
2. And, g(x) is called the ​backward cost​, and is the
cumulative cost of a node from the root node.
3. A* search is optimal only when for all nodes, the
forward cost for a node h(x) underestimates the actual
cost h*(x) to reach the goal. This property of A*
heuristic is called ​admissibility​.
iii. Strategy: ​Choose the node with lowest f(x) value.
iv. Complete
1. Yes, unless there are infinitely many nodes with f <=
f(G)
v. Time
1. Exponential in [relative error of h x length of soln]
2. The better the heuristic, the better the time
3. Best case h is perfect, O(d)
4. Worst case h = 0, O(b​d​) same as BFS
vi. Space
1. Keeps all nodes in memory and save in case of
repetition
2. This is O(b​d​) or worse
3. A* usually runs out of space before it runs out of time
vii. Optimal
1. Yes, cannot expand f​i + 1​ unless f​i​ is finished
viii. Example:
ix.
x.

xi.
b. Differentiate between Informed & Uninformed search
techniques. [Dec 2018, May 2016] [5mrks]
i.
Basis for Informed Uninformed
comparison Search Search

Basic Uses knowledge No use of


to find the steps knowledge
to the solution.

Called as Informed search Uninformed


is known as search known as
heuristic search. blind search

Efficiency Highly efficient as Efficiency is


consumes less mediatory
time and cost.

Cost Low Comparatively


high

Performance Finds solution Speed is slower


more quickly than informed
search

Problems Most of the Many problems


problems are are not solved by
solved by uninformed search
informed search.

Size Informed search Uninformed


can not handle search can handle
large search large search
problem. problem.

Algorithms Heuristic depth Depth-first


first and search,
breadth-first breadth-first
search, and A* search and lowest
search cost first search
ii.
11. Constraint Satisfaction Programming: Cryptarithmetic
a. Solve following cryptarithmetic problem: SEND + MORE =
MONEY [May 2019] [4mrks]
i. To begin, start in the 5th column. Since 9999 + 9999 <
20000, we must have M = 1.
ii. Then go to the 4th column. Since 999 + 999 < 2000, we
either have 1 + S + 1 = O + 10, or S + 1 = O + 10, meaning
S = O + 8 or S = O + 9, and O = 0 or 1. Since S is a single
digit, and M = 1, we must have O = 0.
iii. In the 3rd column, since E cannot equal N, we cannot have E
+ 0 = N. Thus we must have 1 + E + 0 = N. Since N cannot
be 0, we must also have E less than 9. So there cannot be
carryover in this column, and the 2nd column must have
carryover.
iv. Returning to the 4th column (which has no carryover from
the 3rd), we must have S + 1 = 10, which means S = 9.
v. Now we know 1 + E = N, and there must be a carryover from
the 2nd column. So we have two cases: N + R = E + 10, or
N + R + 1 = E + 10. We can substitute 1 + E = N in both
cases to get (1 + E) + R = E + 10 –> R = 9 (but 9 is already
taken), or we have 1 + E + R + 1 = E + 10 –> R = 8. So we
must have R = 8.
vi. Now in the units column D + E = Y, and it must have
carryover. Since Y cannot be a 0 or 1, we need D + E ≥ 12.
Since 9 and 8 are taken for S and R, we can have 5 + 7 = 12
or 6 + 7 = 13. So either D = 7 or E = 7.
vii. If E = 7, then E + 1 = N so N = 8–which is not possible since
R = 8. So we must have D = 7, meaning E is either 5 or 6.
viii. If E = 6, then N = 7 which is not possible as D = 7. So we
must have E = 5 and N = 6. This means D + E = 7 + 5 = 12,
and thus Y = 2.
ix. So we have solved for all the letters!
x. SEND + MORE = 9567 + 1085 = 10652.
b. Solve the given crypt arithmetic puzzle. [Dec 2018, Dec
2016] [4mrks]
i. A=4,B=7,S=8,E=3,L=5,G=1,andM=9
ii. Explanation​:
iii. Here we are provided with the following information

BASE

+BALL

------------

GAMES

iv. There are seven distinct digits from 10 preliminary digits that
are from [0-9]:A,B,S,E,LG,M
v. As we are just adding 2 numbers so possible carry overs are
either a 1 or 0. But, when we are adding B+B it gives us
some carry that is greater than zero as both 4 digit numbers
add up to form a five digit number. Hence, G can't be zero.
So, G=1
vi. Then, if 2 B's are added and they are giving us value greater
than equal to 10. Then,B must be greater than equal to 5.
Consider B=9 first. if B=9 then A=8 which is not possible as
then B+B+1 is not equal to A. if B=8 then A=6 which is not
possible for the same reason
vii. Take ​B=7 then A=4 which is possible as it is not leaving any
carry over. So, M=A+A+x where x is carry over from S+L+y
where y is carry over from E+L knowing the fact that x and y
can take maximum value of 1 and a minimum value of 0.
viii. We can say M should be either 8 or 9. 8 when x=0. 9 when
x=1.
ix. Assuming M=8 that is x=0. S+L+y=E. E+L= 10*y+S.
S+L+y+L=10*y+S. 2*L=9*y. Where y can take either 0 or 1
x. if y=0 then L=0 then S and E are not distinct which is not
possible according to question.
xi. if y=1 then L=4.5 which is again not possible as L must be a
single digit among 0 to 9
xii. So, M=8 is not possible, ​M=9
xiii. Now, S+L=10*1+E=10+E,and E+L=10*y+S So, here y can
take maximum and minimum values of 1 and 0 respectively.
xiv. So, if y=0 Then, E+L=S,or S-L=E,and S+L=10+E Adding
both we get,
xv. 2*S=2*E+10 or S=E+5
xvi. Subtracting 1st equation from 2nd we get 2*L=10 or ​L=5
xvii. So, E+5=S,and S+5+0=E+10*1. S+5=E+10. S=E+5
xviii. Now we have to choose digits from 0,1,2,3,4,5,6,7,8,and 9
other than 1,4,7,9,and 5. So,we can take values of E and S
from 0,2,3,6,8 such that S=E+5. 0 is there 5 can't be
chosen. 2 is there 7 can't be chosen. 6 is there 11 can't be
chosen. 8 is there 13 can't be chosen. So, only possible pair
left is 3 and 8 which satisfies our constraints.
xix. Hence, ​S=8 and E=3
c. Solve the given crypt arithmetic puzzle. [May 2018] [4mrks]
i. logic+logic=prolog (7 variables)
ii. 90452+90452=180904
d. Solve the given crypt arithmetic puzzle. [Dec 2017] [4mrks]
i. TWO + TWO = FOUR
ii. 734+734=1468
12. Constraint Satisfaction Programming: Map Coloring
13. Constraint Satisfaction Programming: N-Queens.
a. Formulate 8-Queen problem [May 2018] [4mrks]
i. The eight queens puzzle is the problem of placing eight chess
queens on an 8 x 8 chessboard so that no two queens attack
each other.
ii. Thus, a solution requires that no two queens share the same
row, column, or diagonal.
iii. The eight queens puzzle is an example of the more general
n-queens problem of placing n queens on an n x n
chessboard, where solutions exist for all natural numbers n
with the exception of 1, 2 and 3.
iv. The solution possibilities are discovered only up to 23 queen.
v. States: Any arrangement of 0 to 8 queens on the board
vi. Initial state: 0 queens on the board
vii. Successor function: add a queen in any square
viii. Goal test: 8 queens on the board, none attacked

ix.
14. Adversarial Search: Game Playing
a. Write a short note on game playing [My 2017] [5mrks]
i. Game Playing is an important domain of artificial intelligence.
Games don’t require much knowledge; the only knowledge
we need to provide is the rules, legal moves and the
conditions of winning or losing the game.
ii. Defined by
1. Initial state
2. Successor function
3. Goal test
4. Path cost / utility / payoff function
iii. Types of games:
1. Toy problem: 8 Queen, Vacuum world, Missinanories
2. Real world: Travelling salesman, Robot navigation,
assembly
iv. Types of game types:
1. Deterministic or Probabilistic
2. Perfect or Approximate information
v. Eg:
Exact information Approximate
information

Deterministic Chess, Checkers Battleship, card


game

Probabilistic Ludo, Monopoly Scramble, poker


vi.
15. Adversarial Search: Min-Max Search
a. Explain Min Max & alpha Beta pruning for adversal search
with an example [May 2017] [10mrks]
i. Adversarial Search:
1. Adversarial search is a search, where we examine the
problem which arises when we try to plan ahead of the
world and other agents are planning against us.
2. There might be some situations where more than one
agent is searching for the solution in the same search
space, and this situation usually occurs in game
playing.
3. The environment with more than one agent is termed
as ​multi-agent environment​, in which each agent is
an opponent of other agent and playing against each
other. Each agent needs to consider the action of other
agents and effect of that action on their performance.
4. So, ​Searches in which two or more players with
conflicting goals are trying to explore the same
search space for the solution, are called
adversarial searches, often known as Games​.
ii. Min-Max:
1. Mini-max algorithm is a recursive or backtracking
algorithm which is used in decision-making and game
theory. It provides an optimal move for the player
assuming that opponent is also playing optimally.
2. Mini-Max algorithm uses recursion to search through
the game-tree.
3. Min-Max algorithm is mostly used for game playing in
AI. Such as Chess, Checkers, tic-tac-toe, go, and
various tow-players game. This Algorithm computes
the minimax decision for the current state.
4. In this algorithm two players play the game, one is
called MAX and other is called MIN.
5. Both the players fight it as the opponent player gets
the minimum benefit while they get the maximum
benefit.
6. Both Players of the game are opponent of each other,
where MAX will select the maximized value and MIN
will select the minimized value.
7. The minimax algorithm performs a depth-first search
algorithm for the exploration of the complete game
tree.
8. The minimax algorithm proceeds all the way down to
the terminal node of the tree, then backtrack the tree
as the recursion.
iii. Algorithm:
1. function minimax(node, depth, maximizingPlayer) is
2. if depth ==0 or node is a terminal node then
3. return static evaluation of node
4.
5. if MaximizingPlayer then // for Maximizer Player
6. maxEva= -infinity
7. for each child of node do
8. eva= minimax(child, depth-1, false)
9. maxEva= max(maxEva,eva) //gives Maximum of
the values
10.return maxEva
11.
12.else // for Minimizer player
13. minEva= +infinity
14. for each child of node do
15. eva= minimax(child, depth-1, true)
16. minEva= min(minEva, eva) //gives minimum of
the values
17. return minEva
16. Adversarial Search: Alpha Beta Pruning
a. Consider the given tree. Apply alpha-beta pruning on the
following example: [May 2019, May 2016] [10mrks]

i.
ii. Alpha-Beta pruning is not actually a new algorithm, rather an
optimization technique for minimax algorithm.
iii. It reduces the computation time by a huge factor.
iv. This allows us to search much faster and even go into deeper
levels in the game tree.
v. It cuts off branches in the game tree which need not be
searched because there already exists a better move
available.
vi. It is called Alpha-Beta pruning because it passes 2 extra
parameters in the minimax function, namely alpha and beta.
vii. Let’s define the parameters alpha and beta.
1. Alpha is the best value that the ​maximizer currently
can guarantee at that level or above.
2. Beta is the best value that the ​minimizer currently
can guarantee at that level or above.
viii. Algorithm:
1. function minimax(node, depth, isMaximizingPlayer,
alpha, beta):
2.
3. ​if​ node is a leaf node :
4. ​return​ value of the node
5.
6. ​if​ isMaximizingPlayer :
7. bestVal = -INFINITY
8. ​for each​ child node :
9. value = minimax(node, depth+1, false, alpha,
beta)
10. bestVal = max( bestVal, value)
11. alpha = max( alpha, bestVal)
12. ​if​ beta <= alpha:
13. ​break
14. ​return​ bestVal
15.
16. ​else​ :
17. bestVal = +INFINITY
18. ​for each​ child node :
19. value = minimax(node, depth+1, true, alpha,
beta)
20. bestVal = min( bestVal, value)
21. beta = min( beta, bestVal)
22. ​if​ beta <= alpha:
23. ​break
24. ​return​ bestVal
ix. Example:
x.

xi.
b. Apply alpha beta pruning & min-max search on given game
tree and find which is the next move [May 2018] [10mrks]

i.
ii.

iii.
c. Apply alpha beta pruning & min-max search on given game
tree. Square is max node; Circle is min node [Dec 2017]
[10mrks]

i.

ii.
d. [Dec 2016] [10mrks]

i.
Module III: Knowledge and Reasoning

1. A Knowledge Based Agent


a. Knowledge bases agent [Dec 2018] [5mrks]
i. An intelligent agent needs knowledge about the real world for
taking decisions and reasoning to act efficiently.
ii. Knowledge-based agents are those agents who have the
capability of ​maintaining an internal state of knowledge,
reason over that knowledge, update their knowledge
after observations and take actions. These agents can
represent the world with some formal representation
and act intelligently​.
iii. Knowledge-based agents are composed of two main parts:
1. Knowledge-base and
2. Inference system​.
iv. A knowledge-based agent must able to do the following:
1. An agent should be able to represent states, actions,
etc.
2. An agent Should be able to incorporate new percepts
3. An agent can update the internal representation of the
world
4. An agent can deduce the internal representation of the
world
5. An agent can deduce appropriate actions.
v. Architecture:

1.
2. The above diagram is representing a generalized
architecture for a knowledge-based agent. The
knowledge-based agent (KBA) take input from the
environment by perceiving the environment. The input
is taken by the inference engine of the agent and
which also communicate with KB to decide as per the
knowledge store in KB. The learning element of KBA
regularly updates the KB by learning new knowledge.
vi. Knowledge base: Knowledge-base is a central component
of a knowledge-based agent, it is also known as KB. It is a
collection of sentences
vii. Inference system:
1. Forward chaining
2. Backward chaining
viii. Operations Performed by KBA:
1. TELL: This operation tells the knowledge base what it
perceives from the environment.
2. ASK: This operation asks the knowledge base what
action it should perform.
3. Perform:​ It performs the selected action.
2. Overview of Propositional Logic
a. Explain limitations of propositional logic with suitable
example. [Dec 2018, Dec 2017, Dec 2016] [4mrks]
i. There are quite a few different limitations. First off, logic does
only apply to true or false statements, but there are also
limits in terms of what can be translated into purely
propositional logic. Some valid arguments cannot be
translated into purely propositional logic. For example:
1. Cannot represent individual entities. Eg: Merra is short
[FOL: short(Merra)]
2. Cannot represent generalization, specialization or
pattern. Eg: Triangle have 3 sides [FOL:
noofsides(Triangle, 3)]
3. Complex Statement:
a. Premise 1) All dogs like running.
b. Premise 2) Sam is a dog.
c. Conclusion) Sam likes running.
ii. The argument is valid, but it would take a more complex
logical system to make this argument form translatable. The
system needed would be predicate logic, including the
universal and existential quantifiers. In propositional logic the
argument would be given unique sentence letters, but it
would come out to be invalid by truth table or derivation,
since A, B does not necessarily imply C.
iii. Necessity and possibility are also not captured in
propositional logic(PL). Necessarily 2 + 2 = 4, one might say,
so if it is necessary then it is surely possible. This form of
logic is not possible in PL alone, and so are other many
different systems of logic. The main idea is that logic (mainly
PL) shows a formal system of reasoning, and that system of
reasoning is not the only way to reason. Meaning is
separated from syntax in logic, so a logical system needs to
have adequate operators to deal with different arguments
and problems with meaning. PL does not have very much
going for itself and cannot adequately explain certain issues
it has. For example in PL
iv. A conditional statement with a false antecedent is
automatically true, despite the truth of the antecedent
because of the formulation of the rules in the system. Also,
contradictions can imply anything in PL, so a multivalued
logic could explain or allow such a thing. Saying someone is
bald when they took a haircut could seem like both a T and F
statement, and that would mean statements could be T and
F, or neither T nor F as well as one or the other. So, when
real world meaning is applied to the logical symbols and
operators, there is a clear limitation to what PL offers.
b. Explain modus ponen with suitable example [Dec 2017, May
2016] [4mrks]
i. The Modus Ponens rule is one of the most important rules of
inference, and it states that i​f P and P → Q is true, then
we can infer that Q will be true.​ It can be represented as:

ii.
iii. Example:
1. Statement-1: "If I am sleepy then I go to bed" ==>
P→ Q
2. Statement-2: "I am sleepy" ==> P
3. Conclusion: "I go to bed." ==> Q.
4. Hence, we can say that, if P→ Q is true and P is true
then Q will be true.
iv. Proof by Truth table:
1.

P Q P=>Q

0 0 1

0 1 0

1 0 1

1 1 1
c. Explain modus tollens with suitable example
i. The Modus Tollens rule state that if P→ Q is true and ​¬ Q is
true, then ¬ P​ will also true. It can be represented as:

ii.
iii. Example:
1. Statement-1: "If I am sleepy then I go to bed" ==>
P→ Q
2. Statement-2: "I do not go to the bed."==> ~Q
3. Statement-3: Which infers that "​I am not sleepy​" =>
~P
iv. Proof by Truth table:
1.

P Q ¬P ¬Q P=>Q

0 0 1 1 1

0 1 1 0 0

1 0 0 1 1

1 1 0 0 1
d. [Dec 2016, May 2016] [10mrks]

i.
3. First Order Predicate Logic
a. Explain predicate logic [May 2017] [5mrks]
i. First-order logic is another way of knowledge representation
in artificial intelligence. It is an extension to propositional
logic.
ii. FOL is sufficiently expressive to represent the natural
language statements in a concise way.
iii. First-order logic is also known as ​Predicate logic or
First-order predicate logic​. First-order logic is a powerful
language that develops information about the objects in a
more easy way and can also express the relationship
between those objects.
iv. First-order logic (like natural language) does not only assume
that the world contains facts like propositional logic but also
assumes the following things in the world:
1. Objects: A, B, people, numbers, colors, wars,
theories, squares, pits, wumpus, ......
2. Relations: ​It can be unary relation such as: red,
round, is adjacent, ​or n-any relation such as: the
sister of, brother of, has color, comes between
3. Function: Father of, best friend, third inning of, end
of, ......
v. As a natural language, first-order logic also has two main
parts:
1. Syntax
2. Semantics
vi. Elements:
1.
Constant 1, 2, A, John, Mumbai,
cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃
vii. Atomic sentences:
1. Atomic sentences are the most basic sentences of
first-order logic. These sentences are formed from a
predicate symbol followed by a parenthesis with a
sequence of terms.
2. Eg: Ravi and Ajay are brothers: => Brothers(Ravi,
Ajay).
viii. Complex Sentences:
1. Complex sentences are made by combining atomic
sentences using connectives.
ix. First-order logic statements can be divided into two
parts:
1. Subject​: Subject is the main part of the statement.
2. Predicate​: A predicate can be defined as a relation,
which binds two atoms together in a statement.
x. Quantifiers in First-order logic:
1. A quantifier is a language element which generates
quantification, and quantification specifies the quantity
of specimen in the universe of discourse.
2. These are the symbols that permit to determine or
identify the range and scope of the variable in the
logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone,
everything)
b. Existential quantifier, (for some, at least
one).
b. What do you mean by Quantifier and its types? Explain with
an example [Dec 2016] [5mrks]
i. Universal quantification
1. ∀〈variables〉〈sentence〉
∀x P is true in a model ​m iff P is true with ​x being
each​ possible object in the model
2. Example: “Everyone at Berkeley is smart.”
∀x At(x, Berkeley) ⇒ Smart(x)
ii. Existential quantification
1. ∃〈variables〉〈sentence〉
∃x P is true in a model ​m iff P is true with ​x being
some​ possible object in the model
2. Example: “Someone at Stanford is smart.”
∃x At(x, Stanford) ∧ Smart(x)
c. Difference between propositional and predicate logic. [May
2019, May 2016] [4mrks]
i.
Sr Propositional Predicate (First Order
Logic)

1 It uses Prepositions in FOL uses Predicate which


which complete involve constraints,
sentence is denoted as variables, functions,
symbol relations

2 Cannot represent Can represent individual


individual entities. Eg: entities. Eg: short(Merra)
Merra is short

3 Cannot represent Can represent


generalization, generalization,
specialization or specialization or pattern.
pattern. Eg: Triangle Eg: noofsides(Triangle, 3)
have 3 sides

4 Follows Truth table Follows interpretations

5 Inference algorithm: Inference algorithm:


DPLL, GSAT Forward, Backward
Chaining

6 Associated with finite Associated with infinite


models models

d. Represent the following sentences in FOPL [May 2019]


[10mrks]
i. Everyone who loves all animals is loved by someone
1. ∀x [∀y Animal(y) → Loves(x,y)] → [∃y Loves(y,x)]
ii. Ravi likes all kind of food
1. ∀x : food(x) → likes (Ravi, x)
iii. Every gardener likes the sun.
1. ∀x : gardener(x) → likes(x,Sun)
iv. Everybody loves somebody.
1. ∀x∃y L(x,y)
v. Apples are food
1. food(apple)
e. Represent the following sentences in FOPL [May 2017]
[10mrks]
i. Every gardener likes the sun.
1. ∀x : gardener(x) → likes(x,Sun)
ii. You can fool some of the people all the time
1. ∃x∀t (person(x) ∧ time(t) => can-fool(x,t))
iii. All purple mushrooms are poisonous
1. ∀x (mushroom(x) ∧ purple(x) => poisonous(x))
iv. Every student who takes french passes it
1. ∀x∀y:(𝖲𝗍𝗎𝖽𝖾𝗇𝗍(x)∧𝖥𝗋𝖾𝗇𝖼𝗁(y)∧𝖳𝖺𝗄𝖾(x,y))⟹𝖯𝖺𝗌𝗌(x,y)
v. No person buys an expensive policy
1. ∀x,y Person(x) ∧ Policy(y) ∧ Expensive(y) ⇒
¬Buys(x,y)
f. CNF [May 2019] [5mrks]
i. It is called as Conjunction Normal Form or Clause Normal
Form
ii. Clause Normal Form (CNF) is a sub-language of 1st order
logic. A clause is an expression of the form L​1 |... | L​m where
each L​i​ is a literal.
iii. It is conjunction of one or more clauses
iv. Clause is a disjunction of literals
v. Used in automated theorem and circuit theorem\
vi. A CNF formula is a conjunction of disjunctions of literals

(clauses)
vii. Convert FOL to CNF:
1. STEP 1: Eliminate ‘=>’ & ‘<=>’
a.
Formula Rewrite to

A => B ¬A∨B

A <=> B (A => B) ∧ (A <= B)


2. STEP 2: Move ¬ Inwards
a.
Formula Rewrite to

¬ ∀x P ∃x ¬ P

¬ ∃x P ∀x ¬ P

¬ (A ∨ B) ¬A∧¬B

¬ (A ∧ B) ¬A∨¬B

¬¬A A
3. STEP 3: Renaming Variables:
a. All variables must be renamed apart.
4. STEP 4: Replace Existential quantifiers by skolem
constant
a.
Formula Rewrite to

∃x Rich(x) Rich(G1)
5. STEP 5: Drop Universal Quantifiers
a.
Formula Rewrite to

∀x: food(x) => ¬ food(x) ∨


likes(Ram,x) likes(Ram,x)
4. Inference in First Order Predicate Logic: Forward and
Backward Chaining
a. Short notes on Forward chaining & Backward chaining [May
2019, Dec 2017, May 2017] [5mrks]
i. Forward chaining
1. It is a data driven method of deriving a particular goal
from a given knowledge base and set of inference
rules.
2. Inference rules are applied by matching facts to the
antecedents of consequence relations in the knowledge
base.
3. The application of inference rules results in new
knowledge (from the consequences of the relations
matched), which is then added to the knowledge base
4. It is a strategy of an expert system to answer the
question, ​“What can happen next?”

5.
6. When applying forward chaining, the first step is to
take the facts in the fact database and see if any
combination of these matches all the antecedents of
one of the rules in the rule database.
7. When all the antecedents of a rule are matched by
facts in the database, then this rule is triggered.
8. Usually, when a rule is triggered, it is then fired, which
means its conclusion is added to the facts database. If
the conclusion of the rule that has fired is an action or
a recommendation, then the system may cause that
action to take place or the recommendation to be
made.
9. Example:
10."As per the law, it is a crime for an American to
sell weapons to hostile nations. Country A, an
enemy of America, has some missiles, and all the
missiles were sold to it by Robert, who is an
American citizen."
11. Prove that ​"Robert is criminal."
12.To solve the above problem, first, we will convert all
the above facts into first-order definite clauses, and
then we will use a forward-chaining algorithm to reach
the goal.
13. Facts Conversion into FOL:
a. It is a crime for an American to sell weapons to
hostile nations. (Let's say p, q, and r are
variables)
​American (p) ∧ weapon(q) ∧ sells (p, q, r)
∧ hostile(r) → Criminal(p) ...(1)
b. Country A has some missiles. ​∃p Owns(A, p)
∧ Missile(p)​. It can be written in two definite
clauses by using Existential Instantiation,
introducing new Constant T1.
​Owns(A, T1) ......(2)
​Missile(T1) .......(3)
c. All of the missiles were sold to country A by
Robert.
​∀p Missiles(p) ∧ Owns (A, p) → Sells
(Robert, p, A) ......(4)
d. Missiles are weapons.
​Missile(p) → Weapons (p) .......(5)
e. Enemy of America is known as hostile.
​Enemy(p, America) →Hostile(p)
........(6)
f. Country A is an enemy of America.
​Enemy (A, America) .........(7)
g. Robert is American
​American(Robert). ..........(8)
14. Forward chaining proof:
a. Step-1: ​In the first step we will start with the
known facts and will choose the sentences which
do not have implications, such as:
American(Robert), Enemy(A, America),
Owns(A, T1), and Missile(T1)​. All these facts
will be represented as below.

b.
c. Step-2: ​At the second step, we will see those
facts which infer from the available facts and
with satisfied premises.
i. Rule-(1) does not satisfy premises, so it
will not be added in the first iteration.
ii. Rule-(2) and (3) are already added.
iii. Rule-(4) satisfied with the substitution
{p/T1}, ​so Sells (Robert, T1, A) is
added, which infers from the conjunction
of Rule (2) and (3).
iv. Rule-(6) is satisfied with the
substitution(p/A), so Hostile(A) is added
and which infers from Rule-(7).

d.
e. Step-3: ​At step-3, as we can check Rule-(1) is
satisfied with the substitution ​{p/Robert,
q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available
facts. And hence we reached our goal
statement.

f.
g. Hence it is proved that Robert is Criminal
using forward chaining approach.

ii. Backward chaining [Dec 2018] [5mrks]


1. Backward chaining is a goal driven method of deriving
a particular goal from a given knowledge base and set
of inference rules.
2. Inference rules are applied by matching the goal of the
search to the consequences of the relations stored in
the knowledge base.
3. It is a strategy of an expert system to answer the
question, ​“Why did this happen?”
4.
5. Backward chaining starts with the goal state, which is
the set of conditions the agent wishes to achieve in
carrying out its plan. It now examines this state and
sees what actions could lead to it.
6. For example, if the goal state involves a block being on
a table, then one possible action would be to place that
block on the table. This action might not be possible
from the start state, and so further actions need to be
added before this action in order to reach it from the
start state. In this way, a plan can be formulated
starting from the goal and working back toward the
start state.
7. Example:
8. Backward-Chaining proof:
a. Step-1: ​At the first step, we will take the goal
fact. And from the goal fact, we will infer other
facts, and at last, we will prove those facts true.
So our goal is "Robert is Criminal," so following
is the predicate of it.

b.
c. Step-2: ​At the second step, we will infer other
facts form goal fact which satisfies the rules. So
as we can see in Rule-1, the goal predicate
Criminal (Robert) is present with substitution
{Robert/P}. So we will add all the conjunctive
facts below the first level and will replace p with
Robert. ​Here we can see American (Robert)
is a fact, so it is proved here.
d.
e. Step-3: At step-3, we will extract further fact
Missile(q) which infer from Weapon(q), as it
satisfies Rule-(5). Weapon (q) is also true with
the substitution of a constant T1 at q.

f.
g. Step-4: ​At step-4, we can infer facts Missile(T1)
and Owns(A, T1) form Sells(Robert, T1, r) which
satisfies the ​Rule- 4​, with the substitution of A
in place of r. So these two statements are
proved here.
h.
i. Step-5: ​At step-5, we can infer the fact
Enemy(A, America) from ​Hostile(A) which
satisfies Rule- 6. And hence all the statements
are proved true using backward chaining.

j.
5. Inference in First Order Predicate Logic: Resolution
a. Illustrate the Resolution Proof for the given statement. [May
2019, May 2016] [10mrks]
1. Resolution is a theorem proving technique that
proceeds by building refutation proofs, i.e., proofs by
contradictions. It was invented by a Mathematician
John Alan Robinson in the year 1965.
2. Resolution is used, if there are various statements are
given, and we need to prove a conclusion of those
statements. Unification is a key concept in proofs by
resolutions. Resolution is a single inference rule which
can efficiently operate on the conjunctive normal form
or clausal form.
3. Clause: Disjunction of literals (an atomic sentence) is
called a clause. It is also known as a unit clause
4. Conjunctive Normal Form: A sentence represented as
a conjunction of clauses is said to be conjunctive
normal form or CNF.
ii. The Resolution Inference rule:
1. The resolution rule for first-order logic is simply a lifted
version of the propositional rule. Resolution can
resolve two clauses if they contain complementary
literals, which are assumed to be standardized apart so
that they share no variables.

2. Where l​i​ and m​j​ are complementary literals.


3. This rule is also called the binary resolution rule
because it only resolves exactly two literals.
iii. “The law says that it is a crime for an American to sell
weapons to hostile nations. The country Nono, an enemy of
America, has some missiles, and all of its missiles were sold
to it by Colonel West, who is American.”
iv. STEP 1: Represent the above sentences in First order
predicate logic (FOPL)
1. It is a crime for an American to sell weapons to hostile
nations.
a. American(x) ∧ Weapons(y) ∧ Hostile(z) ∧
Sells(x,y,z) => Criminal(x)
2. The country Nono …. Has some missiles
a. ∃x Owns(Nono, x) ∧ Missiles(x)
3. The country Nono, an enemy of America
a. Enemy(Nono,America)
4. …. All of its missiles were sold to it by Colonel West
a. Missiles(x) ∧ Owns(Nono,x) => Sells(West, x,
Nono)
5. West, who is American
a. American(West)
6. An enemy of America counts as hostile
a. Enemy(x,America) => Hostile(x)
7. Missiles are Weapons
a. Missiles(x) => Weapons(x)
v. STEP 2: Convert them to clause form
1. Applying Rule #5 on Statement 1
a. ¬[American(x) ∧ Weapons(y) ∧ Hostile(z) ∧
Sells(x,y,z)] ∨ Criminal(x)
b. ¬American(x) ∨ ¬Weapons(y) ∨ ¬Hostile(z) ∨
¬Sells(x,y,z) ∨ Criminal(x)
2. Applying Rule #4 on Statement 2
a. Owns(Nono, M1) ∧ Missiles(M1)
3. No Change
a. Enemy(Nono,America)
4. Applying Rule #5
a. ¬[Missiles(x) ∧ Owns(Nono,x)] ∨ Sells(West, x,
Nono)
b. ¬Missiles(x) ∨ ¬Owns(Nono,x) ∨ Sells(West, x,
Nono)
5. No Change
a. American(West)
6. Applying Rule #5
a. ¬Enemy(x,America) ∨ Hostile(x)
7. Applying Rule #5
a. ¬Missiles(x) ∨ Weapons(x)
vi. Prove that Colonel West is a criminal using resolution
technique

vii.
b. Consider the following sentences, Prove “Bob is Mortal”
using modus ponen & Resolution [Dec 2018, Dec 2017]
[10mrks]
i. Proof by Resolution:
1. STEP 1: Represent the above sentences in First
order predicate logic (FOPL)
a. Mammals drink water
mammal(Bob) => drinks(Bob, Milk)
b. Man is Mortal
man(Bob) => mortal(Bob)
c. Man is Mammal
man(Bob) => mammal(Bob)
d. Bob is Man
man(Bob)
2. STEP 2: Convert them to clause form
a. mammal(Bob) => drinks(Bob, Milk)
¬ mammal(Bob) ∨ drinks(Bob, Milk)
b. man(Bob) => mortal(Bob)
¬ man(Bob) ∨ mortal(Bob)
c. man(Bob) => mammal(Bob)
¬ man(Bob) ∨ mammal(Bob)
d. man(Bob)
man(Bob)
3. STEP 3: Apply Resolution
a. Query: mortal(Bob)
b. To disprove: ¬ mortal(Bob)
c. Therefore, KB ∧ ¬ Query: KB ∧ ¬ mortal(Bob)

d.
ii. Proof by Modus Ponens:
1. STEP 1: Represent the above sentences in First
order predicate logic (FOPL)
a. Mammals drink water
mammal(Bob) => drinks(Bob, Milk)
b. Man is Mortal
man(Bob) => mortal(Bob)
c. Man is Mammal
man(Bob) => mammal(Bob)
d. Bob is Man
man(Bob)
2. STEP 2: Apply modus ponens[if P & P=>Q then
Q]:
a. Applying modus ponen on (b) and (d) we get:
mortal(Bob)

c. Use resolution to answer the question “What course would


steve like” [May 2018] [10mrks]
i. STEP 1: Represent the above sentences in First order
predicate logic (FOPL)
1. Steve only likes easy courses
∀x easy(x) => likes(Steve,x)
2. Science course is hard
∀x Science(x) => ¬ easy(x)
3. All the courses in the basket weaving department are
easy.
∀x BacketWeaving(x) => easy(x)
4. BIB301 is a basket weaving course
BacketWeaving(BIB301)
ii. STEP 2: Convert them to clause form
1. ∀x easy(x) => likes(Steve,x)
¬ easy(x) ∨ likes(Steve,x)
2. ∀x Science(x) => ¬ easy(x)
¬ Science(x) ∨ ¬ easy(x)
3. ∀x BacketWeaving(x) => easy(x)
¬ BacketWeaving(x) ∨ easy(x)
4. BacketWeaving(BIB301)
BacketWeaving(BIB301)
iii. STEP 3: Apply Resolution
1. Query: likes(Steve,x)
2. To disprove: ¬ likes(Steve,x)
3. Therefore, KB ∧ ¬ Query: KB ∧ ¬ likes(Steve,x)

iv.
v. The substitution x/BK301 is produced by the unification
algorithm which says that the only wff of the form
likes(steve,x) which follows from the premises is
likes(steve,BK301)​.
vi. Thus, resolution gives us a way to find additional
assumptions (in this case x=BK301)
d. Consider the following sentences, Prove “Angle B is equal to
angle C” using modus ponen & Resolution [Dec 2016]
[10mrks]
i. Proof by Resolution:
1. STEP 1: Represent the above sentences in First
order predicate logic (FOPL)
a. If triangle is equilateral then it is isosceles
∀x triangle(x) ∧ equilateral(x) =>
isosceles(x)
b. If triangle is isosceles then two sides AB, BC are
equal
∀x triangle(x) ∧ isosceles(x) =>
equal(AB,BC)
c. If AB and BC are equal then angle B and C are
equal
equal(AB,BC) => equal(B,C)
d. ABC is an equilateral triangle
equilateral(ABC)
triangle(ABC)
2. STEP 2: Convert them to clause form
a. ∀x triangle(x) ∧ equilateral(x) => isosceles(x)
¬ triangle(x) ∨ ¬equilateral(x) ∨
isosceles(x)
b. ∀x triangle(x) ∧ isosceles(x) => equal(AB,BC)
¬ triangle(x) ∨ ¬isosceles(x) ∨
equal(AB,BC)
c. equal(AB,BC) => equal(B,C)
¬ equal(AB,BC) ∨ equal(B,C)
d. equilateral(ABC)
equilateral(ABC)
e. triangle(ABC)
triangle(ABC)
3. STEP 3: Apply Resolution
a. Query: equal(B,C)
b. To disprove: ¬ equal(B,C)
c. Therefore, KB ∧ ¬ Query: KB ∧ ¬ equal(B,C)
d.
e. The substitution x/ABC is produced by the
unification algorithm which says that the only
wff of the form equilateral(ABC) which follows
from the premises is ​equal(B,C)​.
f.
ii. Proof by Modus ponens:
1. STEP 1: Represent the above sentences in First
order predicate logic (FOPL)
a. If triangle is equilateral then it is isosceles
∀x triangle(x) ∧ equilateral(x) =>
isosceles(x)
b. If triangle is isosceles then two sides AB, BC are
equal
∀x triangle(x) ∧ isosceles(x) =>
equal(AB,BC)
c. If AB and BC are equal then angle B and C are
equal
equal(AB,BC) => equal(B,C)
d. ABC is an equilateral triangle
equilateral(ABC)
e. ABC is an equilateral triangle
triangle(ABC)

2. STEP 2: Apply modus ponens[if P & P=>Q then


Q]:
a. From (d) and (e) we get ​triangle(x) ∧
equilateral(x) ​- (f)
b. Applying modus ponen on (f) and (a) we get:
isosceles(x)​ - (g)
c. From (g) and (e) we get ​triangle(x) ∧
isosceles(x) ​- (h)
d. Applying modus ponen on (h) and (b) we get:
equal(AB,BC) ​- (g)
e. Applying modus ponen on (g) and (c) we get:
equal(B,C) ​- (g)
Module IV: Planning

1. Introduction to Planning
a. Explain Planning in AI [Dec 2018] [3mrks]
i. Planning is the task of coming up with a sequence of actions
that will achieve the goal.
ii. Planning is the task of finding a procedural course of action
for a declaratively described system to reach its goals while
optimizing overall performance measures.
iii. Classical planning environments:
1. Fully observable
2. Deterministic
3. Finite
4. Static (change only happens when the agent acts)
5. Discrete (in time, action, objects)
iv. A plan is ​complete​ iff every precondition is achieved
v. A precondition is ​achieved iff it is the effect of an earlier step
and no possibly intervening step undoes it
b. Explain planning algorithm using flat tire problem [Dec
2017] [10mrks]
i. Consider the problem of changing a flat tire. More precisely,
the goal is to have a good spare tire properly mounted onto
the car’s axle, where the initial state has a flat tire on the
axle and a good spare tire in the trunk. To keep it simple, our
version of the problem is a very abstract one, with no sticky
lug nuts or other complications. There are just four actions:
1. Removing the spare from the trunk
2. Removing the flat tire from the axle
3. Putting the spare on the axle
4. Leaving the car unattended overnight. We assume that
the car is in a particularly bad neighborhood, so that
the effect of leaving it overnight is that the tires
disappear.
ii. The ADL description of the problem is shown in Figure. Notice
that it is purely propositional. It goes beyond STRIPS in that
it uses a negated precondition, ¬At(Flat, Axle),for the
PutOn(Spare, Axle) action. This could be avoided by using
Clear (Axle) instead.
iii. Algorithm(Using ADL):
1. Init​(At(Flat, Axle) ∧ At(Spare, Trunk ))
2. Goal​ (At(Spare, Axle))
3. Action(Remove(Spare, Trunk ),
a. PRECOND : At (Spare, Trunk )
b. EFFECT : ¬ At(Spare, Trunk ) ∧ At(Spare,
Ground ))
4. Action(Remove(Flat, Axle),
a. PRECOND : At (Flat, Axle)
b. EFFECT : ¬ At(Flat , Axle) ∧ At(Flat, Ground ))
5. Action(PutOn(Spare, Axle),
a. PRECOND : At(Spare, Ground ) ∧ ¬ At(Flat,
Axle)
b. EFFECT : ¬ At(Spare, Ground ) ∧ At(Spare,
Axle))
6. Action(LeaveOvernight)
a. PRECOND :
b. EFFECT : ¬ At(Spare, Ground ) ∧ ¬ At(Spare,
Axle) ∧ ¬ At(Spare, Trunk ) ∧ ¬ At(Flat,
Ground ) ∧ ¬ At(Flat , Axle))
iv. Algorithm(Using STRIPS):
1. Initial state​:
a. Flat tire => Axle
b. Spare tire => Trunk
2. Goal state:
a. Spare tire => Axle
b. Flat tire => Trunk
3. Actions:
a. Remove Spare
b. Remove Flat
c. Put Spare => Axle
d. Put Flat => Trunk
4. Action(Remove(Spare, Trunk)):
a. Precond: At(Spare,Trunk)
b. Effect: ¬At(Spare,Trunk)
5. Action(Remove(Flat, Axle)):
a. Precond: At(Flat, Axle)
b. Effect: ¬At(Flat, Axle)
6. Action(Put(Flat, Trunk)):
a. Precond: ¬At(Flat, Axle)
b. Effect: At(Flat, Trunk)
7. Action(Put(Spare, Axle)):
a. Precond: ¬At(Spare, Trunk)
b. Effect: At(Spare, Axle)
2. Planning with State Space Search
a. The ​initial state ​of the search is the initial state from the planning
problem. In general, each state will be a set of positive ground
literals; literals not appearing are false.
b. • The ​actions that are applicable to a state are all those whose
preconditions are satisfied. The successor state resulting from an
action is generated by adding the positive effect literals and
deleting the negative effect literals. (In the first-order case, we
must apply the unifier from the preconditions to the effect literals.)
Note that a single successor function works for all planning
problems—a consequence of using an explicit action representation.
c. • The ​goal test checks whether the state satisfies the goal of the
planning problem.
d. • The ​step cos​t of each action is typically 1. Although it would be
easy to allow different costs for different actions, this is seldom
done by S TRIPS planners.
3. Partial Order planning
a. Explain a partial order planning with example [May 2019,
May 2018, May 2017, May 2016] [5mrks] [Dec 2018]
[3mrks]
i. The idea of a ​partial-order planner is to have a partial
ordering between actions and only commit to an ordering
between actions when forced. This is sometimes also called a
non-linear planner​, which is a misnomer because such
planners often produce a linear plan.
ii. A partial ordering is a less-than relation that is transitive and
asymmetric. A ​partial-order plan is a set of actions
together with a partial ordering, representing a "before"
relation on actions, such that any total ordering of the
actions, consistent with the partial ordering, will solve the
goal from the initial state. Write ​act0​ < act1​ if action ​act​0 is
before action ​act​1 in the partial order. This means that action
act​0​ must occur before action ​act1​ ​.
iii. Eg: Flat tire, Putting on a shoe

iv.
v. It combines two action sequence:
1. First branch covers left sock and left shoe.
2. In case, to wear a left shoe, wearing left sock is
precondition, similarly.
3. Second branch covers right shock and right shoe.
4. Here, wearing a right shock is precondition for wearing
the right shoe.
vi. Once these actions are taken we achieve our goal and reach
the finish state.
b. Explain STRIPS representation of planning problem [May
2018, May 2016] [5mrks]
i. The STRIPS representation is an action-centric representation
which, for each action, specifies when the action can occur
and the effects of the action. STRIPS, which stands for
“STanford Research Institute Problem Solver,” was the
planner used in Shakey, one of the first robots built using AI
technology.
ii. The STRIPS representation for an action consists of
1. The ​precondition​, a set of assignments of values to
features that must hold for the action to occur
2. The ​effect​, a set of assignments of values to primitive
features that specifies the features that change, and
the values they change to, as the result of the action.
iii. The precondition of an action is a proposition – the
conjunction of the elements of the set – that must be true
before the action is able to be carried out. In terms of
constraints, the robot is constrained so it can only choose an
action for which the precondition holds.
iv. Example:
1. The action of Rob to pick up coffee has the following
STRIPS representation:
2. Precondition: ​{cs,¬rhc}
3. Effect: ​{rhc}
4. That is, in order to be able to pick up coffee, the robot
must be at the coffee shop and not have coffee. After
the action, ​rhc h
​ olds (i.e., ​RHC=true​). All other
feature values are unaffected by this action.
v. Features:
1. It allows only positive literals in the states, e.g.: A
valid sentence in STRIPS is expressed as
=>Intelligent^Beautiful.
2. It makes use of closed-world assumption. (i.e.
unmentioned literals are false)
3. We can only find ground literals in goals. For example:
Intelligent^Beautiful.
4. Goals are conjunctions For eg:(Intelligent^Beautiful)
5. Effects are conjunctions
6. Does not support equality.
7. Do not have support for types.
4. Hierarchical Planning
a. Explain real time example of hierarchical planning [Dec
2018] [3mrks]
i. Hierarchical decomposition is used to reduce complexity
1. At each level of the hierarchy a computational task is
reduced to a small number of activities at the next
lower level.
2. The computational cost of arranging these activities
are low
ii. Hierarchical task network (HtN) planning use a refinement of
actions through decomposition
1. Eg: building a house
2. Refined until only primitive actions remains
iii. There are two types of pure and hybrid HTN planning
iv. General description are stored in plan library
1. Each method = Decompose(a,d); a = action & d =
Plan

v.
5. Conditional Planning
a. Explain conditional planning with example [Dec 2018]
[3mrks]
i. Conditional planning has to work regardless of the outcome
of an action.
ii. It takes place in Fully Observable Environment where the
current state of the agent is known environment is fully
observable. The outcome of actions cannot be determined so
the environment is said to be nondeterministic.
iii. Here we can check what is happening in the environment at
predetermined points of the plan to deal with ambiguous
actions.
iv. It needs to take some actions at every state and must be
able to handle every outcome for the action it takes. A state
node is represented with a square and chance node is
represented with a circle.
v. For a state node we have the option of choosing some
actions. For a chance node agent has to handle every
outcome.
vi. Conditional Planning can also take place in the Partially
Observable Environments where, we cannot keep a track on
every state.
vii. In vacuum cleaner e.g. if the dirt is at Right and agent knows
about Right, but not about Left. Then, in such cases Dirt
might be left behind when the agent, leaves a clean square.
Initial state is also called as a state set or a belief state.
viii. Sensors play an important role in Conditional planning for
partially observable environments. Automatic sensing can be
useful; with automatic sensing an agent gets all the available
precepts at every step.

ix.
Module V: Uncertain Knowledge and Reasoning

1. Uncertainty
a. What is uncertainty? [Dec 2018] [5mrks] [May 2018]
[5mrks]
i. Agents may need to handle uncertainty, whether due to
partial observability, no determinism, or a combination of the
two. An agent may never know for certain what state it’s in
or where it will end up after a sequence of actions.
1. Uncertain data
a. missing data, unreliable, ambiguous, imprecise
representation, inconsistent, subjective, derived
from defaults, noisy…
2. Uncertain knowledge
a. Multiple causes lead to multiple effects
b. Incomplete knowledge of causality in the
domain
c. Probabilistic/stochastic effects
3. Uncertain knowledge representation
a. restricted model of the real system
b. limited expressiveness of the representation
mechanism
4. inference process
a. Derived result is formally correct, but wrong in
the real world
b. New conclusions are not well-founded (eg,
inductive reasoning)
c. Incomplete, default reasoning methods
ii. Let’s consider an example of uncertain reasoning: diagnosing
a dental patient’s toothache.
iii. Diagnosis—whether for medicine, automobile repair, or
whatever—almost always involves uncertainty. Let us try to
write rules for dental diagnosis using propositional logic, so
that we can see how the logical approach breaks down.
iv. Consider the following simple rule:
v. Toothache ⇒Cavity.
vi. The problem is that this rule is wrong. Not all patients with
toothaches have cavities; some of them have gum disease,
an abscess, or one of several other problems:
vii. Toothache ⇒Cavity ∨GumProblem∨Abscess.
viii. Unfortunately, in order to make the rule true, we have to add
an almost unlimited list of possible problems. We could try
turning the rule into causal rule:
ix. Cavity ⇒Toothache.
x. But this rule is not right either; not all cavities cause pain.
The only way to fix the rule is to make it logically exhaustive:
to augment the left-hand side with all the qualifications
required for a cavity to cause a toothache.
2. Representing Knowledge in an Uncertain Domain
a. Probability theory:
i. Using predictable logic to represent facts in intelligent
systems is a standard method of representation. But the
problem with this classical logic as a knowledge
representation system is that it assumes that we can define
all values as either completely true or completely false.
ii. A given sentence is labeled with a real number in the range
of 0 to 1, 0 meaning the sentence is completely false, and 1
meaning it is completely true. A label of 0.5 means there is
an equal chance that the sentence is true or false.
iii. Challenges:
1. Unknown probability values
2. Inconsistent probability assignments
3. Computational expense
b. Fuzzy logic:
i. Fuzzy logic is a very convenient and precise way of
expressing imprecise knowledge.
ii. In boolean system truth value, 1 represents absolute truth
value and 0 represents absolute false value. But in the fuzzy
system, there is no logic for absolute truth and absolute false
value. There is an intermediate value present which is
partially true and partially false.
iii. Challenges:
1. Degree of membership is often context dependent.
2. General-purpose fuzzy rules are hard to get.
c. Truth Maintenance System (TMS):
i. It is a non-monotonic reasoning system. It stores the latest
truth value of any predicate. The system is developed with
the idea that truthfulness of a predicate can change with
time, as new knowledge is added or existing knowledge is
updated. It keeps a record showing which items of knowledge
are currently believed or disbelieved.
3. Conditional Probability
a. Write a note on conditional probability and its role in AI.
[May 2019] [4mrks] [Dec 2018, Dec 2017, Dec 2016]
[5mrks]
i. In probability theory, conditional probability is a measure of
the probability of an event given that (by assumption,
presumption, assertion or evidence) another event has
occurred.
ii. If the event of interest is A and the event B is known or
assumed to have occurred, "the conditional probability of A
given B", or "the probability of A under the condition B", is
usually written as P(A|B), or sometimes P​b​(A) where | is
pronounced “given”.
iii. Using conditional probabilities, we can have conditional
information. It is mainly used for representing and reasoning
with probabilistic information.
iv. E.g. P(It will rain tomorrow | It is raining today)
v. It represents the conditional knowledge that it might rain
tomorrow as it is raining today.
vi. P(A|B)+P(NOT (A)|B)=1.
vii. For example, the probability that any given person has a
cough on any given day may be only 5%. But if we know or
assume that the person has a cold, then they are much more
likely to be coughing. The conditional probability of coughing
given that you have a cold might be a much higher 75%.
viii. The concept of conditional probability is one of the most
fundamental and one of the most important concepts in
probability theory. But conditional probabilities can be quite
slippery and require careful interpretation. For example,
there need not be a causal or temporal relationship between
A and B.
ix. P(A|B) may or may not be equal to P(A) (the unconditional
probability of A). If P(A|B) = P(A) (or its equivalent P(B|A) =
P(B)), then events A and B are said to be independent: in
such a case, having knowledge about either event does not
change our knowledge about the other event.
x. Also, in general, P(A|B) (the conditional probability of A given
B) is not equal to P(B|A). For example, if you have cancer
you might have a 90% chance of testing positive for cancer.
In this case what is being measured is that if event B
("having cancer") has occurred, the probability of A (test is
positive) given that B (having cancer) occurred is 90%: that
is, P(A|B) = 90%.
xi. Alternatively, if you test positive for cancer you may have
only a 15% chance of actually having cancer because most
people do not have cancer and the false positive rate for the
test may be high. In this case what is being measured is the
probability of the event B (having cancer) given that the
event A (test is positive) has occurred: P(B|A) = 15%
4. Joint Probability

a.
5. Bayes’ theorem
a. Short note on Bayes theorem [May 2019, Dec 2017] [5mrks]
[Dec 2019, May 2016] [4mrks]
i. Bayes Theorem describes the probability of an event based
on prior knowledge of conditions that might be related to the
event.
ii. In Artificial Intelligence, Bayes Theorem determines the
probability of an event with uncertain knowledge.
iii. It is a way to calculate the value of P(X|Y) with the
knowledge of P(Y|X).
iv. Derivation:
P (X∩Y )
We know that, P(X|Y) = P (Y ) -------- (1)
P (Y ∩X)
Similarly, P(Y|X) = P (X) -------- (2)
Equating (1) and (2), we get:

P (Y |X) . P (X)
P(X|Y) =​ P (Y )
The above equation (a) is called as Bayes' rule or Bayes'
theorem. This equation is basic of most modern AI systems
for probabilistic inference.

P(A|B) ​is known as ​posterior​, which we need to calculate,


and it will be read as Probability of hypothesis A when we
have occurred an evidence B.
P(B|A) ​is called the ​likelihood​, in which we consider that
hypothesis is true, then we calculate the probability of
evidence.
P(A) is called the ​prior probability​, probability of
hypothesis before considering the evidence
P(B) is called ​marginal probability​, pure probability of
evidence.
v. You might be interested in finding out a patient’s probability
of having liver disease if they are an alcoholic. “Being an
alcoholic” is the test (kind of like a litmus test) for liver
disease.
1. A could mean the event “Patient has liver disease.”
Past data tells you that 10% of patients entering your
clinic have liver disease. P(A) = 0.10.
2. B could mean the litmus test that “Patient is an
alcoholic.” Five percent of the clinic’s patients are
alcoholics. P(B) = 0.05.
3. You might also know that among those patients
diagnosed with liver disease, 7% are alcoholics. This is
your B|A: the probability that a patient is an alcoholic,
given that they have liver disease, is 7%.
4. Bayes’ theorem tells you:
P(A|B) = (0.07 * 0.1)/0.05 = 0.14
5. In other words, if the patient is an alcoholic, their
chances of having liver disease is 0.14 (14%).
vi.
6. Belief Networks
a. Explain Bayesian Network with an example [Dec 2018]
[5mrks] [May 2018] [5mrks]
i. A Bayesian Belief Network, or simply “Bayesian Network,”
provides a simple way of applying Bayes Theorem to complex
problems.
ii. The networks are not exactly Bayesian by definition, although
given that both the probability distributions for the random
variables (nodes) and the relationships between the random
variables (edges) are specified subjectively, the model can be
thought to capture the “belief” about a complex domain.
iii. Represent the full joint distribution over the variables more
compactly with a smaller number of parameters.
iv. Take advantage of conditional and marginal independence
among random variables
v. A and B are independent
1. P ( A , B ) = P ( A ) P ( B )
vi. A and B are conditionally independent given C
1. P ( A , B | C ) = P ( A | C ) P ( B | C )
2. P ( A | C , B ) = P ( A | C )
vii. Joint Probability Distribution:

1.
2. Eg:

a.
b. P(A,J,M) = P(A∧J∧M) = P(A) * P(J/A) * P(M/A)
viii. Example: Assume your house has an alarm system against
burglary. You live in a seismically active area and the alarm
system can get occasionally set off by an earthquake. You
have two neighbors, Mary and John, who do not know each
other. If they hear the alarm they call you, but this is not
guaranteed. We want to represent the probability distribution
of events: Burglary, Earthquake, Alarm, Mary calls and John
calls
1. Nodes & values: Identify nodes which are random
values
a. Burglary, Earthquake, Alarm, Mary calls and
John calls
2. Structure/Link: To represent the network structure
must be created using links, in which two nodes are
connected by a link if they are connected by a
relationship.

a.
3. Local conditional distribution: Relate variables and
their parents:
a.
4. Conditional probabilities: Once the relation is
established we need to quantify the nodes with the
conditional probabilities

a.
5. Query:
a. Probability that John & Mary both call, alarm
rings but there is no burglary and no earthquake
i. P(J ∧ M ∧ A ∧ ¬B ∧ ¬E)
ii. P(J/A) * P(M/A) * P(A/¬B,¬E) * P(¬B) *
P(¬E)
iii. 0.9 * 0.7 * 0.001 * 0.999 * 0.998
iv. 0.000628
b. Probability that john calls
i. P(J)
ii. P(J/A) * P(A) + P(J/¬A) * P(¬A)
iii. 0.9 * P(A) + 0.05 * P(¬A)
iv. Where P(A) =
1. P(A/B,E) * P(B∧E) +
2. P(A/¬B,E) * P(¬B∧E) +
3. P(A/B,¬E) * P(B∧¬E) +
4. P(A/¬B,¬E) * P(¬B∧¬E) =
5. 0.025
v. Similarly find P(¬A) = 0.9974
vi. = 0.521
c. Probability of burglary [May 2017, Dec
2016, May 2016] [10mrks]
i. Inference:

ii.
iii. We have to find:
1. P(B|J,M) = P(B,J,M) / P(J,M)
2. P(B|J,M) = α P(B,J,M)
3. P(B|J,M) = α Σe Σa P(B,e,a,J,M)
4. P(B|J,M) = α Σe Σa P(B) * P(e) *
P(a|B,e) * P(J|a) * P(M|a)
5. Since we only know J & M we
substitute and add the results:

7. Simple Inference in Belief Networks.


Module VI: Natural Language Processing

1. Language Models
a. The goal of a language model is to compute the probability of a
token (e.g. a sentence or a sequence of words) and are useful in
many different Natural Language Processing applications.
b. Language Model (LM) can be called as the grammar of a language
as it gives the probability of the word that will follow.
c. In case of probabilistic language modelling the probability of a
sentence as a sequence of words is calculated:
P (W ) = P (w1 , w2 , w3 , ... wn )
d. It can also be used to find the probability of the next word in the
sentence:
P (w5 | w1 , w2 , w3 , w4 )
e. A model that computes either of these is called a language model.
f. There are various language models in available practice. Following
are a few of them:
i. Methods using Markov assumption:
1. A process which is stochastic in nature, is said to have
the Markov property if the conditional probability
definition of future states of the process depends only
upon the present state, not on the sequence of events
that happened in the past.
2. In other words, the probability of the next word can be
estimated given only the previous k number of words.
3. Following in the general equation for the Markov
Assumption, k=i :
P (wi | w1 , w2 , ...wi−1 ) ≈ P (wi | wi − k , ...., wi−1 )
ii. N-gram Models:
1. From the Markov Assumption, we can formally define
N-gram models where k = n − 1 :
P (wi | w1 , w2 , ...wi−1 ) ≈ P (wi | wi − (n−1) , ...., wi−1 )
2. The simplest versions of this are defined as the
Unigram Model ( k = 1 ) and the Bigram Model ( k = 2)
iii. Unigram Model ( k = 1) :

1. P (w1 , w2 , ...., wn ) ≈ ∏ P (w1 )


i
iv. Bigram Model ( k = 2) :
1. P (wi | w1 , w2 , ...wi−1 ) ≈ P (wi | wi−1 )
g. Language Modelling is one of the most important parts of modern
Natural Language Processing.
h. There are many sorts of applications of language modelling, like:
Spell Correction, Speech Recognition, Machine Translation, Question
Answering, Summarization, Sentiment Analysis,etc.
i. All these tasks require the use of a language model.
j. Language model is supposed to represent the text to a form
understandable from the machine point of view.
k. Moreover, language modelling must also consider the correlated
ordering of token. As every language is based on some grammar,
where order has a lot of influence on the meaning of a text.

2. Natural Language for Communication: Syntactic Analysis


a. Syntactic analysis or parsing or syntax analysis is the third phase of
NLP.
b. In syntactic analysis the sentences are parsed as nouns, verbs,
adjectives, and other parts of sentences.
c. In this phase the grammar of the sentence is analyzed in order to
get the relationships among different words in the sentence.
d. Syntax analysis checks the text for meaningfulness compared to the
rules of formal grammar.
e. For example, a sentence like “mongo eats me” would be rejected by
syntactic analyzer.

3. Natural Language for Communication: Augmented Grammars


and Semantic Interpretation
a. Grammar is a formal description of the structure of any language.
b. In Semantic Interpretation, the actual meaning of the sentence is
extracted from the structure and the words used.
c. It checks whether the sentence is meaningful.
d. It maps the object with their syntactic structure to decide the
correctness of the sentence.
e. For eg, “bitter sugar” will be termed as a wrong sentence by a
semantic analyser.
f. Augmented Grammars:
i. An augmented grammar is any grammar whose productions
are augmented with conditions, expressed using features.
ii. Features may be associated with any non-terminal symbol in
a derivation.
iii. A feature associated with a non-terminal symbol is shown
following that non-terminal separated from it by a “.”, eg.
A.COUNT is the COUNT feature associated with the
non-terminal A.
iv. When the value is included with the feature, the feature and
its value are bracketed and the dot omitted, as in A[COUNT
1].
v. If no ambiguity arises, the feature name may be omitted, as
in A[1].
vi. The values of the features may be assigned by the
application of a production (indicated by the assignment
operator “:=” for ‘is set equal to’), or they may be compared
using a comparison operator such as “=”.
vii. Example: An augmented context-free grammar for
generating the strictly indexed language Ln3 = {an bn cn | n > 0} .
viii. Here are the productions of an augmented context-free
grammar G1 that generates the language Ln3 = {an bn cn | n > 0} .
ix. Non-terminal symbols that occur more than once in
production are distinguished by indices.
x. The non-terminal symbol to the left of the rewrite arrow “→”
get index 0 (eg. A​0​), and recurring non-terminals to the right
of the arrow get indices 1, 2, …. From left to right, eg.
A​1​.COUNT is an integer feature i.e an integer that takes
integer values of each of the non-terminal symbols in G1 .
xi. Productions of G1 :
1. S → ABC, where A.COUNT := S.COUNT, B.COUNT :=
S.COUNT, C.COUNT := S.COUNT
2. A​0​ → aA​1​ , where A​1​.COUNT := A​0​.COUNT -1
3. B​0​ → bB​1​ , where B​1​.COUNT := B​0​.COUNT -1
4. C​0​ → cC​1​ , where C​1​.COUNT := C​0​.COUNT -1
5. A → 𝜖 , where A.COUNT = 0
6. B → 𝜖 , where B.COUNT = 0
7. C → 𝜖 , where C.COUNT = 0

4. Natural Language for Communication: Machine Translation


a. Machine Translation is the classic test of language understanding. It
consists of both language analysis and language generation.
b. Many machine translation system has huge commercial use. Few
examples are : Google Translate, eBay uses MT to enable
cross-border trade, Facebook usee MT to translate posts and
comments.
c. In a traditional Machine Translation system, parallel corpus a
collection of texts is used each of which, is translated into one or
more other languages than the original. For e.g. given a source
language such as French and a target language such as English,
multiple statistical models need to be built, including a probabilistic
formulation using the Bayesian rule, a translation model p(f|e)
trained on the parallel corpus, and a language model p(e) trained
on the English-only corpus. Such an approach can potentially skip
hundreds of important details.
d. Neural Machine Translation (NMT):
i. The above process is modelled through one big artificial
neural network, known as a Recurrent Neural Network (RNN)
which is a stateful neural network.
ii. Standard NMT is an end-to-end neural network where the
source sentence is encoded by a RNN called encoder and the
target words are predicted using another RNN known as
decoder.
iii. The RNN encoder reads a source sentence one symbol at a
time and then summarizes the entire source sentence in its
last hidden state.
iv. Back propagation is used to learn the summary and the
translated version is returned.
v. Features of NMT:
1. End-to-end training: All parameters in NMT are
simultaneously optimised to minimise loss function on
the network’s output.
2. Distributed representations: NMT has a better
exploitation of word and phrase similarities. Hence it
forms a robust translator.
3. Better exploration of context: NMT can use a much
bigger context for both source and partial target text
in order to translate more accurately.
4. More fluent text generation: Deep learning text
generation is of much higher quality than the parallel
corpus way.
vi. However, the main problem with RNNs is the vanishing or
exploding gradient problem where, depending on the
activation functions used, information rapidly gets lost over
time.
vii. As a consequence, RNNs will experience difficulty in
memorising previous words very far away in the sequence
and are only able to make predictions based on the most
recent words.
e. Long Short-Term Memory (LSTM):
i. LSTM works as a solution to the vanishing gradient problem
by introducing gates and an explicitly defined memory cell.
ii. Each neuron has a memory cell and three gates; input,
output and forget.
iii. The function of these gates is to safeguard information by
stopping or allowing the flow of it:
1. The input gate determines how much information from
the previous layer gets stored in the cell.
2. The output layer takes the job on the other end and
determines how much the next layer gets to know
about the state of this cell.
3. The forget gate seems like an odd inclusion at first but
sometimes it's good to forget: if it’s learning a book
and a new chapter begins, it may be necessary for the
networks to forget some characters from the previous
chapter.
iv. LSTMs are able to learn complex sequences, such as writing
like Shakespeare or composing primitive music.
v. Since each of the gates has a weight to a cell in the previous
neuron, they typically require more resources to run. It is the
default model for most sequence labelling tasks, which have
lots and lots of data.
f. Gated Recurrent Units (GRU):
i. They are a slight variation of LSTMs and are extensions of
Neural Machine Translation.
ii. They have one less gate and are wired differently.
iii. GRU has update and reset gates instead of an input, output
and forget gates.
iv. This update gate determines how much information has to be
kept from the last state and how much information to forget
from the previous layer.
v. The reset gate functions much like the forget gate of LSTM,
but it’s located slightly differently.
vi. They don’t have an output gate. In most cases, they function
very similarly to LSTMs, with the biggest difference being
that GRUs are slightly faster and easier to run and slightly
less expressive.
g. Further notable improvements in neural machine translation
systems are:
i. Sequence to sequence, learning with Neural Networks proved
the effectiveness of LSTM for Neural Machine Translation. The
method uses a multilayered LSTM to map the input sequence
to a vector of a fixed dimensionality and then another deep
LSTM to decode to target sequence from the vector.
ii. NMT by Jointly Learning to Align and Translate introduced the
attention mechanism in NLP.
iii. Convolutional over Recurrent Encoder for NMT augments the
standard RNN encoder in NMT with additional convolutional
layers in order to capture wider context in encoder output.
iv. Google built its own NMT system, called Google’s Neural
Machine Translation, which addresses many issues in
accuracy and ease of deployment. The model consists of
deep LSTM with 9 encoder and 8 decoder layers using
residual connection as well as attention connections from the
decoder network to the encoder.
v. Instead of using RNN, Facebook AI Researchers use
Convolutional Neural NEtworks for sequence-to-sequence
learning tasks in NMT.
5. Overview of Cognitive Computing: Foundation of Cognitive
Computing
a. Cognitive Computing is a new type of computing with the goal of
more accurate models of how the human brain/mind senses,
reasons and responds to stimulus.
b. Generally, the term cognitive computing is used to refer to new
hardware and/or software that mimic the functioning of the human
brain thereby improving human decision-making.
c. Cognitive Computing applications link data analysis and adaptive
page displays i.e Adaptive User Interfaces, to adjust content for a
particular type of audience
d. Following are some features of cognitive systems:
i. Interactive: They may interact easily with users so that those
users can define their needs comfortably. They may also
interact with other processors, devices and cloud services as
well as with people.
ii. Adaptive: They may be engineered to feed on dynamic data
in real time. They may learn as information changes and as
goals and requirements evolve. They may resolve ambiguity
and tolerate unpredictability.
iii. Contextual: They may understand, identify and extract
contextual elements such as meaning, syntax, time, location,
appropriate domain, regulations, user’s profile, process, task
and goal. They may draw on multiple sources of information
including both structured and unstructured digital information
as well as sensory inputs like visual, gestures, auditory or
sensor-provided.
iv. Iterative and stateful: They may aid in defining a problem by
asking questions or finding additional sources of input if a
problem statement is ambiguous or incomplete. They may
“remember” previous interactions in a process and return
information that is suitable for the specific application at that
point in time.

6. Overview of Cognitive Computing: List of Design Principles


for Cognitive Systems
a. Standardise:
i. Many errors are caused by inconsistencies in how things
work, whether how information is displayed or how controls
are activated.
ii. To prevent mistakes, a general rule is to insure that similar
devices work the same way.
iii. Agreeing upon a standard helps prevent errors.
b. Use Stereotypes:
i. A stereotype is a commonly held expectation of what people
think is supposed to happen when they recognise a signal or
activate a control.
ii. Good design should take advantage of these perceptions and
expectations.
iii. Stereotype is much similar to standardization. The only
difference is stereotype is convention that is generally
followed while standardization is expected agreement on the
operation or design of components.
c. Match controls to equipment layout
d. Simplify presentation of information:
i. Too much information is provided sometimes, or it is
provided in a complex fashion.
ii. In general good designs provide simplified displays although
it can depend on the situation.
e. Present information in appropriate detail:
i. The design of signs, instructions manuals and control panels
can benefit from evaluation.
ii. It is the job of the writer or designer to do all the hard work
of thinking, organising and expressing.
iii. The user or reader should have an easy understanding of the
overall working and operations of the system
iv. Good design leads to user friendly system.
f. Present clear images:
i. Another common problem is exhibiting an image poorly such
that the user cannot distinguish or interpret the message.
ii. There are three issues in presenting clear images:
1. Visibility:
a. First the message must be visible.
b. The size and location should be appropriate at
the distance from which it is to be observed and
there should be no obstructions.
c. Signs and labels should contrast with their
background
2. Distinguishability:
a. The message should also be distinguishable
from other surrounding signals and information.
b. Multiple signals such as warning lights or
alarms, should not be so similar that they can
be confusing.
c. For eg: a fire alarm should have a separate
pattern or pitch that is distinct from other types
of alarms.
d. There should be adequate space to separate
messages from one another.
3. Interpretability:
a. Finally, the message should be interpretable.
b. Avoid using characters that look alike and
breaking up long strings.
c. Match the message with the training of the user.
iii. Use redundancies:
1. Sometimes, one message is insufficient.
2. Because mistakes are easy to make and humans have
many limitations, it is important to provide the same
information in more than one way.
iv. Use patterns:
1. The human eye grasps patterns well.
2. Information presented as a pattern can often be
understood much more quickly and accurately than
otherwise.
3. In control panels for complex equipment, grouping
appropriate controls can ease use.
4. Similarly, placing controls in patterns or in a context
can help indicate to the user what to do.
v. Provide variable stimuli:
1. Humans detect novel stimulus more readily than a
constant one because our senses fatigue easily with
continuous exposure.
2. Written warnings are generally being ignored as they
quickly become part of the background and are simply
are read.
3. Thus, it is important to avoid excessive use of a single
way of presenting information.
vi. Provide instantaneous feedback:
1. An additional principle that helps prevent errors is to
provide feedback to the user on the course of action
taken.
2. Furthermore, the sooner the feedback is given, the
easier it is to determine if an error has been made or
not.

7. Overview of Cognitive Computing: Natural Language


Processing in Support of a Cognitive System (First three
chapters from Text book 3)
a. Natural Language Processing is a core ability of cognitive computing
systems, is often defined as helping computers process and
understand human language.
b. Other deep technical processes behind NLP include machine
learning techniques, computational linguistics and statistics across
training corpora.
c. The advancement in AI technologies and the growing interest in
interacting with computers using human speech as it is spoken are
driving an increasing demand for natural language processing
applications.
d. Automation of information-intensive tasks is one of the most
popular and effective use cases for NLP.
e. Benefits range from operational and production efficiency to more
effective analysis of data to gain competitive advantage and new
insights.
f. The ultimate goal of NLP is for computers to achieve human-like
comprehension of texts/languages.
g. When it is achieved, computer systems will be able to understand,
draw inferences from, summarize, translate and generate accurate
and natural human text and language.
Miscellaneous

1. Explain Expert System Shell in short [May 2019, May 2017, Dec
2016] [4 marks]
2. Draw and explain the expert system architecture [May 2019,Dec
2016] [5mrks] [May 2018] [4mrks] [Dec 2017] [10mrks]
3. Write a prolog program for Factorial,fibonacci [May 2019,
May2017, Dec 2016] [5mrks]
4. Decision tree [May 2019, May 2018] [5mrks]
5. Inductive learning and rote learning [May 2019] [5mrks]
6. What is prolog? What do you mean by structure in prolog? Write a
prolog program [Dec 2018] [10mrks] [May 2018] [10mrks]
7. Expert Shell system architecture [Dec 2018] [5mrks]
8. Explain Route learning & Inductive learning with suitable example
[Dec 2017, May 2017] [10mrks]
9. Explain SMA* Algorithm [Dec 2017] [5mrks]
10. Supervised & Unsupervised learning [Dec 2017, May 2017]
[5mrks]
11. What is satisfiability? Explain with an example [Dec 2016]
[5mrks]
12. What is ontology? and its types [Dec 2016] [5mrks]

You might also like