BTech Advanced AI Unit02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Advanced AI

Dr.Vaishnaw G.Kale
Associate Professor
Department of Computer Science
Engineering & Applications
D.Y.Patil International University, Pune
About the Course

Name of the subject:Advanced AI


Course Code:AIM703
Programme:B.Tech
Year of study:Final Year
Semester :VII
Specialization:Artificial Intelligence Track
Academic Year:2023-24
Syllabus
Module-II:Adversarial Search &Games

Game Theory, Game Playing Techniques like Minimax and Alpha Beta Pruning, Monte Carlo
Tree Search, Stochastic Games, Partially Observable Games, Limitations of Game Search
Algorithms, Constraint Satisfaction Problems (CSP)
Books
Sr.No. Text Books Name of the Author

1 “Artificial Intelligence: A Modern Approach” Stuart Russell and Peter Norvig

2 “A First Course in Artificial Intelligence” Deepak Khemani

3 “Artificial Intelligence” Elaine Rich, Kevin Knight and Nair

4 “Deep Learning” Ian Goodfellow The MIT Press

Sr.No. Reference Books Name of the Author

1 “Artificial Intelligence: A new Synthesis” Nilsson Nils J

2 “Artificial Intelligence” Patrick Henry Winston

3 “Computational Intelligence: An Introduction” Andries P. Engelbrecht

4 “Artificial Intelligence: Concepts and Applications” Dr. Lavika Goel


Advanced Artificial Intelligence

Module 02.Adversarial Search and Games


Game Theory

● Game Theory is applicable to several areas of artificial intelligence, including: Mathematical


game theory is used to simulate how various players would interact strategically in a setting
with predetermined rules and consequences.
● Different areas of artificial intelligence can benefit from the application of game theory:
1. Multi-agent AI systems.
2. Imitation and Reinforcement Learning.
3. Adversary training in Generative Adversarial Networks (GANs)
Game Theory

● In addition, machine learning models and many events in daily life may be described using
game theory.
● For instance, a two-person game in which one player challenges the other to locate the best
hyper-plane providing him the most demanding points to classify may be used to demonstrate a
classification technique like SVM (Support Vector Machines).
● The outcome of the game will then condense into a trade-off between the two players’ strategic
prowess (example how well the first player was challenging the second one to classify difficult
data points and how good was the second player to identify the best decision boundary).
Game Theory

There are five primary categories of games in game theory:


1. Games that are cooperative vs. noncooperative allow players to form alliances to increase their
chances of winning (eg. negotiations). Instead of forming alliances, players cannot do so in non-
cooperative games (eg. wars).
2. Games can be either symmetric or asymmetric. In a symmetric game, everyone has the same
objectives, and the winner is determined only by the techniques each player uses to attain them
(eg. chess). Instead, in asymmetric games, players have opposing or discordant objectives.
3. Games with Perfect Information vs. Games with Imprecise Information: In Perfect Information
games, all participants can observe each other’s movements (eg. chess). Instead, the actions of
other players are concealed in games with imperfect information (eg. card games).
Game Theory

There are five primary categories of games in game theory:


4. Games that are simultaneous vs those that are sequential: In simultaneous games, many players can
act at once. Instead, in sequential games, every player is informed of the prior deeds of every other
player (eg. board games).

5. Games that are Zero-Sum versus Non-Zero-Sum: In Zero-Sum games, if one player gets anything,
the other players lose. Instead, many players might profit from one another’s gains in non-zero sum
games.
Game Theory
Optimal Decision in Games

Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious reasons. MAX
is the first to move, and then they take turns until the game is finished.

At the conclusion of the game, the victorious player receives points, while the loser receives penalties.

A game can be formalized as a type of search problem that has the following elements:
● S0: The initial state of the game, which describes how it is set up at the start.
● Player (s): Defines which player in a state has the move.
● Actions (s): Returns a state’s set of legal moves.
● Result (s, a): A transition model that defines a move’s outcome.
● Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise. Terminal
states are those in which the game has come to a conclusion.
Game Theory
Optimal Decision in Games

● Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for player p.
The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2. Backgammon’s
payoffs range from 0 to +192, but certain games have a greater range of possible outcomes. A
zero-sum game is defined (confusingly) as one in which the total reward to all players is the
same for each game instance. Chess is a zero-sum game because each game has a payoff of 0 +
1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a preferable name, 22 but zero-sum is
the usual term and makes sense if each participant is charged 1.
Game Playing
1.Mini Max

● Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-making


and game theory. It provides an optimal move for the player assuming that opponent is also
playing optimally.
● Mini-Max algorithm uses recursion to search through the game-tree.
● Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-tac-toe,
go, and various two-players game. This Algorithm computes the minimax decision for the
current state.
● In this algorithm two players play the game, one is called MAX and other is called MIN.
● Both the players fight it as the opponent player gets the minimum benefit while they get the
maximum benefit.
Game Playing
1.Mini Max

● Both Players of the game are opponent of each other, where MAX will select the maximized
value and MIN will select the minimized value.
● The mini-max algorithm performs a depth-first search algorithm for the exploration of the
complete game tree.
● The mini-max algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.
Game Playing
1.Mini Max Working

● The working of the mini-max algorithm can be easily described using an example. Below we
have taken an example of game-tree which is representing the two-player game.
● In this example, there are two players one is called Maximizer and other is called Minimizer.
● Maximizer will try to get the Maximum possible score, and Minimizer will try to get the
minimum possible score.
● This algorithm applies DFS, so in this game-tree, we have to go all the way through the leaves
to reach the terminal nodes.
● At the terminal node, the terminal values are given so we will compare those values and
backtrack the tree until the initial state occurs.
Game Playing
1.Mini Max Working

Step-1: In the first step, the algorithms generate the


entire game-tree and apply the utility function to get the
utility values for the terminal states. In the below tree
diagram, let's take A is the initial state of the tree.
Suppose maximizer takes first turn which has worst-case
initial value =- infinity, and minimizer will take next
turn which has worst-case initial value = +infinity.
Game Playing
1.Mini Max Working

Step 2: Now, first we find the utilities value for the


Maximizer, its initial value is -∞, so we will compare each
value in terminal state with initial value of Maximizer and
determines the higher nodes values. It will find the
maximum among the all.
● For node D max(-1,- -∞) => max(-1,4)= 4
● For Node E max(2, -∞) => max(2, 6)= 6
● For Node F max(-3, -∞) => max(-3,-5) = -3
● For node G max(0, -∞) = max(0, 7) = 7
Game Playing
1.Mini Max Working

Step 3: In the next step, it's a turn for minimizer, so it


will compare all nodes value with +∞, and will find the
3rd layer node values.
● For node B= min(4,6) = 4
● For node C= min (-3, 7) = -3
Game Playing
1.Mini Max Working

Step 4: Now it's a turn for Maximizer, and it will again


choose the maximum of all nodes value and find the
maximum value for the root node. In this game tree,
there are only 4 layers, hence we reach immediately to
the root node, but in real games, there will be more than
4 layers.
● For node A max(4, -3)= 4

That is complete workflow of the minimax two player


game
Game Playing

1.Mini Max Properties

● Complete- Mini-Max algorithm is Complete. It will definitely find a solution (if exist), in the
finite search tree.
● Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
● Time complexity- As it performs DFS for the game-tree, so the time complexity of Min-Max
algorithm is O (bm), where b is branching factor of the game-tree, and m is the maximum depth
of the tree.
● Space Complexity- Space complexity of Mini-Max algorithm is also similar to DFS which is
O (bm).
Game Playing

1.Mini Max Limitations

● The main drawback of the minimax algorithm is that it gets really slow for complex games such
as Chess, go, etc.
● This type of games has a huge branching factor, and the player has lots of choices to decide.
● This limitation of the minimax algorithm can be improved from alpha-beta pruning.
Game Playing
2.Alpha Beta Pruning

● Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization technique for
the minimax algorithm.
● As we have seen in the minimax search algorithm that the number of game states it has to examine are
exponential in depth of the tree.
● Since we cannot eliminate the exponent, but we can cut it to half.
● Hence there is a technique by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning.
● This involves two threshold parameter Alpha and beta for future expansion, so it is called alpha-beta
pruning. It is also called as Alpha-Beta Algorithm.
● Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only prune the tree
leaves but also entire sub-tree.
Game Playing
2.Alpha Beta Pruning

● The two-parameter can be defined as:

a. Alpha: The best (highest-value) choice we have found so far at any point along the path of Maximizer.
The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the path of Minimizer.
The initial value of beta is +∞.
● The Alpha-beta pruning to a standard minimax algorithm returns the same move as the standard
algorithm does, but it removes all the nodes which are not really affecting the final decision but
making algorithm slow. Hence by pruning these nodes, it makes the algorithm fast.
Condition for Alpha Beta Pruning:
● The main condition which required for alpha-beta pruning is: α>=β
Game Playing
2.Alpha Beta Pruning

Key points for Alpha Beta Pruning


● The Max player will only update the value of alpha.
● The Min player will only update the value of beta.
● While backtracking the tree, the node values will be passed to upper nodes instead of values
of alpha and beta.
● We will only pass the alpha, beta values to the child nodes.
Game Playing

2.Working of Alpha Beta Pruning

Let's take an example of two-player search tree to


understand the working of Alpha-beta pruning

Step 1: At the first step the, Max player will start first
move from node A where α= -∞ and β= +∞, these value
of alpha and beta passed down to node B where again
α= -∞ and β= +∞, and Node B passes the same value to
its child D.
Game Playing
2.Working of Alpha Beta Pruning

Step 2: At Node D, the value of α will be calculated as its


turn for Max. The value of α is compared with firstly 2 and
then 3, and the max (2, 3) = 3 will be the value of α at node
D and node value will also 3.

Step 3: Now algorithm backtracks to node B, where the


value of β will change as this is a turn of Min, Now β= +∞,
will compare with the available subsequent nodes value, i.e.
min (∞, 3) = 3, hence at node B now α= -∞, and β= 3.
● In the next step, algorithm traverse the next successor of Node B which is node E, and the values of α= -∞, and
β= 3 will also be passed.
Game Playing
2.Working of Alpha Beta Pruning

Step 4: At node E, Max will take its turn, and the value of
alpha will change. The current value of alpha will be
compared with 5, so max (-∞, 5) = 5, hence at node E α= 5
and β= 3, where α>=β, so the right successor of E will be
pruned, and algorithm will not traverse it, and the value at
node E will be 5.
Game Playing
2.Working of Alpha Beta Pruning

Step 5: At next step, algorithm again backtrack the tree, from


node B to node A. At node A, the value of alpha will be changed
the maximum available value is 3 as max (-∞, 3)= 3, and β= +∞,
these two values now passes to right successor of A which is
Node C.At node C, α=3 and β= +∞, and the same values will be
passed on to node F.

Step 6: At node F, again the value of α will be compared with


left child which is 0, and max(3,0)= 3, and then compared with
right child which is 1, and max(3,1)= 3 still α remains 3, but the
node value of F will become 1.
Game Playing

2.Working of Alpha Beta Pruning

Step 7: Node F returns the node value 1 to node C, at C α= 3


and β= +∞, here the value of beta will be changed, it will
compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of
C which is G will be pruned, and the algorithm will not
compute the entire subtree G.
Game Playing

2.Working of Alpha Beta Pruning

Step 8: C now returns the value of 1 to A here the best value


for A is max (3, 1) = 3. Following is the final game tree
which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value
for the maximizer is 3 for this example.
Game Playing
2.Move Ordering in Alpha Beta Pruning

The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is examined.
Move order is an important aspect of alpha-beta pruning.
It can be of two types:
● Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the leaves of the
tree, and works exactly as minimax algorithm. In this case, it also consumes more time because of alpha-
beta factors; such a move of pruning is called worst ordering. In this case, the best move occurs on the
right side of the tree. The time complexity for such an order is O(bm).
● Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning happens in the
tree, and best moves occur at the left side of the tree. We apply DFS hence it first search left of the tree
and go deep twice as minimax algorithm in the same amount of time. Complexity in ideal ordering is
O(bm/2).
Game Playing
2.Rules to find good Ordering in Alpha Beta Pruning

Following are some rules to find good ordering in alpha-beta pruning:


● Occur the best move from the shallowest node.
● Order the nodes in the tree such that the best nodes are checked first.
● Use domain knowledge while finding the best move. Ex: for Chess, try order: captures first, then
threats, then forward moves, backward moves.
● We can bookkeep the states, as there is a possibility that states may repeat.
Game Playing
Monte Carlo Search

● The Games like Tic-Tac-Toe, Rubik’s Cube, Sudoku, Chess, Go and many others have common
property that lead to exponential increase in the number of possible actions that can be played.
● These possible steps increase exponentially as the game goes forward.
● Ideally if you can predict every possible move and its result that may occur in the future. You can
increase your chance of winning.
● But since the moves increase exponentially — the computation power that is required to calculate the
moves also goes through the roof.
● Monte Carlo Tree Search is a method usually used in games to predict the path (moves) that should be
taken by the policy to reach the final winning solution.
● Essentially, MCTS uses Monte Carlo simulation to accumulate value estimates to guide towards highly
rewarding trajectories in the search tree.
Game Playing
Monte Carlo Search

MCTS consists of repeated iterations of 4 steps: selection, expansion, simulation and back propagation.
1) Selection
● In this process, the MCTS algorithm traverses the current tree from the root node using a specific
strategy.
● The strategy uses an evaluation function to optimally select nodes with the highest estimated
value.
● MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the strategy in the
selection process to traverse the tree.
● It balances the exploration-exploitation trade-off. During tree traversal, a node is selected based
on some parameters that return the maximum value.
Game Playing
Monte Carlo Search

where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
● When traversing a tree during the selection process, the child node
that returns the greatest value from the above equation will be one
that will get selected.
● During traversal, once a child node is found which is also a leaf
node, the MCTS jumps into the expansion step.
Game Playing
Monte Carlo Search

2) Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
3) Simulation: In this process, a simulation is performed by choosing moves or strategies until a
result or predefined state is achieved.
4) Backpropagation: After determining the value of the newly added node, the remaining tree must
be updated. So, the backpropagation process is performed, where it backpropagates from the new
node to the root node.
During the process, the number of simulation stored in each node is incremented.
Also, if the new node’s simulation results in a win, then the number of wins is also incremented.
Game Playing
Monte Carlo Search

● These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
● This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against the
world’s top Go players.
● But, its application is not limited to games only.
● It can be used in any situation which is described by state-action pairs and simulations used to
forecast outcomes.
Game Playing
Advantages of Monte Carlo Search

1. MCTS is a simple algorithm to implement.


2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without any
knowledge in the particular domain, apart from the rules and end conditions, and can find its
own moves and learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used in future use cases
whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the circumstances in which it
is operating.
Game Playing
Disadvantages of Monte Carlo Search

1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when implemented
for those turn-based games. This is mainly due to the vast amount of combinations and each of
the nodes might not be visited enough number of times to understand its result or outcome in the
long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.
Game Playing
Stochastic Games

● Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
● Many games, such as dice tossing, have a random element to reflect this unpredictability.
● These are known as stochastic games. Backgammon is a classic game that mixes skill and luck.
● The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario.
● This is a standard backgammon position. The object of the game is to get all of one’s pieces off
the board as quickly as possible.White moves in a clockwise direction toward 25, while Black
moves in a counterclockwise direction toward 0.
● Unless there are many opponent pieces, a piece can advance to any position; if there is only one
opponent, it is caught and must start over.
Game Playing
Stochastic Games

● White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–24), (5–
10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one piece from
position 5 to 11 and then another from 11 to 16.
● White knows his or her own legal moves, but he or she has no idea how Black will roll, and thus
has no idea what Black’s legal moves will be.
● That means White won’t be able to build a normal game tree-like in chess or tic-tac-toe.
● In backgammon, in addition to M A X and M I N nodes, a game tree must include chance nodes.
● The figure below depicts chance nodes as circles. The possible dice rolls are indicated by the
branches leading from each chance node; each branch is labeled with the roll and its probability.
Game Playing
Stochastic Games

● There are 36 different ways to roll two dice,


each equally likely, yet there are only 21
distinct rolls because a 6–5 is the same as a
5–6. P (1–1) = 1/36 because each of the six
doubles (1–1 through 6–6) has a probability
of 1/36. Each of the other 15 rolls has a 1/18
chance of happening.

● The following phase is to learn how to make good decisions. Obviously, we want to choose the move that
will put us in the best position. Positions, on the other hand, do not have specific minimum and maximum
values. Instead, we can only compute a position’s anticipated value, which is the average of all potential
outcomes of the chance nodes.
Game Playing
Partially Observable Games

● Partial observability means that an agent does not know the state of the world or that the agents
act simultaneously.
● Partial observability for the multiagent case is more complicated than the fully observable
multiagent case or the partially observable single-agent case.
● This is the best examples to show some important issues that arise even in the case of two
agents, each with a few choices.
● A partially observable system is one in which the entire state of the system is not fully visible
to an external sensor.
● In a partially observable system the observer may utilize a memory system in order to add
information to the observer's understanding of the system.
Game Playing
Partially Observable Games

● An example of a partially observable system would be a card game in which some of the cards
are discarded into a pile face down.
● In this case the observer is only able to view their own cards and potentially those of the
dealer.
● They are not able to view the face-down (used) cards, nor the cards which will be dealt at some
stage in the future.
● A memory system can be used to remember the previously dealt cards that are now on the used
pile (large collection arranged one over other).
● This adds to the total sum of knowledge that the observer can use to make decisions.
Game Playing
Partially Observable Games

● In contrast, a fully observable system would be that of chess. In chess (apart from the 'who is moving
next' state) the full state of the system is observable at any point in time. Partially observable is a term
used in a variety of mathematical settings, including that of Artificial Intelligence and Partially
observable Markov decision processes.
● Chess has often been described as war in miniature, but it lacks at least one major characteristic of real
wars, namely, partial observability.
● In the “fog of war,” the existence and disposition of enemy units is often unknown until revealed by
direct contact. As a result, warfare includes the use of scouts and spies together information and the
use of concealment and bluff to confuse the enemy.
● Partially observable games share these characteristics and are thus qualitatively different from other
observable games.
Constraint Satisfaction Problem(CSP)
● In a CSP, we have a set of variables with known domains and a set of constraints that impose
restrictions on the values those variables can take.
● Our task is to assign a value to each variable so that we fulfill all the constraints.So, to formally
define a CSP, we specify:
Constraint Satisfaction Problem(CSP)
CSP as Search Problem

● The domains and variables together determine a set of all possible assignments (solutions) that
can be complete or partial.
● Finding the one(s) that satisfy all the constraints is a search problem like finding the shortest
path between two nodes in a graph.
● The CSP search graph nodes represent the assignments, and an edge from node u to vertex v
corresponds to a change in a single variable.
● The start node is the empty solution, with all variables unassigned.
● Each time when we cross an edge, we change the value of exactly one variable.
● The search stops when we find the complete assignment that satisfies all the constraints.
● The constraint satisfaction check corresponds to the goal test in the general search
Constraint Satisfaction Problem(CSP)

Constraint Graph

● We can visualize the CSP and the structure of its solutions as a constraint graph.
● If all the constraints are binary, the nodes in the graph represent the CSP variables, and the
edges denote the constraints acting upon them.
● However, if the constraints involve more than two variables (in which case, we call them
global), we create constraint hyper-graphs.
● They contain two types of nodes: the regular nodes that represent variables and hyper-nodes
that represent constraints.
Constraint Satisfaction Problem(CSP)
Backtracking Solver

● The idea is to start from an empty solution and set the variables one by one until we assign
values to all.
● When setting a variable, we consider only the values consistent with those of the previously set
variables.
● If we realize that we can’t set a variable without violating a constraint, we backtrack to one of
those we set before.
● Then, we set it to the next value in its domain and continue with other unassigned variables
● Although the order in which we assign the variables their values don’t affect the correctness of
the solution, it does influence the algorithm’s efficacy.
● The same goes for the order in which we try the values in a variable’s domain.
Constraint Satisfaction Problem(CSP)
Backtracking Solver

● Moreover, it turns out that whenever we set a variable, we can infer which values of the other
variables are inconsistent with the current assignment and discard them without traversing the
whole sub-tree.
● The way we implement variable selection, value ordering, and inference is crucial for
performance.
● In traditional search, we improved the algorithms by formulating problem-specific heuristics
that guided the search efficiently.
● However, it turns out that there are domain-independent techniques we can use to speed up the
backtracking solver for CSPs.
Thank you!

You might also like