BTech Advanced AI Unit02
BTech Advanced AI Unit02
BTech Advanced AI Unit02
Dr.Vaishnaw G.Kale
Associate Professor
Department of Computer Science
Engineering & Applications
D.Y.Patil International University, Pune
About the Course
Game Theory, Game Playing Techniques like Minimax and Alpha Beta Pruning, Monte Carlo
Tree Search, Stochastic Games, Partially Observable Games, Limitations of Game Search
Algorithms, Constraint Satisfaction Problems (CSP)
Books
Sr.No. Text Books Name of the Author
● In addition, machine learning models and many events in daily life may be described using
game theory.
● For instance, a two-person game in which one player challenges the other to locate the best
hyper-plane providing him the most demanding points to classify may be used to demonstrate a
classification technique like SVM (Support Vector Machines).
● The outcome of the game will then condense into a trade-off between the two players’ strategic
prowess (example how well the first player was challenging the second one to classify difficult
data points and how good was the second player to identify the best decision boundary).
Game Theory
5. Games that are Zero-Sum versus Non-Zero-Sum: In Zero-Sum games, if one player gets anything,
the other players lose. Instead, many players might profit from one another’s gains in non-zero sum
games.
Game Theory
Optimal Decision in Games
Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious reasons. MAX
is the first to move, and then they take turns until the game is finished.
At the conclusion of the game, the victorious player receives points, while the loser receives penalties.
A game can be formalized as a type of search problem that has the following elements:
● S0: The initial state of the game, which describes how it is set up at the start.
● Player (s): Defines which player in a state has the move.
● Actions (s): Returns a state’s set of legal moves.
● Result (s, a): A transition model that defines a move’s outcome.
● Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise. Terminal
states are those in which the game has come to a conclusion.
Game Theory
Optimal Decision in Games
● Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for player p.
The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2. Backgammon’s
payoffs range from 0 to +192, but certain games have a greater range of possible outcomes. A
zero-sum game is defined (confusingly) as one in which the total reward to all players is the
same for each game instance. Chess is a zero-sum game because each game has a payoff of 0 +
1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a preferable name, 22 but zero-sum is
the usual term and makes sense if each participant is charged 1.
Game Playing
1.Mini Max
● Both Players of the game are opponent of each other, where MAX will select the maximized
value and MIN will select the minimized value.
● The mini-max algorithm performs a depth-first search algorithm for the exploration of the
complete game tree.
● The mini-max algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.
Game Playing
1.Mini Max Working
● The working of the mini-max algorithm can be easily described using an example. Below we
have taken an example of game-tree which is representing the two-player game.
● In this example, there are two players one is called Maximizer and other is called Minimizer.
● Maximizer will try to get the Maximum possible score, and Minimizer will try to get the
minimum possible score.
● This algorithm applies DFS, so in this game-tree, we have to go all the way through the leaves
to reach the terminal nodes.
● At the terminal node, the terminal values are given so we will compare those values and
backtrack the tree until the initial state occurs.
Game Playing
1.Mini Max Working
● Complete- Mini-Max algorithm is Complete. It will definitely find a solution (if exist), in the
finite search tree.
● Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
● Time complexity- As it performs DFS for the game-tree, so the time complexity of Min-Max
algorithm is O (bm), where b is branching factor of the game-tree, and m is the maximum depth
of the tree.
● Space Complexity- Space complexity of Mini-Max algorithm is also similar to DFS which is
O (bm).
Game Playing
● The main drawback of the minimax algorithm is that it gets really slow for complex games such
as Chess, go, etc.
● This type of games has a huge branching factor, and the player has lots of choices to decide.
● This limitation of the minimax algorithm can be improved from alpha-beta pruning.
Game Playing
2.Alpha Beta Pruning
● Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization technique for
the minimax algorithm.
● As we have seen in the minimax search algorithm that the number of game states it has to examine are
exponential in depth of the tree.
● Since we cannot eliminate the exponent, but we can cut it to half.
● Hence there is a technique by which without checking each node of the game tree we can compute the
correct minimax decision, and this technique is called pruning.
● This involves two threshold parameter Alpha and beta for future expansion, so it is called alpha-beta
pruning. It is also called as Alpha-Beta Algorithm.
● Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only prune the tree
leaves but also entire sub-tree.
Game Playing
2.Alpha Beta Pruning
a. Alpha: The best (highest-value) choice we have found so far at any point along the path of Maximizer.
The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the path of Minimizer.
The initial value of beta is +∞.
● The Alpha-beta pruning to a standard minimax algorithm returns the same move as the standard
algorithm does, but it removes all the nodes which are not really affecting the final decision but
making algorithm slow. Hence by pruning these nodes, it makes the algorithm fast.
Condition for Alpha Beta Pruning:
● The main condition which required for alpha-beta pruning is: α>=β
Game Playing
2.Alpha Beta Pruning
Step 1: At the first step the, Max player will start first
move from node A where α= -∞ and β= +∞, these value
of alpha and beta passed down to node B where again
α= -∞ and β= +∞, and Node B passes the same value to
its child D.
Game Playing
2.Working of Alpha Beta Pruning
Step 4: At node E, Max will take its turn, and the value of
alpha will change. The current value of alpha will be
compared with 5, so max (-∞, 5) = 5, hence at node E α= 5
and β= 3, where α>=β, so the right successor of E will be
pruned, and algorithm will not traverse it, and the value at
node E will be 5.
Game Playing
2.Working of Alpha Beta Pruning
The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is examined.
Move order is an important aspect of alpha-beta pruning.
It can be of two types:
● Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the leaves of the
tree, and works exactly as minimax algorithm. In this case, it also consumes more time because of alpha-
beta factors; such a move of pruning is called worst ordering. In this case, the best move occurs on the
right side of the tree. The time complexity for such an order is O(bm).
● Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning happens in the
tree, and best moves occur at the left side of the tree. We apply DFS hence it first search left of the tree
and go deep twice as minimax algorithm in the same amount of time. Complexity in ideal ordering is
O(bm/2).
Game Playing
2.Rules to find good Ordering in Alpha Beta Pruning
● The Games like Tic-Tac-Toe, Rubik’s Cube, Sudoku, Chess, Go and many others have common
property that lead to exponential increase in the number of possible actions that can be played.
● These possible steps increase exponentially as the game goes forward.
● Ideally if you can predict every possible move and its result that may occur in the future. You can
increase your chance of winning.
● But since the moves increase exponentially — the computation power that is required to calculate the
moves also goes through the roof.
● Monte Carlo Tree Search is a method usually used in games to predict the path (moves) that should be
taken by the policy to reach the final winning solution.
● Essentially, MCTS uses Monte Carlo simulation to accumulate value estimates to guide towards highly
rewarding trajectories in the search tree.
Game Playing
Monte Carlo Search
MCTS consists of repeated iterations of 4 steps: selection, expansion, simulation and back propagation.
1) Selection
● In this process, the MCTS algorithm traverses the current tree from the root node using a specific
strategy.
● The strategy uses an evaluation function to optimally select nodes with the highest estimated
value.
● MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the strategy in the
selection process to traverse the tree.
● It balances the exploration-exploitation trade-off. During tree traversal, a node is selected based
on some parameters that return the maximum value.
Game Playing
Monte Carlo Search
where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
● When traversing a tree during the selection process, the child node
that returns the greatest value from the above equation will be one
that will get selected.
● During traversal, once a child node is found which is also a leaf
node, the MCTS jumps into the expansion step.
Game Playing
Monte Carlo Search
2) Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
3) Simulation: In this process, a simulation is performed by choosing moves or strategies until a
result or predefined state is achieved.
4) Backpropagation: After determining the value of the newly added node, the remaining tree must
be updated. So, the backpropagation process is performed, where it backpropagates from the new
node to the root node.
During the process, the number of simulation stored in each node is incremented.
Also, if the new node’s simulation results in a win, then the number of wins is also incremented.
Game Playing
Monte Carlo Search
● These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
● This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against the
world’s top Go players.
● But, its application is not limited to games only.
● It can be used in any situation which is described by state-action pairs and simulations used to
forecast outcomes.
Game Playing
Advantages of Monte Carlo Search
1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when implemented
for those turn-based games. This is mainly due to the vast amount of combinations and each of
the nodes might not be visited enough number of times to understand its result or outcome in the
long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.
Game Playing
Stochastic Games
● Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
● Many games, such as dice tossing, have a random element to reflect this unpredictability.
● These are known as stochastic games. Backgammon is a classic game that mixes skill and luck.
● The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario.
● This is a standard backgammon position. The object of the game is to get all of one’s pieces off
the board as quickly as possible.White moves in a clockwise direction toward 25, while Black
moves in a counterclockwise direction toward 0.
● Unless there are many opponent pieces, a piece can advance to any position; if there is only one
opponent, it is caught and must start over.
Game Playing
Stochastic Games
● White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–24), (5–
10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one piece from
position 5 to 11 and then another from 11 to 16.
● White knows his or her own legal moves, but he or she has no idea how Black will roll, and thus
has no idea what Black’s legal moves will be.
● That means White won’t be able to build a normal game tree-like in chess or tic-tac-toe.
● In backgammon, in addition to M A X and M I N nodes, a game tree must include chance nodes.
● The figure below depicts chance nodes as circles. The possible dice rolls are indicated by the
branches leading from each chance node; each branch is labeled with the roll and its probability.
Game Playing
Stochastic Games
● The following phase is to learn how to make good decisions. Obviously, we want to choose the move that
will put us in the best position. Positions, on the other hand, do not have specific minimum and maximum
values. Instead, we can only compute a position’s anticipated value, which is the average of all potential
outcomes of the chance nodes.
Game Playing
Partially Observable Games
● Partial observability means that an agent does not know the state of the world or that the agents
act simultaneously.
● Partial observability for the multiagent case is more complicated than the fully observable
multiagent case or the partially observable single-agent case.
● This is the best examples to show some important issues that arise even in the case of two
agents, each with a few choices.
● A partially observable system is one in which the entire state of the system is not fully visible
to an external sensor.
● In a partially observable system the observer may utilize a memory system in order to add
information to the observer's understanding of the system.
Game Playing
Partially Observable Games
● An example of a partially observable system would be a card game in which some of the cards
are discarded into a pile face down.
● In this case the observer is only able to view their own cards and potentially those of the
dealer.
● They are not able to view the face-down (used) cards, nor the cards which will be dealt at some
stage in the future.
● A memory system can be used to remember the previously dealt cards that are now on the used
pile (large collection arranged one over other).
● This adds to the total sum of knowledge that the observer can use to make decisions.
Game Playing
Partially Observable Games
● In contrast, a fully observable system would be that of chess. In chess (apart from the 'who is moving
next' state) the full state of the system is observable at any point in time. Partially observable is a term
used in a variety of mathematical settings, including that of Artificial Intelligence and Partially
observable Markov decision processes.
● Chess has often been described as war in miniature, but it lacks at least one major characteristic of real
wars, namely, partial observability.
● In the “fog of war,” the existence and disposition of enemy units is often unknown until revealed by
direct contact. As a result, warfare includes the use of scouts and spies together information and the
use of concealment and bluff to confuse the enemy.
● Partially observable games share these characteristics and are thus qualitatively different from other
observable games.
Constraint Satisfaction Problem(CSP)
● In a CSP, we have a set of variables with known domains and a set of constraints that impose
restrictions on the values those variables can take.
● Our task is to assign a value to each variable so that we fulfill all the constraints.So, to formally
define a CSP, we specify:
Constraint Satisfaction Problem(CSP)
CSP as Search Problem
● The domains and variables together determine a set of all possible assignments (solutions) that
can be complete or partial.
● Finding the one(s) that satisfy all the constraints is a search problem like finding the shortest
path between two nodes in a graph.
● The CSP search graph nodes represent the assignments, and an edge from node u to vertex v
corresponds to a change in a single variable.
● The start node is the empty solution, with all variables unassigned.
● Each time when we cross an edge, we change the value of exactly one variable.
● The search stops when we find the complete assignment that satisfies all the constraints.
● The constraint satisfaction check corresponds to the goal test in the general search
Constraint Satisfaction Problem(CSP)
Constraint Graph
● We can visualize the CSP and the structure of its solutions as a constraint graph.
● If all the constraints are binary, the nodes in the graph represent the CSP variables, and the
edges denote the constraints acting upon them.
● However, if the constraints involve more than two variables (in which case, we call them
global), we create constraint hyper-graphs.
● They contain two types of nodes: the regular nodes that represent variables and hyper-nodes
that represent constraints.
Constraint Satisfaction Problem(CSP)
Backtracking Solver
● The idea is to start from an empty solution and set the variables one by one until we assign
values to all.
● When setting a variable, we consider only the values consistent with those of the previously set
variables.
● If we realize that we can’t set a variable without violating a constraint, we backtrack to one of
those we set before.
● Then, we set it to the next value in its domain and continue with other unassigned variables
● Although the order in which we assign the variables their values don’t affect the correctness of
the solution, it does influence the algorithm’s efficacy.
● The same goes for the order in which we try the values in a variable’s domain.
Constraint Satisfaction Problem(CSP)
Backtracking Solver
● Moreover, it turns out that whenever we set a variable, we can infer which values of the other
variables are inconsistent with the current assignment and discard them without traversing the
whole sub-tree.
● The way we implement variable selection, value ordering, and inference is crucial for
performance.
● In traditional search, we improved the algorithms by formulating problem-specific heuristics
that guided the search efficiently.
● However, it turns out that there are domain-independent techniques we can use to speed up the
backtracking solver for CSPs.
Thank you!