HBG 2021 08
HBG 2021 08
HBG 2021 08
ISSN 1651-2197
LU-CS/HBG-EX: 2021-08
Hbg-2021-08
Bachelor thesis:
Henrik Brange
© Copyright Henrik Brange
Printed in Sweden
Media-Tryck
Biblioteksdirektionen
Lunds universitet
Lund 2021
Abstract
In making a chess engine using alpha-beta search there are many ways to reduce
its execution time and thereby improve overall performance. One way is by using
move ordering heuristics, which are heuristics that reduce the search space by
attempting to make a rough estimation about which moves are most promising
before starting a search of a board position. Another example is using a
transposition table, which can reduce redundant computations. A third example is
multithreading, which is an attempt to utilize the several cores found in most
modern computer to improve the amount of information found per unit of time.
The purpose of this thesis is to investigate four different commonly used algorithms
and heuristics to see how much they can reduce the execution time of a chess
engine. These algorithms are the move-ordering heuristic known as MVV-LVA, a
transposition table, iterative deepening and Lazy SMP parallel search.
A Java chess engine named KLAS was implemented and used to measure the
impact of these four algorithms and heuristics. The largest impact came from the
MVV-LVA move ordering heuristic which decreased average execution time by
68.5%. Second most significant was the transposition table, which led to an
average decrease of 39.8% execution time. Lazy SMP and iterative deepening
resulted in average execution time reduction of 33.1% and 28.7% respectively.
All-node – A node where no moves cause a cut-off nor increase alpha or beta.
Bitboard – A binary number where each bit represents the location of a piece.
Branching factor – The factor at which a search tree branches, or in other words,
the number of possible moves on a given board position.
Depth – The distance, in nodes, from the root node to the current node.
Principle Variation – The specific line of moves that have led to the result
returned by an alpha-beta search.
PV-node – A node where at least one move increased alpha or beta and no move
caused a cut-off.
Search – A chess engine considering different moves and the positions they lead
to.
• RQ1.4 How much execution time can Lazy SMP save on average?
This thesis was carried out on behalf of the department of Computer Science at
LTH.
1
2
2 Background
In 1948, Alan Turing wrote the chess engine Turochamp for a computer that had
not yet been invented. Turing tried to implement Turochamp on one of the
computers available to him at the time, but unfortunately it failed as the computer
did not have the computing power necessary. Turing used algorithms in
Turochamp that are still used in chess engines today, such as: a minimax search
method, variable search depth and an evaluation method which considers the
mobility of chess pieces, king safety and the ability to castle. Turochamp was
capable of playing chess on par with an inexperienced amateur [2].
Claude Shannon was also among the first to investigate the possibility of using
computers for playing chess. In 1949, Shannon estimated the complexity of chess
and determined that the total number of possible games of chess is at least 10120.
This number has since came to be known as the Shannon number [3].
The time required for a computer to fully analyse a game of chess can be
estimated by dividing Shannon’s number by the number of floating-point operations
the computer can perform per second and multiplying by the number of such
operations needed to analyse a single board position. To get a very optimistic
approximation we can pretend that analysing a board position takes one floating-
point operation.
The fastest supercomputer in the world as of July 20211 has a computing power of
about 400 petaFLOPS, that is, 4 ∙ 1017 floating-point operations per second. If we
use this number with the estimation method above, we get 10120 / (4 ∙ 1017)
seconds or 8 ∙ 1085 billion years to fully analyse a chess game from the initial
position. This shows that it is clearly beyond the capabilities of conventional
computers to optimally play chess.
To figure out what moves are good and which ones aren’t, chess engines perform
something called a “search”. Searching is when a chess engine considers what
moves are available on the board as it is currently and then plays them on its
internal board representation to then repeat the process. At some point, the engine
must stop considering possibilities and instead evaluate the boards it has found so
far as the possibilities in chess rise exponentially with each move.
1
The supercomputer Fugaku located at the RIKEN Center for Computational Sciencce in Japan.
3
The ever-present lack of computing power gives rise to what is called the horizon
effect in the fields of AI and game theory. When a chess engine searches on a
board position, or in other words, considers different possible outcomes of the
available moves in that position, the horizon effect must be considered.
It arises, in a turn-based game, when the computer searches down to a limited
depth of possibilities and makes a move based on the information it has acquired
from this search. There is always the possibility that, if the computer had searched
even deeper, it would have found that this move is a bad choice. However, lacking
this search depth, it proceeds with the move believing it to be a good choice. The
horizon effect can never be fully eliminated in chess because the search depth and
information of the engine is always limited until nearing the end of the game and so
one must find algorithms and heuristics that can evaluate different chess positions
and make good move even when it is uncertain what will happen later during the
game.
The board representation contains all the relevant information about a chess
board. It includes the pieces that are still on the board, their location as well as
whether either of the two players has castled or has lost the ability to castle on the
kingside and/or the queenside.
4
The evaluation method is static, it does not consider what moves can be played nor
the possibilities that may occur after any moves have been played as that is the
responsibility of the search method. Traditionally evaluation was hand-crafted,
perhaps by skilled chess players, but today many of the more powerful chess
engines apply neural networks to perform the evaluation.
For a chess engine to be able to accurately determine how good a move is, it
needs to know what may occur after the move has been played. Therefore, a
search method is used for exploring possible future moves. The search method
requires an internal board representation and evaluation method to do just that.
Figure 2.1.1. A search tree. In the analysis of engines playing two-player turn-based games, search trees are
frequently used to represent the logical flow of the engine’s calculations. In these trees the edges represent
moves and the nodes are the resulting positions. The numbers inside of the nodes are the evaluations of the
position by the engine.
Many optimizations for improving chess engine performance involve the concept of
“pruning”. Pruning is when a chess engine rejects moves that it deems unlikely to
be played, to avoid unnecessarily dedicating time towards analysing them.
Through smart heuristics, an engine can realize that certain moves are surely
worse than others and thus not worth considering further. The more pruning that
occurs, the faster an engine can reach a certain search depth; however, this also
sometimes entails the risk of a good move being pruned despite warranting
consideration.
5
One of the most common archetypical search methods for turn-based, two-player
games is the minimax algorithm. A minimax algorithm is a recursive algorithm that
seeks to alternately maximize and minimize the score of the game. This is based
on the fair assumption that both players are trying to play their best moves, which
for one player means maximizing the evaluation and for the other, minimizing it.
The result it seeks is either the lowest evaluation the maximizing player is assured
of or the highest evaluation the minimizing player is assured of, given a certain
position. In this thesis, an enhanced form of a minimax algorithm is implemented
and examined, specifically using an optimization called alpha-beta pruning.
Optimizations used to improve the speed and playing ability of chess engines can
be divided into groups based on how they achieve this goal:
6
3 The KLAS Chess Engine
The repository for the KLAS chess engine can be found on Github at:
https://github.com/henrikbr21/examensarbete21
This chapter explains the inner workings of the chess engine KLAS as well as any
technical information that is necessary to understand this paper in its entirety. The
KLAS chess engine contains the following critical components, each explained in
its own section:
• Check & checkmate methods, used in both the move generator and
evaluation method.
In addition to the vital components listed above, KLAS employs four different
optimizations for improving execution time:
7
3.1 Search Trees
In combinational game theory, game trees are graphs used to represent all the
possible outcomes, and their intermediates, in games such as chess. As the
complexity of chess is much too large for a full game tree to be made for it, search
trees are commonly used in chess programming. These search trees are subsets
of the game tree containing the possibilities that an engine considers during a
search. These possibilities are called nodes, in chess specifically, they represent
different positions on the board. Rather than being a well-defined data structure,
they are graphs used to represent the logical flow of an engine.
At the start of the function is the end condition, where if the depth left to be
searched is equal to ‘0’ the method returns the value given by the evaluation
method for the node. If the end condition is not fulfilled, the method continues by
finding the child nodes, i.e., the moves available, and then playing each of them on
the internal board representation and then calling itself to repeat the process.
When all the children of a node have been searched, the greatest of their
evaluation is returned if the parent node is a board where it is the white player’s
turn to move, otherwise if it’s the player with black pieces whose turn it is to move,
the smallest value of the child nodes is returned.
8
Figure 3.1.1. An example of a search tree for a minimax search algorithm. The bold, red line represents the
principal variation, or in other words, the moves the algorithm expects will be played. The edges represent
moves, and the nodes are the resulting board positions. The numbers inside of the nodes are the evaluations
of the position by the engine.
Figure 3.1.1 shows an example of a search tree representing the flow of a search
method in a game of chess. In this, tree the light nodes correspond to board
positions where it is the maximizing player’s turn to make a move and the dark
nodes correspond to positions where it is the minimizing player’s turn to make a
move. The numbers in the leaf nodes are the values given by the evaluation
method. After a leaf node has been evaluated the evaluation is returned to the
parent node which in turn returns either the maximum or the minimum value of its
child nodes. Whether or not the smallest or greatest value is returned depends on
which player’s turn it is to move. If it is the maximizing player’s, i.e., the player with
the white pieces, turn to move then the greatest value is returned. Conversely, if it
is the minimizing player’s turn to move, the smallest value is returned. The result of
a minimax algorithm called at the root node is either the highest or lowest
(depending on which of the two recursive methods are called), evaluation the
player is guaranteed to achieve given expected play. This expected outcome in a
given position is also called the principal variation, which is the sequence of moves
that an engine expects to be played given a certain board position.
9
3.1.2 Alpha-Beta Pruning
Figure 3.1.2. A search tree of an alpha-beta algorithm demonstrating nodes that have been pruned. Pruned
nodes are crossed out with a red line.
10
In an alpha-beta algorithm, all nodes are evaluated, and the alpha and beta values
are updated, up until the point when a node’s evaluation falls outside the bounds of
alpha and beta. Because such a node cannot change the outcome of the search, it
may be pruned without affecting the result. An example of pseudocode
demonstrating the maximizing method of the alpha-beta pruning algorithm is shown
below:
The method above is the maximizing half of the alpha-beta search routine and it
calls the minimizing method for each of its children, which in turn calls the
maximizing method and so on. This makes these two methods corecursive. The
minimizing method is shown below:
11
Alpha-beta pruning gives a large average reduction in execution time in chess,
without the risk of overlooking any relevant possibilities. The worst-case scenario of
alpha-beta search is when the worst move is always examined first, meaning no
pruning can be made. In this scenario, the alpha-beta search performs the same as
the minimax algorithm. Assuming a constant number of moves B available in a
game of chess, and with a search depth of N, the total number of leaf nodes is B N
for both the minimax and alpha-beta algorithm. However, in a best-case scenario
the number of leaf nodes that needs to be examined is b⌈n/2⌉ + b⌊n/2⌋ - 1 [4]. To
maximize the benefit of alpha-beta pruning one needs accurate move ordering
(See chapter 3.7).
Many chess engines, including KLAS, build a principal variable as they search for
the purpose of debugging and to extract the best moves according to the engine.
The principal variation contains the line of moves the engine expects to be played.
In the search trees used to represent chess and similar games, there are three
commonly recognized node types as defined by Donald Knuth, an American
computer scientist and mathematician. In his 1975 monography, the Art of
Computer Programming, Knuth analysed a number of algorithms, among them the
minimax algorithm with alpha-beta pruning. The node types he defined for alpha-
beta search trees are as follows [5]:
• Type 3-nodes or “All-nodes” are nodes where the evaluation does not
cause an update in the alpha or beta values nor does it cause any alpha
or beta cut-off.
When KLAS searches a node, it determines its type and stores that along with its
score in the transposition table for later use (see chapter 3.8). The node type
determines what the score means. For PV-nodes the score is exact and for cut-
nodes and all-nodes the score is either an upper or lower bound of the actual
score. This is because of the alpha-beta pruning which doesn’t need to determine
the exact score of nodes that are not PV-nodes as all that is relevant is whether a
score is lower or greater than the bounds alpha and beta or not.
12
3.2 Board Representation
Figure 3.2.1. The mapping in KLAS of squares to indexes in the bitboards as seen from the perspective of the
player with the white pieces.
In KLAS, the least significant bit i.e., the right-most bit of a binary number, LSB, is
mapped to the top right square. The most significant bit, which is also the left-most
bit, MSB, is mapped to the bottom left square. As an example, the following binary
number corresponds to the position of the white pawns are the start of a chess
game.
”000000001111111000000000000000000000000000000000000000000”
Bitstring representing the white pawns at the start of a chess game.
13
The binary number above is shown more pedagogically in figure 3.2.2.
8 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0
a b c d e f g h
Figure 3.2.2. The bitboard for the white pawns are the start of chess game.
Lastly, the board representation contains a few other attributes related to castling
and en passant moves. The use of these is explained in chapter 3.5.
Chess engines use move generators to find all available moves on a given
chessboard. In KLAS, the move generator first generates moves that may not be
legal as they may leave the moving player’s king in check. After the move
generator has produced the list of moves, each move is individually checked for
validity. In case a move in the list causes the player’s own king to be in check, the
move is illegal and thus removed from the list.
Vital to the move generator’s function are four general use bitboards: the empty-
bitboard, the occupied-bitboard, the friends-bitboard and the enemies-bitboards.
The empty-bitboard contains 1s for each index at which the square is not occupied
on the board and the occupied-bitboard is the inverse the empty-bitboard. The
friends-bitboard is a subset of the occupied-bitboard containing 1s at the indexes
where the square is occupied, and the occupying piece is friendly to the player for
which the move generator is currently operating. In turn, the enemies-bitboard
contains 1s at the indexes where the square is occupied, and the occupying piece
belongs to the opponent.
14
3.3.1 Pawns
To generate moves for the pawns, KLAS uses bitshift to create a new bitboard
containing the squares that the pawns may move to. Shifting a bit eight steps to the
right is equivalent to moving a piece on step upwards on the board. For example,
the bitboard for the white pawns at the start of a game is shown in figure 3.3.1.
8 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0
a b c d e f g h
Figure 3.3.1. Bitboard for the white pawns at the start of a game.
The bitboard is then bit shifted eight steps to the right to create a new bitboard that
contains the possible one square pawn pushes.
8 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1
2 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
a b c d e f g h
Figure 3.3.2. The bitboard in figure 3.3.1 bit shifted eight steps to the left.
15
This bitboard is again bit shifted eight steps to the right to create the bitboard
containing the available two step pawn pushes.
8 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1
3 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
a b c d e f g h
Figure 3.3.3. The resulting bitboard after the bitboard containing the white pawns has been bit shifted a total of
16 steps to the right.
To prevent the move generator from generating moves where a pawn is pushed to
a square that is occupied, these bitboards are intersected with the empty bitboard
using bitwise AND.
8 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
6 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1
2 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
a b c d e f g h
Figure 3.3.4. The empty-bitboard at the start of a game.
To generate the pawn attacks, nine and seven step bit-shifting is used. After a nine
step rightwards bit shift of the bitboards for the white pawns, the resulting bitboard
will contain their rightward attacks. A pawn cannot attack a square that does not
contain an enemy, so to remove such moves the bitboard is also ANDed with the
enemies bitboard.
16
3.3.2 Knights and Kings
In KLAS, knight and king moves are generated using pre-calculated movement
bitboards which contain all the possible moves for a piece given its position,
assuming it is not blocked by a friendly piece. At startup, KLAS initializes a
movement bitboard for each possible square a knight can inhabit for a total of 64
bitboards.
8 0 0 0 0 0 0 0 0
7 0 0 0 1 0 1 0 0
6 0 0 1 0 0 0 1 0
5 0 0 0 0 0 0 0 0
4 0 0 1 0 0 0 1 0
3 0 0 0 1 0 1 0 0
2 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
a b c d e f g h
When the move generator finds a knight or king on a square, it retrieves the
corresponding movement bitboard and XORs it with the friendly-bitboard to avoid
the capture of one’s own pieces.
In chess, the bishop, rook and the queen are often referred to as “sliding pieces”.
This refers to their movements as they can slide along rays as far as the
chessboard allows unless they are stopped by another piece.
The rook moves along four different rays: one in an upwards direction, one
downwards and two left- and rightward directions. The moves for each of these
rays are generated in the same way, although separately. First, pre-calculated
movement bitboards are used to find the movement ray corresponding to the
position of the rook for which moves are to be generated. The movement bitboard
for the upward ray of a rook on square e3 is shown in figure 3.3.6.
8 0 0 0 0 1 0 0 0
7 0 0 0 0 1 0 0 0
6 0 0 0 0 1 0 0 0
5 0 0 0 0 1 0 0 0
4 0 0 0 0 1 0 0 0
3 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
a b c d e f g h
Figure 3.3.6. The bitboard containing the upward ray of a rook located on e3.
17
Unlike with the knights and kings, a simple intersection of the movement bitboards
and the occupied/friendly-bitboards is not enough to calculate the pseudo-legal
moves of this rook. If the square e7 is blocked by an enemy pawn, the rook cannot
attack the square behind it, e8. To remove moves that are blocked in this way, the
movement bitboard for the ray is bitwise XORed with a ray in the same direction
starting at the piece behind the first occupied piece along the original ray. For
example, if e7 was occupied, a ray starting at e8 in the same direction is used and
removed from the original movement ray.
We have thus far only shown how the move generator calculates bitboards
representing the available moves. However, in order to be used, the actual moves
must be extracted from the bitboards. The move generator iterates over each
bitboard containing the moves for each piece and then adds a move to the move
list whenever it encounters a square with a value of 1. As one bitboard is
generated for each individual piece, it is known which piece moves according to
which bitboard. However, in the case of pawns, all the one step pawn pushes are
stored in a single bitboard, and all the two step pawn pushes are stored in another.
This is possible because pawn pushes are of course always vertical – for example
if there is a possible pawn push to e4, it must be the pawn in the e column that can
perform it.
Only the bitboards containing pawn attacks need to be separated when the moves
are to be extracted as it is possible for two different pawns to attack the same
square.
The findMoveList function first obtains all pseudo-legal moves from the move
generator, then checks if each move is legal by playing them on its own internal
board. After a move has been played, a check function is called to determine
whether the move resulted in the player being checked. If the move leaves the
player in check, the move is rejected and if the player is not in check, the move is
added to the list of fully legal moves.
18
In chess, checkmate has occurred when a player is in check and will remain in
check regardless of what move they attempt to make. In KLAS checkmate is found
by considering all the pseudo-legal moves that are available to a player in check.
Should none of them be legal, as they all result in the player still being checked,
the player is checkmated.
As castling moves may only be played once per player, the move generator must
know if a castling move has already occurred. Therefore, the Board class has
these additional attributes:
These attributes keep track of whether castling is allowed on either side. Similarly,
information about whether en passant moves are possible is also stored in the
Board class.
The makeMove function is responsible for updating the castling and en passant
attributes when moves are made on the board.
19
3.6 Evaluation Methods
If a chess engine had infinite computing power, the evaluation would only need to
evaluate checkmate as the search method could explore the whole chess game
tree, where leaf nodes are either checkmate or stalemate. However, it is usually
not possible to search all the way to leaf nodes, except when very few pieces
remain. Better playing ability can then be achieved for the chess engine by using a
heuristic evaluation of various qualities of a board position – giving an estimation of
how likely it is to win for both players in that position.
The evaluation of the material on a board position in KLAS is show in the following
pseudocode:
function evalPosition(board) is
for each piece on board
if piece == WHITE_PAWN
points += 100
else if piece == WHITE_KNIGHT
points += 320
else if piece == WHITE_BISHOP
points += 330
else if piece == WHITE_ROOK
points += 500
else if piece == WHITE_QUEEN
points += 900
20
Each piece on the board is given a value based on its type. If the piece belongs to
the white player, the value is added to the total score, and conversely subtracted if
it belongs to the black player. This means that positions, where the white player’s
material is identical to the black player’s, are given the material score of 0. The
piece values used in KLAS are widely accepted in the chess community as
appropriate approximate values.
Piece-Square Tables
Move ordering is used to improve the performance of the alpha-beta search routine
by attempting to prioritize the most favorable moves first. Alpha-beta search
routines perform pruning when a move has been shown to result in a worse
evaluation than a previously examined move. In the best case, the best available
move is examined first causing other moves to be quickly pruned as they are
shown to be worse. The faster the engine finds a good move, the more cut-offs
may be performed to reduce the search space.
Move ordering optimizations involve a rough estimation of which moves appear the
most promising before the engine begins to examine them. The moves, which lead
to different board positions, or “nodes”, are then sorted based on this estimation, in
the hope that a good move is found early. The best-case scenario occurs when the
first child of each node is determined to be the PV-node out of all the child nodes.
It should be noted that move ordering optimizations have no effect on the actual
result of the search. A chess engine will decide on the same move, whether it uses
any move ordering because it is an optimization that only serves to reduce search
space through enabling more alpha and beta cut-offs.
21
In figure 3.7.1 a search tree for an alpha-beta method is shown. Inside each node,
the move that leads to the node is shown. The move “c2-c4” is the move from the
root node which gives rise to the leftmost child node. Without move ordering, this
move is explored first for arbitrary reasons that are completely unrelated to how
promising the move may or may not appear.
Figure 3.7.1. An alpha-beta search tree before any move ordering has occurred.
In figure 3.7.2 the same search tree is shown after move ordering has been
performed at the root node. The sorting has resulted in the move “e2-e4” being the
highest priority move of all the root nodes’ children, should this move turn out to be
the most favorable move at the root, the search space and thus execution time will
be reduced significantly.
Figure 3.7.2. The search tree from figure 3.7.1 after the move ordering has been performed at the root node.
22
After the move ordering routine has been applied at the root node and the most
promising child node is visited, it is applied once again on the new node “e2-e4”.
This time the move ordering routine finds that the “e7-e5”-move appears more
promising than the “c7-c6”-move, resulting in a new order as shown in figure 3.7.3.
Figure 3.7.3. The search tree from figure 3.7.2 after the move ordering routine has been performed at the “e2-
e4”-node.
The move ordering optimization is based on the concept that chess moves may be
divided into categories, some of which are statistically more likely to result in
favourable positions. However, if the heuristic behind the move ordering is
inaccurate, it will result in a lower performance than if there is no move ordering
whatsoever. In a position where the actual best moves are given low priority, move
ordering will fail to decrease the search space and increase execution time.
23
3.7.1 MVV-LVA
function prioritize(moves)
for(move in moves)
movingPiece = move.movingPiece
takenPiece = move.takenPiece
switch(movingPiece)
case “PAWN”
move.priority += -1
case “KNIGHT”
move.priority += -3
case “BISHOP”
move.priority += -3
case “ROOK”
move.priority += -5
case “QUEEN”
move.priority += -9
case “KING”
move.priority += -10
switch(takenPiece)
case “PAWN”
move.priority += 10
case “KNIGHT”
move.priority += 30
case “BISHOP”
move.priority += 30
case “ROOK”
move.priority += 50
case “QUEEN”
move.priority += 90
Each move’s priority is decreased depending on the value of the piece that moves
from one square to another and then increased depending on which piece, if any,
is taken. The bonus for taking a piece is greater than the penalty of moving a piece
and so taking moves are given a greater priority than others.
24
3.7.2 PV-Ordering
Another common move ordering optimization is to prioritize the moves that either
were a part of the principal variation of the previous search or those that, for a
node, caused an alpha or beta cut-off. These moves have already been
determined to be the either the best moves for the current node or at least good
enough to cause a cut-off. KLAS prioritizes these moves the highest, even higher
than the most promising move according to MVV-LVA. KLAS stores these moves
in the transposition table, and they are retrieved for the current node at the start of
both alpha-beta methods. See chapter 3.8.1.
Transposition tables are hash tables that store hashes of board positions along
with relevant information about them so that it may save time re-evaluating them. In
KLAS the transposition table is implemented using a HashMap where the keys are
the hash values of boards, and the values are objects of type TPTEntry. The
TPTEntry data structure contains the following fields:
25
3.8.1 Usage of the Transposition Table
During a search, KLAS computes a hash value for each node it examines and tries
to find a match in the transposition table. Should a match be found, the
corresponding TPTEntry is retrieved, and its depth is compared to the remaining
depth to be searched. As the score of a node becomes more accurate the deeper it
is searched, KLAS rejects any matches where the depth field is lesser than the
depth remaining to be searched.
The retrieval of entries in the transposition table occurs at the start of both two
alpha-beta methods. The usage of the entries depends on its depth attribute as
well as the node type attribute. An entry is only used if its depth attribute exceeds
or is equal to the depth left to be searched at that point in time. See the following
pseudocode:
The attribute nodeType allows KLAS to know whether the score attribute is an
exact score or an upper or lower bound of the actual score. Depending on which
type it is, KLAS uses the score differently. Should the node be a PV-node, the
score is known to be exact and is thus returned immediately. However, if the node
is either a cut-node or an all-node, the score is only usable if it exceeds the current
alpha and beta bounds as we then know that this node couldn’t possibly affect
these values.
26
The exact same logic is valid for the minimizing half of the alpha-beta search
routine except it is, in a sense, reversed. In the minimizing method, child nodes
with a score smaller than alpha cause alpha cut-offs and the lower bound, alpha, is
returned. A node where this happens is a cut-node. If none of the node’s children
causes an alpha cut-off and none of them succeeds in decreasing beta, the node is
an all-node with a score that is an upper bound. If no child nodes cause an alpha
cut-off and there is at least one child that decreases beta, the node is known to be
a PV-node with an exact score.
During the alpha-beta search methods, KLAS creates entries and sets their
nodeType attribute to their corresponding node type so that their score attribute is
not misinterpreted when the entries are used.
The Zobrist hash function was invented by Alexander Zobrist as an efficient way to
calculate hash values for chess positions using bitwise operations and a number of
random numbers. This type of hashing function is common in chess engines and is
used in KLAS [6].
To generate the hash of a board, the engine iterates over the squares and when
examines what state the square is in. For each square, it takes the random integer
corresponding to that square and performs an XOR operation on it and the hash in
its current state. This results in a 64-bit hash value with a very low probability of
collisions.
27
Pseudocode for the hash algorithm is shown below:
function hash(board) is
for each piece on board
if(pieceType == WHITE_PAWN)
hash ^= randomNumbers[WHITE_PAWN][piece.position]
else if(pieceType == WHITE_KNIGHT)
hash ^= randomNumbers[WHITE_KNIGHT][piece.position]
else if(pieceType == WHITE_BISHOP)
hash ^= randomNumbers[WHITE_BISHOP][piece.position]
else if(pieceType == WHITE_ROOK)
hash ^= randomNumbers[WHITE_ROOK][piece.position]
else if(pieceType == WHITE_QUEEN)
hash ^= randomNumbers[WHITE_QUEEN][piece.position]
else if(pieceType == WHITE_KING)
hash ^= randomNumbers[WHITE_KING][piece.position]
else if(pieceType == BLACK_PAWN)
hash ^= randomNumbers[BLACK_PAWN][piece.position]
else if(pieceType == BLACK_KNIGHT)
hash ^= randomNumbers[BLACK_KNIGHT][piece.position]
else if(pieceType == BLACK_BISHOP)
hash ^= randomNumbers[BLACK_BISHOP][piece.position]
else if(pieceType == BLACK_ROOK)
hash ^= randomNumbers[BLACK_ROOK][piece.position]
else if(pieceType == BLACK_QUEEN)
hash ^= randomNumbers[BLACK_QUEEN][piece.position]
else if(pieceType == BLACK_KING)
hash ^= randomNumbers[BLACK_KING][piece.position]
The reason why Zobrist hashing is used in many chess engines is because of how
it may be efficiently updated during a search. Knowing the previous hash value and
having access to the previous board position the hash for the new board may
efficiently be generated using only the changes incurred by the new move to be
performed. [6]
28
3.10 Lazy SMP Parallel Search
Lazy SMP can speed up searching by different threads sharing search results in a
transposition table. A number of helper threads are started, along with the main
thread and begin to search the board position. The transposition table allows each
thread to store data regarding the different board positions it has searched thus far
and then the other threads use this data instead of researching the position. Lazy
SMP has proven to be an effective optimization and is capable of halving execution
time in some cases [9].
If all the threads are searching the same node at the same time no speedup is
achieved and so to decrease the probability that this occurs, different move
ordering for each thread may be used. In KLAS the helper threads use randomized
move ordering at the root node to decrease the probability of two of them
searching the same node at the same time.
29
3.11 Other Optimizations
To minimize the execution time penalty incurred by the JVM’s garbage collection,
we made efforts to avoid initializing new objects when possible and instead reuse
old ones. One class often instantiated is the Move class which is an abstraction of
a chess move. To avoid instantiating new moves, a class called
MoveArrayListManager (MALM) was implemented. The MALM is responsible for
providing instances of type MoveArrayList (MAL), which are lists of moves.
The transposition table also recycles its entries in a similar manner to the
MoveArrayList. Whenever an old entry is to be overwritten, its attributes are
updated to the values of the new entry.
30
4 Assessment
To answer the research questions, we needed to measure the performance of
KLAS and analyse it from several perspectives. This chapter presents the
methodology of the measurements, the results as well as the analysis of the
results, each in its own section.
The optimizations that are discussed in this chapter are: iterative deepening, move
ordering according to MVV-LVA and PV-nodes, Lazy SMP as well as the usage of
the transposition table. To be able to examine each optimization independently of
one or several of the other optimizations a number of different variants of KLAS are
defined:
• MT – Complete, multithreaded KLAS which employs all the above
optimizations.
• ST – Complete KLAS except Lazy SMP-multithreading is disabled.
• MO – Single-threaded KLAS without MVV-LVA move ordering.
• TT – Single-threaded KLAS without using transposition table for
alpha/beta pruning.
• ID – Single-threaded KLAS with no iterative deepening.
The table below shows what optimizations are active in each variant.
31
4.2 Board Positions
The performance of the different KLAS versions was measured using sets of ten
connected lines, that is, ten board positions where each board is identical to the
previous one except for one move having been made. The first board in each of
these sets of ten will be referred to as an “original position”. Each original position
was chosen from different online resources like chess lessons and professional
matches. We used 60 sets for a total of 600 board positions.
The 600 board positions can be divided into two different groups, each based on
how the boards from the original board were generated. These groups are:
32
4.3 Metrics
A number of different metrics were recorded during the 600 searches. Each metric
recorded is listed and explained below.
33
4.3.1 Good Move Index (GMI)
5E+10
Execution TIme [ns]
4E+10
3E+10
2E+10
1E+10
0
0 2 4 6 8 10 12 14 16 18
-1E+10
Good Move Index
Figure 4.3.1. Scatterplot of each search’s execution time and its average Good Move Index (GMI) of
all depths combined. The trendline demonstrates the correlation between the two.
Although there are also other factors at play, a lower Good Move Index is a
predictor of lower execution time.
34
4.4 Results
This chapter presents the results from the measurements of the five different
variants of KLAS, each in their own section.
Variant Execution
time [ms]
MO 4066
ST 1280
The average Good Move Index of both the ST and MO variants are presented in
the figure 4.4.1.
Variant GMI (depth GMI (depth GMI (depth GMI (depth GMI (depth GMI (depth
0) 1) 2) 3) 4) 5)
MO 5.84 1.60 5.43 3.37 8.97 7.31
ST 4.89 0.98 2.00 1.35 1.48 0.32
Figure 4.4.1. The average Good Move Index at each depth of the MO and ST variants of KLAS.
35
4.4.2 Transposition Table
The transposition table in all variants of KLAS, where activated, had a maximum
size of 1.000.000 entries. In the TT variant of KLAS, the transposition table was not
fully deactivated, but the entries were only used for PV-ordering. No entries found
during a search in the TT variant were used to avoid researching a node.
The average execution time of the TT and ST variants of KLAS are presented in
the table below.
Variant Execution
time [ms]
TT 2124
ST 1280
The ST variant, which used the transposition table for alpha/beta cut-offs
performed the average search with a 39.8% reduction in execution time.
The average Good Move Index at each search depth of both the TT and ST
variants is presented in figure 4.4.2.
In terms of move average GMI, the TT and ST variants are relatively similar at
most depths. At depth 5 specifically the relative difference is significant, with the ID
version achieving an almost twice as low GMI.
36
4.4.3 Iterative Deepening
In KLAS, the search depth is incremented by two every iteration. First, KLAS
searches to a depth of two, then four and then six. As the ID variant of KLAS uses
no iterative deepening memory usage was only measured at the end of the search
at depth 6. The table below demonstrates the execution time of the ID variant and
ST variant.
Variant Execution
time [ms]
ID 1796
ST 1280
As can be seen in the table above, iterative deepening with PV-ordering caused a
decrease in execution time of 28.7% on average.
As can be seen in figure 4.4.3, the number of transposition table hits at lower
depths increases drastically with iterative deepening. However, at greater depths
the number of hits decreased with iterative deepening.
The average Good Move Index at each depth was recorded for the ID variant of
KLAS as well. This data is presented in figure 4.4.4.
37
The ST variant of KLAS, with its iterative deepening, achieved a much lower and
thus better GMI at every depth of search. Most significant is the decrease at lesser
depths while the difference is much lesser at greater depths.
The average execution time and memory usage at each iteration of iterative
deepening from the MT and ST variants of KLAS is presented below in the table
below.
The MT variant of KLAS performed the same 600 searches with an average
execution time of 850 milliseconds while the single-threaded ST variant of KLAS
performed the same searches with an average execution time of 1280
milliseconds. In other words, Lazy SMP decreased average execution time by
approximately 33.4%.
The MT variant used significantly more memory at the first two iterations of iterative
deepening: an increase of 155.6 or 20.5% and 168.5 MBs or 21.7% respectively. A
much smaller difference in memory usage was recorded after the last iteration to
depth 6, specifically 9.5 MBs or 1.2%.
The average number of transposition table hits at each depth per search is
presented in figure 4.4.5.
38
In the MT variant of KLAS the average GMI was recorded at each depth of search
for only the main search thread. The GMI of the helper threads was not recorded.
The GMI of the MT and ST variants are presented in figure 4.4.6.
For the MT variant the average GMI varied from its lowest value of 0.29 at depth 5
to its highest value of 4.59 at depth 0. Meanwhile, for the ST variant, the GMI
varied between 0.32 to 4.89. The MT variant achieved a better GMI at depths 0, 5
and 5 while the ST variant performed better, in terms of GMI, at depths 1, 2, 3.
The correlation between the average GMI and the execution time shown in chapter
4.4.1 was observed in the MT variant as well. Figure 4.4.7 shows a scatterplot of
the GMI as an average of all depths and the execution time.
6E+09
Execution Time [ns]
5E+09
4E+09
3E+09
2E+09
1E+09
0
0 2 4 6 8 10 12
Good Move Index
Figure 4.4.7. Scatterplot of each search’s execution time and its average Good Move Index (GMI) of
all depths combined. The trendline demonstrates the correlation between the two.
The correlation between the GMI and execution time is significantly weaker in the
MT variant of KLAS than the MO variant (see figure 4.3.1, chapter 4.3.1).
39
4.4.5 Garbage Collection
The Java Virtual Machine’s garbage collection was recorded both in terms of the
number of calls to the garbage collector as well as the time used to collect the
garbage at the end of each search. The average for all five variants of KLAS is
presented in the table below.
As can be seen in figure 4.4.8, the execution time penalty incurred by garbage
collection was quite similar for all the variants of KLAS except for one, the MT
variant. On average, garbage collection accounted for around 7 to 8% of the
execution time of the four lowest variants while it accounted for more than 24% of
the execution time of the MT variant.
40
4.5 Results Analysis
Although MVV-LVA is very effective at deeper search depth, it does not improve
move ordering that much at lower depths (see Figure 4.4.1 chapter 4.4.1). By using
iterative deepening and PV-ordering in conjunction with MVV-LVA, move ordering
is improved at both greater and lower depths.
41
4.5.2 Transposition Table
The usage of a transposition table was measured to answer research question 1.2:
how much execution time can transposition tables save on average? The answer
found through our testing is 39.8% when the entries are used for alpha/beta cut-
offs.
The transposition table in KLAS is also used for PV-ordering as well as achieving
cut-offs from previously found transpositions. Together with PV-ordering, the
performance boost that the transposition table results in is significantly higher than
39.8%.
The TT variant of KLAS managed to achieve a lower average GMI on search depth
0, 3, 4 and 5. This could be explained by more nodes being searched again due to
fewer cut-offs having happened.
The transposition table is also the primary source of memory needed. The increase
in memory usage of the certain variants was in all cases significantly lower than the
memory the transposition table used.
42
4.5.4 Lazy SMP
Lazy SMP was tested to answer research question 1.4: how much execution time
can Lazy SMP multithreading save on average? Through our measurements we
found Lazy SMP to reduce average execution time by 33.1%.
The reduction in execution time due to Lazy SMP comes from several helper
threads helping the main thread in its search by filling the transposition table with
relevant data for the main thread to use. In our testing, the helper threads
succeeded in increasing the number of hits in the transposition table for the main
thread at every depth of search, as can be seen in Figure 4.4.5.
Lazy SMP improved the move ordering by decreasing the GMI at every depth of
search as well. Why this is we do not know but we suspect it is somehow a side
effect of more alpha-beta pruning.
Memory usage was significantly increased at two of the three iterations of iterative
deepening when using Lazy SMP. Why memory usage was not increased after the
third iteration as well we do not know.
Lazy SMP brings with it a much higher impact of the Java virtual machine’s
garbage collection. We conclude that the usage of Lazy SMP warrants greater
effort to be taken in reducing this impact.
The four optimizations examined in this thesis were chosen for different reasons.
Firstly, it was our hope that they would be effective for the goal of reducing
execution time. Secondly, they were also chosen as they are all different from each
other in how they function. It would have been less interesting to investigate four
optimizations that all operate in a similar manner. The expectation was also that
they would interfere less with each other. For example, if four different move
ordering heuristics were implemented, the effectiveness of each individual heuristic
optimization would probably be diminished due to the other three.
43
4.6 Threats to validity
To ensure that measurements of execution time are accurate, and generalizable,
outside factors need to be eliminated. Any other processes running on the test
system when measurements are made can interfere with the results and so
execution time is measured in CPU cycles instead of wall-time using the Java class
ThreadMXBean [9]. Other possible outside factors include the optimization
performed by the Java Virtual Machine as well as interference from its garbage
collector.
The board positions used for all measurements consist of original boards and
boards generated from the original boards. The original boards were taken from
various online resources and mostly constitute example positions from chess
lessons as well as positions from various professional matches. It is possible that
the board positions used are not representative of chess as a whole, as most of
them are from the middle game. The reason why we chose middle game positions
is because this is where the greatest complexity and greatest branching factor is
found in chess.
The Java Virtual Machine dynamically optimizes Java programs during runtime,
this is done for example when a particular method has been run many times – it is
optimized to run faster on subsequent calls. However, these dynamic optimizations
introduce variance in runtime performance. To reduce these variances in runtime,
we let KLAS search method run 60 times without any usage of a transposition table
at the start of each measurement to make the JVM optimize as much as possible
before actually running a measurement. This should, along with the sample size
used for our results, reduce the JVM’s optimization as a significant source of error.
The different heuristics and algorithms examined in this thesis are interconnected,
meaning they affect each other’s function. For example, the different threads used
in the MT variant of KLAS all use MVV-LVA move ordering, which means that the
results of how effective Lazy SMP is might be dependent on the MVV-LVA
heuristic. It’s possible that Lazy SMP would be more effective or less effective if
MVV-LVA move ordering had not been used.
44
5 Conclusion
Four different algorithmic and heuristic optimizations were examined in this thesis:
MVV-LVA move ordering, transposition tables, iterative deepening and Lazy SMP
multithreading. All four optimizations resulted in significant performance boosts to
the KLAS chess engine with MVV-LVA move ordering having the single biggest
impact. MVV-LVA resulted in an average reduction in execution time of 68.5% and
the transposition table, when used to avoid researching nodes, resulted in a
reduction of 39.8% of average execution time. Lazy SMP parallel search and
iterative deepening resulted in the average execution time being reduced by 33.1%
and 28.7% respectively.
4000
3500
3000
2500
2000
1500
1000
500
0
MT ST ID TT MO
KLAS variant
Figure 5.1. The average execution time of each variant of KLAS, demonstrating the impact of each
of the four algorithms/heuristics.
45
We conclude that all four optimizations are worth implementing in a chess engine,
particularly MVV-LVA move ordering as it has few downsides. The transposition
table, while it is a very powerful optimization, has the downside of using a lot of
memory. As long as there are no strict memory constraints then it is a very
worthwhile optimization.
Lazy SMP is effective when implemented in Java and run on an Intel CPU with four
cores. This effect is however dependent on the system having more than one core.
It would be interesting to see how well Lazy SMP scales with an even higher
number of available cores.
A significant increase in the amount of time the Java Virtual Machine spent
collecting garbage was recorded in the MT variant of KLAS. We conclude that it is
warranted to take effort to reduce the amount of garbage generated if
implementing Lazy SMP like in KLAS.
46
6 References
1. J.A.N Lee. Konrad Zuse. IEEE Computer society.
URL: https://history.computer.org/pioneers/zuse.html
2. Alan Turing, Jack Copeland (ed.) (2004). The Essential Turing, 572. Oxford:
Oxford University Press.
3. Claude Shannon (1950). Programming a Computer for Playing Chess
(PDF). Philosophical Magazine. 41 (314). Retrieved September 6, 2021.
4. Daniel Edwards, Timothy Hart (1961). The Alpha-Beta Heuristic. AIM-030.
Massachusetts: Massachusetts Institute of Technology. URL:
http://dspace.mit.edu/handle/1721.1/6098.
5. Knuth, Donald E., and Ronald W. Moore. An Analysis of Alpha-Beta
Pruning. Artificial Intelligence, vol. 6, no. 4, 1975, pp. 293–326. DOI:
10.1016/0004-3702(75)90019-3
6. Albert Zobrist (1970). A New Hashing Method with Application for Game
Playing. Madison: Computer Sciences Department, University of Wisconsin.
URL: https://research.cs.wisc.edu/techreports/1970/TR88.pdf. Retrieved
September 20, 2021.
7. T.A Marsland (1991). Computer Chess and Search, pp 12. Computer
Science Department. Edmonton: University of Alberta. URL:
https://webdocs.cs.ualberta.ca/~tony/RecentPapers/Marsland-
CCandSearch-1991.pdf
8. Daylen Yang (2016). Stockfish 7. URL:
https://stockfishchess.org/blog/2016/stockfish-7/. Retrieved October 6, 2021.
9. Østensen, Emil (2016). A Complete Chess Engine Parallelized Using Lazy
SMP. Department of informatics, University of Oslo.
10. Oracle, Java interface ThreadMXBean documentation.
URL:
https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/la
ng/management/ThreadMXBean.html
47
Henrik Brange
[email protected]
Background
One of the pioneers of computer science, Claude Shannon, published in 1949 his
paper Programming a Computer for Playing Chess. In which he described how a
computer could play chess using a minimax algorithm. The minimax algorithm is
based around the idea that both player are trying to maximize and minimize a
value alternately. This is the case in chess, if we imagine that a board positions
that favors the player with the white pieces are given positive values while
positions that are favorable for the player with the black pieces are given
negative values.
Search
Search refers to when a chess engine considers what moves are available on a
board position, and then plays them on an internal board representation to see
what the result is. A chess engine may repeat this process several times to a so
called “depth”, which is the number of moves from the original board position
the engine considers. The result of a search is a list of moves the chess engine
expects will be played as it considers them to be the best moves available for
both players. Different optimizations may be used to speed up this process.
An example of a board position.
The Optimizations
The purpose of this thesis was to implement a chess engine called KLAS
and optimize it to play faster using four different optimizations. These
optimizations are:
Methodology Results
In order to investigate the potential of these four optimizations a Java chess All four techniques resulted in significant performance boosts
engine named KLAS was written and five different versions or “variants” of it were to the KLAS chess engine with MVV-LVA move ordering
defined: having the single biggest impact. The MVV-LVA heuristic
decreased average execution time by 68.5%.
MT – Complete, multithreaded KLAS which employs all four
techniques. The transposition table, is a very powerful technique as well
ST – Complete KLAS except Lazy SMP-multithreading is disabled. decreasing average execution time by 39.8% on average
ID – Single-threaded KLAS with no iterative deepening. when used to avoid researching board positions.
TT – Single-threaded KLAS without using a transposition table for
Lazy SMP decreased average execution time by 33.1% on
alpha/beta pruning.
average on the four core test system. However it also
MO – Single-threaded KLAS without MVV-LVA move ordering significantly increased both the memory usage as well as the
time the Java virtual machine spent on garbage collection.
These five different versions were used to search 600 different board positions of
chess and the execution time, as well as other relevant information, was Iterative deepening reduced average execution time by
measured. After the measurements were performed, the results were analysed 28.7%.
to find out how these techniques function in practice and how well they
perform.
AVERAGE EXECUTION TIME OF EACH KLAS VARIANT
Conclusion
4066
KLAS’ execution time with different upsides and downsides. The move ordering
heuristic MVV-LVA provides the biggest reduction in execution time and has few
downsides. The transposition table is another very powerful optimization but it has
2124
the downside of requiring a lot of memory. Lazy SMP is effective when running on
1796
the Intel i5 6600K test system, which has four cores, but is of course dependent on
the CPU having more than one core.
1280
850
MT ST ID TT MO
KLAS VARIANT