COMPARING CHESS OPENINGS PART 3: QUEEN'S PAWN OPENINGS
JAMAL MUNSHI
ABSTRACT: A dual engine experimental design for comparing chess openings was described in a previous paper (Munshi,
Comparing Chess Openings, 2014). It is used in this paper to study ten chess openings that are initiated with the queen's pawn
move 1. d4. One of the openings is identified as the mainline and the other nine as variations from the mainline. Five of the
variations are found to be benign innovations and the other four are deemed to be failed innovations. The findings are mostly
consistent with expert opinion. The primary purpose of this paper, however, is not these specific findings but rather the further
development and verification of an objective and quantitative methodology for the evaluation of chess openings in general1.
1. INTRODUCTION
This paper is the third of a series in a study undertaken to develop a generally applicable methodology for
the objective evaluation of chess openings. The proposed methodology uses controlled experiments with
chess engines to compare chess openings. The first paper in this series (Munshi, A Method for Comparing
Chess Openings, 2014) presented a single engine experimental design (SED) to compare ten openings
that are initiated with the King's pawn move 1. e4. It demonstrated that the proposed methodology is able
to discriminate between known strong openings and known weak openings. The advantage of the SED is
that it removes the difference in playing strength from the experiment and isolates the effect of the
opening; but the disadvantage is that the same engine playing both sides of the board may introduce an
engine bias in the data by not playing a sufficiently diverse set of opening variations.
Subsequently, a dual engine design (DED) was proposed to address the issue of engine bias (Munshi,
Comparing Chess Openings, 2014). The second paper showed that there may have been a propensity for
engine bias in the SED and that the engine bias problem is mitigated by the DED which forces the
engines to play a greater number of variations. This paper describes a further test of the DED using a new
set of openings.
The motivation for this study is that conventional methods of evaluating chess openings are inadequate.
Grandmaster opinions are subjective and inconsistent, while the win-loss-draw statistics in opening book
databases are field data that were not taken under controlled conditions and are therefore confounded by
intervening variables that have a greater effect on game outcomes than the opening (Munshi, A Method
for Comparing Chess Openings, 2014). As a result, there are conflicting opinions on the merit of the
different lines in the opening book and these opinions have engendered ongoing debates that have no
satisfactory conclusion. It is proposed that an objective method for evaluating openings will settle these
issues and help to refine the opening book.
1
Date: May 2014
Key words and phrases: chess openings, chess engines, refutation, methodology, Monte Carlo simulation,
numerical methods, probability vector, Euclidean distance, robust statistics, bootstrap
Author affiliation: Professor Emeritus, Sonoma State University, Rohnert Park, CA, 94928,
[email protected]
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
2
2. THEORY
Chess games may be thought of as a stochastic trinomial process driven by an unknown and unobservable
underlying probability vector given by
π = [pw, pb, pd]
Equation 1
where π is a vector with two degrees of freedom, pw is the probability that white will win, pb is the
probability that black will win, and pd = 1-pw-pb is the probability that the game will end in draw.
The components of the probability vector π are determined by (1) white's first move advantage or FMA,
(2) the general rate of imperfection in the moves or IMP, (3) the difference in playing strength between
the player making white moves and the player making black moves, or DIFF, and (4) the opening
employed (Munshi, A Method for Comparing Chess Openings, 2014). The value of FMA is not known
but we know that it is a universal constant and we suspect that its effect is relatively small. Experiments
designed to measure the effect of the opening must therefore control the values of IMP and DIFF so that
the opening effect can be observed. Our hypothesis is that the choice of opening line played can change π
and that therefore chess engine experiments under these controlled conditions may be used to detect the
effect of openings on the probability vector π.
In comparing two openings, opening-1 and opening-2, our research question and hypotheses are set up as
follows:
1. Research question:
2. Null hypothesis:
3. Alternate hypothesis
Ho:
Ha:
Is π1=π2?
π1=π2
π1≠π2
A testable implication of this hypothesis is that if the true (and unknown) population mean results are
plotted in Cartesian coordinates with x=number of wins by white and y=the number of wins by black, and
the Euclidean distance between opening-1 and opening -2 is computed and designated as δ, then we may
write the testable hypotheses as:
Ho:
Ha:
δ=0
δ≠0
If we fail to reject Ho in this test, we immediately reach the conclusion that the evidence does not show
that the probability vector π is changed by using opening-2 instead of opening-1. If we reject Ho,
however, we know that the probability vector changed but we still don't know the direction of the change.
Further tests are necessary to determine whether the change favors white, whether it favors black, or
whether the change is in a neutral direction and favors neither black nor white.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
3
3. METHODOLOGY
3.1
Baseline and test openings. Well recognized and established opening book databases (MeyerKahlen, 2000) (Jones & Powell, 2014) are used to select the first three moves (first six half moves) from
ten openings that begin with the queen's pawn move 1. d4. The opening sequence in this category most
used by grandmasters is the Queen's Indian Defense (Meyer-Kahlen, 2000)2 and it is identified as the
baseline opening and the other nine opening sequences selected are likewise described as test openings, or
innovations. The proposed methodology for comparing chess openings is then used to compare each of
the test openings with the baseline opening. The ten opening sequences selected for this study are shown
in Table 1.
The rarity data shown in the table refer to the frequency of the baseline relative to the test opening
according to the opening database used for this purpose3 (Meyer-Kahlen, Opening Database, 2000). The
nine test openings shown were selected to include a large rarity range and they are listed in the table from
the most frequently played to the least. The first four test openings listed may be considered to be
commonly used. The next two are not very common, and the last three are rarely played. The test
openings selected are expected to represent a wide spectrum of possibilities in the Queen's pawn game.
ID
E12QID
E43NID
D37QGD
E61KID
E11BID
E01CAT
A81DUD
A52BUG
A45TVA
A83DSG
Name
Queen's Indian Defense
Nimzo Indian Defense
Queen's Gambit Declined
King's Indian Defense
Bogo Indian Defense
Catalan Opening
Dutch Defense
Budapest Gambit
Trompovsky Attack
Staunton Gambit
Fixed 3-move sequence
d4Nf6 c4e6 Nf3b6
d4Nf6 c4e6 Nc3Bb4
d4Nf6 c4e6 Nf3d5
d4Nf6 c4g6 Nc3Bg7
d4Nf6 c4e6 Nf3Bb4+
d4Nf6 c4e6 g3d5
d4f5 g3Nf6 Bg2g6
d4Nf6 c4e5 dxe5Ng4
d4Nf6 Bg5Ne4 Bf4c5
d4f5 e4fxe4 Nc3Nf6
Rarity
1.0
1.0
1.7
2.6
7.1
8.1
23.3
25.2
108.2
Innovator
White
Black
Black
Black
White
Black
Black
White
White
Table 1 Baseline and test openings
3.2
Dual engine experimental design. The dual engine design (DED) described in a previous paper
(Munshi, Comparing Chess Openings, 2014) is used to compare each test opening with the baseline using
chess engine experiments. Each experiment consists of 300 games played between two chess engines. The
engines selected are Houdini3Pro and Houdini4Pro (Houdart, 2013), generally regarded as the leaders in
this kind of chess software (Wikipedia, 2014). All engine parameters are set to their default values. In
each experiment, each of the two engines plays 150 games as white and 150 games as black. Every game
of each experiment begins with the six half moves being evaluated. These move sequences are shown in
Table 1. Engine calculations begin with the fourth move by white. The engine moves may cause
transpositions of the opening into different opening designations than that by which it is identified in this
paper and these transpositions are noted in the Appendix.
2
The identification of the "mainline" varies among databases. The selection of the baseline is therefore somewhat
arbitrary since any of the first three openings listed could have been used as the mainline.
3
The rarity values differ among opening databases. They should be taken only in a very approximate sense.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
4
The Deep Shredder chess GUI4 software (Meyer-Kahlen, Deep Shredder, 2007) is used to set up the
engine matches. The search depth is fixed and set to 21 half moves for both engines, a level at which the
engines are expected to play at the grandmaster level or better (Ferreira, 2013). The very high level of
play is evident in the relatively low percentage of decisive games and a low estimated value of IMP. For
example, in the baseline case, 12% of the games were decisive with the remaining 88% ending in draw.
The estimated value of IMP, the rate of imperfection in the moves, is 2% as measured by the number of
wins by black5. Also, a comparison of the playing strength of the two engines6 under the controlled
experimental conditions of this study shows no evidence of a significant difference in playing strength
(DIFF). These statistics are indicative of a very high level of play in which the effect of the opening is
unlikely to be overcome by move imperfections (IMP) or by the difference in playing strength (DIFF).
The relevant data recorded for each experiment are shown below. The opening variability is a count of the
number of unique moves made by the engines from the fourth to the tenth move. It serves as a measure of
the number of variations computed by the engines during the opening phase of the game (Munshi,
Comparing Chess Openings, 2014).
1.
2.
3.
4.
White
Black
OV
Transpositions
the number of games won by white
the number of games won by black
the opening variability
whether the engine moves changed the opening designation
3.3
Comparing test openings against the baseline. We assume that game outcomes in the baseline
opening are driven by an underlying, unknown, and unobservable probability vector π and if the opening
innovation7 to be evaluated changes the vector π we will be able to observe the effect of this change in the
data. We then use the data to classify each test opening into one of three categories:
Category A:
Category C:
Category F:
Successful innovation.
The probability vector has been changed in favor of the innovator.
Benign innovation.
The probability vector is either unchanged or it was changed in a neutral direction.
Failed innovation.
The probability vector has been changed in favor of the opponent.
The test is carried out in stages. First we test to see if the Euclidean distance between the baseline opening
result and the test opening result in the population from which our sample was taken is greater than zero.
The hypotheses for this test are:
Ho:
Ha:
4
δ=0
δ≠0
Graphical User Interface
See (Munshi, A Method for Comparing Chess Openings, 2014) for a detailed explanation.
6
The comparison is shown in the Appendix.
7
The terms "test opening" and "opening innovation" are used interchangeably. It is assumed that any opening
sequence that differs from the mainline is an innovation.
5
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
5
We set the probability value for our level of disbelief at α=0.001 as suggested by Valen Johnson who has
studied the relationship between the α level and the irreproducibility of results and found that the higher
values of α such as α=0.05 or α=0.01 normally used can lead to spurious findings (Johnson, 2013). If the
probability of observing a sample distance as large or larger8 than the one being tested (i.e. the p-value) is
greater than α we fail to reject Ho and conclude that it is possible that the observed distance is a result of
sampling variation in a sample of 300 games taken from an unobservable population in which δ=0. In
these cases we can immediately classify the test opening into Category C as a benign innovation because
we have no evidence that the probability vector has been changed by the test opening.
However, if the p-value is less than α, we know that δ≠0 and conclude that the test opening has changed
the probability vector but we are unable to classify the test opening until we determine the direction of the
change. If the direction is well within the first or third quadrant, it is possible that the change is in a
neutral direction and therefore we can classify the opening as Category C, benign innovation. This finding
implies that the effect of the opening was only a change in the probability of decisive games with p w and
pb changing proportionately with neither color gaining an advantage due to the opening innovation.
If the direction is in the second or fourth quadrant, then the relative values of pw and pb have changed and
one color has gained an advantage over another. In this case we can classify the opening as either
Category A or Category F according to whether the change favors the innovator or the opponent. The
possibilities are shown in Table 2.
Innovator
White
Black
Quadrant 1
C
C
Quadrant 2
F
A
Quadrant 3
C
C
Quadrant 4
A
F
Table 2 Classification according to innovator and direction (δ≠0)
3.4
Monte Carlo simulation. As in the previous papers we use a Monte Carlo numerical technique
to create a simulated sampling distribution from which we derive a measure of variance that we use in our
hypothesis test for distance (Munshi, A Method for Comparing Chess Openings, 2014) (Wikipedia,
2014). The sample data are used to estimate π=[pw,pb,pd] and these estimates are used to generate one
thousand simulated replications of the experiment. For each opening, we compute the squared Euclidean
distance of each simulated game from the mean9. Thus we have one thousand squared distances for each
opening. When comparing two openings we have two thousand squared distances from their respective
means. These squared distances are used to estimate what may be termed the "within treatment" variance
of distance10. This variance serves as a measure of how different the sample results can be when taking
samples of 300 games from the same population with a fixed value of π. This measure of variance can
then be used to compute the probability of observing distances "between treatments11" greater than or
equal to the observed distance if Ho is true and δ=0. This probability serves as the basis of our hypothesis
test.
8
Since the distance is computed as a square root it can be either positive or negative and therefore this is a two
tailed test. The reference to the magnitude of the distance as being "large or larger" refers to its absolute value.
9
By defi itio , the ea is represe ted by the sa ple data that were used to esti ate π.
10
In Fisher's terminology each opening is a treatment
11
Between a test opening and the baseline opening.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
6
4.0 DATA ANALYSIS
The raw data from ten experiments of 300 games each are shown in Table 3. The essential data are the
number of games won by white (White), the number of games won by black (Black), and the opening
variability (OV). The OV data serve as a measure of the number of different variations played by the
engines in the "opening phase12" of the game after the first three moves specified and fixed for each
experiment are exhausted. These variations often cause the ECO designation13 to change. All such
transpositions are listed in the Appendix along with references to high profile grandmaster games14 for
each ECO designation played by the engines. All three thousand games played are available in PGN
format in the online data archive for this paper (Munshi, PGN Files, 2014).
ID
E12QID
E43NID
D37QGD
E61KID
E11BID
E01CAT
A81DUD
A52BUG
A45TVA
A83DSG
Name
Queen's Indian Defense
Nimzo Indian Defense
Queen's Gambit Declined
King's Indian Defense
Bogo Indian Defense
Catalan Opening
Dutch Defense
Budapest Gambit
Trompovsky Attack
Staunton Gambit
OV
999
692
821
560
940
1002
914
403
603
419
White
30
30
31
68
29
26
48
63
7
3
Black
6
7
3
3
10
5
3
1
7
20
Decisive
36
37
34
71
39
31
51
64
14
23
Draw
264
263
266
229
261
269
249
236
286
277
Pct Draw
88%
88%
89%
76%
87%
90%
83%
79%
95%
92%
Table 3 Observed sample data
4.1
Hypothesis test for distance. We can now use the sample data15 to compute the Euclidean
distance of each test opening from the baseline opening. The distance may be visualized in Cartesian
coordinates where the x-axis represents the number of white wins and the y-axis represents the number of
black wins in any given simple random sample of 300 games. Each point in this x-y space represents a
sample in our study. In the population of all possible games each point in this space represents a unique
chess game and its probability vector. The data in Table 3 are shown in this format in Figure 1.
12
Arbitrarily assumed to constitute the first ten moves of the game
Encyclopedia of Chess Openings
14
Tournaments from which these games are taken include the London Classic, World Championship Candidates,
Moscow Open, Tata Steel, Tal Memorial, Chigorin Memorial, and the Geneva Chess Masters
15
Each experiment of 300 games is considered to be a simple random sample of 300 games taken from a
population of an infinite number of games in which all games are driven by the same unobservable probability
vector.
13
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
7
Number of games won by black
25
20
Staunton
15
10
Bogo Indian
Trompovsky
Nimzo Indian
Queens Indian
Catalan
5
Queens Gambit
Dutch
Kings Indian
Budapest
0
0
10
20
30
40
50
60
70
80
Number of games won by white
Figure 1 Sample data in Cartesian coordinates
The Queen's Indian Defense is our baseline opening to which the other openings will be compared. So
what we are interested in is the distance and direction of each of the test openings from the Queen's
Indian. These distances and their directions are visualized more easily of we move the axis of the plot to
the Queen's Indian and set it as the (0,0) point. This visualization of the distance vectors in our study is
shown in Figure 2. An example distance computation is shown in Table 4. All the observed distances are
tabulated in Table 5 along with the estimated standard deviation and the hypothesis tests for distance.
Dutch Defense
Queens Indian
Difference
Squared difference
Squared distance=sum of squared differences
Euclidean distance=square root of squared
distance
Table 4 Example distance computation
White
48
30
18
324
Black
3
6
-3
9
333
18.25
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
8
20
Staunton
10
Bogo Indian
Trompovsky
-50
-25
Nimzo Indian
0
0
Catalan
25
50
Kings Indian
Queens Gambit
Dutch
Budapest
-10
Figure 2 Visualization of the distance and direction of the test openings from the baseline
ID
E43NID
D37QGD
E61KID
E11BID
E01CAT
A81DUD
A52BUG
A45TVA
A83DSG
Name
Nimzo Indian Defense
Queen's Gambit Declined
King's Indian Defense
Bogo Indian Defense
Catalan Opening
Dutch Defense
Budapest Gambit
Trompovsky Attack
Staunton Gambit
Table 5 Hypothesis test for distance
distance
1.00
3.16
38.12
4.12
4.12
18.25
33.38
23.02
30.41
stdev16
5.85
5.68
6.64
5.88
5.61
6.17
6.52
4.89
5.34
t-value
0.171
0.557
5.737
0.701
0.734
2.956
5.120
4.712
5.692
p-value
8.6E-01
5.8E-01
1.1E-08
4.8E-01
4.6E-01
3.2E-03
3.3E-07
2.6E-06
1.4E-08
Result
--Reject Ho
---Reject Ho
Reject Ho
Reject Ho
17
The hypothesis tests in Table 5 show that the observed distances of five of the test openings from the
baseline opening are small enough to have been the result of sampling variation. In these cases we do not
reject the null hypothesis Ho that δ=0 and conclude that the evidence does not show that the opening
innovation has changed the probability vector. These test openings are therefore classified as Category C,
benign innovation. The data are consistent with the hypothesis that the probability vector that generates
game outcomes in these test openings is not different from that which generates game outcomes in the
baseline Queen's Indian Defense, that is, π(test opening) = π(baseline opening).
16
The term stdev refers to the standard deviation of distance and its value is estimated by using a Monte Carlo
simulation procedure. The computational details are available in the online data archive for this paper (Munshi,
Numerican analysis, 2014).
17
The comparison of each test opening with the baseline opening is shown graphically in the Appendix.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
9
In the remaining four test openings, marked in Table 5 as "Reject Ho", we find that the observed sample
distance is too large to be explained by sampling variation alone. In these cases we reject the Ho
hypothesis and conclude that δ≠0 and that therefore the test opening innovation has changed the
probability vector so that π(test opening) ≠π(baseline opening). To classify these openings we must
examine the direction of the change to determine whether the change in π favors the innovator, or whether
it favors the opponent, or whether the change is in a neutral direction and does not favor either party. The
direction information for these four test openings are shown in Table 6.
ID
E61KID
A52BUG
A45TVA
A83DSG
Name
King's Indian Defense
Budapest Gambit
Trompovsky Attack
Staunton Gambit
distance
23.36
23.50
25.15
24.66
angle
356
352
178
161
quadrant
4
4
2
2
favors
white
white
black
black
innovator
black
black
white
white
Category
F
F
F
F
Table 6 Classification of distant test openings according to direction
What we see in Table 6 is that none of the test openings that changed the π vector gained from the change
and also that none of these changes are in a neutral direction. All of these innovations are detrimental to
the innovator and therefore all of them are classified as Category F, failed innovation. The information in
Table 6 is presented visually in Figure 2 where one can see clearly that the Category F innovations by
white decreased white's chance of winning or increased black's chance of winning or that they did both.
Likewise Figure 2 also shows that the Category F innovations by black decreased black's chance of
winning or increased white's chance of winning or that they did both.
We now summarize our findings in Table 7. The table shows our final classification of all the test
openings in view of the data we collected from our controlled engine experiments and their analysis as
presented above. As noted in the table, none of the openings tested was a successful innovation and none
of the changes in the probability vector occurred in a neutral direction.
ID
Name
E43NID
Nimzo Indian Defense
Category
C
Observed distance explained by sampling variation
Reason for the classification
D37QGD
Queen's Gambit Declined
C
Observed distance explained by sampling variation
E61KID
King's Indian Defense
F
Probability vector changed in favor of the opponent
E11BID
Bogo Indian Defense
C
Observed distance explained by sampling variation
E01CAT
Catalan Opening
C
Observed distance explained by sampling variation
A81DUD
Dutch Defense
C
Observed distance explained by sampling variation
A52BUG
Budapest Gambit
F
Probability vector changed in favor of the opponent
A45TVA
Trompovsky Attack
F
Probability vector was changed in favor of the opponent
A83DSG
Staunton Gambit
F
Probability vector changed in favor of the opponent
NONE
----
A
Probability vector changed in favor of the innovator
NONE
----
C
Probability vector changed in a neutral direction
Table 7 Summary of findings
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
10
5. CONCLUSIONS
Engine experiments carried out under controlled conditions show that the Queen's Indian Defense, chosen
as the baseline opening in this study, may be considered to be neutral and perfect in this category of
openings because none of the nine innovations tested offered any advantage to the innovator18. Of the
nine test openings, five were found to be benign innovations as they had no measurable effect on the
underlying and unobservable probability vector that determines chess game outcomes under the baseline
conditions. The observed distances of these openings from the baseline may be explained in terms of
sampling variation19.
Two members of this group, the Dutch Defense, an innovation by black, and the Catalan Opening, an
innovation by white, are noteworthy in that our findings are inconsistent with their rarity of play and
supportive of their positive evaluation by analysts who have studied these openings (Bologan, 2012)
(Harding, 2010) (Kelley, 2005) (Kelley, Catalan, 2008). The other three members of this group, the
Nimzo Indian Defense, Queen's Gambit Declined, and the Bogo Indian Defense are universally
considered to be strong openings (Dearin, 2005) (Sielecki, 2014) comparable with the Queen's Indian
Defense and our findings along with their popularity in the opening book support this view.
In four of the test openings, the evidence indicates that the opening innovation changed the probability
vector. This means that the probability vector that generates game outcomes in these openings is not the
same as that which generates game outcomes in the baseline Queen's Indian Defense. In all four cases the
change in the probability vector goes against the innovator and so they are classified as failed innovations.
The most significant member of this group is the King's Indian Defense, an opening that is at once a
popular line in the opening book database (Jones & Powell, 2014) and also viewed in a positive light by
many analysts (Kelley, Kings Indian, 2008) (Gserper, 2010) (Golubov, 2006). Our experiment shows that
it is a failed innovation by black. Some analysts agree (Semkov, 2009) (Hansen, 2009). The other three
failed innovations are less controversial because our findings are consistent with general opinion and the
opening book. The Staunton Gambit, the Trompovsky Attack, and the Budapest Gambit are played very
rarely at high level games according to the opening books (Jones & Powell, 2014) and analysts generally
tend to project a negative opinion on these innovations (Dzindzichashvili, 2009) (Schiller, 1993) (Prie,
2009).
The motivation for this study is not so much to pass judgment on specific opening lines but rather to
develop and refine an objective methodology for the evaluation of chess openings in general. We
recognize that at a sufficiently high move imperfection rate (IMP) chess games would be decided mostly
by move errors and in those games the disadvantage of the failed opening innovations noted in this study
may not become apparent20. Yet the relative merit of opening lines is of great interest to the chess
community and its evaluation has a practical application for designers of opening books.
18
A comparison of the Queen's Indian Defense with the Sicilian mainline in the Appendix further supports its
selection as a neutral and perfect baseline against which other queen's pawn openings may be compared.
19
The term "sampling variation" refers to the difference among samples taken from the same population.
20
The move imperfection rate in the baseline opening as measured by the percentage win by black is 13% in
grandmaster games (Jones & Powell, 2014) compared with 2% in our engine experiments.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
11
6. APPENDIX
6.1
Comparison of the playing strength of the engines. One of the test opening experiments had a
result very similar to that of the baseline opening. To compare engine strengths at the same sample size
used for comparing openings (n=300), we combine these two experiments into a large sample of 600
games as shown below.
Opening
E12 Queens Indian Defense
E43 Nimzo Indian Defense
Combined E12 + E43
Games played
300
300
600
White won
30
30
60
Black won
6
7
13
Of the 600 games, each engine played 300 games as white and 300 as black. We now count for each
engine the number of games won as white and the number won as black. These data allows us to set up
the comparison of the engines as follows.
Engine
Houdini3Pro
Houdini4Pro
Games played as white
300
300
Won as White
33
27
Games played as black
300
300
Won as Black
4
9
These results, plotted in Cartesian coordinates show a Euclidean distance between them of 7.8102. As in
the evaluation of openings, we create a simulated sampling distribution and estimate the variation of the
distance that we can expect from one sample to the next. We can now set up a hypothesis test for distance
as follows:
Ho:
Ha:
δ=0
δ≠0
The data are tabulated below and shown graphically in Figure 3..
Observed distance
Standard deviation of the sampling distribution of distances
The value of the t-statistic for Ho: δ=0
Probability that we would observe a t-value this large or larger if δ=0
Probability value that serves as our threshold of disbelief
7.8102
5.8151
1.3431
0.1794
0.001
The test shows that the distance observed is one we would to observe as sampling error even when the
true value of the distance is zero. Therefore we fail to reject the Ho statement that δ=0 and conclude that
the evidence does not show that there is a difference in playing strength between the two engines under
the experimental conditions used in this study.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
12
Houdini4Pro -vs- Houdini3Pro
25
Won as Black
20
15
10
5
0
0
10
20
30
40
50
60
Won as White
Figure 3 Comparison of engine playing strength
6.2
Comparison of two baseline openings. A baseline opening used in a previous study (Munshi,
Comparing Chess Openings, 2014) and the one used in this study are compared. The sample data from the
DED experiments are as follows21:
Opening
E12 Queens Indian Defense
B53 Sicilian Defense
Games played
300
300
White won
30
19
Black won
6
3
The Euclidean distance between these results in Cartesian coordinates is 11.402 and the Queen's Indian
Defense projects at an angle of 14 degrees from the Sicilian Defense. The standard deviation of distance
is estimated using Monte Carlo simulation to be 5.3208. Using the t-distribution we find that the
probability of observing a sample distance of 11.402 or greater under these conditions is 0.0322, much
larger than our threshold of α=0.001. We fail to reject Ho that the true distance δ=0 and that the two
probability vectors are the same. In any event, even if the observed distance were larger and we had
rejected Ho, we would have to consider that the angle lies in the first quadrant and that therefore the large
distance would only indicate a difference in the probability of decisive games and not necessarily a
relative advantage to either white or black. The comparison is shown graphically in Figure 4.
21
The three-move sequences used in the test are: B53 Sicilian Defense = e4c5 Nf3d6 d4cxd4 and E12 Queen's
Indian Defense = d4Nf6 c4e6 Nf3b6.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
13
Number of games won by
Black
B53 Sicilian -vs- E12 QID
15
10
5
0
0
10
20
30
40
50
60
Number of games won by White
Figure 4 E12 Queen's Indian Defense compared with B53 Sicilian Defense
22
6.3
Transpositions. The ECO codes used to identify the baseline and test openings apply to the first
three moves that were fixed for each experiment. Engine calculations began in the fourth move and the
engine moves often caused transpositions to different ECO designations. All such transpositions are noted
in Table 8 along with references to recent grandmaster games.
ID
E12QID
E43NID
D37QGD
E61KID
E11BID
E01CAT
A81DUD
A52BUG
A45TVA
Transpositions
E12 Queen's Indian/Petrosian Variation
E14 Queen's Indian/ Classical Variation
E16 Queen's Indian/ Classical Variation
E43 Nimzo Indian/Nimzowitch Variation
E43 Nimzo Indian/ Nimzowitsch Variation
E47 Nimzo Indian/ Mainline
D37 Queen's Gambit Declined
E70 Kings Indian Defense
E71 Kings Indian Defense
E90 Kings Indian Defense
E91 Kings Indian Defense
E11 Bogo Indian Defense
E15 Queens Indian/Classical Variation
E16 Queens Indian/Classical Variation
E01 Catalan Opening
E04 Catalan Opening
E06 Catalan Opening
E11 Bogo Indian Defense
E16Queen's Indian/Classical Variation
A81 Dutch Defense
A87 Dutch/Leningrad Variation
A88 Dutch/Leningrad Variation
A89 Dutch/Leningrad Variation
A52 Budapest Gambit
A45 Trompovsky Attack
Grandmaster games
(Carlsen-Karjakin, 2012)
(Kramnik-Pelletier, 2013)
(Gelfand-Gashimov, 2012)
(Radjabov-Leitao, 2001)
(Gelfand-Grischuk, 2012)
(Topalov-Kramnik, 2014)
(Carlsen-Radjabov, 2014)
(Aronian-Radjabov, 2013)
(Aronian-Carlsen, 2013)
(Nakamura-Morozevich, 2013)
(Caruana-Short, 2013)
(Radjabov-Karjakin, 2012)
(Bratteteig-Pleninger, 2012)
(Kramnik-Leko, 2012)
(Andreikin-Mamedyarov, 2014)
(Caruana-Karjakin, 2012)
_______________________________
(Cramling-Kosteniuk, 2000)
(Gupta-Carr, 2013)
(Aronian-Namakura, 2012)
(Phillips-Moloney, 2012)
(Gelfand-Rapport, 2014)
(Rapport-Aronian, 2014)
Table 8 Transpositions
22
In all such graphs the baseline is shown in blue with square markers and the test opening is shown in red with
diamond shaped markers.
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
6.4
14
Graphical depiction of Monte Carlo simulation results
Number of games won by
Black
E12 QID -vs- E43 NID
20
15
10
5
0
0
10
20
30
40
50
60
50
60
Number of games won by White
Number of games won by
Black
E12 QID -vs- D37 QGD
15
10
5
0
0
10
20
30
40
Number of games won by White
Number of games won by
Black
E12 QID -vs- E61 KID
15
10
5
0
0
20
40
60
Number of games won by White
80
100
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
15
Number of games won by
Black
E12 QID -vs- E11 BID
20
15
10
5
0
0
10
20
30
40
50
60
Number of games won by White
Number of games won by
Black
E12 QID -vs- E01 CAT
15
10
5
0
0
10
20
30
40
50
60
Number of games won by White
Number of games won by
Black
E12 QID -vs- A81 DUD
15
10
5
0
0
20
40
60
Number of games won by White
80
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
16
Number of games won by
Black
E12 QID -vs- A52 BUG
15
10
5
0
0
20
40
60
80
100
Number of games won by White
Number of games won by
Black
E12 QID -vs- A45 TVA
20
15
10
5
0
0
10
20
30
40
50
60
50
60
Number of games won by White
Number of games won by
Black
E12 QID -vs- A83 DSG
40
30
20
10
0
0
10
20
30
40
Number of games won by White
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
17
7. REFERENCES
Andreikin-Mamedyarov. (2014). World Chess Championship Candidates. Retrieved 2014, from
chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751553
Aronian-Carlsen. (2013). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1704703
Aronian-Namakura. (2012). Tal Memorial. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1669174
Aronian-Radjabov. (2013). World Championship Candidates. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1714072
Bologan, V. (2012). The Powerful Catalan. New in Chess.
Bratteteig-Pleninger. (2012). London Classic. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1700479
Carlsen-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1654399
Carlsen-Radjabov. (2014). Gashimov Memorial. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1753385
Caruana-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1744284
Caruana-Short. (2013). London Chess Classic. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1741071
Cramling-Kosteniuk. (2000). WCC. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1258901
Dearin, E. (2005). Play the Nimzo Indian. Everyman Chess.
Dzindzichashvili, R. (2009). Budapest gambit. Retrieved 2014, from Youtube:
http://www.youtube.com/watch?v=-ShFKpTkL9Q
Ferreira, D. (2013). The impact of search depth on chess playing strength. Retrieved 2014, from Instituto
Superior Tecnico: http://web.ist.utl.pt/diogo.ferreira/papers/ferreira13impact.pdf
Gelfand-Gashimov. (2012). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1654442
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
18
Gelfand-Grischuk. (2012). World Rapid Championship. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1671296
Gelfand-Rapport. (2014). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1744267
Golubov, M. (2006). Understanding the King's Gambit. Gambit Publications.
Gserper, G. (2010). King's Indian Defense. Retrieved 2014, from chess.com:
http://www.chess.com/article/view/openings-for-tactical-players-kings-indian-defense
Gupta-Carr. (2013). London Classic. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1740547
Hansen, C. (2009). Checkpoint. Retrieved 2014, from Chesscafe.com:
http://www.chesscafe.com/text/hansen125.pdf
Harding, T. (2010). Play the Dutch. Retrieved 2014, from chesscafe.com:
http://www.chesscafe.com/text/kibitz175.pdf
Houdart, R. (2012). Houdini. Retrieved November 2013, from cruxis.com:
http://www.cruxis.com/chess/houdini.htm
Houdart, R. (2013). Houdini Chess. Retrieved 2014, from cruxis.com:
http://www.cruxis.com/chess/houdini.htm
Johnson, V. E. (2013, November). Revised Standards for Statistical Evidence. Retrieved December 2013,
from Proceedings of the National Academy of Sciences:
http://www.pnas.org/content/110/48/19313.full
Jones, R., & Powell, D. (2014). Game Database. Retrieved February 2014, from chesstempo.com:
http://chesstempo.com/game-database.html
Kelley, D. (2008). Catalan. Retrieved 2014, from chessopenings.com: http://chessopenings.com/catalan/
Kelley, D. (2005). Dutch. Retrieved 2014, from chessopenings.com: http://chessopenings.com/dutch/
Kelley, D. (2008). Kings Indian. Retrieved 2014, from chessopenings.com:
http://chessopenings.com/kings+indian/
Kramnik-Leko. (2012). Dortmund. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1672566
Kramnik-Pelletier. (2013). Geneva Chess Masters. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1722084
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
Meyer-Kahlen, S. (2007). Deep Shredder. Retrieved 2014, from Shredderchess.com:
http://www.shredderchess.com/chess-software/deep-shredder12.html
Meyer-Kahlen, S. (2000). Opening Database. Retrieved January 2014, from Shredder Chess:
http://www.shredderchess.com/online-chess/online-databases/opening-database.html
Munshi, J. (2014). A Method for Comparing Chess Openings. Retrieved 2014, from SSRN:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2415203
Munshi, J. (2014). Comparing Chess Openings. Retrieved 2014, from SSRN:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2427542
Munshi, J. (2014). Numerican analysis. Retrieved 2014, from Drop box:
https://www.dropbox.com/sh/c7ze8c64ukpf525/AADAuuDx_mxXU6BVH-6GoF2ka
Munshi, J. (2014). PGN Files. Retrieved 2014, from Dropbox:
https://www.dropbox.com/sh/nj14ur3cucew5xo/AABaz_LViWNpRTSaBKlUWmj5a
Nakamura-Morozevich. (2013). FIDE Grand Prix Zug. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1716006
Phillips-Moloney. (2012). London Classic. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1700347
Prie, E. (2009). d-pawn specials. Retrieved 2014, from chesspublishing.com:
http://www.chesspublishing.com/content/8/jul09.htm
Radjabov-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1654273
Radjabov-Leitao. (2001). E43 Nimzo Indian. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1242308
Rapport-Aronian. (2014). Tata Steel. Retrieved 2014, from chessgames.com:
http://www.chessgames.com/perl/chessgame?gid=1744275
Schiller, E. (1993). How to play against the Staunton Gambit. Chess Digest.
Semkov, S. (2009). Kill KID Vol. 1. New in Chess.
Sielecki, C. (2014). Nimzo and Bogo Indian. Everyman Chess.
Topalov-Kramnik. (2014). World Chess Championships Candidates. Retrieved 2014, from
chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751385
Wikipedia. (2014). Houdini Chess. Retrieved 2014, from Wikipedia:
http://en.wikipedia.org/wiki/Houdini_(chess)
19
COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014
Wikipedia. (2014). Monte Carlo Simulation. Retrieved 2014, from Wikipedia:
http://en.wikipedia.org/wiki/Monte_Carlo_method
20