Academia.eduAcademia.edu

COMPARING CHESS OPENINGS PART 3: QUEEN'S PAWN OPENINGS

A dual engine experimental design for comparing chess openings was described in a previous paper (Munshi, Comparing Chess Openings, 2014). It is used in this paper to study ten chess openings that are initiated with the queen's pawn move 1. d4. One of the openings is identified as the mainline and the other nine as variations from the mainline. Five of the variations are found to be benign innovations and the other four are deemed to be failed innovations. The findings are mostly consistent with expert opinion. The primary purpose of this paper, however, is not these specific findings but rather the further development and verification of an objective and quantitative methodology for the evaluation of chess openings in general 1 .

COMPARING CHESS OPENINGS PART 3: QUEEN'S PAWN OPENINGS JAMAL MUNSHI ABSTRACT: A dual engine experimental design for comparing chess openings was described in a previous paper (Munshi, Comparing Chess Openings, 2014). It is used in this paper to study ten chess openings that are initiated with the queen's pawn move 1. d4. One of the openings is identified as the mainline and the other nine as variations from the mainline. Five of the variations are found to be benign innovations and the other four are deemed to be failed innovations. The findings are mostly consistent with expert opinion. The primary purpose of this paper, however, is not these specific findings but rather the further development and verification of an objective and quantitative methodology for the evaluation of chess openings in general1. 1. INTRODUCTION This paper is the third of a series in a study undertaken to develop a generally applicable methodology for the objective evaluation of chess openings. The proposed methodology uses controlled experiments with chess engines to compare chess openings. The first paper in this series (Munshi, A Method for Comparing Chess Openings, 2014) presented a single engine experimental design (SED) to compare ten openings that are initiated with the King's pawn move 1. e4. It demonstrated that the proposed methodology is able to discriminate between known strong openings and known weak openings. The advantage of the SED is that it removes the difference in playing strength from the experiment and isolates the effect of the opening; but the disadvantage is that the same engine playing both sides of the board may introduce an engine bias in the data by not playing a sufficiently diverse set of opening variations. Subsequently, a dual engine design (DED) was proposed to address the issue of engine bias (Munshi, Comparing Chess Openings, 2014). The second paper showed that there may have been a propensity for engine bias in the SED and that the engine bias problem is mitigated by the DED which forces the engines to play a greater number of variations. This paper describes a further test of the DED using a new set of openings. The motivation for this study is that conventional methods of evaluating chess openings are inadequate. Grandmaster opinions are subjective and inconsistent, while the win-loss-draw statistics in opening book databases are field data that were not taken under controlled conditions and are therefore confounded by intervening variables that have a greater effect on game outcomes than the opening (Munshi, A Method for Comparing Chess Openings, 2014). As a result, there are conflicting opinions on the merit of the different lines in the opening book and these opinions have engendered ongoing debates that have no satisfactory conclusion. It is proposed that an objective method for evaluating openings will settle these issues and help to refine the opening book. 1 Date: May 2014 Key words and phrases: chess openings, chess engines, refutation, methodology, Monte Carlo simulation, numerical methods, probability vector, Euclidean distance, robust statistics, bootstrap Author affiliation: Professor Emeritus, Sonoma State University, Rohnert Park, CA, 94928, [email protected] COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 2 2. THEORY Chess games may be thought of as a stochastic trinomial process driven by an unknown and unobservable underlying probability vector given by π = [pw, pb, pd] Equation 1 where π is a vector with two degrees of freedom, pw is the probability that white will win, pb is the probability that black will win, and pd = 1-pw-pb is the probability that the game will end in draw. The components of the probability vector π are determined by (1) white's first move advantage or FMA, (2) the general rate of imperfection in the moves or IMP, (3) the difference in playing strength between the player making white moves and the player making black moves, or DIFF, and (4) the opening employed (Munshi, A Method for Comparing Chess Openings, 2014). The value of FMA is not known but we know that it is a universal constant and we suspect that its effect is relatively small. Experiments designed to measure the effect of the opening must therefore control the values of IMP and DIFF so that the opening effect can be observed. Our hypothesis is that the choice of opening line played can change π and that therefore chess engine experiments under these controlled conditions may be used to detect the effect of openings on the probability vector π. In comparing two openings, opening-1 and opening-2, our research question and hypotheses are set up as follows: 1. Research question: 2. Null hypothesis: 3. Alternate hypothesis Ho: Ha: Is π1=π2? π1=π2 π1≠π2 A testable implication of this hypothesis is that if the true (and unknown) population mean results are plotted in Cartesian coordinates with x=number of wins by white and y=the number of wins by black, and the Euclidean distance between opening-1 and opening -2 is computed and designated as δ, then we may write the testable hypotheses as: Ho: Ha: δ=0 δ≠0 If we fail to reject Ho in this test, we immediately reach the conclusion that the evidence does not show that the probability vector π is changed by using opening-2 instead of opening-1. If we reject Ho, however, we know that the probability vector changed but we still don't know the direction of the change. Further tests are necessary to determine whether the change favors white, whether it favors black, or whether the change is in a neutral direction and favors neither black nor white. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 3 3. METHODOLOGY 3.1 Baseline and test openings. Well recognized and established opening book databases (MeyerKahlen, 2000) (Jones & Powell, 2014) are used to select the first three moves (first six half moves) from ten openings that begin with the queen's pawn move 1. d4. The opening sequence in this category most used by grandmasters is the Queen's Indian Defense (Meyer-Kahlen, 2000)2 and it is identified as the baseline opening and the other nine opening sequences selected are likewise described as test openings, or innovations. The proposed methodology for comparing chess openings is then used to compare each of the test openings with the baseline opening. The ten opening sequences selected for this study are shown in Table 1. The rarity data shown in the table refer to the frequency of the baseline relative to the test opening according to the opening database used for this purpose3 (Meyer-Kahlen, Opening Database, 2000). The nine test openings shown were selected to include a large rarity range and they are listed in the table from the most frequently played to the least. The first four test openings listed may be considered to be commonly used. The next two are not very common, and the last three are rarely played. The test openings selected are expected to represent a wide spectrum of possibilities in the Queen's pawn game. ID E12QID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA A83DSG Name Queen's Indian Defense Nimzo Indian Defense Queen's Gambit Declined King's Indian Defense Bogo Indian Defense Catalan Opening Dutch Defense Budapest Gambit Trompovsky Attack Staunton Gambit Fixed 3-move sequence d4Nf6 c4e6 Nf3b6 d4Nf6 c4e6 Nc3Bb4 d4Nf6 c4e6 Nf3d5 d4Nf6 c4g6 Nc3Bg7 d4Nf6 c4e6 Nf3Bb4+ d4Nf6 c4e6 g3d5 d4f5 g3Nf6 Bg2g6 d4Nf6 c4e5 dxe5Ng4 d4Nf6 Bg5Ne4 Bf4c5 d4f5 e4fxe4 Nc3Nf6 Rarity 1.0 1.0 1.7 2.6 7.1 8.1 23.3 25.2 108.2 Innovator White Black Black Black White Black Black White White Table 1 Baseline and test openings 3.2 Dual engine experimental design. The dual engine design (DED) described in a previous paper (Munshi, Comparing Chess Openings, 2014) is used to compare each test opening with the baseline using chess engine experiments. Each experiment consists of 300 games played between two chess engines. The engines selected are Houdini3Pro and Houdini4Pro (Houdart, 2013), generally regarded as the leaders in this kind of chess software (Wikipedia, 2014). All engine parameters are set to their default values. In each experiment, each of the two engines plays 150 games as white and 150 games as black. Every game of each experiment begins with the six half moves being evaluated. These move sequences are shown in Table 1. Engine calculations begin with the fourth move by white. The engine moves may cause transpositions of the opening into different opening designations than that by which it is identified in this paper and these transpositions are noted in the Appendix. 2 The identification of the "mainline" varies among databases. The selection of the baseline is therefore somewhat arbitrary since any of the first three openings listed could have been used as the mainline. 3 The rarity values differ among opening databases. They should be taken only in a very approximate sense. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 4 The Deep Shredder chess GUI4 software (Meyer-Kahlen, Deep Shredder, 2007) is used to set up the engine matches. The search depth is fixed and set to 21 half moves for both engines, a level at which the engines are expected to play at the grandmaster level or better (Ferreira, 2013). The very high level of play is evident in the relatively low percentage of decisive games and a low estimated value of IMP. For example, in the baseline case, 12% of the games were decisive with the remaining 88% ending in draw. The estimated value of IMP, the rate of imperfection in the moves, is 2% as measured by the number of wins by black5. Also, a comparison of the playing strength of the two engines6 under the controlled experimental conditions of this study shows no evidence of a significant difference in playing strength (DIFF). These statistics are indicative of a very high level of play in which the effect of the opening is unlikely to be overcome by move imperfections (IMP) or by the difference in playing strength (DIFF). The relevant data recorded for each experiment are shown below. The opening variability is a count of the number of unique moves made by the engines from the fourth to the tenth move. It serves as a measure of the number of variations computed by the engines during the opening phase of the game (Munshi, Comparing Chess Openings, 2014). 1. 2. 3. 4. White Black OV Transpositions the number of games won by white the number of games won by black the opening variability whether the engine moves changed the opening designation 3.3 Comparing test openings against the baseline. We assume that game outcomes in the baseline opening are driven by an underlying, unknown, and unobservable probability vector π and if the opening innovation7 to be evaluated changes the vector π we will be able to observe the effect of this change in the data. We then use the data to classify each test opening into one of three categories: Category A: Category C: Category F: Successful innovation. The probability vector has been changed in favor of the innovator. Benign innovation. The probability vector is either unchanged or it was changed in a neutral direction. Failed innovation. The probability vector has been changed in favor of the opponent. The test is carried out in stages. First we test to see if the Euclidean distance between the baseline opening result and the test opening result in the population from which our sample was taken is greater than zero. The hypotheses for this test are: Ho: Ha: 4 δ=0 δ≠0 Graphical User Interface See (Munshi, A Method for Comparing Chess Openings, 2014) for a detailed explanation. 6 The comparison is shown in the Appendix. 7 The terms "test opening" and "opening innovation" are used interchangeably. It is assumed that any opening sequence that differs from the mainline is an innovation. 5 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 5 We set the probability value for our level of disbelief at α=0.001 as suggested by Valen Johnson who has studied the relationship between the α level and the irreproducibility of results and found that the higher values of α such as α=0.05 or α=0.01 normally used can lead to spurious findings (Johnson, 2013). If the probability of observing a sample distance as large or larger8 than the one being tested (i.e. the p-value) is greater than α we fail to reject Ho and conclude that it is possible that the observed distance is a result of sampling variation in a sample of 300 games taken from an unobservable population in which δ=0. In these cases we can immediately classify the test opening into Category C as a benign innovation because we have no evidence that the probability vector has been changed by the test opening. However, if the p-value is less than α, we know that δ≠0 and conclude that the test opening has changed the probability vector but we are unable to classify the test opening until we determine the direction of the change. If the direction is well within the first or third quadrant, it is possible that the change is in a neutral direction and therefore we can classify the opening as Category C, benign innovation. This finding implies that the effect of the opening was only a change in the probability of decisive games with p w and pb changing proportionately with neither color gaining an advantage due to the opening innovation. If the direction is in the second or fourth quadrant, then the relative values of pw and pb have changed and one color has gained an advantage over another. In this case we can classify the opening as either Category A or Category F according to whether the change favors the innovator or the opponent. The possibilities are shown in Table 2. Innovator White Black Quadrant 1 C C Quadrant 2 F A Quadrant 3 C C Quadrant 4 A F Table 2 Classification according to innovator and direction (δ≠0) 3.4 Monte Carlo simulation. As in the previous papers we use a Monte Carlo numerical technique to create a simulated sampling distribution from which we derive a measure of variance that we use in our hypothesis test for distance (Munshi, A Method for Comparing Chess Openings, 2014) (Wikipedia, 2014). The sample data are used to estimate π=[pw,pb,pd] and these estimates are used to generate one thousand simulated replications of the experiment. For each opening, we compute the squared Euclidean distance of each simulated game from the mean9. Thus we have one thousand squared distances for each opening. When comparing two openings we have two thousand squared distances from their respective means. These squared distances are used to estimate what may be termed the "within treatment" variance of distance10. This variance serves as a measure of how different the sample results can be when taking samples of 300 games from the same population with a fixed value of π. This measure of variance can then be used to compute the probability of observing distances "between treatments11" greater than or equal to the observed distance if Ho is true and δ=0. This probability serves as the basis of our hypothesis test. 8 Since the distance is computed as a square root it can be either positive or negative and therefore this is a two tailed test. The reference to the magnitude of the distance as being "large or larger" refers to its absolute value. 9 By defi itio , the ea is represe ted by the sa ple data that were used to esti ate π. 10 In Fisher's terminology each opening is a treatment 11 Between a test opening and the baseline opening. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 6 4.0 DATA ANALYSIS The raw data from ten experiments of 300 games each are shown in Table 3. The essential data are the number of games won by white (White), the number of games won by black (Black), and the opening variability (OV). The OV data serve as a measure of the number of different variations played by the engines in the "opening phase12" of the game after the first three moves specified and fixed for each experiment are exhausted. These variations often cause the ECO designation13 to change. All such transpositions are listed in the Appendix along with references to high profile grandmaster games14 for each ECO designation played by the engines. All three thousand games played are available in PGN format in the online data archive for this paper (Munshi, PGN Files, 2014). ID E12QID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA A83DSG Name Queen's Indian Defense Nimzo Indian Defense Queen's Gambit Declined King's Indian Defense Bogo Indian Defense Catalan Opening Dutch Defense Budapest Gambit Trompovsky Attack Staunton Gambit OV 999 692 821 560 940 1002 914 403 603 419 White 30 30 31 68 29 26 48 63 7 3 Black 6 7 3 3 10 5 3 1 7 20 Decisive 36 37 34 71 39 31 51 64 14 23 Draw 264 263 266 229 261 269 249 236 286 277 Pct Draw 88% 88% 89% 76% 87% 90% 83% 79% 95% 92% Table 3 Observed sample data 4.1 Hypothesis test for distance. We can now use the sample data15 to compute the Euclidean distance of each test opening from the baseline opening. The distance may be visualized in Cartesian coordinates where the x-axis represents the number of white wins and the y-axis represents the number of black wins in any given simple random sample of 300 games. Each point in this x-y space represents a sample in our study. In the population of all possible games each point in this space represents a unique chess game and its probability vector. The data in Table 3 are shown in this format in Figure 1. 12 Arbitrarily assumed to constitute the first ten moves of the game Encyclopedia of Chess Openings 14 Tournaments from which these games are taken include the London Classic, World Championship Candidates, Moscow Open, Tata Steel, Tal Memorial, Chigorin Memorial, and the Geneva Chess Masters 15 Each experiment of 300 games is considered to be a simple random sample of 300 games taken from a population of an infinite number of games in which all games are driven by the same observable probability vector. 13 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 7 Number of games won by black 25 20 Staunton 15 10 Bogo Indian Trompovsky Nimzo Indian Queens Indian Catalan 5 Queens Gambit Dutch Kings Indian Budapest 0 0 10 20 30 40 50 60 70 80 Number of games won by white Figure 1 Sample data in Cartesian coordinates The Queen's Indian Defense is our baseline opening to which the other openings will be compared. So what we are interested in is the distance and direction of each of the test openings from the Queen's Indian. These distances and their directions are visualized more easily of we move the axis of the plot to the Queen's Indian and set it as the (0,0) point. This visualization of the distance vectors in our study is shown in Figure 2. An example distance computation is shown in Table 4. All the observed distances are tabulated in Table 5 along with the estimated standard deviation and the hypothesis tests for distance. Dutch Defense Queens Indian Difference Squared difference Squared distance=sum of squared differences Euclidean distance=square root of squared distance Table 4 Example distance computation White 48 30 18 324 Black 3 6 -3 9 333 18.25 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 8 20 Staunton 10 Bogo Indian Trompovsky -50 -25 Nimzo Indian 0 0 Catalan 25 50 Kings Indian Queens Gambit Dutch Budapest -10 Figure 2 Visualization of the distance and direction of the test openings from the baseline ID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA A83DSG Name Nimzo Indian Defense Queen's Gambit Declined King's Indian Defense Bogo Indian Defense Catalan Opening Dutch Defense Budapest Gambit Trompovsky Attack Staunton Gambit Table 5 Hypothesis test for distance distance 1.00 3.16 38.12 4.12 4.12 18.25 33.38 23.02 30.41 stdev16 5.85 5.68 6.64 5.88 5.61 6.17 6.52 4.89 5.34 t-value 0.171 0.557 5.737 0.701 0.734 2.956 5.120 4.712 5.692 p-value 8.6E-01 5.8E-01 1.1E-08 4.8E-01 4.6E-01 3.2E-03 3.3E-07 2.6E-06 1.4E-08 Result --Reject Ho ---Reject Ho Reject Ho Reject Ho 17 The hypothesis tests in Table 5 show that the observed distances of five of the test openings from the baseline opening are small enough to have been the result of sampling variation. In these cases we do not reject the null hypothesis Ho that δ=0 and conclude that the evidence does not show that the opening innovation has changed the probability vector. These test openings are therefore classified as Category C, benign innovation. The data are consistent with the hypothesis that the probability vector that generates game outcomes in these test openings is not different from that which generates game outcomes in the baseline Queen's Indian Defense, that is, π(test opening) = π(baseline opening). 16 The term stdev refers to the standard deviation of distance and its value is estimated by using a Monte Carlo simulation procedure. The computational details are available in the online data archive for this paper (Munshi, Numerican analysis, 2014). 17 The comparison of each test opening with the baseline opening is shown graphically in the Appendix. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 9 In the remaining four test openings, marked in Table 5 as "Reject Ho", we find that the observed sample distance is too large to be explained by sampling variation alone. In these cases we reject the Ho hypothesis and conclude that δ≠0 and that therefore the test opening innovation has changed the probability vector so that π(test opening) ≠π(baseline opening). To classify these openings we must examine the direction of the change to determine whether the change in π favors the innovator, or whether it favors the opponent, or whether the change is in a neutral direction and does not favor either party. The direction information for these four test openings are shown in Table 6. ID E61KID A52BUG A45TVA A83DSG Name King's Indian Defense Budapest Gambit Trompovsky Attack Staunton Gambit distance 23.36 23.50 25.15 24.66 angle 356 352 178 161 quadrant 4 4 2 2 favors white white black black innovator black black white white Category F F F F Table 6 Classification of distant test openings according to direction What we see in Table 6 is that none of the test openings that changed the π vector gained from the change and also that none of these changes are in a neutral direction. All of these innovations are detrimental to the innovator and therefore all of them are classified as Category F, failed innovation. The information in Table 6 is presented visually in Figure 2 where one can see clearly that the Category F innovations by white decreased white's chance of winning or increased black's chance of winning or that they did both. Likewise Figure 2 also shows that the Category F innovations by black decreased black's chance of winning or increased white's chance of winning or that they did both. We now summarize our findings in Table 7. The table shows our final classification of all the test openings in view of the data we collected from our controlled engine experiments and their analysis as presented above. As noted in the table, none of the openings tested was a successful innovation and none of the changes in the probability vector occurred in a neutral direction. ID Name E43NID Nimzo Indian Defense Category C Observed distance explained by sampling variation Reason for the classification D37QGD Queen's Gambit Declined C Observed distance explained by sampling variation E61KID King's Indian Defense F Probability vector changed in favor of the opponent E11BID Bogo Indian Defense C Observed distance explained by sampling variation E01CAT Catalan Opening C Observed distance explained by sampling variation A81DUD Dutch Defense C Observed distance explained by sampling variation A52BUG Budapest Gambit F Probability vector changed in favor of the opponent A45TVA Trompovsky Attack F Probability vector was changed in favor of the opponent A83DSG Staunton Gambit F Probability vector changed in favor of the opponent NONE ---- A Probability vector changed in favor of the innovator NONE ---- C Probability vector changed in a neutral direction Table 7 Summary of findings COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 10 5. CONCLUSIONS Engine experiments carried out under controlled conditions show that the Queen's Indian Defense, chosen as the baseline opening in this study, may be considered to be neutral and perfect in this category of openings because none of the nine innovations tested offered any advantage to the innovator18. Of the nine test openings, five were found to be benign innovations as they had no measurable effect on the underlying and unobservable probability vector that determines chess game outcomes under the baseline conditions. The observed distances of these openings from the baseline may be explained in terms of sampling variation19. Two members of this group, the Dutch Defense, an innovation by black, and the Catalan Opening, an innovation by white, are noteworthy in that our findings are inconsistent with their rarity of play and supportive of their positive evaluation by analysts who have studied these openings (Bologan, 2012) (Harding, 2010) (Kelley, 2005) (Kelley, Catalan, 2008). The other three members of this group, the Nimzo Indian Defense, Queen's Gambit Declined, and the Bogo Indian Defense are universally considered to be strong openings (Dearin, 2005) (Sielecki, 2014) comparable with the Queen's Indian Defense and our findings along with their popularity in the opening book support this view. In four of the test openings, the evidence indicates that the opening innovation changed the probability vector. This means that the probability vector that generates game outcomes in these openings is not the same as that which generates game outcomes in the baseline Queen's Indian Defense. In all four cases the change in the probability vector goes against the innovator and so they are classified as failed innovations. The most significant member of this group is the King's Indian Defense, an opening that is at once a popular line in the opening book database (Jones & Powell, 2014) and also viewed in a positive light by many analysts (Kelley, Kings Indian, 2008) (Gserper, 2010) (Golubov, 2006). Our experiment shows that it is a failed innovation by black. Some analysts agree (Semkov, 2009) (Hansen, 2009). The other three failed innovations are less controversial because our findings are consistent with general opinion and the opening book. The Staunton Gambit, the Trompovsky Attack, and the Budapest Gambit are played very rarely at high level games according to the opening books (Jones & Powell, 2014) and analysts generally tend to project a negative opinion on these innovations (Dzindzichashvili, 2009) (Schiller, 1993) (Prie, 2009). The motivation for this study is not so much to pass judgment on specific opening lines but rather to develop and refine an objective methodology for the evaluation of chess openings in general. We recognize that at a sufficiently high move imperfection rate (IMP), chess games would be decided mostly by move errors and in those games the effect of the failed opening innovations noted in this study may not be directly relevant. Yet the relative merit of opening lines is of great interest to the chess community and its evaluation has a practical application for designers of opening books. 18 A comparison of the Queen's Indian Defense with the Sicilian mainline in the Appendix further supports its selection as a neutral and perfect baseline against which other queen's pawn openings may be compared. 19 The term "sampling variation" refers to the difference among samples taken from the same population. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 11 6. APPENDIX 6.1 Comparison of the playing strength of the engines. One of the test opening experiments had a result very similar to that of the baseline opening. To compare engine strengths at the same sample size used for comparing openings (n=300), we combine these two experiments into a large sample of 600 games as shown below. Opening E12 Queens Indian Defense E43 Nimzo Indian Defense Combined E12 + E43 Games played 300 300 600 White won 30 30 60 Black won 6 7 13 Of the 600 games, each engine played 300 games as white and 300 as black. We now count for each engine the number of games won as white and the number won as black. These data allows us to set up the comparison of the engines as follows. Engine Houdini3Pro Houdini4Pro Games played as white 300 300 Won as White 33 27 Games played as black 300 300 Won as Black 4 9 These results, plotted in Cartesian coordinates show a Euclidean distance between them of 7.8102. As in the evaluation of openings, we create a simulated sampling distribution and estimate the variation of the distance that we can expect from one sample to the next. We can now set up a hypothesis test for distance as follows: Ho: Ha: δ=0 δ≠0 The data are tabulated below and shown graphically in Figure 3.. Observed distance Standard deviation of the sampling distribution of distances The value of the t-statistic for Ho: δ=0 Probability that we would observe a t-value this large or larger if δ=0 Probability value that serves as our threshold of disbelief 7.8102 5.8151 1.3431 0.1794 0.001 The test shows that the distance observed is one we would to observe as sampling error even when the true value of the distance is zero. Therefore we fail to reject the Ho statement that δ=0 and conclude that the evidence does not show that there is a difference in playing strength between the two engines under the experimental conditions used in this study. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 12 Houdini4Pro -vs- Houdini3Pro 25 Won as Black 20 15 10 5 0 0 10 20 30 40 50 60 Won as White Figure 3 Comparison of engine playing strength 6.2 Comparison of two baseline openings. A baseline opening used in a previous study (Munshi, Comparing Chess Openings, 2014) and the one used in this study are compared. The sample data from the DED experiments are as follows20: Opening E12 Queens Indian Defense B53 Sicilian Defense Games played 300 300 White won 30 19 Black won 6 3 The Euclidean distance between these results in Cartesian coordinates is 11.402 and the Queen's Indian Defense projects at an angle of 14 degrees from the Sicilian Defense. The standard deviation of distance is estimated using Monte Carlo simulation to be 5.3208. Using the t-distribution we find that the probability of observing a sample distance of 11.402 or greater under these conditions is 0.0322, much larger than our threshold of α=0.001. We fail to reject Ho that the true distance δ=0 and that the two probability vectors are the same. In any event, even if the observed distance were larger and we had rejected Ho, we would have to consider that the angle lies in the first quadrant and that therefore the large distance would only indicate a difference in the probability of decisive games and not necessarily a relative advantage to either white or black. The comparison is shown graphically in Figure 4. 20 The three-move sequences used in the test are: B53 Sicilian Defense = e4c5 Nf3d6 d4cxd4 and E12 Queen's Indian Defense = d4Nf6 c4e6 Nf3b6. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 13 Number of games won by Black B53 Sicilian -vs- E12 QID 15 10 5 0 0 10 20 30 40 50 60 Number of games won by White Figure 4 E12 Queen's Indian Defense compared with B53 Sicilian Defense 21 6.3 Transpositions. The ECO codes used to identify the baseline and test openings apply to the first three moves that were fixed for each experiment. Engine calculations began in the fourth move and the engine moves often caused transpositions to different ECO designations. All such transpositions are noted in Table 8 along with references to recent grandmaster games. ID E12QID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA Transpositions E12 Queen's Indian/Petrosian Variation E14 Queen's Indian/ Classical Variation E16 Queen's Indian/ Classical Variation E43 Nimzo Indian/Nimzowitch Variation E43 Nimzo Indian/ Nimzowitsch Variation E47 Nimzo Indian/ Mainline D37 Queen's Gambit Declined E70 Kings Indian Defense E71 Kings Indian Defense E90 Kings Indian Defense E91 Kings Indian Defense E11 Bogo Indian Defense E15 Queens Indian/Classical Variation E16 Queens Indian/Classical Variation E01 Catalan Opening E04 Catalan Opening E06 Catalan Opening E11 Bogo Indian Defense E16Queen's Indian/Classical Variation A81 Dutch Defense A87 Dutch/Leningrad Variation A88 Dutch/Leningrad Variation A89 Dutch/Leningrad Variation A52 Budapest Gambit A45 Trompovsky Attack Grandmaster games (Carlsen-Karjakin, 2012) (Kramnik-Pelletier, 2013) (Gelfand-Gashimov, 2012) (Radjabov-Leitao, 2001) (Gelfand-Grischuk, 2012) (Topalov-Kramnik, 2014) (Carlsen-Radjabov, 2014) (Aronian-Radjabov, 2013) (Aronian-Carlsen, 2013) (Nakamura-Morozevich, 2013) (Caruana-Short, 2013) (Radjabov-Karjakin, 2012) (Bratteteig-Pleninger, 2012) (Kramnik-Leko, 2012) (Andreikin-Mamedyarov, 2014) (Caruana-Karjakin, 2012) _______________________________ (Cramling-Kosteniuk, 2000) (Gupta-Carr, 2013) (Aronian-Namakura, 2012) (Phillips-Moloney, 2012) (Gelfand-Rapport, 2014) (Rapport-Aronian, 2014) Table 8 Transpositions 21 In all such graphs the baseline is shown in blue with square markers and the test opening is shown in red with diamond shaped markers. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 6.4 14 Visualization of the hypothesis tests in Tables 5 and 6. Number of games won by Black E12 QID -vs- E43 NID 20 15 10 5 0 0 10 20 30 40 50 60 50 60 Number of games won by White Number of games won by Black E12 QID -vs- D37 QGD 15 10 5 0 0 10 20 30 40 Number of games won by White Number of games won by Black E12 QID -vs- E61 KID 15 10 5 0 0 20 40 60 Number of games won by White 80 100 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 15 Number of games won by Black E12 QID -vs- E11 BID 20 15 10 5 0 0 10 20 30 40 50 60 Number of games won by White Number of games won by Black E12 QID -vs- E01 CAT 15 10 5 0 0 10 20 30 40 50 60 Number of games won by White Number of games won by Black E12 QID -vs- A81 DUD 15 10 5 0 0 20 40 60 Number of games won by White 80 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 16 Number of games won by Black E12 QID -vs- A52 BUG 15 10 5 0 0 20 40 60 80 100 Number of games won by White Number of games won by Black E12 QID -vs- A45 TVA 20 15 10 5 0 0 10 20 30 40 50 60 50 60 Number of games won by White Number of games won by Black E12 QID -vs- A83 DSG 40 30 20 10 0 0 10 20 30 40 Number of games won by White COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 17 7. REFERENCES Andreikin-Mamedyarov. (2014). World Chess Championship Candidates. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751553 Aronian-Carlsen. (2013). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1704703 Aronian-Namakura. (2012). Tal Memorial. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1669174 Aronian-Radjabov. (2013). World Championship Candidates. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1714072 Bologan, V. (2012). The Powerful Catalan. New in Chess. Bratteteig-Pleninger. (2012). London Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1700479 Carlsen-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1654399 Carlsen-Radjabov. (2014). Gashimov Memorial. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1753385 Caruana-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1744284 Caruana-Short. (2013). London Chess Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1741071 Cramling-Kosteniuk. (2000). WCC. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1258901 Dearin, E. (2005). Play the Nimzo Indian. Everyman Chess. Dzindzichashvili, R. (2009). Budapest gambit. Retrieved 2014, from Youtube: http://www.youtube.com/watch?v=-ShFKpTkL9Q Ferreira, D. (2013). The impact of search depth on chess playing strength. Retrieved 2014, from Instituto Superior Tecnico: http://web.ist.utl.pt/diogo.ferreira/papers/ferreira13impact.pdf Gelfand-Gashimov. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1654442 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 18 Gelfand-Grischuk. (2012). World Rapid Championship. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1671296 Gelfand-Rapport. (2014). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1744267 Golubov, M. (2006). Understanding the King's Gambit. Gambit Publications. Gserper, G. (2010). King's Indian Defense. Retrieved 2014, from chess.com: http://www.chess.com/article/view/openings-for-tactical-players-kings-indian-defense Gupta-Carr. (2013). London Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1740547 Hansen, C. (2009). Checkpoint. Retrieved 2014, from Chesscafe.com: http://www.chesscafe.com/text/hansen125.pdf Harding, T. (2010). Play the Dutch. Retrieved 2014, from chesscafe.com: http://www.chesscafe.com/text/kibitz175.pdf Houdart, R. (2012). Houdini. Retrieved November 2013, from cruxis.com: http://www.cruxis.com/chess/houdini.htm Houdart, R. (2013). Houdini Chess. Retrieved 2014, from cruxis.com: http://www.cruxis.com/chess/houdini.htm Johnson, V. E. (2013, November). Revised Standards for Statistical Evidence. Retrieved December 2013, from Proceedings of the National Academy of Sciences: http://www.pnas.org/content/110/48/19313.full Jones, R., & Powell, D. (2014). Game Database. Retrieved February 2014, from chesstempo.com: http://chesstempo.com/game-database.html Kelley, D. (2008). Catalan. Retrieved 2014, from chessopenings.com: http://chessopenings.com/catalan/ Kelley, D. (2005). Dutch. Retrieved 2014, from chessopenings.com: http://chessopenings.com/dutch/ Kelley, D. (2008). Kings Indian. Retrieved 2014, from chessopenings.com: http://chessopenings.com/kings+indian/ Kramnik-Leko. (2012). Dortmund. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1672566 Kramnik-Pelletier. (2013). Geneva Chess Masters. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1722084 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 Meyer-Kahlen, S. (2007). Deep Shredder. Retrieved 2014, from Shredderchess.com: http://www.shredderchess.com/chess-software/deep-shredder12.html Meyer-Kahlen, S. (2000). Opening Database. Retrieved January 2014, from Shredder Chess: http://www.shredderchess.com/online-chess/online-databases/opening-database.html Munshi, J. (2014). A Method for Comparing Chess Openings. Retrieved 2014, from SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2415203 Munshi, J. (2014). Comparing Chess Openings. Retrieved 2014, from SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2427542 Munshi, J. (2014). Numerican analysis. Retrieved 2014, from Drop box: https://www.dropbox.com/sh/c7ze8c64ukpf525/AADAuuDx_mxXU6BVH-6GoF2ka Munshi, J. (2014). PGN Files. Retrieved 2014, from Dropbox: https://www.dropbox.com/sh/nj14ur3cucew5xo/AABaz_LViWNpRTSaBKlUWmj5a Nakamura-Morozevich. (2013). FIDE Grand Prix Zug. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1716006 Phillips-Moloney. (2012). London Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1700347 Prie, E. (2009). d-pawn specials. Retrieved 2014, from chesspublishing.com: http://www.chesspublishing.com/content/8/jul09.htm Radjabov-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1654273 Radjabov-Leitao. (2001). E43 Nimzo Indian. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1242308 Rapport-Aronian. (2014). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1744275 Schiller, E. (1993). How to play against the Staunton Gambit. Chess Digest. Semkov, S. (2009). Kill KID Vol. 1. New in Chess. Sielecki, C. (2014). Nimzo and Bogo Indian. Everyman Chess. Topalov-Kramnik. (2014). World Chess Championships Candidates. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751385 Wikipedia. (2014). Houdini Chess. Retrieved 2014, from Wikipedia: http://en.wikipedia.org/wiki/Houdini_(chess) 19 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 Wikipedia. (2014). Monte Carlo Simulation. Retrieved 2014, from Wikipedia: http://en.wikipedia.org/wiki/Monte_Carlo_method 20
COMPARING CHESS OPENINGS PART 3: QUEEN'S PAWN OPENINGS JAMAL MUNSHI ABSTRACT: A dual engine experimental design for comparing chess openings was described in a previous paper (Munshi, Comparing Chess Openings, 2014). It is used in this paper to study ten chess openings that are initiated with the queen's pawn move 1. d4. One of the openings is identified as the mainline and the other nine as variations from the mainline. Five of the variations are found to be benign innovations and the other four are deemed to be failed innovations. The findings are mostly consistent with expert opinion. The primary purpose of this paper, however, is not these specific findings but rather the further development and verification of an objective and quantitative methodology for the evaluation of chess openings in general1. 1. INTRODUCTION This paper is the third of a series in a study undertaken to develop a generally applicable methodology for the objective evaluation of chess openings. The proposed methodology uses controlled experiments with chess engines to compare chess openings. The first paper in this series (Munshi, A Method for Comparing Chess Openings, 2014) presented a single engine experimental design (SED) to compare ten openings that are initiated with the King's pawn move 1. e4. It demonstrated that the proposed methodology is able to discriminate between known strong openings and known weak openings. The advantage of the SED is that it removes the difference in playing strength from the experiment and isolates the effect of the opening; but the disadvantage is that the same engine playing both sides of the board may introduce an engine bias in the data by not playing a sufficiently diverse set of opening variations. Subsequently, a dual engine design (DED) was proposed to address the issue of engine bias (Munshi, Comparing Chess Openings, 2014). The second paper showed that there may have been a propensity for engine bias in the SED and that the engine bias problem is mitigated by the DED which forces the engines to play a greater number of variations. This paper describes a further test of the DED using a new set of openings. The motivation for this study is that conventional methods of evaluating chess openings are inadequate. Grandmaster opinions are subjective and inconsistent, while the win-loss-draw statistics in opening book databases are field data that were not taken under controlled conditions and are therefore confounded by intervening variables that have a greater effect on game outcomes than the opening (Munshi, A Method for Comparing Chess Openings, 2014). As a result, there are conflicting opinions on the merit of the different lines in the opening book and these opinions have engendered ongoing debates that have no satisfactory conclusion. It is proposed that an objective method for evaluating openings will settle these issues and help to refine the opening book. 1 Date: May 2014 Key words and phrases: chess openings, chess engines, refutation, methodology, Monte Carlo simulation, numerical methods, probability vector, Euclidean distance, robust statistics, bootstrap Author affiliation: Professor Emeritus, Sonoma State University, Rohnert Park, CA, 94928, [email protected] COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 2 2. THEORY Chess games may be thought of as a stochastic trinomial process driven by an unknown and unobservable underlying probability vector given by π = [pw, pb, pd] Equation 1 where π is a vector with two degrees of freedom, pw is the probability that white will win, pb is the probability that black will win, and pd = 1-pw-pb is the probability that the game will end in draw. The components of the probability vector π are determined by (1) white's first move advantage or FMA, (2) the general rate of imperfection in the moves or IMP, (3) the difference in playing strength between the player making white moves and the player making black moves, or DIFF, and (4) the opening employed (Munshi, A Method for Comparing Chess Openings, 2014). The value of FMA is not known but we know that it is a universal constant and we suspect that its effect is relatively small. Experiments designed to measure the effect of the opening must therefore control the values of IMP and DIFF so that the opening effect can be observed. Our hypothesis is that the choice of opening line played can change π and that therefore chess engine experiments under these controlled conditions may be used to detect the effect of openings on the probability vector π. In comparing two openings, opening-1 and opening-2, our research question and hypotheses are set up as follows: 1. Research question: 2. Null hypothesis: 3. Alternate hypothesis Ho: Ha: Is π1=π2? π1=π2 π1≠π2 A testable implication of this hypothesis is that if the true (and unknown) population mean results are plotted in Cartesian coordinates with x=number of wins by white and y=the number of wins by black, and the Euclidean distance between opening-1 and opening -2 is computed and designated as δ, then we may write the testable hypotheses as: Ho: Ha: δ=0 δ≠0 If we fail to reject Ho in this test, we immediately reach the conclusion that the evidence does not show that the probability vector π is changed by using opening-2 instead of opening-1. If we reject Ho, however, we know that the probability vector changed but we still don't know the direction of the change. Further tests are necessary to determine whether the change favors white, whether it favors black, or whether the change is in a neutral direction and favors neither black nor white. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 3 3. METHODOLOGY 3.1 Baseline and test openings. Well recognized and established opening book databases (MeyerKahlen, 2000) (Jones & Powell, 2014) are used to select the first three moves (first six half moves) from ten openings that begin with the queen's pawn move 1. d4. The opening sequence in this category most used by grandmasters is the Queen's Indian Defense (Meyer-Kahlen, 2000)2 and it is identified as the baseline opening and the other nine opening sequences selected are likewise described as test openings, or innovations. The proposed methodology for comparing chess openings is then used to compare each of the test openings with the baseline opening. The ten opening sequences selected for this study are shown in Table 1. The rarity data shown in the table refer to the frequency of the baseline relative to the test opening according to the opening database used for this purpose3 (Meyer-Kahlen, Opening Database, 2000). The nine test openings shown were selected to include a large rarity range and they are listed in the table from the most frequently played to the least. The first four test openings listed may be considered to be commonly used. The next two are not very common, and the last three are rarely played. The test openings selected are expected to represent a wide spectrum of possibilities in the Queen's pawn game. ID E12QID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA A83DSG Name Queen's Indian Defense Nimzo Indian Defense Queen's Gambit Declined King's Indian Defense Bogo Indian Defense Catalan Opening Dutch Defense Budapest Gambit Trompovsky Attack Staunton Gambit Fixed 3-move sequence d4Nf6 c4e6 Nf3b6 d4Nf6 c4e6 Nc3Bb4 d4Nf6 c4e6 Nf3d5 d4Nf6 c4g6 Nc3Bg7 d4Nf6 c4e6 Nf3Bb4+ d4Nf6 c4e6 g3d5 d4f5 g3Nf6 Bg2g6 d4Nf6 c4e5 dxe5Ng4 d4Nf6 Bg5Ne4 Bf4c5 d4f5 e4fxe4 Nc3Nf6 Rarity 1.0 1.0 1.7 2.6 7.1 8.1 23.3 25.2 108.2 Innovator White Black Black Black White Black Black White White Table 1 Baseline and test openings 3.2 Dual engine experimental design. The dual engine design (DED) described in a previous paper (Munshi, Comparing Chess Openings, 2014) is used to compare each test opening with the baseline using chess engine experiments. Each experiment consists of 300 games played between two chess engines. The engines selected are Houdini3Pro and Houdini4Pro (Houdart, 2013), generally regarded as the leaders in this kind of chess software (Wikipedia, 2014). All engine parameters are set to their default values. In each experiment, each of the two engines plays 150 games as white and 150 games as black. Every game of each experiment begins with the six half moves being evaluated. These move sequences are shown in Table 1. Engine calculations begin with the fourth move by white. The engine moves may cause transpositions of the opening into different opening designations than that by which it is identified in this paper and these transpositions are noted in the Appendix. 2 The identification of the "mainline" varies among databases. The selection of the baseline is therefore somewhat arbitrary since any of the first three openings listed could have been used as the mainline. 3 The rarity values differ among opening databases. They should be taken only in a very approximate sense. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 4 The Deep Shredder chess GUI4 software (Meyer-Kahlen, Deep Shredder, 2007) is used to set up the engine matches. The search depth is fixed and set to 21 half moves for both engines, a level at which the engines are expected to play at the grandmaster level or better (Ferreira, 2013). The very high level of play is evident in the relatively low percentage of decisive games and a low estimated value of IMP. For example, in the baseline case, 12% of the games were decisive with the remaining 88% ending in draw. The estimated value of IMP, the rate of imperfection in the moves, is 2% as measured by the number of wins by black5. Also, a comparison of the playing strength of the two engines6 under the controlled experimental conditions of this study shows no evidence of a significant difference in playing strength (DIFF). These statistics are indicative of a very high level of play in which the effect of the opening is unlikely to be overcome by move imperfections (IMP) or by the difference in playing strength (DIFF). The relevant data recorded for each experiment are shown below. The opening variability is a count of the number of unique moves made by the engines from the fourth to the tenth move. It serves as a measure of the number of variations computed by the engines during the opening phase of the game (Munshi, Comparing Chess Openings, 2014). 1. 2. 3. 4. White Black OV Transpositions the number of games won by white the number of games won by black the opening variability whether the engine moves changed the opening designation 3.3 Comparing test openings against the baseline. We assume that game outcomes in the baseline opening are driven by an underlying, unknown, and unobservable probability vector π and if the opening innovation7 to be evaluated changes the vector π we will be able to observe the effect of this change in the data. We then use the data to classify each test opening into one of three categories: Category A: Category C: Category F: Successful innovation. The probability vector has been changed in favor of the innovator. Benign innovation. The probability vector is either unchanged or it was changed in a neutral direction. Failed innovation. The probability vector has been changed in favor of the opponent. The test is carried out in stages. First we test to see if the Euclidean distance between the baseline opening result and the test opening result in the population from which our sample was taken is greater than zero. The hypotheses for this test are: Ho: Ha: 4 δ=0 δ≠0 Graphical User Interface See (Munshi, A Method for Comparing Chess Openings, 2014) for a detailed explanation. 6 The comparison is shown in the Appendix. 7 The terms "test opening" and "opening innovation" are used interchangeably. It is assumed that any opening sequence that differs from the mainline is an innovation. 5 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 5 We set the probability value for our level of disbelief at α=0.001 as suggested by Valen Johnson who has studied the relationship between the α level and the irreproducibility of results and found that the higher values of α such as α=0.05 or α=0.01 normally used can lead to spurious findings (Johnson, 2013). If the probability of observing a sample distance as large or larger8 than the one being tested (i.e. the p-value) is greater than α we fail to reject Ho and conclude that it is possible that the observed distance is a result of sampling variation in a sample of 300 games taken from an unobservable population in which δ=0. In these cases we can immediately classify the test opening into Category C as a benign innovation because we have no evidence that the probability vector has been changed by the test opening. However, if the p-value is less than α, we know that δ≠0 and conclude that the test opening has changed the probability vector but we are unable to classify the test opening until we determine the direction of the change. If the direction is well within the first or third quadrant, it is possible that the change is in a neutral direction and therefore we can classify the opening as Category C, benign innovation. This finding implies that the effect of the opening was only a change in the probability of decisive games with p w and pb changing proportionately with neither color gaining an advantage due to the opening innovation. If the direction is in the second or fourth quadrant, then the relative values of pw and pb have changed and one color has gained an advantage over another. In this case we can classify the opening as either Category A or Category F according to whether the change favors the innovator or the opponent. The possibilities are shown in Table 2. Innovator White Black Quadrant 1 C C Quadrant 2 F A Quadrant 3 C C Quadrant 4 A F Table 2 Classification according to innovator and direction (δ≠0) 3.4 Monte Carlo simulation. As in the previous papers we use a Monte Carlo numerical technique to create a simulated sampling distribution from which we derive a measure of variance that we use in our hypothesis test for distance (Munshi, A Method for Comparing Chess Openings, 2014) (Wikipedia, 2014). The sample data are used to estimate π=[pw,pb,pd] and these estimates are used to generate one thousand simulated replications of the experiment. For each opening, we compute the squared Euclidean distance of each simulated game from the mean9. Thus we have one thousand squared distances for each opening. When comparing two openings we have two thousand squared distances from their respective means. These squared distances are used to estimate what may be termed the "within treatment" variance of distance10. This variance serves as a measure of how different the sample results can be when taking samples of 300 games from the same population with a fixed value of π. This measure of variance can then be used to compute the probability of observing distances "between treatments11" greater than or equal to the observed distance if Ho is true and δ=0. This probability serves as the basis of our hypothesis test. 8 Since the distance is computed as a square root it can be either positive or negative and therefore this is a two tailed test. The reference to the magnitude of the distance as being "large or larger" refers to its absolute value. 9 By defi itio , the ea is represe ted by the sa ple data that were used to esti ate π. 10 In Fisher's terminology each opening is a treatment 11 Between a test opening and the baseline opening. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 6 4.0 DATA ANALYSIS The raw data from ten experiments of 300 games each are shown in Table 3. The essential data are the number of games won by white (White), the number of games won by black (Black), and the opening variability (OV). The OV data serve as a measure of the number of different variations played by the engines in the "opening phase12" of the game after the first three moves specified and fixed for each experiment are exhausted. These variations often cause the ECO designation13 to change. All such transpositions are listed in the Appendix along with references to high profile grandmaster games14 for each ECO designation played by the engines. All three thousand games played are available in PGN format in the online data archive for this paper (Munshi, PGN Files, 2014). ID E12QID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA A83DSG Name Queen's Indian Defense Nimzo Indian Defense Queen's Gambit Declined King's Indian Defense Bogo Indian Defense Catalan Opening Dutch Defense Budapest Gambit Trompovsky Attack Staunton Gambit OV 999 692 821 560 940 1002 914 403 603 419 White 30 30 31 68 29 26 48 63 7 3 Black 6 7 3 3 10 5 3 1 7 20 Decisive 36 37 34 71 39 31 51 64 14 23 Draw 264 263 266 229 261 269 249 236 286 277 Pct Draw 88% 88% 89% 76% 87% 90% 83% 79% 95% 92% Table 3 Observed sample data 4.1 Hypothesis test for distance. We can now use the sample data15 to compute the Euclidean distance of each test opening from the baseline opening. The distance may be visualized in Cartesian coordinates where the x-axis represents the number of white wins and the y-axis represents the number of black wins in any given simple random sample of 300 games. Each point in this x-y space represents a sample in our study. In the population of all possible games each point in this space represents a unique chess game and its probability vector. The data in Table 3 are shown in this format in Figure 1. 12 Arbitrarily assumed to constitute the first ten moves of the game Encyclopedia of Chess Openings 14 Tournaments from which these games are taken include the London Classic, World Championship Candidates, Moscow Open, Tata Steel, Tal Memorial, Chigorin Memorial, and the Geneva Chess Masters 15 Each experiment of 300 games is considered to be a simple random sample of 300 games taken from a population of an infinite number of games in which all games are driven by the same unobservable probability vector. 13 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 7 Number of games won by black 25 20 Staunton 15 10 Bogo Indian Trompovsky Nimzo Indian Queens Indian Catalan 5 Queens Gambit Dutch Kings Indian Budapest 0 0 10 20 30 40 50 60 70 80 Number of games won by white Figure 1 Sample data in Cartesian coordinates The Queen's Indian Defense is our baseline opening to which the other openings will be compared. So what we are interested in is the distance and direction of each of the test openings from the Queen's Indian. These distances and their directions are visualized more easily of we move the axis of the plot to the Queen's Indian and set it as the (0,0) point. This visualization of the distance vectors in our study is shown in Figure 2. An example distance computation is shown in Table 4. All the observed distances are tabulated in Table 5 along with the estimated standard deviation and the hypothesis tests for distance. Dutch Defense Queens Indian Difference Squared difference Squared distance=sum of squared differences Euclidean distance=square root of squared distance Table 4 Example distance computation White 48 30 18 324 Black 3 6 -3 9 333 18.25 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 8 20 Staunton 10 Bogo Indian Trompovsky -50 -25 Nimzo Indian 0 0 Catalan 25 50 Kings Indian Queens Gambit Dutch Budapest -10 Figure 2 Visualization of the distance and direction of the test openings from the baseline ID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA A83DSG Name Nimzo Indian Defense Queen's Gambit Declined King's Indian Defense Bogo Indian Defense Catalan Opening Dutch Defense Budapest Gambit Trompovsky Attack Staunton Gambit Table 5 Hypothesis test for distance distance 1.00 3.16 38.12 4.12 4.12 18.25 33.38 23.02 30.41 stdev16 5.85 5.68 6.64 5.88 5.61 6.17 6.52 4.89 5.34 t-value 0.171 0.557 5.737 0.701 0.734 2.956 5.120 4.712 5.692 p-value 8.6E-01 5.8E-01 1.1E-08 4.8E-01 4.6E-01 3.2E-03 3.3E-07 2.6E-06 1.4E-08 Result --Reject Ho ---Reject Ho Reject Ho Reject Ho 17 The hypothesis tests in Table 5 show that the observed distances of five of the test openings from the baseline opening are small enough to have been the result of sampling variation. In these cases we do not reject the null hypothesis Ho that δ=0 and conclude that the evidence does not show that the opening innovation has changed the probability vector. These test openings are therefore classified as Category C, benign innovation. The data are consistent with the hypothesis that the probability vector that generates game outcomes in these test openings is not different from that which generates game outcomes in the baseline Queen's Indian Defense, that is, π(test opening) = π(baseline opening). 16 The term stdev refers to the standard deviation of distance and its value is estimated by using a Monte Carlo simulation procedure. The computational details are available in the online data archive for this paper (Munshi, Numerican analysis, 2014). 17 The comparison of each test opening with the baseline opening is shown graphically in the Appendix. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 9 In the remaining four test openings, marked in Table 5 as "Reject Ho", we find that the observed sample distance is too large to be explained by sampling variation alone. In these cases we reject the Ho hypothesis and conclude that δ≠0 and that therefore the test opening innovation has changed the probability vector so that π(test opening) ≠π(baseline opening). To classify these openings we must examine the direction of the change to determine whether the change in π favors the innovator, or whether it favors the opponent, or whether the change is in a neutral direction and does not favor either party. The direction information for these four test openings are shown in Table 6. ID E61KID A52BUG A45TVA A83DSG Name King's Indian Defense Budapest Gambit Trompovsky Attack Staunton Gambit distance 23.36 23.50 25.15 24.66 angle 356 352 178 161 quadrant 4 4 2 2 favors white white black black innovator black black white white Category F F F F Table 6 Classification of distant test openings according to direction What we see in Table 6 is that none of the test openings that changed the π vector gained from the change and also that none of these changes are in a neutral direction. All of these innovations are detrimental to the innovator and therefore all of them are classified as Category F, failed innovation. The information in Table 6 is presented visually in Figure 2 where one can see clearly that the Category F innovations by white decreased white's chance of winning or increased black's chance of winning or that they did both. Likewise Figure 2 also shows that the Category F innovations by black decreased black's chance of winning or increased white's chance of winning or that they did both. We now summarize our findings in Table 7. The table shows our final classification of all the test openings in view of the data we collected from our controlled engine experiments and their analysis as presented above. As noted in the table, none of the openings tested was a successful innovation and none of the changes in the probability vector occurred in a neutral direction. ID Name E43NID Nimzo Indian Defense Category C Observed distance explained by sampling variation Reason for the classification D37QGD Queen's Gambit Declined C Observed distance explained by sampling variation E61KID King's Indian Defense F Probability vector changed in favor of the opponent E11BID Bogo Indian Defense C Observed distance explained by sampling variation E01CAT Catalan Opening C Observed distance explained by sampling variation A81DUD Dutch Defense C Observed distance explained by sampling variation A52BUG Budapest Gambit F Probability vector changed in favor of the opponent A45TVA Trompovsky Attack F Probability vector was changed in favor of the opponent A83DSG Staunton Gambit F Probability vector changed in favor of the opponent NONE ---- A Probability vector changed in favor of the innovator NONE ---- C Probability vector changed in a neutral direction Table 7 Summary of findings COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 10 5. CONCLUSIONS Engine experiments carried out under controlled conditions show that the Queen's Indian Defense, chosen as the baseline opening in this study, may be considered to be neutral and perfect in this category of openings because none of the nine innovations tested offered any advantage to the innovator18. Of the nine test openings, five were found to be benign innovations as they had no measurable effect on the underlying and unobservable probability vector that determines chess game outcomes under the baseline conditions. The observed distances of these openings from the baseline may be explained in terms of sampling variation19. Two members of this group, the Dutch Defense, an innovation by black, and the Catalan Opening, an innovation by white, are noteworthy in that our findings are inconsistent with their rarity of play and supportive of their positive evaluation by analysts who have studied these openings (Bologan, 2012) (Harding, 2010) (Kelley, 2005) (Kelley, Catalan, 2008). The other three members of this group, the Nimzo Indian Defense, Queen's Gambit Declined, and the Bogo Indian Defense are universally considered to be strong openings (Dearin, 2005) (Sielecki, 2014) comparable with the Queen's Indian Defense and our findings along with their popularity in the opening book support this view. In four of the test openings, the evidence indicates that the opening innovation changed the probability vector. This means that the probability vector that generates game outcomes in these openings is not the same as that which generates game outcomes in the baseline Queen's Indian Defense. In all four cases the change in the probability vector goes against the innovator and so they are classified as failed innovations. The most significant member of this group is the King's Indian Defense, an opening that is at once a popular line in the opening book database (Jones & Powell, 2014) and also viewed in a positive light by many analysts (Kelley, Kings Indian, 2008) (Gserper, 2010) (Golubov, 2006). Our experiment shows that it is a failed innovation by black. Some analysts agree (Semkov, 2009) (Hansen, 2009). The other three failed innovations are less controversial because our findings are consistent with general opinion and the opening book. The Staunton Gambit, the Trompovsky Attack, and the Budapest Gambit are played very rarely at high level games according to the opening books (Jones & Powell, 2014) and analysts generally tend to project a negative opinion on these innovations (Dzindzichashvili, 2009) (Schiller, 1993) (Prie, 2009). The motivation for this study is not so much to pass judgment on specific opening lines but rather to develop and refine an objective methodology for the evaluation of chess openings in general. We recognize that at a sufficiently high move imperfection rate (IMP) chess games would be decided mostly by move errors and in those games the disadvantage of the failed opening innovations noted in this study may not become apparent20. Yet the relative merit of opening lines is of great interest to the chess community and its evaluation has a practical application for designers of opening books. 18 A comparison of the Queen's Indian Defense with the Sicilian mainline in the Appendix further supports its selection as a neutral and perfect baseline against which other queen's pawn openings may be compared. 19 The term "sampling variation" refers to the difference among samples taken from the same population. 20 The move imperfection rate in the baseline opening as measured by the percentage win by black is 13% in grandmaster games (Jones & Powell, 2014) compared with 2% in our engine experiments. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 11 6. APPENDIX 6.1 Comparison of the playing strength of the engines. One of the test opening experiments had a result very similar to that of the baseline opening. To compare engine strengths at the same sample size used for comparing openings (n=300), we combine these two experiments into a large sample of 600 games as shown below. Opening E12 Queens Indian Defense E43 Nimzo Indian Defense Combined E12 + E43 Games played 300 300 600 White won 30 30 60 Black won 6 7 13 Of the 600 games, each engine played 300 games as white and 300 as black. We now count for each engine the number of games won as white and the number won as black. These data allows us to set up the comparison of the engines as follows. Engine Houdini3Pro Houdini4Pro Games played as white 300 300 Won as White 33 27 Games played as black 300 300 Won as Black 4 9 These results, plotted in Cartesian coordinates show a Euclidean distance between them of 7.8102. As in the evaluation of openings, we create a simulated sampling distribution and estimate the variation of the distance that we can expect from one sample to the next. We can now set up a hypothesis test for distance as follows: Ho: Ha: δ=0 δ≠0 The data are tabulated below and shown graphically in Figure 3.. Observed distance Standard deviation of the sampling distribution of distances The value of the t-statistic for Ho: δ=0 Probability that we would observe a t-value this large or larger if δ=0 Probability value that serves as our threshold of disbelief 7.8102 5.8151 1.3431 0.1794 0.001 The test shows that the distance observed is one we would to observe as sampling error even when the true value of the distance is zero. Therefore we fail to reject the Ho statement that δ=0 and conclude that the evidence does not show that there is a difference in playing strength between the two engines under the experimental conditions used in this study. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 12 Houdini4Pro -vs- Houdini3Pro 25 Won as Black 20 15 10 5 0 0 10 20 30 40 50 60 Won as White Figure 3 Comparison of engine playing strength 6.2 Comparison of two baseline openings. A baseline opening used in a previous study (Munshi, Comparing Chess Openings, 2014) and the one used in this study are compared. The sample data from the DED experiments are as follows21: Opening E12 Queens Indian Defense B53 Sicilian Defense Games played 300 300 White won 30 19 Black won 6 3 The Euclidean distance between these results in Cartesian coordinates is 11.402 and the Queen's Indian Defense projects at an angle of 14 degrees from the Sicilian Defense. The standard deviation of distance is estimated using Monte Carlo simulation to be 5.3208. Using the t-distribution we find that the probability of observing a sample distance of 11.402 or greater under these conditions is 0.0322, much larger than our threshold of α=0.001. We fail to reject Ho that the true distance δ=0 and that the two probability vectors are the same. In any event, even if the observed distance were larger and we had rejected Ho, we would have to consider that the angle lies in the first quadrant and that therefore the large distance would only indicate a difference in the probability of decisive games and not necessarily a relative advantage to either white or black. The comparison is shown graphically in Figure 4. 21 The three-move sequences used in the test are: B53 Sicilian Defense = e4c5 Nf3d6 d4cxd4 and E12 Queen's Indian Defense = d4Nf6 c4e6 Nf3b6. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 13 Number of games won by Black B53 Sicilian -vs- E12 QID 15 10 5 0 0 10 20 30 40 50 60 Number of games won by White Figure 4 E12 Queen's Indian Defense compared with B53 Sicilian Defense 22 6.3 Transpositions. The ECO codes used to identify the baseline and test openings apply to the first three moves that were fixed for each experiment. Engine calculations began in the fourth move and the engine moves often caused transpositions to different ECO designations. All such transpositions are noted in Table 8 along with references to recent grandmaster games. ID E12QID E43NID D37QGD E61KID E11BID E01CAT A81DUD A52BUG A45TVA Transpositions E12 Queen's Indian/Petrosian Variation E14 Queen's Indian/ Classical Variation E16 Queen's Indian/ Classical Variation E43 Nimzo Indian/Nimzowitch Variation E43 Nimzo Indian/ Nimzowitsch Variation E47 Nimzo Indian/ Mainline D37 Queen's Gambit Declined E70 Kings Indian Defense E71 Kings Indian Defense E90 Kings Indian Defense E91 Kings Indian Defense E11 Bogo Indian Defense E15 Queens Indian/Classical Variation E16 Queens Indian/Classical Variation E01 Catalan Opening E04 Catalan Opening E06 Catalan Opening E11 Bogo Indian Defense E16Queen's Indian/Classical Variation A81 Dutch Defense A87 Dutch/Leningrad Variation A88 Dutch/Leningrad Variation A89 Dutch/Leningrad Variation A52 Budapest Gambit A45 Trompovsky Attack Grandmaster games (Carlsen-Karjakin, 2012) (Kramnik-Pelletier, 2013) (Gelfand-Gashimov, 2012) (Radjabov-Leitao, 2001) (Gelfand-Grischuk, 2012) (Topalov-Kramnik, 2014) (Carlsen-Radjabov, 2014) (Aronian-Radjabov, 2013) (Aronian-Carlsen, 2013) (Nakamura-Morozevich, 2013) (Caruana-Short, 2013) (Radjabov-Karjakin, 2012) (Bratteteig-Pleninger, 2012) (Kramnik-Leko, 2012) (Andreikin-Mamedyarov, 2014) (Caruana-Karjakin, 2012) _______________________________ (Cramling-Kosteniuk, 2000) (Gupta-Carr, 2013) (Aronian-Namakura, 2012) (Phillips-Moloney, 2012) (Gelfand-Rapport, 2014) (Rapport-Aronian, 2014) Table 8 Transpositions 22 In all such graphs the baseline is shown in blue with square markers and the test opening is shown in red with diamond shaped markers. COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 6.4 14 Graphical depiction of Monte Carlo simulation results Number of games won by Black E12 QID -vs- E43 NID 20 15 10 5 0 0 10 20 30 40 50 60 50 60 Number of games won by White Number of games won by Black E12 QID -vs- D37 QGD 15 10 5 0 0 10 20 30 40 Number of games won by White Number of games won by Black E12 QID -vs- E61 KID 15 10 5 0 0 20 40 60 Number of games won by White 80 100 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 15 Number of games won by Black E12 QID -vs- E11 BID 20 15 10 5 0 0 10 20 30 40 50 60 Number of games won by White Number of games won by Black E12 QID -vs- E01 CAT 15 10 5 0 0 10 20 30 40 50 60 Number of games won by White Number of games won by Black E12 QID -vs- A81 DUD 15 10 5 0 0 20 40 60 Number of games won by White 80 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 16 Number of games won by Black E12 QID -vs- A52 BUG 15 10 5 0 0 20 40 60 80 100 Number of games won by White Number of games won by Black E12 QID -vs- A45 TVA 20 15 10 5 0 0 10 20 30 40 50 60 50 60 Number of games won by White Number of games won by Black E12 QID -vs- A83 DSG 40 30 20 10 0 0 10 20 30 40 Number of games won by White COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 17 7. REFERENCES Andreikin-Mamedyarov. (2014). World Chess Championship Candidates. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751553 Aronian-Carlsen. (2013). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1704703 Aronian-Namakura. (2012). Tal Memorial. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1669174 Aronian-Radjabov. (2013). World Championship Candidates. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1714072 Bologan, V. (2012). The Powerful Catalan. New in Chess. Bratteteig-Pleninger. (2012). London Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1700479 Carlsen-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1654399 Carlsen-Radjabov. (2014). Gashimov Memorial. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1753385 Caruana-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1744284 Caruana-Short. (2013). London Chess Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1741071 Cramling-Kosteniuk. (2000). WCC. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1258901 Dearin, E. (2005). Play the Nimzo Indian. Everyman Chess. Dzindzichashvili, R. (2009). Budapest gambit. Retrieved 2014, from Youtube: http://www.youtube.com/watch?v=-ShFKpTkL9Q Ferreira, D. (2013). The impact of search depth on chess playing strength. Retrieved 2014, from Instituto Superior Tecnico: http://web.ist.utl.pt/diogo.ferreira/papers/ferreira13impact.pdf Gelfand-Gashimov. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1654442 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 18 Gelfand-Grischuk. (2012). World Rapid Championship. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1671296 Gelfand-Rapport. (2014). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1744267 Golubov, M. (2006). Understanding the King's Gambit. Gambit Publications. Gserper, G. (2010). King's Indian Defense. Retrieved 2014, from chess.com: http://www.chess.com/article/view/openings-for-tactical-players-kings-indian-defense Gupta-Carr. (2013). London Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1740547 Hansen, C. (2009). Checkpoint. Retrieved 2014, from Chesscafe.com: http://www.chesscafe.com/text/hansen125.pdf Harding, T. (2010). Play the Dutch. Retrieved 2014, from chesscafe.com: http://www.chesscafe.com/text/kibitz175.pdf Houdart, R. (2012). Houdini. Retrieved November 2013, from cruxis.com: http://www.cruxis.com/chess/houdini.htm Houdart, R. (2013). Houdini Chess. Retrieved 2014, from cruxis.com: http://www.cruxis.com/chess/houdini.htm Johnson, V. E. (2013, November). Revised Standards for Statistical Evidence. Retrieved December 2013, from Proceedings of the National Academy of Sciences: http://www.pnas.org/content/110/48/19313.full Jones, R., & Powell, D. (2014). Game Database. Retrieved February 2014, from chesstempo.com: http://chesstempo.com/game-database.html Kelley, D. (2008). Catalan. Retrieved 2014, from chessopenings.com: http://chessopenings.com/catalan/ Kelley, D. (2005). Dutch. Retrieved 2014, from chessopenings.com: http://chessopenings.com/dutch/ Kelley, D. (2008). Kings Indian. Retrieved 2014, from chessopenings.com: http://chessopenings.com/kings+indian/ Kramnik-Leko. (2012). Dortmund. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1672566 Kramnik-Pelletier. (2013). Geneva Chess Masters. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1722084 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 Meyer-Kahlen, S. (2007). Deep Shredder. Retrieved 2014, from Shredderchess.com: http://www.shredderchess.com/chess-software/deep-shredder12.html Meyer-Kahlen, S. (2000). Opening Database. Retrieved January 2014, from Shredder Chess: http://www.shredderchess.com/online-chess/online-databases/opening-database.html Munshi, J. (2014). A Method for Comparing Chess Openings. Retrieved 2014, from SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2415203 Munshi, J. (2014). Comparing Chess Openings. Retrieved 2014, from SSRN: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2427542 Munshi, J. (2014). Numerican analysis. Retrieved 2014, from Drop box: https://www.dropbox.com/sh/c7ze8c64ukpf525/AADAuuDx_mxXU6BVH-6GoF2ka Munshi, J. (2014). PGN Files. Retrieved 2014, from Dropbox: https://www.dropbox.com/sh/nj14ur3cucew5xo/AABaz_LViWNpRTSaBKlUWmj5a Nakamura-Morozevich. (2013). FIDE Grand Prix Zug. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1716006 Phillips-Moloney. (2012). London Classic. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1700347 Prie, E. (2009). d-pawn specials. Retrieved 2014, from chesspublishing.com: http://www.chesspublishing.com/content/8/jul09.htm Radjabov-Karjakin. (2012). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1654273 Radjabov-Leitao. (2001). E43 Nimzo Indian. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1242308 Rapport-Aronian. (2014). Tata Steel. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1744275 Schiller, E. (1993). How to play against the Staunton Gambit. Chess Digest. Semkov, S. (2009). Kill KID Vol. 1. New in Chess. Sielecki, C. (2014). Nimzo and Bogo Indian. Everyman Chess. Topalov-Kramnik. (2014). World Chess Championships Candidates. Retrieved 2014, from chessgames.com: http://www.chessgames.com/perl/chessgame?gid=1751385 Wikipedia. (2014). Houdini Chess. Retrieved 2014, from Wikipedia: http://en.wikipedia.org/wiki/Houdini_(chess) 19 COMPARING CHESS OPENINGS PART 3, JAMAL MUNSHI, 2014 Wikipedia. (2014). Monte Carlo Simulation. Retrieved 2014, from Wikipedia: http://en.wikipedia.org/wiki/Monte_Carlo_method 20