Academia.eduAcademia.edu

A METHOD FOR COMPARING CHESS OPENINGS 1

A quantitative method is described for comparing chess openings. Test openings and baseline openings are run through chess engines under controlled conditions and compared to evaluate the effectiveness of the test openings. The results are intuitively appealing and in some cases they agree with expert opinion. The specific contribution of this work is the development of an objective measure that may be used for the evaluation and refutation of chess openings, a process that had been left to thought experiments and subjective conjectures and thereby to a large variety of opinion and a great deal of debate. 2 1 An error in the Caro-Kann opening was corrected on March 15,

A METHOD FOR COMPARING CHESS OPENINGS1 JAMAL MUNSHI ABSTRACT. A quantitative method is described for comparing chess openings. Test openings and baseline openings are run through chess engines under controlled conditions and compared to evaluate the effectiveness of the test openings. The results are intuitively appealing and in some cases they agree with expert opinion. The specific contribution of this work is the development of an objective measure that may be used for the evaluation and refutation of chess openings, a process that had been left to thought experiments and subjective conjectures and thereby to a large variety of opinion and a great deal of debate. 2 1. INTRODUCTION The first few moves in chess are referred to collectively as the opening. The importance of the opening is widely recognized. Its importance rests on the assumption that it affects the outcome of the game. The rationale for this assumption is that in the opening both white and black attempt to deploy their pieces and simultaneously to occupy and control the four empty squares in the middle of the board and the party that gets ahead in achieving these goals enjoys a higher probability of winning the game ceteris paribus (Horowitz, 1964) Chess openings have been an intense area of research for more than four hundred years in the European era of the game and many hundreds of standardized openings are now recognized, organized, and listed according to a system known as the Encyclopedia of Chess Openings or ECO (Wikibooks, 2013). These opening moves evolved over the centuries from the analyses and games of the grandmasters of chess. All serious chess players today begin their games with one of these proven opening sequences. These openings differ from one another with respect to how well they facilitate their objective of center control and piece deployment. A direct comparison of openings in this regard is made difficult by the complexity of their intended function. Openings with a weakness in one aspect of the game may have a compensating strength in another or those that are strong in the beginning may offer opportunities for a counter attack later in the game. Also, openings that don't control the center with pawns and knights may do so remotely with bishops; or an apparent opportunity for the opposition may serve as a trap (Horowitz, 1964). These complexities have made it difficult to compare openings directly or to refute weak openings because no universal objective measure exists to make the necessary comparisons (Chessvibes, 2009). To judge the relative merit of an opening, chess players depend on two sources - grandmaster analyses and game statistics. Both of these resources are unreliable. 1 An error in the Caro-Kann opening was corrected on March 15, 2014 Date: February 2014 Key words and phrases: chess openings, chess engines, refutation, methodology, Monte Carlo simulation, numerical analysis, probability vector, Euclidean distance, robust statistics, bootstrap Author affiliation: Professor Emeritus, Sonoma State University, Rohnert Park, CA, 94928, [email protected] 2 A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 2 The grandmasters of the game are able to look ahead ten moves or more to assess all the possible ways the game can evolve from a given position. They use these faculties to make their picks for strong openings and to declare their refutations for weak ones. In fact, we owe the existence of rote "book" openings to these kinds of analyses that have been published since the 17th century (Horowitz, 1964). Yet there is no general agreement among the grandmasters on this issue even to the point that we may find that an opening that has been refuted by one grandmaster is actually played by another. For example, the King's Gambit, famously refuted by former world champion Bobby Fischer (Fischer, 1961), continues to be played even by grandmasters (chessgames.com, 2014) Thus, grandmaster analyses by themselves, though necessary, are not sufficient to assess the relative merits of chess openings. The second resource for comparing openings are large databases containing the first ten or so moves from each of millions of games organized into an "Opening Book" format (Meyer-Kahlen, 2007). These databases allow the user to view the popularity as well as win, loss, and draw statistics for each opening line. Although universally used, these data are not as useful as they might appear because they are field data that have not been taken under controlled conditions. They are confounded by larger effects than the effect of the opening that they profess to measure. Two important uncontrolled variables in these data are (1) the level of play and (2) the difference in playing strength between the two players. These variables are likely to have a greater effect on game outcome than the choice of opening. The lower the level of play, the higher the move imperfection rate, and the greater the effect of imperfect moves on game outcome. The effect of the opening on game outcome may not be detected under these conditions. Likewise, the greater the difference in playing strength between the two opponents the greater will be its effect on game outcome with the relatively weaker effect of the opening not observable under these conditions. The variance in game outcomes created by these stronger explanatory variables makes it difficult to measure the effect of the opening. Opening effectiveness data taken under these conditions may be unable to discriminate between good and bad openings. This work proposes a method for generating chess game outcomes in controlled experiments designed to detect the effect of the opening on game outcome. 2. THEORY A chess game is a trinomial stochastic process. The three possible outcomes of a game may be enumerated as white wins, black wins, and draw and so we may model the chess game as a discrete stochastic process driven by an unknown and unobservable probability vector (Wikipedia, 2013) with two degrees of freedom expressed as π[pw, pb, pd] where π is the probability vector that generates game outcomes, pw = the probability that white will win, pb = the probability that black will win, and pd = 1 - pw - pb = the probability that the game will end in draw. As pw and pb cannot be assumed to be independent, it is necessary treat them as a vector and to analyze them in two dimensional space. The Cartesian space xy defined as x= pw and y= pb contains all possible chess games. Each (x,y) point in this space represents a unique chess game. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 3 For ease of evaluating any given opening, we assume that there exist neutral and perfect chess openings in which each side plays a move from a set of best possible moves so that neither side gains or cedes an advantage due to the opening. Before the game starts, white enjoys a first move advantage that can be translated into a certain probability pw =FMA that white will win (Wikipedia, 2013). Thus in the simplest case where the two players are both equal and perfect chess players who can correctly evaluate all possible future states of the board for every possible move, and who have played a perfect and neutral opening, the probability that the game will be decisive is FMA and the probability vector may be described with: pw = FMA, pb = 0, pd = 1-FMA π[pw, pb, pd]=[FMA,0,1-FMA] That is to say, perfect chess games end in draw except for a certain probability, likely to be small, that white will win because it makes the first move. Black has no chance of winning a perfect game. Now, suppose that the two players are equal but not perfect. Random imperfect moves in the games will create a certain probability of decisive games that applies equally to both sides. The probability of a decisive game is the sum of the two independent probabilities IMP=probability of imperfection and FMA=the first move advantage. Under these conditions, the win probabilities are as follows: pw = FMA + IMP/2, pb = IMP/2, pd = 1 - FMA - IMP π[pw, pb, pd]=[FMA+IMP/2,IMP/2,1-FMA-IMP] Thus black wins only by virtue of imperfections and white wins not only by imperfection but also because it holds the first move advantage. For example, if the first move advantage is given by FMA = 4% and the imperfection rate is given by IMP = 10% then under the neutral opening condition for two players of equal playing strength, pw = 0.04 + 0.10/2 = 9%, and pb = 0.10/2 = 5%. In this case, game outcomes are generated in a stochastic process driven by the probability vector π[pw, pb, pd]=[0.09,0.05,0.86]. In any given game the probability that white will win is 9%, the probability that black will win is 5%, the probability that the game will be decisive is 14%, and the probability that the game will end in draw is 86%. Now suppose that at a higher level of play, the imperfection is reduced to 2%. In that case pw = 5%, pb = 1%, and pd = 94% and therefore π[pw, pb, pd]=[0.05,0.01,0.94]. These relationships are consistent with the observation that at higher levels of play more games end in draw and of the decisive games, white wins more than black. The FMA is undetectable at high levels of IMP. If the opening is not neutral and not perfect then the probability vector π[pw, pb, pd] will be changed by the opening. This idea forms the basic principle of this work. If the imperfect opening is an innovation by black its imperfection may increase pw or decrease pb, or it may do both. Likewise, an imperfect innovation by white may decrease pw or increase pb or it may do both. The net result is that the opening imperfection will change the probability vector. To measure this effect we use controlled experiments and collect game outcome data under conditions in which other variables are fixed at levels at which that the relatively weaker effect of the opening may be detected. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 4 3. THE PROPOSED METHODOLOGY The proposed method of comparing openings makes use of a class of software referred to as "chess engines" to set up controlled experiments in which the effect of the opening may be detected. A chess engine is a computer program that selects a chess move to make in any given position of the pieces on the chessboard (Wikipedia, 2014). It constructs a tree of all possible move sequences3 to a given search depth and calculates the positional value for each side at the end of each node of the tree and then selects the move that yields the greatest positional advantage. Typically millions of positions may be evaluated to make a single move. Although the engine's method of making moves is dramatically different from the way humans play chess it generates the same kind of game and, at sufficient depth, the engines can outplay humans even at the highest levels (Wikipedia, 2014). To be able to detect the effect of the opening, it is necessary that the two players be of the same playing strength so that the effect of strength differential on game outcome is removed. In our engine experiment this condition is achieved by setting up an engine match in which both sides of the board are played by copies of the same engine utilizing identical parameters and set to evaluate board positions at the same search depth. The additional requirement is that the move imperfection rate should be low enough to render the effect of the opening detectable and measurable. This condition is realized by running the engines at a sufficient search depth of their look ahead move tree. The move imperfection rate IMP cannot be eliminated because at any finite search depth the positional values used to select a move may be inaccurate; but it is possible to hold IMP to a rate that is low enough for the effect of the opening to be be measured. For our experiments we selected the Houdini3Prox64 chess engine, generally considered to be the industry leader (Wikipedia, 2013). The numeral "3" designates the version of the software, "64" designates that the software uses a 64-bit architecture, the word "Pro" indicates that more than six processors may be used. Version 4 of this software was released in November of 2013. It was not used in these experiments because it is relatively new and has not been tested in the field for a sufficient period of time. A preliminary exploratory study was used to select a search depth of 22 plies4 and a sample size of 300 games per experiment for this study. A search depth of 22 plies, equivalent to looking ahead eleven moves, meets the criterion that the engine should look ahead at least ten moves. It is generally recognized that grandmasters use board pattern recognition in a way that is equivalent to a look-ahead of up to ten moves (Simon, 1996). A further condition for a relatively high level of play is that when using a standardized set of neutral openings, the imperfection rate, measured as black's win rate, should be low and that the win rate of white should exceed that of black. Houdini3Prox64 playing at a search depth of 22 plies meets these conditions. 3.1 Baseline and test openings. It is necessary to establish a baseline to serve as a control for chess game outcome statistics against which the performance of a test opening may be evaluated. The ideal baseline opening to use is one which is similar to the opening to be tested with a common stem in the move sequence so that the specific innovation of the test opening can be identified and evaluated. Ideally, 3 The tree is "pruned" to speed up the evaluation process One complete "move" in chess requires two "half moves" one from each side of the board. Chess engines measure their search depth in half moves usually referred to as "plies". 4 A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 5 the baseline opening should be neutral and "perfect" in the sense that each side responds with a move from the set of best possible moves and in course of the opening neither side cedes an advantage to the other due to the opening moves. To ensure that the outcome statistics are comparable, the number of half moves specified in the baseline is set equal to the half moves specified in the openings to be tested so that the point in the game where the engine begins to evaluate positional value is the same for the two openings being compared. A length of six half moves was selected and fixed for all openings used in the study. Baseline and test openings for this study are selected from a popular online opening database (Jones & Powell, 2014). This database allows the users to set the minimum Elo rating (Wikipedia, 2014) of the players when searching the database of chess games. For the purpose of this study we have restricted our search to a very high level of play by setting the minimum Elo rating to 2600. For the first three moves of chess games at this level we find that the most used opening is C685 Ruy Lopez6 and the second most used opening line is B53 Sicilian Defense. We therefore select these lines as our baseline openings against which openings to be tested will be compared. Ten test openings are randomly selected for this study from a large set of openings from the same database that (1) share a common stem of initial moves with the baseline to which it will be compared and (2) are rarely played at the high level of play chosen. The selected baseline and test openings are shown in Table 1. The test openings are separated into two groups according to the baseline with which the test opening will be compared. Each test opening is identified by an "innovation" which marks the point of departure of the test opening from the stem it shares with the baseline. The relative rarity of the innovation is computed as a ratio of the frequency of the corresponding baseline move divided by the frequency of the innovation at the selected level of play. For example, the rarity of C50 Giuoco Piano7 shown to be 9.0 means that at a level of play above 2600 Elo, the baseline move 3. Bb5 is played nine times more frequently than the innovation 3. Bc4. The word innovation carries no meaning other than to convey that it is a departure from the baseline. Expt# Group 1 2 Opening Sequence Description Innovation Ratio 1 e4e5 Nf3Nc6 Bb5a6 C68 Ruy Lopez 1 e4e5 Nf3Nc6 Bb5Nd4 C61 Bird Defense 3 1 e4e5 Nf3Nc6 Bc4Bc5 4 1 5 6 Rarity Baseline 3…a6/3…a6 1.0 …Nd …a / …Nd 254.5 C50 Giuoco Piano 3. Bc4 3. Bb5/3. Bc4 9.0 e4e5 Nf3Nc6 d4exd4 C44 Scotch Game 3. d4 3. Bb5/3. d4 10.9 1 e4e5 Nf3d6 d4exd4 C41 Philidor …d …N / …d 100.2 1 e4e5 f4exf4 Nf3g5 C37 Kings Gambit 2. f4 2. Nf3/2. f4 43.7 7 2 e4c5 Nf3d6 d4cxd4 B53Sicilian Defense Baseline 2. Nf3/2. Nf3 1.0 8 2 e4c5 d4cxd4 c3dxc3 B21 Smith Morra 2. d4 2. Nf3/2. d4 1459.2 9 2 e4c5 c3d5 exd5Qxd5 B22 Sicilian Alapin 2. c3 2. Nf3/2. c3 35.8 10 2 e4c6 d4d5 e5Bf5 B12 Caro Kann 11 2 e4d5 exd5Qxd5 Nc3Qd6 B01 Scandinavian 12 2 e4d6 d4Nf6 Nc3g6 B07 Pirc … … / … 4.0 1…d … / …d 39.6 …d … / …d 18.5 Table 1 Baseline and test openings selected for this study: UPDATED IN MARCH 2014 5 The ECO designations apply to the first three specified moves. Transpositions are noted in the text. Also known as the Spanish Game 7 Also known as the Italian Game 6 A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 6 It is assumed that the innovator seeks a greater advantage than that offered by the neutral baseline opening sequence from which it has parted. There are three possible outcomes of an innovation. If the innovation is a success, we would expect it to change π[pw, pb, pd] in favor of the innovator. If the innovation is a failure, it backfires. In that case, we would expect it to change π[pw, pb, pd] to the detriment of the innovator. The third possibility is that the innovation fails to create the advantage sought by the innovator but it does no harm, in which case we would not expect to see any change in π[pw, pb, pd]. We set up controlled experiments and hypothesis tests to select one of these effects as the likely outcome of the innovation. Houdini 3x64 and Houdini3Prox64, products of cruxis.com (Houdanrt, 2012) are used for all engine experiments. The two engines are considered equivalent and differ only with respect to speed. The nonPro version is limited to 6 processors (or cores) while the Pro version does not have such a limit. The engine parameters are set to the default values as shipped except for the number of threads which is set to the number of cores available for the Pro version and to 6 for the non-Pro version. All engine experiments in this study are carried out with contempt = 1. The Deep Shredder graphical user interface (GUI) from Shredder Chess (Meyer-Kahlen, 2007) was chosen for this study. Once the Houdini3 engine is installed in the GUI, we use the "Level of Play", "Opening Book", and "Engine Match" facilities of the GUI to set up each experiment. The Level of Play is set to a fixed search depth of 22 plies for each move the engines make. The Opening Book function is used to specify the opening to be used in the games by both sides. Finally, the Engine Match function is used to set up a match with either Houdini3Prox64 or Houdini3x64 playing both white and black moves. The number of games to be played is set to the selected sample size of 300 games. At the end of each match, the number of wins by white and the number of wins by black are recorded. 3.2 Method for comparing match results. We assume that each game of an experiment is akin to tossing an unfair loaded three-sided "coin" in which the loadings are unknown. The loaded "coin" can come up as "white wins", "black wins", or "draw". The process is driven by an unknown probability vector π[pw, pb, pd]. In the controlled experiments the probability vector is assumed to be completely specified by three parameters, namely, white's first move advantage, the general rate of imperfection in the engine's position evaluation function, and the effect of the opening. The vector π[pw, pb, pd]is not known to us and it is not directly observable. It is necessary to infer its value from the data. The effectiveness of openings are then evaluated by comparing the inferred values of their respective probability vectors. Each experiment of n=300 games is likened to tossing 300 of these three-sided unfair loaded coins. We then count w = the number of white wins and b = the number of black wins and compute P[pw, pb, pd]= [w/300, b/300, 1-w/300-b/300] where P[pw, pb, pd] is an unbiased estimate of the unobservable probability vector π[pw, pb, pd]. If we toss the coins again we are likely to get different values of w and b and therefore it is necessary to know how different these values could be. The uncertainty of the estimate is assessed using a simulation procedure to generate one thousand simulated repetitions of the 300-game experiment. The use of simulations to estimate variance is described by Grinstead and Snell (Snell, 1997). Microsoft Excel is used as the A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 7 simulation tool. The procedure for generating simulated experiments is as follows. First we select a column of 300 random numbers from a uniform distribution with values from zero to one using the RAND() function of Excel. Every occurrence of a random number that is less than or equal to w/300 is marked as a win by white. Those greater than 1-w/300-b/300 are marked as a draw. The rest are marked as a win by black. This procedure is repeated one thousand times to generate one thousand simulated repetitions of the experiment. The variations in the P[pw, pb, pd] vector among these simulated repetitions serves as our measure of uncertainty in our estimated value of π[pw, pb, pd]. The computational details and simulation results for both Group 1 openings (Munshi, Group 1 Simulations, 2014) and Group 2 openings (Munshi, Group 2 Simulations, 2014) are available in the data archive for this paper. Number of games won by Black To compare two openings, we plot the simulation results for both openings in the same Cartesian xy space with x= number of white wins and y=number of black wins as shown in Figure 18. Each marker in the plot represents a unique simulated match of 300 games. In this particular example we note that the two openings appear to form distinct clusters of games. The visual comparison appears to indicate that white wins more games and black wins fewer games in the baseline opening than in the test opening. The essential research question is whether the unobservable probability vectors that generated the two sets of games are different; or whether both sets of simulated results could have been generated by the same underlying probability vector π[pw, pb, pd]. If they are different, the difference is described as an effect of the opening since all other variables are controlled and fixed for both sets of games. Baseline -vs- Test 25 20 15 10 5 0 0 10 20 30 40 Number of games won by White Figure 1 Graphical display of simulated match outcomes 3.3 Hypotheses: The research question may be stated in hypothesis format as follows: Ho: π[pw, pb, pd](test opening) = π[pw, pb, pd] (baseline opening) Ha: π[pw, pb, pd] (test opening) ≠ π[pw, pb, pd] (baseline opening). 8 In all such graphs presented in this paper the baseline control opening is represented by diamond markers and the test opening is represented by square markers. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 8 In essence, if the same underlying probability vector could have generated both of the observed experimental results then we fail to reject Ho. Otherwise we reject Ho and conclude that there is an opening effect and use the graph to determine whether the innovation is a success or a failure and if it is a failure whether the failure is benign on injurious to the innovator. For example, if Ho is rejected in the case shown in Figure 1 and if the innovator is black, then the innovation is successful; but if the innovator is white then the innovation is a failure and injurious to the innovator. If we fail to reject Ho, we conclude that test opening is a benign innovation. The hypothesis is tested by computing the observed Euclidean distance9 between the centroids of the two treatment clusters and the standard deviation of the Euclidean distances of the simulated games from their respective centroids. We assume that if the two π[pw, pb, pd] vectors are the same then the distance between them is zero; and conversely, that if the distance between them is greater than zero, they must be different. We can then state our testable hypotheses as follows: Ho: distance between centroids = 0 Ha: distance between centroids ≠ 0 We test the hypothesis using the t-distribution holding our comparison error rate to α=0.001 and our experiment-wide error rate to α=0.005. These values of α have been proposed by Valen Johnson who makes a strong case that they enhance the reproducibility of results (Johnson, 2013). The Bonferroni adjustment for multiple comparisons (Abdi, 2007) implies that up to five comparisons may be made to each baseline opening experiment within an overall experiment-wide error rate of to α=0.005. 4. DATA ANALYSIS AND RESULTS Table 2 shows the data from twelve engine experiments. The essential data in the table are White = number of games won by white, Black = number of games won by black, Draw = number of games that ended in draw, and Total = the number of games played. The moves played in all 3,600 games have been made available in PGN format in the data archive for this paper (Munshi, OpeningPaperData, 2014). Expt# 1 2 3 4 5 6 7 8 9 10 11 12 Group 1 1 1 1 1 1 2 2 2 2 2 2 Description C68 Ruy Lopez C61 Birds Defense C50 Giuoco Piano C44 Scotch Game C41 Philidor C37 Kings Gambit B53 Sicilian Defense B21 Smith Morra B22 Sicilian Alapin B12 Caro Kann B01 Scandinavian B07 Pirc White 16 56 3 12 32 4 21 7 17 37 59 60 Black 3 2 7 7 0 29 2 13 3 4 0 2 Draw 281 242 290 281 268 267 277 280 280 259 241 238 Total 300 300 300 300 300 300 300 300 300 300 300 300 Table 2 Observed sample statistics: raw data: UPDATED IN MARCH 2014 9 In all references to "distance" we mean the absolute value of the Euclidean distance. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 9 4.1 The baseline openings and the level of play. Consider the for data for the two baseline openings shown in Table 2 as Experiment #1 and Experiment #7. Note that more than 90% of the games in these experiments ends in draw, that white wins more games than black, and that black does not win more than 1% of the games. If we combine the two experiments into a 600-game sample we can estimate that the move imperfection rate is 1.67% and that white's first move advantage is 4.5%. The probability that the data are tainted by bad engine moves is therefore assumed to be low and inconsequential to the findings. These relationships are indicative of a high level of play and serve to validate the use of these openings as neutral and "perfect" baselines against which the test openings may be compared (Wikipedia, 2013). The Monte Carlo simulations of the baseline experiments are depicted graphically in Figure 2. Each simulated experiment was generated with the estimated probability vectors P[pw, pb, pd] (Ruy Lopez) = [0.0533,0.01,0.9367] and P[pw, pb, pd] (Sicilian) = [0.07,0.0067,0.9233] and a sample size of 300 games. The graph in Figure 2 shows that the there are many overlapping results that could have been generated by either estimated probability vector and that the two openings do not form two distinct clusters of games. So we suspect that the two observed experiment results could have been generated by the same underlying and unobservable probability vector π[pw, pb, pd]. In fact, the distance between the openings is quite small and the t-test shows a p-value of 0.254 much larger than our α= 0.001. So we fail to reject Ho in this case and conclude that π[pw, pb, pd] (B53) could be equal to π[pw, pb, pd] (C68) because the evidence does not show that the two probability vectors are different. The two baseline openings are thus not only efficient but equivalent. The simulation data and computational details of the comparison are available in the online data archive for this paper (Munshi, C68 C53 Comparison, 2014). Number of games won by Black The game data show that the engines have played generally the mainline moves one finds in grandmaster games (Jones & Powell, 2014). In the C68 Ruy Lopez experiment we find that all 300 games have maintained the C68 designation through the entire game. In the B53 Sicilian Defense experiment most of the games pertain to the B53 designation but B90 Sicilian Najdorf, B92 Sicilian Najdorf, B73 Sicilian Dragon, and B76 Sicilian Dragon lines also occur. In the rest of this study all references to the ECO designation B53 Sicilian Defense should be interpreted in this context. The actual moves played are shown in PGN format for both baseline openings, C68 Ruy Lopez (Munshi, Experiment 01, 2014) and C53 Sicilian Defense (Munshi, Experiment 07, 2014), in the data archive for this paper. C68 Ruy Lopez -vs- B53 Sicilian Defense 12 10 8 6 4 2 0 0 10 20 30 Number of games won by White Figure 2 Monte Carlo Simulation of the baseline openings 40 A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 10 4.2 Hypothesis tests. Table 3 is a summary of the hypothesis tests. Ten hypothesis tests are made. The null hypothesis is rejected in six of them. We are now in a position to interpret the results for each test opening shown in Table 3 by examining the plot of the simulation data in light of the hypothesis test results. Our objective is to use this information to classify each test opening innovation into one of three categories listed below. Category A: Category C: Category F: The innovation has succeeded by changing the probability vector in favor of the innovator. The innovation is benign. The data do not indicate that it has changed the probability vector. The innovation has failed. It has changed the probability vector to the detriment of the innovator. Test Opening Compare with Distance Stdev t-value p-value α Decision 1 C61 Birds Defence C68 Ruy Lopez 40.012 5.861 6.827 1.1E-11 0.001 Reject Ho 3 1 C50 Giuoco Piano C68 Ruy Lopez 11.705 3.904 2.998 0.00275 0.001 Fail to reject 4 1 C44 Scotch Game C68 Ruy Lopez 5.657 4.266 1.326 0.18497 0.001 Fail to reject 5 1 C41 Philidor C68 Ruy Lopez 16.279 4.949 3.289 0.00102 0.001 Fail to reject 6 1 C37 Kings Gambit C68 Ruy Lopez 28.636 5.016 5.709 1.3E-08 0.001 Reject Ho 8 2 B21 Smith Morra B53 Sicilian Defense 17.804 4.508 3.949 8.1E-05 0.001 Reject Ho 9 2 B22 Sicilian Alapin B53 Sicilian Defense 4.123 4.522 0.912 0.362 0.001 Fail to reject 10 2 B12 Caro Kann B53 Sicilian Defense 16.125 5.367 3.005 0.0027 0.001 Fail to reject 11 2 B01 Scandinavian B53 Sicilian Defense 38.053 5.839 6.517 9.1E-11 0.001 Reject Ho 12 2 B07 Pirc B53 Sicilian Defense 39.000 5.848 6.669 3.3E-11 0.001 Reject Ho Expt# Group 2 Table 3 Hypothesis tests: UPDATED IN MARCH 2014 5. DISCUSSION OF RESULTS 5.1 Experiment #2: C61 Bird Defense: Category F Number of games won by Black C68 Ruy Lopez -vs- C61 Birds Defense 12 10 8 6 4 2 0 0 20 40 60 Number of games won by White Figure 3 C61 Bird Defense 80 100 A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 11 In the hypothesis test in Table 3 we find that the observed distance between C61 Hypothesis test: and C68 in our sample is 40.012. Since Ho is rejected we can conclude that π[pw, pb, pd]C61) ≠ π[pw, pb, pd] (C68) If the π[pw, pb, pd] vectors of the two openings were the same, the probability that we would observe a distance of 40.012 or more between them in our sample is very close to zero and much less than our threshold of disbelief. Graph interpretation: We now examine Figure 3 for clues to how these probability vectors might differ and we find that the probability vector that drives game outcomes in the Bird Defense must contain a higher probability of a win by white. Since the innovator is black, the innovation has failed. We therefore classify the opening as Category F. The computational details for the comparison of these two openings are available in the data archive for this paper (Munshi, C68 C61 Comparison, 2014). General opinion: The finding is consistent with the rarity of the Bird Defense at the highest levels of play (Jones & Powell, 2014). However, it is interesting to note that there has been no call for a refutation of the Bird Defense except for an obscure note by chess teacher Edward Scimia that after 3…Nd4, "white will emerge with a small advantage due to having a better pawn structure" (Scimia, 2013). Transpositions: There were no transpositions in this opening. The opening designation remained as ECO C61 Bird Defense in all 300 games. The PGN file is available in the data archive of this paper (Munshi, Experiment02, 2014). Number of games won by Black 5.2 Experiment #3: C50 Giuoco Piano: Category C C68 Ruy Lopez -vs- C50 Giuoco Piano 20 15 10 5 0 0 10 20 30 40 Number of games won by White Figure 4 C50 Giuoco Piano In Table 3 we find no evidence that the Giuoco Piano opening is driven by a Hypothesis test: different probability vector than the one that generates games in the Ruy Lopez. If the π[pw, pb, pd]vectors A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 12 of the two openings were the same, the probability that we would observe a Euclidean distance of 11.705 or more between them in our sample is 0.00275, greater than our threshold of α=0.001. We therefore conclude that it is possible that chess games in the two openings may be driven by the same underlying probability vector. Graph interpretation: In Figure 4, we find that, even though the two simulation clusters appear to be different, there is an area of overlap between the two openings. This graph is consistent with the notion that the two sets of match outcomes could have been produced by the same underlying probability vector. Since we failed to reject Ho, we classify the innovation as Category C. It is a benign innovation. The computational details for this comparison is available in the data archive for this paper (Munshi, C68 C50 Comparison, 2014) The finding is consistent with the general opinion of chess players. The Giuoco General opinion: Piano is generally accepted as an equal to the Ruy Lopez (Kebu Chess, 2008). There has been no call for it to be refuted. Yet, the Giouoco Piano is rarely played by grandmasters (Jones & Powell, 2014). In my own collection of recent grandmaster games10, the Giuoco Piano is not played at all (Munshi, GM 2012 2013 Book, 2014). Transpositions: Although the first three moves were set according to the ECO designation C50 Giuoco Piano, many of the games transposed into ECO code C54. The PGN file is available in the data archive of this paper (Munshi, Experiment03, 2014). 5.3 Experiment #4: C44 Scotch Game: Category C Number of games won by Black C68 Ruy Lopez -vs- C44 Scotch Opening 18 16 14 12 10 8 6 4 2 0 0 10 20 30 40 Number of games won by White Figure 5 C44 Scotch Game Hypothesis test: In Table 3 we find that the distance between C44 and C68 is 5.657 and the probability that we would observe a distance this large or larger if if the π[pw, pb, pd]vectors that generate game outcomes in these openings were the same is 0.18497, greater than our threshold of disbelief. We 10 Unlike the online databases my game collection excludes blitz, rapid, and simul games. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 13 are therefore unable to reject the possibility that the game outcomes in the two openings are driven by the same underlying π[pw, pb, pd]vector. Graph interpretation: In Figure 5, we find that the simulated experiments do not form distinct clusters. There is a significant area of overlap. This graphic supports the notion that match outcomes in the two openings could have been generated by the same underlying probability vector. We therefore classify the innovation as Category C. It is a benign innovation. The computational details for the comparison of these two openings are available in the data archive for this paper (Munshi, C68 C44 Comparison, 2014). The finding is consistent with the general opinion of chess players. The Scotch General opinion: Game is generally accepted as a strong opening by grandmasters (Gserper, 2009) and there has been no call for it to be refuted. However, it is rarely seen in grandmaster tournaments. In opening books we find that at high levels of play 3. Bb5 is preferred 11:1 over 3. d4 in this line (Jones & Powell, 2014). Transposition: The first three moves in the experiment were set according to ECO code C44 but many of the games had transposed into C45. The games may be viewed in PGN format in the online data archive of this paper (Munshi, Experiment04, 2014). 5.4 Experiment #5: C41 Philidor: Category C Number of games won by Black C68 Ruy Lopez -vs- C41 Philiddor Defense 12 10 8 6 4 2 0 0 10 20 30 40 50 60 Number of games won by White Figure 6 C41 Philidor Defense Hypothesis test: Although the distance between C41 Philidor and C68 Ruy Lopez is 16.279 it is still not large enough for it to be unusual even under the assumption that the same π[pw, pb, pd]vector is responsible for generating game outcomes in both openings because the p-value of 0.00104 is not less than our threshold of disbelief that is set to α=0.001. Thus we must allow for the possibility that π[pw, pb, pd](C41) = π[pw, pb, pd](C68). Graph interpretation: In Figure 6 it seems that the two game clusters are different but we find that there is an area of overlap that contain games from both openings. The graph is consistent with the inconclusive A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 14 nature of our hypothesis test. Since the two results of these two experiments could have been generated by the same π[pw, pb, pd] vector we classify the C44 Philidor as Category C. It is a benign innovation. The computational details of the comparison are included in the data archive for this this paper (Munshi, C68 C41 Comparison, 2014). The finding is consistent with the general opinion of chess players. The Philidor General opinion: Defense is generally accepted as a strong opening (Bauer, 2006) and there has been no call for it to be refuted. However, in opening books we find that 2…Nc6 is preferred to the 2…d6 innovation by a ratio of 100:1 at high levels of play (Jones & Powell, 2014). In my collection of recent high profile grandmaster tournaments the Philidor is notable in its absence. (Munshi, GM 2012 2013 Book, 2014). Transpositions: The first three moves in the experiment were set according to ECO code C41and this ECO designation remained unchanged throughout all 300 games. The games may be viewed in PGN format in the online data archive of this paper (Munshi, Experiment 05, 2014). Number of games won by Black 5.5 Experiment 6: C37 King's Gambit: Category F C68 Ruy Lopez -vs- C37 Kings Gambit 60 50 40 30 20 10 0 0 10 20 30 40 Number of games won by White Figure 7 C37 King's Gambit Hypothesis test: In the hypothesis test in Table 3 we find that the observed distance between C37 and C68 in our sample is 28.636. Since Ho is rejected we can conclude that π[pw, pb, pd] (C37) ≠ π[pw, pb, pd] (C68) If the π[pw, pb, pd] vectors of the two openings were the same, the probability that we would observe a distance of 28.636 or more between them in our sample is very close to zero and much less than our threshold of disbelief set to α=0.001. Graph interpretation: Figure 7 shows two distinct clusters of simulated match results that do not overlap. The graph visually supports our conclusion that the game outcomes in the two openings are A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 15 driven by different π[pw, pb, pd] vectors. In particular, the 2. f4 innovation by white appears to have increased pb and/or decreased pw both of which work to the detriment of the innovator. We therefore classify the opening in Category F. It is a failed innovation. . The computational details of the comparison between C37 King's Gambit and C68 Ruy Lopez are available in the data archive for this paper (Munshi, C68 C37 Comparison, 2014). Former world champion Bobby Fischer had once published a paper calling for General opinion: the refutation of the King's Gambit (Fischer, 1961) and more recently chess engine programmer Vasik Rajlich carried out an extensive study on chess engines to support Fischer's call (Rajlich: Busting the King's Gambit, 2012). In opening books we find that 2. Nf3 is preferred to 2. f4 in this line by a ratio of 43:1 at high levels of play (Jones & Powell, 2014). Yet, the King's Gambit enjoys a degree of popularity and has many ardent supporters (Kebu Chess, 2008). Transpositions: The first three moves of the games were set according to ECO code C37 but many of the games in the sample transposed to C39. A complete record of all 300 games is available in PGN format in the data archive for this paper (Munshi, Experiment 06, 2014). Experiment #8: B21 Smith Morra: Category F Number of games won by Black 5.6 B53 Sicilian Defense -vs- B21 Smith Morra 30 25 20 15 10 5 0 0 10 20 30 40 Number of games won by White Figure 8 B21 Smith Morra Gambit Hypothesis test: In the hypothesis test in Table 3 we find that the observed distance between B21 and B53 in our sample is 17.804. Since Ho is rejected we can conclude that π[pw, pb, pd] (B21) ≠ π[pw, pb, pd] (B53) If the π[pw, pb, pd] vectors of the two openings were the same, the probability that we would observe a distance of 17.804 or more between them in our sample is 0.000081, much less than our threshold of disbelief set to α=0.001. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 16 Graph interpretation: Figure 8 shows two distinct clusters of simulation results that do not overlap. It visually supports our conclusion that the game outcomes in the two openings are driven by different π[pw, pb, pd] vectors. In particular, the 2. d4 innovation by white appears to have increased pb and/or decreased pw both of which work to the detriment of the innovator. We therefore classify the opening in Category F. It is a failed innovation. The computational details of the comparison between B21 and B53 are available in the data archive for this paper (Munshi, B53 B21 Comparison, 2014). General opinion: The opening continues to be controversial with no general consensus as to its merit. There are high profile commentators on both sides if the argument. For example Marc Esserman (Esserman, 2012) is a supporter while Timothy Taylor claims to have found its weakness (Taylor, 1993). In online opening databases we find that the Smith Morra is not played by grandmasters. At the highest levels of play, 2. Nf3 is preferred to 2. d4 in this line by a ratio of 1459:1 (Jones & Powell, 2014). There were no transpositions in this experiment. The ECO designation remained Transpositions: as B21 throughout every game. The PGN file is available in the data archive for this paper (Munshi, Experiment 08, 2014). 5.7 Experiment #9: B22 Sicilian Alapin: Category C Number of games won by Black B53 Sicilian Defense -vs- B122 Sicilian Alapin 10 8 6 4 2 0 0 10 20 30 40 Number of games won by White Figure 9 B22 Sicilian Alapin Hypothesis test: In Table 3 we find that the distance between B53 and B22 is 4.123 and the probability that we would observe a difference as large or larger is 0.362, much larger than our threshold of disbelief set at 0.001. Since we are unable to reject Ho, we must allow for the possibility that the same underlying π[pw, pb, pd] vector generates game outcomes for both of these openings. Graph interpretation: In Figure 9 we see the simulated matches overlie each other and form a single cluster. The graph supports the null hypothesis that a single underlying π[pw, pb, pd] vector could generate the observed results from both of these openings. We therefore classify the B22 Alapin in Category C. The computational details of the comparison is available in the data archive for this paper (Munshi, B53 B22 Comparison, 2014). A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 17 There does not exist any negative opinion on the B22 Alapin variation and there General opinion: are many who feel that 2. c3 is a strong move by white in the very popular Sicilian opening (Eddleman, 2010). Yet it is a rare occurrence in grandmaster games. In online opening databases one finds that 2. Nf3 is preferred to 2. c3 by a ratio of 35:1 (Jones & Powell, 2014). Our finding seems to be in agreement with general opinion but at odds with the rarity of the 2. c3 move in this line. Transpositions: There were no transpositions. The ECO designation remained as B22 through all 300 games. The games may be viewed in PGN format online (Munshi, Experiment 09, 2014). 5.8 Experiment #10: B12 Caro-Kann: Category C11 Number of games won by Black B53 Sicilian Defense -vs- B12 Caro Kann 12 10 8 6 4 2 0 0 10 20 30 40 50 60 Number of games won by White Figure 10 B12 Caro-Kann: UPDATED IN MARCH 2014 In the hypothesis test in Table 3 we find that the observed distance between B12 Hypothesis test: and B53 in our sample is 16.125. Since we failed to reject Ho we must allow for the possibility that π[pw, pb, pd] (B12) = π[pw, pb, pd] (B53) If the π[pw, pb, pd] vectors of the two openings were the same, the probability that we would observe a distance of 16.125 or more between these two openings in our sample is 0.0027. Since the probability is not less than our threshold of disbelief set to α=0.001 we conclude that the same probability vector could have generated both B12 and B53 game outcomes. Graph interpretation: Figure 10 appears to show that the π[pw, pb, pd] (B12) vector contains higher win probabilities for both white and black but the degree of overlap between the two clusters supports our hypothesis test conclusion that there is no evidence here that game outcomes in the B12 is driven by a different probability vector for B53 . Computational details of the comparison between B12 Caro-Kann and B53 Sicilian Defense are included in the data archive for this paper (Munshi, B12 B53 Comparison, 2014) 11 The Caro-Kann data were updated in March 2014 to correct a data entry error A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 18 There have been some calls for the refutation of this opening General opinion: (KenilworthKibitzer, 2010) but most commentators and analysts consider the Caro-Kann to be a strong opening comparable to the B53 Sicilian (Kebu Chess, 2008). However, in online opening databases we find that at the grandmaster level, 1…c5 is preferred to 1…c6 by a ratio of only 4: 1 (Jones & Powell, 2014). Transpositions: There were no transpositions. The first three moves were specified according to the B12 Caro-Kann Advance variation and the opening designation stayed in that ECO code throughout the games. The line used represents the mainline in the opening book set to Elo ratings of 2600 or higher. The moves made in these games are included in the data archive for this paper (Munshi, Experiment 10, 2014). 5.9 Experiment #11: B01 Scandinavian: Category F Number of games won by Black B53 Sicilian Defense -vs- B01 Scandinavian 8 6 4 2 0 0 20 40 60 80 100 Number of games won by White Figure 11 B01 Scandinavian Defense Hypothesis test: In Table 3 we find that the distance between B01 Scandinavian and B53 Sicilian is 38.053 and the probability that we would observe this difference or larger if game outcomes in the two openings were driven by the same probability vector is very close to zero and much less than our threshold value of 0.001. We therefore conclude that π[pw, pb, pd] (B01) ≠ π[pw, pb, pd] (B53) Graph interpretation: Figure 11 clearly shows that the two clusters of game outcomes are distinct and separated with no overlap, in agreement with our hypothesis test. The graph appears to show that the B inno ation of …d has in reased pw or decreased pb or both. As these changes work against the innovator we classify this opening as Category F. Computational details of the comparison of B01 with B53 are included in the data archive for this paper (Munshi, B01 B53 Comparison, 2014). There is plenty of support for this opening even at the grandmaster level of General opinion: chess (Gserper, Scandinavian center counter defense, 2010). There has been no call for refutation. In opening ooks e find that … is preferred to …d y a ratio of 39:1 (Jones & Powell, 2014). Our finding is at odds with subjective opinion but consistent with its rarity at a high level of play. A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 19 There were no transpositions. The ECO code remained as B01 throughout every Transpositions: game played. The games may be viewed in the data archive for this paper (Munshi, Experiment 11, 2014). 5.10 B07 Pirc Defense: Category F Number of games won by Black B53 Sicilian Defense -vs- B07 Pirc Defense 12 10 8 6 4 2 0 0 20 40 60 80 100 Number of games won by White Figure 12 B07 Pirc Defense Hypothesis test: Table 3 shows that the distance between B07 and B53 is 39 and the probability of observing a distance this large or larger if π[pw, pb, pd] (B07) = π[pw, pb, pd] (B53) is close to zero and less than our threshold of 0.001. We therefore reject Ho and conclude that the data provide strong evidence that π[pw, pb, pd] (B07) ≠ π[pw, pb, pd] (B53) and that the chess games using these two openings are therefore driven by different probability vectors. Graph interpretation: The simulated matches in Figure 12 form distinct clusters separated from each other without any overlapping match outcomes. This graphic serves to visually confirm out conclusion in the hypothesis test. It is also evident in the graph that the difference between the two probability vectors may include a higher value of pw in B07 than in B53. This difference works against the black who is the innovator in this case. We therefore lassify the …d inno ation as Category F. The computational details of the comparison are included in the data archive for this paper (Munshi, B53 B07 Comparison, 2014). General Opinion: The results conform with the relative rarity of this opening. At Elo ratings greater than , the opening data ase sho s that …c5 is preferred to 1…d6 by a ratio of over 18:1 (Jones & Powell, 2014). However there have not been a negative opinion expressed on this opening. Many experts and analysts have promoted the opening as one that opens up opportunities for black (Kebu Chess, 2012). (This paragraph was added in March 2014) The first three moves of the games were set according to B07 but many of the Transpositions: games transposed into B08 and A43. The complete game record move by move is included in the data archive for this paper (Munshi, Experiment 12, 2014). A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 20 6. CONCLUSIONS AND IMPLICATIONS12 Starting with a trinomial stochastic model for chess game outcomes we designed engine experiments and a Monte Carlo simulation procedure to detect the effect of the opening on the probability vector that generates game outcomes. The results show that the method of detecting opening effects is able to discriminate between known strong openings and known weak openings. Of the ten opening innovations tested, five were found to be failures because the innovation did not strengthen but in fact weakened the position of the innovator. The other five were found to be benign innovations as they had no measurable effect on the probability vector that generates game outcomes. None of the innovations tested succeeded in improving the innovator's chances of winning, leaving the baseline control openings as the strongest openings examined in this study. The value of this work lies not so much in these findings but in offering an objective methodology for comparing chess openings. The method may be used as an analysis tool to compare complex variations in opening lines. It may also be used to test controversial openings for possible refutation and in those cases the method serves as an objective refutation tool. An important implication of this work is that, because chess is trinomial and not binomial, neither opening strategies nor chess players may be compared using only a single scalar measure. The search for a single scalar index such as match score difference and Elo differential to measure relative playing strength ignores the three-dimensional nature of the probability vector that generates chess game outcomes. For example, in this paper we looked at both distance and direction. Although no opening in this study exhibited this behavior it is possible that an innovation that is rejected by the hypothesis test because of a large distance could still receive a grade of C if pw and pb are changed in the same proportion. In such a case, the innovation would serve to change only the probability of decisive games but without a relative advantage to either side. A single scalar measure of strength differential will never be found because the idea is mathematically flawed. A weakness in the methodology is the possibility that the results may to some extent be an artifact of the engine since the same engine with the same parameters was used to play both sides of the board for every game in every experiment. Further research is under way to test this hypothesis. A weakness in the findings of this work that requires further investigation is that there are some anomalies between the rarity data shown in Table 1 and the p-values in Table 3. Although there are some spectacular agreements, the overall lack of correlation between these values requires an explanation. For example, the Sicilian Alapin, an opening that appears to be as strong as the Sicilian mainline is relatively unpopular; while openings graded as "F" in this study such as the Scandinavian and the King's Gambit are played by grandmasters and even promoted by analysts (Kebu Chess, 2012). Also, as in all experimental studies, we may have gained precision possibly at the expense of realism. The objective is the development of an objective measure for comparing openings that may serve to bring many endless and subjective debates to a satisfactory conclusion and help to refine the opening book. 12 Edited in March 2014 A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 21 7. REFERENCES Abdi, H. (2007). Bonferroni Sidak. Retrieved February 2014, from utdallas.edu: http://www.utdallas.edu/~herve/Abdi-Bonferroni2007-pretty.pdf Bauer, C. (2006). The Philidor Files. New York: Everymanchess. chessgames.com. (2014, February). King's Gambit Accepted. Retrieved February 2014, from chessgames.com: http://www.chessgames.com/perl/chess.pl?yearcomp=ge&year=2000&playercomp=either&pid=&playe r=&pid2=&player2=&movescomp=exactly&moves=&opening=&eco=C37&result= Chessvibes. (2009, May). What baby names can tell you about chess openings. Retrieved February 2014, from chessvibes.com: http://www.chessvibes.com/columns/what-baby-names-can-tell-you-aboutchess-openings Eddleman, J. (2010, June). Destroying the Sicilian defense with the Alapin Variation. Retrieved February 2014, from voices.yahoo.com: http://voices.yahoo.com/destroying-sicilian-defense-chess-opening-with6165180.html Esserman, M. (2012). Mathem in the Morra. Qualitychess.co.uk. Fischer, R. (1961). A bust to the King's Gambit. Retrieved February 2014, from academicchess.org: http://www.academicchess.org/images/pdf/chessgames/fischerbust.pdf Gserper, G. (2010, January). Scandinavian center counter defense. Retrieved February 2014, from chess.com: http://www.chess.com/article/view/openings-for-tactical-players-scandinavian-centercounter--defense Gserper, G. (2009, November). Scotch Game. Retrieved February 2014, from chess.com: http://www.chess.com/article/view/openings-for-tactical-players-scotch-game Horowitz, I. A. (1964). Chess Openings Theory and Practice. New York: Fireside Books. Houdanrt, R. (2012). Houdini. Retrieved November 2013, from cruxis.com: http://www.cruxis.com/chess/houdini.htm Johnson, V. E. (2013, November). Revised Standards for Statistical Evidence. Retrieved December 2013, from Proceedings of the National Academy of Sciences: http://www.pnas.org/content/110/48/19313.full Jones, R., & Powell, D. (2014). Game Database. Retrieved February 2014, from chesstempo.com: http://chesstempo.com/game-database.html Kebu Chess. (2008). Caro-Kann. Retrieved February 2014, from chessopenings.com: http://chessopenings.com/caro-kann/ A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 22 Kebu Chess. (2008). Italian Game vs Ruy Lopez. Retrieved February 2014, from Chessopenings.com: http://chessopenings.com/italian+game/?videoId=RuyVsItalian Kebu Chess. (2008). Kings Gambit. Retrieved Febrary 2014, from chessopenings.com: http://chessopenings.com/kings+gambit/ Kebu Chess. (2008). King's Gambit. Retrieved February 2014, from chessopenings.com: http://chessopenings.com/kings+gambit/ KenilworthKibitzer. (2010, November). A bust to the Caro-Kann. Retrieved February 2014, from blogspot.com: http://kenilworthkibitzer.blogspot.com/2010/11/bust-to-caro-kann.html Meyer-Kahlen, S. (2007). Deep Shredder . Retrieved January 2014, from Shredder Chess: http://www.shredderchess.com/chess-software/deep-shredder12.html Munshi, J. (2014, February). B01 B53 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/slvy0g7zycr0egl/B53B01Comparison.pdf Munshi, J. (2014, March). B12 B53 Comparison. Retrieved 2014, from Dropbox: https://www.dropbox.com/s/u06ix9jj28i9tte/B12B53%20Comparison.pdf Munshi, J. (2014, February). B53 B07 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/79rapv4z7ifzz25/B53B07Comparison.pdf Munshi, J. (2014, February). B53 B15 comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/kh5aixxhvcehvvv/B53B15Comparison.pdfhttps://www.dropbox.com/s/kh 5aixxhvcehvvv/B53B15Comparison.pdf Munshi, J. (2014, February). B53 B21 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/mslcakx1o8fehal/B53B21Comparison.pdf Munshi, J. (2014, February). B53 B22 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/eyq1tyoldwlmspm/B53B22Comparison.pdf Munshi, J. (2014, February). C68 C37 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/rogj8pv3jnsgnwm/C68C37Comparison.pdf Munshi, J. (2014, February). C68 C37 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/rogj8pv3jnsgnwm/C68C37Comparison.pdf Munshi, J. (2014, February). C68 C41 Comparison. Retrieved February 2014, from Dropbox. Munshi, J. (2014, February). C68 C44 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/ra8s3al5n0urkn5/C68C44Comparison.pdf Munshi, J. (2014, February). C68 C50 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/bfk8qudmvpmd8hu/C68C50Comparison.pdf A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 23 Munshi, J. (2014, February). C68 C53 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/evbp599nglefbiz/C68C53Comparison.pdf Munshi, J. (2014, February). C68 C61 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/yrswnnqz3vogmzo/C68C61Comparison.pdf Munshi, J. (2014, February). C68 C61 Comparison. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/yrswnnqz3vogmzo/C68C61Comparison.pdf Munshi, J. (2014, February). Experiment 01. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/9y1efluh0gqaf8w/Experiment01.pdf Munshi, J. (2014, February). Experiment 05. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/t07h5o8i6a40wmd/Experiment05.pdf Munshi, J. (2014, February). Experiment 05. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/t07h5o8i6a40wmd/Experiment05.pdf Munshi, J. (2014, February). Experiment 06. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/sybbg6q1g9l94u2/Experiment06.pdf Munshi, J. (2014, February). Experiment 07. Retrieved February 2014, from Drop box: https://www.dropbox.com/s/v1waagahg675o9o/Experiment07.pdf Munshi, J. (2014, February). Experiment 08. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/p5ln7078b8cln7p/Experiment08.pdf Munshi, J. (2014, February). Experiment 09. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/63y34wb670ke4bl/Experiment09.pdf Munshi, J. (2014, February). Experiment 10. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/o09xgykrzfmucki/Experiment10.pdf Munshi, J. (2014, March). Experiment 10. Retrieved March 2014, from Dropbox: https://www.dropbox.com/s/o09xgykrzfmucki/Experiment10.pdf Munshi, J. (2014, February). Experiment 11. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/yugjl8lcj113kop/Experiment11.pdf Munshi, J. (2014, February). Experiment 12. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/5b78mmmen2zshyl/Experiment12.pdfhttps://www.dropbox.com/s/5b78 mmmen2zshyl/Experiment12.pdf Munshi, J. (2014, February). Experiment02. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/jbgrv7y5i10j2gf/Experiment02.pdf A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 24 Munshi, J. (2014, February). Experiment03. Retrieved February 2914, from Dropbox: https://www.dropbox.com/s/fr0d35fu1cym4jx/Experiment03.pdf Munshi, J. (2014, February). Experiment04. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/wsfwqf3d7pbxefx/Experiment04.pdf Munshi, J. (2014, February). GM 2012 2013 Book. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/rw5tu7e8dc8cgqh/GM20122013Book.bkt Munshi, J. (2014, February). Group 1 Simulations. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/6k23wu3yfzvyoun/Group1Simulations.pdf Munshi, J. (2014, February). Group 2 Simulations. Retrieved February 2014, from Dropbox: https://www.dropbox.com/s/4imkhk9mu0j02ty/Group2Simulations.pdf Munshi, J. (2014, March). Group 2 Simulations. Retrieved March 2014, from Dropbox: https://www.dropbox.com/s/21fewohnr60aufn/Group2SimulationData.pdf Munshi, J. (2012). London classic openings mainline. Retrieved November 2013, from chess.com: http://www.chess.com/blog/Jamalov/london-classic-openings-main-line Munshi, J. (2014, February). OpeningPaperData. Retrieved February 2014, from Dropbox: https://www.dropbox.com/sh/n9k3tqnwrlb92m2/S6vtPPU0RP Munshi, J. (2013). Openings in the Carlsen games. Retrieved November 2013, from chess.com: http://www.chess.com/blog/Jamalov/openings-in-carlsen-games Rajlich: Busting the King's Gambit. (2012, February). Retrieved February 2014, from chessbase.com: http://en.chessbase.com/post/rajlich-busting-the-king-s-gambit-this-time-for-sure Scimia, E. (2013, December). Bird Defense. Retrieved January 2014, from About.com: http://chess.about.com/od/openings/ss/RuyLopez_4.htm Simon, F. G. (1996). Brunel University. Retrieved January 2014, from Recognition and search in simultaneous games: http://bura.brunel.ac.uk/bitstream/2438/1338/1/recognition%20processes%20and%20lookahead%20search.pdf Snell, C. M. (1997). Introduction to probability. Retrieved January 2014, from Dartmouth.edu: https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pd f Taylor, T. (1993). How to defeat the Smith Morra Gambit. Coroapolis, PA: Chess Enterprises. Wikibooks. (2013). Chess Opening Theory. Retrieved November 2013, from Wikibooks: http://en.wikibooks.org/wiki/Chess_Opening_Theory A METHOD FOR COMPARING CHESS OPENINGS, JAMAL MUNSHI, 2014 Wikipedia. (2014). Analysis of Variance. Retrieved January 2014, from Wikipedia: http://en.wikipedia.org/wiki/Analysis_of_variance Wikipedia. (2014, February). Chess engine. Retrieved February 2014, from wikipedia: http://en.wikipedia.org/wiki/Chess_engine Wikipedia. (2014). Elo rating system. Retrieved February 2014, from Wikipedia: http://en.wikipedia.org/wiki/Elo_rating_system Wikipedia. (2013). First move advantage in chess. Retrieved November 2013, from Wikipedia: http://en.wikipedia.org/wiki/First-move_advantage_in_chess Wikipedia. (2013). Houdini Chess. Retrieved November 2013, from Wikipedia: http://en.wikipedia.org/wiki/Houdini_(chess) Wikipedia. (2013, March). Probability vector. Retrieved December 2013, from Wikipedia: http://en.wikipedia.org/wiki/Probability_vector Wikipedia. (2014, January). Wikipedia. Retrieved February 2014, from Computer chess: http://en.wikipedia.org/wiki/Computer_chess 25