Academia.eduAcademia.edu

Simulation leagues: Analysis of competition formats

2014

The selection of an appropriate competition structure is critical for both the success and credibility of any competition, both real and simulated. In this paper, the automated parallelism offered by the RoboCup 2D simulation league is leveraged to conduct a 28,000 game round-robin between the top 8 teams from RoboCup 2012 and 2013. A proposed new competition structure is found to reduce variation from the resultant statistically significant team performance rankings by 75% and 67%, when compared to the actual competition results from RoboCup 2012 and 2013 respectively.

Simulation leagues: Analysis of competition formats David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 1 National ICT Australia (NICTA), Victoria Research Lab The University of Melbourne, Parkville, VIC 3010, Australia Statistical Learning, CSIRO Computational Informatics, Epping, NSW 1710, Australia 2 arXiv:1403.4023v1 [cs.MA] 17 Mar 2014 3 Abstract. The selection of an appropriate competition structure is critical for both the success and credibility of any competition, both real and simulated. In this paper, the automated parallelism offered by the RoboCup 2D simulation league is leveraged to conduct a 28,000 game round-robin between the top 8 teams from RoboCup 2012 and 2013. A proposed new competition structure is found to reduce variation from the resultant statistically significant team performance rankings by 75% and 67%, when compared to the actual competition results from RoboCup 2012 and 2013 respectively. 1 Introduction 1.1 The RoboCup humanoid challenge RoboCup (the “World Cup” of robot soccer) was first proposed in 1997 as a standard problem for the evaluation of theories, algorithms and architectures in areas including artificial intelligence (AI), robotics and computer vision [1]. This proposal followed the observation that traditional AI problems were increasingly unable to meet these requirements and that a new challenge was necessary to initiate the development of next-generation technologies. The overarching RoboCup goal of developing a team of humanoid robots capable of defeating the FIFA World Cup champion team, coined the “Millennium Challenge”, has proven a major factor in driving research in AI and related areas for over a decade, with a search for the term “RoboCup” in a major literature database yielding over 23,000 results. Since 1997, researchers and competitors have decomposed this ambitious pursuit into two complementary categories [2]: – Physical robot league: Using physical robots to play soccer games. This category now contains many different leagues for both wheeled robots (small-sized [3] and mid-sized leagues [4]) and humanoids (standard platform league [5] and humanoid league [6]), with each focusing on different aspects of physical robot design [7], motor control and bipedal locomotion [8], real-time localisation [9] and computer vision [10]. – Software agent league: Using software or synthetic agents to play soccer games on an official soccer server over a network. This category contains both 2D [11,12] and 3D [13] simulation leagues. 2 David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 The annual RoboCup competition, which attracted 2,500 participants and 40,000 spectators from 40 countries in 2013 [14], now exhibits a number of non-soccer competitions. The oldest and largest of these, RoboCup rescue, is also separated into physical and simulation leagues [15,16]. 1.2 Significance of simulation leagues The RoboCup simulation leagues traditionally involve the largest number of international participating teams, reaching 40 in 2013 [17]. The ability to simulate soccer matches without physical robots abstracts away low-level issues such as image processing and motor breakages, allowing teams to focus on the development of complex team behaviours and strategies for a larger number of autonomous agents. The “Millennium Challenge” requires robots to exhibit both physical and strategic prowess, necessitating this decomposition of the larger problem into more manageable, complementary tasks for concurrent development. The remainder of this section expands upon some specific contributions of the RoboCup simulation leagues toward this goal. Financial inclusiveness of competing nations The physical robots required by non-simulation leagues remain particularly expensive. As an example, the Robotis DARwIn-OP humanoid robot is currently advertised for ₩12,000,000 KRW [7] (approximately $10,000 USD). By removing these costs and those associated with robot repairs and transportation, the simulation leagues allow institutes with access to less significant funding to actively contribute to and participate in the RoboCup initiative. To quantify this claim, Fig. 1 presents the PEoE (public expenditure on education as a percentage of GDP [18]) and GDP/cap (gross domestic product at purchasing power parity per capita [19]) for the home country of each participating1 RoboCup 2013 team, averaged over each of the six largest RoboCup leagues. The countries participating in the standard platform league, which requires teams to field five Aldebaran Nao humanoid robots (with a retail value comparable to the DARwIn-OP), have the highest average PEoE and GDP/cap of any league considered. The kid-sized humanoid and rescue leagues, each of which require the purchase or construction of physical robots, also involve countries with a high average PEoE and GDP/cap. Each of the three major simulation leagues (2D, 3D and rescue) exhibit significantly lower values, suggesting that the inclusion of simulation leagues supports financial inclusiveness within the competition. 1 In this instance, a team is considered “participating” if they have a team description paper published in the RoboCup symposium proceedings, detailing their contribution toward the RoboCup initiative. Simulation leagues: Analysis of competition formats 3 4 4 x 10 4.9 4.8 3.5 GDP (PPP) per capita Public expenditure on education (% GDP) 5 4.7 4.6 4.5 4.4 4.3 4.2 3 2.5 2 4.1 4 SPL KSL Rescue 2DSim 3DSim RescSim 1.5 SPL KSL Rescue 2DSim 3DSim RescSim Fig. 1. PEoE (public expenditure on education as a percentage of GDP [18]) and GDP/cap (gross domestic product at purchasing power parity per capita [19]) for the home country of each participating RoboCup 2013 team, averaged over each of the six largest RoboCup leagues. Each of the three major simulation leagues (2D, 3D and rescue) exhibit significantly lower values than those requiring the purchase or development of physical robots. Statistically significant analyses by automated competition parallelism The automation of multiple parallel games makes RoboCup simulation leagues ideal platforms for analysing the complexities of complex team behaviours. Most team games and sports (both real and virtual) are characterised by rich, dynamic interactions that influence the contest outcome. As described by Vilar et al., “quantitative analysis is increasingly being used in team sports to better understand performance in these stylized, delineated, complex social systems” [20]. Early examples of such quantitative analysis include sabermetrics, which attempts to “search for objective knowledge about baseball” by considering statistics of in-game activity [21]. A recent study by Fewell et al. involved the analysis of basketball games as networks, with properties including degree centrality, clustering, entropy and flow centrality calculating from measurements of ball position throughout the game [22]. This idea was extended by Vilar et al., who considered the local dynamics of collective team behaviour to quantify how teams occupy sub-areas of the field as a function of ball position [20]. Recently, Cliff et al. presented several information-theoretic methods of quantifying dynamic interactions in football games, using the RoboCup 2D simulation league as an experimental platform [23]. In particular, Cliff et al. calculated pairwise information transfer between each pair of agents, averaged over hundreds of 2D simulation league games, producing different diagrams of information “flow” during the games and enabling detailed tactical analysis. The ability to automate thousands of simulation league games allows for the analysis of competition structures to determine which best approximate the true performance rankings of competing teams. The selection of an appropriate competition structure (or format) is critical for both the success and credibility of any competition. Unfortunately, this choice is not straightforward: The format must minimise randomness relative to the true performance ranking of teams while keeping the number of games to a minimum, 4 David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 both to satisfy time constraints and retain the interest of participants and spectators alike. Furthermore, maintaining competition interest introduces a number of constraints to competition structure: As an example, multiple games between the same two opponents (the obvious method of achieving a statistically significant ranking) should be avoided. The remainder of this paper quantifies the appropriateness of different tournament structures (a major consideration in many human sports) by determining the statistically significant performance rankings of 2012 and 2013 RoboCup 2D simulation teams. A new competition structure is then proposed and verified by leveraging the automated parallelism facilitated by the 2D simulation league platform. In addition to demonstrating the utility of simulation leagues for statistical analysis of team sport outcomes given some system perturbation, it is anticipated that the adoption of the proposed structure would improve the success and credibility of the RoboCup simulation leagues in future years. 2 Previous competition structures The following two competition structures were adopted by the RoboCup 2D simulation league in 2012 and 2013: – In 2012, a total of 20 games were played to determine the final rank of the top 8 teams. Specifically, the top 4 teams played 6 games each (3 quarterfinal roundrobin, 2 semifinal and 1 final/third place playoff), and the bottom 4 teams player 4 games each. – In 2013, a double-elimination system was adopted, where a team ceases to be eligible to place first upon having lost 2 games [24,25]. A total of 16 games were played to determine the final rank of the top 8 teams. Specifically, 14 games were played in the double-elimination format (i.e. 2n − 2, n = 8) in addition to 2 classification games. Previously, it has been unclear whether this change in competition structure improves the fairness and reproducibility of the final team rankings. In general, lack of reproducibility is due to non-transitivity of team performance (a well-known phenomena that occurs frequently in actual human team sports). This may be addressed by a round-robin competition (where all 28 possible pairs of teams play against one another), yet it is also unclear whether this increase in the number of games is guaranteed to improve ranking stability. 3 Methods of ranking team performance Before evaluating different competition structures, it is necessary to establish a fair (i.e. statistically significant) ranking of the top 8 RoboCup 2D simulation league teams for 2012 and 2013. This was accomplished by conducting an 8-team round-robin for both years, where all 28 pairs of teams play approximately 1000 games against one another. In addition, two different schemes were considered for point calculation: Simulation leagues: Analysis of competition formats 5 – Continuous scheme: Teams are ranked by sum of average points obtained against each opponent across all 1000 games. – Discrete scheme: Firstly, the average score between each pair of teams (across all 1000 games) is rounded to the nearest integer (e.g. “1.9 : 1.2” is rounded to “2 : 1”). Next, points are allocated for each pairing based on these rounded results: 3 for a win, 1 for a draw and 0 for a loss. Teams are then ranked by sum of these points received against each opponent. The final rankings generated for 2012 and 2013 RoboCup 2D simulation league teams under these two schemes are presented in Sec. 5.1. Finally, in order to formally capture the overall difference between two rankings ra and rb , the L1 distance is utilised: d1 (ra , rb ) = kra − rb k1 = n X |ria − rib | , (1) i=1 where i is the index of the i-th team in each ranking, 1 ≤ i ≤ 8. The difference between rankings for different competition structures are presented in Sec. 5.2. 4 Proposed competition structure Sec. 3 describes two schemes under which statistically significant rankings of RoboCup 2D simulation league teams can be achieved. However, it remains unclear whether the previously adopted competition structures are able to replicate these rankings with minimal noise for considerably fewer games, or whether a new format may achieve improved results in this regard. One possible format involves the following two steps: – Firstly, a preliminary round-robin is conducted where 1 game is played for all 28 pairs of teams. – Following the rankings obtained in the previous step, 4 classification games are played: The final between the top 2 teams and playoffs between third and fourth, fifth and sixth, and seventh and eighth places. It is possibly to use the best-of-three format for each of these classification games. The 32 games required involved in this competition structure could still fit readily in a 1-2 day time frame, particularly with 2 games running simultaneously as per RoboCup 2013. 5 Results 5.1 Statistically significant rankings versus previous competition structures Following iterated round-robin and two point calculation schemes described in Sec. 3, statistically significant rankings were generated for the top 8 RoboCup 2D simulation league teams for 2012 and 2013. These results are presented below. 6 David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 RoboCup 2012 results The final round-robin results of the top 8 teams for RoboCup 2012 are presented in Table 1 and Table 2, for the continuous and discrete scoring schemes described in Sec. 3 respectively. Results are ordered according to actual performance at RoboCup 2012, ra . Table 2 presents the continuous (non-rounded) scores averaged across the approximately 1000 games for each pair in the round-robin, in addition to the points allocated according to the discretisation scheme (3 for a win, 1 for a draw and 0 for a loss). The tie-breaker is the rounded goal difference (not shown), which was used only to separate first place (WrightEagle, +39 points) from second (Helios, +26 points). The final ranking corresponds exactly with that generated under the continuous scheme, as presented in Table 1. Despite the agreement between continuous and discrete scoring schemes, it is obvious that this ranking (generated from the results of approximately 28,000 games) disagrees significantly from the actual RoboCup 2012 results. This can be quantified using the distance metric defined in (1): d1 (ra , rc )2012 = |1−2|+|2−1|+|3−5|+|4−4|+|5−6|+|6−8|+|7−3|+|8−7| = 12, where ra represents the actual RoboCup 2012 rankings and rc represents the ranking generated under continuous scoring scheme round-robin. This large difference suggests that the 2012 competition format did not succeed in capturing the true team performance ranking. RoboCup 2013 results The final round-robin results of the top 8 teams for RoboCup 2013 are presented in Table 3 and Table 4, for the continuous and discrete scoring schemes described in Sec. 3 respectively. Results are ordered according to actual performance at RoboCup 2013, ra , and presented in the same format as Table 1 and Table 2 for RoboCup 2012. Unlike RoboCup 2012, there is a slight disagreement between the rankings generated using continuous and discrete scoring schemes, with a swap between third and fourth teams. Again using the distance metric defined in (1), the difference between these rankings and the actual RoboCup 2013 results can be quantified: d1 (ra , rc )2013 = |1−1|+|2−2|+|3−4|+|4−8|+|5−6|+|6−3|+|7−7|+|8−5| = 12, d1 (ra , rd )2013 = |1−1|+|2−2|+|3−3|+|4−8|+|5−6|+|6−4|+|7−7|+|8−5| = 10, where ra represents the actual RoboCup 2013 rankings, while rc and rd represent the ranking generated under continuous and discrete scoring schemes of round-robins respectively. It is evident that the 2013 double-elimination format yielded as much overall divergence as the 2012 single-elimination format, but with slightly fewer individual discrepancies. It is also clear that, given very small points differences between adjacent Simulation leagues: Analysis of competition formats ra Team 7 Helios Wright Marlik Gliders GDUT AUT Yushan RobOTTO Points Goal Diff Rank, rc 1 Helios 2 Wright 1.406 1.397 2.442 2.517 2.948 2.970 2.880 2.998 18.152 + 26.0 2 2.792 2.835 2.900 2.998 2.970 2.998 18.899 + 38.7 3 Marlik 0.309 0.129 1 1.147 2.121 2.804 0.874 2.615 9.999 + 0.3 4 Gliders 0.261 0.102 1.396 5 1.809 2.957 0.903 2.863 10.291 + 3.4 5 GDUT 0.029 0.074 0.633 0.960 4 2.955 0.552 2.597 7.800 - 6.0 6 AUT 0.007 0.001 0.107 0.026 0.024 6 0.003 0.209 0.377 - 39.3 7 Yushan 0.084 0.021 1.822 1.875 2.316 2.994 8 2.993 12.105 + 6.5 8 RobOTTO 0.001 0.001 0.233 0.087 0.228 2.418 0.005 3 2.973 - 29.6 7 Table 1. Round-robin results (average goals scored) for the top 8 teams from RoboCup 2012, ordered according to their final competition rank, ra . The final points for each team were determined by summing the average points scored against each opponent over approximately 1000 games, resulting in the round-robin with continuous point allocation scheme ranking, rc . Helios Helios Marlik Gliders 2.3 : 2.3 1.4 : 0.1 1.6 : 0.1 4.4 : 0.2 7.7 : 0.0 4.5 : 0.7 7.6 : 0.1 29.5 : 3.5 19 2 3.2 : 0.3 3.3 : 0.2 5.8 : 1.2 12.1 : 0.1 7.2 : 1.0 10.1 : 0.2 44.0 : 5.3 19 1 5 Wright 2.3 : 2.3 Marlik 0.1 : 1.4 0.3 : 3.2 Gliders 0.1 : 1.6 0.2 : 3.3 0.56 : 0.46 GDUT 0.2 : 4.4 1.2 : 5.8 AUT Yushan 0.4 : 1.4 GDUT AUT 7.4 : 7.1 10 1.9 : 1.2 4.3 : 0.1 1.4 : 2.2 4.6 : 0.8 13.1 : 9.7 12 4 4.0 : 0.2 2.0 : 3.9 3.4 : 0.8 12.4 : 18.4 6 6 1.2 : 1.9 0.1 : 4.3 0.2 : 4.0 0.7 : 4.5 1.0 : 7.2 2.2 : 1.4 3.9 : 2.0 7.1 : 0.1 RobOTTO 0.1 : 7.6 0.2 : 10.1 0.2 : 2.1 Goals 0.46 : 0.56 1.4 : 0.4 2.3 : 0.1 0.7 : 1.2 2.1 : 0.2 0.0 : 7.7 0.1 : 12.1 0.1 : 2.3 1.2 : 0.7 Yushan RobOTTO Points rd Wright 0.1 : 7.1 0.7 : 3.1 0.8 : 4.6 0.8 : 3.4 3.1 : 0.7 0.4 : 6.7 1.3 : 40.6 0 8 6.7 : 0.4 22.8 : 16.3 13 3 5.6 : 35.2 3 7 Table 2. Round-robin results (average goals scored and discretised points allocated) for the top 8 teams from RoboCup 2012, ordered according to their final competition rank, ra . Discretised points are determined by calculating the average number of goals scored over approximately 1000 games rounded to the nearest integer, then awarding 3 points for a win, 1 point for a draw and 0 points for a loss. The resultant round-robin with discrete point allocation scheme ranking, rd , is equivalent to that generated under the continuous scheme. David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 8 teams, it may be necessary to play classification games even after a statistically significant round-robin. It is therefore proposed that the format described in Sec. 4 should improve reliability of the competition outcomes. ra Team Wright Helios Yushan Axiom Gliders Oxsy AUT Cyrus Points Goal Diff Rank, rc 1 Wright 1.877 2.470 2.880 2.397 2.901 2.991 2.792 18.308 + 22.5 2.841 2.940 2.194 2.343 2.969 2.767 16.937 + 14.9 2 2.506 1.892 1.557 2.059 0.921 9.434 - 1.3 4 0.590 0.395 1.224 1.023 3.713 - 14.5 8 1.612 1.871 0.828 8.371 - 2.0 6 2.225 2.167 9.543 - 2.2 3 0.731 4.416 - 14.0 7 8.408 - 3.4 5 2 Helios 0.883 3 Yushan 0.406 0.093 4 Axiom 0.072 0.042 0.367 5 Gliders 0.437 0.490 0.884 2.249 6 Oxsy 0.065 0.385 1.159 2.437 1.105 7 AUT 0.006 0.017 0.718 1.491 0.878 0.575 8 Cyrus 0.137 0.136 1.791 1.740 1.926 0.632 2.046 1 Table 3. Round-robin results (average goals scored) for the top 8 teams from RoboCup 2013, ordered according to their final competition rank, ra . The final points for each team were determined by summing the average points scored against each opponent over approximately 1000 games, resulting in the round-robin with continuous point allocation scheme ranking, rc . Wright Wright Helios Yushan Axiom Gliders Oxsy AUT Cyrus Goals Points rd 1.9 : 1.2 2.8 : 0.9 4.9 : 0.3 2.5 : 0.7 5.4 : 0.8 6.4 : 0.3 3.4 : 0.6 27.3 : 4.8 21 2.8 : 0.2 4.1 : 0.2 1.2 : 0.2 2.2 : 0.4 4.1 : 0.1 2.5 : 0.2 18.1 : 3.2 18 2 2.7 : 0.8 1.8 : 1.1 1.4 : 1.2 1.7 : 0.8 0.9 : 1.4 9.6 : 10.9 11 3 1.0 : 2.3 0.7 : 2.7 0.9 : 1.1 1.4 : 2.0 5.3 : 19.8 1 8 1.3 : 1.0 1.8 : 1.1 0.9 : 1.7 8.3 : 10.3 7 6 2.3 : 0.8 2.2 : 1.0 10.6 : 12.8 11 4 0.8 : 1.8 5.0 : 19.0 1 7 8.7 : 12.1 10 5 Helios 1.2 : 1.9 Yushan 0.9 : 2.8 0.2 : 2.8 Axiom 0.3 : 4.9 0.2 : 4.1 0.8 : 2.7 Gliders 0.7 : 2.5 0.2 : 1.2 1.1 : 1.8 2.3 : 1.0 Oxsy 0.8 : 5.4 0.4 : 2.2 1.2 : 1.4 2.7 : 0.7 1.0 : 1.3 AUT 0.3 : 6.4 0.1 : 4.1 0.8 : 1.7 1.1 : 0.9 1.1 : 1.8 0.8 : 2.3 Cyrus 0.6 : 3.4 0.2 : 2.5 1.4 : 0.9 2.0 : 1.4 1.7 : 0.9 1.0 : 2.2 1.8 : 0.8 1 Table 4. Round-robin results (average goals scored and discretised points allocated) for the top 8 teams from RoboCup 2012, ordered according to their final competition rank, ra . Discretised points are determined by calculating the average number of goals scored over approximately 1000 games rounded to the nearest integer, then awarding 3 points for a win, 1 point for a draw and 0 points for a loss. The tie-breaker is the total of rounded goal differences (not shown). The resultant round-robin with discrete point allocation scheme ranking, rd , is slightly different to that generated under the continuous scheme. 5.2 Evaluation of proposed competition structures In order to evaluate the proposed competition structure described in Sec. 4, the actual game results from RoboCup 2012 and 2013 were used where possible. As these previous formats do not necessarily require all pairs of teams to play against one another, some of these results are not available: In these cases, the average scores from Table 2 and Table 4 were utilised for RoboCup 2012 and 2013 respectively. Simulation leagues: Analysis of competition formats 9 Using these results, it is possible to infer final rankings for RoboCup 2012 and 2013 under the proposed competition structure. These results are presented below. RoboCup 2012 results The combined actual and average results of top 8 teams from RoboCup 2012 are presented in Table 5, in addition to the inferred final ranking, rp , for RoboCup 2012 under the competition structure proposed in Sec. 4. Using the distance metric defined in (1), the difference between rp and the ranking generated from the 28,000 game round-robin, rc , can be quantified: d1 (rp , rc )2012 = |1−2|+|2−1|+|4−5|+|5−4|+|6−6|+|8−8|+|3−3|+|7−7| = 4. This is a considerably smaller difference than the 12 produced under the actual RoboCup 2012 format, suggesting that the proposed format better captures the true team performance ranking. Furthermore, this result is achieved using a majority of actual game results (i.e. 18 from 28 pairs, with only 10 using the averages from Table 2). Helios Helios Wright 4:1 Marlik Gliders GDUT Yushan RobOTTO Points Rank rp 4:0 1:0 4.4 : 0.2 1:0 4.5 : 0.7 2:0 21 1 1 2:1 2:0 5:1 12.1 : 0.1 6:1 10.1 : 0.2 18 2 2 1:0 1:0 2.3 : 0.1 1:1 2.1 : 0.2 13 3 4 1.9 : 1.2 2:0 1.4 : 2.2 3:0 9 5 5 3:2 3.4 : 0.8 9 6 6 0.1 : 7.1 1:0 3 7 8 3:1 10 4 3 0 8 7 Wright 1:4 Marlik 0:4 1:2 Gliders 0:1 0:2 0:1 GDUT 0.2 : 4.4 1:5 0:1 AUT 0:1 Yushan 0.7 : 4.5 RobOTTO 0:2 0.1 : 12.1 0.1 : 2.3 1:6 AUT 1:1 0.2 : 10.1 0.2 : 2.1 1.2 : 1.9 1:0 0:2 0:1 2.2 : 1.4 2:3 7.1 : 0.1 0:3 0.8 : 3.4 0:1 1:3 Table 5. Combined actual and average results for the top 8 teams from RoboCup 2012, ordered according to their final competition rank. Each goal difference represents the actual (integer) game results from RoboCup 2012 where possible. As this previous format does not necessarily require all pairs of teams to play against one another, some of these results are not available: In these cases, the average (continuous-valued) scores from Table 2 were utilised. Using these results, it is possible to infer the final ranking, rp , for RoboCup 2012 under the competition structure proposed in Sec. 4. 10 David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 RoboCup 2013 results The combined actual and average results of top 8 teams from RoboCup 2013 are presented in Table 6, in addition to the inferred final ranking, rp , for RoboCup 2013 under the competition structure proposed in Sec. 4. Again using the distance metric defined in (1), the difference between rp and the ranking generated from the 28,000 game round-robin, rc or rd , can be quantified: d1 (rp , rc )2013 = |1−1|+|2−2|+|3−4|+|8−8|+|4−6|+|5−3|+|7−7|+|6−5| = 6 d1 (rp , rd )2013 = |1−1|+|2−2|+|3−3|+|8−8|+|4−6|+|5−4|+|7−7|+|6−5| = 4 Similarly to the results for 2012, these are considerably smaller differences than the 12 (or 10) produced under the actual RoboCup 2013 format, providing further evidence that the proposed format better captures the true team performance ranking. Again, this result is achieved using a majority of actual game results (i.e. 15 from 28 pairs, with only 13 using the averages from Table 4). Wright Wright Helios Helios Yushan Axiom Gliders 3:1 1:3 Yushan 0.9 : 2.8 2.8 : 0.9 4.9 : 0.3 2.5 : 0.7 2:0 0:2 12 3 3 3:3 1:6 2:1 1.4 : 2.0 4 7 8 1.8 : 1.1 0.9 : 1.7 10 4 4 2.3 : 0.8 2.2 : 1.0 9 5 5 3 8 7 6 6 6 6:1 0:4 0:2 2 2:0 0:3 Cyrus 0.6 : 3.4 1 2 1.7 : 0.8 3:3 0.1 : 4.1 0.8 : 1.7 1 18 3:0 1:0 0:7 21 4:0 0:1 0:2 AUT 7:0 2.2 : 0.4 4.1 : 0.1 4:1 Gliders 0.7 : 2.5 0.4 : 2.2 6.4 : 0.3 2:0 1:4 3:5 AUT 5:3 4.1 : 0.2 Axiom 0.3 : 4.9 0.2 : 4.1 Oxsy Cyrus Points Rank rp Oxsy 1:2 4:0 0:4 1.1 : 1.8 0.8 : 2.3 2.0 : 1.4 1.7 : 0.9 1.0 : 2.2 3:1 1:3 Table 6. Combined actual and average results for the top 8 teams from RoboCup 2013, ordered according to their final competition rank. Each goal difference represents the actual (integer) game results from RoboCup 2013 where possible. As this previous format does not necessarily require all pairs of teams to play against one another, some of these results are not available: In these cases, the average (continuous-valued) scores from Table 4 were utilised. Using these results, it is possible to infer the final ranking, rp , for RoboCup 2013 under the competition structure proposed in Sec. 4. 6 Conclusions The selection of an appropriate competition structure (format) is critical for both the success and credibility of any competition. This is particularly true in the RoboCup 2D simulation league, which provides an ideal computational platform for examining different formats by facilitating automated parallel execution of a statistically significant number of games. A 28,000 game round-robin competition was conducted between the top 8 2D simulation league teams from both RoboCup 2012 and 2013. The difference between the Simulation leagues: Analysis of competition formats 11 resultant rankings was calculated relative to the actual results of RoboCup 2012 and 2013 (12 and 12 respectively) and compared to those that would have resulted under a proposed new structure (4 and 6 respectively). This suggests a significant reduction in randomness relative to true team performance rankings while only requiring the number of games to be increased to 32; a number that would still fit readily in a 1-2 day time frame, particularly utilising the round-robin parallelism enabled by the stable 2D simulation platform. The RoboCup “Millennium Challenge” requires robots to exhibit both physical and strategic prowess, necessitating the decomposition of the larger problem into both physical robot and simulation leagues. Although often overlooked, the simulation leagues contribute significantly to this goal, both through improving financial inclusiveness of competing nations and providing a stable platform for statistically significant analysis of team behaviour and competition structure. In addition to highlighting the latter of these contributions, it is anticipated that the introduction of the proposed new format will improve the reliability of final competition rankings and consequently success and credibility of the RoboCup simulation leagues in future years. Acknowledgement NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program. References 1. Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., Osawa, E.: RoboCup: The robot world cup initiative. In: Proceedings of the first international conference on autonomous agents, ACM (1997) 340–347 2. Kitano, H., Asada, M.: The robocup humanoid challenge as the millennium challenge for advanced robotics. Advanced Robotics 13(8) (1998) 723–736 3. RoboCup Technical Committee: Laws of the RoboCup Small Size League 2013. http://robocupssl.cpe.ku.ac.th/_media/rules:ssl-rules-2013-2.pdf (2013) 4. RoboCup Technical Committee: Middle Size Robot League rules and regulations for 2013. http://wiki.robocup.org/images/9/98/Msl_rules_2013.pdf (2013) 5. RoboCup Technical Committee: RoboCup Standard Platform League (NAO) rule book. http://www.tzi.de/spl/pub/Website/Downloads/Rules2013.pdf (2013) 6. RoboCup Technical Committee: RoboCup Soccer Humanoid League rules and setup. http://www.tzi.de/humanoid/pub/Website/Downloads/HumanoidLeagueRules2013-05-28.pdf (2013) 7. Ha, I., Tamura, Y., Asama, H., Han, J., Hong, D.W.: Development of open humanoid platform DARwIn-OP. In: SICE Annual Conference (SICE), 2011 Proceedings of, IEEE (2011) 2178–2181 8. Budden, D., Walker, J., Flannery, M., Mendes, A.: Probabilistic gradient ascent with applications to bipedal robot locomotion. In: Australasian Conference on Robotics and Automation (ACRA). (2013) 12 David Budden1,2 , Peter Wang3 , Oliver Obst3 , Mikhail Prokopenko3 9. Budden, D., Prokopenko, M.: Improved particle filtering for pseudo-uniform belief distributions in robot localisation. In: RoboCup 2013: Robot Soccer World Cup XVII, Springer (2013) 10. Budden, D., Mendes, A.: Unsupervised recognition of salient colour for real-time image processing. In: RoboCup 2013: Robot Soccer World Cup XVII, Springer (2013) 11. Chen, M., Foroughi, E., Heintz, F., Huang, Z., Kapetanakis, S., Kostiadis, K., Kummeneje, J., Noda, I., Obst, O., Riley, P., Steffens, T., Wang, Y., Yin, X.: RoboCup Soccer Server. http://wwfc.cs.virginia.edu/documentation/manual.pdf 12. Prokopenko, M., Obst, O., Wang, P., Budden, D., Cliff, O.: Gliders2013: Tactical analysis with information dynamics. In: RoboCup 2013 symposium and competitions: Team description papers, Eindhoven, The Netherlands, June 2013. (2013) 13. RoboCup Technical Committee: RoboCup Soccer Simulation League 3D competition rules and setup. http://homepages.herts.ac.uk/˜sv08aav/RCSoccerSim3DRules2013.1.pdf (2013) 14. Butler, K.: RoboCup 2013: Humanoid robots play soccer for world title. http://www.upi.com/Science_News/Blog/2013/07/01/RoboCup-2013-Humanoid-robots-play(2013) 15. Kitano, H., Tambe, M., Stone, P., Veloso, M., Coradeschi, S., Osawa, E., Matsubara, H., Noda, I., Asada, M.: The RoboCup synthetic agent challenge 97. In: RoboCup-97: Robot Soccer World Cup I, Springer (1998) 62–73 16. Kitano, H., Tadokoro, S.: Robocup rescue: A grand challenge for multiagent and intelligent systems. AI Magazine 22(1) (2001) 39 17. Bai, A., Chen, X., MacAlpine, P., Urieli, D., Barrett, S., Stone, P.: WrightEagle and UT Austin Villa: RoboCup 2011 Simulation League Champions. In: RoboCup 2011: Robot Soccer World Cup XV. Springer (2012) 1–12 18. World Bank: World Development Indicators 2012. http://data.worldbank.org/data-catalog/world-development-indicators (2012) 19. World Bank: GDP per capita, PPP (current international $). http://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD (2012) 20. Vilar, L., Araújo, D., Davids, K., Bar-Yam, Y.: Science of winning soccer: Emergent patternforming dynamics in association football. Journal of Systems Science and Complexity 26(1) (2013) 73–84 21. Grabiner, D.: The sabermetrics manifesto. http://seanlahman.com/baseball-archive/sabermetrics/sab (2004) 22. Fewell, J., Armbruster, D., Ingraham, J., Petersen, A., Waters, J.: Basketball teams as strategic networks. PloS one 7(11) (2012) e47445 23. Cliff, O., Lizier, J., Wang, R., Wang, P., Obst, O., Prokopenko, M.: Towards quantifying interaction networks in a football match. In: RoboCup 2013: Robot Soccer World Cup XVII, Springer (2013) 24. David, H.A.: The method of paired comparisons. DTIC Document 12 (1963) 25. Edwards, C.T.: Double-elimination tournaments: Counting and calculating. The American Statistician 50(1) (1996) 27–33