Minor Report Content

1.
INTRODUCTION
1.1 INTRODUCTION TO CRICKET:
Cricket is a sport that originated in England in the 16th century and later spread to
her colonies. The first international game however did not feature England but
was played between Canada and the United States in 1844 at the grounds of the
St. George’s Cricket Club in New York. In time, in both of these countries,
cricket took a back seat to other, faster sports like ice-hockey, basketball, and
baseball.
International cricket is played today by a number of British Commonwealth

countries; the main ones being Australia, Bangladesh, England, India, New
Zealand, Pakistan, South Africa, Sri Lanka, West Indies and Zimbabwe. These
teams are members of the International Cricket Council (ICC).
Figure 1.1: A cricket field showing the location of the pitch and some possible
fielding positions for players.
Cricket is played on an oval-shaped playing field and, apart from baseball, is the
only major international sport that does not define an exact size for the playing
field. The main action takes place on a rectangular 22 yard area called the pitch
(1)
in the middle of the large playing field. A diagram showing a cricket field is
given in Figure 1.1 and the pitch is magnified in Figure 1.2.
Cricket is a game played between two teams of 11 players each, where the two
teams alternate scoring (batting) and defending (fielding). A player (bowler) from
the fielding team delivers a ball to a player (batsman) from the batting team, who
should strike it with a bat in order to score while the rest of the fielding team
(fielders) defend the scoring.
Figure 1.2: The layout of the pitch
Furthermore, though it is a team sport, the bowler and batsman in particular, and
fielders to some extent, act on their own, each carrying out certain solitary actions
independently. A similar sport with respect to individual duties is baseball. The
simplicity of these actions (relative to sports such as hockey, soccer and basketball)
facilitate statistical modelling. In the process of batting, batsmen can get dismissed
(get out) due to a variety of lapses on their part. When all the batsmen from the
batting team have been dismissed, or the batting side has faced their allotted number
of overs, (each over normally consists of six balls), that team’s turn (called innings)
is concluded. Their score (number of runs) is recorded. A photo of some game action
is provided in Figure 1.3. The teams then change places and the fielding team now
gets to wield the bat and try to overtake the score of the team that batted first. At the
end of one such set of innings (in the shorter versions of cricket) and two such sets of
innings (in the longer version of cricket) the winner is selected on the basis of the
most runs scored. This is a very simplified explanation of a very complex game, and
there are many variables and constraints that come into play. For more details on
cricket; see http://www.icc-cricket.com/cricket-rules-and-regulations/. When
international cricket matured, the standard format was a match that could last up to
five whole days. This format is called a test match. But even after five days of play
(2)
the match could end in a draw which means that there is no winner. This was fine in
a more leisurely age when both players and spectators had more time, when playing
the game was more important than winning, and when most cricketers were amateur
players. But as lifestyles became faster, spectators became ever more reluctant or
unable to spend five days watching one match (sometimes with no result).
Meanwhile, other faster sports became crowd pullers and earned much more in the
way of ticket sales and TV rights. As a result, cricketers became the poor relatives in
the sports world. In the 1960s a shorter version of cricket was developed called ’one-
day cricket’ with each batting side given 65 overs, and later 50 overs in which to
score runs. When this format was used in international matches, they became known
as one-day Internationals, or ODIs.
Figure 1.3: The pitch in use, with bowler, batsman, fielders and Umpire (referee).
This version of cricket was much more exciting to watch, as the batsmen had to
wield the bat aggressively. Compared to the five-day long test matches, the advent of
the 50-over format was a dramatic improvement in terms of spectator entertainment.
However, even a 50-over match lasted about 8 hours and could not compete with the
two to three hours match times and attention spans of the fans of ice-hockey,
football, baseball and basketball. As competition increased for the sports fans’ dollar,
and TV advertising income was linked directly to the number of viewers, it was
inevitable that a shorter format for cricket would emerge. With declining ticket sales
(3)
and dwindling sponsorships, the England and Wales Cricket Board (ECB) discussed
the options for a shorter and more entertaining game limited to twenty overs per side
and the first official game in this format was played on June 13, 2003. This version
became known as Twenty20. Since then, Twenty20 cricket has exploded in
popularity with the label “Twenty20” being shortened to T20. In 2008 the Indian
Premier League (IPL) was inaugurated using the T20 format.
1.2 CRICKET DATA ANALYSIS:
Cricket Data Analysis is a field that combines the sport of cricket and the use of data
analysis techniques to understand and improve performance. Cricket is a highly
competitive sport that has millions of fans worldwide, and as a result, there is a vast
amount of data generated from every match. This data provides valuable information
about the performance of teams and players that can be used to make informed
decisions. The use of data analysis in cricket has been growing in recent years from
player performance analysis to team strategy and tactical decision-making, the
insights generated from data analysis can greatly improve the performance of teams
and players. This project aims to analyze the performance of cricket teams and
players using statistical methods and provide insights into their strengths and
weaknesses. The data used for this analysis will be collected from various sources
and will include information such as runs scored, and other relevant performance
metrics. The results of this analysis will be presented in a meaningful and concise
manner to provide valuable insights for coaches, managers and analysts. Overall, this
project will demonstrate the importance of data analysis in the sport of cricket and
how it can be used to gain a competitive advantage.
Cricket data analysis in modern sports holds immense significance as it provides a

systematic and data-driven approach to understanding and improving player
performance, team strategies, and overall game dynamics. Here are several key
reasons why cricket data analysis is crucial in the contemporary sports landscape:
1. Performance Optimization: Analysing player statistics allows teams to

identify strengths and weaknesses, enabling targeted training programs to
enhance individual player performance. Coaches can use data to tailor
strategies that capitalize on a player's strengths while mitigating weaknesses.
(4)
2. Strategic Decision-Making: Teams can use historical match data to
formulate effective game strategies. Analysis of opposition players and teams
helps in devising specific tactics, such as field placements, bowling
variations, and batting approaches, based on past performances.
3. Injury Prevention: Monitoring players' workload and performance metrics

can aid in injury prevention. By analyzing fatigue levels, bowling workloads,
and injury histories, teams can manage player fitness more effectively and
make informed decisions about player rotations and rest periods.
4. Scouting and Recruitment: Data analysis is invaluable in scouting new

talent. Teams can identify promising players by assessing their performance
metrics in domestic leagues and youth tournaments. This aids in making
informed decisions during player recruitment.
5. Fan Engagement: Cricket data analysis contributes to a richer fan

experience. Advanced statistics and visualizations provide fans with deeper
insights into the game, fostering a more informed and engaged fan base.
6. Media and Broadcasting: Media outlets and broadcasters use data analysis
to enhance their coverage by providing in-depth insights, statistics, and
visualizations during live broadcasts. This adds a layer of analysis and
commentary that enriches the viewing experience for fans.
7. Team Management and Planning: Coaches and team management can use
data to plan match strategies, make informed selections, and assess the
overall team dynamics. This helps in optimizing team composition and
fostering a cohesive and effective playing unit.
8. Performance Benchmarking: Comparing players and teams across different

formats and conditions allows for benchmarking performance. This aids in
understanding the adaptability and consistency of players and teams in
various situations.
1.3Aim and Objectives
Cricket, beyond its athletic prowess, is a game deeply intertwined with statistics.
Every ball bowled, every run scored, and every wicket taken contributes to an
(5)
intricate tapestry of data that holds the key to understanding player performance,
team dynamics, and strategic nuances. In this project, we embark on an end-to-end
data analytics journey, leveraging web scraping, Python, Pandas, and Power BI to
unravel the mysteries hidden within the vast expanse of cricket statistics.
A. Data Collection (Web Scraping):

The primary objective of this project is to collect a rich repository of cricket
statistics from reputable sources, setting the stage for in-depth analysis. The
challenges that loom on the horizon include navigating the intricacies of
dynamic content loading, pagination, and ensuring the integrity of the data
during the web scraping process.
B. Data Cleaning and Preprocessing:

With a focus on data accuracy, the project aims to meticulously clean and
preprocess the collected dataset. Addressing missing values, standardizing
data types, and tackling outliers are among the challenges that demand
careful consideration.
C. Exploratory Data Analysis (EDA):

This phase is dedicated to peeling back the layers of the dataset, revealing its
inherent patterns, trends, and correlations. The utilization of visualization
tools such as Matplotlib and Seaborn is paramount in facilitating a deeper
understanding of key performance indicators.
D. Statistical Analysis:
Calculation of essential performance metrics, including batting averages,
strike rates, and team win-loss ratios, forms the bedrock of statistical
analysis. This stage incorporates the implementation of statistical tests to
validate assumptions and unearth significant differences within the dataset.
E. Feature Engineering:
Adding a layer of complexity to the analysis, the project aims to create new
features that offer additional insights into player form, team dynamics, and
recent performance. The challenge lies in defining features that not only
(6)
enhance the analysis but align seamlessly with the overarching project
objectives.
F. Dashboard Creation (Power BI):

The visualization of data is a crucial aspect of this project, and Power BI
emerges as the tool of choice. The objective here is to design an interactive
dashboard that not only presents the insights gleaned from the data but also
provides users with a dynamic and user-friendly platform for exploration.
G. User Interaction and Filtering:

To ensure a seamless user experience, the project endeavors to implement
filters and slicers within Power BI, empowering users to dynamically explore
the intricacies of the cricket data. The challenge lies in achieving this
interactivity without compromising performance.
H. Insights and Recommendations:

The ultimate goal of the project is to derive actionable insights from the
analyzed cricket data. This involves translating statistical findings into
meaningful recommendations for teams, players, or strategic decisions.
Effective communication of these insights is a challenge that demands a
balance between technical accuracy and accessibility.
I. Documentation:
A comprehensive documentation strategy is imperative for ensuring the
reproducibility and clarity of the entire analytics process. This includes
documenting each step from data collection to analysis, with a keen focus on
creating materials accessible to both technical and non-technical audiences.
J. Presentation:
The culmination of the project involves the preparation of a presentation that
encapsulates the key aspects for diverse stakeholders. The challenge here is
to communicate technical findings in a clear and engaging manner, bridging
the gap between the intricacies of data analytics and the understanding of a
broader audience.
(7)
K. Portfolio Enhancement:
Beyond the analytical journey itself, the project seeks to contribute to the
individual's professional growth. Uploading the project code, documentation,
and presentation to a GitHub repository serves not only as a testament to the
acquired skills but also as a valuable addition to the individual's data science
portfolio.
1.4 Modules
This project aims to analyze cricket data using web scraping, Python, Pandas, and
Power BI. We will scrape match, player batting, and bowling data from ESPN
Cricinfo, clean and transform it in Python and Pandas, and finally, create interactive
dashboards in Power BI for insightful data visualization.
1. Data Scraping:
a. Target website:
ESPN cricinfo is the chosen website for data scraping due to its
comprehensive coverage of cricket matches and players statistics.
b. Web scraping tools:
Bright Data’s data collector is used for parsing HTML content and
extracting relevant data in the form of json files.
c. Data scraped:
Match data (match details: location, teams, winner, date, scorecard,
etc.)
Player batting data (player name, runs scored, boundaries, strike rate,
etc.)
Player bowling data (player name, wickets taken, economy rate, etc.)
2. Data cleaning & transformation:
a. Import libraries:
Pandas library is used for data cleaning, manipulation and analysis.
b. Data cleaning:
In data cleaning, HTML tags and unwanted characters are removed.
Missing values and inconsistencies are handled. Data formats are
standardized.
c. Data transformation:
(8)
Creating new features or derived variables (e.g., average score,
bowling average).
Combines data from different sources (match & player data).
Aggregate data for specific analysis (e.g., player performance over a
specified period).
3. Data analysis and insights:
a. DAX: calculate essential performance indicators for players and teams,
such as:
Batting average, strike rate, highest score for batters.
Bowling average, economy rate, most wickets for bowlers.
Win/loss ratio, average score for teams.
b. Descriptive statistics:
Analyse data using descriptive statistics to understand data distribution.
c. Data visualization:
Snowflake schema, charts and graphs are created to visualize trends and
patterns in player performance and team strategies.
d. Hypothesis testing:
Perform statistical tests to identify significant differences between players
or teams based on specific criteria.
4. Power BI dashboards:
a. Data import:
Import the cleaned and transformed data into Power BI.
b. Relationship building:
Define relationships between different data table for accurate analysis.
c. Visualization creation:
Creating interactive dashboards with various visuals such as:
Bar charts and line charts to compare players performance.
Scatter plots to identify trends and outliners.
Maps to analyse team performance across different venues.
(9)
2. REQUIREMENTS ANLAYSIS WITH SRS:
2.1 WHAT IS SOFTWARE REQUIREMENT SPECIFICATIONS (SRS)
DOCUMENT?
A software requirements specification (SRS) is a description of a software
system to be developed. It lays out functional and non-functional
requirements and may include a set of use cases that describe the user
interaction that the software must provide. The software requirement
specification document enlists enough and necessary requirements that are
required for the project development. To derive the requirements, we need to
have a clear and thorough understanding of the products to be developed or
being developed. This is achieved and refined with detail and continuous
communication with the project team and customer till the completion of the
software. The Software Requirements Specification (SRS) is a
communication tool between stakeholders and software designers.
The specific goals of the SRS are:
• Facilitating reviews
• Describing the scope of work
• Proving a reference to software designer (i.e. navigation aids, document
structure) • Providing a framework for testing primary and secondary use
cases
• Including features to the customer requirements
• Providing a platform for the ongoing refinement (via incomplete specs
or questions)
2.2 DESIGN CONSIDERATIONS

2.2.1 Software Requirements:
The software can be defined as programs that run on our computers. It
acts as petrol in the vehicle. It provides a relationship between a
human and a computer. It is very important to run software to function
on a computer. Various software is needed in this project for its
development.
A. Operating System: Any modern operating system like
Windows, macOS, or Linux would suffice.
B. Python: Version 3.6 or higher is recommended.
(10)
C. Python Libraries:
requests: For making web scraping requests.
BeautifulSoup: For parsing HTML content.
pandas: For data manipulation and analysis.
powerbi-python (optional): For connecting to and interacting with
Power BI from Python.
D. Web Browser: Chrome or Firefox with a web scraping extension
like Selenium or Puppeteer (optional).
E. Power BI Desktop: Free version available for download.
2.2.2 Hardware Requirements:
In hardware requirements, we require all those components that will
provide us with the platform for the project's development. The
minimum hardware requirements are as follows:
• Processor: Multi-core processor recommended for smooth
performance.
• RAM: Minimum 8GB recommended, 16GB preferred for better
performance.
• Storage: Enough storage space to store scraped data and project
files.
• Internet Connection: Stable internet connection required for
downloading libraries, scraping data, and accessing Power BI
services.
2.2.3 Technologies used:
This project utilizes the following technologies to build an insightful
cricket data analysis platform:
Web Scraping:
• Libraries: BeautifulSoup, Selenium, Scrapy
• Purpose: Extract cricket data from websites like ESPN
Cricinfo, Cricket Archive, etc.
Python:
• Libraries: Pandas, NumPy, Matplotlib, Seaborn
• Purpose:
(11)
o Data Cleaning and Transformation: Cleaning scraped
data, handling missing values, and transforming data into a
suitable format for analysis.
o Exploratory Data Analysis (EDA): Analyzing data
trends, identifying patterns, and generating insights.
o Building Statistical Models: Building models to predict
match outcomes, player performance, etc.
Pandas:
• Purpose:
o Data Manipulation: Efficiently handle and organize data
in data frames.
o Data Analysis: Perform calculations, aggregations, and
statistical analysis on the data.
o Data Visualization: Create various charts and graphs to
visualize data insights.
Power BI:
• Purpose:
o Building Interactive Dashboards: Create visually appealing
and interactive dashboards to showcase analysis results in
a user-friendly way.
o Sharing Insights: Share dashboards and reports with
stakeholders for deeper understanding and decision-
making.
Additional technologies:
• Jupyter Notebook: Interactive environment for developing and
analysing data in Python.
• Git: Version control system for managing project code and data.
2.3 REQUIREMENT SCOPING:

We have to define parameters in order to select 11 players.
Requirements:
A. The team should be able to score at least 180 runs on an average.
B. They should be to defend 150 runs on an average.
(12)
Parameters for selecting players:
A. OPENERS (2 PLAYERS):
The openers are the two players who bat first in an innings. This is a
crucial role, as they are responsible for setting the tone for the innings
and providing a solid foundation for the rest of the batting team.
PARAMETERS DESCRIPTION CRITERIA
Batting average Average runs scored in an inning. >30
Strike rate No. of runs scored per 100 balls. >140
Innings batted Total innings batted. >3
Boundary % of runs scored in boundaries. >50

percentage
Batting position Order in which batter played <4
Table 2.1: Parameters for selecting openers for a match.
B. ANCHORS/MIDDLE ORDER (3 PLAYERS):

These players bat after the openers and are responsible for building on
the foundation laid by the openers and accelerating the scoring rate.
Average balls Average balls faced by the batter in >20

faced. innings.
Batting position Order in which batter played >2
Table 2.2: Parameters for selecting anchors for a match.
(13)
These five players will give an average of 100-120 runs
C. FINISHER/LOWER ORDER ANCHOR (1 PLAYERS):

These players are responsible for scoring quickly in the final overs of
an innings. They need to be aggressive and able to hit boundaries in
order to maximize the team's score.
Average balls Average balls faced by the batter in >12

faced the innings.
Innings bowled Total innings bowled by the >1

bowler.
Table 2.3: Parameters for selecting finishers for a match.
D. ALL ROUNDERS (2 PLAYERS):

These players are capable of both batting and bowling well. They can
provide balance to the team and are often key players in limited-overs
cricket formats.
(14)
Innings bowled Total innings bowled by the >2
bowler.
Bowling Average runs allowed per over. >7
economy
Bowling strike Average numbers of balls required <20
rate to take a wicket.
Table 2.4: Parameters for selecting allrounders for a match.
E. SPECIALIST FAST BOWLERS (3 PLAYERS):

Specialist fast bowlers are a vital component of any cricket team. Their
primary role is to take wickets and restrict the opposition's scoring rate.
They rely on their pace, accuracy, and variations to achieve this goal.
Innings bowled Total innings bowled. >4
Bowling Average runs allowed per over. <7

economy
Bowling strike Average number of balls required <16
rate to take a wicket.
Bowling style Bowling style of the player =%FAST%
Bowling Number of runs allowed per <20

average wicket.
Dot ball % Percentage of the dot balls allowed. >40
Table 2.5: Parameters for selecting specialist fast bowlers for a match.
(15)
3. SYSTEM DESIGN
3.1 REQUIREMENT ANALYSIS DIAGRAMS:
3.1.1 CLIENT-SERVER ARCHITECTURE:
Fig 3.1. Client-Server Architecture
(16)
3.1.2 ENTITY-RELATIONSHIP DIAGRAM:
Fig 3.2. Entity-Relationship Diagram
(17)
3.1.3 USE CASE DIAGRAM:
Fig 3.3. Use Case Diagram
(18)
3.1.4 DATA FLOW DIAGRAMS:
LEVEL 0:
Fig 3.4. Level 0 DFD
Explanation:
1. Cricket Data Analytics Project:
➔ Central process representing the overall cricket data analytics project.
2. External Entities:
➔ Web Data Source: Represents the cricket website as an external source of

data.
➔ Python Script: Handles data extraction from the web source.

➔ Power BI: Utilizes the processed data for visualization.
(19)
3. Data Stores:
➔ Raw Cricket Data: Stores the web-scraped data.
➔ Processed Data: Stores data manipulated using Python and Pandas.
➔ Power BI Dataset: Stores data prepared for visualization in Power BI.
4. Data Flows:
➔ Arrows indicate the flow of data between processes, entities, and data
stores.
• From Web Data Source to Python Script: Represents data

extraction.
• From Python Script to Raw Cricket Data: Indicates storing scraped

data.
• From Raw Cricket Data to Pandas: Represents the flow of data for
manipulation.
• From Pandas to Processed Data: Indicates storing cleaned/processed

data.
• From Processed Data to Power BI: Represents the flow of data for
visualization.
• From Power BI to end-users or stakeholders: Represents the final

output.
This Level 0 DFD provides a high-level overview of the major components and their
interactions in your cricket data analytics project. Depending on the complexity of
your project, you may need to create more detailed DFDs at lower levels to further
elaborate on subprocesses and data flows within each component.
(20)
Level 1:
Fig 3.5. Level 1 DFD

Explanation:
1. EXTERNAL ENTITIES:
➔ Cricket Data Source (ESPN Cricinfo): The external source of

cricket data where the web scraping process extracts information
about matches, players, and other relevant data.
➔ User (Data Analyst): The person interacting with the system,
providing input, and receiving the final analyzed data for decision-
making.
2. PROCESSES:
➔ Web Scraping and Data Extraction:
▪ Description: This subprocess is responsible for extracting raw

cricket data from the ESPN Cricinfo website.
(21)
▪ Detailed Flow: Arrows show the detailed extraction process,
indicating the flow of data from the web source to the Local
File System (Raw Cricket Data).
➔ Data Processing with Python and Pandas:

▪ Description: This subprocess involves loading, cleaning, and
processing the raw cricket data using Python and Pandas.
▪ Detailed Flow: Arrows show the transformation steps,
including the flow of data from the Local File System (Raw
Cricket Data) to Processed Data (Fields and Columns after
Data Manipulation).
➔ Power BI Visualization:
▪ Description: This subprocess is responsible for creating

interactive dashboards and reports using Power BI.
▪ Detailed Flow: Arrows show the flow of data from

Processed Data to Power BI Dataset and, eventually, to
the Data Analyst (User).
3. DATA STORES:
➔ Local File System (CSV Files - Raw Cricket Data):
▪ Description: This data store holds the web-scraped data in

a detailed structure or format.
➔ Processed Data (Fields and Columns after Data Manipulation):
▪ Description: This data store stores the data after it has

been manipulated, cleaned, and processed using Python
and Pandas.
➔ Power BI Dataset:
▪ Description: This data store holds the analyzed data ready

for interactive visualization in Power BI.
4. ANNOTATIONS:
➔ Descriptions:
(22)
▪ Descriptions provide clarity on the purpose of each
external entity, process, and data store.
➔ Detailed Data Flows:
▪ The arrows show the direction of data flow, indicating the

steps in web scraping, data processing, and visualization.
➔ External Annotations:
▪ Annotations outside the diagram provide additional

information about the technologies used in each
subprocess, such as "Web Scraping using Bright Data" or
"Data Manipulation with Python and Pandas."
(23)
3.1TEST PLAN
4.1 Test Case 1:
Fig 4.1: List of 11 players selected using on the basis of different parameters value
(24)
4.2 TEST CASE 2
Fig 4.2: List of 11 players selected using on the basis of different parameters value
(25)
4.3 TEST CASE 3:
Fig 4.3: 11 players selected using different values of the parameters
(26)
5. BODY OF THESIS
5.1 LITERATURE REVIEW:
Cricket, a captivating sport with millions of fans globally, generates a wealth of
data, presenting a fertile ground for the application of data analytics. This
comprehensive literature review delves deep into the current landscape of cricket
analytics, exploring the diverse methodologies employed, scrutinizing the
challenges faced, and charting a path towards promising future directions.
1. The Flourishing Landscape of Cricket Analytics:

Driven by advancements in data collection, storage, and analysis tools,
cricket analytics has witnessed remarkable growth in recent years. Major
cricket boards, teams, and analysts are increasingly leveraging data to delve
into player performance, team dynamics, and match outcomes, leading to a
plethora of applications, including:
1.1 Player Performance Analysis:
Identifying key performance indicators (KPIs): Analyzing batting
averages, bowling strike rates, fielding efficiency, and other relevant
metrics paints a comprehensive picture of individual player performance,
providing valuable feedback for improvement.
Understanding batting and bowling styles: Analyzing factors like strike
rotation, shot selection, and bowling variations helps understand
individual strengths and weaknesses, enabling personalized training
programs and skill development.
Predicting future performance: Utilizing statistical models and machine
learning techniques, analysts can predict individual performance with
increasing accuracy, allowing teams to make informed decisions on
player selection and strategy.
1.2. Team Strategy Optimization:
Analyzing batting and bowling strategies: Examining historical data and
player strengths helps optimize batting order, fielding positions, and
bowling strategies, maximizing team potential for winning matches.
(27)
Identifying player matchups: Analyzing data to understand individual
player strengths and weaknesses against specific opponents assists in
formulating optimal strategies and predicting potential outcomes.
Simulating match scenarios: Utilizing data-driven simulations enables
teams to test different strategies and predict potential outcomes under
various conditions, fostering informed decision-making and risk
management.
1.3. Predictive Modeling:
Predicting match winners: By analyzing historical data, team
compositions, and current playing conditions, predictive models can
forecast match winners with increasing accuracy, aiding in strategic
planning and resource allocation.
Predicting individual performances: Utilizing data-driven models,
analysts can predict individual player performance metrics like runs
scored or wickets taken, providing valuable insights for coaches and team
management.
Predicting impactful moments: Identifying statistically significant factors
that lead to pivotal moments in a match, such as sixes or dismissals,
empowers teams to capitalize on these moments and strategize
accordingly.
1.4. Talent Identification:
Identifying promising young players: Analyzing youth academy data and
player performance in lower leagues helps identify promising young
talent with the potential to excel at the highest level, facilitating early
recruitment and development.
Assessing fitness and injury risk: Utilizing data to analyze player fitness
levels and injury history assists in managing player workloads, preventing
injuries, and maximizing player availability.
Evaluating player development programs: Data analysis facilitates the
evaluation of player development programs, measuring their effectiveness
and identifying areas for improvement, optimizing talent development
initiatives.
(28)
2. Unveiling Insights: A Tapestry of Methodologies:
Cricket analytics utilizes a diverse range of methodologies, each offering its
unique strengths and insights:
2.1. Traditional Statistical Analysis:
Descriptive statistics: Calculating averages, medians, and standard
deviations provides a basic understanding of player and team
performance, enabling initial analysis and identifying trends.
Correlation analysis: Identifying relationships between different variables,
such as a batsman's strike rate and the size of the boundary, unveils
underlying patterns and informs strategic decision-making.
Regression analysis: Predicting continuous variables like a batsman's
score or a bowler's wickets based on other relevant factors allows for
deeper analysis of player performance and assists in talent identification.
2.2. Machine Learning Techniques:
Decision trees: These models classify data based on a series of rules,
aiding in predicting match outcomes, player roles, and impactful moments
within a match.
Random forests: Comprised of multiple decision trees, these models offer
improved accuracy and robustness compared to single decision trees,
enhancing the reliability of predictions.
Neural networks: These complex models learn from large amounts of data
and can identify complex patterns and relationships, providing deeper
insights into player performance and match dynamics.
2.3. Advanced Techniques:
Natural Language Processing (NLP): Analyzing commentary, news
articles, and social media using NLP techniques helps understand public
sentiment, identify trends, and gauge the impact of specific events.
Computer vision: Analyzing video footage to track player movements,
ball trajectories, and predict outcomes based on real-time data
revolutionizes performance analysis and tactical decision-making.
Big data integration: Utilizing diverse data sources, including weather
data, player bios, and social media interactions, provides a more
comprehensive view of the sport and its complexities, uncovering hidden
patterns and insights.
(29)
3. Navigating Challenges: Towards Responsible and Ethical Data
While cricket analytics offers immense potential, it also faces challenges that
require careful consideration:
3.1. Data Availability and Quality:
Accessing comprehensive and accurate data, particularly for historical
matches and lower-tier leagues, remains a challenge, hindering the scope
and depth of analysis.
Inconsistency in data collection and recording methods, especially for
older datasets, can impact the accuracy of results and limit the
generalizability of findings.
3.2. Model Generalizability and Explainability:
Models trained on specific data sets may not generalize well to different
playing conditions, teams, or players, leading to inaccurate predictions
and limited applicability.
Lack of transparency in complex AI models can hinder trust and raise
ethical concerns, requiring the development of explainable AI techniques
for greater understanding and accountability.
3.3. Ethical Considerations and Data Privacy:
Bias in data sets and potential misuse of information require careful
consideration.
Ensuring data privacy and protecting personal information of players and
other stakeholders is paramount.
3.4. Transparency and Collaboration:
Sharing data and collaborating between different stakeholders, including
cricket boards, teams, and researchers, can accelerate progress and lead to
more comprehensive and reliable insights.
Open access to data and research findings can promote
transparency, encourage innovation, and benefit the sport as a whole.
4. Unlocking New Horizons: The Future of Cricket Analytics:

The future of cricket analytics appears bright, with exciting possibilities
driven by advancements in technology and a focus on deeper understanding:
4.1. Big Data Integration and Advanced Analytics:
(30)
Utilizing large datasets from diverse sources, including social
media, weather data, and player bios, will provide a more comprehensive
view of the sport and its complexities, uncovering hidden patterns and
insights.
Advanced analytics techniques, such as natural language processing
(NLP) and computer vision, will offer deeper insights into player
performance, team dynamics, and match outcomes.
4.2. Explainable AI and Responsible Data Governance:
Developing explainable AI models that can provide clear explanations for
their predictions will enhance transparency and trust in their
results, fostering informed decision-making.
Implementing responsible data governance practices will ensure ethical
data utilization, safeguard privacy, and promote fairness and
transparency.
4.3. Democratization of Cricket Analytics:
Making data and analytics tools more accessible and user-friendly will
empower coaches, players, and fans to gain valuable insights and
personalize their approach to the sport.
Open-source data platforms and collaboration initiatives will promote
innovation and accelerate the development of new applications and
analytical tools.
5. Conclusion: Towards a Data-Driven Future of Cricket

Cricket analytics has emerged as a powerful tool for enhancing player
performance, optimizing team strategies, and predicting match outcomes. By
addressing the identified challenges and embracing new opportunities, the
future of cricket analytics promises:
5.1 Deeper understanding of the sport:
Uncovering hidden patterns, identifying key factors influencing
performance, and refining our understanding of the sport's
complexities.
(31)
5.2 Enhanced player development:
Tailoring training programs based on individual strengths and
weaknesses, optimizing player performance, and fostering
personalized growth.
6. Improved team performance:
Developing data-driven strategies, predicting match outcomes, and

maximizing team success through informed decision-making.
7. Engaged fans and stakeholders:
Providing deeper insights into the game, offering engaging data

visualizations, and promoting a data-driven culture within the sport.
As cricket analytics continues to evolve, its impact will extend beyond the field,
influencing coaching methods, broadcasting strategies, and the overall fan
experience. In conclusion, embracing a data-driven approach promises to
revolutionize the sport, fostering innovation, enhancing performance, and
enriching the experience for players, fans, and stakeholders alike.
5.2 METHODOLOGY:
5.2.1 DATA COLLECTION USING WEB SCRAPING:
1. Identifying Relevant URLs:

➔ Pinpoint the specific pages on ESPN Cricinfo containing the desired
data. This could involve navigating through menus, searching for
specific matches or players, or using advanced search filters.
➔ Analyze the website structure and identify the URL patterns for
accessing different data sections (e.g., player profiles, match
scorecards, tournament statistics).
➔ Consider potential limitations of web scraping, such as restrictions on
access frequency or data availability.
2. Choosing a Web Scraping Tool:

➔ Explore various Python libraries and frameworks for web
scraping, such as Beautiful Soup, Scrapy, or Selenium.
(32)
➔ Evaluate the features and functionalities of each tool based on the
complexity of the website structure, dynamic content presence, and
desired scraping depth.
➔ Choose a library that offers suitable data parsing capabilities and
allows for efficient extraction of relevant information.
3. Developing the Scraping Script:

➔ Write Python code to automate the data extraction process.
➔ Utilize the chosen library's functionalities to:
o Download the HTML content of the target URL.
o Parse the HTML structure using appropriate methods
(e.g., BeautifulSoup).
o Identify relevant data elements based on specific
tags, attributes, or class names.
o Extract the desired data points, including text, numbers, and
other attributes.
o Handle dynamic content loading or JavaScript elements by
using techniques like Selenium or headless browsers.
o Implement error handling mechanisms to address potential
issues during scraping.
4. Storing the Scraped Data:

➔ Save the extracted data in a structured format suitable for further
analysis.
➔ Options include CSV (comma-separated values), JSON
(JavaScript Object Notation), or custom formats depending on the
data structure and analysis requirements.
➔ Ensure proper data labeling and organization for efficient data
handling in subsequent steps.
5. Validating and Cleaning the Data:

➔ Check for inconsistencies, missing values, and formatting errors in
the scraped data.
➔ Utilize Python libraries like pandas to perform data cleaning tasks:
(33)
o Detect and handle missing values using techniques like
imputation or deletion.
o Clean and standardize data formatting to ensure consistency in
data types and units.
o Identify and remove irrelevant data points or outliers.
o Document any data cleaning procedures and transformations
for future reference.
5.2.2 DATA TRANSFORMATION IN PYTHON PANDAS:

1. Importing the Data:
➔ Load the scraped data into a pandas DataFrame for structured
manipulation and analysis.
➔ Explore the data structure and familiarize yourself with the
available columns and data types.
2. Detecting and Handling Missing Values:

➔ Identify missing values in each column using pandas functions
like isnull().
➔ Analyze the pattern and distribution of missing values to
determine the best approach for handling them.
➔ Choose appropriate methods like:
o Imputing missing values with statistical methods like mean
or median.
o Deleting rows or columns with excessive missing values.
o Implementing custom logic based on domain knowledge
and data characteristics.
3. Cleaning and Standardizing Data:

➔ Remove unnecessary characters, whitespace, and special symbols
from text data.
➔ Standardize data formats for consistent analysis across different
features.
➔ Convert data types to appropriate formats like integers, floats, or
datetimes.
(34)
➔ Ensure data units are consistent throughout the DataFrame.
4. Merging and Joining Datasets:

➔ Combine data from multiple scraped sources or tables if necessary.
➔ Utilize pandas functions like merge and concat to perform inner
joins, outer joins, or vertical/horizontal concatenation.
➔ Ensure proper data alignment and relationship identification when
merging datasets.
5. Feature Engineering:
➔ Create new features from existing data to enhance analysis
capabilities.
➔ This may involve:
o Calculating derived features like batting average or bowling
strike rate.
o Grouping players based on specific criteria like playing style
or role.
o Creating time-based features for analyzing trends and
seasonality.
o Applying data transformation techniques like normalization or
scaling.
6. Validating and Verifying Data:
➔ Perform checks to ensure data integrity and accuracy after
cleaning and transformation.
➔ Use data analysis tools and visualization techniques to identify any
remaining inconsistencies or anomalies in the data.
➔ Verify the logic and functionality of custom features and
transformations implemented.
5.2.3 DATA TRANSFORMATION IN POWER QUERY:

1. Importing the Data:
➔ Load the cleaned and transformed data from Python into Power BI
Desktop through Power Query.
➔ Utilize Power BI's data import features to connect to the data
source or upload the data.
(35)
2. Transforming Data Types:
➔ Ensure data types in Power Query are compatible with Power BI
visualizations and calculations.
➔ Use Power Query's data transformation features to convert data
types for specific columns.
➔ For example, convert text data to date/time format for analyzing
time-based trends.
3. Creating Calculated Columns:

➔ Utilize Power Query's formula language (M) to create new
calculated columns based on existing data.
➔ This allows for:
o Performing complex calculations on multiple columns.
o Applying conditional logic to create new categories or
groups.
o Deriving additional features relevant to the analysis.
o Enhancing data preparation for creating insightful
visualizations in Power BI.
4. Pivot and Reshape Data:

➔ Transform the data layout to a suitable format for desired
visualizations and analysis.
➔ Utilize Power Query's pivot and unpivot features to reshape the data
based on specific columns and criteria.
➔ This enables efficient data aggregation and analysis for different
dimensions and perspectives.
5. Grouping Data:
➔ Group data based on specific criteria to perform aggregate
calculations and analysis.
➔ Power Query's Group By function allows grouping by one or
more columns and calculating various aggregate functions like
sum, average, or count.
(36)
➔ This helps identify trends and patterns within different groups of
players, teams, or matches.
6. Cleaning and Filtering Data:

➔ Perform additional cleaning and filtering steps to refine data for
specific analysis needs.
➔ Power Query provides various filter functions to remove unwanted
data points based on specific conditions.
➔ This further enhances data quality and ensures focused analysis on
relevant information.
5.2.4 DATA MODELING AND BUILDING PARAMETERS USING DAX:

1. Defining Relationships:
➔ Establish relationships between tables loaded into Power BI to
enable data analysis across multiple datasets.
➔ Use Power BI's relationship manager to identify and define
relationships based on common key columns.
➔ Ensure accurate data representation and avoid ambiguous joins that
could affect analysis results.
2. Creating Measures:
➔ Utilize DAX language to define calculated measures for specific
analysis needs.
➔ This allows for:
o Calculating advanced metrics like strike rate, bowling
economy, or win percentage.
o Creating custom performance indicators based on specific
criteria.
o Analyzing relationships and correlations between different
variables.
o Enhancing the depth and insights derived from the data.
3. Building Parameters:
➔ Define parameters in DAX to allow users to dynamically filter and
customize visualizations based on their preferences.
(37)
➔ This enables interactive exploration of the data and facilitates user-
driven insights.
➔ Parameters can be based on specific filters, date ranges, or other
criteria.
4. Testing and Refining Model:
➔ Test the data model for accuracy, performance, and ensure desired
calculations and measures are functioning correctly.
➔ Utilize Power BI's testing tools and data validation capabilities to
identify any errors or inconsistencies.
➔ refine the model based on testing results and optimize performance
for efficient analysis.
5. Documenting the Model:
➔ Create documentation for the data model, including:
o Descriptions of tables, columns, and relationships.
o Definitions of calculated measures and parameters.
o Explanation of cleaning and transformation procedures.
o Assumptions and limitations of the data and analysis.
➔ This documentation ensures clarity, transparency, and
reproducibility of the analysis for future reference and
collaboration.
5.2.5 BUILDING A COMPREHENSIVE DASHBOARD:

➔ Utilize Power BI's visual authoring features to create insightful and
engaging dashboards for data exploration and communication.
➔ Choose appropriate visualizations like charts, graphs, maps, and tables
to represent different aspects of the data effectively.
➔ Design the dashboard layout for optimal information flow and user
interaction.
➔ Incorporate filters, slicers, and parameters to allow users to explore the
data and discover insights.
➔ Add annotations, titles, and descriptions to provide context and clarity
for the visualizations.
➔ Share the dashboard with stakeholders for effective communication and
informed decision-making.
(38)
5.2.6 CONCLUSION:
This detailed methodology provides a comprehensive framework for
collecting, cleaning, transforming, and analyzing cricket data. By
adhering to these steps, you can build robust data models, create
insightful visualizations, and derive valuable insights into player
performance, team dynamics, and match outcomes.
5.3 TECHNOLOGY USED:

5.3.1 PYTHON
Fig 5.1: Python Logo
Python is a versatile and powerful programming language widely used in data

analysis and visualization. Its simplicity, readability, and extensive libraries
make it a popular choice for beginners and seasoned professionals alike. In
this report, we will delve into the key features of Python that make it a
valuable tool for data analysis and visualization projects.
What Makes Python Ideal for Data Analysis:
➔ Simplicity and Readability: Python's syntax is clear and concise, making
it easier to learn and write compared to other languages like C++ or
Java. This allows analysts to focus on the data itself rather than struggling
with complex syntax.
➔ Extensive Libraries: Python boasts a rich ecosystem of libraries

specifically designed for data analysis and manipulation. These
libraries, such as Pandas, NumPy, and Scikit-learn, offer powerful
(39)
functions and algorithms for various tasks, including data
cleaning, exploration, transformation, and modeling.
➔ Flexibility and Scalability: Python is a versatile language that can handle

various data sizes and formats. From small datasets to massive Big Data
projects, Python can adapt and scale efficiently, making it suitable for
diverse analytical needs.
➔ Interactive Environment: Python allows for interactive data exploration

and visualization through tools like Jupyter Notebook and IPython. This
interactive nature enables analysts to experiment with code and visualize
results in real-time, facilitating a more dynamic and iterative analysis
process.
➔ Open-source and Community Support: Python is an open-source language

with a large and supportive community. This means that free
resources, tutorials, and libraries are readily available online, making it
easier to find help and solve problems.
Benefits of using Python for Data Analysis:

➔ Increased Efficiency: Python's simplicity and libraries automate many
repetitive tasks, allowing analysts to focus on more complex problems.
➔ Improved Accuracy: Python libraries are well-tested and

maintained, reducing the risk of errors compared to manual data
manipulation.
➔ Enhanced Insights: Powerful visualizations and interactive analysis tools

enable deeper exploration of data and discovery of hidden patterns.
➔ Wider Collaboration: Python's popularity and open-source nature

facilitate collaboration with other analysts and researchers.
Examples of Data Analysis Applications:

➔ Finance: Analyzing stock market trends, building financial models, and
forecasting economic indicators.
➔ Healthcare: Identifying risk factors for diseases, analyzing patient

data, and developing predictive models.
(40)
➔ Marketing: Understanding customer behavior, segmenting target
audiences, and optimizing marketing campaigns.
➔ Social Sciences: Analyzing survey data, studying social interactions, and

exploring trends in social media.
➔ Science and Engineering: Analyzing experimental data, modeling

scientific phenomena, and designing simulations.
5.3.2 PANDAS
Fig 5.2: Pandas
Pandas is a software library built on top of the NumPy library. It provides

high-level data structures like DataFrames and Series, which are specifically
designed for efficient storage, manipulation, and analysis of large datasets.
Key Features of Pandas:

➔ Data Structures: Pandas offers two primary data structures:
o DataFrames: Two-dimensional, size-mutable, tabular data

structures with labeled rows and columns. Ideal for storing and
analyzing relational data.
o Series: One-dimensional, size-mutable, labeled arrays used to

represent sequences of data. Useful for storing and analyzing time
series data or categorical data.
➔ Data Manipulation: Pandas provides a rich set of functions and methods

for manipulating data, including:
(41)
o Data cleaning and preprocessing
o Indexing, selection, and filtering
o Aggregation and group-by operations
o Data merging and joining
o Missing value handling
o Data transformation and feature engineering
➔ Data Analysis: Pandas integrates seamlessly with NumPy and other

scientific libraries to perform various data analysis tasks, including:
o Descriptive statistics and data exploration
o Time series analysis
o Statistical hypothesis testing
o Data visualization
Benefits of using Pandas:

▪ Efficiency: Pandas' built-in functions and methods automate many
repetitive data manipulation tasks, making data analysis faster and
more efficient.
▪ Flexibility: Pandas' data structures are flexible and

adaptable, allowing you to work with various data formats and sizes.
▪ Readability: Pandas code is generally concise and easy to

understand, making it easier to collaborate with other analysts.
▪ Visualization Integration: Pandas integrates seamlessly with

libraries like Matplotlib and Seaborn for creating informative and
visually appealing data visualizations.
▪ Large Community and Resources: Pandas boasts a large and active

community, providing extensive documentation, tutorials, and
resources for learning and troubleshooting.
Examples of Pandas Applications:
(42)
• Financial Analysis: Analyzing stock market data, calculating financial
ratios, and building trading strategies.
• Healthcare: Analyzing patient data, identifying risk factors for
diseases, and developing treatment plans.
• Marketing: Analyzing customer behavior, segmenting target
audiences, and optimizing marketing campaigns.
• Social Sciences: Analyzing survey data, studying social
interactions, and exploring trends in social media.
• Science and Engineering: Analyzing experimental data, modeling
scientific phenomena, and designing simulations.
5.3.3 POWER QUERY
Fig 5.3: Power Query

Power Query is a data transformation engine built into Microsoft Excel. It
allows users to import data from various sources, clean and transform it, and
then load it into Excel for further analysis and visualization. Power Query
utilizes a visual query editor, making data manipulation intuitive and
accessible even for users with limited coding experience.
Key Features of Power Query:

➔ Data Import: Power Query connects to a wide range of data
sources, including Excel files, text files, databases, and web APIs.
➔ Data Transformation: It offers a variety of functions and operators
for cleaning, transforming, and shaping data. These include
filtering, sorting, merging, splitting, and adding custom logic.
(43)
➔ Visual Query Editor: The visual query editor provides a user-
friendly interface for building complex data transformation steps
without writing code.
➔ Data Modeling: Power Query allows for creating data models by
establishing relationships between different datasets. This facilitates
multi-dimensional analysis and simplifies complex data exploration.
➔ M Language: For advanced users, Power Query offers the M
language, which provides a powerful scripting environment for
creating custom functions and automating complex tasks.
➔ Integration with Excel: The transformed data seamlessly integrates
with Excel spreadsheets, enabling users to leverage Excel's familiar
functionalities for further analysis and visualization.
Benefits of using Power Query:

➔ Increased Efficiency: Power Query automates repetitive data
manipulation tasks, saving time and effort compared to manual data
cleaning and transformation.
➔ Improved Accuracy: The visual query editor reduces the risk of
errors compared to manual coding, leading to more reliable data
analysis.
➔ Enhanced Data Quality: Power Query's cleaning and transformation
tools improve data consistency and usability, leading to more accurate
and actionable insights.
➔ Simplified Data Analysis: The visual query editor makes complex
data manipulation tasks accessible to users of all skill
levels, promoting greater data literacy and analysis.
➔ Flexible and Scalable: Power Query can handle various data sizes
and formats, making it suitable for projects of all scales.
Examples of Power Query Applications:

➔ Finance: Transforming financial data for analysis, building financial
models, and creating custom reports.
(44)
➔ Marketing: Cleaning and combining customer data from different
sources, segmenting target audiences, and analyzing marketing
campaign performance.
➔ Operations: Analyzing operational data to identify
inefficiencies, improve processes, and optimize resource allocation.
➔ Human Resources: Transforming employee data for
analysis, calculating HR metrics, and creating custom dashboards.
➔ Sales: Analyzing sales data to identify trends, forecast sales, and
improve customer service.
5.3.4 DAX
Data Analysis Expressions (DAX) is a powerful formula language used

within Microsoft Power BI and other Power Platform tools. It enables users to
create calculated columns, measures, and tables, unlocking deeper insights
from their data. This report delves into the key features and benefits of DAX,
highlighting its capabilities and demonstrating its impact on data analysis
projects.
DAX is a formula language built on top of the Tabular Model in Power BI. It
allows users to perform calculations, aggregations, and logical operations on
data within the model. Unlike traditional Excel formulas, DAX is optimized
for working with large datasets and complex data models.
Key Features of DAX:

➔ Calculated Columns: DAX formulas can be used to create new
columns within your data model based on existing data and
calculations. This allows you to manipulate and transform data to
meet your specific analysis needs.
➔ Measures: DAX measures are dynamic calculations that can be used
in visualizations and reports to summarize and analyze data. Measures
update automatically when data changes, ensuring your analysis is
always up-to-date.
(45)
➔ Time Intelligence Functions: DAX offers a powerful set of time
intelligence functions that enable you to analyze and visualize
trends, patterns, and seasonality in your data over time.
➔ Relationships and Filters: DAX formulas can leverage relationships
between tables and apply filters to specific data subsets, enabling you
to focus your analysis on relevant information.
➔ Logical and Conditional Operations: DAX supports various logical
and conditional operators, allowing you to build complex formulas
that perform different calculations based on specific conditions within
your data.
➔ Integration with Other Tools: DAX integrates seamlessly with other
Power BI features and tools, such as Power Query and Power
Pivot, enabling you to create a complete data analysis workflow.
Benefits of using DAX:

➔ Advanced Data Analysis: DAX unlocks deeper insights from your
data by enabling complex calculations and aggregations that are not
possible with traditional formulas.
➔ Dynamic and Flexible Measures: DAX measures update
automatically, ensuring your analysis reflects the latest data changes
and eliminating the need for manual recalculations.
➔ Data Exploration and Visualization: DAX empowers you to create
insightful and interactive visualizations that effectively communicate
data trends and patterns.
➔ Increased Efficiency: DAX automates repetitive tasks and
calculations, reducing manual effort and saving time.
➔ Improved Decision Making: By unlocking deeper insights from your
data, DAX empowers users to make data-driven decisions based on
reliable and accurate information.
Examples of DAX Applications:

➔ Finance: Calculating financial ratios, forecasting sales, and analyzing
profitability trends.
(46)
➔ Marketing: Analyzing campaign performance, identifying customer
segments, and measuring marketing ROI.
➔ Operations: Monitoring key performance indicators, identifying
bottlenecks, and optimizing business processes.
➔ Human Resources: Analyzing employee data, calculating HR
metrics, and predicting workforce trends.
➔ Sales: Analyzing sales performance, forecasting future sales, and
identifying customer churn risks.
5.3.5 POWER BI
Fig 5.4: Power Bi
Power BI is a cloud-based business intelligence (BI) platform developed by

Microsoft. It empowers users to connect to disparate data sources, visualize
their data through interactive dashboards and reports, and share their insights
with others. In this report, we delve into the key features and capabilities of
Power BI, highlighting its benefits and potential applications for data-driven
decision making.
Key Features of Power BI:

➔ Data Connectivity: Power BI connects to a wide range of data
sources, including relational databases, cloud services, Excel
spreadsheets, and even social media platforms. This allows users to
integrate all their relevant data into a single platform for holistic
analysis.
(47)
➔ Data Visualization: Power BI offers a rich collection of built-in
visualizations, including charts, graphs, maps, and custom visuals. These
visualizations enable users to explore their data in interactive and
visually appealing ways, facilitating deeper understanding and faster
decision making.
➔ Data Analysis: Power BI provides powerful data analysis
tools, including calculated columns, measures, and DAX
functions. These tools allow users to perform complex
calculations, aggregations, and logical operations on their data to
uncover hidden patterns and trends.
➔ Interactive Dashboards: Power BI facilitates the creation of interactive
dashboards that present key insights at a glance. Users can filter
data, drill down into details, and customize dashboards to meet their
specific needs.
➔ Collaborative Sharing: Power BI allows users to easily share their
dashboards and reports with others within their organization. This
enables collaboration and promotes informed decision making across
teams.
➔ Security and Governance: Power BI offers robust security features that
enable organizations to control access to data and ensure compliance
with regulatory requirements.
➔ Scalability and Flexibility: Power BI is built on a scalable cloud
architecture that can handle large datasets and complex analytical
workloads. It also offers flexible deployment options, including on-
premises, cloud, and hybrid deployments.
Benefits of using Power BI:

➔ Improved decision making: By providing clear and actionable
insights, Power BI empowers users to make data-driven decisions that
improve business outcomes.
➔ Increased efficiency: Power BI automates many tasks associated with
data analysis, saving users time and effort.
(48)
➔ Enhanced collaboration: Power BI facilitates collaboration by
allowing users to share insights and work together on data-driven
projects.
➔ Better communication: Power BI's interactive visualizations
effectively communicate complex data to stakeholders at all levels.
➔ Reduced costs: Power BI offers a cost-effective solution for data
analysis and visualization, eliminating the need for expensive BI tools
and consultants.
Applications of Power BI:

➔ Finance: Analyzing financial performance, identifying trends, and
forecasting financial results.
➔ Marketing: Measuring campaign effectiveness, understanding customer
behavior, and targeting marketing efforts.
➔ Operations: Optimizing processes, identifying bottlenecks, and
improving efficiency.
➔ Human Resources: Tracking employee performance, identifying talent
gaps, and developing training programs.
➔ Sales: Analyzing sales performance, identifying opportunities, and
forecasting future sales.
5.3.6 BEAUTIFULSOUP4
Fig 5.5: BeautifuSoup4
Beautiful Soup is an open-source Python library designed to parse and extract

data from HTML and XML documents. It provides a user-friendly interface
and a variety of functionalities for navigating and manipulating document
structures, making it a popular choice for web scraping projects.
Key Features of Beautiful Soup:
(49)
➔ Parsing: Beautiful Soup parses HTML and XML documents into tree-
like structures, allowing you to navigate and access elements based on
their tags, attributes, and relationships.
➔ Selection: You can easily select specific elements within the document
using various methods based on tags, classes, IDs, attributes, and text
content.
➔ Extraction: Beautiful Soup facilitates the extraction of specific data
points from the parsed document, such as text content, attribute
values, and table data.
➔ Navigation: You can efficiently navigate the document structure by
traversing through parent-child relationships, siblings, and descendants
of elements.
➔ Modification: Beautiful Soup allows you to modify the parsed
document structure by adding, removing, or editing elements and
attributes.
➔ Customization: You can customize the parsing process and extraction
behavior by defining filters, regular expressions, and custom functions.
➔ Integrations: Beautiful Soup integrates seamlessly with other Python
libraries, such as Requests and Selenium, for more complex web
scraping tasks.
Benefits of using Beautiful Soup:

➔ Ease of Use: Beautiful Soup provides a simple and intuitive
interface, making it accessible for users of all skill levels.
➔ Versatility: It supports parsing both HTML and XML
documents, offering flexibility for various web scraping tasks.
➔ Efficiency: Beautiful Soup streamlines the web scraping
process, automating repetitive tasks and saving time.
➔ Scalability: It handles large and complex websites efficiently, making
it suitable for scraping big data.
➔ Open-source: Beautiful Soup is an open-source library, freely
available for use and modification.
(50)
5.3.7 NumPy
Fig 5.6: NumPy
NumPy, short for Numerical Python, is a fundamental open-source library for

scientific computing in Python. It provides powerful tools for working with
multidimensional arrays and matrices, facilitating efficient computations and
data analysis tasks. This report delves into the key features and capabilities of
NumPy, highlighting its benefits and demonstrating its applications in various
scientific and engineering domains.
NumPy is a core library for scientific computing in Python. It builds upon the
built-in Python data types and provides efficient implementations of
multidimensional arrays (n-dimensional arrays) and matrices. These arrays
are optimized for various mathematical operations, including vectorization,
broadcasting, and linear algebra.
Key Features of NumPy:
➔ Multidimensional Arrays: NumPy provides the ndarray data
type, which represents multidimensional arrays with efficient storage
and manipulation capabilities.
➔ Mathematical Operations: NumPy offers a vast collection of built-in
functions for performing various mathematical operations like
addition, subtraction, multiplication, division, and element-wise
operations.
➔ Vectorization: NumPy allows for vectorized operations, where a
single operation is applied to all elements in an array
simultaneously, leading to significant performance gains compared to
traditional loops.
(51)
➔ Broadcasting: NumPy automatically broadcasts arrays of different
shapes to facilitate operations between them, eliminating the need for
explicit looping and reshaping.
➔ Linear Algebra: NumPy includes a comprehensive set of functions
for performing linear algebra operations like matrix
multiplication, inversion, and eigenvalue decomposition.
➔ Random Number Generation: NumPy provides functions for
generating random numbers from various distributions like
uniform, normal, and binomial, facilitating simulations and statistical
analysis.
➔ Integration with Other Libraries: NumPy seamlessly integrates with
other scientific libraries like Scikit-learn, matplotlib, and
Pandas, enabling collaborative data analysis and visualization.
Benefits of using NumPy:

➔ Efficiency: NumPy's optimized data structures and functions
significantly improve the performance of scientific computations
compared to native Python data types.
➔ Conciseness: NumPy's vectorized operations and broadcasting
capabilities enable concise and expressive code, reducing the need for
explicit loops and complex logic.
➔ Numerical Stability: NumPy utilizes high-precision data types and
algorithms to ensure accurate and stable numerical calculations.
➔ Interoperability: NumPy integrates seamlessly with other scientific
libraries, creating a comprehensive ecosystem for data analysis and
scientific computing.
➔ Ease of Use: NumPy's intuitive API and comprehensive
documentation make it relatively easy to learn and use even for
beginners in Python programming.
5.3.8 Matplotlib
Fig 5.7: Matplotib
(52)
Matplotlib is a fundamental and widely used Python library for creating
publication-quality data visualizations. Its versatility, ease of use, and
extensive functionalities make it a vital tool for researchers, analysts, and
anyone who wants to effectively communicate their data insights. This report
delves into the key features, applications, and benefits of Matplotlib,
highlighting its role in various scientific and data-driven fields.
Matplotlib is an open-source Python library that empowers users to create a
wide range of data visualizations, from simple line graphs and bar charts to
complex heatmaps and scatter plots. It provides a comprehensive API with
various functions and options for customizing plots, annotations, and styles.
Key Features of Matplotlib:

➔ Extensive Plot Types: Matplotlib offers a vast collection of built-in
plot types, including line plots, bar charts, scatter
plots, histograms, boxplots, heatmaps, and contour plots, enabling the
visualization of diverse data forms.
➔ Customization Options: Users can fine-tune virtually every aspect of
their visualizations, including plot
styles, colors, labels, ticks, legends, grid lines, and annotations.
➔ Subplots and Figure Management: Matplotlib allows for creating
multiple plots within a single figure and customizing their layout and
spacing for efficient data presentation.
➔ Interactivity: Matplotlib provides tools for creating interactive
visualizations with functionalities like mouse-over
events, zooming, and panning, enhancing user engagement and
exploration.
➔ Integration with Other Libraries: Matplotlib seamlessly integrates
with other scientific libraries like NumPy, Pandas, and SciPy, enabling
a comprehensive data analysis and visualization workflow.
➔ Cross-platform Compatibility: Matplotlib visualizations are
platform-independent and can be exported to various
formats, including PNG, JPEG, PDF, and SVG, for further sharing and
integration.
(53)
Benefits of using Matplotlib:
➔ Versatility: Matplotlib's extensive plot types and customization
options make it suitable for visualizing a wide range of data and
conveying diverse information.
➔ Ease of Use: The library offers a user-friendly API and comprehensive
documentation, allowing beginners and experienced users alike to
create effective visualizations.
➔ Large Community and Resource Base: Matplotlib boasts a vast and
active community, providing extensive documentation, tutorials, and
examples to support users at all skill levels.
➔ Open-source and Free: Being open-source and freely
available, Matplotlib eliminates licensing costs and enables
accessibility for a broad range of users.
➔ Publication-quality Output: Matplotlib generates high-resolution and
publication-ready visualizations, making it ideal for scientific
papers, presentations, and reports
5.4 DATABASE MANAGEMENT:

Cricket, a sport with a rich history and passionate following, generates a
wealth of data that can be analyzed to gain valuable insights. This data
can be used to improve player performance, predict outcomes, and
enhance fan engagement. However, effectively utilizing this data requires
a robust and well-managed database system.
Importance of Database Management in Cricket Data Analysis
A properly managed database plays a crucial role in ensuring the success
of any cricket data analysis project. It provides a structured and organized
way to store, retrieve, and analyze large amounts of data. By efficiently
managing the database, analysts can:
1. Improve Data Quality: A well-designed database can enforce data
integrity constraints, ensuring data accuracy and consistency. This is
essential for drawing reliable conclusions from the analysis.
2. Enhance Data Accessibility: A centralized database makes it easy
for authorized users to access and analyze data from different sources.
(54)
This facilitates collaboration and knowledge sharing among team
members.
3. Streamline Data Analysis: Effective database management reduces
the time and effort required to process and analyze data. This allows
analysts to focus on extracting insights and generating valuable
reports.
4. Enable Scalability: As the volume and variety of cricket data

continue to grow, a scalable database is crucial. This allows the
system to accommodate new data sources and analysis requirements
without compromising performance.
Database Design Considerations for Cricket Data Analysis

Several factors need to be considered when designing a database for
cricket data analysis:
1. Data Model: The data model defines the structure of the database,
including the entities, relationships, and attributes. Choosing the right
data model, such as relational or NoSQL, depends on the specific
needs of the project and the type of data being analyzed.
2. Data Standardization: Data standardization ensures consistency
across different data sources, facilitating efficient analysis. This
involves defining common formats and units for data elements.
3. Data Security: Protecting sensitive data is crucial. This requires
implementing appropriate security measures, such as access control
and encryption, to prevent unauthorized access and data breaches.
4. Data Backup and Recovery: Having a robust data backup and
recovery plan is essential to ensure data availability in case of system
failures or disasters.
Popular Database Technologies for Cricket Data Analysis
Several database technologies are commonly used for cricket data
analysis, each with its own strengths and weaknesses:
1. Relational Databases: These databases are well-established and
offer strong data consistency and integrity. Popular options include
MySQL, PostgreSQL, and Oracle Database.
(55)
2. NoSQL Databases: These databases are designed for handling large
amounts of unstructured data and can be more efficient in scaling up.
Examples include MongoDB, Cassandra, and HBase.
3. Cloud Databases: Cloud-based databases offer scalability,
flexibility, and cost-effectiveness. AWS RedShift, Microsoft Azure
SQL Database, and Google Cloud SQL are popular choices.
Best Practices for Database Management in Cricket Data Analysis

Projects
Here are some best practices for effectively managing databases in cricket
data analysis projects:
1. Define Clear Data Governance: Establish clear guidelines for data
ownership, access, and usage to ensure data quality and compliance.
2. Implement Data Quality Checks: Regularly perform data cleansing
and validation processes to remove errors and inconsistencies.
3. Optimize Database Performance: Regularly monitor database
performance and implement appropriate tuning strategies to ensure
efficient data retrieval and analysis.
4. Automate Data Pipelines: Automate data ingestion, processing, and
analysis tasks to improve efficiency and reduce manual intervention.
5. Utilize Data Visualization Tools: Leverage data visualization tools
to gain a deeper understanding of trends and patterns in the data.
6. Continuously Monitor and Review: Regularly review the database
structure and processes to ensure they are still aligned with the
project's evolving needs.
5.5 LIMITATIONS AND CHALLENGES:

Despite the vast amount of data available and the growing sophistication
of analytical tools, cricket data analysis still faces several limitations and
challenges.
5.5.1 LIMITATIONS:
These limitations can hinder the accuracy and effectiveness of analysis,
ultimately impacting decision-making and strategic planning.
(56)
1. Data Availability and Quality
One of the biggest challenges in cricket data analysis is the availability
and quality of data. This is particularly true for historical data, where
scoring systems and data recording methods have changed significantly
over time. Furthermore, access to granular data, such as detailed ball-by-
ball information, can be limited, especially for lower levels of the sport.
Data Availability Issues:
➔ Historical data: Scoring systems and data recording methods have
evolved significantly over time, making it difficult to compare data
across different eras.
➔ Limited access to granular data: Ball-by-ball data is often
unavailable for lower levels of cricket, hindering analysis of
individual performances and tactical nuances.
➔ Incomplete or inaccurate data: Data entry errors and
inconsistencies can significantly impact the reliability of analysis.
➔ Proprietary data: Data ownership rights can restrict access to
valuable information, limiting the scope of research and analysis.
Data Quality Issues:
➔ Inconsistent scoring: Variations in scoring across different leagues

and tournaments can make it difficult to compare performances
accurately.
➔ Missing data: Incomplete datasets can lead to biased results and
inaccurate conclusions.
➔ Unstructured data: Large amount of unstructured data, such as
video footage and commentary, require additional processing and
analysis expertise.
2. Data Bias and Interpretation

Another significant challenge in cricket data analysis is data bias and
interpretation. Biases can arise from various factors, such as:
➔ Selection bias: The data may only represent a specific subset of
players or teams, leading to skewed results.
(57)
➔ Measurement bias: The way data is collected and recorded can
introduce bias, such as favouring certain types of players or
performances.
➔ Confirmation bias: Analysts may unconsciously interpret data in a
way that supports their existing beliefs.
3. Data Interpretation Challenges:

➔ Identifying causal relationships: Establishing cause-and-effect
relationships from observational data can be difficult due to the
complex and multifaceted nature of cricket.
➔ Overfitting models: Overly complex models can lead to
overfitting, reducing the generalizability of the results to unseen
data.
➔ Ignoring context: Failing to consider the context of the game can
lead to misinterpretations and inaccurate conclusions.
4. Ethical Considerations:
As cricket data analysis becomes increasingly sophisticated, ethical
considerations become more important. These include:
➔ Data privacy: Protecting the privacy of players and other
individuals whose data is collected and analysed is essential.
➔ Algorithmic bias: Ensuring that algorithms used for analysis are
fair and unbiased is crucial to prevent discrimination and
unintended consequences.
➔ Transparency and accountability: Analysts should be
transparent about their methods and data sources and be held
accountable for the accuracy and fairness of their results.
5.5.2 OTHER LIMITATIONS AND CHALLENGES

➔ Limited computational resources: Analysing large datasets
requires significant computational power, which can be a barrier
for researchers and analysts with limited resources.
➔ Lack of domain expertise: Analysing cricket data effectively
requires in-depth knowledge of the sport and its nuances. Analysts
(58)
without a strong cricket background may face challenges
interpreting the data and drawing accurate conclusions.
➔ Rapidly evolving game: The rules and strategies of cricket can
change quickly, making it difficult to develop models that remain
relevant over time.
5.5.3 ADDRESSING THE LIMITATIONS AND CHALLENGES
Several steps can be taken to address the limitations and challenges of

cricket data analysis:
➔ Improving data collection and recording: Standardizing data

collection methods and ensuring accurate and consistent data
recording across different leagues and levels is crucial.
➔ Developing data sharing platforms: Creating platforms for sharing

data openly and ethically can facilitate collaboration and accelerate
research.
➔ Investing in advanced analytical techniques: Utilizing machine

learning and other advanced analytical techniques can help extract
more insights from complex datasets.
➔ Promoting ethical practices: Establishing clear guidelines for data

privacy, algorithmic fairness, and transparency is essential to ensure
responsible use of data in cricket analysis.
➔ Building a community of data-driven cricket analysts: Fostering

collaboration and knowledge exchange among data scientists,
statisticians, and cricket experts can lead to more effective and
impactful analysis.
5.6 SCOPE & FUTURE ENHANCEMENTS:

Future scope and enhancements for cricket data analysis using web
scraping, Python, Pandas, and Power BI:
1. DATA ACQUISITION:
Expand the data sources: Currently, web scraping might be limited to a
few websites. Consider incorporating data from more sources, like
(59)
official cricket boards, sports news platforms, and fantasy cricket
websites.
➔ Live data streaming: Integrate live data streaming APIs to
capture real-time match details and analyze player performance in
real-time.
➔ Social media analysis: Capture and analyze social media data
related to cricket matches to understand fan sentiment and trends.
2. Data analysis and visualization:
➔ Advanced statistical analysis: Implement advanced statistical
techniques like predictive modeling to forecast match
outcomes, player performance, and analyze player strengths and
weaknesses.
➔ Visualization enhancements: Utilize Power BI's advanced
visualization capabilities to create interactive
dashboards, heatmaps, and network graphs for deeper insights.
➔ Sentiment analysis: Analyze textual data from news
articles, social media, and fan forums to understand public opinion
about teams, players, and specific matches.
3. Machine learning and AI:
➔ Develop predictive models: Utilize machine learning algorithms
to predict match outcomes, player performances, and identify
potential upsets or unexpected wins.
➔ Player recommendation systems: Develop AI-powered systems
to recommend players for specific roles or team compositions
based on their past performances and playing styles.
➔ Automated data analysis: Implement machine learning
algorithms to automate data analysis tasks, saving time and
resources.
4. Additional enhancements:
➔ Interactive storytelling: Use Power BI's storytelling features to
create interactive reports that engage the audience and
communicate insights effectively.
➔ Mobile accessibility: Develop mobile-friendly dashboards and
reports for easy access and analysis on the go.
(60)
➔ Natural language interaction: Integrate natural language
processing to enable users to interact with the data and generate
insights using voice commands or natural language queries.
5. Ethical considerations:
➔ Data privacy: Ensure data is collected and used
ethically, respecting user privacy and complying with relevant data
regulations.
➔ Bias and fairness: Be aware of potential biases in data sources and
algorithms and strive to develop fair and unbiased models.
➔ Transparency and explainability: Explain the data analysis
methods and results clearly to ensure transparency and build trust
with users.
(61)
6. RESULT
This dynamic dashboard aids users in visualising the different performance
and growth pattern of different players.
Users can not only visualise the data but also can make their new teams
according to the performance analysis or according to their will. This
approach encourages experience of different cricket players.
This assessment provides personalised insights by analysing dashboard and

generating a percentage probability of potential cricket players according to
their roles. The visual representation helps users understand the overall
standings of different players.
The dashboard acts as a central hub, providing users with an intuitive

interface to access different team players and their performance. It allows
users to visually track the performance of various players. It is very
interesting as user can make a new team of their own according to the
performance of different players.
The dashboard also specifies several roles of cricketers as Power hitters,

Anchors, Finishers, Fast bowlers and All Rounders. A graph is also there
which shows the batting average and strike rate.
We can have the place for the openers, power raters etc., and we also have a
solid graph of it to understand the consistency and detailed information and
also a scatter plot which shows how their batting average fairs their strike
rate.
For example- in this plot Joes Buttler gives the highest average but he is a
good striker as well as he strikes at 140 plus which is a parameter, so firstly
we will select Joes Buttler because he is consistent and played all the matches
decently and for partner, Alex Hales is selected. We also need a left-hand
combination with a better strike rate. So, we will choose Rilee Russouw
because he can strike ball very nicely. So, they both will give 40 runs in
average at a strike rate of 150 plus and can stand for the average of 4 overs
which is all we need. So, this dashboard can give combined performance of
two players simultaneously. We will select three anchors now, so Virat Kohli
is first as he gives us lot of runs as seen from scattered plot .Second player
would be SuryaKumar Yadav because his average is good and striking at
180.So based on the statistics the fifth player we will choose will be Glenn
Philips because his strike rate is very high .For next place we will choose a
batting rounder and that is Sam Curran as his bowling average is very good
and Anrich Nortje and Shaheen Shah Afridi are also selected because they are
excellent fast bowlers. Next, we will choose because he is consistent and his
bowling average is good and batting is also good but we will choose at
number eight So at number seven, Sikander Raza because he has a high strike
rate and batting average. We will choose Marcus Stoinis because of batting
average. So final team on the basis of the analysis is as follows;
(62)
Total Players-11
1.Jos Buttler (wicket keeper)
2.Rilee Rossouw(batsman)
3.Virat Kohli (batsman)
4.Suryakumar Yadav(batsman)
5.Glenn Philips (batsman)
6.Marcus Stoinis (allrounder)
7.Sikandar Raza (allrounder)
8.Shadab Khan (allrounder)
9.Sam Curran(bowler)
10.Anrich Nortje(bowler)
11. Shaheen Shah Afridi(bowler)
We can also search the player result according to our will and can see the
detailed result and analysis which includes their batting style, playing role,
bowling style, strike rate and so on.
It analyses the overall performance of players in it. Users can not only
visualise the data but also can make their new teams according to the
performance analysis or according to their will. This approach encourages
experience of different cricket players
(63)
7. SUMMARY & CONCLUSIONS:
7.1 Summary
This dashboard is designed to analyze the performance of players in cricket. The
core features of this platform are designed to create a holistic analyzing
experience through the use of different technologies.
Cricket Data Analysis is a field that combines the sport of cricket and the use of
data analysis techniques to understand and improve performance. Cricket is a
highly competitive sport that has millions of fans worldwide, and as a result,
there is a vast amount of data generated from every match. This data provides
valuable information about the performance of teams and players that can be
used to make informed decisions.
The use of data analysis in cricket has been growing in recent years from player
performance analysis to team strategy and tactical decision-making, the insights
generated from data analysis can greatly improve the performance of teams and
players. This project aims to analyze the performance of cricket teams and
players using statistical methods and provide insights into their strengths and
weaknesses. The data used for this analysis will be collected from various
sources and will include information such as runs scored, and other relevant
performance metrics.
Cricket data analysis in modern sports holds immense significance as it provides
a systematic and data-driven approach to understanding and improving player
performance, team strategies, and overall game dynamics.
This project will demonstrate the importance of data analysis in the sport of
cricket and how it can be used to gain a competitive advantage.
Cricket, beyond its athletic prowess, is a game deeply intertwined with statistics.
Every ball bowled, every run scored, and every wicket taken contributes to an
intricate tapestry of data that holds the key to understanding player performance,
team dynamics, and strategic nuances. In this project, we embark on an end-to-
end data analytics journey, leveraging web scraping, Python, Pandas, and Power
BI to unravel the mysteries hidden within the vast expanse of cricket statistics.
The dashboard acts as a central hub, providing users with an intuitive interface to
access different team players and their performance. It allows users to visually
track the performance of various players. It is very interesting as user can make a
new team of their own according to the performance of different players.
(64)
This dynamic dashboard aids users in visualizing the different performance and
growth pattern of different players.
Users can not only visualize the data but also can make their new teams
according to the performance analysis or according to their will. This approach
encourages experience of different cricket players.
7.2 Conclusion:
In conclusion, we have made a cricket dashboard using technologies like power
BI, git, Jupiter notebook, pandas and web scraping.
In this project, there is a dashboard which shows complete analysis of different

players which have a different role in cricket. It analyses the overall performance
of players in it.
Users can not only visualize the data but also can make their new teams
according to the performance analysis or according to their will. This approach
encourages experience of different cricket players.
Future scope and enhancements for cricket data analysis using web scraping,
Python, Pandas, and Power BI
➔ Live data streaming: Integrate live data streaming APIs to
capture real-time match details and analyse player performance in
real-time.
➔ Social media analysis: Capture and analyse social media data
related to cricket matches to understand fan sentiment and trends.
➔ Visualization enhancements: Utilize Power BI's advanced
visualization capabilities to create interactive
dashboards, heatmaps, and network graphs for deeper insights.
➔ Develop predictive models: Utilize machine learning algorithms
to predict match outcomes, player performances, and identify
potential upsets or unexpected wins.
➔ Player recommendation systems: Develop AI-powered systems
to recommend players for specific roles or team compositions
based on their past performances and playing styles.
(65)
➔ Interactive storytelling: Use Power BI's storytelling features to
create interactive reports that engage the audience and
communicate insights effectively.
As we move forward, continuous user feedback, data analysis and potential

expansions to the dashboard will be essential for refining and enhancing the
platform’s effectiveness.
(66)
REFERENCES:
Websites:
1. https://learn.microsoft.com/en-us/dax/
2. https://learn.microsoft.com/en-us/training/paths/dax-power-bi/
3. https://pandas.pydata.org/docs/
4. https://www.python.org/
5. https://researchgate.net/
6. https://www.tigeranalytics.com/blog/magic-off-pitch-role-data-analytics-
cricket/
Books:
1. Building Dashboards and Data Stories with Power BI by Alberto Ferrari
2. Data Cleaning with Python: Practical Techniques for Converting Messy Data
into Useful Insights by Jason Grout (2 editions)
3. Extracting Cricket Data from Unstructured Text using Web Scraping and
Natural Language Processing by Shaik Abdul Raheem et al. (2022)
4. Microsoft Power BI for Beginners: A Hands-on Guide to Data Visualization
and Business Intelligence by Rob Collie
5. Power BI and Excel: The Ultimate Guide to Data Visualization and Business
Intelligence by Reza Rad
6. Python for Data Analysis, 2nd Edition by Wes McKinney
7. Web Scraping with Python: Collecting Data from the Modern Web by Ryan
Mitchell
Research Papers:
1. Cricket Score Data Analysis by Mohammed Wahaj Arif Baji, Mohammad
Minhaj Arif Baji, and MD Suhail (2023)
2. Data analysis of cricket scores: ICC Men’s T20 World Cup 2022/2023
3. The application of data analytics in cricket by Drury, J., & Collins, K. (2019)
4. The Impact of Data Analytics on Cricket Performance: A Case Study by
S. Singh et al. (2017)
5. Visualization of Cricket Data using Power BI for Performance Analysis by
K. Patel et al. (2019)
(67)
Other Resources:
1. Cricket Analytics by Game by Harsha Perea (Simon Fraser University, Fall
2015)
2. Cricket Data Analysis using Python and R: A Hands-on Tutorial by M. Khan
et al. (2016)
3. Drury, J., & Collins, K. (2019). The application of data analytics in cricket.
Sports
(68)
APPENDIX A
Report View
(69)
(70)
(71)
(72)
Model View
(73)
Dataset View
Match Summary:
(74)
Batting Summary:
(75)
(76)
Bowling Summary:
(77)
Player Information:
(78)
APPENDIX B
1. WEB SCRAPING CODE:

BATTING SUMMARY:
/* -------------- STAGE 1 ------------ */
//------- 1.a Interaction Code ------ //

navigate('https://stats.espncricinfo.com/ci/engine/records/team/match_results.html?id=14450;type=tou
rnament');
let links = parse().matchSummaryLinks;

for(let i of links) {
next_stage({url: i})
}
//------- 1.b Parser Code ------------//

let links = []
const allRows = $('table.engineTable > tbody > tr.data1');
allRows.each((index, element) => {
const tds = $(element).find('td');
const rowURL = "https://www.espncricinfo.com" +$(tds[6]).find('a').attr('href');
links.push(rowURL);
})
return {
'matchSummaryLinks': links
};
/* -------------- STAGE 2 ------------ */

navigate(input.url);
collect(parse());
//------- 2.b Parser Code ------------//

var match = $('div').filter(function(){
return $(this)
.find('span > span > span').text() === String("Match Details")
}).siblings()
team1 = $(match.eq(0)).find('span > span > span').text().replace(" Innings", "")
matchInfo = team1 + ' Vs ' + team2
var tables = $('div > table.ci-scorecard-table');

var firstInningRows = $(tables.eq(0)).find('tbody > tr').filter(function(index, element){
return $(this).find("td").length >= 8
})
var secondInningsRows = $(tables.eq(1)).find('tbody > tr').filter(function(index, element){

});
var battingSummary = []
firstInningRows.each((index, element) => {
var tds = $(element).find('td');
battingSummary.push({
"match": matchInfo,
(79)
"teamInnings": team1,
"battingPos": index+1,
"batsmanName": $(tds.eq(0)).find('a > span > span').text().replace(' ', ''),
"dismissal": $(tds.eq(1)).find('span > span').text(),
"runs": $(tds.eq(2)).find('strong').text(),
"balls": $(tds.eq(3)).text(),
"4s": $(tds.eq(5)).text(),
"6s": $(tds.eq(6)).text(),
"SR": $(tds.eq(7)).text()
});
});
secondInningsRows.each((index, element) => {

battingSummary.push({
"match": matchInfo,
"teamInnings": team2,
"battingPos": index+1,
"batsmanName": $(tds.eq(0)).find('a > span > span').text().replace(' ', ''),
"dismissal": $(tds.eq(1)).find('span > span').text(),
"runs": $(tds.eq(2)).find('strong').text(),
"balls": $(tds.eq(3)).text(),
"4s": $(tds.eq(5)).text(),
"6s": $(tds.eq(6)).text(),
"SR": $(tds.eq(7)).text()
});
});
return {"battingSummary": battingSummary}
BOWLING SUMMARY:
/* -------------- STAGE 1 ------------ */

rnament');
let links = parse().playersLinks;

}
//------- 1.b Parser Code ------------//

let links = []
links.push(rowURL);
})
return {
'playersLinks': links
};
/* -------------- STAGE 2 ------------ */

(80)
collect(parse());
//---------- 2.b Parser Code ---------//

return $(this)
}).siblings()
matchInfo = team1 + ' Vs ' + team2
var tables = $('div > table.ds-table');

})

});
var bowlingSummary = []
bowlingSummary.push({
"match": matchInfo,
"bowlingTeam": team2,
"bowlerName": $(tds.eq(0)).find('a > span').text().replace(' ', ''),
"overs": $(tds.eq(1)).text(),
"maiden": $(tds.eq(2)).text(),
"runs": $(tds.eq(3)).text(),
"wickets": $(tds.eq(4)).text(),
"economy": $(tds.eq(5)).text(),
"0s": $(tds.eq(6)).text(),
"4s": $(tds.eq(7)).text(),
"6s": $(tds.eq(8)).text(),
"wides": $(tds.eq(9)).text(),
"noBalls": $(tds.eq(10)).text()
});
});

bowlingSummary.push({
"match": matchInfo,
"bowlingTeam": team1,
"bowlerName": $(tds.eq(0)).find('a > span').text().replace(' ', ''),
"overs": $(tds.eq(1)).text(),
"maiden": $(tds.eq(2)).text(),
"runs": $(tds.eq(3)).text(),
"wickets": $(tds.eq(4)).text(),
"economy": $(tds.eq(5)).text(),
"0s": $(tds.eq(6)).text(),
"4s": $(tds.eq(7)).text(),
"6s": $(tds.eq(8)).text(),
"wides": $(tds.eq(9)).text(),
"noBalls": $(tds.eq(10)).text()
});
});
(81)
return {"bowlingSummary": bowlingSummary}
MATCH RESULTS:
/* -------------- STAGE 1 ------------ */
rnament');
collect(parse());
//------- 1.b Parser Code ------------//

//Step1: create an array to store all the records
let matchSummary = []
//Step2: Selecting all rows we need from target table

//Step3: Looping through each rows and get the data from the cells(td)
const tds = $(element).find('td'); //find the td
matchSummary.push({
'team1': $(tds[0]).text(),
'team2': $(tds[1]).text(),
'winner': $(tds[2]).text(),
'margin': $(tds[3]).text(),
'ground': $(tds[4]).text(),
'matchDate': $(tds[5]).text(),
'scorecard': $(tds[6]).text()
})
})
// step4: Finally returning the data

return {
"matchSummary": matchSummary
};
PLAYER INFORMATION:
/* -------------- STAGE 1 ------------ */
rnament');
let links = parse().matchSummaryLinks;

}
//------- 1.b Parser Code ------------//
(82)
let links = []
links.push(rowURL);
})
return {
'matchSummaryLinks': links
};
/* ------------ STAGE 2 -------------- */

let playersData = parse().playersData;

for(let obj of playersData) {
name = obj['name']
team = obj['team']
url = obj['link']
next_stage({name: name, team: team, url: url})
}
//---------- 2.b Parser Code ---------//

//to store all the players in a list
var playersLinks = []

return $(this)
}).siblings()
//for batting players

var tables = $('div > table.ci-scorecard-table');
})

});

playersLinks.push({
"name": $(tds.eq(0)).find('a > span > span').text().replace(' ', ''),
"team": team1,
"link": "https://www.espncricinfo.com" + $(tds.eq(0)).find('a').attr('href')
});
});

playersLinks.push({
"name": $(tds.eq(0)).find('a > span > span').text().replace(' ', ''),
(83)
"team": team2,
});
});
//for bowling players
var tables = $('div > table.ds-table');

})

});

playersLinks.push({
"name": $(tds.eq(0)).find('a > span').text().replace(' ', ''),
"team": team2.replace(" Innings", ""),
});
});

playersLinks.push({
"name": $(tds.eq(0)).find('a > span').text().replace(' ', ''),
"team": team1.replace(" Innings", ""),
});
});
return {"playersData": playersLinks}
/* ------------- STAGE 3 ------------ */
final_data = parse()
collect(
{
"name": input.name,
"team": input.team,
"battingStyle": final_data.battingStyle,
"bowlingStyle": final_data.bowlingStyle,
"playingRole": final_data.playingRole,
"description": final_data.content,
});
//---------- 3.b Parser Code ---------//
const battingStyle = $('div.ds-grid > div').filter(function(index){
return $(this).find('p').first().text() === String('Batting Style')
})
const bowlingStyle = $('div.ds-grid > div').filter(function(index){
return $(this).find('p').first().text() === String('Bowling Style')
(84)
})
const playingRole = $('div.ds-grid > div').filter(function(index){
return $(this).find('p').first().text() === String('Playing Role')
})
return {
"battingStyle": battingStyle.find('span').text(),
"bowlingStyle": bowlingStyle.find('span').text(),
"playingRole": playingRole.find('span').text(),
"content": $('div.ci-player-bio-content').find('p').first().text()
}
2. DATA PREPROCESSING CODE

PROCESS MATCH RESULT:
(85)
PROCESS BATTING SUMMARY:
(86)
(87)
PROCESS BOWLING SUMMARY:
(88)
PROCESS PLAYER INFORMATION:
(89)

Minor Report Content

Uploaded by

Copyright:

Available Formats

Minor Report Content

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Minor Report Content

Uploaded by

Copyright:

Available Formats

1.

International cricket is played today by a number of British Commonwealth

Figure 1.2: The layout of the pitch

1.2 CRICKET DATA ANALYSIS:

Cricket data analysis in modern sports holds immense significance as it provides a

1. Performance Optimization: Analysing player statistics allows teams to

3. Injury Prevention: Monitoring players' workload and performance metrics

4. Scouting and Recruitment: Data analysis is invaluable in scouting new

5. Fan Engagement: Cricket data analysis contributes to a richer fan

8. Performance Benchmarking: Comparing players and teams across different

1.3Aim and Objectives

A. Data Collection (Web Scraping):

B. Data Cleaning and Preprocessing:

C. Exploratory Data Analysis (EDA):

F. Dashboard Creation (Power BI):

G. User Interaction and Filtering:

H. Insights and Recommendations:

2.2 DESIGN CONSIDERATIONS

2.3 REQUIREMENT SCOPING:

PARAMETERS DESCRIPTION CRITERIA

Batting average Average runs scored in an inning. >30

Strike rate No. of runs scored per 100 balls. >140

Innings batted Total innings batted. >3

Boundary % of runs scored in boundaries. >50

Table 2.1: Parameters for selecting openers for a match.

B. ANCHORS/MIDDLE ORDER (3 PLAYERS):

PARAMETERS DESCRIPTION CRITERIA

Batting average Average runs scored in an inning. >40

Strike rate No. of runs scored per 100 balls. >125

Innings batted Total innings batted. >3

Average balls Average balls faced by the batter in >20

Table 2.2: Parameters for selecting anchors for a match.

C. FINISHER/LOWER ORDER ANCHOR (1 PLAYERS):

PARAMETERS DESCRIPTION CRITERIA

Batting average Average runs scored in an inning. >25

Strike rate No. of runs scored per 100 balls. >130

Innings batted Total innings batted. >3

Average balls Average balls faced by the batter in >12

Innings bowled Total innings bowled by the >1

D. ALL ROUNDERS (2 PLAYERS):

PARAMETERS DESCRIPTION CRITERIA

Batting average Average runs scored in an inning. >15

Strike rate No. of runs scored per 100 balls. >140

Innings batted Total innings batted. >2

Batting position Order in which batter played >4

E. SPECIALIST FAST BOWLERS (3 PLAYERS):

PARAMETERS DESCRIPTION CRITERIA

Innings bowled Total innings bowled. >4

Bowling Average runs allowed per over. <7

Bowling Number of runs allowed per <20

3.1.1 CLIENT-SERVER ARCHITECTURE:

Fig 3.1. Client-Server Architecture

Fig 3.2. Entity-Relationship Diagram

Fig 3.3. Use Case Diagram

Fig 3.4. Level 0 DFD

➔ Central process representing the overall cricket data analytics project.

➔ Web Data Source: Represents the cricket website as an external source of

➔ Python Script: Handles data extraction from the web source.

➔ Raw Cricket Data: Stores the web-scraped data.

➔ Processed Data: Stores data manipulated using Python and Pandas.