Data Science Project 1
Data Science Project 1
Data Science Project 1
Management
Project Title: Performance Analysis of Formula 1 Teams. (Project 1)
Group 12
1
1 Introduction
For Novel Data Set Collection, data on Formula 1 teams and drivers from
different sources has been pooled together to create a novel database.
• The performance data for all the teams and drivers has been scrapped from
ergast.com API which tracks all F1 data on drivers and constructors
performance. Historical data on race results, lap time, pit stop time, driver
information, constructor information and fastest lap time are considered. The
time frame of the data is 2003 to 2023 as that is the most relevant data
available.
• The financial data which will be used in this report has been downloaded
from yahoo finance. The time frame of the data is taken from 2021 to 2023 for
relevance to future predictive analysis.
• Data which will be used for textual analysis has been scrapped by
Developing a Python-based web scraping tool using Selenium WebDriver for
automated web navigation and BeautifulSoup for HTML parsing, aimed at
collecting news articles related to Formula 1 teams: Ferrari, Alpine, Aston
Martin, and Mercedes including personnel changes (racers, technical staff,
CEOs, team principals), new sponsorships and partnerships, car model
launches, and terminations of sponsorships or partnerships.
2
3 Database creation and querying
• For our analysis, two databases have been created to store all the data. The
first one is called f1 database which contains all the race performance data and
financial data and the second one is called f1 news which contains data from
various news sources.
• Queries have been executed to extract data for each table to perform
exploratory data analysis, data cleaning, and model building.
• Further, summary statistics were generated using queries to gain deeper
insights into our novel data set.
• Table 1 displays the numbers of race wins for every team from 2003 to 2023
and their average career qualifying position meaning what position they start
the race.
• Table 2 displays the drivers who won the championship from 2003 to 2023 by
scoring most number of point and the team of the driver.
3
Table 2: Formula 1 Drivers Championship
2003-2023.
The required steps to clean, check and organize the data are as follows:
4
checked the range for values and identified any outliers.
• To simplify the analysis, the data has been organized by taking average race
pace for each driver for each race for every season. Normalization has been
done to make the data consistent for format and output result.
• The financial data has been organized as monthly stock price data and
aligned with the timeline of performance analysis.
• The textual data from news sources has been cleaned and organised through
stop word removal, stemming, lemmatisation, and tokenisation.
5 Codes
def create_driver_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS drivers
(id TEXT PRIMARY KEY,
first_name TEXT,
last_name TEXT,
nationality TEXT)''')
def create_constructor_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS constructors
(id TEXT PRIMARY KEY,
name TEXT)''')
def create_track_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS tracks
(id INTEGER PRIMARY KEY,
locality TEXT,
country TEXT,
name TEXT)''')
def create_results_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS results
(id INTEGER PRIMARY KEY AUTOINCREMENT,
driver_id TEXT,
position INTEGER,
grid INTEGER,
number INTEGER,
constructor_id TEXT,
race_track_id INTEGER,
points INTEGER,
race_year DATE,
FOREIGN KEY (driver_id) REFERENCES drivers(id),
FOREIGN KEY (constructor_id) REFERENCES constructors(id),
5
FOREIGN KEY (race_track_id) REFERENCES tracks(id))''')
if driver is None:
cursor.execute("INSERT INTO drivers (first_name, last_name, id) VALUES (?, ?, ?)",
(driver_first_name, driver_last_name, driver_id))
conn.commit()
id = cursor.lastrowid
print(f"Driver {driver_first_name} {driver_last_name} inserted into the database.")
else:
id = driver[0]
print(f"Exists already")
return id
if exists is None:
cursor.execute("INSERT INTO constructors (id, name) VALUES (?, ?)",
(constructor_id, constructor_name))
conn.commit()
id = cursor.lastrowid
print(f"Constructor '{constructor_name}' with id '{constructor_id}' inserted successfully.")
else:
print(f"Exists already")
id = exists[0]
return id
if track_exists is None:
cursor.execute('INSERT INTO tracks (locality, country, name) VALUES (?, ?, ?)',
(locality, country, name))
conn.commit()
id = cursor.lastrowid
print(f"Track '{name}' inserted successfully.")
else:
id = track_exists[0]
print(f"Track '{name}' already exists in the database.")
return id
6
conn.commit()
print("Result inserted successfully.")
offset += limit
def print_all_results_group_by_year(conn):
7
cursor = conn.cursor()
query = '''
SELECT r.race_year, d.first_name, d.last_name, r.position, r.grid, r.number, c.name as
constructor_name, t.name as track_name
FROM results r
JOIN drivers d ON r.driver_id = d.id
JOIN constructors c ON r.constructor_id = c.id
JOIN tracks t ON r.race_track_id = t.id
ORDER BY r.race_year, r.id
'''
cursor.execute(query)
results = cursor.fetchall()
current_year = None
for result in results:
race_year, first_name, last_name, position, grid, number, constructor_name,
track_name = result
if race_year != current_year:
print(f"\nYear: {race_year}")
current_year = race_year
ns = {'mrd': 'http://ergast.com/mrd/1.5'}
conn = sqlite3.connect('f1_database.db')
create_driver_table(conn)
create_constructor_table(conn)
create_track_table(conn)
create_results_table(conn)
print_all_results_group_by_year(conn)
conn.close()
import requests
import sqlite3
import xml.etree.ElementTree as ET
def create_driver_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS drivers
(id TEXT PRIMARY KEY,
first_name TEXT,
last_name TEXT,
nationality TEXT)''')
def create_laps_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS laps
(id INTEGER PRIMARY KEY,
driver TEXT,
8
position INTEGER,
time TEXT,
track_id INTEGER,
lap_number INTEGER,
year DATE,
FOREIGN KEY (track_id) REFERENCES tracks(id))''')
def create_constructor_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS constructors
(id TEXT PRIMARY KEY,
name TEXT)''')
def create_track_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS tracks
(id INTEGER PRIMARY KEY,
locality TEXT,
country TEXT,
name TEXT)''')
def create_results_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS results
(id INTEGER PRIMARY KEY AUTOINCREMENT,
driver_id TEXT,
position INTEGER,
grid INTEGER,
number INTEGER,
constructor_id TEXT,
race_track_id INTEGER,
points INTEGER,
race_year DATE,
FOREIGN KEY (driver_id) REFERENCES drivers(id),
FOREIGN KEY (constructor_id) REFERENCES constructors(id),
FOREIGN KEY (race_track_id) REFERENCES tracks(id))''')
if driver is None:
cursor.execute("INSERT INTO drivers (first_name, last_name, id) VALUES (?, ?, ?)",
(driver_first_name, driver_last_name, driver_id))
conn.commit()
id = cursor.lastrowid
print(f"Driver {driver_first_name} {driver_last_name} inserted into the database.")
else:
id = driver[0]
print(f"Exists already")
return id
if exists is None:
cursor.execute("INSERT INTO constructors (id, name) VALUES (?, ?)",
9
(constructor_id, constructor_name))
conn.commit()
id = cursor.lastrowid
print(f"Constructor '{constructor_name}' with id '{constructor_id}' inserted successfully.")
else:
print(f"Exists already")
id = exists[0]
return id
if track_exists is None:
cursor.execute('INSERT INTO tracks (locality, country, name) VALUES (?, ?, ?)',
(locality, country, name))
conn.commit()
id = cursor.lastrowid
print(f"Track '{name}' inserted successfully.")
else:
id = track_exists[0]
print(f"Track '{name}' already exists in the database.")
return id
10
while offset < total:
paginated_url = f"{base_url}?limit={limit}&offset={offset}"
response = requests.get(paginated_url)
xml_data = response.content
root = ET.fromstring(xml_data)
for race in root.findall(".//mrd:Race", ns):
circuit = race.find("mrd:Circuit", ns)
location = circuit.find("mrd:Location", ns)
year = race.get("season")
offset += limit
def print_all_results_group_by_year(conn):
cursor = conn.cursor()
query = '''
SELECT r.race_year, d.first_name, d.last_name, r.position, r.grid, r.number, c.name as
constructor_name, t.name as track_name
FROM results r
JOIN drivers d ON r.driver_id = d.id
JOIN constructors c ON r.constructor_id = c.id
JOIN tracks t ON r.race_track_id = t.id
ORDER BY r.race_year, r.id
'''
cursor.execute(query)
results = cursor.fetchall()
current_year = None
for result in results:
race_year, first_name, last_name, position, grid, number, constructor_name,
track_name = result
if race_year != current_year:
print(f"\nYear: {race_year}")
current_year = race_year
def populate_race_db(conn):
base_url = f'http://ergast.com/api/f1/{year}/results'
# Initial fetch to determine pagination
response = requests.get(base_url)
xml_data = response.content
11
root = ET.fromstring(xml_data)
print(f"Fetching data from year - {year}")
# Pagination details
total = int(root.attrib['total'])
limit = int(root.attrib['limit'])
offset = 0
constructor_name, constructor_id)
offset += limit
def print_laps_with_track(conn):
cursor = conn.cursor()
query = '''
SELECT l.id, l.driver, l.position, l.time, l.year, t.locality, t.country, t.name
FROM laps l
JOIN tracks t ON l.track_id = t.id
ORDER BY l.id
'''
cursor.execute(query)
12
results = cursor.fetchall()
ns = {'mrd': 'http://ergast.com/mrd/1.5'}
conn = sqlite3.connect('f1_database.db')
# create_driver_table(conn)
# create_constructor_table(conn)
create_track_table(conn)
create_laps_table(conn)
# # create_results_table(conn)
# print_all_results_group_by_year(conn)
print_laps_with_track(conn)
conn.close()
#Inserting financial data into the database.
import pandas as pd
import sqlite3
#Reading the CSV file
df = pd.read_csv('RACE.csv')
#Clean up
df.columns.str.strip()
connection = sqlite3.connect('f1_database.db')
df.to_sql('Ferrari_stock', connection, if_exists='replace')
connection.close()
13
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3")
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)
# List of URLs
urls = ['https://www.carandbike.com/news/f1-ferrari-to-develop-sf21-till-june-2021-2414332',
'https://www.businesswire.com/news/home/20210617005933/en/Ferrari-Selects-AWS-as-its-Official-
Cloud-Provider-to-Power-Innovation-on-the-Road-and-Track'
,'https://www.planetf1.com/news/controversial-mission-winnow-dropped-ferrari',
'https://www.carandbike.com/news/ferrari-discussing-new-f1-deal-with-philip-morris-despite-
mission-winnow-eu-ban-2471177','https://www.carandbike.com/news/ferrari-discussing-new-
f1-deal-with-philip-morris-despite-mission-winnow-eu-ban-2471177','https://sportsmintmedia.com/
formula-1-ferrari-signs-cloud-partnership-deal-with-amazon-web-services/',
'https://www.fia.com/news/f1-verstappen-quickest-red-bull-ring-ahead-ferraris-leclerc-and-sainz',
'https://www.the-race.com/formula-1/ferrari-to-use-generational-new-simulator-for-22-f1-car/',
'https://www.pmw-magazine.com/news/team-news/ferrari-completes-install-of-new-dil-simulator-for-
f1-team.html','https://www.pmw-magazine.com/news/team-news/ferrari-completes-install-of-new-
dil-simulator-for-f1-team.html',
'https://www.racefans.net/2021/08/09/ferrari-power-unit-upgrade-significant-step-f1-2021/',
'https://us.motorsport.com/f1/news/how-ferraris-new-gearbox-casing-helped-boost-its-f1-aero/6653646/',
'https://www.formula1.com/en/latest/article.ferrari-to-debut-new-engine-in-russia-forcing-
leclerc-to-start-from-back-of.NsUPIl5I66ZIol5eNMGKE.html',
'https://www.autosport.com/f1/news/sainz-calls-on-ferrari-to-analyse-recent-f1-pit-errors/6727337/',
'https://www.santander.com/en/press-room/press-releases/2021/12/santander-agrees-a-multi-year-
partnership-with-scuderia-ferrari',
'https://www.the-race.com/formula-1/ferrari-drops-mission-winnow-name-still-in-philip-morris-talks/'
if not skip_paragraph:
c.execute("INSERT INTO articles (url, paragraph) VALUES (?, ?)", (url, paragraph.text))
conn.commit()
14
# Cleanup
conn.close()
driver.quit()
# Note: This code has been used by making adjustments to url and file names mutiple times
15