Data Science Project 1

FIN42110: Data Science for Trading and Risk
Management
Project Title: Performance Analysis of Formula 1 Teams. (Project 1)
Group 12
Harsh Desai - 23205088

Jay Milind Kelkar - 23202493
Runqi Xue - 23206038
1
1 Introduction
Formula 1 is the pinnacle of motorsport, it displays a fusion of cutting-edge

technology, strategic prowess, and exceptional skill. In the world of Formula 1,
teams operate at the forefront of innovation, constantly pushing the boundaries
to gain a competitive edge. This data science project endeavours to dive into
the multifaceted landscape of Formula 1 by undertaking a comprehensive
analysis of team performance on and off the track.
By combining on-track performance metrics, financial insights, and sentiment

analysis, this project aims to unveil hidden patterns and correlations. The
synthesis of these diverse datasets may yield valuable insights into the holistic
nature of Formula 1 team dynamics. We anticipate to uncover strategies that
contribute to success, understanding the delicate balance between financial
investments and racing achievements, and understanding the impact of media
sentiment on team morale.
2 Novel Data Set Collection
For Novel Data Set Collection, data on Formula 1 teams and drivers from
different sources has been pooled together to create a novel database.
• The performance data for all the teams and drivers has been scrapped from
ergast.com API which tracks all F1 data on drivers and constructors
performance. Historical data on race results, lap time, pit stop time, driver
information, constructor information and fastest lap time are considered. The
time frame of the data is 2003 to 2023 as that is the most relevant data
available.
• The financial data which will be used in this report has been downloaded
from yahoo finance. The time frame of the data is taken from 2021 to 2023 for
relevance to future predictive analysis.
• Data which will be used for textual analysis has been scrapped by
Developing a Python-based web scraping tool using Selenium WebDriver for
automated web navigation and BeautifulSoup for HTML parsing, aimed at
collecting news articles related to Formula 1 teams: Ferrari, Alpine, Aston
Martin, and Mercedes including personnel changes (racers, technical staff,
CEOs, team principals), new sponsorships and partnerships, car model
launches, and terminations of sponsorships or partnerships.
2
3 Database creation and querying
• For our analysis, two databases have been created to store all the data. The
first one is called f1 database which contains all the race performance data and
financial data and the second one is called f1 news which contains data from
various news sources.
• Queries have been executed to extract data for each table to perform
exploratory data analysis, data cleaning, and model building.
• Further, summary statistics were generated using queries to gain deeper
insights into our novel data set.
• Table 1 displays the numbers of race wins for every team from 2003 to 2023
and their average career qualifying position meaning what position they start
the race.
• Table 2 displays the drivers who won the championship from 2003 to 2023 by
scoring most number of point and the team of the driver.
Table 1: Grand Prix Wins 2003-2023.
Constructor Number of Wins Average Qualifying

Ferrari 83 6.002
McLaren 47 8.851
Mercedes 116 4.704
Red Bull 92 6.589
Williams 6 11.976
Renault 20 10.017
Brawn 7 5.242
Lotus 2 10.777
Toro Rosso 1 13.659
Alpha Tauri 1 10.487
BMW Sauber 1 8.928
Racing Point 1 11.253
Alpine 1 10.264
Jordan 1 16.352
Honda 1 12.361
3
Table 2: Formula 1 Drivers Championship
2003-2023.
Year Driver Constructor

2003 Michael Schumacher Ferrari
2004 Michael Schumacher Ferrari
2005 Fernando Alonso Renault
2006 Fernando Alonso Renault
2007 Kimi Raikkonen Ferrari
2008 Lewis Hamilton McLaren
2009 Jenson Button Brawn
2010 Sebastian Vettel Red Bull
2014 Lewis Hamilton Mercedes
2016 Nico Rosberg Mercedes
2021 Max Verstappen Red Bull
4 Data Cleaning, Checking and Organisation
The required steps to clean, check and organize the data are as follows:
• For track performance analysis we have considered parameters such as

qualifying grid position, race finish position, lap time, points scored, fastest lap
time, driver and constructor information. In order to understand financial
relevance and position we have considered stock prices of the owner/partner of
formula 1 teams which are publicly traded and for textual analysis we have
used news article on teams and drivers.
• Raw data on performance has been cleaned by checking for missing or
abnormal values and filtering out irrelevant data. In order to do so we have
4
checked the range for values and identified any outliers.
• To simplify the analysis, the data has been organized by taking average race
pace for each driver for each race for every season. Normalization has been
done to make the data consistent for format and output result.
• The financial data has been organized as monthly stock price data and
aligned with the timeline of performance analysis.
• The textual data from news sources has been cleaned and organised through
stop word removal, stemming, lemmatisation, and tokenisation.
5 Codes
#Extracting Race results Data

import requests
import sqlite3
import xml.etree.ElementTree as ET
def create_driver_table(conn):
cursor = conn.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS drivers
(id TEXT PRIMARY KEY,
first_name TEXT,
last_name TEXT,
nationality TEXT)''')
def create_constructor_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS constructors
name TEXT)''')
def create_track_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS tracks
(id INTEGER PRIMARY KEY,
locality TEXT,
country TEXT,
name TEXT)''')
def create_results_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS results
(id INTEGER PRIMARY KEY AUTOINCREMENT,
driver_id TEXT,
position INTEGER,
grid INTEGER,
number INTEGER,
constructor_id TEXT,
race_track_id INTEGER,
points INTEGER,
race_year DATE,
FOREIGN KEY (driver_id) REFERENCES drivers(id),
FOREIGN KEY (constructor_id) REFERENCES constructors(id),
5
FOREIGN KEY (race_track_id) REFERENCES tracks(id))''')
def insert_driver_if_not_exists(conn, driver_first_name, driver_last_name, driver_id):

cursor.execute("SELECT * FROM drivers WHERE id = ?", (driver_id,))
driver = cursor.fetchone()
if driver is None:
cursor.execute("INSERT INTO drivers (first_name, last_name, id) VALUES (?, ?, ?)",
(driver_first_name, driver_last_name, driver_id))
conn.commit()
id = cursor.lastrowid
print(f"Driver {driver_first_name} {driver_last_name} inserted into the database.")
else:
id = driver[0]
print(f"Exists already")
return id
def insert_constructor_if_not_exists(conn, constructor_name, constructor_id):

cursor.execute("SELECT * FROM constructors WHERE id = ?", (constructor_id,))
exists = cursor.fetchone()
if exists is None:
cursor.execute("INSERT INTO constructors (id, name) VALUES (?, ?)",
(constructor_id, constructor_name))
conn.commit()
print(f"Constructor '{constructor_name}' with id '{constructor_id}' inserted successfully.")
else:
id = exists[0]
return id
def insert_track_if_not_exists(conn, locality, country, name):

cursor.execute('SELECT * FROM tracks WHERE name = ?', (name,))
track_exists = cursor.fetchone()
if track_exists is None:
cursor.execute('INSERT INTO tracks (locality, country, name) VALUES (?, ?, ?)',
(locality, country, name))
conn.commit()
print(f"Track '{name}' inserted successfully.")
else:
id = track_exists[0]
print(f"Track '{name}' already exists in the database.")
return id
def insert_result(conn, driver_id, position, grid, number, constructor_id, race_track_id,

points, race_year):
insert_query = '''INSERT INTO results (driver_id, position, grid, number, constructor_id,
race_track_id, points, race_year)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)'''
cursor.execute(insert_query, (driver_id, position, grid, number, constructor_id, race_track_id,
points, race_year))
6
conn.commit()
print("Result inserted successfully.")
def populate_race_db(year, ns, conn):

base_url = f'http://ergast.com/api/f1/{year}/results'
# Initial fetch to determine pagination
response = requests.get(base_url)
xml_data = response.content
root = ET.fromstring(xml_data)
print(f"Fetching data from year - {year}")
# Pagination details
total = int(root.attrib['total'])
limit = int(root.attrib['limit'])
offset = 0
# Fetching all results page by page

while offset < total:
paginated_url = f"{base_url}?limit={limit}&offset={offset}"
response = requests.get(paginated_url)
for race in root.findall(".//mrd:Race", ns):
circuit = race.find("mrd:Circuit", ns)
location = circuit.find("mrd:Location", ns)
year = race.get("season")
track_name = circuit.find("mrd:CircuitName", ns).text

track_locality = location.find("mrd:Locality", ns).text
track_country = location.find("mrd:Country", ns).text
track_id = insert_track_if_not_exists(conn, track_locality, track_country, track_name)
for result in race.findall(".//mrd:Result", ns):

# Extract driver information
driver = result.find(".//mrd:Driver", ns)
driver_code = driver.get('code')
given_name = driver.find("mrd:GivenName", ns).text
family_name = driver.find("mrd:FamilyName", ns).text
driver_id = insert_driver_if_not_exists(conn, given_name, family_name, driver_code)
# Extract constructor information

constructor = result.find(".//mrd:Constructor", ns)
constructor_name = constructor.find("mrd:Name", ns).text
constructor_id = constructor.get("constructorId")
constructor_id = insert_constructor_if_not_exists(conn, constructor_name,
constructor_id)
# Extract result information

position = result.get("position")
points = result.get("points")
number = result.get("number")
grid = result.find("mrd:Grid", ns).text
insert_result(conn, driver_id, position, grid, number, constructor_id,
race_track_id=track_id, points=points, race_year=year)
offset += limit
def print_all_results_group_by_year(conn):
7
query = '''
SELECT r.race_year, d.first_name, d.last_name, r.position, r.grid, r.number, c.name as
constructor_name, t.name as track_name
FROM results r
JOIN drivers d ON r.driver_id = d.id
JOIN constructors c ON r.constructor_id = c.id
JOIN tracks t ON r.race_track_id = t.id
ORDER BY r.race_year, r.id
'''
cursor.execute(query)
results = cursor.fetchall()
current_year = None
for result in results:
race_year, first_name, last_name, position, grid, number, constructor_name,
track_name = result
if race_year != current_year:
print(f"\nYear: {race_year}")
current_year = race_year
print(f"Driver: {first_name} {last_name}, Position: {position}, Grid: {grid}, "

f"Number: {number}, Constructor: {constructor_name}, Track: {track_name}")
ns = {'mrd': 'http://ergast.com/mrd/1.5'}
conn = sqlite3.connect('f1_database.db')
create_driver_table(conn)
create_constructor_table(conn)
create_track_table(conn)
create_results_table(conn)
# Loop through years and fetch results

for year in range(2003, 2023):
populate_race_db(year, ns, conn)
print_all_results_group_by_year(conn)
conn.close()
import requests
import sqlite3
import xml.etree.ElementTree as ET
def create_driver_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS drivers
first_name TEXT,
last_name TEXT,
nationality TEXT)''')
def create_laps_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS laps
driver TEXT,
8
position INTEGER,
time TEXT,
track_id INTEGER,
lap_number INTEGER,
year DATE,
FOREIGN KEY (track_id) REFERENCES tracks(id))''')
def create_constructor_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS constructors
name TEXT)''')
def create_track_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS tracks
locality TEXT,
country TEXT,
name TEXT)''')
def create_results_table(conn):
cursor.execute('''CREATE TABLE IF NOT EXISTS results
(id INTEGER PRIMARY KEY AUTOINCREMENT,
driver_id TEXT,
position INTEGER,
grid INTEGER,
number INTEGER,
constructor_id TEXT,
race_track_id INTEGER,
points INTEGER,
race_year DATE,
FOREIGN KEY (driver_id) REFERENCES drivers(id),
FOREIGN KEY (constructor_id) REFERENCES constructors(id),
FOREIGN KEY (race_track_id) REFERENCES tracks(id))''')
def insert_driver_if_not_exists(conn, driver_first_name, driver_last_name, driver_id):

cursor.execute("SELECT * FROM drivers WHERE id = ?", (driver_id,))
driver = cursor.fetchone()
if driver is None:
cursor.execute("INSERT INTO drivers (first_name, last_name, id) VALUES (?, ?, ?)",
(driver_first_name, driver_last_name, driver_id))
conn.commit()
print(f"Driver {driver_first_name} {driver_last_name} inserted into the database.")
else:
id = driver[0]
return id
def insert_constructor_if_not_exists(conn, constructor_name, constructor_id):

cursor.execute("SELECT * FROM constructors WHERE id = ?", (constructor_id,))
exists = cursor.fetchone()
if exists is None:
cursor.execute("INSERT INTO constructors (id, name) VALUES (?, ?)",
9
(constructor_id, constructor_name))
conn.commit()
print(f"Constructor '{constructor_name}' with id '{constructor_id}' inserted successfully.")
else:
id = exists[0]
return id
def insert_track_if_not_exists(conn, locality, country, name):

cursor.execute('SELECT * FROM tracks WHERE name = ?', (name,))
track_exists = cursor.fetchone()
if track_exists is None:
cursor.execute('INSERT INTO tracks (locality, country, name) VALUES (?, ?, ?)',
(locality, country, name))
conn.commit()
print(f"Track '{name}' inserted successfully.")
else:
id = track_exists[0]
print(f"Track '{name}' already exists in the database.")
return id
def insert_result(conn, driver_id, position, grid, number, constructor_id,

race_track_id, points, race_year):
insert_query = '''INSERT INTO results (driver_id, position, grid, number, constructor_id,
race_track_id, points, race_year)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)'''
cursor.execute(insert_query, (driver_id, position, grid, number, constructor_id,
race_track_id, points, race_year))
conn.commit()
print("Result inserted successfully.")
def insert_lap(conn, driver, position, time, track_id, lap_number, year):

insert_query = '''INSERT INTO laps (driver, position, time, track_id, lap_number, year)
VALUES (?, ?, ?, ?, ?, ?)'''
cursor.execute(insert_query, (driver, position, time, track_id, lap_number, year))
conn.commit()
print("Lap inserted successfully.")
def get_laps(year, ns, conn):

for round in range(1, 22, 1):
base_url = f'http://ergast.com/api/f1/{year}/{round}/laps'
offset = 0
10

lap_list = race.find("mrd:LapsList", ns)

laps = lap_list.findall("mrd:Lap", ns)
for lap in laps:
timings = lap.findall("mrd:Timing", ns)
lap_number = lap.get("number")
for timing in timings:
driver = timing.get("driverId")
position = timing.get("position")
time = timing.get("time")
insert_lap(conn, driver, position, time, track_id, lap_number, year)
offset += limit
def print_all_results_group_by_year(conn):
query = '''
SELECT r.race_year, d.first_name, d.last_name, r.position, r.grid, r.number, c.name as
constructor_name, t.name as track_name
FROM results r
JOIN drivers d ON r.driver_id = d.id
JOIN constructors c ON r.constructor_id = c.id
JOIN tracks t ON r.race_track_id = t.id
ORDER BY r.race_year, r.id
'''
current_year = None
for result in results:
race_year, first_name, last_name, position, grid, number, constructor_name,
track_name = result
if race_year != current_year:
print(f"\nYear: {race_year}")
current_year = race_year
print(f"Driver: {first_name} {last_name}, Position: {position}, Grid: {grid}, "

f"Number: {number}, Constructor: {constructor_name}, Track: {track_name}")
def populate_race_db(conn):
base_url = f'http://ergast.com/api/f1/{year}/results'
11
offset = 0


for result in race.findall(".//mrd:Result", ns):

# Extract driver information
driver = result.find(".//mrd:Driver", ns)
driver_code = driver.get('code')
given_name = driver.find("mrd:GivenName", ns).text
family_name = driver.find("mrd:FamilyName", ns).text
driver_id = insert_driver_if_not_exists(conn, given_name, family_name, driver_code)
# Extract constructor information

constructor = result.find(".//mrd:Constructor", ns)
constructor_name = constructor.find("mrd:Name", ns).text
constructor_id = constructor.get("constructorId")
constructor_id = insert_constructor_if_not_exists(conn,
constructor_name, constructor_id)
# Extract result information

position = result.get("position")
points = result.get("points")
number = result.get("number")
grid = result.find("mrd:Grid", ns).text
insert_result(conn, driver_id, position, grid, number, constructor_id,
race_track_id=track_id, points=points, race_year=year)
offset += limit
def print_laps_with_track(conn):
query = '''
SELECT l.id, l.driver, l.position, l.time, l.year, t.locality, t.country, t.name
FROM laps l
JOIN tracks t ON l.track_id = t.id
ORDER BY l.id
'''
12
for row in results:

lap_id, driver, position, lap_time, year, locality, country, track_name = row
print(f"Lap ID: {lap_id}, Driver: {driver}, Position: {position}, Time: {lap_time},
Year: {year}, "
f"Track: {track_name}, Locality: {locality}, Country: {country}")
ns = {'mrd': 'http://ergast.com/mrd/1.5'}
conn = sqlite3.connect('f1_database.db')
# create_driver_table(conn)
# create_constructor_table(conn)
create_track_table(conn)
create_laps_table(conn)
# # create_results_table(conn)
# Loop through years and fetch results

for year in range(2021, 2023):
# populate_race_db(year, ns, conn)
get_laps(year, ns, conn)
# print_all_results_group_by_year(conn)
print_laps_with_track(conn)
conn.close()
#Inserting financial data into the database.
import pandas as pd
import sqlite3
#Reading the CSV file
df = pd.read_csv('RACE.csv')
#Clean up
df.columns.str.strip()
connection = sqlite3.connect('f1_database.db')
df.to_sql('Ferrari_stock', connection, if_exists='replace')
connection.close()
#Web Scrapping News articles

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import sqlite3
from random import randint
# Connect to SQLite database

conn = sqlite3.connect('f1_ferrarinews2021.db')
c = conn.cursor()
# Create articles table

c.execute('''CREATE TABLE IF NOT EXISTS articles
(id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT, paragraph TEXT)''')
# Setup WebDriver with a User-Agent

options = webdriver.ChromeOptions()
13
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3")
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)
# Blacklist keywords/phrases for unwanted paragraphs

blacklist_words = ["cookie disclaimer", "related content", "you may also like", "Subscriber",
"cookies","browser","aggregated","anonymous","advertising","Internet","devices","identifiers",
"tracking","articles","geolocation","Apps","Newsletters","fraudulent","reviews"]
# List of URLs
urls = ['https://www.carandbike.com/news/f1-ferrari-to-develop-sf21-till-june-2021-2414332',
'https://www.businesswire.com/news/home/20210617005933/en/Ferrari-Selects-AWS-as-its-Official-
Cloud-Provider-to-Power-Innovation-on-the-Road-and-Track'
,'https://www.planetf1.com/news/controversial-mission-winnow-dropped-ferrari',
'https://www.carandbike.com/news/ferrari-discussing-new-f1-deal-with-philip-morris-despite-
mission-winnow-eu-ban-2471177','https://www.carandbike.com/news/ferrari-discussing-new-
f1-deal-with-philip-morris-despite-mission-winnow-eu-ban-2471177','https://sportsmintmedia.com/
formula-1-ferrari-signs-cloud-partnership-deal-with-amazon-web-services/',
'https://www.fia.com/news/f1-verstappen-quickest-red-bull-ring-ahead-ferraris-leclerc-and-sainz',
'https://www.the-race.com/formula-1/ferrari-to-use-generational-new-simulator-for-22-f1-car/',
'https://www.pmw-magazine.com/news/team-news/ferrari-completes-install-of-new-dil-simulator-for-
f1-team.html','https://www.pmw-magazine.com/news/team-news/ferrari-completes-install-of-new-
dil-simulator-for-f1-team.html',
'https://www.racefans.net/2021/08/09/ferrari-power-unit-upgrade-significant-step-f1-2021/',
'https://us.motorsport.com/f1/news/how-ferraris-new-gearbox-casing-helped-boost-its-f1-aero/6653646/',
'https://www.formula1.com/en/latest/article.ferrari-to-debut-new-engine-in-russia-forcing-
leclerc-to-start-from-back-of.NsUPIl5I66ZIol5eNMGKE.html',
'https://www.autosport.com/f1/news/sainz-calls-on-ferrari-to-analyse-recent-f1-pit-errors/6727337/',
'https://www.santander.com/en/press-room/press-releases/2021/12/santander-agrees-a-multi-year-
partnership-with-scuderia-ferrari',
'https://www.the-race.com/formula-1/ferrari-drops-mission-winnow-name-still-in-philip-morris-talks/'
# Scrape and store data

for url in urls:
driver.get(url)
time.sleep(randint(2, 10)) # Random delay between 2 to 10 seconds
soup = BeautifulSoup(driver.page_source, 'html.parser')

article_text = soup.find_all('p')
for paragraph in article_text:

skip_paragraph = False
for word in blacklist_words:
if word.lower() in paragraph.text.lower():
skip_paragraph = True
break # Exit inner loop if any blacklist word is found
if not skip_paragraph:
c.execute("INSERT INTO articles (url, paragraph) VALUES (?, ?)", (url, paragraph.text))
conn.commit()
14
# Cleanup
conn.close()
driver.quit()
# Note: This code has been used by making adjustments to url and file names mutiple times
15

Data Science Project 1

Uploaded by

Copyright:

Available Formats

Data Science Project 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science Project 1

Uploaded by

Copyright:

Available Formats

FIN42110: Data Science for Trading and Risk

Harsh Desai - 23205088

Formula 1 is the pinnacle of motorsport, it displays a fusion of cutting-edge

By combining on-track performance metrics, financial insights, and sentiment

2 Novel Data Set Collection

Table 1: Grand Prix Wins 2003-2023.

Constructor Number of Wins Average Qualifying

Year Driver Constructor

4 Data Cleaning, Checking and Organisation

• For track performance analysis we have considered parameters such as

#Extracting Race results Data

def insert_driver_if_not_exists(conn, driver_first_name, driver_last_name, driver_id):

def insert_constructor_if_not_exists(conn, constructor_name, constructor_id):

def insert_track_if_not_exists(conn, locality, country, name):

def insert_result(conn, driver_id, position, grid, number, constructor_id, race_track_id,

def populate_race_db(year, ns, conn):

# Fetching all results page by page

track_name = circuit.find("mrd:CircuitName", ns).text

track_id = insert_track_if_not_exists(conn, track_locality, track_country, track_name)

for result in race.findall(".//mrd:Result", ns):

# Extract constructor information

# Extract result information

insert_result(conn, driver_id, position, grid, number, constructor_id,

race_track_id=track_id, points=points, race_year=year)

print(f"Driver: {first_name} {last_name}, Position: {position}, Grid: {grid}, "

# Loop through years and fetch results

def insert_driver_if_not_exists(conn, driver_first_name, driver_last_name, driver_id):

def insert_constructor_if_not_exists(conn, constructor_name, constructor_id):

def insert_track_if_not_exists(conn, locality, country, name):

def insert_result(conn, driver_id, position, grid, number, constructor_id,

def insert_lap(conn, driver, position, time, track_id, lap_number, year):

def get_laps(year, ns, conn):

# Fetching all results page by page

track_name = circuit.find("mrd:CircuitName", ns).text

track_id = insert_track_if_not_exists(conn, track_locality, track_country, track_name)

lap_list = race.find("mrd:LapsList", ns)

print(f"Driver: {first_name} {last_name}, Position: {position}, Grid: {grid}, "

# Fetching all results page by page

track_name = circuit.find("mrd:CircuitName", ns).text

track_id = insert_track_if_not_exists(conn, track_locality, track_country, track_name)

for result in race.findall(".//mrd:Result", ns):

# Extract constructor information

# Extract result information

insert_result(conn, driver_id, position, grid, number, constructor_id,

race_track_id=track_id, points=points, race_year=year)

for row in results:

# Loop through years and fetch results

#Web Scrapping News articles

# Connect to SQLite database

# Create articles table

# Setup WebDriver with a User-Agent

# Blacklist keywords/phrases for unwanted paragraphs

# Scrape and store data

soup = BeautifulSoup(driver.page_source, 'html.parser')

for paragraph in article_text:

You might also like