Organized
Organized
Organized
05/17/2024
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary
• Summary of methodologies
- Data Collection through API
- Data Collection with Web Scraping
- Data Wrangling
- Exploratory Data Analysis with SQL
- Exploratory Data Analysis with Data Visualization
- Interactive Visual Analytics with Folium
- Machine Learning Prediction
• Summary of all results
- Exploratory Data Analysis result
- Interactive analytics in screenshots
- Predictive Analytics result from Machine Learning Lab
3
Introduction
SpaceX is a revolutionary company who has disrupt the space industry by offering a
rocket launches specifically Falcon 9 as low as 62 million dollars; while other providers
cost upward of 165 million dollar each. Most of this saving thanks to SpaceX
astounding idea to reuse the first stage of the launch by re-land the rocket to be used
on the next mission. Repeating this process will make the price down even further. As a
data scientist of a startup rivaling SpaceX, the goal of this project is to create the
machine learning pipeline to predict the landing outcome of the first stage in the future.
This project is crucial in identifying the right price to bid against SpaceX for a rocket
launch.
The problems included:
• Identifying all factors that influence the landing outcome.
• The relationship between each variables and how it is affecting the outcome.
• The best condition needed to increase the probability of successful landing. 4
Section 1
5
Methodology
Executive Summary
• Data collection methodology:
• Data was collected using SpaceX REST API and web scrapping from Wikipedia
For REST API, its started by using the get request. Then, we decoded the response
content as Json and turn it into a pandas dataframe using json_normalize(). We
then cleaned the data, checked for missing values and fill with whatever needed.
For web scrapping, we will use the BeautifulSoup to extract the launch records as
HTML table, parse the table and convert it to a pandas dataframe for further
analysis
7
Data Collection – SpaceX API
From:
https://github.com/farishelmi17/SpaceX/blob/m
ain/notebook:Data_Collection_yJPxhv2oU.ipynb
8
Data Collection - Scraping
Create a BeautifulSoup
from the HTML response
From:
https://github.com/farishelmi17/SpaceX/blo
b/main/notebook:Data_Collection_with_We
b_Scraping_nI89VIRCE.ipynb
9
Data Wrangling
10
EDA with Data Visualization
We first started by using scatter graph to find the relationship
between the attributes such as between:
• Payload and Flight Number.
• Flight Number and Launch Site.
• Payload and Launch Site.
• Flight Number and Orbit Type.
• Payload and Orbit Type.
12
EDA with SQL
Using SQL, we had performed many queries to get better understanding of the dataset, Ex:
- Displaying the names of the launch sites.
- Displaying 5 records where launch sites begin with the string ‘CCA’.
- Displaying the total payload mass carried by booster launched by NASA (CRS).
- Displaying the average payload mass carried by booster version F9 v1.1.
- Listing the date when the first successful landing outcome in ground pad was achieved.
- Listing the names of the boosters which have success in drone ship and have payload mass
greater than 4000 but less than 6000.
- Listing the total number of successful and failure mission outcomes.
- Listing the names of the booster_versions which have carried the maximum payload mass.
- Listing the failed landing_outcomes in drone ship, their booster versions, and launch sites
names for in year 2015.
- Rank the count of landing outcomes or success between the date 2010-06-04 and
2017-03-20, in descending order.
https://github.com/farishelmi17/SpaceX/blob/main/notebook:Exploratory_Data_Analysis_with_SQL__eqznon1EA.ipynb 13
Build an Interactive Map with Folium
To visualize the launch data into an interactive map. We took the latitude and longitude
coordinates at each launch site and added a circle marker around each launch site with a
label of the name of the launch site.
We then used the Haversine’s formula to calculated the distance of the launch sites to
various landmark to find answer to the questions of:
• How close the launch sites with railways, highways and coastlines?
• How close the launch sites with nearby cities?
From: https://github.com/farishelmi17/SpaceX/blob/main/notebook:Interactive_Visual_Analytics_with_Folium_M8uUhCmHY.ipynb
14
Build a Dashboard with Plotly Dash
• We built an interactive dashboard with Plotly dash which allowing the user to play
around with the data as they need.
• We plotted pie charts showing the total launches by a certain sites.
• We then plotted scatter graph showing the relationship with Outcome and Payload
Mass (Kg) for the different booster version.
15
Predictive Analysis (Classification)
Building the Model Evaluating the Model Improving the Model Find the Best Model
• Load the dataset into • Check the accuracy for each • Use Feature Engineering • The model with the best
NumPy and Pandas model and Algorithm Tuning accuracy score will be the
• Transform the data and • Get tuned hyperparameters best performing model.
then split into training and for each type of algorithms.
test datasets • plot the confusion matrix.
• Decide which type of ML to
From:
use https://github.com/farishelmi17
• set the parameters and /SpaceX/blob/main/spacex_das
algorithms to GridSearchCV h_app.py
and fit it to dataset.
16
Results
17
Section 2
Flight Number vs. Launch Site
19
Payload vs. Launch Site
20
Success Rate vs. Orbit Type
21
Flight Number vs. Orbit Type
22
Payload vs. Orbit Type
Heavier payload has positive
impact on LEO, ISS and P0 orbit.
However, it has negative impact
on MEO and VLEO orbit.
GTO orbit seem to depict no
relation between the attributes.
23
Launch Success Yearly Trend
This figures clearly depicted
and increasing trend from
the year 2013 until 2020.
• JDJD
If this trend continue for the
next year onward. The
success rate will steadily
increase until reaching
1/100% success rate.
.
24
All Launch Site Names
We used the key word DISTINCT to show only unique launch sites
from the SpaceX data.
25
Launch Site Names Begin with 'CCA'
26
Total Payload Mass
27
Average Payload Mass by F9 v1.1
28
First Successful Ground Landing Date
We use the min() function to find the result
We observed that the dates of the first successful landing outcome on ground
pad was 22nd December 2015
29
Successful Drone Ship Landing with Payload between 4000 and 6000
We used the WHERE clause to filter for boosters which have successfully landed on
drone ship and applied the AND condition to determine successful landing with
payload mass greater than 4000 but less than 6000
30
Total Number of Successful and Failure Mission Outcomes
We used wildcard like ‘%’ to filter for WHERE MissionOutcome was a success or a failure.
31
Boosters Carried Maximum Payload
We determined the
booster that have
carried the maximum
payload using a
subquery in the
WHERE clause and
the MAX() function.
32
2015 Launch Records
33
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
34
Section 3
Location of all the Launch Sites
We can see that
all the SpaceX
launch sites are
located inside
the United
States
36
Markers showing launch sites with color labels
37
Launch Sites Distance to Landmarks
38
Section 4
The success percentage by each sites.
40
The highest launch-success ratio: KSC LC-39A
41
Payload vs Launch Outcome Scatter Plot
We can see that all the success rate for low weighted payload is higher than heavy weighted
payload
42
Section 5
Classification Accuracy
As we can see, by using the code as below: we could identify that the best algorithm to be
the Tree Algorithm which have the highest classification accuracy.
44
Confusion Matrix
The confusion matrix for the decision tree classifier shows that the classifier can
distinguish between the different classes. The major problem is the false positives
.i.e., unsuccessful landing marked as successful landing by the classifier.
45
Conclusions
46