Machine Learning For Beginners
Machine Learning For Beginners
Machine Learning For Beginners
April 3 – May 7 2023 | FREE 5 -Day Machine Learning Immersion in 150+ Cities in Nigeria
and 50 in Other Countries
DAY 1: FREE AI Class in Every City 2023
Code:
https://colab.research.google.com/drive/1ghjrlQoscUbqNVPckdbOjp77IcJiaCIq?usp=sh
aring
1
Introduction to Data Scientists Network and AI Invasion
2
3
4
5
6
7
8
9
10
11
12
13
14
15
JOIN OUR COMMUNITY:
https://www.datasciencenigeria.org/ai-communities/
16
What is Artificial Intelligence?
18
Reinforcement learning (RL): Reinforcement learning is an autonomous, self- teaching
system that essentially learns by trial and error. It performs actions with the aim of maximizing
rewards, or in other words, it is learning by doing to achieve the best outcomes. RL can be
positive or negative. It is negative when a particular behavior is strengthened because a
negative condition is stopped of avoided. Positive Reinforcement is defined as when an event,
occurs due to a particular behavior, increases the strength and the frequency of the behavior.
In other words, it has a positive effect on behavior.
● AI in Robotics
The field of robotics involves the designing and creation of automated machines or robots
in such a way that they possess the ability to perform tasks on their own. Nowadays,
robots are becoming more and more advanced and efficient in accomplishing tasks. This
is due to the Artificial Intelligence tools and techniques that are specially designed for the
19
field of robotics. Advanced robots consist of sensors, high-definition cameras, voice
recognition devices, etc.
These robots are capable of learning from their past mistakes and experience and can
adjust the algorithms according to the environment. Artificial Intelligence is an extremely
useful tool for robotic applications. When AI is combined with advanced devices, it can
help in optimizations. It is helpful in enhancing the complex manufacturing process in
industries such as aerospace.
Sample: https://www.youtube.com/watch?v=XuGJqajHAHo
● AI in Defence
Defence is one of the most crucial sectors where Artificial Intelligence is contributing to
nation-building. Security systems can be vulnerable to attacks by hackers, these attackers
would do so to sell private information which can prove to be detrimental to any country.
This is where the involvement of Artificial Intelligence proves to be of great use. The
analysis of large amounts of data becomes easy with the help of Artificial Intelligence.
Tools powered by AI can help find suspicious activity over the system and keep track of
the security of an organisation/nation database. Any alterations in the database by an
unknown source are immediately tracked down for action.
Sample: https://www.analyticsinsight.net/role-of-artificial-intelligence-in-defence/
● AI in Transport
Artificial Intelligence in the transport industry has completely changed the era of travelling.
As the competition in the travel industry is high, there is a need to analyze all the factors
that influence the travel business. These factors are price, seasons, festivals, the number
of travellers, etc.
With the help of predictive analytics, the software can analyze data related to these factors
that impact the cost of transport. Tools powered by AI can help perform predictive
analytics efficiently on the data, like predicting the best prices in specific routes. Another
application of Artificial Intelligence in transportation is route optimization. Businesses like
Uber and Bolt can use Artificial Intelligence in their app to show optimized paths thereby
moving consumers faster from point A to B.
Sample: https://www.youtube.com/watch?v=s_Ze2o8ixNM
● AI in Healthcare
When it comes to healthcare, AI never lags behind. Most healthcare organizations rely on
AI-based software for their day-to-day tasks. These tasks vary from patient diagnosis to
hospital data management. Since large amount of data are generated by the healthcare
industry per day, there emerges a need for AI-based advanced processors that can
extract, manipulate, analyze, and draw some meaningful insights from this data. AI and
ML technologies are doing a fabulous job in the healthcare industry. The AI-based
algorithms fed into the systems are capable enough to spot patterns much more efficiently
20
than humans. AI-based devices also help measure real-time data such as blood pressure,
heartbeat, body temperature, and many more. Sample:
https://www.youtube.com/watch?v=ii-FfE-7C-k
● AI in Marketing
Today, the marketing industry is revolutionized by the applications of Artificial Intelligence.
Various industries such as e-commerce, e-learning, advertising, media, and
entertainment use Artificial Intelligence to boost profitability. Suppose, you are searching
for a product on Amazon. Along with the product, it will also show you the best sellers,
similar products, varieties of the same product, and the ‘Recommended for you’ list of
products.
AI-based algorithms understand the interests of customers and give recommendations to
searches made. An example of such is Netflix.
Sample: https://www.youtube.com/watch?v=FYMjXD3G__Y
● AI in Automotive Industry
The invention of self-driving cars has completely changed the world of automobiles. There
are various companies developing self-driving cars such as Tesla, Google, Bosch,
Nissan, Audi, Volvo, and many more. The self-driving cars are built using a combination
of various technologies, and one of the majorly used technologies is Artificial Intelligence.
A self-driving car uses sensors, cameras, voice detectors, and many other devices. It
analyzes the surroundings by collecting data. The AI-enabled advanced systems used in
the self-driving car will find an optimized path to the destination.
With the help of AI, we can address problems such as traffic accidents, respond to natural
disasters, etc.
Sample: https://www.youtube.com/watch?v=VGGHCH0T_SQ
21
● Email filtering
This is a classic use of machine learning. Email inboxes also have a spam inbox, where
your email provider automatically filters unwanted spam emails. ● Product
recommendations
Amazon and other online retailers often list “recommended products” for each consumer
individually. These recommendations are based on past purchases, browsing history, and
any other behavioral information they have about consumers. This is a great way for
online retailers to provide extra value or upsells to their customers using machine
learning.
● Personalized marketing
Marketing is becoming more personal as technologies like machine learning gain more
ground in the enterprise. Now that much of marketing is online, marketers can use
characteristic and behavioral data to segment the market.
● Process automation
There are many processes in the enterprise that are much more efficient when done using
machine learning. These include analyses such as risk assessments, demand
forecasting, customer churn prediction, and others. Machine learning for process
automation alleviates the timeliness issue for enterprises. Machine learning can even help
with customer loyalty analyses like sentiment analysis.
● Fraud detection
Banks use machine learning for fraud detection to keep their consumers safe, but this can
also be valuable to companies that handle credit card transactions. Fraud detection can
save money on disputes and chargebacks, and machine learning models can be trained
to flag transactions that appear fraudulent based on certain characteristics.
• Generative /Generalized AI
Generative AI is a form of artificial intelligence that can generate new content or output
similar to human-created content. This is achieved by analyzing large amounts of data
using machine learning algorithms, which then create new content based on what they
have learned. E. G Text to Audio, Text to Image, Text to Video
In contrast, Generalized AI is a type of artificial intelligence that can perform a diverse range of
tasks and can adapt to new situations that it has not been specifically programmed for. This
type of AI is also called "human-level AI" because it has the ability to reason, learn, and
understand language like a human. For example Chatgpt
Introduction To Platforms
Whether you're a student, a data scientist or an AI researcher, Colab can make your
work easier. Visit the link below for how to get started
https://colab.research.google.com/
23
● String: Characters or texts are known as strings in Python. A string object must
be inside a single or double quote. Otherwise, Python will throw an error.
● Float: Float (floating point) numbers are numbers with a decimal point.
24
● Tuple: A tuple in Python is similar to a list. The difference between the two is that
we cannot change the elements of a tuple once it is assigned whereas we can
change the elements of a list.
● Sets: A Sets in Python is similar to a list. The difference between the two is that
set does not allow for duplicate elements whereas duplicate elements are
allowed in a list.
25
● The If else statement: This statement is used when both the true and false parts
of a given condition are specified to be executed. When the condition is true, the
statement inside the if block is executed; if the condition is false, the statement
outside the if block is executed.
26
Functions In Python
A function is a block of code that only executes when called. You can pass data, known
as parameters, into a function and the function can return data as a result.
27
References
1. Mitchell, Tom (1997). Machine Learning. New York: McGraw Hill. ISBN 0-07-042807-
7. OCLC 36417892.
2. Machine Learning Use Cases: https://algorithmia.com/blog/machine-learning-
usecases
3. "1. Introduction: What Is Data Science? - Doing Data Science [Book]".
www.oreilly.com. Retrieved 3 April 2020.
4. Top Applications of Artificial Intelligence: Top Applications of Artificial Intelligence(AI)
in 2021 (intellipaat.com)
5. Artificial Intelligence Tutorial for Beginners: Artificial Intelligence Tutorial for Beginners
[Updated 2020] (simplilearn.com)
6. ARTIFICIAL INTELLIGENCE: NEXT FRONTIERS FOR TECHNOLOGY:
https://global.ariseplay.com/amg/www.thisdaylive.com/uploads/ARTIFICIAL-
INTELLIGENCE.jpg
7. Types Of Machine Learning:
https://www.simplilearn.com/ice9/free_resources_article_thumb/TypesOfMachineLear
ning.PNG
8. ChatGPT: https://chat.openai.com/
9. http://web.archive.org/web/20220812165347/https://www.forbes.com/sites/bernardma
rr/2018/10/22/artificial-intelligence-whats-the-difference-between-deep-learning-and-
reinforcement-learning
10.
28
11. https://mcli.cogdogblog.com/proj/nru/nr.html
12. https://vatsalparsaniya.github.io/ML_Knowledge/Reinforcement
Learning/README.html
Activity
● List other industries where AI can be utilised
● Write a python function called “two_sum” that takes in two arguments(integer) and
calculates the sum of two number
● Write an if-else statement that print “This is a prime number” if the number is
prime, or print out “This is an Ordinary number” if the number is not prime
AI Invasion Project
The AI Invasion project is the last project you must complete before graduating from this
program. The goal of this project is to reinforce what you've learned, put some of the
abilities you've gained into practice, and have a project to show potential employers in
your portfolio.
Data Collection
Data collection is one of the most important stages of machine learning workflows.
During data collection, you are defining the potential usefulness and accuracy of your
project with the quality of the data you collect.
To collect data, you need to identify your sources and aggregate data from those
sources into a single dataset. This could mean streaming data from the Internet,
downloading open-source data sets, or constructing a data lake from assorted files,
logs, or media.
30
Data Pre-Processing
Once your data is collected, you need to pre-process it. Pre-processing involves
cleaning, verifying, and formatting data into a usable dataset. If you are collecting data
from a single source, this may be a relatively straightforward process. However, if you
are aggregating several sources you need to make sure that data formats match, that
data is equally reliable, and remove any potential duplicates.
Data Segmentation
This phase involves breaking processed data into three datasets—training, validating,
and testing:
● Training set— used to initially train the algorithm and teach it how to process
information. This set defines model classifications through parameters.
● Validation set— used to estimate the accuracy of the model. This dataset is used
to finetune model parameters.
● Test set— used to assess the accuracy and performance of the models. This set
is meant to expose any issues in the model.
Once you have datasets, you are ready to train your model. This involves feeding your
training set to your algorithm so that it can learn appropriate parameters and features
used in classification.
Once training is complete, you can then refine the model using your validation dataset.
This may involve modifying or discarding variables and includes a process of tweaking
model-specific settings (hyperparameters) until an acceptable accuracy level is
reached.
Finally, after an acceptable set of hyperparameters is found and your model accuracy is
optimized you can test your model. Testing uses your test dataset and is meant to verify
that your models are using accurate features. Based on the feedback you receive you
may return to training the model to improve accuracy, adjust output settings, or deploy
the model as needed.
31
Deployment of machine learning models, or simply, putting models into production,
means making your models available to other systems within the organization or the
web so that they can receive data and return their predictions.
These are lists of some of the publicly available datasets you can work with as a data
scientist
● AFRIFASHION40000: AFRIFASHION40000 is an openly available dataset of
African fashion images generated using Generative Adversarial Networks
(GANs) and created by Data Science Nigeria. Link:
https://bit.ly/DSN_AFRIFASHION40000
● Data.gov: It consists of a variety of datasets from US Government agencies.
Domains include Education, Climate, Food, Chronic disease and so much more.
Link: https://www.data.gov/
● UCI Machine Learning Repository: This site consists of datasets hosted by the
University of California, Irvine. It has a collection of about 400+ datasets aimed
toward the Machine Learning community. Link:
http://archive.ics.uci.edu/ml/index.php
● Google Public Datasets: Google has hosted tons of datasets on Google Public
Datasets which is basically their Cloud Platform. You can browse through their
dataset collection using BigQuery. The first 1 Terabyte of queries you make are
basically free. Link: https://cloud.google.com/bigquery/public-data/
● Datasets on Github: It hosts tons of awesome datasets. This GitHub boasts a
variety of datasets such as Climate Data, Time Series data, Plane crash data etc.
Feel free to dig in. Link: https://github.com/awesomedata/awesome-
publichttps://github.com/awesomedata/awesome-public-datasetsdatasets
● For more datasets, you can check the link below:
https://medium.com/analyticshttps://medium.com/analytics-vidhya/top-100-open-
source-datasets-for-data-science-cd5a8d67cc3dvidhya/top-100-open-source-
datasets-for-data-science-cd5a8d67cc3d
References
Data Cleaning Steps & Process to Prep Your Data for Success. (2021, June 3).
https://monkeylearn.com/blog/datacleaning-steps/
33
Guide, S. (2021, November 11). What is Exploratory Data Analysis? Steps and
https://www.simplilearn.com/tutorials/data-analytics-tutorial/exploratory-
dataanalysis
Guide To Data Cleaning: Definition, Benefits, Components, And How To Clean Your
https://www.tableau.com/learn/articles/what-is-data-cleaning
Patil, P. (2018, March 23). What is Exploratory Data Analysis? | by Prasad Patil.
https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15
Activity
- Using Pandas download and load a dataset from one of the open-source
datasets
- Perform EDA on the titanic dataset. You can download the dataset using
“pd.read_csv(https://raw.githubusercontent.com/datasciencedojo/
datasets/master/titanic.csv)”
34
DAY 3: FREE AI Class in Every City 2023
Supervised Learning
Supervised Learning can be subdivided into Regression and Classification. Regression and
Classification are both parts of a class of Machine Learning Algorithms called “Supervised
Learning Algorithms”. Remember that Machine Learning is all about generating future
occurrences/predictions of an event while making reference to past occurrences of the event in
question. And for this particular class of Machine Learning(Supervised learning), we have
labelled data with known examples with which the algorithm is trained to predict unknown data.
35
Classification: A classification classifier (model) is the type of model in which the output
variable (i.e the Label) is Discrete. For example:
● Predict if the patient has cancer or not. [Label: Caner and Not cancer (2)]
● Predict if an employee will leave or stay [Label: Leave and Stay (2)] ● Predict if an email
is spam or not. [Label: Spam and Not Spam (2)]
Regression: A Regression model is the type of model in which the output variable is continuous.
For example:
Similarities:
- Both belong to the class of Machine Learning Algorithms called “Supervised
Learning”.
- Both model a problem by learning a mapping function/relationship from the
input(X) to the output(Y), using known examples.
36
Differences:
Regression is a predictive statistical process where the model attempts to find the important
relationship between dependent and independent variables. The goal of a regression algorithm
is to predict a continuous number such as sales, income, and test scores.
First, we will have a quick introduction to building models in Python, and what better way to start
than one of the very basic models, linear regression? Linear regression will be the first algorithm
used and you will also learn more complex regression models.
37
The Mathematical Theory Of Linear Regression
Mathematically, the linear regression model can be defined by a dependent variable ’Y’, also
called the regressand and an independent variable(or set of independent variables) ’X’, also
called the regressor(s), and a sample space ’n’.
- Decision Tree
- Support Vector Machine
- XGBoost
38
References
regressionin-machine-learning/
https://towardsdatascience.com/introduction-to-machine-learning-for-
beginnerseed6024fdb08
https://medium.com/@mygreatlearning/step-by-step-regression-
analysisf7e3e3ebf296
Activity
- Build a machine learning algorithm
R Square/Adjusted R Square
R Square is calculated by the sum of the squared of prediction error divided by the total
sum of the square which replaces the calculated prediction with mean. R Square value
is between 0 to 1 and a bigger value indicates a better fit between prediction and actual
value.
R Square is a good measure to determine how well the model fits the dependent
variables. However, it does not take into consideration of overfitting problem.
40
Mean Square Error(MSE)/Root Mean Square Error(RMSE)
MSE is calculated by the sum of the square of prediction error which is real output
minus predicted output and then divide by the number of data points. It gives you an
absolute number on how much your predicted results deviate from the actual number.
You cannot interpret many insights from one single result but it gives you a real number
to compare against other model results and help you select the best regression model.
Root Mean Square Error(RMSE) is the square root of MSE. It is used more commonly
than MSE because firstly sometimes MSE value can be too big to compare easily.
Secondly, MSE is calculated by the square of error, and thus square root brings it back
to the same level of prediction error and makes it easier for interpretation.
R Square/Adjusted R Square
41
Mean Square Error(MSE)/Root Mean Square Error(RMSE)
42
Exporting Predictions Of Machine Learning Models
At last, you're ready to submit some predictions for scoring. You can write your
predictions to a CSV file using the .to_csv() method on a pandas DataFrame.
References
https://towardsdatascience.com/what-are-the-best-metrics-to-evaluate-
yourhttps://towardsdatascience.com/what-are-the-best-metrics-to-evaluate-your-
regression-model-418ca481755bregression-model-418ca481755b
43
Activity
- Build an XGBoost Model suing the tips.csv dataset
44
4. Click on the sign-up bottom
5. An email will be sent to your mailbox for account confirmation. 6. Next, re-visit
the site: https://zindi.africa/ 7. Click on sign-in.
45
How To Join A Competition On Zindi
- Prize competition: you win prize money if you are among the top 3 winners
of a particular competition.
- You win points: Points increase your ranking among other data scientists
on the platform.
- You gain knowledge: Knowledge competitions are where you can learn
and increase your skillset.
4. Navigate to the hackathon of choice.
5. The competition page provides some information to help you understand the
problem you are going to solve, by reading the problem statement, how you can
participate and how to submit your solution on the platform.
6. On The competition page, the following tab can be found;
1. Info: The info tab contains the problem statements of the competition
and a list of organizations that have either provided the dataset or funded
the competition. On the left side, you can see a list of vertical tabs that
provide more information about the competition.
(a) Description
(b) Rules
- Each competition has its own rules. Breaking the rules can lead to
disqualification, so make sure to carefully read and understand all
(c) Prizes
- Now, this is the best part: the prize section provides details about
the prize money that will be provided for the first, second and third
place winners of the competition. But remember not all competition
provides prize money for its winners; in other competitions, you can
get Zindi points or gain knowledge.
(d) Evaluation
- Each competition has its own evaluation metric that will be used to
evaluate your results and rank you on the leaderboard. It also
shows how you should prepare your submission file before
uploading your file on the platform.
(e) Timeline
47
- This section provides information about the start date of the
competition and the end date and time of the competition. If you
submit your solution after the deadline you will receive a score but it
will not reflect on the leaderboard. Make sure to submit before the
deadline if you want to be considered for a prize.
2. Data
- The data tab contains a description of the dataset you are going to
use for this competition. On the right side of the page, you can see
a list of links to download the dataset and other important files. You
will download:
3. Discussion
- We're not Liverpool FC fans but we like their slogan: “You will never
walk alone”. That is the purpose of the discussion page, you don’t
need to walk alone throughout the competition. If you face any
challenge or uncertainty during the competition or you want to ask a
question to understand more about the dataset provided, you can
post on the discussion page and other data scientists enrolled in
the competition can help you to solve the problem.
48
- The discussion board is very active and full of knowledgeable and
helpful African data scientists willing to assist you.
4. Leaderboard
- After you have uploaded your submission file, you will appear on
the leaderboard. The leaderboard shows your position among all
enrolled data scientists in the competition. Your position will depend
on your performance after your solution has been evaluated. For
this competition, you can submit ten times a day.
5. Team
- You don’t want to do the competition on your own? That’s OK! You
can create a team with fellow data scientists enrolled in the
competition and work together. The maximum number for a team is
4 members. Remember that sharing code between individuals is
not allowed, so if you want to share code with someone else, they
must be on the same team as you.
49
6. Submission
- The submission page is where you will upload your submission file,
by clicking the orange button at the top right side of the page. After
you have uploaded your solution, it will be evaluated according to
the evaluation metric specified in the competition. Then you will see
the score that will define your position on the leaderboard.
Deep learning is a subset of Machine Learning using the approach of neural networks.
This branch works with algorithms built with the aim of mimicking the structure and
function of the human brain.
50
- Output Layer: where the expected output is obtained.
A node/ neuron; is a single unit in the network that receives information from other
neurons. It is the basic building block of a neural network and is also referred to as ‘the
learning unit’.
The image below shows how a deep learning model works at a glance
51
Key difference between Deep Learning and machine learning;
52