15.-Meeting-21-Machine-Learning (Materi Tambahan)

Machine Learning
PERTEMUAN XXI
© IBM 2020
Learning Objectives
In this course you will learn about:

How Statistical Modeling relates to Machine Learning and do a comparison of each.
Real-life examples of Machine learning and how it affects society in ways you may not have guessed!
In the labs: Use Python libraries for Machine Learning, such as scikit-learn.
Explore many algorithms and models:

Popular algorithms: Regression
© IBM 2020
In this course, you’ll learn how Machine
Learning is used in many key fields and
industries.
For example, in the health care industry, data
scientists use Machine Learning to predict
whether a human cell that is believed to be at
risk of developing cancer, is either benign or
Machine Learning malignant.

As such, Machine learning can play a key role
in determining a person’s health and welfare.
You’ll also learn about the value of decision
trees and how building a good decision tree
from historical data helps doctors to prescribe
the proper medicine for each of their patients.
© IBM 2020
Machine Learning Role
As such, Machine learning can

play a key role in determining a
person’s health and welfare.
You’ll also learn about the value
of decision trees and how
building a good decision tree
from historical data helps
doctors to prescribe the proper
medicine for each of their
patients.
© IBM 2020
You’ll learn how bankers use machine learning to make decisions on whether to approve loan applications.
© IBM 2020
And you will learn how to use
machine learning to do bank
customer segmentation, where it
is not usually easy to run for
huge volumes of varied data.
© IBM 2020
In this course, you’ll see how machine learning helps websites such as YouTube, Amazon, or Netflix develop
recommendations to their customers about various products or services, such as which movies they might be
interested in going to see or which books to buy.
© IBM 2020
There is so much that you can do with
Machine Learning!
Here, you’ll learn how to use popular python
libraries to build your model.
For example, given an automobile dataset, we
use the sci-kit learn (sklearn) library to
estimate the Co2 emission of cars using their
Engine size or Cylinders.
We can even predict what the Co2 emissions
Machine Learning will be for a car that hasn’t even been
produced yet!
And we’ll see how the telecommunications
industry can predict customer churn.
You can run and practice the code of all these
samples using the built-in lab environment in
this course.
You don’t have to install anything to your
computer or do anything on the cloud.
© IBM 2020
Introduction
This is a human cell sample extracted from a patient.

And this cell has characteristics … for example, its
Clump thickness is 6, its Uniformity of cell size is 1, its
Marginal adhesion is 1, and so on.
© IBM 2020
Introduction
One of the interesting questions we can ask, at this point is: "Is this a Benign or Malignant cell?"
In contrast with a benign tumor, a malignant tumor is a tumor that may invade its surrounding tissue or spread
around the body, and diagnosing it early might be the key to a patient’s survival.
© IBM 2020
Introduction
One could easily presume that only a doctor with years of experience could diagnose that tumor and say if the
patient is developing cancer or not. Right?
Well, imagine that you’ve obtained a dataset containing characteristics of thousands of human cell samples
extracted from patients who were believed to be at risk of developing cancer.
© IBM 2020
Introduction
Analysis of the original data showed that many of the characteris4cs differed significantly between benign and
malignant samples.
© IBM 2020
Introduction
You can use the values of these cell characteristics in samples from other patients to give an early indication
of whether a new sample might be benign or malignant.
© IBM 2020
Machine Learning
You should clean your data, select a proper algorithm for
building a predic7on model, and train your model to
understand pa8erns of benign or malignant cells within the
data.
Once the model has been trained by going through data

itera7vely, it can be used to predict your new or unknown
cell with a rather high accuracy.
This is machine learning! It is the way that a machine

learning model can do a doctor’s task or at least help that
doctor make the process faster.
© IBM 2020
Machine learning
© IBM 2020
Machine Learning
“without being explicitly programmed.”

Assume that you have a dataset of images of animals such as cats and dogs, and you want to have
software or an application that can recognize and differentiate them.
© IBM 2020
Machine Learning
The first thing that you have to

do here is interpret the images
as a set of feature sets.
For example, does the image
show the animal’s eyes?
If so, what is their size? Does it
have ears? What about a tail?
How many legs? Does it have
wings?
Prior to machine learning, each
image would be transformed to a
vector of features.
© IBM 2020
Machine Learning Then, tradi,onally, we had to write down some rules or methods in
order to get computers to be intelligent and detect the animals.
But, it was a failure.

Why? Well, as you can guess, it needed a lot of rules, highly dependent
on the current dataset, and not generalized enough to detect out-of-
sample cases.
This is when machine learning entered the scene. Using machine

learning allows us to build a model that looks at all the feature sets,
and their corresponding type of animals, and learn it learns the paAern
of each animal. It is a model built by machine learning algorithms.
© IBM 2020
Machine Learning
It detects without explicitly being programmed to do so.

In essence, machine learning follows the same process that a 4-year-old child uses to learn, understand, and
differentiate animals.
© IBM 2020
Machine Learning
So, machine learning algorithms,

inspired by the human learning
process, iteratively learn from
data, and allow computers to
find hidden insights.
These models help us in a variety
of tasks, such as object
recognition, summarization,
recommendation, and so on.
Machine Learning impacts
society in a very influential way.
© IBM 2020
Machine Learning in Real Life
Here are some real-life examples.

First, how do you think Netflix and Amazon recommend videos, movies, and TV shows to its users?
© IBM 2020
Machine Learning in They use Machine Learning to produce sugges4ons that you might enjoy!
Real Life
This is similar to how your friends might recommend a television show to you,
based on their knowledge of the types of shows you like to watch.
How do you think banks make a decision when approving a loan applica4on?
They use machine learning to predict the probability of default for each
applicant, and then approve or refuse the loan applica4on based on that
probability.
Telecommunica4on companies use their customers’ demographic data to
segment them, or predict if they will unsubscribe from their company the next
month.
© IBM 2020
Machine Learning in
Real Life There are many other applica0ons of machine learning
that we see every day in our daily life, such as chatbots,
logging into our phones or even computer games using
face recogni0on.
Each of these use different machine learning techniques
and algorithms.
So, let’s quickly examine a few of the more popular

techniques.
© IBM 2020
1. Regression / Estimation
2. Classification
3. Clustering
Major Machine 4.
5.
Associations
Anomaly Detection
Learning Techniques 6. Sequence Mining
7. Dimension Reduction
8. Recommendation Systems
© IBM 2020
Major Machine Learning Techniques
The Regression/Es4ma4on technique is used for predic4ng

a con4nuous value, for example, predic4ng things like the
price of a house based on its characteris4cs, or to es4mate
the Co2 emission from a car’s engine.
A Classifica4on technique is used for Predic4ng the class or

category of a case, for example, if a cell is benign or
malignant, or whether or not a customer will churn.
© IBM 2020
Major Machine Learning
Techniques
Clustering groups of similar cases, for example, can find
similar pa7ents, or can be used for customer
segmenta7on in the banking field.
Associa7on technique is used for finding items or events

that oGen co-occur, for example, grocery items that are
usually bought together by a par7cular customer.
Anomaly detection is used to discover abnormal and

unusual cases, for example, it is used for credit card fraud
detection.
© IBM 2020
Major Machine Learning
Techniques
Sequence mining is used for predic4ng the next event, for
instance, the click-stream in websites.
Dimension reduc4on on is used to reduce the size of data.
And finally, recommenda4on systems; this associates

people's preferences with others who have similar tastes,
and recommends new items to them, such as books or
movies.
© IBM 2020
Machine Learning vs AI vs
Deep Learning
By this point, this question may has crossed your mind, “What is the difference
between these buzzwords that we keep hearing these days, such as Artificial
intelligence (or AI), Machine Learning and Deep Learning?”
Well, let me explain what is different between them.
In brief, AI tries to make computers intelligent in order to mimic the cognitive functions
of humans.
So, Artificial Intelligence is a general field with a broad scope including: Computer
Vision, Language Processing, Creativity, and Summarization.
© IBM 2020
Machine Learning vs AI vs
Deep Learning
Machine Learning is the branch of AI that covers the statistical part of artificial
intelligence.
It teaches the computer to solve problems by looking at hundreds or thousands of
examples, learning from them, and then using that experience to solve the same
problem in new situations.
And Deep Learning is a very special field of Machine Learning where computers can
actually learn and make intelligent decisions on their own.
Deep learning involves a deeper level of automation in comparison with most machine
learning algorithms.
© IBM 2020
© IBM 2020
Python is a popular and powerful general-
purpose programming language that recently
emerged as the preferred language among
data scientists.
Phyton for Machine You can write your machine learning

algorithm using python, and it works very
Learning well.
However, there are a lot of modules and
libraries already implemented in python that
can make your life much easier.
© IBM 2020
© IBM 2020
We try to introduce the Python packages in
this course and use it in the labs to give you
better hands-on experience.
1. The first package is Numpy, which is a
math library to work with n-dimensional
arrays in Python. It enables you to do
computation efficiently and effectively. It
is better than regular python because of
it’s amazing capabilities. For example, for
working with arrays, dictionaries,
functions, datatypes, and working with
Phyton for Machine images, you need to know Numpy.
2. SciPy is a collection of numerical
Learning algorithms and domain-specific toolboxes,
including signal processing, optimization,
statistics and much more. SciPy is a good
library for scientific and high-performance
computation.
3. Matplotlib is a very popular plotting
package that provides 2D plotting as well
as 3D plotting. Basic Knowledge about
these 3 packages, which are built on top of
python, is a good asset for data scientists
who want to work with real world
problems.
© IBM 2020
4. Pandas library, is a very high-level python
library that provides high-performance,
easy to use data structures. It has many
functions for data importing, manipulation
and analysis In particular, it offers data
Phyton for Machine structures and operations for manipulating

numerical tables and time series.
Learning 5. Scikit-learn is a collection of algorithms

and tools for machine learning, which is
our focus here, and which you’ll learn to
use with in this course. As we’ll be using
scikit-learn quite a bit, in the labs, let me
explain more about it and show you why it
is so popular among data scientists.
© IBM 2020
Scikit-learn is a free machine learning library for the
Python programming language.
It has most of the classification, regression and
clustering algorithms, and it’s designed to work with the
Python numerical and scientific libraries, NumPy and
SciPy.
Also, it includes very good documentation.
On top of that, implementing machine learning models
with scikit learn is really easy, with a few lines of python
Sckit-learn code.
Most of the tasks that need to be done in a machine
learning pipeline are implemented already in scikit
learn,
including, pre-processing of data, feature
selection, feature extraction, train/test splitting,
defining the algorithms, fitting models, tuning
parameters, prediction, evaluation, and
exporting the model.
© IBM 2020
© IBM 2020
Let me show you an example of how scikit
learn looks like when you use this library.
You don’t have to understand the code for
now, but just see how easily you can build a
model with just a few lines of code.
Sckit-Learn Basically, Machine learning algorithms benefit
from standardization of the data set.
If there are some outliers, or different scales
fields in your data set, you have to fix them.
© IBM 2020
How to work with Sckit-Learn
The preprocessing package of scikit learn provides several common utility functions and transformer
classes to change raw feature vectors into a suitable form of vector for modeling.
You have to split your dataset into train and test sets to train your model, and then test the model’s
accuracy separately.
Scikit learn can split arrays or matrices into random train and test subsets for you, in one line of code.
Then, you can setup your algorithm.
© IBM 2020
How to work with Sckit-Learn
For example, you can build a classifier using a support vector classification algorithm.
We call our estimator instance clf, and initialize its parameters. Now, you can train your model with the train set.
By passing our training set to the fit method, the clf model learns to classify unknown cases.
Then, we can use our test set to run predictions.
And, the result tells us what the class of each unknown value is.
© IBM 2020
How to work with
Sckit-Learn
Also, you can use different metrics to evaluate

your model accuracy, for example, using a
confusion matrix to show the results.
And finally, you save your model.
© IBM 2020
Machine Learning
You may find all or some of these machine learning terms
confusing, but don’t worry, we will talk about all of these topics
The most important point to remember is that the en4re process

of a Machine Learning task can be done simply in a few lines of
code, using scikit learn.
Please no4ce that, though it is possible, it would not be that easy

if you want to do all of this using Numpy or Scipy packages.
And of course, it needs much more coding if you use pure python
programming to implement all of these tasks.
© IBM 2020
An easy way to begin grasping the concept of
supervised learning is by looking directly at
the words that make it up.
Supervised vs Supervise means to observe and direct the
execution of a task, project, or activity.
Unsupervised Obviously, we aren’t going to be supervising a
person…
© IBM 2020
Supervised Machine
Learning
Instead, we’ll be supervising a machine learning model that

might be able to produce classification regions like we see
here.
© IBM 2020
Supervised Machine
Learning So, how do we supervise a machine learning model? We do
this by “teaching” the model.
That is, we load the model with knowledge so that we can

have it predict future instances.
But … this leads to the next ques7on, which is, “How exactly
do we teach a model?”
We teach the model by training it with some data from a

labeled dataset.
© IBM 2020
© IBM 2020
Supervised Machine Learning – Labeled Data
It’s important to note that the data is labeled. And what does a labeled dataset look like?
Well, it can look something like this. This example is taken from the cancer dataset.
As you can see, we have some historical data for patients, and we already know the class of each row.
© IBM 2020
Let’s start by introducing some components of this table.

The names up, which are called Clump thickness, Uniformity of cell size, Uniformity of cell shape, Marginal
adhesion, and so on, are called Attributes.
The columns are called Features, which include the data.
If you plot this data, and look at a single data point on a plot, it’ll have all of these attributes.
That would make a row on this chart, also referred to as an observation.
© IBM 2020
Looking directly at the value of the data, you can have two kinds.
The first is numerical. When dealing with machine learning, the most commonly used data is numeric.
The second is categorical… that is, it’s non-numeric, because it contains characters rather than numbers.
In this case, it’s categorical because this dataset is made for Classification.
© IBM 2020
There are two types of Supervised Learning techniques.
They are: classification and regression.
© IBM 2020
Classification
Classification is the process of predicting a discrete class label or category.
© IBM 2020
Regression
Regression is the process of predicting a continuous value as opposed to predicting a categorical value in
Classification.
Look at this dataset.
© IBM 2020
Regression
It is related to Co2 emissions of different cars.
Since we know the meaning of

supervised learning, what do you
think unsupervised learning
It includes Engine size, Cylinders, Fuel Consump<on
means? Yes! and Co2 emission of various models of automobiles.
Unsupervised Learning is exactly
as it sounds.
We do not supervise the model,
but we let the model work on its
own to discover information that
may not be visible to the human
eye.
Given this dataset, you can use regression to predict
the Co2 emission of a new car by using other fields,
such as Engine size or number of Cylinders.
© IBM 2020
Unsupervised Algorithm
It means, The Unsupervised algorithm trains on the dataset, and draws conclusions on UNLABELED data.
Generally speaking, unsupervised learning has more difficult algorithms than supervised learning, since we
know little to no information about the data, or the outcomes that are to be expected.
Dimension reduction, Density estimation, Market basket analysis and Clustering are the most widely used
unsupervised machine learning techniques.
© IBM 2020
Unsupervised Machine Learning Techniques
1. Dimensionality Reduction and/or feature selection play a large role in this by reducing redundant features to
make the classification easier.
2. Market basket analysis is a modelling technique based upon the theory that if you buy a certain group of
items, you’re more likely to buy another group of items.
3. Density estimation is a very simple concept that is mostly used to explore the data to find some structure
within it.
4. And finally, clustering. Clustering is considered to be one of the most popular unsupervised machine learning
techniques used for grouping data points or objects that are somehow similar.
© IBM 2020
Cluster analysis has many applications in
different domains, whether it be a bank’s
desire to segment its customers based on
certain characteristics, or helping an
individual to organize and group his/her
Cluster Analysis favourite types of music!
Generally speaking, though, Clustering is used
mostly for: Discovering structure,
Summarization, and Anomaly detection.
© IBM 2020
Supervised vs So, to recap, the biggest difference between Supervised and Unsupervised Learning
Unsupervised Learning is that supervised learning deals with labeled data while Unsupervised Learning
deals with unlabeled data.
In supervised learning, we have machine learning algorithms for Classification and

Regression.
In unsupervised learning, we have methods such as clustering.
In comparison to supervised learning, unsupervised learning has fewer models and

fewer evalua>on methods that can be used to ensure that the outcome of the
model is accurate.
As such, unsupervised learning creates a less controllable environment, as the

machine is creating outcomes for us.
© IBM 2020
© IBM 2020
Regression
Look at this dataset.

It's related to Co2 emissions from different cars.
It includes Engine size, number of Cylinders, Fuel Consumption and Co2 emission from various
automobile models.
The question is, "Given this dataset, can we predict the Co2 emission of a car using other fields, such as
EngineSize or Cylinders?”
Let’s assume we have some historical data from different cars, and assume that a car, such as in row 9,
has not been manufactured yet, but we're interested in estimating its approximate Co2 emission, after
production. Is it possible?
© IBM 2020
Regression
We can use regression methods to predict a continuous value, such as CO2 Emission, using some other
variables.
Indeed, regression is the process of predicting a continuous value.
In regression there are two types of variables: a dependent variable and one or more independent variables.
© IBM 2020
Regression
The dependent variable can be seen as the "state", "target" or "final goal" we study and try to predict, and the
independent variables, also known as explanatory variables, can be seen as the "causes" of those "states".
The independent variables are shown conventionally by x; and the dependent variable is notated by y.
A regression model relates y, or the dependent variable, to a function of x, i.e., the independent variables.
The key point in the regression is that our dependent value should be continuous, and cannot be a discreet
value.
© IBM 2020
Regression
However, the independent variable or variables can be measured on either a categorical or continuous
measurement scale.
So, what we want to do here is to use the historical data of some cars, using one or more of their features, and
from that data, make a model.
We use regression to build such a regression/estimation model.
Then the model is used to predict the expected Co2 emission for a new or unknown car.
© IBM 2020
Regression Models
Simple regression is when one independent variable is used to estimate a dependent

variable. It can be either linear on non-linear.
For example, predicting Co2emission using the variable of EngineSize.
Linearity of regression is based on the nature of relationship between independent
and dependent variables.
Basically there are 2 types of
regression models:
When more than one independent variable is present, the process is called multiple linear
simple regression and multiple regression.
regression.
For example, predicting Co2emission using EngineSize and the number of Cylinders
in any given car.
Again, depending on the relation between dependent and independent variables, it
can be either linear or non-linear regression.
© IBM 2020
Regression Application
Let’s examine some sample

applications of regression.
It can also be used in the field of psychology, for example, to determine individual
Essentially, we use regression
when we want to estimate a
satisfaction based on demographic and psychological factors.
continuous value. We can use regression analysis to predict the price of a house in an area, based on its size,
For instance, one of the number of bedrooms, and so on.
applications of regression
analysis could be in the area of We can even use it to predict employment income for independent variables, such as hours
sales forecasting. of work, education, occupation, sex, age, years of experience, and so on.
You can try to predict a
salesperson's total yearly sales
from independent variables such
as age, education, and years of
experience.
© IBM 2020
Regression Application
It can also be used in the field of psychology, for example,
to determine individual satisfaction based on
demographic and psychological factors.
We can use regression analysis to predict the price of a

house in an area, based on its size, number of bedrooms,
and so on.
We can even use it to predict employment income for

independent variables, such as hours of work, education,
occupation, sex, age, years of experience, and so on.
© IBM 2020
Regression Algorithms
We have many regression algorithms.

Each of them has its own importance and a specific
condition to which their application is best suited.
And while we've covered just a few of them in this
course, it gives you enough base knowledge for you to
explore different regression techniques.
© IBM 2020
Simple Linear
Regression
You don’t need to know any linear algebra to understand topics in linear regression.
This high-level introduction will give you enough background information on linear regression to be able to use it
effectively on your own problems.
So, let’s get started. Let’s take a look at this dataset. It’s related to the Co2 emission of different cars.
© IBM 2020
Linear Regression
It includes Engine size, Cylinders, Fuel Consumption and Co2 emissions for various car models.
The question is: Given this dataset, can we predict the Co2 emission of a car, using another field, such as Engine
size?
Quite simply, yes! We can use linear regression to predict a continuous value such as Co2 Emission, by using
other variables.
Linear regression is the approximation of a linear model used to describe the relationship between two or more
variables.
In simple linear regression, there are two variables: a dependent variable and an independent variable.
The key point in the linear regression is that our dependent value should be continuous and cannot be a discreet
value.
© IBM 2020
Linear Regression
However, the independent variable(s) can be measured on either a categorical or continuous measurement
scale.
© IBM 2020
Linear Regression Models
There are two types of linear regression models.
They are: simple regression and multiple regression.
Simple linear regression is when one independent variable is used to estimate a dependent variable.
For example, predicting Co2 emission using the EngineSize variable.
When more than one independent variable is present, the process is called multiple linear regression.
For example, predicting Co2 emission using EngineSize and Cylinders of cars.
Our focus in this section is on simple linear regression.
Now, let’s see how linear regression works.
© IBM 2020
© IBM 2020
Linear Regression
So let’s look at our dataset again.

To understand linear regression, we can plot our variables here.
We show Engine size as an independent variable, and Emission as the target value that we would like to predict.
A scatterplot clearly shows the relation between variables where changes in one variable "explain" or possibly
"cause" changes in the other variable.
© IBM 2020
Linear Regression
Also, it indicates that these variables are linearly related. With linear regression you can fit a line through the
data.
For instance, as the EngineSize increases, so do the emissions.
With linear regression, you can model the relationship of these variables.
A good model can be used to predict what the approximate emission of each car is.
© IBM 2020
Linear Regression
How do we use this line for prediction now? Let us assume, for a moment, that the line is a good fit of data. We
can use it to predict the emission of an unknown car. For example, for a sample car, with engine size 2.4, you
can find the emission is 214.
© IBM 2020
Linear Regression – Fitting Line
Now, let’s talk about what this fitting line actually is. We’re going to predict the target value, y.
In our case, using the independent variable, "Engine Size," represented by x1.
© IBM 2020
The fit line is shown traditionally as a polynomial. In a simple regression problem (a single x), the form of the
model would be θ0 +θ1 x1.
In this equation, y ̂ is the dependent variable or the predicted value, and x1 is the independent variable; θ0 and
θ1 are the parameters of the line that we must adjust. θ1 is known as the "slope" or "gradient” of the fitting line
and θ0 is known as the "intercept."θ0 and θ1 are also called the coefficients of the linear equation.
You can interpret this equation as y ̂ being a function of x1, or y ̂ being dependent of x1.
Now the questions are: "How would you draw a line through the points?" And, "How do
you determine which line ‘fits best’?”
Linear regression estimates the coefficients of the line.
This means we must calculate θ0 and θ1 to find the best line to ‘fit’ the data.
This line would best estimate the emission of the unknown data points.
© IBM 2020
Let’s see how we can find this line, or to be more precise, how we can adjust the parameters to make the line
the best fit for the data.
For a moment, let’s assume we’ve already found the best fit line for our data.
Now, let’s go through all the points and check how well they align with this line.
Best fit, here, means that if we have, for instance, a car with engine size x1=5.4, and actual Co2=250, its Co2
should be predicted very close to the actual value, which is y=250, based on historical data.
© IBM 2020
But, if we use the fit line, or better to say, using our polynomial with known parameters to predict the Co2
emission, it will return y ̂ =340.
Now, if you compare the actual value of the emission of the car with what we predicted using our model, you will
find out that we have a 90-unit error.
© IBM 2020
This means our prediction line is not accurate. This error is also called the residual error.
So, we can say the error is the distance from the data point to the fitted regression line.
The mean of all residual errors shows how poorly the line fits with the whole dataset.
Mathematically, it can be shown by the equation, mean squared error, shown as (MSE).
© IBM 2020
Our objective is to find a line where the mean of all these errors is minimized.
In other words, the mean error of the prediction using the fit line should be minimized.
Let’s re-word it more technically.
The objective of linear regression is to minimize this MSE equation, and to minimize it, we should find the best
parameters, θ0 and θ1.
© IBM 2020
Linear Regression –
Fitting Line Actually, we have two options here:
Now, the question is, how to find

θ0 and θ1 in such a way that it
Option 1 - We can use a mathematic
minimizes this error?
How can we find such a perfect
approach.
line?
Or, said another way, how should
we find the best parameters for
our line? Should we move the
Or, Option 2 - We can use an optimization
line a lot randomly and calculate
the MSE value every time, and
approach.
choose the minimum one?
Not really!
Let’s see how we can easily use a
mathematic formula to find the θ0 and θ1.
© IBM 2020
Notice that all of the data must be available to traverse and calculate the parameters. It can be shown that
the intercept and slope can be calculated using these equations. We can start off by estimating the value for
θ1.
As mentioned before, θ0 and θ1, in the simple linear regression, are the coefficients of the fit line.
We can use a simple equation to estimate these coefficients. That is, given that it’s a simple linear regression,
with only 2 parameters, and knowing that θ0 and θ1 are the intercept and slope of the line, we can estimate
them directly from our data.
It requires that we calculate the mean of the independent and dependent or target columns, from the dataset.
© IBM 2020
This is how you can find the slope of a line based on the data.
x ̅ is the average value for the engine size in our dataset.
Please consider that we have 9 rows here, row 0 to 8.
First, we calculate the average of x1 and average of y. Then we plug it into the slope equation, to find θ1.
© IBM 2020
The xi and yi in the equation refer to the fact that we need to repeat these calculations across all values in our
dataset and i refers to the i’th value of x or y.
Applying all values, we find θ1=39; it is our second parameter.

It is used to calculate the first parameter, which is the intercept of the line.
© IBM 2020
Now, we can plug θ1 into the line equation to find θ0.

It is easily calculated that θ0=125.74. So, these are the two parameters for the line, where θ0 is also
called the bias coefficient and θ1 is the coefficient for the Co2 Emission column.
As a side note, you really don’t need to remember the formula for calculating these parameters, as most of
the libraries used for machine learning in Python, R, and Scala can easily find these parameters for you.
But it’s always good to understand how it works.Now, we can write down the polynomial of the line.
© IBM 2020
So, we know how to find the best fit for our data, and its equation.
Now the question is: "How can we use it to predict the emission of a new car based on its engine size?"
After we found the parameters of the linear equation, making predictions is as simple as solving the equation for a
specific set of inputs.
Imagine we are predicting Co2 Emission(y) from EngineSize(x) for the Automobile in record number 9. Our linear
regression model representation for this problem would be: y ̂ = θ0 + θ1 x1.
© IBM 2020
Or if we map it to our dataset, it would be Co2Emission = θ0 + θ1 EngineSize.
As we saw, we can find θ0, θ1 using the equations that we just talked about.
Once found, we can plug in the equation of the linear model.
For example, let’s use θ0=125 and θ1=39.
So, we can rewrite the linear model as 𝐶𝑜2𝐸𝑚𝑖𝑠𝑠𝑖𝑜𝑛=125+39𝐸𝑛𝑔𝑖𝑛𝑒𝑆𝑖𝑧𝑒.
© IBM 2020
Now, let’s plug in the 9th row of our dataset and calculate the Co2 Emission for a car with an EngineSize of 2.4. So
Co2Emission = 125 + 39 × 2.4.
Therefore, we can predict that the Co2 Emission for this specific car would be 218.6.
© IBM 2020
Let’s talk a bit about why Linear Regression is
so useful.
Quite simply, it is the most basic regression to
use and understand. In fact, one reason why
Linear Regression is so useful is that it’s fast!
It also doesn’t require tuning of parameters.
Linear Regression So, something like tuning the K parameter in
K-Nearest Neighbors or the learning rate in
Neural Networks isn’t something to worry
about.
Linear Regression is also easy to understand
and highly interpretable.
© IBM 2020
Multiple Linear Regression
As you know there are two types of linear regression models: simple regression and multiple regression.
Simple linear regression is when one independent variable is used to estimate a dependent variable. For
example, predicting Co2 emission using the variable of EngineSize.
In reality, there are multiple variables that predict the Co2 emission. When multiple independent variables
are present, the process is called "multiple linear regression." For example, predicting Co2 emission using
EngineSize and the number of Cylinders in the car’s engine.
The good thing is that multiple linear regression is the extension of the simple linear regression model.
Before we dive into a sample dataset and see how multiple linear regression works, I want to tell you what kind
of problems it can solve; when we should use it; and, specifically, what kind of questions we can answer using it.
© IBM 2020
First, it can be used when we would like to identify the strength of the effect that the
independent variables have on a dependent variable.
For example, does revision time, test anxiety, lecture attendance, and gender, have any
effect on exam performance of students?
Second, it can be used to predict the impact of changes.
Basically, there are two That is, to understand how the dependent variable changes when we change the
applications for multiple linear independent variables.
regression.
For example, if we were reviewing a person’s health data, a multiple linear regression can tell
you how much that person’s blood pressure goes up (or down) for every unit increase (or
decrease) in a patient’s body mass index (BMI), holding other factors constant.
© IBM 2020
As is the case with simple linear regression, multiple linear regression is a method of predicting a continuous
variable.
It uses multiple variables, called independent variables, or predictors, that best predict the value of the target
variable, which is also called the dependent variable.
In multiple linear regression, the target value, y, is a linear combination of independent variables, x.
For example, you can predict how much Co2 a car might emit due to independent variables, such as the car’s
Engine Size, Number of Cylinders and Fuel Consumption.
© IBM 2020
Multiple linear regression is very useful because you can examine which variables are significant predictors of
the outcome variable.
Also, you can find out how each feature impacts the outcome variable.
And again, as is the case in simple linear regression, if you manage to build such a regression model, you can
use it to predict the emission amount of an unknown case, such as record number 9.
© IBM 2020
Generally, the model is of the form: y ̂=θ_0+ θ_1 x_1+ θ_2 x_2 and so on, up to ... +θ_n x_n.
Mathematically, we can show it as a vector form as well.
This means, it can be shown as a dot product of 2 vectors: the parameters vector and the feature set vector.
Generally, we can show the equation for a multi-dimensional space as θ^T x, where θ is an n-by-one vector
of unknown parameters in a multi-dimensional space, and x is the vector of the feature sets, as θ is a vector
of coefficients, and is supposed to be multiplied by x.
© IBM 2020
Conventionally, it is shown as transpose θ. θ is also called the parameters, or, weight vector of the regression
equation … both these terms can be used interchangeably. And x is the feature set, which represents a car.
For example x1 for engine size, or x2 for cylinders, and so on.
The first element of the feature set would be set to 1, because it turns the θ_0 into the intercept or
bias parameter when the vector is multiplied by the parameter vector.
© IBM 2020
Please notice that θ^T x in a one dimensional space, is the equation of a line. It is what we use in simple linear
regression. In higher dimensions, when we have more than one input (or x), the line is called a plane or a hyper-
plane.
And this is what we use for multiple linear regression.
So, the whole idea is to find the best fit hyper-plane for our data.
To this end, and as is the case in linear regression, we should estimate the values
for θ vector that best predict the value of the target field in each row.
To achieve this goal, we have to minimize the error of the prediction.
Now, the question is, "How do we find the optimized parameters?"
© IBM 2020
To find the optimized parameters for our
model, we should first understand what the
optimized parameters are.
Then we will find a way to optimize the
parameters.
In short, optimized parameters are the ones
Optimize which lead to a model with the fewest errors.
Let’s assume, for a moment, that we have
Parameters already found the parameter vector of our
model. It means we already know the values
of θ vector.
Now, we can use the model, and the feature
set of the first row of our dataset to predict
the Co2 emission for the first car, correct?
© IBM 2020
Optimize Parameters
If we plug the feature set values into the model equation, we find y ̂.
Let’s say, for example, it returns 140 as the predicted value for this specific row.
What is the actual value? y=196.
How different is the predicted value from the actual value of 196? Well, we can calculate it quite simply, as
196-140, which of course = 56.
© IBM 2020
Optimize Parameters
This is the error of our model, only for one row, or one car, in our case.
As is the case in linear regression, we can say the error here is the distance from the data point to the fitted
regression model.
The mean of all residual errors shows how bad the model is representing the dataset. It is called the mean
squared error, or MSE.
Mathematically, MSE can be shown by an equation. While this is not the only way to expose the error of a
multiple linear regression model, it is one the most popular ways to do so.
The best model for our dataset is the one with minimum error for all prediction values.
© IBM 2020
Optimize Parameters
So, the objec7ve of mul7ple linear regression is to minimize
the MSE equa7on.
However, the most common To minimize it, we should find the best parameters θ, but
methods are the ordinary least
squares and optimization
how?
approach.
Ordinary least squares tries to
estimate the values of the
coefficients by minimizing the
“Mean Square Error.” Okay, “How do we find the parameter or coefficients for
This approach uses the data as a
multiple linear regression?”
matrix and uses linear algebra
operations to estimate the
optimal values for the theta.
There are many ways to es7mate the value of these

coefficients.
© IBM 2020
Optimize Parameters
The problem with this technique is the time complexity of calculating matrix operations, as it can take a very
long time to finish.
When the number of rows in your dataset is less 10,000 you can think of this technique as an option, however,
for greater values, you should try other faster approaches.
© IBM 2020
Optimize Parameters
The second option is to use an optimization algorithm to find the best parameters.
That is, you can use a process of optimizing the values of the coefficients by iteratively minimizing the
error of the model on your training data.
For example, you can use Gradient Descent, which starts optimization with random values for each
coefficient.
Then, calculates the errors, and tries to minimize it through wise changing of the coefficients in multiple
iterations.
Gradient descent is a proper approach if you have a large dataset.
© IBM 2020
Optimize Parameters
Please understand, however, that there are other approaches to estimate the parameters of the multiple linear
regression that you can explore on your own.
After you find the best parameters for your model, you can go to the prediction phase.
After we found the parameters of the linear equation, making predictions is as simple as solving the equation
for a specific set of inputs. Imagine we are predicting Co2 emission (or y) from other variables for the
automobile in record number 9.
© IBM 2020
Optimize Parameters
Our linear regression model representation for this problem would be: y ̂=θ^T x. Once we find the parameters,
we can plug them into the equation of the linear model. For example, let’s use θ0 = 125, θ1 = 6.2, θ2 = 14,
and so on.
If we map it to our dataset, we can rewrite the linear model as "Co2Emission=125 plus 6.2 multiplied by
EngineSize plus 14 multiplied by Cylinder," and so on.
As you can see, multiple linear regression estimates the relative importance of predictors.
© IBM 2020
For example, it shows Cylinder has higher impact on Co2 emission amounts in comparison with EngineSize.
Now, let’s plug in the 9th row of our dataset and calculate the Co2 emission for a car with the EngineSize of
2.4.
So Co2Emission=125 + 6.2 × 2.4 + 14 × 4 … and so on.
We can predict the Co2 emission for this specific car would be 214.1.
© IBM 2020
Multiple Linear
Regression Concerns
Now let me address some concerns that you might already be having regarding multiple
linear regression.
As you saw, you can use multiple independent variables to predict a target value in multiple
linear regression.
It sometimes results in a better model compared to using a simple linear regression, which
uses only one independent variable to predict the dependent variable.
Now, the question is, "How many independent variables should we use for the prediction?”
Should we use all the fields in our dataset? Does adding independent variables to a multiple
linear regression model always increase the accuracy of the model?
© IBM 2020
Multiple Linear Regression Concerns
Basically, adding too many independent variables without any theoretical justification may result in an over-
fit model.
An over-fit model is a real problem because it is too complicated for your data set and not general enough to
be used for prediction.
So, it is recommended to avoid using many variables for prediction.
There are different ways to avoid overfitting a model in regression.
The next question is, “Should independent variables be continuous?”

Basically, categorical independent variables can be incorporated into a regression model by converting them
into numerical variables.
For example, given a binary variable such as car type, the code dummies “0” for “Manual” and 1 for
“automatic” cars.
© IBM 2020
Multiple Linear Regression Concerns
As a last point, remember that “multiple linear regression” is a specific type of linear regression.
So, there needs to be a linear relationship between the dependent variable and each of your independent
variables.
There are a number of ways to check for linear relationship.
For example, you can use scatterplots, and then visually check for linearity.
If the relationship displayed in your scatterplot is not linear, then, you need to use non-linear regression.
© IBM 2020
Model Evaluation
The goal of regression is to build a model to
accurately predict an unknown case.
These approaches are:

train and test on the same
dataset and train/test split.
We’ll talk about what each of
To this end, we have to perform regression
these are, as well as the pros evalua;on a<er building the model.
and cons of using each of these
models.
Also, we’ll introduce some
metrics for accuracy of
regression models.
In this section, we’ll introduce and discuss two

types of evaluation approaches that can be used
to achieve this goal.
© IBM 2020
Let’s look at the first approach.
When considering evaluation models, we
clearly want to choose the one that will give
us the most accurate results.
Model Accuracy So, the question is, how we can calculate the
accuracy of our model?
In other words, how much can we trust this
model for prediction of an unknown sample,
using a given a dataset and having built a
model such as linear regression.
© IBM 2020
Model Accuracy
One of the solutions is to select a portion of our dataset for

testing.
For instance, assume that we have 10 records in our dataset.
We use the entire dataset for training, and we build a model
using this training set.
© IBM 2020
Model Accuracy
Now, we select a small portion of the dataset, such as row numbers 6 to 9, but without the labels.
This set, is called a test set, which has the labels, but the labels are not used for prediction, and is used only as
ground truth.
© IBM 2020
Model Accuracy
The labels are called “Actual values” of the test set.

Now, we pass the feature set of the testing portion to our built model, and predict the target values.
Finally, we compare the predicted values by our model with the actual values in the test set.
© IBM 2020
Model Accuracy
This indicates how accurate our model actually is.

There are different metrics to report the accuracy of the model, but most of them work generally, based on the
similarity of the predicted and actual values.
Let’s look at one of the simplest metrics to calculate the accuracy of our regression model.
As mentioned, we just compare the actual values, y, with the predicted values, which is noted as y ̂ for the
testing set.
© IBM 2020
Model Accuracy
The error of the model is calculated as the average difference between the predicted and actual values for all the
rows.
We can write this error as an equation.
So, the first evaluation approach we just talked about is the simplest one: train and test on the SAME dataset.
© IBM 2020
Model Accuracy
Essen4ally, the name of this approach says it all … you train
the model on the en4re dataset, then you test it using a
por4on of the same dataset.
What is training accuracy and

out-of-sample accuracy? In a general sense, when you test with a dataset in which
We said that training and testing you know the target value for each data point, you’re able
on the same dataset produces a to obtain a percentage of accurate predictions for the
high training accuracy, but what
exactly is "training accuracy?" model.
Training accuracy is the
percentage of correct predictions
that the model makes when
using the test dataset.
This evaluation approach would most likely have a high
“training accuracy” and a low “out-of-sample accuracy”,
since the model knows all of the testing data points from
the training.
© IBM 2020
Model Accuracy
However, a high training accuracy isn’t necessarily a good thing.

For instance, having a high training accuracy may result in an ‘over-fit’ of the data.
This means that the model is overly trained to the dataset, which may capture noise and produce a non-
generalized model
© IBM 2020
Out-of-sample accuracy is the percentage of
correct predictions that the model makes on
data that the model has NOT been trained on.
Doing a “train and test” on the same dataset
will most likely have low out-of-sample
Out of Sample accuracy due to the likelihood of being over-
fit.
Accuracy It’s important that our models have high, out-
of-sample accuracy, because the purpose of
our model is, of course, to make correct
predictions on unknown data.
So, how can we improve out-of-sample
accuracy?
© IBM 2020
Out of Sample Accuracy
One way is to use another evaluation approach called "Train/Test Split.” It’s important that our models have
high, out-of-sample accuracy, because the purpose of our model is, of course, to make correct predictions on
unknown data.
So, how can we improve out-of-sample accuracy?
One way is to use another evaluation approach called "Train/Test Split."
© IBM 2020
In this approach, we select a portion of our dataset for training, for example, rows 0 to 5. And the rest is used
for testing, for example, rows 6 to 9.
© IBM 2020
The model is built on the training set.

Then, the test feature set is passed to the model for prediction.
And finally, the predicted values for the test set are compared with the actual values of the testing set.
© IBM 2020
Train/Test Split involves splitting the dataset into training and testing sets, respectively, which are mutually
exclusive, after which, you train with the training set and test with the testing set.
This will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is NOT
part of the dataset that has been used to train the data.
It is more realistic for real world problems.
This means that we know the outcome of each data point in this dataset, making it great to test with!
And since this data has not been used to train the model, the model has no knowledge of the outcome of these
data points.
So, in essence, it’s truly out-of-sample testing.
© IBM 2020
The issue with train/test split is that it’s highly dependent on the datasets on which the data was trained and
tested.
The variation of this causes train/test split to have a better out-of-sample prediction than training and testing
on the same dataset, but it still has some problems due to this dependency.
© IBM 2020
K-Fold Cross-validation
Another evaluation model, called "K-Fold Cross-validation," resolves most of these issues.
How do you fix a high variation that results from a dependency?
Well, you average it.
Let me explain the basic concept of “k-fold cross-validation” to see how we can solve this problem.
The entire dataset is represented by the points in the image at the top left. If we have k=4 folds, then we
split up this dataset as shown here.
© IBM 2020
In the first fold, for example, we use the first 25 percent of the dataset for testing, and the rest for training.
© IBM 2020
Finally the result of all 4 evaluations are averaged.

That is, the accuracy of each fold is then averaged, keeping in mind that each fold is distinct, where no training
data in one fold is used in another.
The model is built using the training set, and is evaluated using the test set.
Then, in the next round (or in the second fold), the second 25 percent of the dataset is used for testing and the
rest for training the model.
Again the accuracy of the model is calculated. We continue for all folds.
© IBM 2020
K-fold cross-validation, in its simplest form, performs multiple train/test splits
using the same dataset where each split is different.
Then, the result is averaged to produce a more consistent out-of-sample
accuracy.
We wanted to show you an evaluation model that addressed some of the
issues we’ve described in the previous approaches.
© IBM 2020
EVALUATION METRICS IN REGRESSION
Evaluation metrics are used to explain the performance of a model.

Let’s talk more about the model evaluation metrics that are used for regression.
As mentioned, basically, we can compare the actual values and predicted values to calculate the accuracy of a
regression model.
Evaluation metrics provide a key role in the development of a model, as it provides insight to areas that require
improvement.
© IBM 2020
EVALUATION METRICS IN REGRESSION
We’ll be reviewing a number of model evaluation metrics, including:

Mean Absolute Error (MAE),
Mean Squared Error (MSE), and
Root Mean Squared Error (RMSE).
© IBM 2020
Defining Error
But, before we get into defining these, we need to define what an error actually is.
In the context of regression, the error of the model is the difference between the data points and the trend line
generated by the algorithm.
Since there are multiple data points, an error can be determined in multiple ways.
© IBM 2020
Defining Error
Mean absolute error is the mean of the absolute value of the errors.
This is the easiest of the metrics to understand, since it’s just the average error.
Mean Squared Error (MSE) is the mean of the squared error.

It’s more popular than Mean absolute error because the focus is geared more towards large errors.
This is due to the squared term exponentially increasing larger errors in comparison to smaller ones.
© IBM 2020
Defining Error
Root Mean Squared Error (RMSE) is the square root of the mean squared error.
This is one of the most popular of the evaluation metrics because Root Mean Squared Error is interpretable in the same units as the
response vector (or ‘y’ units) making it easy to relate its information.
Relative Absolute Error (RAE), also known as Residual sum of square, where y-bar is a mean value of y, takes
the total absolute error and normalizes it by dividing by the total absolute error of the simple predictor.
© IBM 2020
Defining Error
Relative Squared Error (RSE) is very similar to “Relative absolute error “, but is widely adopted by the data
science community, as it is used for calculating R-squared.
R-squared is not error, per se, but is a popular metric for the accuracy of your model.
It represents how close the data values are to the fitted regression line.
The higher the R-squared, the better the model fits your data.
Each of these metrics can be used for quantifying of your prediction.

The choice of metric completely depends on the type of model, your data type, and domain of knowledge.
© IBM 2020
Non Linear Regression
These data points correspond to China's Gross Domestic Product (or GDP) from 1960 to 2014.
The first column, is the years, and the second, is China's corresponding annual gross domestic income in US
dollars for that year.
© IBM 2020
This is what the data points look like. Now, we have a couple of interesting questions.
First, “Can GDP be predicted based on time?”
And second, “Can we use a simple linear regression to model it?”
© IBM 2020
Indeed, if the data shows a curvy trend, then linear regression will not produce very accurate results
when compared to a non-linear regression -- simply because, as the name implies, linear regression
presumes that the data is linear.
The scatterplot shows that there seems to be a strong relationship between GDP and time, but the
relationship is not linear.
As you can see, the growth starts off slowly, then from 2005 onward, the growth is very significant. And
finally, it decelerates slightly in the 2010s.
It kind of looks like either a logistical or exponential function. So, it requires a special estimation method of
the non-linear regression procedure.
© IBM 2020
For example, if we assume that the model for these data points are exponential functions, such as y ̂ = θ_0 + θ_1
〖θ_2〗^x, our job is to estimate the parameters of the model, i.e. θs, and use the fitted model to predict GDP for
unknown or future cases.
In fact, many different regressions exist that can be used to fit whatever the dataset looks like.
You can see a quadratic and cubic regression lines here, and it can go on and on to infinite degrees.
© IBM 2020
Polynomial Regression
In essence, we can call all of these "polynomial regression," where the relationship between the independent
variable x and the dependent variable y is modelled as an nth degree polynomial in x.
With many types of regression to choose from, there’s a good chance that one will fit your dataset well.
Remember, it’s important to pick a regression that fits the data the best.
© IBM 2020
Polynomial Regression
So, what is polynomial Regression? Polynomial regression fits a curved line to your data.
A simple example of polynomial, with degree 3, is shown as y ̂ = θ_0 + θ_1x + θ_2x^2 + θ_3x^3 or to the power
of 3, where θs are parameters to be estimated that makes the model fit perfectly to the underlying data.
© IBM 2020
Non Linear and Polynomial Regression
Though the relationship between x and y is non-linear here, and polynomial regression can fit them, a
polynomial regression model can still be expressed as linear regression.
I know it's a bit confusing, but let’s look at an example.
Given the 3rd degree polynomial equation, by defining x_1 = x and x_2 = x^2 or x to the power of 2 and so on,
the model is converted to a simple linear regression with new variables, as y ̂ = θ_0+ θ_1x_1 + θ_2x_2 +
θ_3x_3.
This model is linear in the parameters to be estimated, right?
Therefore, this polynomial regression is considered to be a special case of traditional multiple linear
regression.
© IBM 2020
So, what is “non-linear regression” exactly?

First, non-linear regression is a method to model a non-linear relationship between the dependent variable
and a set of independent variables.
Second, for a model to be considered non-linear, y ̂ must be a non-linear function of the parameters θ, not
necessarily the features x.
So, you can use the same mechanism as linear regression to solve such a problem.
Therefore, polynomial regression models CAN fit using the model of least squares.
Least squares is a method for estimating the unknown parameters in a linear regression model, by minimizing
the sum of the squares of the differences between the observed dependent variable in the given dataset and
those predicted by the linear function.
© IBM 2020
© IBM 2020
When it comes to non-linear equation, it can be the shape of exponential, logarithmic, and logistic, or many
other types.
As you can see, in all of these equations, the change of y ̂ depends on changes in the parameters θ, not
necessarily on x only.
© IBM 2020
That is, in non-linear regression, a model is non-linear by parameters.

In contrast to linear regression, we cannot use the ordinary "least squares" method to fit the data in non-
linear regression, and in general, estimation of the parameters is not easy.
Let me answer two important questions here:
First, “How can I know if a problem is linear or non-linear in an easy way?”
To answer this question, we have to do two things:

The first is to visually figure out if the relation is linear or non-linear. It’s best to plot bivariate plots of output
variables with each input variable.
Also, you can calculate the correlation coefficient between independent and dependent variables, and if for all
variables it is 0.7 or higher there is a linear tendency, and, thus, it’s not appropriate to fit a non-linear regression.
The second thing we have to do is to use non-linear regression instead of linear regression when we cannot
accurately model the relationship with linear parameters.
© IBM 2020
The second important questions is, “How should I model my data, if it displays non-linear on a scatter plot?”
Well, to address this, you have to use either a polynomial regression, use a non-linear regression model, or
"transform" your data
© IBM 2020
Lab
Please Continue to lab https://courses.cognitiveclass.ai/courses/course-

v1:CognitiveClass+ML0101ENv3+2018/courseware/76d637cbe8024e509dc445df847
e6c3a/7c7d5af7bad847c79d2c1e2a24124d6e/
© IBM 2020

15.-Meeting-21-Machine-Learning (Materi Tambahan)

Uploaded by

Copyright:

Available Formats

15.-Meeting-21-Machine-Learning (Materi Tambahan)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

15.-Meeting-21-Machine-Learning (Materi Tambahan)

Uploaded by

Copyright:

Available Formats

Machine Learning

In this course you will learn about:

Explore many algorithms and models:

Machine Learning malignant.

As such, Machine learning can

This is a human cell sample extracted from a patient.

Once the model has been trained by going through data

This is machine learning! It is the way that a machine

“without being explicitly programmed.”

The first thing that you have to

But, it was a failure.

This is when machine learning entered the scene. Using machine

It detects without explicitly being programmed to do so.

So, machine learning algorithms,

Here are some real-life examples.

So, let’s quickly examine a few of the more popular

The Regression/Es4ma4on technique is used for predic4ng

A Classiﬁca4on technique is used for Predic4ng the class or

Associa7on technique is used for ﬁnding items or events

Anomaly detection is used to discover abnormal and

Dimension reduc4on on is used to reduce the size of data.

And ﬁnally, recommenda4on systems; this associates

Phyton for Machine You can write your machine learning

Phyton for Machine structures and operations for manipulating

Learning 5. Scikit-learn is a collection of algorithms

Also, you can use different metrics to evaluate

The most important point to remember is that the en4re process

Please no4ce that, though it is possible, it would not be that easy

Instead, we’ll be supervising a machine learning model that

That is, we load the model with knowledge so that we can

We teach the model by training it with some data from a

Let’s start by introducing some components of this table.

Classification is the process of predicting a discrete class label or category.

Since we know the meaning of

In supervised learning, we have machine learning algorithms for Classification and

In unsupervised learning, we have methods such as clustering.

In comparison to supervised learning, unsupervised learning has fewer models and

As such, unsupervised learning creates a less controllable environment, as the

Look at this dataset.

Simple regression is when one independent variable is used to estimate a dependent

Let’s examine some sample

We can use regression analysis to predict the price of a

We can even use it to predict employment income for

We have many regression algorithms.

So let’s look at our dataset again.

Now, the question is, how to find

Applying all values, we find θ1=39; it is our second parameter.

Now, we can plug θ1 into the line equation to find θ0.

There are many ways to es7mate the value of these

The next question is, “Should independent variables be continuous?”

These approaches are:

In this section, we’ll introduce and discuss two

One of the solutions is to select a portion of our dataset for

The labels are called “Actual values” of the test set.

This indicates how accurate our model actually is.

What is training accuracy and

However, a high training accuracy isn’t necessarily a good thing.

The model is built on the training set.

Finally the result of all 4 evaluations are averaged.

Evaluation metrics are used to explain the performance of a model.