15.-Meeting-21-Machine-Learning (Materi Tambahan)
15.-Meeting-21-Machine-Learning (Materi Tambahan)
15.-Meeting-21-Machine-Learning (Materi Tambahan)
PERTEMUAN XXI
© IBM 2020
Learning Objectives
© IBM 2020
In this course, you’ll learn how Machine
Learning is used in many key fields and
industries.
For example, in the health care industry, data
scientists use Machine Learning to predict
whether a human cell that is believed to be at
risk of developing cancer, is either benign or
© IBM 2020
Machine Learning Role
© IBM 2020
Machine Learning Role
You’ll learn how bankers use machine learning to make decisions on whether to approve loan applications.
© IBM 2020
Machine Learning Role
And you will learn how to use
machine learning to do bank
customer segmentation, where it
is not usually easy to run for
huge volumes of varied data.
© IBM 2020
Machine Learning Role
In this course, you’ll see how machine learning helps websites such as YouTube, Amazon, or Netflix develop
recommendations to their customers about various products or services, such as which movies they might be
interested in going to see or which books to buy.
© IBM 2020
There is so much that you can do with
Machine Learning!
Here, you’ll learn how to use popular python
libraries to build your model.
For example, given an automobile dataset, we
use the sci-kit learn (sklearn) library to
estimate the Co2 emission of cars using their
Engine size or Cylinders.
We can even predict what the Co2 emissions
Machine Learning will be for a car that hasn’t even been
produced yet!
And we’ll see how the telecommunications
industry can predict customer churn.
You can run and practice the code of all these
samples using the built-in lab environment in
this course.
You don’t have to install anything to your
computer or do anything on the cloud.
© IBM 2020
Introduction
© IBM 2020
Introduction
One of the interesting questions we can ask, at this point is: "Is this a Benign or Malignant cell?"
In contrast with a benign tumor, a malignant tumor is a tumor that may invade its surrounding tissue or spread
around the body, and diagnosing it early might be the key to a patient’s survival.
© IBM 2020
Introduction
One could easily presume that only a doctor with years of experience could diagnose that tumor and say if the
patient is developing cancer or not. Right?
Well, imagine that you’ve obtained a dataset containing characteristics of thousands of human cell samples
extracted from patients who were believed to be at risk of developing cancer.
© IBM 2020
Introduction
Analysis of the original data showed that many of the characteris4cs differed significantly between benign and
malignant samples.
© IBM 2020
Introduction
You can use the values of these cell characteristics in samples from other patients to give an early indication
of whether a new sample might be benign or malignant.
© IBM 2020
Machine Learning
You should clean your data, select a proper algorithm for
building a predic7on model, and train your model to
understand pa8erns of benign or malignant cells within the
data.
© IBM 2020
Machine learning
© IBM 2020
Machine Learning
© IBM 2020
Machine Learning
© IBM 2020
Machine Learning Then, tradi,onally, we had to write down some rules or methods in
order to get computers to be intelligent and detect the animals.
© IBM 2020
Machine Learning
© IBM 2020
Machine Learning
© IBM 2020
Machine Learning in Real Life
© IBM 2020
Machine Learning in They use Machine Learning to produce sugges4ons that you might enjoy!
Real Life
This is similar to how your friends might recommend a television show to you,
based on their knowledge of the types of shows you like to watch.
How do you think banks make a decision when approving a loan applica4on?
They use machine learning to predict the probability of default for each
applicant, and then approve or refuse the loan applica4on based on that
probability.
Telecommunica4on companies use their customers’ demographic data to
segment them, or predict if they will unsubscribe from their company the next
month.
© IBM 2020
Machine Learning in
Real Life There are many other applica0ons of machine learning
that we see every day in our daily life, such as chatbots,
logging into our phones or even computer games using
face recogni0on.
Each of these use different machine learning techniques
and algorithms.
© IBM 2020
1. Regression / Estimation
2. Classification
3. Clustering
Major Machine 4.
5.
Associations
Anomaly Detection
Learning Techniques 6. Sequence Mining
7. Dimension Reduction
8. Recommendation Systems
© IBM 2020
Major Machine Learning Techniques
© IBM 2020
Major Machine Learning
Techniques
Clustering groups of similar cases, for example, can find
similar pa7ents, or can be used for customer
segmenta7on in the banking field.
© IBM 2020
Major Machine Learning
Techniques
Sequence mining is used for predic4ng the next event, for
instance, the click-stream in websites.
© IBM 2020
Machine Learning vs AI vs
Deep Learning
By this point, this question may has crossed your mind, “What is the difference
between these buzzwords that we keep hearing these days, such as Artificial
intelligence (or AI), Machine Learning and Deep Learning?”
Well, let me explain what is different between them.
In brief, AI tries to make computers intelligent in order to mimic the cognitive functions
of humans.
So, Artificial Intelligence is a general field with a broad scope including: Computer
Vision, Language Processing, Creativity, and Summarization.
© IBM 2020
Machine Learning vs AI vs
Deep Learning
Machine Learning is the branch of AI that covers the statistical part of artificial
intelligence.
It teaches the computer to solve problems by looking at hundreds or thousands of
examples, learning from them, and then using that experience to solve the same
problem in new situations.
And Deep Learning is a very special field of Machine Learning where computers can
actually learn and make intelligent decisions on their own.
Deep learning involves a deeper level of automation in comparison with most machine
learning algorithms.
© IBM 2020
© IBM 2020
Python is a popular and powerful general-
purpose programming language that recently
emerged as the preferred language among
data scientists.
© IBM 2020
© IBM 2020
We try to introduce the Python packages in
this course and use it in the labs to give you
better hands-on experience.
1. The first package is Numpy, which is a
math library to work with n-dimensional
arrays in Python. It enables you to do
computation efficiently and effectively. It
is better than regular python because of
it’s amazing capabilities. For example, for
working with arrays, dictionaries,
functions, datatypes, and working with
Phyton for Machine images, you need to know Numpy.
2. SciPy is a collection of numerical
Learning algorithms and domain-specific toolboxes,
including signal processing, optimization,
statistics and much more. SciPy is a good
library for scientific and high-performance
computation.
3. Matplotlib is a very popular plotting
package that provides 2D plotting as well
as 3D plotting. Basic Knowledge about
these 3 packages, which are built on top of
python, is a good asset for data scientists
who want to work with real world
problems.
© IBM 2020
4. Pandas library, is a very high-level python
library that provides high-performance,
easy to use data structures. It has many
functions for data importing, manipulation
and analysis In particular, it offers data
© IBM 2020
Scikit-learn is a free machine learning library for the
Python programming language.
It has most of the classification, regression and
clustering algorithms, and it’s designed to work with the
Python numerical and scientific libraries, NumPy and
SciPy.
Also, it includes very good documentation.
On top of that, implementing machine learning models
with scikit learn is really easy, with a few lines of python
Sckit-learn code.
Most of the tasks that need to be done in a machine
learning pipeline are implemented already in scikit
learn,
including, pre-processing of data, feature
selection, feature extraction, train/test splitting,
defining the algorithms, fitting models, tuning
parameters, prediction, evaluation, and
exporting the model.
© IBM 2020
© IBM 2020
Let me show you an example of how scikit
learn looks like when you use this library.
You don’t have to understand the code for
now, but just see how easily you can build a
model with just a few lines of code.
Sckit-Learn Basically, Machine learning algorithms benefit
from standardization of the data set.
If there are some outliers, or different scales
fields in your data set, you have to fix them.
© IBM 2020
How to work with Sckit-Learn
The preprocessing package of scikit learn provides several common utility functions and transformer
classes to change raw feature vectors into a suitable form of vector for modeling.
You have to split your dataset into train and test sets to train your model, and then test the model’s
accuracy separately.
Scikit learn can split arrays or matrices into random train and test subsets for you, in one line of code.
Then, you can setup your algorithm.
© IBM 2020
How to work with Sckit-Learn
For example, you can build a classifier using a support vector classification algorithm.
We call our estimator instance clf, and initialize its parameters. Now, you can train your model with the train set.
By passing our training set to the fit method, the clf model learns to classify unknown cases.
Then, we can use our test set to run predictions.
And, the result tells us what the class of each unknown value is.
© IBM 2020
How to work with
Sckit-Learn
© IBM 2020
Machine Learning
You may find all or some of these machine learning terms
confusing, but don’t worry, we will talk about all of these topics
And of course, it needs much more coding if you use pure python
programming to implement all of these tasks.
© IBM 2020
An easy way to begin grasping the concept of
supervised learning is by looking directly at
the words that make it up.
Supervised vs Supervise means to observe and direct the
execution of a task, project, or activity.
Unsupervised Obviously, we aren’t going to be supervising a
person…
© IBM 2020
Supervised Machine
Learning
© IBM 2020
Supervised Machine
Learning So, how do we supervise a machine learning model? We do
this by “teaching” the model.
But … this leads to the next ques7on, which is, “How exactly
do we teach a model?”
© IBM 2020
© IBM 2020
Supervised Machine Learning – Labeled Data
It’s important to note that the data is labeled. And what does a labeled dataset look like?
Well, it can look something like this. This example is taken from the cancer dataset.
As you can see, we have some historical data for patients, and we already know the class of each row.
© IBM 2020
Supervised Machine Learning – Labeled Data
© IBM 2020
Supervised Machine Learning – Labeled Data
Looking directly at the value of the data, you can have two kinds.
The first is numerical. When dealing with machine learning, the most commonly used data is numeric.
The second is categorical… that is, it’s non-numeric, because it contains characters rather than numbers.
In this case, it’s categorical because this dataset is made for Classification.
© IBM 2020
There are two types of Supervised Learning techniques.
They are: classification and regression.
© IBM 2020
Classification
© IBM 2020
Regression
Regression is the process of predicting a continuous value as opposed to predicting a categorical value in
Classification.
Look at this dataset.
© IBM 2020
Regression
It is related to Co2 emissions of different cars.
© IBM 2020
Unsupervised Algorithm
It means, The Unsupervised algorithm trains on the dataset, and draws conclusions on UNLABELED data.
Generally speaking, unsupervised learning has more difficult algorithms than supervised learning, since we
know little to no information about the data, or the outcomes that are to be expected.
Dimension reduction, Density estimation, Market basket analysis and Clustering are the most widely used
unsupervised machine learning techniques.
© IBM 2020
Unsupervised Machine Learning Techniques
1. Dimensionality Reduction and/or feature selection play a large role in this by reducing redundant features to
make the classification easier.
2. Market basket analysis is a modelling technique based upon the theory that if you buy a certain group of
items, you’re more likely to buy another group of items.
3. Density estimation is a very simple concept that is mostly used to explore the data to find some structure
within it.
4. And finally, clustering. Clustering is considered to be one of the most popular unsupervised machine learning
techniques used for grouping data points or objects that are somehow similar.
© IBM 2020
Cluster analysis has many applications in
different domains, whether it be a bank’s
desire to segment its customers based on
certain characteristics, or helping an
individual to organize and group his/her
Cluster Analysis favourite types of music!
Generally speaking, though, Clustering is used
mostly for: Discovering structure,
Summarization, and Anomaly detection.
© IBM 2020
Supervised vs So, to recap, the biggest difference between Supervised and Unsupervised Learning
Unsupervised Learning is that supervised learning deals with labeled data while Unsupervised Learning
deals with unlabeled data.
© IBM 2020
© IBM 2020
Regression
© IBM 2020
Regression
We can use regression methods to predict a continuous value, such as CO2 Emission, using some other
variables.
Indeed, regression is the process of predicting a continuous value.
In regression there are two types of variables: a dependent variable and one or more independent variables.
© IBM 2020
Regression
The dependent variable can be seen as the "state", "target" or "final goal" we study and try to predict, and the
independent variables, also known as explanatory variables, can be seen as the "causes" of those "states".
The independent variables are shown conventionally by x; and the dependent variable is notated by y.
A regression model relates y, or the dependent variable, to a function of x, i.e., the independent variables.
The key point in the regression is that our dependent value should be continuous, and cannot be a discreet
value.
© IBM 2020
Regression
However, the independent variable or variables can be measured on either a categorical or continuous
measurement scale.
So, what we want to do here is to use the historical data of some cars, using one or more of their features, and
from that data, make a model.
We use regression to build such a regression/estimation model.
Then the model is used to predict the expected Co2 emission for a new or unknown car.
© IBM 2020
Regression Models
© IBM 2020
Regression Application
© IBM 2020
Regression Application
It can also be used in the field of psychology, for example,
to determine individual satisfaction based on
demographic and psychological factors.
© IBM 2020
Regression Algorithms
© IBM 2020
Simple Linear
Regression
You don’t need to know any linear algebra to understand topics in linear regression.
This high-level introduction will give you enough background information on linear regression to be able to use it
effectively on your own problems.
So, let’s get started. Let’s take a look at this dataset. It’s related to the Co2 emission of different cars.
© IBM 2020
Linear Regression
It includes Engine size, Cylinders, Fuel Consumption and Co2 emissions for various car models.
The question is: Given this dataset, can we predict the Co2 emission of a car, using another field, such as Engine
size?
Quite simply, yes! We can use linear regression to predict a continuous value such as Co2 Emission, by using
other variables.
Linear regression is the approximation of a linear model used to describe the relationship between two or more
variables.
In simple linear regression, there are two variables: a dependent variable and an independent variable.
The key point in the linear regression is that our dependent value should be continuous and cannot be a discreet
value.
© IBM 2020
Linear Regression
However, the independent variable(s) can be measured on either a categorical or continuous measurement
scale.
© IBM 2020
Linear Regression Models
There are two types of linear regression models.
They are: simple regression and multiple regression.
Simple linear regression is when one independent variable is used to estimate a dependent variable.
For example, predicting Co2 emission using the EngineSize variable.
When more than one independent variable is present, the process is called multiple linear regression.
For example, predicting Co2 emission using EngineSize and Cylinders of cars.
Our focus in this section is on simple linear regression.
Now, let’s see how linear regression works.
© IBM 2020
© IBM 2020
Linear Regression
© IBM 2020
Linear Regression
Also, it indicates that these variables are linearly related. With linear regression you can fit a line through the
data.
For instance, as the EngineSize increases, so do the emissions.
With linear regression, you can model the relationship of these variables.
A good model can be used to predict what the approximate emission of each car is.
© IBM 2020
Linear Regression
How do we use this line for prediction now? Let us assume, for a moment, that the line is a good fit of data. We
can use it to predict the emission of an unknown car. For example, for a sample car, with engine size 2.4, you
can find the emission is 214.
© IBM 2020
Linear Regression – Fitting Line
Now, let’s talk about what this fitting line actually is. We’re going to predict the target value, y.
In our case, using the independent variable, "Engine Size," represented by x1.
© IBM 2020
Linear Regression – Fitting Line
The fit line is shown traditionally as a polynomial. In a simple regression problem (a single x), the form of the
model would be θ0 +θ1 x1.
In this equation, y ̂ is the dependent variable or the predicted value, and x1 is the independent variable; θ0 and
θ1 are the parameters of the line that we must adjust. θ1 is known as the "slope" or "gradient” of the fitting line
and θ0 is known as the "intercept."θ0 and θ1 are also called the coefficients of the linear equation.
You can interpret this equation as y ̂ being a function of x1, or y ̂ being dependent of x1.
Now the questions are: "How would you draw a line through the points?" And, "How do
you determine which line ‘fits best’?”
Linear regression estimates the coefficients of the line.
This means we must calculate θ0 and θ1 to find the best line to ‘fit’ the data.
This line would best estimate the emission of the unknown data points.
© IBM 2020
Linear Regression – Fitting Line
Let’s see how we can find this line, or to be more precise, how we can adjust the parameters to make the line
the best fit for the data.
For a moment, let’s assume we’ve already found the best fit line for our data.
Now, let’s go through all the points and check how well they align with this line.
Best fit, here, means that if we have, for instance, a car with engine size x1=5.4, and actual Co2=250, its Co2
should be predicted very close to the actual value, which is y=250, based on historical data.
© IBM 2020
Linear Regression – Fitting Line
But, if we use the fit line, or better to say, using our polynomial with known parameters to predict the Co2
emission, it will return y ̂ =340.
Now, if you compare the actual value of the emission of the car with what we predicted using our model, you will
find out that we have a 90-unit error.
© IBM 2020
Linear Regression – Fitting Line
This means our prediction line is not accurate. This error is also called the residual error.
So, we can say the error is the distance from the data point to the fitted regression line.
The mean of all residual errors shows how poorly the line fits with the whole dataset.
Mathematically, it can be shown by the equation, mean squared error, shown as (MSE).
© IBM 2020
Linear Regression – Fitting Line
Our objective is to find a line where the mean of all these errors is minimized.
In other words, the mean error of the prediction using the fit line should be minimized.
Let’s re-word it more technically.
The objective of linear regression is to minimize this MSE equation, and to minimize it, we should find the best
parameters, θ0 and θ1.
© IBM 2020
Linear Regression –
Fitting Line Actually, we have two options here:
Notice that all of the data must be available to traverse and calculate the parameters. It can be shown that
the intercept and slope can be calculated using these equations. We can start off by estimating the value for
θ1.
As mentioned before, θ0 and θ1, in the simple linear regression, are the coefficients of the fit line.
We can use a simple equation to estimate these coefficients. That is, given that it’s a simple linear regression,
with only 2 parameters, and knowing that θ0 and θ1 are the intercept and slope of the line, we can estimate
them directly from our data.
It requires that we calculate the mean of the independent and dependent or target columns, from the dataset.
© IBM 2020
Linear Regression – Fitting Line
This is how you can find the slope of a line based on the data.
x ̅ is the average value for the engine size in our dataset.
Please consider that we have 9 rows here, row 0 to 8.
First, we calculate the average of x1 and average of y. Then we plug it into the slope equation, to find θ1.
© IBM 2020
Linear Regression – Fitting Line
The xi and yi in the equation refer to the fact that we need to repeat these calculations across all values in our
dataset and i refers to the i’th value of x or y.
© IBM 2020
Linear Regression – Fitting Line
As a side note, you really don’t need to remember the formula for calculating these parameters, as most of
the libraries used for machine learning in Python, R, and Scala can easily find these parameters for you.
But it’s always good to understand how it works.Now, we can write down the polynomial of the line.
© IBM 2020
Linear Regression – Fitting Line
So, we know how to find the best fit for our data, and its equation.
Now the question is: "How can we use it to predict the emission of a new car based on its engine size?"
After we found the parameters of the linear equation, making predictions is as simple as solving the equation for a
specific set of inputs.
Imagine we are predicting Co2 Emission(y) from EngineSize(x) for the Automobile in record number 9. Our linear
regression model representation for this problem would be: y ̂ = θ0 + θ1 x1.
© IBM 2020
Or if we map it to our dataset, it would be Co2Emission = θ0 + θ1 EngineSize.
As we saw, we can find θ0, θ1 using the equations that we just talked about.
Once found, we can plug in the equation of the linear model.
For example, let’s use θ0=125 and θ1=39.
So, we can rewrite the linear model as 𝐶𝑜2𝐸𝑚𝑖𝑠𝑠𝑖𝑜𝑛=125+39𝐸𝑛𝑔𝑖𝑛𝑒𝑆𝑖𝑧𝑒.
© IBM 2020
Now, let’s plug in the 9th row of our dataset and calculate the Co2 Emission for a car with an EngineSize of 2.4. So
Co2Emission = 125 + 39 × 2.4.
Therefore, we can predict that the Co2 Emission for this specific car would be 218.6.
© IBM 2020
Let’s talk a bit about why Linear Regression is
so useful.
Quite simply, it is the most basic regression to
use and understand. In fact, one reason why
Linear Regression is so useful is that it’s fast!
It also doesn’t require tuning of parameters.
Linear Regression So, something like tuning the K parameter in
K-Nearest Neighbors or the learning rate in
Neural Networks isn’t something to worry
about.
Linear Regression is also easy to understand
and highly interpretable.
© IBM 2020
Multiple Linear Regression
As you know there are two types of linear regression models: simple regression and multiple regression.
Simple linear regression is when one independent variable is used to estimate a dependent variable. For
example, predicting Co2 emission using the variable of EngineSize.
In reality, there are multiple variables that predict the Co2 emission. When multiple independent variables
are present, the process is called "multiple linear regression." For example, predicting Co2 emission using
EngineSize and the number of Cylinders in the car’s engine.
The good thing is that multiple linear regression is the extension of the simple linear regression model.
Before we dive into a sample dataset and see how multiple linear regression works, I want to tell you what kind
of problems it can solve; when we should use it; and, specifically, what kind of questions we can answer using it.
© IBM 2020
Multiple Linear Regression
First, it can be used when we would like to identify the strength of the effect that the
independent variables have on a dependent variable.
For example, does revision time, test anxiety, lecture attendance, and gender, have any
effect on exam performance of students?
Second, it can be used to predict the impact of changes.
Basically, there are two That is, to understand how the dependent variable changes when we change the
applications for multiple linear independent variables.
regression.
For example, if we were reviewing a person’s health data, a multiple linear regression can tell
you how much that person’s blood pressure goes up (or down) for every unit increase (or
decrease) in a patient’s body mass index (BMI), holding other factors constant.
© IBM 2020
Multiple Linear Regression
As is the case with simple linear regression, multiple linear regression is a method of predicting a continuous
variable.
It uses multiple variables, called independent variables, or predictors, that best predict the value of the target
variable, which is also called the dependent variable.
In multiple linear regression, the target value, y, is a linear combination of independent variables, x.
For example, you can predict how much Co2 a car might emit due to independent variables, such as the car’s
Engine Size, Number of Cylinders and Fuel Consumption.
© IBM 2020
Multiple Linear Regression
Multiple linear regression is very useful because you can examine which variables are significant predictors of
the outcome variable.
Also, you can find out how each feature impacts the outcome variable.
And again, as is the case in simple linear regression, if you manage to build such a regression model, you can
use it to predict the emission amount of an unknown case, such as record number 9.
© IBM 2020
Multiple Linear Regression
Generally, the model is of the form: y ̂=θ_0+ θ_1 x_1+ θ_2 x_2 and so on, up to ... +θ_n x_n.
Mathematically, we can show it as a vector form as well.
This means, it can be shown as a dot product of 2 vectors: the parameters vector and the feature set vector.
Generally, we can show the equation for a multi-dimensional space as θ^T x, where θ is an n-by-one vector
of unknown parameters in a multi-dimensional space, and x is the vector of the feature sets, as θ is a vector
of coefficients, and is supposed to be multiplied by x.
© IBM 2020
Multiple Linear Regression
Conventionally, it is shown as transpose θ. θ is also called the parameters, or, weight vector of the regression
equation … both these terms can be used interchangeably. And x is the feature set, which represents a car.
For example x1 for engine size, or x2 for cylinders, and so on.
The first element of the feature set would be set to 1, because it turns the θ_0 into the intercept or
bias parameter when the vector is multiplied by the parameter vector.
© IBM 2020
Multiple Linear Regression
Please notice that θ^T x in a one dimensional space, is the equation of a line. It is what we use in simple linear
regression. In higher dimensions, when we have more than one input (or x), the line is called a plane or a hyper-
plane.
And this is what we use for multiple linear regression.
So, the whole idea is to find the best fit hyper-plane for our data.
To this end, and as is the case in linear regression, we should estimate the values
for θ vector that best predict the value of the target field in each row.
To achieve this goal, we have to minimize the error of the prediction.
Now, the question is, "How do we find the optimized parameters?"
© IBM 2020
To find the optimized parameters for our
model, we should first understand what the
optimized parameters are.
Then we will find a way to optimize the
parameters.
In short, optimized parameters are the ones
Optimize which lead to a model with the fewest errors.
Let’s assume, for a moment, that we have
Parameters already found the parameter vector of our
model. It means we already know the values
of θ vector.
Now, we can use the model, and the feature
set of the first row of our dataset to predict
the Co2 emission for the first car, correct?
© IBM 2020
Optimize Parameters
If we plug the feature set values into the model equation, we find y ̂.
Let’s say, for example, it returns 140 as the predicted value for this specific row.
What is the actual value? y=196.
How different is the predicted value from the actual value of 196? Well, we can calculate it quite simply, as
196-140, which of course = 56.
© IBM 2020
Optimize Parameters
This is the error of our model, only for one row, or one car, in our case.
As is the case in linear regression, we can say the error here is the distance from the data point to the fitted
regression model.
The mean of all residual errors shows how bad the model is representing the dataset. It is called the mean
squared error, or MSE.
Mathematically, MSE can be shown by an equation. While this is not the only way to expose the error of a
multiple linear regression model, it is one the most popular ways to do so.
The best model for our dataset is the one with minimum error for all prediction values.
© IBM 2020
Optimize Parameters
So, the objec7ve of mul7ple linear regression is to minimize
the MSE equa7on.
However, the most common To minimize it, we should find the best parameters θ, but
methods are the ordinary least
squares and optimization
how?
approach.
Ordinary least squares tries to
estimate the values of the
coefficients by minimizing the
“Mean Square Error.” Okay, “How do we find the parameter or coefficients for
This approach uses the data as a
multiple linear regression?”
matrix and uses linear algebra
operations to estimate the
optimal values for the theta.
© IBM 2020
Optimize Parameters
The problem with this technique is the time complexity of calculating matrix operations, as it can take a very
long time to finish.
When the number of rows in your dataset is less 10,000 you can think of this technique as an option, however,
for greater values, you should try other faster approaches.
© IBM 2020
Optimize Parameters
The second option is to use an optimization algorithm to find the best parameters.
That is, you can use a process of optimizing the values of the coefficients by iteratively minimizing the
error of the model on your training data.
For example, you can use Gradient Descent, which starts optimization with random values for each
coefficient.
Then, calculates the errors, and tries to minimize it through wise changing of the coefficients in multiple
iterations.
Gradient descent is a proper approach if you have a large dataset.
© IBM 2020
Optimize Parameters
Please understand, however, that there are other approaches to estimate the parameters of the multiple linear
regression that you can explore on your own.
After you find the best parameters for your model, you can go to the prediction phase.
After we found the parameters of the linear equation, making predictions is as simple as solving the equation
for a specific set of inputs. Imagine we are predicting Co2 emission (or y) from other variables for the
automobile in record number 9.
© IBM 2020
Optimize Parameters
Our linear regression model representation for this problem would be: y ̂=θ^T x. Once we find the parameters,
we can plug them into the equation of the linear model. For example, let’s use θ0 = 125, θ1 = 6.2, θ2 = 14,
and so on.
If we map it to our dataset, we can rewrite the linear model as "Co2Emission=125 plus 6.2 multiplied by
EngineSize plus 14 multiplied by Cylinder," and so on.
As you can see, multiple linear regression estimates the relative importance of predictors.
© IBM 2020
For example, it shows Cylinder has higher impact on Co2 emission amounts in comparison with EngineSize.
Now, let’s plug in the 9th row of our dataset and calculate the Co2 emission for a car with the EngineSize of
2.4.
So Co2Emission=125 + 6.2 × 2.4 + 14 × 4 … and so on.
We can predict the Co2 emission for this specific car would be 214.1.
© IBM 2020
Multiple Linear
Regression Concerns
Now let me address some concerns that you might already be having regarding multiple
linear regression.
As you saw, you can use multiple independent variables to predict a target value in multiple
linear regression.
It sometimes results in a better model compared to using a simple linear regression, which
uses only one independent variable to predict the dependent variable.
Now, the question is, "How many independent variables should we use for the prediction?”
Should we use all the fields in our dataset? Does adding independent variables to a multiple
linear regression model always increase the accuracy of the model?
© IBM 2020
Multiple Linear Regression Concerns
Basically, adding too many independent variables without any theoretical justification may result in an over-
fit model.
An over-fit model is a real problem because it is too complicated for your data set and not general enough to
be used for prediction.
So, it is recommended to avoid using many variables for prediction.
There are different ways to avoid overfitting a model in regression.
© IBM 2020
Multiple Linear Regression Concerns
As a last point, remember that “multiple linear regression” is a specific type of linear regression.
So, there needs to be a linear relationship between the dependent variable and each of your independent
variables.
There are a number of ways to check for linear relationship.
For example, you can use scatterplots, and then visually check for linearity.
If the relationship displayed in your scatterplot is not linear, then, you need to use non-linear regression.
© IBM 2020
Model Evaluation
The goal of regression is to build a model to
accurately predict an unknown case.
© IBM 2020
Let’s look at the first approach.
When considering evaluation models, we
clearly want to choose the one that will give
us the most accurate results.
Model Accuracy So, the question is, how we can calculate the
accuracy of our model?
In other words, how much can we trust this
model for prediction of an unknown sample,
using a given a dataset and having built a
model such as linear regression.
© IBM 2020
Model Accuracy
© IBM 2020
Model Accuracy
Now, we select a small portion of the dataset, such as row numbers 6 to 9, but without the labels.
This set, is called a test set, which has the labels, but the labels are not used for prediction, and is used only as
ground truth.
© IBM 2020
Model Accuracy
© IBM 2020
Model Accuracy
© IBM 2020
Model Accuracy
The error of the model is calculated as the average difference between the predicted and actual values for all the
rows.
We can write this error as an equation.
So, the first evaluation approach we just talked about is the simplest one: train and test on the SAME dataset.
© IBM 2020
Model Accuracy
Essen4ally, the name of this approach says it all … you train
the model on the en4re dataset, then you test it using a
por4on of the same dataset.
© IBM 2020
Model Accuracy
© IBM 2020
Out-of-sample accuracy is the percentage of
correct predictions that the model makes on
data that the model has NOT been trained on.
Doing a “train and test” on the same dataset
will most likely have low out-of-sample
Out of Sample accuracy due to the likelihood of being over-
fit.
Accuracy It’s important that our models have high, out-
of-sample accuracy, because the purpose of
our model is, of course, to make correct
predictions on unknown data.
So, how can we improve out-of-sample
accuracy?
© IBM 2020
Out of Sample Accuracy
One way is to use another evaluation approach called "Train/Test Split.” It’s important that our models have
high, out-of-sample accuracy, because the purpose of our model is, of course, to make correct predictions on
unknown data.
So, how can we improve out-of-sample accuracy?
One way is to use another evaluation approach called "Train/Test Split."
© IBM 2020
Out of Sample Accuracy
In this approach, we select a portion of our dataset for training, for example, rows 0 to 5. And the rest is used
for testing, for example, rows 6 to 9.
© IBM 2020
Out of Sample Accuracy
© IBM 2020
Out of Sample Accuracy
Train/Test Split involves splitting the dataset into training and testing sets, respectively, which are mutually
exclusive, after which, you train with the training set and test with the testing set.
This will provide a more accurate evaluation on out-of-sample accuracy because the testing dataset is NOT
part of the dataset that has been used to train the data.
It is more realistic for real world problems.
This means that we know the outcome of each data point in this dataset, making it great to test with!
And since this data has not been used to train the model, the model has no knowledge of the outcome of these
data points.
So, in essence, it’s truly out-of-sample testing.
© IBM 2020
Out of Sample Accuracy
The issue with train/test split is that it’s highly dependent on the datasets on which the data was trained and
tested.
The variation of this causes train/test split to have a better out-of-sample prediction than training and testing
on the same dataset, but it still has some problems due to this dependency.
© IBM 2020
K-Fold Cross-validation
Another evaluation model, called "K-Fold Cross-validation," resolves most of these issues.
How do you fix a high variation that results from a dependency?
Well, you average it.
Let me explain the basic concept of “k-fold cross-validation” to see how we can solve this problem.
The entire dataset is represented by the points in the image at the top left. If we have k=4 folds, then we
split up this dataset as shown here.
© IBM 2020
K-Fold Cross-validation
In the first fold, for example, we use the first 25 percent of the dataset for testing, and the rest for training.
© IBM 2020
K-Fold Cross-validation
© IBM 2020
K-Fold Cross-validation
K-fold cross-validation, in its simplest form, performs multiple train/test splits
using the same dataset where each split is different.
Then, the result is averaged to produce a more consistent out-of-sample
accuracy.
We wanted to show you an evaluation model that addressed some of the
issues we’ve described in the previous approaches.
© IBM 2020
EVALUATION METRICS IN REGRESSION
© IBM 2020
EVALUATION METRICS IN REGRESSION
© IBM 2020
Defining Error
But, before we get into defining these, we need to define what an error actually is.
In the context of regression, the error of the model is the difference between the data points and the trend line
generated by the algorithm.
Since there are multiple data points, an error can be determined in multiple ways.
© IBM 2020
Defining Error
Mean absolute error is the mean of the absolute value of the errors.
This is the easiest of the metrics to understand, since it’s just the average error.
© IBM 2020
Defining Error
Root Mean Squared Error (RMSE) is the square root of the mean squared error.
This is one of the most popular of the evaluation metrics because Root Mean Squared Error is interpretable in the same units as the
response vector (or ‘y’ units) making it easy to relate its information.
Relative Absolute Error (RAE), also known as Residual sum of square, where y-bar is a mean value of y, takes
the total absolute error and normalizes it by dividing by the total absolute error of the simple predictor.
© IBM 2020
Defining Error
Relative Squared Error (RSE) is very similar to “Relative absolute error “, but is widely adopted by the data
science community, as it is used for calculating R-squared.
R-squared is not error, per se, but is a popular metric for the accuracy of your model.
It represents how close the data values are to the fitted regression line.
The higher the R-squared, the better the model fits your data.
© IBM 2020
Non Linear Regression
These data points correspond to China's Gross Domestic Product (or GDP) from 1960 to 2014.
The first column, is the years, and the second, is China's corresponding annual gross domestic income in US
dollars for that year.
© IBM 2020
Non Linear Regression
This is what the data points look like. Now, we have a couple of interesting questions.
First, “Can GDP be predicted based on time?”
And second, “Can we use a simple linear regression to model it?”
© IBM 2020
Non Linear Regression
Indeed, if the data shows a curvy trend, then linear regression will not produce very accurate results
when compared to a non-linear regression -- simply because, as the name implies, linear regression
presumes that the data is linear.
The scatterplot shows that there seems to be a strong relationship between GDP and time, but the
relationship is not linear.
As you can see, the growth starts off slowly, then from 2005 onward, the growth is very significant. And
finally, it decelerates slightly in the 2010s.
It kind of looks like either a logistical or exponential function. So, it requires a special estimation method of
the non-linear regression procedure.
© IBM 2020
Non Linear Regression
For example, if we assume that the model for these data points are exponential functions, such as y ̂ = θ_0 + θ_1
〖θ_2〗^x, our job is to estimate the parameters of the model, i.e. θs, and use the fitted model to predict GDP for
unknown or future cases.
In fact, many different regressions exist that can be used to fit whatever the dataset looks like.
You can see a quadratic and cubic regression lines here, and it can go on and on to infinite degrees.
© IBM 2020
Polynomial Regression
In essence, we can call all of these "polynomial regression," where the relationship between the independent
variable x and the dependent variable y is modelled as an nth degree polynomial in x.
With many types of regression to choose from, there’s a good chance that one will fit your dataset well.
Remember, it’s important to pick a regression that fits the data the best.
© IBM 2020
Polynomial Regression
So, what is polynomial Regression? Polynomial regression fits a curved line to your data.
A simple example of polynomial, with degree 3, is shown as y ̂ = θ_0 + θ_1x + θ_2x^2 + θ_3x^3 or to the power
of 3, where θs are parameters to be estimated that makes the model fit perfectly to the underlying data.
© IBM 2020
Non Linear and Polynomial Regression
Though the relationship between x and y is non-linear here, and polynomial regression can fit them, a
polynomial regression model can still be expressed as linear regression.
I know it's a bit confusing, but let’s look at an example.
Given the 3rd degree polynomial equation, by defining x_1 = x and x_2 = x^2 or x to the power of 2 and so on,
the model is converted to a simple linear regression with new variables, as y ̂ = θ_0+ θ_1x_1 + θ_2x_2 +
θ_3x_3.
This model is linear in the parameters to be estimated, right?
Therefore, this polynomial regression is considered to be a special case of traditional multiple linear
regression.
© IBM 2020
Non Linear Regression
© IBM 2020
© IBM 2020
Non Linear Regression
When it comes to non-linear equation, it can be the shape of exponential, logarithmic, and logistic, or many
other types.
As you can see, in all of these equations, the change of y ̂ depends on changes in the parameters θ, not
necessarily on x only.
© IBM 2020
Non Linear Regression
© IBM 2020
Non Linear Regression
The second important questions is, “How should I model my data, if it displays non-linear on a scatter plot?”
Well, to address this, you have to use either a polynomial regression, use a non-linear regression model, or
"transform" your data
© IBM 2020
Lab
© IBM 2020