Machine Learning Concepts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

MACHINE LEARNING CONCEPTS

TOPICS TO BE COVERED
Machine Learning?
Basic types of Machine Learning .
Machine Learning Models.
Regression vs. Classification
One supervised approach:
Linear Regression
Naïve Bayes Classifier
One unsupervised approach
Clustering approach: K Means
Overview of reinforcement approach
Splitting the Train Sample:
Training
Testing
Validation
Over fitting and Under fitting
MACHINE LEARNING

• Artificial Intelligence is a scientific field concerned with the development of


algorithms that allow computers to learn without being explicitly programmed
• Machine Learning is a branch of Artificial Intelligence, which focuses on
methods that learn from data and make predictions on unseen data

Machine Learning
Labeled Data algorithm

Training
Prediction

Learned
Labeled Data Prediction
model
BASIC TYPES OF MACHINE LEARNING
• Supervised:
Supervised learning with labeled data
– Example: email classification, image classification
– Example: regression for predicting real-valued outputs
• Unsupervised:
Unsupervised discover patterns in unlabeled data
– Example: cluster similar data points
• Reinforcement learning:
learning learn to act based on feedback/reward
– Example: learn to play Go

class A

class B

Regression Clustering
Classification
Supervised Learning

Regression aims to predict the value of a Classification aims to assign an input to


dependent variable based on the values of one of several predefined classes or
independent variables. The goal is to categories based on its features. The
minimize the difference between the goal is to correctly classify instances
predicted and actual values. into their respective categories.
MACHINE LEANING METHODS IN EACH TYPE

• Supervised learning categories and techniques


– Numerical classifier functions
• Linear classifier,
classifier perceptron, logistic regression, support vector
machines (SVM), neural networks
– Parametric (probabilistic) functions
• Naïve Bayes,
Bayes Gaussian discriminant analysis (GDA), hidden Markov
models (HMM), probabilistic graphical models
– Non-
Non-parametric (instance-
(instance-based) functions
• k-nearest neighbors, kernel regression, kernel density estimation, local
regression
– Symbolic functions
• Decision trees, classification and regression trees (CART)
– Aggregation (ensemble) learning
• Bagging, boosting (Adaboost), random forest
Machine Leaning Methods in each Type

• Unsupervised learning categories and techniques


– Clustering
• k-means clustering
• Mean-shift clustering
• Spectral clustering
– Density estimation
• Gaussian mixture model (GMM)
• Graphical models
– Dimensionality reduction
• Principal component analysis (PCA)
• Factor analysis
Machine Leaning Methods in each Type

Reinforcement learning categories and techniques


Q-Learning:
Learning Learns optimal action choices by updating Q-values iteratively.
Deep Q-Networks (DQN):
(DQN) Uses deep neural networks to handle high-
dimensional state spaces in Q-learning.
Policy Gradient Methods:
Methods Directly learn policies through gradient ascent on
expected return.
Deep Deterministic Policy Gradient (DDPG):
(DDPG) Extends DQN to continuous
action spaces with deterministic policies.
CLASSIFIER: LINEAR VS. NON-LINEAR

For some tasks, input data


can be linearly separable, and
linear classifiers can be
suitably applied

For other tasks, linear


classifiers may have
difficulties to produce
adequate decision boundaries
LINEAR VS NON-LINEAR TECHNIQUES
Linear classification techniques
Linear classifier
Perceptron
Logistic regression
Linear SVM
Naïve Bayes
Non-linear classification techniques
k-nearest neighbors
Non-linear SVM
Neural networks
Decision trees
Random forest
REGRESSION MODEL

LEAST SQUARE REGRESSION CLASSIFIER


SUPERVISED LEARNING APPROACH:
LINEAR REGRESSION
Regression is a statistical method used to examine the relationship between
one dependent variable and one or more independent variables. It aims to
understand how the value of the dependent variable changes when one or
more independent variables are varied.

The basic idea behind regression is to find the best-fitting line (or
curve) that describes the relationship between the variables.
RECALL: COVARIANCE

∑ ( x − X )( y
i =1
i i −Y )
cov ( x , y ) =
n −1
INTERPRETING COVARIANCE

cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated

cov(X,Y) = 0 X and Y are independent


CORRELATION COEFFICIENT

Pearson’s Correlation Coefficient is


standardized covariance (unitless):

cov ariance( x, y )
r=
var x var y
CORRELATION

Measures the relative strength of the linear


relationship between two variables
Unit-less
Ranges between –1 and 1
The closer to –1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker any positive linear relationship
SCATTER PLOTS OF DATA WITH VARIOUS
CORRELATION COEFFICIENTS
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
LINEAR CORRELATION

Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
ASSUMPTIONS
Linear regression assumes that…
1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the same (homogeneity
of variances)
4. The observations are independent
PREDICTION
If you know something about X, this knowledge
helps you predict something about Y.

Regression equation…
Expected value of y at a given level of x=

E ( y i / xi) = α + β xi
EXAMPLE: LEAST SQUARES REGRESSION
MODEL
NUMERICAL EXAMPLE
Example: Sam found how many hours of sunshine vs how
many ice creams were sold at the shop from Monday to Friday:
"x" "y"
Hours of Ice First: Let us find the best m (slope) and b (y-intercept) that
Sunshin Creams suits that data y = mx + b
e Sold
2 4 Sum x, y, x2 and xy (gives us Σx, Σy, Σx2 and Σxy):
3 5
5 7
7 10
9 15
12 ?????
As Prediction Model
For the Timing i.e. Hour 12:
Y = 1.518*12 + 0.305
Y = 18.521 (approx)

As Classification Model
AS CLASSIFICATION MODEL
It works by making the total of the square of
the errors as small as possible (that is why
it is called "least squares").

So we find the best fit by varying the slope of


line.
SUPERVISED MODEL
NAIVE BAYES CLASSIFIER
NAVIE BAYES CLASSIFIER
Numerical Example
a l a l s
u
o ric o ric uo
g g in ss
at
e
at
e n t Global/Total
la Probability:
c c co c
Tid Refund Marital Taxable
P(No) = 7/10
Status Income Evade P(Yes) = 3/10

1 Yes Single 125K No


P(Status=Married|No) = 4/7
2 No Married 100K No P(Refund=Yes|Yes)=0
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
How toricaEstimate
l
r ic al
o u sProbabilities from Data?
o o u
eg eg in s
at at nt las
c c co c
Tid Refund Marital
Status
Taxable
Income Evade
• Normal distribution:
( Ai − µ ij ) 2
1 Yes Single 125K No 1 −
2 σ ij2
P( A | c ) = e
2 No Married 100K No
2πσ
i j 2

3 No Single 70K No ij

4 Yes Married 120K No


– One for each (Ai,ci) pair
5 No Divorced 95K Yes
6 No Married 60K No • For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
• sample mean = 110
9 No Married 75K No
• sample variance = 2975
10 No Single 90K Yes
10

1 −
( 120−110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )


= 0.0072
2π (54.54)
Example of Naïve Bayes Classifier
Given a Test Record:

X = (Refund = No, Married, Income = 120K)


naive Bayes Classifier:
P(X|Class=No) = P(Refund=No|Class=No)
P(Refund=Yes|No) = 3/7 × P(Married| Class=No)
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0
× P(Income=120K| Class=No)
P(Refund=No|Yes) = 1 = 4/7 × 4/7 × 0.0072 = 0.0024
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7 P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Married|No) = 4/7 × P(Married| Class=Yes)
P(Marital Status=Single|Yes) = 2/7 × P(Income=120K| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7
= 1 × 0 × 1.2 × 10-9 = 0
P(Marital Status=Married|Yes) = 0

For taxable income: Since P(X|No)P(No) > P(X|Yes)P(Yes)


If class=No: sample mean=110
sample variance=2975 Therefore P(No|X) > P(Yes|X)
If class=Yes: sample mean=90 => Class = No
sample variance=25
UN-
UN-SUPERVISED MODEL
K MEANS CLUSTERING
C L U S T E R I N G S E G M E N TAT I O N
W HAT I S C L U S T E R A N A LYS I S ?
Finding groups of objects such that the objects in a group will
be similar (or related) to one another and different from (or
unrelated to) the objects in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
NOTION OF A CLUSTER CAN BE AMBIGUOUS

How many Six Clusters


clusters?

Two Clusters Four


Clusters
Image Clusters on
color

K-means clustering using intensity alone and color alone


NUMERICAL EXAMPLE
Lets consider we have cluster points:
P1(1,3) , P2(2,2) , P3(5,8) , P4(8,5) , P5(3,9) , P6(10,7) , P7(3,3) , P8(9,4) , P9(3,7).

K value as 3 and we assume that our Initial cluster centers are:


P7(3,3), P9(3,7), P8(9,4) as C1, C2, C3.

Steps:
1. Find the distance between data with the centers C1, C2 and C3.
2. Assign the data point to nearest cluster using the minimum distance value.
3. Repeat this step for all data points.
4. Update the centers by taking average of data points present in each cluster.
5. Repeat this step for “N” number of iterations or till convergence.
UPDATE THE CENTER
Updated Centers

Repeat this process until


convergence
REINFORCEMENT LEARNING MODEL
REINFORCEMENT LEARNING
EXAMPLE
REINFORCEMENT LEARNING (RL)
BASIC LEARNING
REINFORCEMENT LEARNING

Agent

Reward

Action
State
Environment
SETUP FOR REINFORCEMENT LEARNING
MARKOV DECISION PROCESS POLICY
(ENVIRONMENT) (AGENT’S BEHAVIOR)

Probability of moving to each


state

Reward for making Value of being in


that move that state
SIMPLE EXAMPLE OF AGENT IN AN
ENVIRONMENT

Score: 100
0
0, 0 1, 0 2, 0
100

0, 1 1, 1 2, 1

0, 2 1, 2 2, 2
POLICIES
Policy Evaluating Policies

0, 0 1, 0 2, 0

12.5 100

0, 1 1, 1 2, 1

50

0, 2 1, 2 2, 2

Move toMove toMove


<0,1> <1,1>
toMove to <2,0>
<1,0>

Policy could be
better
SPLITTING THE DATASET
Training Dataset:
The training dataset is the portion of the data used to train the model. It consists of
input-output pairs where the input is the data used to make predictions, and the
output is the corresponding target or label.
Testing Dataset:
The testing dataset is a separate portion of the data that is held out from the training
process. It is used to evaluate the performance of the trained model.
Validation Dataset:
The validation dataset is another independent portion of the data used during the
model development process.

Complete Dataset

Training/Development Phase
Testing Phase
Training Set
Testing Set
Validation Set
OVER FITTING AND UNDER FITTING
Overfitting:
It occurs when a model learns to fit the training data too closely, capturing
noise or random fluctuations in the data rather than the underlying pattern.
As a result, an overfitted model performs well on the training data but fails
to generalize to new, unseen data.

Underfitting:
IT happens when a model is too simple to capture the underlying structure
of the data.
It fails to capture the patterns in the training data and also performs poorly
on new data.
BIAS AND VARIANCE VS. OVERFITTING AND
UNDERFITTING
Bias:
It is the error caused because the model can not represent the concept.
Bias measures how much the average prediction of the model differs from
the true value it's trying to predict.
The difference between the training observations and the best fit line
is the Training Error. The Training Error is also called Bias.

Variance:
It is the error caused because the learning algorithm overreacts to small
changes (noise) in the training data
The difference between the testing observations and the best fit line is
the Testing Error. The Testing Error is also called a Variance.

Total Loss = Bias + Variance (+ noise)


Bias and Variance vs. overfitting and Underfitting
Considering the Bias and Variance we have four relationships:
High Bias and High Variance(Worst Case)
Low Bias and Low Variance(Best Case)
Low Bias and High Variance(Overfitting)
High Bias and Low Variance(Underfitting)
Avoiding overfitting and Underfitting

OVERFITTING UNDERFITTING

Cross-Validation By increasing the training time of


Training with more data the model.
Removing features By increasing the number of
Early stopping the training features.
Regularization
Ensembling
END

You might also like