Machine Learning Concepts

MACHINE LEARNING CONCEPTS
TOPICS TO BE COVERED
Machine Learning?
Basic types of Machine Learning .
Machine Learning Models.
Regression vs. Classification
One supervised approach:
Linear Regression
Naïve Bayes Classifier
One unsupervised approach
Clustering approach: K Means
Overview of reinforcement approach
Splitting the Train Sample:
Training
Testing
Validation
Over fitting and Under fitting
MACHINE LEARNING
• Artificial Intelligence is a scientific field concerned with the development of

algorithms that allow computers to learn without being explicitly programmed
• Machine Learning is a branch of Artificial Intelligence, which focuses on
methods that learn from data and make predictions on unseen data
Machine Learning
Labeled Data algorithm
Training
Prediction
Learned
Labeled Data Prediction
model
BASIC TYPES OF MACHINE LEARNING
• Supervised:
Supervised learning with labeled data
– Example: email classification, image classification
– Example: regression for predicting real-valued outputs
• Unsupervised:
Unsupervised discover patterns in unlabeled data
– Example: cluster similar data points
• Reinforcement learning:
learning learn to act based on feedback/reward
– Example: learn to play Go
class A
class B
Regression Clustering
Classification
Supervised Learning
Regression aims to predict the value of a Classification aims to assign an input to

dependent variable based on the values of one of several predefined classes or
independent variables. The goal is to categories based on its features. The
minimize the difference between the goal is to correctly classify instances
predicted and actual values. into their respective categories.
MACHINE LEANING METHODS IN EACH TYPE
• Supervised learning categories and techniques

– Numerical classifier functions
• Linear classifier,
classifier perceptron, logistic regression, support vector
machines (SVM), neural networks
– Parametric (probabilistic) functions
• Naïve Bayes,
Bayes Gaussian discriminant analysis (GDA), hidden Markov
models (HMM), probabilistic graphical models
– Non-
Non-parametric (instance-
(instance-based) functions
• k-nearest neighbors, kernel regression, kernel density estimation, local
regression
– Symbolic functions
• Decision trees, classification and regression trees (CART)
– Aggregation (ensemble) learning
• Bagging, boosting (Adaboost), random forest
Machine Leaning Methods in each Type
• Unsupervised learning categories and techniques

– Clustering
• k-means clustering
• Mean-shift clustering
• Spectral clustering
– Density estimation
• Gaussian mixture model (GMM)
• Graphical models
– Dimensionality reduction
• Principal component analysis (PCA)
• Factor analysis
Machine Leaning Methods in each Type
Reinforcement learning categories and techniques

Q-Learning:
Learning Learns optimal action choices by updating Q-values iteratively.
Deep Q-Networks (DQN):
(DQN) Uses deep neural networks to handle high-
dimensional state spaces in Q-learning.
Policy Gradient Methods:
Methods Directly learn policies through gradient ascent on
expected return.
Deep Deterministic Policy Gradient (DDPG):
(DDPG) Extends DQN to continuous
action spaces with deterministic policies.
CLASSIFIER: LINEAR VS. NON-LINEAR
For some tasks, input data

can be linearly separable, and
linear classifiers can be
suitably applied
For other tasks, linear

classifiers may have
difficulties to produce
adequate decision boundaries
LINEAR VS NON-LINEAR TECHNIQUES
Linear classification techniques
Linear classifier
Perceptron
Logistic regression
Linear SVM
Naïve Bayes
Non-linear classification techniques
k-nearest neighbors
Non-linear SVM
Neural networks
Decision trees
Random forest
REGRESSION MODEL
LEAST SQUARE REGRESSION CLASSIFIER

SUPERVISED LEARNING APPROACH:
LINEAR REGRESSION
Regression is a statistical method used to examine the relationship between
one dependent variable and one or more independent variables. It aims to
understand how the value of the dependent variable changes when one or
more independent variables are varied.
The basic idea behind regression is to find the best-fitting line (or
curve) that describes the relationship between the variables.
RECALL: COVARIANCE
∑ ( x − X )( y
i =1
i i −Y )
cov ( x , y ) =
n −1
INTERPRETING COVARIANCE
cov(X,Y) > 0 X and Y are positively correlated
cov(X,Y) < 0 X and Y are inversely correlated
cov(X,Y) = 0 X and Y are independent

CORRELATION COEFFICIENT
Pearson’s Correlation Coefficient is

standardized covariance (unitless):
cov ariance( x, y )
r=
var x var y
CORRELATION
Measures the relative strength of the linear

relationship between two variables
Unit-less
Ranges between –1 and 1
The closer to –1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker any positive linear relationship
SCATTER PLOTS OF DATA WITH VARIOUS
CORRELATION COEFFICIENTS
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
LINEAR CORRELATION
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
ASSUMPTIONS
Linear regression assumes that…
1. The relationship between X and Y is linear
2. Y is distributed normally at each value of X
3. The variance of Y at every value of X is the same (homogeneity
of variances)
4. The observations are independent
PREDICTION
If you know something about X, this knowledge
helps you predict something about Y.
Regression equation…
Expected value of y at a given level of x=
E ( y i / xi) = α + β xi
EXAMPLE: LEAST SQUARES REGRESSION
MODEL
NUMERICAL EXAMPLE
Example: Sam found how many hours of sunshine vs how
many ice creams were sold at the shop from Monday to Friday:
"x" "y"
Hours of Ice First: Let us find the best m (slope) and b (y-intercept) that
Sunshin Creams suits that data y = mx + b
e Sold
2 4 Sum x, y, x2 and xy (gives us Σx, Σy, Σx2 and Σxy):
3 5
5 7
7 10
9 15
12 ?????
As Prediction Model
For the Timing i.e. Hour 12:
Y = 1.518*12 + 0.305
Y = 18.521 (approx)
As Classification Model
AS CLASSIFICATION MODEL
It works by making the total of the square of
the errors as small as possible (that is why
it is called "least squares").
So we find the best fit by varying the slope of

line.
SUPERVISED MODEL
NAIVE BAYES CLASSIFIER
NAVIE BAYES CLASSIFIER
Numerical Example
a l a l s
u
o ric o ric uo
g g in ss
at
e
at
e n t Global/Total
la Probability:
c c co c
Tid Refund Marital Taxable
P(No) = 7/10
Status Income Evade P(Yes) = 3/10
1 Yes Single 125K No

P(Status=Married|No) = 4/7
2 No Married 100K No P(Refund=Yes|Yes)=0
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
How toricaEstimate
l
r ic al
o u sProbabilities from Data?
o o u
eg eg in s
at at nt las
c c co c
Tid Refund Marital
Status
Taxable
Income Evade
• Normal distribution:
( Ai − µ ij ) 2
1 Yes Single 125K No 1 −
2 σ ij2
P( A | c ) = e
2 No Married 100K No
2πσ
i j 2
3 No Single 70K No ij
4 Yes Married 120K No

– One for each (Ai,ci) pair
5 No Divorced 95K Yes
6 No Married 60K No • For (Income, Class=No):
7 Yes Divorced 220K No
– If Class=No
8 No Single 85K Yes
• sample mean = 110
9 No Married 75K No
• sample variance = 2975
10 No Single 90K Yes
10
1 −
( 120−110 ) 2
P( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2π (54.54)
Example of Naïve Bayes Classifier
Given a Test Record:
X = (Refund = No, Married, Income = 120K)

naive Bayes Classifier:
P(X|Class=No) = P(Refund=No|Class=No)
P(Refund=Yes|No) = 3/7 × P(Married| Class=No)
P(Refund=No|No) = 4/7
P(Refund=Yes|Yes) = 0
× P(Income=120K| Class=No)
P(Refund=No|Yes) = 1 = 4/7 × 4/7 × 0.0072 = 0.0024
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7 P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Married|No) = 4/7 × P(Married| Class=Yes)
P(Marital Status=Single|Yes) = 2/7 × P(Income=120K| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7
= 1 × 0 × 1.2 × 10-9 = 0
P(Marital Status=Married|Yes) = 0
For taxable income: Since P(X|No)P(No) > P(X|Yes)P(Yes)

If class=No: sample mean=110
sample variance=2975 Therefore P(No|X) > P(Yes|X)
If class=Yes: sample mean=90 => Class = No
sample variance=25
UN-
UN-SUPERVISED MODEL
K MEANS CLUSTERING
C L U S T E R I N G S E G M E N TAT I O N
W HAT I S C L U S T E R A N A LYS I S ?
Finding groups of objects such that the objects in a group will
be similar (or related) to one another and different from (or
unrelated to) the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
NOTION OF A CLUSTER CAN BE AMBIGUOUS
How many Six Clusters

clusters?
Two Clusters Four

Clusters
Image Clusters on
color
K-means clustering using intensity alone and color alone

NUMERICAL EXAMPLE
Lets consider we have cluster points:
P1(1,3) , P2(2,2) , P3(5,8) , P4(8,5) , P5(3,9) , P6(10,7) , P7(3,3) , P8(9,4) , P9(3,7).
K value as 3 and we assume that our Initial cluster centers are:

P7(3,3), P9(3,7), P8(9,4) as C1, C2, C3.
Steps:
1. Find the distance between data with the centers C1, C2 and C3.
2. Assign the data point to nearest cluster using the minimum distance value.
3. Repeat this step for all data points.
4. Update the centers by taking average of data points present in each cluster.
5. Repeat this step for “N” number of iterations or till convergence.
UPDATE THE CENTER
Updated Centers
Repeat this process until

convergence
REINFORCEMENT LEARNING MODEL
REINFORCEMENT LEARNING
EXAMPLE
REINFORCEMENT LEARNING (RL)
BASIC LEARNING
REINFORCEMENT LEARNING
Agent
Reward
Action
State
Environment
SETUP FOR REINFORCEMENT LEARNING
MARKOV DECISION PROCESS POLICY
(ENVIRONMENT) (AGENT’S BEHAVIOR)
Probability of moving to each

state
Reward for making Value of being in

that move that state
SIMPLE EXAMPLE OF AGENT IN AN
ENVIRONMENT
Score: 100
0
0, 0 1, 0 2, 0
100
0, 1 1, 1 2, 1
0, 2 1, 2 2, 2
POLICIES
Policy Evaluating Policies
0, 0 1, 0 2, 0
12.5 100
0, 1 1, 1 2, 1
50
0, 2 1, 2 2, 2
Move toMove toMove

<0,1> <1,1>
toMove to <2,0>
<1,0>
Policy could be
better
SPLITTING THE DATASET
Training Dataset:
The training dataset is the portion of the data used to train the model. It consists of
input-output pairs where the input is the data used to make predictions, and the
output is the corresponding target or label.
Testing Dataset:
The testing dataset is a separate portion of the data that is held out from the training
process. It is used to evaluate the performance of the trained model.
Validation Dataset:
The validation dataset is another independent portion of the data used during the
model development process.
Complete Dataset
Training/Development Phase
Testing Phase
Training Set
Testing Set
Validation Set
OVER FITTING AND UNDER FITTING
Overfitting:
It occurs when a model learns to fit the training data too closely, capturing
noise or random fluctuations in the data rather than the underlying pattern.
As a result, an overfitted model performs well on the training data but fails
to generalize to new, unseen data.
Underfitting:
IT happens when a model is too simple to capture the underlying structure
of the data.
It fails to capture the patterns in the training data and also performs poorly
on new data.
BIAS AND VARIANCE VS. OVERFITTING AND
UNDERFITTING
Bias:
It is the error caused because the model can not represent the concept.
Bias measures how much the average prediction of the model differs from
the true value it's trying to predict.
The difference between the training observations and the best fit line
is the Training Error. The Training Error is also called Bias.
Variance:
It is the error caused because the learning algorithm overreacts to small
changes (noise) in the training data
The difference between the testing observations and the best fit line is
the Testing Error. The Testing Error is also called a Variance.
Total Loss = Bias + Variance (+ noise)

Bias and Variance vs. overfitting and Underfitting
Considering the Bias and Variance we have four relationships:
High Bias and High Variance(Worst Case)
Low Bias and Low Variance(Best Case)
Low Bias and High Variance(Overfitting)
High Bias and Low Variance(Underfitting)
Avoiding overfitting and Underfitting
OVERFITTING UNDERFITTING
Cross-Validation By increasing the training time of

Training with more data the model.
Removing features By increasing the number of
Early stopping the training features.
Regularization
Ensembling
END

Machine Learning Concepts

Uploaded by

Copyright:

Available Formats

Machine Learning Concepts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Concepts

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING CONCEPTS

• Artificial Intelligence is a scientific field concerned with the development of

Regression aims to predict the value of a Classification aims to assign an input to

• Supervised learning categories and techniques

• Unsupervised learning categories and techniques

Reinforcement learning categories and techniques

For some tasks, input data

For other tasks, linear

LEAST SQUARE REGRESSION CLASSIFIER

cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated

cov(X,Y) = 0 X and Y are independent

Pearson’s Correlation Coefficient is

Measures the relative strength of the linear

Linear relationships Curvilinear relationships

So we find the best fit by varying the slope of

1 Yes Single 125K No

4 Yes Married 120K No

P( Income = 120 | No) = e 2 ( 2975 )

X = (Refund = No, Married, Income = 120K)

For taxable income: Since P(X|No)P(No) > P(X|Yes)P(Yes)

How many Six Clusters

Two Clusters Four

K-means clustering using intensity alone and color alone

K value as 3 and we assume that our Initial cluster centers are:

Repeat this process until

Probability of moving to each

Reward for making Value of being in

Move toMove toMove

Total Loss = Bias + Variance (+ noise)

Cross-Validation By increasing the training time of

You might also like