Module 1

Introduction to Machine
Learning
21BCA6D02
VI Sem BCA - DA
Course Objectives & Outcomes
COB1 Introduce the basic concepts of Machine Learning techniques in problem solving
COB2 Familiarize Machine Learning model building and evaluation
COB3 Introduce Neural Network and Deep Learning
CO1 Demonstrate the basic understanding of Machine Learning

CO2 Apply linear regression to build machine learning models to solve real life problems
CO3 Analyse and apply classification algorithms in predictive modelling.
CO4 Evaluate and select suitable clustering technique for problem solving.
CO5 Formulate solutions for real life problems using neural network and deep learning.
Syllabus
Module 1 (12 Hours) Learning Problems:

• Designing Learning systems, Perspectives and Issues in Machine Learning, What is statistical learning? –
Why and how we estimate f, Trade-off between Prediction Accuracy and Model Interpretation, Supervised vs
Unsupervised Learning, Regression vs Classification problems, Assessing Model Accuracy: Measuring the
quality of fit, The Bias-Variance Trade-off.
Module 2 (12 Hours) Linear Regression:

• Simple Linear Regression: Estimating the Coefficients, Assessing the Accuracy of the Coefficient Estimates,
Assessing the Accuracy of the Model Multiple Linear Regression: Estimating the Regression Coefficients,
Some Important Questions, and Other Considerations in the Regression Model: Qualitative Predictors,
Extensions of the Linear Model, Potential Problems.
Syllabus
Module 3 (12 Hours) Classification:

• Basic Concepts, Decision Tree Induction, Bayes Classification Methods, Rule Based Classification, Model
Based Evaluation and Selection. Techniques to improve classification accuracy, Bayesian Belief networks
and Support Vector Machines
Module 4 (12 Hours) Cluster Analysis:
• Basic Concepts and Methods. Cluster Analysis overview. Partitioning Methods: K-means; Hierarchical
Methods: Agglomerative versus Divisive Hierarchical Clustering, Distance Measures in Algorithmic
Methods, BIRCH, Density Based: DBSCAN
• Evaluation of Clustering: Assessing Clustering Tendency, Determining the Number of Clusters, Measuring
Clustering Quality
Module 5 (12 Hours) Introduction to Neural Networks and Deep Learning:
• Neural network representations, Perceptron Multilayer Networks and Back Propagation Algorithm,
Introduction to Deep Learning, Architectures: RNN, CNN
Introduction
• Understanding of how to make computers learn would open up many new uses
and new levels of competence and customization
• State of the art achievements:

Recognize spoken words
Predict recovery rates of pneumonia patients
Detect fraudulent use of credit card
Drive autonomous vehicles on public highways
Play games like checkers
• ML is a multidisciplinary field (AI, Probability and Statistics, Information Theory,

Philosophy, Psychology, Neurobiology,… )
Introduction
• A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
A checkers learning problem:

• Task T: playing checkers
• Performance measure P: percent of games won against opponents
• Training experience E: playing practice games against itself
A handwriting recognition learning problem:

• Task T: recognizing and classifying handwritten words within images
• Performance measure P: percent of words correctly classified
• Training experience E: a database of handwritten words with given classifications
Designing a Learning System
I. Choosing the Training Experience
II. Choosing the Target Function
III. Choosing a Representation for the Target Function
IV. Choosing a Function Approximation Algorithm
V. The Final Design

1. Choose the type of training experience
• Has a significant impact on success or failure of the learner

• Key attribute is whether it provides direct or indirect feedback
• Ex: Direct: individual checkers board states and correct move for each
• Indirect: Move sequences and final outcomes of various games played
• Credit Assignment: Degree to which each move in the sequence deserves credit or blame for the
final outcome
• Game can be lost even when earlier moves are optimal and later poor moves
2. Learner Control on sequence of training examples

• Learner might rely on teacher to select board states and to provide correct move
• Learner may have complete control over board states
3. Distribution of examples
• Reliable when training examples follow a distribution similar to future test examples
• If training experience has games only played against itself, then training experience might not be
fully representative of distribution of future test cases
• Mastery of one distribution of examples will not necessary lead to strong performance over other
distribution
• What type of knowledge will be learnt

• Choose the best move among legal moves
• Legal moves define large search space that are known a priori
• A function that choose the best move from any given board state
• ChooseMove B  M
• Where B is set of legal board states and M is set of legal moves
• It is difficult if there is indirect experience.
• V: B  R
• Where, V is the target function, R is the real value (Score to any given board state)
• V assigns higher scores to better board state
V to choose best successor board state and there by best legal move
• 1. if b is a final board state that is won, then V(b) = 100
• 2. if b is a final board state that is lost, then V(b) = -100
• 3. if b is a final board state that is drawn, then V(b) = 0
• 4. if b is not a final state in the game, then V(b) = V(b’), where b' is the best final board state that
can be achieved starting from b
• Operational description of V to evaluate the states and select moves
It is difficult to learn V perfectly but may be approximately

• V is the ideal target function
• V is the function that is actually learned by the program
III. Choosing a Representation for the Target Function
• Large table with distinct entry specifying value for each distinct board state
• Collection of rules that match against the features of the board state
• Quadratic polynomial function of predefined board features
• Artificial Neural Network
• Pick an expressive representation to represent close approximation to ideal target function
• But more training data is required
• Need a simple representation V will be calculated as a linear combination of board features
• Weights determine the importance of the board features

IV. Choosing a Function Approximation Algorithm
Each training example is an ordered pair of the form (b, Vtrain(b))
Ex: Training example in which Black has won the game
Estimating Training Values:

It is difficult to assign training values to the intermediate board states
Lost game does not necessarily indicate that every board state along the game path is bad
There may be early board states which should be rated very high
Assign the training value for any intermediate board state to be V’(Successor(b))
Estimating training values based on successor state prove to converge toward perfect estimates of V train
V. The Final Design
Performance System
Critic
Generalizer
Experiment Generator
Perspectives and Issues in ML
Perspectives:
It involves a very large search space
Iteratively tune the weights whenever the predicted value differs from training value
Different hypothesis representation based on the algorithm
Finally consistent hypothesis with training data will correctly generalize to unseen examples
Perspectives and Issues in ML
Issues:
What algorithms exist for learning general target functions from specific training examples?
Which algorithm perform best for which type of problem?
How much training data is sufficient?
When and how can prior knowledge held by learner guide the generalizing from examples?
How can the learner automatically alter its representation to improve its ability to represent?
Statistical Learning
Statistical Learning is a set of tools for understanding data
2 Classes: Supervised and Unsupervised Learning
Supervised learning: predicting or estimating an o/p based on one or more i/p guided by supervised
o/p and i/p
Unsupervised learning: relationship or finds a pattern within the given data without a supervised o/p
Let, suppose that we observe a response Y and p different predictors X = (X ₁, X ₂,…., Xp).
Y =f(X) + ε
Here f is the function describing the relationship between X and Y, and ε is the random error term.
Why Estimate f?
Exactly f is not known, so use statistical methods to estimate f’
Y’ = f’(X)
Reducible Error
Error arising from mismatch of f’ and f
Irreducible Error
Arises from the fact that X doesn’t
completely determine Y.
There are variables outside of X that still have some small effect on Y
How Estimate f?
Using training data, apply statistical learning method estimate unknown function f
Parametric Methods
• 1. Make an assumption about functional form of f, such as “f is linear in X”
• 2. Perform procedure that uses training data to train the model
• In case of linear model, this procedure estimates parameters β0, β1, ..., βp
• Most common approach to fit linear model is (ordinary) least squares
Non-Parametric Methods
Do not make assumptions about the form of f.
Have the potential to fit a wider range of possible shapes for f.
The problem of estimating f is not reduced to a set number of parameters.
More observations are needed compared to a parametric approach to estimate f accurately
Trade-off between Prediction Accuracy and Model Interpretability
Prediction Accuracy
Accuracy
For imbalanced dataset  Precision, Recall
For unified measure  F-Score
Restrictive models are much more interpretable than flexible ones
Flexible approaches can be so complicated that it is hard to understand how predictors

affect the response.
If inference is the goal, simple and inflexible methods are easier to interpret
Flexible models are more prone to overfitting.

Supervised Versus Unsupervised Learning
Supervised learning methods are those in which a model that captures the relationship
between predictors and response measurements is fitted.
• The goal is to accurately predict the response variables for future observations
• To understand the relationship between the predictors and response.
Unsupervised learning takes place when we have a set of observations and a vector of
measurements xi, but no response yi.
• Examine the relationship between the variables or between the observations
• A popular method of unsupervised learning is cluster analysis, in which observations are

grouped into distinct groups based on their vector of measurements xi.
Supervised Versus Unsupervised Learning
Clustering works best when the groups are significantly distinct from each other.
There is often overlap between observations in different groups
Clustering will inevitably place a number of observations in the wrong groups
Visualization of clusters breaks down as the dimensionality of data increases.
There are some scenarios where only a subset of the observations has response measurements.
This is a semi-supervised learning problem

Regression vs Classification
Variables can be categorized as either quantitative or qualitative
Both qualitative and quantitative predictors can be used to predict both types of response variables
Choosing an appropriate statistical learning method is based on the type of the response variable
Regression vs Classification
Regression Classification
The output variable must be either continuous nature

The output variable has to be a discrete value.
or real value.
It attempt to find the best fit line, which predicts the Classification tries to find the decision boundary,
output more accurately. which divides the dataset into different classes.
Classification algorithms solve classification problems

Regression algorithms solve regression problems such
like identifying spam e-mails, spotting cancer cells,
as house price predictions and weather predictions.
and speech recognition.
We can further divide Regression algorithms into We can further divide Classification algorithms into
Linear and Non-linear Regression. Binary Classifiers and Multi-class Classifiers.
Assessing Model Accuracy
• Every data set is different and there is no one statistical learning method that works best for all data sets
Measuring the Quality of Fit

Need to be able to quantify how well a model’s predictions match the observed data
In regression, mean squared error (MSE) is the most commonly-used measure
A small MSE indicates the predicted responses are very close to the true ones.
The test MSE can be evaluated on these observations, and the learning method which produces the smallest TSE
will be chosen.
There is no guarantee that a model with the lowest training MSE also has the lowest test MSE.
Models often work in minimizing the training MSE, and can end up with large test MSE.
• A model with a small training MSE and large test MSE is overfitting the data, picking up patterns on the
training data that don’t exist in the test data.
Bias-Variance Trade-Off
• The expected test MSE can be broken down into the sum of three quantities:
• 1. the variance of (x0)
• 2. the squared bias of (x0)
• 3. the variance of the error terms ε
• To minimize expected test MSE, we need to choose a statistical learning method that achieves both low
variance and low bias.
• Variance refers to how much would change if repeatedly estimated with different training data sets.
• Generally, the more flexible a model it is, the higher the variance.
• Bias is the error introduced from approximating a complicated problem by a much simpler model.
• Fitting a linear regression to data that is not linear will always lead to high bias, no matter how many
observations are in the training set.
• More flexible methods lead to higher variance and lower bias

Module 1

Uploaded by

Copyright:

Available Formats

Module 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 1

Uploaded by

Copyright:

Available Formats

Introduction to Machine

CO1 Demonstrate the basic understanding of Machine Learning

Module 1 (12 Hours) Learning Problems:

Module 2 (12 Hours) Linear Regression:

Module 3 (12 Hours) Classification:

• State of the art achievements:

• ML is a multidisciplinary field (AI, Probability and Statistics, Information Theory,

A checkers learning problem:

A handwriting recognition learning problem:

I. Choosing the Training Experience

II. Choosing the Target Function

III. Choosing a Representation for the Target Function

IV. Choosing a Function Approximation Algorithm

V. The Final Design

1. Choose the type of training experience

• Has a significant impact on success or failure of the learner

2. Learner Control on sequence of training examples

• What type of knowledge will be learnt

It is difficult to learn V perfectly but may be approximately

• Weights determine the importance of the board features

Estimating Training Values:

It involves a very large search space

Different hypothesis representation based on the algorithm

Which algorithm perform best for which type of problem?

How much training data is sufficient?

For imbalanced dataset  Precision, Recall

For unified measure  F-Score

Restrictive models are much more interpretable than flexible ones

Flexible approaches can be so complicated that it is hard to understand how predictors

Flexible models are more prone to overfitting.

• To understand the relationship between the predictors and response.

• Examine the relationship between the variables or between the observations

• A popular method of unsupervised learning is cluster analysis, in which observations are

There is often overlap between observations in different groups

Clustering will inevitably place a number of observations in the wrong groups

Visualization of clusters breaks down as the dimensionality of data increases.

This is a semi-supervised learning problem

Variables can be categorized as either quantitative or qualitative

The output variable must be either continuous nature

Classification algorithms solve classification problems

Measuring the Quality of Fit

You might also like