Module 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Introduction to Machine

Learning
21BCA6D02
VI Sem BCA - DA
Course Objectives & Outcomes

COB1 Introduce the basic concepts of Machine Learning techniques in problem solving
COB2 Familiarize Machine Learning model building and evaluation
COB3 Introduce Neural Network and Deep Learning

CO1 Demonstrate the basic understanding of Machine Learning


CO2 Apply linear regression to build machine learning models to solve real life problems
CO3 Analyse and apply classification algorithms in predictive modelling.
CO4 Evaluate and select suitable clustering technique for problem solving.
CO5 Formulate solutions for real life problems using neural network and deep learning.
Syllabus

Module 1 (12 Hours) Learning Problems:


• Designing Learning systems, Perspectives and Issues in Machine Learning, What is statistical learning? –
Why and how we estimate f, Trade-off between Prediction Accuracy and Model Interpretation, Supervised vs
Unsupervised Learning, Regression vs Classification problems, Assessing Model Accuracy: Measuring the
quality of fit, The Bias-Variance Trade-off.

Module 2 (12 Hours) Linear Regression:


• Simple Linear Regression: Estimating the Coefficients, Assessing the Accuracy of the Coefficient Estimates,
Assessing the Accuracy of the Model Multiple Linear Regression: Estimating the Regression Coefficients,
Some Important Questions, and Other Considerations in the Regression Model: Qualitative Predictors,
Extensions of the Linear Model, Potential Problems.
Syllabus

Module 3 (12 Hours) Classification:


• Basic Concepts, Decision Tree Induction, Bayes Classification Methods, Rule Based Classification, Model
Based Evaluation and Selection. Techniques to improve classification accuracy, Bayesian Belief networks
and Support Vector Machines
Module 4 (12 Hours) Cluster Analysis:
• Basic Concepts and Methods. Cluster Analysis overview. Partitioning Methods: K-means; Hierarchical
Methods: Agglomerative versus Divisive Hierarchical Clustering, Distance Measures in Algorithmic
Methods, BIRCH, Density Based: DBSCAN
• Evaluation of Clustering: Assessing Clustering Tendency, Determining the Number of Clusters, Measuring
Clustering Quality
Module 5 (12 Hours) Introduction to Neural Networks and Deep Learning:
• Neural network representations, Perceptron Multilayer Networks and Back Propagation Algorithm,
Introduction to Deep Learning, Architectures: RNN, CNN
Introduction
• Understanding of how to make computers learn would open up many new uses
and new levels of competence and customization

• State of the art achievements:


Recognize spoken words
Predict recovery rates of pneumonia patients
Detect fraudulent use of credit card
Drive autonomous vehicles on public highways
Play games like checkers

• ML is a multidisciplinary field (AI, Probability and Statistics, Information Theory,


Philosophy, Psychology, Neurobiology,… )
Introduction
• A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.

A checkers learning problem:


• Task T: playing checkers
• Performance measure P: percent of games won against opponents
• Training experience E: playing practice games against itself

A handwriting recognition learning problem:


• Task T: recognizing and classifying handwritten words within images
• Performance measure P: percent of words correctly classified
• Training experience E: a database of handwritten words with given classifications
Designing a Learning System

I. Choosing the Training Experience

II. Choosing the Target Function

III. Choosing a Representation for the Target Function

IV. Choosing a Function Approximation Algorithm

V. The Final Design


Designing a Learning System
I. Choosing the Training Experience

1. Choose the type of training experience

• Has a significant impact on success or failure of the learner


• Key attribute is whether it provides direct or indirect feedback
• Ex: Direct: individual checkers board states and correct move for each
• Indirect: Move sequences and final outcomes of various games played
• Credit Assignment: Degree to which each move in the sequence deserves credit or blame for the
final outcome
• Game can be lost even when earlier moves are optimal and later poor moves
Designing a Learning System
I. Choosing the Training Experience

2. Learner Control on sequence of training examples


• Learner might rely on teacher to select board states and to provide correct move
• Learner may have complete control over board states

3. Distribution of examples
• Reliable when training examples follow a distribution similar to future test examples
• If training experience has games only played against itself, then training experience might not be
fully representative of distribution of future test cases
• Mastery of one distribution of examples will not necessary lead to strong performance over other
distribution
Designing a Learning System
II. Choosing the Target Function

• What type of knowledge will be learnt


• Choose the best move among legal moves
• Legal moves define large search space that are known a priori
• A function that choose the best move from any given board state
• ChooseMove B  M
• Where B is set of legal board states and M is set of legal moves
• It is difficult if there is indirect experience.
• V: B  R
• Where, V is the target function, R is the real value (Score to any given board state)
• V assigns higher scores to better board state
Designing a Learning System
II. Choosing the Target Function

V to choose best successor board state and there by best legal move
• 1. if b is a final board state that is won, then V(b) = 100
• 2. if b is a final board state that is lost, then V(b) = -100
• 3. if b is a final board state that is drawn, then V(b) = 0
• 4. if b is not a final state in the game, then V(b) = V(b’), where b' is the best final board state that
can be achieved starting from b
• Operational description of V to evaluate the states and select moves

It is difficult to learn V perfectly but may be approximately


• V is the ideal target function
• V is the function that is actually learned by the program
Designing a Learning System
III. Choosing a Representation for the Target Function

• Large table with distinct entry specifying value for each distinct board state
• Collection of rules that match against the features of the board state
• Quadratic polynomial function of predefined board features
• Artificial Neural Network
• Pick an expressive representation to represent close approximation to ideal target function
• But more training data is required
• Need a simple representation V will be calculated as a linear combination of board features

• Weights determine the importance of the board features


Designing a Learning System
IV. Choosing a Function Approximation Algorithm
Each training example is an ordered pair of the form (b, Vtrain(b))
Ex: Training example in which Black has won the game

Estimating Training Values:


It is difficult to assign training values to the intermediate board states
Lost game does not necessarily indicate that every board state along the game path is bad
There may be early board states which should be rated very high
Assign the training value for any intermediate board state to be V’(Successor(b))

Estimating training values based on successor state prove to converge toward perfect estimates of V train
Designing a Learning System
V. The Final Design

Performance System
Critic
Generalizer
Experiment Generator
Perspectives and Issues in ML
Perspectives:

It involves a very large search space

Iteratively tune the weights whenever the predicted value differs from training value

Different hypothesis representation based on the algorithm

Finally consistent hypothesis with training data will correctly generalize to unseen examples
Perspectives and Issues in ML
Issues:

What algorithms exist for learning general target functions from specific training examples?

Which algorithm perform best for which type of problem?

How much training data is sufficient?

When and how can prior knowledge held by learner guide the generalizing from examples?

How can the learner automatically alter its representation to improve its ability to represent?
Statistical Learning
Statistical Learning is a set of tools for understanding data
2 Classes: Supervised and Unsupervised Learning

Supervised learning: predicting or estimating an o/p based on one or more i/p guided by supervised
o/p and i/p

Unsupervised learning: relationship or finds a pattern within the given data without a supervised o/p

Let, suppose that we observe a response Y and p different predictors X = (X ₁, X ₂,…., Xp).

Y =f(X) + ε

Here f is the function describing the relationship between X and Y, and ε is the random error term.
Why Estimate f?
Exactly f is not known, so use statistical methods to estimate f’
Y’ = f’(X)

Reducible Error
Error arising from mismatch of f’ and f

Irreducible Error
Arises from the fact that X doesn’t
completely determine Y.
There are variables outside of X that still have some small effect on Y
How Estimate f?
Using training data, apply statistical learning method estimate unknown function f
Parametric Methods
• 1. Make an assumption about functional form of f, such as “f is linear in X”
• 2. Perform procedure that uses training data to train the model
• In case of linear model, this procedure estimates parameters β0, β1, ..., βp
• Most common approach to fit linear model is (ordinary) least squares

Non-Parametric Methods
Do not make assumptions about the form of f.
Have the potential to fit a wider range of possible shapes for f.
The problem of estimating f is not reduced to a set number of parameters.
More observations are needed compared to a parametric approach to estimate f accurately
Trade-off between Prediction Accuracy and Model Interpretability

Prediction Accuracy
Accuracy

For imbalanced dataset  Precision, Recall

For unified measure  F-Score

Restrictive models are much more interpretable than flexible ones

Flexible approaches can be so complicated that it is hard to understand how predictors


affect the response.

If inference is the goal, simple and inflexible methods are easier to interpret

Flexible models are more prone to overfitting.


Supervised Versus Unsupervised Learning

Supervised learning methods are those in which a model that captures the relationship
between predictors and response measurements is fitted.

• The goal is to accurately predict the response variables for future observations

• To understand the relationship between the predictors and response.

Unsupervised learning takes place when we have a set of observations and a vector of
measurements xi, but no response yi.

• Examine the relationship between the variables or between the observations

• A popular method of unsupervised learning is cluster analysis, in which observations are


grouped into distinct groups based on their vector of measurements xi.
Supervised Versus Unsupervised Learning

Clustering works best when the groups are significantly distinct from each other.

There is often overlap between observations in different groups

Clustering will inevitably place a number of observations in the wrong groups

Visualization of clusters breaks down as the dimensionality of data increases.

There are some scenarios where only a subset of the observations has response measurements.

This is a semi-supervised learning problem


Regression vs Classification

Variables can be categorized as either quantitative or qualitative

Both qualitative and quantitative predictors can be used to predict both types of response variables

Choosing an appropriate statistical learning method is based on the type of the response variable
Regression vs Classification

Regression Classification

The output variable must be either continuous nature


The output variable has to be a discrete value.
or real value.

It attempt to find the best fit line, which predicts the Classification tries to find the decision boundary,
output more accurately. which divides the dataset into different classes.

Classification algorithms solve classification problems


Regression algorithms solve regression problems such
like identifying spam e-mails, spotting cancer cells,
as house price predictions and weather predictions.
and speech recognition.

We can further divide Regression algorithms into We can further divide Classification algorithms into
Linear and Non-linear Regression. Binary Classifiers and Multi-class Classifiers.
Assessing Model Accuracy
• Every data set is different and there is no one statistical learning method that works best for all data sets

Measuring the Quality of Fit


Need to be able to quantify how well a model’s predictions match the observed data
In regression, mean squared error (MSE) is the most commonly-used measure
A small MSE indicates the predicted responses are very close to the true ones.

The test MSE can be evaluated on these observations, and the learning method which produces the smallest TSE
will be chosen.
There is no guarantee that a model with the lowest training MSE also has the lowest test MSE.
Models often work in minimizing the training MSE, and can end up with large test MSE.
Assessing Model Accuracy
• A model with a small training MSE and large test MSE is overfitting the data, picking up patterns on the
training data that don’t exist in the test data.
Bias-Variance Trade-Off
• The expected test MSE can be broken down into the sum of three quantities:
• 1. the variance of (x0)
• 2. the squared bias of (x0)
• 3. the variance of the error terms ε

• To minimize expected test MSE, we need to choose a statistical learning method that achieves both low
variance and low bias.
• Variance refers to how much would change if repeatedly estimated with different training data sets.
• Generally, the more flexible a model it is, the higher the variance.
Assessing Model Accuracy
• Bias is the error introduced from approximating a complicated problem by a much simpler model.
• Fitting a linear regression to data that is not linear will always lead to high bias, no matter how many
observations are in the training set.
• More flexible methods lead to higher variance and lower bias

You might also like