Machine Learning UNIT-3
Machine Learning UNIT-3
Machine Learning UNIT-3
UNIT-3
BAYESIAN AND COMPUTIONBAL LEARNING
o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program
that helped an IBM computer to play a checkers game. It performed better more it
played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.
The first "AI" winter:
o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and
this duration was called as AI winter.
Machine learning from theory to reality
o 1959: In 1959, the first neural network was applied to a real-world problem to
remove echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce 20,000
words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the
chess expert Garry Kasparov, and it became the first computer which had beaten a
human chess expert.
Machine learning at 21st century
o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new name to
neural net research as "deep learning," and nowadays, it has become one of the
most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to recognize the
image of humans and cats in YouTube videos.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go game. In
2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was able
to learn the online trolling. It used to read millions of comments of different
websites to learn to stop online trolling.
4
The goal of supervised learning is to map input data with the output data.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning.
It is used to identify objects, persons, places, digital images, etc.
The popular use case of image recognition and face detection is, Automatic friend
tagging suggestion:
Whenever we upload a photo with our Face book friends, then we automatically get
a tagging suggestion with name, and the technology behind this is machine
learning's face detection and recognition algorithm.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
7
We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning.
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,
and Naïve Bayes classifier are used for email spam filtering and malware detection.
6. Virtual Personal Assistant:
We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction.
These assistants can help us in various ways just by our voice instructions such as
Play music, call someone, Open an email, Scheduling an appointment, etc.
7. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction.
Whenever we perform some online transaction, there may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction.
8. Automatic Language Translation:
Nowadays, if we visit a new place and we are not aware of the language then it is
not a problem at all, as for this also machine learning helps us by converting the text
into our known languages.
Difference between Supervised and Unsupervised Learning:
This model takes direct feedback to check This model does not take any feedback.
if it is predicting correct output or not.
Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.
This model produces an accurate result. This model may give less accurate result
as compared to supervised learning.
Bayes Theorem
Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
Bayes' theorem can be derived using product rule and conditional probability of
event X with known event Y:
o According to the product rule we can express as the probability of event X with
known event Y as follows;
1. P(X ? Y)= P(X|Y) P(Y) {equation 1}
o Further, the probability of event Y with known event X:
1. P(X ? Y)= P(Y|X) P(X) {equation 2}
9
Here, both events X and Y are independent events which means probability of outcome of
both events does not depends one another.
The above equation is called as Bayes Rule or Bayes Theorem.
o P(X|Y) is called as posterior, which we need to calculate. It is defined as updated
probability after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is
true.
o P(X) is called the prior probability, probability of hypothesis before considering
the evidence
o P(Y) is called marginal probability. It is defined as the probability of evidence under
any consideration.
Hence, Bayes Theorem can be written as: posterior = likelihood * prior / evidence
Maximum Likelihood
Maximum Likelihood Estimation (MLE) is a probabilistic based approach to
determine values for the parameters of the model. MLE is a widely used technique
in machine learning, time series, panel data and discrete data.
The likelihood function measures the extent to which the data provide support for
different values of the parameter. It indicates how likely it is that a particular
population will produce a sample.
Working of Maximum Likelihood Estimation
The maximization of the likelihood estimation is the main objective of the MLE.
10
So MLE will calculate the possibility for each data point in salary and then by using
that possibility, it will calculate the likelihood of those data points to classify them
as either 0 or 1.
It will repeat this process of likelihood until the learner line is best fitted. This
process is known as the maximization of likelihood.
MLE is the base of a lot of supervised learning models, one of which is Logistic
regression.
Logistic regression maximum likelihood technique to classify the data. Let’s see
how Logistic regression uses MLE.
Specific MLE procedures have the advantage that they can exploit the properties of
the estimation problem to deliver better efficiency and numerical stability.
These methods can often calculate explicit confidence intervals.
The parameter “solver” of the logistic regression is used for selecting different
solving strategies for classification for better MLE formulation.
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
11
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature
is independent of the occurrence of other features. Such as if the fruit is identified on
the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized
as an apple. Hence each feature individually contributes to identify that it is an apple
without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
o The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of
a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
12
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but
the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
Instanced based learning:
The Machine Learning systems which are categorized as instance-based learning are
the systems that learn the training examples by heart and then generalize to new
instances based on some similarity measure.
It is called instance-based because it builds the hypotheses from the training instances.
It is also known as memory-based learning or lazy-learning .
The time complexity of this algorithm depends upon the size of training data.
Each time whenever a new query is encountered, its previously stores data is
examined.
And assign to a target function value for the new instance.
The worst-case time complexity of this algorithm is O (n), where n is the number of
training instances.
Some of the instance-based learning algorithms are:
1. K Nearest Neighbor (KNN)
2. Learning Vector Quantization (LVQ)
3. Locally Weighted Learning (LWL)
4. Case-Based Reasoning
K-Nearest Neighbor (KNN) Algorithm for Machine Learning
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
15
K-NN algorithm stores all the available data and classifies a new data point based on
the similarity.
This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
It is also called a lazy learner algorithm .
KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data. .
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories.
To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below diagram:
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
16
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each
category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider
the below image:
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the
below image: