Naive Bayesian Classifier: National Institute of Technology Sikkim
Naive Bayesian Classifier: National Institute of Technology Sikkim
Naive Bayesian Classifier: National Institute of Technology Sikkim
In
At
Submitted By
Ankit Saurav
Roll No. B170013CS
B.Tech, 3rd year, 6th Semester
Concerned Faculty
Assistant Professor
Department of Computer Science and Engineering
National Institute of Technology Sikkim
Introduction
Naive Bayes is a simple technique for constructing classifiers: models that
assign class labels to problem instances, represented as vectors of feature
values, where the class labels are drawn from some finite set. There is not a
single algorithm for training such classifiers, but a family of algorithms based
on a common principle: all naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other feature, given the
class variable. For example, a fruit may be considered to be an apple if it is
red, round, and about 10 cm in diameter. A naive Bayes classifier considers
each of these features to contribute independently to the probability that this
fruit is an apple, regardless of any possible correlations between the color,
roundness, and diameter features.
For some types of probability models, naive Bayes classifiers can be trained
very efficiently in a supervised learning setting. In many practical applications,
parameter estimation for naive Bayes models uses the method of maximum
likelihood; in other words, one can work with the naive Bayes model without
accepting Bayesian probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified assumptions, naive
Bayes classifiers have worked quite well in many complex real-world
situations. In 2004, an analysis of the Bayesian classification problem showed
that there are sound theoretical reasons for the apparently implausible
efficacy of naive Bayes classifiers.[9] Still, a comprehensive comparison with
other classification algorithms in 2006 showed that Bayes classification is
outperformed by other approaches, such as boosted trees or random forests.
Probabilistic model
The discussion so far has derived the independent feature model, that is, the
naïve Bayes probability model. The naïve Bayes classifier combines this
model with a decision rule. One common rule is to pick the hypothesis that is
most probable; this is known as the maximum a posteriori or MAP decision
rule. The corresponding classifier, a Bayes classifier, is the function that
assigns a class label y^ = Ck for some k as follows:
( ∑i xi)!
p(x | Ck ) = pkixi
∏i x i ! ∏
i
where pki is the probability of class Ck generating the term xi. This event
model is especially popular for classifying short texts. It has the benefit of
explicitly modelling the absence of terms. Note that a naive Bayes classifier
with a Bernoulli event model is not the same as a multinomial NB classifier
with frequency counts truncated to one.
Given a way to train a naïve Bayes classifier from labeled data, it's possible
to construct a semi-supervised training algorithm that can learn from a
combination of labeled and unlabeled data by running the supervised learning
algorithm in a loop:
Conclusion:
Despite the fact that the far-reaching independence assumptions are often
inaccurate, the naive Bayes classifier has several properties that make it
surprisingly useful in practice. In particular, the decoupling of the class
conditional feature distributions means that each distribution can be
independently estimated as a one-dimensional distribution. This helps
alleviate problems stemming from the curse of dimensionality, such as the
need for data sets that scale exponentially with the number of features. While
naive Bayes often fails to produce a good estimate for the correct class
probabilities,[19] this may not be a requirement for many applications. For
example, the naive Bayes classifier will make the correct MAP decision rule
classification so long as the correct class is more probable than any other
class. This is true regardless of whether the probability estimate is slightly, or
even grossly inaccurate. In this manner, the overall classifier can be robust
enough to ignore serious deficiencies in its underlying naive probability
model.