Assignment 3 Part 1 and 4

Part 1:
Naïve Bayes is a probabilistic machine learning algorithm based on Bayes’ theorem. It is commonly
used for classification, particularly in NLP and image recognition.
Discrete Naïve Bayes that features follow categorical or discrete structure. For example, count of
specific words in a text document is discrete in nature.
Steps involved in Discrete Naïve Bayes:
1. Calculate probability of each class in the dataset.

(Formula: P(Class = c) = Count of instances of class c / Total number of instances)
2. Calculate probability of a feature in a given class.
(Formula: P(Feature| Class = c) = Count of instances with features in c / Total count of
instances in c )
3. Given a new instance with features, compute the probability of each class using Bayes'
theorem.
(Formula: P(Class = c ∣ features) = P(Class = c) × ∏ni=1 P (featurei ∣ Class = c) )
Continuous Naïve Bayes assumes continuous or real valued features. It models features using
continuous probability distributions like Gaussian. Sensor readings is an example of continuous
feature.
Steps involved in Continuous Naïve Bayes:
1. Calculate probability of each class in the dataset.

(Formula: P(Class = c) = Count of instances of class c / Total number of instances)
2. Conditional Probability (P(feature | class)):
a. Estimate parameters of a probability distribution (usually Gaussian) for each feature
within each class (mean and variance).
b. For a feature x in class c:
i. Mean = ¿ , where Nc is the number of instances in class c
ii. Variance σ 2 ( c , x ) =¿
3. Prediction: Similar to discrete Naïve Bayes but using probability density functions for
continuous features.
Classification using Naïve Bayes has a training phase and a prediction phase.
Training Phase:
 Discrete Case:
 Calculate the prior probabilities for each class (P(class)).
 Estimate conditional probabilities for each feature given each class (P(feature|class)).
 Continuous Case:
 Compute mean and variance for each feature within each class.
Prediction Phase:
 Given new data (set of features), compute the likelihood of these features for each class.
 Use Bayes' theorem to calculate the posterior probability for each class.
 Assign the class with the highest posterior probability as the prediction.
Estimating Conditional Probability Distributions:
 Discrete Case:
 Compute probabilities using relative frequency counts.
 For instance, for a particular feature given a class, count the occurrences of that
feature in that class and divide by the total count of that class.
 For Gaussian distribution, estimate the mean and variance for each feature within
each class.
 Use the sample mean and variance of the feature values observed within each class.
Smoothing in Naïve Bayes:
 Why Smoothing?
 To handle scenarios where certain feature class combinations were not present in
the training data.
 Avoid zero probabilities which could lead to biased predictions.
 Smoothing Techniques:
 Discrete Case (Additive or Laplace Smoothing):
 Add a small value to all counts (usually 1) to avoid zero probabilities.
 Add a small value (epsilon) to the variance to prevent probabilities from

becoming zero.
Implementing Naïve Bayes involves computing these probabilities and applying the appropriate
techniques based on the nature of the features (discrete or continuous) and handling potential issues
like zero probabilities through smoothing.
Part 4:
Results Analysis:
Accuracy:
 Test Set Accuracy: 55.58% (from the test confusion matrix)

 Training Set Accuracy: 9.26% (from the training confusion matrix)
Precision, Recall, F1score:
 Vary significantly across different classes.

 The classifier performs relatively well on certain classes (e.g., 1, 7) and poorly on others (e.g.,
5, 8).
Challenges Faced:
 Handling High Dimensional Data: Naïve Bayes assumes feature independence, which might
not hold well in high dimensional data like images.
 Distribution Assumptions: Gaussian assumptions for continuous features might not perfectly
match the true distribution of pixel values.
 Class Imbalance: Some digits might be underrepresented, affecting model performance.
Possible Improvements:
 Feature Engineering: Extract more informative features from images that might better
represent the digits.
 Model Selection: Explore other classification models that can handle high dimensional data
better than Naïve Bayes, like ensemble methods or deep learning models.
 Normalization or Transformation: Preprocess image data to ensure better conformance with
assumptions made by Naïve Bayes or other models.
 Address Class Imbalance: Strategies like oversampling or under sampling the
minority/majority classes could help balance the dataset.
Further Investigation:
 Analyze misclassified instances or classes to understand why certain digits are harder to
predict.
 Experiment with different hyperparameters or preprocessing techniques to enhance model
performance.
Improving the Naïve Bayes classifier's performance might involve a combination of data
preprocessing, model selection, and hyperparameter tuning to better suit the complexities of the
MNIST dataset.

Assignment 3 Part 1 and 4

Uploaded by

Copyright:

Available Formats

Assignment 3 Part 1 and 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 3 Part 1 and 4

Uploaded by

Copyright:

Available Formats

Part 1:

Steps involved in Discrete Naïve Bayes:

1. Calculate probability of each class in the dataset.

Steps involved in Continuous Naïve Bayes:

1. Calculate probability of each class in the dataset.

 Calculate the prior probabilities for each class (P(class)).

Estimating Conditional Probability Distributions:

 Compute probabilities using relative frequency counts.

Smoothing in Naïve Bayes:

 Avoid zero probabilities which could lead to biased predictions.

 Discrete Case (Additive or Laplace Smoothing):

 Add a small value to all counts (usually 1) to avoid zero probabilities.

 Add a small value (epsilon) to the variance to prevent probabilities from

 Test Set Accuracy: 55.58% (from the test confusion matrix)

Precision, Recall, F1score:

 Vary significantly across different classes.

You might also like