Wine Quality Classification Using Weka

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

Wine Quality

Classification
Using Weka
Darshan Pathak 123M1H041

Supervised By:
Prof. Bali Khurana
01
Classification
What is Classification
Classification in Machine Learning aims to determine which category an
observation by understanding the relationship between the dependent and
independent variables.

Classification algorithm can learn to


predict whether a given email is spam or
not span
Working of Classification Algorithms
Classification algorithms sort data into predefined categories based on patterns they
learn from labeled examples. They use features (data attributes) to create a model
during training, and then apply this model to predict the classes of new, unseen
data.
Types of Classification

Binary Classification Multi-Class Classification

Multi-Label Classification Imbalanced Classification


02
Case Study
Let’s learn with help of

Wine Quality Case Study


Using Weka Software
What is WEKA Software
• Waikato Environment for Knowledge Analysis
• Collection of machine learning algorithms and data processing tools
implemented in Java
• Used for the process of experimental data mining
• Preparation of input data
• Statistical evaluation of learning schemes
• Visualization of input data and the result
1. Install Weka
2. Load Data
Wine Quality Dataset
3. Visualize
4. Select J48 Classifier
5. Evaluate Result
Summary
Classification Corretly
Classified
Kappa statistic 0.3952
61.4% Classifier - J48
Mean absolute error 0.1359 982 Instances

Root mean squared ● Decision tree classifier


error 0.3332
● Recursively splits based on attribute
● Select features to create decision
Relative absolute
error 63.3475 % nodes and branches.
Incorrectly ● Emphasizes information gain.
Root relative squared Classified ● Effective for categorical data.
error 101.8207 %
38.5%
Total Number of
Instances 1599 617 Instances
Proportion of actual positive instances correctly identified.
TP Rate

Proportion of actual negative instances incorrectly classified as


FP Rate positive.

The accuracy of positive predictions, representing the ratio of true


Precision positives to the total predicted positives.

The model's ability to correctly identify all actual positive instances.


Recall
Evaluation Metrics
Combines precision and recall into a single metric, balancing both
F-Measure aspects of classification performance.

Matthews Correlation Coefficient considers true & false +tives/-


MCC tives to assess classification quality

Measures the model's ability to distinguish between classes


ROC Area

Quantifies the model's precision-recall trade-off


PRC Area
Detailed Accuracy By Class
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class

0.1 0.006 0.091 0.1 0.095 0.089 0.541 0.026 1

0.113 0.032 0.107 0.113 0.11 0.079 0.526 0.047 2

0.711 0.237 0.689 0.711 0.7 0.472 0.76 0.652 3

0.614 0.259 0.612 0.614 0.613 0.355 0.706 0.579 4

0.497 0.056 0.559 0.497 0.527 0.465 0.789 0.418 5

0 0.008 0 0 0 -0.009 0.611 0.022 6

0.614 0.213 0.611 0.614 0.612 0.403 0.732 0.563 Weighted Avg.
Confusion Matrix
a b c d e f

a 1 3 2 3 1 0 a=1

b 4 6 24 16 3 0 b=2

c 2 24 484 154 17 0 c=3

d 3 19 169 392 50 5 d=4

e 1 4 23 65 99 7 e=5

f 0 0 0 11 7 0 f=6

A confusion matrix is a compact table summarizing the performance of a classification model, detailing true
positive, true negative, false positive, and false negative predictions for each class, aiding in model evaluation
and error analysis.
03
Conclusion
This presentation on wine quality classification using the J48
algorithm in Weka highlights the significance of classification
in machine learning, specifically exploring the Weka software
and its application to the Wine Quality dataset. The detailed
accuracy metrics and confusion matrix provide valuable
insights into model performance, aiding in informed decision-
making using classification in Data Mining
Thank You

You might also like