An Approach Based Iris Flower Species Recognition Using Machine Learning Classifiers
An Approach Based Iris Flower Species Recognition Using Machine Learning Classifiers
An Approach Based Iris Flower Species Recognition Using Machine Learning Classifiers
4, 2022
This is an open access article distributed under the Creative Commons Attribution License
Copyright ©2017. e-ISSN: 2520-789X
https://doi.org/10.25007/ajnu.v11n4a1636
ABSTRACT
For data instances, classification is used to forecast group membership. Techniques for machine learning are being
introduced to make the classification problem simpler. The development of a model to categorize Iris blossoms using
an Artificial Neural Network (ANN) and Support Vector Machine (SVM) is the main goal of this paper. The Iris flower
data set is an example of a multivariate data set. It was first presented by Ronald Fisher, a British statistician and
biologist. Multi-parameter analysis of taxonomic is considered a main problem of iris flower recognition. The difficulty
is in differentiating between the iris flower species (Setosa, Versicolor, and Verginica) based on the dimensions of the
sepal and petal. The Iris data set would be classified by looking for patterns in the sepal and petal sizes of the Iris
flower, and then determining how the pattern was predicted to form the class of the Iris flower. Experimental results
illustrated that both ANN and SVM can classify iris flowers successfully by obtaining 98.66% and 97.79% of accuracy,
respectively.
Keywords: Machine Learning, Deep Learning, AI, Iris Flower.
469
Academic Journal of Nawroz University (AJNU), Vol.11, No.4, 2022
An approach for identifying Iris flower species is an original image of the flower.
described in this research. It operates in two stages: Patrick [10] concentrated on the dataset's statistical
testing and training. A Machine Learning (ML) method analysis using the iris flower example. They are
is loaded with the dataset of the training stage during examining two alternative approaches in his study. To
training, and labels are given to the data. The predictive identify the various classification patterns, the dataset is
model also indicates the species to which the iris flower plotted. Then, using a java program they developed,
belongs. Therefore, the predicted Iris species have been they may retrieve statistical data. In her research,
identified. This research focuses on machine learning- Poojitha [11] employed neural networks to examine data
based IRIS flower classification. The problem statement sets on iris flowers. A branch of computer science called
focuses on identifying IRIS flower species using machine learning. We have already loaded the iris
measurements of their floral attributes. The IRIS data set dataset and have divided it into three groups. They
would be classified by finding patterns in the petal and divided the dataset into groups using the k-means
sepal sizes of the IRIS flower and determining how the technique. Large-scale information aggregation is the
class of the IRIS flower was predicted by evaluating the main use of a neural network. Additionally, it is
pattern. In this study, we use data to train the machine employed in the mining of data, quantization of vectors,
learning model, and when previously unknown data is work approximation, division of images, and highlight
found, the algorithm predicts the species based on what extraction. Without any oversight, the findings are
it has learned from trained data. divided into three distinct iris species. Lakhdoura and
Elayachi [12] used WEKA 3.9 to do a test comparing the
2. Literature Review performance of two classifier methods: J48 (c4.5) and RF
The practice of categorizing distinct database objects into on the IRIS features. As a result, the University of
one or more groups or categories is known as data California, Irvine's ML library provides access to the IRIS
mining. The objective of the classification step is to plant dataset, one of the most popular datasets for
assign each instance to the relevant target class. This classification problems (UCI). The researchers also
section provides an overview of the most recent and contrasted the outcomes of both classifiers on numerous
practical classification methods that have been efficacy assessment metrics. According to the results, the
developed by researchers in the last two years across J48 classifier performs better than the Random Forest
many ML domains. Additionally, it only employs k- (RF) classifier for predicting IRIS variety using a range of
Nearest Neighbors, random forests, and decision trees as measures, including classification precision, mean
classifiers. absolute error, and construction time. The accuracy of
The Gaussian Naive Bayes technique is used by Zainab the J48 classifier is 95.83%, while that of the Random
Iqbal [7] to categorize the species of the iris flower. We Forest is 95.55%.
analyze the iris dataset using a scatter matrix and a Numerous research has been done using different
scatter plot that is constructed. The algorithm and methods to identify the species of the iris flower. Every
Python are both utilized in the paper to categorize the study employs a different method. The issue is the
many species of iris flowers. We can see that this categorization and identification of iris flower species
technique is effective for supervised learning based on their characteristics. After examining the
classification because it achieves a 95% accuracy rate. A characteristics of the iris flowers, we classify the iris data
C4.5 decision tree was suggested by Mijwil and Abttan set by identifying patterns, and we then project how the
[8] as a way to lessen the impacts of overfitting. IRIS, Car patterns will be processed to create the class of iris
Assessment, Bottle, and WINE were the datasets flowers. With the use of this classification and pattern,
utilized; both of these may be found in the UCI ML future predictions for unknown data can be made with
library. The issue with this classifier is that it overfits greater accuracy. The dataset for iris flowers is placed
because of its large number of nodes and divisions. It is into the machine learning prototype for the iris flower
possible that this overfitting will undermine the species approach.
classification system. The experimental results
demonstrated that, with an accuracy of roughly 92%, the 3. Developed Method
genetic algorithm was effective in reducing the effects of A data mining technique called classification divides
overfitting on the four datasets and increasing the data instances into a few different classes. The numerous
Confidence Factor (CF) of the C4.5 decision tree. Rong- models that built ML identification techniques have all
Guo Huang [9] focuses on flower detection using been developed to surpass one another. They all employ
Difference Image Entropy (DIE), a feature extraction- statistical techniques, including, but not limited to, DT,
based method. Their analysis of the experimental SVM, and ANN. These techniques look at the available
findings shows that the average recognition rate was data in various ways to guess [13]. Figure 1 displays the
95%. The DIE-based approach utilizes pre-processing conceptual framework that this work has built. The
and DIE computing to provide a recognition result from classification of iris flower species is done using a
470
Academic Journal of Nawroz University (AJNU), Vol.11, No.4, 2022
technique that is described. Testing and training are the order to categorize it. The classification of the iris flower
two stages. Machine learning models are fed data sets is then formed by looking closely at the pattern to make
during the training stage. Which category the iris flower the forecast. The machine learning model is trained by
belongs to is indicated by the predictive model. In this giving it data sets, and if any unexpected data is found
study, we use ML techniques to classify the many kinds or seen, it will forecast the species of flower based on
of iris flowers. By analyzing the sepal and petal sizes of what it has learned from training the data. Our work is
the flowers, we must find patterns in the iris data set in to identify iris species based on floral characteristics.
Trained algorithm
471
Academic Journal of Nawroz University (AJNU), Vol.11, No.4, 2022
472
Academic Journal of Nawroz University (AJNU), Vol.11, No.4, 2022
4.1 Dataset The observational data with 150 items are included in the
The UCI Machine Learning Repository served as the Iris flower data collection. Since the data frame contains
source of the dataset for this study. A multivariate data 150 items that fall into one of the three target categories
set is the Iris flower data set, often known as Fisher's Iris and four features (sepal width, sepal length, petal width,
data set. The data set includes 50 samples from each of and petal length). In this stage, we analyze the dataset
the three Iris species (Iris virginica, Iris versicolor, and mathematically to evaluate the performance of the
Iris setosa). Each sample was measured (in centimeters) method. Examples from the IRIS dataset is shown in
for the following four characteristics: sepal length, sepal Table 1 as illustrations.
width, sepal length, and sepal length. The Iris species are
depicted in Figure 4.
473
Academic Journal of Nawroz University (AJNU), Vol.11, No.4, 2022
The preceding rule can be parsed in the manner listed boost their efficiency and achieved 95.83% accuracy for
below. If an iris flower complies with the criterion, it is j48 and 95.55% accuracy for RF. The study in [21]
assumed to be an Iris Versicolor. The rule is a disjunction presents a method for iris classification using different
of literals in each of the three clauses that make up the machine learning techniques including LDA, PCA, RF,
conjunction. A literal can make binary decisions if the and LR. The same dataset has been used in the
sepal length is greater than 6.3. According to the evaluation; results showed that LDA among them
aforementioned rule, a clause is satisfied when at least achieved higher accuracy by obtaining 100% whereas
one literal is met, and the rule is met when every clause other techniques achieved lower accuracy compared to
is met. In this work, we are interested in learning logical this work by obtaining 86%, 94%, and 96% respectively.
formulations that represent categorization rules. Moreover, a study in [22] used only DT for iris
Equations (2–4) were used to calculate accuracy, as classification by obtaining 98%. This method obtained
shown below. higher accuracy compared to this study when using
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
(2) SVM while when using ANN this work obtained higher
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃 accuracy.
𝑅𝑒𝑐𝑎𝑙𝑙 = (3)
𝑇𝑃+𝐹𝑁
𝑇𝑃
𝑃𝑟𝑒 = × 100% (4) 5. Conclusion
(𝑇𝑃+𝐹𝑃)
The terminology used in the above equations that were In this research, we trained our data using a variety of
listed in the confusion matrix is clarified in more detail strong algorithms. The best results can only be obtained
below: True Positive (TP) is the result of the sample by data processing, and as we can see from the findings
properly predicting the positive class; False Positive (FP) above, those results are quite pleasing. In two of the four
is an outcome in which the sample correctly predicts the models mentioned above, which have accuracy scores of
positive class, whereas False Negative (FN) is an event 98.66% and better respectively, it is possible to predict
in which the sample wrongly predicts the negative class. the species of the iris flower. We can draw the conclusion
True Negative (TN) is an outcome in which the sample that it will be feasible to identify the species of any flower
accurately predicts the negative class. in the future with the right information on its
characteristics.
Table 2: Evaluation Results on IRIS Classification Based
data Splitting with 75% Training and 25% Testing References
Classifier Accuracy Recall Precision [1] Bhutada, S., K. Tejaswi and S. Vineela. Flower Recognition Using
Machine Learning. International Journal Of Researches In
ANN 98.66 96.32 98
Biosciences, Agriculture And Technology, 4(2), 67-73, 2021.
SVM 96.79 93.31 93.88 [2] Khalid, L. F., Abdulazeez, A. M., Zeebaree, D. Q., Ahmed, F. Y.,
ANN and SVM, are two classifiers that were assessed & Zebari, D. A. (2021, July). Customer churn prediction in
using this method using IRIS datasets. The findings telecommunications industry based on data mining. In 2021 IEEE
Symposium on Industrial Electronics & Applications (ISIEA) (pp. 1-
show that the classifiers offer various resolutions on
6). IEEE.
various datasets due to functional differences. The [3] Haji, S. H., Abdulazeez, A. M., Zeebaree, D. Q., Ahmed, F. Y., &
evaluation of the accuracy, precision, and recall-based Zebari, D. A. (2021, July). The Impact of Different Data Mining
method was described in Table 2. The ANN algorithm Classification Techniques in Different Datasets. In 2021 IEEE
outperforms both ANN and SVM in terms of Symposium on Industrial Electronics & Applications (ISIEA) (pp. 1-
performance. This study's final finding is that ANN 6). IEEE.
outperforms SVM in terms of accuracy. [4] P. Galdi and R. Tagliaferri, “Data mining: accuracy and error
measures for classification and prediction,” Encycl. Bioinforma.
Comput. Biol., pp. 431–6, 2018.
Table 3: Evaluation Results on IRIS Classification Based [5] Chicho, B. T., Abdulazeez, A. M., Zeebaree, D. Q., & Zebari, D.
data Splitting with 80% Training and 20% Testing A. (2021). Machine learning classifiers-based classification for IRIS
Classifier Accuracy Recall Precision recognition. Qubahan Academic Journal, 1(2), 106-118.
ANN 96.72 95.09 96.72 [6] Rao, T. S., Hema, M., Priya, K. S., Krishna, K. V., & Ali, M. S.
(2021). Iris Flower Classification Using Machine
SVM 95.03 93.01 92.31 Learning. Network, 9(6).
Based on the classifier employed in this study, ANN and [7] Shilpi Jain, V Poojitha, “By Using Neural Network Clustering
SVM classification algorithms were documented in tool in MATLAB Collecting the IRIS Flower”, Proc. IEEE , vol. 109,
linked works in this article, illuminating the significant 2020.
tasks that the researchers set with each approach [8] M. M. Mijwil and R. A. Abttan, “Utilizing the Genetic
Algorithm to Pruning the C4. 5 Decision Tree Algorithm,” Asian J.
examined. In this section, the findings from this study
Appl. Sci. ISSN 2321– 0893, vol. 9, no. 1, 2021.
were contrasted with those from studies based on [9] Roung– Guo Huang, Sang-Hyeon Jin, Jung –Hyun Kim and
research from related works. According to related work, Kwang- Seck Hong, “Flower Image Recognition Using Difference
a study [20] employed j48 and RF on IRIS datasets to Image Entropy”. DOI: 10.1145/1821748.1821868
474
Academic Journal of Nawroz University (AJNU), Vol.11, No.4, 2022
475