20 Page
20 Page
20 Page
net/publication/370189362
CITATIONS READS
2 86
3 authors, including:
Jayashree Deka
Assam Don Bosco University
7 PUBLICATIONS 27 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jayashree Deka on 27 April 2023.
ORIGINAL CONTRIBUTION
13
Vol.:(0123456789)
J. Inst. Eng. India Ser. B
13
J. Inst. Eng. India Ser. B
species in order to gain maximum growth and better species from Thailand based on shape and texture features.
yield. The system achieved a precision rate of 99% with the ANN
classifier [19]. Rodrigues et al. used SIFT feature extrac-
tion & Principal Component Analysis (PCA) for the clas-
Literature Search sification of 6 fish species from a dataset of 162 images and
achieved an accuracy of 92% [20].
Research on fish classification and recognition tasks started Fish recognition from live fish images of the South Tai-
back in the year 1993 [11]. Fish recognition was first inves- wan sea was performed using 66 different attributes based
tigated on dead demersal and pelagic fishes where color on color, texture, and shape features [21]. As the recogni-
and shape features were used for recognition [11]. They tion task depended on the detection of the fish in the video,
reported 100% sorting for demersal and 98% for pelagic the system showed a poor recognition rate. Different feature
fish, respectively. descriptors for different parts of fish images were employed
Cadieux et al. 2000 developed a system for monitoring in [22]. SIFT and color histograms methods extracted par-
fish using a Silhouette Sensor]. Fish silhouettes were used tial features from images and a partial hierarchical classifier
for extracting the shape features using Fourier Descriptor predicts the images with a score of 92.1% on the FISH4KN-
and other shape parameters i.e., area, perimeter, length, LOWLEDGE dataset [22]. Mattew et al. 2017 used land-
height, compactness, convex hull area, concavity rate, image mark selection and geometrical features for the identification
area, and the height of the image [12]. A multi-classifier of four Indian marine fish using ANN and they achieved an
approach was used for classification by incorporating Bayes accuracy of 83.33% [29].
maximum likelihood classifier, Learning Vector Quantifica- The popularity of Complex Neural Networks (CNNs) has
tion classifier, and One-Class-One-Network (OCON) neural been proved in fish detection and species classification tasks
network classifier. The combined classifier resulted in an too. A deep CNN based on the Alex-Net model for fish spe-
accuracy of 77.88%. cies classification on the LifeClef-15 dataset was used and
The fish classification from videos in the constrained it gained an F-score of 73.5% on underwater videos [30].
environment was explored based on the texture pattern and Video clips of the 16 species collected from the Western
shape of the Striped Trumpeter and the Western Butterfish Australia sea were studied using a pre-trained deep CNN
from 320 images [13]. The SVM classifier with a linear ker- where a cross-layer pooling was used as feature extractor
nel outperforms the polynomial kernel with an accuracy of and classification was performed using an SVM classifier
90%. thereby achieving an accuracy of 94.3% [31].
Spampinato et al. performed a classification of 10 fish Different fish images were collected from the internet by
species using a dataset of 360 images. They extracted tex- the authors in [32]. They used a deep residual 50-layer net-
ture features using Gabor filtering, and boundary features by work for residual feature extraction and an SVM classifier
using Fourier descriptors and curvature analysis [14]. The on those images. They examined the effect of the frozen
system showed an accuracy of 92%. Alsmadi et al. 2011 layer on the overall system performance. The work reported
developed a backpropagation classifier-based fish image rec- an accuracy of 97.19%. Salman et al.2019 used CNN-SVM
ognition. They extracted fish color texture features from 20 for best accuracy on the same dataset as in [31] where the
different fish families using GLCM and achieved an overall maximum accuracy they claimed was 90% [33]. Khalifa
testing accuracy of 84% on 110 testing fish images [15]. et al. [34] proposed a 4-layer CNN on QUT and LifeClef-15
Images from Birch Aquarium were studied using a Haar dataset image classification and achieved a testing accuracy
detector and classifier for classifying Scythe Butterflyfish of 85.59%.
images [16]. Hu et al. utilized color, texture, and shape fea- Islam et al. performed eight (8) different types of small
tures of six freshwater fish images that are collected from indigenous fish classification experiments in Bangladesh
ponds in China [17]. Color features were extracted from skin [35]. They developed an HLBP (hybrid census transform)
images by transforming RGB images into HSV color space, feature descriptor for feature extraction and used an SVM
whereas statistical texture features and wavelet texture fea- classifier to classify the fish images [35]. They achieved an
tures were obtained from a total of 540 fish images [17]. The accuracy of 94.97% with a linear SVM kernel. A 32-layer
accuracy achieved was 97.96% with some miss-classification CNN network was employed on 6 freshwater fish species
due to similar texture features among the species. from the Fish-Pak [24] dataset [25]. They investigated the
In a separate experiment, features were extracted by SIFT effect of different learning rates on the system performance
and SURF detectors for Tilapia fish identification [18]. It and reports the best classification accuracy of 87.78% at a
was observed that the performance with SIFT (69.57%) learning rate of 0.001 and epoch = 350 [25]. Another six
features was lower than that with SURF (94.4%) features. SIFS image classifications from Bangladesh were studied
Another fish recognition model was introduced for 30 fish using color, texture, and geometrical features [26]. A total
13
J. Inst. Eng. India Ser. B
of 14 features were extracted, which was further reduced by It is very clear that CNN-based classification models pro-
using PCA and then fed to an SVM classifier resulting in a vide better classification accuracy compared to traditional
classification accuracy of 94.2% [26]. For the first time, an machine learning models even with the typical constraints
experiment for finding fish species, order, and family of 68 of translation, rotation, and overlapping [44]. Also, fish clas-
Pantanal fish species was executed using Inception layers sification using deep learning requires a lot of training data,
[28]. These layers extracted the features from the fish images which poses a challenge for researchers working alone [23].
and then three branches were implemented with the help of This detailed background search on the fish classification
fully connected layers to predict the class, family, and spe- reveals that while marine fish classification is extensively
cies of the fish images. The technique showed maximum studied by maximum researchers, only a few have inves-
accuracy for order classification. tigated freshwater fish [17, 24–26, 35, 41, 42]. It has been
In a separate experiment conducted in the Lifeclef19 observed that a majority of the studies on automated fresh-
and QUT dataset, an Alex-Net with a reduced no. of con- water fish classification are restricted only to the carp fami-
volutional layers and an extra soft attention layer added in lies. These studies are missing out on a large variety of small
between the last fully connected layer and the output layer indigenous freshwater fish that plays an important role in the
was used. This gave an accuracy of 93.35% on the testing daily diet and livelihood of the common people of India. No
dataset [27]. A deep YOLO network on the LifeClef-15 research has been done to date that aims to classify mixed-
dataset and UWA datasets were used for fish detection and cultured species. Nevertheless, we also could not locate any
classification [36]. The classification accuracy observed work that specifically provides the identification/classifica-
was 91.64% and 79.8% on both datasets respectively. Deep tion of any of the Indian freshwater fish species through the
DenseNet121 and SPP-Net were used for the classification use of computer vision or image processing. Since SIFS are
of six different kinds of freshwater fish images downloaded tiny compared to carp species, it is difficult to identify them
from the internet in [37]. They used an ‘sgd’ optimizer for just by looking at the captured images. From Fig. 1 it is seen
training the models and a true classification rate(recall) of that, visually, Systomus Sarana(q) looks similar to Catla or
99.2% was achieved by SPP-Net [37]. In the later year, a Rohu. So, we collected images of twenty (20) varieties of
pre-trained VGG-16 model was utilized on four major carp indigenous fish species including SIFS and carps available
images and was able to achieve an accuracy of 100% with in our locality for designing the automatic fish classification
fivefold validation [38]. A modified Alex-Net was studied model. The proposed classification model shows excellent
using QUT and LifeClef dataset [39]. They adopted a drop- performance even though small indigenous fish species are
out layer before the SoftMax layer that helped in a higher tiny compared to six of the carp species.
accuracy over the traditional Alex-Net model and reported
90.48% accuracy on the validation dataset. To eliminate the
problem of class imbalances, the focal loss function together Materials and Methods
with the SeResNet-152 model was used for training and
achieved an accuracy of 98.8% [40]. It has been observed Overview of the Proposed System
that, with the class balanced focal loss function, the per-
formance increased for the species with fewer datasets too. This work proposes the classification of freshwater fish using
Dey et al. 2021 used a FishNet app based on 5-layer CNN a deep Convolutional Neural Network. Figure 1 presents the
for the recognition of eight varieties of indigenous freshwa- layout of the proposed classification model.
ter fish species from Bangladesh. [41]. An ‘adam’ optimizer
was chosen for simulation and three different drop-out rates Data Acquisition and Data preparation
were applied to the last three fully-connected layers of the
CNN. This method attained a validation accuracy of 99% Twenty different varieties of freshwater fish species are
with a 3.25% validation loss [41]. Abinaya et al. 2021 used considered for the fish classification purpose. For this
the Fish_Pak dataset and segmented fish heads, scales, and purpose, fifty-seven (57) fish samples of each of these 20
body are segmented from those images [42]. They used a varieties are captured from different local ponds in Assam,
Naïve Bayesian fusion layer in each training segment to India using fishing nets. The images from these species
increase the classification accuracy to 98.64% with the Fish- are acquired in natural lighting conditions using a Sam-
Pak dataset while with the BYU dataset accuracy achieved sung GalaxyM30s mobile phone that has a triple camera.
was 98.94% [42]. Jose et al. [43] used deep octonion net- We use the wide view mode of the main rear camera with
works as feature extractors and a soft-max classifier for a specification of 48-megapixel, aperture (f/2.0) lens, and
tuna fish classification. This method reported an accuracy focal length of 26 mm. The sensor size is 1/2.0" and the
of 98.01% on the self-collected tuna fish dataset and 98.17% pixel size is 0.8 µm. The third author of the paper, who is
accuracy on the Fish-Pak dataset. actively involved with the identification and study of fish
13
J. Inst. Eng. India Ser. B
Table 1 Data collection per species desired fish classification task and labeled with the help of
Species Fish_Pak Ours Total
the third author.
As these images are not enough for training the classifier,
Labeo gonius – 20 20 so, without acquiring new images, the size of the training
Labeo bata – 28 28 images is increased with the help of various image augmenta-
Channa punctata – 13 13 tion techniques. The process of image augmentation generates
Amblypharyngodon_mola – 15 15 duplicate images from the original images without losing the
Mystus Cavasius – 16 16 key features of the original images (Fig. 2).
Guduchia Chapra – 18 18
N. Notopterus – 14 14 Image Pre‑Processing
R. daniconus – 18 18
Trichogaster fasciata – 20 20 Since all the images, we have collected are of different sizes
Systomus Sarana – 16 16 and in different formats (jpeg and png), we first convert all
Puntius Ticto – 9 9 images into ‘png’ format in ‘Paint’ app. Again, as the number
Heteropneustes_fossils – 20 20 of images in some species are few, so, image augmentation
Anabas_testudineus – 23 23 techniques are applied to increase the size of the dataset. As
Puntius Sophore – 17 17 the dataset is very small, so, we have considered offline image
Cirrinhus Mrigal 70 22 92 augmentation technique which involves storing of images on
Labeo rohita(Rou) 73 16 89 the computer disk after each augmentation operation.
Cyprinus Carpio 50 10 60 Let I(x, y) be the image. Following augmentation tech-
Catla 20 12 32 niques are employed to each of these original images.
CIdella(grass-carp) 11 4 15
Silver carp 47 5 52 Cropping
Total 271 316 587
Cropping is the basic image reframing operation where a
desired portion of the image is retrieved. Let (x1, y1) be the
fauna across Assam, has supported the labeling of the col- origin of the image and (xe, ye) be the end point of the image.
lected images. Table 1 shows the fish image distribution per The new image co-ordinate (xp, yp) after cropping operation
species used in the experiment. We use Fish-Pak [24] dataset will be
for simulating the system in its initial phases. Images of the
(1)
( ) ( )
major carps are downloaded from the Fish-Pak dataset as I xp , yp = I xe − x1 , ye − y1
these are popular carps native to South Asian countries. A
total of 587 images of these species are accumulated for the
13
J. Inst. Eng. India Ser. B
All the original images in the dataset were flipped upside where p is a deep learning architecture model which can
down. Then the images were again flipped horizontally as it be series or parallel network. The layer Inp performs some
is common than flipping in the vertical axis. It is one of the pre-processing if required, before passing to the next layer
easiest ways to implement and has proven useful on data- called convolution layer. The convolution layer uses filters
sets such as CIFAR-10 and ImageNet. Upside down images that perform convolution operations and scans the input I
have a new co-ordinate of I(x, width-y-1) whereas, a hori- with respect to its dimensions. The input image becomes less
zontally flipped image co-ordinate is I (width-x-1, y) in the noticeable at the output of a convolution layer. However, par-
new image. ticulars such as the edges, orientation, and patterns become
more prominent, and these are the key features from which a
Rotation machine actually learns. The output obtained from the con-
volution operation of the input, I, is called as feature map, fm
The image rotation routine involves an input image, a rota-
(6)
( )
fm = Conv I, k, d, rl
tion angle θ, and a point about which rotation is done. If a
point (x1, y1) in original image is rotated around (x0, y0) by where k is the filter window, d is the stride, and rl is RELU
an angle of θ0 then the new image co-ordinate ( x2 , y2 ), can activation function. Next, batch normalization layer evalu-
be presented by following equation. ates the mean and variance for each of this feature map.
Batch normalization process makes it possible to apply much
(2)
( ) ( )
x2 = x1 − x0 ∗ cos𝜃 + y1 − y0 × sin𝜃 + x0
higher learning rates without caring about initialization.
The pooling layer performs the down-sampling opera-
(3) tion on the feature maps obtained from the preceding layer
( ) ( )
y2 = −sin𝜃 ∗ x1 − x0 + cos𝜃 ∗ y1 − y0 + y0
and produces a new set of feature maps with compressed
If x0 = y0 = 0 and 𝜃 = 90◦ clockwise, then ( x2 , y2 ) resolution. The main objective of a pooling layer is to gather
beocomes crucial information only by discarding unnecessary details.
x2 = y1 (4) Two main pooling operations are average pooling and max
pooling. In average pooling, down-sampling is performed by
All the original images and upside-down images are partitioning the input into small rectangular pooling regions
rotated 900 clockwise, anticlockwise and 1800 clock-wise. and then computing the average values of each region.
The image count post augmentation techniques goes up to In the Max-pooling operation, the spatial size of the pre-
2781. vious feature map is reduced. It helps to bring down the
noise by eliminating noisy activations and hence is superior
Convolutional Neural Network than average pooling. The pooling operator carries forward
the maximum value within a group of T activations. Then
A Convolutional neural-network (CNN) is a deep structured n-th max-pooled band consists of K-related filters
algorithm commonly applied to visualize and extract the hid- pk,m = max(hj,(m−1)U+t ) (7)
den texture features of image datasets. Since the last decade,
CNN features and deep networks have become the primary where U ∈ {1, 2, 3 … , T} is pooling shift operation that per-
choice to handle any computer vision tasks. Different CNN mits overlap between pooling regions if U < T. This results
architecture models utilize different layers i.e., convolution in shrinking the dimensionality of the K convolutional band
layers, batch-normalization, max-pooling layer, fully con- to N pooled band (k − T)
[ where, N] = N.J ∕U + 1 and final layer
nected layer, and a soft-max layer. The major advantages becomes p = p1 , p2 … pN ∈ T .
of the deep CNN architecture over the manual supervised At the end of every CNN, there lies one or more fully-
method are its self-learning and self-organized character- connected (FC) layers. The output of pooling layer is flat-
istics [38]. tened and then, put in to the fully-connected layer. The
The first layer in the CNN architecture is the image input neurons in the FC layer performs a linear transformation to
primary layer that takes up 2-D and 3-D images as input. its input and later, a nonlinear transformation takes place
The size of the image must be initialized into this layer [45]. as follows
Let I = m × n is an input image where m and n represent the
rows and columns respectively. Equation (5) represents the
( nH )
∑
image input layer by a function: Y=f Wjk X + B (8)
i=1
13
J. Inst. Eng. India Ser. B
where f is the activation function, X is the input, W is the RELU function. Batch normalization layers standardize the
weight matrix, j is the number of neurons in previous layer, activations of a given input volume before passing it into
k is the neuron in current layer, and B is the bias weight. It the succeeding layer. It calculates the mean and standard
is important to have the same number of output nodes in the deviation for each convolutional filter response across each
final FC layer as the number of classes. mini-batch at each iteration to normalize the present layer
Finally, Soft-max function computes the probability of activation. Resnet-50 has over 23 million trainable param-
each of the class that is outputted by the final FC layer. A eters. The original residual unit is modified by bottleneck
soft-max function is represented by design as shown in Fig. 3. For each residual function F, a
∑ stack of 3-layers was used instead of original 2-layers.
sof tmax(yi ) = exp(yi ) ∕ exp(yi ) (9)
i
Experimental Settings and Evaluation Criteria
From the background search on fish classification and
recognition, it is observed that CNN has shown outstanding The experimental simulation is performed in MAT-
result in fish classification compared to traditional method LAB2018a environment in a system with an Intel Core i5
of manual feature-based image classification. Some of the 7300HQ Processor of 2.50 GHz turbo up to 3.50 GHz with
species in our dataset had fewer images that resulted in class NVIDIA GeForce GTX 1050 Ti with a computation capa-
imbalance. So, to overcome this problem we have employed bility of 6.1.
transfer learning [46] method where a pretrained model of We use fine-tuned Resnet-50 and Alex-net from MAT-
CNN(ResNet-50 and Alex-Net) is being chosen to solve the LAB’s Deep-learning Toolbox. Due to variations in the
task. image sizes, we use ‘augmentedImageDatastore’ func-
tion from MATLAB to resize all the manually augmented
AlexNet images.
Our simulation-based experiments are performed first
Alex-Net is considered one of the most popular CNNs used on Fish_Pak dataset. A total of 1,191 augmented images
for pattern recognition and classification/identification appli- from this dataset are segregated randomly into 70% train-
cations. It incorporates 60 million parameters and 650,000 ing (834) and 30% validation (357) images. As our dataset
neurons [47]. The architecture consists of 5 convolution lay- is small, we use fine-tuned approach where the top 3 layers
ers, max-pooling layers, and 3 consecutive fully connected of Resnet-50 and Alex-Net models are removed and then
layers. The input layer takes the images of size 227 × 227 × 3 again, we re-arranged the model by adding the three layers
and then it filters the images by using a total 96 number of with six output neurons for the output layer. The weight and
kernels of size 11 × 11 with a stride of four. The output from
this layer is passed through the second convolutional layer
of 256 feature maps and the size of each kernel is 5 × 5 with
a stride of one. The output from this layer passes through a
pooling and normalization layer resulting in a 13 × 13 × 256
output. The third and fourth convolution layer use a kernel
of 3 × 3 with a stride of one. Then the output of the last con-
volutional layer is flattened through a fully connected layer
of 9216 feature maps which is again to a fully connected
layer with 4096 units. The last layer is the output that uses
the SoftMax layer with six units (classes) for the Fish_Pak
dataset and 20 units (classes) for our dataset.
ResNet‑50
13
J. Inst. Eng. India Ser. B
bias learning rate of the fully connected layer are varied for optimizer(‘sgdm’) with a momentum of 0.9 to train our data-
best performance of the classifier while the bottom layer set for the two architectures under consideration.
weights are frozen. Freezing the bottom layer weight helps The learning rate (Lr) in every CNN architecture pro-
in increased network speed during training. For training the duces a prominent impact on the overall accuracy of the
classifier an “sgdm” optimizer with a “momentum” of 0.9 classifier. While a low learning rate (Lr) can exhibit a slow
is used. Training data is divided into a “mini-batch” size execution rate of the entire model and thus increasing the
of 20 and we choose a maximum epoch = 30.The learning time complexity. On the other hand, using too large a learn-
capability of both the CNNs, i.e., Alex-Net and Resnet-50 is ing rate, Lr, there is a possibility that the model may get
evaluated in terms of classification accuracy, loss function stuck at the suboptimal output results. So, we have experi-
(training and validation loss) and confusion matrix (CM). mented with different values of the learning rate, i.e., 0.01,
CM is an ideal performance measure to assess accuracy of 0.001, and 0.0001. Tables 2 and 3 represent performance
multi-class balanced classification problem like fish clas- analysis of the two network models at different learning rate
sification. It is a two-dimensional matrix with target and with variations in the ‘WeightLearnRate’ factor and ‘Bias-
predicted class at the rows and columns, respectively. It pro- LearnRate’ factor of the fully connected layers.
vides insight into class-specific and also overall classifica- The performance of the classifier is determined by four
tion accuracies. It gives the overall classification accuracy evaluation parameters which are important for multi classi-
evaluated based on the number of images that have been fication problems: overall classification accuracy, mean pre-
classified correctly. cision, recall, and mean F-score. The recall is an important
As both the CNN models show satisfactory performance parameter in a multi-class classification task as it refers to
with Fish-Pak dataset, so next, we apply our own dataset the no. of positive classes that are labeled correctly. Preci-
consisting of a total 2781 augmented images. The training of sion gives information about the actual positive cases out of
1946 images(70% of 2781) is carried out and the 834 images the total positive prediction. All these parameters are calcu-
are chosen for the validation of the proposed CNN classifier. lated as per Eqs. (10)–(12).
A ‘mini-batch size of 32’ is chosen for both training and test-
ing purposes. We use a gradient descent solver momentum
13
J. Inst. Eng. India Ser. B
The CMs shown in Fig. 4, 5 and 6 give the idea of how all
the fish species are classified to its corresponding classes
by Resnet-50 on the Fish-Pak dataset. The rows in the CM
correspond to the Output Class (predicted class) and the
columns correspond to the true class (target class). The
diagonal values represent the number of correctly classified
species and the percentage value represented in the diagonal
cells is the recall value, i.e., out of all predicted classes how
many are actually correctly classified(positive). All confu-
sion matrix plot is obtained by using the MATLAB function
“Plotconfusion”. Figure 4 represents the CM of Resnet-50
architecture that gives the best classification accuracy of
100% at a learning rate of 0.01 and weight = bias = 10 on Fig. 4 CM of Resnet-50 on Fish-Pak dataset at Lr = .01,
Fish Pak dataset. Here, all the fish test images are predicted Weight = 10 = Bias
13
J. Inst. Eng. India Ser. B
Fig. 5 CM of Resnet-50 on Fish Pak dataset at Lr = .01, Fig. 7 CM of Alex-Net on Fish Pak dataset at Lr = .0001 and
Weight = 20 = Bias Weight = 10 = Bias
13
J. Inst. Eng. India Ser. B
Fig. 10 Accuracy and Loss curve of ResNet-50 at Lr = 0.01 and weight = 10 for Fish-Pak dataset
13
J. Inst. Eng. India Ser. B
Fig. 11 Accuracy and Loss curve of AlexNet-50 at Lr = 0.0001 and Weight = 10 for Fish-Pak dataset
13
J. Inst. Eng. India Ser. B
13
J. Inst. Eng. India Ser. B
13
J. Inst. Eng. India Ser. B
Fig. 18 Accuracy and loss curve (performance) of Resnet-50 at Lr = .001and weight = 20 = bias factor on our dataset
Weight and Bias learning factor = 10. During the experi- same training options that we feed to Resnet-50. Therefore,
ment, we observed that, performance measures at Lr = 0.01 we totally discard results associated with this learning rate.
for Alex-Net is below 40% for all weight and bias factors at From Table 3, with our own dataset, the best classification
(100%) performance measures by ResNet-50 are obtained at
13
J. Inst. Eng. India Ser. B
Fig. 19 Accuracy and loss curve (performance) of Alex-Net at Lr = .0001and weight = 30 = bias factor on our dataset
Table 4 Mis-classification analysis with Resnet-50 and Alex-Net For both the datasets, it has been observed that training
Parameters Weight and Mis-classification Mis-classification
loss and validation loss in the case of ResNet-50 decrease
bias learning on Fish_Pak on own data- first when we train the model from learning rate, Lr = 0.01
rate dataset (validation set (validation to Lr = 0.001, and it again increases for the learning rate,
image = 357) image = 834) Lr = 0. 0001. While in the case of Alex-Net, a decrease in
Resnet-50 10 0 2 learning rate increases the training and validation loss. Also,
20 2 0 an increase in Weight and Bias learning factors increase
30 1 2 the training and validation loss of the classifiers. From
Alexnet 10 3 20 Tables 2 and 3, we can draw an inference that with a gradual
20 7 24 increase in Weight and Bias learning factor, the loss function
30 6 18 increases for both the datasets.
The performance of the proposed classification method is
compared with the most recent works that dealt with fresh-
Lr = 0.001 when we choose the Weight and Bias factor as water fish species for classification and a detailed analysis
20. It also experiences the least validation and training loss is shown in Table 5. Those works use similar kinds of fish
with this training options. Alex-Net best performance is only species that are common in South Asian countries.
97.84%. validation accuracy at Lr = 0.0001 with weight and We have considered multiple parameters for showing a
bias learning factor = 30. Table 4 gives the total misclas- detailed comparison of the proposed work with previous
sification numbers by both the models on the two datasets. works. The highest classification accuracy on the Fish-
So, from the above empirical analysis, we observe that the Pak dataset is 98.8% [39] followed by 98.64% to date [42],
best classification parameters among the two classifiers are whereas with our fine-tuned Resnet-50 network, we are able
offered by ResNet-50 for both datasets at a learning rate of to reach 100% classification accuracy, precision, and recall
0.001. on the Fish-Pak dataset. We also achieve a considerable
improvement in terms of validation (2.38%) and training
13
J. Inst. Eng. India Ser. B
Islam et al. HLBP feature 94.78 Not men- Not men- Not men- Not men- BDIndig- 8
[35] with SVM tioned tioned tioned tioned enous-
Fish2019
Rauf et al. 32-layer VGG- 98.5%, 94.83, 95.67, Not men- Not men- Fish_Pak 6
[25] Net, (93) (89.88) (90.17) tioned tioned
(Resnet-50)
Sharmin et al. Handcrafted 94.2% 93% 94.59 NA NA self-collected 6
[26] fea- SIFS
tures + SVM-
classifier
Wang et al. SPP-densenet 97% 97.62% 99.2% 0.1(from Not men- Google 6
[37] graph) tioned Images
Banan et al. VGG-16 100% Not men- Not men- 0.0014 .0154 Own images 4
[38] tioned tioned of major
carps
Iqbal et al. AlexNet 90.48% Not men- Not men- Not men- Not men- LifeClef’15 6
[39] tioned tioned tioned tioned
Xu et al. [40] Combined SE- 98.8% Only per Only per NA NA Fish_Pak 6
Resnet-152 species is species is
mentioned mentioned
Dey et al. [41] 5-layer CNN 99 99.01 99.01 .0325 .0167 BDIndig- 8
enous-
Fish2019
Abinaya et al. AlexNet with 98.64 99.80 98.99 Not men- Not men- Fish_Pak 6
[42] fuse Naive tioned tioned
Bayesian
layer
Jose et al. [43] Octonion Net- 98.17% 98.11% 98.13% NA NA Fish_Pak 6
work + soft- (98.01%) (98.05%) (98.07%) (Self-col- (3)
max classifier lected Tuna
fish data)
Proposed Fine-tuned 100 100 100 .0238 0.0003 Fish_Pak 6
Work (Ours) Resnet-50
Proposed Fine-tuned 100 100 100 .0206 .0004 Own dataset 20
Work (Ours) Resnet-50
loss (0.03%). Our analysis shows that this is the best result loss < training loss. With our own dataset of 20 varieties
on the Fish_Pak dataset to this date. of indigenous fish images, we achieve a 100% validation
Although a 100% classification accuracy is achieved accuracy, precision and recall. We accomplish even less
by using VGG-16 [38], their target species is only 4 major training and validation loss than Fish-Pak dataset on our
carps. Another important thing we want to highlight in this simulation result. So, we can claim that our work is the first
paper is that the learning curve where the validation loss is to report successful automatic classification of 20 varieties
lesser than the training loss which is considered an underfit of fresh-water fish from India with best classification scores
problem. This is a case of an unrepresentative validation and minimum loss than any of the state-of-the-art methods.
dataset [49–51] which means validation dataset is easier for
the model to predict than the training data. Another possibil-
ity could be validation dataset considered in the simulation Conclusions
experiment is scarce but widely represented by the training
dataset therefore the model behaves pretty well with those The research article investigates different learning rate, weight
few validation data. Generally, a good training means vali- and bias factor for achieving maximum classifier accuracy by
dation loss should be slightly greater than the training loss. ResNet-50 and Alex-Net on fish classification problem. From
Also, it is advised to keep training the classifier if validation the performance analysis of these two deep-learning models,
13
J. Inst. Eng. India Ser. B
we have observed that a 100% classification accuracy is achiev- 2. T.M. Berra, An Atlas of Distribution of the Fresh Water Fish Fami-
able by Resnet-50 architecture at a learning rate of 0.001 on lies of the World (University of Nebarska Press, Lincoln, 1981)
3. D. Kumar, Fish culture in undrainable ponds. A manual for exten-
both the datasets under investigation. The proposed ResNet-50 sion, FAO Fisheries Technical Paper No. 325. (Rome, FAO,
model not only improves the target classification accuracy but 1992). 239 p
reduces network under-fitting too. It also reveals that the mis- 4. M. Karim, H. Ullah, S. Castine et al., Carp–mola productivity
classification rate by ResNet-50 is minimum compared to Alex- and fish consumption in small-scale homestead aquaculture in
Bangladesh. Aquac. Int. 25, 867–879 (2017). https://doi.org/10.
Net. The paper reports the best classification accuracy on the 1007/s10499-016-0078-x
Fish-Pak dataset till date. The paper also reports maximum 5. B.K. Bhattacharjya, M. Choudhury, V.V. Sugunan, Ichthyofaunis-
classification accuracy of 100% first time for 20-class classifi- tic resources of Assam with a note on their sustainable utilization,
cation model. The proposed method will help in accurate clas- in Participatory approach for Fish Biodiversity Conservation in
North East India, ed. by P.C. Mahanta, L.K. Tyagi (Workshop
sification of carps and small indigenous fish species reared in Proc. NBFGR, Lucknow, 2003), pp. 1–14
one environment. With the help of this technique, fish farmers 6. S. Dewan, M.A. Wahab, M.C.M. Beveridge, M.H. Rahman, B.K.
would be able to manage early segregation of the fish species Sarker, Food selection, electivity and dietary overlap among
according to their food preference and growth rate for better planktivorous Chinese and Indian major carp fry and fingerlings
grown in extensively managed, rain-fed ponds in Bangladesh.
yield. Also, accurate classification of fish species can create Aquac. Res. 22, 277–294 (1991)
awareness among fish loving locals of the North-East India in 7. M.M. Rahman, M.C.J. Verdegem, M.A. Wahab, M.Y. Hossain,
limiting over exploitation of some of the threatened fish species. Q. Jo, Effects of day and night on swimming, grazing and social
In future, we would like to incorporate a live monitoring-based behaviours of rohu Labeo rohita (Hamilton) and com-mon carp
Cyprinus carpio (L.) in simulated ponds. Aquac. Res. 39, 1383–
classification system to develop a robust and realistic classifica- 1392 (2008)
tion model to help local and poor fish farmers for proper man- 8. M.A. Wahab, M.M. Rahman, A. Milstein, The effect of common
agement of their fisheries as their livelihood depends on this. carp Cyprinus carpio (L.) and mrigal Cirrhinus mrigala (Hamil-
ton) as bottom feeders in major Indian carp polycultures. Aquac.
Acknowledgements The authors are grateful to Mr. Balaram Mahal- Res. 33, 547–557 (2002)
der (Technical Specialist, WorldFishCenter), Mr.Hamid Badar Osmany 9. D.M. Alam, M. Hasan, Md. Wahab, M. Khaleque, M. Alam, Md.
(Assistant Biologist, Marine Fisheries Department, Karachi, Pakistan), Samad, Carp polyculture in ponds with three small indigenous fish
Mostafa A. R. Hossain (Professor, Professor, Aquatic Biodiversity & species—Amblypharyngodon mola, Chela cachius and Puntius
Climate Change,Department of Fish. Biology & Genetics, Bangladesh sophore. Progress. Agric. 13, 117–126 (2018)
Agricultural University), Dr. Shamim Rahman(Asstt. Professor in 10. A.S.M. Kibria, M.M. Haque, Potentials of integrated multi-trophic
Zoology,Devicharan Barua Girls’ College, Jorhat, India), Dr. Dibakar aquaculture (IMTA) in freshwater ponds in Bangladesh. Aquac.
Bhakta(Scientist (SS),Riverine and Estuaries Fisheries Division ICAR- Rep. 11, 8–16 (2018). https://d oi.o rg/1 0.1 016/j.a qrep.2 018.0 5.0 04
Central Inland Fisheries Research Institute, Barrackpore,India), Dr. 11. N.J.C. Strachan, Recognition of fish species by colour and shape.
Mosaddequr Rahman(Kadai · Graduate School of Agriculture Forestry Image Vis. Comput. 11, 2–10 (1993)
and Fisheries, Kagoshima University) and Bhenila Bailung(Research 12. S. Cadieux, F. Michaud, F. Lalonde, Intelligent system for auto-
Scholar,Dibrugarh University) for allowing us to use their fish images mated fish sorting and counting, Proceedings, in 2000 IEEE/RSJ
from FishBase for initial simulation of the deep-learning networks. International Conference on Intelligent Robots and Systems (IROS
Due to the COVID-19 led pandemic and lockdown, our own fish image 2000) (Cat. No.00CH37113), vol. 2 (Takamatsu, Japan, 2000), pp.
collections was hampered initially, but these individuals saved us by 1279–1284
permitting us to use their collection of images which helped in initial 13. A. Rova, G. Mori, L. M. Dill, One fish, two fish, butterfish, trum-
planning of the classification model. peter: recognizing fish in underwater video, in APR Conference
on Machine Vision Applications (2007), pp. 404–407
14. C. Spampinato, D. Giordano, R. Di Salvo, J. Chen-Burger, R.
Author contribution statement JD: Conceptualization, Methodol- Fisher, G. Nadarajan, Automatic fish classification for underwa-
ogy, Software, Investigation, Writing—original draft, Writing—review ter species behavior understanding. Anal. Retr. Tracked Events
& editing. Dr. SL: Supervision, Conceptualization. Dr. BB: Data col- Motion Imagery Streams (2010). https://doi.org/10.1145/18778
lection and validation. 68.1877881
15. M.K. Alsmadi, K.B. Omar, S.A. Noah et al., Fish classification
Funding The authors declare that no funds, grants, or other support based on robust features extraction from color signature using
were received during the preparation of this manuscript. back-propagation classifier. J. Comput. Sci. 7, 52 (2011)
16. B. Benson, J. Cho, D. Goshorn, R. Kastner, Field programmable
Declarations gate array (FPGA) based fish detection using Haar classifiers. Am.
Acad. Underwater Sci. (2009)
Conflict of interest The authors have no relevant financial or non- 17. J. Hu, D. Li, Q. Duan, Y. Han, G. Chen, X. Si, Fish species clas-
financial interests to disclose. sification by color, texture and multi-class support vector machine
using computer vision. Comput. Electron. Agric. 88, 133–140
(2012)
18. M. M. M. Fouad, H. M. Zawbaa, N. El-Bendary, A. E. Hassanien,
References Automatic Nile Tilapia fish classification approach using machine
learning techniques, in 13th International Conference on Hybrid
1. FAO, Aquaculture development trends in Asia (2000). http://w
ww. Intelligent Systems (HIS 2013) (Gammarth, 2013), pp. 173–178.
fao.org/3/ab980e/ab980e03.htm#TopOfPage https://doi.org/10.1109/HIS.2013.6920477
13
J. Inst. Eng. India Ser. B
19. C. Pornpanomchai, B. Lurstwut, P. Leerasakultham, W. Kitiyanan, 36. A. Jalal, A. Mian, M. Shortis, F. Shafait, Fish detection and spe-
Shape- and texture-based fish image recognition system. Kasetsart cies classification in underwater environments using deep learn-
J. Nat. Sci. 47, 624–634 (2013) ing with temporal information. Ecol. Inform. 57, 101088 (2020).
20. M. Rodrigues, M. Freitas, F. Pádua, R. Gomes, E. Carrano, Eval- https://doi.org/10.1016/j.ecoinf.2020.101088
uating cluster detection algorithms and feature extraction tech- 37. H. Wang, Y. Shi, Y. Yue, H. Zhao, Study on freshwater fish image
niques in automatic classification of fish species. Pattern Anal. recognition integrating SPP and DenseNet network. 2020, in
Appl. (2014). https://doi.org/10.1007/s10044-013-0362-6 IEEE International Conference on Mechatronics and Automation
21. P.X. Huang, B.J. Boom, R.B. Fisher, Hierarchical classification (ICMA) (2020). https://doi.org/10.1109/icma49215.2020.923
with reject option for live fish recognition. Mach. Vis. Appl. 26, 38. A. Banan, A. Nasiri, A. Taheri-Garavand, Deep learning-based
89–102 (2015) appearance features extraction for automated carp species identi-
22. M.-C. Chuang, J.-N. Hwang, K. Williams, A feature learning and fication. Aquac. Eng. 89, 102053 (2020)
object recognition framework for underwater fish images. IEEE Trans. 39. M.A. Iqbal, Z. Wang, Z.A. Ali et al., Automatic fish species clas-
Image Process. (2016). https://doi.org/10.1109/tip.2016.2535342 sification using deep convolutional neural networks. Wireless
23. D. Li, Qi. Wang, X. Li, M. Niu, He. Wang, C. Liu, Recent Pers. Commun. 116, 1043–1053 (2021). https://doi.org/10.1007/
advances of machine vision technology in fish classification. ICES s11277-019-06634-1
J. Mar. Sci. 79(2), 263–284 (2022). https://doi.org/10.1093/icesj 40. X. Xu, W. Li, Q. Duan, Transfer learning and SE-ResNet152 net-
ms/fsab264 works-based for small-scale unbalanced fish species identification.
24. S.Z.H. Shah, H.T. Rauf, I.U. Lali, S.A.C. Bukhari, M.S. Khalid, Comput. Electron. Agric. 180, 105878 (2021). https://doi.org/10.
M. Farooq, M. Fatima, Fish-Pak: fish species dataset from Paki- 1016/j.compag.2020.105878
stan for visual features based classification. Mendeley Data 41. K. Dey, M.M. Hassan, M.M. Rana, M.H. Hena, Bangladeshi
(2019). https://doi.org/10.17632/n3ydw29sbz.3 indigenous fish classification using convolutional neural networks.
25. H.T. Rauf, M.I. Lali, S. Zahoor, S.Z. Shah, A. Rehman, S.A.C. Int. Conf. Inf. Technol. (ICIT) 2021, 899–904 (2021). https://doi.
Bukhari, Visual features based automated identification of fish org/10.1109/ICIT52682.2021.9491681
species using deep convolutional neural networks. Comput. Elec- 42. N.S. Abinaya, D. Susan, R. Sidharthan, Naive Bayesian fusion
tron. Agric. (2019). https://doi.org/10.1016/j.compag.2019 based deep learning networks for multisegmented classification of
26. I. Sharmin, N.F. Islam, I. Jahan et al., Machine vision based local fishes in aquaculture industries. Ecol. Inform. 61, 101248 (2021).
fish recognition. SN Appl. Sci. 1, 1529 (2019). https://doi.org/10. https://doi.org/10.1016/j.ecoinf.2021.101248
1007/s42452-019-1568-z 43. J. Jose, C.S. Dr. Kumar, S. Sureshkumar, Region-based split Octo-
27. Z. Ju, Y. Xue, Fish species recognition using an improved AlexNet nion networks with channel attention module for tuna classifica-
model. Optik 223, 165499 (2020). https://doi.org/10.1016/j.ijleo. tion. Int. J. Pattern Recognit. Artif. Intell. (2022). https://doi.org/
2020.165499 10.1142/S0218001422500306
28. A.A. dos Santos, W.N. Gonçalves, Improving Pantanal fish species 44. M.K. Alsmadi, I. Almarashdeh, A survey on fish classification
recognition through taxonomic ranks in convolutional neural net- techniques. J. King Saud Univ. Comput. Inf. Sci. 34(5), 1625–
works. Ecol. Inform. 53, 100977 (2019). https://d oi.o rg/1 0.1 016/j. 1638 (2022). https://doi.org/10.1016/j.jksuci.2020.07.005
ecoinf.2019.100977 45. M. Riesenhuber, T. Poggio, Hierarchical models of object recog-
29. P. Mathew, S. Elizabeth, Fish identification based on geometric nition in cortex. Nat. Neurosci. 2, 1019–1025 (1999). https://doi.
robust feature extraction from anchor/landmark points, in National org/10.1038/14819
Conference on Image Processing and Machine Vision (NCIPMV) 46. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans.
2017 (At University of Kerala, Trivandrum, 2017) Knowl. Data Eng. 22(10), 1345–1359 (2010)
30. J. Jäger, E. Rodner, J. Denzler, V. Wolff, K. Fricke-Neuderth, 47. A. Krizhevsky, I. Sutskever, G. Hinton, ImageNet classification
Seaclef 2016: Object proposal classification for fish detection in with deep convolutional neural networks. Neural Inf. Process.
underwater videos, in CLEF (Working Notes) (2016), pp. 481–489 Syst. (2012). https://doi.org/10.1145/3065386
31. S.A. Siddiqui, A. Salman, M.I. Malik, F. Shafait, A. Mian, M.R. 48. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image
Shortis, E.S. Harvey, Automatic fish species classification in recognition, in Proceedings of the IEEE Conference on Computer
underwater videos: exploiting pretrained deep neural network Vision and Pattern Recognition (Las Vegas, NV, USA, 27–30 June
models to compensate for limited labelled data. ICES J. Mar. Sci. 2016), pp. 770–778
75, 374–389 (2017) 49. T. Mitchell, Machine Learning (McGraw-Hill Science/Engineer-
32. Y. Ma, P. Zhang, Y. Tang, Research on fish image classification ing/Math, Berlin, 1997)
based on transfer learning and convolutional neural network model 50. K. Horak, Introduction to Learning Curves. http://vision.uamt.
2018, in 14th International Conference on Natural Computation, feec.vutbr.cz/STU/lectures/MachineLear ning_LearningCur ves.
Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (2018), Accessed 08 February (2022)
pp. 850–855. https://doi.org/10.1109/FSKD.2018.8686892 51. https://www.baeldung.com/cs/learning-curve-ml. Accessed 08
33. A. Salman, S. Maqbool, A.H. Khan, A. Jalal, F. Shafait, Real- Feb 2022
time fish detection in complex backgrounds using probabilistic
background modelling. Ecol. Inform. 51, 44–51 (2019) Publisher’s Note Springer Nature remains neutral with regard to
34. N.E. Khalifa, M. Taha, A.E. Hassanien, Aquarium family fish jurisdictional claims in published maps and institutional affiliations.
species identification system using deep neural networks, in Pro-
ceedings of the International Conference on Advanced Intelligent Springer Nature or its licensor (e.g. a society or other partner) holds
Systems and Informatics 2018, (2019), pp. 347–356. https://doi. exclusive rights to this article under a publishing agreement with the
org/10.1007/978-3-319-99010-1_32 author(s) or other rightsholder(s); author self-archiving of the accepted
35. M. A. Islam, M. R. Howlader, U. Habiba, R. H. Faisal, M. M. Rah- manuscript version of this article is solely governed by the terms of
man, Indigenous fish classification of Bangladesh using hybrid such publishing agreement and applicable law.
features with SVM classifier. 2019, in International Conference
on Computer, Communication, Chemical, Materials and Elec-
tronic Engineering (IC4ME2) (2019). https://doi.org/10.1109/
ic4me247184.2019.9036679
13