103 Submission

Feature extraction and classification with transfer
learning for breast cancer classification
M. Mohana Dhas1*[0000-0002-9296-5162] and N. Suresh Singh2

1
Department of Computer Science, Annai Velankanni College, Tholayavattam, India
2
Department of Computer Applications, Malankara Catholic College, Mariagri, Kaliyakkavilai,
India
*[email protected]
Abstract. Breast cancer emerges as a significant health issue worldwide,

affecting a vast number of females. The complexity of treatment escalates when
tumors extend to distant areas and infiltrate surrounding tissues, underscoring
the urgency of early identification and categorization to curb progression.
Despite significant endeavours in applying deep learning to breast cancer
classification, there is a recognized need for refinement in existing
methodologies. Addressing this gap, this study introduces an innovative model
employing an advanced pre-trained Convolutional Neural Network, with the
goal of improving the precision of breast cancer detection and categorization.
The methodology begins with robust data pre-processing and augmentation to
address noise and mitigate overfitting. Feature extraction is performed using
fine-tuned MobileNet V2, Thin MobileNet, and Reduced MobileNet V2
architectures, augmented by transfer learning to refine CNN models. The
integration of the Osprey Optimization Algorithm (OOA) further optimizes the
CNN, resulting in significant classification accuracy improvements. The
outcomes of the suggested approach are subsequently compared to those of
existing optimized CNNs using optimization algorithms and also compared
with advanced methods. The comparison provides 94% accuracy in feature
extraction using Fine-tuned MobileNet V2 and provides 98% classification
accuracy using OOA-CNN.
Keywords: MobileNet V2, Breast Cancer, Osprey Optimization Algorithm,

Transfer Learning.
1 Introduction
Breast cancer accounts for nearly 15% of deaths related to cancer among women,
underlining the critical requirement for advanced research to facilitate early detection,
accurate diagnosis, and effective treatment to enhance survival rates. Mammography
and biopsy stand out as the two primary diagnostic techniques employed for breast
cancer identification. Mammography involves the use of specialized breast imaging
by a radiologist to discern early signs of breast cancer in women. The widespread
adoption of mammography has contributed to a reduction in mortality rates associated
with breast cancer. Another precise diagnostic method is a biopsy, wherein a
pathologist examines a sample of tissue from the infected area of the breast under a
microscope to identify and characterize tumors. Biopsy enables the pathologist to
2
differentiate between malignant and benign lesion types, with benign anomalies often
stemming from irregularities in epithelial cells that generally do not lead to breast
cancer. Malignant or cancerous cells, on the other hand, exhibit abnormal growth and
division patterns.
Analyzing microscopic images manually becomes exceptionally challenging due to
the uneven presence of malignant and benign cells. In recent days, researchers have
proposed various cell classification methods for breast cancer detection in images. In
recent years, Artificial Intelligence (AI) has significantly advanced breast cancer
detection and recognition, aiming to categorize patients into "malignant" or "benign"
groups. Research efforts have explored diverse ML algorithms for breast cancer
classification. For instance, a study [1] utilized the weighted Naïve Bayesian
algorithm, achieving an accuracy of 98.54%. Another work [2] compared six ML
algorithms, using the Wisconsin Diagnostic Breast Cancer dataset, demonstrating
efficient classification of malignant and benign cases.
While traditional ML methods offer effective classification, challenges arise in
accurately detecting tumor subtypes from histopathological images using automated
approaches. A study [3] applied Deep Learning (DL) with Inception and ResNet
architectures to distinguish tiny malignant images and proposed a highly accurate
automatic framework for cancer diagnosis and subtype classification. However, DL
algorithms demand extensive training datasets, posing a challenge due to the limited
accessibility of breast images.
To address this, transfer learning (TL) has been applied to develop a DL model for
breast cancer initial diagnosis. TL involves leveraging a pre-trained model on
ImageNet and attaching it for the prediction, segmentation, or classification of Breast
Cancer Imaging (BCI) [4]. Additionally, a DL approach for categorizing breast
ultrasound images based on transfer learning was presented in [5]. Although
conventional transfer learning methods using an ImageNet pre-trained AlexNet have
demonstrated improved performance, challenges persist, such as small test precision
and undesirable false negatives in the presence of previously unseen examples [6],
limiting their applicability in clinical settings.
This research focuses on optimizing breast cancer classification through advanced
Convolutional Neural Networks (CNNs) and innovative techniques. The initial data
pre-processing addresses noise in breast tissue images, while the proposed data
augmentation method enhances the diversity of the training dataset. Three pre-trained
CNN architectures, including MobileNetV2 and Thin MobileNet, are employed for
feature extraction, complemented by transfer learning for improved model refinement.
The optimization involves the integration of the Osprey Optimization Algorithm
(OOA) to enhance CNN classification performance and reduce computational costs.
The key contributions of this proposed methodology include:
 Introduces a novel data augmentation technique, diversifying the training
dataset to improve CNN model accuracy.
 Utilizes fine-tuned MobileNetV2, Thin MobileNet, and Reduced
MobileNetV2 for effective feature extraction from breast cancer images.
 Applies transfer learning to refine CNN models, leveraging previously
learned characteristics to enhance recognition accuracy.
3
 Introduces Osprey Optimization Algorithm for efficient CNN optimization,

inspired by Osprey hunting behavior, ensuring improved classification
abilities and reduced computational costs.
The subsequent sections of this article provide further elaboration. Section 2 delves
into the literature review, offering a comprehensive analysis of existing research.
Section 3 elucidates the suggested approach in detail, outlining the steps and
strategies employed. Outcomes are thoroughly examined in Section 4. As a final
point, Section 5 concludes, and summarizes the findings and implications of the
study.
2 Related work
Researchers have looked into the possibility of utilizing image analysis methods to
classify breast cancer in recent years. Several research works have used image-
processing methods with ML and DL algorithms to distinguish between benign and
malignant conditions. The evaluation of the literature highlights the importance of
conducting more research in this field to develop trustworthy and easy-to-use
diagnostic tools that can help healthcare providers diagnose breast cancer promptly
and accurately. Ragab et al. [7] employed the Computer-aid diagnosis (CAD) system
for categorising the tumors type in the mammography images of the breast. The
features are extracted using the DCNN and the classification is done by the SVM.
Additionally, the CLAHE is used to develop the distinction and suppress the noise in
mammogram images and the area of interest (ROI) from the mammography is
extracted using two methods. The first method uses circular contours to manually
estimate the ROI; the second method uses threshold and region-based approaches. It
is challenging to gather a substantial amount of training data, the DCNN performance
for mammography image recognition is currently unknown. Furthermore, the
DCNN's capacity to obtain the feature representation is also unknown because of the
patterns that differ between the mammography and natural images. Therefore, Suzuki
et al. [8] introduced transfer learning in CAD systems to overcome the training
problem of DCNN. Both natural images and mammography images are used for
training. Transfer learning uses a small amount of mammography images after pre-
trained utilizing many natural photographs. Deniz et al. [9] employed transfer learning
(TL) in the classification of histopathological BCI. The TL and feature extraction is
done by using the model namely AlexNet and VGG16. The feature vectors from both
the vectors form a high-dimensional feature representation. Then the SVM classifier
with homogeneous mapping is utilized to categorize the image classes according to
the extracted features. Finally, they fine-tune the AlexNet model by removing the
final three layers and update layers to adapt it to the breast cancer detection problem.
Farhadi el. [10] emphasizes the possibility of structured data in developing the early
detection of malignant breast cancer through advanced ML techniques. It handles
imbalanced data prominent in breast cancer datasets utilizing deep TL on structured
data, as opposed to large image databases.
Zhu et al. [11] introduce a hybrid model merging Convolutional Neural Networks
and Logistic Regression-Principal Component Analysis (LR-PCA) for improved
diagnosis accuracy. The feature extraction is enhanced by using the pseudo-coloured
4
images which enrich the input data for the DL models. Furthermore, the LR-PCA
overcomes the multicollinearity among high-level deep characteristics derived from
CNNs and the image is pre-processed by the CLAHE and Pixel-wise intensity
alteration. The diagnostic results were enhanced by Samee et al. [12] through the
implementation of a several-views screening image-processing structural design.
Tumor patches were segmented by a texture-oriented method called first-class local
entropy. The results of the feature extraction process were used to create malignancy
markers like radius and area. Alruwaili and Gouda [13] employed transfer learning to
distinguish between types of breast cancer. Implementing a variety of augmentation
approaches to avoid overfitting and enhance the stability of outcomes. Researchers
skillfully combined a modified ResNet50 with a blend of Nasnet and MobileNet
architectures. This strategy allowed them to train network weights effectively on large
datasets while also fine-tuning pre-existing network weights on smaller, more
specialized datasets. Khamparia et al. [14] focus on a hybrid TL technique for
combining the MVGG and ImageNet to detect breast cancer from mammograms. It
also discusses the advantages of 3D mammography over standard 2D mammograms
in breast cancer screening but the dataset is tiny and lacks information. Aly et al. [15]
pre-process mammograms from DICOM to image format and use YOLO
architectures to detect and classify masses as malignant or benign. The feature
extractions are done by using ResNet and Inception then the concept of YOLO-V3 is
introduced by utilizing the k-means clustering on the dataset. It was estimated on a
dataset of 322 FFDMs and attained an accuracy of 96.6% for mass detection and
94.7% for mass classification. Two sophisticated neural networks are combined by
Zhang et al. [16], they are a CNN and graph convolutional network. Moreover, it
enhances the identification of malignant lesions in breast mammograms by combining
rank-based stochastic pooling, dropout, and batch normalisation with a two-layer
GCN. Khafaga [17] diagnosed breast cancer from the thermal imageries utilizing the
four pre-trained CNN architectures. Then the features are selected to train the neural
network for classification using an optimization algorithm. Yet, the optimization
method of the method is dependent on several factors, which may result in early
convergence and decreased efficiency. However, existing approaches have
limitations, including early convergence and decreased efficiency in optimization
methods. Addressing these gaps motivates the proposed approach, which aims to
overcome such challenges and improve the accuracy and efficiency of breast cancer
organization methodologies.
3 Proposed methodology
This research is centered on optimizing the performance of CNN for breast cancer
classification. The approach involves a comprehensive data pre-processing and
augmentation step, where noise is eliminated, and the CNN's effectiveness is
enhanced through the incorporation of a data augmentation technique. This technique
involves random transformations like rotation, flipping, scaling, and blurring to
diversify the training dataset, thereby improving the classifier's generalizability and
mitigating overfitting. The feature extraction process utilizes three pre-trained CNN
5
architectures: fine-tuned MobileNetV2, thin MobileNet, and reduced MobileNetV2

(RMNv2). These architectures are employed to extract relevant features from breast
cancer images, enhancing CNN's ability to classify and detect cancer accurately.
Transfer learning is implemented to further refine the models by leveraging
previously learned characteristics, contributing to improved recognition accuracy.
The breast cancer classification is performed using an optimized CNN, integrating
the Osprey Optimization Algorithm (OOA) to enhance the CNN's classification
abilities and reduce computational costs. The OOA, inspired by Osprey hunting
behavior, introduces exploration and exploitation phases to efficiently search for
optimal solutions. The exploration phase simulates the osprey detecting and pursuing
fish, while the exploitation phase mirrors the Osprey carrying the caught fish to a
secure area for consumption. The research also includes a hyperparameter tuning step,
where dynamic testing and training processes are employed to select the best-fit
hyperparameters for the pre-trained CNNs. Recursive k-fold cross-validation ensures
accurate predictions, and the selected hyperparameters define the architecture and
training parameters of the CNN. Fig. 1 illustrates the Architecture of the proposed
CNN-based breast cancer classification model.
Fig. 1. Architecture of the Proposed Model
3.1 Data pre-processing and augmentation

The initial step of pre-processing is crucial in breast image analysis to eliminate
various types of noise. The effectiveness of Convolutional Neural Networks (CNN) is
highly dependent on large datasets for optimal accuracy. Conversely, when dealing
with small datasets, CNN performance tends to degrade due to overfitting, where the
network excels in training data but struggles with test data. The proposed
framework tackles this issue by integrating a data augmentation technique,
which expands the dataset and helps alleviate overfitting concerns.
The data augmentation method creates additional images by applying random
transformations like rotation, flipping, scaling, blurring, and others to existing ones.
This technique serves to rise the diversity and dimension of the training data,
enhancing the classifier's generalizability and resilience against overfitting.
6
Considering the rotational invariance of microscopic images, the proposed approach

employs three specific mathematical operations for data augmentation: left-to-right
flipping, up-and-down flipping, and a 90-degree rotation. This strategy ensures that
the microscopic images can be analyzed from various angles, maintaining consistency
in breast cancer diagnosis without introducing variations. The proposed data
augmentation is illustrated in Algorithm 1.
Algorithm 1: Data Augmentation
For each target object from i = 1 to target object:
1. Read the input image.
2. Apply left-to-right flipping.
3. Apply up-and-down flipping.
4. Rotate the image by 90 degrees.
5. Save the image generated from step 2.
End loop
3.2 Feature Extraction using Pre-trained CNN
The CNN model comprises various layers, encompassing an input layer,

convolutional layers, batch normalization layers, pooling, ReLU activation layers,
softmax layers, and a singular output layer. In CNN, the layers work together to
process input data: from the initial input layer to the final output layer. These layers
include convolutional, batch normalization, pooling, ReLU activation, and softmax
layers, all contributing to the network’s ability to recognize patterns and make
predictions. The input layer is characterized by x × y × z, representing the input
image's width, height, and number of channels (z). The primary convolutional layer
serves as the initial layer and utilizes three inputs: x, y, and z. Within the
convolutional layer, the mapping of features occurs, facilitating visualization and
subsequent utilization in the activation layer. This process helps extract relevant
features from the input image for further processing in the neural network.
3.2.1 Fine-tuned MobileNet V2

In computer vision, the MobileNet v2 [18] is a convenient custom-based model. It
maintains the same precision overall by using less memory and fewer operations. The
reversed residual layer with a linear bottleneck is present. The input of low-
dimensional is represented as compressed and transformed into high-dimensional data
using light-weight depth-wise convolutional filters. It operates well in any kind of
application. It offers a tiny quantity of cache memory that the boosts system while
decreasing the requirement in many integrated hardware designs for primary memory.
Thus, less primary memory is required for the classification tasks. MobileNetv2
employs inverted residuals, linear bottleneck, depth-wise convolution, and
information flow interpretation. The two additional layers in MobileNetV2 take the
role of the convolution layers. The initial layer, known as the depth-wise convolution,
employs a single convolution filter per input unit. This lightweight filtering process
7
sets the stage for subsequent feature extraction and model learning. The point-wise
convolution layer creates fresh features by combining input channels through linear
combinations. The information from the deep convolutional layer exists as a manifold
within the low-dimensional subspace of the residual bottleneck. This representation
can be captured by reducing the layer and operational space dimensions. The process
of capturing this representation involves dimensionality reduction in both layers and
operational spaces
By suppressing the dimensionality space activation, the manifold increases the space.
ReLU (nonlinear per-coordinate transformation) is a feature of the deep convolution
neural network that defies intuition. The linear transformation is generated if the
ReLU transformation and the manifold of interest still have a volume that is not zero.
The more memory-efficient inverted residuals are constructed. The final layers of
MobileNetV2 were adjusted by updating new layers by the goal datasets. The refined
model is trained via a transfer learning strategy. Eight batches, 0.00001 learning rate,
and 100 epochs are established during the training phase. Ultimately, the improved
model utilizes the global average pool (GAP) to extract deep features for additional
processing. This layer's vector output measures N × 1280.
3.2.2 Thin MobileNet

In the thin MobileNet [19], the separable convolution is used instead of depth-wise
divisible convolutions, using drop activation instead of ReLU, using random
removing as a data augmentation approach, and eliminating some unnecessary layers
which are opposite to the fine-tuned MobileNet V2. The pointwise and depthwise
layers are united into an individual layer in the thin MobileNet using separable
convolution, eliminating the requirement for separate layer definitions. In this case,
the depthwise initialiser, regulariser, and constraints are distinct inside the same
function as the pointwise initialiser, regulariser, and constraint for the pointwise
convolution. This keeps the depthwise separable convolution's fundamental
functioning intact by cutting the network down to 14 layers, but it does not affect
improving the accuracy of the network. Therefore, the activation layer ReLU is
replaced with the drop-activation layer to make the model more correct, strong, and
compatible. During training, the non-linear function is casually turned on and off,
presenting uncertainty in the activation function (AF). The AF's nonlinearity is
dropped with a possibility of (1-P) and retained with a possibility of P. ReLU is the
non-linear activation function taken into consideration in this case, and the drop-
activation layer is used to change how ReLU is applied to the network. This improves
the accuracy and reduces the overfitting of the model. Furthermore, some unnecessary
layers are detached that have a similar result shape of (2, 2, 512) from the network
and account for 41% of all the parameters. This extremely decreases the model size
without compromising accuracy. The overall final modified architecture of MobileNet
is displayed in the Fig. 2.
8
Fig. 2. Architecture of MobileNet
3.2.3 Reduced MobileNet V2

Reduced MobileNet V2 is a new convolutional neural network architecture that
decreases the size of the model and computational complexity of the original
MobileNet V2. To make the MobileNet v2 more compatible [20], some
downsampling layers from the original MobileNet V2 were removed by substituting
stride 1 for stride 2. Heterogeneous kernel-based convolutions (HetConv), which
decrease the number of parameters and increase the receptive field, then take the
place of the bottleneck. The architecture of Reduced MobileNet V2 is displayed in
Fig. 3. MobileNet v2 achieves a reduction in the number of parameters while
maintaining accuracy by using a heterogeneous kernel with a size of 3 ×3 . The
various kernel types are controlled by the variable p. The kernel size k × k will be the
fraction of 1 ÷ p , while the remaining (1−(1 ÷ p))will have a kernel size of 1 ×1.
Therefore, the computing cost for kernel k × k at layer L and 1 ×1 is expressed in
Equations (1) & (2).
cost 1=( Ai × Ai ×C i × Co × k ×k )÷ P (1)
cost 2=( Ao × A o × C o)×(Ci −(C i ÷ p)) (2)
Where Ai is the representation of spatial height and width of the input feature map,
C i is denoted as the input channels. The output feature map's spatial height and
breadth is denoted as A o and the output of channels is represented as C o. The process
involves multiplying C ofilters of size k×k to generate the output feature map, then the
total cost can be calculated by Equation (3).
cost =cost 1+ cos t 2 (3)
The overall reduction is then expressed in Equation (4)
2
RHetconv=( 1 ÷ p )+((1−( 1 ÷ p ))÷ k ) (4)
Furthermore, an activation function in a neural network introduces the crucial idea of
nonlinearity. The Mish activation function is employed in the reduced MobileNet V2
9
with soft plus activation is represented as tanh ⁡(1+e x ) to enhance the gradient flow
and non-linearity which is expressed in Equation (5)
f ( x )=x ×¿ (5)
Fig. 3. Structure of reduced MobileNet V2
3.2.4 Transfer learning
A CNN that is being trained from the beginning usually requires enormous volumes
of data, but organizing a large data collection with pertinent problems can be
somewhat challenging in certain situations. While the goal is to have matching
training and testing data, achieving this is often impractical or presents a challenging
task. Consequently, the notion of transfer learning has been presented. Transfer
learning is among the best-renowned machine learning methods. Which uses prior
information gained from solving one problem to solve other pertinent problems.
Using a model learned on a particular assignment as the basis for a model on another
is defined as transfer learning. When there is little data for the second task or when
the tasks are comparable to each other, it may be helpful. By using the learned
characteristics from the first assignment, the framework can learn quicker and more
efficiently on the following task. This can also mitigate overfitting, as the system will
have already acquired generic features that are likely to be beneficial in subsequent
operations. The pre-training process and the transfer technique make it possible to
import neural network parameters from real-world imaging datasets. The similarity
and comparability of natural imaging datasets and remote sensing images within their
respective categories made this achievable in part. It makes sense that a CNN
architecture can be trained on large and complex ImageNet datasets to get well-
trained parameters, which is essential to initiating the next classification. As a result,
the pre-training technique enhances the MobileNet to classify breast cancer from the
data.
3.3 Breast Cancer Classification using optimized CNN with Osprey

Optimization Algorithm (OOA)
Recently, CNNs have been the most efficient NNs for image categorization and
processing. CNN and other Feed Forward Neural Network (FFN) models permit a
signal to go only in one direction within the network, never returning to the input
10
node. One of the greatest ML methods for medical image exploration is CNN since
even following the input image filtering process, it maintains spatial correlations.
These relationships are highly valued in the field of medical analytics. In CNN, two
processes make up the image analysis process known as convolution. The pixel values
of the characteristics that were taken out of three pre-trained CNNs must be entered
first. A numerical array known as a kernel (or filter) is used to symbolize the second
activity. The outcome of the two procedures is the dot product. The stride length is
then used to determine the kernel's location in the image. Feature maps, sometimes
called activation maps, are created by repeating the calculation until the whole image
is enclosed. The feature maps that are produced are then used as contributions to the
subsequent layer. Sparse connections, sharing of weights, and invariant representation
are necessary for efficient computational machine learning when using convolution.
CNN employs sparse connections, transmitting only a subset of outputs from all
layers to the subsequent one. Unlike other neural networks, it establishes connections
between each neuron within a layer. As the kernel's enclosed part of each step
decreases, the algorithm's performance increases by slowly learning the significant
attributes and significantly lowering the predicted weights. In addition to lowering
computing costs and assisting in avoiding the vanishing gradient issue, the ReLU
layer expedites training. The average pooling is located after the ReLU layer. Then
the maximum and average pooling is the relevantly used method because it decreases
the dimension of the image and the parameters. The fully connected layer establishes
a connection between all neurons in the preceding layer and all neurons in the
subsequent layer. It uses the output of the previous layers such as pooling, ReLU, or
convolutional to calculate the probabilities of different classifications. The CNN can
learn to recognize cancer cells from images by using standard methods like stochastic
gradient descent and backpropagation to train on previously labelled images.
3.3.1 Osprey Optimization Algorithm (OOA)

The OOA is a novel metaheuristic technique inspired by bio-inspired that mimics the
hunting behaviour of ospreys, a type of fish-eating bird [21].OOA has been shown to
perform well on various benchmark functions and real-world problems, such as
engineering design, economic load dispatch, image segmentation and classification.
OOA is compared with other popular optimization approaches, like GA, PSO, DE,
WOA, and SCA, and demonstrates superior convergence speed, and accuracy. OOA
is a promising algorithm for explaining complex and nonlinear optimization
problems. Based on a mathematical model of osprey behaviour during hunting, the
suggested OOA technique is separated into a pair of phases: exploration and
exploitation. A mathematical expression for the two stages of the OOA technique
exploration and exploitation was developed by simulating osprey behaviour during
hunting.
The location of the osprey is randomly initialized in the search space from the initial
implementation of OOA using Equation (6)
x i , j=lb j +r i , j . ( ub j−lb j ) , (6)
11
Where x is the representation of the location population matrix of osprey, x i , j is its

jth dimension, r i , j is the random number in the interval [0,1], ub j and lb j is the upper
bond and lower bond of the jth problem variable
The primary criterion used to assess the value of the candidate resolutions is the
evaluated values for the unbiased function. Consequently, the optimal candidate
resolution is represented by the finest value achieved for the unbiased function, and
the optimal candidate solution, or the poorest member, is represented by the poorest
value found for the objective function. In each repetition, the ospreys' position in the
search space is updated, necessitating a corresponding update of the finest candidate
solution.
Exploration phase: Finding the spot and pursuing the fish (exploration) is the first
phase of updating the population in the OOA. In this phase, each osprey in the
population randomly detects the position of another osprey that has a better objective
function value and considers it as a fish. The osprey then approaches the fish, goes
under the water, and captures it. The osprey's position undergoes substantial changes
due to this process, which boosts OOA's exploration capacity in locating the primary
optimum region and avoiding local optima. The mathematical model of this phase is
given by equations (7-9)
|
FPi={ X k k ∈ {1 , 2 , … ., N } ∧ F k < Fi } ⋃ { X best } , (7)
p1
x =x i , j +r i , j . ( SF i , j−I i , j . x i , j )
i, j (8)
{
p1 p1
X i = x i , F i < Fi ; (9)
X i , else ,
p1 p1
Where x i is the representation of the new position of i th osprey, x i , j is its jth
p1
dimension, F i is the objective function, r i , j are a random number from the range
[0,1], the random number sets of {1,2} is represented as I i , j .
Exploitation phase: The exploitation phase of the osprey's hunting behavior, which
is represented in equations (10), involves carrying the caught fish to a secure area for
consumption. It updates the OOA population throughout this period. It could result in
minor adjustments to the search space, boosting the capacity of OOA to utilize local
searches and accelerating the convergence of improved solutions. After that, the
osprey's prior location is changed in accordance with equation (11), which also
improves the objective function.
p2 lb j +r .(ub j−lb j)
x i , j=x i , j+ , (10)
t
{
p2 p2
X i = x i , F i < Fi ;
X i , else ,
(11)
12
p2 p2
Where x i is the representation of the new position of i th osprey, x i , j is its jth
p2
dimension, F i is the objective function, r i , j are arbitrary numbers within the range
[0, 1], the random number sets of {1,2} is represented as I i , j . t represents the
algorithm's iteration counter, and the T is the sum of all the iterations.
3.4 Hyperparameter Tuning

Through a dynamic testing and training process, the best parameters for the pre-
trained CNN are selected. The dataset is distributed into train and test subsets to get
the best-fit hyperparameters using the recursive k-fold cross-validation procedure.
This ensures the most accurate predictions. The chosen hyperparameters of the pre-
trained networks are shown in Table 1. The adopted CNN utilized for feature
classification is determined by these factors. Table 2 also contains a list of the
network-trained hyperparameters. The suggested optimization approach is used to
train and optimize these parameters. The ideal combination of criteria defining the
architecture of the CNN as well as the ideal values for the training hyperparameters
are employed to attain the highest classification accuracy are the outcomes of the
optimization process.
Table 1. Pre-trained network’s hyperparameters
Hyperparameter Range
No. of filters [16, 32, 64, 96]
Size of Kernel [3, 4, 5]
Fully connected layer [60, 100, 125]
Activation layer [RELU, drop, mish]
Table 2. Trained network’s hyperparameters
Hyperparameter Possible values
Learning rate [0.03, 0.003, 0.01, 0.001]
Batch size [0.01, 0.001, 0.03, 0.003]
Dropout [0.2 to 0.6]
4 Result and Discussion

The experimental evaluation of the proposed approach involves utilizing the BreakHis
dataset, which is partitioned into ten folds. During each iteration, the model is trained
on nine folds, and the tenth fold is used as the validation set. This ten-fold cross-
validation process is repeated multiple times to ensure comprehensive assessment. To
assess the performance of the proposed method, various metrics including sensitivity,
F1-score, AUC, precision, False Positive Rate, Accuracy, and computational time.
These metrics serve to calculate the ability of the model to accurately predict BCI
class labels. The implementation and evaluation of the proposed model are conducted
in MATLAB 2020 on a computer equipped with 16 GB RAM, an Intel i7-8700
processor, and a 4GB graphics card. This technical setup ensures rigorous testing and
analysis of the model's performance under different conditions.
13
4.1 Dataset
Benign
Malignan
t
Fig. 4. Sample images of the dataset

The suggested approach was assessed using the
https://www.kaggle.com/datasets/ambarish/breakhis. Using a variety of magnification
levels (40X, 100X, 200X, and 400X), 9,109 microscopic images of breast tumor
tissue from 82 individuals make up the BreastHis image categorization system for
breast cancer. It now has 5,429 malignant and 2,480 benign samples. Fig. 4 shows
some of the dataset's sample images.
4.2 Performance metrics

Standard performance metrics like precision, accuracy, and recall can be utilized to
calculate the efficacy and efficiency of a classification algorithm. The model’s
accuracy is determined by the ratio of correct predictions to the total predictions
made. To determine the model’s accuracy, divide the number of accurate
classifications by the total number of predictions made. The mathematical
representation for the performance metrics is expressed in Equations (12-16)
TP+TN
Accuracy= (12)
TP+TN + FP+ FN
TP
Precision= (13)
TP+ FP
TP
Recall= (14)
TP+ FN
TN
Specificity= (15)
TN + FP
TN
Sentivity= (16)
TP+ FN
4.3 Evaluation of Feature Selection

Table 3 presents the estimation of the derived features by transfer learning for the
classification of malignant or benign during testing and training. From the analysis,
all three models obtained accuracy scores higher than 0.90, suggesting that they are
generally capable of producing accurate predictions. Accuracy can occasionally be
14
deceiving if one type of dataset is far more common than the other which means
unbalanced. Sensitivity and specificity metrics are very important for binary
classification issues like the categorization of breast cancer. Sensitivity quantifies the
number of true positives the model accurately detects, whereas Specificity quantifies
the number of true negatives that are accurately detected. The models' sensitivity
values in this instance varied from 0.967 to 0.915, suggesting that they are capable of
identifying true positive situations with similar levels of performance. The models'
specificity values, which varied in success, showed that they could accurately detect
real negative situations. These values ranged from 0.852 to 0.901. It is significant to
remember that selecting the decision threshold might have an impact on sensitivity
and specificity and that various thresholds may yield varying presentation levels.
Since the NPV and Precision measures reveal the frequency of FP and FN, they are
also pertinent assessment metrics for binary classification issues. While the Precision
calculates the percentage of negatively categorized cases that are mistakenly labeled
as positive, the NPV calculates the percentage of positively classed cases that are
mistakenly classified as negative. The Precision in this instance ranged from 0.893 to
0.942, showing that the models' charges of FP predictions are comparatively low.
With FPR ranging from 0.072 to 0.053, the models appear to have somewhat greater
FN rates. Lastly, a metric that associates recall and precision into a single number is
the F-score. It offers a useful synopsis of how well the model performed overall in
classifying negative and positive cases. The F1-score values in this instance varied
from 0.521 to 0.529, showing that the models' recall and precision are comparable.
This performance gives a comprehensive description of how successfully the models
under investigation classify breast cancer.
Table 3. Performance of the derived features
Model Training Class

Specificity
Sensitivity
Accuracy
-Testing Type
Precision
F1-Score
Time (s)
Data
AUC
FPR
(%)
(%)
(%)
(%)
(%)
Splitting
Fine-tuned 70%- B 0.974 0.991 0.941 0.984 0.979 0.98 0.021 64.5
MobileNet V2 30% M 0.954 0.981 0.924 0.951 0.947 0.97 0.045 59.7
RMV2 70%- B 0.946 0.961 0.908 0.937 0.943 0.95 0.052 64.6
30% M 0.931 0.928 0.893 0.928 0.916 0.93 0.058 85.4
Thin 70%- B 0.925 0.945 0.884 0.916 0.925 0.92 0.067 98.9
MobileNet 30% M 0.908 0.915 0.852 0.893 0.914 0.90 0.072 128.7
4.4 Evaluation of Classification

The outcome of classification using the recommended OOA-CNN is shown in Table
4, along with comparisons of standard CNN and the enhanced CNN with other
optimization techniques. The OOA-CNN model outperformed the other five models
under evaluation with 98% accuracy, indicating that it was the most accurate at
classifying breast cancer. The accuracy scores of the other models ranged from 0.872
to 0.987. Additional details regarding the dataset and the particular task would be
15
required to completely understand and put these results into context. These outcomes
indicate that the suggested OOA-CNN model is an auspicious method for classifying
breast cancer.
Table 4. Performance of the model
Model Accuracy Sensitiv Specific Precis F1- AUC FPR Time(s)

ity ity ion Score
CNN 0.872 0.882 0.874 0.881 0.886 0.89 0.126 91.56
WOACNN 0.898 0.894 0.888 0.899 0.907 0.91 0.094 77.23
PSO-CNN 0.905 0.909 0.912 0.896 0.912 0.91 0.087 65.96
GWO-CNN 0.921 0.937 0.926 0.934 0.924 0.92 0.064 48.59
BER-CNN 0.948 0.941 0.935 0.952 0.955 0.95 0.050 34.86
ABER-CNN 0.964 0.979 0.959 0.970 0.973 0.97 0.023 27.97
OOA-CNN 0.987 0.982 0.988 0.984 0.991 0.99 0.012 18.53
Fig. 5 displays the confusion matrix illustrating the outcomes of the OOA-CNN
method in classifying breast cancer. The matrix provides a strong visual
representation of the model's accuracy, affirming its effectiveness in the medical field.
The accurate classification of breast cancer underscores the robust performance of the
suggested approach in contributing to the field of diagnostic methodologies.
Fig. 5. Performance of the confusion matrix
Fig. 6 indicates the accuracy curve during the training and validation. The graph
reveals that the proposed model provides a high accuracy rate during training and
validation because the proposed method extracts more information from the existing
data using transfer learning. It creates new features from the existing ones, which help
the model to learn and classify breast cancer better. Then overfitting is avoided by
introducing the cross-validation dividing the dataset. Then the graph in Fig. 7
illustrates the trajectory of the loss function for the suggested approach throughout the
training and validation phase. The graph shows that both the loss decreases sharply
which indicates that the model is learning and stabilizing because the hyperparameter
of the classification model is optimized using the advanced optimization algorithm to
achieve better results with less loss.
16
Fig. 6. Performance of the Accuracy Fig. 7. Performance of the Loss function
Table 5. Evaluation of state-of-art methods
Referenc Model Accuracy Year

e (%)
[22] Ensemble of DNNs 95.29 2020
[23] ResNet50 96.97 2020
[24] CNN-FNN 91.00 2020
[25] GAN 87.60 2021
[26] myResNet-34 94.03 2021
[27] AE+Siamese Network 96.70 2022
[28] Ensemble transfer learning 98.90 2023
[29] GARL-Net 99.49 2023
Proposed OOA-CNN 99.51 2023
The comprehensive comparison of various state-of-the-art methods is tabulated in

Table 5 for breast cancer classification, emphasizing the proposed OOA-CNN model.
The accuracy percentages and publication years reveal that the OOA-CNN excels
with an impressive accuracy of 99.51%. This outperformance is evident when
compared to earlier models like Ensemble of DNNs, ResNet50, CNN-FNN, GAN,
myResNet-34, and AE+Siamese Network. The consistently high accuracy positions
the OOA-CNN as a robust and advanced solution for breast cancer classification,
showcasing its potential for enhanced diagnostic accuracy. The results demonstrate
that the OOA-CNN model outperforms existing benchmarks, establishing a new
standard in the field.
5 Conclusion
This article addresses the imperative need for enhanced breast cancer classification
methods by introducing a novel approach leveraging advanced pre-trained
17
Convolutional Neural Network (CNN) architectures and innovative techniques.

Through rigorous data pre-processing, augmentation, and feature extraction using
fine-tuned MobileNetV2, Thin MobileNet, and Reduced MobileNetV2, the proposed
methodology demonstrates improved accuracy in breast cancer identification.
Transfer learning further refines CNN models, and the integration of the Osprey
Optimization Algorithm (OOA) significantly enhances classification performance
while minimizing computational costs. The experimental evaluation, conducted on the
BreakHis dataset, employs ten-fold cross-validation and various performance metrics,
showcasing the proficiency of the proposed method. Finally, the proposed method is
evaluated against the most advanced methods in the field, highlighting its competitive
edge. From the evaluation, it has been found that the suggested framework improves
classification efficiency without requiring new training, yielding outstanding
outcomes. In future work, the most advanced neural network will be used for the
classification.
Acknowledgements. All authors named on the title page have made substantial
contributions to the research, have reviewed the manuscript, confirm the integrity and
valid interpretation of the data, and consent to its submission.
References
1. Karabatak, M.: A new classifier for breast cancer detection based on Naïve
Bayesian. Measurement 72, 32-36 (2015).
2. Rane, N., Sunny, J., Kanade, R., Devi, S.: Breast cancer classification and
prediction using machine learning. International Journal of Engineering
Research and Technology 9(2), 576-580 (2020).
3. Motlagh, M. H., Jannesari, M., Aboulkheyr, H., Khosravi, P., Elemento, O.,
Totonchi, M., Hajirasouliha, I.: Breast cancer histopathological image
classification: A deep learning approach. BioRxiv 242818 (2018).
4. Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., He, Q.: A comprehensive
survey on transfer learning. Proceedings of the IEEE 109(1), 43-76 (2020).
5. Byra, M., Galperin, M., Ojeda Fournier, H., Olson, L., O'Boyle, M., Comstock,
C., Andre, M.: Breast mass classification in sonography with transfer learning
using a deep convolutional neural network and color conversion. Medical
physics 46(2), 746-755 (2019).
6. Khan, S., Islam, N., Jan, Z., Din, I. U., Rodrigues, J. J. C.: A novel deep learning
based framework for the detection and classification of breast cancer using
transfer learning. Pattern Recognition Letters 125, 1-6 (2019).
7. Ragab, D. A., Sharkas, M., Marshall, S., Ren, J.: Breast cancer detection using
deep convolutional neural networks and support vector machines. PeerJ 7,
e6201 (2019).
8. Suzuki, S., Zhang, X., Homma, N., Ichiji, K., Sugita, N., Kawasumi, Y.,
Yoshizawa, M.: Mass detection using deep convolutional neural network for
mammographic computer-aided diagnosis. 55th Annual conference of the
society of instrument and control engineers of Japan (SICE), Tsukuba, Japan,
pp. 1382-1386 (2016).
18
9. Deniz, E., Şengür, A., Kadiroğlu, Z., Guo, Y., Bajaj, V., Budak, Ü.: Transfer
learning based histopathologic image classification for breast cancer
detection. Health information science and systems 6, 1-7 (2018).
10. Farhadi, A., Chen, D., McCoy, R., Scott, C., Miller, J. A., Vachon, C. M.,
Ngufor, C.: Breast cancer classification using deep transfer learning on
structured healthcare data. IEEE International Conference on Data Science and
Advanced Analytics (DSAA), Washington, DC, USA, pp. 277-286 (2019).
11. Zhu, W., Braun, B., Chiang, L. H., Romagnoli, J. A.: Investigation of transfer
learning for image classification and impact on training sample size.
Chemometrics and Intelligent Laboratory Systems 211, 104269 (2021).
12. Samee, N. A., Alhussan, A. A., Ghoneim, V. F., Atteia, G., Alkanhel, R., Al-
Antari, M. A., Kadah, Y. M.: A Hybrid Deep Transfer Learning of CNN-Based
LR-PCA for Breast Lesion Diagnosis via Medical Breast
Mammograms. Sensors 22(13), 4938 (2022).
13. Alruwaili, M., Gouda, W.: Automated breast cancer detection models based on
transfer learning. Sensors 22(3), 876 (2022).
14. A Khamparia, A., Bharati, S., Podder, P., Gupta, D., Khanna, A., Phung, T. K.,
Thanh, D. N.: Diagnosis of breast cancer based on modern mammography using
hybrid transfer learning. Multidimensional systems and signal processing 32,
747-765 (2021).
15. Aly, G. H., Marey, M., El-Sayed, S. A., Tolba, M. F.: YOLO based breast
masses detection and classification in full-field digital mammograms. Computer
methods and programs in biomedicine 200, 105823 (2021).
16. Zhang, Y. D., Satapathy, S. C., Guttery, D. S., Górriz, J. M., Wang, S. H.:
Improved breast cancer classification through combining graph convolutional
network and convolutional neural network. Information Processing &
Management 58(2), 102439 (2021).
17. Khafaga, D.: Meta-heuristics for feature selection and classification in diagnostic
breast cancer. Computers, Materials and Continua 73(1), 749-765 (2022).
18. Hameed, Z., Zahia, S., Garcia-Zapirain, B., Javier Aguirre, J., Maria Vanegas,
A.: Histopathology image classification using an ensemble of deep learning
models. Sensors 20(16), 4373 (2020).
19. Zahoor, S., Shoaib, U., Lali, I. U.: Breast cancer mammograms classification
using deep neural network and entropy-controlled whale optimization
algorithm. Diagnostics 12(2), 557 (2022).
20. Sinha, D., El-Sharkawy, M.: Thin mobilenet: An enhanced mobilenet
architecture. 10th annual ubiquitous computing, electronics & mobile
communication conference (UEMCON), New York, NY, USA, pp. 0280-0285
(2019).
21. Ayi, M., El-Sharkawy, M.: Rmnv2: Reduced mobilenet v2 for cifar10. 10th
Annual Computing and Communication Workshop and Conference
(CCWC), Las Vegas, NV, USA, pp. 0287-0292 (2020).
22. Dehghani, M., Trojovský, P.: Osprey optimization algorithm: A new bio-
inspired metaheuristic algorithm for solving engineering optimization problems.
Frontiers in Mechanical Engineering 8, 1126450 (2023).
23. Yari, Y., Nguyen, T. V., Nguyen, H. T.: Deep learning applied for histological
diagnosis of breast cancer. IEEE Access 8, 162432–162448 (2020).
19
24. Zaychenko, Y., Hamidov, G.: Medical images of breast tumors diagnostics with
application of hybrid CNN–FNN network. Syst. Res. Inf. Technol. 4, 37-47
(2018).
25. Wang, D., Chen, Z., Zhao, H.: Prototype transfer generative adversarial network
for unsupervised breast cancer histology image classification. Biomed. Signal
Process. Control, 68, 102713 (2021).
26. Hu, C., Sun, X., Yuan, Z., Wu, Y.: Classification of breast cancer
histopathological image with deep residual learning. Int. J. Imaging Syst.
Technol 31(3), 1583–1594 (2021).
27. Liu, M., He, Y., Wu, M., Zeng, C.: Breast histopathological image classification
method based on autoencoder and Siamese framework. Information 13(3), 107
(2022).
28. Zheng, Y., Li, C., Zhou, X., Chen, H., Xu, H., Li, Y., Grzegorzek, M.:
Application of transfer learning and ensemble learning in image-level
classification for breast histopathology. Intelligent Medicine 3(02), 115-128
(2023).
29. Patel, V., Chaurasia, V., Mahadeva, R., Patole, S. P.: GARL-Net: Graph Based
Adaptive Regularized Learning Deep Network for Breast Cancer
Classification. IEEE Access 11, 9095-9112 (2023).

103 Submission

Uploaded by

Copyright:

Available Formats

103 Submission

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

103 Submission

Uploaded by

Copyright:

Available Formats

Feature extraction and classification with transfer

learning for breast cancer classification

M. Mohana Dhas1*[0000-0002-9296-5162] and N. Suresh Singh2

Abstract. Breast cancer emerges as a significant health issue worldwide,

Keywords: MobileNet V2, Breast Cancer, Osprey Optimization Algorithm,

 Introduces Osprey Optimization Algorithm for efficient CNN optimization,

architectures: fine-tuned MobileNetV2, thin MobileNet, and reduced MobileNetV2

Fig. 1. Architecture of the Proposed Model

3.1 Data pre-processing and augmentation

Considering the rotational invariance of microscopic images, the proposed approach

3.2 Feature Extraction using Pre-trained CNN

The CNN model comprises various layers, encompassing an input layer,

3.2.1 Fine-tuned MobileNet V2

3.2.2 Thin MobileNet

Fig. 2. Architecture of MobileNet

3.2.3 Reduced MobileNet V2

Fig. 3. Structure of reduced MobileNet V2

3.2.4 Transfer learning

3.3 Breast Cancer Classification using optimized CNN with Osprey

3.3.1 Osprey Optimization Algorithm (OOA)

Where x is the representation of the location population matrix of osprey, x i , j is its

3.4 Hyperparameter Tuning

4 Result and Discussion

Fig. 4. Sample images of the dataset

4.2 Performance metrics

4.3 Evaluation of Feature Selection

Table 3. Performance of the derived features

Model Training Class

4.4 Evaluation of Classification

Table 4. Performance of the model

Model Accuracy Sensitiv Specific Precis F1- AUC FPR Time(s)

Fig. 5. Performance of the confusion matrix

Fig. 6. Performance of the Accuracy Fig. 7. Performance of the Loss function

Table 5. Evaluation of state-of-art methods

Referenc Model Accuracy Year

The comprehensive comparison of various state-of-the-art methods is tabulated in

Convolutional Neural Network (CNN) architectures and innovative techniques.

You might also like