Breast Cancer Image Pre-Processing With Convolutional Neural Network For Detection and Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Breast Cancer Image Pre-Processing With

Convolutional Neural Network For Detection and


Classification
Aulia Arif Iskandar, S.T., M.T. Michael Jeremy Muhammad Fathony, Ph.D
Department of Biomedical Engineering Department of Biomedical Engineering Department of Biomedical Engineering
(Swiss German University) (Swiss German University) (Swiss German University)
Tangerang, Indonesia Jakarta, Indonesia Tangerang, Indonesia
[email protected] [email protected] [email protected]

Abstract—Breast Cancer is one of the most common types of mammogram images. Owing to the barrage of data obtained
cancer. This research was conducted with the purpose of from screenings, it is difficult to maintain a highly accurate
developing a Computer-Aided Diagnosis to detect breast diagnosis when having to manually evaluate each
cancers from mammogram images. The mammogram images mammogram image obtained from the widespread screening
were obtained from the INbreast Dataset and Husada Hospital [2]. It’s evident that misdiagnosis and misinterpretations by
in Jakarta. The program was developed with the usage of pre- radiologists can also occur. This is caused by the fatiguing task
processing which includes Median Filtering, Otsu thresholding, of evaluating hundreds of images. In research conducted by
Truncation Normalization, and Contrast Limited Adaptive
[4], out of 943 samples collected, 15 breast cancers were
Histogram Equalization to manipulate the images and
detected. However, 7 (46.6%) out of the 15 breast cancer cases
Convolutional Neural Network to classify the images into either
mass or normal, or either benign or malignant. The pre-
were missed from the assessment, and 3 (43%) cases were
processing pipeline have provided enhanced images to be used missed due to misinterpretations by the radiologists.
to train and test the Convolutional Neural Network. The best Numerous attempts have aimed to develop a Computer-
model achieved reached an accuracy, precision and sensitivity Aided Diagnosis (CAD) to help detect breast cancer and to
of 94.1%, 100% and 85.7% in classifying the mammogram
improve the accuracy of a diagnosis. CAD is designed to help
images into benign or malignant, and 88.3%, 92.6% and 83.3%
in classifying the mammogram images into mass or normal. In radiologists analyze images and serve as a second opinion, or
conclusion, the algorithm was able to classify mammogram supporting tool for radiologists [5]. This study aims to
images and has provided results as high as other related develop an image processing method to pre-process
researches. mammogram images for breast cancer detection and
classification using CNN. This study used Otsu thresholding,
Keywords—Breast Cancer, Classification, Convolutional median filtering, CLAHE and truncation normalization. Data
Neural Network, Image Processing, Mammography augmentation will also be used to enlarge the dataset.
I. INTRODUCTION Afterwards, CNN is used to provide a positive and negative
diagnosis from the mammogram images. This study also had
Amongst the common fatal diseases, cancer is one of the the privilege of using mammogram images from a local
leading types of diseases that causes millions of deaths in a hospital, Husada Hospital located in Jakarta.
year. The human body is composed of trillions of cells that
group together and perform millions of functions which makes
II. RESEARCH METHODS
up for a complete individual. Cancer arises due to an
accumulation of mutations that transforms normal cells to A. Study Design
cancerous cells. The cells then either fix the mutations or self-
destruct before it becomes cancerous. However, a mutation
can grow unchecked and accumulate. The accumulation of
mutation then causes the cell to become cancerous and invade
nearby cells and tissues. From nearby cells and tissues, the Fig 1. Study Design Block Diagram
cancer can even metastasize to distant organs.
According to Cancer Statistics 2020, breast cancer Fig. 1 represents the block diagram for the algorithm
accounts for 30% of cancer cases with 276,480 new cases and developed in this study. The input image represents the
42,000 estimated deaths in 2020 [1]. In Indonesia alone, breast mammogram image to be analyze or predicted. To produce
cancer incidents occur with an incident rate of 42 cases in the prediction for the input image, the whole process includes
100,000 women, and the mortality rate is found to be higher two major steps: “Image Pre-processing” step and “CNN
than the average global rate [2]. The primary reason for this Classification” step. The pre-processing step was designed to
higher-than-average mortality rate is the lack of early manipulate and enhance the input image to further increase the
detection. Early detection of breast cancer is crucial, as breast performance of the CNN used in the classification step. The
cancer survival depends largely on effective and affordable “CNN classification” step then takes the pre-processed
treatments where proper follow ups and timely treatments can mammogram image and produce the predicted classification
be conducted [3]. output for the corresponding image.

One of the methods aiming towards early detection of There are two different outputs of the classification step
breast cancer is a widespread screening using mammography. depending on which dataset is used: Mass or Normal, Benign
A mammography screening results in a large quantity of or Malignant. Mass is when the breast in the image contains a
mass, normal is when the breast does not contain any mass. Python, and each mammogram images were flipped either
Malignant is when the breast has a malignant mass, benign is right or left, or rotated either 90° or 270°.
when the breast does not contain a mass or has a mass is not
The CNN used a transfer learning method using
malignant.
DenseNet201 architecture [6] and a customized classification
Fig 2 represents a more detailed block diagram of the layer. The training was set to 30 epochs with a learning rate of
methods. The image pre-processing includes processing the 1e-4 and the minimum learning rate was set to 1e-7. Fig. 3
image with Median Filtering, Otsu thresholding, Truncation represents how the CNN produce its predictions based on the
normalization, CLAHE. Afterwards, the images were then processed mammogram images.
manipulated by flipping, padding and resizing to meet the
CNN requirements. Data augmentation was used to enlarge
and balance the data.

Fig 3. CNN Classification

This study was conducted using Acer Aspire 5, Intel®


Core(TM) i3-1005G1 CPU @1.20GHz, 10th GEN, 12.0 GB
RAM with Windows 64-bit OS, x64-based processor. The
code was written in Jupyter Notebook 6.4.5 for the image pre-
processing using Python 3.9.7 and Kaggle for the CNN
training and testing using Python 3.7.12 with GPU
Accelerator. The OpenCV version used was OpenCV Python
4.5.5.62.
B. Data
Before CNN was used to be able to classify mammogram
images, it needs sufficient images to train the CNN. The
Fig 2. Research Methodology Block Diagram training results were saved in the form of weights which was
then used for the classification process. In this study, the
The median filter works to denoise the image. Through mammogram images were obtained from the INbreast dataset
median filtering, the edges of the breast were more preserved [7] and mammogram images were also obtained from Husada
before any further preprocessing and it reduced the amount of Hospital located in Jakarta. The INbreast dataset consists of
noise present within the image. The kernel size for the median 115 cases. Each case has either four images (left and right, CC
filter is set to 3x3. and MLO view) or two images (left or right, CC or MLO
Using Otsu thresholding, separation from the breast region view).
or foreground from the black background of a mammogram Out of the 115 cases, 90 cases were from women with both
image was obtained. The main purpose of using the Otsu breasts assessed which results in four images per case, and 25
thresholding is to separate the foreground from the cases were from mastectomy patients which results in 2
background, obtaining the masked image, and then using the images per case. Therefore, the INbreast dataset has in total
generated bounding box to crop the image to obtain only the 410 images. There are several types of lesions included within
foreground region. the images from the INbreast dataset: mass, calcifications,
Afterwards, the contrast of the image is enhanced through asymmetries, and distortions.
truncation and normalization and CLAHE. Truncation In this study, only breasts that does not have any findings
normalization was used because if a standard normalization is and breasts that have been assessed to have masses were
used, then it would induce an adverse effect towards the selected for this study. Out of the 410 images, the INbreast
image. Two CLAHEs with two different clip limits were used dataset has in total 108 images that has a mass finding within
to further differentiate any mass from the surrounding breast the breast. Breasts that do not have any findings were 68
tissue. The processed image was synthesized where it uses the images in total. Therefore, the total images used from the
image obtained from the truncation normalization, CLAHE INbreast dataset were 176 images.
with clip limit 1.0 and CLAHE with clip limit 2.0. The three
images were merged together to form a 3-channel image. This study was also provided with mammogram images
Afterwards, the image was converted into a gray scaled image. from Husada Hospital which is a hospital located in Jakarta.
The mammogram images from Husada Hospital were
Data augmentation was used to enlarge the amount of obtained from 15 patients dated from September 2021 until
mammogram images in the datasets. This step is essential due February 2022. In total, the images obtained from Husada
to the lack of mammogram images available for this study. Hospital were 64 images. However, 2 images were not used
The data augmentation uses the augmentor library from since it was defected. Therefore, the total mammogram
images used from Husada Hospital was 62 images.
As seen in the filtered image, the majority of the image is
still composed of the background, which offers no valuable
information. The background region is composed of millions
of pixels with intensity value 0. To reduce this unused
information, Otsu Thresholding was used to reduce the
amount of background and obtain the breast region from the
mammogram image. The Otsu Thresholding method separates
the foreground from the background through a binarization
process.
The separation of foreground and background resulted in
a masked image which was used to obtain the bounding box
of the breast region. Using the obtained bounding box, the
image was then cropped. In other words, the cropping process
using this Otsu thresholding method is adaptive to each
mammogram image. Fig 6 shows an example of a
Fig 4. Datasets
mammogram image after cropping. Originally, the size of the
image was 3328 pixels x 4084 pixels. After cropping, the size
The folder structure of the datasets developed in this of the image was reduced to 1493 pixels x 2600 pixels.
research can be seen in Fig 4. The bolded text represents a
folder, whereas non-bolded text are the files within the folder.
The files are all mammogram images obtained from the
INBreast Dataset and Husada Hospital in Jakarta. There were
2 datasets developed in this study, each dataset consisted of a
parent folder, which contained train and validation folders.
The train and validation folders each contained another two
separate folders representing the classification class for the
classification process. The Mass Normal Dataset contains 2
classes which includes: Normal (mammogram images without Fig 6. Output of Otsu Thresholding
mass findings) and Mass (mammogram images with mass
findings). The Benign Malignant Dataset also contains 2 Enhancing the contrast of the image was done through
classes which includes: Benign (BI-RADS 1 – 3) and truncation normalization and CLAHE. The truncation
Malignant (BI-RADS 4 – 6). The classification output of the normalization method consists of 2 processes: truncation
CNN will be dependent based on which datasets were used. process and normalization process. Truncation normalization
allows for contrast enhancement with minimal noise
III. RESULTS AND DISCUSSIONS amplification. This is possible due to the truncation prosess
In this sub-chapter, the results from the image pre- prior to normalizing the image. The truncation process follows
processing and CNN training and testing are discussed. The the condition in (1), where P is the intensity value of a pixel.
analysis methods used to assess the results of image pre- The Pmax value is the 1% largest values of the intensity
processing were visual description and image histogram. The values, while the Pmin value is the 5% smallest values of the
results of CNN training are assessed using learning curves, image intensity.
and the results of CNN testing are assessed using confusion
matrixes. The accuracy obtained by the CNN was used as the ܲ݉݅݊ǡ ݂݅ܲ ൑ ܲ݉݅݊
main parameter of comparison and analysis.  ܲ ൌ  ቐܲǡ ݂݅ܲ݉݅݊ ൏ ܲ ൏ ܲ݉ܽ‫ ݔ‬ 
ܲ݉ܽ‫ݔ‬ǡ ݂݅ܲ ൒ ܲ݉ܽ‫ݔ‬
A. Image Pre-processing Results
The first step of the image pre-processing pipeline was the After the truncation process, each pixel in the image was
Median Filter. A kernel size of 3x3 was used. Fig. 5 shows the normalized using (2).
mammogram image before (a) and after (b) median filter.
௉ି௉௠௜௡
Median filtering was used to denoise the image by blurring the  ܲ ൌ ௉௠௔௫ି௉௠௜௡ 
image, and also preserve the edges of the breast in the
mammogram, to avoid losing it due to further pre-processing.
The normalized image was then further processed using
CLAHE. Two CLAHEs were used where the clip limit was
set to 1.0 and 2.0. The truncation normalized image was then
merged with the image processed using CLAHE to form a 3-
channel image or a colored image. The colored image was
then converted back into grayscale.
The comparison between the original image and
synthesized image can be seen in Fig 7 along with the
histogram of each image. Originally, it can be seen that the
information from the image is confined into a certain region
Fig 5. Output of Median Filtering: (a) Original Image; (b) Median Filtered within the histogram. After pre-processing, it can be seen that
the information from the image has stretched across the X-axis
of the histogram, which indicates that the information from the
image has been enhanced. Visually, the mass within the breast It can also be seen from the graph that there were no case of
tissue is clearly more visible after pre-processing, where it is overfitting in the performance of the program.
further differentiated from the surrounding breast tissue.

Fig 7. Final Output of Image Pre-processing

B. CNN Training
Several experiments were done using the Mass Normal
Dataset and Benign Malignant Dataset. The CNN training was
set to 30 epochs, and was conducted in an online platform, Fig 9. Learning Curves of CNN Training on the Benign Malignant Dataset
Kaggle. The CNN training was accelerated using a GPU
accelerator which was provided by the Kaggle website. On the other hand, Fig 9 shows the overall training
performance of the CNN on 1.253 mammogram images from
the Benign Malignant Dataset. The evaluation curve graphs
also shows that the behaviour of accuracy and loss was similar
to the graph produced when using the Mass Normal Dataset,
where the accuracy increases and the loss decreases as the
epoch progresses.
However, it can be seen that the validation loss curve lies
considerably beneath the training loss curve. This indicates a
possibility that the images in the validation folder in the
“Benign Malignant Dataset” was easier to predict for the
program compared to the images in the validation folder in the
“Mass Normal Dataset.
This occurrence is also commonly associated with poor
sampling procedures where duplicate samples exist in the
training and evaluation datasets. It can also happen due to a
less variety of mammogram image in the evaluation dataset.
Another possibility for this occurrence is information leakage,
where features from the mammogram image samples in the
training dataset has a direct link or ties towards the features of
mammogram image samples in the evaluation dataset.
C. CNN Testing
Fig 8. Learning Curves of CNN Training on the Mass Normal Dataset The results of CNN testing can be seen in Table 1 and
Table 2 below. The Mass Normal Dataset contained a total of
Fig 8 shows the overall training performance of the CNN 60 images for testing, whereas the Benign Malignant Dataset
on 1.157 mammogram images from the Mass Normal Dataset. contained a total of 34 images for testing. The testing was
The X axis in both graphs represents the epoch and the Y axis carried out using the weights obtained from the CNN training.
represents the accuracy and the loss. It can be seen that the Table 1 shows the results of CNN testing using the Mass
higher the epoch, the accuracy increases while the loss Normal Dataset, and Table 2 shows the results of CNN testing
decreases. This behavior is synonymous for both the training using the Benign Malignant Dataset. The main parameter for
(the blue line) and the evaluation (the orange line). The graph analysis and comparison was accuracy. The higher the
shows that the more the CNN learns and trains, the higher the accuracy, the better the performance of the program was
accuracy is obtained by the CNN, whereas the loss decreases. deemed to be. Precision and Sensitivity were used for further
analysis.
by the higher precision rather than sensitivity. the overall
performance by the CNN have given satisfactory results with
TABLE I. CNN TESTING RESULT ON THE MASS NORMAL DATASET an average accuracy performance of 90.2%. Improvements
can still be made to further decrease the FN value and thereby
Experimental Runs (%) Average
RESULT
increasing the accuracy and sensitivity score.
Run 1 Run 2 Run 3 Performance (%)

Model: DenseNet201 CNN


D. Discussion
TP 26 27 25
After conducting several experiments with the developed
algorithm for pre-processing and CNN, it can be said that the
FN 4 3 5
program was able to be used for mammogram image
TN 25 26 28 classification for breast cancer detection. Although
FP 5 4 2 improvements can be made towards the program, this research
Accuracy 85.00 88.33 88.33 87.22 has produced satisfactory results and to a certain extent, has
provided improved results. This research has provided a solid
Specificity 83.33 86.67 93.33 87.78
stepping stone for future studies to continue developing the
Sensitivity 86.67 90.00 83.33 86.67 program to further increase the capability and potential of this
Precision 83.87 87.10 92.59 87.85 program.
Dataset Used Mass Normal Dataset The trainings of CNN have provided the expected results.
Note: Positive (Mass), Negative (Normal) However, it must be noted that during the training using the
benign malignant dataset that the evaluation curves of both
evaluation accuracy and loss lies beneath the training curves.
In Table 1, it can be seen that the highest accuracy This may be caused by poor sampling where duplicate or
achieved from the CNN testing was 88.33% which was similar samples exists in both training dataset and evaluation
achieved in experimental run 2 and experimental run 3 (yellow dataset, poor variety in the evaluation dataset, or information
colored cell), and the average accuracy performance for 3 leak where features from the training dataset have direct ties
experimental runs was 87.22%. It can also be seen that both towards features in the evaluation dataset.
experimental run 2 and 3 had the same achieved accuracy.
Based on the experiment results for CNN, using accuracy
However, in the case of detecting cancers, it is crucial that as the primary parameter, the program performed better using
the program does not miss any mass images, since it is more the “Benign Malignant Dataset”, compared to using the “Mass
important that the program does not miss any mass cases from Normal Dataset”. The highest accuracy achieved using the
patients in real world application. Therefore, falsely predicted “Benign Malignant Dataset” was 94.1%, whereas the highest
mass images as normal images by the program were deemed accuracy achieved using the “Mass Normal Dataset” was
more critical rather than falsely predicted normal images. 88.3%.
Thus, it can be seen that experimental run 2 was the best model
in accuracy as well as sensitivity, correctly classifying mass This may be caused by the difficulty in identifying the
images with 27 TP value, and the least FN value of all mass due to the variation of tissue density in mammogram
experimental runs, with just 3 FN value, which can be seen in images, where masses in breasts with high tissue density were
the higher sensitivity obtained. more difficult to distinguish compared to lower tissue density
or fatty breast. Because both fibroglandular tissues and mass
TABLE II. CNN TESTING RESULT ON THE BENIGN MALIGNANT lesion are represented as white areas within a mammogram
DATASET image. Therefore, the higher the density of the fibroglandular
tissues, the higher the chance of the program misclassifying
Experimental Runs (%)
Result
Average the corresponding mammogram image, because the white
Run 1 Run 2 Run 3 Performance (%)
areas caused by the high tissue density may mask an
Model: DenseNet201 CNN underlying mass.
TP 12 12 10 Several limitations encountered during this research
FN 2 2 4 include the hardware used in this study where the training and
TN 20 18 20 testing of the CNN was not possible using the local computer
FP 0 2 0 because of the experimentally exhaustive and high
Accuracy 94.12 88.24 88.24 90.20
computational power and burden. Therefore, an online server
was needed to conduct the experiments using the CNN. This
Specificity 100.00 90.00 100.00 96.67
will cause some problems towards future usage of the
Sensitivity 85.71 85.71 71.43 80.95 program, because some hospital’s ethical code does not allow
Precision 100.00 85.71 100.00 95.24 any patient data or image be uploaded onto an online server.
Dataset Used Benign Malignant Dataset The limitations also include a lack of mammogram data. The
size of the dataset used in this study was substantially small,
Note: Positive (Malignant), Negative (Benign)
whereas CNN works better in large datasets.
Table 3 was made to compare the results achieved in this
In Table 2, the best performance on the Benign Malignant research with results achieved by other research that have
Dataset obtained was during the first experiment run with an similar design of study, with accuracy as the main parameter
accuracy score of 94.12%, in the yellow filled cell. It can also of comparison. All related study within Table 3 uses CNN,
be seen that the program performed better in detecting benign and mammogram as the primary data. It can be seen that the
cases compared to malignant cases, which can be seen where program has produced a comparable or improved results
the highest number of FP value was only 2, and also evident
compared to other related researches. But there are also other IV. CONCLUSION
researches that had better results compared to the results In conclusion, the best model achieved during this study
obtain in this current research. obtained an accuracy, precision and sensitivity of 94.1%,
Other research can obtain high accuracy which can be 100%, and 85.71% in classifying benign or malignant. The
credited to their advanced CNN or deep learning techniques best model achieved to classify mass or normal obtained an
and also advanced or higher-level hardware with high accuracy, precision and sensitivity of 88.3%, 92.6% and
computational power. Advanced CNN techniques include 83.3%. The pre-processing methods followed by the CNN
methods such as combining different architectures of CNN. It have given improved and satisfactory results where the
is also important to note that “Current Research Model B” program achieved an accuracy higher or as high as other
currently has no other comparation since the classification related researches in predicting breast cancers from
classes in this model was different than other similar mammogram images. The pre-processing method have been
researches conducted recently. proven to be able to produce enhanced mammogram images
for the program to detect mass lesions and cancers from the
TABLE III. COMPARISON OF THE ACHIEVED RESULTS WITH OTHER mammogram images, and also for doctors and radiologists.
SIMILAR RESEARCH
Classes Research Accuracy Sensitivity Precision
REFERENCES
[1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer Statistics, 2020,” Am.
2 Class: [8] 0.961 0.962 -
Cancer Soc. ACS Journals, 2020.
Benign, Current Research [2] A. V. Icanervilia et al., “A Qualitative Study: Early Detection of Breast
0.941 0.857 1.00
Malignant Model A Cancer in Indonesia (After Universal Health Coverage
Implementation),” 2021.
[9] 0.930 0.948 0.917
[3] O. Ginsburg et al., “Breast cancer early detection: A phased approach
2 Class: to implementation,” Cancer, vol. 126, pp. 2379–2393, 2020.
Current Research
Normal, 0.883 0.900 0.871 [4] K. B. Waheed et al., “Breast cancers missed during screening in a
Model B
Mass tertiary-care hospital mammography facility,” Ann. Saudi Med., vol.
39, no. 4, pp. 236–243, 2019.
2 Class: [10] 0.865 0.851 0.845
[5] K. Doi, “Current status and future potential of computer-aided
Benign, [11] 0.823 0.913 0.856 diagnosis in medical imaging,” Br. J. Radiol., vol. 78, no. suppl_1, pp.
Malignant [12] 0.75 0.5 0.714 s3–s19, 2005.
[6] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
a.
Model A: Using “Benign Malignant Dataset” connected convolutional networks,” in Proceedings of the IEEE
b.
Model B: Using “Mass Normal Dataset” conference on computer vision and pattern recognition, 2017, pp.
4700–4708.
Consultations were made with an expert radiologist from [7] I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M. J. Cardoso, and
J. S. Cardoso, “INbreast: Toward a Full-field Digital Mammographic
Husada Hospital to gain an insight regarding the program was
Database,” Acad. Radiol., vol. 19, no. 2, pp. 236–248, Feb. 2012, doi:
obtained from a medical perspective. In Fig 10, the left image 10.1016/J.ACRA.2011.09.014.
shows the image before preprocessing, while the right image [8] Y.-D. Zhang, S. C. Satapathy, D. S. Guttery, J. M. Górriz, and S.-H.
shows the image after preprocessing. The notes mentioned by Wang, “Improved breast cancer classification through combining
the expert radiologist to obtain a medical’s perspective, the graph convolutional network and convolutional neural network,” Inf.
mammogram image had an enhancement in quality after Process. Manag., vol. 58, no. 2, p. 102439, 2021.
preprocessing, where it clearly shows the irregularity of the [9] E. M. F. El Houby and N. I. R. Yassin, “Malignant and nonmalignant
outer shape of the mass better compared to the mammogram classification of breast lesions in mammograms using convolutional
before preprocessing, indicating the signs of cancer. In Fig 10 neural networks,” Biomed. Signal Process. Control, vol. 70, p. 102954,
2021.
specifically, it is important to note that although the mass
[10] Z. Wang et al., “Breast cancer detection using extreme learning
within the breast was small in size, it was also categorized into machine based on feature fusion with CNN deep features,” IEEE
a malignant type of mass which also indicates cancer. Both Access, vol. 7, pp. 105146–105158, 2019.
images had the mass lesion within the breast greatly enhanced [11] H.-C. Lu, E.-W. Loh, and S.-C. Huang, “The Classification of
which also benefits doctors and radiologists. Mammogram Using Convolutional Neural Network with Specific
Image Preprocessing for Breast Cancer Detection,” in 2019 2nd
International Conference on Artificial Intelligence and Big Data
(ICAIBD), 2019, pp. 9–12.
[12] H. Zhou, Y. Zaninovich, and C. Gregory, “Mammogram classification
using convolutional neural networks,” in International conference on
technology trends, 2017, vol. 2.

Fig 10. Original Image vs Pre-processed Image; Left Image(s) (original);


Right Image(s) (pre-processed)

You might also like