Implementation of Intelligent Model For Pneumonia Detection: Željko KNOK, Klaudio PAP, Marko HRNČIĆ
Implementation of Intelligent Model For Pneumonia Detection: Željko KNOK, Klaudio PAP, Marko HRNČIĆ
Implementation of Intelligent Model For Pneumonia Detection: Željko KNOK, Klaudio PAP, Marko HRNČIĆ
https://doi.org/10.31803/tg-20191023102807
Abstract: The advancement of technology in the field of artificial intelligence and neural networks allows us to improve speed and efficiency in the diagnosis of various types of
problems. In the last few years, the rise in the field of convolutional neural networks has been particularly noticeable, showing promising results in problems related to image
processing and computer vision. Given that humans have limited ability to detect patterns in individual images, accurate diagnosis can be a problem for even medical professionals.
In order to minimize the number of errors and unintended consequences, computer programs based on neural networks and deep learning principles are increasingly used as
assistant tools in medicine. The aim of this study was to develop a model of an intelligent system that receives x-ray image of the lungs as an input parameter and, based on the
processed image, returns the possibility of pneumonia as an output. The implementation of this functionality was implemented through transfer learning methodology based on
already defined convolution neural network architectures.
The artificial neural network, inspired by the work of the 2.4 General Learning Rule
human brain, contains a number of connected processors,
neurons, which have the same role as biological neurons Each artificial neural network is based on a general
within the brain. They are connected by links whose signals learning rule that involves collecting relevant data, which is
can pass from one neuron to another, thus transmitting then divided into training and validation data.
important information. Each neuron receives a certain After the data collection is completed, it is necessary to
number of input signals in the Xi tag. Each connection has its determine the architecture of the neural network, which
own numerical value, namely the weight of the Wi, which is involves determining the number of layers in the network and
the basis for long-term memory in artificial neural networks. the number of neurons in each layer, and then selecting the
The Xi and Wi values are summed by the transfer or type of connection between the neurons together with the
activation function, and the result is sent as output Y to learning rule that is the basis for defining criteria that
another neuron. determines the architecture of the neural network. The next
step is learning, which is the basis of artificial neural
networks. Learning involves initializing weights, training a
model based on a training dataset, and checking the amount
of error that weights are corrected after each iteration, which
refers to going through all the training samples, called the
epoch.
Learning lasts until the desired number of epochs is
reached or a wanted error is met. In the initial stages of
Figure 2 Structure of artificial neuron learning, the neural network will adapt to the general trends
present in the data set, so the error in both the training set and
the validation set will fall through time. problems are jump functions and sigmoidal functions.
During the longer learning process, there is a possibility
that the neural network can begin to adapt to the specific data
and noise of the learning data, thereby losing its
generalization property. The error on the training set will
drop while on the validation set it will start to grow. The
moment the validation set error starts to grow, the learning
process must be interrupted so that the model does not
become overfitted.
With the completion of the learning process, it is
necessary to test the operation of the network using
previously obtained validation data. The difference between
the learning and the testing phase is that the neural network
no longer learns in the testing phase and the weight values
are fixed. The evaluation of the network is obtained by
calculating the error and comparing it with the desired error
value. If the error is greater than allowed, additional training
data may need to be collected or the number of epochs
increased for better results, since in this case the network is
unsuitable for use.
indicates the square filter size. The output width after the where Ds is the old dimension, F is the filter width, and S is
convolution can be defined as: a jump between 2 value selections.
The fully connected layer is in most cases used in the
Wx − F final layers of the network. The reason for using fully
Wy
= + 1, (1) connected layers is to reduce the dimensions of the image by
S
passing it through the neural network, since complete
connectivity is defined as the square number of connections
between layers.
For example, for image data with dimensions
200×200×3, the input layer would have 120,000 input values.
If we fully linked this to a hidden layer consisting of 1000
neurons, we would have 120 million weight values to learn,
which requires big computing power. This is why fully
connected layers are used in the later stages of the neural
network.
The pooling layer contains a filter that reduces the During the learning process of the neural network, the
dimensions of an image. In a convolutional neural network, goal is to find the location of the global minimum of error,
pooling layer is most commonly used after several which means that the model is at the best possible level at a
convolution layers in order to reduce the resolution of maps given moment, and the learning process can stop. In this
generated by the convolution layer. The pooling layer filter process, the so-called local minimums can fool the network
is different from the convolution layer filter because it does in a way that the network thinks it is within the global
not contain weight values. The specified filter is used to minimum. Avoiding local minimums can be achieved by
select values in the filter default dimensions. using various methods.
Of the several types of pooling, the most commonly used Known methods for avoiding local minimums:
are average pooling and max pooling. Average pooling Random Transformations – Random transformations
replaces the arithmetic mean of clustered values, while the serve to augment an existing training dataset that is
max pooling simply selects the maximum value. The benefit accomplished by operations such as translation, rotation, and
of max pooling is to store the stronger and more prominent scaling. This increases the number of data without the need
pixels in the image, which are more important for getting the to collect additional samples. An increased amount of
end result, while the irrelevant pixels are eliminated. learning data makes the network less likely to get stuck in
local minimum. Random transformations can be performed
during each iteration in the learning process or by pre-
processing data before training begins.
2.9 Transfer Learning padding of value 1, and the total number of parameters of the
specified network architecture is 138 million. [5]
Conventional deep learning algorithms are traditionally
based on isolated tasks, while a single neural network serves 3 MODEL IMPLEMENTATION
a particular type of classification. Transfer learning is a new
methodology that seeks to change this and circumvent the This chapter describes the implementation of the neural
paradigm of isolated learning by developing knowledge network model. The process of collecting image data for
transfer methods that can use models learned on one type of learning, division of the set into a learning and validation set,
classification for multiple different tasks. visual analysis and comparison of data for learning and with
In this way, a model initially created for one type of data for validation, comparison of the number of positive and
problem can later be used as a starting point for solving a new negative images, and preprocessing of the collected data for
type of classification, and thus give better results than entering the learning process are presented.
initializing a new neural network from the beginning. The procedure of retrieving the previously described
An analogy can be made with learning to drive in real architecture of the VGG16 network and changing the output
life. In this analogy, learning how to drive a motorcycle can layer in accordance with the collected data were developed,
be greatly assisted by the knowledge of driving a car. the learning curves using Tensorboard technology and the
Transfer learning works in a similar way. implementation of the training or learning process were
Transfer learning is nowadays a very popular approach presented. After learning was completed, the best-performing
on which the practical part of this paper is based. Choosing model was saved and an evaluation of that model was
pre-trained models and already defined neural network performed to graphically present prediction accuracy on
architectures can be of great use in solving complex problems validation data not used in the learning process.
such as detecting pneumonia on x-ray images.
After the architecture of the existing neural network with 3.1 Data Collection
defined layers has been loaded, it is necessary to delete the
last, output layer from the specified network, and replace it The first step in implementing a model for detecting
with a new output layer related to the problem for which we pneumonia on x-ray images is to collect imaging data that
will use the network. will consist of sets for training and validating the model.
The next step is to conduct training on the complete, Dataset available for download on the Kaggle site was used.
already defined, architecture and all layers of the network on Kaggle offers users a large number of different datasets that
the learning dataset. By using this way of learning, the neural can be used for various research purposes.
network will be able to apply the classification principles The dataset selected consists of a total of 5863 lung x-
learned in the previous tasks to a new type of problem, and ray images divided into two categories. Existing categories
in that way the results will be better without the need to define are x-rays with a positive diagnosis of pneumonia and images
layers and create a new neural network from the beginning. of normal, healthy lungs with no indication of disease. The
Transfer learning is good to use in cases where we do not set contains 1583 images of healthy lungs and 4273 images
have a large dataset to learn. In the case where we have positive for pneumonia. The image data format is .jpeg, while
approximately 1000 images to perform learning process, by the image dimensions are different and vary from image to
merging that 1000 data with trained networks that have been image. In the later steps, it is necessary to change the image
trained with millions of data, we can gather many learned dimensions that will be supported as input for the VGG16
principles for sorting and classification, and in that way network architecture. [6]
improve model efficiency and reduce training time.
VGG16 is a convolution neural network architecture 3.2 Data Preprocessing
proposed by K. Simonyan and A. Zisserman of Oxford
University. The model achieves 92.7% accuracy on the After successful process of the data collection for
ImageNet image dataset, which consists of over 14 million training process, before learning, the same data should be
image data divided into 1000 classes. processed in such a way that they are suitable for entering the
The VGG16 network architecture consists of 13 network.
convolution layers in which the number of filters increases Image file paths were originally loaded using a function
from 64 to 512; 5 compression layers with highest value from the Scikit-learn library that loads the paths of all files in
compression; and, 3 fully connected layers that are used to a given directory as parameters of those directories within the
avoid local minimums using the dropout technique described main directory as categories of individual files, or their paths.
earlier. Subsequently, categories or labels are defined from the
Big progress in comparison with previous neural obtained data set, depending on whether the data is in the
network architectures has been made after resizing the subdirectory of images with pneumonia or images without
convolution filter to 3×3. disease. Saving image categories was done using
The filter of the pooling layers is 2×2 in size with a stride functionality from the Numpy library.
of 2, while all the hidden layers of the network use the By completing the label definition for all retrieved image
rectification linear activation function described in the paths, the total image dataset should be divided into a
previous chapters. All convolutional layers use stride and learning set and a set for validation. In this case, 90% of the
total data set was determined for learning purposes, with the 3.4 Model Architecture
remaining 10% for validation.
After splitting paths and labels into training and Defining a convolutional neural network architecture is
validation data, functions were implemented to load saved the most important aspect on which the overall success of a
paths and convert them to image data and resize images to project depends. This part of the paper was done using the
256×256, which will be required for later input to the initial transfer learning methodology described earlier in the paper,
layer of the neural network. which uses defined architectures available for use in order to
have a better model performance. The VGG16 architecture
3.3 Data Visualization was used in this paper. In this way, the neural network can
take advantage of all previously learned principles such as
This chapter elaborates a section based on the recognizing edges, angles, discoloration, etc., thus shortening
visualization of preprocessed data for an easier idea of the learning time and improving model efficiency.
dataset collected, the ratios between learning and validation The VGG16 architecture was loaded using a function
data, and the like. Graphics and diagrams to show the ratio from the Keras library. Another layer of average pooling
between the data were implemented using the Matplotlib layer was added to the defined VGG16 architecture. The last
library. layer of the neural network was changed, which defines the
A visualization feature that has been implemented offers number of classes of possible outputs, in our case 2, that is,
a graphical representation of the data. This function performs pneumonia and normal state of lungs. A softmax activation
visual processing of the data ratio, namely, the ratio between function has been added to the last layer, which will change
the learning and the validation sets and the number of images the output values depending on the ratio in values ranging
positively diagnosed for pneumonia in relation to x-ray from 0 to 1 in order to easily obtain a percentage of the
images of healthy lungs. prediction value.
A visual representation of the ratio of the number of The initial VGG16 architecture and the changes on the
learning data sets to the validation set: last layer are integrated into a single architecture, after which
the complete architecture is stored in a variable that will
execute the learning process. The functions for calculating
the error, the type of optimizer with the learning rate and the
values to be monitored during the iterations of the learning
process are defined, in our case the prediction accuracy and
the error value is included by default. The use of Tensorboard
enables a later overview of the learning flow and changes of
defined parameters using Tensorboard technology.
After completing all of the above procedures, the model
architecture has been successfully implemented and is ready
to perform a model training process that may take some time
depending on the computer configuration and the amount of
learning data.
The matrix shows a diagonal shape in darker colours, regional development fund and implemented within
which is a good sign since the confusion matrix with the Operational Programme Competitiveness and Cohesion 2014
strength of its diagonal shows the superiority of the learned – 2020, based on the call "Investing in Organizational
model. The total accuracy of the model is below the matrix Reform and Infrastructure in the Research, Development and
when all the images are combined together, and it shows the Innovation Sector".
94% accuracy already indicated.
5 REFERENCES
Acknowledgements