Academia.eduAcademia.edu

Deep Learning Pipeline for Image Classification on Mobile Phones

2022, Artificial Intelligence and Applications

This article proposes and documents a machine-learning framework and tutorial for classifying images using mobile phones. Compared to computers, the performance of deep learning model performance degrades when deployed on a mobile phone and requires a systematic approach to find a model that performs optimally on both computers and mobile phones. By following the proposed pipeline, which consists of various computational tools, simple procedural recipes, and technical considerations, one can bring the power of deep learning medical image classification to mobile devices, potentially unlocking new domains of applications. The pipeline is demonstrated on four different publicly available datasets: COVID X-rays, COVID CT scans, leaves, and colorectal cancer. We used two application development frameworks: TensorFlow Lite (real-time testing) and Flutter (digital image testing) to test the proposed pipeline. We found that transferring deep learning models to a mobile phone is limited by hardware and classification accuracy drops. To address this issue, we proposed this pipeline to find an optimized model for mobile phones. Finally, we discuss additional applications and computational concerns related to deploying deep-learning models on phones, including real-time analysis and image preprocessing. We believe the associated documentation and code can help physicians and medical experts develop medical image classification applications for distribution.

Deep learning pipeline for image classification on mobile phones Muhammad Muneeb, Samuel F. Feng, and Andreas Henschel Department of Mathematics and Department of Electrical Engineering and Computer Science, Khalifa University of Science and Technology, Abu Dhabi, UAE Abstract. This article proposes and documents a machine-learning framework and tutorial for classifying images using mobile phones. Compared to computers, the performance of deep learning model performance degrades when deployed on a mobile phone and requires a systematic approach to find a model that performs optimally on both computers and mobile phones. By following the proposed pipeline, which consists of various computational tools, simple procedural recipes, and technical considerations, one can bring the power of deep learning medical image classification to mobile devices, potentially unlocking new domains of applications. The pipeline is demonstrated on four different publicly available datasets: COVID X-rays, COVID CT scans, leaves, and colorectal cancer. We used two application development frameworks: TensorFlow Lite (real-time testing) and Flutter (digital image testing) to test the proposed pipeline. We found that transferring deep learning models to a mobile phone is limited by hardware and classification accuracy drops. To address this issue, we proposed this pipeline to find an optimized model for mobile phones. Finally, we discuss additional applications and computational concerns related to deploying deep-learning models on phones, including real-time analysis and image preprocessing. We believe the associated documentation and code can help physicians and medical experts develop medical image classification applications for distribution. Keywords: Image classification, machine learning, medical image classification, mobile phone application, cancer 1 Introduction Disease diagnosis plays a crucial role in healthcare, and for many conditions, medical image data can aid in diagnosing diseases [1,2,3,4,5,6,7,8]. Recent research has made these diagnostic tools more efficient, primarily through deep-learning methods [9]. However, these methods typically require relatively powerful and expensive computational hardware (e.g. modern GPUs), which may not be available in remote or poor areas of the world that lack modern infrastructure. This study proposes a pipeline for classifying images using mobile phones with a remote computer for model training. Even though a central server is required for training, there are free options [10,11] which are sufficient for training the model. Due to the widespread availability of mobile phones and applications, tasks such as image classification can now be completed at the point-of-care. The deployment of medical image classification models on mobile phones can lead to several technical issues. For example, the model’s performance when trained or tested on a computer degrades significantly when the same model is deployed on a mobile phone. Neural network models easily become large enough for phones’ memory, and images captured in real time can degrade classification performance. Furthermore, it is unclear how to build a proper workflow connecting training data, often from different sources with varying image sizes and quality, with a model deployed on a smartphone to aid diagnosis. We highlight these issues (and other) and provide feasible solutions to tackle them. The result is an amalgamation of best practices taken from the modern deep learning and data science landscape, including implementations for parameter reduction (Section 2.3) and data augmentation (Section 2.3), resulting in a cost-effective diagnosis pipeline that can be used for medical image classification aiding expert in the diagnosis of the diseases. The following section explains the differences and similarities between similar projects and the pipeline proposed in this study. David C. Wyld et al. (Eds): AIAPP, NLPML, DMA, CRIS, SEC, CoSIT, SIGL - 2022 pp. 01-20, 2022. CS & IT - CSCP 2022 DOI: 10.5121/csit.2022.120901 2 Computer Science & Information Technology (CS & IT) In this study, [12], researchers proposed a mobile-based deep learning application for image classification. However, they used Unity and MATLAB for implementation, whereas we used the Android Studio TensorFlow application development template and Python for model training. This study [13] investigated the usability of mobile phones for medical image classification. In this project [14,15], researchers designed an application for skin diseases, with attached hardware for accurate detection. In this paper [16], an artificial intelligence diagnostic system on mobile Android terminals for cholelithiasis disease is proposed. This article [17] discusses the various challenges that arise when deploying a machine learning model on mobile phones. This work [18] employs edge computing on a mobile device with an integrated web server to diagnose and forecast metastasis in histopathology pictures. Among the existing studies [19,14,20,13,21,22,15,12,23,24], researchers have developed a machine-learning pipeline and shed light on the medical and general images on mobile phone applications. However, they did not provide the source code and applications that can be used for result replication. Many research papers explain image classification on mobile phones, but the following are the reasons for reproducing existing work. – The existing papers do not shed light on real-time testing and the concerns that may arise when deploying a machine-learning model on mobile phones. For example, some android mobile phones require models having weights in specific data types (Float32 and Unsigned Int8), and if the model is trained on different data types, then the application crashes. – The existing papers do not include source code and documentation, which are essential for reproducing the results and developing an application for some other dataset. – The existing papers developed different pipelines for various images like plants, tissues, and X-ray/CT-Scan classification. However, we proposed a general application that works for any classification problem by including and excluding substeps in the pipeline. – We compared the model’s performance on the computer, mobile phone when pictures were loaded from the camera, and mobile phone in real-time. We believe there must be a generalized application that can capture images in real time and from mobile galleries and classify them into various categories; for that purpose, we used existing templates. Using such a template assists in classification problems involving birds, flowers, plants, objects, tissues, and lung cancer classification. We also analyzed the performance of the same model on real-time and digital images, which showed that the performance of the model was highly degraded in real-time analysis. Finally, we present a method for improving the model’s performance on a mobile phone. Section 2 provides the technical context and describes the entire seven-step pipeline process. Section 3 demonstrates the implementation of the pipeline on covid-19, plants, and cancer classification as a use case and discusses the model performance and technical considerations. Sections 4 and 5 contain a discussion and conclusion, including a link to all the data and codes to reproduce results. 2 A pipeline for image classification on mobile phones There are already several contexts in which deep learning or other classification models are deployed on mobile phones, including small-scale applications such as emoji selection from text [25] to larger-scale recommendation systems [26,27] and face detection from camera images [28,29]. Implementing image classification presents many challenges, such as ensuring that the image dimensions are the same for training and testing data. Furthermore, the image background/environment in which the model trained should be the same as in the field; otherwise, the model might be trained Computer Science & Information Technology (CS & IT) 3 to see spurious details in the image background, leading to incorrect results. One must also ensure that the model is implemented efficiently to fit inside mobile memory, often forcing reductions in the model size that can sacrifice accuracy. Finally, if one wants to leverage publicly available medical image data in a mobile context, the details of implementing best practices are unclear. Our answer to all these considerations is a machine-learning pipeline, illustrated in Figure 1 and described in this section. A pipeline is only as good as its data, and starting, one must obtain data suitable for model training and identify the end-user’s mobile devices. As is standard in supervised learning, the training data must be labeled; for medical images, this is typically performed by a domain expert [30]. Many such medical image datasets are available in the public domain [31,32], and one should verify labels with the help of a local physician. 2.1 Step 1: Choose different images sizes and generate sub-datasets The first step is preprocessing, which consists of image rescaling, normalization, and image resizing [33], and organizing the data appropriately for later analysis. The user selects a small number of different image sizes for testing. Extra testing in this first step will help avoid excess model parameters, which might crash the mobile application in a later stage. In practical applications, we recommend choosing 3-5 different sizes ranging from 30 x 30 to 500 x 500 as sufficient, and these choices may depend on the expected aspect ratios of the training data. During the model validation in step 3 2.3, one of these image sizes will be selected automatically depending on the model performance. Figure 2 shows the subdatasets having different image sizes generated from the original dataset. 2.2 Step 2: Data splitting for validation It is essential to use stratified-k-fold validation for each image size to avoid over-and under-fitting during the training [34]. This ensures that the folds are chosen such that the mean response value is equal across all folds, ultimately decreasing model bias. Our recommendation is to start with k=5, which repeatedly trains models using 80 percent of the original data and uses the other 20 percent to evaluate model performance. As is typical with cross-validation, one then systematically trains the model on 4 of the 5 folds and uses the held out to assess the model performance, and the results are averaged to estimate overall model performance. 2.3 Step 3: Model Architecture and Training Image analysis using deep learning methods is a rapidly growing field with many algorithms competing over a wide variety of applications (e.g. LSTM and RCNN) [35]. For medical image classification, Convolutional Neural Networks (CNN) are the most popular [36]. Convolution is the process of multiplying pixel values by weights and summing them. The first layer of the CNN frequently detects essential characteristics such as horizontal, vertical, and diagonal edges. The first layer’s output is then sent to the second layer, which extracts more complex features like corners and edge combinations. Subsequent layers recognize higher-level characteristics such as objects and faces [37]. Based on the activation map of the last convolution layer, the terminal layer outputs a series of confidence ratings (numbers ranging from 0 to 1) that indicate how probable the image belongs to a specific class. Models are ultimately fit in Keras using model.fit. But before training the model, we must address a few key considerations. First, choose the number of parameters or neurons in each layer and the number of layers. If there are too many layers, then there is a possibility that the model 4 Computer Science & Information Technology (CS & IT) Fig. 1: This diagram shows the overall pipeline for implementing deep learning image classification model for mobile phone devices. may overfit. If there are too few layers, the model may not learn applicable features. Second, reduce the number of parameters (one could imagine a systematic dropout algorithm that trade-off model size and accuracy) to enable execution within a mobile phone’s memory [38], while also minimizing any associated performance penalties. This is another step that is typically adjusted “by hand,” and as a starting point, we recommend following the steps taken in our use case below Computer Science & Information Technology (CS & IT) 5 Fig. 2: Choose different images sizes and generate sub-datasets. in Section 3. Other key considerations before model fitting are included in this subsection and are clearly implemented in the shared code linked below Section 3. Data augmentation Data augmentation is another key step for maintaining model performance in real-world settings. It artificially stimulates and induces noise and other transformations on the training images when fitting the model [39,40]. In unpredictably noisy applications like ours using potentially multiple mobile phone cameras and hospital settings, such steps are essential. Parameters for the data augmentation process are controlled via an image train generator and must also be specified. More specifically, when dealing with the medical images captured on mobile phones in real-time, it is essential to consider all options and possibilities with data augmentation, eventually settling on a few different combinations of settings. The key data augmentation parameters in our context are: – rescale: Rescaling (normalizing the pixels values in the image) is the default parameter used in image preprocessing for training the model. Original pictures are in RGB format, with pixel values ranging from 0-255. Such numbers would be too large for the model to cope, resulting in an inflating gradient during the backpropagation phase when training the model. So we multiply the data by 1/255 to change the pixels values between 0-1. – rotation range, horizontal flip, and vertical flip: These parameters randomly rotate and flip training data and should be included because mobile photos can be taken in various rotations. The rotated image is appended with pixels that degrade the classification accuracy in the data augmentation phase. For medical images, the horizontal or vertical flip is fine. However, when the image is rotated, we lose a vast amount of information, depending on the rotation range. One vital point to notice here is that rotated images generated by train generators have appended pixels and can degrade performance. In contrast, images captured by rotating phone cameras do contain all the information. – brightness range: The brightness range in the train generator increases or decreases the image’s color brightness to produce multiple images. When the application is deployed in real-time, there is a possibility that the image’s brightness captured from a mobile phone is different from the one on which the model is trained. So, this parameter is compulsory in the data augmentation phase to train the model on images of varying brightness. – zoom range, shear range, width shift range, and height shift range: Zoom range randomly zooms inside pictures, shear range applies shearing transformations to image, width, and height shift change image dimension horizontally and vertically [41]. In a typical image classification task, these parameters can play an essential role in making the model robust. In the case of medical images, it is possible that if these factors are added, the 6 Computer Science & Information Technology (CS & IT) model’s performance will suffer, as evidenced by the findings. In medical images, the difference between the positive and the negative case is subtle. For example, in the dog/cat classification problem, we can distinguish them quickly, but in cases/controls, there is just a white pattern in the image. The transformed image produced using these parameters can convert the positive case into negative and vice versa. Table 1 contains our recommendation of 4 train generators and their respective parameter settings in Keras. Figure 3 shows the resulting directory structure after following the above steps. Fig. 3: This diagram shows a directory structure for training the model. Generate images with multiple sizes and make a folder for each size. There can be various train generators for each image size, which do not require a separate folder. For each image size, make one folder for each fold. To use data augmentation or train generator, make three folders (train, test, validation) containing sub-folders representing each category. Parameter reduction Once one identifies the best model by Stratified-k-fold validation, we must reduce parameters to avoid crashing the mobile application. Hyper-parameter tuning does not, but the number of layers and neurons in each layer affects the model’s size. Consider the best model from the previous step has two layers with N filters in the first (convolution layer) and M neurons in the second layer (fully connected layer). Increase filters from 1 to N for the first layer, 1 to M (Neurons) for the second layer, calculate model accuracy and the number of parameters in the model. The number of filters or neurons, the size of the filter, dropout, and strong pooling can be used to reduce the model size. This step is time-consuming, but once the model with fewer parameters is achieved, it can also be deployed on other devices with low computation power like the Raspberry pi. This model contains a compact form of knowledge learned by the model having high parameters. Computer Science & Information Technology (CS & IT) Image Generator Parameters Generator 1 rescale = 1./255 ✓ rotation range = 40 brightness range = [0.2,1.0] horizontal flip = True vertical flip = True fill mode = ’nearest’ featurewise std normalization = True featurewise center = True zoom range = 0.2 shear range = 0.2 width shift range = 0.2 height shift range = 0.2 Generator 2 ✓ ✓ ✓ ✓ ✓ ✓ Generator 3 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 7 Generator 4 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Table 1: Recommendations for data augmentation settings in Keras image data processing. The first column shows parameter settings, and ✓in subsequent columns denotes inclusion in the 4 recommended image generators. 2.4 Step 4: Convert model environment to TensorFlow Lite At this step, we have two options: The first is to convert the model to TensorFlow Lite, and the second is to convert the model to TensorFlow Lite and quantize the model (A quantized model executes some or all of the operations on tensors with integers rather than floating-point values) [42]. 2.5 Step 5: Specify appropriate metadata Metadata gives information about the model in addition to its fit weights and architecture. It includes the model name, the input size (image size), and the output size (# of categories), and must be specified. Table 2 gives a starting point for metadata settings. After this step, we have two files: the TensorFlow Lite model and a label text file. One should also verify that the order of categories in the label text file matches the model’s prediction order. name version image width image height image min image max mean std num classes author model’s name v1 50 50 0 1 [0] [255] 2 X Table 2: Model’s metadata parameters and dummy values 2.6 Step 6: Specify appropriate application development platform Different application development platforms can be used, like a Flutter (Quantized TensorFlow Lite model) or Android Studio (TensorFlow Lite model). An already developed application template can build deep learning applications in this stage. Explore several options and 8 Computer Science & Information Technology (CS & IT) decide on the flow of the application. For example, images can be classified in real-time using a camera feed or captured images. This step is related to application development and will not affect the final result. 2.7 Execution and final considerations At this point, we have a model deployed on potentially multiple mobile phones capable of medical image classification. Select 3 to 5 images from all categories and test the TensorFlow Lite (Quantized/ Unqunatized) model performance on the computer, and that accuracy is the baseline accuracy. For mobile phones, we recommend testing the final model in two ways. The first is to build the application and load those images from the mobile gallery for evaluation (Flutter-based template) using a quantized model. The second option is real-time processing (TensorFlow template) using an unquantized model. Just like preprocessing is required to train the model on a computer, there is also a preprocessing engine in mobile to preprocess the images, so there can be severe issues when images are tested from the camera feed. The first is the underlying hardware, and the result for the same image using the same model on a computer and mobile phone can yield different results, and there is no solution to mitigate this problem. The sizes of the images can vary (See figures 4a and 4b), the distance at which the phone should be placed to classify the image can vary, and lastly, the location of images when captured through the phone. If the size of the image varies, then mobile phone distance can be changed such that the frame contains the image. We address each issue separately (See figure 4). Lastly, the background of images can vary (See figures 4c and 4d). If the model’s accuracy when pictures are loaded from the gallery is low, then the model will not perform when pictures are captured and classified in real-time. So, it is recommended to test the model performance on flutter application before shifting it to real-time. (a) (b) (c) (d) Fig. 4: Panels 4a/4b: Inconsistent images of Hornbeam. Panels 4c/4d: Difference of background of various catagories. Figure 5 shows how the TensorFlow Lite application template perceives an image. If classification accuracy is not sufficiently high, return to step 2.3 and repeat the process. To enhance performance, modify the picture size, machine learning model (number of neurons and layers), and hyper-parameters. This final step of testing on additional images is essential for robust performance, and even though the model will run without it, we do not recommend skipping it. Figure 6 shows the possibilities in which the proposed pipeline can be used depending on the type of the dataset. Computer Science & Information Technology (CS & IT) 9 Fig. 5: This diagram shows how real-time picture is perceived by mobile phone. The width of the frame in the TensorFlow Lite template is 480 pixels, and the height is 640 pixels. The captured picture is cropped to 480 by 480, as shown by the black box in the middle. The two yellow lines are equal and show that a particular region is ignored. 3 Use Cases Medical images have the potential to contribute as a less expensive and more rapid option that can deliver results in minutes instead of days [43]. The added value is not a replacement for the clinical diagnosis of but to rapidly augment physician information with an uncertain and rapidly evolving virus, thereby improving patient care and outcomes. The system specifications for the following results are: Intel(R) Core(TM) 7-9750H CPU @ 2.60Hz, 16 GB RAM with NVIDIA GeForce RTX 2060 GPU, running Microsoft Windows 10. The development specifications are Cuda compilation tools release 10.0, V10.0.130, Deep Learning framework Keras 2.4.3, Python 3.6.8, and Tensorflow 2.3.1. The mobile phone specifications are a HUAWEI Y7 Prime 2019, Android version 8.1.0, EMUI version 8.2.0, and Model number DUB-LX1. 3.1 Dataset 1: Chest X-ray images Dataset 1 consisted of images of size (1024,1024,3), which we reduced (Step 1, section 2.1) to (50, 50, 3), (100, 100, 3), (200, 200, 3), and (300, 300, 3). Data splitting for Stratified-5-fold validation (Step 2, section 2.2) resulting in training data (70%), validation data (10%), and test data (20%). The model architecture and training (Step 3, section 2.3) used a CNN with the architecture and hyper-parameters shown in table 9 and 10. This data already contained augmented positive/negative X-ray scan images, so we skipped data augmentation (Section 2.3). Cross-validation, with normal images and augmented images are distributed randomly in the train, validation, and test sets, produced the classification results in Table 3 with image size of 200 yielding the highest accuracy. Parameter reduction (Section 2.3) yielded a model with 4 filters in layer 1 (Convolution layer) and 8 neurons (Fully connected layer) in layer 2, resulting in a reduction in model size from 2,374KB to 24KB. Table 3 shows that the best performance was found with an image size of 200. Taken together with the results from parameter reduction, the model parameters for metadata were finalized and are shown in Table 4. 10 Computer Science & Information Technology (CS & IT) Fig. 6: This figure shows the four possibilities of using the proposed pipeline depending on the type of images. The grayed boxes skip a particular skip step for a particular images type. For custom datasets, all steps are compulsory. If the existing model is used, there is no need for parameter reduction because the model’s size is already optimal. When images are loaded from the gallery image generator, digital images will not increase the accuracy. For digital and medical images like CT scans, the image’s size is mostly fixed. If the model’s performance when tested on a mobile phone is low, tune model architecture and perform hyper-parameter optimization by mutating batch size, number of epochs, and the number of layers in the model. Size 50 100 200 300 Dataset 1 Accuracy 95.88 (+- 1.80) 95.77(+- 1.65) 96.54(+- 1.87) 59.80(+- 19.41) Table 3: Dataset 1 [44] cross validation accuracy for different image sizes. This reduced model was converted to TensorFlow lite (Section, 2.4), and the associated code snippets and scripts can be found in this paper’s code repository. These steps involved running the TensorFlow lite converter, generating an appropriate label file with classes, and adding the Table 4 metadata to the TensorFlow Lite model. This resulted in a model with well-defined input size, range of input values, and output (see Table 4). Then, the TensorFlow Lite Android wrapper code generator was used to create platform-specific wrapper code, which efficiently deploys and executes the model code on the mobile phone [45]. Next, we elected to use the Flutter image classification template (Section 2.6), allowing users to capture an image with their camera and pass it from their phone’s image gallery application to the model, which then gives the positive or negative prediction. Computer Science & Information Technology (CS & IT) name version image width image height image min image max mean std num classes author 11 Dataset 1 Model v1 200 200 0 1 [0] [255] 2 X Table 4: Model’s metadata parameters for the best model of dataset 1. 3.2 Dataset 2: Chest CT images Dataset 2 [46] contained CT-scan images of size (1024, 1024, 3), which we reduced (Step 1, section 2.1) to (30, 30, 3), (50, 50, 3), (100, 100, 3), (200, 200, 3), and (300, 300, 3). The model architecture and training (Step 3, section 2.3) used two CNN models shown in Table 11 (Model 1) and 13 (Model 2) with the data augmentation, and cross-validation to train the model. Table 5 shows the result of classification for dataset 2. DataSet 2 / Model 1 Shape=30 Shape=50 Shape=100 Shape=200 Shape=300 DataSet 2 / Model 2 Shape=30 Shape=50 Shape=100 Shape=200 Shape=300 Gen1 63.11(+- 2.98) 63.93(+- 4.33) 65.82(+- 4.63) 59.56(+- 3.58) 60.16(+- 9.06) Gen1 50.95(+- 6.31) 63.98(+- 1.88) 59.18(+- 3.80) 62.94(+- 5.49) 60.74(+- 6.91) Gen2 59.69(+- 6.21) 57.82(+- 2.84) 62.85(+- 4.13) 56.23(+- 7.67) 52.37(+- 6.17) Gen2 54.89(+- 5.13) 60.41(+- 3.96) 55.30(+- 4.00) 53.93(+- 4.03) 50.25(+- 3.97) Gen3 61.05(+- 6.48) 59.60(+- 2.80) 57.65(+- 2.70) 54.22(+- 5.46) 56.04(+- 6.30) Gen3 60.68(+- 3.05) 58.87(+- 4.76) 56.45(+- 3.53) 56.30(+- 3.41) 51.92(+- 0.81) Gen4 59.20(+- 5.11) 57.87(+- 6.51) 53.41(+- 6.60) 53.56(+- 3.58) 55.08(+- 2.91) Gen4 55.77(+- 2.27) 56.13(+- 5.77) 55.41(+- 10.60) 50.39(+- 4.00) 53.33(+- 5.81) Table 5: We used 5-fold validation, 5 different sizes of input images, and 4 generators. If the accuracy is too low, try multiple models. The best accuracy for dataset 2 was for model 1 with image input size 100, model 1, and train generator 1. Parameter reduction (sub-section, 2.3) reduced the size of the model from 2,374KB to 323KB. Figure 9 (See supplementary material, 6) shows the heatmap of accuracies for each combination of filters and neurons in the first layer and the second layer. Convert model environment to TensorFlow lite version (Step 4, section 2.4). Specify appropriate metadata (Step 5, section 2.5) adds metadata to the TfLite version of the model. The best accuracy for dataset 2 was for model 1, image size 100, so only change the metadata fields mentioned in Table 6. name Dataset 2 Model image width 100 image height 100 Table 6: Model’s metadata parameters for the best model of dataset 2. Computer Science & Information Technology (CS & IT) 12 After this step, we have a label file and a model with metadata. Specify appropriate application development (Section, 2.6) used flutter image classification template. Examples of the final application at work can be seen in Figures 7a and 7b. In execution and final considerations (Step 7, section 2.7), we considered the first 5 images from both (CT COVID and CT nonCOVID) categories for the final testing on mobile phone, and those images were not part of model training/testing. We tested the model performance on the mobile phone in real-time, and the final accuracy was 0.6, which means the model cannot be deployed, but the performance increased to 0.80 percent when images were loaded from the gallery (Flutter App). 3.3 Dataset 3: Leaves The dataset [47] consists of five leaves: Ash (24), Beech (28), Hornbeam (30), Mountainoak (20), and Sycamoremaple(20). Dataset is reduced (Step 1, section 2.1) to (50, 50, 3), (100, 100, 3), (200, 200, 3), and (300, 300, 3). The model architecture and training (Step 3, section 2.3) used two CNN models shown in Table 11 (Model 1) and Table 12 (Model 2) with the data augmentation, and cross-validation to train the model. Table 7 shows the result of classification for dataset 3. DataSet 3 Gen1 Gen2 Gen3 Gen4 50 - Model 1 87.13 (+- 19.63) 95.03 (+- 4.87) 96.73 (+- 1.64) 92.67 (+- 2.94) 100 - Model 2 86.57 (+- 15.74) 91.28 (+- 7.73) 94.09 (+- 7.36) 91.28 (+- 8.18) 200 - Model 2 98.04 (+- 2.39) 99.04 (+- 1.90) 96.04 (+- 3.73) 94.14 (+- 5.61) 300 - Model 2 97.047 (+- 2.41) 96.04 (+- 1.97) 99.047 (+- 1.90) 91.33 (+- 12.92) Table 7: We used 5-fold validation, 4 different sizes of input images, and 4 generators. Parameter reduction (sub-section, 2.3) is skipped. Convert model environment to TensorFlow lite version (Step 4, section 2.4) and produce two files: model.tflite and quantizedmodel.tflite. Specify appropriate metadata (Step 5, section 2.5) adds metadata to the TfLite version of the model. After this step, we have a label file and a model with meta data. model.tflite is deployed on TensorFlow lite template and quantizedmodel.tflite is deployed on Flutter template (Section, 2.6). In execution and final considerations (Step 7, section 2.7), we considered the first 4 images from all categories for the final testing on mobile phone, and those images were not part of model training/testing. We tested the model performance on the mobile phone when the image size was 50 for generator 3, but the final accuracy was 0.2, which means the model cannot be deployed. At this stage, repeat the process with variation in the model architecture (Step 2, section 2.2) and test the model performance. For shape 224, generator 1, and model 2 (See table 12), the performance increased to 0.75 percent when images were loaded from the gallery (Flutter App). For shape 224, generator 3, and model 2 (See table 12), the performance was 0.75 in real-time (TensorFlow App). 3.4 Dataset 4: Colorectal adenocarcinoma This is a set of 7180 image patches (9 different categories) from N=50 patients with colorectal adenocarcinoma [48]. The dataset is challenging to train, test, deploy on the phone, and real testing. Dataset is reduced (Step 1, section 2.1) to (50, 50, 3), (100, 100, 3), (200, 200, 3), and (300, 300, 3). The model architecture and training (Step 3, section 2.3) used one CNN models shown in Table Computer Science & Information Technology (CS & IT) DataSet 4 Gen1 Gen2 Gen3 Gen4 50 73.19 (+- 6.27) 69.19 (+- 10.24) 68.40 (+- 4.96) 70.0 (+- 6.32) 100 71.20 (+- 4.11) 75.59 (+- 3.20) 70.00 (+- 4.89) 67.20 (+- 7.44) 200 18.00 (+- 9.54) 22.79 (+- 3.91) 17.20 (+- 4.11) 20.79 (+- 3.24) 13 300 16.80 (+- 8.81) 22.39 (+- 2.65) 20.39 (+- 3.44) 22.40 (+- 6.49) Table 8: Cancer classification result. 12 (Model 1) with the data augmentation, and cross-validation to train the model. Table 8 shows the result of classification for dataset 4. Parameter reduction (sub-section, 2.3) is skipped. Convert model environment to TensorFlow lite version (Step 4, section 2.4) and produce two files: model.tflite and quantizedmodel.tflite. Specify appropriate metadata (Step 5, section 2.5) adds metadata to the TfLite version of the model. After this step, we have a label file and a model with meta data. model.tflite is deployed on TensorFlow lite template and quantizedmodel.tflite is deployed on Flutter template (Section, 2.6). In execution and final considerations (Step 7, section 2.7), we considered the first 4 images from all categories for the final testing on mobile phone, and those images were not part of model training/testing. We tested the model performance on the mobile phone in real-time, and the final accuracy was 0.2, which means the model cannot be deployed. We increased the image size to 200, and the performance increased to 0.56 percent when images were loaded from the gallery (Flutter App). One point to notice here is the test accuracy was 0.99 on the computer. (a) (b) Fig. 7: Screenshot of application 1 7a 7b for covid detection using CT scan, deployed on android phone. Panels 7a/7b: Covid negative/positive Chest CT-scans, respectively. The following paragraph elaborates the time to execute the pipeline. For reading images (Step 1, 2.1), write a script, and it may take an average time of 20 - 30 minutes, depending on the way the dataset is stored. Below we calculated the time to train the model for images of various sizes for dataset 2, containing about 744 images. Time to train machine learning model (Step 3, 2.3) was 10, 17, 50, 180, and 600 minutes (total time 14 hours) for images having dimensions 30, 50, 100, 200, and 300. The parameter reduction step (Step 4, 2.4) Computer Science & Information Technology (CS & IT) 14 is time-consuming, and for one model having two layers, it took about 1 to 2 days. Converting the model to TensorFlow lite and adding metadata (Step 5, 2.5) takes about 5 minutes. Modifying the image classification template (Step 6, 2.6) for a specific dataset will take about 30 minutes. Building and deploying the application will take about 6 hours for one phone. So the total time for running the pipeline for one dataset is about 3 days. 4 Discussion This section contains the limitations and future directions of the proposed pipeline. There are a few limitations associated with the proposed approach. For example, we considered only android phones (a specific vendor) running a particular version of the Android operating system. There is a high probability that the final classification performance would be identical for phones running the Android operating system due to the same android operating system, but for iPhone or Raspberry Pi, the results may vary. Such applications can also be developed for the iPhone using a different application development framework, one of the future directions for the proposed framework. 5 Conclusion This study proposed a pipeline for deploying a deep learning model for medical image classification on mobile phones. The scope of the solution is not only limited to covid-19, but we can also use it for breast cancer or any other medical dataset, making the mobile phone a diagnostic tool for medical images classification. Complex models and other high-end application development skills can also lead to image segmentation. It is essential to highlight the usability and application of the proposed pipeline. Imagine traveling in a deep forest, which has insects and plants that can cause rashes. If someone suffers from skin allergies from a plant, they cannot call a doctor or find any medical assistance. Nevertheless, having a mobile phone application can tell which medicine is appropriate for a particular injury. Such applications can empower doctors in clinical settings where they may require knowledge from other sources to diagnose some diseases better. It will also give access to the public to use that application because there are about 4.3 billion people who use mobile phones. Computer Science & Information Technology (CS & IT) 6 15 Supplementary information The documentation associated with the manuscript is available at the following link. https: //github.com/MuhammadMuneeb007/A-pipeline-for-image-classificati on-using-deep-learning-on-mobile-phones The code segments associated with the documentation are available at the following link. ht tps://muhammadmuneeb007.github.io/A-pipeline-for-image-classific ation-using-deep-learning-on-mobile-phones/Find%20a%20dataset.ht ml#directory-form The files, associated applications, and directories are available at the following link. https: //1drv.ms/u/s!AlFVll05llt7gwheV2i4SN3rba13?e=My0QKq This section contains the material referenced in the section 3. Model 1 architecture for dataset 1 Layers Parameters 30 Filters * Layer 1 - Con2D (kernel size = (3,3)) Layer 2 - MaxPool2D (pool size = (2,2)) Reshape Layer 3 - FullyConnected (50 Neurons) Relu Layer 4 - FullyConnected (2 Neurons) Softmax - Table 9: Model 1 architecture for dataset 1. Model’s Hyper-parameters Hyper-parameters Value Batch size 10 Epochs 50 Validation size 10% Optimizer SGD Loss categorical/binary crossentropy Metrics Accuracy Table 10: Hyper-parameters for all dataset 1 and 2. Model 1 architecture for dataset 2 Layers Parameters 32 Filters * Layer 1 - Con2D (kernel size = (3,3)) Layer 2 - MaxPool2D (pool size = (2,2)) Reshape Layer 3 - FullyConnected (128 Neurons) Relu Layer 4 - FullyConnected (2 Neurons) Softmax - Table 11: Model 1 architecture for dataset 2. 16 Computer Science & Information Technology (CS & IT) Model 2 architecture for dataset 3 Layers Parameters 32 Filters * Layer 1 - Con2D (kernel size = (3,3)) Layer 2 - MaxPool2D (pool size = (2,2)) 32 Filters * Layer 3 - Con2D (kernel size = (3,3)) Layer 4 - MaxPool2D (pool size = (2,2)) 32 Filters * Layer 5 - Con2D (kernel size = (3,3)) Layer 6 - MaxPool2D (pool size = (2,2)) 64 Filters * Layer 7 - Con2D (kernel size = (3,3)) Layer 8 - MaxPool2D (pool size = (2,2)) Reshape Layer 9 - FullyConnected (128 Neurons) Relu Layer 10 - FullyConnected (50 Neurons) Relu Layer 11 - FullyConnected (20 Neurons) Relu Layer 12 - FullyConnected (Categories Neurons) Softmax - Table 12: Model 2 architecture for dataset 3 and 4. We increased the number of convolutional layers to extract the information because the model 13 did not work. Model 2 architecture for dataset 2 Layers Parameters 32 Filters * Layer 1 - Con2D (kernel size = (3,3)) Layer 2 - MaxPool2D (pool size = (2,2)) 64 Filters * Layer 3 - Con2D (kernel size = (3,3)) Layer 4 - MaxPool2D (pool size = (2,2)) Reshape Layer 5 - FullyConnected (256 Neurons) Relu Layer 6 - FullyConnected (128 Neurons) Relu Softmax - Table 13: Model 2 architecture for dataset 2. Acknowledgments This publication is based upon work supported by the Khalifa University of Science and Technology under Award No. CIRA-2019-050 to SFF. References 1. S. Mitra and B. U. Shankar, “Medical image analysis for cancer management in natural computing framework,” Information Sciences, vol. 306, pp. 111–131, Jun. 2015. [Online]. Available: https://doi.org/10.1016/j.ins.2015.0 2.015 2. P. D. Velusamy and P. Karandharaj, “Medical image processing schemes for cancer detection: A survey,” in 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE). IEEE, Mar. 2014. [Online]. Available: https://doi.org/10.1109/icgccee.2014.6922267 Computer Science & Information Technology (CS & IT) 17 Fig. 8: Heatmap of accuracies. Y-axis and X-axis represent the number of neurons in the first layer and the second layer. 3. S. Bauer, R. Wiest, L.-P. Nolte, and M. Reyes, “A survey of MRI-based medical image analysis for brain tumor studies,” Physics in Medicine and Biology, vol. 58, no. 13, pp. R97–R129, Jun. 2013. [Online]. Available: https://doi.org/10.1088/0031-9155/58/13/r97 4. W. Lin, T. Tong, Q. Gao, D. Guo, X. Du, Y. Yang, G. Guo, M. Xiao, M. Du, and X. Q. and, “Convolutional neural networks-based MRI image analysis for the alzheimer’s disease prediction from mild cognitive impairment,” Frontiers in Neuroscience, vol. 12, Nov. 2018. [Online]. Available: https://doi.org/10.3389/fnins.2018.00777 5. J. Islam and Y. Zhang, “Brain MRI analysis for alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks,” Brain Informatics, vol. 5, no. 2, May 2018. [Online]. Available: https://doi.org/10.1186/s40708-018-0080-3 6. S. Bhattacharya, P. K. R. Maddikunta, Q.-V. Pham, T. R. Gadekallu, S. R. K. S, C. L. Chowdhary, M. Alazab, and M. J. Piran, “Deep learning and medical image processing for coronavirus (COVID-19) pandemic: A survey,” Sustainable Cities and Society, vol. 65, p. 102589, Feb. 2021. [Online]. Available: https://doi.org/10.1016/j.scs.2020.102589 7. S. Liang, H. Liu, Y. Gu, X. Guo, H. Li, L. Li, Z. Wu, M. Liu, and L. Tao, “Fast automated detection of COVID-19 from medical images using convolutional neural networks,” Communications Biology, vol. 4, no. 1, Jan. 2021. [Online]. Available: https://doi.org/10.1038/s42003-020-01535-7 8. T. A. Soomro, L. Zheng, A. J. Afifi, A. Ali, M. Yin, and J. Gao, “Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): a detailed review with direction for future research,” Artificial Intelligence Review, Apr. 2021. [Online]. Available: https://doi.org/10.1007/s10462-021-09985-z 9. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical Image Analysis, vol. 42, pp. 60–88, Dec. 2017. [Online]. Available: https://doi.org/10.1016/j.media.2017.07.005 10. Reference, “Welcome to colaboratory - colaboratory,” https://colab.research.google.com/notebooks/intro.ipynb, (Accessed on 08/16/2021). 11. ——, “Microsoft azure notebooks,” https://notebooks.azure.com/, (Accessed on 08/16/2021). 12. T. V. Dittimi and C. Y. Suen, “Mobile phone based ensemble classification of deep learned feature for medical image analysis,” IETE Technical Review, vol. 37, no. 2, pp. 157–168, Feb. 2019. [Online]. Available: https://doi.org/10.1080/02564602.2019.1576550 13. B. Hunt, A. J. Ruiz, and B. W. Pogue, “Smartphone-based imaging systems for medical applications: a critical review,” Journal of Biomedical Optics, vol. 26, no. 04, Apr. 2021. [Online]. Available: https://doi.org/10.1117/1.jbo.26.4.040902 14. N.-M. Cheung, V. Pomponiu, D. Toan, and H. Nejati, “Mobile image analysis for medical applications,” SPIE Newsroom, Jul. 2015. [Online]. Available: https://doi.org/10.1117/2.1201506.005997 15. A. Karargyris, O. Karargyris, and A. Pantelopoulos, “DERMA/care: An advanced image-processing mobile application for monitoring skin cancer,” in 2012 IEEE 24th International Conference on Tools with Artificial Intelligence. IEEE, Nov. 2012. [Online]. Available: https://doi.org/10.1109/ictai.2012.180 18 Computer Science & Information Technology (CS & IT) Fig. 9: Heatmap of accuracies. Y-axis and X-axis represent the number of neurons in the first layer and the second layer. 16. S. Pang, S. Wang, A. Rodrı́guez-Patón, P. Li, and X. Wang, “An artificial intelligent diagnostic system on mobile android terminals for cholelithiasis by lightweight convolutional neural network,” PLOS ONE, vol. 14, no. 9, p. e0221720, Sep. 2019. [Online]. Available: https://doi.org/10.1371/journal.pone.0221720 17. M. Curiel and L. Flórez-Valencia, “Challenges in processing medical images in mobile devices,” in Trends and Advancements of Image Processing and Its Applications. Springer International Publishing, Nov. 2021, pp. 31–51. [Online]. Available: https://doi.org/10.1007/978-3-030-75945-2 2 18. A. Johny and K. N. Madhusoodanan, “Edge computing using embedded webserver with mobile device for diagnosis and prediction of metastasis in histopathological images,” International Journal of Computational Intelligence Systems, vol. 14, no. 1, Nov. 2021. [Online]. Available: https://doi.org/10.1007/s44196-021-00040-x 19. C. Morikawa, M. Kobayashi, M. Satoh, Y. Kuroda, T. Inomata, H. Matsuo, T. Miura, and M. Hilaga, “Image and video processing on mobile devices: a survey,” The Visual Computer, vol. 37, no. 12, pp. 2931–2949, Jun. 2021. [Online]. Available: https://doi.org/10.1007/s00371-021-02200-8 20. R. Rajendran and J. Rajendiran, “Image analysis using smartphones for medical applications: A survey,” pp. 275–290, Jul. 2019. [Online]. Available: https://doi.org/10.1002/9781119439004.ch12 21. J. K. Carroll, A. Moorhead, R. Bond, W. G. LeBlanc, R. J. Petrella, and K. Fiscella, “Who uses mobile phone health apps and does use matter? a secondary data analytics approach,” Journal of Medical Internet Research, vol. 19, no. 4, p. e125, Apr. 2017. [Online]. Available: https://doi.org/10.2196/jmir.5604 22. Z. F. khan and S. R. Alotaibi, “Applications of artificial intelligence and big data analytics in m-health: A healthcare system perspective,” Journal of Healthcare Engineering, vol. 2020, pp. 1–15, Sep. 2020. [Online]. Available: https://doi.org/10.1155/2020/8894694 Computer Science & Information Technology (CS & IT) 19 23. M. Straczkiewicz, P. James, and J.-P. Onnela, “A systematic review of smartphone-based human activity recognition methods for health research,” npj Digital Medicine, vol. 4, no. 1, Oct. 2021. [Online]. Available: https://doi.org/10.1038/s41746-021-00514-4 24. K. Nwe, M. E. Larsen, N. Nelissen, and D. C.-W. Wong, “Medical mobile app classification using the national institute for health and care excellence evidence standards framework for digital health technologies: Interrater reliability study,” Journal of Medical Internet Research, vol. 22, no. 6, p. e17457, Jun. 2020. [Online]. Available: https://doi.org/10.2196/17457 25. Q. Bai, Q. Dan, Z. Mu, and M. Yang, “A systematic review of emoji: Current research and future perspectives,” Frontiers in Psychology, vol. 10, Oct. 2019. [Online]. Available: https://doi.org/10.3389/fpsyg.2019.02221 26. R. C. Jisha, J. M. Amrita, A. R. Vijay, and G. S. Indhu, “Mobile app recommendation system using machine learning classification,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 2020, pp. 940–943. 27. K. Zhu, Z. Liu, L. Zhang, and X. Gu, “A mobile application recommendation framework by exploiting personal preference with constraints,” Mobile Information Systems, vol. 2017, pp. 1–9, 2017. [Online]. Available: https://doi.org/10.1155/2017/4542326 28. K. Choi, K.-A. Toh, and H. Byun, “Realtime training on mobile devices for face recognition applications,” Pattern Recognition, vol. 44, no. 2, pp. 386–400, Feb. 2011. [Online]. Available: https://doi.org/10.1016/j.patcog.2010.08 .009 29. B. Rı́os-Sánchez, D. C. da Silva, N. Martı́n-Yuste, and C. Sánchez-Ávila, “Deep learning for face recognition on mobile devices,” IET Biometrics, vol. 9, no. 3, pp. 109–117, Feb. 2020. [Online]. Available: https://doi.org/10.1049/iet-bmt.2019.0093 30. D. M. Hedderich and S. B. Eickhoff, “Machine learning for psychiatry: getting doctors at the black box?” Molecular Psychiatry, vol. 26, no. 1, pp. 23–25, Nov. 2020. [Online]. Available: https: //doi.org/10.1038/s41380-020-00931-z 31. Reference, “aylward.org - open-access medical image repositories,” https://www.aylward.org/notes/open-accessmedical-image-repositories, (Accessed on 08/18/2021). 32. ——, “Smir - sicas medical image repository,” https://www.smir.ch/, (Accessed on 08/18/2021). 33. S. S. Nath, G. Mishra, J. Kar, S. Chakraborty, and N. Dey, “A survey of image classification methods and techniques,” in 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT). IEEE, Jul. 2014. [Online]. Available: https://doi.org/10.1109/iccicct.2014.6993023 34. B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete bayesian network analysis?” Computational Statistics, vol. 36, no. 3, pp. 2009–2031, Jun. 2020. [Online]. Available: https://doi.org/10.1007/s00180-020-00999-9 35. Y. Aslam and S. N, “A review of deep learning approaches for image analysis,” in 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT), 2019, pp. 709–714. 36. S. S. Yadav and S. M. Jadhav, “Deep convolutional neural network based medical image classification for disease diagnosis,” Journal of Big Data, vol. 6, no. 1, Dec. 2019. [Online]. Available: https: //doi.org/10.1186/s40537-019-0276-2 37. F. Chollet, Deep Learning with Python, 1st ed. USA: Manning Publications Co., 2017. 38. X. Xia, E. Shihab, Y. Kamei, D. Lo, and X. Wang, “Predicting crashing releases of mobile applications,” in Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, Sep. 2016. [Online]. Available: https://doi.org/10.1145/2961111.2962606 39. C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, Jul. 2019. [Online]. Available: https://doi.org/10.1186/s40537-019-0197-0 40. L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” ArXiv, vol. abs/1712.04621, 2017. 41. Reference, “Building powerful image classification models using very little data,” https://blog.keras.io/building-p owerful-image-classification-models-using-very-little-data.html, (Accessed on 08/22/2021). 42. Y. Deng, “Deep learning on mobile devices: a review,” in Mobile Multimedia/Image Processing, Security, and Applications 2019, S. S. Agaian, S. P. DelMarco, and V. K. Asari, Eds. SPIE, May 2019. [Online]. Available: https://doi.org/10.1117/12.2518469 43. Reference, “Plain radiograph/x-ray - insideradiology,” https://www.insideradiology.com.au/plain-radiographx-ray/#:∼:text=It%20usually%20takes%20less%20than,about%2C%20and%20your%20general%20health., (Accessed on 06/02/2021). 44. A. M. Alqudah, “Augmented covid-19 x-ray images dataset,” 2020. [Online]. Available: https://data.mendeley.co m/datasets/2fxz4px6d8/3 45. Reference, “Adding metadata to tensorflow lite models,” https://www.tensorflow.org/lite/convert/metadata, (Accessed on 09/06/2021). 46. J. Zhao, Y. Zhang, X. He, and P. Xie, “Covid-ct-dataset: a ct scan dataset about covid-19,” arXiv preprint arXiv:2003.13865, 2020. 47. Reference, “Leaves,” https://ftp.cngb.org/pub/gigadb/pub/10.5524/100001 101000/100251/Leaf outlines.zip, (Accessed on 03/12/2022). 20 Computer Science & Information Technology (CS & IT) 48. ——, “100,000 histological images of human colorectal cancer and healthy tissue — zenodo,” https://zenodo.org /record/1214456#.YhxvROhBzIU, (Accessed on 03/09/2022). Authors Muhammad Muneeb obtained his M.Sc. in Computer Science from the Khalifa University, Abu Dhabi, UAE. I am currently working as a research associate in the same institute under the supervision of Dr. Samuel. I like to work on inter-discipline problems. Have interests in algorithms, automation, genetics, medical image analysis, and optimization. Samuel F. Feng obtained his PhD in Applied and Computational Mathematics from Princeton University (2012), his MA from Princeton University (2009), and his BA from Rice University (2007). From 2012 to 2014 he was a Ruth L. Kirschstein NRSA Postdoctoral Fellow at the Princeton Neuroscience Institute, working on stochastic models of decision making in psychology and neuroscience. In 2014 he joined the Department of Mathematics at Khalifa University, Abu Dhabi, UAE, where is currently an assistant professor. Andreas Henschel earned his M.Sc. and Ph.D. in Computer Science from the Technical University of Dresden (Germany), in 2002 and 2008, respectively. He joined Masdar Institute as a Postdoctoral researcher in 2009. In 2011, he became Assistant Professor at Masdar Institute, UAE. He spent one year at the Massachusetts Institute of Technology MIT), USA as a Visiting Scholar. © 2022 By AIRCC Publishing Corporation. This article is published under the Creative Commons Attribution (CC BY) license.