Academia.eduAcademia.edu

Fuzzy description of skin lesions

2010

We propose a system for describing skin lesions images based on a human perception model. Pigmented skinlesions including melanoma and other types of skin cancer as well as non-malignant lesions are used. Works onclassification of skin lesions already exist but they mainly concentrate on melanoma. The novelty of our work isthat our system gives to skin lesion images a semantic label in a manner similar to humans. This work consists of two parts: first we capture they way users perceive each lesion, second we train a machine learning system thatsimulates how people describe images. For the first part, we choose 5 attributes: colour (light to dark), colouruniformity (uniform to non-uniform), symmetry (symmetric to non-symmetric), border (regular to irregular),texture (smooth to rough). Using a web based form we asked people to pick a value of each attribute for eachlesion. In the second part, we extract 93 features from each lesions and we trained a machine learning algorithmusing such features as input and the values of the human attributes as output. Results are quite promising,especially for the colour related attributes, where our system classifies over 80% of the lesions into the samesemantic classes as humans.

Fuzzy Description of Skin Lesions Nikolaos Laskarisa , Lucia Ballerinia , Robert B. Fishera , Ben Aldridgeb , Jonathan Reesb a School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, UK; of Dermatology, University of Edinburgh, Lauriston Place, Edinburgh, UK b Department ABSTRACT We propose a system for describing skin lesions images based on a human perception model. Pigmented skin lesions including melanoma and other types of skin cancer as well as non-malignant lesions are used. Works on classification of skin lesions already exist but they mainly concentrate on melanoma. The novelty of our work is that our system gives to skin lesion images a semantic label in a manner similar to humans. This work consists of two parts: first we capture they way users perceive each lesion, second we train a machine learning system that simulates how people describe images. For the first part, we choose 5 attributes: colour (light to dark), colour uniformity (uniform to non-uniform), symmetry (symmetric to non-symmetric), border (regular to irregular), texture (smooth to rough). Using a web based form we asked people to pick a value of each attribute for each lesion. In the second part, we extract 93 features from each lesions and we trained a machine learning algorithm using such features as input and the values of the human attributes as output. Results are quite promising, especially for the colour related attributes, where our system classifies over 80% of the lesions into the same semantic classes as humans. Keywords: skin lesion images, semantic labels, human perception, feature extraction, supervised learning 1. INTRODUCTION Dermatologists usually perform the diagnosis of skin lesion based on their personal experience. The criteria to reach a diagnosis are often derived from qualitative properties they observe in the images, but they are not able to describe the lesions in a consistent way using quantitative measurements. The goal of this work is to develop a system that can describe skin lesions in a manner similar to dermatologists. This is an attempt to create a model to simulate human perception, using image analysis and machine learning techniques. The system takes as input features extracted from skin lesion images and produces as output a description of some shape, colour and texture properties (i.e. regular, irregular, pale, dark, uniform, rough, etc.). In such a way, the presented system assigns descriptive labels to skin lesions in a similar manner as the visual perception of dermatologists. Methods to interpret visual information carried in a digital image, describing it in ways that humans do, by “extracting” high-level semantics such as “people playing football” or “tiger hunting its pray” have been of great interest during the previous years. Humans tend to use high-level features (concepts), such as keywords, text descriptors, to interpret images and measure their similarity. On the other hand, the features automatically extracted using computer vision techniques are mostly low-level features (colour, texture, shape, spatial layout, etc.). In general, there is no direct link between the high-level concepts and the low-level features.1 More specifically, the discrepancy between the limited descriptive power of low-level image features and the richness of user semantics, is referred to as the “semantic gap”.2 A recent survey1 shows that the state-of-the-art techniques in reducing the “semantic gap” include mainly five categories: (1) using object ontology to define high-level concepts; (2) using machine learning methods to associate low-level features with query concepts; (3) using relevance feedback to learn the user’s intention; (4) generating semantic templates to support high-level image retrieval; (5) fusing the evidence from HTML text and Further author information: (Send correspondence to Lucia Ballerini) Lucia Ballerini: E-mail: [email protected], Telephone: +44 (0)131 651 5664 the visual content of images for WWW image retrieval. Major recent publications are included in that survey covering different aspects of the research in this area, including low-level image feature extraction, similarity measurement, and deriving high-level semantic features. Some authors3, 4 report that in recent years, content-based image retrieval research (CBIR) has shifted its focus towards bridging the gap between low-level features and high-level semantics. This work can be seen as attempt to reduce the “semantic gap” that uses supervised learning methods to associate low-level features with high-level semantic concepts. Most of the work in dermatology has focused on skin cancer detection. Different techniques for segmentation, feature extraction and classification have been reported by several authors. Concerning segmentation, Celebi et al.5 presented a systematic overview of recent border detection methods: clustering followed by active contours are the most popular. Numerous features have been extracted from skin images, including shape, colour, texture and border properties.6, 7, 8 Classification methods range from discriminant analysis to neural networks and support vector machines.9, 10, 11 These methods are mainly developed for images acquired by epiluminescence microscopy (ELM or dermoscopy) and they focus on melanoma, which is actually a rather rare, but quite dangerous, condition whereas other skin cancers are much more common. However, no work has been done previously on automatically annotating skin lesion images with a symbolic label and thus the novelty of this work consists in both inspecting the way people understand skin lesion images, and attempting to model human perception. To our knowledge, there is only a proposal for a CBIR system of skin lesion images that incorporates human perception. The system proposed by Celebi et al.12 incorporates human perception to guide the search for an optimum similarity function. They designed an experiment to measure the perceived similarity of each image with every other other in the database. However, they focus only on shape similarity. The novelty of our work is that our system gives to skin lesion images a semantic label in a manner similar to humans. Our system classifies 31 lesions into the same colour description as the most common selected label with 19% error. The structure of the paper is as follows. Section 2 describes the two parts of our work, i.e. the experiment to gather user description of lesions, and the machine learning system implementation. Results are presented in Section 3, together with their evaluation. Conclusion and possible future direction follows. 2. METHOD This work consists of two parts: the first step is to capture how users perceive each image, the second step is to train a machine learning system that simulates how people describe images. Images used in this work are acquired using a Canon EOS 350D SRL camera, having a resolution of about 0.03 mm. A subset of 31 pigmented lesions, having different characteristics, were chosen by dermatologists for this part of the study. Some images are shown in Figure 1. (a) lentigo (b) seborrheic keratosis Figure 1. Examples of skin lesion images used in this work (c) naevus 2.1 Gathering User Perception A pilot study showed that people tend to describe the lesion with concepts they are familiar with, e.g. “it looks like a pizza”. Moreover, there was huge variability in the description of the same image using natural language. For this reason, a predefined vocabulary was chosen. For each concept, the user can assign an attribute, e.g. for colour: “light”, “dark” and so on. These concepts are: 1. Colour of lesion [light to dark] 2. Uniformity of colour [uniform to heterogeneous] 3. Asymmetry of lesion [symmetric to asymmetric] 4. Border of lesion [regular to irregular] 5. Texture of lesion [smooth to rough] In order to collect the data, a web-based questionnaire was created. The questionnaire was developed in PHP and JavaScript, while a PostgreSQL database server was setup to store the user selections. The main page of the questionnaire, where the data collection regarding the image set is done, is shown in Figure 2. Users are presented with a slideshow and they are asked to input how they perceive each image, using five sliders with values ranging from 0 to 10 for each of the five concepts. Figure 2. Screenshot of the image description form, where volunteers describe the images through the use of five bars. The web-based questionnaire consists also of other two parts: • an introductory page, where the user identification and the IP address check is done. Accepted users are required to provide a nickname (to respect their anonymity) and their medical related qualification (if they are medical doctors or not). • a final page, which is intended to make an integrity check of the whole data set produced by the user and store it on the database. Currently, 37 volunteers have submitted their intuitive view of the selected 31 lesions (6 medical doctors and 31 with no medical knowledge). Generally, we noted high intra-class variation in the non-doctors group of volunteers, while slightly lower variation in the group of doctors. In the Table 1 we report the overall mean and standard deviation of the values gathered by our form for the images shown in Figure 1. Table 1. Examples of values (mean ± stdev) of doctor (top lines) and non-doctor (bottom lines) perception for the lesions shown in Figure 1. lesion doctor non-doctor (a) (b) (c) (a) (b) (c) lesion colour 0.59 ± 0.53 8.08 ± 1.12 5.31 ± 0.48 0.62 ± 0.87 8.51 ± 1.18 5.25 ± 1.90 colour uniformity 1.45 ± 1.63 6.81 ± 2.27 3.87 ± 2.34 2.14 ± 2.49 5.51 ± 2.05 4.51 ± 1.94 lesion asymmetry 4.13 ± 3.15 6.16 ± 1.68 0.75 ± 0.63 7.09 ± 2.85 7.22 ± 1.97 2.34 ± 1.79 lesion border 5.78 ± 3.20 3.72 ± 3.04 1.75 ± 2.09 6.86 ± 2.82 5.85 ± 3.06 2.72 ± 1.72 lesion texture 0.78 ± 0.56 7.69 ± 1.23 3.41 ± 2.05 1.03 ± 1.51 8.42 ± 1.14 4.01 ± 2.52 The initial plan was to have two groups of people: doctors and non-doctors and conduct the same experiment on both groups separately. Unfortunately, the number of medical doctors (just 6) was too few to provide reliable results. Moreover, after the analysis of the data of both groups, we can say that the difference of the two groups when it comes to evaluate each lesion is very little. So, it was decided to drop the discrimination between doctors and non-doctors, merging the two groups into a single one. Thus, each image has 37 different evaluations, equal to the number of the volunteers. The standard deviation of these evaluations, averaged over all 31 images, is presented on Table 2. We observe that in most cases, the deviation is more than 2, which, for our domain ∈[0,10] is very high. Table 2. Average standard deviation between users’ inputs over all 31 images. lesion colour 1.419 colour uniformity 2.022 lesion asymmetry 2.221 lesion border 2.341 lesion texture 2.144 2.2 Machine Learning System In this section we describe the system, based on machine learning algorithms, that imitate the human understanding of lesion images. Figure 3 shows an overview of the system. The system needs to be trained by using low-level image features as input and human description as output. Figure 3. Overview of the machine learning system 2.2.1 Feature Extraction A simple segmentation algorithm based on the Otsu method13 was used to distinguish between background (safe skin) and foreground (lesion). It gave good results in most cases. In the 10% of the cases in which the algorithm fails, a manual segmentation is done. Ninety-three features were extracted from our images. Extracted features include: • area, perimeter, compactness of the lesion (3 features) • central moments of the border of the lesion and of the whole lesion (2 features) • mean and standard deviation of colour values in RGB (red, green, blue) and HSI (hue, saturation, intensity) colour spaces, calculated for the entire lesion and for 8 subregions (18 features) • mean and standard deviation of contrast on the border of the lesion (2 features) • textures based on Fourier transform (4 features) • textures based on Co-occurrence matrices (36 features) • textures based on Gabor filters (28 features) We created 5 different classifiers, each one using a subset of the 93 extracted numerical features as input and one of the human perceived attributes as output. For example, classifier 1 computed the fuzzy description [light, medium, dark] for the lesion colour. The representative features which were extracted for each lesion are summarised on Table 3. A detailed analysis of the method which was used to extract each one of them is described in another report.14 The features presented in Table 3 are grouped into the following categories during the system learning process, to train each classifier system: • Features for “Lesion Colour”: {6-17} • Features for “Colour Uniformity”: {6-17, 20-25} • Features for “Lesion Asymmetry”: {1-5, 20-25} • Features for “Lesion Border”: {3, 4, 5, 18, 19} • Features for “Lesion Texture”: {26-93} From the last group, the texture features, we also considered 3 subsets: - Fourier Features: {26-29} - Cooccurrence Matrix Features: {30-65} - Gabor Features: {66-93} that we used individually to train 3 different classifier systems. Table 3. The 93 features extracted from the lesions, used as input of the systems. # 1 2 3 4 5 6-8 9-11 12-14 15-17 18 19 20-22 23-25 26 27 28 29 30-35 36-41 42-47 48-53 54-59 60-65 66-69 70-73 74-77 78-81 82-85 86-89 90-93 Feature Perimeter of the lesion Area of the lesion Compactness Index Normalised Central Moment of the border Normalised Central Moment of the whole lesion Mean {red,green,blue} of lesion Std {red,green,blue} of lesion Mean {hue,saturation,intensity} of lesion Std {hue,saturation,intensity} of lesion Mean contrast on the border Std contrast on the border Std of {red,green,blue} Std of {hue,saturation,intensity} Entropy of Fourier transform Inertia of Fourier transform Energy of Fourier transform Weighted Distance of Fourier transform Contrast of Co-occurrence matrix Dissimilarity of Co-occurrence matrix Homogeneity of Co-occurrence matrix Energy of Co-occurrence matrix Entropy of Co-occurrence matrix Correlation of Co-occurrence matrix 1st Gray scale Hu invariant of Gabor filtered image 2nd Gray scale Hu invariant of Gabor filtered image 3rd Gray scale Hu invariant of Gabor filtered image 4th Gray scale Hu invariant of Gabor filtered image 5th Gray scale Hu invariant of Gabor filtered image 6th Gray scale Hu invariant of Gabor filtered image 7th Gray scale Hu invariant of Gabor filtered image Additional Info over over over over over over over over 8 points 8 points 8 regions of the lesion 8 regions of the lesion the lesion’s bounding box the lesion’s bounding box the lesion’s bounding box the lesion’s bounding box for 6 offsets {1,2,3,4,5,6}, averaged over 4 directions {0, π/4, π/2, 3π/4} for 4 rotations {0, π/4, π/2, 3π/4} of the Gabor kernel 2.2.2 User Data Interpretation The output of the system has been created from the data collected from the questionnaire as follows. For each image attribute, we: a) create a histogram, discretizing into 3 bins of answers, b) create a probabilistic model of whether a specific evaluation is in one class (bin) or another. For example, assuming we have a set of 37 “lesion colour” evaluations for image (c) in Figure 1: {7.51, 7.29, 4.41, 1.70, 4.67, 7.26, 2.44, 6.89, 7.52 · · · , 4.81}. We create 3 bins: [0 ← 3.333), [3.333 ← 6.667), [6.667 ← 10]. Thus the histogram is: [7, 18, 12]. The resulting probabilistic model is: image (c) has a probability of 7/37 = 0.19 to belong to light, a probability of 18/37 = 0.49 to belong to medium and a probability of 12/37 = 0.32 to belong to dark. Figure 4 sketches our system. Figure 4. Overview of our proposal to use histogram instead of a single value for each attribute To evaluate the correctness of each system we propose an evaluation algorithm based on the comparison of the two histograms, not in terms of error (average of the absolute difference on each bin) but in terms of bin ranking. For example: 1. Let the “Lesion Colour” histogram have 3 bins: {Bin 1 , Bin 2 , Bin 3}. 2. Assume that a specific image has actual probabilities for each bin {0.1 , 0.3 , 0.6}. 3. We order the bins according to their probabilities from the most probable to the least probable: {Bin 3 , Bin 2 , Bin 1}. 4. We order in the same way the bin value estimation of the machine learning algorithm for this image and compare the two ordered lists. 5. If the two lists contain the bins in the same order, then the classification is correct. So, an output of {0, 0.1, 0.9} which has an average error of 0.2 on each bin is considered correct, while an output of {0.25 , 0.4 , 0.35} which has a lower average error (0.16666) is considered wrong. However, this might have some flaws: in case that the distribution of the histogram is near-uniform, where all bins have almost the same number (i.e. {0.31, 0.33, 0.36}), it would be a problematic scenario for our method. But this happens rarely, because usually the histogram distribution approximates the Gaussian. 3. RESULTS In this part, almost all the machine learning algorithms of Weka software15 have been utilised to train each system by using the corresponding extracted features and the best results achieved in each case are presented. Each figure reports the highest percentages of correct classification achieved for each system, and the feature set used to get that result. The feature numbering refers to Table 3. A good description for each algorithm included in the Weka toolbox can be found in the book of Witten and Frank.16 In each bar diagram, the first three (as they appear from top) classifiers used are the same (so the reader can compare the effectiveness of the same classification method on different attributes), and in some cases, we report additional ones which might perform slightly better. All the settings used for the classification are the default for each classifier, as set by the authors of Weka, except where it is explicitly mentioned. The leave-one-out cross-validation algorithm is used. As can be seen in Figure 5, the best results (80.64%) are obtained by system 1 trained for the colour attribute. This system correctly classifies 25 lesions as the most common selected labels. The classifier used in this case was a Multilayer Perceptron. The other 4 systems gave results varying between 52% and 68%, depending on the classifier used and the attribute considered. (a) Feature Set: {6-17} (c) Feature Set: {1-5, 20-25} (b) Feature Set: {6-17, 20-25} (d) Feature Set: {3, 4, 5, 18, 19} Figure 5. Best classification results (%) for the first four attributes using 3 discrete classes. For each diagram, Feature Set refers to Table 3. Figure 6 reports results obtained for the Texture attribute using all texture features and the 3 subsets. The classification results for most systems seem to be mediocre, except for the colour attribute. In fact, the classification accuracy is strongly affected by the variance of the user input, being inversely proportional to the variability (the more the variability is, the less the ideal accuracy gets). As a matter of fact, this is reflected in our results. By observing Figure 5, it can be seen that the “Lesion’s Colour” is in general classified better than the “Colour Uniformity”. Similarly, the “Colour Uniformity” and the “Lesion’s Asymmetry” are better classified than the “Lesion’s Border”. Table 2 contains the average standard deviation of users’ input for each attribute. We see that the attribute with the lowest deviation enables the classification algorithms to reach higher levels of accuracy, while as the deviation increases, the algorithms are restricted to lower accuracy. 4. CONCLUSIONS We have proposed a model of human perception of skin lesion images that classifies lesions into the same semantic classes as humans. In this system we focus on some attributes (colour, shape, texture). Generalisation to other semantic properties are possible. Future work would be to automatically generate fuzzy rather than discrete descriptions. We noticed the the labels assigned to each image have a very high deviation, meaning different people describe the same image in different way. This high variation of user description of lesions using attributes made the resulting classification accuracy quite low in some cases. One future project will be develop a tool that will show several images to the user and will allow him to assign each image to the most visually similar one, instead of assigning a conceptual label. (a) Feature Set: {26-93} (b) Feature Set: {26-29} (Fourier features) (c) Feature Set: {30-65} (Co-occurrence matrix features) (d) Feature Set: {66-93} (Gabor features) Figure 6. Best classification results (%) for the “Lesion’s Texture” attribute using 3 discrete classes. For each diagram, Feature Set refers to Table 3. ACKNOWLEDGMENTS We thank the Wellcome Trust for funding this project. REFERENCES [1] Liu, Y., Zhang, D., Lu, G., and Ma, W.-Y., “A survey of content-based image retrieval with high-level semantics,” Pattern Recognition 40, 262–282 (2007). [2] Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R., “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000). [3] Carneiro, G., Chan, A. B., Moreno, P. J., and Vasconcelos, N., “Supervised learning of semantic classes for image annotation and retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 394–410 (2007). [4] Vasconcelos, N., “From pixels to semantic spaces: Advances in content-based image retrieval,” IEEE Computer 40, 20–26 (2007). [5] Celebi, M. E., Iyatomi, H., Schaefer, G., and Stoecker, W. V., “Lesion border detection in dermoscopy images,” Computerized Medical Imaging and Graphics 33(2), 148–153 (2009). [6] Wollina, U., Burroni, M., Torricelli, R., Gilardi, S., Dell’Eva, G., Helm, C., and Bardey, W., “Digital dermoscopy in clinical practise: a three-centre analysis,” Skin Research and Technology 13, 133–142(10) (May 2007). [7] Seidenari, S., Pellacani, G., and Pepe, P., “Digital videomicroscopy improves diagnostic accuracy for melanoma,” Journal of the American Academy of Dermatology 39(2), 175–181 (1998). [8] Lee, T. K. and Claridge, E., “Predictive power of irregular border shapes for malignant melanomas,” Skin Research and Technology 11(1), 1–8 (2005). [9] Schmid-Saugeons, P., Guillod, J., and Thiran, J.-P., “Towards a computer-aided diagnosis system for pigmented skin lesions,” Computerized Medical Imaging and Graphics 27, 65–78 (2003). [10] Maglogiannis, I., Pavlopoulos, S., and Koutsouris, D., “An integrated computer supported acquisition, handling, and characterization system for pigmented skin lesions in dermatological images,” IEEE Transactions on Information Technology in Biomedicine 9(1), 86–98 (2005). [11] Celebi, M. E., Kingravi, H. A., Uddin, B., Iyatomi, H., Aslandogan, Y. A., Stoecker, W. V., and Moss, R. H., “A methodological approach to the classification of dermoscopy images,” Computerized Medical Imaging and Graphics 31(6), 362 – 373 (2007). [12] Celebi, M. E. and Aslandogan, Y. A., “Content-based image retrieval incorporating models of human perception,” in [International Conference on Information Technology: Coding and Computing (ITCC 2004)], 2, 241–245, IEEE Computer Society, Los Alamitos, CA, USA (2004). [13] Otsu, N., “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man and Cybernetics 9, 62–66 (1979). [14] Laskaris, N., Fuzzy Description of Skin Lesion Images, Master’s thesis, School of Informatics, University of Edinburgh, UK (2009). [15] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H., “The WEKA data mining software: An update,” SIGKDD Explorations 11(1), 10–18 (2009). available at: http://www.cs.waikato.ac.nz/ml/weka/. [16] Witten, I. H. and Frank, E., [Data Mining: Practical Machine Learning Tools and Techniques], Morgan Kaufmann, Elsevier (2005). Second Edition.