Academia.eduAcademia.edu

Content-based image visualization

2000 IEEE Conference on Information Visualization. An International Conference on Computer Visualization and Graphics

The proliferation of content-based image retrieval techniques has highlighted the need to understand the relationship between image clustering based on low-Ievel image features and image clustering made by human users. In conventional image retrieval systems, images are typically characterized by a range offeatures such as color, texture, and shape. However, little is known to what extent these low-Ievel features can be effectively combined with information visualization techniques such that users may explore images in a digital library according to visual similarities. In this article, we compared and analyzed a number of Pathfinder networks of images generated based on such features. Salient structures of images are visualized according to features extracted .from color, texture, and shape orientation. Implications for visualizing and constructing hypermedia systems are discussed.

Content-Based Image Visualization Chaomei Chenl, George Gagaudakis2,Paul Rosin2 1Department of Information 2Department Systems and UB8 3P H; UK E-mail: [email protected], Science Cardiff University Newport Road, Cardiff Wales, CF24 3XF UK Computing Brunei University Uxbridge, of Computer {george.gagaudakis , paul.rosin}@cs.cfac.uk infonnation space organized through a variety of metaphors, such as an infonnation landscape or an infonnation galaxy [7, 8]. Many of these visualizations are based on interrelationships derived from textual infonnation, typically using classic infonnation retrieval models such as the vector space model [9], Latent Semantic Indexing (LSI) [10], or other variants. There has been a steadily increased interest in a variety of layout and visualization techniques that tend to place similar objects near to each other and separate dissimilar objects far apart in the visualization Abstract The proliferation of content-based image retrieval techniques has highlighted the need to understand the relationship between image clustering based on low-Ievel image features and image clustering made by human users. In conventional image retrieval systems, images are typically characterized by a range offeatures such as color, texture, and shape. However, little is known to what extent these low-Ievel features can be effectively combined with information visualization techniques such that users may explore images in a digital library according to visual similarities. In this article, we compared and analyzed a number of Pathfinder networks of images generated based on suchfeatures. Salient structures of images are visualized according to features extracted .from color, texture, and shape orientation. Implications for visualizing and constructing hypermedia systemsare discussed. space. The work described in this article extends our earlier work in structuring and analyzing the design of various infonnation visualization displays. We have gathered computer-generated images of a variety of infonnation visualizations [II]. In particular, we have visualized image networks based on similarity measures produced by mM's QBIC system [12], including color, layout, and texture. Researchers and practitioners in infonnation visualization often need to fmd an optimal way to arrange various visualization images so that design patterns and trends will become apparent. Ideally, images of similar layouts, spatial properties, or overall shapes should be closely grouped together. Users should be able to explore and compare images within such structures. Generalised Similarity Analysis (GSA) is a generic framework developed for structuring and visualizing infonnation spaces [13, 14]. Applications of GSA include visualization of university websites, online conference proceedings, and journals in digital libraries according to a variety of similarity measures, such as tenn-frequences, hypertext reference links, author co-citation profiles, and browsing trails of users. A key element in GSA is the use of Pathfmder network scaling technique to extract the most salient links and eliminate redundant or counter-intuitive links [15]. Pathfmder has some desirable features over techniques such as multidimensional scaling (MDS), for example, Pathfinder networks present a more accurate local structure. In this article, our aim is to explore a synergy I. Introduction Content-based image retrieval has been a highly. active field of research [I, 2]. A number of widely known image retrieval systems have been developed over the last few years, notably, ffiM's QBIC [3], PhotoBook [4], lmageRover [5], and Webseek [6]. In these systems, images are typically characterised by attributes known as features, ranging from simple, low-level ones such as color and texture, to more complex, relatively higher-level ones'such as shape and other semantically rich query classes. Ultimately, feature-extraction techniques, combined with other techniques, are expected to narrow down the gap between relatively primitive features extracted from images and high-level, semantically-rich perceptions by humans so that users will be able to fmd the right images more easily and intuitively. The advances of information visualization and data mining techniques now allow users to explore an 13 0-7695-0743-3/00 $10.00 @ 2000 IEEE The Proceedings of the: IEEE International Conference on Information Visualization (IV'00) ~0-7695-0743-3/00 $10.00 @ 2000 IEEE .spatial between Pathfmder network scaling and CBIR teclmiques to enable users to explore a collection of images according to their content similarities. The rest of this article is organised as follows. First, the feature-extraction teclmiques to be used are introduced in more detail. Second, a brief history of using Pathfmder networks in information visualization is provided to form a wider context. Then, search results are included to illustrate the effects of four feature-extraction teclmiques. Subsequently derived Pathfinder networks are examined and discussed. Finally, implications of the synergy for visualizing and constructing hypermedia systems are discussed. A robust CBIR technique should support a combination of these query classes. Ideally, users should be able to use high-Ievel and semantically-rich image query classes, such as human facial expressions, in their image retrieval. However, the reliability of today's feature-extraction techniques has yet to reach such a level of satisfaction. This is partially why simpler, and relatively low-Ievel featureextraction techniques are still being widely used and continuously developed. The four feature-extraction algorithms to be used in our study is explained as follows. 2.1 2. Content-Based Retrieval Colour Swain & Ballard [16] matched images based solely on their colour .The distribution of colour was represented by colour histograms, and formed the images' feature vectors. The similarity between a pair of images was then calculated using a similarity measure between their histograms called the normalised histogram intersection. This approach became very popular due to the following advantages: .Robustness .Effectiveness The key issue in CBIR is how to match two images according to computationally extracted features. Typically, the content of an image can be characterised by a variety of visual properties known as features. It is common to compare images by colour, texture, and shape, although these entail different levels of computational complexity. Colour histograms are much easier to compute than a shapeoriented feature extraction. Most content-based image retrieval techniques fall into two categories: manual and computational [2]. In manual approaches, a human expert may identify and annotate the essenceof an image for storage and retrieval. For example, radiologists often work on medical images marked and filed manually with a high degree of accuracy and reliability . .Implementation simplicity .Computational simplicity .Low storage requirements Differentiating from the original proposal, towards a more compact colour representation, we used the 11 colour labels as obtained by the anthropological study of Berlin and Kay on colour terms in 100 different languages [17]. 2.2 Figure 1. Manually clustered 279 computer-generated constraints. Texture A common extension to colour-based feature extraction is to add textural infonnation. There are many texture analysis methods available, and these can be applied either to perform segmentation of the image, or to extract texture properties from segmented regions or the whole image. In a similar vein to colour-based feature extraction, we modified the standard cooccurrence method in order to produce texture histograms with an additional degree of rotation invariance. The modified method, called the circular cooccurrence matrix, is described in [18]. In general, texture-based feature extraction tends to provide more spatial information than color histograms. In order to fmd out more about the content of an image, one may consider features associated with shapes. For example, the presence of edges, edge orientation, and edge distance may lead to a more accurate match of images. images. Computational approaches, on the other hand, typically rely on feature-extraction and pattem-recognition algorithms to match two images. Feature-extraction algorithms commonly match images according to the following attributes, also known as query classes: .color 2.3 Shape Shape extraction remains a challenging to featureoriented approaches. Several methods have been developed for detecting shapes indirectly. Whereas it tends to be .texture .shape 14 The Proceedings of the: IEEE International Conference on Information Visualization (IV'00) ~0-7695-0743-3/00 $10.00 @ 2000 IEEE extremely difficult to perfoml semantically meaningful segmentation, many reasonably reliable algorithms for lowlevel feature extraction have been developed. These will be used to provide the spatial infomlation that is lacking in colour histograms. Rather than attempt to directly measure shape we will calculate some simpler properties that are indirectly related to shape and avoid the requirement for good segmentation, providing a more practical solution. Edge Orientation. Previous work in this area can be found in Jain and Vailaya's work [19]. They combined edge orientation histograms with colour histograms. These edge orientation histograms encode some aspects of shape infomlation. As a result, image retrieval can be more responsive to the shape content of the images. Standard edge detection is sufficient for shape-oriented feature extraction (e.g. Canny's algorithm [20]). In addition, minor errors in the edge map have little effect on the edge orientation histograms. Unlike colour histograms, the orientation histograms are not rotationally invariant. Therefore the histogram matching process has to iteratively shift the histogram to fmd the best match. A more important consideration is that the edge maps were thresholded by some unspecified means. For robustness an adaptive thresholding scheme should be used [21]. However, an alternative is to include all the edges and weight their contribution to the histogram by their magnitudes so as to reduce the contribution from spurious edges. This is the approach we take in the reported The partitioning injects the spatial information into the analysis so that standard feature-based methods ( e.g. nonspatial) can then be applied within each region. However, small changes in the threshold value may cause relatively large changes in resulting binary images. In order to overcome this potential drawback, we applied a soft threshold as introduced in [18] to generate similarity measuresfor the work reported in this article. 3. Pathfinder Networks Pathfmder network scaling is a structural modelling technique originally developed for the analysis of proximity data in psychology [15]. We have adapted this modelling technique to simplify and visualise the strongest interrelationships in proximity data. The resultant networks are called Pathfmder networks (PFNETs). The key to Pathfmder is the so-called triangular inequality condition, which can be used to eliminate redundant or counter-intuitive links. Pathfmder network scaling particularly refers to this pruning process. The topology of a PFNET is determined by two parameters r and q and the resultant Pathfmder network is denoted as PFNET(r, q). The weight of a path is defmed based on Minkowski metric with the r-parameter. The qparameter specifies that the triangle inequality must be maintained against all the alternative paths with up to q links connecting nodes n 1 and nk: 1 k-l -;; experiments. Multi-resolution Salience Distance Transform. Another approach to including shape infomlation is based on the distance transfoml (DT). The DT is a method for taking a binary image of feature and non-feature pixels and calculating at every pixel in the image the distance to the closest feature. Although this is a potentially expensive operation efficient algorithms have been developed that only require two passesthrough the image [22]]. To improve the stability of the distance transfoml, Rosin and West [23] developed an algorithm called the salience distance transfoml (SDT). In SDT, the distances are weighted by the salience of the edge, rather than propagating out Euclidean (or quasi-Euc1idean) distances from edges. Various fomls of salience have been demonstrated, incorporating features such as edge magnitude, curve length, and local curvature. The effect of including salience was to downplay the effect of spurious edges by soft assignment while avoiding the sensitivity problems ofthresholding. Segmentation by Thresholding. Partitioning based approaches as in [24] have been used to improve the perfomlance of CBIR systems. Trying to avoid selection of rigid regions and true segmentation, we used the binary thresholding as a tool for partitioning. W n1nk ~(Lwr. i=l .) nlnl+l Vk=2,3,...,q The least number of links can be achieved by imposing the triangular inequality condition throughout the entire network (q=N-l). In such networks, each path is a minimum-cost path. Pathfinder network scaling is a central component of the GSA framework. GSA provides a flexible platform for us to experiment with a variety of structures, such as the vector-space model, LSI, and author co-citation networks [25]. 3.1 Image Database In this article, we use a collection of 279 information visualization images. A considerable number of these images are computer-generated graphics included in [II]. We apply the Pathfinder network scaling technique on image similarity data computed based on color labels, texture, shape orientation, and a combined feature classes. These similarity data are submitted to Pathfmder network scaling. All the Pathfmder networks described in this article are minimum-cost networks, i.e. PFNETs (r=oo, 15 The Proceedings of the: IEEE International Conference on Information Visualization (IV'00) 0-7695-0743-3/00 $10.00 @ 2000 IEEE ~ q=N-l). These Pathfmder networks are rendered as virtual reality models in VRML (Virtual Reality Modeling Language) for examination and evaluation. solution, in terms of the number of clusters and the homogeneity of clusters. The Pathfinder network corresponding to the texturebased feature-extraction scheme consists of three huge clusters. A possible explanation is that most of these images are generated by computer; therefore, they may share texture patterns to a considerable extent. In order to understand further about the nature of the clustering patterns in these Pathfmder networks, we compared the network structures corresponding to the 5 grouping schemes used. The results are summarized in Table I. Given that all the networks consist of the same set of images, the focus of the comparison was on the number of links in common between a pair of network structures. The assumption is if two networks have more than their share of links in common, then this commonality indicates that these two structures together reveal some valuable information. On the contrary , if two networks only have a number of links in common more or less by chance, then it is unlikely that these networks contain any information valuable. .Apart from the manual scheme, pure color label scheme generated the largest number of links: 338. The shape orientation scheme generated the least: 227. It is particularly interesting to note that color labels with spatial injection through soft thresholding scheme has the highest overlap rate with the manual scheme, in terms of the information (16.074). This measuring scheme should be further investigated in future studies 4. Pathfinder networks of images Five Pathfmder networks of images were generated based on similarity data derived from color labels, color with spatial injection through soft thresholding, texture, shape orientation, and the combined similarity scheme. In this article, we expanded QmC-derived similarity measures reported in [12], to include relatively higher-Ievel features such as shape orientation. We expected that images with similar structures and appearances should be grouped together in Pathfinder networks. Figure 2 shows a screenshotof image visualization based on a combination of color labels, texture histogram, and shape orientation. The layout reveals 7 apparent clusters. Images within each cluster appear to be homogenous, except the largest cluster, in which the color patterns of images appear to be mixed. Figure2. A Pathfindernetworkof the same279 imagesgenerated from a combinationof color labels,texture,and shapehistograms. Figure 3 includes 6 sub-figures corresponding to 6 different clustering schemes, namely, manual, combined, color labels, color labels with spatial injection through soft thresholding, texture, and shape orientation. The combined scheme generated the best result, whereas the shape orientation did not reveal any clear sub-structures. Pathfmder network scaling on the shape orientation scheme along was not as effective as with the combined scheme. Color labels with spatial injection appeared to generate a slightly better clustering pattern than the pure color label Figure3. Pathfindernetworksof the same279imagesby automaticallyextractedfeatures. 16 The Proceedings of the: IEEE International Conference on Information Visualization (IV'00) 0-7695-0743-3/00 $10.00 @ 2000 IEEE 5. Discussion and Conclusion In a long run, visualizing image clusters based on feature-extraction mechanisms remains a challenging field of research. Unlike text-based information visualization, visualizing interrelationships among images has a unique advantage. Because humans can easily recognise visual patterns, it would be easier for users to detect discrepancies from a network of images than from a network of abstract concepts in text. We have seen the results of applying the Pathfmder network scaling technique on various feature-extractionbased image matching schemes. On the one hand, incorporating shape-oriented feature-extraction algorithms appears to have improved the quality of image matching when combined with other schemes. We also identified that spatial injection to color label scheme yielded the highest overlap rate in terms of the network similarity. images 279 j manual clusters r color manual clusters ~ 8072 color texture shape color lavout color texture I 284 common links 48 texture shaDe ~.006 0.017 0.070 color 338 common links ~ 1 similarity point 0.002 probability information color layout 280 I common 0.005 links ~ ~ 8 ~0.264 .OO~ 0.011 similarity point ~.01~ 0.009 6.292 0.006 0.000 0.000 probability information 5 0.025 0.196 16.074 24.939 texture O.OO~ ~ 0.260 0.190 shape 227 ~ 3 ~ 0.33127 0.002 0.000 infolmation 0.000 ~0.274 .OO~ = 0.319 .OO~ = 0.313 .OO~ 0.211 0.308 0.292 0.000 729.229 Table I. The similarity of network structures. Compared to computational feature-extraction algorithms, human users may employ a much wider range of criteria to judge, compensate, or differentiate the similarity between two images. The integration of Pathfmder networks and some of the most commonly used feature-extraction schemes as presented in this ~icle is only the fIrst step towards the development of a comprehensive framework of visualizing and exploring hypermedia networks. Information visualization and feature-extraction techniques have the great potential to benefit tremendously from each other. Clustering images has a wide range of potential applications, for example, data mining in remote sensing images and image retrieval from film and video archives. Most images in our sample are more likely to be different than similar. Such discretenessmay obscure some otherwise obvious patterns in image groupings. We are now considering to apply this methodology on a sample of images with more continuous scenes, for example, video segments, so that we will be able to keep track of the impact of various feature-extraction techniques more closely. The Proceedings of the: IEEE International Conference 17 on Information Visualization (IV'00) 0-7695-0743-3/00 $10.00 @ 2000 IEEE ~ Future work should address an optimal integration of feature-extraction techniques and other image indexing methods, especially meta-data approaches. The integration of CBIR techniques and existing techniques in GSA so far provides additional tools for designers to organise images based on a variety of features for retrieval and browsing. Image indexing techniques described in this article have the potential to use generic visualization techniques to generate overviews of contentbased image networks. Visualizations based on such content-based image indexing mechanisms may lead to more insights into emerging trends in information visualization. [10] S. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science, vol. 41, pp. 391-407, 1990. 6. Acknowledgements [14] C. Chen and L. Carr, "Trailblazing the literature of hypertext: Author co-citation analysis (1989-1998)," Proceedings of the lOth ACM Conference on Hypertext (Hypertext '99), Darmstadt, Germany, 1999. [11] C. Chen, Information Visualisation and Environments. London: Springer-Verlag London, 1999. Virtual [12] C. Chen, G. Gagaudakis, and P. Rosin, "Similaritybased image browsing," Proceedings of the 16th IFIP World Computer Congress, International Conference on Intelligent Information Processing, Beijing, China, 2000. [13] C. Chen, "Generalised Sin)ilarity Analysis and Pathfinder Network Scaling," Interacting with Computers, vol. 10, pp. 107-128,1998. This study was in part supported by the British research council EPSRC (GR/L61088 and GR/L94628). [15] R. W. Schvaneveldt, F. T. Durso, and D. W. Dearholt, "Network structures in proximity data," in The Psychology of 7. References Learning and Motivation, 24, G. Bower, Ed.: Academic Press, 1989, pp. 249-284. [I] M. Marsicoi, L. Cinque, and S. Levialdi, "Indexing pictorial documents by their content: A survey of current techniques," Image and Vision Computing, vol. 15, pp. 119-141, 1997. [16] M. Swain and H. Ballard, "Color indexing," International Journal ofComputer Vision, vol. 7, pp. 11-32, 1991. [17] B. Berlin and P. Kay, Basic Colour Terms: Their Universality and Evolution: University of California Press, 1969. [2] V. Gudivada and V. Raghavan, "Content-based image retrieval systems," IEEE Computer, vol. 28, pp. 18-22, 1995. [18] G. Gagaudakis and P. Rosin, "Incorporating shape into histograms for CBIR, " Patern Recognition, To Appear. [3] M. Flickner, H. Sawhney, W. Niblack, I. Sahley, Q. Huang, B. Dom, M. Gorkani, I. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, "Query by image and video content: The QBIC system," IEEE Computer, vol. 28, pp. 23-32, 1995. [19] A. K. lain and A. Vailaya, "Image retrieval using color and shape," Pattern Recognition, vol. 29, pp. 1233-1244, 1996. [4] A. Pentland, R. W. Picard, and S. Sclaroff, "Photobook: Tools for content-base manipulation of image databases," Proceedings of SPIE Conference on Storage and Retrieval of Image and Video Databases II, San lose, CA, 1994. [20] I. Canny, "A computational approach to edge detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679-698, 1986. [21] P. L. Rosin, "Edges: Saliency measures and automatic thresholding," Machine Vision and Application, vol. 9, pp. 139159, 1997. [5] S. Sclaroff, L. Taycher, and M. LaCascia, "ImageRover: A content-based image browser for thr World Wide Web," Proceedings of IEEE Content-Based Access of Image and Video Libraries, 1997. [22] G. Borgefors, "Distance transformations in digital images.," Computer Vision, Graphics. and Image Processing, vol. 34, pp. 344-371, 1986. [6] I. R. Smith and S.-F. Chang, "Searching for images and video on the World Wide Web," Multimedia Systems, vol. 3, pp. 3-14,1995. [23] P. L. Rosin and G. A. W. West, "Salience distance transform," Graphical Models and Image Processing, vol. 57, pp. 483-521, 1995. [7] I. A. Wise Ir., I. I. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, and V. Crow, "Visualizing the non-visual: Spatial analysis and interaction with information from text documents," Proceedings of IEEE Symposium on Information Visualization '95, Atlanta, Georgia, USA, 1995. [24] M. Striker and A. Dimai, "Special covariance and fuzzy regions for image indexing," Machine Vision and Applications, vol. 10, pp. 66-73,1997. [25] C. Chen, "Visualizing semantic spaces and author cocitation networks in digital libraries," Information Processing and Management, vol. 35, pp. 401-420, 1999. [8] H. Small, "Update on science mapping: Creating large document spaces," Scientometrics, vol. 38, pp. 275-293, 1997. [9] G. Salton, I. Allan, and C. Buckley, "Automatic structuring and retrieval of large text files," Communications of theACM, vol. 37, pp. 97-108, 1994. 18 The Proceedings of the: IEEE International Conference on Information Visualization (IV'00) ~0-7695-0743-3/00 $10.00 @ 2000 IEEE