A Neural Network Approach For Learning Image Similarity in Adaptive Cbir

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A NEURAL NETWORK APPROACH FOR LEARNING IMAGE SIMILARITY IN ADAPTIVE CBIR

P. Muneesawang
School. of Elect. and Info. Engin. University of Sydney Sydney, Australia

L. G u a n
Dept. of Elect. and Comp. Engin Ryerson Polytechnic University Toronto, Canada

Abstract In this paper the adoption of neural network techniques is studied for the purpose of image retrieval. More specifically, we propose a n adaptive retrieval system which incorporates learning capability into the image retrieval module where the network weights represent the adaptivity. This system can learn users notions of similarity between images through the continual relevance feedback from the users. Accordingly it makes the proper adjustment to improve performance. This retrieval system has demonstrated its effectiveness in performance. It is conflrmed by simulations conducted for applications such as texture retrieval and retrieval of the DCT compressed images.

INTRODUCTION
In this paper the adoption of a machine-learning approach is proposed for image matching in content-based image retrieval (CBIR). This learning approach has one main advantage over the traditional retrieval approaches: it allows the retrieval system to solve the problem of fuzzy understanding of users goals that are not realized by a one-shot retrieval procedure. Specifically, we for this task in view of propose the adoption of neural network techniques [1][2] their learning capability and their ability to simulate universal mapping. These techniques allow the users to directly modify the query characteristics by specifying their desired image attributes in the form of training examples. Within this framework, the retrieval system can adjust its strategy to progressively model the notion of image similarity through continual relevance feedback from the user. In fact it uses individual user choices to decide relevance. We have adopted a Self-organizing Tree Map (SOTM) [l]and a Learning Vector Quantization (LVQ) [3] to generate prototypes in the form of local data clustering. This is used for approximating the distribution of the relevant s a ples which are associated with the training data. The resulting prototypes are then incorporated into a single-pass radial basis function network (RBFN) for ranking of an image database. In the experiments, the proposed interactive system has been applied to a compressed domain image retrieval as well as to the retrieval system that use image features extrated from uncompressed images. The comparison with other methods is provided.

0-7803-7025-2/01/$10.00 02001 IEEE.

257

Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 5, 2010 at 04:46 from IEEE Xplore. Restrictions apply.

GENERATION OF PROTOTYPES USING SOTM/LVQ ALGORITHMS


Traditional query refinement strategy [4] is implemented as an iterative search strategy which uses the information provided by the user to update the initial submited query. However, this produces a single query which only allows for global similarity evaluation, but does not adapt well to the local context defined by i;he current query. Here, we implement the query refinement strategy based on local clustering which aims at optimizing the current search. We propose the adoption of an efficient data clustering technique, the SOTM [l]algorithm to achieve this purpose. A set of relevant images is treated as training data, to create the weight vectors called prototypes which are found locally in the input space. This follows the distribution of input data associated with the relevant images. We then adopt LVQ t o modify the prototypes by negative samples (i.e., non-relevant images) to improve the clustering performance. ~ [ l Unsupervised ). Growing of a New Cluster. The SOTM [l]provides construction of unsupervised suitable feature clustering. It is more effective than the K-mean and the self organizing feature map (SOM) algorithms particulary when the input space has a high dimensionality (i.e., sparse data). Thus, SOTM is chosen in our work to locate cluster centers in the high dimensional space of image features. To carry out the prototypes using SOTM/LVQ algorithms, we are given a set of training samples corresponding to retrieved images from a previous search operation. We formally denote this training data with two sets of vectors: positive sample set (relevant images) X+ = { X I , .. ., X N } c R P ,and negative sample set (non-relevant images) X- = { x i , .. . ,xh} c RP. We then assign each input vector in X+ into an SOTM algorithm to create the weight vectors, whereas the vectors in X- are used in the LVQ algorithm for further adjustment of the weight vectors. SOTM algorithm is summarized as follows [l]: (i). Initialization. Choose the root nodes { ~ i in a random manner. input vectors {xi}gl
} i =from ~

the available set of

(ii). Similarity matching. Randomly select a new data point x , and find the best-matching (winning) neuron at time step t by using the minimumdistance Euclidean criterion:
wj* = argmin Ilx(t) - wjll,
3

j = 1 , 2 , . . . ,1

(1)

(iii). Updating. If ( ( x ( t) wj. (1 5 H ( t ) , where H ( t ) is the hierarchy function used to control the levels of the tree, then assign x ( t )to the j-th cluster, and adjust the synaptic weight vector according t o the reinforced learning rule: (2) Wj(t 1) = W j ( t ) a(t)[x(t) - Wj(t)], where a ( t )is the learning rate, which decreases monotonically with time, 0 < a(t)< 1. Else form a new subnode starting with x .

258

Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 5, 2010 at 04:46 from IEEE Xplore. Restrictions apply.

(iv). Continuation. Continue with step (ii) until no noticeable changes in the feature map are observed. The SOTM algorithm obtains a new set of cluster centers VO= { V I , .. .,V k , . . . ,V K } , where the number of centers I< is controlled by the function H ( t ) . In the experiment, H ( t ) was initialized by the norm of the training vectors in X + , and was reduced linearly. (2). Cluster Modification. At this stage, the negative samples in Xare used to tune the decision boundaries of the initial set of prototypes VO. We employ the antireinforced learning rule in the LVQ algorithm [3] to perform this operation. The algorithm starts with randomly choosing input vector xA(t) from the training set { x A } g Z l . Then, classify xA(t) to node v i if llxA(t) v i 1 1 < llx&,(t) - Vk11,Vk # i, and apply the antireinforced learning rule to the corresponding cluster centers as follows:
Vi(t

+ 1) = V i ( t ) - rl(t)[x:,(t)- V i ( t ) l ,

(3)

where ~ ( t is ) the learning constant which decreases monotonically with the number of iterations t , 0 5 q ( t ) 5 1. After several passes through the training data, the node vectors typically converge, and the modification process is complete. Note that we desire to move the prototypes slightly.

RANKING IMAGES USING RADIAL BASIS FUNCTION NETWORK (RBFN)


Having the prototypes to describe the local clusters of relevant images, the next step is to use them in a new search for retrieval. To obtain a new set of retrieved images, a process of evaluating similarity between each prototype and the input vectors corresponding to images in the database is performed . We proposed a single-pass RBF network to perform this operation using its nonlinear discrimination capability. Unlike traditional RBF network models that require iterative learning, the proposed network requires only a single pass of processing to allow rapid evaluation of image similarity in an interactive session. To carry out function approximation using RBFN, we are given a set of prototypes & = { V I ,..., v k ,...,v K } C RP from the SOTM/VQ algorithms. We then assign each input vector Vk as the center of the corresponding Gaussian kernel in the network. In other words, the RBFN uses pD Gaussian distributions to describe the shapes of data clusters indicating the relevant class. For an arbitrary input vector x , the output of the k-th RBF unit is given by

where

Uk

is a smoothing parameter defined as,

259

Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 5, 2010 at 04:46 from IEEE Xplore. Restrictions apply.

and. P = 0.5 being an overlapping factor. The estimated function output F ( x ) for x is then given as:
K

F(x) =
k=l

G k (x, v k ok)

(6)

Intuitively, the RBFN performs similarity evaluation by linearly combining the output G k . If the current input vector x (associated with an image in the database) is close to one of the RBF centers V k (i.e., the k-th prototype) in a Euclidean sense, the corresponding RBF unit G k given by Eq (4) will increase, and the estimated output F ( x ) will also increase indicating the greater similarity of the corresponding image. On the other hand, those input vectors that are far away from each of V k , k = 1 , . . .,K do not appreciably contribute to the summation due to the exponentially decaying weighting function.

EXPERIMENTS AND APPLICATIONS Texture Retrieval on the Brodatz Database


'This method was compared with the traditional relevance feedback method (RI'M) [4], using the Brodatz texture database [5]. The database containes 1856 images which were obtained from 116 different texture classes, where each class contains 16 images. Each texture image in this database is described by a 48dimensional feature vector that characterizes the coefficients after applying the Gabor wavelet (GW) transform to the image [6]. [n the simulation studies, a total of 116 images, one from each class, was selected as the query images. For each query, the top 16 images were retrieved to evaluate retrieval performance. The performance was measured in terms of the retlieval rate [6] defined as the average percentage number of images belonging to the same class as the query in the top 16 matches. 'The proposed neural network (NN) method and the relevance feedback method (RFM) have been tried. The retrieval performance is summarized in Table 1, showing very satisfactory convergence speeds and retrieval accuracies (where relevance feedback was provided upto 3 iterations). It can be seen that all learning methods demonstrated significant improvement across the tasks. The retrieval rate was improved from 73.7% (normalized Euclidean measure) to 87.6% with the application of the proposed NN method, i.e., more than 14 correct images were presented out of 16. This result shows that the NN method performs substantially better than RFM which provides a retreival performance of 78.5%. C n Table 1 the performance of the NN method was based on the SOTM/VQ algorithm with the muximum number of prototypes equal to 8 (this is controlled by the hierarchy function H ( t ) ) . After varying the number of the prototypes from 2 to 15, we observed that the retrieval performances were gradually changed in a range of 86.3% to 87.6%.

260

Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 5, 2010 at 04:46 from IEEE Xplore. Restrictions apply.

Method
NN

0 Iter.
73.71 67.10

Iter.

2 Iter.
86.53 77.76

Iter.

RFM

83.72 75.74

87.55 78.46

Table 1: Average retrieval rate (%) of 116 query images on the Brodatz database.

Application to Compressed Domain Image Retrieval


In this experiment we applied the proposed interactive method to a compressed domain image retrieval system. Specifically, the matching process is directly performed on the DCT domain to avoid the costly operation of decompression. The image database is a single uncategorized database that consists of nearly 4700 JPEG photographs [7] covering a broad range of categories. The image feature used here is based on an energy histogram of a DCT coefficient originally proposed in [8]. An energy histogram of DCT coefficients is obtained by counting the number of times a particular coefficient value appears in the 8 x 8 block. For this experiment, the energy histogram features are based on the four lower frequency DCT coefficients in the upper left corner of the DCT block. Separate energy histograms are constructed for the DC and AC coefficients of each of the color channels, and 30 bins are used for each histogram. A total of 10 query images (each was different to the other) were manually selected to ensure that the similar images by human vision were properly identified in the database. Since the number of relevant images for a particular query could not be easily identified from the database, the performance was measured by counting the number of relevant images that hit the desired target in the top 15 matches. Table 2 details the retrieval results. It can be seen that the number of relevant images increased considerably by using the learning approach. Typical retrieval sessions are shown in Figure 1, where Figures l(a) and l(b) show the 15 top-matched images before and after the application of the learning technique respectively. The improvement given by the proposed method is apparent.

Figure 1: left: (a) the retrieval results before learning similarity in answer to the query 024-172.JPG; right: (b) the retrieval resuls after learningsimilarity.

26 1

Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 5, 2010 at 04:46 from IEEE Xplore. Restrictions apply.

Query

0 Iter.

1 Iter. 14 10 5 5 11

2 Iter.
14
10

Query
046-160.JPEG 065-200.JPEG 058-143.JPEG 076-126.JPEG 040-123.JPEG

6 7 12

0 Item 4 3 8
4

1 Iter. 6 7 11 6 10

2 Iter. 6 7 10
9

Table 2: Retrieval results on the JPEG database (4678 images) using the DCT energy histogram feature. The table shows the number of relevant image in the top 15 matches.

CONCLUSION
We have proposed the adoption of neural network techniques for characterizing the behavior of human users in an interactive session where relevance feedback is applied. With the SOTM/VQ algorithms, a more effective local analysis of relevance is provided than with another better-known model: query modification approach. The network structure also includes a single-pass RBFN to perform a more accurate non-linear evaluation of image similarity for ranking purposes. Based on a simulation performance comparison, this proposed learning-based approach appears to be highly effective for retrieval application on compressed and uncompressed-domain image retrievals.

References
H.Kong and L.Guan, Self-organizing tree map for eliminating impulse noise with random intensity distributions, J . of Electronic Imaging, vo1.7/1, pp.36-44, 1998.

T. Sigitani, Y. Liguni, H. Maeda, Image interpolation for progressive transmission by using radial basis function networks, I E E E Trans. on Neural Networks, ~01.10/2,pp. 381-390, 1999.
S. Haykin, Neural Networks: a Comprehensive Foundation, Prentice Hall, NJ, 1999.

Y. Rui, T.S. Huang, and S. Mehrotra, Content-based image retrieval with relevance feedback in MARS, Proc. I E E E Int. Conf. on Image Processing, pp. 815-818, 1997.
http://vivaldi.ece.ucsb.edu/users/wei/codes/thml, 1999.

B. S. Manjunath and W. Y. Ma, Texture features for browsing and retrieval of image data, I E E E Trans. of Pattern Analysis and Machine Intelligence, vol. 18/8, pp. 837-842, 1996.
Media Graphics International, http: //www,media-graphics.net . Photo Gallery

5,000

Vol.1

CDROM,

J.A. Lay and L. Guan, Image retrieval based on energy histogram of the low frequency DCT coefficients, Proc. of the I C A S S P , pp.3009-3012, 1999.

262

Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 5, 2010 at 04:46 from IEEE Xplore. Restrictions apply.

You might also like