Academia.eduAcademia.edu

Detector of image orientation based on Borda Count

2006, Pattern Recognition Letters

Accurately and automatically detecting image orientation is a task of great importance in intelligent image processing. In this paper, we present an automatic image orientation detection algorithm based on low-level features: color moments; Harris corner; phase symmetry; edge direction histogram. Support vector machines, statistical classifiers, parzen window classifiers are used in our approach: we use Borda Count as combination rule for these classifiers. Large amounts of experiments have been conducted, on a database of more than 6000 images of real photos, to validate our approach. Discussions and future directions for this work are also addressed at the end of the paper.

Pattern Recognition Letters 27 (2006) 180–186 www.elsevier.com/locate/patrec Detector of image orientation based on Borda Count Alessandra Lumini, Loris Nanni * DEIS, IEIIT—CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Received 17 November 2004; received in revised form 1 August 2005 Available online 10 October 2005 Communicated by G. Borgefors Abstract Accurately and automatically detecting image orientation is a task of great importance in intelligent image processing. In this paper, we present an automatic image orientation detection algorithm based on low-level features: color moments; Harris corner; phase symmetry; edge direction histogram. Support vector machines, statistical classifiers, parzen window classifiers are used in our approach: we use Borda Count as combination rule for these classifiers. Large amounts of experiments have been conducted, on a database of more than 6000 images of real photos, to validate our approach. Discussions and future directions for this work are also addressed at the end of the paper.  2005 Elsevier B.V. All rights reserved. Keywords: Image orientation detection; Combination rule; Support vector machines; Low-level features; Borda Count 1. Introduction With advances in the multimedia technologies and the advent of the Internet, more and more users are very likely to create digital photo albums. Moreover, the progress in digital imaging and storage technologies have made processing and management of digital photos, either captured from photo scanners or digital cameras, essential functions of personal computers and intelligent home appliances. To input a photo into a digital album, the digitized or scanned image is required to be displayed in its correct orientation. However, automatic detection of image orientation is a very difficult task. Humans identify the correct orientation of an image through the contextual information or object recognition, which is difficult to achieve with present computer vision technology. However there are some external information that can be considered to improve the performance of a detector * Corresponding author. Fax: +39 0547 338890. E-mail address: [email protected] (L. Nanni). 0167-8655/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2005.08.023 of image orientation in such a case where the acquisition source is known: a photo acquired by a digital camera often is taken in the normal way (i.e., 0 rotation), sometimes rotated by 90 or 270, but very seldom rotated by 180; a set of images acquired by digitalization of a film very unlikely has an orientation differing more than 90 (i.e., horizontal images are all straight or all upset), therefore the orientation of the most of the images belonging to the same film can be used to correct classification errors. Since image orientation detection is a relatively new topic, the literature about this is quite sparse. In (Evano and McNeill, 1998) a simple and rapid algorithm for medical chest image orientation detection has been developed. The most related work that investigates this problem was recently presented in (Poz and Tommaselli, 1998; Vailaya et al., 1999; Vailaya and Jain, 2000; Wang and Zhang, 2004). In (Wang and Zhang, 2004) the authors extract edge-based structural features and color moment features: these two sources of information are incorporated into a recognition system to provide complimentary information for robust image orientation detection. Support vector machines (SVMs) based classifiers are utilized. In (Zhang A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186 181 Fig. 1. The four possible orientations of acquisition for a digital image: (a) 0, (b) 180, (c) 90, (d) 270. et al., 2002) the authors propose an automated method based on the boosting algorithm to estimate image orientation. The combination of multiple classifiers was shown to be suitable for improving the recognition performance in many difficult classification problems (Kittler and Roli, 2000). Recently a number of classifier combination methods, called ensemble methods, have been proposed in the field of machine learning. Given a single classifier, called the base classifier, a set of classifiers can be automatically generated by changing the training set (Breiman, 1996), the input features (Ho, 1998), the input data by injecting randomness, or the parameters and architecture of the classifiers. A summary of such methods is given in (Dietterich, 2000). In this work, we assume that the input image is restricted to only four possible rotations that are multiples of 90 (Fig. 1). Therefore, we represent the orientation detection problem as a four-class classification problem (0, 90, 180, 270). We propose two new sets of features never used in the literature for this issue and we prove experimentally that they are useful in solving this problem. Moreover we test several classifiers and we show that a combination of different classifiers trained in different feature spaces obtains a higher performance than a standalone classifier. 2. System overview In this section a brief description of the feature extraction methodologies, feature transformations, classifiers and ensemble methods combined and tested in this work is given. to be extracted, in this work we adopt a regular subdivision in N · N non-overlapping blocks (we empirically select N = 10) and the features are extracted from these local regions. 2.1.1. Color moments (COL) It is shown in (Jain and Vailaya, 1996) that color moments of an image in the LUV color space are very simple yet very effective for color-based image analysis. We use the first order (mean color) and the second order moments (color variance) as our COL features to capture image chrominance information, so that for each block 6 COL features (3 mean and 3 variance values of L, U, V components) are extracted. Finally within each block, the COL vector is normalized such that the sum of each componentÕs square is one. The dimension of the COL feature vector is 6 · 100 = 600. 2.1.2. Edge direction histogram (EDH) The edge-based structural features are employed in this work to capture the luminance information carried by the edge map of an image. Specifically, we utilize the edge direction histogram (EDH) to characterize structural and texture information of each block, similarly as that in (Jain and Vailaya, 1996). The Canny edge detector (Canny, 1986) is used to extract the edges in an image. In our experiments, we use a total of 37 bins to represent the edge direction histogram. The first 36 bins represent the count of edge points of each block with edge directions quantized at 10 intervals, and the last bin represents the count of the number of pixels that do not contribute to an edge (which is the difference between the dimension of a block and the first 36 bins). The dimension of the EDH feature vector is 37 · 100 = 3700. 2.1. Feature extraction Feature extraction is a process that extracts a set of features from the original image representation through some functional mapping. In this task it is important to extract ‘‘local’’ features sensitive to rotation, in order to distinguish among the four orientations: for example the global histogram of an image is not a good feature because it is invariant to rotations. To overcome this problem an image is first divided in blocks, and then the features are extracted from each block. Several block decomposition have been proposed in the literature depending on the set of features 2.1.3. Harris corner histogram (HCH) A corner is a point that can be extracted consistently over different views, and such as there is enough information in its neighborhood so that corresponding points can be automatically matched. The corner features are employed in this work to capture information about the presence of details in blocks by counting the number of corner points in each block (only one feature per block is needed). For corner detection we use the Harris corner detector (Harris and Stephens, 1988). The dimension of the HCH feature vector is 100. 182 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186 2.1.4. Phase symmetry (PHS) Phase symmetry is an illumination and contrast invariant measure of symmetry in an image, developed from its representations the frequency domain. In particular, phase congruency (Kovesi, 1999) can be used as an illumination and contrast invariant measure of feature significance. This allows edges, lines and other features to be detected reliably, and fixed thresholds can be applied over wide classes of images. Points of local symmetry and asymmetry in images can be detected from the special arrangements of phase that arise at these points, and the level of symmetry/asymmetry can be characterized by invariant measures. In this work we calculate the phase symmetry image to count the number of symmetry pixels in each block (one feature per block). The dimension of the PHS feature vector is 100. The above COL, EDH, HCH and PHS vectors are normalized within each block. In order to accommodate the scale differences over different images during the feature extraction, all the features extracted are also normalized over the training samples to the same scale. A linear normalization procedure is performed, so that features are in the range [0, 1]. Please note that the last two sets of features have never been used for orientation detection. 2.2. Feature transformation Feature transformation is a process through which a new set of features is created from an existing one. We adopt a Karhunen–Lòeve transformation (KL) to reduce the feature set to a lower dimensionality, obtaining a new reduced set of decorrelated features. This step is performed since the original dimension of the feature space is too large to make training of classifiers feasible. Given a F-dimensional data points x, the goal of KL (Duda et al., 2000) is to reduce the dimensionality of the observed vector. This is obtained by finding k principal axes, denoted as principal components, which are given by the eigenvectors associated with the k largest eigenvalues of the covariance matrix of the training set. In this paper the features extracted, whose original dimension depends on the set of feature used, are reduced by KL to 100-dimensional vector. 2.3. Classifiers A classifier is a component that uses the feature vector provided by the feature extraction or transformation to assign a pattern to a class. In this work, we test the following classifiers: • Linear discriminant classifier (LDC) (Duda et al., 2000); • Quadratic discriminant classifier (QDC) (Duda et al., 2000); • Parzen windows classifier (PWC) (Duda et al., 2000); • Polynomial-support vector machine (P-SVM) (Duda et al., 2000); • Radial basis function-support vector machine (R-SVM) (Duda et al., 2000). 2.4. Multiclassifier systems (MCS) Multiclassifier systems are special cases where different approaches are combined to resolve the same problem. They combine output of various classifiers trained using different datasets by a Decision rule (Kittler and Roli, 2000). Several decision rules can be used to determine the final class from an ensemble of classifiers; the most used are: Vote rule, Max rule, Min rule, Mean rule, Borda Count (Ho et al., 1994). In this work we make a comparison among these decision rules; moreover, we test a supervised method for combining different classifiers: dynamic classifier selection (DCS) (Woods et al., 1997). The best classification accuracy for the orientation detection problem was achieved, in our experiments, using Borda Count (see Table 2 in Section 3). Borda Count is defined as a mapping from a set of individual rankings to a combined ranking leading to the most relevant decision. Each class gets 1 point for each last place vote received, 2 points for each next-to-last point vote, etc., all the way up to M points for each first place vote (where M is the number of candidates/alternatives). The candidate with the largest point total wins the election. The idea behind the Borda Count method is based on the assumption of additive independence among the contributing classifiers; it ignores the redundant classifiers, which reinforce errors made by others. Advantages of the Borda Count method are that it is easy to implement and does not require any training; a disadvantage of this technique is that it treats all classifiers equally and does not take into account individual classifiers capabilities. In Section 3 we denote by BORDA a multiclassifier obtained by combining all the feature sets and all the classifiers (for a total of 4 ‘‘feature sets’’ · 5 ‘‘classifiers’’ = 20 ‘‘approaches’’), we determine the final class using Borda Count. 2.5. Clustering Since the dataset is made of images very different each to other, and thus it can be a very arduous task to classify it by a single classifier, we propose to cluster images into similar groups before the classification step. This can help in specializing the classifiers on particular sub-groups of images. This consideration is supported by other methods in the literature that try to divide the images in two classes (indoor and outdoor) in a supervised manner, before deciding the orientation (Zhang et al., 2002). However these approaches fail for two reasons: first, such a pre-classification is a very difficult task (the precision is about 90%), second no sensible advantage has been obtained in detect orientation separately on each class. Due to the difficulty of dividing images in indoor and outdoor, we perform a non-supervised clustering in this work, in order to divide 183 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186 the dataset in non-semantic clusters. The results, which are encouraging, are reported in Section 3.2. It is well known that clustering algorithms are useful for discovering complex structures in the data. We propose to partition the data into NCL clusters, using ExpectedMaximization (Duda et al., 2000), to group together similar patterns. Different classifiers are trained using the patterns that belong to each cluster. This clustering step is performed separately on each set of features. For example, while in the standard approach a LCD classifier is trained on a fixed feature set (say COL), now NCL LDC classifiers are trained on the different clusters of the COL feature set and a new pattern is classified according to its membership to a cluster (if a pattern belongs to a given cluster we decide the orientation of that pattern according to the classifier trained on its cluster). Then we make a fusion of the results as in the BORDA method, combining all the feature spaces and all the classifiers (for a total of 20); this method is denoted as CLUSTER in the Section 3. 2.6. Rejection rule A rejection rule is always useful in the orientation detection problem since many images are difficult to be correctly oriented even by a human operator. In the literature several rejection schemes have been adopted for this problem: simpler approaches are based on the rejection of those images whose maximum a posteriori probabilities are less than a threshold (Vailaya et al., 1999) or those for which the classifier has a low confidence (Wang and Zhang, 2004). In (Zhang et al., 2002) the authors propose to reject more indoor than outdoor images at the same level of confidence, basing on the observation that the accuracy of an orientation detector on indoor images is much lower than that of outdoor images. In this work we adopt a simple rejection rule based on the evaluation of the confidence value given by the classifier; we noted an improvement of performance if the rejection is evaluated only for patterns classified as 90 and 270. This behavior can be explained by the consideration that such classes are much less discriminable (as reported in Table 4). In our BORDA multiclassifier, we use the confidence obtained by ‘‘mean rule’’. 2.7. Correction rule We implement a very simple heuristic rule that takes into account the acquisition information of an image to correct the classification response. After evaluating all the photos belonging to the same roll of film (considered as a single session of work), we count the number of photos labelled as 0 and 180 and we select as correct orientation for all those photos the one having the larger number of images, thus changing the labelling of images assigned to the wrong class. In the experimental section the application of this correction rule is denoted by the name ROLL after the considered method (BORDA-ROLL or CLUSTERROLL). 3. Experiments We carried out some experiments to evaluate both the features extracted and the classifiers used. In order to test our correction rule we use a dataset of images acquired by analogical cameras and digitalized by film scanning. The images have been manually labeled with their correct orientation. The dataset is composed by about 6000 images from 350 rolls of film, distributed as follows: 39.8% with correct orientation (0), 21% with a 90 rotation, 26.4% with a 180 rotation, 12.8% with a 270 rotation. This dataset is substantially more difficult than others tested in the literature for two reasons: first, because the distribution of images among the four classes if strongly unbalanced versus classes difficult to be detected (due to the film-scanning origin of our photos), with respect to other distributions reported in the literature (Segur, 2000); second, since the dataset is composed by 80% indoor photos which are hardly classifiable. It is especially difficult to detect the orientations of indoor images (i.e., it is very hard to detect the orientations of a face) because we lack the discriminative features for indoor images, while for outdoor images there are lots of useful information which can be mapped to low-level features, such as sky, grass, building and water. Even if we are aware of the difficulty of orientation detection for indoor images, we do not propose an ad hoc approach, due to the intrinsic complexity of discriminating between indoor and outdoor (see Section 2.5). The classification results are averaged on 10 tests, each time randomly resampling the test set (in each test set we have 1000 images). The training set was composed by images taken from rolls of film not used in the test set, thus the test set is poorly correlated to the training data. For each image of the training set, we employ four feature vectors corresponding to the four orientations (only one has to be extracted, the other three can be simply calculated). We perform experimentations to verify that all the sets of features proposed are useful to improve the performance: in Table 1 results of classification are reported for all the classifiers listed in Section 2.3. The experiments in Table 1 show that the best single classifier for this problem is the quadratic discriminant classifier (QDC) and that the accuracy for this classifier increases from 0.52 using only COL, which is the best single set of features, to 0.562 using a combination of all the features. Moreover, for almost all classifiers, the performance using a single set of features is Table 1 Average accuracy for the orientation detection problem using different feature spaces and classifiers Accuracy COL EDH HCH PHS ALL LDC QDC PWC RSVM PSVM 0.435 0.52 0.487 0.49 0.455 0.455 0.496 0.495 0.509 0.46 0.506 0.488 0.502 0.521 0.445 0.493 0.501 0.489 0.519 0.448 0.479 0.562 0.516 0.557 0.529 184 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186 Table 2 Average accuracy for the orientation detection problem using several combination rules for a multiclassifier and the DCS supervised method 1.00 SVMCM BORDA 0.90 Combining rule Accuracy 0.80 Min rule Max rule Mean rule Vote rule Borda Count DCS 0.542 0.512 0.608 0.598 0.62 0.541 0.70 SVMCM+EDH BORDA-ROLL 0.88 0.82 0.78 0.74 0.735 0.67 0.64 0.62 0.57 0.60 0.57 0.56 0.53 0.50 0.65 0.49 0.48 0.46 0.40 0.30 0.20 outperformed by that obtained using a sequential combination of all the features (ALL). The second experiment was aimed to evaluate the performance of a multiclassifier system constructed by combining all the feature spaces and all the classifiers studied in Table 1, by varying the combination rule; in addition the result of the DCS supervised method for combining classifiers is reported. The results (Table 2) demonstrate that the Borda Count outperforms the other rules: in the following we denote with BORDA the fusion based on the Borda Count, for comparison to other state-of-the-art approaches. The third experiment we carried out was aimed to compare the performance of the proposed BORDA method and the approaches in (Vailaya and Jain, 1999; Zhang et al., 2002). A high discrepancy in accuracies is present in the methods proposed in the literature, which is most likely due to the fact that the datasets were different. High performance has been reported on constrained image sets, such as Corel, while lower performance has been achieved on typical consumer photos (Segur, 2000), where the number of indoor pictures is much more higher, a high percentage of photos contain people taken at the typical subject distance (not only portraits as in Corel), and contain a much higher level of background clutter. Without the access to the specific datasets, it is very difficult to make a comparison with other state-of-the-art methods, therefore we make a comparison at different level of rejection with our reimplementation of the two methods, which consist of a polynomial SVM trained using COL or COL + EDH features (denoted as SVMCM and SVMCM + EDH, respectively). From the results shown in Fig. 2, we can see that with the same training data, the BORDA fusion method performs better than SVMCM and SVMCM + EDH. A further improvement is obtained by coupling our method to the correction rule described in Section 2.7 (BORDAROLL). In (Vailaya and Jain, 1999; Zhang et al., 2002) the authors propose the following solutions to increase the performance of the standard SVM classifier: • An AdaBoost method (Vailaya and Jain, 1999) where the feature space is obtained extending the original feature set (COL + EDH) by combining any two features 0.10 0.00 0% 10% 20% 50% Fig. 2. Comparison (accuracy) among several state-of-the-art methods at different levels of rejection. with addition operation and thus getting a very large feature set; • A two layer SVMs (Vailaya and Jain, 1999) (with trainable combiner); • A rejection scheme to reject more indoor than outdoor images at the same level of confidence score (Zhang et al., 2002). However these solutions grant only a slighter improvement of performance with respect to a standard SVM classifier, while adopting our fusion method the performance is really better than a stand-alone classifier. Moreover the results obtained without rejection prove that the BORDA recognition rate (0.62) is higher than any single classifier (see last column of Table 1). Finally our correction rule has proven to be well suited for this problem, since it allows to improve the performance of the base method. We also analyzed the computation time of our system. In Table 3 the average time spent for the main processing steps of our method is reported. All the simulations have been carried out on a PC Pentium IV 2400 MHz (Matlab Code not optimized). Recently a semantic method has been proposed in the literature that approaches the image orientation detection Table 3 Average time spent for the main processing steps Step Time (s) COL (feature extraction) EDH (feature extraction) HCH (feature extraction) PHS (feature extraction) LDC (classification) QDC (classification) PWC (classification) RSVM (classification) PSVM (classification) BORDA SVMCM + EDH 0.002 0.1 0.15 0.25 0.00047 0.00047 0.0002 0.00001 0.00001 0.50316 0.10021 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186 Table 4 Confusion matrix obtained using BORDA on a test set of 1000 images 0 90 180 270 0 90 180 270 265 24 45 18 19 115 18 29 101 27 192 32 13 44 9 49 185 semantic features extraction and orientation prediction need about 1 s per photo using optimized code (as reported by the authors), more than the time spent for low-level features. In this work we do not deal with the semantic concepts, however our new low-level features can be integrated with the semantic cues to further improve the performance. 3.1. Error analysis Table 5 Average accuracy on indoor and outdoor images and total variance of several methods Method Accuracy (indoor) Accuracy (outdoor) Variance SVMCM SVMCM + EDH BORDA CLUSTER (NCL = 2) 0.37 0.45 0.55 0.56 0.82 0.85 0.90 0.90 31 35 26.22 18 problem via confidence-based integration of low-level and semantic cues within a Bayesian framework. The authors show that integrating low level features (color moments + edge histograms) with semantic cues (face, grass, sky and wall detection) the performance increases of a 10% on a consumer photos dataset (Luo and Boutell, 2005). The In Table 4 the confusion matrix of the BORDA approach is reported: it reveals that the most difficult classes are the ‘‘vertical’’ ones (90 and 270), which have a total accuracy of 0.485 vs. 0.69 of the ‘‘horizontal’’ classes. Moreover the most percentage of errors (57%) is generated by a confusion between a class and its opposite in the direction sense (0–180, 90–270). We manually classified 200 images as indoor/outdoor in order to evaluate the performance on these two classes of images: the results reported in Table 5 confirm the difficulty in managing indoor images, and show a good improvement of the performance of the BORDA method for indoor images even without any ad hoc rule. In Fig. 3 some images taken from our dataset are shown, (a) which have been erroneously classified (whose Fig. 3. Some images from our dataset which appear to be difficult to classify. 186 A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186 Table 6 Average accuracy of the CLUSTER method by varying the number of clusters (NCL) at different levels of rejection. The method coincides with BORDA if NCL = 1 Method Rejection posed a non-supervised clustering, in order specialize the detection on these non-semantic clusters. The first results in this sense are encouraging and as a future work we plan to develop this idea. 0% 10% 20% 50% References NCL = 1 BORDA BORDA-ROLL 0.62 0.738 0.64 0.78 0.67 0.82 0.74 0.88 NCL = 2 CLUSTER CLUSTER-ROLL 0.63 0.75 0.64 0.79 0.66 0.82 0.75 0.92 NCL = 3 CLUSTER CLUSTER-ROLL 0.61 0.71 0.64 0.76 0.66 0.78 0.73 0.86 Breiman, L., 1996. Bagging predictors. Machine Learning (2), 123–140. Canny, J.F., 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. 8 (6), 679–698. Dietterich, T.G., 2000. Ensemble methods in machine learning. In: Kittler, J., Roli, F. (Eds.), 2000. First Internat. Workshop on Multiple Classifier Systems. Springer, Cagliari, Italy. pp. 1–15. Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classification, second ed. Wiley. Evano, M.G., McNeill, K.M., 1998. Computer recognition of chest image orientation. In: Proc. Eleventh IEEE Symp. on Computer-Based Medical Systems. pp. 275–279. Harris, C., Stephens, M., 1988. A combined corner and edge detector. In: Fourth Alvey Vision Conf., pp. 147–151. Ho, T.K., 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intell. 20 (8), 832– 844. Ho, T.H., Hull, J.J., Srihari, S.N., 1994. Decision combination in multiple classifier system. IEEE Trans. Pattern Anal. Machine Intell. 16 (part 1), 66–75. Jain, A.K., Vailaya, A., 1996. Image retrieval using color and shape. Pattern Recognition (29), 1233–1244. Kittler, J., Roli, F. (Eds.), 2000. First Internat. Workshop on Multiple Classifier Systems. Springer, Cagliari, Italy. Kovesi, P., 1999. Image features from phase congruency. Videre: A J. Comput. Vision Research. 1(3), Summer. Luo, J., Boutell, M., 2005. Automatic image orientation detection via confidence-based integration of low-level and semantic cues. IEEE Trans. PAMI 27 (4), 715–726. Poz, A.P.D., Tommaselli, A.M.G., 1998. Automatic absolute orientation of scanned aerial photographs. In: Proc. Internat. Symp. on Computer Graphics, Image Processing, and Vision. pp. 295–302. Segur R., 2000. Using photographic space to improve the evaluation of consumer cameras. In: Proc. IS&T Image Processing, Image Quality, Image Capture and Systems (PICS) Conf. pp. 221–224. Vailaya, A., Jain, A.K., 1999. Incremental learning for bayesian classification of images. IEEE Internat. Conf. on Image Processing, pp. 585– 589. Vailaya, A., Jain, A.K., 2000. Rejection option for VQ-based Bayesian classification. In: Proc. Fifteenth Internat. Conf. on Pattern Recognition. pp. 48–51. Vailaya, A., Zhang, H., Jain, A.K., 1999. Automatic image orientation detection. In: Proc. Sixth IEEE Internat. Conf. on Image Processing, vol. 2. pp. 600–604. Wang, Y.M., Zhang, H., 2004. Detecting image orientation based on lowlevel visual content. Computer Vision and Image Understanding 93, 328–346. Woods, K., Kegelmeyer, W.P., Bowyer, K.W., 1997. Combination of multiple classifiers using local accuracy estimates. IEEE Trans. PAMI 19 (4), 405–410. Zhang, L., Li, M., Zhang, H.-J., 2002. Boosting image orientation detection with indoor vs. outdoor classification. In: Sixth IEEE Workshop on Applications of Computer Vision. orientation has not been correctly detected) by SVMCM + EDH and correctly by BORDA, (b) erroneously classified by both methods, and (c) rejected by our method (rejection 20%). The faces in this paper have been blurred due to privacy. 3.2. Clustering evaluation In Table 6 the accuracy of the CLUSTER and CLUSTER-ROLL approaches is reported: the performance is quite similar to those of the BORDA/BORDA-ROLL, with the advantage of a lower variance of the accuracy (reported in Table 5). Another interesting result is that the clusters obtained by EM are balanced with respect to the four orientation classes and to the indoor/outdoor classes. This means that the feature spaces generated by the considered low-level features are not particularly discriminant for these two classification problems. 4. Conclusions We have proposed an automatic approach for contentbased image orientation detection. Extensive experiments on a database of more than 6000 images were conducted to evaluate the system. The experimental results, obtained on a dataset of ‘‘difficult’’ images which is very similar to real applications, show that our approach outperforms RSVM and others ‘‘stand-alone’’ classifiers. However, the general image orientation detection is still a challenging problem. It is especially difficult to detect the orientations of indoor images because we lack the discriminative features for indoor images. Due to the difficulty of considering separately the indoor and outdoor images, we have pro-