Pattern Recognition Letters 27 (2006) 180–186
www.elsevier.com/locate/patrec
Detector of image orientation based on Borda Count
Alessandra Lumini, Loris Nanni
*
DEIS, IEIIT—CNR, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy
Received 17 November 2004; received in revised form 1 August 2005
Available online 10 October 2005
Communicated by G. Borgefors
Abstract
Accurately and automatically detecting image orientation is a task of great importance in intelligent image processing. In this paper,
we present an automatic image orientation detection algorithm based on low-level features: color moments; Harris corner; phase symmetry; edge direction histogram. Support vector machines, statistical classifiers, parzen window classifiers are used in our approach: we
use Borda Count as combination rule for these classifiers. Large amounts of experiments have been conducted, on a database of more
than 6000 images of real photos, to validate our approach. Discussions and future directions for this work are also addressed at the end
of the paper.
2005 Elsevier B.V. All rights reserved.
Keywords: Image orientation detection; Combination rule; Support vector machines; Low-level features; Borda Count
1. Introduction
With advances in the multimedia technologies and the
advent of the Internet, more and more users are very likely
to create digital photo albums. Moreover, the progress in
digital imaging and storage technologies have made processing and management of digital photos, either captured
from photo scanners or digital cameras, essential functions
of personal computers and intelligent home appliances. To
input a photo into a digital album, the digitized or scanned
image is required to be displayed in its correct orientation.
However, automatic detection of image orientation is a
very difficult task. Humans identify the correct orientation
of an image through the contextual information or object
recognition, which is difficult to achieve with present computer vision technology.
However there are some external information that can
be considered to improve the performance of a detector
*
Corresponding author. Fax: +39 0547 338890.
E-mail address:
[email protected] (L. Nanni).
0167-8655/$ - see front matter 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2005.08.023
of image orientation in such a case where the acquisition source is known: a photo acquired by a digital camera
often is taken in the normal way (i.e., 0 rotation), sometimes rotated by 90 or 270, but very seldom rotated by
180; a set of images acquired by digitalization of a film
very unlikely has an orientation differing more than 90
(i.e., horizontal images are all straight or all upset), therefore the orientation of the most of the images belonging
to the same film can be used to correct classification errors.
Since image orientation detection is a relatively new
topic, the literature about this is quite sparse. In (Evano
and McNeill, 1998) a simple and rapid algorithm for medical chest image orientation detection has been developed.
The most related work that investigates this problem was
recently presented in (Poz and Tommaselli, 1998; Vailaya
et al., 1999; Vailaya and Jain, 2000; Wang and Zhang,
2004). In (Wang and Zhang, 2004) the authors extract
edge-based structural features and color moment features:
these two sources of information are incorporated into a
recognition system to provide complimentary information
for robust image orientation detection. Support vector machines (SVMs) based classifiers are utilized. In (Zhang
A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186
181
Fig. 1. The four possible orientations of acquisition for a digital image: (a) 0, (b) 180, (c) 90, (d) 270.
et al., 2002) the authors propose an automated method
based on the boosting algorithm to estimate image
orientation.
The combination of multiple classifiers was shown to be
suitable for improving the recognition performance in
many difficult classification problems (Kittler and Roli,
2000). Recently a number of classifier combination methods, called ensemble methods, have been proposed in the
field of machine learning. Given a single classifier, called
the base classifier, a set of classifiers can be automatically
generated by changing the training set (Breiman, 1996),
the input features (Ho, 1998), the input data by injecting
randomness, or the parameters and architecture of the classifiers. A summary of such methods is given in (Dietterich,
2000).
In this work, we assume that the input image is restricted to only four possible rotations that are multiples
of 90 (Fig. 1). Therefore, we represent the orientation
detection problem as a four-class classification problem
(0, 90, 180, 270). We propose two new sets of features
never used in the literature for this issue and we prove
experimentally that they are useful in solving this problem.
Moreover we test several classifiers and we show that a
combination of different classifiers trained in different feature spaces obtains a higher performance than a standalone classifier.
2. System overview
In this section a brief description of the feature extraction methodologies, feature transformations, classifiers
and ensemble methods combined and tested in this work
is given.
to be extracted, in this work we adopt a regular subdivision
in N · N non-overlapping blocks (we empirically select
N = 10) and the features are extracted from these local
regions.
2.1.1. Color moments (COL)
It is shown in (Jain and Vailaya, 1996) that color moments of an image in the LUV color space are very simple
yet very effective for color-based image analysis. We use the
first order (mean color) and the second order moments
(color variance) as our COL features to capture image
chrominance information, so that for each block 6 COL
features (3 mean and 3 variance values of L, U, V components) are extracted. Finally within each block, the COL
vector is normalized such that the sum of each componentÕs
square is one. The dimension of the COL feature vector is
6 · 100 = 600.
2.1.2. Edge direction histogram (EDH)
The edge-based structural features are employed in this
work to capture the luminance information carried by the
edge map of an image. Specifically, we utilize the edge
direction histogram (EDH) to characterize structural and
texture information of each block, similarly as that in (Jain
and Vailaya, 1996). The Canny edge detector (Canny,
1986) is used to extract the edges in an image. In our experiments, we use a total of 37 bins to represent the edge direction histogram. The first 36 bins represent the count of
edge points of each block with edge directions quantized
at 10 intervals, and the last bin represents the count of
the number of pixels that do not contribute to an edge
(which is the difference between the dimension of a block
and the first 36 bins). The dimension of the EDH feature
vector is 37 · 100 = 3700.
2.1. Feature extraction
Feature extraction is a process that extracts a set of features from the original image representation through some
functional mapping. In this task it is important to extract
‘‘local’’ features sensitive to rotation, in order to distinguish among the four orientations: for example the global
histogram of an image is not a good feature because it is
invariant to rotations. To overcome this problem an image
is first divided in blocks, and then the features are extracted
from each block. Several block decomposition have been
proposed in the literature depending on the set of features
2.1.3. Harris corner histogram (HCH)
A corner is a point that can be extracted consistently
over different views, and such as there is enough information in its neighborhood so that corresponding points can
be automatically matched. The corner features are employed in this work to capture information about the presence of details in blocks by counting the number of corner
points in each block (only one feature per block is needed).
For corner detection we use the Harris corner detector
(Harris and Stephens, 1988). The dimension of the HCH
feature vector is 100.
182
A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186
2.1.4. Phase symmetry (PHS)
Phase symmetry is an illumination and contrast invariant measure of symmetry in an image, developed from its
representations the frequency domain. In particular, phase
congruency (Kovesi, 1999) can be used as an illumination
and contrast invariant measure of feature significance. This
allows edges, lines and other features to be detected reliably, and fixed thresholds can be applied over wide classes
of images. Points of local symmetry and asymmetry in
images can be detected from the special arrangements of
phase that arise at these points, and the level of symmetry/asymmetry can be characterized by invariant measures.
In this work we calculate the phase symmetry image to
count the number of symmetry pixels in each block (one
feature per block). The dimension of the PHS feature vector is 100.
The above COL, EDH, HCH and PHS vectors are normalized within each block. In order to accommodate the
scale differences over different images during the feature
extraction, all the features extracted are also normalized
over the training samples to the same scale. A linear normalization procedure is performed, so that features are in
the range [0, 1]. Please note that the last two sets of features
have never been used for orientation detection.
2.2. Feature transformation
Feature transformation is a process through which a
new set of features is created from an existing one. We
adopt a Karhunen–Lòeve transformation (KL) to reduce
the feature set to a lower dimensionality, obtaining a new
reduced set of decorrelated features. This step is performed
since the original dimension of the feature space is too large
to make training of classifiers feasible.
Given a F-dimensional data points x, the goal of KL
(Duda et al., 2000) is to reduce the dimensionality of the
observed vector. This is obtained by finding k principal
axes, denoted as principal components, which are given
by the eigenvectors associated with the k largest eigenvalues of the covariance matrix of the training set.
In this paper the features extracted, whose original
dimension depends on the set of feature used, are reduced
by KL to 100-dimensional vector.
2.3. Classifiers
A classifier is a component that uses the feature vector
provided by the feature extraction or transformation to assign a pattern to a class. In this work, we test the following
classifiers:
• Linear discriminant classifier (LDC) (Duda et al., 2000);
• Quadratic discriminant classifier (QDC) (Duda et al.,
2000);
• Parzen windows classifier (PWC) (Duda et al., 2000);
• Polynomial-support vector machine (P-SVM) (Duda
et al., 2000);
• Radial basis function-support vector machine (R-SVM)
(Duda et al., 2000).
2.4. Multiclassifier systems (MCS)
Multiclassifier systems are special cases where different
approaches are combined to resolve the same problem.
They combine output of various classifiers trained using
different datasets by a Decision rule (Kittler and Roli,
2000). Several decision rules can be used to determine the
final class from an ensemble of classifiers; the most used
are: Vote rule, Max rule, Min rule, Mean rule, Borda
Count (Ho et al., 1994). In this work we make a comparison among these decision rules; moreover, we test a supervised method for combining different classifiers: dynamic
classifier selection (DCS) (Woods et al., 1997). The best
classification accuracy for the orientation detection problem was achieved, in our experiments, using Borda Count
(see Table 2 in Section 3).
Borda Count is defined as a mapping from a set of individual rankings to a combined ranking leading to the most
relevant decision. Each class gets 1 point for each last place
vote received, 2 points for each next-to-last point vote, etc.,
all the way up to M points for each first place vote (where
M is the number of candidates/alternatives). The candidate
with the largest point total wins the election. The idea behind the Borda Count method is based on the assumption
of additive independence among the contributing classifiers; it ignores the redundant classifiers, which reinforce
errors made by others. Advantages of the Borda Count
method are that it is easy to implement and does not
require any training; a disadvantage of this technique is
that it treats all classifiers equally and does not take into
account individual classifiers capabilities.
In Section 3 we denote by BORDA a multiclassifier obtained by combining all the feature sets and all the classifiers
(for a total of 4 ‘‘feature sets’’ · 5 ‘‘classifiers’’ = 20 ‘‘approaches’’), we determine the final class using Borda Count.
2.5. Clustering
Since the dataset is made of images very different each to
other, and thus it can be a very arduous task to classify it
by a single classifier, we propose to cluster images into similar groups before the classification step. This can help in
specializing the classifiers on particular sub-groups of
images. This consideration is supported by other methods
in the literature that try to divide the images in two classes
(indoor and outdoor) in a supervised manner, before deciding the orientation (Zhang et al., 2002). However these approaches fail for two reasons: first, such a pre-classification
is a very difficult task (the precision is about 90%), second
no sensible advantage has been obtained in detect orientation separately on each class. Due to the difficulty of
dividing images in indoor and outdoor, we perform a
non-supervised clustering in this work, in order to divide
183
A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186
the dataset in non-semantic clusters. The results, which are
encouraging, are reported in Section 3.2.
It is well known that clustering algorithms are useful for
discovering complex structures in the data. We propose to
partition the data into NCL clusters, using ExpectedMaximization (Duda et al., 2000), to group together
similar patterns. Different classifiers are trained using the
patterns that belong to each cluster. This clustering step
is performed separately on each set of features. For example, while in the standard approach a LCD classifier is
trained on a fixed feature set (say COL), now NCL LDC
classifiers are trained on the different clusters of the COL
feature set and a new pattern is classified according to its
membership to a cluster (if a pattern belongs to a given
cluster we decide the orientation of that pattern according
to the classifier trained on its cluster). Then we make a fusion of the results as in the BORDA method, combining all
the feature spaces and all the classifiers (for a total of 20);
this method is denoted as CLUSTER in the Section 3.
2.6. Rejection rule
A rejection rule is always useful in the orientation detection problem since many images are difficult to be correctly
oriented even by a human operator. In the literature several
rejection schemes have been adopted for this problem: simpler approaches are based on the rejection of those images
whose maximum a posteriori probabilities are less than a
threshold (Vailaya et al., 1999) or those for which the classifier has a low confidence (Wang and Zhang, 2004). In
(Zhang et al., 2002) the authors propose to reject more indoor than outdoor images at the same level of confidence,
basing on the observation that the accuracy of an orientation detector on indoor images is much lower than that of
outdoor images. In this work we adopt a simple rejection
rule based on the evaluation of the confidence value given
by the classifier; we noted an improvement of performance
if the rejection is evaluated only for patterns classified as
90 and 270. This behavior can be explained by the consideration that such classes are much less discriminable (as reported in Table 4). In our BORDA multiclassifier, we use
the confidence obtained by ‘‘mean rule’’.
2.7. Correction rule
We implement a very simple heuristic rule that takes
into account the acquisition information of an image to
correct the classification response. After evaluating all the
photos belonging to the same roll of film (considered as a
single session of work), we count the number of photos
labelled as 0 and 180 and we select as correct orientation
for all those photos the one having the larger number of
images, thus changing the labelling of images assigned to
the wrong class. In the experimental section the application
of this correction rule is denoted by the name ROLL after
the considered method (BORDA-ROLL or CLUSTERROLL).
3. Experiments
We carried out some experiments to evaluate both the
features extracted and the classifiers used. In order to test
our correction rule we use a dataset of images acquired
by analogical cameras and digitalized by film scanning.
The images have been manually labeled with their correct
orientation. The dataset is composed by about 6000 images
from 350 rolls of film, distributed as follows: 39.8% with
correct orientation (0), 21% with a 90 rotation, 26.4%
with a 180 rotation, 12.8% with a 270 rotation. This dataset is substantially more difficult than others tested in the
literature for two reasons: first, because the distribution
of images among the four classes if strongly unbalanced
versus classes difficult to be detected (due to the film-scanning origin of our photos), with respect to other distributions reported in the literature (Segur, 2000); second,
since the dataset is composed by 80% indoor photos which
are hardly classifiable. It is especially difficult to detect the
orientations of indoor images (i.e., it is very hard to detect
the orientations of a face) because we lack the discriminative features for indoor images, while for outdoor images
there are lots of useful information which can be mapped
to low-level features, such as sky, grass, building and water.
Even if we are aware of the difficulty of orientation detection for indoor images, we do not propose an ad hoc approach, due to the intrinsic complexity of discriminating
between indoor and outdoor (see Section 2.5).
The classification results are averaged on 10 tests, each
time randomly resampling the test set (in each test set we
have 1000 images). The training set was composed by
images taken from rolls of film not used in the test set, thus
the test set is poorly correlated to the training data. For
each image of the training set, we employ four feature vectors corresponding to the four orientations (only one has to
be extracted, the other three can be simply calculated).
We perform experimentations to verify that all the sets
of features proposed are useful to improve the performance: in Table 1 results of classification are reported for
all the classifiers listed in Section 2.3. The experiments in
Table 1 show that the best single classifier for this problem
is the quadratic discriminant classifier (QDC) and that the
accuracy for this classifier increases from 0.52 using only
COL, which is the best single set of features, to 0.562 using
a combination of all the features. Moreover, for almost all
classifiers, the performance using a single set of features is
Table 1
Average accuracy for the orientation detection problem using different
feature spaces and classifiers
Accuracy
COL
EDH
HCH
PHS
ALL
LDC
QDC
PWC
RSVM
PSVM
0.435
0.52
0.487
0.49
0.455
0.455
0.496
0.495
0.509
0.46
0.506
0.488
0.502
0.521
0.445
0.493
0.501
0.489
0.519
0.448
0.479
0.562
0.516
0.557
0.529
184
A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186
Table 2
Average accuracy for the orientation detection problem using several
combination rules for a multiclassifier and the DCS supervised method
1.00
SVMCM
BORDA
0.90
Combining rule
Accuracy
0.80
Min rule
Max rule
Mean rule
Vote rule
Borda Count
DCS
0.542
0.512
0.608
0.598
0.62
0.541
0.70
SVMCM+EDH
BORDA-ROLL
0.88
0.82
0.78
0.74
0.735
0.67
0.64
0.62
0.57
0.60
0.57
0.56
0.53
0.50
0.65
0.49
0.48
0.46
0.40
0.30
0.20
outperformed by that obtained using a sequential combination of all the features (ALL).
The second experiment was aimed to evaluate the performance of a multiclassifier system constructed by combining all the feature spaces and all the classifiers studied
in Table 1, by varying the combination rule; in addition
the result of the DCS supervised method for combining
classifiers is reported. The results (Table 2) demonstrate
that the Borda Count outperforms the other rules: in the
following we denote with BORDA the fusion based on
the Borda Count, for comparison to other state-of-the-art
approaches.
The third experiment we carried out was aimed to compare the performance of the proposed BORDA method
and the approaches in (Vailaya and Jain, 1999; Zhang
et al., 2002).
A high discrepancy in accuracies is present in the methods proposed in the literature, which is most likely due to
the fact that the datasets were different. High performance
has been reported on constrained image sets, such as Corel,
while lower performance has been achieved on typical consumer photos (Segur, 2000), where the number of indoor
pictures is much more higher, a high percentage of photos
contain people taken at the typical subject distance (not
only portraits as in Corel), and contain a much higher level
of background clutter. Without the access to the specific
datasets, it is very difficult to make a comparison with
other state-of-the-art methods, therefore we make a comparison at different level of rejection with our reimplementation of the two methods, which consist of a polynomial
SVM trained using COL or COL + EDH features (denoted
as SVMCM and SVMCM + EDH, respectively).
From the results shown in Fig. 2, we can see that with
the same training data, the BORDA fusion method
performs better than SVMCM and SVMCM + EDH. A
further improvement is obtained by coupling our method
to the correction rule described in Section 2.7 (BORDAROLL).
In (Vailaya and Jain, 1999; Zhang et al., 2002) the
authors propose the following solutions to increase the
performance of the standard SVM classifier:
• An AdaBoost method (Vailaya and Jain, 1999) where
the feature space is obtained extending the original feature set (COL + EDH) by combining any two features
0.10
0.00
0%
10%
20%
50%
Fig. 2. Comparison (accuracy) among several state-of-the-art methods at
different levels of rejection.
with addition operation and thus getting a very large
feature set;
• A two layer SVMs (Vailaya and Jain, 1999) (with trainable combiner);
• A rejection scheme to reject more indoor than outdoor
images at the same level of confidence score (Zhang
et al., 2002).
However these solutions grant only a slighter improvement of performance with respect to a standard SVM classifier, while adopting our fusion method the performance is
really better than a stand-alone classifier. Moreover the results obtained without rejection prove that the BORDA
recognition rate (0.62) is higher than any single classifier
(see last column of Table 1). Finally our correction rule
has proven to be well suited for this problem, since it allows
to improve the performance of the base method.
We also analyzed the computation time of our system.
In Table 3 the average time spent for the main processing
steps of our method is reported. All the simulations have
been carried out on a PC Pentium IV 2400 MHz (Matlab
Code not optimized).
Recently a semantic method has been proposed in the
literature that approaches the image orientation detection
Table 3
Average time spent for the main processing steps
Step
Time (s)
COL (feature extraction)
EDH (feature extraction)
HCH (feature extraction)
PHS (feature extraction)
LDC (classification)
QDC (classification)
PWC (classification)
RSVM (classification)
PSVM (classification)
BORDA
SVMCM + EDH
0.002
0.1
0.15
0.25
0.00047
0.00047
0.0002
0.00001
0.00001
0.50316
0.10021
A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186
Table 4
Confusion matrix obtained using BORDA on a test set of 1000 images
0
90
180
270
0
90
180
270
265
24
45
18
19
115
18
29
101
27
192
32
13
44
9
49
185
semantic features extraction and orientation prediction
need about 1 s per photo using optimized code (as reported
by the authors), more than the time spent for low-level
features. In this work we do not deal with the semantic
concepts, however our new low-level features can be integrated with the semantic cues to further improve the
performance.
3.1. Error analysis
Table 5
Average accuracy on indoor and outdoor images and total variance of
several methods
Method
Accuracy
(indoor)
Accuracy
(outdoor)
Variance
SVMCM
SVMCM + EDH
BORDA
CLUSTER (NCL = 2)
0.37
0.45
0.55
0.56
0.82
0.85
0.90
0.90
31
35
26.22
18
problem via confidence-based integration of low-level and
semantic cues within a Bayesian framework. The authors
show that integrating low level features (color moments +
edge histograms) with semantic cues (face, grass, sky and
wall detection) the performance increases of a 10% on a
consumer photos dataset (Luo and Boutell, 2005). The
In Table 4 the confusion matrix of the BORDA approach is reported: it reveals that the most difficult classes
are the ‘‘vertical’’ ones (90 and 270), which have a total
accuracy of 0.485 vs. 0.69 of the ‘‘horizontal’’ classes.
Moreover the most percentage of errors (57%) is generated
by a confusion between a class and its opposite in the direction sense (0–180, 90–270).
We manually classified 200 images as indoor/outdoor in
order to evaluate the performance on these two classes of
images: the results reported in Table 5 confirm the difficulty
in managing indoor images, and show a good improvement
of the performance of the BORDA method for indoor
images even without any ad hoc rule.
In Fig. 3 some images taken from our dataset are
shown, (a) which have been erroneously classified (whose
Fig. 3. Some images from our dataset which appear to be difficult to classify.
186
A. Lumini, L. Nanni / Pattern Recognition Letters 27 (2006) 180–186
Table 6
Average accuracy of the CLUSTER method by varying the number of
clusters (NCL) at different levels of rejection. The method coincides with
BORDA if NCL = 1
Method
Rejection
posed a non-supervised clustering, in order specialize the
detection on these non-semantic clusters. The first results
in this sense are encouraging and as a future work we plan
to develop this idea.
0%
10%
20%
50%
References
NCL = 1
BORDA
BORDA-ROLL
0.62
0.738
0.64
0.78
0.67
0.82
0.74
0.88
NCL = 2
CLUSTER
CLUSTER-ROLL
0.63
0.75
0.64
0.79
0.66
0.82
0.75
0.92
NCL = 3
CLUSTER
CLUSTER-ROLL
0.61
0.71
0.64
0.76
0.66
0.78
0.73
0.86
Breiman, L., 1996. Bagging predictors. Machine Learning (2), 123–140.
Canny, J.F., 1986. A computational approach to edge detection. IEEE
Trans. Pattern Anal. Machine Intell. 8 (6), 679–698.
Dietterich, T.G., 2000. Ensemble methods in machine learning. In: Kittler,
J., Roli, F. (Eds.), 2000. First Internat. Workshop on Multiple
Classifier Systems. Springer, Cagliari, Italy. pp. 1–15.
Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classification, second
ed. Wiley.
Evano, M.G., McNeill, K.M., 1998. Computer recognition of chest image
orientation. In: Proc. Eleventh IEEE Symp. on Computer-Based
Medical Systems. pp. 275–279.
Harris, C., Stephens, M., 1988. A combined corner and edge detector. In:
Fourth Alvey Vision Conf., pp. 147–151.
Ho, T.K., 1998. The random subspace method for constructing decision
forests. IEEE Trans. Pattern Anal. Machine Intell. 20 (8), 832–
844.
Ho, T.H., Hull, J.J., Srihari, S.N., 1994. Decision combination in multiple
classifier system. IEEE Trans. Pattern Anal. Machine Intell. 16 (part
1), 66–75.
Jain, A.K., Vailaya, A., 1996. Image retrieval using color and shape.
Pattern Recognition (29), 1233–1244.
Kittler, J., Roli, F. (Eds.), 2000. First Internat. Workshop on Multiple
Classifier Systems. Springer, Cagliari, Italy.
Kovesi, P., 1999. Image features from phase congruency. Videre: A J.
Comput. Vision Research. 1(3), Summer.
Luo, J., Boutell, M., 2005. Automatic image orientation detection via
confidence-based integration of low-level and semantic cues. IEEE
Trans. PAMI 27 (4), 715–726.
Poz, A.P.D., Tommaselli, A.M.G., 1998. Automatic absolute orientation
of scanned aerial photographs. In: Proc. Internat. Symp. on Computer
Graphics, Image Processing, and Vision. pp. 295–302.
Segur R., 2000. Using photographic space to improve the evaluation of
consumer cameras. In: Proc. IS&T Image Processing, Image Quality,
Image Capture and Systems (PICS) Conf. pp. 221–224.
Vailaya, A., Jain, A.K., 1999. Incremental learning for bayesian classification of images. IEEE Internat. Conf. on Image Processing, pp. 585–
589.
Vailaya, A., Jain, A.K., 2000. Rejection option for VQ-based Bayesian
classification. In: Proc. Fifteenth Internat. Conf. on Pattern Recognition. pp. 48–51.
Vailaya, A., Zhang, H., Jain, A.K., 1999. Automatic image orientation
detection. In: Proc. Sixth IEEE Internat. Conf. on Image Processing,
vol. 2. pp. 600–604.
Wang, Y.M., Zhang, H., 2004. Detecting image orientation based on lowlevel visual content. Computer Vision and Image Understanding 93,
328–346.
Woods, K., Kegelmeyer, W.P., Bowyer, K.W., 1997. Combination of
multiple classifiers using local accuracy estimates. IEEE Trans. PAMI
19 (4), 405–410.
Zhang, L., Li, M., Zhang, H.-J., 2002. Boosting image orientation
detection with indoor vs. outdoor classification. In: Sixth IEEE
Workshop on Applications of Computer Vision.
orientation has not been correctly detected) by
SVMCM + EDH and correctly by BORDA, (b) erroneously classified by both methods, and (c) rejected by our
method (rejection 20%). The faces in this paper have been
blurred due to privacy.
3.2. Clustering evaluation
In Table 6 the accuracy of the CLUSTER and CLUSTER-ROLL approaches is reported: the performance is
quite similar to those of the BORDA/BORDA-ROLL,
with the advantage of a lower variance of the accuracy
(reported in Table 5).
Another interesting result is that the clusters obtained
by EM are balanced with respect to the four orientation
classes and to the indoor/outdoor classes. This means that
the feature spaces generated by the considered low-level
features are not particularly discriminant for these two
classification problems.
4. Conclusions
We have proposed an automatic approach for contentbased image orientation detection. Extensive experiments
on a database of more than 6000 images were conducted
to evaluate the system. The experimental results, obtained
on a dataset of ‘‘difficult’’ images which is very similar to
real applications, show that our approach outperforms
RSVM and others ‘‘stand-alone’’ classifiers. However, the
general image orientation detection is still a challenging
problem. It is especially difficult to detect the orientations
of indoor images because we lack the discriminative features for indoor images. Due to the difficulty of considering
separately the indoor and outdoor images, we have pro-