Colour Invariants for Machine Face Recognition
Ognjen Arandjelović
Trinity College
University of Cambridge
Cambridge, CB2 1TQ
Roberto Cipolla
Department of Engineering
University of Cambridge
Cambridge, CB2 1PZ
[email protected]
[email protected]
Abstract
Illumination invariance remains the most researched, yet
the most challenging aspect of automatic face recognition.
In this paper we investigate the discriminative power of
colour-based invariants in the presence of large illumination changes between training and test data, when appearance changes due to cast shadows and non-Lambertian effects are significant. Specifically, there are three main contributions: (i) we employ a more sophisticated photometric
model of the camera and show how its parameters can be
estimated, (ii) we derive several novel colour-based face invariants, and (iii) on a large database of video sequences
we examine and evaluate the largest number of colourbased representations in the literature. Our results suggest
that colour invariants do have a substantial discriminative
power which may increase the robustness and accuracy of
recognition from low resolution images.
(a)
(b)
Figure 1. The extent of facial appearance changes with illumination is difficult to fully appreciate as the human visual system is
highly adapted to such variations. Above we visualize appearance
as a 3D surface with height proportional to pixel intensity to illustrate the challenge posed to automatic recognition methods.
1. Introduction
In this paper we are interested in colour-based face representations for machine face recognition. Owing to its invariance to illumination [11] skin colour has been used extensively in detection and tracking applications e.g. hand
tracking [10], face detection [6] and face segmentation [2].
In contrast, colour has received little attention in the face
recognition community in spite of neurophysiological evidence that it is an important cue in recognition from lowresolution images [15].
The most complete comparison of colour-based representations for face recognition was published by Torres et
al. [12] in which the discriminative power of different
colour spaces (RGB, YUV and HSV) was evaluated. There
are several limitations of the reported evaluation, which we
address in this paper. Firstly, data with little illumination
variation was used, as witnessed by the high recognition
rate (85%) attained even using unprocessed luminance only.
In contrast, our data contains extreme illumination changes
with prominent non-Lambertian effects. As we will demonstrate, these can have a dramatic effect on the recognition
performance using different colour representations. Second, the implicitly employed photometric camera model is
the simple linear model (see Sec. 2.1) which is in Sec. 3
shown to be less accurate than the more complex model proposed here. In this paper we also show how the parameters
of the complex model can be recovered from face motion
sequences, and describe and evaluate several illumination
invariants based on it.
1
Colour
Red
Green
Blue
Greyscale
1.2
1
0.8
Hue
Saturation
Pixel value
(a) RGB
Value
0.6
0.4
γ = 1.3
G = 1.4
0.2
(b) HSV
Figure 2. (a) None of the channels in the RGB decomposition of
an image are illumination invariant; (b) the Hue band of the HSV
space shows the highest degree of invariance.
2. Colour-Based Invariants
In this section we describe a detailed photometric model
of a camera. Then, by first considering its simpler, special
cases and working towards the most general form, we derive
a number of colour-based illumination invariants. These are
evaluated in Sec. 3.
2.1. Photometric camera model
Following several successful methods from the literature
[9, 1], we too start from a weak photometric assumption that
the measured intensity of a pixel is a linear function of the
albedo a(x, y) of the corresponding surface point:
I(x, y) = a(x, y) · f (Θ)
(1)
where f is a function of illumination, shape and other parameters not modelled explicitly (Θ ≡ Θ(x, y)). We extend this approach for colour images by treating each of the
colour channels IC separately and describing surface albedo
as dependent on the wavelength of incident light:
IC (x, y) = aC (x, y) · f (Θ)
(2)
where channel C is either red (R), green (G) or blue (B), see
Fig. 2.
In this paper we further augment this model to account
for nonlinearities in the camera response. In particular, we
include the (i) camera gamma parameter γ, (ii) linear gain
G and (iii) the clipping, saturation function:
IC (x, y) = max [(aC (x, y) · f (Θ))
γC
· G, 1.0]
(3)
see Fig. 3. Note that gamma is also light wavelength dependant, introducing three further unknowns: γR , γG and
γB .
0
0
0.2
0.4
0.6
Incident light energy
0.8
1
Figure 3. The previously used linear photometric model of camera
response significantly deviates from the more realistic, but also
more complex model employed in this paper.
Chromaticity Chromaticity Chromaticity Chromaticity
Red
Green
Blue
All
Figure 4. Under the assumption of a unity gamma, chromaticity
images are illumination invariant. While far less sensitive to illumination than the corresponding RGB channels in Fig. 2, there is
still notable room for improvement. It is important to notice that
the three chromaticity images exhibit different degrees of invariance, motivating our wavelength dependent gamma in (3).
2.2. Unity gamma model and chromaticity
The simplest special case of the photometric camera
model (3) that is of interest in this paper is obtained when
γR = γG = γB = 1.0. In this case, the chromaticity images HC (C ∈ {R, G, B}) are invariant both to illumination changes (i.e. to Θ in (3)) and camera parameters:
HC (x, y) = P
IC (x, y)
i∈{R,G,B} Ii (x, y)
(4)
=P
aC (x, y)
i∈{R,G,B} ai (x, y)
(5)
Examples are shown in Fig. 4.
2.3. Variable gamma model
An examination of chromaticity images shows both that
they are generally not entirely invariant to illumination and
also that red, green and blue chromaticities show different
degrees of sensitivity. Referring back to our image formation model in (3), we can see that this means that the value
of gamma is not unity and furthermore, that it is wavelength
dependent i.e. γR 6= γG 6= γB .
6
10
5
2.3.1 Estimating gammas
The key observation that we use to estimate the colour channels’ gamma values (up to their ratio) is that faces have vertical symmetry. Consider an image I of a frontal face. We
know that pixels (x1 , y) and (x2 , y) (where x2 = w+1−x1 )
correspond to surface points with the same shape and reflectance properties, see Fig. 5 (a). Then for non-saturated
pixels it holds:
γ
∀C. IC (x1 , y) = (aC (x1 , y) · f (Θ1 )) C · G
γ
IC (x2 , y) = (aC (x1 , y) · f (Θ2 )) C · G,
(6)
(7)
and eliminating aC (x1 , y):
log
IC (x1 , y)
f (Θ1 )
= γC log
.
IC (x2 , y)
f (Θ2 )
To increase the accuracy of the estimate in the presence of
image noise and spatial discretization, we find the gamma
value that achieves the best agreement across the entire image:
XX
γR
γ̂R ≡
(10)
= arg min
γ
γB
x1 y
¯2
¯
¯
IR (x1 , y) ¯¯
IB (x1 , y)
¯
(11)
− log
¯ .
¯γ · log
¯
IB (x2 , y)
IR (x2 , y) ¯
Note that recovering gammas up to their ratio is the best
that one can do without making further assumptions (such
as imposing a prior on the wavelength dependent albedos)
since a face with channel albedo aC (x, y) imaged by a camera with the corresponding gamma γC is indistinguishable
from a face with the albedo aC (x, y)γC imaged by a camera
with a unity gamma.
2.3.2 Unity gain camera
Let us first consider the special case of unity gain, i.e. G = 1
in (3). Defining semi-gamma normalized channels ÎC as
(12)
it can be shown that:
ΦB (x, y) ≡
=
(w + 1 −x, y)
4
10
3
10
2
10
1
10
0
1
2
Gamma γ
3
4
5
C
(a)
(b)
Figure 5. (a) We use face symmetry to estimate the wavelengthdependent camera gamma. (b) The estimate is made by polling
“votes” from all pixels and finding the gamma value that minimizes total disagreement across the image.
(8)
By applying (8) to two colour channels we can estimate the
ratio of the two corresponding gammas, e.g. γR and γB :
IR (x1 , y) .
IB (x1 , y)
γR
= log
.
(9)
log
γ̂R ≡
γB
IR (x2 , y)
IB (x2 , y)
γ /γ
IˆC (x, y) = [IC (x, y)] B C ,
(x, y)
Symmetry error
10
log IˆR (x, y) − log IˆB (x, y)
log IˆG (x, y) − log IˆB (x, y)
(13)
log aR (x, y) − log aB (x, y)
.
log aG (x, y) − log aB (x, y)
(14)
The quantity ΦB is thus entirely independent of illumination or camera parameters (under the assumption of unity
gain). Rather, it is a function of person-specific albedos aC
and is a colour-based invariant under the given photometric
model. Similarly, so are ΦR (x, y) and ΦG (x, y):
ΦR (x, y) ≡
log IˆG (x, y) − log IˆR (x, y)
log IˆB (x, y) − log IˆR (x, y)
(15)
ΦG (x, y) ≡
log IˆR (x, y) − log IˆG (x, y)
log IˆB (x, y) − log IˆG (x, y)
(16)
We shall refer to ΦR (x, y), ΦG (x, y) and ΦB (x, y) as, respectively, the red, green and blue-centred log-∆-ratios.
2.3.3 Variable gain model
In Sec. 2.2 we considered a special case of the camera photometric model with unity gamma. We were able to derive
two independent colour invariants by looking at each pixel
individually. In the previous section we allowed gammas to
vary. The increased number of unknown parameters meant
that we could no longer find an invariant at each pixel – indeed, we used face symmetry as a further constraint. Now
we consider the most general case of the model (3) in which
both the camera gain and the wavelength dependent gammas are variable.
Much like before, we face the problem of having a higher
number of unknowns than independent equations. In terms
of our model, the ambiguity is posed by not being able to
differentiate between a face with albedo aC (x, y) imaged
by a camera with gain G and gamma γC , and a face with
albedo aC (x, y)·G1/γC imaged by a camera with unity gain
and gamma γC .
Consider the (say) blue-centred log-∆-ratio introduced
in (14), under the variable gamma/gain model:
ΦB (x, y; G) =
=
log IˆR (x, y) − log IˆB (x, y)
log IˆG (x, y) − log IˆB (x, y)
γB log
γB log
aR (x,y)
aB (x,y)
aG (x,y)
aB (x,y)
− 1) log G
+ ( γγB
R
+ ( γγB
− 1) log G
G
(17)
.
(18)
Clearly ΦB is now a function of the camera gain and thus
no longer an invariant. However, if G was somehow known,
the same invariant of (14) could be computed easily:
log aR (x, y) − log aB (x, y)
=
(19)
log aG (x, y) − log aB (x, y)
log IˆR (x, y) − log IˆB (x, y) − ( γγB
− 1) log G
R
. (20)
γ
B
log IˆG (x, y) − log IˆB (x, y) − ( − 1) log G
γG
We use this by computing, and adjusting for, the relative
camera gain between data sets when they are compared, and
call this the adaptive log-∆-ratio.
Consider two frontal faces from different sequences. If
ΦB ′ is the blue-centred log-∆-ratio of the reference, the
relative camera gain is determined by minimizing:
XX
Ĝ = arg min
G
x
y
¯2
¯
¯
¯ log IˆR (x, y)/IˆB (x, y) − ( γB − 1) log G
¯
¯
γR
′
−
Φ
(x,
y)
¯ ,
¯
B
¯
¯ log IˆG (x, y)/IˆB (x, y) − ( γB − 1) log G
γG
(21)
where non-primed variables correspond to the nonreference image.
It can be seen that the estimate of the relative gain, is accurate when the identity of the person in the compared data
sets is the same. The value of Ĝ is not meaningful when the
corresponding identities are different. This is however not a
concern, as by the very nature of the invariant, in this case
no camera gain will produce a good match.
2.4. Saturation, specular reflections and shadows
The final aspect of our camera model that we need to address is that of colour-wise “uninformative” pixels. We classify these into three groups: saturated, specular and shadowed.
Saturation is perhaps the easiest to understand as being
uninformative: loss of information occurs as the energy
of incident light is outside of the photo-sensor sensitivity
range. In our photometric camera model the effects of saturation are represented by the clipping max function.
In contrast, within the context of this paper, intensely
specular image regions are problematic not due to the limitations of practical imaging equipment, but rather due to
Figure 6. Pixels detected as saturated (shown in red) are ignored.
inherent physical reasons. This is because unlike isotropic,
diffuse reflection, specular reflection by definition does not
depend on surface albedo [7] and is effectively determined
by the colour of incident light [13].
Finally, deeply shadowed pixels lack colour information because insufficient light was reflected to lend itself to
wavelength/colour analysis. In the case of chromaticity, for
example, this problem demonstrates itself through division
by zero in (4).
Our approach. For simplicity, uninformative regions in
this paper are excluded from consideration when appearance models are built (see Sec. 3.2). As a consequence,
they do not contribute to the similarity score between sequences. We formally classify a pixel as uninformative if
its luminance is either less than 3%, or more than 97% of
the maximal luminance that can be represented, see Fig. 6.
Discussion. Before we proceed to the next section, we
wish to add a brief clarification regarding “uninformative”
pixels. Our claim is not that these are entirely lacking in discriminative information. As a simple example, if only a single colour channel is saturated, the remaining two channels
can still be used to derive a colour constraint. Calling such
image areas ”less informative” would have probably been
more appropriate, but we decided against it for the sake of
avoiding awkward language constructs.
Also, we emphasize that we do not mean to suggest that
these pixels are uninformative in general, but merely in the
context of colour-based invariants. Indeed, spatial distribution of shadowed and specular pixels contains strong shape
cues [4], amongst others.
3. Empirical Evaluation
The central premise of this paper is that in the treatment
of colour for the purpose of face recognition, nonlinear
effects in the photometric camera response are significant
and need to be carefully modelled. In this section we first
3.2. Implementation details
Figure 7. Cambridge Face Database contains extreme illumination conditions which also greatly vary between sequences. They
are illustrated on a single frontal face for the purpose of isolating
illumination effects only.
Detected and
resized face:
60 x 60 pixels
Original frame:
320 x 240 pixels
Our aim in this evaluation was not necessarily to engineer the best performing system, but rather to obtain an
assessment of relative performance of different representations and invariants. We chose canonical correlations (CC)
between linear subspaces [5, 8] as a simple and well understood method for matching sets of fixed dimensionality
vectors.
3.2.1 Set matching
Our basic algorithm for pairwise matching of face sets consists of two stages. Model estimation consists of fitting a
linear subspace to each image set corresponding to a single
input sequence. Two such sets are then compared and the
first canonical correlation between the corresponding subspaces is used as the similarity measure. We now explain
these steps in more detail.
Cropped subimage:
40 x 40 pixels
Figure 8. Following detection, we automatically crop faces so as
to eliminate any image regions which may interfere with the study
of colour.
present empirical evidence for this assertion and then quantify the contribution of each model parameter by evaluating
the appropriate proposed colour-invariant.
3.1. Data
Model estimation. Let di be a raster-ordered representation of the i-th detected face in a video sequence. The
basis vectors of the corresponding linear subspace can be
computed as the eigenvectors corresponding to the largest
eigenvalue of the cross-correlation matrix C = DDT /N ,
where:
£
¤
D = d1 |d2 | . . . |dN .
(22)
Model estimation with void elements. In Sec. 2.4 we explained why some image regions cannot be used to extract
colour invariants. This means that the corresponding elements of di are undefined and PCA cannot be readily performed. We thus modify the basic model estimation algorithm to take this feature of our data into account.
Let mi be the mask corresponding to di , such that
mi (j) = 0 iff di (j) is undefined. We then perform PCA
on a modified cross-correlation matrix
We conducted evaluation on a large database of face motion video sequences kindly provided to us by the University
of Cambridge and described in detail in [1]. The 700 sequences in this database, each containing 100 frames, were
acquired in a virtually unconstrained setting, thus making
the recognition task representatively challenging for most
practical applications. Specifically, the extent of illumination variation across the 7 different settings used for each of
the 100 people, is illustrated in Fig. 7.
where ÷ denotes element-wise division and
Faces, which vary in scale from (roughly) 40 to 80 pixels, were extracted from 320 × 240 pixel frames using the
Viola-Jones detector [14]. They were then rescaled to the
uniform scale of 60 × 60 pixels and cropped to the innermost 40 × 40 pixel subimage, as shown in Fig. 8.
CC matching. The first canonical correlation between
two subspaces spanned by bases B1 and B2 can be computed as the largest singular value of the matrix BT1 B1 [3].
It it equal to the cosine of the smallest angle between vectors of the two spaces.
¡
¢ ¡
¢
Ĉ = DDT ÷ MMT ,
i
h
M = m1 |m2 | . . . |mN .
(23)
(24)
Table 1. A summary of experimental results.
64.9
Colour channel, red
55.5
Colour channel, green
66.5
Colour channel, blue
67.9
HSV
66.1
2. quantify the degree of “frontality” of maps of all detected faces
Saturation
56.2
Value
65.8
3. select the frontal-most (highest “frontality”) face If :
Hue
35.1
Chromaticity, red
48.5
Chromaticity, green
56.8
Chromaticity, blue
51.3
Chromaticity, all
58.2
Log-∆-ratio, red-centred
39.2
Log-∆-ratio, green-centred
57.8
Log-∆-ratio, blue-centred
57.8
Log-∆-ratio, all
64.5
Adaptive log-∆-ratio, red-centred
40.3
Adaptive log-∆-ratio, green-centred
64.9
Adaptive log-∆-ratio, blue-centred
63.7
Adaptive log-∆-ratio, all
65.1
(a) from the neighbourhood of the Sf estimate the
2D plane tangential to the appearance manifold
(b) perform extrapolation from Sf along the tangent
plane to the point S′f nearest to the vertical symmetry hyperplane
(c) inverse map
S′f
to
I′f
4. result: I′f is a synthetic image of the frontal face
Measuring “frontality”. To find the face in a data set
which is closest to frontal, we need a way of quantifying the
degree of face “frontality”. Our approach consists of computing a distance transformed edge map of each face image,
which is a quasi-illumination invariant representation, and
then measuring the cosine of the angle between its rasterized left and (mirrored) right halves. Fig. 9 (a) illustrates
the basic principle, while Fig. 9 (b) shows typical responses
to differently oriented faces.
Finding the inverse map. After localizing the frontalmost face, we use the distribution of its neighbours to extrapolate in the direction tangential to the appearance manifold, see Fig. 9 (c). Since the face detector normalizes for
face scale and location, the two dominant modes of appearance changes in a singe data set correspond to varying pitch
and yaw, thus resulting in a 2D manifold. We perform extrapolation in the quasi-illumination invariant space of distance transformed edge maps, maximizing vertical symmetry. The result in the appearance domain is then obtained by
linearly combining the corresponding appearance images.
3.3. Results and discussion
We summarized our experimental results in Tab. 1.
Firstly, note the grouping of different representations in the
table into two categories: colour space transformations and
Colour space transformations
Colour channels, all (RGB)
Colour-based invariant signatures
64.6
{z
Greyscale
}
Recognition
rate (%)
}|
1. map all face images I onto a quasi-illumination invariant domain I → S
Representation
{z
The extraction of colour invariants proposed in Sec. 2.3 relies on the availability of an image of a frontal face to recover a set of camera parameters. At the very least, this
means that we need a reliable way of automatically selecting the frontal-most face from the pool of all detections in
a sequence, or more likely, an algorithm for synthesizing a
frontal face from non-frontal detections. We summarize our
approach:
|
3.2.2 Detection and synthesis of frontal faces
colour-based invariant signatures. The representations of
the former group, while functions of colour, are also inherently dependent on the manner in which a face is illuminated. On the other hand, the representations of the latter
group are all invariants, each under a specific photometric
model.
The results obtained using raw images are useful as a
benchmark for quantifying the severity of illumination variation in the database. Specifically, in comparison to Torres
et al. [12], our data set is far more challenging with approximately 25% lower recognition rate obtained using unprocessed greyscale. This difference is even more significant
when it is taken into account that we performed recognition
from video sequences, thus using more data and effectively
normalizing for pose, as well as that our matching algorithm
is more sophisticated in comparison to the simple PCA in
0
0.96
Synthetic
reconstruction
−200
0.94
Symmetry score
−400
0.92
−600
0.9
−800
0.88
Most frontal face
in the data
−1000
0.86
−1200
0.84
−1400
−1600
−1800
−2500
(a)
(b)
−2000
−1500
−1000
−500
(c)
Figure 9. (a,b) We quantify the degree of face “frontality” by measuring vertical symmetry of distance transformed Canny edges. (c)
After finding the frontal-most face in a data set we use its neighbourhood to synthetically improve the result by performing 2D linear
extrapolation. For clarity, this is illustrated in the 2D principal component space (blue points represent face images in a hypothetical data
set; green points represent samples from the remainder of the actual appearance manifold, which can be used to verify the accuracy of
synthetic frontal face reconstruction).
[12].
Much like Torres et al., we too found no statistically significant improvement when using the RGB colour space,
either a single channel at a time, or all together. However,
as the remainder of our results will show, we argue that it
would not be correct to conclude from this (as Torres et al.
do) that colour information has nothing to add to the discriminative power of luminance.
It is interesting to note that in contrast to Torres et al.
who found the three colour components equally informative, in our experiments Red was notably worse than Green
and Blue. The same was found in the case of chrominance
components. For this reason we examined the three channels in more detail, see Fig. 10. Red was found to be the
greatest in magnitude, which is not surprising given the
red-dominant colour of skin. Interestingly, the correlation
between Green and Blue was consistently quite high, and
quite low (but very variable as suggested by the variance)
between Red and either Green or Blue.
Superficially, the recognition results achieved using individual HSV components may seem somewhat surprising:
the performance of the (near) invariant Hue (see Sec. 2.2) is
rather disappointing, with the heavily illumination-affected
Value correctly matching twice as many individuals. The
performance of Hue is indicative of the inherent discriminative power of pure colour. In effect, it is this performance
that we are set on scrutinizing and improving upon in this
paper. It is also insightful to consider why Value performed
so relatively well, in the light of the widely accepted claim
that illumination is one of the foremost challenges to face
recognition. Briefly put, the reason is that it is the large
changes in illumination that present difficulties; shadows
and highlights can in fact help discern between individuals,
effectively by placing constraints on the head shape, otherwise lost in the process of projection onto the image plane.
Recognition rate attained using individual chromaticity
components significantly exceeded that of Hue and in combination nearly matching greyscale performance (“Chromaticity, all”). This supports our main premise and the
proposed photometric model: by analyzing the dependence
of measured RGB values at each pixel on camera gain, we
were able to derive a representation that is not affect by gain
changes, see Sec. 2.2.
Our introduction of wavelength-dependent gammas in
Sec. 2.3 provides further substantial improvement, adaptive log-∆-ratios expectedly performing better than simple
log-∆-ratios (see Sec. 2.3.2 and 2.3.3). The attained rate
slightly exceeds that of the greyscale representation (as well
as RGB), which is quite remarkable given that the log-∆ratios are pure colour invariants and thus complementary
to greyscale. These results suggest that colour is in fact
much more promising for face recognition than previously
acknowledged.
Interestingly, despite the very different nature of the log∆-ratios based representations and chromaticity or RGB
components, the representation corresponding to the Red
channel was found to be consistently worse in all cases than
those corresponding to Green or Blue. We found no satisfying explanation for this and suggest that more research is
needed.
4. Conclusion
This paper analyzed the importance of colour in machine recognition of faces. It was argued and experimentally demonstrated that the previously largely ignored nonlinear effects in the photometric response of the camera are
References
1
[1] O. Arandjelović and R. Cipolla. Face recognition from
video using the generic shape-illumination manifold.
In Proc. European Conference on Computer Vision
(ECCV), 4:27–40, 2006. 2, 5
0.8
0.6
cos α
0.4
[2] O. Arandjelović, G. Shakhnarovich, J. Fisher,
R. Cipolla, and T. Darrell. Face recognition with image sets using manifold density divergence. In Proc.
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 1:581–588, 2005. 1
0.2
0
−0.2
−0.4
−0.6
[3] Å. Björck and G. H. Golub. Numerical methods for
computing angles between linear subspaces. Mathematics of Computation, 27(123):579–594, 1973. 5
Face images
(a)
Angle
Mean
Variance
Red-Green
0.1063
0.30002
Red-Blue
0.2272
2
0.3026
Green-Blue
0.9077
0.03982
[4] A. Blake and G. Brelstaff. Geometry from specularities. In Proc. IEEE International Conference on Computer Vision (ICCV), pages 394–403, 1988. 4
[5] H. Hotelling. Relations between two sets of variates.
Biometrika, 28:321–372, 1936. 5
[6] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain. Face
detection in color images. IEEE Transactions on
Pattern Analysis and Machine Intelligence (PAMI),
24(5):696–706, 2002. 1
(b)
Magnitude
(/ luminance)
Mean
Variance
Red
1.1141
0.04042
Green
0.8766
2
0.0209
Blue
1.0149
0.02332
(c)
Figure 10. Pairwise angles between RGB colour components (from
the mean luminance) (a) plotted across a portion of our face data
set and (b) the mean statistics for the entire database, and (c) their
magnitudes (relative to luminance).
[7] S. K. Nayar, K. Ikeuchi, and T. Kanade. Identification of human faces based on isodensity maps. IEEE
Transactions on Pattern Analysis and Machine Intelligence (PAMI), 13(7):611–634, 1991. 4
[8] E. Oja. Subspace Methods of Pattern Recognition. Research Studies Press and J. Wiley, 1983. 5
[9] T. Riklin-Raviv and A. Shashua. The quotient image:
Class based re-rendering and recognition with varying
illuminations. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 23(2):219–139,
2001. 2
[10] B. Stenger, A. Thayananthan, P. H. S. Torr, and
R. Cipolla. Filtering using a tree-based estimator. In
Proc. IEEE International Conference on Computer Vision (ICCV), 2:1063–1070, 2003. 1
[11] M. Störring, H. J. Andersen, and E. Granum. Skin
colour detection under changing lighting conditions.
Symposium on Intelligent Robotics Systems, pages
187–195, 1999. 1
in fact substantial and should be modelled. Thus, a number
of novel colour invariants were developed for several models with different complexities. Their recognition performance on a large database with extreme illumination variability suggests that the use of colour may significantly improve greyscale-based matching algorithms.
We believe that the reported results open a number of
promising areas for further work. The most immediate
research direction we intend to pursue, motivated by the
success of similar methods in matching greyscale appearance, is that of developing algorithms which better exploit
the manifold structure of colour-based invariant representations.
[12] L. Torres, J. Y. Reutter, and L. Lorente. The importance of color information in face recognition. In Proc.
IEEE International Conference on Image Processing
(ICIP), 1999. 1, 6, 7
[13] S. Umeyama and G. Godin. Separation of diffuse and
specular components of surface reflection by use of
polarization and statistical analysis of images. IEEE
Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(5):639–647, 2004. 4
[14] P. Viola and M. Jones. Robust real-time face detection. International Journal of Computer Vision
(IJCV), 57(2):137–154, 2004. 5
[15] A. Yip and P. Sinha. Role of color in face recognition.
Perception, 31(5):995–1003, 2002. 1