Face Reg LBP
Face Reg LBP
Face Reg LBP
Introduction
470
The original LBP operator, introduced by Ojala et al. [9], is a powerful means of
texture description. The operator labels the pixels of an image by thresholding
the 3x3-neighbourhood of each pixel with the center value and considering the
result as a binary number. Then the histogram of the labels can be used as a
texture descriptor. See Figure 1 for an illustration of the basic LBP operator.
Later the operator was extended to use neigbourhoods of dierent sizes [8].
Using circular neighbourhoods and bilinearly interpolating the pixel values allow
any radius and number of pixels in the neighbourhood. For neighbourhoods we
will use the notation (P, R) which means P sampling points on a circle of radius
of R. See Figure 2 for an example of the circular (8,2) neighbourhood.
Another extension to the original operator uses so called uniform patterns
[8]. A Local Binary Pattern is called uniform if it contains at most two bitwise
Threshold
1
1
0
1
471
Binary: 11010011
Decimal: 211
Fig. 2. The circular (8,2) neigbourhood. The pixel values are bilinearly interpolated
whenever the sampling point is not in the center of a pixel.
transitions from 0 to 1 or vice versa when the binary string is considered circular.
For example, 00000000, 00011110 and 10000011 are uniform patterns. Ojala et al.
noticed that in their experiments with texture images, uniform patterns account
for a bit less than 90 % of all patterns when using the (8,1) neighbourhood and
for around 70 % in the (16,2) neighbourhood.
We use the following notation for the LBP operator: LBPu2
P,R . The subscript
represents using the operator in a (P, R) neighbourhood. Superscript u2 stands
for using only uniform patterns and labelling all remaining patterns with a single
label.
A histogram of the labeled image fl (x, y) can be defined as
!
I {fl (x, y) = i} , i = 0, . . . , n 1,
(1)
Hi =
x,y
in which n is the number of dierent labels produced by the LBP operator and
"
1, A is true
I {A} =
0, A is false.
This histogram contains information about the distribution of the local micropatterns, such as edges, spots and flat areas, over the whole image. For ecient face
representation, one should retain also spatial information. For this purpose, the
image is divided into regions R0 , R1 , . . . Rm1 (see Figure 5 (a)) and the spatially
enhanced histogram is defined as
!
Hi,j =
I {fl (x, y) = i} I {(x, y) Rj } , i = 0, . . . , n1, j = 0, . . . , m1. (2)
x,y
472
From the pattern classification point of view, a usual problem in face recognition is having a plethora of classes and only a few, possibly only one, training
sample(s) per class. For this reason, more sophisticated classifiers are not needed
but a nearest-neighbour classifier is used. Several possible dissimilarity measures
have been proposed for histograms:
Histogram intersection:
D(S, M) =
min(Si , Mi )
(3)
Log-likelihood statistic:
L(S, M) =
Si log Mi
(4)
! (Si Mi )2
i
Si + Mi
(5)
!
i,j
wj
(Si,j Mi,j )2
,
Si,j + Mi,j
(6)
Experimental Design
The CSU Face Identification Evaluation System [11] was utilised to test the
performance of the proposed algorithm. The system follows the procedure of
the FERET test for semi-automatic face recognition algorithms [12] with slight
modifications. The system uses the full-frontal face images from the FERET
database and works as follows (see Figure 3):
1. The system preprocesses the images. The images are registered using eye
coordinates and cropped with an elliptical mask to exclude non-face area
from the image. After this, the grey histogram over the non-masked area is
equalised.
2. If needed, the algorithm is trained using a subset of the images.
subset
Image files
Preprocessing
Eye coordinates
Preprocessed
image files
473
Algorithm
training
Training data
Experimental
algorithm
Distance matrix
Rank curve
NN Classifier
3. The preprocessed images are fed into the experimental algorithm which outputs a distance matrix containing the distance between each pair of images.
4. Using the distance matrix and dierent settings for gallery and probe image
sets, the system calculates rank curves for the system. These can be calculated for prespecified gallery and probe image sets or by choosing a random
permutations of one large set as probe and gallery sets and calculating the
average performance. The advantage of the prior method is that it is easy
to measure the performance of the algorithm under certain challenges (e.g.
dierent lighting conditions) whereas the latter is more reliable statistically.
The CSU system uses the same gallery and probe image sets that were used
in the original FERET test. Each set contains at most one image per person.
These sets are:
fa set, used as a gallery set, contains frontal images of 1196 people.
fb set (1195 images). The subjects were asked for an alternative facial expression than in fa photograph.
fc set (194 images). The photos were taken under dierent lighting conditions.
dup I set (722 images). The photos were taken later in time.
dup II set (234 images). This is a subset of the dup I set containing those
images that were taken at least a year after the corresponding gallery image.
In this paper, we use two statistics produced by the permutation tool: the
mean recognition rate with a 95 % confidence interval and the probability of one
algorithm outperforming another [13]. The image list used by the tool1 contains
4 images of each of the 160 subjects. One image of every subject is selected to
the gallery set and another image to the probe set on each permutation. The
number of permutations is 10000.
1
474
The CSU system comes with implementations of the PCA, LDA, Bayesian
intra/extrapersonal (BIC) and Elastic Bunch Graph Matching (EBGM) face
recognition algorithms. We include the results obtained with PCA, BIC2 and
EBGM here for comparison.
There are some parameters that can be chosen to optimise the performance
of the proposed algorithm. The first one is choosing the LBP operator. Choosing
an operator that produces a large amount of dierent labels makes the histogram
long and thus calculating the distace matrix gets slow. Using a small number of
labels makes the feature vector shorter but also means losing more information.
A small radius of the operator makes the information encoded in the histogram
more local. The number of labels for a neighbourhood of 8 pixels is 256 for
standard LBP and 59 for LBPu2 . For the 16-neighbourhood the numbers are
65536 and 243, respectively. The usage of uniform patterns is motivated by the
fact that most patterns in facial images are uniform: we found out that in the
preprocessed FERET images, 79.3 % of all the patterns produced by the LBP16,2
operator are uniform.
Another parameter is the division of the images into regions R0 , . . . , Rm1 .
The length of the feature vector becomes B = mBr , in which m is the number
of regions and Br is the LBP histogram length. A large number of small regions
produces long feature vectors causing high memory consumption and slow classification, whereas using large regions causes more spatial information to be lost.
We chose to divide the image with a grid into k k equally sized rectangular
regions (windows). See Figure 5 (a) for an example of a preprocessed facial image
divided into 49 windows.
Results
Two decision rules can be used with the BIC classifier: Maximum A Posteriori (MAP)
or Maximum Likelihood (ML). We include here the results obtained with MAP.
475
Table 1. The performance of the histogram intersection, log-likelihood and 2 dissimilarity measures using dierent window sizes and LBP operators.
Operator Window size P(HI > LL) P(2 > HI) P(2 > LL)
LBPu2
18x21
1.000
0.714
1.000
8,1
LBPu2
21x25
1.000
0.609
1.000
8,1
LBPu2
26x30
0.309
0.806
0.587
8,1
LBPu2
18x21
1.000
0.850
1.000
16,2
LBPu2
21x25
1.000
0.874
1.000
16,2
LBPu2
26x30
1.000
0.918
1.000
16,2
LBPu2
32x37
1.000
0.933
1.000
16,2
LBPu2
43x50
0.085
0.897
0.418
16,2
parameters. Changes in the parameters may cause big dierences in the length
of the feature vector, but the overall performance is not necessarily aected
significantly. For example, changing from LBPu2
16,2 in 18*21-sized windows to
LBPu2
in
21*25-sized
windows
drops
the
histogram
length from 11907 to 2124,
8,2
while the mean recognition rate reduces from 76.9 % to 73.8 %.
u2
u2
The mean recognition rates for the LBPu2
16,2 , LBP8,2 and LBP8,1 as a function of the window size are plotted in Figure 4. The original 130*150 pixel image
was divided into k k windows, k = 4, 5, . . . , 11, 13, 16 resulting in window sizes
from 32*37 to 8*9. The five smallest windows were not tested using the LBPu2
16,2
operator because of the high dimension of the feature vector that would have
been produced. As expected, a larger window size induces a decreased recognition rate because of the loss of spatial information. The LBPu2
8,2 operator in
18*21 pixel windows was selected since it is a good trade-o between recognition
performance and feature vector length.
0.8
0.75
0.7
0.65
0.6
LBPu2
8,2
LBPu2
16,2
LBPu2
8,1
0.55
8x9
Window size
Fig. 4. The mean recognition rate for three LBP operators as a function of the window
size.
476
To find the weights wj for the weighted 2 statistic (Equation 6), the following procedure was adopted: a training set was classified using only one of the
18*21 windows at a time. The recognition rates of corresponding windows on
the left and right half of the face were averaged. Then the windows whose rate
lay below the 0.2 percentile of the rates got weight 0 and windows whose rate
lay above the 0.8 and 0.9 percentile got weights 2.0 and 4.0, respectively. The
other windows got weight 1.0.
The CSU system comes with two training sets, the standard FERET training
set and the CSU training set. As shown in Table 2, these sets are basically subsets
of the fa, fb and dup I sets. Since illumination changes pose a major challenge
to most face recognition algorithms and none of the images in the fc set were
included in the standard training sets, we defined a third training set, called the
subfc training set, which contains half of the fc set (subjects 10131109).
Table 2. Number of images in common between dierent training and testing sets.
Training set
FERET standard
CSU standard
subfc
fa
270
396
97
fb
270
0
0
The permutation tool was used to compare the weights computed from the
dierent training sets. The weights obtained using the FERET standard set gave
an average recognition rate of 0.80, the CSU standard set 0.78 and the subfc set
0.81. The pairwise comparison showed that the weights obtained with the subfc
set are likely to be better than the others (P(subfc > FERET)=0.66 and P(subfc
> CSU)=0.88).
The weights computed using the subfc set are illustrated in Figure 5 (b).
The weights were selected without utilising an actual optimisation procedure
and thus they are probably not optimal. Despite that, in comparison with the
nonweighted method, we got an improvement both in the processing time (see
Table 3) and recognition rate (P(weighted > nonweighted)=0.976).
The image set which was used to determine the weights overlaps with the fc
set. To avoid biased results, we preserved the other half of the fc set (subjects
(a)
(b)
Fig. 5. (a) An example of a facial image divided into 7x7 windows. (b) The weights
set for weighted 2 dissimilarity measure. Black squares indicate weight 0.0, dark grey
1.0, light grey 2.0 and white 4.0.
477
Table 3. Processing times of weighted and nonweighted LBP on a 1800 MHz AMD
Athlon running Linux. Note that processing FERET images (last column) includes
heavy disk operations, most notably writing the distance matrix of about 400 MB to
disk.
Type of LBP Feature ext. Distance calc.
Processing
(ms / image)
(s / pair) FERET images (s)
Weighted
3.49
46.6
1046
Nonweighted
4.14
58.6
1285
fb
0.97
0.93
0.85
0.82
0.90
fc
0.79
0.51
0.65
0.37
0.42
dup I dup II
0.66
0.64
0.61
0.50
0.44
0.22
0.52
0.32
0.46
0.24
lower
0.76
0.71
0.66
0.67
0.61
mean
0.81
0.76
0.72
0.72
0.66
upper
0.85
0.81
0.78
0.78
0.71
478
Cumulative score
0.95
0.9
LBP weighted
LBP nonweighted
Bayesian MAP
PCA MahCosine
EBGM CSU optimal
0.85
0.8
0
10
20
Rank
30
40
50
Cumulative score
0.9
0.8
0.7
0.6
LBP weighted
LBP nonweighted
Bayesian MAP
PCA MahCosine
EBGM CSU optimal
0.5
0.4
0.3
0
10
20
Rank
30
40
50
Cumulative score
0.9
0.8
0.7
0.6
LBP weighted
LBP nonweighted
Bayesian MAP
PCA MahCosine
EBGM CSU optimal
0.5
0.4
0
10
20
Rank
30
40
50
Fig. 6. (a), (b), (c) Rank curves for the fb, fc and dup1 probe sets (from top to
down).
479
1
0.9
Cumulative score
0.8
0.7
0.6
0.5
LBP weighted
LBP nonweighted
Bayesian MAP
PCA MahCosine
EBGM CSU optimal
0.4
0.3
0.2
0
10
20
Rank
30
40
50
480
The experimental results clearly show that the LBP-based method outperforms other approaches on all probe sets (fb, fc, dup I and dup II ). For instance,
our method achieved a recognition rate of 97% in the case of recognising faces
under dierent facial expressions (fb set), while the best performance among
the tested methods did not exceed 90%. Under dierent lighting conditions (fc
set), the LBP-based approach has also achieved the best performance with a
recognition rate of 79% against 65%, 37% and 42% for PCA, BIC and EBGM,
respectively. The relatively poor results on the fc set confirm that illumination
change is still a challenge to face recognition. Additionally, recognising duplicate
faces (when the photos are taken later in time) is another challenge, although
our proposed method performed better than the others.
To assess the performance of the LBP-based method on dierent datasets, we
also considered the ORL face database. The experiments not only confirmed the
validity of out approach, but also demonstrated its relative robustness against
changes in alignment.
Analyzing the dierent parameters in extracting the face representation, we
noticed a relative insensitivity to the choice of the LBP operator and region
size. This is an interesting result since the other considered approaches are more
sensitive to their free parameters. This means that only simple calculations are
needed for the LBP description while some other methods use exhaustive training
to find their optimal parameters.
In deriving the face representation, we divided the face image into several
regions. We used only rectangular regions each of the same size but other divisions are also possible as regions of dierent sizes and shapes could be used.
To improve our system, we analyzed the importance of each region. This is motivated by the psychophysical findings which indicate that some facial features
(such as eyes) play more important roles in face recognition than other features
(such as the nose). Thus we calculated and assigned weights from 0 to 4 to the
regions (See Figure 5 (b)). Although this kind of simple approach was adopted to
compute the weights, improvements were still obtained. We are currently investigating approaches for dividing the image into regions and finding more optimal
weights for them.
Although we clearly showed the simplicity of LBP-based face representation
extraction and its robustness with respect to facial expression, aging, illumination and alignment, some improvements are still possible. For instance, one
drawback of our approach lies in the length of the feature vector which is used for
face representation. Indeed, using a feature vector length of 2301 slows down the
recognition speed especially, for very large face databases. A possible direction
is to apply a dimensionality reduction to the face feature vectors. However, due
to the good results we have obtained, we expect that the methodology presented
here is applicable to several other object recognition tasks as well.
481
References
1. Phillips, P., Grother, P., Micheals, R.J., Blackburn, D.M., Tabassi, E., Bone, J.M.:
Face recognition vendor test 2002 results. Technical report (2003)
2. Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, P.J.: Face recognition: a literature survey. Technical Report CAR-TR-948, Center for Automation Research,
University of Maryland (2002)
3. Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.: The FERET database and
evaluation procedure for face recognition algorithms. Image and Vision Computing
16 (1998) 295306
4. Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3 (1991) 7186
5. Etemad, K., Chellappa, R.: Discriminant analysis for recognition of human face
images. Journal of the Optical Society of America 14 (1997) 17241733
6. Wiskott, L., Fellous, J.M., Kuiger, N., von der Malsburg, C.: Face recognition by
elastic bunch graph matching. IEEE Transaction on Pattern Analysis and Machine
Intelligence 19 (1997) 775779
7. Moghaddam, B., Nastar, C., Pentland, A.: A bayesian similarity measure for direct
image matching. In: 13th International Conference on Pattern Recognition. (1996)
II: 350358
8. Ojala, T., Pietik
ainen, M., M
aenp
aa
, T.: Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns. IEEE Transactions on
Pattern Analysis and Machine Intelligence 24 (2002) 971987
9. Ojala, T., Pietik
ainen, M., Harwood, D.: A comparative study of texture measures
with classification based on feature distributions. Pattern Recognition 29 (1996)
5159
10. Gong, S., McKenna, S.J., Psarrou, A.: Dynamic Vision, From Images to Face
Recognition. Imperial College Press, London (2000)
11. Bolme, D.S., Beveridge, J.R., Teixeira, M., Draper, B.A.: The CSU face identification evaluation system: Its purpose, features and structure. In: Third International
Conference on Computer Vision Systems. (2003) 304311
12. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and
Machine Intelligence 22 (2000) 10901104
13. Beveridge, J.R., She, K., Draper, B.A., Givens, G.H.: A nonparametric statistical comparison of principal component and linear discriminant subspaces for face
recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (2001) I: 535542
14. Samaria, F.S., Harter, A.C.: Parameterisation of a stochastic model for human face
identification. In: IEEE Workshop on Applications of Computer Vision. (1994)
138142