Face Identification Using Kinect Technology: Ana-Andreea Nagâţ, Cătălin-Daniel Căleanu

Face Identification Using Kinect Technology
Ana-Andreea Nag, Ctlin-Daniel Cleanu

Faculty of Electronics and Telecommunications
POLITEHNICA University of Timioara
Timioara, Romnia
[email protected]

AbstractFace recognition is one of the most widely studied
problems in computer science due to various advantages,
such as universality, robustness, permanence and
accessibility [1]. It becomes increasingly important in many
applications, including urban surveillance, home security,
and healthcare [2]. Our approach aims human identification
using soft biometrics (face, skeleton) extracted from 2D and
3D video sources. The proposed solution is based on low cost
hardware [3].
I. INTRODUCTION
For many human identity recognition applications,
facial imagery represents a key criterion. Face recognition
is still a vividly researched area because the current state-
of-the-art person identification systems have good
recognition performance for structured environments,
when the user presents a frontal view, neutral expression
under consistent lighting conditions whereas the
performances degrade sharply with variations in facial
expression, position, head pose, or illumination [4].
Most current state-of-the-art facial and body
recognition systems are based on 2-D images or videos,
which offer good performance only for the data captured
under controlled conditions [5]. As a result, there is
currently a shift towards the use of 3-D data to yield better
recognition performance [6]. However, it requires more
expensive data acquisition systems and sophisticated
processing algorithms. 3D face recognition is a promising
technology because it is expected to provide greater
recognition rates than the two-dimensional approach, e.g.
by overcoming the limitations due to viewpoint, shape or
light variations [7], [8]. Supplementary, the distance and
curvature information could contain key discriminative
information offering certain advantages over traditional
intensity based techniques.
However, we think that there may be useful information
in the 2D that is not in the 3D shape, such as skin color,
freckles, and other such features. Thus, the appropriate
issue may not be 3D versus 2D, but instead be the best
method to combine 3D and 2D. Thus, in our paper, using
combined skeletal tracking and depth information, we
obtain the 2D image of the face region. All these features
are provided by a low cost 3D acquisition system, the
Kinect sensor [3]. This information is further processed
using standard image processing (PCA feature extraction)
and machine learning (distance-based classifier)
techniques.
The organization of the remainder of this paper is as
follows. We begin, in Section II, with a brief description
of the Kinect technology working principle, main features
and programming aspects. We then detail our approach
aiming face based human identification with respect to
image acquisition, pre-processing, feature extraction and
classification operations (Section III). In Section IV,
experimental results are presented. Further possible
improvements are proposed in Section V along with some
concluding remarks.
II. THE KINECT TECHNOLOGY
Today, compact and relatively inexpensive systems for
the capturing 3D images with high accuracy are available.
Stereo vision - a method of rendering objects with the
added depth information - is arguable the most used
method. Despite the numerous attempts [9] - [11] it still
has some major disadvantages, e.g. the correspondence
problem.
Currently, the Time-of-Flight (ToF) methods are
intensively studied and implemented in various hardware
solutions [12] - [14]. In this situation, the distance is
computed from the propagation time of the light beam
between the camera and the scenes objects for each point
of the image. Typical examples for such 3D video
cameras are Mesa Imagings Swiss Ranger or the
PMD[vision] CamCube 3.0 [15]. Still, the hardware
price is too high for consumer appliances.
In a third principle, called structured light, a narrow
band of light is projected onto a 3D shape. In the same
time the scene is observed by a camera [16] - [18].
Because of the distance between objects and the light
source, the appearance of the light band will suffer
modifications. In this way it is possible to calculate the
depth information for a particular scene. This principle is
used in some 3D sensors, e.g. Microsoft Kinect. This
range sensor was employed in our work mainly because it
is inexpensive and widely available.
A. The Hardware Platform
Fig. 1 depicts the main external components of the
Kinect sensor. It has as outputs an infrared structured-light
laser projector, a LED indicator, and a motor to control
tilt-in base; as inputs, four microphones, two cameras
(RGB and IR), and one accelerometer [19], [20].
B. The Software Platform
The Kinect Software Development Kit (SDK)
represents the software part of the Kinect for Window
package. Microsoft Visual Studio 2010 and its various
supported programming languages, e.g. C#, Visual Basic
or C++, might be used in developing SDK based
applications. The newly release Kinect for Windows SDK
version 1.6 (October 2012) offers improved skeletal
information, high quality speech recognition, and the
ability to support up to four Kinect devices connected to a
single computer [21], [22].
169
8th IEEE International Symposium on Applied Computational Intelligence and Informatics May 2325, 2013 Timisoara, Romania
978-1-4673-6400-3/13/$31.00 2013 IEEE
The Kinect for Windows SDK provides the tools and
APIs, both native and managed, that one need to develop
Kinect-enabled applications for Microsoft Windows [23].

Figure 1. The Kinect sensor architecture.
III. HUMAN IDENTIFICATION PROCEDURE
In order to perform person identification using soft
biometrics we developed an application which is able to:

- Capture, record and store color/depth data
- Play the recorded data
- Perform 3D skeletal tracking
- Generate 2D facial images from the scene
- Perform human identification
- Implement speech recognition

Persons standing in front of the sensor are detected
using skeletal function. The skeletal function is able to
detect up to six persons and track maximum two persons.
The data from the skeletal tracking are provided to the
application as a set of points such as head, elbow, spine,
hand, foot, knee etc.
The head point of the skeleton is used to detect persons
face. Once the algorithm detected the head, the depth data
is used to map the skeleton depth. This is necessary
because from the depth data of the skeleton point we can
retrieve the X and Y coordinates of the head point.
The color data is copied according to the X and Y
coordinates of the head, coordinates determined earlier
using the skeleton and depth data from the Kinect sensor.
Only the face of the person is copied into binary array
files.
Then, as a preprocessing steps, we first normalize the
current face image taken in real time by the Kinect sensor
to 100x100 pixels and converted it to 8 bit gray scale.
Also we apply histogram equalization on the gray scale
images. It will increase the global contrast of images,
especially when the usable data of the image is
represented by close contrast values. This allows for areas
of lower local contrast to gain a higher contrast.
Histogram equalization accomplishes this by effectively
spreading out the most frequent intensity values [24].
The recognition algorithm uses a function implemented
in the Emgu CV [25], called EigenObjectRecognizer.
With the help of this function, the faces existing in the
database are compared with the ones received in real time
from the Kinect sensor.
OpenCV (Open Source Computer Vision Library)
contains a set of useful functions for the fields of
computer vision and machine learning, developed by Intel
and now supported by Willow Garage. The library is
cross-platform, it aims real-time digital image processing,
and it is free for use under certain license conditions [26],
[27].
The Emgu CV allows calling OpenCV routines from
.NET compatible languages, e.g. C# [28], so it is a cross-
platform .NET wrapper to the OpenCV set of functions.
The EigenObjectRecognizer class from Emgu CV used
for the recognition algorithm creates an object recognizer
using the specific training data and parameters, using a
procedure similar with [29] and [30]. A new face image
is transformed into its eigenface components by a simple
operation:
) ( =
n n
(1)
for n = 1,, M where M represents the most
significant eigenvectors, - the average face of the set -
defined by
=
=
M
n
n
M
1
1
and
n
are the eigenfaces:
M n A
n
M
k
k nk n
,..., 1 ,
1
= = =
=
(2)
The weights form a vector [ ]
' 2 1
... , ,
M
T
= that
describes the contribution of each eigenface in
representing the input face image. Further, the method
determines which face class provides the best description
of an input face image by calculating the face class k that
minimizes the Euclidian distance:
( )
2 2
k k
= (3)
where
k
is a vector describing the kth face class.
It will always return the most similar object. It has as
parameters the images used for training, each of them
having the same size, the labels corresponding to the
images and the criteria for recognizer training. The
algorithm returns the name of the person form the
database having the most similarities with the one in front
of the sensor. If the person is not recognized (not enough
similarities are found) the algorithm returns an empty
string, which means the person is unknown. In order to
increase the recognition rate, multiple faces (from
different angles) of the same person could be added to the
database.
IV. EXPERIMENTAL EVALUATION
The human identity recognition application was
implemented using Microsoft Visual Studio 2010 and
Microsoft Kinect SDK v1.4. In the following we will refer
only to the face based identification part of our system.
The first component of the application displays Kinect
data as following: color data (the top-left corner), depth
data (the top-right corner), skeletal tracking data (the
bottom-left corner), sensors elevation angle (the bottom-
right corner), as is depicted in fig .2.
When the person is unknown, a label named
Unknown person appear above the persons head. The
person remains unknown until it is added into the data
based and the recognition algorithm is started (fig. 3).
In order to add an unknown person to the database, its
name must be filled in the corresponding text box and the
Add face button should be pressed. Now the person
exists in the database and can be recognized by the
A.Nagt and C.-D. Caleanu Face Identification Using Kinect Technology
170
application (fig. 4). The name of the recognized person
appears above its head but also in the right side of the
application.
It is very important that the person to be recognized to
stay at the optimum distance from the sensor, that is
between 1m and 3m. The optimum distance is about 1.5
m.
The application shows real time capabilities when
running on a mid-level laptop, HP Compaq 8510w,
configured as Intel Core 2 Duo T7500 processor, 2GB
DDR2 RAM and 256MB NVIDIA Quadro FX 570M
GPU with Microsoft Windows 7 32 bit.
Video sequence and the final application could be
downloaded from http://www.ea.etc.upt.ro/Kinect.html
V. CONCLUSION AND FUTURE RESEARCH DIRECTIONS
This paper has presented a possibility to implement a
real time system performing face based human
identification using the Kinect technology. A solution for
obtaining the face images using the combination of the 3D
information provided by the skeletal tracker and 2D
information provided by the RGB camera was proposed.
Then, an appearance-based method (eigenface - a PCA
approach) in conjunction with a distance based classifier
solution was chosen for implementing the feature
extraction and classification stages. In order to perform
these operations, a .NET wrapper was employed to enable
calling OpenCV image processing library functions.
Future work should address the possibility to use the
newly introduced Microsoft Face Tracking SDK. Using it
is possible to calculate the head pose and the face
expression in real time (fig. 5).
We would also like to improve the classification stage
by following the recent trends in computational
intelligence: the use of biologically inspired architectures,
e.g. reservoir computing [31], liquid state machines
(LMS) [32] and echo state networks (ESN) [33]. For
example, [34] and [35] present a face recognition/facial
expression application that uses ESN and LSM
architectures respectively, achieving high recognition rates
and robustness to noise. These approaches use either 2D
or 3D information in recognition.
Lastly, we would like to include other biometric traits
(skeleton, voice, hair color, eye color and skin color, as
well as the existence of beard, moustache and glasses) for
further increasing the human identification accuracy [37].

Figure 2. Kinect data acquisition.

Figure 3. Detected person not present in the facial database.

Figure 4. Successfully recognized person. In order to increase the
recognition rate, multiple faces of the same person could be added to the
database.

171
8th IEEE International Symposium on Applied Computational Intelligence and Informatics May 2325, 2013 Timisoara, Romania

Figure 5. SDK Face Tracking offering a Candide-3 3D face model
[36].

REFERENCES
[1] J. Yang, L. Nanni (Eds.), State of the art in Biometrics, InTech,
July, 2011.
[2] W. Lin, Video Surveillance, InTech, 2011.
[3] KinectTM for Windows, http://www.microsoft.com/en-
us/kinectforwindows
[4] W. Zhao, R. Chellappa, Face Processing: Advanced Modelling
and Methods, Academic Press, 2005.
[5] Z. Lei, S. Liao, M. Pietika inen, S.Z. Li, Face Recognition by
Exploring Information Jointly in Space, Scale and Orientation,
Image Processing, IEEE Transactions on , vol. 20, no. 1, pp. 247-
256, Jan. 2011.
[6] L. A. Schwarz, A. Mkhitaryan, D. Mateus, N. Navab, Estimating
Human 3D Pose from Time-of-Flight Images Based on Geodesic
Distances and Optical Flow IEEE Conference on Automatic Face
and Gesture Recognition (FG), Santa Barbara, USA, March 2011.
[7] G. Medioni and R. Waupotitsch, Face recognition and modelling
in 3D, IEEE International Workshop on Analysis and Modelling
of Faces and Gestures (AMFG 2003), October 2003, pp. 232233.
[8] C. Hesher, A. Srivastava, and G. Erlebacher, A novel technique
for face recognition using range images, Seventh International
Symposium on Signal Processing and Its Applications, 2003.
[9] R. Labayrade, D. Aubert, J.-P. Tarel, "Real time obstacle detection
in stereovision on non flat road geometry through "v-disparity"
representation", Intelligent Vehicle Symposium, vol.2, 17-21 June
2002, pp. 646- 651.
[10] T. Lyes, K. Hawick, Implementing stereo vision of gpu-
accelerated scientific simulations using commodity hardware, in
Proc. International Conference on Computer Graphics and Virtual
Reality (CGVR11). Number CGV4047, Las Vegas, USA, 2011,
pp76-82.
[11] J. Wang, P. Huang, C. Chen, W. Gu, J. Chu, "Stereovision aided
navigation of an Autonomous Surface Vehicle," Advanced
Computer Control (ICACC), 2011 3rd International Conference
on, 18-20 Jan. 2011, pp.130-133.
[12] E. N. K. Kollorz, J, Penne, J, Hornegger, A, Barke, Gesture
recognition with a Time-Of-Flight camera, International Journal
of Intelligent Systems Technologies and Applications, vol. 5, pp.
334-343, no 3/4, doi 10.1504/IJISTA.2008.021296, 2008.
[13] V. den Bergh, L.Van Gool, Combining RGB and ToF cameras
for real-time 3D hand gesture interaction , Applications of
Computer Vision (WACV), 2011 IEEE Workshop.
[14] P. Breuer, C. Eckes, S Muller, Hand Gesture Recognition with a
novel IR Time-of-Flight Range Camera - A pilot study,
Proceedings of Mi-rage 2007, Computer Vision / Computer
Graphics Collaboration Techniques and Applications, pp. 247
260.
[15] [PMD2012] PMD[vision] CamCube 3.0,
http://www.pmdtec.com/products-services/pmdvisionr-
cameras/pmdvisionr-camcube-30
[16] J. Salvi, S. Fernandez, T. Pribanic, X. Llado, A state of the art in
structured light patterns for surface profilometry, Pattern
Recognition, vol. 43, no. 8, pp. 2666-2680, August 2010.
[17] D. Scharstein, R. Szeliski, High-Accuracy Stereo Depth Maps
Using Structured Light, in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2003, pp. 195-202.
[18] M. Gupta, A. Agrawal, A. Veeraraghavan, S.G. Narasimhan,
"Structured light 3D scanning in the presence of global
illumination," Computer Vision and Pattern Recognition (CVPR),
2011 IEEE Conference on, 20-25 June 2011 pp.713-720.
[19] J. Kramer, N. Burrus, F. Echtler, D. Herrera, M. Parker, Hacking
the Kinect, Apress, 2012.
[20] S. Kean, Y. C. Hall, P. Perry, Meet the Kinect, Apress, 2011.
[21] J. Webb, J. Ashley, Beginning Kinect Programming with
Microsoft Kinect SDK, Apress, 2012.
[22] D. Catuhe, Programming with the Kinect for Windows Software
Development Kit, Microsoft Press, 2012.
[23] Kinect SDK, http://www.microsoft.com/en-
us/kinectforwindows/develop/developer-downloads.aspx
[24] T. Acharya and A. Ray, Image Processing: Principles and
Applications, New York: Wiley, 2005.
[25] Emgu CV, http://www.emgu.com/wiki/index.php/Main_Page
[26] G. Bradski, A. Kaebler, Learning OpenCV, OReilly, 2008
[27] G. Bradski, Using the Kinect Depth Camera with OpenCV,
http://opencv.willowgarage.com/wiki/Kinect
[28] D. Solis, Illustrated C# 2012, Apress, 2012.
[29] M. Turk and A. Pentland, Eigenfaces for recognition, Journal of
Cognitive Neuroscience, vol.3, No.1, 1991.
[30] M. Turk and A. Pentland, Face recognition using eigenfaces,
Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
1991, pp. 586-591.
[31] K. Vandoorne, J. Dambre, D. Verstraeten, B. Schrauwen, P.
Bienstman, Parallel Reservoir Computing Using Optical
Amplifiers, Neural Networks, IEEE Transactions on, vol. 22, no.
9, pp.1469-1481, 2011.
[32] [H. Jaeger, W. Maass, J. Principe, Introduction to the special
issue on echo state networks and liquid state machines, Neural
Networks, vol. 20, no. 3, pp. 287-289, 2007.
[33] S.P. Chatzis, Y. Demiris, Echo State Gaussian Process, Neural
Networks, IEEE Transactions on, vol.22, no.9, pp.1435-1445,
Sept. 2011.
[34] A. Woodward, T. Ikegami, A Reservoir Computing approach to
Image Classification using Coupled Echo State and Back-
Propagation Neural Networks, IVCNZ2011, International
Conference on Image and Vision Computing New Zealand, 2011.
[35] B. J. Grzyb, E. Chinellato, G. M. Wojcik, A. Kaminski, Facial
Expression Recognition based on Liquid State Machines built of
Alternative Neuron Models, Proceedings of International Joint
Conference on Neural Networks, Atlanta, Georgia, USA, June 14-
19, 2009.
[36] J. Ahlberg, CANDIDE - a parameterized face,
http://www.icg.isy.liu.se/candide/
[37] A. Dantcheva, J-L Dugelay, P. Elia, Person Recognition using a
bag of facial soft biometrics (BOFSB), MMSP 2010, IEEE
International Workshop on Multimedia Signal Processing, October
4-6, 2010.

A.Nagt and C.-D. Caleanu Face Identification Using Kinect Technology
172

Face Identification Using Kinect Technology: Ana-Andreea Nagâţ, Cătălin-Daniel Căleanu

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Face Identification Using Kinect Technology: Ana-Andreea Nagâţ, Cătălin-Daniel Căleanu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Face Identification Using Kinect Technology: Ana-Andreea Nagâţ, Cătălin-Daniel Căleanu

Uploaded by

Copyright:

Available Formats

Face Identification Using Kinect Technology

Ana-Andreea Nag, Ctlin-Daniel Cleanu

You might also like