Control of Facial Expressions of The Humanoid Robot Head ROMAN
Control of Facial Expressions of The Humanoid Robot Head ROMAN
Control of Facial Expressions of The Humanoid Robot Head ROMAN
head ROMAN
Karsten Berns
Robotics Research Lab
Department of Computer Science
University of Kaiserslautern Germany
Email: [email protected]
Jochen Hirth
Robotics Research Lab
Department of Computer Science
University of Kaiserslautern Germany
Email: j [email protected]
Abstract For humanoid robots which are able to assist
humans in their daily life, the capability for adequate interaction
with human operators is a key feature. If one considers that more
than 60% of human communication is conducted non-verbally
(by using facial expressions and gestures), an important research
topic is how interfaces for this non-verbal communication can
be developed. To achieve this goal, several robotic heads have
been designed. However, it remains unclear how exactly such a
head should look like and what skills it should have to be able to
interact properly with humans. This paper describes an approach
that aims at answering some of these design choices. A behavior-
based control to realize facial expressions which is a basic ability
needed for interaction with humans is presented. Furthermore a
poll in which the generated facial expressions should be detected
is visualized. Additionally, the mechatronical design of the head
and the accompanying neck joint are given.
Index Termshumanoid robot head, facial expressions, me-
chanical design, behavior based control
I. INTRODUCTION
Worldwide, several research projects focus on the develop-
ment of humanoid robots. Especially for the head design there
is an ongoing discussion if it should look like a human head
or if a more technical optimized head construction [1], [2]
should be developed. The advantage of a technical head is, that
there is no restriction according to the design parameters like
head size or shape. This fact reduces the effort for mechanical
construction. On the other hand, if realistic facial expressions
should be used to support communication between a robot
and a person, human likeness could increase the performance
of the system. The aim of the humanoid head project of
the University of Kaiserslautern is to develop both a very
complex robot head able to simulate the facial expressions of
humans while perceiving its environment by a sensor system
(stereo-camera system, articial nose, several microphones,
...) similar to the senses of a human. On the other hand, the
robot head should look like a human head to examine, if its
performance due to non-verball communication will be higher
compared to technical heads. In the following, a behavior-
based control architecture for realizing facial expressions is
introduced. Starting from the state of research new goals
for the realization of facial expressions are dened. Second
the mechatronics system of ROMAN, including a neck- and
an eye-construction, is described. Then the behavior-based
software architecture of ROMAN is presented. Based on this
control approach an experimental evaluation is realized, where
the facial expressions of ROMAN should be detected. The
results of this evaluation are visualized at the end of the paper.
Fig. 1. The humanoid robot head ROMAN (ROMAN = RObot huMan
interAction machiNe) of the University of Kaiserslautern.
II. STATE OF RESEARCH
Several projects worldwide focus on the implementation of
facial expressions on different types of heads (e.g. see [3]).
Uncertainly Kismet is one of the most advanced projects
in the area of human robot interaction [4], [5], [2]. Kismet
is able to show different types of emotions by using facial
expressions. The main disadvantage of the selected approach
is the way how a facial expression is activated. Each facial-
expression-behavior is related to a so called releaser. This
releaser calculates the activation of the corresponding facial-
expression-bahavior. The behavior with the highest activation
determines the resulting facial expression of the robot. That
means the so called winner-takes-all concept is used. In
order to show more than a certain number of predened
facial expressions it is necessary to fuse the single facial
expressions. With this fusion it is also possible to show
graduated facial expressions and to get smooth transitions
between the different expressions. Also with regard to the
use of speech and the corresponding movements of the mouth
a fusion of the determined control parameters is necessary.
Otherwise the robot would change its facial expression when it
starts to talk. Or new specic behaviors must be implemented
for the combination of speech and facial expressions.
Another Project concerning facial expressions is WA-4RII of
the Takanishi Lab of the Waseda University [1], [6]. For
realization of facial expressions a 3-dimensional emotion map
is used. In this map there are x areas dened for 6 Emotions.
If the input value is located in the area of a certain emotion, the
corresponding facial expression is presented by the robot. This
activation system of facial expressions is the main disadvan-
tage in this approach. Because this system allows only to show
a predened number of facial expressions. A better approach
should be able to show more than a predened number of
facial expressions. This could be realized with a mixture of
the different facial expressions and with a graduated activation
of the different expressions.
The Alpha-Project of the University of Freiburg [7], [8]
realizes an approach that uses the fusion of the different facial
expressions. But a disadvantage in this approach is, that the
head of Alpha only has a few degrees of freedom to show
facial expressions.
All these considerations lead to the following questions:
What should the architecture of such a behavior based
control for facial expressions look like?
How should the activation of such a behavior work?
How should the fusion of the facial expressions be
calculated?
How to prevent inuence between different facial expres-
sions?
III. MECHATRONICS OF ROMAN
Fig. 2. Mechanical construction of the humanoid robot head
Mechanics - The mechanics of the head consists of a
basic unit (cranial bone) including the lower jaw, the neck,
and the motor unit for the articial eyes. Beside the eye
construction, which is built at the moment in our mechanical
workshop, all mechantonical components are installed in our
head. In the basic unit 8 metal plates, which can be moved
via wires, are glued on the silicon skin at a position, on which
Ekmans action units are placed. The plate areas as well as its
xing positions on the skin and the direction of its movement
are optimized in a simulation system according to the basic
emotions which should be expressed. As actuators, 10 servo-
motors are used to pull and push the wires. Additionally, a
servo-motor is used to raise and lower the lower jaw.
For the neck construction a concept was selected where
motions are realized by a kinematic chain with 4 DOF. For
the design of the neck basic characteristics of the geometry,
kinematics and dynamics of a human neck are considered.
From the analysis of the human neck a ball joint could be
selected for the construction of the 3DOF for basic motion.
Unfortunately, it is very hard to design appropriate driving
system for such a solution. So, it has been decided, that for
kinematic functions of the human neck a serial chain similar
to Cardan joint solution is applied. The rst degree of freedom
is the rotation over vertical axis. The range of this rotation for
articial neck was assumed as +/ 60
). In
addition there is a 4th joint used for nodding ones head (range
of motion +/ 40
).
Sensor system - Besides encoders xed on the DC motors,
the neck and the eye movements, an inertial system will be
integrated in the head which measures the angular velocity
and the acceleration in all 3 DOFs. This gives an estimation
of the pose of the head. Also two microphones (xed in
the ears) and a loudspeaker are included in the head. The
main sensor system for the interaction with humans is the
stereo-vision system, which consist of two dragony cameras.
Several experiments have been performed like the detection of
a human head (see [9]). At the moment the integration of the
stereo camera system in the eye design of the head is under
development.
Control architecture - The control of the servo and DC mo-
tors as well as the determination of the pose from the inertial
system is done with a DSP (Motorola 56F803) connected to
a CPLD (Altera EPM 70 128). In total 5 of these computing
units are installed in the head one for the inertial system, one
for the stepping motors of the eyes, two for the 4 DC motors
of the neck and one of the 11 servo motors which move the
skin. These computing units are connected via CAN-bus to
an embedded PC. The two microphones and the loudspeaker
are connected to the sound card of the embedded PC. The
cameras, which are included in the eye construction, use the
rewire IEEE 1394 input channel of the embedded PC (see
gure 3).
The calculation of movements of the different facial ex-
pressions is done on a Linux-PC. The behavior based control
is implemented with the help of the Modular Controller
Architecture (MCA). MCA is a modular, network transparent
and realtime capable C/C++ framework for controlling robots
(see [10] and [11] for details). MCA is conceptually based
on modules and edges between them. Modules may be organ-
ised in module groups, which allows the implementation of
hierarchical structures.
Fig. 3. Computer architecture of ROMAN.
IV. CONCEPT FOR THE BEHAVIOR BASED CONTROL OF
FACIAL EXPRESSIONS
In the following our control approach will be presented. To
reach the the above mentioned targets for the control concept a
behavior-based approach was selected. The design of a single
bahavior-node is shown in gure 4. The basic concept of our
behavior-nodes is presented in [12].
There are 6 basic facial expressions (anger, disgust, fear,
happiness, sadness and surprise). Every expression is related
to one behavior-node. The control of these 6 behaviors is done
similar to the Kismet-Project [2], by a 3-dimensional input
vector (A, V, S) (A = Arousal, V = Valence, S = Stance). This
vector is represented by a point in the emotion map, which
is a cube. In this cube every basic facial expression is also
represented by a point. The activation of facial expression i,
i
is calculated in equation 1, where diag stands for the diagonal
of the cube which is the maximum possible distance of two
points. P
i
means the point that represents facial expression i
and I the input vector.
i
=
diag |P
i
I|
diag
(1)
Fig. 4. A single behavior-node. a = the activity, r = target rating, = the
activation, e = input, i = inhibition, F(e, , i) = transfer function and u the
output calculated by the transfer function
Fig. 5. Concept of the behavior based control. A facial expression behavior
consists of two steps, the rst step considers the time depending character of
facial expressions, the second step realizes the movements of the face.
Figure 5 shows the concept of the behavior based archi-
tecture. Every facial expression consists of two steps. The
rst step gets the activation calculated with the input vector
(A, V, S). It determines a function, act(t, ), for the activation
depending on time (equation 2 and gure 6). In equation 2 g
stands for the gradient, h for the time the max. value is hold, n
for the negative gradient and p for the percent of the max. value
that are hold until the input falls under a certain threshold. The
resulting functions for the 6 basic facial expressions are shown
in gure 7.
act(t, ) =
g t if t
1
g
and > 0.1,
if
1
g
< t
1+h
g
and > 0.1,
+ n t if
1+h
g
< t <
1+h
g
+
p1
n
and > 0.1,
p if t
1+h
g
+
p1
n
and > 0.1,
max(p + n t, 0) if
1+h
g
+
p1
n
< t <
p
n
and 0.1,
0 else.
(2)
Fig. 6. The general function that represents the regard of time for facial
expressions. The parameters that changes depending on facial expression are
the gradient, time t that the maximum is hold, x and the negative gradient.
This activation is led to the second behavior step. This
step realizes the facial expressions. Therefore, several action
units, similar to [13], are dened as you can see in table I.
The difference between the action units in [13] and the new
ones are that the action units of ROMAN not only move
in one direction. That means that the muscles of ROMAN
are able to pull and push. Because of this the action units
used here have two parameters: AU(x, y), x{1, 1} stands
for the direction (1 = push, 1 = pull) and y stands for
the strength of the movement. The action units used by
the behaviors are shown in table II. The strength s
i
of the
action unit AU
i
(x
i
, y)
i
caused by facial expression j, is
calculated as shown in equation 4. The parameter a
j
in this
equation stands for the activation of the facial expression j
(see equation 3). The output u
j
of the facial expression j is:
u
j
= (s
1
, s
2
, s
9
, s
12
, s
15
, s
20
, s
24
, s
26
).
a
i
= act
i
(t, ), i[1, ..., 6] (3)
s
i
= AU
i
(x
i
, y
i
) a
j
(4)
s
i
= x
i
yi a
j
Finally the action unit alignments of every basic facial ex-
pression behavior are fused. This fusion is done by equation 5,
Fig. 7. The activation functions for the facial expressions of the robot head
ROMAN, A = anger, B = disgust, C = fear, D = happiness, E = sadness and
F = surprise.
TABLE I
TABLE OF THE ROBOT HEAD ROMANS ACTION UNITS.
Action Unit Number Description
1 raise and lower inner eyebrow
2 raise and lower outer eyebrow
9 nose wrinkle
12 raise mouth corner
15 lower mouth corner
20 stretch lips
24 press lips
26 lower chin
where u
i
stands for the action unit alignment of the facial
expression i, a
i
for the activation of facial expression i and u
for the resulting action unit alignment.
u =
5
k=0
a
k
5
j=0
a
j
u
k
(5)
To prevent inuence between different facial expressions,
the inhibition option of the behaviors is used. If a facial
expression is activated for more than 80% contrary facial
expressions are inhibited (e.g. if happiness is activated for
more than 80% the activation of sadness should be 0%) and
if one facial expression is more than 95% activated all other
facial expressions are inhibited.
TABLE II
THE ACTION UNIT COMBINATIONS TO SHOW THE 6 BASIC FACIAL
EXPRESSIONS.
Facial Expression Action Unit Combination
Anger AU1(1, y) + AU2(1, y) + AU20(1)
Disgust AU1(1, y) + AU2(1, y) + AU9(1, y) +
AU20(1, y) + AU24(1, y)
Fear AU1(1, y) + AU2(1, y) + AU26(1, y)
Happiness AU12(1, y) + AU26(1, y)
Sadness AU1(1, y) + AU2(1, y) + AU15(1, y)
Surprise AU1(1, y) + AU2(1, y) + AU26(1)
The resulting facial expressions of our robot head ROMAN
1
are shown in gure 8.
Fig. 8. The facial expressions generated with the robot head ROMAN: A =
anger, B = disgust, C = fear, D = happiness, E = sadness and F = surprise.
V. EXPERIMENTAL EVALUATION
The experiment set-up was as follows: we presented 9
pictures and 9 videos with facial expressions of ROMAN
to 32 persons (13 women and 19 men) at the age of 21
to 61 years. Every person has to determine the correlation
between presented expression an the 6 basic facial expressions
1
The silikon skin of ROMAN was produced and designed by Clostermann
Design Ettlingen.
with levels from 1 to 5 (1 means a weak and 5 a strong
correlation).
The results of the experiment should help to get more
information of the recognition and the demonstration of facial
expressions. Furthermore with help of the results the facial
expressions of ROMAN should be rectied.
The program used for the analysis of the evaluation was the
SPSS
2
(= Statistical Package for the Social Scientist). The
results of the evaluation are shown in table III and in table IV.
The left column contains the shown facial expression. The
right column contains the average values of the detected
correlation between the 6 basic facial expression and the
current picture or video.
TABLE III
THE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE PICTURES.
Presented emotion Detected strength
Anger Anger:4.5, Disgust:1.8, Fear:1.5,
Happiness:1.0, Sadness:1.4, Surprise:1.2
Disgust Anger:1.7, Disgust:2.6, Fear:1.0,
Happiness:2.6, Sadness:1.2, Surprise:3.7
Fear Anger:1.4 Disgust:1.8, Fear:3.6,
Happiness:1.5, Sadness:1.8, Surprise:3.8
Happiness Anger:1.1 Disgust:1.0, Fear:1.2,
Happiness:4.3, Sadness:1.0, Surprise:2.3
Sadness Anger:2.2 Disgust:2.3, Fear:2.8,
Happiness:1.0, Sadness:3.9, Surprise:1.3
Surprise Anger:1.3 Disgust:1.3, Fear:2.7,
Happiness:1.4, Sadness:1.6, Surprise:4.2
50% Fear Anger:1.5, Disgust:1.5, Fear:3.0,
Happiness:1.6, Sadness:1.8, Surprise:2.8
Anger, Fear and Disgust Anger:3.0, Disgust:2.4, Fear:2.0,
Happiness:1.0, Sadness:2.5, Surprise:1.4
50% Sadness Anger:2.2, Disgust:2.4, Fear:2.8,
Happiness:1.0, Sadness:3.7, Surprise:1.3
TABLE IV
THE RESULTS OF THE EXPERIMENTAL EVALUATION FOR THE VIDEOS.
Presented emotion Detected strength
Anger Anger:3.7, Disgust:2.1, Fear:2.7,
Happiness:1.4, Sadness:1.7, Surprise:1.7
Disgust Anger:3.0, Disgust:2.6, Fear:1.9,
Happiness:1.0, Sadness:2.3, Surprise:1.5
Fear Anger:1.9, Disgust:1.5, Fear:3.3,
Happiness:2.5, Sadness:1.5, Surprise:4.2
Happiness Anger:1.1, Disgust:1.0, Fear:1.1,
Happiness:4.3, Sadness:1.1, Surprise:2.5
Sadness Anger:1.6, Disgust:1.5, Fear:1.9,
Happiness:1.1, Sadness:3.6, Surprise:1.5
Surprise Anger:1.6, Disgust:1.6, Fear:3.5,
Happiness:1.3, Sadness:1.3, Surprise:4.6
50% Fear Anger:2.0, Disgust:1.8, Fear:3.2,
Happiness:1.4, Sadness:1.8, Surprise:3.5
Anger, Fear and Disgus Anger:1.9, Disgust:1.4, Fear:2.4,
Happiness:1.2, Sadness:3.4, Surprise:1.5
50% Sadness Anger:1.6, Disgust:1.4, Fear:2.0,
Happiness:1.1, Sadness:3.5, Surprise:1.5
The results of the evaluation show that the correct recog-
nition of the facial expressions anger, happiness, sadness and
2
http://www.spss.com/de/
surprise is signicant (signicance < 5%). But the facial
expressions fear and disgust are not identied. Furthermore
the results show that in most cases there are no signicant
differences between the evaluation of the pictures and the
videos (signicance < 5%). Due to continuous human
machine interaction the result of the video experiment is more
important. The analysis of the understated emotions ( fear
50% activation and sadness 50% activation) show, the subjects
recognize that the facial expression is not that strong than
in the case of 100% activation. The evaluation of the anger,
fear and disgust mixture shows that in this case no facial
expression is identied. But the selected facial expressions
are named in the most cases (similar to the interpretation of
comparable human faces expressions). Compared to Ekmans
experiments for the recognition of human facial expressions
there was no signicant difference in the evaluation of the
basic emotions anger, happiness, sadness and surprise. To
improve the perception of the facial expressions fear and
disgust it is planed to optimize the action units at the wing of
the nose. In addition new action units of the neck and of the
eyes should be used to strengthen these facial expressions [14].
VI. CONCLUSION
In this paper a humanoid head construction is introduced,
which will be used to interact with humans. One focus
of the present research is how the facial expressions of a
human being can be realized on the robot head ROMAN.
From our point of view the complexity of this problem
will be reduced, if the robot head is human-like. Using
a behavior-based control architecture for the robot head
ROMAN, facial expressions were realized. In experiments
with several persons it is shown that the generated facial
expressions are in general classied correctly.
The next steps taken in the course of this work will
include the addition of facial expression analysis to the image
processing subsystem, and in parallel to this the completion
of the mechatronical design as well as enhancements of the
implemention of the behaviour based control concept for
interaction with humans.
ACKNOWLEDGMENT
A special thank to Prof. Dr. Stephan Dutke of the Depart-
ment of Psychology of the University of Kaiserslautern, who
helped us with the experimental evaluation.
REFERENCES
[1] A. Takanishi, H. Miwa, and H. Takanobu, Development of human-
like head robots for modeling human mind and emotional human-robot
interaction, IARP International workshop on Humanoid and human
Friendly Robotics, pp. 104109, December 2002.
[2] C. L. Breazeal, Emotion and sociable humanoid robots, International
Journal of Human-Computer Studies, vol. 59, no. 12, pp. 119155,
2003.
[3] N. Esau, B. Kleinjohann, L. Kleinjohann, and D. Stichling, Mexi -
machine with emotionally extended intelligence: A software architecture
for behavior based handling of emotions and drives, in Proceedings
of the 3rd International Conference on Hybrid and Intelligent Systems
(HIS03), Melbourne, Australia, December 14-17 2003, pp. 961970.
[4] C. Breazeal, Sociable machines: Expressive social exchange between
humans and robots, Ph.D. dissertation, Massachusetts Institute Of
Technology, May 2000.
[5] Kismet, http://www.ai.mit.edu/projects/humanoid-robotics-group/
kismet/kismet.html, 2001.
[6] Emotion expression humanoid robot, http://www.takanishi.mech.
waseda.ac.jp/research/eyes/we-4rII/index.htm, 2005.
[7] M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke, Towards
a humanoid museum guide robotthat interacts with multiple persons,
in Proceedings of the IEEE-RAS/RSJ International Conference on Hu-
manoid Robots (Humanoids), Tsukuba, Japan, December 5-7 2005, pp.
418423.
[8] , Enabling a humanoid robot to interact with multiple persons,
in Proceedings of the nternational Conference on Dextrous Autonomous
Robots and Humanoids (DARH), Yverdon-les-Bains - Switzerland, May
19-22 2005.
[9] K. Berns and T. Braun, Design concept of a human-like robot head,
in Proceedings of the IEEE-RAS/RSJ International Conference on Hu-
manoid Robots (Humanoids), Tsukuba, Japan, December 5-7 2005, pp.
3237.
[10] K.-U. Scholl, V. Kepplin, J. Albiez, and R. Dillmann, Developing
robot prototypes with an expandable modular controller architecture, in
Proceedings of the International Conference on Intelligent Autonomous
Systems, Venedig, June 2000, pp. 6774.
[11] K. Scholl, V. Kepplin, J. Albiez, and R. Dillmann, Developing robot
prototypes with an expandable modular controller architecture, in
Proceedings of the International Conference on Intelligent Autonomous
Systems, Venedig, June 2000, pp. 67 74.
[12] J. Albiez, T. Luksch, K. Berns, and R. Dillmann, An activation-based
behavior control architecture for walking machines, The International
Journal on Robotics Research, Sage Publications, vol. 22, pp. 203211,
2003.
[13] P. Ekman and W. Friesen, Facial Action Coding System. Consulting
psychologist Press, Inc, 1978.
[14] P. Ekman, W. Friesen, and J. Hager, Facial Action Coding System. A
Human Face, 2002.