Affective Computing: A Fuzzy Approach
Madhusudan
Department of computer science
Himachal Pradesh University
Shimla
[email protected]
Abstract: The recent developments in the field of
Human-Computer Interaction have emphasized the focus
of design more towards user-centered rather than
computer-centered approach. This led to design of
interfaces which are highly effective, intelligent and
adaptive, and can adjust themselves with respect to the
user’s behavioral changes. Such designs are called
intelligent and affective interfaces. The proposed
framework is designed for affective computing machines,
which can sense and adapt according to the moods and
emotions of the user, using Fuzzy and Image processing
techniques.
Keywords:
HCI,
MMHCI,
affective
interfaces,
intelligentinterfaces, emotions, features, fuzzy, Ubiquitous, feature
sets, global processing, local processing, UbiComp.
I. INTRODUCTION
Affective computing is a field of Human-Computer Interaction
(HCI) and primarily focus on research and design of usercentered machines which provide better user’s experiences and
greater level of satisfaction in users while they are interacting
with computing machines. The term “Affective Computing”
was coined by Rosalind W. Picard and reflects the research
developments in emotion-aware interface design in computing
machines [1]. The ability of machines to recognize, interpret,
adapt and respond according to the emotions of the user is
called adaptive computing, and the interfaces which implement
such emotion-sensing techniques are called affective interfaces.
Since the end user of the computers and computer-like
machines are mostly the Human, the interfaces need to be
developed which take human factors in account and provides
greater level of satisfaction and usability. HCI played an
important role in the development of user-centred machines
which included the factors of user satisfaction, which were
earlier usually neglected. Initially, it was thought that creation
of such computing machines was impossible and seemed to be
an idiotic task, but with rapid changes and development in the
field of computing, speed, architectures and software’s of
machines, the vision of affect-oriented intelligent interfaces and
Ubiquitous computing can be realized [2]. The omnipresence of
machines, yet unknown to the users, supported by networking
to fulfil the information needs of the users all the times, is
called “Ubiquitous Computing” and was coined by Mark
Weiser. Earlier there were limited modes of interaction like
Dr. Aman Kumar Sharma
Department of computer science
Himachal Pradesh University
Shimla
[email protected]
keyboard, mouse, joystick etc. and they made the user move
towards the interface for interaction. One of the main aims of
Ubiquitous computing is design of machines which are capable
of moving towards the user for interaction rather than
movement of user himself to the machines [1]. The idea was to
support the main character in the computing environment i.e.
the user. This led to development of interfaces which are
intelligent and could accept the input from the user through
multiple channels (called modes) for e.g. audio, gestures, text,
gaze, facial expressions etc. Intelligent machines tend to
provide greater functionality & usability under different
circumstances such as for the person with disability. The
information from different modes is fused at some point
depending upon certain criteria like in humans the information
through different senses is processed [3]. Once the machines
that support multimodal interfaces were designed, the concept
of affective computing came to light, where the machines could
sense and interpret the emotions of the user and adapt
themselves in a manner so that user can have a better
experience comparatively. The development of affective or
emotion-sensing interfaces require the understanding of human
emotions, how they are generated, how they can be transformed
and what are the features that can help to judge them [4]. The
main reason behind all these developments was to support
Human-computer interaction as Human-Human interaction and
design the machine which could communicate with the users
the way human interacts with each other, thereby making the
process of communication between human and machines more
natural. Emotion sensing is a confluence of different fields like
Digital Image Processing (DIP), biology, physiology, neural
networks, audio processing and psychology etc. When a
particular instance of a user is captured by camera or sensor, it
need to be digitally processed for e.g. image enhancement for
noise reduction and better interpretation, image compression for
size reduction, segmentation for capturing the objects of
interest, classification and recognition for categorizing the
extracted features and finally processing of the classified
features [5][6]. This field of computer science has grown many
folds in the last few decades and as such there has been lot of
development in the field of intelligent and emotion-sensing
HCIs. Emotion sensing or emotion recognition can be defined
as a process of acquiring, analyzing, interpreting some
emotions/moods of the user. The output of such process is a
feedback which makes the interface to adapt according to the
emotion of a user [7]. According to the principles of interface
design, the interfaces should be simple, easy-to-use, userfriendly and most importantly, should support the user-point of
view rather than designer-point of view. With the advancement
in the field of HCI, more attention has been paid to the design
of interfaces and as a result intelligent and affective interfaces
have come as an effect which employs user-centered approach
while designing software or some interface of a hardware.
Intelligent HCIs are defined with respect to the mode of
information gathering. The interaction with a machine may take
place at three different levels namely; physical, cognitive, and
affective level [8]. Physical level includes devices like
keyboard, mouse etc., whereas the cognitive level defines the
interpretation or the way understands the system. The affective
level determines the user's experiences better while interacting
with the system. The input devices used for
communication/interaction includes three different classes of
input methods namely; vision-based like mouse, audio-based
like speech analyzer, and touch-based haptic devices like
touchscreens [9]. The various modes, channels that can be used
for communication with a machine determine the intelligent
behavior of the machines. If there are more than one way of
interaction, then such machines are called multimodal human
computer interaction (MMHCI) [10]. The term intelligent is
used with an interface, if it is able to capture the information
from the user with minimum physical inputs. Such devices
require some kind of intelligence in perception of the response
from the user [12]. The affective HCIs are those which are able
to interpret the human emotions and adapt itself with respect to
certain emotion. This aspect of HCI deals entirely user
experiences and mechanism to improve the level of satisfaction
in the interaction whereas intelligent HCIs deal with the ways of
information gathering [13]. Recent developments have turned
the paradigm of research towards creation of emotion-sensed
systems that can provide pleasant experiences to the users with
the help of digital images [11] [14]. We propose an affective
computing framework using Fuzzy logic on digital images
using different algorithms of segmentation and pattern
recognition for extracting emotions. The paper is organized as:
section II investigate recent developments in affective
computing, section III discuss the proposed framework,
followed by section IV in which underlying complexities and
architecture is discussed. The paper is concluded in Section V
along with future scope.
II. RECENT DEVELOPMENTS AND CHALLENGES
Affective computing is trying to assign human like capabilities
to the machines for create a better environment of interaction,
making them capable of sensing, interpreting, and generating
affect features [15]. The research on affect computing can be
traced to 19th century when the psychology and physiology
were applied to study the emotions and emotion-like features.
In the field of emotions sensing and processing, a lot of recent
developments has taken place in last three decades. These
research developments include various fields and factors like
emotional speech processing, facial expressions, body gestures
and movements, gaze detection, and MMHCIs. Emotional
speech processing differs with respect to acoustic features and
some variables were used to establish a relationship between
these acoustic features and the delivered emotion [16]. These
acoustic features were used for pattern recognition to identify
the features which reflect in the emotions and hence could be
used for designing a speech recognizer which could judge the
emotion of the user [17] [18]. In speech synthesis, emotion
control parameters were used which resulted in higher
performance and emotional keywords helped in generating the
emotional TTS system [19] [20]. The final emotion state was
determined on the basis of emotion outputs from textual
content module. Use of cognitive sciences and perception
model also reflected higher accuracy and consistency [21]. The
most researched and investigated field in emotion processing is
facial expressions. Since facial expressions or the digital
images carry most information, as such study of facial
expressions and their physiological changes with changes in
behaviour and mood of the user, could serve as one of the
medium for extracting emotions [22]. Most of the work related
to facial expressions has been carried out with the help of
digital image processing and pattern recognition algorithms
[23] [24] [25] [26]. Mesh model of human faces were also used
to study the physiological changes and their resemblance to
emotions [27]. As the advances were made in artificial
intelligence and machine learning, the algorithms under
supervised conditions were used to implement self-detection of
changes on facial features and the respective emotion from
these changes [28]. The quality in facial expressions
processing has improved remarkably which can be credited to
the recent development in computing power, performance,
speed and storage. As such larger software could be designed
for facial expressions and their processing which could
implement Hidden Markov Model (HMM), Point Distribute
Model (PDM), and Geometric Tracking Method (GTM), EGM,
and Gabor representations [29] [30] [31] [32]. Some
approaches containing both audio and video format were used
for better accuracy and quality [33]. Large body movements
and their relationship to emotions had always been an area of
study, and there has been considerable amount of research
developments in this field as in facial expression, or speech
processing. Also, most of the research in this context was
focused on hand movements only. Various designs and
frameworks have been created for judging the emotions and
their coordination with hand and body gestures [34] [35] [36].
To provide a human like communication or interaction
environment, various modes of information were fused
together for better user experience and better decision making
in machines [37] [40] [38] [39]. Multimodal systems provided
more realistic and satisfied user experiences and were
implemented with Bayesian networks and machine learning for
affect sensing [43] [42] [41]. Digital image processing has
played a significant role in the field of affective computing and
design of intelligent MMHCIs. The algorithms of image
segmentation, recognition, enhancement, classifications, and
compressions are widely used in processing of the images
acquired while interaction. Also, analyses of these digital
images containing facial features and hand movements help in
emotion-sensing of the user [2] [44]. There have been various
projects related to affective computing which were conducted
in recent decade like Oz project [45], Affective-Computing
Framework for Learning and Decision Making [46],
HUMAINE [47], BlueEyes [48] etc. There have been many
challenges in affective computing and lot of development will
be required for making the dream of human robots come true.
Some of the challenges include affective information
acquisition, affect identification, affect understanding,
differences in emotions with changes in customs and countries,
non-standardization of emotion states and effect of different
factors on emotions, affect generation, and creation of affective
databases which are quite different and complex as compared
to traditional databases [49].
III. PROPOSED FRAMEWORK
The general framework for intelligent and affective machines
comprises of four steps namely; image acquisition, followed by
segmentation for extracting regions of interest (ROI), feature
extraction and finally feature analysis. In the proposed
framework, fuzzy logic and its techniques are used for better
classification of features and their effect on overall emotion
sensing. Initially, some of the features and their values are
stored in the affective database for operation with the input
features that are collected during interaction with the
computing machine. The algorithm starts with image
acquisition through some sensor or camera which collects the
facial features and expressions of the user along with hand
movements. The proposed framework can be implemented in
two ways: firstly, with calibration, and secondly with selflearning but no calibration. In case of calibration, some images
and features of the user are pre-stored in the affective database
which contains their values, for applying arithmetic operations
on them with the acquired features. This help the learning
module in the initial stages for fast processing and learning. In
the second scenario, no images and features are kept in the
database and all the operations are performed by applying the
acquired features with the best collected features till date.
Second case is fully implemented with machine learning, while
there is little use of machine learning in the first case. Once the
features or the facial expressions are acquired, they are
processed digitally for better perception and analysis. It
includes image operations like histogram equalization, contrast
and brightness adjustments, smoothing, sharpening or negative
of the image, as per the demand of the application in which the
module is used. In the next step, the enhanced image is
segmented with the help of region growing and merging
algorithms coupled with thresholding, to generate the regions
of interest or features of the facial expressions. Set of
segmentation techniques are used that take the segments out of
the image in an efficient way. When the features or the
segments of the image are available, they are applied with the
fuzzy techniques. Here the fuzzy techniques are used on the
acquired segments and memberships of features on individual
basis are counted. It is possible that a single feature may be
tagged with different emotion values, but we keep all the tags
and membership values since they are used again while
calculating the overall value of emotion in the final step.
Bayesian theorem is also applied, mostly in the cases of
ambiguity, which works on the basis of conditional probability
to sense the emotion values more accurately and precisely
compared with some set of features.
The probability value or likelihood is calculated for
each possible matched tag and the one with maximum value is
selected. These membership values are passed to the next
module, which classify these emotions under different emotion
categories. The classifier uses machine learning and selflearning module for operation on the membership values and
feature’s tag, and for the growth of affective databases since
with time the framework is supposed to grow monotonically
and making the classification process more accurate and
precise with time.
Fuzzy membership calculations: In this framework,
we used the concept of fuzzy logic, in which not only true or
false values are calculated to assign membership, but fractional
membership values can be assigned to the objects which can be
used to describe the degree of resemblance to a certain class. In
the third step of framework, fuzzy membership values are used
to assign a value or degree of membership to the individual
features on the basis of their resemblance to a certain set of
features. This is called local fuzzy processing. For e.g. if we
consider a feature namely lips in a happy mood, then we can
assign a range of values between 0 and 1 to the lips and their
deviation from the mean position. Mean positions and values
are calculated/extracted from user’ facial expressions under
normal emotional condition.
Image Acquisition
(Via a sensor or camera with high resolution)
Image Enhancement
(Application of image processing techniques
for better analysis)
Image Segmentation
(Segmentation algorithms applied for dividing
the image into regions of interest)
Fuzzy Application on Segments at Local Scale
(Application of proposed fuzzy technique on
segments obtained in the last step)
Classification of Regions of Interest
(Necessary techniques of machine learning for
self-categorization of features for future access
and analysis)
Fuzzy Implementation at Global Scale
(Application of proposed fuzzy logic on image
as a whole for sensing the overall emotion
value)
Fig. 1. The proposed framework using fuzzy techniques
Let us consider the shape of the lips when a person is
in happy mood as; smile look, smiling, laughing, and
exhaustive laughing. Assume that the in the full exhaustive
laugh, in which the lips shape have maximum deviation from
the mean position, we assign 1 value as its membership.
Similarly, in laughing position we assign .75 value to it, in case
of smile .5 value is assigned as membership, and in smile look
.25 value is considered as membership. Other feature like eyes
can similarly be assigned membership values during happy
mood. For e.g. in case of exhaustive laugh, the size of eyes
tends to be smaller as compared to normal condition, and
hence is assigned 1 value as membership. In the same way all
the features or ROI are assigned membership for each mood.
Here machine learning plays an important role by replacing the
features in the best affective state with time, as new features
with higher values are acquired. The features stored in the
database with time, can be analysed to generalize the
framework for a class of people with some similarities. The
other way to assign membership value to the features is: we
apply the acquired features with pre-stored images,
arithmetically
subtract
them,
and
calculate
the
deviation/difference of the feature from the set of different
features. The deviation values are assigned membership values
in this case and little deviation or membership values imply
more similarity to that class. Also, since the same features can
have same membership values, yet they can belong to more
than one set of class of emotions at the same time. For e.g. the
small size of eyes can appear in laugh, weeping, hard thinking,
angry mood etc. As such we create different feature sets for
each class like happy, sad, angry, frustrated etc. All these sets
in turn contain the values of all the features and their possible
combinations to other features. In natural language processing,
a word may contain more than one part-of-speech as a tag,
similarly in facial features, a single feature may contain
different tags of class and hence lead to ambiguity. This
ambiguity is resolved by applying Bayesian theorem to
calculate the value of probability with conditional comparison
of that feature with some class. The probability value which
results in the maximum membership, is assigned as the tag to
that feature. It is possible that in some cases the total ambiguity
or ambiguity of all the features might not be resolved since it
may not be possible to resolve ambiguity by using individual
feature. This is because the overall emotion value that appears
in facial expressions is sum total of all the features in certain
proportion, and as such there is interdependence among the
features and hence need to be incorporated while determining
the probability value of other features. For e.g. while
determining the membership or class of eyes, when they are
diminished, we need to compare it with different classes like
laughing, weeping, thinking etc. So it is not possible to assign
an unambiguous tag to eyes in such case. But we can use other
feature like shape of lips to find the correct class for eyes.
Let F[i][j] be the feature set for some class, where i
represent the class of the emotion like happy, sad etc. and j
represent particular feature. The value of F[i][j] gives the
membership value of jth feature to the ith class of emotion.
i may contain the values like happy, sad, angry, frustrated etc.
in an enumerated manner like 1, 2, 3, 4 etc. Similarly, j may
contain values like eyes, lips, cheeks, ears; forehead etc. and
they can also be enumerated. Hence finally the feature sets can
be represented in an array of two dimensions and be easily
accessed. Let i=1 represent Emotion=happy, and j=1 represent
Feature=lips, then we have possible combinations as:
0, no smile/not happy
.25, little smile/not so much happy
F[1][1]=
.50, healthy smile/happy
.75, laughing/quite happy
1.0, exhaustive laugh/extremely happy
In the similar way, we calculate the feature sets for all the
emotion values and store them into affective database. These
values are calculated using the law of averages and may differ
with set of people and hence database can be supplied with
fresh values upon calibration. These values are used for
comparison with new sets of acquired images.
Now, if some feature sets like F[1][1]=F[2][1]=F[3][1], i=1,
2, 3 represent emotions Happy, Sad and angry, and j=1,
represent the feature ‘eyes’, then Bayesian theorem is used to
calculate the likelihood of the feature for assigning correct tag
to it. Let the membership of ‘eyes’=0.5 for i=1,2,3, then we
calculate
Ᵽ(έ\Ḟi), determines the value of likelihood of the
feature to some class, depending upon the membership value of
the feature and maximum possible favourable outcomes in the
past. When we compare the conditional probability of the
ambiguous feature with other extracted features and feature
sets, then Bayesian theorem give the actual tag of the feature.
For e.g. if there is ambiguity in tagging lips, then we compare
the membership of the lips in the extracted feature, and
compare it with other extracted features first and then with the
classes of emotions. In the beginning some predefined
probability value can be used and is subjected to iteration via
machine learning to refine the results for accuracy and highest
probability. If still the ambiguity is not resolved, then it is left
to be resolved at the global fuzzy processing step.
In the final step of algorithm, fuzzy techniques are
again applied on the segments as a whole image and is called
global fuzzy processing. This is to resolve either the
ambiguities in the third step, or to redefine the emotion value
on the larger scale. In this step, we calculate the sum total of
different membership values belonging to different features
and different emotion classes. All the possible feature sets were
generated in third step, and in global fuzzy processing these
smaller feature sets are combined. The feature set which results
in maximum likelihood or membership value, is selected as the
final emotion value.
For e.g. we calculate the
{F[1][1] + F[1][2] + F[1][3] + F[1][4] + F[1][5]+…..}
{F[2][1] + F[2][2] + F[2][3] + F[2][4] + F[2][5]+..…..}
For i, j=1, 2, 3, 4….
The steps of fuzzy technique differ at both levels in this
algorithm only on the basis of their scale of application. At
minor scale with individual values, it’s called local processing
while on the final set of features as a whole, it’s called global
processing.
The final feature sets are represented as fi, where i, represent
the final feature as emotion value and not in constituents. We
use Bayesian probability in this step also, but only when there
are ambiguities in final emotion value.
Self-Learning Module: Machine learning framework
is also proposed in this algorithm, which is supported by
pattern analysis module to help in deciding the pattern of
emotions with their reflection on particular feature sets.
Physiological studies of human face contour are used to locate
the pattern of the features and their resemblance to some
emotion value. Also, the machine learning module helps in
monotonic growth of affective database, and as such we have
more test cases to be operated with, which can result in better
accuracy with time.
IV. ADVANTAGES, ARCHITECTURE AND UNDERLYING
COMPLEXITIES
In the proposed model, n sensors are processing the
behaviour feature of the human beings. These multiple sensors
are collecting the data which is processed at the real time so all
the acquired data is fed to the algorithm for feature
classification. The use of the multiple processing units in the
system is to make the processing faster and feasible to design
the system that can supply real time decisions. The results of
the processed data from the individual processing units are fed
to the fuzzy inference systems.
The use of the fuzzy system is due to the fact that the
multiple inputs are crisp in nature but the particular value may
belong to the multiple features for example if the human have
facial expression describing happiness but his hand expression
may show the nature of the angriness. This means a value may
belong to the multiples set of behaviour. To handle these data,
we define IF-THEN rules and associated membership
functions. For simplicity, the Gaussian membership functions
are preferred to each of them. On the basis of the rule base, the
resultant is obtained and this result will go through the process
of defuzzification and the system will produce the desired
results. In the Figure 2, the detail description the complete
system has been depicted. The sensors are generally referring
to the video cameras or other affective sensors like wearable
devices. The results of the different cameras have been fed to
the individual processing unit to process according the body
component has been focusing. The results from the feature
extraction algorithm are provided to the fuzzy inference system
which is using the rule base and providing the results
accordingly. Massive parallelized and distributed architecture
is required for real time results. The processors should have
high speed, storage space must be fast and large, quality of
sensors need to be high, and classification algorithms should be
robust.
Use of fuzzy technique results in a large number of
feature sets and hence increase the amount of datasets
produced which require a lot of storage space. Some
compression algorithm like run length coding is used to reduce
the amount of storage space. But on the other side this large
configuration of feature sets helps in better analysis and
comparison cases, and as such results in better accuracy and
lesser ambiguity. Secondly, a study and standardization of
emotion values and their respective features is required for
calibration and scalability or generalization of the framework.
Psychology is also included in this framework since facial
expressions are one of the medium of expressing the emotion,
which transitively is function of psychological states of human
mind. Very minute observations and large data sets may result
in slow processing and elevate the requirement for fast
processing machine supported by large caches. Affective
database creation and maintenance is also a complex and timeconsuming task. There are no common theories regarding
emotions which also present an issue towards its scalability.
Fig. 2. Architecture of proposed framework using Fuzzy Inference System
Real time processing of multimodal information from
different sensors need high-performance computing
environment. Since information from different channels like
facial features, hand movements, body movements, gaze, skin
conductance and speech is fused to judge the affective levels of
users, processing of this varied and voluminous data require
distributed and parallel processing. Input signals received by
sensors are fed into processing machine to analyse the features
via feature extraction and pattern recognition algorithms.
Simply machines with limited computing and storage capacity
are not sufficient to achieve the same. Keeping in mind the
user factors of satisfaction, we need to acquire and process
information at very high rates with greater accuracy and
precision. These processing requirements can only be achieved
in massively parallelized and highly-coupled distributive
environments.
Finally, the variations in emotions with respect to culture,
customs, nationals, and creeds present a challenge in the
generalization of the framework.
V. CONCLUSION AND FUTURE SCOPE
The advancement in the field of MMHCI with the creation of
intelligent and affective interfaces has helped in achieving the
dream of ubiquitous computing though in small scale only. The
proposed framework can sense the physiological changes in
the facial features and hand movements of the user and can
adapt the affective interface with accordance to the sensed
emotion value. It provides better results in the MMHCIs which
are user-centered and are driven by the emotional behavior of
the user. Since the modular approach is used for
implementation, this framework can be used with different sets
of features and emotion values and hence achieves increased
scalability. Such framework can be implemented in Ubicomp
machines for affect sensing and adaptive interfaces.
Furthermore, there are chances of improvement in the
algorithms for better and efficient implementation. Some other
feature sets and emotion can be included and a new range of
values may be tested with different feature sets.
REFERENCES:
[1]. Rosalind W. Picard, “Affective computing’’, MIT Press, 1997.
[2]. Alan Dix, Janet Finlay, Gregory D. Abowd, Russell Beale,
“Human Computer Interaction”, Third Edition, Pearson.
[3]. Fakhreddine karray et. Al, “Human-Computer Interaction:
Overview on State of the Art”, International Journal on Smart sensing
and Intelligent Systems Vol. 1 No. 1, March 2008.
[4]. Liam J. Bannon, “From Human Factors to Human Actors: The
role of Psychology and Human-Computer Interaction Studies in
Systems Design”, Book Chapter in Grenbaum, J. & Kyng, M. (Eds.),
1991.
[5]. Maja Pantic, “Towards an Affect-Sensitive Multimodal HumanComputer Interaction”, proceedings of the IEEE, Vol. 91, No. 09,
September 2003
[6]. Nicu Sebe, “Multimodal interfaces: Challenges and perspectives”,
Journal of Ambient Intelligence and Smart Environments 1 pg. no.
19–26, 2009.
[7]. Alejandro Jaimes et. Al., “Multimodal human-computer
interaction: A survey”, Computer Vision and Image Understanding pg.
no. 116-134, Elsevier, 2007.
[8]. Zoran Duric et. Al., “Integrating Perceptual and Cognitive
Modeling for Adaptive and Intelligent Human-Computer Interaction”,
Invited paper in Proceedings of the IEEE, vol. 90 no. 07, July 2002.
[9]. Rafael C. Gonzalez, Richard E. Woods, “Digital Image
Processing”, Third Edition, Pearson Prentice Hall, 2013.
[10]. S Jayaraman, S Esakkirajan, T Veerakumar, “Digital Image
Processing”, McGraw Hill Education Private Limited, Eleventh
Reprint 2013.
[11]. Minakshi Kumar, “DIGITAL IMAGE PROCESSING”, Indian
Institute of Remote Sensing, Dehra Dun 2010.
[12]. Rafael C. Gonzalez, Richard E. Woods, “Digital Image
Processing”, Third Edition, Pearson Prentice Hall, 2013.
[13]. Terry Winograd, “Shifting viewpoints: Artificial intelligence and
human–computer interaction”, Elsevier, 1 November 2006.
[14]. Rafael C. Gonzalez, Richard E. Woods, “Digital Image
Processing”, Third Edition, Pearson Prentice Hall, 2013.
[15]. Jianjua Tao and Tieniu Tan, “Affective Computing: A Review”,
ACII 2005, LNCS 3784, pp. 981-995, 2005.
[16]. K. R. Scherer, “Vocal affect expression: A review and a model
for future research”, Psychological Bulletin, vol. 99, pp. 143–165,
1986.
[17]. F. Dellaert, T. Polzin, and A. Waibel, “Recognizing Emotion in
Speech”, In Proc. Of ICSLP 1996, Philadelphia, PA, pp. 1970-1973,
1996.
[18]. A. Petrushin, ‘‘Emotion Recognition in Speech Signal:
Experimental Study, Development and Application.’’ ICSLP2000,
Beijing.
[19]. Mozziconacci J. Sylvie and Hermes Dik, ‘‘Expression of
emotion and attitude through temporal speech variations’’,
ICSLP2000, Beijing, 2000.
[20]. Chuang Ze-Jing and Wu Chung-Hsien, ‘‘Emotion Recognition
from Textual Input using an Emotional Semantic Network,’’ In
Proceedings of International Conference on Spoken Language
Processing, ICSLP 2002, Denver, 2002.
[21]. F. Yu et. al., ‘‘Emotion Detection from Speech to Enrich
Multimedia Content’’, in the second IEEE Pacific-Rim Conference on
Multimedia, October 24-26, 2001, Beijing, China.
[22]. “Human Computer Interaction”, Alan Dix, Janet Finlay, Gregory
D. Abowd, Russell Beale, Third Edition, Pearson.
[23]. Massaro W. Dominic et. al., “Picture My Voice: Audio to Visual
Speech Synthesis using Artificial Neural Networks”, Proceedings of
AVSP’99, pp.133-138. Santa Cruz, CA, August, 1999.
[24]. E. Cosatto et. al.,“Audio-visual unit selection for the synthesis of
photo-realistic talking-heads”, IEEE International Conference on
Multimedia and Expo, ICME 2000.
[25]. N.L. Etcoff and J.J. Magee, Categorical perception of facial
expressions, Cognition, 44, 227- 240, 1992.
[26]. C. Bregler et. al.‘‘Video Rewrite: Driving Visual Speech with
Audio’’, ACM SIGGRAPH, 1997.
[27]. A. Murat Tekalp, ‘‘Face and 2-D mesh animation in MPEG-4’’,
Signal Processing: Image Communication 15 (2000) 387-421.
[28]. H. Kobayashi and F. Hara, ‘‘Recognition of Six Basic Facial
Expressions and Their Strength by Neural Network,’’ in Proc.
International Workshop Robot and Human Comm., pp. 381- 386,
1992.
[29]. E. Yamamoto, S. Nakamura, and K. Shikano, ‘‘Lip movement
synthesis from speech based on Hidden Markov Models’’, Speech
Communication, 26, (1998).
[30]. Michael J. Lyons et. al., ‘‘Coding Facial Expressions with Gabor
Wavelets’’, In Proceedings of Third IEEE International Conference
on Automatic Face and Gesture Recognition, April 14-16 1998, Nara
Japan, IEEE Computer Society, pp. 200-205.
[31]. http://www.cs.bham.ac.uk/%7Eaxs/cogaff.html
[32]. Ashish Verma, L. Venkata Subramaniam, Nitendra Rajput,
Chalapathy Neti, Tanveer A. Faruquie, ‘‘Animating Expressive Faces
Across Languages’’. IEEE Trans on Multimedia, Vol. 6, No. 6, Dec,
2004.
[33]. A. Hunt, A. Black, ‘‘Unit selection in a concatenative speech
synthesis system using a large speech database’’, ICASSP, vol. 1, pp.
373-376, 1996.
[34]. J. K. Aggarwal, Q. Cai, ‘‘Human Motion Analysis: A Review’’,
Computer Vision and Image Understanding, Vol. 73, No. 3, 1999.
[35]. Vladimir et. al., ‘‘Visual Interpretation of Hand Gestures for
Human-Computer Interaction: A Review’’, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 1997.
[36]. D. M. Gavrila, ‘‘The Visual Analysis of Human Movement: A
Survey’’, Computer Vision and Image Understanding, Vol. 73, No.1.
January, 1999.
[37]. H. Schlossberg, Three dimensions of emotion, Psychological
review, 61, 81-88, 1954.
[38]. Mozziconacci J. Sylvie and Hermes Dik, ‘‘Expression of
emotion and attitude through temporal speech variations’’,
ICSLP2000, Beijing, 2000.
[39]. Rosalind W. Picard, “Affective computing: From Laughter to
IEEE’’, IEEE Transaction on Affective Computing, vol. 1, no. 1, JanJun, 2010.
[40]. Rosalind W. Picard, “Affective computing’’, MIT Press, 1997.
[41]. R. Brunelli and D. Falavigna, ‘‘Person Identification Using
Multiple Cues’’, IEEE Transaction On Pattern Analysis and Machine
Intelligence, Vol. 12, No. 10, pp. 955-966, Oct 1995.
[42]. A.Camurri, G.De Poli, M.Leman, G.Volpe, “A Multi-layered
Conceptual Framework for Expressive Gesture Applications”, in
Proc. Intl MOSART Workshop, Barcelona, Nov. 2001.
[43]. R. Cowie., “Emotion recognition in human-computer
interaction”, IEEE Signal Processing Magazine, 18(1):32-80, 2001.
[44]. P. Romero et. al., “A Novel Real Time Facial Expression
Recognition system based on Candide-3 Reconstruction Model”, in
proc. of the XIV Workshop on Physical Agents, September, 2013.
[45]. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/oz/web/oz.html.
[46]. http://affect.media.mit.edu/
[47]. http://emotion-research.net/.
[48]. http://www.almaden.ibm.com/cs/BlueEyes/index.html.
[49]. Rosalind Picard, ‘‘Affective Computing: Challenges’’,
International Journal of Human-Computer Interaction, ELSEVIER,
2003.