Final Report
Final Report
Final Report
1.1 Introduction
A Facial expression is the visible manifestation of the affective state, cognitive activity,
intention, personality and psychopathology of a person and plays a communicative role in
interpersonal relations. It have been studied for a long period of time and obtaining the progress
recent decades. Though much progress has been made, recognizing facial expression with a high
accuracy remains to be difficult due to the complexity and varieties of facial expressions.
Generally human beings can convey intentions and emotions through nonverbal ways such as
gestures, facial expressions and involuntary languages. This system can be significantly useful,
nonverbal way for people to communicate with each other. The important thing is how fluently
the system detects or extracts the facial expression from image. The system is growing attention
because this could be widely used in many fields like lie detection, medical assessment and
human computer interface. The Facial Action Coding System (FACS), which was proposed in
1978 by Ekman and refined in 2002, is a very popular facial expression analysis tool.
On day to day basics humans commonly recognize emotions by characteristic features, displayed
as a part of a facial expression. For instance happiness is undeniably associated with a smile or
an upward movement of the corners of the lips. Similarly other emotions are characterized by
other deformations typical to a particular expression. Research into automatic recognition of
facial expressions addresses the problems surrounding the representation and categorization of
static or dynamic characteristics of these deformations of face pigmentation.
The system classifies facial expression of the same person into the basic emotions namely
anger, disgust, fear, happiness, sadness and surprise. The main purpose of this system is efficient
interaction between human beings and machines using eye gaze, facial expressions, cognitive
modelling etc. Here, detection and classification of facial expressions can be used as a natural
way for the interaction between man and machine. And the system intensity vary from person to
person and also varies along with age, gender, size and shape of face, and further, even the
expressions of the same person do not remain constant with time.
However, the inherent variability of facial images caused by different factors like variations in
illumination, pose, alignment, occlusions makes expression recognition a challenging task. Some
1
surveys on facial feature representations for face recognition and expression analysis addressed
these challenges and possible solutions in detail.
1.2 Aim:
The research aims to evaluate the potential for emotion recognition technology to
improve the quality of human-computer interaction.
1.3 Objectives:
The motivation behind choosing this topic specifically lies in the huge
Investments large corporations do in feedbacks and surveys but fail to get
equitable response on their investments.
Also, in today’s networked world the need to maintain security of information or physical
property is becoming both increasingly important and increasingly difficult. In countries like
India the rate of crimes are increasing day by day. No automatic systems are there that can track
person’s activity. If we will be able to track Facial expressions of persons automatically then we
can find the criminal easily since facial expressions changes doing different activities. So we
decided to make a Facial Expression Recognition System.
We are interested in this project after we went through few papers in this area. The papers
were published as per their system creation and way of creating the system for accurate and
reliable facial expression recognition system.
As a result we are highly motivated to develop a system that recognizes facial expression
and track one person’s activity.
Human emotions and intentions are expressed through facial expressions and deriving an
efficient and effective feature is the fundamental component of facial expression system.
Emotion Detection through facial gestures is a technology that aims to improve product and
services performance by monitoring customer behavior to certain products or service staff by
their evaluation. Facial expressions convey non-verbal cues, which play an important role in
interpersonal relations. Automatic recognition of facial expressions can be an important
component of natural human-machine interfaces; it may also be used in behavioral science and in
clinical practice. An automatic Facial Expression Recognition system needs to solve the
following problems: detection and location of faces in a cluttered scene, facial feature extraction,
and facial expression classification.
Chapter 2: Review of Literature
Research paper 1
GJCST-F
Volume 17 Issue 1 Version 1.0 Year 2017
Online ISSN: 0975-4172 & Print ISSN: 0975-4350 Publisher: Global Journals Inc. (USA)
Type: Double Blind Peer Reviewed International Research Journal Graphics & vision G
I. Introduction
Most common exposition of an idea of emotion could be found as "a natural instinctive
state of mind deriving from one's circumstances, mood, or relationships with others". Which
misses depicting the driving force behind all motivation which may positive, negative or neutral?
This is very important information to understand emotion as an intelligent agent. It is very
complicated to detect the emotions and distinguish among them. Before a decades or two
emotions started to become a concern as an important addition towards the modern technology
world. Rises the hope of new dawn for intelligence apparatus. Imagine a world where machines
do feel what humans need or want. With the special kind of calculation then that machine could
predict the further consequences and by which mankind could avoid serious circumstances and
lot more. Humans are far more strong and intelligent due to the addition of the emotion but less
effective than machines. But what if machines get this special features of human? It will be the
strongest addition to the technology ever. And to make the dreams come true this is the first step;
train a system to spot and recognize emotions. This is the start of an intelligent system.
Intelligent Systems are becoming more efficient by predicting and classifying decision in various
aspects of practical life.
A. Yao, D. Cai, P. Hu, HoloNet: towards robust Achieved mean recognition rate of
S. Wang, L. Shan and Y. emotion recognition in the 57.84%.
Chen (2016) wild
Yelin Kim and Emily Data driven framework to Achieved 65.60% UW accuracy,
Mower Provos (2016) explore patterns (timings 1.90% higher than the baseline.
and durations) of emotion
evidence, specific to
individual emotion classes
Table 2.1: Emotion recognition different approach and successes
III.Future Scope
We are working towards a machine with emotions. A machine or a system, which can
think like humans, can feel warmness of heart; can judge on events, prioritized between choices
and with many more emotional epithets. To make the dream reality first we need the machine or
system to understand human emotions, ape the emotion and master it. We just started to do that.
Though there is some real example exists this days. Some features and services are getting
popularity like Microsoft Cognitive Services but still there is a lot works required in the terms of
efficiency, accuracy and usability. Therefore, in future Emotion Recognition is an area requires a
great intentness.
IV. Conclusions
In this Paper we discussed about the work done on emotion recognition and for achieving
that all superior and novel approaches and methods. We have proposed a glimpse of a probable
solution and method towards recognition the emotion. Work so far substantiate that emotion
recognition using users EEG signal and audiovisual signal has the highest recognition rate and
has highest performance.
Research paper 2
Abstract— In this paper, we introduce seven emotions and positive and negative
emotion recognition methods using facial images and the development of apps based on the
method. In previous researches, they used the deep-learning technology to generate models with
emotion-based facial expressions to recognized emotions. There are existing apps that express
six emotions, but not seven emotions and positive and negatives in graphs and percentages.
Thus, we recognized seven emotions such as Angry, Disgust, Fear, Happy, Sad, Surprise, and
Neutral and also classified the calculated emotion-recognition scores into positive, negative and
neutral emotions. Then we implemented an app that provides the user with seven emotions
scored and positive and negative emotions. Keywords— Emotion recognition, Positive and
negative, Scores, Facial images, Deep-learning.
V. CONCLUSION
In this paper, we proposed an emotion-recognition method using facial images and
implemented an app that provides seven emotion and positive and negative emotion-recognition
results to users. As a result, when we applied those recognition methods into apps, application
performance rate was 50.7% in seven emotions and in positive and negative was 72.3%. In the
future, we will improve the recognition rate by adding more emotional databases and modified
some parts of deep-learning algorithm. In addition, our research will be carried out to recognize
the user’s intention as well as the current user’s emotion recognition.
Research paper 3
I. INTRODUCTION
General expressions may include word choices, tone of voice, and body languages, such as
posture and physiology responses. Among the sensible emotional reactions, facial expressions
usually changes distances between facial feature points while emotion is excited. According to
the significant facial feature points, such as positions of eyebrows, eyes, and lips, we can
determine various facial expressions. Ekman and Friesen identified six basic human emotions:
fear, surprise, sad, anger, disgust, happy and their associated facial expressions.
Physiological signals can greatly help assessing and quantifying stress, tension, anger, and
other emotions that influence health. In general, physiological reactions are non-autonomic
nerves in physiology. Thus, the three physiological signals were adopted in this paper. In this
paper, a novel approach is proposed to recognize four emotions (fear, love, joy and surprise) by
facial expression and physiological signals.
Remaining parts of this paper are organized as follows. In Section II, the mood induction
experiment setup for collecting facial and physiological features is described. Experimental
results are provided in Section III to show the effectiveness of the proposed system. Finally,
conclusions are drawn in Section IV.
II.FACIAL EXPRESSION AND PHYSIOLOGICAL SIGNALS
ACQUISITION
Fig 2.2 The schematic diagram of the mood induction and signal acquisition environment.
A series of specific designed mood induction events is performed to acquire facial
expressions and physiological signals. The schematic diagram of the acquisition procedure is
shown.
III.RESULTS
A series of specific designed emotion induction experiment is performed to collect facial
expression images and their corresponding physiological signals including four kinds of emotion
(fear, love, joy, and surprise) from six subjects. The duration of physiological signal of each
emotion is ten seconds. The sampling rate of the physiological signal and the camera is 20Hz.
Samples of facial expression image are shown in Fig. 2.3. Figure 2.3(a-d) show the facial images
of expression of ―fear‖, ―love‖, ―joy‖, and ―surprise‖, respectively. Samples of physiological signals
are shown in Fig. 2.3.
Introduction: Facial expressions are considered as the most indicative, strong and
natural way of knowing the psychological state of a person during the communication.
Fifty five percent of the communication is done by the means of facial expressions. In
Facial Expressions Recognition System the image is processed to extract such
information from it, which can help in recognizing six universal expressions that
are neutral, happy, sad, angry, disgust and surprise. This processing is done in several
phases including image acquisition, features extraction and finally expressions
classification using different techniques.
Fig2.5 : Different Upper Face Action Units & their Combinations Fig2.6 : Different Lower Face Action Units & their Combinations
C)Parameterization:
1)Geometric based parameterization:
It is one of the oldest techniques in which the tracking and processing is done on some
of the spots on the facial images. Spatial location and shapes of the facial points as
feature vectors for classification of the expressions.
2) Appearance based parameterization:
Instead of using the spatial points for tracking position, movement parameters are
used in this approach. These parameters vary with the time and colors of the related
regions of the face. Different types of features like haar wavelet coefficients, gabor
wavelets along with the feature extraction and selection techniques like PCA and
LDA are used within this concept.
Challenges:
The very first problem that is being faced recently is unavailability of the databases
having the spontaneous facial expressions and creating such database is one of the
major challenges. if the subjects have prior knowledge that they are being captured
then the expressions will not remain natural .
The next problem is having these spontaneous expressions with different medical and
lighting conditions. For this type of cases the approach of using the hidden camera
will not work out well. Finding out the labeled data for testing as well as training is
also a challenge for working in this field. Unlabeled data is easily available and one
can find such data in huge amount. Labeling of data is a lengthy and a complicated
process, it consumes a lot of time and the chances of error are also very high.
Other factors that affects the expressions like: Subjects belonging to different cultures
like Asians, Europeans, age groups will have different expressions.
Angles of head and their rotations are also a big concern.
Future scope:
A future extension of facial expressions analysis could be the analysis of micro expression.
Nowadays only few training techniques are available that works for micro expressions. To avoid
the problems with labeling of data semi-supervised learning techniques could be used because
they allows the use of labeled data along with the unlabeled data.
Chapter 3: Proposed Solution
The Facial Expression Recognition system includes the major stages such as face image
preprocessing, feature extraction and classification.
Firstly, the image is taken from test database and face detection from the image is done. When
the face is detected important features are extracted from the facial image like eyes, eyebrows,
lips etc. After extracting these important features, the expression is classified by comparing the
image with the images in the training dataset using some algorithm.
I. Preprocessing
Preprocessing is a process which can be used to improve the performance of the FER
system and it can be carried out before feature extraction process .Image preprocessing includes
different types of processes such as image clarity and scaling, contrast adjustment, and additional
enhancement processes to improve the expression frames .The cropping and scaling processes
were performed on the face image in which the nose of the face is taken as midpoint and the
other important facial components are included physically.Normalization is the preprocessing
method which can be designed for reduction of illumination and variations of the face images
with the median filter and to achieve an improved face image. The normalization method also
used for the extraction of eye positions which make more robust to personality differences for
the FER system and it provides more clarity to the input images. Localization is a preprocessing
method and it uses the Viola-Jones algorithm to detect the facial images from the input image.
Detection of size and location of the face images using Adaboost learning algorithm and haar
like features.The localization is mainly used for spotting the size and locations of the face from
the image. Face alignment is also the preprocessing method which can be performed by using the
SIFT (Scale Invariant Feature Transform) flow algorithm. For this, first calculate reference
image for each face expressions. After that all the images are aligned through related reference
images. The histogram equalization method is used to conquer the illumination variations .This
method is mainly used for enhancing the contrast of the face images and for exact lighting also
used to improve the distinction between the intensities.
In FER, more preprocessing methods are used but the ROI segmentation process is more
suitable because it detects the face organs accurately which organs are is mainly used for
expression recognition. Next the histogram equalization is also another one important
preprocessing technique for FER because it improves the image distinction.
II. Feature extraction
Feature extraction process is the next stage of FER system. Feature extraction is finding
and depicting of positive features of concern within an image for further processing. In image
processing computer vision feature extraction is a significant stage, whereas it spots the move
from graphic to implicit data depiction. Then these data depiction can be used as an input to the
classification. The feature extraction methods are categorized into five types such as texture
feature-based method, edge based method, global and local feature-based method, geometric
feature-based method and patch-based method.
III. Classification
Classification is the final stage of FER system in which the classifier categorizes the
expression such as smile, sad, surprise, anger, fear, disgust and neutral. The CNN contains two
important perceptions likely shared weight and sparse connectivity . In FER, the CNN classifier
used as multiple classifiers for the different face regions. If CNN is framed for entire face image
then first frame the CNN for mouth area and next for eye area likely for each other area CNNs
are framed . Deep Neural Network (DNN) contains various hidden layers and the more difficult
functions are trained efficiently comparing with other neural networks. The neural network based
classifier CNN gives better accuracy than the other neural network based classifiers. The various
FER techniques with their algorithm are analyzed which includes the algorithms that are used for
three important requirements such as preprocessing, feature extraction and classification.
Fig 3.2 Sample images from JAFFE Fig 3.3 Sample images from CK
Database Database
Flowchart of Facial Expression Recognition
The facial expression recognition system is trained using supervised learning approach in which
it takes images of different facial expressions. The system includes the training and testing phase
followed by image acquisition, face detection, image preprocessing, feature extraction and
classification. Face detection and feature extraction are carried out from face images and then
classified into six classes belonging to six basic expressions which are outlined below:
1. Image Acquisition
The design starts with the initializing CNN model by taking an input image (static or
dynamic) .Images used for facial expression recognition are static images or image sequences.
Images of face can be captured using camera.
2. Face detection
Face Detection is useful in detection of facial image. Face Detection is carried out in training
dataset using Haar classifier called Voila-Jones face detector and implemented through Opencv.
Haar like features encodes the difference in average intensity in different parts of the image and
consists of black and white connected rectangles in which the value of the feature is the
difference of sum of pixel values in black and white regions.
3. Image Pre-processing
Image pre-processing includes the removal of noise and normalization against the variation of
pixel position or brightness.
a) Color Normalization
b) Histogram Normalization
4. Feature Extraction
Selection of the feature vector is the most important part in a pattern classification problem.
The image of face after pre-processing is then used for extracting the important features. The
inherent problems related to image classification include the scale, pose, translation and
variations in illumination level.
The feature will be extracted through the max-pooling method by creating the model with .h5
extension and then compile the model with loss and optimizer. Here we import haar cascade for
face recognition which is in XML format.
OpenCV: Open Source Computer Vision Library provides a common infrastructure for
computer vision applications for humans and computer vision which contains 2500 optimized
algorithms. These algorithms used for face detection, identification of objects for training and
detecting objects.
Keras: Keras is an open-source neural network in python, which is used for the
preprocessing, modeling, evaluating, and optimization. It is used for high-level API as it handled
by backend. It is designed for making a model with loss and optimizer function, and training
process with fit function. Keras does not support low-level graphs and computations as it
handled by the backend engine. For backend, it designed for convolution and low-level
computation under tensors or TensorFlow. Importing the below python libraries are used for
preprocessing, modelling, optimization, and testing.
Importing the below python libraries are used for preprocessing, modelling, optimization,
and testing.
Python Libraries
5.Classification
The dimensionality of data obtained from the feature extraction method is very high so it is
reduced using classification. Features should take different values for object belonging to
different class so classification will be done. Here emotions are classified as happy, sad, angry,
surprise, neutral, disgust, and fear with 34,488 images for the training dataset and 1,250 for
testing. Each emotion is expressed with different facial features like eyebrows, opening the
mouth, Raised cheeks, wrinkles around the nose, wide-open eyelids and many others.
Trained the large dataset for better accuracy and result that is the object class for an input
image.Based on those features it performs convolution layers and max pooling.
6.System Evaluation
Evaluation of the system will be done.
ACKNOWLEDGEMENT
We would like to express special thanks of gratitude to our mentor Professor Srindi
Gindi as who guided us and gave us this opportunity to do this wonderful project on the topic of
―Emotion Recognition with Consideration of Facial Expression‖, which also helped us in doing a
research and we came to know about so many new things. We would also like to thanks our
HOD Er. Zainab Mirza for providing the opportunity to implement our project. We are very
thankful to them. Finally we would like to thank our parents and friends who help us a lot in
finalizing this project within the limited time frame.
REFERENCE
[1] Franc¸ois Chollet. Xception: Deep learning with depthwise separable convolutions.
CoRR, abs/1610.02357, 2016.
[2] Andrew G. Howard et al. Mobilenets: Efficient convolutional neural networks for
mobile vision applications. CoRR, abs/1704.04861, 2017.
[3] Dario Amodei et al. Deep speech 2: End-to-end speech recognition in english and
mandarin. CoRR, abs/1512.02595, 2015.
[4] Ian Goodfellow et al. Challenges in Representation Learning: A report on three
machine learning contests, 2013.
[5] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural
networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence
and Statistics, pages 315–323, 2011.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 770–778, 2016.
[7] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network
training by reducing internal covariate shift. In International Conference on Machine Learning,
pages 448–456, 2015.
[8] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980, 2014.
[9] Rasmus Rothe, Radu Timofte, and Luc Van Gool. Deep expectation of real and apparent age
from a single image without facial landmarks. International Journal of Computer Vision (IJCV),
July 2016.
[10] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.