Papers by LouisPhilippe Morency
Abstract With more than 10,000 new videos posted online every day on social websites such as YouT... more Abstract With more than 10,000 new videos posted online every day on social websites such as YouTube and Facebook, the internet is becoming an almost infinite source of information. One crucial challenge for the coming decade is to be able to harvest relevant information from this constant flow of multimodal data.
Abstract Face-to-face communication is a highly dynamic process where participants mutually excha... more Abstract Face-to-face communication is a highly dynamic process where participants mutually exchange and interpret linguistic and gestural signals. Even when only one person speaks at the time, other participants exchange information continuously amongst themselves and with the speaker through gesture, gaze, posture and facial expressions.
Abstract There are a multitude of annotated behavior corpora (manual and automatic annotations) a... more Abstract There are a multitude of annotated behavior corpora (manual and automatic annotations) available as research expands in multimodal analysis of human behavior. Despite the rich representations within these datasets, search strategies are limited with respect to the advanced representations and complex structures describing human interaction sequences. The relationships amongst human interactions are structural in nature.
Dialogue act labels are being used to represent a higher level intention of utterances during hum... more Dialogue act labels are being used to represent a higher level intention of utterances during human conversation (Stolcke et al., 2000). Automatic dialogue act recognition is still an active research topic. The conventional approach is to train one generic classifier using a large corpus of annotated utterances (Stolcke et al., 2000). One aspect that makes it so challenging is that people can express the same intentions using a very different set of spoken words.
In this work we investigate the use of multimodal semi-supervised learning to train a classifier ... more In this work we investigate the use of multimodal semi-supervised learning to train a classifier which detects user agreement during a dialog with a robotic agent. Separate 'views' of the user's agreement are given by head nods and keywords in the user's speech. We develop a co-training algorithm for the gesture and speech classifiers to adapt each classifier to a particular user and increase recognition performance. Multimodal co-training allows user-adaptive models without labeled training data for that user.
In typical communication situations, it is desirable to avoid any type of simultaneous talking du... more In typical communication situations, it is desirable to avoid any type of simultaneous talking due to lack of coordination between communicators, as it is not easy to maintain sufficient mutual clarity over conversation at the same time. Researchers have long commented on the lack of coordination in the turn-taking of conversation partners [2]. In our previous study [5] we mainly investigated people's self-disclosure in the interview interaction with real human videos and virtual agents.
Abstract—Many real-world face and gesture datasets are by nature imbalanced across classes. Conve... more Abstract—Many real-world face and gesture datasets are by nature imbalanced across classes. Conventional statistical learning models (eg, SVM, HMM, CRF), however, are sensitive to imbalanced datasets. In this paper we show how an imbalanced dataset affects the performance of a standard learning algorithm, and propose a distribution-sensitive prior to deal with the imbalanced data problem.
Hesitations, and pause fillers (eg “um”,“uh”), occur frequently in everyday conversations or mono... more Hesitations, and pause fillers (eg “um”,“uh”), occur frequently in everyday conversations or monologues. They can be observed for a wide range of reasons including: lexical access, structuring of utterances, and requesting feedback from the listener [1]. In this study we investigate the usefulness of pause fillers as a feature for the prediction of backchannels using conditional random fields (CRF)[2] within a large corpus of interactions.
Humans often share personal information with others in order to create social connections. Sharin... more Humans often share personal information with others in order to create social connections. Sharing personal information is especially important in counseling interactions [2]. Research studying the relationship between intimate self-disclosure and human behavior critically informs the development of virtual agents that create rapport with human interaction partners. One significant example of this application is using virtual agents as counselors in psychotherapeutic situations.
Abstract In this work we study the effectiveness of speaker adaptation for dialogue act recogniti... more Abstract In this work we study the effectiveness of speaker adaptation for dialogue act recognition in multiparty meetings. First, we analyze idiosyncracy in dialogue verbal acts by qualitatively studying the differences and conflicts among speakers and by quantitively comparing speaker-specific models. Based on these observations, we propose a new approach for dialogue act recognition based on reweighted domain adaptation which effectively balance the influence of speaker specific and other speakers' data.
Abstract Be it in our workplace or with our family or friends, negotiation comprises a fundamenta... more Abstract Be it in our workplace or with our family or friends, negotiation comprises a fundamental fabric of our everyday life, and it is apparent that a system that can automatically predict negotiation outcomes will have substantial implications. In this paper, we focus on finding nonverbal behaviors that are predictive of immediate outcomes (acceptances or rejections of proposals) in a dyadic negotiation.
Abstract In his study, we investigate low level predictors from audio and writing modalities for ... more Abstract In his study, we investigate low level predictors from audio and writing modalities for the separation and identification of socially dominant leaders and experts within a study group. We use a multimodal dataset of situated computer assisted group learning tasks: Groups of three high-school students solve a number of mathematical problems in two separate sessions.
During face-to-face conversation, people use visual feedback to communicate relevant information ... more During face-to-face conversation, people use visual feedback to communicate relevant information and to synchronize rhythm between participants. A good example of nonverbal feedback is head nodding and its use for visual grounding, turn-taking and answering yes/no questions. When recognizing visual feedback, people use more than their visual perception. Knowledge about the current topic and expectations from previous utterances help guide our visual perception in recognizing nonverbal cues.
Modern virtual agents require knowledge about their environment, the interaction itself, and thei... more Modern virtual agents require knowledge about their environment, the interaction itself, and their interlocutors' behavior in order to be able to show appropriate nonverbal behavior as well as to adapt dialog policies accordingly. Recent achievements in the area of automatic behavior recognition and understanding can provide information about the interactants' multimodal nonverbal behavior and subsequently their affective states.
Abstract Human emotion is an important part of human-human communication, since the emotional sta... more Abstract Human emotion is an important part of human-human communication, since the emotional state of an individual often affects the way that he/she reacts to others. In this paper, we present a method based on concatenated Hidden Markov Model (co-HMM) to infer the dimensional and continuous emotion labels from audio-visual cues. Our method is based on the assumption that continuous emotion levels can be modeled by a set of discrete values.
Abstract—A wide number of problems in face and gesture analysis involve the labeling of temporal ... more Abstract—A wide number of problems in face and gesture analysis involve the labeling of temporal sequences. In this paper, we introduce a discriminative model for such sequence labeling tasks. This model involves two layers of latent dynamics, each with their separate roles. The first layer, the neural network or gating layer, aims to extract non-linear relationships between input data and output labels.
Abstract One of the main challenges in facial expression recognition is illumination invariance. ... more Abstract One of the main challenges in facial expression recognition is illumination invariance. Our long-term goal is to develop a system for automatic facial expression recognition that is robust to light variations. In this paper, we introduce a novel 3D Relightable Facial Expression (ICT-3DRFE) database that enables experimentation in the fields of both computer graphics and computer vision.
Abstract We present 3D Constrained Local Model (CLM-Z) for robust facial feature tracking under v... more Abstract We present 3D Constrained Local Model (CLM-Z) for robust facial feature tracking under varying pose. Our approach integrates both depth and intensity information in a common framework. We show the benefit of our CLM-Z method in both accuracy and convergence rates over regular CLM formulation through experiments on publicly available datasets. Additionally, we demonstrate a way to combine a rigid head pose tracker with CLM-Z that benefits rigid head tracking.
During face-to-face communication, people continuously exchange para-linguistic information such ... more During face-to-face communication, people continuously exchange para-linguistic information such as their emotional state through facial expressions, posture shifts, gaze patterns and prosody. These affective signals are subtle and complex. In this paper, we propose to explicitly model the interaction between the high level perceptual features using Latent-Dynamic Conditional Random Fields.
This work is part of a research effort to understand and characterize the morphological and dynam... more This work is part of a research effort to understand and characterize the morphological and dynamic features of polite and amused smiles. We analyzed a dataset consisting of young adults (n= 61), interested in learning about banking services, who met with a professional banker face-to-face in a conference room while both participants' faces were unobtrusively recorded.
Uploads
Papers by LouisPhilippe Morency