Papers by Philippe Pasquier
Choreography is an embodied and complex creative process that often relies on ‘co-imagining’ as a... more Choreography is an embodied and complex creative process that often relies on ‘co-imagining’ as a strategy in generating new movement ideas. Technology has historically been used as a tool to augment creative opportunities in choreographic process, with multiple choreographic support tools designed to function as a ‘blank slate’ for choreography. However, few of these tools support creative authoring with interactive or generative components. Cochoreo is a sub-module for generating body positions as keyframes that catalyze creative movement, as part of the movement sketching tool idanceForms (idF). Cochoreo catalyzes movement sketching by using parameters from Laban Movement Analysis, an existing movement framework, to generate unique keyframes that are used as seed material for choreographic process. idF is a creativity support tool that engages with choreographers’ creative movement process by design. This paper presents the design of Cochoreo and evaluations from our pilot study ...
Journal of New Music Research, May 22, 2019
Autonomously generating artificial soundscapes for video games, virtual reality, and sound art pr... more Autonomously generating artificial soundscapes for video games, virtual reality, and sound art presents several non-trivial challenges. We outline a system called Audio Metaphor that is built upon the notion that sound design for soundscape compositions is emotionally informed. We first define the problem space of generating soundscape compositions referencing the sound design and soundscape literature. Next, we survey the state-of-the-art soundscape generation systems and establish the characteristics and challenges for evaluating these types of systems. We then describe the Audio Metaphor system that aims to model the soundscape generation problem using a method of soundscape emotion recognition and segmentation based on perceptual classes, and an autonomous mixing engine utilising optimisation and prediction algorithms to generate a soundscape composition. We evaluate the soundscape compositions generated by Audio Metaphor by comparing them with those created by a human expert and also those generated randomly. Our analysis of the evaluation study reveals that the proposed soundscape generation model is human-competitive regarding semantic and emotion-based indicators.
Zenodo (CERN European Organization for Nuclear Research), Jun 7, 2022
Soundscape composition and design is the creative practice of processing and combining sound reco... more Soundscape composition and design is the creative practice of processing and combining sound recordings to evoke auditory associations and memories within a listener. We present a new set of classification and segmentation algorithms as part of Audio Metaphor (AUME), a generative system for creating novel soundscape compositions. Audio Metaphor processes natural language queries from a user to retrieve semantically linked sound recordings from a database containing 395,541 audio files. Building off previous work, we implemented a new audio feature extractor and conducted experiments to test the accuracy of the updated system. We then classified audio files based on general soundscape composition categories, improved emotion prediction, and refined our segmentation algorithm. The model maintains a good accuracy in segment classification, and we significantly improved valence and arousal prediction models-as noted by the r-squared (72.2% and 92.0%) and mean squared error values (0.09 and 0.03) in valence and arousal respectively. An empirical analysis, among other improvements, finds that the new system provides better segmentation results.
arXiv (Cornell University), Feb 4, 2021
The ability to explain decisions to end-users is a necessity to deploy AI as critical decision su... more The ability to explain decisions to end-users is a necessity to deploy AI as critical decision support. Yet making AI explainable to non-technical end-users is a relatively ignored and challenging problem. To bridge the gap, we first identify twelve end-userfriendly explanatory forms that do not require technical knowledge to comprehend, including feature-, example-, and rule-based explanations. We then instantiate the explanatory forms as prototyping cards in four AI-assisted critical decision-making tasks, and conduct a user study to co-design low-fidelity prototypes with 32 layperson participants. The results confirm the relevance of using explanatory forms as building blocks of explanations, and identify their proprieties-pros, cons, applicable explanation goals, and design implications. The explanatory forms, their proprieties, and prototyping supports (including a suggested prototyping process, design templates and exemplars, and associated algorithms to actualize explanatory forms) constitute the End-User-Centered explainable AI framework EUCA, and is available at http://weinajin.github.io/end-user-xai. It serves as a practical prototyping toolkit for HCI/AI practitioners and researchers to understand user requirements and build end-user-centered explainable AI. CCS Concepts: • Computing methodologies → Artificial intelligence; • Human-centered computing → User studies.
Zenodo (CERN European Organization for Nuclear Research), Jun 7, 2022
The development of computer-assisted composition (CAC) systems is a research activity that dates ... more The development of computer-assisted composition (CAC) systems is a research activity that dates back to at least the works by IRCAM on OpenMusic [1]. CAC is a field that is concerned with developing systems that are capable of automating partially or completely the process of music composition. There exists several compositional tasks a system can address (e.g. rhythm generation, harmonization, melody generation, etc). These tasks can be realized with machine learning (ML) algorithms given a conditioning or not on prior musical sequences. Many ML-based CAC systems have emerged from both academia and industry over the years [2] [3]. For the majority of them, the user continuously generate music by tweaking a set of parameters that influences the model's generation. Building on top of Apollo, an interactive web environment that makes corpus-based music algorithms available for training and generation via a convenient graphical interface [4], Calliope is specialized for advanced MIDI manipulation in the browser and generative controllability of the Multi-Track Music Machine (MMM) model [5] for batch generation of partial or complete multi-track compositions. The aim is to enable the ability for composers to effectively co-create with a generative system. Calliope is built in Node.js, the Web stack (HTML, CSS, Javascript) and MongoDB. It is made interoperable with the MMM pretrained model via the Python runtime. MMM offers both global-level deep learning parameters (e.g. temperature) and track-level music-based constraint parameters: note density, polyphony range and note duration range. Bar selection can be used to refine the request for generation. It is also possible to delete or add MIDI tracks to an existing MIDI file in order to generate on a subset of the tracks or to generate a new track for a given composition. The composer makes use of all these varied controls to steer the generative behavior of the model and guide the composition process. Batch generation of musical outputs is implemented via the MMM's Python interface which offers batch support natively. This ability means that the composer can rapidly explore alternatives, including generating from a previously generated output, for a given set of control parameters. We have tested batch requests of 5, up to 1000 generated music excerpts at a Copyright: © 2022 Renaud Bougueng et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
One of the most important decisions made when training neural networks, is how to represent the d... more One of the most important decisions made when training neural networks, is how to represent the data. Despite the large number of possible representations, the piano roll dominates recent literature on modelling music [2, 3]. Furthermore, previous work suggests that modelling simpler conditional probability distributions such as p(pitch|rhythm), may be an advantageous approach to the complex task of modelling music [5]. However, many alternate representations are more difficult to implement than the piano roll. Motivated by these factors, we propose an accessible framework for symbolic musical data storage, and dataset construction, which supports a variety of representations.
Video editors are facing the challenge of montage editing when dealing with massive amount of vid... more Video editors are facing the challenge of montage editing when dealing with massive amount of video shots. The major problem is selecting the feature they want to use for building repetition patterns in montage editing. It is time-consuming when testing various features for repetitions and watching videos one by one. A visualization tool for video features could be useful for assisting montage editing. Such a visualization tool is not currently available. We present the design of ViVid, an interactive system for visualizing video features for particular target videos. ViVid is a generic tool for computer-assisted montage and for the design of generative video arts, which could take advantage of the information of video features for rendering the piece. The system computes sand visualizes the color information, motion and texture information data. Instead of visualizing original feature data frame by frame, we re-arranged the data and used both statistics of video feature data and fr...
We propose CAEMSI, a cross-domain analytic evaluation methodology for Style Imitation (SI) system... more We propose CAEMSI, a cross-domain analytic evaluation methodology for Style Imitation (SI) systems, based on a set of statistical significance tests that allow hypotheses comparing two corpora to be tested. Typically, SI systems are evaluated using human participants, however, this type of approach has several weaknesses. For humans to provide reliable assessments of an SI system, they must possess a sufficient degree of domain knowledge, which can place significant limitations on the pool of participants. Furthermore, both human bias against computer-generated artifacts, and the variability of participants’ assessments call the reliability of the results into question. Most importantly, the use of human participants places limitations on the number of generated artifacts and SI systems which can be feasibly evaluated. Directly motivated by these shortcomings, CAEMSI provides a robust and scalable approach to the evaluation problem. Normalized Compression Distance, a domain-independ...
2018 IEEE Games, Entertainment, Media Conference (GEM), 2018
Cybersickness, which is also called Virtual Reality (VR) sickness, poses a significant challenge ... more Cybersickness, which is also called Virtual Reality (VR) sickness, poses a significant challenge to the VR user experience. Previous work demonstrated the viability of predicting cybersickness for VR 360° videos. Is it possible to automatically predict the level of cybersickness for interactive VR games? In this paper, we present a machine learning approach to automatically predict the level of cybersickness for VR games. First, we proposed a novel ranking-rating (RR) score to measure the ground-truth annotations for cybersickness. We then verified the RR scores by comparing them with the Simulator Sickness Questionnaire (SSQ) scores. Next, we extracted features from heterogeneous data sources including the VR visual input, the head movement, and the individual characteristics. Finally, we built three machine learning models and evaluated their performances: the Convolutional Neural Network (CNN) trained from scratch, the Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) trained from scratch, and the Support Vector Regression (SVR). The results indicated that the best performance of predicting cybersickness was obtained by the LSTM-RNN, providing a viable solution for automatically cybersickness prediction for interactive VR games.
Proceedings of the 13th International Conference on Advances in Computer Entertainment Technology, 2016
A music video (MV) is a videotaped performance of a recorded popular song, usually accompanied by... more A music video (MV) is a videotaped performance of a recorded popular song, usually accompanied by dancing and visual images. In this paper, we outline the design of a generative music video system, which automatically generates an audio-video mashup for a given target audio track. The system performs segmentation for the given target song based on beat detection. Next, according to audio similarity analysis and color heuristic selection methods, we obtain generated video segments. Then, these video segments are truncated to match the length of audio segments and are concatenated as the final music video. An evaluation of our system has shown that users are receptive to this novel presentation of music videos and are interested in future developments.
ArXiv, 2021
Fig. 1. End-user-friendly explanatory forms in the EUCA framework. The explanatory forms are show... more Fig. 1. End-user-friendly explanatory forms in the EUCA framework. The explanatory forms are shown on the right grids, each accompanied by a prototyping card example across four tasks used in the user study. The forms are a familiar language to both AI designers and end-users, thus overcoming the technical communication barriers between the two. The 12 explanatory forms are grouped into four categories to explain AI’s prediction on a new data point (the red dot in the leftmost 2D feature-space plot), or the model’s overall behavior: Explaining using features, examples, rules, and supplementary information. These categories correspond to the different aspects of showing AI’s learned representations at the feature, instance, and decision boundary level, indicated in the plot.
ICCC, 2016
Choreography is an embodied and complex creative process that often relies on 'co-imagining' as a... more Choreography is an embodied and complex creative process that often relies on 'co-imagining' as a strategy in generating new movement ideas. Technology has historically been used as a tool to augment creative opportunities in choreographic process, with multiple choreographic support tools designed to function as a 'blank slate' for choreography. However, few of these tools support creative authoring with interactive or generative components. Cochoreo is a sub-module for generating body positions as keyframes that catalyze creative movement, as part of the movement sketching tool idanceForms (idF). Cochoreo catalyzes movement sketching by using parameters from Laban Movement Analysis, an existing movement framework, to generate unique keyframes that are used as seed material for choreographic process. idF is a creativity support tool that engages with choreographers' creative movement process by design. This paper presents the design of Cochoreo and evaluations from our pilot study with university dance students.
We present MoComp, an interactive visualization tool that allows users to identify and understand... more We present MoComp, an interactive visualization tool that allows users to identify and understand differences in motion between two takes of motion capture data. In MoComp, the body part position and motion is visualized focusing on angles of the joints making up each body part. This makes the tool useful for between-take and even between-subject comparison of particular movements since the angle data is independent of the size of the captured subject.
BCS Learning & Development, Jul 1, 2013
Respire is a virtual environment presented on a head-mounted display with generative sound built ... more Respire is a virtual environment presented on a head-mounted display with generative sound built upon our previous work textitPulse Breath Water. The system follows the changes in user's breathing patterns upon which it generates changes in the audio and virtual environment. The piece is built upon mindfulness-based design principles with a focus on the breath as a primary object of the user's attention, and employs various approaches to augmenting breathing in the virtual environment.
Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, 2017
Never Alone (2016) is a generative large-scale urban screen video-sound installation, which prese... more Never Alone (2016) is a generative large-scale urban screen video-sound installation, which presents the idea of generative choreographies amongst multiple video agents, or "digital performers". This generative installation questions how we navigate in urban spaces and the ubiquity and disruptive nature of encounters within the cities' landscapes. The video agents explore precarious movement paths along the façade inhabiting landscapes that are both architectural and emotional.
Lecture Notes in Computer Science, 2017
We introduce Pulse Breath Water, an immersive virtual environment (VE) with affect estimation in ... more We introduce Pulse Breath Water, an immersive virtual environment (VE) with affect estimation in sound. We employ embodied interaction between a user and the system through the user's breathing frequencies mapped to the system's behaviour. In this study we investigate how two different mappings (metaphoric, and "reverse") of embodied interaction design might enhance the affective properties of the presented system. We build on previous work in embodied cognition, embodied interaction, and affect estimation in sound by examining the impact of affective audiovisuals and two kinds of interaction mapping on the user's engagement, affective states, and overall experience. The insights gained through questionnaires and semi-structured interviews are discussed in the context of participants' lived experience and the limitations of the system to be addressed in future work.
Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020
Attending to breath is a self-awareness practice that exists within many contemplative and reflec... more Attending to breath is a self-awareness practice that exists within many contemplative and reflective traditions and is recognized for its benefits to well-being. Our current technological landscape embraces a large body of systems that utilize breath data in order to foster self-awareness. This paper seeks to deepen our understanding of the design space of systems that perceptually extend breath awareness. Our contribution is twofold: (1) our analysis reveals how the underlying theoretical frameworks shape the system design and its evaluation, and (2) how system design features support perceptual extension of breath awareness. We review and critically analyze 31 breath-based interactive systems. We identify 4 theoretical frameworks and 3 design strategies for interactive systems that perceptually extend breath awareness. We reflect upon this design space from both a theoretical and system design perspective, and propose future design directions for developing systems that "listen to" breath and perceptually extend it.
Uploads
Papers by Philippe Pasquier
Proceedings of Artificial Intelligence and Interactive Digital Entertainment (AIIDE'13) workshops, Boston, MA, October 14-15th, 2013.