Academia.eduAcademia.edu

Automatic Prediction of Cybersickness for Virtual Reality Games

2018, 2018 IEEE Games, Entertainment, Media Conference (GEM)

Cybersickness, which is also called Virtual Reality (VR) sickness, poses a significant challenge to the VR user experience. Previous work demonstrated the viability of predicting cybersickness for VR 360° videos. Is it possible to automatically predict the level of cybersickness for interactive VR games? In this paper, we present a machine learning approach to automatically predict the level of cybersickness for VR games. First, we proposed a novel ranking-rating (RR) score to measure the ground-truth annotations for cybersickness. We then verified the RR scores by comparing them with the Simulator Sickness Questionnaire (SSQ) scores. Next, we extracted features from heterogeneous data sources including the VR visual input, the head movement, and the individual characteristics. Finally, we built three machine learning models and evaluated their performances: the Convolutional Neural Network (CNN) trained from scratch, the Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) trained from scratch, and the Support Vector Regression (SVR). The results indicated that the best performance of predicting cybersickness was obtained by the LSTM-RNN, providing a viable solution for automatically cybersickness prediction for interactive VR games.

2018 IEEE Games, Entertainment, Media Conference (GEM). Automatic Prediction of Cybersickness for Virtual Reality Games Weina Jin, Jianyu Fan, Diane Gromala, Philippe Pasquier School of Interactive Arts and Technology, Simon Fraser University Surrey, Canada {weinaj, jianyuf, gromala, pasquier}@sfu.ca Abstract—Cybersickness, which is also called Virtual Reality (VR) sickness, poses a significant challenge to the VR user experience. Previous work demonstrated the viability of predicting cybersickness for VR 360° videos. Is it possible to automatically predict the level of cybersickness for interactive VR games? In this paper, we present a machine learning approach to automatically predict the level of cybersickness for VR games. First, we proposed a novel ranking–rating (RR) score to measure the ground-truth annotations for cybersickness. We then verified the RR scores by comparing them with the Simulator Sickness Questionnaire (SSQ) scores. Next, we extracted features from heterogeneous data sources including the VR visual input, the head movement, and the individual characteristics. Finally, we built three machine learning models and evaluated their performances: the Convolutional Neural Network (CNN) trained from scratch, the Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) trained from scratch, and the Support Vector Regression (SVR). The results indicated that the best performance of predicting cybersickness was obtained by the LSTM-RNN, providing a viable solution for automatically cybersickness prediction for interactive VR games. Keywords— Machine Learning; Virtual Reality; Cybersickness. I. INTRODUCTION With the release of consumer Virtual Reality (VR) products, VR has gained in popularity and been widely used in many fields, such as games, film, medical training, psychological therapy, education, museum, and sports. VR is defined as a computer-generated interactive virtual world that the user is effectively immersed in and has dynamic control of the viewpoint [1]. However, the VR experience is often accompanied by VR sickness, which poses a great challenge and safety issue. VR sickness, also called cybersickness, is the motion-sickness-like symptoms that occur during exposure to the virtual environment [2]. The symptoms of cybersickness include nausea, retching, vomiting, increased salivation, cold sweating, drowsiness, pallor, dizziness, etc. Since the reported incidence of cybersickness is 61–80% [3], cybersickness is a major barrier to the wider use of VR. In particular, it limits the effective use of VR for training, rehabilitation, and therapeutic purposes. The pathological cause of cybersickness is unknown. The most common theory in cybersickness research is the sensory mismatch theory. It states that the sickness is caused by the conflict of different sensory input channels, such as visual, auditory, and vestibular [2]. Because cybersickness cannot be 978-1-5386-6304-2/18/$31.00 ©2018 IEEE completely eliminated at the current stage, automatically predicting the level of cybersickness can help the systematic control, for example, will enable the VR games to set individualized breakpoints based on the player’s cumulative cybersickness level, or allow the players to review the cybersickness score before purchasing a new VR game. Machine learning provides techniques to build predictive models from real-world gameplay data, and previous works evaluated the viability of using machine learning to automatically predict cybersickness for 360° videos [4,5]. To the best of our knowledge, no work has utilized a machine learning approach to predict cybersickness for interactive VR games. The difficulties of predicting cybersickness in the realworld scenarios of VR gameplay include rich interactions, dynamic changes of viewpoints, and subjective experiences of cybersickness. To approach this problem, we first designed a ranking–rating (RR) score to measure the degree of cybersickness, and aimed to reduce the cognitive load and achieve consistency within and between annotators. Then we collected VR gameplay data and trained machine learning models based on each piece of gameplay data, rather than on the averaged data for each game, to allow our model to capture the interactive and individualized nature of VR gameplay. Our contributions to the VR community are the following: x Designed a ranking–rating score to annotate the level of cybersickness and compared it with the Simulator Sickness Questionnaire (SSQ), which is the most widely used questionnaire in cybersickness studies. x Utilized heterogeneous data sources including the VR visual input, the head movement, and the individual characteristics. x Built machine learning models to predict the level of cybersickness based on the heterogeneous data and the ranking-rating annotations. The paper is structured as follows: we first review the previous work on predicting cybersickness (Section II). Next, we describe our methods of dataset construction (Section III), data preprocessing (Section IV), and feature extraction (Section V). We built three machine learning models and compared their performances (Section VI). Finally, we present our conclusion and future work (Section VII). 382 2018 IEEE Games, Entertainment, Media Conference (GEM). TABLE I. Hardware THE FACTORS OF CYBERSICKNESS Software Content Individual Measured Factors • Head movement • Motion in a scene • Scene texture • Color in a scene • Video game experience • VR game experience • Susceptibility to motion sickness Controlled Factors • Field of view • Resolution • Sitting vs. standing • Duration of VR exposure Unused Factors • None • Independent visual backgrounds (elements of the visual field that remain stable relative to the user [6]) • The degree of control (to what extent do users have control over their movement within a virtual environment [7]) • Scene content • Change of Color (hue, saturation, and brightness) over a scene • Postural instability (the ability of an individual to maintain balance and postural control, usually measured by body sway [8]) • History of headaches/migraine • Age • Gender 15.0, the median is 3.7, and the SSQ score ranges from 0.0 to 108.6. II. RELATED WORK A. Factors of Cybersickness Previous cybersickness research usually attributes the factors of sickness to three categories [2,6]: 1) hardware: the VR device and its configuration; 2) software: the VR content; and 3) individual: the user who interacts with the VR environment. Here, we list the most relevant factors in Table I as adapted from Rebenitsch’s thorough literature review and prioritization (Table XI: Confirmed Factors, Table XII: Probable factors, and Table XIII: Possible factors [6]). B. Traditional Approaches to Predict Cybersickness As Table I shows, there are many cybersickness-related factors, whose configurations are quite flexible. Moreover, the interactions among factors are complex. Therefore, previous studies focused on controlled experiments in laboratory settings to determine the causality of one or more factors that account for cybersickness. Kennedy et al. identified 16 symptoms as statistical indicators that showed significant changes from pre-exposure to post-exposure in 1,119 trial data [9]. Thus they devised an instrument called Simulator Sickness Questionnaire (SSQ) to quantify and predict the level of simulator sickness. It is worth pointing out that cybersickness, simulator sickness, and motion sickness share similar symptoms but are caused by exposure to different situations. While motion sickness usually involves actual motions, simulator sickness and/or cybersickness are subsets of motion sickness and are experienced while the users remain stationary [2]. The SSQ is the most widely used questionnaire in cybersickness studies [10]. It asks the degree of 16 symptoms, with a four-level severity scale of “0-none, 1slight, 2-moderate, and 3-severe”. The 16 symptoms are attributed to three subscales of nausea, oculomotor, and disorientation. The SSQ score is calculated as 3.74 times the sum of the three subscales. According to [9], after testing nine simulators from 3,691 samples, the mean ± SD of SSQ is 9.8 ± Since cybersickness is a subjective experience, to predict individual differences, Golding devised the Motion Sickness Susceptibility Questionnaire short version (MSSQ) [11]. It is an 18-item self-report questionnaire that asks participants about their motion sickness experience (scored with numbers 0–3) when riding different transportations in childhood and adulthood. The MSSQ shows good reliability and validity for predicting individual susceptibility to motion sickness. To calculate the MSSQ score, for each subscale of childhood and adulthood, subscore = (total sickness score) × 9 / (9 – the number of types not experienced). The MSSQ score is the summation of the childhood and adulthood subscores. It ranges from 0 to 54. According to [11], the mean ± SD of MSSQ score was 12.9 ± 9.9 with a positively skewed distribution from a normative sample of 257 university students. Rebenitsch developed several single-factor linear models that incorporate multiple factors to predict cybersickness [6]. Based on a set of controlled experiments involving 24 participants, the model on individual factors explains 37% of the adjusted variance of cybersickness. It included factors of MSSQ, headache, and video gameplay. Based on previous studies with reported configurations, the researcher built linear models on hardware and software factors, and it explained 55% of the adjusted variance. The model included factors of duration, tracking, controller type, seated or not, the realism of a virtual environment, the field of view, and movement in VR. C. Machine Learning Approach of Predicting Cybersickness for 360° Videos While the traditional controlled experimental approaches achieved high internal validity [12] in providing the causal correlation between specific factors and cybersickness, they sacrificed external validity of generalization to the real world, and may not capture the complex interactions among factors. Machine learning provides an alternative approach to this problem. It learns directly from real-world observational data 383 2018 IEEE Games, Entertainment, Media Conference (GEM). and saves the experimental labor of controlling variables one at a time to determine their correlations. After a thorough literature search, we list the recent work related to cybersickness and machine learning. Padmanaban et al. built a machine learning model to predict the sickness level when watching a 360° stereoscopic video [4]. They collected a set of 109 one-minute videos annotated with SSQ from 96 participants. They then trained a bagged decision tree as the machine learning model on hand-crafted features (quantifying speed, direction, and depth) from video content. Their model generally outperformed a naive estimate but was ultimately limited by the size of the dataset. During the data collection, the participant’s head motion was constrained using a headrest to ensure that all users saw the same scene and the participant was not allowed to look around. This limits their model to be generalized to a fully interactive VR environment in which the viewpoint changes frequently. Kim et al. adopted an unsupervised learning approach to detect irregular motions in 360° videos [5]. The deep convolutional autoencoder extracts features from five consecutive video frames in the encoder part and reconstructs the video sequences in the decoder part. Since the reconstruction errors were high for irregular motions, this network was used as an objective measure of the exceptional motions, and such measures had a high correlation with a subjective measure of SSQ. However, because the training and test videos used in this study were consecutive driving videos of cities and roads, this may make the model not generalize well to complex scenarios in VR gameplay that involve dynamic changes of viewpoints and scenes. Moreover, their model only took account of the single motion velocity factor to measure cybersickness. Despite the previous progress in predicting cybersickness, none has used machine learning models to predict the level of cybersickness for interactive VR games. Built upon the previous work, we present a machine learning approach to predict cybersickness for VR games. III. DATASET CONSTRUCTION A. Study Design 1) Factors Included in the Study We designed a study and recruited participants to collect data for building the predictive model. In our experiment, we tried to include as many factors of cybersickness as possible based on previous literature reviews. Among them, some were included as measured factors, while others were included as controlled factors that were fixed throughout the study. The rows of “Measured Factors” and “Controlled Factors” in Table I summarize the factors included in this study. We describe the methods of collecting these factors below. a) Hardware We included head movement as a hardware factor of cybersickness. Head movement was recorded as the changes of position and rotation of the head-mounted display (HMD), with a sample rate of 500 Hz. To represent the head movement, the captured data were later processed to extract the following factors: linear speed and acceleration in 3-D space, and the first and second difference of quaternion for spatial rotation. b) Software Content We recorded the eye screen video that was displayed in the HMD. The visual output from the HMD was projected to the desktop and was recorded with a both-eye view at 60 frames per second (fps). The captured videos were then processed to extract factors of motion, texture, and color. c) Individual We designed a demographic questionnaire to record individual factors. The video game experience and VR game experience were collected as binary data (0 for rarely, 1 for often). The susceptibility to motion sickness was measured by MSSQ. TABLE II. Hardware VR device = HTC Vive Field of view = 110° Resolution = 1080 ×1200 Refresh rate = 90 Hz FACTORS CONTROLLED IN THIS STUDY Software Content Individual Position = Sitting Duration = 3 minutes Some factors, although related to cybersickness, were controlled throughout the study. Although these controlled factors were not included in our present models, we reported their configurations in Table II, so that the dataset can be extended by collecting more data with different configurations in future work. Due to the nature of our pilot study, other possible factors were not collected in our study, or not included in our present models, as shown in the row of “Unused Factors” in Table I. We plan to take them into consideration in future work. 2) VR Games Used in the Study We chose five “off-the-shelf” VR games for the study based on the distribution of each factor. The games are: 1) The Night Cafe: A VR Tribute to Vincent Van Gogh 1; 2) NoLimits 2 Roller Coaster Simulation 2; 3) Endless Labyrinth 3; 4) InCell VR 4; and 5) Audioshield 5. The game configurations are listed in Table III. 3) Ranking-Rating Measure of Cybersickness For quantifying subjective assessment, ratings are the most often applied instrument. However, because contextual situations differ, the meaning of ratings may change over time or across individuals [13]. Compared with ratings, rankingbased approaches show consistency within participants and over time [14, 15]. However, a ranking system itself cannot easily assess the distance between values for one participant, and it is difficult to compare the rankings between participants. 1 https://store.steampowered.com/app/482390 https://store.steampowered.com/app/301320 3 https://store.steampowered.com/app/495830 4 https://store.steampowered.com/app/396030 5 https://store.steampowered.com/app/412740 2 384 2018 IEEE Games, Entertainment, Media Conference (GEM). TABLE III. THE FIVE VR GAMES PLAYED IN THIS STUDY Name Audioshield The Night Cafe InCell VR Endless Labyrinth NoLimits 2 Roller Coaster Simulation Game level played in the study Difficult: Normal; Music: Chopin Nocturne Op.9, No.2; Skin: Private amphitheater The first 3 minutes of the game Level 1 The first 3 minutes of the game “Dive Park” theme in the default library Control of movement No full-body movement in space Teleport The movement along the axis is forced by the game. The player could move the head to control the orientation. Use a controller to glide in the labyrinth. The movement is forced by the game Use of controller Two controllers One controller to teleport None One controller None Game Screenshot Table IV lists the pros and cons of the two measures [13,15,16]. TABLE IV. PROS AND CONS OF RATING AND RANKING. Rating Ranking Pros • Widely used • Easy to design • A higher inter-annotator reliability • The relative ranking task is simple for participants. Cons • Imposes heavy cognitive load on the participants • It is relatively difficult to design a ranking-based measure • The analysis method of ranking is less commonly known The two measures may not be mutually exclusive [17]. Based on the advantages of both measures, we designed an instrument, named “ranking-rating” (RR) measure, that combines the ranking and rating at the same time to assess cybersickness across games and individuals. To compare the RR measure with the previous measures, after each gameplay, we also asked the participants to answer the SSQ, which is a rating-based measure that widely used in cybersickness studies. Here, we describe the procedure of using the RR measure for cybersickness. The following rating and ranking processes are based on an 11-point scale (0–10, 0 for no sickness and 10 for the highest level of sickness). For each participant, the RR measure consists of two steps. First step: After playing the first game, participants were asked to rate the score of cybersickness based on their understanding of the scale. Starting from the second game, after each gameplay, participants were asked to evaluate their cybersickness based on the relative comparisons (ranking) among the games they had played so far. By ranking, the participants faced less cognitive load compared to solely rating the next game. It also avoids inconsistencies potentially caused by a changing understanding of the scale. These scores were used as references in the second step. Second step: After playing all the games, participants were asked to adjust all the scores they marked in the first step. They could modify the order of the ranking, and adjust the distance between each other. The use of fractions or decimals was encouraged, and if decimals were used, all participants chose to report their scores with a resolution of 0.5. The scores obtained in the second step were the final RR scores of cybersickness. The idea of designing this second step is that after playing all the games, participants now have a more general understanding of the various levels of cybersickness. This procedure also helps to minimize the bias introduced by the initial rating of the first game. B. Study Procedure The data collection was conducted in an experimental setting. The study was approved by Simon Fraser University’s Research Ethics Board. The inclusion criterion is people above 18 years old. The exclusion criteria are people with migraines or other severe diseases which may affect the study. Participants were recruited using convenience sampling. Written informed consent was obtained from participants before enrolment. A total of 25 participants were enrolled in this study. One participant’s data were discarded because of withdrawal and data incompletion. The data of 24 participants (Female = 10, Male = 14; Age 28.0 ± 7.5 years) were included in the final data processing. During the study, the participants were asked to play five “off-the-shelf” VR games as stated above. The games were purchased from Steam gaming platform6. Each gameplay lasted 6 385 http://store.steampowered.com/ 2018 IEEE Games, Entertainment, Media Conference (GEM). for three minutes. The gameplay sequence was randomly assigned by drawing lots. Participants remained seated on a swivel chair to play the VR games. They could move their body in the virtual environment by moving the swivel chair in the 2.5 × 2.5 meters physical space. We used HTC Vive as the VR gaming device. The Vive headset has a refresh rate of 90 Hz and a 110° field of view. The display resolution for each eye screen is 1080 × 1200. The games were run on an Alienware Desktop (Intel Core i7-5820K CPU @ 3.30 GHz, 12 Logical Processors; 16 GB RAM; Dual NVIDIA GTX 980) with Windows 10 Enterprise operating system. After each gameplay, the participants were asked to evaluate their levels of cybersickness using RR and SSQ measure, as stated in Section III.A.3. Then, the participants had a washout period between each gameplay for at least five minutes until they did not feel any sickness from the previous gameplay. After the whole gameplay session, the participants had a semi-structured interview to talk about their feelings of sickness during each gameplay. Then, they answered the demographic questionnaire before completing the study. The study session last for 1–1.5 hours in total. The participants were thanked with $10 CAD cash for their time and effort participating in this study. IV. DATA PREPROCESSING AND AUGMENTATION A. Sickness Score Calculation The SSQ score was calculated according to the method described in Section II.B. The RR score was collected directly in the study without any effort to calculate, as stated in Section III.A.3. It has the range of [0.0, 10.0]. The Pearson’s correlation coefficient r between the SSQ and RR scores is 0.838, which indicates that the two have a strong positive linear correlation. Therefore, we used the RR score as the ground-truth label for the construction of the following predictive models. Figure 1 visualizes the distribution of the RR and SSQ scores and their correlation. gameplay as one data point. This is because each gameplay information (eye video recordings, head movement, individual characteristics, cybersickness experience) is different from the other. We ended up having 120 raw data points (5 games per participant × 24 participants). B. Data Preprocessing For the eye screen video data, first, we truncated the original recordings to obtain the first 125 seconds of the eye screen videos. Second, we rescaled the video recordings. For each video recording in the corpus, the size is 37 × 20 (width × height). Third, considering the low-frequency nature of the visual stimuli and the head movement, and the computational recourses, we downsampled our video recordings to 2 fps. For the head movement data, we truncated the first 125 seconds of the raw data, which was in the same period aligned with the video data. We then downsampled the original 500 Hz data to 2 Hz to align with the video data. The time steps of video and head movement data were precisely aligned. The raw data consisted of two vectors: the position in 3-D space (x, y, z), and the quaternion spatial rotation (qw, qx, qy, qz). A quaternion uses four numbers to simplify the way to encode an axis–angle rotation to a position vector. The vector (qx, qy, qz) in a quaternion represents the position vector, and the scalar qw encodes the rotation angle. C. Data Augmentation Because each of the original video recordings is 125 seconds long, each converted video recording has 250 frames in total. Then, we adopted a windowing method to perform data augmentation to enlarge the training set artificially. We experimented with a couple of settings and chose the window size of 60 frames and a step size of 10 frames. For one video recording, we kept selecting 60 consecutive frames as one augmented recording and moved one step ahead, until we reached the end of a video recording. After the data augmentation, we ended up having 2,400 video recordings. One augmented video recording is 30 seconds long and contains 60 frames. The annotations of each augmented video recording are the same as the annotations of the original video recording. The head movement data were augmented in the same way as the video recording. We ended up having a set of 2,400 augmented data points, and each has the shape of 14 × 60. The details of head movement features are described in Section V.B. V. FEATURE EXTRACTION Figure 1. The distribution and correlation between RR and SSQ score. It is worth noting that instead of using the averaged RR score of each game as the ground-truth label, we used the raw individual RR scores as the ground truth, and regarded each A. Individual Features For the video game experience and VR game experience, we collected binary data with the demographic questionnaire. These data were treated as categorical features for the machine learning model. The details of these two features are described in Section III.A.1.c. In addition, we calculated the MSSQ score (numerical feature) according to the method described in Section II.B. 386 2018 IEEE Games, Entertainment, Media Conference (GEM). B. Head Movement Features We computed hand-crafted head movement features based on the original position vectors (x, y, z) and quaternion vectors (qw, qx, qy, qz). For the position vector, we computed the speed feature (3 dimensions) as the first difference of two consecutive position vectors, and the acceleration feature (3 dimensions) as the first difference of two consecutive speed vectors. For the quaternion vector, we computed the first difference (4 dimensions) and the second difference (4 dimensions) of two consecutive quaternion vectors. The size of the head movement features is 14×60, where 14 is the total number of dimensions, and 60 is the total number of time steps (2 Hz for 30 seconds). algorithm. If the image has higher entropy, it contains more information. Each feature has 60 time-steps. D. Two Feature Sets We used two sets of features for different machine learning models separately. Feature Set 1 (Table V) contains the time sequence features (frame by frame) and was fed into the Convolutional Neural Network (CNN) and Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) models. We ended up having an 18 × 60 feature matrix, where 18 is the total number of dimensions, and 60 is the number of time steps. Each feature was normalized by subtracting its average and dividing by its standard deviation. TABLE V. C. Video Features (Eye Screen Video) We extracted low-level video features including color, motion, and texture. The ViVid [18] software package was used for the video feature extraction. 1) Color: Hue, Saturation, and Brightness Regarding color features, we selected hue, saturation, and brightness, which are related to human perception of colors. For each color feature, the original range of the value is from 0 to 255. Here, we divided the range into 32 bins. Instead of computing these features frame by frame, we computed these color features over the entire 30-second video recording (all frames together). We ended up having 96 dimensions of color features for one video recording. Features Feature Types Dimensions Time Steps Head Movement Temporal 14 60 Motion Intensity Temporal 1 60 Contrast Temporal 1 60 Smoothness Temporal 1 60 Entropy Temporal 1 60 TABLE VI. Hue is defined as “the degree to which a stimulus can be described as similar to or different from stimuli that are described as: red, orange, yellow, green, blue, violet” [19]. Saturation represents the level of purity of the color. A higher saturation value indicates less mixing of colors. By mixing more colors, the picture will turn gray. Therefore, the purity will decrease, and the saturation will decrease. The brightness feature represents the level of a source appears to be radiating or reflecting light, which is the perception elicited by the luminance of a visual target [20]. Similar to saturation, we separated the brightness value into 32 bins. FEATURE SET 2, PREPARED FOR SVR Features 2) Motion: Motion Intensity For motion features, we investigated the motion intensity, which is the level of motion between frames within the video. While the color features were computed over the entire video recording and presented based on its distribution, the motion intensity feature was extracted based on two consecutive frames. Since there are 60 frames in each augmented video recording, we have 60 time-steps of motion features in total. 3) Texture: Contrast, Smoothness, Entropy We chose to compute contrast, smoothness and entropy features to provide the texture information of the video recordings. These are also frame-level features. Contrast is determined by the difference of color between the objects within an image. Smoothness represents the homogeneity of the gray level distribution of a frame. It is approximately inversely correlated with contrast. Entropy represents the level of randomness. Specifically, it is a measure of the amount of information, which must be coded by a compression FEATURE SET 1, PREPARED FOR CNN & LSTM-RNN Feature Types Dimensions Video Game Experience Categorical 1 VR Experience Categorical 1 MSSQ Numerical 1 Head Movement 6 Statistics 84 Hue Distribution 32 Saturation Distribution 32 Brightness Distribution 32 Motion Intensity 6 Statistics 6 Texture 6 Statistics 18 The Feature Set 2 (Table VI) was prepared for Support Vector Regression (SVR) model. These features are either categorical features (video game experience and VR game experience), color features (distributions of hue, saturation, and brightness), or the statistics of time sequence features. For the statistics of each time sequence feature, we computed 6 features: mean, standard deviation, skewness, kurtosis, maximum and minimum. We ended up having a 207- 387 2018 IEEE Games, Entertainment, Media Conference (GEM). dimensional feature vector as listed in Table VI. Each feature was normalized by subtracting its average and dividing by its standard deviation. VI. MACHINE LEARNING MODELS AND PERFORMANCES In this section, we describe the construction of three machine learning models: the CNN, LSTM-RNN, and SVR. A. CNN We built and trained a CNN model from scratch. We applied grid search to find the best hyperparameters such as the number of kernels in each layer, kernel size, learning rate, and decay. Our CNN is composed of two convolutional layers and one fully connected layer. The first convolutional layer filters the 18 × 60 × 1 input features with 16 kernels of size 3 × 3 × 1 and a stride of 1. The second convolutional layer, connected to the first one, uses 16 kernels of size 3 × 3 × 16. We used maxpooling (2 × 2) for the outputs of both convolutional layers. There is a dropout layer between the second convolutional and the fully-connected layer, with a dropout rate of 0.15. The fully connected layer, connected to the second convolutional layer, is composed of 256 neurons. The output layer is composed of 1 neuron. We used the ReLU nonlinearity activation function for all convolutional layers and the fully connected layer. For the output layer, we use the linear activation to predict the level of cybersickness. All the weights were initialized based on a Xavier uniform. The CNN was trained using the RMSProp optimizer with a batch size of 32 examples, learning rate of 1 × 10−3 and decay of 1 × 10−6. We trained 60 epochs before testing the model. B. LSTM-RNN We built and trained an LSTM-RNN from scratch. To find the best hyperparameters such as the number of neurons in each layer, learning rate, and decay, we used a grid search method. Our LSTM-RNN is composed of two stacked LSTM units. The input size is 18 × 60, where 18 is the number of the feature extracted from head movement and eye screen video data, and 60 is the number of time steps (See Table V). There are 64 neurons in each LSTM unit. The output layer is composed of 1 neuron. We used the tanh non-linearity activation function for LSTM units, and linear activation function for the output layer to predict the level of cybersickness. Similar to CNN, all the weights were also initialized based on a Xavier uniform. We trained the LSTMRNN using the RMSProp optimizer with a batch size of 32 examples, learning rate of 1 × 10−2 and decay of 1 × 10−6. We trained 60 epochs before testing the model. We implemented the CNN and LSTM-RNN with Keras 2.0. C. SVR The SVR model maps the input data into a higher dimensional feature space using nonlinear mapping and builds a linear model in this feature space to make predictions. We fed the SVR with Feature Set 2 (See Table VI), selected the Radial Basis Function (RBF) kernel and used a grid search method to find the best hyperparameters of C and gamma. We implemented the SVR in scikit-learn 0.19.0. D. Performances To train and evaluate the above three models, we applied repeated random sub-sampling validation. The augmented dataset composing a total of 2,400 30-seconds data was shuffled 10 times. Each time, 90% of the data were randomly selected for training the model, and the remaining 10% was used for testing. We use R2 and MSE to evaluate the performance of the prediction. Table VII presents the performance results of CNN, LSTM-RNN, and SVR. Among the three models, the LSTM-RNN model achieved the highest performance both on R2 (0.868) and MSE (0.009). TABLE VII. MODEL PREDICTION PREFORMANCES R2 a MSE b CNN 0.462 0.036 LSTM-RNN 0.868 0.009 SVR 0.793 0.014 a. R2: Coefficient of determination b. MSE: Mean Square Error E. Discussion Our results showed that LSTM-RNN model outperformed CNN and SVR models in predicting cybersickness. One possible explanation is that the LSTM-RNN could remember things and find patterns across time to make predictions, which is suitable for the problem of predicting cybersickness based on the time-series events of VR gameplay. Another possible explanation is that the LSTM-RNN may be capable of modeling the interaction and “mismatch” between the head motion and visual motion stimuli. For example, it may capture a rapid visual change with no head movement, which corresponds to the sensory mismatch theory that explains the cause of cybersickness. In addition, the LSTM-RNN model could make predictions without the individual features; it only used the eye screen videos and head movement data as input. This would save the players from manually inputting their individual features for predicting cybersickness. The SVR model also achieved good, but not equivalent performance to the LSTM-RNN, probably due to the lack of capacity of representing the interactions among features. Regarding the CNN, it is reasonable that the CNN did not achieve satisfying results because it is more suitable to capture spatial rather than temporal information. Our system of predicting cybersickness can be utilized in the following usage scenarios: 1) Quantify cybersickness in a retrospective manner: it can be applied to automatically quantify the cumulative cybersickness over a period of exposure to VR games. It will allow the VR systems to set individual-based breakpoints to control cybersickness below a certain threshold and thus improve VR gaming experience. 2) Predict cybersickness in a prospective manner: our system enables the VR game developers and platforms to compute the cybersickness score automatically, thereby encourages the initiation of a cybersickness rating system. Such system allows the players to easily check the level of cybersickness of new 388 2018 IEEE Games, Entertainment, Media Conference (GEM). games. However, since our model requires eye screen video recordings and head movement data as input, such information cannot be acquired before the gameplay. One way to solve this problem is to invite a number of trial players to play the new game first and record their eye screen videos and head movement data. The data collected from the trial players will be fed to our model to get a number of predicted RR scores. Then, an overall RR score for the new game can be computed by averaging the predicted RR scores from each trial player. [4] [5] [6] [7] VII. CONCLUSIONS & FUTURE WORKS In this paper, we present a machine learning approach to automatically predict the level of cybersickness for interactive VR games. We reviewed the literature and selected the possible factors for cybersickness, and designed an experiment to construct the dataset. Different from previous study approaches, which mainly considered the static viewpoint of visual stimuli in VR as input, our study included the dynamic visual stimuli, head movement, and individual features. We designed the RR measure and aimed to achieve an easier and consistent assessment for cybersickness. We built three machine learning models and compared their performance of predicting cybersickness. The results indicated that the LSTMRNN is a more viable model for the problem of predicting cybersickness. [8] [9] [10] [11] [12] [13] The present dataset can be expanded in future work by collecting more data varying in the configurations of the measured and controlled factors as shown in Table I. As for the features extracted from eye screen video recordings, at the current stage, we only extracted low-level features such as color, motion, and texture. Since cybersickness is a multiplex reaction mingled with physiological and psychological processes, in the future work, we also intend to include higherlevel information, such as semantic or emotional features to improve our model’s performance. [15] ACKNOWLEDGMENT [17] We would like to acknowledge the Social Sciences and Humanities Research Council of Canada and Natural Sciences and Engineering Research Council of Canada. [18] [14] [16] REFERENCES [1] [2] [3] F. P. Brooks, “What’s real about virtual reality?,” American Industrial Hygiene Association Journal, ieee computer graphics and applications, IEEE computer graphics and applications., vol. 19, no. 6, pp. 16–27, Nov. 1999. S. Davis, K. Nesbitt, and E. Nalivaiko, A Systematic Review of Cybersickness. New York, NY, USA: ACM, 2014. B. Lawson, “Motion Sickness Symptomatology and Origins,” in Handbook of Virtual Environments, CRC Press, 2014, pp. 531–600. [19] [20] 389 N. Padmanaban, T. Ruban, V. Sitzmann, A. M. Norcia, and G. Wetzstein, “Towards a Machine-learning Approach for Sickness Prediction in 360o Stereoscopic Videos,” IEEE Transactions on Visualization and Computer Graphics, vol. PP, no. 99, pp. 1–1, 2018. H. G. Kim, W. J. Baddar, H. Lim, H. Jeong, and Y. M. Ro, “Measurement of Exceptional Motion in VR Video Contents for VR Sickness Assessment Using Deep Convolutional Autoencoder,” in Proceedings of the 23rd ACM Symposium on Virtual Reality Software and Technology, New York, NY, USA, 2017, pp. 36:1–36:7. L. R. Rebenitsch, Cybersickness Prioritization and Modeling. Michigan State University, 2015. K. M. Stanney and P. Hash, “Locus of User-Initiated Control in Virtual Environments: Influences on Cybersickness,” Presence, vol. 7, no. 5, pp. 447–459, Oct. 1998. S. V. G. Cobb, “Measurement of postural stability before and after immersion in a virtual environment,” Applied Ergonomics, vol. 30, no. 1, pp. 47–57, Feb. 1999. R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal, “Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sickness,” The International Journal of Aviation Psychology, vol. 3, no. 3, pp. 203–220, Jul. 1993. B. Lawson, “Motion Sickness Scaling,” in Handbook of Virtual Environments, CRC Press, 2014, pp. 601–626. J. F. Golding, “Predicting individual differences in motion sickness susceptibility by questionnaire,” Personality and Individual Differences, vol. 41, no. 2, pp. 237–248, Jul. 2006. R. W. Proctor and E. J. Capaldi, “Internal and External Validity,” in Why Science Matters, Wiley-Blackwell, 2008, pp. 180–194. G. N. Yannakakis and H. P. Martínez, “Ratings are Overrated!,” Front. ICT, vol. 2, 2015. O. L. Bock and C. M. Oman, “Dynamics of subjective discomfort in motion sickness as measured with a magnitude estimation method.,” Aviation, Space, and Environmental Medicine, vol. 53, no. 8, pp. 773– 777, 1982. J. Fan, M. Thorogood, and P. Pasquier, “Emo-soundscapes: A dataset for soundscape emotion recognition,” in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017, pp. 196–201. A. Metallinou and S. Narayanan, “Annotation and processing of continuous emotional attributes: Challenges and opportunities,” in 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–8. S. Ovadia, “Ratings and rankings: reconsidering the structure of values and their measurement,” International Journal of Social Research Methodology, vol. 7, no. 5, pp. 403–414, Sep. 2004. J. Fan, P. Pasquier, L. M. Fadel, and J. Bizzocchi, “ViVid: A Video Feature Visualization Engine,” in Design, User Experience, and Usability: Understanding Users and Contexts, vol. 10290, A. Marcus and W. Wang, Eds. Cham: Springer International Publishing, 2017, pp. 42–53. J. J. V. Wijk and E. R. V. Selow, “Cluster and calendar based visualization of time series data,” in 1999 IEEE Symposium on Information Visualization, 1999. (Info Vis ’99) Proceedings, 1999, pp. 4–9, 140. A. Vadivel, S. Sural, and A. k. Majumdar, “Robust histogram generation from the HSV space based on visual colour perception,” International Journal of Signal and Imaging Systems Engineering, vol. 1, no. 3–4, pp. 245–254, Jan. 2008.