Academia.eduAcademia.edu

Reducing the Noise of Reality

2019, Psychological Inquiry

Psychological Inquiry An International Journal for the Advancement of Psychological Theory ISSN: 1047-840X (Print) 1532-7965 (Online) Journal homepage: https://www.tandfonline.com/loi/hpli20 Reducing the Noise of Reality Abele Michela, Marieke M. J. W. van Rooij, Floris Klumpers, Jacobien M. van Peer, Karin Roelofs & Isabela Granic To cite this article: Abele Michela, Marieke M. J. W. van Rooij, Floris Klumpers, Jacobien M. van Peer, Karin Roelofs & Isabela Granic (2019) Reducing the Noise of Reality, Psychological Inquiry, 30:4, 203-210, DOI: 10.1080/1047840X.2019.1693872 To link to this article: https://doi.org/10.1080/1047840X.2019.1693872 © 2019 The Author(s). Published with license by Taylor & Francis Group, LLC. Published online: 04 Jan 2020. Submit your article to this journal Article views: 501 View related articles View Crossmark data Citing articles: 1 View citing articles Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=hpli20 PSYCHOLOGICAL INQUIRY 2019, VOL. 30, NO. 4, 203–210 https://doi.org/10.1080/1047840X.2019.1693872 COMMENTARIES Reducing the Noise of Reality Abele Michelaa, Marieke M. J. W. van Rooija, Floris Klumpersa,b, Jacobien M. van Peera, Karin Roelofsa,b, and Isabela Granica a Behavioral Science Institute, Radboud University Nijmegen, Nijmegen, The Netherlands; bDonders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, The Netherlands A Commentary On: Causal Inference in Generalizable Environments Miller and colleagues (this issue) propose a novel and inspiring theoretical framework that merges systematic and representative experimental design. As such, the Systematic Representative Design (SRD) framework has the potential to move psychological science forward in a significant way. Essential to the authors’ proposition is the default control group (DCG), an experimental condition that leverages the possibilities of new technologies such as Virtual Reality (VR) in order to create a close approximation to “real-life,” but with the affordances of a tightly controlled experience. With this innovation, Miller and colleagues attempt to solve the incongruous needs, common in so many experimental designs, between generalizability (to everyday life) and experimental control (to claim causality). Although the SRD framework could potentially result in a real shift in the way psychological research is conducted, it provides far from an ‘off the shelf’ solution. In particular the use of new technologies such as VR brings with it a number of complications - of both practical and theoretical nature – that are not fully addressed in the target article. With this commentary, we aim to contribute to the discussion on improving experimental design and thus empirical psychological research in general, by drawing from our experience in designing digital, game and neurofeedback based interventions for mental health and behavioral change. Furthermore, we will suggest that there are important process similarities between SRD and game design, as well as common practical pitfalls. We will outline these links and discuss their implications in order to potentially further strengthen the SRD framework, especially for applied research purposes. The Potential of Systematic Representative Design SRD is a promising framework that attempts to address, and provide solutions for, both causality and generalizability requirements in experimental design. This framework allows us to maintain a high level of experimental control in environments that are usually impossible to standardize without introducing a lot of noise in the measurements. Specifically, CONTACT Abele Michela [email protected] the strength and main novelty of the SRD approach is that it offers experimental control through a carefully constructed default control group (DCG) that serves as a highly controllable virtual substitute to reality. The first benefit of a highly controllable, yet generalizable, environment in SRD is the reduction of noise and biases in the data that are collected in the testing environment. Specifically, the authors of the target article suggest that VR, unlike the typically austere psychology laboratory, allows for the design of an experimental environment that can mimic the natural, complex context in which behaviors of interest appear. In doing so, the VR experimental design limits some of the experimental biases (e.g. due to different motivations, semantic understanding, cultural or social expectations) that otherwise emerge from studying participants acting out-of-the ordinary in traditional lab contexts (e.g. Ceci, Kahan, & Braman, 2010). Moreover, in studies focusing on social interaction, the opportunity to use Virtual Agents (VA) in the VR setup helps researchers overcome the low internal validity and increased noise that is often introduced when human confederates are used (Kuhlen & Brennan, 2013). On top of improving the reliability of the data being collected compared to naturalistic settings by increasing the consistency of the environment, SRD also aims to focus more precisely on the motivational core of participants’ behaviors, keeping that core as consistent as possible across the full sample of participants. A large problem in “controlled” laboratory assessments come from the range of unwanted individual differences that participants bring with them to a lab task (e.g. motivations, interpretations of the instructions, past experience with similar tasks, expectations, and so on). The SRD framework insists on taking these individual differences seriously and reducing the noise they introduce in experiments, by emphasizing the meaning of the action performed. Instead of relying on participants simply following task instructions, SRD experiments use contextualization (what is happening?) and interactive narrative (why is it happening and what should I do?) to prime participants to act consistently according to the role they are given. Thus, in participants’ experience, they behave more in accordance with their natural internal motivation rather Radboud University Nijmegen, Montessorilaan 3, 6525 Nijmegen, HR, The Netherlands. ß 2019 The Author(s). Published with license by Taylor & Francis Group, LLC. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4. 0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. 204 COMMENTARIES than the external motivation of complying with the experimenter. They have a reason to act as they are asked to. Indeed, “providing a role”, has been shown to change behavior in simple setups like the ultimatum game, where being primed to imagine impersonating a banker significantly changed the way participants behaved (Lightner, Barclay, Hagen, & Hagen, 2017). In addition, the narrative context makes action matter to the participant. Stimuli become affectively salient and motivationally important as a result of these contextual enhancements (see Parsons, 2015 for a review). In many research contexts, the increases in motivation and affective salience of stimuli should not only give rise to a better approximation of reality but also improve the reliability with which these effects are captured. Drawbacks and Challenges Despite SRD being a sound framework from a theoretical perspective, there are important pitfalls that we anticipate researchers will face when they attempt to apply the framework to their own experimental work. We experienced many of these same pitfalls in our own research, in which we strived to design a suitable VR environment for studying decision-making under stress in police officers. We will review a number of these pitfalls both from an implementation and an ethical perspective, but also provide specific examples from our own research that illustrate the concrete challenges. Suggestions for solutions that we have found to address these challenges will be explained in the “proposed solutions” section. Implementation Concerns One important aim of the SRD approach that is emphasized by the authors is the need for generalizability to everyday life. This concept resembles the more widespread concept of ecological validity, and will be used interchangeably in our discussions. Ecological validity can be defined as a combination of verisimilitude and veridicality (Franzen & Wilhelm, 1996). Verisimilitude is the level of believability or the extent to which an experimental task approximates the features of everyday life. Veridicality, in contrast, is the degree to which the performance of a participant in an experimental setup accurately predicts what that person would do in reality. Enhancing Verisimilitude Enhancing the verisimilitude in an experimental setup can be deceptively appealing, but may introduce several problems. In an SRD context, enhancing verisimilitude requires isolating a target behavior, identifying the “most frequent setting” in which that “behavior of interest” appears and also the “relevant script components”, and so forth. In other words, an SRD experiment working towards maximal verisimilitude may seem to aim at approximating a simple snapshot of reality - faithfully reproduced in VR and/or using by using VA. However designing this type of VR environment leads to several problems of feasibility that need to be considered even before an SRD approach is taken, especially in VR. First, among the most common drawbacks of VR are nausea and motion sickness: Up to 80% of participants are impacted by these physical symptoms in VR experiments (Stanney, Hale, Nahmens, & Kennedy, 2003). Most concerning for VR design using an SRD approach, nausea and motion sickness are most often a problem when the aim of the VR design is to elicit “natural behaviors.” More specifically, only a relatively small part of the population cannot physically tolerate VR that simply reacts to the natural movements of the player (e.g. an object grows bigger in order to appear closer when the user bends forward towards that object), but the proportion gets much larger when unpredictable or non-user-initiated artificial movement is experienced (e.g. a car hits a wall unexpectedly; e.g. Stanney & Hash, 1998). For example car simulation games in VR (like Project Cars, https://www.projectcarsgame.com/vr/) minimize this issue by providing wide tracks, a predictable trajectory, a reduced feeling of braking and frontward gaze fixation points. Considering the success of such games, VR could seem like a perfect environment to a researcher investigating, for example, reckless driving. The virtual environment allows for highly controlled measures to take place, and even very “risky” situations to be safely re-created. Yet, the nausea-reducing measures present in VR driving games cannot be used in most setups aiming to enhance verisimilitude, as exemplified by driving in a city: it happens in narrow tracks, frequent braking and the need to constantly monitor the surroundings, which results in gaze diversion from frontward fixation. Thus, with the current technology, enhanced verisimilitude increases unwanted nausea effects, which paradoxically, reduce the generalizability of the task, because nausea can be so unpleasant. A second drawback with the pursuit of verisimilitude is the uncanny valley effect (Mori, 1970), which is the feeling of eeriness or revulsion experienced while interacting with robotic or virtual avatars that mimic human behavior almost, but not exactly, perfectly. In the pursuit of ecological validity, researchers tempted to faithfully reproduce a snapshot of reality have a higher risk to encounter this effect than researchers limiting themselves to simpler, less realistic, stimuli. Research on this uncanny valley effect lacks consistency regarding the prevalence and explanation of the effect (Cheetham, 2017; Lay, Brace, Pike, & Pollick, 2016; Wang & Rochat, 2017). However, it may be that the effect is related to a discrepancy between expectations raised by an anthropomorphic “entity” and its observed behavior (Złotowski et al., 2015). Therefore, in the current state of most available technologies, the uncanny valley effect can pose a severe threat to the ecological validity of a task, as participants who feel disturbed by the perceived un-realism of the experimental setup are less likely to behave naturally. A last, more mundane, drawback regards the limited availability of adaptable and evolved VA and complex interactive environments whether in VR or not. Their development requires large financial investments and might require niche programing expertise. These high costs and long COMMENTARIES development times are often overlooked in grant proposals and research designs, whilst being of paramount importance for the success of any project aiming at using VA to enhance ecological validity. Aside from the implementation concerns, enhancing verisimilitude by approximating a snapshot of reality also brings with it analytic concerns: Given the complexity and richness of VR environments, how valid are our conventional analytic methods? Most of the tests used in traditional laboratory experiments rely on specific assumptions that are easily violated in VR setups that attempt to simulate reality as closely as possible. Human behavior is complex, occurring at different levels of analysis (perception, attention, interpretation, action), and changing over time (i.e. momentto-moment). The methodological challenge of capturing this complexity and dynamic nature is illustrated by Brehmer’s (1992) early research that attempted to re-create a generalizable, ecologically valid decision-making environment for firefighters. In his setup, a series of interdependent decisions had to be made in real-time in an environment that changed both spontaneously and as a consequence of earlier decisions. Since all the decisions made by a player were interdependent, the standard analytic strategy which aggregates all decisions across time and contexts was not feasible anymore because the assumption of independence was violated in this setup. This is why Brehmer’s focus had to move to the general tactics and strategies used by the firefighters to achieve their pursued goal rather than momentary decisions. The same could happen in SRD experiments: When broadly reproducing real-life situations in an SRD setup, actions and decisions will be embedded hierarchically, they will influence one another, and they will be contingent on prior actions and decisions. As a result, individual actions or decisions are not the correct unit of measurement and aggregating those measurements (decisions) violate most General Linear Model assumptions. More sophisticated hierarchical analysis approaches could be envisioned, but without severe constraints on the participant’s action range those analyses would quickly grow out of hand. In other words, if an SRD experiment consists of a VR-based snapshot of reality, it is more like an interactive “video” of reality than a series of independent snapshots. Thus that design does not allow analyzing single action or decision moments in isolation. Taken together, the lack of independence of measurement points prevents fundamental mechanistic questions to be addressed adequately in such a high-fidelity reproduction of reality. Enhancing Veridicality Enhancing the veridicality of an experimental setup raises another set of concerns regarding the way the data can be analyzed. Since veridicality is defined as the degree of prediction of the experimental setup on participants’ everyday behavior, it seems meaningful to enhance veridicality by incorporating “real life” elements into a validated experimental setup. It actually is the only principled way to proceed at enhancing the prediction power of a laboratory setup on everyday life’s behavior. Yet, sadly enough, every 205 element of “reality” added introduces proportional levels of complexity. No level of technological sophistication will solve this problem; in creating a close to real-world environment, one gets the corresponding real-world complexity for free. In a situation comparable to “real life”, behaviors can have wildly different explanations, or be due to the interaction of a large number of sub-systems. Therefore, by making an experimental setup richer to enhance veridicality, the researcher might not be able to exclude potential alternative explanations of a measured effect, especially when trying to make an inference about the underlying mechanisms. Effects that are established in a controlled environment could also disappear altogether in this more complex environment. A concrete example from our own work might be important to clarify our point. In an early iteration of an experiment on decision making under stress for police officers, we attempted creating an ecologically valid version of the laboratory shooter task standardized by Gladwin, Hashemi, van Ast & Roelofs (2016). This task requires (police) participants to perform several trials of shoot-don’t shoot decisions. The main effect we desired to reproduce from that task was a reaction time difference in shooting responses between high and low threat conditions, as found by Hashemi et al. (2019). Reaction times are measured as the latency between a target stimulus appearing on a screen and the recorded response from the participant. In the laboratory task the target stimuli appear instantaneously (an opponent with a gun or phone), thus allowing for a precise measurement of the time until the participant responds, with millisecond precision. In an ecologically valid VR setup however, even target stimuli with a seemingly sharp onset like someone opening a door or taking an item out of their pocket are not instantaneous enough for a precise reaction time measurement. In addition, the time duration of those very actions actually lasts much longer than the differences in reaction times that we wished to measure between conditions. As if that was not enough, since VR allows participants to look in any direction, target stimuli could be missed altogether. Hence, the effects of threat on response times observed in the laboratory task could disappear in the VR environment. Also, when enhancing veridicality by adding “reality” to an established experimental setup, the problem of trial duration can also become a challenge. This concern can again be illustrated by our attempts to create an ecologically valid version of the shooter task mentioned above. In this task, participants were asked to perform 150 trials of shoot-don’t shoot decision making. The task can be completed in about an hour and contains enough trials to reliably measure the within-condition effect (i.e. high versus low threat). In this set-up, it is virtually impossible to include contextual elements to make the decision similar to what police encounter in “real life”. To have an idea of what a more ecologically valid example could be and what would be the implications for task duration, let us consider the task used by Johnson et al. (2014). In this task, deadly use of force decisions were inspired by real situations from police practice and trials lasted an average of two minutes. Transforming the shooter 206 COMMENTARIES task by Gladwin and colleagues according to Johnson’s example, while keeping the same number of repetitions to reliably detect the effect, would make it last almost five hours (without breaks). Thus, because of feasibility reasons, the effects reported in Gladwin and colleagues’ actual experiment may not have been discovered, and could probably not be replicated, in a setup characterized by higher veridicality. This difference raises the question of the number of repetitions allowed by an SRD experiment relying on contextualization to elicit affective reactions, and therefore the effect size needed to reliably measure an effect with very few trials. Ethical Concerns An extension to the implementation concerns we have outlined above are those related to what is ethically feasible in experimental setups. The effort to make an environment ecologically valid and able to elicit genuine behaviors of interest may raise a range of ethical concerns that may not apply to standardized, controlled studies. Specifically, as outlined by Pan and Hamilton (2018), there are several ethical risks that appear in immersive environments and interaction with VA: Enhanced personal disclosure can lead to privacy issues (Lucas, Gratch, King, & Morency, 2014; Rizzo et al., 2015), immersive environments could lead to changes in real-life behaviors through embodiment (Tajadura-Jimenez, Banakou, Bianchi-Berthouze, & Slater, 2017) and even false memories in children (Segovia & Bailenson, 2009). As hinted by these studies, VR experiments could be considered to be emotionally charged environments because users are not only immersed in a story, with vivid graphics. It is also due to one’s whole body – including gaze, body posture, physiological arousal, and so on – being directly impacted on by this enclosed, immersed simulation. As a result, emotional experiences and associated cognitive impacts may have long-lasting effects. Those effects might linger far longer than the actual VR experience itself and generalize outside the context of the experiment. Consequently, these studies suggest the potential risk of accidentally creating traumatic experiences in VR experiments that simply aim to assess behavior. Clearly, the point of using VR in a study is to increase “immersion”, which in turn should elicit authentic emotional responses. If the responses are indeed authentic emotional experiences, they have the potential to impact the well-being of participants, as suggested by the use of VR for mood induction protocols (Ba~ nos et al., 2006), and the potential of those induction procedures to be used in clinical contexts (Herrero, GarcıaPalacios, Castilla, Molinari, & Botella, 2014). Moreover, the use of VR as a therapeutic intervention tool is a strong indication that it requires extra attention to potential side effects. Indeed VR is a promising tool used in the treatment of several disorders such as Post Traumatic Stress Disorder (PTSD; Rizzo & Shilling, 2017), complicated grief (Botella, Osma, Palacios, Guillen, & Ba~ nos, 2008), eating disorder (de Carvalho, Dias, Duchesne, Nardi, & Appolinario, 2017), and sexual disorders (Fromberger, Jordan, & M€ uller, 2018). Most of the leading scientists in those fields advise the use of extra care in considering the ethical issues specifically linked to VR. A comprehensive example of ethical guidelines can be found in the article of Madary and Metzinger (2016), where potential implications of the use of VR are reviewed extensively. In our own work, we had similar issues around ethically designing an emotionally evocative task in VR, with high levels of ecological validity. In the previously described VR project, aimed at training decision making under stress for police officers, situations arose in which police participants could involuntarily shoot innocent bystanders. These situations seemed relatively benign and game-like to the VR simulation designers. However, our stakeholder, the Dutch Police Academy, quickly vetoed this training scenario because of the concern for triggering traumatic memories and the potential for desensitizing officers or reinforcing shooting behaviors that were against their training policies. Thus, it is important to keep these ethical considerations in mind from the start when designing VR tasks from an SRD perspective, to avoid creating unsuitable (and costly re-) designs. Proposed Solutions To overcome the previously mentioned ethical and implementation challenges in the development of SRD experiments, we propose two main directions. The first is to extend the range of techniques proposed in the target article to experimental manipulations that include using powerful tools from neuroscience. We agree that techniques proposed by the authors like fMRI neurofeedback are a good start, but we see substantial additional advantages in terms of feasibility and opportunity for making causal claims from EEG neurofeedback applications, as well as brain stimulation techniques and other psychophysiological methods. Indeed, those techniques allow for a better use of VR capacities, by preserving the participant’s head movement freedom. The second direction that we propose is the application of gamebased approaches to SRD paradigms. We argue that it may be important to reconsider the importance of general verisimilitude and think about which specific elements of everyday life are necessary to claim generalizability. We suggest that less realistic game-based elements can often prove more effective in homing in on the core causal units necessary for generalizability claims. Extending the Use of SRD to Other Methods One of the core promises of SRD is to allow for causal inferences by isolating mechanisms by testing the experimental condition against a default control group (DCG). Providing the experimental condition by modifying the DCG by changing parts of the VR environment or of the VA can be relatively straightforward for a certain number of applications (like changing the gender of an interacting virtual avatar). However, as previously mentioned, for more fundamental questions (e.g., such as investigating the neural underpinning of specific decision making processes), COMMENTARIES difficulty can grow exponentially fast. This increased difficulty is mainly due to the need to exclude alternative explanations for observed effects and to make causality claims, which is what has historically driven experiments away from ecological validity (see Burgess et al. (2006), for a narrative review). One interesting way to avoid or at least address these difficulties could come from the fields of neurofeedback, biofeedback and brain stimulation. Indeed, these techniques can provide a way to manipulate physiological parameters - and can therefore allow causal inferences - by being used to create an experimental condition without modifying the VR environment or VA behaviors. Miller and colleagues alluded to neurofeedback as a field of applications that would benefits from the SRD framework. We agree, but argue that this benefit could go both ways, as neurofeedback has been used not only for interventions, but also as a tool to perform experimental manipulations in fundamental research (see Sitaram et al., 2017 for a review). Moreover, contrary to the authors’ suggestion that fMRI neurofeedback is ideal for incorporating into SRD designs, we suggest that there is a wide range of opportunities offered by the more VR-friendly EEG neurofeedback (as exemplified by the work of Vourvopoulos et al., 2019). This latter technique has already proven its potential in investigating fundamental neuroscientific questions, like the trainability of brain plasticity (Ros, Munneke, Ruge, Gruzelier, & Rothwell, 2010), EEG biomarker’s connection to psychopathological symptoms (Ros, Baars, Lanius, & Vuilleumier, 2014), the normalization of scale-free dynamics in EEG in (Ros et al., 2016) and research on stroke patients rehabilitation (Ros et al., 2017). Moreover, the spatial resolution of EEG neurofeedback has recently been extended by using machine learning to extract deeper, non-cortical, signals like those from the amygdala (as elegantly shown by Keynan et al., 2019). Specific brain activity could therefore be modulated either endogenously with neurofeedback or externally with brain-stimulation techniques, in controlled SRD setups. These possibilities would remove the chore of providing an experimental condition from the VR environment design. In other words, in such an experiment the VR and VA components would be constant across the control and experimental group, and the experimental manipulation would happen in terms of brain training or stimulation only. Experimenters could therefore perform targeted mechanistic manipulation of brain activity on participants interacting with a generalizable environment. If the VR environment was built in such a way that it was, indeed, reliably eliciting the behavior of interest, it would allow direct claims to be made regarding the implication of specific brain activity patterns in everyday functioning. One example could be studying the effect of disrupting (or training) posterior alpha oscillations, which has been linked to visual attention (Rihs, Michel, & Thut, 2007) in ecologically valid VR contexts. This could allow researchers to link the phase-amplitude coupling of the alpha-gamma waves (Pascucci, Hervais-Adelman, & Plomp, 2018) to behaviors generalizable to “real life”, which could 207 in turn prove very useful to better understand the neural underpinning of the daily attentional processes. Finally, the range of techniques used to create the experimental condition in SRD could be further expanded to biofeedback protocols controlling non-cerebral psychophysiological markers. Biofeedback could be used to investigate well-established psychophysiological effects in environments that provide a higher degree of generalizability to everyday life than traditional experimental designs. A few examples could be studying (through training) the role of easily measurable markers like heart-rate variability for stress management (Yu, Funk, Hu, Wang, & Feijs, 2018) or anticipatory bradycardia for decision making (Roelofs, 2017). In these two specific examples, studying the contribution of well-established psychophysiological markers in generalizable contexts would pave the way for a wide range of affordable interventions aiming at changing, for example, the behavior of patients suffering from anxiety disorders. Careful Integration of Reality: Make It a Game! We have argued that there are a host of potential pitfalls in designing SRD studies that aspire to maximize verisimilitude and veridicality, with the goal of increasing generalizability of research results and making stronger causal claims. An alternative and more promising approach to create an SRD experiment could come from the game design world. As we know, building the DCG requires isolating the core causal units needed for that generalizability. As long as these core units are maintained in the design, we may reduce realism in our VR simulations and still elicit the behavior of interest. We suggest that elements common in commercial games can offer such a solution by making the environment not realistic, but rather believable (Schubert, 2013). Where realism is achieved by faithfully reproducing reality, believability is achieved by engaging the player through several game mechanisms, like challenge or emotional narratives. When Miller et al. suggest that an SRD experiment could look like “serious games” they run the risk of directing the player toward the pursuit of realism instead of believability. Serious games are developed for educational interventions (e.g. skill training) and even if they usually attempt to include a “fun” component, it does not usually compare with the entertainment value of commercial games (Baranowski et al., 2016). We propose that using conventional gamification techniques incorporated in serious games will not be enough to obtain generalizability, largely because serious games are usually simplistic simulations that pay little attention to the emotional, engagement, and motivational underpinnings of player’s experience (Schoneveld, Lichtwarck-Aschoff, & Granic, 2019). This is why we think that serious games do not offer a solution for many applications of the SRD approach. However, we do advocate for game-based approaches that include believable core concerns of participants’ motivations and engagement that are fundamentally embedded in an emotionally relevant context. To illustrate the difference between a realistic “serious” game that feels artificial and a believable one that is 208 COMMENTARIES experienced as relevant and motivationally engaging, we make the analogy of an SRD “serious game” being like an empty plate on which we attempt to grow a bunch of human cells, while a believable game is a petri dish containing all the essential nutriments required for those cells to grow. The cells on the petri dish survive, whereas the cells on the empty plate quickly die. Similarly in an SRD experiment, in many cases the behavior of interest in a study could not “survive” in the DCG if the context does not provide the correct motivational, affective, and engaging conditions. A successful SRD experiment relies on a tradeoff between approximating real-life conditions (by using VR and VA in a believable way) while accommodating the scientific imperatives. We argue that providing motivational, affective and engaging elements that elicit ecologically valid behaviors in participants can be best achieved by making it into an engaging and entertaining (not “serious”) game. After all, what makes a good game greatly overlaps with the needs of SRD: An immersive experience with an interactive narrative that elicits genuine affective emotions in the participant. Importantly, game-based interventions can provide solutions for some of the potential pitfalls of SRD studies we mentioned earlier: The narrative scaffolding directs participants’ attention to what is at the core of the experimental goals and narrows the spectrum of actions observable in the experimental setup. Moreover, a high quality narrative can also provide a believable reason for many repetitions of any particular behavior, an otherwise key potential pitfall we mentioned earlier in SRD studies. Using real games as a reference can also mitigate the ethical concerns by reducing the risk of learning transfer of unwanted aspects of the simulation. In turn, a well-designed game setup can also greatly reduce the technical difficulties like the uncanny valley effect in the development of the SRD task, thus increasing the implementation potential and generalizability to everyday life. The final benefit of applying SRD in the form of real games, based on our own experiences in the previously described police project, is that it forces the researchers to adopt a more iterative stance in their design process, which is crucial for a successful experimental setup using VR. Working closely with game designers often means that researchers need to (at least partially) adopt design thinking principles in their development process (Scholten & Granic, 2019). Creating a suitable VR environment – with an immersive narrative and emotional experience – often requires more than just handing a list of requirements to game designers. These designers should instead be included in the experimental design process very early on, and help the researchers develop the task in an iterative procedure that will invariably challenge the scientist’s original assumptions. Indeed, as the complexity of an SRD experimental setup - sing VR, VA, and potentially other technologies combined - is orders of magnitude higher than traditional experiments, the design process may often resemble the ones used in commercial game companies. This process, called the Rapid Iterative Testing and Evaluation (RITE) method, has been advocated for more than a decade (Wixon, 2003) and can prove useful for testing assumptions and design choices in small consecutive steps, instead of designing and testing the full complex setup at once. It requires the early involvement of stakeholders, to test and refine each prototype until reaching combined scientific and design goals. This is a time consuming endeavor, but as Miller and colleagues themselves implied, isolating the behavior of interest and constructing a DCG are arduous tasks and require a very flexible mindset to be achieved. Conclusions The article by Miller and colleagues (this issue) suggests that the needed shift in the way we study psychology can be provided by Systematic Representative Design (SRD). We agree with them and hope that our commentary will help researchers undertaking this endeavor to be aware of the potential challenges of designing an SRD-informed study. More than a cautionary note, we aimed to raise issues about the conceptual conversion cost involved. The straightforward use of Virtual Reality and Virtual Agents to increase ecological validity by simply adding “reality” to a controlled experiment might rarely pay off due to feasibility issues, ethical complications and increased complexity in analysis methods. Our first suggestion is to widen the range of techniques used to create the experimental condition to EEG neurofeedback, biofeedback and brain stimulation. Our second suggestion is to draw inspiration from the game design world when designing an SRD experiment, both for design process and final result, and aim at creating believable environments instead of realistic ones. If these kinds of cross-disciplinary approaches are taken, we are convinced that SRD will lead to genuinely novel psychological research protocols that address many of the problems of generalizability that the field has suffered from for decades. Funding This work was supported by National Institute of General Medical Sciences; National Institute of Mental Health; National Institute on Drug Abuse. References Baños R. M., Liaño V., Botella C., Alcañiz M., Guerrero B., Rey B. (2006). Changing induced moods via virtual reality. In: W. A. IJsselsteijn, Y. A. W. de Kort, C. Midden, B. Eggen, E. van den Hoven (Eds.) Persuasive Technology. PERSUASIVE 2006. Lecture Notes in Computer Science (Vol. 3962, pp. 7–15). Berlin, Heidelberg: Springer. Baranowski, T., Blumberg, F., Buday, R., DeSmet, A., Fiellin, L. E., Green, C. S., … Young, K. (2016). Games for health for children – current status and needed research. Games for Health Journal, 5(1), 1–12. doi:10.1089/g4h.2015.0026 Botella, C., Osma, J., Palacios, A. G., Guillén, V., & Baños, R. (2008). Treatment of complicated grief using virtual reality: A case report. Death Studies, 32(7), 674–692. doi:10.1080/07481180802231319 COMMENTARIES Brehmer, B. (1992). Dynamic decision making: Human control of complex systems. Acta Psychologica, 81(3), 211–241. doi:10.1016/00016918(92)90019-A Burgess, P., Alderman, N., Forbes, C., Costello, A., Coates, L., Dawson, D., … Channon, S. (2006). The case for the development and use of “ecologically valid” measures of executive function in experimental and clinical neuropsychology. Journal of the International Neuropsychological Society, 12(2), 194–209. doi:10.1017/ S1355617706060310 Ceci, S. J., Kahan, D. M., & Braman, D. (2010). The WEIRD are even weirder than you think: Diversifying contexts is as important as diversifying samples. Behavioral and Brain Sciences, 33(2–3), 87–83. doi:10.1017/S0140525X0999152X Cheetham, M. (2017). Editorial: The uncanny valley hypothesis and beyond. Frontiers in Psychology, 8, 1–3. doi:10.3389/fpsyg.2017.01738 de Carvalho, M. R., Dias, T. R., de, S., Duchesne, M., Nardi, A. E., & Appolinario, J. C. (2017). Virtual reality as a promising strategy in the assessment and treatment of bulimia nervosa and binge eating disorder: A systematic review. Behavioral Sciences (Sciences), 7(3) doi:10.3390/bs7030043 Franzen, M. D., and Wilhelm, K. L. (1996). Conceptual foundations of ecological validity in neuropsychological assessment, In R. J. Sbordoneand C. J. Long (Eds.), Ecological validity of neuropsychological testing (pp. 91–112). Boca Raton, FL: St Lucie Press. Fromberger, P., Jordan, K., & Müller, J. L. (2018). Virtual reality applications for diagnosis, risk assessment and therapy of child abusers. Behavioral Sciences and the Law, 36(2), 235–244. doi:10.1002/bsl. 2332 Gladwin, T. E., Hashemi, M. M., van Ast, V., & Roelofs, K. (2016). Ready and waiting: Freezing as active action preparation under threat. Neuroscience Letters, 619, 182–188. doi:10.1016/j.neulet.2016. 03.027 Hashemi, M. M., Gladwin, T. E., de Valk, N. M., Zhang, W., Kaldewaij, R., van Ast, V., … Roelofs, K. (2019). Neural dynamics of shooting decisions and the switch from freeze to fight. Scientific Reports, 9(1), 1–10. doi:10.1038/s41598-019-40917-8 Herrero, R., García-Palacios, A., Castilla, D., Molinari, G., & Botella, C. (2014). Virtual reality for the induction of positive emotions in the treatment of fibromyalgia: A pilot study over acceptability, satisfaction, and the effect of virtual reality on mood. Cyberpsychology, Behavior, and Social Networking, 17(6), 379–384. doi:10.1089/cyber. 2014.0052 Johnson, R. R., Stone, B. T., Miranda, C. M., Vila, B., James, L., James, S. M., … Berka, C. (2014). Identifying psychophysiological indices of expert vs. novice performance in deadly force judgment and decision making. Frontiers in Human Neuroscience, 8, 512. doi:10.3389/ fnhum.2014.00512 Keynan, J. N., Cohen, A., Jackont, G., Green, N., Goldway, N., Davidov, A., … Hendler, T. (2019). Electrical fingerprint of the amygdala guides neurofeedback training for stress resilience. Nature Human Behaviour, 3(1), 63–73. doi:10.1038/s41562-018-0484-3 Kuhlen, A. K., & Brennan, S. E. (2013). Language in dialogue: When confederates might be hazardous to your data. Psychonomic Bulletin and Review, 20(1), 54–72. doi:10.3758/s13423-012-0341-8 Lay, S., Brace, N., Pike, G., & Pollick, F. (2016). Circling around the uncanny valley: Design principles for research into the relation between human likeness and eeriness. i-Perception, 7(6), 204166951668130. doi:10.1177/2041669516681309 Lightner, A. D., Barclay, P., & Hagen, E. H. (2017). Radical framing effects in the ultimatum game: The impact of explicit culturally transmitted frames on economic decision-making. Royal Society Open Science, 4(12), 170543. doi:10.1098/rsos.170543 Lucas, G. M., Gratch, J., King, A., & Morency, L. P. (2014). It’s only a computer: Virtual humans increase willingness to disclose. Computers in Human Behavior, 37, 94–100. doi:10.1016/j.chb.2014. 04.043 Madary, M., & Metzinger, T. K. (2016). Real virtuality: A code of ethical conduct. Recommendations for good scientific practice and the consumers of VR-technology. Frontiers Robotics AI. Frontiers Media S.A. Retrieved from doi:10.3389/frobt.2016.00003 209 Mori, M. (1970). Bukimi no tani [The un-canny valley]. Energy, 7, 33–35. Pan, X., & Hamilton, A. F. D C. (2018). Why and how to use virtual reality to study human social interaction: The challenges of exploring a new research landscape. British Journal of Psychology, 109(3), 395–417. doi:10.1111/bjop.12290 Parsons T. D. (2015). Virtual reality for enhanced ecological validity and experimental control in the clinical, affective and social neurosciences. Frontiers in Human Neuroscience, 9, 660. doi:10.3389/ fnhum.2015.00660 Pascucci, D., Hervais-Adelman, A., & Plomp, G. (2018). Gating by induced Α-Γ asynchrony in selective attention. Human Brain Mapping, 39(10), 3854–3870. doi:10.1002/hbm.24216 Rihs, T. A., Michel, C. M., & Thut, G. (2007). Mechanisms of selective inhibition in visual spatial attention are indexed by alpha-band EEG synchronization. European Journal of Neuroscience, 25(2), 603–610. doi:10.1111/j.1460-9568.2007.05278.x Rizzo, A., & Shilling, R. (2017). Clinical Virtual Reality tools to advance the prevention, assessment, and treatment of PTSD. European Journal of Psychotraumatology, 8(sup5), 1414560. doi:10. 1080/20008198.2017.1414560 Rizzo, A., Cukor, J., Gerardi, M., Alley, S., Reist, C., Roy, M., … Difede, J. (2015). Virtual reality exposure for PTSD due to military combat and terrorist attacks. Journal of Contemporary Psychotherapy, 45(4), 255–264. doi:10.1007/s10879-015-9306-3 Roelofs, K. (2017). Freeze for action: Neurobiological mechanisms in animal and human freezing. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1718), 20160206. doi:10.1098/rstb. 2016.0206 Ros, T., Frewen, P., Théberge, J., Michela, A., Kluetsch, R., Mueller, A., … Lanius, R. A. (2016). Neurofeedback tunes scale-free dynamics in spontaneous brain activity. Cereb. Cortex 27, 4911–4922. doi:10. 1093/cercor/bhw285 Ros, T., J. Baars, B., Lanius, R. A., & Vuilleumier, P. (2014). Tuning pathological brain oscillations with neurofeedback: A systems neuroscience framework. Frontiers in Human Neuroscience, 8, 1008. doi: 10.3389/fnhum.2014.01008 Ros, T., Michela, A., Bellman, A., Vuadens, P., Saj, A., & Vuilleumier, P. (2017). Increased alpha-rhythm dynamic range promotes recovery from visuospatial neglect: A neurofeedback study. Neural Plasticity, 2017, 1. doi:10.1155/2017/7407241 Ros, T., Munneke, M. A. M., Ruge, D., Gruzelier, J., & Rothwell, J. C. (2010). Endogenous control of waking brain rhythms induces neuroplasticity in humans. European Journal of Neuroscience, 31(4), 770–778. Retrieved from doi: 10.1111/j.1460-9568.2010. 07100.x. Scholten, H., & Granic, I. (2019). Use of the principles of design thinking to address limitations of digital mental health interventions for youth: Viewpoint. Journal of Medical Internet Research, 21(1), e11528–14. doi:10.2196/11528 Schoneveld, E. A., Lichtwarck-Aschoff, A., & Granic, I. (2019). What keeps them motivated? Children’s views on an applied game for anxiety. Entertainment Computing, 29, 69–74. doi:10.1016/j.entcom. 2018.12.003 Schubert, D. (2013). Do we always have to strive for “realism”? Gamasutra. Retrieved from https://www.gamasutra.com/view/news/ 196663/Do_we_always_have_to_strive_for_realism.php Segovia, K. Y., & Bailenson, J. N. (2009). Virtually true: Children’s acquisition of false memories in virtual reality. Media Psychology, 12(4), 371–393. doi:10.1080/15213260903287267 Sitaram, R., Ros, T., Stoeckel, L., Haller, S., Scharnowski, F., LewisPeacock, J., … Sulzer, J. (2017). Closed-loop brain training: The science of neurofeedback. Nature Reviews Neuroscience, 18(2), 86. doi: 10.1038/nrn.2016.164 Stanney, K. M., & Hash, P. (1998). Locus of user-initiated control in virtual environments: Influences on cybersickness. Presence: Teleoperators and Virtual Environments, 7(5), 447–459. doi:10.1162/ 105474698565848 Stanney, K. M., Hale, K. S., Nahmens, I., & Kennedy, R. S. (2003). What to expect from immersive virtual environment exposure: 210 COMMENTARIES Influences of gender, body mass index, and past experience. Human Factors: The Journal of the Human Factors and Ergonomics Society, 45(3), 504–520. doi:10.1518/hfes.45.3.504.27254 Tajadura-Jiménez, A., Banakou, D., Bianchi-Berthouze, N., & Slater, M. (2017). Embodiment in a child-like talking virtual body influences object size perception, self-identification, and subsequent real speaking. Scientific Reports, 7(1), 1–12. doi:10.1038/s41598-017-09497-3 Vourvopoulos, A., Pardo, O. M., Lefebvre, S., Neureither, M., Saldana, D., Jahng, E., & Liew, S.-L. (2019). Effects of a brain-computer interface with virtual reality (VR) neurofeedback: A pilot study in chronic stroke patients. Frontiers in Human Neuroscience, 13(June), 1–17. doi:10.3389/fnhum.2019.00210 Wang, S., & Rochat, P. (2017). Human perception of animacy in light of the Uncanny Valley Phenomenon. Perception, 46(12), 1386–1411. doi:10.1177/0301006617722742 Wixon, D. (2003). Evaluating usability methods. Interactions, 10(4), 28. doi:10.1145/838830.838870 Yu, B., Funk, M., Hu, J., Wang, Q. & Feijs, L. (2018). Biofeedback for everyday stress management: A systematic review. Frontiers in ICT, 5, 1–23. doi:10.3389/fict.2018.00023 Złotowski J. A., Sumioka H., Nishio S., Glas D. F., Bartneck C., & Ishiguro H. (2015) Persistence of the uncanny valley: the influence of repeated interactions and a robot's attitude on its perception. Frontiers in Psychology 6, 883. doi:10.3389/fpsyg.2015.00883