Academia.eduAcademia.edu

Interacting with Gestures: An Intelligent Virtual Environment

2006

The paper addresses the opportunity of using gestures in virtual environments by taking into account current technologies for capturing human gestures, existing interaction techniques and various aspects dealing with gesture and gesticulation functionalities. Virtual environments have proven to be of an effective usefulness for a variety of applications such as medical training and exploration, military simulations, architecture, phobia therapy, etc. A great variety of devices have been developed for effectively interacting with such environments and many interaction techniques have been introduced. Many of them prove to be cumbersome and more or less obstructing the user and hence affecting his/her virtual experience. Natural gestures as a mean of human computer interaction have the great advantage of being the ideal interface as natural, efficient and intuitive means of communication (people use gestures everyday in a natural manner). Discussion is equally conducted on gestures being used for the three main tasks commonly encountered in virtual environments: selection, manipulation and travel. References to gesture definition and classification according to different criteria and their relevance for interacting in virtual environments are equally taken into account.

The 1st International Conference on Virtual Learning, ICVL 2006 293 Interacting with Gestures: An Intelligent Virtual Environment Radu Daniel VATAVU1, Stefan-Gheorghe PENTIUC1 (1) University Stefan cel Mare Suceava str. Universitatii nr. 13, 720229 Suceava, ROMANIA E-mail: [email protected] Abstract The paper addresses the opportunity of using gestures in virtual environments by taking into account current technologies for capturing human gestures, existing interaction techniques and various aspects dealing with gesture and gesticulation functionalities. Virtual environments have proven to be of an effective usefulness for a variety of applications such as medical training and exploration, military simulations, architecture, phobia therapy, etc. A great variety of devices have been developed for effectively interacting with such environments and many interaction techniques have been introduced. Many of them prove to be cumbersome and more or less obstructing the user and hence affecting his/her virtual experience. Natural gestures as a mean of human computer interaction have the great advantage of being the ideal interface as natural, efficient and intuitive means of communication (people use gestures everyday in a natural manner). Discussion is equally conducted on gestures being used for the three main tasks commonly encountered in virtual environments: selection, manipulation and travel. References to gesture definition and classification according to different criteria and their relevance for interacting in virtual environments are equally taken into account. Keywords: virtual environment, human computer interaction, human gestures 1 Introduction Virtual environments have proven of great benefit and of real usefulness for a large variety of applications ranging from medical training and exploration (Mesure and Chaillou, 2000; Mesure et. al, 2003; Westwood et. al. 2000), architecture (Whyte, 2003), military simulations (Hogue et. al, 2003), etc. In what concerns the teaching process, virtual reality offers great experiences for both teachers and students (Dillenbourg, 2000; Weiss et al, 2006) whom may benefit of hands-on learning, working in group projects (under collaborative environments), simulations, field trips and visualization of concepts. Whatever the final aim they are targeting may be, simulation, learning or training, virtual environments could benefit greatly from a natural interaction. In the end, the virtual experience that is perceived by the user is influenced by experiential, cognitive, 294 Faculty of Mathematics and Computer Science, Bucharest perceptual and motor factors. A great variety of devices have been developed for effectively interacting with such environments and many interaction techniques have been introduced (Hirose 2001; Sherman 2003). Many of them prove to be cumbersome and more or less obstructing the user and hence affecting his/her virtual experience. Natural gestures as a mean of human computer interaction have the great advantage of being the ideal interface as natural, efficient and intuitive means of communication (people use gestures everyday in a natural manner to interact with the real world or to convey information). The paper discusses the use of human gestures for interacting inside virtual environments with reference to the three main tasks commonly encountered in VR: selection, manipulation and travel. References to gesture definition and classification according to different criteria and their relevance for interacting in virtual environments are equally taken into account. 2 An overview on human gestures Prior to anything, gestures can be defined as a physical movement of hands, arms, face and body with the intent of conveying information and meaning. Gestures convey information and are accompanied by content and semantics. Many studies (Cadoz 1994; Cassell 1996; Kendon 1986; McNeill 1992) were dedicated to studying human gestures from a psychological point of view. (Cadoz, 1994) identifies 3 types of gestures in accordance to their function: ̇ semiotic (gestures that produce informational messages for the environment. It is the type of gestures for yes/no, approve/deny type of actions for the human computer dialogue) ̇ ergotic (gestures that are associated with the idea of work and ability of modeling and manipulating the environment. It is the gesture that may be considered for interacting with the virtual objects of a virtual environment) ̇ epistemic (gestures that offer information with regards to environment revealing: temperature, pressure, surface quality for a given object, shape, orientation, weight) Semiotic gestures are further classified by (McNeill, 1992) according to their role in communication as: ̇ iconics (gestures that describe an actual concrete object or event and that bear a close relationship with the semantic content of speech) ̇ metaphorics (gestures similar to iconics but referring to abstract objects or events, depicting a general abstract idea) ̇ deictics (pointing gestures) ̇ beat-like (gestures that accentuate the meaning of a word or a phrase) (Kendon, 1986) describes a gesture continuum as follows: ̇ gesticulation or spontaneous movements of hands and arms that take place during speech and always accompany speech ̇ language-like gestures that represent gesticulation actually integrated into speech that replaces a word or a phrase ̇ pantomime are gestures that depict objects, events or actions that may be or not accompanied by speech The 1st International Conference on Virtual Learning, ICVL 2006 295 ̇ ̇ emblems or familiar gestures (for example the “V” sign for victory) sign languages that are sets of gestures and postures that define linguistic communication systems (for example ASL, the American Sign Language) Starting from gesticulation to sign languages, the association with speech gets more and more reduced, spontaneity decreases and social regulation increases. It has been noticed that gesticulation or spontaneous gestures represent about 90% of the entire human gestures. Studies conducted on iconic, metaphoric and deictic gestures show their segmentation into 3 phases (Kendon, 1986; Cassell 1996) an aspect that is very important considering the gesture acquisition process by an information system: shifting into the gesture space or the preparation phase, quick stroke and shifting back into the resting phase or the pulling back. 3 Equipments and technology A great variety of devices have been developed for interacting in virtual worlds that combine several technologies: magnetic (such as Ascension’s Flock of Birds), mechanical (Fakespace’s BOOM Tracker), acoustic (Logitech’s Fly Mouse), inertial (InterSense IS300), vision/video camera based or combined hybrid (InterSense IS600). One main property of input devices is the number of DOFs (Degrees of Freedom) they posses. In accordance with the type of events input devices generate, one may distinguish between: ̇ discrete input devices which generate one event at a time according to the user need - events are fired for example when the user presses a button. Examples include the traditional keyboard, the pinch glove or the virtual tool belt. In the case of (Fakespace Labs Pinch Gloves), the user will pinch two or more fingers for the device to signal an event. ̇ continuous input devices that generate a stream of events. Common examples are position / orientation trackers and data gloves ̇ hybrid devices that combine both discrete and continuous events. Examples include the ring mouse such as Pegasus FreeD and digital pen based tablets With respect to the main forms of feedback, one can classify devices into: ̇ ground referenced (Phantom devices) ̇ body referenced (for example CyberGrasp which is a lightweight forcereflecting exoskeleton that fits over a (CyberGlove) data glove wired version and adds resistive force feedback to each finger. Grasp forces are produced by a network of tendons routed to the fingertips via the exoskeleton) ̇ tactile (CyberTouch) 3.1. Video based interaction Video based gesture acquisition presents several advantages with respect to other technologies: it is non intrusive and does not require the user to wear additional equipments or devices, thus giving a comfortable feeling of a natural interaction: see Figure 1 for a natural interaction feel as in (Vatavu et al., 2006) for two hand postures associated with two distinct commands under the virtual environment (select and rotate an object). 296 Faculty of Mathematics and Computer Science, Bucharest There are also a few drawbacks that come with video processing, such as: ̇ the dependency on the environment (general problem in computer vision) with regards to the lighting conditions, camera’s acquisition settings, user skin color or the “background in motion” problem ̇ there are constraints that relate to the gesture dictionary (there may be hand or finger occlusion while performing certain gestures in a one camera view scenario) ̇ real time processing is a must for natural interaction and computer vision algorithms require a lot of computational resources Figure1. Interacting with gestures Recognizing two hand postures (select and rotate) Comparing to other gesture acquisition technologies, video sources present the advantage of the lack of intrusion; the user is not restricted to using or wearing additional equipments or instruments (for example sensor gloves), that creates the feeling of natural interaction. Video gesture recognition appears this way as the ideal technology for the human computer interaction by eliminating the inconveniences presented by other methods (the inopportunity of the keyboard or the traditional mouse in the virtual environments, wearing sensor gloves, etc.) Nevertheless, there are limitations of video processing systems, such as: video resolution is not sufficient for detecting high fidelity movements of fingers; 30 frames per second for conventional video capture devices is usually not enough for capturing quick hand movements (hand is quicker than the eye); fingers detection may become challenging cause of occlusion. Employed techniques include: motion detection and background modeling (Vatavu and Pentiuc, 2006), color detection (and particularly skin color detection as a preprocessing stage in hands/face detection) as in (Caetano and Barone, 2001; Gomez, 2002; Jones and Rehg, 1998; Vatavu et. al, 2005), pattern recognition methods (Davis, 1996; Starner and Pentland, 1995; Vatavu et al., 2006) etc. The 1st International Conference on Virtual Learning, ICVL 2006 297 4 Gestures in Virtual Environments We can distinguish three most important tasks with virtual objects and inside virtual environments: selection, manipulation and navigation. Selection is the action of specifying one or more objects from a set. Usual goals of selections are: indicate action on an object, make an object active, travel to object’s location, set up manipulation. Selection performance is affected by several factors: the object distance from the user, object’s size, density of objects in area. Commonly used selection techniques include: the virtual hand metaphor (Poupyrev et al., 1996), ray casting (Bolt, 1980), occlusion/framing (Pierce, 1997), naming or indirect selection. Selection tasks may be classified according to: feedback (graphical, tactile and audio), object indication (object touching, pointing, indirect selection) and the indication to select (button, gesture, voice). Manipulation is the task of modifying object properties (for example color, shape, orientation, position, behavior, etc). The goals of manipulation may be object placement, tool usage, etc. Navigation and travel are the tasks to control viewpoint and camera location via movement and way finding. Although it is very natural to gesture in the real world in order to interact with and model real objects or simply to transmit information, performing the same activities in the virtual environment is not self revealing. This in contrast with standard graphic user interfaces such as buttons, menus or selection lists. Gesturing comes in natural but in what regards virtual environments guidance must be available such as visual remainders of the interaction techniques. Several guidelines have been proposed when using gestures for interacting in virtual environments (Cerney and Vance, 2005; Turk, 2001). 5 Conclusions An overview on the use of natural gestures for interacting with virtual environments has been presented. Discussion was conducted on the psychological nature of gestures and on the particularities of interacting in the virtual world. A brief overview on gesture capturing technology was given with an emphasis on video based gesture aquisition. References Books: Bergman, B. and Klefsjö, B. (1995): Quality: from Customer Needs to Customer Satisfaction. Taylor & Francis, London. Hirose Michitaka (edt) (2001): Human-Computer Interaction - INTERACT'01, IOS Press. McNeill David (1992): Hand and mind: What gestures reveal about thought, University of Chicago Press. Sherman William, Alan Craig (2003): Understanding Virtual Reality: Interface, Application and Design, Morgan Kauffman. 298 Faculty of Mathematics and Computer Science, Bucharest Weiss, J., Nolan, J., Hunsinger, J., Trifonas, P. (Eds.) (2006): International Handbook of Virtual Learning Environments, Springer International Handbooks of Education, Vol. 14, Netherlands. Westwood J.D., Hoffman H.M., Mogel G.T., Robb R.A., Stredney D., eds. (2000): Medicine meets virtual reality 2000. Envisioning healing: interactive technology and the patient-practitioner dialogue. IOS Press, Amsterdam. Journal Articles: Cadoz Claude (1994): Le geste canal de communication homme/machine, Technique et Science Informatique, Vol. 13, No 1, 31-61 Meseure P., C. Chaillou (2000) A Deformable Body Model for Surgical Simulation, Journal of Visualization and Computer Animation, 11, 4, September 2000, 197-208 Whyte J (2003): Industrial applications of virtual reality in architecture and construction, ITcon Vol. 8, Special Issue Virtual Reality Technology in Architecture and Construction, 43-50 Cassell Justine (1996): A Framework for gesture generation and interpretation, Computer Vision in Human-Machine Interaction, R. Cipolla and A. Pentland Editors, Cambridge University Press, 1996 Vatavu Radu Daniel, Grisoni Laurent, Degrande Samuel, Chaillou Christophe, Pentiuc Ştefan-Gheorghe (2005): Adaptive Skin Color Detection in Unconstrained Environments using 2D Histogram Partitioning, Advances in Electrical and Computer Engineering, Volume 5, Number 1(23), 101-106 Conference Proceedings: Caetano T. S., Barone D. A. C. (2001): A probabilistic model for human skin color, Int. Conf. on Image Analysis and Processing, 279-283 Bolt Richard A. (1980) „Put-That-There”: Voice and Gesture at the Graphics Interface. In Proceedings of SIGGRAPH’ 80, ACM SIGGRAPH, NY, 262-270 Dillenbourg P. (2000): Virtual Learning Environments. In Proceedings of Eun Conference 2000: «Learning In The New Millennium: Building New Education Strategies For Schools», Workshop On Virtual Learning Environments Gomez Giovani (2002): On selecting color components for skin detection, In Proceedings of the International Conference on Pattern Recognition Hogue Jeffrey R., Markham Steve, Harmsen Arvid, MacDonald Jerry, Schmucker Cliff (2003): Parachute Mission Planning Training and Rehearsal Using a Deployable Virtual Reality Simulator. In Proceedings of the 17th AIAA Aerodynamic Decelerator Systems Conference and Seminar, Monterey, California, May 19-22. Mesure P., Davanne J., Hilde L., Lenoir J., France L., Triquet F., Chaillou C. (2003): A Physically Based Virtual Environment dedicated to Surgical Simulation. In Proceedings of Surgery Simulation and Soft Tissue Modeling (IS4TM’03) – LNCS, June 2003, 38-47 Pierce J., Forsbert A., Conway M., Hong S., Zelezink R., Mine M. (1997): Image plane interaction techniques in 3D immersive environments. In Proceedings of the ACM Symposium on Interactive 3D Graphics, ACM Press, 39-44 The 1st International Conference on Virtual Learning, ICVL 2006 299 Poupyrev I., Billinghurst M., Weghorst S., Ichikawa T. (1996) The go-go interaction technique: Non-linear mapping for direct manipulation in VR. In Proceedings of the ACM Symposium on User Interface Software and Technology, ACM Press, 79-80 Vatavu Radu Daniel, Pentiuc Stefan Gheorghe (2006): Motion and Color Cues for Hands Detection in Video Based Gesture Recognition. In Proceedings of the International Conference On Computers, Communications & Control, Oradea, Romania, 1-3 June, 465-469. Vatavu Radu-Daniel, Pentiuc Ştefan-Gheorghe, Chaillou Christophe, Grisoni Laurent, Degrande Samuel (2006): Visual Recognition of Hand Postures for Interacting with Virtual Environments. In Proceedings of the 8th International Conference on Development and Application Systems, 25-27 May, Suceava, Romania, 477-482. Technical Reports: Cerney Melinda M., Vance Judy M. (2005): Gesture Recognition in Virtual Environments: A Review and Framework for Future Development, Iowa State University Human Computer Interaction Technical Report ISU-HCI-2005-01 Davis James William (1996): Appearance-Based Motion Recognition of Human Actions, MIT Media Lab, TR 387 Jones Michael J., Rehg James M. (1998): Statistical color models with application to skin detection, Cambridge Research Laboratory, TR 98/11. Starner T., Pentland A. (1995): Real time american sign language recognition from video using hidden Markov model, TR. 375, MIT Media Laboratory Internet Sources: Ascension Technology Corporation (2005) http://www.ascension-tech.com/products/flockof birds.php Fakespace Labs http://www.fakespacelabs .com Immersion Corporation, the Cyber Grasp, Cyber Touch, Cyber Glove devices http://www.immersion.com/3d/products/cyber_grasp.php Intersense http://www.isense.com/products Logitech Inc. www.logitech.com Pegasus Technologies Ltd. http://www. pegatech.com Sensable Technologies, the Phantom devices http://www.sensable.com/products/phantom_ghost/phantom-omni.asp Turk Matthew, Gesture Recognition, chapter 10 http://ilab.cs.ucsb.edu/projects/turk/ TurkVEChapter.pdf