In its early years, the field of computer vision was largely motivated by researchers seeking computational models of biological vision and solutions to practical problems in manufacturing, defense, and medicine. For the past two decades or so, there has been an increasing interest in computer vision as an input modality in the context of human-computer interaction. Such vision-based interaction can endow interactive systems with visual capabilities similar to those important to human-human interaction, in order to perceive non-verbal cues and incorporate this information in applications such as interactive gaming, visualization, art installations, intelligent agent interaction, and various kinds of command and control tasks. Enabling this kind of rich, visual and multimodal interaction requires interactive-time solutions to problems such as detecting and recognizing faces and facial expressions, determining a person's direction of gaze and focus of attention, tracking movement of the body, and recognizing various kinds of gestures. In building technologies for vision-based interaction, there are choices to be made as to the range of possible sensors employed (e.g., single camera, stereo rig, depth camera), the precision and granularity of the desired outputs, the mobility of the solution, usability issues, etc. Practical considerations dictate that there is not a one-size-fits-all solution to the variety of interaction scenarios; however, there are principles and methodological approaches common to a wide range of problems in the domain. While new sensors such as the Microsoft Kinect are having a major influence on the research and practice of vision-based interaction in various settings, they are just a starting point for continued progress in the area.
In this book, we discuss the landscape of history, opportunities, and challenges in this area of vision-based interaction; we review the state-of-the-art and seminal works in detecting and recognizing the human body and its components; we explore both static and dynamic approaches to "looking at people" vision problems; and we place the computer vision work in the context of other modalities and multimodal applications. Readers should gain a thorough understanding of current and future possibilities of computer vision technologies in the context of human-computer interaction.
Table of Contents: Preface / Acknowledgments / Figure Credits / Introduction / Awareness: Detection and Recognition / Control: Visual Lexicon Design for Interaction / Multimodal Integration / Applications of Vision-Based Interaction / Summary and Future Directions / Bibliography / Authors' Biographies