Distributed, Ambient and Pervasive Interactions: Technologies and Contexts PDF

This two volume set constitutes the refereed proceedings of the 6th International Conference on Distributed, Ambient and Pervasive Interactions, DAPI 2018, held as part of the 20th International Conference on Human-Computer Interaction, HCII 2018, held in Las Vegas, NV, USA in July 2018. The total of 1171 papers and 160 posters presented at the 14 colocated HCII 2018 conferences. The papers were carefully reviewed and selected from 4346 submissions. These papers address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The papers thoroughly cover the entire field of Human-Computer Interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas..TheLNCS 10921 and LNCS 10922 contains papers addressing the following major topics: Technologies and Contexts ( Part I) and Understanding Humans (Part IΙ)

122 downloads 2K Views 36MB Size

Report

Download pdf

Recommend Stories

Empty story

Idea Transcript

LNCS 10922

Norbert Streitz Shin’ichi Konomi (Eds.)

Distributed, Ambient and Pervasive Interactions Technologies and Contexts 6th International Conference, DAPI 2018 Held as Part of HCI International 2018 Las Vegas, NV, USA, July 15–20, 2018, Proceedings, Part II

123

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany

10922

More information about this series at http://www.springer.com/series/7409

Norbert Streitz Shin’ichi Konomi (Eds.) •

Distributed, Ambient and Pervasive Interactions Technologies and Contexts 6th International Conference, DAPI 2018 Held as Part of HCI International 2018 Las Vegas, NV, USA, July 15–20, 2018 Proceedings, Part II

123

Editors Norbert Streitz Smart Future Initiative Frankfurt am Main Germany

Shin’ichi Konomi Learning Analytics Center Kyushu University Fukuoka Japan

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-91130-4 ISBN 978-3-319-91131-1 (eBook) https://doi.org/10.1007/978-3-319-91131-1 Library of Congress Control Number: 2018942172 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © Springer International Publishing AG, part of Springer Nature 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. Printed on acid-free paper This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

The 20th International Conference on Human-Computer Interaction, HCI International 2018, was held in Las Vegas, NV, USA, during July 15–20, 2018. The event incorporated the 14 conferences/thematic areas listed on the following page. A total of 4,373 individuals from academia, research institutes, industry, and governmental agencies from 76 countries submitted contributions, and 1,170 papers and 195 posters have been included in the proceedings. These contributions address the latest research and development efforts and highlight the human aspects of design and use of computing systems. The contributions thoroughly cover the entire ﬁeld of human-computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. The volumes constituting the full set of the conference proceedings are listed in the following pages. I would like to thank the program board chairs and the members of the program boards of all thematic areas and afﬁliated conferences for their contribution to the highest scientiﬁc quality and the overall success of the HCI International 2018 conference. This conference would not have been possible without the continuous and unwavering support and advice of the founder, Conference General Chair Emeritus and Conference Scientiﬁc Advisor Prof. Gavriel Salvendy. For his outstanding efforts, I would like to express my appreciation to the communications chair and editor of HCI International News, Dr. Abbas Moallem. July 2018

Constantine Stephanidis

HCI International 2018 Thematic Areas and Afﬁliated Conferences

Thematic areas: • Human-Computer Interaction (HCI 2018) • Human Interface and the Management of Information (HIMI 2018) Afﬁliated conferences: • 15th International Conference on Engineering Psychology and Cognitive Ergonomics (EPCE 2018) • 12th International Conference on Universal Access in Human-Computer Interaction (UAHCI 2018) • 10th International Conference on Virtual, Augmented, and Mixed Reality (VAMR 2018) • 10th International Conference on Cross-Cultural Design (CCD 2018) • 10th International Conference on Social Computing and Social Media (SCSM 2018) • 12th International Conference on Augmented Cognition (AC 2018) • 9th International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics, and Risk Management (DHM 2018) • 7th International Conference on Design, User Experience, and Usability (DUXU 2018) • 6th International Conference on Distributed, Ambient, and Pervasive Interactions (DAPI 2018) • 5th International Conference on HCI in Business, Government, and Organizations (HCIBGO) • 5th International Conference on Learning and Collaboration Technologies (LCT 2018) • 4th International Conference on Human Aspects of IT for the Aged Population (ITAP 2018)

Conference Proceedings Volumes Full List 1. LNCS 10901, Human-Computer Interaction: Theories, Methods, and Human Issues (Part I), edited by Masaaki Kurosu 2. LNCS 10902, Human-Computer Interaction: Interaction in Context (Part II), edited by Masaaki Kurosu 3. LNCS 10903, Human-Computer Interaction: Interaction Technologies (Part III), edited by Masaaki Kurosu 4. LNCS 10904, Human Interface and the Management of Information: Interaction, Visualization, and Analytics (Part I), edited by Sakae Yamamoto and Hirohiko Mori 5. LNCS 10905, Human Interface and the Management of Information: Information in Applications and Services (Part II), edited by Sakae Yamamoto and Hirohiko Mori 6. LNAI 10906, Engineering Psychology and Cognitive Ergonomics, edited by Don Harris 7. LNCS 10907, Universal Access in Human-Computer Interaction: Methods, Technologies, and Users (Part I), edited by Margherita Antona and Constantine Stephanidis 8. LNCS 10908, Universal Access in Human-Computer Interaction: Virtual, Augmented, and Intelligent Environments (Part II), edited by Margherita Antona and Constantine Stephanidis 9. LNCS 10909, Virtual, Augmented and Mixed Reality: Interaction, Navigation, Visualization, Embodiment, and Simulation (Part I), edited by Jessie Y. C. Chen and Gino Fragomeni 10. LNCS 10910, Virtual, Augmented and Mixed Reality: Applications in Health, Cultural Heritage, and Industry (Part II), edited by Jessie Y. C. Chen and Gino Fragomeni 11. LNCS 10911, Cross-Cultural Design: Methods, Tools, and Users (Part I), edited by Pei-Luen Patrick Rau 12. LNCS 10912, Cross-Cultural Design: Applications in Cultural Heritage, Creativity, and Social Development (Part II), edited by Pei-Luen Patrick Rau 13. LNCS 10913, Social Computing and Social Media: User Experience and Behavior (Part I), edited by Gabriele Meiselwitz 14. LNCS 10914, Social Computing and Social Media: Technologies and Analytics (Part II), edited by Gabriele Meiselwitz 15. LNAI 10915, Augmented Cognition: Intelligent Technologies (Part I), edited by Dylan D. Schmorrow and Cali M. Fidopiastis 16. LNAI 10916, Augmented Cognition: Users and Contexts (Part IΙ), edited by Dylan D. Schmorrow and Cali M. Fidopiastis 17. LNCS 10917, Digital Human Modeling and Applications in Health, Safety, Ergonomics, and Risk Management, edited by Vincent G. Duffy 18. LNCS 10918, Design, User Experience, and Usability: Theory and Practice (Part I), edited by Aaron Marcus and Wentao Wang

X

Conference Proceedings Volumes Full List

19. LNCS 10919, Design, User Experience, and Usability: Designing Interactions (Part II), edited by Aaron Marcus and Wentao Wang 20. LNCS 10920, Design, User Experience, and Usability: Users, Contexts, and Case Studies (Part III), edited by Aaron Marcus and Wentao Wang 21. LNCS 10921, Distributed, Ambient, and Pervasive Interactions: Understanding Humans (Part I), edited by Norbert Streitz and Shin’ichi Konomi 22. LNCS 10922, Distributed, Ambient, and Pervasive Interactions: Technologies and Contexts (Part IΙ), edited by Norbert Streitz and Shin’ichi Konomi 23. LNCS 10923, HCI in Business, Government, and Organizations, edited by Fiona Fui-Hoon Nah and Bo Sophia Xiao 24. LNCS 10924, Learning and Collaboration Technologies: Design, Development and Technological Innovation (Part I), edited by Panayiotis Zaphiris and Andri Ioannou 25. LNCS 10925, Learning and Collaboration Technologies: Learning and Teaching (Part II), edited by Panayiotis Zaphiris and Andri Ioannou 26. LNCS 10926, Human Aspects of IT for the Aged Population: Acceptance, Communication, and Participation (Part I), edited by Jia Zhou and Gavriel Salvendy 27. LNCS 10927, Human Aspects of IT for the Aged Population: Applications in Health, Assistance, and Entertainment (Part II), edited by Jia Zhou and Gavriel Salvendy 28. CCIS 850, HCI International 2018 Posters Extended Abstracts (Part I), edited by Constantine Stephanidis 29. CCIS 851, HCI International 2018 Posters Extended Abstracts (Part II), edited by Constantine Stephanidis 30. CCIS 852, HCI International 2018 Posters Extended Abstracts (Part III), edited by Constantine Stephanidis

http://2018.hci.international/proceedings

6th International Conference on Distributed, Ambient, and Pervasive Interactions Program Board Chair(s): Norbert Streitz, Germany and Shin’ichi Konomi, Japan • • • • • • • • • • •

Andreas Braun, Germany Wei Chen, P.R. China Alois Ferscha, Austria Dimitris Grammenos, Greece Nuno Guimarães, Portugal Jun Hu, The Netherlands Pedro Isaias, Australia Achilles Kameas, Greece Kristian Kloeckl, USA Antonio Maña, Spain Takuya Maekawa, Japan

• • • • • • • • • • •

Panos Markopoulos, The Netherlands Irene Mavrommati, Greece Tatsuo Nakajima, Japan Anton Nijholt, The Netherlands Guochao (Alex) Peng, P.R. China Carsten Röcker, Germany Tanya Toft, Denmark Reiner Wichert, Germany Chui Yin Wong, Malaysia Woontack Woo, South Korea Xenophon Zabulis, Greece

The full list with the Program Board Chairs and the members of the Program Boards of all thematic areas and afﬁliated conferences is available online at:

http://www.hci.international/board-members-2018.php

HCI International 2019 The 21st International Conference on Human-Computer Interaction, HCI International 2019, will be held jointly with the afﬁliated conferences in Orlando, FL, USA, at Walt Disney World Swan and Dolphin Resort, July 26–31, 2019. It will cover a broad spectrum of themes related to Human-Computer Interaction, including theoretical issues, methods, tools, processes, and case studies in HCI design, as well as novel interaction techniques, interfaces, and applications. The proceedings will be published by Springer. More information will be available on the conference website: http://2019.hci.international/. General Chair Prof. Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece E-mail: [email protected]

http://2019.hci.international/

Contents – Part II

Human Activity and Context Understanding Understanding Animal Behavior Using Their Trajectories: A Case Study of Gender Specific Trajectory Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilya Ardakani, Koichi Hashimoto, and Ken Yoda

3

Visualization of Real World Activity on Group Work . . . . . . . . . . . . . . . . . Daisuke Deguchi, Kazuaki Kondo, and Atsushi Shimada

23

A Multi-level Localization System for Intelligent User Interfaces . . . . . . . . . Mario Heinz, Sebastian Büttner, Martin Wegerich, Frank Marek, and Carsten Röcker

38

Survey on Vision-Based Path Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . Tsubasa Hirakawa, Takayoshi Yamashita, Toru Tamaki, and Hironobu Fujiyoshi

48

Neural Mechanisms of Animal Navigation . . . . . . . . . . . . . . . . . . . . . . . . . Koutarou D. Kimura, Masaaki Sato, and Midori Sakura

65

Towards Supporting Multigenerational Co-creation and Social Activities: Extending Learning Analytics Platforms and Beyond. . . . . . . . . . . . . . . . . . Shin’ichi Konomi, Kohei Hatano, Miyuki Inaba, Misato Oi, Tsuyoshi Okamoto, Fumiya Okubo, Atsushi Shimada, Jingyun Wang, Masanori Yamada, and Yuki Yamada

82

Designing a Mobile Behavior Sampling Tool for Spatial Analytics . . . . . . . . Shin’ichi Konomi and Tomoyo Sasao

92

Design and Evaluation of Seamless Learning Analytics . . . . . . . . . . . . . . . . Kousuke Mouri, Noriko Uosaki, and Atsushi Shimada

101

Easy-to-Install Methods for Indoor Context Recognition Using Wi-Fi Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuya Ohara and Takuya Maekawa Finding Discriminative Animal Behaviors from Sequential Bio-Logging Trajectory Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takuto Sakuma, Kazuya Nishi, Shuhei J. Yamazaki, Koutarou D. Kimura, Sakiko Matsumoto, Ken Yoda, and Ichiro Takeuchi

112

125

XVI

Contents – Part II

A Look at Feet: Recognizing Tailgating via Capacitive Sensing . . . . . . . . . . Dirk Siegmund, Sudeep Dev, Biying Fu, Doreen Scheller, and Andreas Braun Sensing, Perception and Decision for Deep Learning Based Autonomous Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takayoshi Yamashita

139

152

Human Enhancement in Intelligent Environments The Reconfigurable Wall System: Designing a Responsive Structure Reactive to Socio-Environmental Conditions. . . . . . . . . . . . . . . . . . . . . . . . Mostafa Alani, Arash Soleimani, Evan Murray, Anthony Bah, Adam Leicht, and Salman Sajwani Can Machine Learning Techniques Provide Better Learning Support for Elderly People? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kohei Hatano Holistic Quantified Self Framework for Augmented Human . . . . . . . . . . . . . Juyoung Lee, Eunseok Kim, Jeongmin Yu, Junki Kim, and Woontack Woo An Intuitive and Personal Projection Interface for Enhanced Self-management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Doreen Scheller, Benjamin Bauer, Andrea Krajewski, Claudius Coenen, Dirk Siegmund, and Andreas Braun Potential of Wearable Technology for Super-Aging Societies . . . . . . . . . . . . Atsushi Shimada Evaluating Learning Style-Based Grouping Strategies in Real-World Collaborative Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuta Taniguchi, Yiduo Gao, Kentaro Kojima, and Shin’ichi Konomi Behavior Mapping of Sketching in VR Space with Physical Tablet Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenjie Xu, Defu Bao, Qifei Wu, Yi Zhou, Xuning Wu, Fangtian Ying, and Cheng Yao Effective Learning Environment Design for Aging Well: A Review . . . . . . . Masanori Yamada, Misato Oi, and Shin’ichi Konomi

167

178 188

202

214

227

240

253

Affect and Humour in Intelligent Environments Computing Atmospheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasmine Abbas

267

Contents – Part II

XVII

Providing Daily Casual Information Through Eye Contact with Emotional Creatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hina Akasaki, Kota Gushima, and Tatsuo Nakajima

278

Touch: Communication of Emotion Through Computational Textile Expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Felecia Davis

292

Comparing Jokes with NLP: How Far Can Joke Vectors Take Us? . . . . . . . . Xiaonan Jing, Chinmay Talekar, and Julia Taylor Rayz

310

Designing Humour in Interaction: A Design Experience. . . . . . . . . . . . . . . . Andreea I. Niculescu, Bimlesh Wadhwa, and Anton Nijholt

327

Humor Facilitation of Polarized Events . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Valitutti

337

Plug and Play for a Transferrable Sense of Humour . . . . . . . . . . . . . . . . . . Tony Veale

348

Automatic Joke Generation: Learning Humor from Examples . . . . . . . . . . . . Thomas Winters, Vincent Nys, and Daniel De Schreye

360

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

379

Contents – Part I

Designing and Developing Intelligent Environment Design Towards AI-Powered Workplace of the Future. . . . . . . . . . . . . . . . . Yujia Cao, Jiri Vasek, and Matej Dusik A Comparative Testing on Performance of Blockchain and Relational Database: Foundation for Applying Smart Technology into Current Business Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Si Chen, Jinyu Zhang, Rui Shi, Jiaqi Yan, and Qing Ke

3

21

Hybrid Connected Spaces: Mediating User Activities in Physical and Digital Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carla Farina, Sotirios D. Kotsopoulos, and Federico Casalegno

35

A Novel Interaction Design Approach for Accessing Daily Casual Information Through a Virtual Creature . . . . . . . . . . . . . . . . . . . . . . . . . . . Kota Gushima, Hina Akasaki, and Tatsuo Nakajima

56

Automatic Generation of Human-Computer Interfaces from BACnet Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lawrence Henschen, Julia Lee, and Ries Guthmann

71

The AR Strip: A City Incorporated Augmented Reality Educational Curriculum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Si Jung Kim, Su Jin Park, Yunhwan Jeong, Jehoshua Josue, and Mary Valdez

85

Evaluating User Experience in Smart Home Contexts: A Methodological Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Mechant, Anissa All, and Lieven De Marez

91

Planning Placement of Distributed Sensor Nodes to Achieve Efficient Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuichi Nakamura, Masaki Ito, and Kaoru Sezaki

103

Flavor Explore: Rapid Prototyping and Evaluation of User Interfaces . . . . . . Shi Qiu, Liangyi Du, Ting Han, and Jun Hu

114

HCI Design for People with Visual Disability in Social Interaction . . . . . . . . Shi Qiu, Ting Han, Hirotaka Osawa, Matthias Rauterberg, and Jun Hu

124

On Interdependent Metabolic Structures: The Case of Cyborg Garden . . . . . . Zenovia Toloudi and Spyridon Ampanavos

135

XX

Contents – Part I

VisHair: A Wearable Fashion Hair Lighting Interaction System . . . . . . . . . . Cheng Yao, Bing Li, Fangtian Ying, Ting Zhang, and Yijun Zhao

146

Design for Fetal Heartbeat Detection and Monitoring in Pregnancy Care . . . . Biyong Zhang, Iuliia Lebedeva, Haiqiang Zhang, and Jun Hu

156

Internet of Things and Smart Cities Collecting Bus Locations by Users: A Crowdsourcing Model to Estimate Operation Status of Bus Transit Service . . . . . . . . . . . . . . . . . . Kenro Aihara, Piao Bin, Hajime Imura, Atsuhiro Takasu, and Yuzuru Tanaka Home Automation Internet of Things: Adopted or Diffused? . . . . . . . . . . . . Badar H. Al Lawati and Xiaowen Fang Visualization of Farm Field Information Based on Farm Worker Activity Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daisaku Arita, Yoshiki Hashimoto, Atsushi Shimada, Hideaki Uchiyama, and Rin-ichiro Taniguchi The Use of Live-Prototypes as Proxy Technology in Smart City Living Lab Pilots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michelle Boonen and Bram Lievens Study on Innovative Design of Urban Intelligent Lighting Appliance (UILA) Based on Kansei Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianxin Cheng, Junnan Ye, Chaoxiang Yang, Lingyun Yao, Zhenzhen Ma, and Tengye Li UMA-P: Smart Bike Interaction that Adapts to Environment, User Habits and Companions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiachun Du, Ran Luo, Min Zou, Yuebo Shen, and Ying Yang Simulation of Energy Management by Controlling Crowd Behavior . . . . . . . Maiya Hori, Keita Nakayama, Atsushi Shimada, and Rin-ichiro Taniguchi Socio-Technical Challenges of Smart Fleet Equipment Management Systems in the Maritime Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingyi Jiang, Guochao Peng, and Fei Xing Opportunistic Data Exchange Algorithm for Animal Wearable Device Through Active Behavior Against External Stimuli . . . . . . . . . . . . . . . . . . . Keijiro Nakagawa, Atsuya Makita, Miho Nagasawa, Takefumi Kikusui, Kaoru Sezaki, and Hiroki Kobayashi

171

181

191

203

214

223 232

242

253

Contents – Part I

Measuring Scarcity or Balancing Abundance: Some Reflections on Human-Building Interaction Paradigms from an Architectural Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selena Savic Design and Development of an Electric Skateboard Controlled Using Weight Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sai Vinay Sayyapureddi, Vishnu Raju Nandyala, Akil Komarneni, and Deep Seth Challenges for Deploying IoT Wearable Medical Devices Among the Ageing Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fei Xing, Guochao Peng, Tian Liang, and Jingyi Jiang Practical and Numerical Investigation on a Minimal Design Navigation System of Bats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasufumi Yamada, Kentaro Ito, Ryo Kobayashi, Shizuko Hiryu, and Yoshiaki Watanabe Design and Research on Human-Computer Interactive Interface of Navigation Robot in the IOT Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ye Zhang, Bingmei Bie, and Rongrong Fu

XXI

264

275

286

296

316

Intelligent Environments for Cultural Heritage and Creativity Collaborative Music Composition Based on Sonic Interaction Design . . . . . . Mauro Amazonas, Victor Vasconcelos, Adriano Brandão, Gustavo Kienem, Thaís Castro, Bruno Gadelha, and Hugo Fuks A Study on the Virtual Reality of Folk Dance and Print Art - Taking White Crane Dance for Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jia-Ming Day, Der-Lor Way, Ke-Jiuan Chen, Weng-Kei Lau, and Su-Chu Hsu

335

347

LIVEJACKET: Wearable Music Experience Device with Multiple Speakers . . . Satoshi Hashizume, Shinji Sakamoto, Kenta Suzuki, and Yoichi Ochiai

359

An Interactive Smart Music Toy Design for Children . . . . . . . . . . . . . . . . . Shijian Luo, Yun Wang, Na Xiong, Ping Shan, and Yexing Zhou

372

Robotic Stand-Up Comedy: State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . Anton Nijholt

391

Study on the Digital Expansion of Chinese Static Works of Art . . . . . . . . . . Jin Sheng and Ziqiao Wang

411

XXII

Contents – Part I

Case Study of AR Field Museum for Activating Local Communities. . . . . . . Tomohiro Tanikawa, Junichi Nakano, Takuji Narumi, and Michitaka Hirose

428

VR Games and the Dissemination of Cultural Heritage . . . . . . . . . . . . . . . . Lie Zhang, Weiying Qi, Kun Zhao, Liang Wang, Xingdong Tan, and Lin Jiao

439

Thinking Transformation of Traditional Animation Creation Based on the Virtual Reality Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yue Zhou and Yunpeng Xu

452

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

467

Human Activity and Context Understanding

Understanding Animal Behavior Using Their Trajectories A Case Study of Gender Speciﬁc Trajectory Trends Ilya Ardakani1, Koichi Hashimoto1 ✉ , and Ken Yoda2 (

1

)

Department of System Information Sciences, Graduate School of Information Sciences, Tohoku University, Aoba-ku, Sendai 980-8579, Japan [email protected], [email protected] 2 Department of Behavior and Evolution, Graduate School of Environmental Studies, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan [email protected]

Abstract. Generally, behavior and movement could be closely attributed with each other. In other words, in most cases, behavior expressed as a set of speciﬁc movement patterns in time. These movement traces or trajectories representing behavior could provide a window into the underlying state of the subjects. In this study, analogies drawn between text and trajectories which allowed us to employ sentiment analysis and topic model methods to analyze trajectories. It is assumed that trajectories consist of key points which are commonly and frequently trav‐ ersed. It is proposed that analogously in trajectory analysis, key points frequency would encapsulate information about the subject or the key points in trajectory generated by latent distribution which attributed to certain behavior or speciﬁc group of subjects with similar behavioral features. To test this hypothesis, an experiment was conducted which examines the inﬂuence of gender in composi‐ tion of key points in birds’ trajectories logged from a seabird species called Streaked Shearwater Calonectris leucomelas. It was shown that genders have speciﬁc distribution over the key points. Therefore, key points membership in trajectory could be attributed to a speciﬁc gender and even a simple classiﬁer would provide information about the gender of the subject simply by observing the trajectory’s key points. It was concluded that like text, trajectories composed of smaller elements which could be associated to a speciﬁc latent state. Learning or exploiting these associations revealed essential information about identity and behavior of the subject of observation. Keywords: Animal movement · Animal behavior · Trajectory mining

1

Introduction

Movement of an organism typically provides considerable amount of information about its behavior and internal state. In general movement referred to as change in spatial features in time. These features could be size or position of the parts of the organism or location of it as whole. In this study, focus is the latter. The movement data oﬀers © Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 3–22, 2018. https://doi.org/10.1007/978-3-319-91131-1_1

4

I. Ardakani et al.

information about lifestyle of an organism [1], its role and its interaction with its envi‐ ronment [2]. Nathan et al. [3] observed importance of the movement to life and fate of organisms and proposed a unifying framework for research about it. Furthermore, Cagnacci et al. [4] investigated the eﬀects and importance of the advances in GPS-based radio telemetry on the research on animal ecology and behavior types. Both advantages such as continuous availability of the data, its accuracy and disadvantages range between device errors and failures to battery and memory limitations. On the other hand, consid‐ ering ever growing volume of logged data, researchers require new and revolutionary methods in order to manage, analyze and visualize the collected movement information [5]. The result of these analysis would provide a window into behavior of organisms, their reaction to environment stimuli and their life phases [6–10]. In addition, informa‐ tion about the environment inferred from movement data collected from the organisms. For instance, Goto et al. showed that it is possible to use sensory data collected from the seabird to create a higher resolution map of wind data in [11]. Study of movement, typically approached in either of Eulerian or Lagrangian. Lagrangian approach analyze and quantify the individual’s movement while the Eulerian approach concerns the population movements and distributions. It is worth mentioning, availability of light GPS-based telemetry devices provided the increasing possibility of Lagrangian approaches to log movement of relatively smaller and longrange organisms [12]. This study limits its scope extracting species speciﬁc behavioral cues from move‐ ment data. It aims to contribute a solution to semantical analysis of trajectories rather than dynamical considerations. It is closer to Eulerian approach rather than Lagrangian approach. Trajectories are converted to series of key points identiﬁed based on density of visits at corresponding spatial location. These series then analyzed to identify the behavioral features of the subjects or organism generated the trajectory. Given the conceptual framework proposed in [3], only components of internal state (Why move?) and navigation capacity (Where to move?) are considered. Here, analogies were drawn between the language models and these sequences, then techniques used in sentiment analysis utilized to diﬀerentiate the trajectories based on behavioral features of the subject. To demonstrate practicality of this approach, it was applied on trajectory data collected from a species of the seabirds called Streaked Shearwater Calonectris leuco‐ melas to discover the categorical diﬀerences in trajectories of male and female bird which were well documented. As result, these features diﬀerence express hidden states of the generators of trajectories which conﬁrm the behavioral diﬀerences in the genders. The rest of this literature is arranged as follows. Next section provides related works and a background review of methods used. In the third section, applied methods are elaborated and in the fourth section experiment setup and results are presented and discussed. Finally, in the last section conclusions are drawn.

2

Related Work and Literature Review

As mentioned in previous section, conceptual framework for movement ecology as proposed in Nathan et al. [3] consists of three fundamental components. These

Understanding Animal Behavior Using Their Trajectories

5

components originated from addressing following essential questions: why move? How to move? When to move? What are the consequences of movement in terms of ecology and evolution? Answer to these key questions bring about major analysis challenges of movement data. One is how these data collected regarding sampling frequency and resolution. Next is segmentation of these logged stream of spatial points into meaningful sequence of phases. Another major challenge is produce multilevel spatiotemporal scales of trajectory phases. As presented by Getz et al. [13] movement phases consist of canonical activity modes which can vary from tumbling to gliding in various organ‐ isms. Final challenge is attributed to the environment in which the organisms reside. Environmental factors certainly have an eﬀect in all components of the introduced framework. For instance, vegetation or winds eﬀectively adjust the movement path of the animal [11, 14]. A complete diagram of the framework proposed by Nathan et al. [3] is shown in Fig. 1. Given movement path component of this framework there were attempts to imply internal state, external factors or motion capacity of the organism. For instance, Gao et al. [15] utilized sequence analysis and clustering techniques to identify the land animal species based on their trajectory. Given diﬀerent species having realiz‐ able internal states, motion capacity and navigation capacity which would lead to gener‐ ation of diﬀerent movement paths. To quantify these movements, trajectories converted to sequences of spatial clustered points. Then, theses sequences were compared with each other using Longest Common SubSequent Method and Entropy [16, 17]. Brillinger et al. [18] used stochastic diﬀerential equation models and statistical inference on Fourier transforms of trajectory data of species of land animals, elk and deer, to analyze the eﬀects of habitat variables and behavior of other animals on their trajectories. Bowyer and Stewart et al. [19, 20] studied the gender segregation of the species of deer based on their behavioral response to habitat alterations. These diﬀerential movements and use of the space also observed by Beier [21].

Fig. 1. Movement ecology framework proposed in [3]. Components concerning cognitive studies contain both individual’s and external factors.

6

I. Ardakani et al.

Gender speciﬁc behavior identiﬁcation based on movement data was topic of various biological studies. Since focus of this study is Streaked Shearwaters, various relatable researches are reviewed here. Foraging behavior and locations of Streaked Shearwaters were studied in [9, 10, 22]. The change in foraging behavior and strategies based on trip length was observed by Matsumoto et al. in [9]. Yoda et al. [22] researched about understanding relationships between animal behavior and ocean currents. This also was observed by Tiunov et al. [10] that there was relationship between water ﬂows and Streaked Shearwater visits. As one could conclude, trajectories are counted as media for understanding animal behavior. Since in this work text mining techniques are employed, in following subsections, a brief review of required background is provided. But before that, a short deﬁnition of Stay Point, a basic concept in trajectory mining, and DBSCAN, a density-based clus‐ tering method is included. 2.1 Stay Point Stay points in trajectory mining categorized as locations at which subject remains within certain spatial range for at least certain span of time [1]. It was proposed that the stay point carries semantic meaning. Stay points could be also divided into two major types. One where the subject remains stationary during span threshold time. The other is where subject remains within certain radius and mean of coordinates would represent the stay point location. 2.2 Density-Based Clustering In contrast to clustering techniques like K-Mean and Gaussian Mixture Models, densitybased clustering methods do not require initial guess of number of data clusters. One of the density-based clustering method is DBSCAN introduced by Ester et al. [23]. It only requires minimum neighbor count and neighborhood distance threshold to deﬁne a point as core point. If a point is in neighborhood but itself has lower count of minimum neigh‐ bors in its neighborhood called border point. Directly density reachable point was deﬁned as point in minimum neighborhood of a core point. Two points are density connected if both are member of neighborhood of a third core point. DBSCAN starts with an arbitrary point and detects all density-reachable points from the chosen points and given having minimum number of neighbors it is recorded as a cluster. Then the next point is processed. If minimum distance two clusters points is more than distance threshold, two clusters are separated otherwise merged. The algorithm is explained in detail in [23]. 2.3 Term Frequency and Inverse Document Frequency In text mining, to employ numerical methods, the text contents should be represented as numbers or numerical vectors. There are diﬀerent approaches to this problem, but here ones which used in this work are reviewed. Typically, words in documents are represented as tokens with integer ids. Term frequency tf is number of occurrence of

Understanding Animal Behavior Using Their Trajectories

7

tokens in a document. This is a way to provide a numerical representation of documents of variable length with ﬁxed length vectors. So, by ﬁxing number of tokens, a corpus could be represented as matrix of m × n where m is number of tokens and n in number of documents. In other words, corpus is of n samples with length m feature vectors. Using token frequency may not be most eﬃcient way of representing a document. There are words which appear in most documents very frequently with very little semantical information such as “is”, “the” and “a”. To encounter this, term frequency inverse docu‐ ment frequency tf − idf method is used [24]. It is calculated by weighing tf by idf of each token and then normalizing by Euclidean norm. 2.4 Sentiment Analysis and Topic Models Sentiment analysis is identifying, extracting and quantifying latent state and subjec‐ tive information from text and speech. Concerning scope of this study, sentiment analysis is used to infer information about topic of document or speech and internal state or attitude of the generator. This latent structure could be modeled as genera‐ tive distributions which produce the words in document. This is also referred to as Topic Modeling. For instance, Latent Dirichlet Allocation (LDA) could be a proba‐ bilistic model which intuitively explains the documents topics and their words [25]. Given a random mixture of hidden topics in a document, words distributed according to a topic specific distribution in that document. This consists of with two series of Dirichlet distributions draws, one for topic document and the other for topic word, and two series multinomial distributions for topic assignment to each word and document assignment to each word. Model’s parameters estimation is commonly performed using inference methods like Gibbs sampling [26]. In discriminative approach, sentimental analysis could be utilized to detect the polarity of the speech or document. This involves family of discriminative models which classify the text sentiment or document topic based on features extracted from the words in them. A great example is Twitter sentiment analysis which was topic of numerous studies [27, 28]. Most of these methods assume bag-of-words model, which in probability theory referred to as exchangeability assumption [29], that ignores the ordering of the words in documents.

3

Methodology

In this section, the intuition behind relationship between sentiment analysis and trajec‐ tory-based behavior analysis is discussed. As suggested in [3] the navigation capacity and internal state of organism are viewed part of cognitive process as with word gener‐ ation in language. The relationship between consecutive locations in trajectory formu‐ lized as Eq. (1) where ut , ut+1 are two positions at times t, t + 1, wt is internal state, rt being environmental factors, Φ is navigation capacity and Ω is motion capacity. ut+1 = F(Ω, Φ, rt , wt , ut )

(1)

8

I. Ardakani et al.

In this model, with small time steps, assuming exchangeability is unlikely. It is due to dynamical constrains which create a dependence between consecutive locations. However, if trajectories represented as sequence of K key points which are having semantical relevance. In this way, independence could be assumed between key points of a trajectory given the other parameters. Here, the destination exercises a stronger semantical meaning rather the path traversed to reach the destination (Fig. 2). ( ) ∏ P uk , … , uK |Ω, Φ, W, R = P(ui |Ω, Φ, W, R)

(2)

k∈K

(note)Λ = [Ω, Φ, W, R]

Fig. 2. (left) Movement model with sequential chained dependencies. (right) Movement model assuming conditional independence.

Similarly, in sentiment analysis the bag-of-words model assumed which neglected the ordering of the words in sentence. This may not always produce great results in more combinatory vocabularies but in general it was simplistic and practical [27, 30]. Another advantage of key points representation of trajectories is that it reduces the eﬀect of transient environment factors on composition of trajectory key points. The reason is that it is assumed that in general animal compensates for such deviations caused by environment and it is only apparent in generated path and time taken to destination, while location of destination remains the same [11]. To reiterate again, conceptually, to analysis behavioral states using trajectories, they are converted into series of key points which are assumed to be exchangeable in order. As though, given a point in sequence, any permutation of the subsequent points is simi‐ larly likely. The idea is that, common or similar internal states tend to generate similar key points to comprise a trajectory which could be interpreted as a particular behavior expression exposed in trajectories. In the rest of this section, key point extraction, discriminative and generative models used are presented. 3.1 Key Point Extraction Stay point detection algorithms are commonly used to extract points of interest from trajectories. Since subject remained at proximity of the location for a minimum time, these points could carry semantics such as foraging, sleeping in case of animals and shopping or dining in case of human. These stay points then clustered to discover the hubs with higher signiﬁcance [31]. However, in case of animals, the spatial range and

Understanding Animal Behavior Using Their Trajectories

9

time threshold for stay point detection may vary. Given the logger data sampled in relatively ﬁxed frequency, it is safe to assume that low speed movement along trajectory creates higher spatial densities of samples. By stacking all subjects’ data, spatial loca‐ tions with frequent low speed crossings would automatically create high density regions that would form cluster using density-based clustering methods. As a result, trajectories are converted to sequence of these clusters. To construct the feature matrix from these trajectories of variable length, a selection key points based on popularity and frequency of appearance is chosen as dictionary in trajectories corpus. A toy example is shown in Fig. 3. Due to assumption of bag-of-words model, feature extraction is performed as follows. Each of the key points in the dictionary, assigned an integer id and frequency of its occurrence in the trajectory or token frequency kf as deﬁned in Eq. (3). N is number of trajectories and M is size of key points dictionary. 𝕀(m) is indicator function which shows if ith key point of trajectory Tn was mth key point. n∈N kfm∈M =

∑

𝕀 (m) i∈Tn i

(3)

Fig. 3. A toy example showing density-based clustering of the below speed threshold trajectory points. Then, key points for dictionary are selected based on diversity of the bird id histograms. In this case, key points selected to be A, B and D. Then, for instance, blue dot and orange cross marks trajectories reduce to AD, and BD sequences.

In order to attenuate eﬀects of trivial tokens’ high frequency, kf features are weighted by idf . In context of trajectories, smoothed idf [32] deﬁned as Eq. (4) where tfm is count of the trajectories containing mth key point. idf (m) = log

1+N +1 1 + tfm

(4)

Finally, the resulting feature vectors for each trajectory is normalized by Euclidean norm which produces an M dimensional feature vectors for each of N trajectories.

10

I. Ardakani et al.

3.2 Discriminative Modeling The choices of the classiﬁers are abundant. Naïve Bayes model intuitively explains the probabilistic relationship between hidden states and key points. Considering their simplicity and strong prior assumptions they tend to work well [33]. Using Bayesian theorem, hidden category probability could be modeled as Eq. (5) in which Θ and Π are parameters of the model, H is number of hidden state categories state, T is trajectory set and Kt is key points of trajectory. p(h|Π)p(t|h, Θ) p(h|t, Θ, Π) = ∑ H p(h, t, Θ, Π)

(5)

A priori is modeled as a categorical distribution with parameter Π as Eq. (6). Π has H dimension with sum of components to unity. Hence, each 𝜋h could be interpreted as prior probability of h. Likelihood of a trajectory given h could be modeled as categorical or multinomial distribution shown in Eq. (7) where key points are conditionally inde‐ pendent given hidden state h. Equation (7) shows that each hidden state has its own distribution over key points.

p(h|Π) = p(t|h, Θ) =

∏

𝕀

h∈H

∏ k∈Kt

𝜋hh = 𝜋h ,

∑ h∈H

𝜋h = 1

p(k|h, Θ) ∼ Multinomial(Θ, K)

(6) (7)

Generic form of naïve Bayes classiﬁer is described in Eq. (8). 𝜋h is hidden state category prior, 𝜃k,h is probability of key point belong to hidden.

ĥ = arg maxh∈H 𝜋h

∏ k∈K

𝜃k,h

(8)

Parameters Θ, and Π could be determined by Maximum Likelihood Estimation (MLE) method which result in Eqs. (9) and (10).

𝜋h = ∑

nh

h′ ∈H

𝜃k,h =

nh′

nh,k nh

(9)

(10)

In practice, smoothed version of equations is used which has smoothing prior constant 𝛼 and its size multiple 𝛼K added to numerator and denominator respectively. Another approach for estimation of parameters is Maximum a Posteriori (MAP) estimate method which assigns Dirichlet priors to parameters. Since Dirichlet is conjugate prior of the multinomial distribution the posterior mean estimate has similar form to smoothed version of MLE estimate. Other than naïve Bayes classiﬁer, more complex models like Support Vector Machines (SVM), ensemble and boosted trees could be utilized to predict the hidden

Understanding Animal Behavior Using Their Trajectories

11

states based on extracted feature vectors. It is also possible to further improve perform‐ ance of the naïve Bayes, SVM and tree classiﬁers by calibrating membership probabil‐ ities [34]. Since these classiﬁers are not optimized on the prediction probabilities, they often produced biased class probabilities. This bias is dependent on the method. Zadrpzny et al. [34, 35] proposed remedies to calibrate the class probabilities of the naïve Bayes classiﬁers using histogram methods and tree classiﬁers using smoothing, curtailment, Kearns-Mansour splitting criterion, and Isotonic Regression. In this experi‐ ment, Isotonic Regression which is a non-parametric method and parametric method Platt Scaling [36] as experimented by Niculescu-Mizil et al. [37] are employed to cali‐ brate the class probability of the all used classiﬁers. Since, Isotonic Regression is very prone to overﬁtting, k-fold cross-validation is used for training procedure. To evaluate the performance of the classiﬁers following measures were employed. Matthews corre‐ lation coeﬃcient (MCC) [38] is deﬁned as Eq. (11) for binary classiﬁcation. This score is regarded as balanced measure [39] and weighs performance in true and false positives and negatives (TP, FP, TN, FN ). MCC = √

TP × TN − FP × FN (TP + FP)(TP + FN)(TN + FP)(TN + FN)

(11)

The other used scores for performance evaluation are precision, recall and F1 measure as deﬁned here Eqs. (12–14) precision = recall = F1 =

TP TP + FP

TP TP + FN

2 × precision × recall precision + recall

(12) (13) (14)

3.3 Generative Modeling To infer the latent states of trajectories, LDA [26], a Bayesian probabilistic modelling was used. Like text documents application, there are H particular group of hidden states are assumed which is analogous to number of topics in documents. As mentioned earlier, each of heading states has a multinomial distribution over the all key points as in text document where topics deﬁne distribution over the words in vocabulary. Prior for these multinomial distributions is Π and for each 𝜋h drawn from Dirichlet with hyperparameter of 𝛼. Hence, to generate the trajectories, given trajectory t , 𝜃t is drawn from Dirichlet distribution with hyperparameter 𝛽. Next, for each key point kt of trajectory t, latent state

ht,k drawn from Multinomilal(𝜃t ) and given ht,k, kt is drawn from Multinomial(𝜋ht,k ). Following procedures is summarizes the described generative process.

12

I. Ardakani et al.

1. For each latent state h, draw 𝜋h ∼ Dirichlet(𝛼), h ∈ H 2. For each trajectory t , draw 𝜃t ∼ Dirichlet(𝛽), t ∈ T 3. For each key point k in t : a. Draw ht,k ∼ Multinomial(𝜃t ) b. Draw kt ∼ Multinomial(𝜋ht,k ) Referred to as “Multinomial Principle Component Analysis (PCA)” in [40], this model would provide an unsupervised approach to analyze trajectories. So, this can discover latent structure of the trajectories collection which is beneﬁcial for both predic‐ tion and data exploration. To estimate the posterior as shown in Eq. (15), variational inference methods or sampling methods like Markov Chain Monte Carlo are used. p(h, Θ, Π|k, 𝛼, 𝛽) =

p(h, Θ, Π|𝛼, 𝛽) p(k|𝛼, 𝛽)

(15)

In this experiment an online variational inference method developed by Hoﬀman et al. in [40] is used. The variational objective derived to rely only on word frequency per document which carries intuition of summary of documents based on word counts. Clearly this is applicable in case of trajectories where frequency of the key points count summarizes the hidden state of the subject.

4

Experiment, Results and Discussions

This study’s experiment was performed on a species of seabirds off the cast of Japan called Streaked Shearwater Calonectris leucomelas. The dataset was provided by Yoda-lab [41]. The trajectories recorded using logging devices attached to 271 birds of 112 unique nests belong to two separate colonies located on east and west coasts of Japan. Data belongs to the west coast colony composed only a quarter of the bird ids under study. Due to light weight of the logging device it was safe to assume that the attachment of the device had ignorable effects on behavior of the birds. Since data collected with gender and nest infor‐ mation of the birds, supervised modeling of gender segregation in trajectories is plausible. As a result, state differences in generation of trajectories could be inferred. Gender of the birds identified by their gender specific vocal features [42]. This create a ground truth for training of the models used in this study. It is evident that gender distribution in each colony is balanced. Here, two types of key points extracted from the trajectories. One based on stay point definition and the other based on procedures introduced in previous section. In the second method the speed threshold used for speed was 5 km/h. For stay point detection range threshold of 500 m and time threshold of 10 min was used. Then, DBSCAN is used to extract densities of points in 1 km radius neighborhood with 30 neighbors. Densities with minimum 10 unique bird ids were selected as key point dictionary for the corpus of the trajectories. This number is essential to performance of classifiers as adjusts the size and utility of the vocabulary. For instance, if decrease the threshold for stay points to 5, the vocabulary jumps to 155.

Understanding Animal Behavior Using Their Trajectories

13

However, the performance of the classifiers increases while remains below other key point extraction method. Unique bird ids histogram of dictionary key points extracted using speed threshold method and their geospatial locations are shown in Fig. 4. A sample trajec‐ tory with key points is demonstrated in Fig. 5. 50 45 40 35 30 25 20 15 10 0

2 4 5 29 31 35 41 47 49 57 71 73 77 86 88 89 113 147 156 168 184 186 189 195 209 212 237 238 239 245 246 247 248 250 254 258 269 305 312 328 346 364 370 448 481 486 492 497 498 499 565 775 779 1783 1807 1810 1823 1826 1842 1843 1997 2282 2395 2397 2645

5

Key Point Ids Fig. 4. Histogram of unique bird ids at key points extracted using speed threshold method.

Having key points dictionary, the trajectories encoded to sequence of key points. Then tf − idf matrix of trajectories was created. To have a rough estimate of classification performance, vanilla classifiers trained 80% of the dataset and tested on the rest. These results are shown in Table 1. Table 1. Test accuracy of vanilla classiﬁers Key point type Speed threshold/DBSCAN

Stay point/DBSCAN

Classiﬁer Bernoulli naïve Bayes SGD/Hinge Loss/Elastic net Gradient Boosting Ada Boost Bernoulli naïve Bayes SGD/Hinge Loss/Elastic net Gradient Boosting Ada Boost

Accuracy (%) 62.75 68.63 70.59 68.63 66.67 57.58 66.67 72.73

Accuracy of the test predictions ranged between 60%–70% with boosted trees claiming the top performances. These results simply show that trajectory key points carry informa‐ tion about gender of the birds. So, to improve the decision making based on this informa‐ tion, tuning class probabilities was attempted.

14

I. Ardakani et al.

Fig. 5. (top) Extracted key point using speed threshold which has minimum of 10 unique bird ids. (bottom) Extracted key point using stay point method with minimum of 10 unique bird ids. (note) Larger cross signs show higher number of unique bird ids contained in the key point cluster.

As explained in previous section, Platt’s sigmoid [36] and Isotonic Regression [35] methods were used to tune class probabilities. Since Isotonic Regression is prone to over‐ fitting, tuning used 10-fold cross-validation on 80% of dataset. The tuned classifier then tested on the rest of the data. The evaluation of results using, MCC, precision, recall and F1 measures are shown in Fig. 6 and tabulated in Table 2.

Understanding Animal Behavior Using Their Trajectories

15

Fig. 6. (left) Sample of female trajectories. (right) Sample male trajectories. (note) Identiﬁed speed threshold DBSCAN clusters marked by blue and identiﬁed vocabulary key points marked by red . (Color ﬁgure online)

Table 2. Test results of tuned classiﬁers for speed threshold key points Classiﬁer Logistic Naive Bayes Naive Bayes + Isotonic Naive Bayes + Sigmoid SVM SVM + Isotonic SVM + Sigmoid SGD SGD + Isotonic SGD + Sigmoid GBC GBC + Isotonic GBC + Sigmoid ABC ABC + Isotonic ABC + Sigmoid

MCC 0.3297 0.2542 0.3297 0.2542 0.2485 0.3723 0.2887 0.2887 0.2901 0.2173 0.4923 0.5000 0.4876 0.1673 0.3287 0.0599

Precision 0.6786 0.6538 0.6786 0.6538 0.6250 0.6667 0.6452 0.6452 0.6364 0.6400 0.7188 0.7059 0.7500 0.5938 0.6667 0.5833

Recall 0.7037 0.6296 0.7037 0.6296 0.7407 0.8148 0.7407 0.7407 0.7778 0.5926 0.8519 0.8889 0.7778 0.7037 0.7407 0.2593

F1 0.6909 0.6415 0.6909 0.6415 0.6780 0.7333 0.6897 0.6897 0.7000 0.6154 0.7797 0.7869 0.7636 0.6441 0.7018 0.3590

Accuracy 0.6667 0.6275 0.6667 0.6275 0.6275 0.6863 0.6471 0.6471 0.6471 0.6078 0.7451 0.7451 0.7451 0.5882 0.6667 0.5098

It is apparent that the results are improved in certain classifiers. It must be mentioned that one of the significant limiting factors of decision models are lack of ubiquitous key

16

I. Ardakani et al.

points. The same procedure was performed on results of key points extracted using stay point detection method. Similar to vanilla classifiers’ results for stay points method, these obtained lower performance results as listed in Table 3. Table 3. Test results of tuned classiﬁers for stay point/DBSCAN key points Classiﬁer Logistic Naive Bayes Naive Bayes + Isotonic Naive Bayes + Sigmoid SVM SVM + Isotonic SVM + Sigmoid SGD SGD + Isotonic SGD + Sigmoid GBC GBC + Isotonic GBC + Sigmoid ABC ABC + Isotonic ABC + Sigmoid

MCC 0.3699 0.2088 0.2510 0.2280 0.2892 0.2452 0.2280 0.1505 0.1255 0.1356 0.4130 0.2892 0.2370 0.0387 0.2048 0.1659

Precision 0.6857 0.6364 0.6563 0.6667 0.6667 0.6389 0.6667 0.6296 0.6250 0.6364 0.6944 0.6667 0.6800 0.5769 0.6286 0.6176

Recall 0.8276 0.7241 0.7241 0.6207 0.7586 0.7931 0.6207 0.5862 0.5172 0.4828 0.8621 0.7586 0.5862 0.5172 0.7586 0.7241

F1 0.7500 0.6774 0.6885 0.6429 0.7097 0.7077 0.6429 0.6071 0.5660 0.5490 0.7692 0.7097 0.6296 0.5455 0.6875 0.6667

Accuracy 0.6923 0.6154 0.6346 0.6154 0.6538 0.6346 0.6154 0.5769 0.5577 0.5577 0.7115 0.6538 0.6154 0.5192 0.6154 0.5962

Comparing to corpus analysis, size of vocabulary and the corpus are small. This has negative effects on generality of the models. This could be solved by increase in size of corpus. Current dataset equivalent of a corpus with vocabulary size of 65 words and 253 documents in case of speed threshold method and 155 word vocabulary and 257 documents for stay points method. It must be taken into account that the distribution of these key points depends on other factors too. For instance, age can be a determining factor. Unfortunately, current dataset lacks information about the age of the birds. Given importance of the range and speed in classification of the birds’ gender, aging birds may lose their abilities to fly fast and far. Figure 7 shows importance factors of the features for gender classification. It is apparent that the range and speed can have greater weights than key points. This conveys that age should play an effective factor in key points generation. Hence, achieving very high classification results using a small dataset with limited dictionary hints at excessive overfit‐ ting. This was not the case for this experiment. Performance of the classifiers using only key points were capped by an upper bound. Since the main objective of this literature is to explore information capacity of the extracted key points regarding latent state of the subject, further analysis of the features and classification results were performed. To dig more about effectiveness of inclusion of key points in classifiers’ input, feature selection method using mutual information, utilized to examine the performance of the logistic regression against number of selected features. Figure 8 shows the results. As seen in the plot, inclusion of more feature for first 30 percen‐ tiles improve the classifier performance.

Understanding Animal Behavior Using Their Trajectories

17

1.0000 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000 0.0000

MCC

Precision

Recall

F1

Accuracy

Fig. 7. Performance results of tuned classiﬁers for key points extracted using speed threshold. 0.16 0.14 0.12 0.1 0.08 0.06 0.04

0

Speed Range 47 1810 269 245 35 1842 212 247 1823 1826 364 328 312 1843 305 184 248 2395 498 2397 246 57 2645 1997 186 147 775 497 195 1807 5 499 88 209 2 86 237 1783 168 492 370 448 346 31 2282 486 156 113 49 189 779 41 481 239 4 77 29 254 250 258 71 565 238 89 73

0.02

Features Fig. 8. Feature importance of the feature for gender classiﬁcation. It is apparent that speed and range are strong decision factors.

This result, agrees with feature importance plot. As mentioned earlier, it is suspected that other than gender, there other factors contribute to latent state of the subject bird. This is examined by taking advantage of the variational Bayesian encoder model. Here, stochastic LDA model trained using online method purposed by [41] to identify the major compo‐ nents of the dataset. In text analysis, this is used like PCA to identify the words contributing to principle components or major topics. To create a comparable generative model to

18

I. Ardakani et al. PredicƟon Rate 0.7 0.65 0.6 0.55 0.5

Feature PercenƟle Fig. 9. Plot of selected feature percentile versus prediction rate

discriminative model, number of topic for LDA chosen to be 2. Then top key points contri‐ buting to the components were extracted. Figure 9 demonstrates top key points of extracted components using LDA, trajectories of birds of different gender and habitat and vocabulary key points. It is evident that one component has more key points shared with male bird trajectory even though the different gender birds from different habitats have common regions of traverse. It can be perceived that although not with strong margins, key points are gender segregated within species trajectories. This should be counted as a reasonable result since there are many more internal and external factors which also affect the path genera‐ tion of the subject.

5

Conclusions

This study, provided an alternative look at behavioral analysis using trajectories. The temporal causality in the study of the trajectories were loosened and trajectories repre‐ sented by group of independent key points given the subject’s internal state. This condi‐ tional independence allows treatment of trajectories in a more relaxed way. Analogous to sentiment analysis and text processing, bag-of-words model could be applied, which given suﬃcient size of training data and ordinary language model, evidently produces acceptable results. One very important element of this procedure is key point identiﬁ‐ cation and extraction. This means identiﬁcation of major semantics. There is still a lot of room for more studies here, which could lead to introduction new kernel methods for fast and reliable detection of the key points. One notable takeaway from the experiment to be shared is that, after key point extraction process the key points lose their spatial relations and their similarities must be measured in a diﬀerent space. As though two key points spatially close to each other may not carry the similar semantics. Therefore, new points to dataset must be clustered again or a projection function must be designed to transform the new points to semantic space of the key points. An instance is that, even though the feeding region and nests are spatially close, they belong to two remote points in semantical space. This encourages use of kernel-based methods for transformation of the trajectories spatial points to semantic space. Furthermore, the concept of n − gram

Understanding Animal Behavior Using Their Trajectories

19

is also applicable here to model more complex relation between consecutive points in trajectories. Unfortunately, there are disadvantages to the approach used in this study. One, as with most of data driven modeling methods, is its dependency on dataset and lack of generality. As it was seen in the experiment, size of corpus and the class balance had

Fig. 10. (top) Female bird trajectories belonging to a-colony (bottom) Male bird trajectories belonging to t-colony (note) Vocabulary key points marked by blue . First component’s top 10 key points marked by green . Second component is marked by red . (Color ﬁgure online)

20

I. Ardakani et al.

considerable eﬀects on the performance. However, this is likely to be handled to accept‐ able margins by increase in size of corpus. Carefully selection of training data is also helpful. As mentioned earlier, key point extraction methods also play very essential role in ﬁtness of the models while key point extraction methods rely on the dataset features like sampling rate, sparseness, etc. (Fig. 10). For example, directly applying clustering method on the points as used in the experi‐ ment is highly dependent on sampling rate of the trajectories. Though it may not appli‐ cable in other cases with much lower sampling rates. One last downside to the introduced approach is lack of generality for species of diﬀerent attributes. This is also held true in the most language and text processing methods. Learning models of certain language, does not necessarily translate into information about other languages. In the end, here an eﬃcient method to approach data inputs of variable length was introduced, which is also open to semantic methods like negative matrix factorization and tensor factorization. It also can adopt currently available analytical facilities of language modeling, sentiment analysis and text document processing literature. Acknowledgements. This work is supported by JSPS KAKENHI Grant number 16H06536.

References 1. Ye, Y., Zheng, Y., Chen, Y., Feng, J., Xie, X.: Mining individual life pattern based on location history. In: Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, MDM 2009, 18 May 2009, pp. 1–10. IEEE (2009) 2. Miller, J.A.: Using spatially explicit simulated data to analyze animal interactions: a case study with brown hyenas in Northern Botswana. Trans. GIS 16(3), 271–291 (2012) 3. Nathan, R., Getz, W.M., Revilla, E., Holyoak, M., Kadmon, R., Saltz, D., Smouse, P.E.: A movement ecology paradigm for unifying organismal movement research. Proc. Nat. Acad. Sci. 105(49), 19052–19059 (2008) 4. Cagnacci, F., Boitani, L., Powell, R.A., Boyce, M.S.: Animal ecology meets GPS-based radiotelemetry: a perfect storm of opportunities and challenges. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365, 2157–2162 (2010) 5. Demšar, U., Buchin, K., Cagnacci, F., Saﬁ, K., Speckmann, B., Van de Weghe, N., Weiskopf, D., Weibel, R.: Analysis and visualisation of movement: an interdisciplinary review. Mov. Ecol. 3(1), 5 (2015) 6. Adrienko, N., Adrienko, G.: Spatial generalization and aggregation of massive movement data. IEEE Trans. Visual Comput. Graphics 17(2), 205–219 (2011) 7. Ryan, T.J., Conner, C.A., Douthitt, B.A., Sterrett, S.C., Salsbury, C.M.: Movement and habitat use of two aquatic turtles (Graptemys geographica and Trachemys scripta) in an urban landscape. Urban Ecosyst. 11(2), 213–225 (2008) 8. Jaeger, C.P., Cobb, V.A.: Comparative spatial ecologies of female painted turtles (Chrysemys picta) and red-eared sliders (Trachemys scripta) at Reelfoot Lake, Tennessee. Chelonian Conserv. Biol. 11(1), 59–67 (2012) 9. Matsumoto, K., Oka, N., Ochi, D., Muto, F., Satoh, T.P., Watanuki, Y.: Foraging behavior and diet of Streaked Shearwaters Calonectris leucomelas rearing chicks on Mikura Island. Ornithological Sci. 11(1), 9–19 (2012)

Understanding Animal Behavior Using Their Trajectories

21

10. Tiunov, I., Katin, I., Lee, H., Lee, S., Im, E.: Foraging areas of streaked shearwater Calonectris leucomelas nesting on the Karamzin Island (Peter the Great Bay, East Sea). J. Asia-Paciﬁc Biodivers. 11(1), 25–31 (2017) 11. Goto, Y., Yoda, K., Sato, K.: Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean. Sci. Adv. 3(9), e1700097 (2017) 12. Wikelski, M., Kays, R.W., Kasdin, N.J., Thorup, K., Smith, J.A., Swenson, G.W.: Going wild: what a global small-animal tracking system could do for experimental biologists. J. Exp. Biol. 210(2), 181–186 (2007) 13. Getz, W.M., Saltz, D.: A framework for generating and analyzing movement paths on ecological landscapes. Proc. Nat. Acad. Sci. 105(49), 19066–19071 (2008) 14. Fryxell, J.M., Hazell, M., Börger, L., Dalziel, B.D., Haydon, D.T., Morales, J.M., McIntosh, T., Rosatte, R.C.: Multiple movement modes by large herbivores at multiple spatiotemporal scales. Proc. Nat. Acad. Sci. 105(49), 19114–19119 (2008) 15. Gao, P., Kupfer, J.A., Zhu, X., Guo, D.: Quantifying animal trajectories using spatial aggregation and sequence analysis: a case study of diﬀerentiating trajectories of multiple species. Geogr. Anal. 48(3), 275–291 (2016) 16. Studer, M., Ritschard, G., Gabadinho, A., Müller, N.S.: Discrepancy analysis of state sequences. Sociol. Meth. Res. 40(3), 471–510 (2011) 17. Gabadinho, A., Ritschard, G., Mueller, N.S., Studer, M.: Analyzing and visualizing state sequences in R with TraMineR. J. Stat. Softw. 40(4), 1–37 (2011) 18. Brillinger, D.R., Preisler, H.K., Ager, A.A., Kie, J.G.: An exploratory data analysis (EDA) of the paths of moving animals. J. Stat. Planning Infer. 122(1), 43–63 (2004) 19. Bowyer, R.T.: Sexual segregation in Southern Mule deer. J. Mammal. 65(3), 410–417 (1984) 20. Stewart, K.M., Fulbright, T.E., Drawe, D.L., Bowyer, R.T.: Sexual segregation in white-tailed deer: responses to habitat manipulations. Wildlife Soc. Bull. 1, 1210–1217 (2003) 21. Beier, P.: Sex diﬀerences in quality of white-tailed deer diets. J. Mammal. 68(2), 323–329 (1987) 22. Yoda, K., Shiomi, K., Sato, K.: Foraging spots of streaked shearwaters in relation to ocean surface currents as identiﬁed using their drift movements. Prog. Oceanogr. 31(122), 54–64 (2014) 23. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, vol. 96(34), pp. 226–231, 2 August 1996 24. Wu, H.C., Luk, R.W., Wong, K.F., Kwok, K.L.: Interpreting tf-idf term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13 (2008) 25. Blei, D., Carin, L., Dunson, D.: Probabilistic topic models. IEEE Sig. Process. Mag. 27(6), 55–65 (2010) 26. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993– 1022 (2003) 27. Da Silva, N.F., Hruschka, E.R., Hruschka, E.R.: Tweet sentiment analysis with classiﬁer ensembles. Decis. Support Syst. 31(66), 170–179 (2014) 28. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC 2010, vol. 10, May 19 2010 29. Aldous, D.J.: Exchangeability and related topics. In: Hennequin, P.L. (ed.) École d’Été de Probabilités de Saint-Flour XIII — 1983. LNM, vol. 1117, pp. 1–198. Springer, Heidelberg (1985). https://doi.org/10.1007/BFb0099421 30. Go, A., Huang, L., Bhayani, R.: Twitter sentiment analysis. Entropy 6(17), 252 (2009) 31. Zheng, Y., Zhang, L., Xie, X., Ma, W.Y.: Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th International Conference on World Wide Web 2009, pp. 791–800. ACM, 20 April 2009

22

I. Ardakani et al.

32. Scikit-learn manual. Section 4.2: Feature extraction. http://scikit-learn.org/stable/modules/ feature_extraction.html#text-feature-extraction. Accessed 24 Jan 2018 33. Zhang, H.: The optimality of naive Bayes. AA 1(2), 3 (2004) 34. Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classiﬁers. In: ICML 2001, vol. 1, pp. 609–616, 28 June 2001 35. Zadrozny, B., Elkan, C.: Transforming classiﬁer scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699. ACM, 23 July 2002 36. Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classiﬁers 10(3), 61–74 (1999) 37. Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 625–632. ACM, 7 August 2005 38. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Struct. 405(2), 442–451 (1975) 39. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classiﬁer for imbalanced data using Matthews Correlation Coeﬃcient metric. PLoS One 12(6), e0177678 (2017) 40. Hoﬀman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 856–864 (2010) 41. YODA Lab Homepage. Ethology and Ecology, Nagoya University. http://yodaken.sakura.ne.jp/yoda_lab/Home.html. Accessed 24 Jan 2018 42. Arima, H., Sugawa, H.: Correlation between the pitch of calls and external measurements of Streaked Shearwaters Calonectris leucomelas breeding on Kanmuri Island. Jpn. J. Ornithol. 53(1), 40–44 (2004)

Visualization of Real World Activity on Group Work Daisuke Deguchi1(B) , Kazuaki Kondo2 , and Atsushi Shimada3 1

Information Strategy Oﬃce, Information and Communications, Nagoya University, Nagoya, Japan [email protected] 2 Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan [email protected] 3 Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan [email protected]

Abstract. Group work is widely introduced and practiced as a method to achieve the learning goal eﬃciently by collaborating group members. However, since most types of group works are carried out in the real environment, it is very diﬃcult to perform formative assessment and real time evaluation without students’ feedbacks. Therefore, there is a strong demand to develop a method that supports evaluation of group work. To support evaluation of group work, this paper proposes a method to visualize the real world activity during group work by using ﬁrst person view cameras and wearable sensors. Here, the proposed method visualizes three scores: (1) individual attention, (2) hand visibility, (3) individual activity. To evaluate the performance and analyze the relationships between scores, we conducted experiments of “Marshmallow challenge” that is a collaborative work to construct a tower using marshmallow and spaghetti within a limit of time. Through the experiments, we conﬁrmed that the proposed method has potential to become a evaluation tool for visualizing the activity of the group work.

Keywords: Visualization

1

· Real world activity · Group work

Introduction

Group work is widely practiced as a method to improve the quality of education and to encourage students to explore and solve problems together with members who have diﬀerent abilities and thoughts. In addition, it is well known that it can enhance various skills such as oral communication, leadership, etc. Therefore, various institutions in higher education try to introduce group work for improving the quality of the classes. In terms of learning analytics, Computer Supported Collaborative Learning (CSCL) [1,2] has attracted an attention, and c Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 23–37, 2018. https://doi.org/10.1007/978-3-319-91131-1_2

24

D. Deguchi et al.

computer oriented virtual environment is commonly used as a tool for quantitative evaluations. Unfortunately, since natural communications and collaborations between students are diﬃcult to be observed in CSCL, there is a strong demand to extend CSCL into real world group work. However, since major aspect of group work is to enhance the skills through real world activities, it is important to evaluate it on-line without students’ feedbacks such as questionnaire. From these points of view, it is expected to be developed technologies to measure and evaluate the real world activity on group work. To solve the above problem, this paper proposes a method to measure and to visualize scores that can support evaluation of real world activities on group work. Here, this paper focuses on evaluation of group work in the view of attention, collaboration, and activity. For the visualization of each member’s attention during group work, the number of persons existing in the visual ﬁeld is one of the choices to measure the level of individual attention. Since we usually tend to talk and to collaborate with members by keeping them within our own visual ﬁeld, the number of persons in our visual ﬁeld will reﬂect the degree of our paying attention. In addition, if discussion between members becomes active, their distance will be close and the size of the person in their visual ﬁeld will become large. Therefore, measuring the size of the person inside the ﬁrst person view (FPV) image that corresponds to the visual ﬁeld will be another choice. For the visualization of collaboration during group work, one of the considerable features is “using hand” information. This will be a strong feature to characterize a working situation especially for an assembling work. Hands are not only used for assembling an object but also its preparation. In a group work, hands are also used for showing examples of assembling methods and drawing ﬁgures for explaining ideas. Since these hand behaviors are performed with looking at a hand region in most cases, visibility of self-hand in FPV images can tell us “using hand” situation. In addition, visibility of other persons’ hands will be a meaningful feature. This is because when only other person’s hands appeared in someone’s FPV image, it can be considered that he or she did not contribute to the work physically and they were only looking at the working behavior from outside. From these points of view, simultaneous existence of self-hand and other persons’ hands will represent a cooperative working situation for assembling and supporting with hands, and handover of a material/tool, etc. Therefore, a transition of those hand visibility patterns according time would tell an abstract process of a target group work, and a comparison of the features among group members will be a good way to analyze their contributions and roles in the group. On the other hand, wearable devices such as smartwatches have become popular worldwide [5]. A smartwatch generally has a three-axis accelerometer and three-axis gyroscope, which enable accurate measurement of motions. In a group work, sensing of hand motions is important to understand individual activities over time [6]. For example, a larger hand movement will be observed during explanation with gesture, working on the table, or writing on the white board. Meanwhile, small hand movement will be observed when he/she is listening to

Visualization of Real World Activity

25

others, watching the activities of other people, etc. Therefore, through the analytics of hand motion, it will be possible to investigate the action property of each person. From the above points of view, the purpose of this paper is to propose a method for visualizing three scores: (1) individual attention, (2) hand visibility, (3) individual activity, by using FPV camera and wearable activity sensors. In addition, these three scores were evaluated through the experiments of “Marshmallow challenge” that is a collaborative work to construct a tower using marshmallow and spaghetti within a limit of time. Section 2 describes details of the proposed method. Then, Sect. 3 gives detailed setup of group work experiments and speciﬁcations of FPV camera and wearable activity sensors. Also, visualization results and detailed discussions are explained. Finally, the paper is concluded in Sect. 4.

2

Methods

As described in the previous section, this paper focuses on visualization method for supporting evaluation of real world activities on group work. In this point of view, the proposed method visualizes three scores: (1) individual attention, (2) hand visibility, (3) individual activity. Following sections describe detailed processes to extract these three scores from FPV images and wearable sensor data. 2.1

Individual Attention Score

Object Detection and Segmentation from FPV Image. During group work, each participant watches various objects, such as group members, tools for assembling task, etc. During group work, each participant would watch objects indispensable to achieve the goal. Therefore, it can be considered that the occurrence of these objects would reﬂect an attention of each participant. From this point of view, the proposed method detects objects from FPV image. In addition, the mask of each detected object is extracted by image segmentation technique. To detect objects from an FPV image, the proposed method uses YOLO [7] that is trained by COCO dataset [8]. Here, YOLO can detect objects more than 40 frames per seconds, and 80 categories (person, chair, table, etc.) can be handled in the detection simultaneously. However, YOLO can only output bounding boxes and their categories for objects appeared in an FPV image. Therefore, detailed silhouette of a person cannot be obtained. Since the size of bounding box changes arbitrary according to the posture of each person, silhouette information will be important to estimate attention accurately. To overcome this problem, the proposed method introduces PSPNet [9] that gives pixel level segmentation from an input image. Here, the proposed method uses PSPNet trained by Pascal VOC 2012 dataset [10]. Both YOLO and PSPNet are constructed by CNN-based deep learning networks. Figure 1 shows an example of results obtained by YOLO and PSPnet.

26

D. Deguchi et al.

(a) Detected objects by YOLO

(b) Object mask by PSPNet

Fig. 1. Examples of results by applying YOLO and PSPNet.

Calculation of Attention Score. As the attention score, this paper proposes two types of metrics: (1) the number of visible persons inside the ﬁeld of view, (2) the area of person accounting for the visual ﬁeld. First of all, the proposed method obtains the set of bounding boxes by using YOLO. Let p be a bounding box detected as a person, and P = {p} be a set of person’s bounding boxes. In addition, the area of each person p is obtained by taking intersection of p and the person mask estimated by PSPNet. Based on the above features, the number of visible persons inside the ﬁeld of view is calculated as 1 (1) P1 = p∈P

And, the area of largest person accounting for the visual ﬁeld is calculated as. maxp∈P p P2 = p∈P p

(2)

Finally, P1 and P2 are averaged over 15 s to reduce measurement noises. 2.2

Hand visibility score

Hand Detection. Figure 2 shows a ﬂowchart of the hand detection and recognition processes. Although several hand detection methods for FPV images have been proposed [11,12], these methods assume that only camera wearer’s hands are visible in the view. Therefore, the proposed method selects CNN-based hand classiﬁer named EgoHands [13] that allows existence of another person’s hands on an FPV image. The training data for EgoHands are hand/non-hand regions annotated in FPV videos that are captured in the situations of two persons sitting face-to-face and playing card games, chess games and jigsaw puzzles on a table between them. Those situations involve physical interactions via hands, and thus these are similar to the setting of our group work for object assembling. Since the classiﬁer only recognizes input images into hand/non-hand categories, candidates of hand windows must be cropped from input images before applying the classiﬁer. However, the terrible number of windows can be cropped from

Visualization of Real World Activity Input image

Skin color region

27

Detected hands

CNN

CNN

Hand owner recognition

Human region

Recognized hands

Fig. 2. Hand detection ﬂow

an entire image for classiﬁcation. To reduce false positives, the locations of the cropping windows are restricted onto overlapping areas of skin color regions and the human regions given from Sect. 2.1. Hand Owner Recognition. Unfortunately, accuracy of EgoHands classiﬁer is not enough for recognizing self/other’s hand correctly. The reason comes from diversity of geometric relationship among persons in the scene. That in our group work situation is quite diverse because we did not force the participants to sit face-to-face and allowed them to behave freely including standing, walking, reaching hands from someone’s side, looking into working space from someone’s behind, and so on. In order to handle such general cases, how arms reach from detected hands (perms) are estimated using the human regions, and used as an additional information for recognizing hand’s owner. Finally, the proposed method applies a mixture of a simple rule-based and likelihood-based method corresponding to following features on FPV image. – – – – –

locations and sizes of hands directions of arms reaching from hands root locations of arms (cross points of arm lines and image edges) crossing information among arms the number of self-hands (up to two)

Calculation of Hand Visibility Score. This paper proposes three types of hand related scores reﬂecting characteristics of collaboration on group work activity. Let S(i, j, p, t) be an area of detected hand where i, j, p, t denote personal index in a group, hand index on an FPV image, hand owner, and time, respectively. The ﬁrst feature Hg is a sum of hand areas over group members, which is expected to reﬂect degrees of hand operations in the group, and it is

28

D. Deguchi et al.

formulated as Hg (t) =

1 Simage

i

j

S(i, j, p, t),

(3)

p

where Simage is the area of the input image. In addition, other two features Hd and Hs are formulated as 1 Hd (i, t) = Simage S(i, j, p, t) j p . (4) 1 Hs (i, t) = Simage j S(i, j, selfhand , t) These are sums of hand/self-hand areas in individual FPV image. Those would tell us hand working degrees of each group member. 2.3

Individual Activity Score

Preprocessing. A smartwatch can measures three-axis accelerometer values (ax (τ ), ay (τ ), az (τ )) and three-axis gyroscope values (wx (τ ), wy (τ ), wz (τ )) at time τ . First of all, a row pass ﬁlter is applied to reduce noises in the original signals. Here, let ak (τ ) be a value of axis k ∈ {x, y, z}, and the moving average of the latest three values are calculated by a ¯k (τ ) =

ak (τ ) + ak (τ − 1) + ak (τ − 2) . 3

(5)

In the next, the moving average is subtracted from current value to remove gravitational bias. ¯k (τ − 1). (6) a ˆk (τ ) = ak (τ ) − a Then, integrated values of accelerometer at time t is acquired by IA(t) = (|ˆ ax (τ )| + |ˆ ay (τ )| + |ˆ az (τ )|) .

(7)

τ ∈t

Note that the discrete time t is deﬁned to integrate recent observations. In our implementation, we deﬁne the t to be a second, and recent 50 observations are integrated into IA(t) since the measurement works at 50 Hz. On the other hand, an integrated value of three-axis gyroscope is simply calculated by a formula without row pass ﬁltering as (|wx (τ )| + |wy (τ )| + |wz (τ )|) . (8) IW (t) = τ ∈t

Finally, the further integration of two integrated values gives the amount of activity at time t. I(t) = IA(t) + IW (t) (9)

Visualization of Real World Activity

29

Calculation of Activity Score. The amount of activity strongly depends on individual person, and it is not easy to compare the amount among participants. Therefore, the proposed method focuses on the individual activity how it changes over time. Let C(t) be a normalized cumulative value of the activity I(t). The cumulative activity score C(t) is calculated by t ≤t I(t ) . (10) C(t) = t I(t) Another score to grasp the characteristics of individual person is the incremental amount of C(t) during a speciﬁc time period. The score is simply calculated by subtracting the score at end points of the time period. Let H be the length of time period, the incremental score D(t) is given by D(t) = C(t + H/2) − C(t − H/2).

(11)

Here, H = 60 is used in our implementation. A1

A2

A3

A4

6 5 4 3 2

1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(a) The number of visible persons inside the field of view P1 A1

A2

A3

A4

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(b) the area of largest person accounting for the visual field P2

Fig. 3. Individual attention scores of group A. The horizontal axis and vertical axis are time (sec) and score, respectively.

30

3 3.1

D. Deguchi et al.

Experiments and Discussions Dataset Construction

Subjective experiments were conducted to construct a dataset for evaluation of the eﬀectiveness of the proposed method. Here, we conﬁgured a cooperative development work of a tower construction using marshmallow and spaghetti, called as “Marshmallow challenge”. This group work requires to build a tower within a limit of time, and collaboration with group members will play a key role to build higher tower. In the experiments, eight university students joined as group activity members. They were divided into two groups of “Group A” and “Group B”, each of which consists of four members. In the following sections, members of “Group A” are denoted as A1, A2, A3, and A4, respectively. Members of “Group B” are also denoted as the same manner. A university staﬀ performed as a facilitator conducting the activity. Each group used the same number of marshmallow and pasta to build a tower within eighteen minutes. As an experimental setup, each participant attaches small ﬁrst person view camera (GoPro HERO 4/5) on the head and two wrist type watch sensors B1

B2

B3

B4

6 5 4 3 2

1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(a) The number of visible persons inside the field of view P1 B1

B2

B3

B4

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(b) the area of largest person accounting for the visual field P2

Fig. 4. Individual attention scores of group B. The horizontal axis and vertical axis are time (sec) and score, respectively.

Visualization of Real World Activity

31

implementing inertial measurement units (IMU) on their both arms. In addition, video cameras were ﬁxed in the environment and they recorded perspective view of the activity to conﬁrm the overall progress of the group work. 3.2

Group Work Analysis

Individual Attention Score. Figures 3 and 4 show individual attention scores calculated from FPV images in group A and B, respectively. From the comparison between Figs. 3(a) and 4(a), the number of visible persons inside the visual ﬁeld is quite diﬀerent between group A and B. This tendency can also be observed from the comparison between Figs. 3(b) and 4(b). Since participants of group A met together for the ﬁrst time in this experiment, they tended to watch each face for checking other’s thoughts. On the other hand, since participants of group B are colleagues in the same laboratory, they could behave as they want from the unspoken agreements in each other. From these result, attention score can be a metric for measuring relationships between participants. From Fig. 4(a) and (b), it can be conﬁrmed that attention score of B4 sometimes drop down to zero. This indicates B4 did not watch other participants during the group work, and there is a possibility that B4 was concentrated on 0.8 0.6 0.4 0.2 0 0

100

200

300

400

500

600

700

800

(a) Total score Hg 0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0 0

100

200

300

400

500

600

700

800

0

100

200

300

(b) A1

400

500

600

700

800

500

600

700

800

(c) A2

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0 0

100

200

300

400

(d) A3

500

600

700

800

0

100

200

300

400

(e) A4

Fig. 5. Hand visibility scores for Group A. The horizontal axis corresponds elapsed time (seconds) from the start of the group work. Figures (b)–(e) are individual scores of Hd (black line) and Hs (red line) for each participant. (Color ﬁgure online)

32

D. Deguchi et al. 0.8 0.6 0.4 0.2 0 0

100

200

300

400

500

600

700

800

(a) Total score Hg 0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0 0

100

200

300

400

500

600

700

800

0

100

200

300

(b) B1

400

500

600

700

800

500

600

700

800

(c) B2

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0 0

100

200

300

400

(d) B3

500

600

700

800

0

100

200

300

400

(e) B4

Fig. 6. Hand visibility scores for Group B. The horizontal axis corresponds elapsed time (seconds) from the start of the group work. Figures (b)–(e) are individual scores of Hd (black line) and Hs (red line) for each participant.

assembling a tower or distracted by something. From this result, it may be useful to check the concentration level during the group work. Hand Visibility Score. Figures 5 and 6 show the three types of scores Hg , Hd , Hs related to hand visibility of the group A and B. Workﬂow in “Marshmallow challenge” starts from discussion of the way for assembling and actual operation. As seen in Fig. 5, Hg values of the group A keep high value in the former half duration. This is because the group A participants were planning the strategy by repeating small trials as shown in Fig. 7(a). However, as seen in Fig. 7(b), small Hg values are observed around 100 s, because they used white board for discussing the strategy. Since they started assembling a tower with all participants as shown in Fig. 7(c), highest Hg values are observed around 630 s. This is also conﬁrmed from Hd values of each participant. On the other hand, since group B participants spent most of time for discussion as shown in Fig. 8(a), they showed diﬀerent Hg transition from the group A. The hand visibility scores also enable us to analyze role of each participant. Hs in the group A tell that A2 was a main person who For example, the ratios H d assembles the object and other three participants tended to look at his work, as seen in Fig. 7(d).

Visualization of Real World Activity

(a) 240 sec.

(b) 100 sec.

(c) 630 sec.

(d) 818 sec.

33

Fig. 7. Representative scenes of the group A’s work. Each scene is shown by a tiled FPV images. The top-left, top-right, bottom-left, and bottom-right portions correspond to view’s of participants A1, A2, A3, and A4, respectively.

(a) 135 sec.

(b) 725 sec.

(c) 86 sec.

(d) 257 sec.

Fig. 8. Representative scenes of the group B’s work. The top-left, top-right, bottomleft, and bottom-right portions correspond to view’s of participants B1, B2, B3, and B4, respectively.

34

D. Deguchi et al. A1

A2

A3

A4

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(a) Cumulative activity scores A1

A2

A3

A4

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(b) Incremental scores

Fig. 9. Cumulative activity scores of group A. The horizontal axis and vertical axis are time (sec) and score, respectively. (Color ﬁgure online)

Individual Activity Score. Figures 9 and 10 show the cumulative activity score C(t) and the incremental score D(t) of each group, respectively. The dashed line in green color is a reference line, which represents a pseudo-situation of constant activity over time. For example, if a series of cumulative activity scores is above the reference line at the early part of the work period, it indicates that he/she often moved his/her hands more frequently compared with latter part of the period. In the case of group A, three participants of A1, A2 and A4 performed average activities during the early part up to 360 s Around 360 s, A4 started to write his idea on the whiteboard, meanwhile other people were working and discussing around the table. In this group work, the motion of writing something on the whiteboard was larger than one observed during discussion and building a tower on the table. A series of B4’s activity scores (in group B) suggests diﬀerent characteristics compared with above mentioned. The score curve is below the reference line during early part, then the score was drastically getting large during the latter part, so called a last spurt type. In fact, B4 moved around the table and helped other people’s activities after 660 s. Through the comparison between two groups, ﬁrstly we found out that the variance of C(t) is quite diﬀerent from each other. The curves of group B seem widely spread compared with those of group A. We guess that this situation was

Visualization of Real World Activity B1

B2

B3

35

B4

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(a) Cumulative activity scores B1

B2

B3

B4

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

0.1 0 0

60

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020

(b) Incremental scores

Fig. 10. Cumulative activity scores of group B. The horizontal axis and vertical axis are time (sec) and score, respectively.

caused by the condition whether the group members were acquainted with each other, or not. In fact, the participants in group B were in the same laboratory, so that they would understand the roles such as a facilitator in the group, a worker, a supporter, and so on. On the other hand, participants in group A met for the ﬁrst time on the experimental date, so that they took a wait-and-see attitude, especially, during the early part of group work. 3.3

Analysis of Score Combinations

Motion and People in the Field of View. The combination of motion analytics and visual analytics, especially focusing on people counting, how many people existing in the ﬁeld of view enables us to understand more detailed activities. For example, the simultaneous observation of higher activity score with less visible people would suggest a situation when the person is working alone or writing something on the whiteboard. In contrast, another combination of higher activity with existence of some people in the view is inferred that the person is working or discussing together with his/her gesture. Visual analytics supports the understanding of activity situation which cannot be distinguished by motion analytics. Meanwhile, motion analytics can give essential information whether he/she is contributing to group work, or just standing/listening.

36

D. Deguchi et al.

Hand Visibility and Motion. The simultaneous usage of the hand visibility scores and the hand motion scores, e.g. combinations of visible hand + enough motion or no visible hand + no motion, provides more conﬁdent estimation of hand working existence. Additionally, we conﬁrmed that it can make ambiguous estimation by a single modality certain one. For example, while hand working of A1 was diﬃcult to be detected from his FPV images as discussed above, it can be estimated by hand motion. In opposite, a combination of visible self-hand and no motion sometimes corresponds to “supporting object” contribution.

4

Conclusions

Although group work is widely practiced and introduced to many institutions as a method to improve the quality of education and to encourage students to explore and solve problems, it is currently very diﬃcult to perform its evaluation in real time without students’ feedbacks. To tackle this problem, this paper proposes a method to visualize the real world activity during group work by using ﬁrst person view cameras and wearable sensors. As the evaluation score for the group work, the proposed method visualizes three scores: (1) individual attention, (2) hand visibility, (3) individual activity. To evaluate the performance and analyze the eﬀectiveness of the scores, we conducted experiments of “Marshmallow challenge” that is a collaborative work to construct a tower using marshmallow and spaghetti. First person view camera and wearable IMU sensors were used for recording activities of group work and used for calculating the scores. From the detailed analysis, we conﬁrmed that the proposed method had potentials to become an evaluation metric of real world group work. Future works will include automatic recognition of more detailed status of individual participant, improvement of detection/recognition accuracy, evaluation by many more cases. Acknowledgement. Parts of this research were supported by JSPS KAKENHI Grant Number 16K12786.

References 1. Baker, M., Lund, K.: Promoting reﬂective interactions in a CSCL environment. J. Comput. Assist. Learn. 13(3), 175–193 (1997) 2. Suthers, D.D.: Technology aﬀordances for intersubjective meaning making: a research agenda for CSCL. Int. J. Comput. Support. Collaborative Learn. 1(3), 315–337 (2006) 3. Damsa, C.I., Kirschner, P.A., Andriessen, J.E.B., Erkens, G., Sins, P.H.M.: Shared epistemic agency: an empirical study of an emergent construct. J. Learn. Sci. 19(2), 143–186 (2010) 4. Ogata, H., Matsuka, Y., Moushir, E.M., Yano, Y.: LORAMS: capturing, sharing and reusing experience by linking physical objects and videos. In: Proceedings of the Workshop on Pervasive Learning, pp. 34–42 (2007)

Visualization of Real World Activity

37

5. Filippeschi, A., Schmitz, N., Miezal, M., Bleser, G., Ruﬀaldi, E., Stricker, D.: Survey of motion tracking methods based on inertial sensors: a focus on upper limb human motion. Sensors 17(6), 1257 (2018) 6. Deguchi, D., Kondo, K., Shimada, A.: Subjective sensing of real world activity on group study. In: The Eighth International Conference on Collaboration Technologies (CollabTech 2016), pp. 5–8 (2016) 7. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: uniﬁed, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2016), pp. 779–788 (2016) 8. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll´ar, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1 48 9. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2017), pp. 2881–2890 (2017) 10. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303– 338 (2010) 11. Li, C., Kitani, K.M.: Pixel-level hand detection in ego-centric videos. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR2013) (2013) 12. Zhu, X., Liu, W., Jia, X., Wong, K.K.: A two-stage detector for hand detection in ego-centric videos. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV) (2016) 13. Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of IEEE International Conference on Computer Vision (ICCV2015) (2015) 14. Lee, S., Bambach, S., Crandall, D.J., Franchak, J.M., Yu, C.: This hand is my hand: a probabilistic approach to hand disambiguation in egocentric video. In: Proceedings of IEEE Int. Conference on Computer Vision and Pattern Recognition Workshops (CVPRW2014) (2014)

A Multi-level Localization System for Intelligent User Interfaces Mario Heinz1 ✉ , Sebastian Büttner1, Martin Wegerich2, Frank Marek2, and Carsten Röcker1,3 (

1

)

University of Applied Science Ostwestfalen-Lippe, 32657 Lemgo, Germany {mario.heinz,sebastian.buettner, carsten.roecker}@hs-owl.de 2 ISI-Automation GmbH & Co. KG, 32699 Extertal, Germany {m.wegerich,f.marek}@isi-automation.de 3 Fraunhofer IOSB-INA, 32657 Lemgo, Germany

Abstract. The localization of employees in the industrial environment plays a major role in the development of future intelligent user interfaces and systems. Yet, localizing people also raises ethical, legal and social issues. While a precise localization is essential for context-aware systems and real-time optimization of processes, a permanently high localization accuracy creates opportunities for surveillance and therefore has a negative impact on workplace privacy. In this paper, we propose a new concept of a multi-level localization system which tries to ﬁnd a way to meet both the technical requirements for a localization with a high accuracy as well as the interests of employees in terms of privacy. Depending on the users’ location, diﬀerent localization technologies are used, that restrict the accuracy to the least required level by design. Furthermore, we present a prototypical implementation of the concept that shows the feasibility of our multilevel localization concept. Using this system, intelligent systems become able to react on employees based on their location without permanently monitoring the precise user location. Keywords: Indoor localization · Intelligent user interface · Process planning

1

Introduction

Driven by the ongoing digitization and automation in the industrial sector, we are currently experiencing a signiﬁcant increase in the complexity of production plants and manufacturing processes. This growing complexity requires the development of intel‐ ligent user interfaces in order to support workers in the regulation and execution of production processes [1]. The term intelligent hereby describes the ability of a user interface to extensively adapt to a usage context, i.e. to a speciﬁc user, task and available tools. The development of intelligent user interfaces requires detailed information about the environment and the location of existing dynamic entities. Therefore, a central requirement for the realization of such intelligent user interfaces will be an eﬀective and robust indoor localization of dynamic entities such as machines, vehicles, tools, material © Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 38–47, 2018. https://doi.org/10.1007/978-3-319-91131-1_3

A Multi-level Localization System for Intelligent User Interfaces

39

boxes and workers. This kind of information can be used to deﬁne environmental situa‐ tions and to regulate production processes. The localization of dynamic entities is already working on a coarsely level today, i.e. based on light barriers or radio-frequency identiﬁcation (RFID) readers. However, for the development of future intelligent user interfaces, which aim to support people in the industrial context, more detailed contextual information is neces‐ sary than currently available. For the ideal assistance of users, better sensors and a comprehensive data base are needed to exploit the potential of intelligent user interfaces. While there are a lot of technical systems are available, which oﬀer a large number of diﬀerent entities to be localized, the localization of people in the working environment poses numerous ethical, legal and social questions [2–4]. The localization and identiﬁ‐ cation of users in the environment of such systems is often avoided in order not to conﬂict with the aforementioned aspects. In many cases, however, this extensive localization would be desirable to provide user interfaces that have some intelligence. For example, an automatic rescheduling of processes could occur as soon as the position of certain employees indicates that a process is not working according to plan. Based on this rescheduling, in the future, assistance systems could automatically order employees to diﬀerent locations in plants and provide support for the elimination of process devia‐ tions. Before these future scenarios can be realized, however, privacy-preserving local‐ ization concepts have to be developed which meet both the technical requirements and the interests of employees [5]. In this paper, we take a ﬁrst step towards this development: we propose a concept for a multi-level localization system that limits the accuracy of the localization to a necessary level depending on the location of a user. To demonstrate the feasibility of our concept, we describe a ﬁrst implementation that has been carried out and tested in the SmartFactoryOWL [6]. The remainder of the paper is organized as follows: in part two, we will give an overview on privacy aspects and indoor localization technologies. In part three we present our concept for a localization system to locate workers in industrial surroundings in a reasonable manner, while preserving their privacy as best as possible. In part four we discuss the details of the implementation of a ﬁrst prototype. Finally, we give a summary and an outlook on future work in part ﬁve.

2

Related Work

2.1 Workplace Privacy The continuous digitization and the consequent integration of new technologies in indus‐ trial environments allows companies to perform a comprehensive collection and recording of production-related and employee-related data. From an employer’s perspective, this provides significant benefits for optimizing productivity, enhancing security, and safe‐ guarding the interests of a company [7]. But, from an employee’s point of view, this kind of data collection is very likely to be seen as a form of monitoring which opens up various possibilities for performance measurements and other evaluations [8, 9]. These circum‐ stances result in a general conflict with respect to the privacy of employees at the

40

M. Heinz et al.

workplace, which has already been investigated and discussed in numerous publications both from legal [8–10], ethical [11, 12] and technological perspectives [7, 13]. In this context, privacy can be seen as a sphere of freedom and anonymity in which an individual can move and act freely without having to justify his activities to others. From a legal perspective, the claim to privacy in the workplace is enshrined in different depths in the legislation of different countries. Thus, the case law in European countries contains a much higher and more clearly defined claim to privacy than, for example, in the US [7]. However, the question arises whether the relevant case-law needs to be adapted to modern circumstances, such as the increasing digitization in industrial environments [8]. In the future, a certain trade-off will be necessary in order to meet both the interests of the employees in terms of privacy at the workplace as well as the requirements of employers for an effectively usable localization of workers in the industrial environment. 2.2 Localization in Industrial Environments Over the years, numerous localization technologies have been developed with focus on the localization of dynamic entities in industrial environments. Besides the localization of vehicles, boxes, tools and other materials, most of these systems also technically permit the detection of persons in industrial environments [14]. Many different localization technologies have been developed and evaluated and existing localization technologies for the detection and tracking of dynamic entities in the industrial environment have been described and discussed in various publications [15–18]. According to [16], localization technologies used in the industrial sector can be classified into one of three basic categories: wireless-communication localization technologies or wave propagation localization technologies [14], dead reckoning localization technologies or even motion sensing localization technologies [19] and scene analysis localization tech‐ nologies [20]: The group of wireless-communication localization technologies, also known as wave propagation localization technologies, include systems based on different radio technolo‐ gies such as Wi-Fi, ultra-wideband (UWB) or Bluetooth, as well as infrared technologies and ultrasound technologies. These systems use the characteristics of wave propagation, e.g. the phase or angle of a signal to determine the distance between transmitters and receivers. The systems differ by using active and passive tags as well as by the number of available sensors to determine the position. The group of dead reckoning localization technologies, or motion sensing localization technologies, on the other hand, include localization systems based on inertial measure‐ ment units (IMU), which are usually integrated in mobile devices. These measuring units use the data of various motion sensors such as acceleration sensors and gyroscopes as well as digital compass sensors to determine a localization based on the detected movements. Finally, the group of localization technologies based on scene analysis methods includes systems that capture the characteristics of an environment via video streams or electromagnetic sensors. Localization is done by performing pattern recognition methods based on comparative data. Even though especially video camera-based systems can offer a wide range of applications for indoor localization, they potentially give employees the impression of permanent supervision by the employer.

A Multi-level Localization System for Intelligent User Interfaces

41

2.3 Privacy-Aware Localization in Industrial Environments From a technical perspective, most of the existing localization technologies for indoor localization in industrial environments are usable for the tracking of employees. While from a technological perspective a high precision of a localization system is desirable, the design of a localization system might be influenced by requirements counting against this high accuracy. Localization of people within an industrial environment is a sensitive topic in terms of privacy and the possibilities of locating employees creates a huge conflict of interests: on the one hand, the information about the position of employees inside an indus‐ trial area can be used to optimize production processes, so from a perspective of work organization a high localization accuracy is required. On the other hand, the system has to take into account the need of the employees in terms of not being observed (or surveilled). A user study presented in [21] shows that privacy is a huge concern, when designing local‐ ization systems. Users of localization system stated that they “wished to have complete control over the visibility of their location” [21]. That implies from a human-centered design perspective that localization systems have to be designed in a way to communicate to its’ users the current state of observation possibilities. Given the mentioned conflict and design recommendations from previous user studies, it is remarkable, that only few publications take into account, how to design localization systems for privacy-awareness. The current state-of-art for the localization of persons in the industrial environment are systems based on tags or identity cards using radio-frequency identification (RFID) tech‐ nology. These tags or cards have to be actively swiped by user at a particular reader unit to register the user at a certain location. A similar, but technically different, approach is the use of barcodes, that have to be scanned by a user to indicate a location [22]. The mentioned technologies are particularly suitable for the localization of employees in order to record the presence or absence in a production area without determining the exact posi‐ tion. Since they require active gestures, users are possibly aware of being registered at a certain location. One location system explicitly designed for privacy was the Cricket location-support system, a mobile system based on radio and ultrasonic beacons [23]. The system does not have a central management. Instead, mobile devices determine their own location and users can control, if this information should be shared to other instances. The decentral nature of the system could be used for industrial localization systems as well; however, it requires explicit interactions by the user to share a particular location in a similar way to the use of RFID smartcards or tags.

3

Conceptual Approach

Based on the previously described context, a concept for a privacy-preserving localiza‐ tion of workers in industrial environments should consider at least the following aspects: The accuracy of localization (1), the access to localization data (2), the location of data processing (3), duration for storing localization information (4) and the privacy of workers (5).

42

M. Heinz et al.

In order to limit the accuracy of a localization, our concept for a privacy-aware localization of workers in industrial environments follows a multigranular approach. This means that the accuracy of the localization is technically reduced to a necessary level and adapted to the speciﬁc location of a worker. Such an adjustment could theo‐ retically be realized by a software-side limitation of a localization system with high accuracy. With regard to the interests of the employees, however, we propose a hard‐ ware-side limitation by implementing multiple levels of localization based on diﬀerent technologies (multi-level design). While the speciﬁc number of localization levels is directly related to the individual usage context, we suggest a multi-level localization system to include at least three levels to cover the relevant requirements for the devel‐ opment of intelligent user interfaces (Fig. 1): • Level 1: The ﬁrst level of localization is intended to capture the presence or absence of employees in a production area without tracking the movements within the envi‐ ronment. This localization can be implemented in the form of an identiﬁcation process, which is carried out when entering or leaving the speciﬁc area. • Level 2: The second level of localization is intended to track the movement of workers across large-scale environments in order to detect their presence in speciﬁc areas such as the immediate environment of an automated production plant. • Level 3: The third level of localization is intended to capture the exact position and the viewing direction of a person in a spatially limited area. The captured data can be used to customize user interfaces in the viewing direction of a worker in order to provide him/her with relevant information.

Fig. 1. Example for the diﬀerent levels of the multi-level localization system.

Regarding the access to the localization information, we propose the communication to be initialized and controlled by the localization systems. In this way, any external access to the raw data will be prevented. Furthermore, we suggest the ﬁrst stages of data processing to be implemented on side of the localization systems to limit the information density. Additionally, the data retention period should be kept relatively short in order to prevent automatic adaptions of production systems based on outdated data.

A Multi-level Localization System for Intelligent User Interfaces

4

43

Prototypical Implementation of a Multi-level-Localization System

In order to evaluate our concept, we implemented a prototypical multi-level localization system based on three diﬀerent localization technologies: An RFID reader (ﬁrst level), an UWB real time localization system (second level) and an optical system based on a depth cameras (third level). The system is installed for evaluation purposes inside the SmartFactoryOWL, a demonstration factory for industrial automation and digitization in Lemgo, Germany [6]. 4.1 First Level of Localization For the ﬁrst level of localization, we used a system based on a RFID reader and person‐ alized tags or identity cards to detect the presence or absence of workers in the production area. The system was implemented based on a Raspberry Pi (RPi) and a RFID breakout board (MFRC522), which is connected to the RPi via the general-purpose input/output (GPIO) pins. The RFID reader is located at the entry to the manufacturing area. The system was integrated into the local area network of the SmartFactoryOWL to commu‐ nicate with a central logistics system (see Subsect. 4.4) (Fig. 2).

Fig. 2. The prototypical RFID reader (left) and the associated tags and identity cards (right).

4.2 Second Level of Localization For the second level of localization, we used an Ubisense UWB real-time localization system which is installed inside the SmartFactoryOWL. It consists of eight sensors, which are mounted under the roof of the production area (Fig. 3 left), and multiple active tags (Fig. 3 right). The data from the eight sensors are transferred via one speciﬁc root sensor to a Linux-based webserver, where the location messages are decrypted and stored within an SQL database. This database is queried by an RPi (we reuse the RPi from the ﬁrst level of localization) and the latest data is retrieved and provided as a data stream to the central logistics system (see Subsect. 4.4).

44

M. Heinz et al.

Fig. 3. A sensor of the UWB-system (left) and the associated tags (right).

4.3 Third Level of Localization According to our concept, the third level of localization is used to capture the position and the viewing direction of a user in a spatial limited area. The system is implemented using a Microsoft Kinect V2 depth-camera and a face-tracking algorithm provided by the Microsoft Face Basics API. The camera is mounted on top of a display at one of the demonstrators, which shows the current state of the factory. Depending on the current user/users (e.g. shop ﬂoor worker, management) information is displayed. The facetracking can be used to enable an interaction with the data visualizations, e.g. accordion panels can be expanded by looking at them. In order to identify the persons to be captured in the viewing area of the camera and to distinguish them from other persons not to be captured by the overall system, it was necessary to transfer the collected position data into the coordinate system of the UWB system used for the second level of localization system in order to make them compa‐ rable. For this purpose, the exact position of the camera in the coordinate system of the UWB system was determined as part of a test measurement by means of a UWB tag and oﬀset with the position data. On this basis, a comparison of the positions of the UWB tags with the positions of the captured faces can achieve a unique assignment (Fig. 4).

Fig. 4. Area of the third level localization in the SmartFactoryOWL (left). The localization is done by using a Kinect depth camera system on top of a display (right).

A Multi-level Localization System for Intelligent User Interfaces

45

4.4 Integration and Central Logistic System (ISIPlus®) In order to create a realistic industrial scenario, we used a commercial logistic software tool (ISIPlus®) from ISI-Automation GmbH & Co. KG1 that aggregates all localization data. The logistic system collects and visualizes the positioning data from the diﬀerent localization levels. The communication between the localization systems and the ISIPlus® system was implemented via individual TCP socket connections (see Fig. 5). In order to restrict any external access to the localization information, the communica‐ tion channels were initialized by the localization systems. We used a general data struc‐ ture in order to handle the incoming data.

Fig. 5. Overview of the communication of the overall system.

Fig. 6. Prototypical user interface (localization manager) of the ISIPlus® system at the demonstrator in the SmartFactoryOWL.

Figure 6 shows the “localization manager” – a prototypical user interface with a table-based and a graphical-based visualization of the present users detected by the system, their positions and their current localization level. Using this interface, people

1

https://www.isi-automation.com .

46

M. Heinz et al.

can be located with a necessary accuracy. Internally, the data can be used in the ISIPlus® system to optimize logistic processes or production planning.

5

Summary and Outlook

In this paper, we presented our concept and a ﬁrst prototype implementation for a multilevel localization system for intelligent user interfaces. We showed the feasibility of such a system that restricts the localization accuracy according to the position of a user in order to ensure workplace privacy on one hand and localization opportunities for the implementation of intelligent user interfaces on the other hand. In addition to the already existing user interface, we will implement several other potential applications in a prototypical way in a next step and evaluate the overall system within user studies to get insight about its usability and perceived user experience. In addition, the system is to be integrated in the future as a demonstrator in the leadership of the SmartFactoryOWL, through which experts from research and industry, as well as other interested persons on the subject of industrial digitization and automation can inform. On the basis of a long-term written survey of this specialist audience, further ﬁndings are to be collected for further optimization of the concept. Acknowledgement. This work is funded by the German Federal Ministry of Education and Research (BMBF) within the context of the top-level cluster “Intelligente Technische Systeme OstWestfalenLippe (it’s OWL)” for project “Verbundprojekt: Nachhaltigkeitsmaßnahme Technologietransfer (itsowl-TT); Teilprojekt: Durchführung fokussierter Transferprojekte; Transferprojekt: Multi-Level-Lokalisierung von Nutzern für Intelligente Benutzerschnittstellen (itsowl-TT-IUILocal)” under grant number 02PQ3062. We thank our colleague Henrik Mucha for the visualization of our concept (Fig. 1)

References 1. Fellmann, M., Robert, S., Büttner, S., Mucha, H., Röcker, C.: Towards a framework for assistance systems to support work processes in smart factories. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2017. LNCS, vol. 10410, pp. 59–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66808-6_5 2. Lahlou, S., Langheinrich, M., Röcker, C.: Privacy and trust issues with invisible computers. Commun. ACM 48(3), 59–60 (2005) 3. Sack, O., Röcker, C.: Privacy and security in technology-enhanced environments: exploring users’ knowledge about technological processes of diverse user groups. Univ. J. Psychol. 1(2), 72–83 (2013) 4. Röcker, C., Feith, A.: Revisiting privacy in smart spaces: social and architectural aspects of privacy in technology-enhanced environments. In: Proceedings of the International Symposium on Computing, Communication and Control (ISCCC 2009), pp. 201–205 (2009) 5. Röcker, C., Hinske, S., Magerkurth, C.: Information security at large public displays. In: Gupta, M., Sharman, R. (eds.) Social and Human Elements of Information Security: Emerging Trends and Countermeasures, pp. 471–492. IGI Publishing, Niagara Falls (2009)

A Multi-level Localization System for Intelligent User Interfaces

47

6. Büttner, S., Mucha, H., Robert, S., Hellweg, F., Röcker, C.: HCI in der SmartFactoryOWL – Angewandte Forschung & Entwicklung. Mensch und Computer 2017, Workshopband (2017) 7. Mitrou, L., Karyda, M.: Bridging the gap between employee surveillance and privacy protection. In: Social and Human Elements of Information Security: Emerging Trends and Countermeasures, pp. 283–300. IGI Global, New York (2009) 8. Levinson, A.R.: Industrial justice: privacy protection for the employed. Cornell J. Law Public Policy 18, 609–688 (2008) 9. Kovach, D., Kenneth, A., Jordan, J., Tansey, K., Framiñan, E.: The balance between employee privacy and employer interests. Bus. Soc. Rev. 105(2), 289–298 (2000) 10. Nord, G.D., McCubbins, T.F., Nord, J.H.: E-monitoring in the workplace: privacy, legislation, and surveillance software. Commun. ACM 49(8), 72–77 (2006) 11. Kaupins, G., Minch, R.: Legal and ethical implications of employee location monitoring. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS 2005). IEEE Press (2005) 12. Zieﬂe, M., Röcker, C., Holzinger, A.: Medical technology in smart homes: exploring the user’s perspective on privacy, intimacy and trust. In: Proceedings of the IEEE 35th Annual Computer Software and Applications Conference Workshops (COMPSACW 2011), pp. 410– 415. IEEE Press (2011) 13. Röcker, C.: Social and technological concerns associated with the usage of ubiquitous computing technologies. Issues Inf. Syst. 11(1), 61–68 (2010) 14. Gu, Y., Lo, A., Niemegeers, I.: A survey of indoor positioning systems for wireless personal networks. IEEE Commun. Surv. Tutorials 11(1), 13–32 (2009) 15. Stojanović, D., Stojanović, N.: Indoor localization and tracking: methods, technologies and research challenges. Autom. Control Robot. 13(1), 57–72 (2014). Facta Universitatis 16. Liu, H., Darabi, H., Banerjee, P., Liu, J.: Survey of wireless indoor positioning techniques and systems. IEEE Trans. Syst. Man Cybern. 37(6), 1067–1080 (2007) 17. Zhang, D., Xia, F., Yang, Z., Yao, L., Zhao, W.: Localization technologies for indoor human tracking. In: Proceedings of the 5th International Conference on Future Information Technology (FutureTech 2010), pp. 1–6 (2010) 18. Roeper, D., Chen, J., Konrad, J., Ishwar, P.: Privacy-preserving, indoor occupant localization using a network of single-pixel sensors. In: Proceedings of the 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2016), pp. 214–220 (2016) 19. House, S., Connell, S., Milligan, I., Austin, D., Hayes, T.L., Chiang, P.: Indoor localization using pedestrian dead reckoning updated with RFID-based ﬁducials. In: Proceedings of the Annual International Conference of the Engineering in Medicine and Biology Society (EMBC 2011), pp. 7598–7601. IEEE (2011) 20. Taneja, S., Akcamete, A., Akinci, B., Garrett Jr., J.H., Soibelman, L., East, E.W.: Analysis of three indoor localization technologies for supporting operations and maintenance ﬁeld tasks. J. Comput. Civil Eng. 26(6), 708–719 (2011) 21. Smailagic, A., Kogan, D.: Location sensing and privacy in a context-aware computing environment. IEEE Wirel. Commun. 9(5), 10–17 (2002) 22. Büttner, S., Cramer, H., Rost, M., Belloni, N., Holmquist, L.E.E.: φ 2: Exploring physical check-ins for location-based services. In: Adjunct Proceedings of the 12th ACM International Conference on Ubiquitous Computing, pp. 395–396. ACM (2010) 23. Priyantha, N.B., Chakraborty, A., Balakrishnan, H.: The cricket location-support system. In: Proceedings of the 6th Annual International Conference on Mobile Computing and Networking, pp. 32–43. ACM (2000)

Survey on Vision-Based Path Prediction Tsubasa Hirakawa1 , Takayoshi Yamashita1 , Toru Tamaki2(B) , and Hironobu Fujiyoshi1 1 Chubu University, Aichi 487-0027, Japan [email protected], {yamashita,hf}@cs.chubu.ac.jp 2 Hiroshima University, Hiroshima 739-8527, Japan [email protected]

Abstract. Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the environment surrounding the target and the internal state of the target, need to be estimated from the video in addition to predicting paths. Many prediction approaches that include understanding the environment and the internal state have been proposed. In this survey, we systematically summarize methods of path prediction that take video as input and extract features from the video. Moreover, we introduce datasets used to evaluate path prediction methods quantitatively. Keywords: Path prediction Datasets

1

· Trajectory · Pedestrian · Survey

Introduction

Path prediction is the task of estimating the path, or trajectory, along which a target (e.g., a pedestrian or vehicle) will move. Predicting paths from video is an important task receiving much attention as it is expected to have many potential applications, such as surveillance camera analysis, self-driving cars, and autonomous robot navigation. Path prediction has to estimate much more information—such as information of the surrounding environment, moving direction, and status of prediction targets—than other simple image recognition tasks. As a result, prediction methods are often built on top of other computer vision tasks, such as pedestrian detection [1,2], pedestrian attribute recognition [3], and semantic segmentation [4]. Moreover, in the prediction task, future observations of predicted paths are not available. In tasks of pedestrian detection and tracking, observations from the past to the present are used to locate and track the target in the current frame of the video. In contrast, the prediction task localizes and predicts the locations of the target in future frames of the video, using observations made until the present time and prior information on the surrounding environment and knowledge of the target motion. c Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 48–64, 2018. https://doi.org/10.1007/978-3-319-91131-1_4

Survey on Vision-Based Path Prediction

49

Fig. 1. Overview of path prediction, modiﬁed from [6].

Path prediction has been studied for decades in the ﬁeld of robotics. At stations and airports, robots need to move without interfering with the many people present [5] and to plan a path of eﬃcient motion in the environment. Path prediction is necessary to achieve such tasks. However, in addition to information from cameras, robots are able to use information from many types of sensor, such as a LIDAR sensor, to obtain the three-dimensional (3D) geometry of the scene. The environment in which the robot can move around is sometimes explicitly given as an environment map. The present survey is of path prediction methods involving video only as a computer vision task. There is an alternative task called early recognition, which predicts future human behaviors in video. This task predicts future actions in the video but is excluded from the survey because the predicted categories are discrete whereas predicted paths are sequences of continuous locations. As the task of path prediction in the ﬁeld of computer vision is diﬃcult and challenging, a number of various methods have been proposed. A common approach is shown in Fig. 1. As input, a video (or a frame of video) is given in addition to the location of the target in the current frame or a sequence of locations over the past frames of several seconds. Features useful for prediction are then extracted from the video (or frames) to predict the path in future frames. There are two important parts to the overview of Fig. 1: (b) feature extraction where many features are extracted to understand the environment and target; and (c) path prediction where a variety of methods are proposed, categorized into four types. In this paper, we survey path prediction methods taking video as input and systematically summarize feature extraction and prediction approaches and datasets used for evaluation. We explain feature extraction methods in Sect. 2 and categorize prediction methods in Sect. 3. In Sect. 4, we review datasets used in evaluating the performance of path prediction. We conclude the survey in Sect. 5.

50

T. Hirakawa et al. Table 1. Categories of feature extraction for path prediction. Feature

Types

Methods

Environment Scene label

Stacked hierarchical labeling [7] Superpixel-based MRF [8] Fully convolutional networks [9, 10] Cost Bag of visual words Spatial matching network [11] Global scene feature Pre-trained AlexNet [12] Siamese network [13]

Target

Location Direction Attribute Feature vector

2

HOG + SVM detector [14] Bayesian orientation estimation [15] Orientation network [11] AlexNet-based multi-task learning [16] Mid-level patch features [17]

Feature Extraction from a Video

This section introduces methods of feature extraction from video for path prediction. The path that the pedestrian takes is implicitly aﬀected by many factors of the surrounding environment and the status of the pedestrian his self or herself. The performance of path prediction is expected to be improve when using information that largely determines how the pedestrian decides the way to go. Given the video, such information is extracted prior to the prediction. Table 1 presents information extracted from video for path prediction. Such information can be broadly categorized into that of (1) the environment and (2) the target. 2.1

Environmental Features

The pedestrian decides the way and walks along a path while being aﬀected by the surrounding environment. For example, we usually walk along the sidewalk while avoiding obstacles on the way (e.g., parked cars and trash cans) and drive a car on the roadway as is common social practice. The movement of the target is dynamically aﬀected by the environment, and environmental features are therefore extracted from the video. Semantic segmentation [18–21] is a task of assigning an object class to each pixel, which is the most common task in understanding the environment in the ﬁeld of computer vision. Semantic segmentation can be conducted to estimate where obstacles exist in the scene and where there are regions available for walking. Kitani et al. [18] assumed that pedestrian paths are mainly aﬀected by the physical environment, such as sidewalks, roadways, ﬂower beds, and buildings, and predicted posterior probabilities of each label using hierarchical segmentation [7] as shown in Fig. 2. These probabilities are used as feature vectors to

Survey on Vision-Based Path Prediction

51

Fig. 2. Examples of environmental attributes [18]

form scene feature maps, which are used for path prediction. Rehder et al. [20] used segmentation results obtained using a fully convolutional network [9,10] for prediction. Alternative approaches do not explicitly use environmental features aﬀecting paths but implicitly represent probabilities of paths as cost (or reward) functions [11,22]. These methods create cost maps of the entire scene from cost functions independently estimated for each superpixel. Walker et al. [22] searched for patches that have similar texture from training samples using a nearest-neighbor approach, and assigned the costs of the training samples to superpixels to generate cost maps of the scene. Huang et al. [11] proposed a convolutional neural network (CNN) called the spatial matching network, which estimates rewards of local regions by comparing similarity between the patch of the target and surrounding superpixel patches. Yet another approach represents the scene as a single feature vector, whereas the above approaches extract local features from superpixels. Assuming that similar scenes prompt similar paths, this approach retrieves similar scenes in a training dataset with feature vectors to predict paths using the paths of the retrieved scenes. To this end, CNNs are usually used to eﬃciently extract scene feature vectors because of the recent success of deep learning architectures. In predicting paths in ﬁrst-person video, Park et al. [23] used AlexNet [12] to extract features when retrieving scenes, and transferred paths of the retrieved scenes for prediction. Su et al. [24] used an AlexNet-based Siamese network [13] to retrieve features. 2.2

Target Features

While environmental features strongly aﬀect the target in terms of the path decision, internal factors of the target are also important. Speciﬁcally, attributes of the target, such as age, gender, and internal demand, aﬀect the path decision. We herein introduce methods for extracting target features. The most common target feature is the orientation of the target [11,16,25] because the estimated orientation can be used to predict in which direction the target is going. In other words, the orientation constrains the moving direction

52

T. Hirakawa et al.

(a)

(b)

Fig. 3. Estimation of head orientation [25]. (a) Detection of heads and bodies of pedestrians. (b) Estimation of the orientation of the head in eight directions.

of the target and thus reduces errors of prediction. Kooij et al. [25] detected pedestrians employing a histogram of oriented gradients (HOG) and support vector machine (SVM) [14] and estimated the head orientation [15] to predict the path of a pedestrian in front of a car on which a camera was mounted, focusing on whether the pedestrian will stop before stepping forward onto the roadway as shown in Fig. 3. If the head faces the camera, then the pedestrian is assumed to notice the car and is predicted to slow down or stop before the roadway. Physical attributes, such as age and gender, are also important to prediction. When walking in places where there are a number of people, pedestrians take actions to avoid colliding with each other. Aspects of such avoidance—when and where pedestrians start to avoid others—are diﬀerent for pedestrians of diﬀerent age and gender; e.g., a younger person walks faster and responds more rapidly to others than senior people. Wei et al. [16] used AlexNet to estimate the orientation, age, and gender of pedestrians as multi-task learning. Estimated attributes are used in deciding the walking speed of pedestrians. Walker et al. [22] proposed unsupervised path prediction by extracting midlevel feature vectors directly from patches containing the target, instead of direct attributes.

3

Prediction Methods

Path prediction follows feature extraction from video. Table 2 summarizes methods of prediction, categorized according to their approach. This section describes each category and its properties. 3.1

Bayesian Models

The ﬁrst approach uses online Bayes ﬁlters, such as Kalman ﬁlters (KFs) and particle ﬁlters, and infers the model to predict paths. Such modeling introduces internal states and observations as variables, and deﬁnes probabilistic models by

Survey on Vision-Based Path Prediction

53

Table 2. Categories of path prediction methods Category

Paper Year Method

Scene

Input

Output

Feature Env. Target

Bayesian

Energy minimization

DL

IRL

Others

[26]

2013 KF

Car

Coord. Coord.

[25]

2014 DBN

Car

Video

Coord.

[19]

2016 DBN

Top view

Video

Coord.

[27]

2013 Dijkstra

Top view

Video

Distribution

[22]

2014 Dijkstra

Surveillance Video

Distribution

[11]

2016 Dijkstra

Surveillance Image Distribution

[28]

2016 CNN

Surveillance Coord. Coord.

[29]

2016 LSTM

Top view

Coord. Coord.

[30]

2017 LSTM

Top view

Coord. Coord.

[31]

2017 LSTM

Top view

Coord. Coord.

[21]

2017 RNN Enc.-Dec Car

Video

Coord.

[18]

2012 IRL

Top view

Video

Distribution

[32]

2016 IRL

Top view

Video

Distribution

[33]

2016 IRL

First person Video

Distribution

[34]

2017 IRL

First person Video

Distribution

[16]

2017 IRL

Surveillance Image Distribution

[20]

2017 IRL

Car

Video

Distribution

[35]

2014 Optical flow

Car

Video

Coord.

[36]

2015 Markov process Car

Video

Coord.

[23]

2016 Data driven

First person Video

Coord.

[24]

2017 Data driven

First person Video

Coord.

[37]

2011 Social force

Top view

Video

Coord.

[38]

2016 Social force

Top view

Video

Coord.

assuming that the observations are the internal states contaminated by noise. This approach iterates the prediction step that computes the current internal states from the previous states, and the update step that updates the current states with the observations. In a common setup, internals states are actual coordinates of pedestrians, and observations are coordinates obtained by pedestrian detection. This is person tracking if we apply the approach to track from the past to present, and path prediction if we only repeat the prediction step to obtain the sequence of coordinates of the pedestrian, without the update step; i.e., there are no future observations. Schneider et al. [26] used the extended KF to update the internal state of the pedestrian in front of a car. This was an early work of path prediction and showed what kind of primitive information (e.g., the walking speed and acceleration) is useful for path prediction. Instead of using online Bayes ﬁlters, some works have used a dynamic Bayesian network (DBN) [19,25]. Kooij et al. [25] considered a more restricted case; estimating if the pedestrian will walk across a roadway in front of a car on

54

T. Hirakawa et al.

Fig. 4. Graphical model of a DBN with an SLDS [25]

which a camera is mounted. They deﬁned a DBN model with a switching linear dynamical system (SLDS) that is shown in Fig. 4 and that uses features extracted from the movie, such as the pedestrian’s head orientation, distance to the car, and distance between the pedestrian and roadway. This method performs better than using coordinates of pedestrian detection only. 3.2

Energy Minimization

The Bayes approach described above is on-line in which it estimates the coordinates of the pedestrian frame by frame in the video. Another (oﬀ-line or batch) approach is an energy minimization approach that estimates the entire sequence of coordinates at the same time. This approach constructs a two-dimensional grid graph of the scene and assigns costs for moving to edges in the graph, and then ﬁnds the combination of edges that gives the minimum energy. This is formulated as a shortest path problem solved employing the Dijkstra method. The prediction accuracy is therefore largely aﬀected by how the cost is deﬁned. Huang et al. [11] proposed a path prediction method using a single image. First, a patch containing the target is extracted to estimate the orientation of the target. Next, the cost for moving across the location of the patch is estimated by comparing the texture of surrounding patches. In addition to this cost, the estimated orientation of the target is used as a constraint and added to the edge weights. Walker et al. [22] compared the texture of superpixels using patches along the path that the target traced without involving any training procedure. Appearance information (texture) of the scene can be used to deﬁne the cost function, but objects in the scene can also be used. Xie et al. [27] assumed that pedestrians have decided their goal (e.g., a food trunk) according to their potential demands (hunger), and deﬁned cost maps where the pedestrians are attracted to objects in the scene. 3.3

Deep Learning

Deep learning methods such as those involving the CNN and long short-term memory (LSTM) have been used for path prediction since the emergence of deep

Survey on Vision-Based Path Prediction

55

Fig. 5. Overview of RL, modiﬁed from [38].

learning frameworks. Methods of this type take as input the series of coordinates of the target over the last several frames, and produce a series of target coordinates in several successive frames. Feature extraction, described in the last section, is not explicitly performed as feature extraction and prediction are not explicitly separated in deep learning models. Several methods thus use LSTM to deal with paths, which are sequences of two-dimensional coordinates, have been proposed. Alahi et al. [29] proposed the social-pooling (S-pooling) layer for avoiding collisions between pedestrians. A pedestrian is represented by LSTM, and hidden layer outputs of LSTMs of other people are connected to the S-pooling layer of the pedestrian. This layer allows the LSTM of the pedestrian to represent the spatial relationship with nearby people (e.g., the distance to each other), and thus predict the path avoiding collision. LSTM has a limitation of long-term memory; i.e., paths in the distant future are diﬃcult to predict. Fernando et al. [31] assumed the necessity of more elaborate long-term memory, and proposed the tree memory network that hierarchically selects useful information of the past stored in memory cells and performs better than other LSTM models. Besides LSTM, the CNN is also used to directly make predictions. Yi et al. [28] proposed the behavior-CNN that predicts the future path from the past path. This method ﬁrst creates three-dimensional sparse data whose channels store the pedestrian two-dimensional coordinates of the last several frames. The sparse 3D data are encoded using convolution and pooling layers and then decoded using deconvolution layers. They also added location bias maps to each channel of encoded information to account for diﬀerent behaviors at diﬀerent locations in the scene, such as the locations of entrances and obstacles. 3.4

Inverse Reinforcement Learning

The three approaches above are examples of supervised or unsupervised learning, while the approach presented here is an example of reinforcement learning (RL).

56

T. Hirakawa et al.

RL learns a policy to decide actions to be taken by an agent under the current status in an environment. RL is usually deﬁned as a Markov decision process that learns the optimal policy to allow the agent to take the best actions maximizing the reward. Figure 5 shows that an agent of RL is the target of prediction, an environment is the scene given as video, a status is the pedestrian location, and an action is the movement of the pedestrian. RL needs to deﬁne the reward of the action of moving from one state to another, which indicates how good the action taken by the agent is. However, it is diﬃcult to explicitly deﬁne the reward function for practical problems such as the path prediction task. This problem is called the reward design problem, and inverse reinforcement learning (IRL) is one approach taken to solve the problem. IRL estimates rewards that reproduce optimal sequences of actions, and decides actions of the agent in the test phase with the estimated reward so that the agent can take similar actions. IRL has been used to learn and control the optimal motion of robots [5]. Kitani et al. [18] ﬁrst introduced IRL to vision-based path prediction. Instead of estimating target locations, they estimated actions that the agent may take at a certain time or location, and predicted possible paths by sequentially applying the estimated actions to the current target location. This task is therefore called activity forecasting, in contrast to path prediction that directly estimates locations of the target in the future. Activity forecasting is a much more complex and challenging task than path prediction while it has great potential in terms of having a variety of predictions adapted to each possible application. Kitani et al. [18] assumed that the physical attributes of a scene strongly aﬀect pedestrian paths, and used scene attributes estimated by semantic segmentation as feature maps. Rewards of each scene attribute are deﬁned by the inner product of the feature maps and weight vectors, and the optimal weights are estimated from training data. For prediction, a sequence of actions that arrive at the predeﬁned goal is generated by giving the goal and the current location of the target pedestrian. Lee et al. [32] used a similar approach to predict paths of football players in a game video. Wei et al. [16] introduced a game theory called ﬁctitious play to predict paths of multiple pedestrians who arrive at a goal while avoiding collisions between pedestrians. Without any predeﬁned goals, Rehder et al. [20] proposed the destination network to estimate the goal of the target using the last several frames. The estimated goal and the environmental attributes obtained using a fully convolutional network were used to predict pedestrian paths. For ﬁrst-person vision, Bokhari et al. [33] used objects held by a person and the object states to predict goals in the future. While this work considered a limited scene (e.g., a kitchen), Rhinehart et al. [34] dealt with wider areas, such as a home including a kitchen, bathroom, and living room. 3.5

Other Approaches

Most prediction methods can be categorized into one of the four approaches described above, but there are other approaches.

Survey on Vision-Based Path Prediction

57

Fig. 6. Prediction from ﬁrst-person videos; (left) [23], (right) [24].

The social force model [39] assumes energy called a “social force” that acts between pedestrians and objects in the scene, and generates pedestrian movement through interaction via the force. Yamaguchi et al. [37] proposed a model with additional states, such as the preferences of pedestrians, walking speeds, goals, and the existence of other people walking together. This work was motivated by a desire to improve the accuracy of pedestrian tracking, but performed path prediction to evaluate the proposed model. Robicquet et al. [38] proposed social forces of multiple classes for avoiding collisions. They estimated “social sensitivity features” using the distances between other people, and applied Kmeans clustering of the features to get several clusters of avoidance behaviors. The cluster of the target behavior of avoidance was estimated using the target feature, and paths of the cluster were then projected back to the scene for prediction. Optical ﬂow extracted from car-mounted cameras was used by Keller et al. [35] to predict pedestrian paths. They used optical ﬂows over the last several frames and computed orientation histograms as motion features of pedestrians. The sequence of histograms was used to retrieve similar scenes in the training set, and paths of the retrieved scenes were then mapped back to the scene for prediction. The use of the Markov process framework was proposed by Rehder et al. [36]. They used normal and von-Mises distributions to represent the state (location) and speed of the pedestrian, and sequentially estimated the state by taking products of these distributions at each time step for prediction. To improve accuracy, the goal of the pedestrian was estimated from environmental attributes to constrain the direction of motion. The retrieval-based approach shown in Fig. 6 was proposed by Park et al. [23] to predict the future path in a video showing the ﬁrst-person view. They ﬁrst extracted scene features using AlexNet and then found similar scenes in the training set by comparing extracted features. Paths of retrieved training samples were mapped onto the video. They predicted paths even in scenes with occlusions by estimating regions behind occluding objects, such as walls and obstacles. Su et al. [24] extended this work to the prediction of multiple basketball players

58

T. Hirakawa et al.

Fig. 7. Datasets and results of prediction, taken and modiﬁed from [16, 18, 21, 23, 25, 28, 29, 31, 34, 38].

in a game scene. In one ﬁrst-person video, they estimated the region of “joint attention” to which multiple players commonly paid attention. Multiple paths were predicted by selecting the optimal path of each player and by minimizing an objective function deﬁned by the estimated joint attention region, locations of players, and paths projected back to the scene.

4

Datasets

This section brieﬂy introduces datasets used to evaluate path prediction methods. Various datasets have been used as shown in Table 3 and Fig. 7. The diversity of datasets is due to the diﬃculty of using a single universal dataset for many diﬀerent conditions, e.g., diﬀerent numbers of scenes and paths needed for learning and diﬀerent types of scenes. We therefore categorize datasets into four categories in terms of the viewpoint of the camera. 4.1

Videos of Entire Scenes

The most commonly used type of dataset is video that captures the entire scene taken by a wide-angle camera (for surveillance) at stations and market places. These datasets are usually used to evaluate pedestrian tracking methods; however, they are also used in evaluating path prediction because sequences of pedestrian locations are given as the ground truth.

Survey on Vision-Based Path Prediction

59

Table 3. Comparison of datasets Year URL #People Viewpoint

#Scenes Other targets

Additional information

UCY [40]

2007 1

786

Top view

3

–

–

ETH [41]

2009 2

750

Top view

2

–

–

Edinburagh informatics forum [42]

2009 3

95,998

Top view

1

–

–

Stanford drone [38]

2016 4

11,216

Top view

8

Bikers, – skateboarders, cars, buses, golf carts

VIRAT [6]

2011 5

4021

Surveillance 11

Car, bike

Object coordinates, activity category

Town centre [43]

2011 6

230

Surveillance 1

–

Head coordinates

Grand central station [44]

2015 7

12,600

Surveillance 1

–

–

Daimler [26]

2013 8

68

Car

–

–

Stereo camera

KITTI [45]

2012 9

6336

Car

–

Car

Stereo camera, LIDAR, Map

–

First person 26

–

Stereo

EgoMotion [23] 2016 –

First-person 2017 – – First person 17 – Object continuous information activity [34] 1: https://graphics.cs.ucy.ac.cy/research/downloads/crowd-data 2: http://www.vision.ee.ethz.ch/en/datasets/ 3: http://homepages.inf.ed.ac.uk/rbf/FORUMTRACKING/ 4: http://cvgl.stanford.edu/projects/uav data/ 5: http://www.viratdata.org/ 6: http://www.robots.ox.ac.uk/∼lav/Papers/benfold reid cvpr2011/benfold reid cvpr2011. html 7: http://www.ee.cuhk.edu.hk/∼xgwang/grandcentral.html 8: http://www.gavrila.net/Datasets/Daimler Pedestrian Benchmark D/ daimler pedestrian benchmark d.html 9: http://www.cvlibs.net/datasets/kitti/

Top view The UCY Dataset [40] and ETH Dataset [41] contain videos of pedestrians walking along streets where no other moving objects exist, which is a relatively simple situation compared with situations of other datasets. The Edinburgh Informatics Forum Pedestrian Database [42] consists of videos of pedestrians walking at the campus of the University of Edinburgh taken by a ﬁxed camera. This dataset is large and has more than 90,000 paths.

60

T. Hirakawa et al.

The above datasets are constructed for pedestrian tracking and crowd behavior analysis, while the Stanford Drone Dataset [38] focuses on path prediction. This dataset has videos taken by drones ﬂying at eight sites of Stanford University, and provides annotations of moving objects, such as cyclists, skateboarders, and cars, as well as pedestrians. Surveillance Videos in the datasets described above are taken from a top view, while videos in the datasets shown in Fig. 7(e, f) are taken from a bird’s eye view; i.e., the videos are taken by surveillance cameras looking downward at an angle. The physical attributes of pedestrians are observable in these videos and can be used for prediction. The VIRAT Video Dataset [6] contains videos taken by surveillance cameras at parking lots, and provides the locations of pedestrians, cars, and objects in the scene and labels of activities, such as getting into a car and opening a trunk. It contains 11 scenes, which is the largest number of scenes among the datasets of surveillance cameras in Table 3. The Town Centre Dataset [43] contains videos of pedestrians and provides bounding boxes of each pedestrian as well as labels of head locations of pedestrians. The Grand Central Station Dataset [44] contains videos taken by a ﬁxed camera mounted at a station, as shown in Fig. 7(g). It has a single scene but is complex owing to the many people appearing in and disappearing from the scene because the motivation is to analyze the behaviors of many pedestrians. 4.2

Car-Mounted Cameras

Datasets of videos taken by cameras mounted on vehicles are used because path prediction is studied with the aim to develop automated driving. In this case, cameras are mounted in the front of the car to look forward, and the main objective is to predict paths of pedestrians in front of the car. The Daimler Pedestrian Path Prediction Benchmark Dataset [26] consists of videos taken by car-mounted cameras. There are four classes of cases, including cases that the pedestrian walks across the roadway and cases that the pedestrian stops walking to avoid an accident. In addition to the videos themselves, depth information is available as the videos are taken by stereo cameras. There are relatively few pedestrians; however, the dataset contains videos that are rare in other datasets, such as videos of pedestrians crossing in front of moving cars. The KITTI Vision Benchmark Suite [45] was constructed for the Intelligent Transport System, and is used for various evaluations such as those of the detection of pedestrians, vehicles, and white lines on the road. It contains not only RGB images but also stereo images, LIDER 3D data, GPS locations, and street maps, and it is therefore useful for path prediction that uses rich information to understand the environment. 4.3

First-Person View

Unlike videos of entire scenes and taken by car-mounted cameras for predicting paths of targets in the scene, videos taken from the ﬁrst-person view are used to

Survey on Vision-Based Path Prediction

61

predict the path of the person taking the video. Park et al. [23] used ﬁrst-person videos taken by wearable cameras moving through indoor and outdoor environments of 26 diﬀerent scenes, such as on a street and inside a store. Rhinehart et al. [34] collected ﬁrst-person videos taken by a person walking around oﬃce environments and assumed that an object held by the person (e.g., a mug or towel) indicate where the person is going (e.g., the kitchen or bathroom).

5

Conclusions

We reviewed vision-based path prediction methods and common datasets. We ﬁrst categorized feature extraction methods of features used for prediction attributed to the environment or target appearance and dynamics. We then grouped prediction methods according to the approach taken. Bayesian methods deﬁne probabilistic models of the path and sequentially estimate internal states. Energy minimization methods deﬁne a two-dimensional grid graph by computing possibilities of pedestrians to move in each local region, and then solve the shortest-path problem. Deep learning methods take a series of locations of the target over the past several seconds and output a series of future locations. IRL uses the policy and reward estimated from training samples and then selects actions iteratively to produce a future path. These approaches are of course not exclusive and often used in combination [21]. Finally, we summarized datasets used in evaluating prediction methods. Some datasets are used for pedestrian detection and tracking, while others are used for path prediction. Acknowledgments. This work was supported in part by JSPS KAKENHI under grant number JP16H06540.

References 1. Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011) 2. Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Cham (2015). https://doi. org/10.1007/978-3-319-16181-5 47 3. Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 789–792. ACM, New York (2014) 4. Zhu, H., Meng, F., Cai, J., Lu, S.: Beyond pixels: a comprehensive survey from bottom-up to semantic image segmentation and cosegmentation. J. Vis. Commun. Image Represent. 34, 12–27 (2016) 5. Ziebart, B.D., Ratliﬀ, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J.A., Hebert, M., Dey, A.K., Srinivasa, S.: Planning-based prediction for pedestrians. In: International Conference on Intelligent Robots and Systems, pp. 3931–3936, October 2009

62

T. Hirakawa et al.

6. Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.C., Lee, J.T., Mukherjee, S., Aggarwal, J.K., Lee, H., Davis, L., Swears, E., Wang, X., Ji, Q., Reddy, K., Shah, M., Vondrick, C., Pirsiavash, H., Ramanan, D., Yuen, J., Torralba, A., Song, B., Fong, A., Roy-Chowdhury, A., Desai, M.: A large-scale benchmark dataset for event recognition in surveillance video. In: Computer Vision and Pattern Recognition, pp. 3153–3160, June 2011 7. Munoz, D., Bagnell, J.A., Hebert, M.: Stacked hierarchical labeling. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 57–70. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3 5 8. Yang, J., Price, B., Cohen, S., Yang, M.H.: Context driven scene parsing with attention to rare classes. In: Computer Vision and Pattern Recognition, pp. 3294– 3301 (2014) 9. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 10. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017) 11. Huang, S., Li, X., Zhang, Z., He, Z., Wu, F., Liu, W., Tang, J., Zhuang, Y.: Deep learning driven visual path prediction from a single image. IEEE Trans. Image Process. 25(12), 5892–5904 (2016) 12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional neural networks. In Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 13. Bromley, J., Guyon, I., LeCun, Y., S¨ ackinger, E., Shah, R.: Signature veriﬁcation using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994) 14. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, pp. 886–893 (2005) 15. Enzweiler, M., Gavrila, D.M.: Integrated pedestrian classiﬁcation and orientation estimation. In: Computer Vision and Pattern Recognition, pp. 982–989 (2010) 16. Ma, W., Huang, D., Lee, N., Kitani, K.M.: Forecasting interactive dynamics of pedestrians with ﬁctitious play. In: Computer Vision and Pattern Recognition, pp. 774–782 (2016) 17. Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 73–86. Springer, Heidelberg (2012). https://doi.org/ 10.1007/978-3-642-33709-3 6 18. Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10. 1007/978-3-642-33765-9 15 19. Ballan, L., Castaldo, F., Alahi, A., Palmieri, F., Savarese, S.: Knowledge transfer for scene-speciﬁc motion prediction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 697–713. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-46448-0 42 20. Rehder, E., Wirth, F., Lauer, M., Stiller, C.: Pedestrian prediction by planning using deep neural networks (2017) 21. Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H.S., Chandraker, M.K.: DESIRE: distant future prediction in dynamic scenes with interacting agents. In: Computer Vision and Pattern Recognition, pp. 336–345 (2017)

Survey on Vision-Based Path Prediction

63

22. Walker, J., Gupta, A., Hebert, M.: Patch to the future: unsupervised visual prediction. In: Computer Vision and Pattern Recognition, pp. 3302–3309, June 2014 23. Park, H.S., Hwang, J.J., Niu, Y., Shi, J.: Egocentric future localization. In: Computer Vision and Pattern Recognition, pp. 4697–4705, June 2016 24. Su, S., Hong, J.P., Shi, J., Park, H.S.: Predicting behaviors of basketball players from ﬁrst person videos. In: Computer Vision and Pattern Recognition, pp. 1502– 1510 (2017) 25. Kooij, J.F.P., Schneider, N., Flohr, F., Gavrila, D.M.: Context-based pedestrian path prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 618–633. Springer, Cham (2014). https://doi.org/10. 1007/978-3-319-10599-4 40 26. Schneider, N., Gavrila, D.M.: Pedestrian path prediction with recursive Bayesian ﬁlters: a comparative study. In: German Conference on Pattern Recognition, pp. 174–183 (2013) 27. Xie, D., Todorovic, S., Zhu, S.C.: Inferring ‘Dark Matter’ and ‘Dark Energy’ from videos. In: International Conference on Computer Vision, pp. 2224–2231, December 2013 28. Yi, S., Li, H., Wang, X.: Pedestrian behavior understanding and prediction with deep neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 263–279. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-46448-0 16 29. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: Computer Vision and Pattern Recognition, pp. 961–971, June 2016 30. Fernando, T., Denman, S., Sridharan, S., Fookes, C.: Soft + hardwired attention: an LSTM framework for human trajectory prediction and abnormal event detection (2017) 31. Fernando, T., Denman, S., McFadyen, A., Sridharan, S., Fookes, C.: Tree memory networks for modelling long-term temporal dependencies (2017) 32. Lee, N., Kitani, K.M.: Predicting wide receiver trajectories in American football. In: Winter Conference on Applications of Computer Vision, pp. 1–9, March 2016 33. Bokhari, S.Z., Kitani, K.M.: Long-term activity forecasting using ﬁrst-person vision. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 346–360. Springer, Cham (2017). https://doi.org/10.1007/978-3319-54193-8 22 34. Rhinehart, N., Kitani, K.M.: First-person activity forecasging with online inverse reinforcement learning (2017) 35. Keller, C.G., Gavrila, D.M.: Will the pedestrian cross? a study on pedestrian path prediction. IEEE Trans. Intell. Transp. Syst. 15(2), 494–506 (2014) 36. Rehder, E., Kloeden, H.: Goal-directed pedestrian prediction. In: Workshop on International Conference on Computer Vision, pp. 139–147, December 2015 37. Yamaguchi, K., Berg, A.C., Ortiz, L.E., Berg, T.L.: Who are you with and where are you going? In: CVPR 2011, pp. 1345–1352 (2011) 38. Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8 33 39. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282 (1995) 40. Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. Comput. Graph. Forum 26(3), 655–664 (2007)

64

T. Hirakawa et al.

41. Pellegrini, S., Ess, A., Schindler, K., van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: International Conference on Computer Vision, pp. 261–268 (2009) 42. Majecka, B.: Statistical models of pedestrian behaviour in the forum. Ph.D thesis, MSc Dissertation, School of Informatics, University of Edinburgh (2009) 43. Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: Computer Vision and Pattern Recognition, pp. 3457–3464 (2011) 44. Yi, S., Li, H., Wang, X.: Understanding pedestrian behaviors from stationary crowd groups. In: Computer Vision and Pattern Recognition, pp. 3488–3496 (2015) 45. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Computer Vision and Pattern Recognition, pp. 3354– 3361 (2012)

Neural Mechanisms of Animal Navigation Koutarou D. Kimura1,2 ✉ , Masaaki Sato3,4, and Midori Sakura5 (

)

1

Graduate School of Science, Osaka University, Toyonaka, Osaka, Japan [email protected] 2 Graduate School of Natural Sciences, Nagoya City University, Nagoya, Aichi, Japan 3 Graduate School of Science and Engineering, Brain and Body System Science Institute, Saitama University, Saitama, Saitama, Japan 4 RIKEN Brain Science Institute, Wako, Saitama, Japan 5 Graduate School of Science, Kobe University, Kobe, Hyogo, Japan

Abstract. Animals navigate to speciﬁc destinations for survival and reproduc‐ tion. Notable examples include birds, ﬁshes, and insects that are driven by their inherited motivation and acquired memory to migrate thousands of kilometers. The navigational abilities of these animals depend on their small and imprecise sensory organs and brains. Thus, understanding the mechanisms underlying animal navigation may lead to the development of novel tools and algorithms that can be used for more eﬀective human-computer interactions in self-driving cars, autonomous robots and/or human navigation. How are such navigational abilities implemented in the animal brain? Neurons (i.e., nerve cells) that respond to external signals related to the animal’s direction and/or travel distance have been found in insects, and neurons that encode the animal’s place, direction, or speed have been identiﬁed in rats and mice. Although the research ﬁndings accumulated to date are not suﬃcient for a complete understanding of the neural mechanisms underlying navigation in the animal brain, they do provide key insights. In this review, we discuss the importance of neurobiological studies of navigation for engineering and computer science researchers and brieﬂy summarize the current knowledge of the neural bases of navigation in model animals, including insects, rodents, and worms. In addition, we describe how modern engineering and computer technologies, such as virtual reality and machine learning, can help advance navigation research in animals. Keywords: Neural computation · Spatial information Biologically-inspired engineering

1

Neurobiological Research of Animal Navigation: Why It Matters

Most, if not all, animals navigate to their destinations over various distances to forage for food, escape from their enemies, and/or ﬁnd mating partners by using their inherited motivation and acquired memory. To accomplish this, animals utilize multiple types of information and a variety of strategies. When they navigate short distances, the spatial position of their goal can be accurately located if they are able to precisely recognize it using binocular vision or binaural hearing. However, when animals navigate using

© Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 65–81, 2018. https://doi.org/10.1007/978-3-319-91131-1_5

66

K. D. Kimura et al.

olfaction, the precise localization of the goal becomes more diﬃcult, even for short distances, because odor does not produce a well-shaped gradient in space but rather diﬀuses nonuniformly as plumes. The diﬃculties in olfactory navigation can be easily understood if you close your eyes and try to reach for an odor source. Nevertheless, certain animals eﬃciently reach odor sources by using specialized strategies, such as zig-zag turns [1]. For long-range navigation, some animals, such as birds, ﬁshes, and insects, can navigate to distant destinations, even if they are hundreds or thousands of kilometers away. Although the navigation goals cannot be directly recognized in these cases, these animals utilize global information, such as sun positioning and/or the geomagnetic ﬁeld, when they navigate [2, 3]. It should be noted that, because global cues change according to time and season, animals need to correctly recognize their current temporal situation and compensate for errors in the relationship between the global information and their own positions. These amazing navigational abilities of animals are comparable to the engineering requirements of very precise modern technologies. To obtain a precision of 10 meters in localization in global positioning systems (GPS), latitude and longitude need to be measured with an accuracy that is exact up to the 4th digit after the decimal point. However, neurobiological investigations of animal navigation have revealed that neurons in sensory organs and the central nervous system code positional information not necessarily accurately but even in a variable and stochastic manner, unlike electronic devices. This suggests that the principles of circuit operation and/or the information processing algorithms underlying animal navigation must diﬀer from those of current technology. Thus, an understanding of the neurobiological mechanisms underlying animal navigation will contribute to the technological development of compact, eﬃcient, and inexpensive mobile systems that can be used to monitor and assist in the navigation of autonomous objects, such as robots, and people. Alternatively, a neurobiological understanding of animal navigation might be useful for improving human social engi‐ neering and ergonomics problems, such as architectural design diﬃculties and the safe and eﬃcient guidance of people toward or away from public facilities while preventing them getting lost, because the behaviors of navigation in humans and animals have many similarities.

2

Information Used for Navigation

Animals adopt multiple navigational strategies. In particular, they are thought to inte‐ grate the sensory information available in the external environment and the internal information that is computed and stored within the animal’s nervous system. Examples of these information types are summarized in Table 1, and further details are described in Sects. 3.1 and 3.2.

Neural Mechanisms of Animal Navigation

67

Table 1. Examples of information used for navigation 1. External Sensory Information 1.1 Global Position Information Cues: geomagnetic ﬁelds, polarized skylights, gradients in brightness and/or color caused by the sun or moon, visualization of the sun, moon, and/or stars Eﬀective situation: The goal is not directly recognizable Animal strategy: Compassing: direction is determined from subtle gradients in sensory cues 1.2 Local Information Cues: visual, auditory, and olfactory information on the features of the goal Eﬀective situation: The goal is directly recognizable. Animal strategy: Beaconing: the direction and distance to the goal are estimated using information from bilateral sensory organs (in visual or auditory navigation); Zig-zag turns: locating the goal by performing repeating right and left turns and subsequent looping behavior (in olfactory navigation) 1.3 Temporal Information Cues: intermittently perceived stimuli Eﬀective situation: The goal is new and unknown Animal strategy: Biased random walk: the direction is random and remains unchanged when the relevant stimulus is perceived 2. Internally Computed Information 2.1 Current Direction Information: the direction of the animal Example of use: Birds adjust their direction of travel according to the strength of the wind [4] Time span: brief Neural activity: head direction cells in rodents [5, 6] and insects [7, 8] 2.2 Integrated Path Information: integration of the direction and distance from the starting point Example of use: Desert ants return to the nest in a straight line after wandering hundreds of meters looking for food [9] Time span: during a single navigational episode Neural activity: possible path integration in insects [10] 2.3 Memory Information: memories of previous experiences associated with the goal Example of use: Animals use remembered cues to reach to destination they have previously reached Time span: long-term Neural activity: place cells in rodents [11]

3

Exploring the Neural Bases of Navigation in Model Animals

Neurobiological studies of navigation, particularly those investigating how the external and internal information is represented and integrated in the form of neural activity, have been extensively performed using representative model animal species, such as insects (e.g., ants, bees, and locusts), rodents (mostly rats and mice), and worms. Some insect species are elegant examples of how various kinds of environmental information are

68

K. D. Kimura et al.

transformed into the neural activity underlying the species’ amazing navigational capa‐ bilities [12]. Another advantage of studying insects is their relatively simple and compact brains. Although tens of thousands of neurons form elaborate networks in the brains of insects, their brains allow for easier experimental manipulations compared to the brains of higher animal species. Rodents are small mammals with brains that share many simi‐ larities to the human brain, and they can be trained to perform a variety of cognitive tasks that are relevant to human behavior. Studies conducted in highly controllable laboratory environments have revealed subsets of neurons that speciﬁcally respond to diﬀerent aspects of navigation, such as location, direction, and geometric border [11]. Finally, the nematode Caenorhabditis elegans, which is a tiny worm, has been studied in a number of laboratories because it has only 302 neurons that make up networks with connections that have been fully elucidated [13]. This neural simplicity, together with additional favorable characteristics, such as the ease of performing genetic manipula‐ tions and monitoring behavior-associated neural activity, have resulted in these worms being extensively studied to investigate how neural circuits constructed via genetic blueprints generate animal behavior. Thus, in the next section, we will discuss the current understanding of the neural mechanisms of navigation, which have been revealed through the achievement of studies of insects, rodents, and worms. 3.1 Insects Many insect species exhibit inherited navigational behaviors, such as orientation and migration. For example, the monarch butterﬂy Danaus plexippus and the desert locust Schistocerca gregaria have well-known migrational habits; they travel back and forth every year between distant areas that are often over 1,000 km apart [2]. Like many migratory birds, these insects move to a suitable area for breeding. In the next season, the newly born population also successfully travels back to the place that their parental generation left, even though they have not previously experienced the place. These observations indicate that their migrational behavior was genetically programmed during evolution. In addition, many insects exhibit learning-dependent navigation. Especially in species that have their own nests, including many social insects, they explore around the nest searching for food and memorize the locations of food so that they can visit the same places again [14]. More surprisingly, honey bees share infor‐ mation on food location with colony members using a waggle dance [14]. Despite their sophisticated navigational behavior, the structures of the insect brain are relatively simple compared to those in vertebrates, such as rodents. Thus, elucidating the brain mechanisms underlying navigation in insects may be useful for revealing the essential components of the sophisticated navigation of animals. Neural Mechanisms for Detecting External Information Types of Information. Even though the spatial resolution of the compound eye of an insect is much lower (normally below 1/20) than that of humans, many insects navigate mainly based on visual information. Referring a polarized skylight (Fig. 1Aa) is a wellknown method by which insects deduce orientation [15]. The position of the sun, moon,

Neural Mechanisms of Animal Navigation

69

and/or stars is another type of global information that helps navigating insects know their heading direction [12]. In addition, especially in some social insects, local visual information, such as landmarks or panoramic views, is used for memorizing a familiar place, such as a nest or frequently visited feeding place [16]. Among the types of visually guided information used for navigation in insects described above, polarization vision is the most intensively studied and perhaps only example. We will now brieﬂy review the latest ﬁndings on the neural mechanisms underlying insect polarization vision.

Fig. 1. Possible neural mechanisms underlying insect navigation. A. Honeybees use skylight polarization (a) to deduce their heading direction and optic ﬂow generated by self-motion (b) to estimate their traveling distance. B. Morphology of the bee brain. a. A brain in a head capsule. b. A section of the area of the bee brain outlined by the white square in a (photo courtesy of Dr. R. Okada). The neuropil of the regions crucial for navigation (La, Me, Lo, and CB) is shown. The No, PB, and LAL regions are located posterior to CB. Scale bar = 200 μm. C. Neural pathways for directional and distance information processing. The solid and dashed arrows indicate direct (monosynaptic) and indirect (polysynaptic) connections, respectively. Directional and distance information converge in CC, which then sends the information to the thoracic ganglia that control the legs and wings via LAL. CE, compound eye; DRA, dorsal rim area; OL, optic lobe; La, lamina; Me, medulla; Lo, lobula; CC, central complex; CB, central body; PB, protocerebral bridge; No, noduli; LAL, lateral protocerebral lobe.

Processing the Information Related to a Polarized Skylight. Polarization vision in insects is mediated by a specialized region in the compound eye called the dorsal rim area (DRA, Fig. 1C). The ommatidia, which are the optical units of the compound eye, in this area are extremely polarization-sensitive because of their structural and physio‐ logical properties [17]. Information on the electric ﬁeld vector (e-vector) of the light

70

K. D. Kimura et al.

waves of the polarized skylight that is detected in the DRA is then delivered to the central complex (CC, Fig. 1C), which is one of the integrative centers in the insect brain. Many types of CC neurons that respond to polarized light stimuli have been identiﬁed in various insect families, including Orthoptera (locusts and crickets) and Hymenoptera (bees and ants) [12, 18]. In locusts, e-vector information on polarized light is topo‐ graphically represented in the protocerebral bridge (PB, Fig. 1C), which is part of the CC, thus suggesting that this area is the highest center of polarization vision in insects [19]. The CC Acts as an Internal Compass. To utilize the polarized skylight as a global cue for navigation, time compensation is necessary because the polarization pattern in the sky changes with respect to solar elevation. Although the mechanisms underlying this compensation are still unclear, a group of neurons that sends time-compensated polar‐ ized light information to the CC by integrating the polarized light and chromatic gradient information has been found in locusts [20]. These ﬁndings suggest that the CC is not only the highest brain center of polarization vision but also a potential internal compass that monitors the insect’s orientation during navigation. Consistently, some CC neurons in Drosophila fruit ﬁles and cockroaches show responses that are similar to those of the head-direction cells in mammals (see below) [7, 8]. Neural Mechanisms Underlying Self-motion Monitoring Behaviorally, ants and bees estimate their travel distances by step counts and optic ﬂow, which is image ﬂow caused by self-motion (Fig. 1Ab), respectively [21, 22]. Many types of neurons respond to optic ﬂow stimuli in the optic lobe (OL, Fig. 1B and C), which is the primary visual center in the insect brain, and additional information processing pathways have not been described for a long time. Recently, a group of neurons in the noduli (No, Fig. 1C), which is one of the input sites to the CC, was found to encode the direction and speed of optic ﬂow stimuli, indicating that they might convey moving distance information to the CC [10]. Because the CC is considered an internal compass as mentioned above, the CC likely integrates information on direction and distance and then computes path integration. To date, however, little is known about how the CC controls navigational behavior. The lateral accessory lobe (LAL), which is the main output region from the CC, might send steering commands to the thoracic ganglia, which is the motor center for the legs and wings (Fig. 1C). We need more information to investigate how the CC stores navigational memories and controls behavior. 3.2 Rodents Spatial Cell Types and the Information They Encode. Navigation-related neural activity in rodents is usually recorded in freely moving animals that are performing random foraging or goal-directed navigational tasks in a recording enclosure or maze placed in a laboratory room (Fig. 2A). Decades of studies have discovered several distinct cell types in the hippocampal-entorhinal network that are relevant to navigation (Fig. 2B; [11]). Namely, place cells in the hippocampus are activated (or “ﬁre”) specif‐ ically when an animal is in a particular location in the environment [23]. Grid cells are

Neural Mechanisms of Animal Navigation

71

found in the medial entorhinal cortex (MEC), which provides the main cortical input to the hippocampus, and these cells exhibit grid-like, periodic, and hexagonal ﬁring patterns across the environment [24]. Head direction (HD) cells in the presubiculum and MEC ﬁre when an animal faces a particular direction [5, 6]. Border cells in the MEC and boundary cells in the subiculum ﬁre along geometric borders of the local environ‐ ment, such as walls [25, 26]. Speed cells in the MEC are a very recent addition to these cell types, and these cells change their ﬁring rates linearly with an animal’s running speed [27].

Fig. 2. A. A schematic representation of the electrophysiological recording setup in a rodent moving freely in a recording enclosure. B. The ﬁring ﬁelds of place cells, grid cells, border cells (also known as boundary cells in the subiculum), and head direction cells (top) and a schematic of the hippocampal-entorhinal region of the rodent brain in which each cell type is found (bottom). The blue squares indicate the recording enclosure, such as the one shown in A, which is seen from the top, and the ﬁelds shown in red indicate the subareas in which each cell type ﬁres. Head direction cells ﬁre when the animal’s heads is oriented in a particular direction. The broken lines in the bottom panel indicate that the cell type found there exhibits slightly diﬀerent response properties than the typical cell types found in the other areas do. CA1 and CA3, hippocampal CA1 and CA3 areas; DG, dentate gyrus; S, subiculum; PrS, presubiculum; PaS, parasubiculum; MEC, medial entorhinal cortex.

Neural Circuits of Spatial Representation. The location-speciﬁc ﬁring of place cells is shaped by visual and other types of sensory information from environmental cues in addition to self-motion cues, such as motor and vestibular information. When external cues are rotated, many place cells rotate their ﬁring ﬁelds accordingly [28]. Moreover, once formed, place cells maintain their ﬁring ﬁelds when the lights are oﬀ, and they can be formed even in an environment in darkness [29]. The activity of place cells is also inﬂuenced by many factors, such as context (e.g., the shape and pattern of the environ‐ ment), events and objects in the environment, and the internal states of the animal, such

72

K. D. Kimura et al.

as motivation and working memory. A current view hypothesizes that angular and linear self-motion integration (i.e., path integration) plays an important role in determining place cell ﬁring and that errors accumulated from the integration are corrected by the association of the place ﬁelds with external cues. How then is the place-speciﬁc activity generated? Place cells greatly change their ﬁring in diﬀerent environments, while grid cells and HD cells maintain their coherent ensemble activity, thus suggesting that these cell types provide an intrinsic metric of space as part of the path-integration system. Since the discovery of grid cells, the place ﬁelds of the hippocampal neurons have been assumed to be created by the linear summation of inputs from upstream grid cells with diﬀerent spatial scales. However, experimental evidence to date does not support this simplistic view [30, 31], which implies that place cells are created through multiple mechanisms, at least one of which is likely to be grid cell-independent. Moreover, in young infant rats, HD cells are already present at 14 days of life, which is generally before the rats exhibit signiﬁcant spatial exploration and their eyes are open, whereas place cells begin to develop later, around 16 days after birth, and grid cells appear only around 20 days of age [32, 33]. These ﬁndings suggest that the directional system that develops ﬁrst in life might provide inputs that shape other spatial cells. A Possible Role of Place Cells in Navigation. Place cells in the hippocampus exhibit speciﬁc activity when the animal passes through particular locations, thus implying that they represent real-time information regarding the current position of the animal when the animal is moving. However, this raises a question about the mechanism by which these place cells contribute to the animal’s navigation towards their goal. One possible answer may exist in the phenomenon called “awake replay”, in which temporallycompressed reactivations of sequences of place cells that reﬂect past trajectories occur when the animal is awake but not moving [34]. A recent study has demonstrated that place cell sequence events like those in awake replay not only encode spatial trajectories from the animal’s current location to remembered goals but also predict immediate future behavioral paths [35], thus suggesting a role of place cells in trajectory-ﬁnding in goal-directed navigation. 3.3 Worms A species of nematode, Caenorhabditis elegans (hereafter simply called worms), which is ~1 mm in length and only possesses ~1,000 cells, has been used worldwide to study the mechanisms of simple brain function as well as other biological phenomena. Some of these studies have resulted in three Nobel prizes (2002 and 2006 for Physiology or Medicine and 2008 for Chemistry). These simple worms are widely used as the main subject of neurobiology for the following reasons. (1) Their nervous system, which consists of only 302 neurons, exhibits simple forms of brain functions, such as sensory perception and learning and memory [36]. (2) The molecular mechanisms regulating the activities of neurons and the neurotransmission of neural activities, which are medi‐ ated by small chemical compounds, such as glutamate, GABA, dopamine, and serotonin, depend on gene products that are functionally very similar to those of higher animals. (3) Because worms crawl relatively slowly (~0.1 mm/s) on the surface of an agar layer,

Neural Mechanisms of Animal Navigation

73

their behavior can be easily monitored with high precision and less noise. (4) The rela‐ tionships between behavior and genes, which are the blueprints of all life activities, can be easily analyzed by a large repertoire of sophisticated genetic techniques. (5) Because their bodies are transparent and exogenous genes can be easily introduced, optical monitoring (“imaging”) and manipulations of neural activities are feasible by using genetically engineered gene products, such as calcium indicators and light-driven ion channels and pumps [37, 38]. In addition, a comprehensive platform for performing quantitative analyses of the behavior and neurophysiology in the worm’s olfactory navigation has been developed by one of the author’s groups [39]. This platform allows for accurate estimates of the dynamic odor concentration changes that each worm experience during olfactory navi‐ gation and reproduces the odor concentration changes on a robotic microscope system that automatically tracks and monitors the neuronal activity of freely behaving worms (Fig. 3A). Because high-quality time-series data can be obtained on sensory stimuli and the behavior and neural activity in between, it has become possible to accurately describe the neural activity as a mathematical model (see next paragraph). Moreover, the highquality data has recently been shown to be useful for machine learning analyses of feature extraction (see Sect. 4). Tanimoto et al. [39] used the robotic microscope system to reveal the following unexpected computational abilities of worm sensory neurons. (1) Increases and decreases in odor concentrations are sensed by diﬀerent sensory neurons. (2) Concen‐ tration increases in an unpreferred odor are transformed into neural activity that reﬂects the time-diﬀerential of the odor concentration, which causes the immediate behavioral response of randomly changing the migratory direction (Fig. 3B). (3) In contrast, concentration decreases in the unpreferred odor are transformed into neural activity that reﬂects the time-integral of the odor concentration changes, which causes a delayed behavioral response to switching the behavioral state from random searching to straight migration in that direction. Interestingly, the temporal integration of sensory information that causes delayed behavioral changes is one of the critical features of decision-making in monkeys and in humans [59]. Thus, our results indicate that worms can make deci‐ sions based on similar neural mechanisms as those in humans. We have identiﬁed the genes responsible for the decision-making in worms (Fig. 3C) [39], whose counterparts in humans might also be responsible for our decision-making.

74

K. D. Kimura et al.

Fig. 3. A robotic microscope system used to reveal the neural basis of decision making in worms. A. A schematic drawing of the setup of the robotic microscope system. B. The time-diﬀerential and time-integral activities of the sensory neurons involved in decision-making during navigation in an odor gradient. C. The molecular mechanisms of the neural responses. Modiﬁed from [39].

Neural Mechanisms of Animal Navigation

4

75

Engineering and Computational Methods Used in Animal Navigation Research

Novel engineering and computational approaches have been adopted in neurobiological research of animal navigation to fulﬁll the requirement of highly accurate manipulations and measurements of sensory input, behavioral output, and neural activity. Here, we introduce two example technologies: navigation tasks using virtual reality (VR) and machine learning analyses of navigation behavior. 4.1 Navigation Tasks Using VR VR refers to a computer-simulated and immersive environment that can present a variety of sensory stimuli in an interactive manner and thereby provide animal and human subjects with simulated experiences of navigation while their brain activities are recorded with various methods, such as functional magnetic resonance imaging (fMRI) and electrophysiology. Virtual navigation tasks have been used successfully to record place cell and grid cell activity in humans [40, 41], which demonstrates the usefulness of VR in bridging the gap between ﬁndings obtained from animals and those in humans. In recent years, VR has been increasingly used in navigation research in insects and rodents since its ﬁrst successful applications in these animals [42, 43]. A recent behav‐ ioral study has demonstrated that goal-directed navigation in VR in mice requires activity of the hippocampus [44], which is also required in real-world situations. VR enables measurements of navigation-related neural activity in these animals with highresolution electrophysiological and optical recording techniques, such as whole-cell patch-clamp recording or two-photon calcium imaging, which require extremely stable ﬁxation of the subject’s head under an electrode or microscope [45]. Another advantage of the use of VR for navigation tasks is the experimental ﬂexibility and controllability of stimulus presentations, which was exempliﬁed in a study in which visual information and the subject’s movement were put into conﬂict during the recording of hippocampal place cells in mice [46]. These advantages of VR make it an attractive behavioral para‐ digm for use in animal navigation research, and it will help explore yet undiscovered cellular and circuit mechanisms and neural representation schemes underlying naviga‐ tion in the future. 4.2 Discovering Features of Navigation Behavior Using Machine Learning Behavior is the ﬁnal output of massive amounts of neural activity in the brain. However, descriptions of this behavior have been seriously limited compared to those of neural activities. Currently, the dynamic activities of thousands of neurons can be simultane‐ ously measured by using the optical monitoring methods described above. Thus, neural activity can be expressed as the time-series vector data of thousands of dimensions. In contrast, behavior can still only be described with simple measures, such as speed, direction, and the probability of reaching the goal. Even worse, most video records of animal behavior are simply stored in laboratories without detailed analyses. Such a large

76

K. D. Kimura et al.

asymmetry in the richness of neural and behavioral data is being recognized as one of the recent major problems in neuroscience [47–49]. The poor descriptions of behavior are due to the diﬃculties in the analysis of meas‐ ured behavioral data. Recent developments in GPS and small cameras allow us to easily record the positions and postures of animals with high precision for an extended period of time. Most, if not all, of the behavioral features of recorded animals, such as velocity, acceleration, heading direction, body rotation, posture, and so on, must be dynamically aﬀected by the environmental information. Then, on which of the behavioral changes should we focus in analyses? Should the behavioral changes be analyzed in a temporal window of milliseconds, seconds, or minutes? Also, should we consider temporal delays between sensory stimuli and behavioral responses? If so, how long should the delay be? Moreover, which sensory stimuli we should pay attention to? These questions show there are too many factors to consider and it has been extremely diﬃcult to ﬁgure out the relationships between sensory input and behavioral output. The same is also true in ﬁnding relationships between neural activities and behavior. One way to solve this problem is to use machine learning. The ﬁrst step in analyzing raw animal behavior data is to classify the behaviors into several distinct behavioral states. It is easy to imagine that animal behavior can be classiﬁed into several states, such as sleeping, feeding, chasing prey, ﬁghting, etc. However, these classiﬁcations have traditionally been performed manually by researchers watching the videos for a long time, in which an ambiguous boundaries have frequently become problematic. Machine learning techniques have recently been successfully used for behavioral classiﬁcation based on combinations of characteristic patterns of basic behavioral features, such as velocity, acceleration, heading direction, etc. [50–52]. Moreover, behavioral patterns triggered by artiﬁcial activation of limited numbers of neurons have been classiﬁed by unsupervised learning to estimate functional connectivity in the brain in an unbiased way [53, 54]. However, machine learning techniques have not been used to understand dynamic brain function that links sensory information and behavioral responses because it has been diﬃcult to accurately measure the sensory information that animals receive during behavior. Even if can be measured, no methods have been established to determine which feature of sensory information aﬀects the behavior. Yamazaki et al. [55] have revealed experience-dependent changes in the olfactory navigation of worms by analyzing the relationships between changes in odor concen‐ trations and behavior. That group previously developed a method for measuring odor concentrations at speciﬁc spatiotemporal points in the small arena used for monitoring worm olfactory navigation, and these results led to the construction of a dynamic model of odor gradient involving evaporation and diﬀusion in the arena [39]. The model allowed for estimations of the changes in the odor concentrations that each worm expe‐ rienced during navigation with an accuracy of nM/s. The analyses indicated that odor learning resulted in the worms ignoring small changes in odor concentrations for eﬃcient olfactory navigation. Interestingly, changes in neural activity that were consistent with the behavioral changes were revealed by the robotic microscope system [55].

Neural Mechanisms of Animal Navigation

5

77

Future Prospects

Understanding how the brain works is one of the big challenges in modern science, as demonstrated by the national brain science projects being conducted in the US, EU, and several countries in Asia [56, 57]. Navigation is one of the prototypical brain functions that is used for understanding the dynamic nature of information processing in the brain because sensory inputs and behavioral outputs can be quantitatively measured and the goal of the behavior is rather straightforward. As we described above, the use of model animals has resulted in signiﬁcant contributions to neurobiological studies of navigation. However, the ﬁndings that have been accumulated from these animals to date are still limited and insuﬃcient to allow for an understanding of dynamic information processing in navigation. To conclude this paper, we suggest the following key issues to be exam‐ ined in future research. (1) The characteristic of sensory information that plays the dominant role in navigation needs to be elucidated at a ﬁner resolution. In the real world, navigation is a multi‐ sensory process that involves vision, audition, olfaction, and self-motion, and the relative importance of each modality may change in a short time span. For example, in olfactory navigation, animals continuously and simultaneously sense various odorants, each of which changes its concentration in diﬀerent time scales. The situation is much more complex for visual and auditory stimuli because of the rich‐ ness of information they contain. Thus, a method needs to be developed to detect the changes in sensory information that are signiﬁcantly correlated with those in behavior by considering the highly dynamic nature of some environmental stimuli. (2) Determining which aspect of behavior is changed by the sensory input is important. As described above, deciding which behavioral feature to focus on in the analysis is diﬃcult. One possible way to solve the problem is to extract the important behav‐ ioral features with machine learning and then try to ﬁnd causal relationships between the sensory inputs and behavioral outputs. However, sensory information and behavioral responses are often in a closed-loop relationship in such a way that changes in sensory information cause behavioral changes, which subsequently elicit further updates of the sensory information that the animal receives. In order to reveal the causal relationships between them, it is important to establish a working hypothesis from the observed sensory behaviors and test it in an open-loop conﬁguration. VR setups with compensated feedback may also be useful toward this direction of research. (3) Similar problems exist in large-scale neural activity data (i.e., hundreds or thou‐ sands of time-series neural activity data). Researchers often have trouble inter‐ preting large high-dimensional datasets without speciﬁc hypotheses. However, if making a hypothesis of the relationship between sensory information and behavior becomes easier with the aid of analytical tools, such as machine learning, the iden‐ tiﬁcation of neural activities that represent higher-order information, such as abstract sensory information, motor planning, Bayesian-like inferences, and deci‐ sion-making will become much easier [58, 59].

78

K. D. Kimura et al.

(4) Close collaborations between neurobiologists and data scientists are necessary. Navigation research is by nature a multidisciplinary science. In order to discover previously unknown principles by using biologically relevant data analyses, researchers in these two ﬁelds need to communicate intensively by speaking a common language. Although it is diﬃcult and not necessary for a researcher to be an expert in both branches of science, a mutual understanding between these two ﬁelds is key for fruitful collaborations. The rapid advance of state-of-the-art technologies, as reviewed partly in this article, will help us tackle these problems and unquestionably lead to a better understanding of the neural mechanisms underlying navigation in the future. Acknowledgments. This work was supported by KAKENHI JP 16H06545 (K.D.K), 17H05985 (M. Sato), and 17H05975 (M. Sakura).

References 1. Kanzaki, R., Sugi, N., Shibuya, T.: Self-generated zigzag turning of Bombyx mori males during pheromone-mediated upwind walking. Zool. Sci. 9, 515–527 (1992) 2. Brower, L.P.: Monarch butterﬂy orientation: missing pieces of a magniﬁcent puzzle. J. Exp. Biol. 199, 93–103 (1996) 3. Wiltschko, W., Wiltschko, R.: Magnetic orientation and magnetoreception in birds and other animals. J. Comp. Physiol. A 191, 675–693 (2005) 4. Goto, Y., Yoda, K., Sato, K.: Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean. Sci. Adv. 3, e1700097 (2017) 5. Taube, J.S., Muller, R.U., Ranck, J.B.: Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis. J. Neurosci. 10, 420–435 (1990) 6. Taube, J.S., Muller, R.U., Ranck, J.B.: Head-direction cells recorded from the postsubiculum in freely moving rats. II. Eﬀects of environmental manipulations. J. Neurosci. 10, 436–447 (1990) 7. Seelig, J.D., Jayaraman, V.: Neural dynamics for landmark orientation and angular path integration. Nature 521, 186–191 (2015) 8. Varga, A.G., Ritzmann, R.E.: Cellular basis of head direction and contextual cues in the insect brain. Curr. Biol. 26, 1816–1828 (2016) 9. Wehner, R., Wehner, S.: Insect navigation: the use of maps or Ariadne’s thread? Ethol. Ecol. Evol. 2, 27–48 (1990) 10. Stone, T., Webb, B., Adden, A., Weddig, N.B., Honkanen, A., Templin, R., Wcislo, W., Scimeca, L., Warrant, E., Heinze, S.: An anatomically constrained model for path integration in the bee brain. Curr. Biol. 27, 3069–3085 (2017) 11. Moser, E.I., Moser, M.-B., McNaughton, B.L.: Spatial representation in the hippocampal formation: a history. Nat. Neurosci. 20, 1448–1464 (2017) 12. Heinze, S.: Unraveling the neural basis of insect navigation. Curr. Opin. Insect Sci. 24, 58– 67 (2017) 13. White, J.G., Southgate, E., Thomson, J.N., Brenner, S.: The Structure of the nervous system of the nematode Caenorhabditis elegans. Phil. Trans. R. Soc. B 314, 1–340 (1986) 14. von Frisch, K.: Dance Language and Orientation of Bees. Harvard University Press, Cambridge (1993)

Neural Mechanisms of Animal Navigation

79

15. Wehner, R., Labhart, T.: Polarisation vision. In: Warrant, E., Nilsson, D.-E. (eds.) Invertebrate Vision, pp. 291–348. Cambridge University Press, Cambridge (2006) 16. Collett, T.S., Collett, M.: Memory use in insect visual navigation. Nat. Rev. Neurosci. 3, 542– 552 (2002) 17. Labhart, T., Meyer, E.P.: Detectors for polarized skylight in insects: a survey of ommatidial specializations in the dorsal rim area of the compound eye. Microsc. Res. Tech. 47, 368–379 (1999) 18. Sakura, M., Lambrinos, D., Labhart, T.: Polarized skylight navigation in insects: model and electrophysiology of e-vector coding by neurons in the central complex. J. Neurophysiol. 99, 667–682 (2008) 19. Heinze, S., Homberg, U.: Maplike representation of celestial e-vector orientations in the brain of an insect. Science 315, 995–997 (2007) 20. Pfeiﬀer, K., Homberg, U.: Coding of azimuthal directions via time-compensated combination of celestial compass cues. Curr. Biol. 17, 960–965 (2007) 21. Srinivasan, M.V., Zhang, S., Altwein, M., Tautz, J.: Honeybee navigation: nature and calibration of the “Odometer”. Science 287, 851–953 (2000) 22. Wittlinger, M., Wehner, R., Wolf, H.: The ant odometer: stepping on stilts and stumps. Science 312, 1965–1967 (2006) 23. O’Keefe, J., Dostrovsky, J.: The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971) 24. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., Moser, E.I.: Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801–806 (2005) 25. Lever, C., Burton, S., Jeewajee, A., O’Keefe, J., Burgess, N.: Boundary vector cells in the subiculum of the hippocampal formation. J. Neurosci. 29, 9771–9777 (2009) 26. Solstad, T., Boccara, C.N., Kropﬀ, E., Moser, M.-B., Moser, E.I.: Representation of geometric borders in the entorhinal cortex. Science 322, 1865–1868 (2008) 27. Kropﬀ, E., Carmichael, J.E., Moser, M.-B., Moser, E.I.: Speed cells in the medial entorhinal cortex. Nature 523, 419–424 (2015) 28. O’Keefe, J., Conway, D.H.: Hippocampal place units in the freely moving rat: why they ﬁre where they ﬁre. Exp. Brain Res. 31, 573–590 (1978) 29. Quirk, G.J., Muller, R.U., Kubie, J.L.: The ﬁring of hippocampal place cells in the dark depends on the rat’s recent experience. J. Neurosci. 10, 2008–2017 (1990) 30. Brandon, M.P., Koenig, J., Leutgeb, J.K., Leutgeb, S.: New and distinct hippocampal place codes are generated in a new environment during septal inactivation. Neuron 82, 789–796 (2014) 31. Koenig, J., Linder, A.N., Leutgeb, J.K., Leutgeb, S.: The spatial periodicity of grid cells is not sustained during reduced theta oscillations. Science 332, 592–595 (2011) 32. Langston, R.F., Ainge, J.A., Couey, J.J., Canto, C.B., Bjerknes, T.L., Witter, M.P., Moser, E.I., Moser, M.-B.: Development of the spatial representation system in the rat. Science 328, 1576–1580 (2010) 33. Wills, T.J., Cacucci, F., Burgess, N., O’Keefe, J.: Development of the hippocampal cognitive map in preweanling rats. Science 328, 1573–1576 (2010) 34. Foster, D.J.: Replay comes of age. Annu. Rev. Neurosci. 40, 581–602 (2017) 35. Pfeiﬀer, B.E., Foster, D.J.: Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013)

80

K. D. Kimura et al.

36. De Bono, M., Maricq, A.V.: Neuronal substrates of complex behaviors in C. elegans. Annu. Rev. Neurosci. 28, 451–501 (2005) 37. Tian, L., Akerboom, J., Schreiter, E.R., Looger, L.L.: Neural activity imaging with genetically encoded calcium indicators. Prog. Brain Res. 196, 79–94 (2012) 38. Tye, K.M., Deisseroth, K.: Optogenetic investigation of neural circuits underlying brain disease in animal models. Nat. Rev. Neurosci. 13, 251–266 (2012) 39. Tanimoto, Y., Yamazoe-Umemoto, A., Fujita, K., Kawazoe, Y., Miyanishi, Y., Yamazaki, S.J., Fei, X., Busch, K.E., Gengyo-Ando, K., Nakai, J., Iino, Y., Iwasaki, Y., Hashimoto, K., Kimura, K.D.: Calcium dynamics regulating the timing of decision-making in C. elegans. eLife 6, 13819 (2017) 40. Doeller, C.F., Barry, C., Burgess, N.: Evidence for grid cells in a human memory network. Nature 463, 657–661 (2010) 41. Ekstrom, A.D., Kahana, M.J., Caplan, J.B., Fields, T.A., Isham, E.A., Newman, E.L., Fried, I.: Cellular networks underlying human spatial navigation. Nature 425, 184–188 (2003) 42. Fry, S.N., Rohrseitz, N., Straw, A.D., Dickinson, M.H.: TrackFly: virtual reality for a behavioral system analysis in free-ﬂying fruit ﬂies. J. Neurosci. Methods 171, 110–117 (2008) 43. Hölscher, C., Schnee, A., Dahmen, H., Setia, L., Mallot, H.A.: Rats are able to navigate in virtual environments. J. Exp. Biol. 208, 561–569 (2005) 44. Sato, M., Kawano, M., Mizuta, K., Islam, T., Lee, M.G., Hayashi, Y.: Hippocampusdependent goal localization by head-ﬁxed mice in virtual reality. eNeuro 4, ENURO. 0369-16.2017 (2017) 45. Dombeck, D.A., Reiser, M.B.: Real neuroscience in virtual worlds. Curr. Opin. Neurobiol. 22, 3–10 (2012) 46. Chen, G., King, J.A., Burgess, N., O’Keefe, J.: How vision and movement combine in the hippocampal place code. PNAS 110, 378–383 (2013) 47. Anderson, D.J., Perona, P.: Toward a science of computational ethology. Neuron 84, 18–31 (2014) 48. Gomez-Marin, A., Paton, J.J., Kampﬀ, A.R., Costa, R.M., Mainen, Z.F.: Big behavioral data: psychology, ethology and the foundations of neuroscience. Nat. Neurosci. 17, 1455–1462 (2014) 49. Krakauer, J.W., Ghazanfar, A.A., Gomez-Marin, A., MacIver, M.A., Poeppel, D.: Neuroscience needs behavior: correcting a reductionist bias. Neuron 93, 480–490 (2017) 50. Baek, J.-H., Cosman, P., Feng, Z., Silver, J., Schafer, W.R.: Using machine vision to analyze and classify Caenorhabditis elegans behavioral phenotypes quantitatively. J. Neurosci. Methods 118, 9–21 (2002) 51. Branson, K., Robie, A.A., Bender, J., Perona, P., Dickinson, M.H.: High-throughput ethomics in large groups of Drosophila. Nat. Methods 6, 451–457 (2009) 52. Dankert, H., Wang, L., Hoopfer, E.D., Anderson, D.J., Perona, P.: Automated monitoring and analysis of social behavior in Drosophila. Nat. Methods 6, 297–303 (2009) 53. Robie, A.A., Hirokawa, J., Edwards, A.W., Umayam, L.A., Lee, A., Phillips, M.L., Card, G.M., Korﬀ, W., Rubin, G.M., Simpson, J.H., Reiser, M.B., Branson, K.: Mapping the Neural substrates of behavior. Cell 170, 393–406 (2017) 54. Vogelstein, J.T., Park, Y., Ohyama, T., Kerr, R.A., Truman, J.W., Priebe, C.E., Zlatic, M.: Discovery of brainwide neural-behavioral maps via multiscale unsupervised structure learning. Science 344, 386–392 (2014) 55. Yamazaki, S.J., Ikejiri, Y., Hiramatsu, F., Fujita, K., Tanimoto, Y., Yamazoe-Umemoto, A., Yamada, Y., Hashimoto, K., Hiryu, S., Maekawa, T., Kimura, K.D.: Experience-dependent modulation of behavioral features in sensory navigation of nematodes and bats revealed by machine learning. bioRxiv, 198879 (2017)

Neural Mechanisms of Animal Navigation

81

56. Brose, K.: Global neuroscience. Neuron 92, 557–558 (2016) 57. Yuste, R., Bargmann, C.: Toward a global BRAIN initiative. Cell 168, 956–959 (2017) 58. Funamizu, A., Kuhn, B., Doya, K.: Neural substrate of dynamic Bayesian inference in the cerebral cortex. Nat. Neurosci. 19, 1682–1689 (2016) 59. Gold, J.I., Shadlen, M.N.: The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007)

Towards Supporting Multigenerational Co-creation and Social Activities: Extending Learning Analytics Platforms and Beyond ( ) Shin’ichi Konomi1 ✉ , Kohei Hatano1, Miyuki Inaba1, 1 Misato Oi , Tsuyoshi Okamoto1, Fumiya Okubo1, Atsushi Shimada2, Jingyun Wang3, Masanori Yamada1, and Yuki Yamada1

1 Faculty of Arts and Science, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan [email protected] 2 Graduate School of Information Science and Electrical Engineering, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan 3 Research Institute for Information Technology, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan

Abstract. As smart technologies pervade our everyday environments, they change what people should learn to live meaningfully as valuable participants of our society. For instance, ubiquitous availability of smart devices and communication networks may have reduced the burden for people to remember factual information. At the same time, they may have increased the benefits to master the uses of new digital technologies. In the midst of such a social and technological shift, we could design novel integrated platforms that support people at all ages to learn, work, collabo‐ rate, and co-create easily. In this paper, we discuss our ideas and first steps towards building an extended learning analytics platform that elderly people and unskilled adults can use. By understanding the characteristics and needs of elderly learners and addressing critical user interface issues, we can build pervasive and inclusive learning analytics platforms that trigger contextual reminders to support people at all ages to live and learn actively regardless of age-related differences of cognitive capabili‐ ties. We discuss that resolving critical usability problems for elderly people could open up a plethora of opportunities for them to search and exploit vast amount of information to achieve various goals. Keywords: Pervasive learning · Learning analytics Multigenerational co-creation · Elderly people · Learning environment Super-aging societies

1

Introduction

As smart technologies pervade our everyday environments, they change what people should learn to live meaningfully as valuable participants of our society. For instance, ubiquitous availability of smart devices and communication networks may have reduced the burden for people to remember factual information. At the same time, they may have © Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 82–91, 2018. https://doi.org/10.1007/978-3-319-91131-1_6

Towards Supporting Multigenerational Co-creation and Social Activities

83

increased the benefits to master the uses of new digital technologies. In the midst of such a social and technological shift, we could design novel integrated platforms that support people at all ages to learn, work, collaborate, and co-create easily. In this paper, we discuss our ideas and first steps towards building an extended learning analytics platform that elderly people and unskilled adults can use. By understanding the characteristics and needs of elderly learners and addressing critical user interface issues, we can build pervasive and inclusive learning analytics platforms that trigger contextual reminders to support people at all ages to live and learn actively regardless of age-related differences of cognitive capabilities. We discuss that resolving critical usability problems for elderly people could open up a plethora of opportunities for them to search and exploit vast amount of information to achieve various goals. We believe that such a platform can play critical roles in addressing the societal chal‐ lenges in the age of declining population and super-aging societies by increasing the mobi‐ lity of human resources and expanding the working population. Their impact can be substantial in many countries. For example, Japan has more than 6 million “potential workers,” who do not have jobs despite their willingness to work, and more than 28 million active seniors without the need of caregiving. Learning support systems for these popula‐ tions would increase their opportunities to participate in various social activities, thereby potentially making a major societal and economic impacts.

2

Limitations to Conventional Systems

There is increasing interest in exploiting distributed, ambient and pervasive digital infra‐ structures including mobile devices, wearable devices, and IoTs to support learning and intellectual work. In addition, the rise of crowdsourcing and sharing economy platforms is enabling a novel and flexible means to connecting with and participating in various social activities. However, conventional technological environments for supporting learning and intellectual activities could not fully cater to the needs and opportunities arising in this context. One of the key components in designing distributed, ambient, and pervasive learning environments is arguably the data generated and consumed by inter-connected people, things, and spaces. Indeed, data-driven approaches such as leaning analytics is increasingly popular in the research and practice of learning-support technologies. Existing learning analytics platforms however are inherently limited in capturing the whole picture of learning and its relevant contexts. Many conventional learning analytics environments go as far as analyzing patterns of learners’ access to digital learning materials. One can argue that this is only a first step towards understanding learners and their contexts to improve learning. We argue for the need to collect more data by using sensors, etc., so as to gain the holistic view of learners and their environments. Doing so would enable timely and appropriate feedback to learners and teachers. The recent advances in sensing and IoT technologies made it easier to create such systems and environments. Multimodal learning analytics [4] for example employs sensing

84

S. Konomi et al.

devices to extend conventional learning analytics for classrooms. This however is insuffi‐ cient for supporting adults and elderly people as their learning would often take place outside classrooms.

3

General Approach

Our goal is to build a learning-support system that interacts with learners and teachers by exploiting sensing and data analysis techniques. In doing so, we focus on the needs of elderly people and unskilled adults. As shown in Fig. 1, the system considers different learning environments including classroom lectures, peer learning, and learning through practice. The development process first focuses on an e-learning platform and improve it to facilitate acquisition of new skills. We will then support adaptive learning and acquisition of informal knowledge. We expect that this development effort will lead to various learning-support services targeting different occupations and skills.

Expansion of services targeƟng various occupaƟons and skills Acquiring informal knowledge easily Acquiring new skills easily AdapƟve learning-support systems e-Learning Plaƞorm

Fig. 1. Overview of the development of extended learning analytics platform.

In order to address the bottlenecks of learning and social activities by elderly people, we plan to exploit a quantitative approach to examine the bottlenecks in collaboration with brain and cognitive scientists. For example, we can measure arousal of consciousness or degree of concentration based on brain-wave and eye-gaze data in different contexts. Such quantitative measures can be useful for not only analyzing learning behaviors but also trig‐ gering contextual reminders at the right time (e.g., triggering proper reminders when consciousness is aroused.) We can then explore and examine the right timing to trigger different types of reminders in terms of effective memorizing and recalling. Although

Towards Supporting Multigenerational Co-creation and Social Activities

85

reasonably reliable brain-sensing devices could not be used “in the wild”, we could explore other contextual information as proxies for brain and cognitive activities. We focus on the kind of contextual information that can be measured easily by using commodity devices including mobile, wearable, and stationary sensors. We can derive locations, presence, activities of people and physical objects as well as learning contexts and social networks based on their data. Our research efforts focus on the kinds of practical learning that leads to increased opportunities for social participation. They include acquisition of the skills to use digital technologies fluently or the skills of caregiving. These kinds of learning require more than just remembering pre-packaged knowledge as they require acquisition of what we might consider as “living knowledge.” Acquisition of such skills by elderly people and unskilled adults can be an important starting point for addressing the challenges of decreasing labor force. There are three user interface issues that must be addressed to extend learning analytics platforms successfully: 1. Designing for all: As this type of learning-support platforms could have significant long-term impacts on people’s quality of life, they should be accessible, usable and useful for everyone. Thus, the system should be designed for inclusiveness from the very beginning. To develop inclusive user interfaces, we can employ user-centered and participatory design processes and exploit pervasive off-the-desktop computing tech‐ nologies. We also have to design push-based user interfaces for people with declining cognitive capabilities. This can be extremely challenging if they have difficulties providing appropriate feedback to system designers. In this case, we can look into people’s “honest signals” based on various sensors in commodity devices as well as physiological sensors (e.g., EEG sensors, eye trackers, etc.). 2. Sustaining continuous use: For a data-driven approaches such as learning analytics to work, systems must collect and accumulate a large amount of relevant data. In our effort to extend learning analytics, systems must collect data by encouraging, moti‐ vating and sustaining continuous uses by elderly and unskilled learners. 3. Support for learning communities: Although elderly people may not be good at memo‐ rizing things and quickly getting used to new environments, they can play important roles in learning communities [1]. An interesting challenge in this context is the devel‐ opment of social user interfaces that help people to collaborate, co-create, and learn effectively in learning communities. These communities and groups would involve elderly people having similar skills and experiences, elderly people having different skills and experiences, or multigenerational people characterized by a wide range of capabilities and experiences. Again, we aim to support learning and social activities by elderly people and unskilled adults by accumulating and using the data from different learners in different contexts. We can also exploit such data to match people and jobs, thereby potentially creating work‐ forces at companies, local communities, and homes (e.g., telework).

86

4

S. Konomi et al.

Understanding Elderly Learners

We have reviewed relevant literature to understand the characteristics of elderly people as learners and examined their implications for designing digital learning-support envi‐ ronments [1]. Although elderly people may not be very good at memorizing things or getting used to new things quickly, they have a good potential to play important roles in learning communities. Thus, we consider two strategies in designing learning-support technologies for elderly people, i.e., (1) facilitating the perception of the self that recog‐ nizes learning as a self-behavior, and (2) supporting collaborative learning. Collaborative learning provides a way of building knowledge through activities of collaboration with others such as group work. We have examined the impact of group composition on group work-based learning involving university students [2], which has some implications for designing group learning environments for elderly people with diverse experiences, knowledge, and learning styles, each of which can be quantiﬁed for recommending optimal group compositions. To cope with insuﬃcient memory abilities, people often utilize external memory aids and routines in everyday life situations. This means that information is often remembers not simply in the head of a person but also in the environment surrounding the person (cf. [5, 6]). If people have good tools and environments for the support of living and learning, their potential to learn and live active lives can increase substantially. This is relevant to not only elderly people but also many people at all ages. As we extend learning analytics platforms for people at all ages, we can consider not only optimization of teaching methods and learning materials but also optimization of support tools and environments for learning and living (i.e., broader context of learning). In this context, it is critical to understand elderly learners from the perspective of their capability to organize and utilize external tools and environments eﬀectively. In the coming years, the capability to utilize the Internet, mobile and wearable devices, social media, and AI tools eﬀectively will likely be of critical importance to improve learning for all. Elderly people who are not yet fully exploiting digital technologies could expand their potential substantially by optimally restructuring their tools and environments for learning and living. As we will discuss later, addressing the digital divide is then of critical impor‐ tance.

5

Sensing the Contexts of Learners

In order to sense and compute the contextual information of learners, we can exploit various sensors that can be used in lab settings or in everyday environments. We use the following two types of devices in a lab setting to sense learners’ physiological activities in relation to their context: 1. EEG sensor (Cognionics Quick-20) for measuring brain waves of learners and quan‐ tifying alertness, etc. 2. Eye tracker (Tobii Pro Spectrum 150 Hz) for measuring eye movements of learners and quantifying degrees of concentration, etc.

Towards Supporting Multigenerational Co-creation and Social Activities

87

We consider the following sensing technologies to sense learners in everyday envi‐ ronments: 1. Absolute and relative locations of people and physical objects. For example, we could obtain sub-meter location information by using QZSS (Quasi-Zenith Satellite System)/RTK-GPS (Real Time Kinematic GPS) in outdoor spaces and WiFi CSI (Channel State Information) in indoor spaces. 2. Presence of people, things, information, spaces, and events in proximity. For example, we can use WiFi and/or Bluetooth signals to capture co-presence auto‐ matically, or employ a human computation approach based on crowdsourcing. We have discussed the potential of wearable devices such as smartwatches [3]. 3. Body movements. Mobile and wearable sensors as well as IoT devices can be used to recognize people’s activities and detect anomaly quickly. 4. Learning behaviors and experiences. These can be captured based on the log data generated by various learning support systems. This is the kind of data that is mainly used in conventional learning analytics platforms, and may include the scores of tests for measuring the outcomes of learning. 5. Social networks of learners. We can derive various social networks based on learners’ locations, presence, body movements, and learning behaviors. We can analyze and use them to support collaborative learning.

6

Contextual Reminders

There has been extensive previous work in the area of context-aware reminders. The comMotion environment [7] allows users to create reminders using a graphical user interface that resembles paper to-do list, and delivers them based on location and time. The CybreMinder tool [8] extends the range of contextual information by using Context‐ Toolkit [9]. More recently, push services are available on most smartphone operating systems [10], making it extremely easy to develop applications that send notiﬁcations based on mobile context. Commercial services such as Nixle allow authorities to send notiﬁcations to local residents via SMS, web and email [11]. Researchers have run ﬁeld studies of location-based personal reminders to examine their usage patterns. The exploratory study of Place-Its [12] suggests that location can be a convenient proxy for context that cannot be easily captured. The ﬁeld study of PlaceMail [13] shows that people’s preferred delivery points of reminders can be aﬀected by situational factors such as patterns of human movements and the geography of corresponding areas. Also, recent studies of mobile notiﬁcations show that people view Android notiﬁcations typically within minutes [14] and that recipients’ perceived values of notiﬁcations are diﬀerent for diﬀerent app categories [15]. Studies on mobile interruptions show that the content of a message plays an impor‐ tant role in inﬂuencing users’ receptivity to mobile interruptions [16], and that notiﬁ‐ cations received after an episode of mobile interaction, such as calling someone or reading a text message, are responded more quickly [17]. Other researchers discuss that notiﬁcations received at the transition of physical activities, such as sitting and walking, are perceived more positively [18]. The Memory Glasses project proposes to send

88

S. Konomi et al.

reminders based on the user’s activity using body-worn sensors [19]. Similar approaches exploit sensing devices to cope with the problems [18, 20]. Recent proposals focus on smartphones and/or exploit machine learning [21–24]. Hatano discusses machine learning techniques to provide contextual feedback to elderly learners [25]. A relevant genre of context-aware applications is mobile guides, which provide tourists and museum visitors with relevant information to support their experiences in situ [26, 27]. Magitti is a mobile guide that recommends leisure-related informa‐ tion based on the categories of activities including Eating, Shopping, Seeing, Doing or Reading [28]. There are also mobile guides that combine context awareness and personalization [29]. Existing mobile guides and context-aware reminders often focus on the mechanisms to deliver information rather than the process to create content. In practice, a “curator” would have to create information content in many cases. This approach to content crea‐ tion does not necessarily scale, as an expert is needed to create content. Also, these are usually one-way information channels (i.e., users just “consume” content, and they cannot generate content). Therefore, it is not easy to transfer a system to a new location/ community. “Bottom up” approaches to generating contents may have an interesting potential as the recent web-based experiment with the production of personal city guides suggests [30]. Community Reminder take advantage of communities and discuss how reminders can be created and received in relation to collective concerns of a community [31]. Although reminders are typically triggered by automated mechanisms, it often requires some human skills to design and utilize them eﬀectively by weaving them into the lives of diﬀerent people. The notion of tools for living and learning [32] is important for understanding the uses and the design of reminders as some reminders are intended as memory aids for living (e.g., reminder to take a medicine), and other reminders are intended as support for learning (e.g., reminder to learn to take a medicine, which can disappear when the recipient ﬁnishes learning it). The skills to use reminders ﬂuently can be extremely useful for living and learning actively regardless of age-related diﬀer‐ ences of cognitive capabilities.

7

Pervasive and Inclusive Learning Analytics

Pervasive computing in its ultimate form makes computers disappear physically and mentally. This would eﬀectively make digital divide disappear. Smartphones, tablets, wearables, interactive surfaces, networked actuators, digital fabrication tools, and various other IoTs could be seen as transient forms of less and less obtrusive interfaces to computational services. Some of these devices such as tablets has already made computing services more accessible to everyone including elderly people. Thus, perva‐ sive learning analytics is not merely about increasing sensor data for analysis but also about reduction of physical barriers for accessing computing services. Existing research on internet skills and the digital divide shows that age may only aﬀect some internet skills but not all [33]. What age may aﬀect are operational and formal internet skills. Operational internet skills concern with operating software tools by

Towards Supporting Multigenerational Co-creation and Social Activities

89

typing URLs in the browser’s location bar, etc. Formal internet skills concern with navigating websites without being disoriented, etc. These are the skills that can be inﬂuenced by speciﬁc implementations of the browser’s user interfaces. Improving user interfaces by designing them for inclusiveness can minimize the negative impact of the decline of these skills. What seems encouraging is that age may not aﬀect content-related internet skills [33]. Content-related internet skills include information internet skills and strategic internet skills. Information internet skills concern with the search processes involving choosing a website or a search engine, deﬁning search options and queries, selecting information, and evaluating information sources. Strategic internet skills concern with developing an orientation towards a particular goal, taking the right action to reach this goal, making the right decision to reach this goal, and gaining he beneﬁts resulting from this goal. All in all, improving the usability suﬃciently for elderly people can open up a plethora of opportunities for them to ﬁnd useful information and achieve their various goals.

8

Conclusion

We have discussed our ideas and the ﬁrst steps towards building an extended learning analytics platform that elderly people and unskilled adults can use. By understanding the characteristics and needs of elderly learners and addressing critical user interface issues, we can build pervasive and inclusive learning analytics platforms that trigger contextual reminders to support people at all ages to live and learn actively regardless of age-related diﬀerences of cognitive capabilities. Existing research suggests that resolving usability problems for elderly people could open up a plethora of opportunities for them to search and exploit vast amount of information to achieve various goals. We have begun to collaborate with the city of Itoshima to test the feasibility of such a platform. Our ﬁrst exploratory trials exploited an existing learning analytics platform and involved 48 elderly people with varied computer skills. We intend to support acquisition of informal as well as formal knowledge in the future to pave the way for the learning-support infrastructure of the future, which maxi‐ mizes the potential of people at all ages to work and create together eﬀectively. Acknowledgement. This work was supported by JST Mirai Grant Number 17-171024547, Japan.

References 1. Yamada, M., Oi, M., Konomi, S.: Eﬀective learning environment design for aging well: a review. In: Streitz, N., Konomi, S. (eds.) DAPI 2018. LNCS, vol. 10922, pp. 253–264. Springer, Heidelberg (2018) 2. Taniguchi, Y., Gao, Y., Kojima, K., Konomi, S.: Evaluating learning style-based grouping strategies in real-world collaborative learning environment. In: Streitz, N., Konomi, S. (eds.) DAPI 2018. LNCS, vol. 10922, pp. 227–239. Springer, Heidelberg (2018) 3. Shimada, A.: Potential of wearable technology for super-aging societies. In: Streitz, N., Konomi, S. (eds.) DAPI 2018. LNCS, vol. 10922, pp. 214–226. Springer, Heidelberg (2018)

90

S. Konomi et al.

4. Blikstein, P.: Multimodal learning analytics. In: Proceedings of the Third International Conference on Learning Analytics and Knowledge, pp. 102–106. ACM, New York (2013) 5. Hutchins, E.: Cognition in the Wild. MIT Press, Cambridge (1995) 6. Fischer, G., Arias, E., Carmien, S., Eden, H., Gorman, A., Konomi, S., Sullivan, J.: Supporting collaboration and distributed cognition in context-aware pervasive computing environments. In: Paper Presented at the 2004 Meeting of the Human Computer Interaction Consortium “Computing Oﬀ the Desktop”, 25 pp. (2004) 7. Marmasse, N., Schmandt, C.: Location-aware information delivery with comMotion. In: Proceedings of 2nd International Symposium on Handheld and Ubiquitous Computing, pp. 157–171 (2000) 8. Dey, A.K., Abowd, G.D.: CybreMinder: a context-aware system for supporting reminders. In: Thomas, P., Gellersen, H.-W. (eds.) HUC 2000. LNCS, vol. 1927, pp. 172–186. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-39959-3_13 9. Dey, A.K., Abowd, G.D., Salber, D.A.: Conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Hum. Comput. Interact. 16, 97–166 (2001) 10. Warren, I., Meads, A., Srirama, S., Weerasinghe, T., Paniagua, C.: Push notiﬁcation mechanisms for pervasive smartphone applications. IEEE Pervasive Comput. 13(2), 61–71 (2014) 11. Nixle. http://www.nixle.com/ 12. Sohn, T., Li, K.A., Lee, G., Smith, I., Scott, J., Griswold, W.G.: Place-Its: a study of locationbased reminders on mobile phones. In: Beigl, M., Intille, S., Rekimoto, J., Tokuda, H. (eds.) UbiComp 2005. LNCS, vol. 3660, pp. 232–250. Springer, Heidelberg (2005). https://doi.org/ 10.1007/11551201_14 13. Ludford, P.J., Frankowski, D., Reily, K., Wilms, K., Terveen, L.: Because I carry my cell phone anyway: functional location-based reminder applications. In: Proceedings of CHI 2006, pp. 889–898 (2006) 14. Pielot, M., Church, K., de Oliveira, R.: An in-situ study of mobile phone notiﬁcations. In: Proceedings of MobileHCI 2014, pp. 233–242 (2014) 15. Shirazi, A.S., Henze, N., Dingler, T., Pielot, M., Weber, D., Schmidt, A.: Large-scale assessment of mobile notiﬁcations. In: Proceedings of CHI 2014, pp. 3055–3064 (2014) 16. Fischer, J.E., Yee, N., Bellotti, V., Good, N., Benford, S., Greenhalgh, C.: Eﬀects of content and time of delivery on receptivity to mobile interruptions. In: Proceedings of MobileHCI 2010, pp. 103–112 (2010) 17. Fischer, J.E., Greenhalgh, C., Benford, S.: Investigating episodes of mobile phone activity as indicators of opportune moments to deliver notiﬁcations. In: Proceedings of MobileHCI 2011, pp. 181–190 (2011) 18. Ho, J., Intille, S.S.: Using context-aware computing to reduce the perceived burden of interruptions from mobile devices. In: Proceedings of CHI 2005, pp. 909–918 (2005) 19. DeVaul, R.W., Clarkson, B., Pentland, A.S.: The memory glasses: towards a wearable, context aware, situation-appropriate reminder system. In: Proceedings of CHI 2000 Workshop on Situated Interaction in Ubiquitous Computing (2000) 20. Fogarty, J., Hudson, S.E., Atkeson, C.G., Avrahami, D., Forlizzi, J., Kiesler, S., Lee, J.C., Yang, J.: Predicting human interruptibility with sensors. ACM Trans. Comput. Hum. Inter. 12(1), 119–146 (2005) 21. Pejovic, V., Musolesi, M.: InterruptMe: designing intelligent prompting mechanisms for pervasive applications. In: Proceedings of UbiComp 2014, pp. 897–908 (2014) 22. Pielot, M., De Oliveira, R., Kwak, H., Oliver, N.: Didn’t you see my message? Predicting attentiveness to mobile instant messages. Proc. CHI 2014, 3319–3328 (2014)

Towards Supporting Multigenerational Co-creation and Social Activities

91

23. Rosenthal, S., Dey, A.K., Veloso, M.: Using decision-theoretic experience sampling to build personalized mobile phone interruption models. In: Lyons, K., Hightower, J., Huang, E.M. (eds.) Pervasive 2011. LNCS, vol. 6696, pp. 170–187. Springer, Heidelberg (2011). https:// doi.org/10.1007/978-3-642-21726-5_11 24. Smith, J., Lavygina, A., Ma, J., Russo, A., Dulay, N.: Learning to recognise disruptive smartphone notiﬁcations. In: Proceedings of MobileHCI 2014, pp. 121–124 (2014) 25. Hatano, K.: Can machine learning techniques provide better learning support for elderly people? In: Streitz, N., Konomi, S. (eds.) DAPI 2018. LNCS, vol. 10922, pp. 178–187. Springer, Heidelberg (2018) 26. Abowd, G.D., Atkeson, C.G., Hong, J., Long, S., Kooper, R., Pinkerton, M.: Cyberguide: a mobile context-aware tour guide. Wirel. Netw. 3(5), 421–433 (1997) 27. Cheverst, K., Davies, N., Mitchell, K., Friday, A., Efstratiou, C.: Developing a context-aware electronic tourist guide: some issues and experiences. In: Proceedings of CHI, pp. 17–24 (2000) 28. Bellotti, V., Begole, B., Chi, E.E., Ducheneaut, D., Fang, J., Isaacs, E., King, T., Newman, M.W., Partridge, K., Price, B., Rasmussen, P., Roberts, M., Schiano, D.J., Walendowski, A.: Activity-based serendipitous recommendations with the Magitti mobile leisure guide. In: Proceedings of CHI 2008, pp. 1157–1166 (2008) 29. Ardissono, L., Kuﬂik, T., Petrelli, D.: Personalization in cultural heritage: the road travelled and the one ahead. User Model. User-Adap. Inter. 22(1–2), 73–99 (2011) 30. Cranshaw, J.B., Luther, K., Gage, P., Norman, K., Kelley, P.G., Sadeh, N.: Curated city: capturing individual city guides through social curation. In: Proceedings of CHI, pp. 3249– 3258 (2014) 31. Sasao, T., Konomi, S. Kostakos, V., Kuribayashi, K., Goncalves, J.: Community reminder: participatory contextual reminder environments for local communities. Int. J. Hum. Comput. Stud. 102, 41–53 (2017) 32. Carmien, S., Fischer, G.: Tools for living and tools for learning. In: Proceedings of HCI International Conference (HCII), Las Vegas, CD-ROM (2005) 33. Van Deursen, A., van Dijk, J.: Internet skills and the digital divide. New Media Soc. 13(6), 893–911 (2010)

Designing a Mobile Behavior Sampling Tool for Spatial Analytics Shin’ichi Konomi1(&) and Tomoyo Sasao2 1

Kyushu University, Fukuoka, Japan [email protected] 2 Tokushima University, Tokushima, Japan

Abstract. In this paper, we build on our previous research [1, 4] to explore techniques and tools for collecting detailed behavioral data in large public spaces by deploying a small number of technology-armed researchers who act according to mobile notiﬁcations. To go beyond the limitations to conventional urban sensing, we ﬁrst examine the challenges of human-in-the-loop sensing. We then propose a mobile behavior sampling tool based on smart notiﬁcations so as to address the challenge of in-situ sampling. Keywords: Mobile behavior sampling

Spatial analytics

1 Introduction As urban sensing technologies advance, there is an increasing amount of macroscopic and microscopic data about urban spaces, which are collected and stored in digital forms. For example, mobile phones generate a large amount of Call Detail Records (CDR) that allow researchers to analyze macroscopic patterns of urban dynamics. Surveillance cameras generate a lot of image data that can be used to analyze microscopic patterns of pedestrians and vehicles. Collecting such data would enable effective spatial analytics for designing and improving commercial spheres, railway station spheres, etc. However, it continues to be difﬁcult to collect rich, microscopic data about large spaces such as city blocks and neighborhoods. In this paper, we build on our previous research [1, 4] to explore techniques and tools for collecting detailed behavioral data in large public spaces by deploying a small number of technology-armed researchers who act according to mobile notiﬁcations. To go beyond the limitations to conventional urban sensing, we ﬁrst examine the challenges of human-in-the-loop sensing. We then propose a mobile behavior sampling tool based on smart notiﬁcations so as to address the challenge of in-situ sampling. One of the challenges in designing smart notiﬁcations is to optimize collective data collection efforts to satisfy clearly deﬁned criteria such as minimization of biases. We thus discuss notiﬁcations that is based on statistical and spatial models to address this challenge. In case the goal of data collection is to construct datasets for machine learning algorithms, we could exploit proactive and active machine learning techniques as well.

© Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 92–100, 2018. https://doi.org/10.1007/978-3-319-91131-1_7

Designing a Mobile Behavior Sampling Tool for Spatial Analytics

93

Another challenge is due to the wickedness [12] of the data collection for understanding (and designing for) people and their practices in urban spaces. In particular, we need to consider shifting modes of observation [13] and exploratory processes [5], and thus focus on satisﬁcing rather than optimal solutions in many cases.

2 Limitations to Conventional Urban Sensing Despite the recent advances in urban sensing technologies, it is still difﬁcult to collect sufﬁcient data for detailed analysis of various human behaviors at scale. Conventional approaches to urban data collection have a number of limitations as discussed below: (i) Pedestrian trafﬁc census is widely used to quantify the number of people who pass by a particular location in a city. An apparent limitation to this approach is the high cost of direct observation. (ii) Mobile phone carriers collect Call Detail Record (CDR), which allows for tracking of mobile-phone users. However, the granularity of the data collected by using this approach is often quite coarse in terms of time and space. (iii) We can also ask volunteers to carry location tracking devices such as GPS receivers. However, doing so requires time-consuming and costly processes to prepare the devices and setting up complicated technological infrastructures. In addition, it is difﬁcult to collect data from a large number of unbiased samples using this approach. Thus, we could not easily understand the behaviors of the entire population in the physical space of concern. (iv) We can use radio signals from WiFi and/or Bluetooth-enabled commodity devices such as smartphones to estimate the locations of people in indoor spaces. In addition, advanced GNSS (Global Navigation Satellite System) technologies allow for detailed location tracking in outdoor spaces as well. However, the data collected by using these technologies would not be representative of everyone in the physical space of concern. Also, it would be difﬁcult to employ this approach successfully without addressing privacy concerns. (v) Networks of surveillance cameras may facilitate collection of detailed unbiased data. In particular, Benenson et al. [8] have exploited deep learning algorithms to derive pedestrians’ detailed behavioral information from video-camera images. However, it may be difﬁcult to install video cameras in certain urban spaces because of privacy concerns. Moreover, trees, vehicles and buildings can occlude the views of video cameras. (vi) Device-free localization and activity recognition techniques [9] exploit various sensor-detectable patterns such as the changes of ambient radio signals. Despite the developments in this area, it is still difﬁcult to collect detailed data at scale just by relying on this technology. Crowd replication [1] has been proposed to address some of these limitations. It relies on sensor-armed volunteers who mimic behaviors of people in public spaces. The volunteers record data from their own mobile and wearable devices while replicating the behaviors of people in proximity. The feasibility of this approach has been tested

94

S. Konomi and T. Sasao

through a ﬁeld experiment involving 4 sensor-armed volunteers who collected data about a large space near a train station in Japan. A critical aspect of this approach is the sampling strategy for determining the people whose behaviors are to be mimicked by researchers. Without a proper sampling strategy, we may end up collecting biased and/or useless data. In this paper, we propose approaches that could complement this technique.

3 Challenges for Human-in-the-Loop Urban Sensing We introduce ﬁve approaches to enable meaningful analysis of various human behaviors at scale, i.e., in-situ sampling, estimating social activities and emotions, improving data quality in context, meta-sensing, and context-aware privacy and data modeling. 3.1

In-Situ Sampling

In-situ sampling is the act of selecting samples in the ﬁeld. For example, an urban researcher may in-situ sample a next person to observe in a public space. We can devise computational tools for supporting this process such as a mobile tool that recommends targets for crowd replication [1] or direct observation. We can design such a tool based on relevant statistical models, thereby supporting or scaffolding the in-situ decision making by researchers. In-situ sampling tools can consider various models of the real-world to support researchers to collect data efﬁciently. For instance, Tobler’s ﬁrst law of geography or some patterns of spatial auto-correlations can be exploited when investigating a relevant real-world phenomenon. 3.2

Estimating Social Activities and Emotions

We can use machine learning techniques to infer social activities and emotions of the people in public spaces. Our preliminary analysis of human activities in public spaces suggested that the group size and the strength of body motion are correlated with different social activities. Moreover, in line with existing research (e.g., [10]), we can exploit off-the-shelf devices such as smartphones, smart watches and smart glasses to analyze detailed motions and infer geospatial emotional perception. In addition, those devices allow for collection and analysis of proximity patterns that may influence or reflect social interactions and activities. These provide starting points for constructing richer datasets about behaviors in public spaces. 3.3

Improving Data Quality in Context

We can exploit contextual information to improve the quality of collected data. For example, when researchers observe or mimic behaviors of people in cities, we should consider the impact of distances on human perception [11]. We should also consider temporal distances between the time something happens and the time at which it is

Designing a Mobile Behavior Sampling Tool for Spatial Analytics

95

recorded. These contextual factors may have different impact on different researchers in terms of the quality of the data they produce. In this context, we can employ data-centric approaches to develop various types of personalized mechanisms for the improvement of data quality. 3.4

Meta-sensing

There are other approaches to improve the quality of collected data. For example, we can use notiﬁcations asking multiple researchers to observe a same target person. We can also use notiﬁcations to have researchers observe other researchers. These types of observation could be considered as ‘sensing’ of the urban sensing environment itself, which we call urban meta-sensing, and allow for evaluation of researchers as well as ‘calibration’ of the human-in-the-loop urban sensing system. 3.5

Context-Aware Privacy and Data Modeling

We should collect data carefully in order to address privacy concerns of the people in public spaces. Even when researchers collect anonymous data only, observing or mimicking behaviors of pedestrians in close proximity would disturb people. In this case, it may not be so much about the privacy with the collected data. However, it is clearly important to protect the privacy of people in the physical space since privacy is inseparable from physical distances. It seems also inseparable from temporal factors such as the lengths of observation (e.g., milliseconds, seconds, minutes, hours, and days). As distances and temporal lengths can affect data quality as well as privacy concerns, we argue for a modeling approach that considers both privacy and data quality in relation to their physical context such as distances and temporal lengths. This can lead to a development of a customizable, privacy-aware and data quality-aware framework for data collection.

4 Smart Notiﬁcations for In-Situ Sampling Next we discuss the design and development of smart notiﬁcations that support researchers and volunteers to select the targets for data collection in situ in urban spaces. In general, it is too time consuming and costly to observe everyone in a public space. One might consider deploying a large number of researchers for exhaustive data collection. However, observer effects would make such an approach infeasible. We thus focus on the need to collect unbiased useful data by observing or replicating behaviors of a limited number of people. In this context, we propose a mobile tool that recommends appropriate targets for observation (or replication) based on relevant sampling methods [14–16]: The mobile tool is intended to support the following three sampling methods: 1. Notiﬁcation-based sampling method: We assume that researchers and/or volunteers use mobile devices such as smartphones, smart watches, or smart glasses to receive notiﬁcations from the computational backend system. The backend system

96

S. Konomi and T. Sasao

collaborates with mobile clients to trigger notiﬁcations based on different sampling strategies so as to enhance the perceptions and support the in-situ decision making of researchers and volunteers. 2. Simulation-based spatial and cluster sampling method: Although we sometimes desire to collect perfect data, we must consider a number of practical constraints when collecting data in the real world. There are a number of practical sampling methods for observational studies [14]. In general, these practical sampling approaches consider the nature of data collection and relevant research goals in order to maximize the usefulness of the collected data while keeping the costs of data collection reasonably small. Of particular interest in the context of this research is spatial sampling [16] and cluster sampling [15] as they can be applied to the analytics of behaviors in urban spaces. One of the challenges in applying these techniques to urban spaces is potential limitation of the models of spaces and clusters. We thus consider the uses of simulation-based modeling of spaces and clusters. 3. Adaptive sampling method: When researchers and volunteers receive a notiﬁcation, it would guide them to a location at which appropriate samples can be observed. It would also show how they can select samples at the location (e.g., “select the person that arrives at the location next”). In this manner, the system combines computational algorithms and “physically-based algorithms” to select samples. As they receive multiple notiﬁcations and observe multiple targets, the system incrementally accumulates the data that would be useful for improving and reﬁning the future sampling processes. In this context, we exploit adaptive sampling as part of the mobile tool. We intend to evaluate the effectiveness and usability of the mobile tool and the backend system, and improve it iteratively. This will include simulation-based evaluation to evaluate different aspects of the collected data as well as usability evaluation based on common assessment tools such as NASA-TLX.

5 Simulation-Based Modeling of Spaces and Clusters As discussed in the previous section, we consider the uses of simulation-based modeling of spaces and clusters. We ﬁrst use a model of pedestrian behaviors to generate large datasets of simulated pedestrian behaviors within the speciﬁed city blocks and neighborhoods. Subsequently, we derive relevant clusters and spatial patterns based on the datasets. We have extended the Social Force Model (SFM) [17] by integrating it with a probabilistic model of route-choice behavior [19]. SFM emulates the motion of pedestrians as if they act according to “social forces.” This model can consider shapes and structures of roads and interaction among pedestrians, and can produce ﬁner-grained pedestrian movements than other simplistic models such as the Random Waypoint Model.

Designing a Mobile Behavior Sampling Tool for Spatial Analytics

97

In this model, the velocity ~ vi of a pedestrian is governed by the four force terms: X X d! vi ~ ~ ~ ~ ¼ f i þ f iB þ f ij þ f ik þ fluctuations dt k j6¼i

1. 2. 3. 4.

~ f i is the acceleration toward the next destination considering a desired speed ~ f iB is the repulsive force due to borders fij is the similar repulsive force due to pedestrian j ~ f ik is the attractive force due to people, objects, and events at position ~ k

The above equation also takes into consideration fluctuations due to accidental or deliberate deviations from the optimal behavior. The desired speed is approximately Gaussian distributed with a mean value of 1.3 m/s and a standard deviation of 0.3 m/s [18]. Again, we have extended the SFM by integrating it with a simple probabilistic route-choice behavior. At an intersection, a pedestrian who walks on the left sidewalk of a street turns left or goes straight with the same probability of 0.5. Similarly, we determined the probabilities of pedestrians walking on the right sidewalk of a street. A map illustrated in Fig. 1 is drawn on the basis of real streets in downtown Tokyo. Three circular sensing areas (A, B, and C) with a radius of 10 m for evaluation purpose are also shown in the ﬁgure. The sensing areas A, B, and C were selected as a representative of vertical street, horizontal street, and intersection, respectively.

Fig. 1. Sample visualization of simulated pedestrian behaviors based on the Social Force Model.

6 The Mobile Behavior Sampling Tool Next we describe the client-server system architecture of our mobile behavior sampling tool (see Fig. 2).

98

S. Konomi and T. Sasao

Mobile Context Sensing

Notification Manager

Location Request

Local Notifications

Android Notification

(AWARE)

User-facing devices

Client Recommender for Behavior Sampling Simulation of Pedestrian Behavior

Notifications & Collected Data

Server

Fig. 2. System architecture of mobile behavior sampling tool

The software on the client side is based on Community Reminder, a smartphonebased platform that supports community members to design and use context-aware reminders [6]. Its mobile context sensing module exploits the AWARE Framework [20] to detect various mobile contexts, which are used to trigger notiﬁcations at the right time. Researchers and volunteers can use Android smartphones, smart watches, and smart glasses to receive and respond to notiﬁcations. The server manages the collected data and notiﬁcations, and provides the core mechanisms for mobile behavior sampling including the simulation of pedestrian behaviors.

7 Conclusion To go beyond the limitations to conventional urban sensing, we have examined the challenges for human-in-the-loop sensing, including in-situ sampling, estimating social activities and emotions, improving data quality in context, meta-sensing, and context-aware privacy and data modeling. We then proposed smart notiﬁcations with the aim to address the challenge of in-situ sampling. Our smart notiﬁcations play critical roles in the proposed mobile tool that supports notiﬁcation-based sampling method, simulation-based spatial and cluster sampling method, and adaptive sampling method. Moreover, it employs SFM-based modeling of spaces and clusters to improve sampling. We also described the system architecture to integrate different components and provide services on different mobile devices, including smartphones, smart watches, and smart glasses.

Designing a Mobile Behavior Sampling Tool for Spatial Analytics

99

We expect that the proposed system will help people collect rich, microscopic data about large spaces such as city blocks and neighborhoods as it has the following advantages: 1. It can help collect data about all kinds of people in public spaces including elderly people and children who may not have GPS-enabled smartphones. It would also facilitate the uses of collected data for various purposes including inclusive design of urban environments, and development of personalized, context-aware digital services. 2. It allows for smart sampling of the targets for observation. We therefore expect that researchers and volunteers will be able to collect quality data with smaller biases than what can be collected by using existing urban sensing systems. Our smart sampling mechanisms could also be used in different spaces besides urban public spaces. 3. We also discussed approaches to improve data quality by exploiting contextual factors and meta-sensing. They can be considered in the design of the next versions of the system. 4. Context-aware privacy and data modeling can help select appropriate data collection methods in different situations and enhance the privacy of urban inhabitants. This can also be considered in the design of the next versions of the system. Our future plans include iterative reﬁnement of the system architecture and integration of system components as well as a full test of feasibility in different urban spaces. We also intend to incorporate more features for supporting exploratory data collection processes and shifting modes of observation. Acknowledgement. This work was supported by JSPS KAKENHI Grant Numbers JP17909134 and JP17865988.

References 1. Hemminki, S., Kuribayashi, K., Konomi, S., Nurmi, P., Tarkoma, S.: Quantitative evaluation of public spaces using crowd replication. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2016), San Francisco, CA, 31 October–3 November 2016. ACM Press, New York (2016). https://doi.org/10.1145/2996913.2996946 2. Konomi, S., Ohno, W., Sasao, T., Shoji, K.: A context-aware approach to microtasking in a public transport environment. In: Proceedings of the 5th IEEE International Conference on Communications and Electronics, Special Session on Crowdsourcing and Crowdsourcing Applications, Da Nang, 30 July–1 August 2014, pp. 498–503. IEEE, Piscataway (2014). https://doi.org/10.1109/CCE.2014.6916754. ISBN: 978-1479950492 3. Konomi, S., Sasao, T.: Crowd geofencing. In: Proceedings of the 2nd EAI International Conference on IoT in Urban Space (Urb-IoT 2016), Tokyo, Japan, 24–25 May 2016, pp. 14– 17. ACM Press, New York (2016). https://doi.org/10.1145/2962735.2962744. ISBN: 978-1-4503-4204-9

100

S. Konomi and T. Sasao

4. Sasao, T., Konomi, S.: The use of historical information to support civic crowdsourcing. In: Streitz, N., Markopoulos, P. (eds.) DAPI 2016. LNCS, vol. 9749, pp. 470–481. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39862-4_43 5. Sasao, T., Konomi, S., Arikawa, M., Fujita, H.: Context weaver: awareness and feedback in networked mobile crowdsourcing tools. Comput. Netw. Int. J. Comput. Telecommun. Netw. 90, 74–84 (2015). https://doi.org/10.1016/j.comnet.2015.05.022. Special Issue on Crowdsourcing 6. Sasao, T., Konomi, S., Kostakos, V., Kuribayashi, K., Goncalves, J.: Community reminder: participatory contextual reminder environments for local communities. Int. J. Hum. Comput. Stud. 102, 41–53 (2017). https://doi.org/10.1016/j.ijhcs.2016.09.001 7. Sasao, T., Konomi, S., Kuribayashi, K.: Activity recipe: spreading cooperative outdoor activities for local communities using contexual reminders. In: Streitz, N., Markopoulos, P. (eds.) DAPI 2015. LNCS, vol. 9189, pp. 590–601. Springer, Cham (2015). https://doi.org/ 10.1007/978-3-319-20804-6_54. ISBN: 978-3319208039 8. Benenson, R., Mathias, M., Timofte, R., Van Gool, L: Pedestrian detection at 100 frames per second. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 2903–2910 (2012) 9. Youssef, M., Mah, M., Agrawala, A.: Challenges: device-free passive localization for wireless environments. In: Proceedings of the 13th Annual ACM International Conference on Mobile Computing and Networking (MobiCom 2007), pp. 222–229 (2007) 10. Raja, R., Exler, A., Hemminki, S., Konomi, S., Sigg, S., Inoue, S.: Towards geospatial emotional perception. Geoinform. J. (2017). https://doi.org/10.1007/s10707-017-0294-1 11. Hall, E.T.: The Hidden Dimension. Anchor Books, New York (1966) 12. Rittel, H.: Dilemmas in a general theory of planning. Policy Sci. 4(2), 155–169 (1973). https://doi.org/10.1007/bf01405730.198 13. Spradley, J.P.: Participant Observation. Harcourt Brace Jovanovich College Publishers, New York (1980) 14. Altmann, J.: Observational study of behavior: sampling methods. Behavior 49(3), 227–266 (1974) 15. Thompson, S.K.: Adaptive cluster sampling. J. Am. Stat. Assoc. 85(412), 1050–1059 (1990) 16. Smith, M.J., Goodchild, M.F., Longley, P.A.: Geospatial Analysis, 5th edn. Prentice Hall, Upper Saddle River (2015) 17. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51(5), 4282–4286 (1995) 18. Helbing, D., Buzna, L., Johansson, A., Werner, T.: Self-organized pedestrian crowd dynamics: experiments, simulations, and design solutions. Transp. Sci. 39(1), 1–24 (2005) 19. Thepvilojanapong, N., Konomi, S., Tobe, Y.: A study of cooperative human probes in urban sensing environments. IEICE Trans. Commun. E93-B(11), 2868–2878 (2010). https://doi. org/10.1587/transcom.E93.B.2868. Special Section on Fundamental Issues on Deployment of Ubiquitous Sensor Networks 20. Ferreira, D., Kostakos, V., Dey, A.K.: AWARE: mobile context instrumentation framework. Front. ICT 2 (2015). 6 pages

Design and Evaluation of Seamless Learning Analytics Kousuke Mouri1 ✉ , Noriko Uosaki2, and Atsushi Shimada3 (

1

)

Tokyo University of Agriculture and Technology, Tokyo, Japan [email protected] 2 Osaka University, Osaka, Japan 3 Kyushu University, Fukuoka, Japan

Abstract. This paper describes a learning analytics perspective for designing to implement a seamless learning environment. Seamless learning has been focused on supporting learning across formal and informal learning contexts, individual and social learning and physical world and cyberspace. Majority of the current researches have realized a seamless learning environment by using the technol‐ ogies such as smart-phone and GPS at schools or universities. However, utiliza‐ tion of the collected learning logs still remains a challenge yet to be explored. In this study, to construct a seamless learning environment, this study developed a system that integrated a digital textbook system called AETEL with a ubiquitous learning system called SCROLL. The system enables learners to bridge the learning between digital textbook learning and real-life learning. To analyze and visualize the relationships between them, this study developed an innovative system called VASCORLL 2.0 (Visualization and analysis System for Connecting Relationships of Learning Logs). An experiment was conducted to evaluate whether VASCORLL 2.0 can increase learners’ learning opportunities. As a result, they were able to increase their learning opportunities by using VASCORLL 2.0. It contributed to enhancing learning activities in the seamless learning environment by utilizing the collected learning logs with well-designed analysis and visualization approaches. Keywords: Seamless learning · Learning analytics · Digital textbook

1

Introduction

In recent years, seamless learning systems have been constructed using information technologies such as mobile devices, RFID tags, QR codes and wireless networks. Seamless learning has been recognized as an eﬀective learning approach across various dimensions including formal and informal learning contexts, individual and social learning, and physical world and cyberspace [1]. One of its most important issues is how to bridge in-class and out-of-class learning because this is inevitable in designing both in-class and out-of-class activities to link what learners have learned in class with their daily life experiences and vice versa, particularly, to link what they have learned in their daily lives to their experiences in class.

© Springer International Publishing AG, part of Springer Nature 2018 N. Streitz and S. Konomi (Eds.): DAPI 2018, LNCS 10922, pp. 101–111, 2018. https://doi.org/10.1007/978-3-319-91131-1_8

102

K. Mouri et al.

So far, majority of researches in the seamless learning focus on realizing a seamless learning environment at schools or universities [2, 3]. The advantages of the seamless learning environment are to enhance learners’ learning opportunities and autonomy learning with sustaining their learning motivation. However, it is yet to be visualized and analyzed the collected learning logs to enhance the quality of learning. This study contended that learning eﬃcacy can be enhanced by utilizing their learning logs. First, we considered that the research issues of learning analytics based on seamless learning environments are as follows: (1) How can we utilize the learning logs collected in a seamless learning system? (2) How can analysis bridge the gap between formal and informal learning? (3) How can analysis increase learners’ learning opportunities? To address these issues, our research project proposed a seamless visualization and analysis system called VASCORLL 2.0 (Visualization and Analysis System for Connecting Relationships of Learning Logs). The system seamlessly supports e-bookbased learning and real-life learning by integrating a ubiquitous learning system called SCROLL with an e-book system. The rest of this paper is constructed as follows. Section 2 describes literature review for clearly identifying the diﬀerence between related works and our research. Section 3 describes our previous works regarding SCROLL and VASCORLL. Section 4 describes the design of VASCORLL 2.0. Section 5 describes the implementation of VASCORLL 2.0. Finally, Sect. 6 describes evaluation and our conclusion.

2

Literature Review

2.1 Design of Seamless Learning Environments Seamless learning is used to describe the situations where students can learn whenever they want to in variety of scenarios and that they can switch from one scenario to another easily and quickly using one device or more per student as a mediator. Researchers in the seamless learning used mediating tools such as smart phone and PDA to realize a seamless learning environment. For example, Wong et al. [4] reported a seamless learning system called MYCLOUD (My Chinese UbiquitOUs learning Days), which allow students to learn the Chinese language in both in-school and out-of-school learning spaces using mobile devices. MYCLOUD consists of three components to bridge formal learning and informal learning: mobile dictionary, digital textbook and Social network service. In a formal learning setting, learners use the digital textbooks to highlight unfa‐ miliar vocabularies and the vocabularies will be added to the mobile dictionary. In an informal learning setting, they use the social network service to record the artifacts (photo(s) + sentence(s)) of the experiences in their daily life. The seamless learning environment is realized by linking the vocabularies between the digital textbooks and the social network service. On the other hand, Uosaki et al. [5] reported a seamless learning system called SMALL (Seamless Mobile-Assisted Language Learning support system) to support students who aimed to learn the English language in a formal and informal setting.

Design and Evaluation of Seamless Learning Analytics

103

SMALL has been developed with newly functions added to SCROLL. In a formal setting, learners use digital textbook to record vocabularies that they want to remember and the vocabularies will be added to the SCROLL database. In an informal setting, learners can record the digital records (a vocabulary with a photo or a video) of their learning experiences in their daily lives. The seamless learning environment is realized by linking the vocabularies between digital textbook and SCROLL. Therefore, in designing seamless learning environments, researchers need to consider how formal and informal learning are linked with use of computer technologies. To construct a seamless learning environment based on above reviews, this study designed and developed a seamless learning system by integrating the digital textbook system called AETEL (Actions and learning on E-TExtbook Logging) and SCROLL. As far, researchers have constructed a seamless learning environment and evaluated whether the seamless learning environment can be enhanced learners’ learning eﬃcacy and autonomy learning, while this study considers learning analytics perspective with designing successfully the seamless learning environment because the collected learning logs aren’t utilized to support teaching and learning. 2.2 Authentic Learning with Learning Analytics Many empirical researchers have found that classroom-only learning is not conductive to enhance learners’ communicative skills, such as listening and speaking, sustaining their learning motivation. It is necessary to consider not only the design of in-class learning but also out-of-class-learning or authentic learning [6]. In this study, the term “authentic learning” is deﬁned as either experimental learning or real-life learning. In recent years, Learning Analytics (LA) have been focused to ﬁnd useful informa‐ tion for improving and optimizing teaching and learning. The deﬁnition and aims of LA are discussed actively by researchers. The techniques and methods of LA include infor‐ mation visualization and social network analysis. The information visualization allows teachers and learners to see, explore and understand large amounts of information at once [7]. The Social Network Analysis (SNA) allows teachers and learners to discover the relationships between various elements such as learners and knowledge, and knowl‐ edge and locations [8]. Based on above reviews, this study analyzes learning logs collected by our developed seamless learning system with information visualization and SNA.

3

Previous Work

3.1 AETEL This study developed a digital textbook system called AETEL [9]. Figure 1(Left) shows the interface of directories of AETEL. Teachers can create e-book contents using PowerPoint and Keynote prior to the class and use them in their courses. The uploaded e-book contents are converted to EPUB format and it is supported to access the contents by using smartphones and PCs. Figure 1(Right) shows digital textbooks uploaded by the teachers.

104

K. Mouri et al.

Fig. 1. Directories and digital textbooks in AETEL

Figure 2 shows the digital textbook viewer interface and slide descriptions. Learners can read the digital textbooks on their web browser anytime and anywhere. For example, when a learner clicks the memo button on the digital textbook viewer system, he/she can write a description concerning the target words as shown in Fig. 2(Right-top). When a learner clicks the highlight button, he/she can highlight the word. In addition, he/she can ﬁnd the page number corresponding to the target word in the e-book by clicking the search button.

Fig. 2. Digital textbook viewer interface

Design and Evaluation of Seamless Learning Analytics

105

3.2 SCROLL SCROLL is to support real-life language learning for international students using ubiq‐ uitous technologies such as wireless networks, Global Positioning System (GPS) and QR codes. SCROLL provides a well-designed form to record a learning log. It adopts an approach to share contents with other users based on a LORE (Log-Organize-RecallEvaluate) model proposed by [10]. For example, when an international student faces some problems such as how to read, write and pronounce words in their real life, they can record what they have learn with photo, location such as latitude and longitude, learning place (e.g. building name), and date and time of creation as a learning log as shown in Fig. 3. Figure 4 shows an example of the learning log. Learners can share the learning log each other.

Fig. 3. Add learning log

Fig. 4. A learning log

3.3 VASCORLL Our previous VASCORLL could visualize and analyze learning logs accumulated in SCROLL to support real-life learning [11, 12]. For example, there is a learning log where a Japanese language learner learned “fan” at the university in the past. It means “ (mechanical fan)” in Japanese. There is another learning log where another learner learned the same word, “fan” in a diﬀerent context in the past. In this case it means “ (Uchiwa is a round, ﬂat paper fan with a wooden or plastic handle)” in Japanese. Even if the English word is the same, the meaning might be diﬀerent if the context is diﬀerent. By using VASCORLL, they can learn such relationships. The results of an evaluation experiment indicated that VASCORLL was a useful tool in detecting the correlations among learners, words and locations in a ubiquitous learning

ᶵ ࡕࢃ

ᡪ㢼 ࠺

106

K. Mouri et al.

environment. Furthermore, VASCORLL could increase learners’ learning opportunities and learners can apply their own experiences to diﬀerent learning places. However, the system did not consider learning analytics in the seamless learning in order to ﬁnd central words bridging digital textbook learning over real-life learning. Therefore, this study developed VASCORLL 2.0 based on the previous work.

4

VASCORLL 2.0

4.1 Design The purpose of VASCORLL 2.0 is to support learners to apply what they have learned in digital textbooks to their real-life learning and vice versa. To link both digital text‐ books learning and real-life learning, this study designed innovative visualization struc‐ tures as shown in Fig. 5: Digital textbook Learning Structure (DLS) and Real-lie Learning Structure (RLS). DLS consists of three layers, which are called “Digital textbook learner”, “Words learned through digital textbooks”, and “digital textbooks”. RLS consists of three layers, which are called “Real-life learners”, “Words learned in a real-life”, and “Locations”. (1) Visualization method in the DLS: For example, when a learner read the learning contents in a digital textbook, he/she is likely to discover unfamiliar words. When he/she highlights the words in a digital textbook using AETEL, our visualization method will ﬁrst create nodes indicating the learner, the word learned through the digital textbooks and the digital textbook. Secondly, it will connect the learner’s node in the upper layer in the DLS to the word’s node in the intermediate layer in the DLS. The word’s node will be connected to the digital textbook node in the lowest layer in the DLS. By visualizing these links, teachers and students can grasp the following information: In the upper layer in the DLS, the learners’ nodes with the large number of edges to words show that learners frequently highlighted words. In the intermediate layer in the DLS, the words’ nodes with the large number of edges to learners’ nodes and digital textbooks’ nodes show that the words in the digital textbooks are frequently highlighted by the learners. In the lowest layer in the DLS, the digital textbooks’ nodes with the large number of edges to the words’ nodes show that the textbook contains many important words to be learned. (2) Visualization method in the RLS: For example, when a learner records a learning log using SCROLL, our visualization method will ﬁrst create nodes indicating the learner, the words learned in a real life, and the location. Secondly, it will connect the learner’s node in the upper layer in the RLS to the word’s node in the inter‐ mediate layer in the RLS. In addition, the word node will be connected to the loca‐ tion node in the lowest layer in the RLS. By visualizing these links, teachers and students can grasp the following information: In the upper layer in the RLS, the learners’ nodes with the large number of edges to word nodes show that they recorded many learning logs in their real life. In the intermediate layer in the RLS, the word nodes with the large number of edges to the learner nodes and location nodes shows vital words that many learners recorded in their real life. In the lowest

Design and Evaluation of Seamless Learning Analytics

107

Fig. 5. Visualization structures in the seamless learning environments: Digital textbook Learning Structure (ELS) and Real-life Learning Structure (RLS)

layer in the RLS, the location nodes with the large number of edges to the word nodes show the locations where they can learn a lot of words. To construct a seamless learning environment based on the above visualization structures, this study connects words learned through digital textbooks in the DLS to words learned in a real life in the RLS if both words are same. By connecting them, learners can learn the word in a real life related to it after learning a word in a digital textbook and vice versa. However, it is diﬃcult to discover only connecting vital nodes which connect digital textbook learning and real-life learning only by connecting nodes. Therefore, this study presents the centralities using SNA as shown in Table 1. Degree, closeness and betweenness centralities are the fundamental measurement concepts for the social network analysis. Particularly, we hypothesize that the betweenness centrality could bridge the gap between digital textbook learning and real-life learning. For example, if a learner learns a word “natto” in a digital textbook, there would be various contexts where he/she can learn it such as supermarkets, shopping malls, and restaurants. So far, the information on whether he/she can learn it in other contexts, or exact location information where he/she can learn it is yet to be provided. This study provides such information for learners.

108

K. Mouri et al. Table 1. Centralities of social network analysis Detail

Centrality

Formula (graph G: = (V,E) with V vertices and E edges)

Degree

Degree centrality is deﬁned as the number of links incident upon a node. That is, it is the sum of each row in the adjacency matrix representing the network. N is the number of node and ki is the degree of the node i N−1 Closeness centrality is that the CiC = (Li ) = ∑ distance of a node to all others in the j∈G,j≠i dij network. dij is the shortest path length between i and j, and Li is the average distance from I to all the other nodes ∑ ∑ njk (i) Betweenness centrality is that the 1 CiB = (N − 1)(N − 2) j∈G,j≠i k≠i,k≠j njk number of shortest paths between any two nodes that pass via a given node. njk is the number of the shortest path between j and k, and njk (i) is the number of the shortest path between j and k that contains node i

Closeness

Betweenness

5

CiD =

ki N−1

Evaluation

5.1 Participants Twenty international students who are studying at the University of Kyushu in Japan participated in the evaluation experiments. The students were from China, Malaysia, Thailand, and Mongolia and aged from 21 to 36 years old. Their length of stay in Japan ranged from 1 month to 5 years. The evaluation experiment was designed to evaluate the following point: Whether VASCORLL 2.0 can increase the participants’ learning opportunities (“learning opportunities” denotes that the number of learning logs that they uploaded to the system during the evaluation period). 5.2 Procedure An administrator uploaded digital textbook contents to the server prior to his/her class. The uploaded digital textbook contents were created according to JLPT (Japanese Language Proﬁciency Test). JLPT has been oﬀered by the Japan foundation and japan education exchanges and services, as reliable means of evaluating and certifying the Japanese proﬁciency of non-native speakers. This evaluation term was conducted for two weeks. At the ﬁrst week, he/she had a brieﬁng session on how to use AETEL with SCROLL since it was their ﬁrst time to use it. Based on the uploaded learning logs during the ﬁrst week, the participants were

Design and Evaluation of Seamless Learning Analytics

109

divided into two groups as even as possible in terms of the keenness of language learning: Group A (Experimental group) and Group B (Control group). Table 2 shows the number of learning logs that the participants uploaded during the ﬁrst week. Group A participants uploaded 143 learning logs and group B participants uploaded 149 learning logs to the system. The means and standard deviations were 14.3 and 6.78 for Group A, and 14.79 and 6.51 for Group B. The t-test shows that there was no signiﬁcant diﬀerent between the two groups (t = 0.201, p > 0.05). This result indi‐ cates that the participants of the two groups have learning opportunities before using VASCORLL 2.0. Then, he/she introduced how to use VASCORLL 2.0. After evaluation experiment, this study considers the diﬀerence of learning activities between Group A and B using collected learning logs. Table 2. Number of uploaded learning logs in the ﬁrst week (practice period) Group Group A Group B

Participants 10 10

Learning logs Mean 143 14.3 149 14.9

SD 6.78 6.51

t 0.201

p p > 0.05

5.3 Result Table 3 shows the means and standard deviations for experimental and control groups in the 1st and 2nd week. Table 3. Number of uploaded learning logs in the second week

Group A Group B

1st week 143 (6.78) 149 (6.51)

2nd week 189 (6.41) 127 (6.75)

F

P

4.11

0.08

The result of repeated measures analysis showed that interaction eﬀect between group and time of measurements was signiﬁcant, (F = 4.11, p

Distributed, Ambient and Pervasive Interactions: Technologies and Contexts

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch