Advances in Computing

This book constitutes the refereed proceedings of the 13th Colombian Conference on Computing, CCC 2018, held in Cartagena, Colombia, in September 2018.The 46 revised full papers presented were carefully reviewed and selected from 194 submissions. The papers deal with the following topics: information and knowledge management, software engineering and IT architectures, educational informatics, intelligent systems and robotics, human-computer interaction, distributed systems and large-scale architectures, image processing, computer vision and multimedia, security of the information, formal methods, computational logic, and theory of computation.


110 downloads 5K Views 76MB Size

Recommend Stories

Empty story

Idea Transcript


Jairo E. Serrano C. Juan Carlos Martínez-Santos (Eds.)

Communications in Computer and Information Science

Advances in Computing 13th Colombian Conference, CCC 2018 Cartagena, Colombia, September 26–28, 2018 Proceedings

123

885

Communications in Computer and Information Science Commenced Publication in 2007 Founding and Former Series Editors: Phoebe Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu, Dominik Ślęzak, and Xiaokang Yang

Editorial Board Simone Diniz Junqueira Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil Joaquim Filipe Polytechnic Institute of Setúbal, Setúbal, Portugal Igor Kotenko St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia Krishna M. Sivalingam Indian Institute of Technology Madras, Chennai, India Takashi Washio Osaka University, Osaka, Japan Junsong Yuan University at Buffalo, The State University of New York, Buffalo, USA Lizhu Zhou Tsinghua University, Beijing, China

885

More information about this series at http://www.springer.com/series/7899

Jairo E. Serrano C. Juan Carlos Martínez-Santos (Eds.)

Advances in Computing 13th Colombian Conference, CCC 2018 Cartagena, Colombia, September 26–28, 2018 Proceedings

123

Editors Jairo E. Serrano C. Universidad Tecnológica de Bolívar Cartagena Colombia

Juan Carlos Martínez-Santos Universidad Tecnológica de Bolívar Cartagena Colombia

ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-3-319-98997-6 ISBN 978-3-319-98998-3 (eBook) https://doi.org/10.1007/978-3-319-98998-3 Library of Congress Control Number: 2018950845 © Springer Nature Switzerland AG 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The Colombian Conference on Computing (CCC) is an annual gathering organized by the Colombian Computer Society. It aims to promote and strengthen the Colombian community in computer science, bringing together researchers, students, and practitioners, both national and international. The Colombian Computer Society has organized this conference since 2005 in Cali, which has been held in successive editions in Bogotá (2007), Medellín (2008), Bucaramanga (2010), Manizales (2011), Medellín (2012), Armenia (2013), Pereira (2014), Bogotá (2015), Popayán (2016), and Cali (2017). The 13th Colombian Conference on Computing was held in Cartagena de Indias again, its city of birth, during September 26–28, 2018. The conference was attended by national and international researchers. This year the conference was organized by the Colombian Computer Society and the Universidad Tecnológica de Bolívar. This conference was an opportunity to discuss and exchange ideas about computing techniques, methodologies, and tools, among others, with a multidisciplinary approach, strengthening the synergies between researchers, professionals, and companies related to the topics of interest of the conference. The conference covers the following areas: • • • • • • • • •

Information and knowledge management Software engineering and IT architectures Educational informatics Intelligent systems and robotics Human–computer interaction Distributed systems and large-scale architectures Image processing, computer vision, and multimedia Security of information Formal methods, computational logic, and theory of computation

The conference allowed the presentation of research papers with (a) a significant contribution to knowledge or (b) innovative experiences in the different areas of computing. This conference included plenary lectures, discussion forums, tutorials, and a symposium for master and doctoral students. All paper submissions were reviewed by two experts. Authors removed personal details, the acknowledgments section, and any reference that may disclose the authors’ identity. We received 194 submissions, of which 36 were accepted as full papers. National and international reviewers participated in the review process. The EasyChair system was used for the management and review of submissions. Our sincere thanks go to all the Technical Program Committee members and authors who submitted papers to the 13th CCC and to all speakers and participants. July 2018

Jairo E. Serrano C. Juan Carlos Martínez-Santos

Organization

Conference Chairs Juan Carlos Martínez-Santos Jairo Serrano Castañeda

Universidad Tecnológica de Bolívar, Colombia Universidad Tecnológica de Bolívar, Colombia

Program Committee Mauricio Alba Luis Fernando Castro César Collazos Toni Granollers Leonardo Flórez María Patricia Trujillo Nestor Duque Iván Cabezas Carlos Hernán Gómez Harold Castro

Universidad Autónoma de Manizales, Colombia Universidad del Quindio, Colombia Universidad del Cauca, Colombia Universidad de Lleida, Spain Pontificia Universidad Javeriana de Bogotá, Colombia Universidad del Valle, Colombia Universidad Nacional de Colombia, Colombia Universidad de San Buenaventura, Colombia Universidad de Caldas, Colombia Universidad de los Andes, Colombia

Colombian Computer Society (SCO2) Enrique González María Clara Gómez Yenny Alexandra Méndez Alegría Iván M. Cabezas T. Juan Carlos Martinez Andrés Solano Jorge Iván Ríos Patiño

Technical Program Committee Gerardo M. Sarria M. Néstor Darío Duque Méndez Silvana Aciar Maria Villegas Hector Florez

Pontificia Universidad Javeriana Cali, Colombia Universidad Nacional de Colombia, Colombia Instituto de Informática, Universidad Nacional de San Juan, Argentina Universidad del Quindío, Colombia Universidad Distrital Francisco Jose de Caldas, Colombia

VIII

Organization

Fabio Martinez Carrillo Paula Lago Francisco Alvarez Mauricio Alba-Castro Sonia Contreras Ortiz Victor M. Gonzalez Edwin Puertas Cristina Manresa-Yee Pablo Ruiz Fabio González Luis Fernando Castro Rojas Harold Castro Helga Duarte Vanessa Agredo Delgado Jorge Villalobos Andres Moreno Philippe Palanque Pablo Torres-Carrion Patricia Paderewski Juan Pavón José Antonio Macías Iglesias Ana Isabel Molina Díaz Enrique González Marta Rosecler Bez Ricardo Azambuja Silveira Ivan Cabezas Alicia Mon Carina Gonzalez Andrés Adolfo Navarro Newball Olga Marino Jose Luis Villa Gabriel Pedraza Angela Carrillo-Ramos Wilson Javier Sarmiento Carlos Mario Zapata Jaramillo Daniela Quiñones Yannis Dimitriadis Jaime Muñoz-Arteaga Víctor Bucheli Jaime Chavarriaga

Bioingenium Research Group, National University of Colombia, Colombia Uniandes, Colombia Universidad Autonoma de Aguascalientes, Mexico Universidad Autonoma de Manizales UAM, Colombia Universidad Tecnológica de Bolívar, Colombia Instituto Tecnológico Autónomo de México, Mexico Universidad Tecnologica de Bolivar, Colombia University of the Balearic Islands, Spain Unicomfacauca, Colombia Universidad Nacional de Colombia, Colombia Universidad Nacional de Colombia - Universidad del Quindío, Colombia Communications and Information Technology Group (COMIT), Universidad de Los Andes, Colombia Universidad Nacional de Colombia, Colombia Unicauca, Colombia University of los Andes, Colombia University of los Andes, Colombia ICS-IRIT, University of Toulouse 3, France UTPL, Ecuador University of Granada, Spain Universidad Complutense de Madrid, Spain Universidad Autónoma de Madrid, Spain University of Castilla-La Mancha, Spain Pontificia Universidad Javeriana, Colombia UFRGS, Brazil Universidade Federal de Santa Catarina, Brazil Universidad de San Buenaventura, Colombia Universidad Nacional de La Matanza, Argentina Universidad de La Laguna, Spain Pontificia Universidad Javeriana, Cali, Colombia Universidad de los Andes, Colombia Universidad Tecnológica de Bolívar, Colombia Universidad Industrial de Santander, Colombia Pontificia Universidad Javeriana, Colombia Universidad Militar Nueva Granada, Colombia Universidad Nacional de Colombia, Colombia Pontificia Universidad Católica de Valparaíso, Chile University of Valladolid, Spain Universidad Autónoma de Aguascalientes, Mexico Universidad del Valle, Colombia University of Los Andes, Colombia

Organization

José Antonio Pow-Sang Fernando De La Rosa R. Cristian Rusu Andrea Rueda Leonardo Flórez-Valencia Gisela T. de Clunie Mario Alberto Moreno Rocha Jorge E. Camargo Norha M. Villegas Omar S. Gómez Claudia Roncancio Cesar Collazos Tiago Primo Federico Botella Maria Patricia Trujillo Kyungmin Bae William Caicedo Lyda Peña Artur Boronat Fáber Danilo Giraldo Velásquez Mauricio Ayala-Rincon Andrés Sicard-Ramírez Mayela Coto Gustavo Isaza Miguel Redondo Carlos Hernan Gomez Mauricio Toro-Bermudez Leandro Krug Wives Hugo Jair Escalante Xabiel García Pañeda Alfonso Infante Moro Toni Granollers Luis Freddy Muñoz Sanabria Angela Villareal Juan Francisco Diaz Leonardo Arturo Bautista Gomez

IX

Pontificia Universidad Catolica del Peru, Peru Universidad de los Andes, Colombia Pontificia Universidad Catolica de Valparaiso, Chile Pontificia Universidad Javeriana, Colombia Pontificia Universidad Javeriana, Colombia Universidad Tecnológica de Panamá, Panama Universidad Tecnológica de la Mixteca, Mexico Universidad Antonio Nariño, Colombia Universidad Icesi, Cali, Colombia, Technical School of Chimborazo, Ecuador Grenoble University, France Colombia Federal University of Pelotas, Brazil UMH, Spain Universidad del Valle, Colombia Pohang University of Science and Technology (POSTECH), South Korea Universidad Tecnológica de Bolívar, Colombia Universidad Autonoma de Occidente, Colombia University of Leicester Colombia Universidade de Brasilia, Brazil EAFIT University, Colombia Universidad Nacional, Costa Rica University of Caldas, Colombia University of Castilla-La Mancha, Spain Universidad de Caldas Universidad Eafit, Colombia Universidade Federal do Rio Grande do Sul (UFRGS), Brazil INAOE Universidad de Oviedo, Spain Universidad de Huelva, Spain University of Lleida, Spain Fundacion Universitaria de Popayan, Colombia Universidad del Cauca, Colombia Universidad del Valle, Colombia Barcelona Supercomputing Center, Spain

X

Organization

Contents

Physiological Signals Fusion Oriented to Diagnosis - A Review . . . . . . . . . . Y. F. Uribe, K. C. Alvarez-Uribe, D. H. Peluffo-Ordoñez, and M. A. Becerra Optimized Artificial Neural Network System to Select an Exploration Algorithm for Robots on Bi-dimensional Grids . . . . . . . . . . . . . . . . . . . . . . Liesle Caballero, Mario Jojoa, and Winston Percybrooks Comparative Analysis Between Embedded-Spaces-Based and Kernel-Based Approaches for Interactive Data Representation. . . . . . . . . C. K. Basante-Villota, C. M. Ortega-Castillo, D. F. Peña-Unigarro, J. E. Revelo-Fuelagán, J. A. Salazar-Castro, and D. H. Peluffo-Ordóñez Solving Large Systems of Linear Equations on GPUs . . . . . . . . . . . . . . . . . Tomás Felipe Llano-Ríos, Juan D. Ocampo-García, Johan Sebastián Yepes-Ríos, Francisco J. Correa-Zabala, and Christian Trefftz Learning Analytics as a Tool for Visual Analysis in an Open Data Environment: A Higher Education Case . . . . . . . . . . . . . . . . . . . . . . . . . . . Johnny Salazar-Cardona, David Angarita-Garcia, and Jeferson Arango-López Mathematical Model for Assigning an Optimal Frequency of Buses in an Integrated Transport System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Sebastián Mantilla Quintero and Juan Carlos Martínez Santos Diatom Segmentation in Water Resources . . . . . . . . . . . . . . . . . . . . . . . . . Jose Libreros, Gloria Bueno, Maria Trujillo, and Maria Ospina Implementation of a Wormhole Attack on Wireless Sensor Networks with XBee S2C Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julian Ramirez Gómez, Héctor Fernando Vargas Montoya, and Alvaro Leon Henao REAL-T: Time Modularization in Reactive Distributed Applications . . . . . . . Luis Daniel Benavides Navarro, Camilo Pimienta, Mateo Sanabria, Daniel Díaz, Wilmer Garzón, Willson Melo, and Hugo Arboleda

1

16

28

39

55

70 83

98

113

XII

Contents

Odor Pleasantness Classification from Electroencephalographic Signals and Emotional States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. A. Becerra, E. Londoño-Delgado, S. M. Pelaez-Becerra, L. Serna-Guarín, A. E. Castro-Ospina, D. Marin-Castrillón, and D. H. Peluffo-Ordóñez Exploration of Characterization and Classification Techniques for Movement Identification from EMG Signals: Preliminary Results . . . . . . . . . A. Viveros-Melo, L. Lasso-Arciniegas, J. A. Salazar-Castro, D. H. Peluffo-Ordóñez, M. A. Becerra, A. E. Castro-Ospina, and E. J. Revelo-Fuelagán An Automatic Approach to Generate Corpus in Spanish . . . . . . . . . . . . . . . Edwin Puertas, Jorge Andres Alvarado-Valencia, Luis Gabriel Moreno-Sandoval, and Alexandra Pomares-Quimbaya Comparing Graph Similarity Measures for Semantic Representations of Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rubén Manrique, Felipe Cueto-Ramirez, and Olga Mariño Knowledge Graph-Based Teacher Support for Learning Material Authoring . . . Christian Grévisse, Rubén Manrique, Olga Mariño, and Steffen Rothkugel Building Alternative Methods for Aiding Language Skills Learning for the Hearing Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paula A. Correa D., Juan P. Mejía P., Andrés M. Lenis L., Cristian A. Camargo G., and Andrés A. Navarro-Newball A Training Algorithm to Reinforce Generic Competences in Higher Education Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Muñoz, Oscar Bedoya, Edwin Gamboa, and María Trujillo A Structure-from-Motion Pipeline for Topographic Reconstructions Using Unmanned Aerial Vehicles and Open Source Software . . . . . . . . . . . . Jhacson Meza, Andrés G. Marrugo, Enrique Sierra, Milton Guerrero, Jaime Meneses, and Lenny A. Romero CREANDO – Platform for Game Experiences Base on Pervasive Narrative in Closed Spaces: An Educational Experience. . . . . . . . . . . . . . . . Carlos C. Ceron Valdivieso, Jeferson Arango-López, Cesar A. Collazos, and Francisco Luis Gutiérrez Vela Towards a Smart Farming Platform: From IoT-Based Crop Sensing to Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Héctor Cadavid, Wilmer Garzón, Alexander Pérez, Germán López, Cristian Mendivelso, and Carlos Ramírez

128

139

150

162 177

192

201

213

226

237

Contents

Instrumented Insole for Plantar Pressure Measurement in Sports . . . . . . . . . . Iván Echeverry-Mancera, William Bautista-Aguiar, Diego Florez-Quintero, Dayana Narvaez-Martinez, and Sonia H. Contreras-Ortiz UP-VSE: A Unified Process - Based Lifecycle Model for Very Small Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jhon Alvarez and Julio Hurtado Frame-Level Covariance Descriptor for Action Recognition . . . . . . . . . . . . . Wilson Moreno, Gustavo Garzón, and Fabio Martínez Prediction Model of Electricity Energy Demand for FCU in Colombia Based on Stacking and Text Mining Methods . . . . . . . . . . . . . . . . . . . . . . . Javier H. Velasco Castillo and Andrés M. Castillo Access Control Application Based on the IMS Communication Framework . . . Estefanía Figueroa-Buitrago and Fabio G. Guerrero Positioning of the Cutting Tool of a CNC Type Milling Machine by Means of Digital Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Camilo Londoño Lopera, Jhon Edison Goez Mora, and Edgar Mario Rico Mesa Support Vector Machines for Semantic Relation Extraction in Spanish Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jefferson Peña Torres, Raúl Gutierrez de Piñerez Reyes, and Víctor A. Bucheli A Strategy Based on Technological Maps for the Identification of the State-of-the-Art Techniques in Software Development Projects: Virtual Judge Projects as a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . Carlos G. Hidalgo Suarez, Vıctor A. Bucheli, Felipe Restrepo-Calle, and Fabio A. Gonzalez Making Decisions on the Student Quota Problem: A Case Study Using a MIP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robinson Duque, Víctor Bucheli, Jesús Alexander Aranda, and Juan Francisco Díaz Towards On-Line Sign Language Recognition Using Cumulative SD-VLAD Descriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jefferson Rodríguez and Fabio Martínez Applying CRISP-DM in a KDD Process for the Analysis of Student Attrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis Fernando Castro R., Esperanza Espitia P., and Andrés Felipe Montilla

XIII

252

260 276

291 301

312

326

338

355

371

386

XIV

Contents

Fuzzy Logic Model for the Evaluation of Cognitive Training Through Videogames. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holman Bolivar, Sonia Rios, Karol Garcia, Sandra Castillo, and Cesar Díaz Creating a Software Product Line of Mini-Games to Support Language Therapy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luisa Rincón, Juan-C. Martínez, María C. Pabón, Javier Mogollón, and Alejandro Caballero

402

418

Segmentation and Detection of Vascular Bifurcations and Crossings in Retinal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Aguiar, Felipe Castano, and Maria Trujillo

432

Object-Oriented Mathematical Modeling for Estimating Electric Vehicle’s Range Using Modelica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. A. Dominguez-Jimenez and Javier Campillo

444

Addressing Motivation Issues in Physical Rehabilitation Treatments Using Exergames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ruiz Camilo, Gamboa Edwin, Cortes Andres, and Trujillo Maria

459

HoloEasy, A Web Application for Computer Generated Holograms. . . . . . . . Alberto Patiño-Vanegas, Lenier Leonis Diaz-Pacheco, John Jairo Patiño-Vanegas, and Juan Carlos Martínez-Santos

471

Integrated Model AmI-IoT-DA for Care of Elderly People . . . . . . . . . . . . . . Andrés Sánchez, Enrique González, and Luis Barreto

487

Intelligent Hybrid Approach for Computer-Aided Diagnosis of Mild Cognitive Impairment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Camilo Flórez, Santiago Murillo Rendón, Francia Restrepo de Mejía, Belarmino Segura Giraldo, and for The Alzheimer’s Disease Neuroimaging Initiative Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

498

513

Physiological Signals Fusion Oriented to Diagnosis - A Review Y. F. Uribe1, K. C. Alvarez-Uribe1, D. H. Peluffo-Ordoñez2, and M. A. Becerra1(&) 1

Instituto Tecnológico Metropolitano, Medellín, Colombia [email protected] 2 Yachay Tech, San Miguel de Urcuquí Canton, Ecuador

Abstract. The analysis of physiological signals is widely used for the development of diagnosis support tools in medicine, and it is currently an open research field. The use of multiple signals or physiological measures as a whole has been carried out using data fusion techniques commonly known as multimodal fusion, which has demonstrated its ability to improve the accuracy of diagnostic care systems. This paper presents a review of state of the art, putting in relief the main techniques, challenges, gaps, advantages, disadvantages, and practical considerations of data fusion applied to the analysis of physiological signals oriented to diagnosis decision support. Also, physiological signals data fusion architecture oriented to diagnosis is proposed. Keywords: Data fusion  Multimodal fusion Signal processing  Physiological signal

 Diagnostic decision support

1 Introduction Physiological signals deliver relevant information on the status of the human being, which helps the doctor to give a diagnosis for specifics pathologies, and therefore provide appropriate treatment. However, in many cases, these tasks become more complicated since patients can present several pathologies that must be managed simultaneously. Additionally, physiological parameters change frequently, requiring a rapid analysis, and high-risk decisions [1] that result from the interpretation of the human expert that analyses the available clinical evidence. Recently, studies the analysis of multimodal signals, for diagnostic support using multimodal has increased [2, 3] in data fusion. This last covers the analysis of different sources and types of data. Its aims is to provide information with less uncertainty [4] and potentially allows ubiquitous and continuous monitoring of physiological parameters [5] and reduce adverse effects of the signals due to sensor movements, irregular sampling, bad connections and signal noise [6–10]. Data fusion can include different processes such as association, correlation, combine data, and information achieved from one or multiple sources to identify objects, situations, and threats [11]. This paper presents a literature review of the data fusion oriented to clinical diagnosis discussing and identifying their most common techniques, properties, and highlighting advantages, disadvantages, challenges, lacks, and gaps. This review was © Springer Nature Switzerland AG 2018 J. E. Serrano C. and J. C. Martínez-Santos (Eds.): CCC 2018, CCIS 885, pp. 1–15, 2018. https://doi.org/10.1007/978-3-319-98998-3_1

2

Y. F. Uribe et al.

carried out from Scopus and Web of Sciences database, based on these search criteria: (i) (physiological signals) and (diagnosis decision support); and (ii) ((“data fusion”) or (“information fusion”) or (“multimodal”) and (diagnosis or diagnostic)) and (“physiological signals”). The selected papers were reported between years 2013 and 2018 in journals of quartile 1 and quartile 2 principally. Also, a data fusion framework oriented to clinical diagnostic was proposed for physiological signals processing based on the Joint Directors of Laboratories (JDL) model. The rest of the document is organized as follows: in section two, a description of the physiological signals is presented. In section three, we describe the most common multi-modal fusion models, spotlighting data processing, and fusion techniques; Section four contains the proposed architecture; and finally, the conclusions and future work are presented.

2 Physiological Signals Description The physiological signals provide information that can be analyzed by specialists to determine with more accurate the diagnosis and treatments, besides, may be used for retrospective studies by research organizations [12]. Physiological signals are obtained through a large number of biomedical measuring devices, such as multi-parameter vital signs monitors, electroencephalograms, electrocardiograms, electromyograms, thermometers, motion sensors, oxygen saturation, glucometers, among others. These signals give a lot of information of the organs, but they have multiple problems of noise derived from internal and external causes. Each signal or group of signals have different application for monitoring of vital signs or diagnostic such as cardiovascular diseases [13], apneic events [14], assesses the activity of back muscles in patients of (scoliosis, identify locomotion modes and measure tissue oxygenation) measure the level of anesthesia during surgery [15], eye tracking [16], non-invasive assessment of blood flow changes in muscle and bone using photoplethysmography (PPG) [17], pulmonary embolism, acute respiratory distress syndrome [18], heart valve disease [19], changes in the severity of aortic regurgitation [20], Arterial aging studies [21], Human motion disorders [22], Epilepsy [23] among others. Some signals are applied for brain–computer interfaces (BCI), which provide people suffering partial or complete motor impairments, through a non-muscular communication channel to transmission of commands to devices that allow managing an application, e.g., computerized spelling, robotic wheelchairs, robotic arms, teleoperated mobile robots, games or virtual environments [24, 25]. Different signals are analyzed for developing diagnostic support systems; an important group of them capture information synchronously or asynchronously from different human being organs. Figure 1 shows a classification of these signals as follow: (i) bioelectric signals: they are variations of biopotential versus time, e.g. Electrocardiogram (ECG), electrooculography (EOG), electromyography (EMG), electroencephalography (EEG), and electrocorticography (ECoG); (ii) Bioacoustic signals: These provide plot of recording of the sounds, e.g. phonocardiography (PCG); (iii) Biooptic signals: they correspond to measures based on detected light intensity from different tissues, flows of the body, among others, e.g. photoplethysmography (PPG); (iv) biomechanical signals: they are pressure measures mainly, e.g. blood pressure (BP),

Physiological Signals Fusion Oriented to Diagnosis - A Review

3

intracranial pressure (ICP), body move (BM), systolic volume (SV); (v) bioimpedance signals: correspond to electrodermal activity e.g. skin conductivity (SC) or galvanic skin response (GSR); (vi) biochemical signals: These are based on chemical components measures e.g. blood glucose (BG).

Fig. 1. Physiological signals classification

ECG is widely used to understand and investigate cardiac health condition [2, 26, 27]. EOG is related to the eye movement which is derived from Cornea-Retinal Potential [28, 29]. EMG is acquired using electrodes through a muscle fiber skin to observe the muscle activity. It is also associated with the neural signals, sent from the spinal cord to muscles [30, 31]. EEG signals indicate any nervous excitement by detecting brain activities derived from neurons in the brain that communicate through electrical impulses [15, 32, 33]. ECoG records are an electrical activity of the brain by means of invasive electrodes [23, 34]. Obtaining information from bioelectric signals becomes extremely difficult due to limited data and presence of noise which significantly affects the ability to detect weak sources of interest [26, 35]. PCG acquisition is plain, non-invasive, low-cost and precise for assessing a wide range of heart disease (e.g. cardiac murmurs) [19, 36]. However, they are altered by external acoustic sources (such as speech, environmental noise, etc.) and physiological interference (such as lung sounds, cough, etc.) [37]. Respiratory rate (RR) [18], can be altered by noise and movement artifacts [38]. PPG signal consists of direct current (DC) and alternating current (AC) components. The AC component represents the changes in arterial blood volume between the systolic and diastolic phases of a cardiac cycle. The DC component corresponds to the detected light intensity from tissues, venous blood, and non-pulsatile components of arterial blood, an example of transmission type is a fingertip pulse oximeter (Spo2), which is clinically accepted and widely used. Clinical applications of PPG sensors are limited by their low signal to noise ratio (SNR), which is caused by the large volume of skin, muscle, and fat and relatively small pulsatile component of arterial blood [17, 39]. BP is defined by systolic and diastolic pressure, and it is measured in millimeters of mercury (mmHg), but main forms of noninvasive blood pressure measurement are divided into intermittent and continuous blood pressure measurements [40, 41], consecutively affecting the calculated measure of systolic volume (SV), ICP is the pressure within skull [42]; BM capture body movements [22, 43]; SC is the electrodermal activity, indicator of sympathetic activation and a useful tool for investigating

4

Y. F. Uribe et al.

psychological and physiological arousal [44, 45]; BG indicates the amount of energy in the body [43, 46]. Finally, the temperature measurement (Temp) is a measure of the ability of the body or skin to generate and release heat [3, 43]. These signals can be easily altered by movement and body mass, environmental noise, intermittent connections, etc. In Table 1 is shown a summarize of some applications of physiological signals for monomodal clinical support systems. Table 1. Physiological signals applications Signal ECG EMG

EEG EOG PPG

RR PCG SV GSR Accelerometer Blood glucose BP Temperature ICP ECoG

Applications Cardiovascular diseases [13] Apneic events [14] Assesses the activity of back muscles in patients suffering of scoliosis [47] Identify locomotion modes such as level-ground walking, standing, sitting, and ascending/descending stairs and ramps [30] Measure tissue oxygenation [48] The level of anesthesia during surgery [15] Eye tracker [16] Parkinson’s disease [49] Early detection of pathologies related to the heart [15] Non-invasive assessment of blood flow changes in muscle and bone using PPG [17] Rapid breathing (tachypnea) [18] Heart failure [19] Changes in the severity of aortic regurgitation [20] Arterial aging studies [21] Repeatability of measurements of galvanic skin response [45] Human motion disorders [22] Diabetes or hypoglycemia [46] Hypotension or hypertension [40] Emotion recognition [50] Hydrocephalus [42] Epilepsy [23]

3 Signal Fusion Multiple information about the same phenomenon can be acquired from different types of detectors or sensors, under different conditions, in multiple experiments or subjects. Particularly multimodal fusion refers to the combination of various signals of multiple modalities to improve the performance of the systems decreasing the uncertain of their results. Each modality contributes a type of added value that cannot be deduced or obtained from only type of physiological signals [51, 52]. There are several techniques of multimodal fusion reported in the literature, like the sum and the product, which have been used for data fusion, and consecutively these operators have evolved into more advanced ones, particularly through the results of

Physiological Signals Fusion Oriented to Diagnosis - A Review

5

soft-computing and fuzzy operator research (Fig. 2) [53] which are widely discussed in [54] as follows: (i) Fusion of imperfect data are approaches capable of representing specific aspects of imperfect data (Probabilistic fusion, Evidential belief reasoning, fusion based on Random set theoretic fusion, Fusion and fuzzy reasoning, Possibilistic fusion, Rough set based fusion, Hybrid fusion approaches (the main idea behind development of hybrid fusion algorithms is that different fusion methods complement each other to give a more precise approach); (ii) Fusion of correlated data provide either independence or prior knowledge of the cross covariance of data to produce consistent results; (iii) Fusion of inconsistent data is the notion of data inconsistency (Spurious data, Out of sequence data, Conflicting data), and (iv) fusion of disparate data is the input data to a fusion system, which is generated by a wide variety of sensors, humans, or even stored sensory data [54]. However, categorizations most used are described in [11, 52, 55–57]; which consists of three types of fusion: (i) early: the characteristics obtained from different modalities are combined into a single representation before feeding the learning phase, it is known as feature fusion, and its major advantage is the detection of correlated features generated by different sensor signals so to identify a feature subset that improves recognition accuracy; In addition, the main drawback is to find the most significant feature subset, large training sets are typically required [11, 50, 58]; (ii) intermediate: it can cope with the imperfect data, along with the problems of reliability and asynchrony between different modalities, and (iii) late [59]: it is known as fusion level decision each modality is processed separately by a first recognizer, and another model is trained on the unimodal predictions to predict the actual single modal gold standard [33], main decision-level fusion advantages include communication bandwidth savings and improved decision accuracy. Another important aspect of decision fusion is the combination of the heterogeneous sensors whose measurement domains have been processed with different algorithms [11, 50, 58, 60].

Fig. 2. Evolution of data fusion operators [53]

The simplest approach to multimodal analysis is to design a classifier per modality and joint the output of these classifiers combine the visual model and the text model under the assumption that they are independent, thus the probabilities are simply

6

Y. F. Uribe et al.

multiplied [61]. Nevertheless, accurate synchronization of multimodal data streams is critical to avoid parameter skews for analysis [62]. Table 2, shows a summarize advantages and disadvantages of this multimodal fusion. Table 2. Advantages and disadvantages multimodal fusion Advantages - Improved signal to noise ratio - Reduced ambiguity and uncertainty - Increased confidence - Enhanced robustness and reliability - Improved resolution, precision and hypothesis discrimination - Interaction of the human with the machine - Integration of independent features and prior knowledge [33, 58]

Disadvantages - The uncertainties in sensors arise the ambiguities and inconsistencies present in the environment, and from the inability to distinguish between them [54] - They require signal processing techniques - The data distributed with a similar semantics, cannot be directly fused and should process separately - Primary data is only available for a short time, as in the case of stream data, which is usually processed in real time and then deleted after storing the analysis results [63]

In general, the main problem of multimodal data processing is that the data must be processed separately and must be combined only at the end, the dimensionality of joint feature space, different feature formats, and time-alignment. The information theory provides with a set of information measures that not only assess the amount of information that one single source of data contains, but also the amount of information that two sources of data have in common [52, 61]. In Table 3 is shown multiple studies of fusion of several physiological signals alongside the techniques applied for specific clinical diagnostic decision support with their respective accuracy (Acc). We highlighted the applications in emotion recognition, monitoring and reduce the false alarms hart diagnosis, and the applicability of ECG signals for fusing with other signals for several diagnostics. Table 3. Multimodal fusion systems Ref [64]

Fused signals RR and ECG

[65]

ECG, EMG, SC and RR Acc: 71% ECG, EMG, EOG, SC, RR, and finger Temp Acc: 67.5% arousal and 73.8% valence BP and SC

[10]

[66]

[50]

BP, EMG, SC, SKT and FR Acc: 78.9%

Techniques Modified Kalman-Filter (KF) framework Hilbert-HuangTransform (HHT) Classifier fusion (Linear and Quadratic Discriminant Analysis with diagonal covariance matrix estimation) Algorithm sequence pattern mining and artificial neural network Viola-Jones face detector, Shi & Thomasi method, Euclidean distance and feature-level fusion

Diagnostic Estimating respiratory rate Emotion recognition

(continued)

Physiological Signals Fusion Oriented to Diagnosis - A Review

7

Table 3. (continued) Ref [67]

Fused signals GSR, attitude of the head, eyes and facial expressions

[52]

EEG, GSR, EMG and EOG Acc: 85% ECG and SpO2

[5]

[8]

[2]

ECG, PA, SV, PPG and EEG Acc: 89.63% ECG

[68]

EEG and EOG Acc: 97.3%

[69]

Change eye gaze direction and duration of flicker Acc: 70% BP, ECG, EEG, EMG, Spo2, FC, Temp and BG

[43]

[71]

ECG and PCG Acc: 97%

Techniques Reference model (CSALP), valence-arousal method, boosting algorithm, model (ASM), Haar-like features, flow-based algorithm, POSIT algorithms, RANSAC regression, entropy, SVMbased method, Support vector machine (SVM), filters and multimodal fusion Discrete wavelet transform

Diagnostic

Stochastic Petri net (SPN) and Wearable health monitoring system (WHMS) Robust algorithm

Improve monitoring and reduce the false alarms

Beat-by-beat algorithm, Function ‘gqrs’ of the WFDB toolbox, Open-source algorithm, ‘wabp’ of the WFDB Toolbox and candidate detections ratio (CDR) Approximate entropy (ApEn), Sample entropy (SampEn), Renyientropy (RenEn), Recurrence quantification analysis (RQA), Extreme learning machine (ELM) and wavelet-based nonlinear features SLD (Standard Lateral Deviation), D-S, decision fusion Preprocessing, puts filter, selfadaptive, data compression (CR and PRD), Gateway data fusion, fuzzy logic, artificial neural networks, support vector machines and classification (specificity and sensitivity) Wavelet transform, discrete wavelet transform STFT, band pass filter and decision fusion

Location of the heart beat

Predict emotions

Drowsiness

Heart rate variability [70]

(continued)

8

Y. F. Uribe et al. Table 3. (continued)

Ref [60] [72]

Fused signals BP, ECG and FC Acc: 99.7% ECG and accelerometer Acc: 99%

[73]

ECoG

[7]

BP and ECG Acc: 99.4%

[1]

ECG, BP and PPG

[6]

BP, ECG and RR Acc: 94.15% ECG, GSR, rotation of the head, movement of the eyes and yawn

[75]

[76]

Essential tremor (ET), Parkinson’s disease (PD), physiological tremor (PT) and EMG Acc: 99.6%

[42]

ICP

[77]

FC

[55]

BP, ECG and EEG Acc: 86.26%

Techniques The Processing Elements (PEs) and decision-level fusion Hamilton-Tompkins algorithm, bandpass filter, wavelet transform and data fusion algorithm Criterion of Neyman-Pearson, preprocessing, fusion channels unification and voting, ROC curve and area under the curve (AUC) Kalman Filter (KF), fusion technique Townsend and Tarassenko and signal quality index (SQI) PCA (principal component analysis), Kalman filter, LSP (Lomb - Scargleperiodogram) and data fusion covariance DWT (Discrete Wavelet transform) and decision fusion FFT, fusion based on Bayesian network data, pre-filter Butterworth fission and Gaussian filter EMD (Empirical mode decomposition), DWT (discrete wavelet transform), D S (Dempster-Shafer), BPNN (back-propagation neural network) and decision fusion The median and the tendency of the waveform, FIR (low pass filter), evidence fusion and global fusion Fuzzy logic, Neural networks, Bayesian probability and belief network Signal quality index (SQI), Estimation of regular intervals, Heartbeats detection, adaptative filter, Multimodal fusion and QRS detection

Diagnostic Hypotension and hypertension [40] Congestive heart failure and sleep apnea and asthma Epilepsy

Left ventricular hypertrophy [74]

Arrhythmias

Fatigue and stress

Tremor

Hydrocephalus

Hypovolemia

Alterations in cardiac autonomic control peripheral [78]

Physiological Signals Fusion Oriented to Diagnosis - A Review

9

4 Proposed Model Different architectures and methodologies of data fusion have been reported in [11, 60, 79, 80], based on the Joint Directors of Laboratories (JDL) model which focus on the abstraction level of the manipulated data by a fusion system. We proposed a general framework for processing and fusion of multimodal physiological signals oriented to diagnostic support systems. The architecture consists of four levels (Fig. 3), where the level 0 has for purpose make the acquisition of different physiological signals and realize the pre-processing, which consists of the stage of filtration, feature extraction, and normalization; Level 1, is composed by a spatial-temporal alignment and data correlation, the latter checks the proportionality of the information, i.e., if the information is not consistent will be feedback to the preprocessing stage, otherwise the process continues. Subsequently, the association of information executes a classification with multiple hypothesis tests, which tracks multiple targets in dense environments with the help of Bayesian networks or similar techniques, providing labels to each signal obtained from the sensors, but when the objective position is doubtful, data estimation is performed with the maximum posterior method that is based on Bayesian theory, and is used when the X parameter to be estimated is the output of a random

Sensor 1 Human acter

S1

Filter

Sensor 2

Feature extracƟon

S2

Sensor n

NormalizaƟon

Sn

Pre - processing Level 0 AassociaƟon (MHT) and esƟmaƟon (MAP) Features fusion

NO S1

YES

S1 S2

Data correlaƟon

SpaƟo-temporal alignment

Sn

Sn S1

S2

S2

Sn

Algorithm to eliminate false alarms

False alarms

Level 1 Pathology 1 Training machine learning

Pathology 2

Decision fusion

Clinical diagnosis

Pathology n

Treatment 1 Level 2 ValuaƟon, risk and impact

Treatment 2 Treatment n

Level 3

Fig. 3. Proposed data fusion oriented to diagnostic.

10

Y. F. Uribe et al.

variable with a known Pr P(X) function, consecutively the system performs an analysis verifying the status of the labels, if at any moment a different label to those assigned to the physiological parameters is identified as false alarm, it is eliminated by means of the algorithm; afterwards, sets of characteristics obtained are fused to form vectors of significant features. Consequently, level 2 has the function to determine the possible pathologies presented by the patient through learning machines; finally level 3 includes the decision level, which will determine the best hypothesis for the pathology, providing a clinical diagnosis and a possible treatment, besides this determines the assessment, risk, and impact of the process based on forecast system. All stages allow including hard and soft data, context information, together medical criteria and a mapping system based on performance quality metrics that allow optimizing the processing. The proposed model was developed to diminish the high rate of false alarms in services of constant monitoring, supply a timely diagnosis and a possible treatment to the pathology of the patient, providing support the specialist.

5 Conclusion In this work were discussed multiple physiological signals alongside multimodal data fusion systems applied in clinical diagnosis support systems, highlighting advantages, disadvantages, shortcomings, and challenges. It has highlighted the capability of multimodal data fusion systems because of allowing obtaining more reliable and robust psychological or physiological information using multiple sources respect to unimodal systems, revealing an increase in the accuracy of diagnoses, and demonstrating complementarity of modalities. Additionally, multimodal data fusion yields important insights processes and structures, spatiotemporal resolution complementarity, including a comprehensive physiological view, structures, quantification, generalization and normalization [81]. Nevertheless, accurate synchronization of multimodal data streams is critical to avoid parameter skews for analysis. For some diagnosis, the results can be considered low. Therefore, studies in this field must follow. We consider that other signals can be included in the data fusion systems and complement it with information quality evaluation systems as the proposed in [82]. In addition, we proposed a physiological signal fusion architecture, based on the JDL model; in order to provide a more reliable diagnosis and treatment based on evidence, all of the above to support the specialist in their decisions; The interface for the model will present continuous monitoring, without alterations with minimum response times, and easy to use. Finally, to develop more effective clinical decision support mechanisms, an architecture was proposed, which covers all levels of development of diagnostic of the assistance systems in the field health taking into account the gaps found in the literature such as lack traceability of the systems from acquisition until results, visualizations, and treatments. Besides, other problems such as signals that cannot be directly merged and must be done separately, the low availability of data in the time, the high computational cost of complex models, and limitations about the assessment of situation and risk.

Physiological Signals Fusion Oriented to Diagnosis - A Review

11

References 1. Clifford, G.D., Long, W.J., Moody, G.B., Szolovits, P.: Robust parameter extraction for decision support using multimodal intensive care data. Philos. Trans. A. Math. Phys. Eng. Sci. 367(1887), 411–429 (2009) 2. Mollakazemi, M.J., Atyabi, S.A., Ghaffari, A.: Heart beat detection using a multimodal data coupling method. Physiol. Meas. 36(8), 1729–1742 (2015) 3. Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012) 4. Begum, S., Barua, S., Filla, R., Ahmed, M.U.: Classification of physiological signals for wheel loader operators using multi-scale entropy analysis and case-based reasoning. Expert Syst. Appl. 41(2), 295–305 (2014) 5. Pantelopoulos, A., Bourbakis, N.: SPN-model based simulation of a wearable health monitoring system. In: Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society Engineering the Future of Biomedicine, EMBC 2009, pp. 320–323 (2009) 6. Ryoo, H.C., Sun, H.H., Hrebien, L.: Two compartment fusion system designed for physiological state monitoring. In: Annual Reports Res. React. Inst., pp. 2224–2227 (2001) 7. Li, Q., Mark, R.G., Clifford, G.D.: Artificial arterial blood pressure artifact models and an evaluation of a robust blood pressure and heart rate estimator. Biomed. Eng. Online 15, 1–15 (2009) 8. Galeotti, L., Scully, C.G., Vicente, J., Johannesen, L., Strauss, D.G.: Robust algorithm to locate heart beats from multiple physiological waveforms by individual signal detector voting. Physiol. Meas. 36(8), 1705–1716 (2015) 9. Tsiliki, G., Kossida, S.: Fusion methodologies for biomedical data. J. Proteomics 74(12), 2774–2785 (2011) 10. Setz, C., Schumm, J., Lorenz, C., Arnrich, B., Tröster, G.: Using ensemble classifier systems for handling missing data in emotion recognition from physiology: one step towards a practical system. In: Affective Computing and Intelligent Interaction (ACII 2009), pp. 1–8 (2009) 11. Castanedo, F.: A review of data fusion techniques. Sci. World J. 2013, 704504 (2013) 12. Patil, R.: Digital signal preservation approaches of archived biomedical paper records - a review. In: 5th International Conference on Wireless Networks and Embedded Systems, WECON 2016, pp. 13–16 (2016) 13. Liu, T., Si, Y., Wen, D., Zang, M., Lang, L.: Dictionary learning for VQ feature extraction in ECG beats classification. Expert Syst. Appl. 53, 129–137 (2016) 14. Alvarez-Estevez, D., Moret-Bonillo, V.: Spectral heart rate variability analysis using the heart timing signal for the screening of the sleep apnea–hypopnea syndrome. Comput. Biol. Med. 71, 14–23 (2016) 15. Liu, Q., Chen, Y.F., Fan, S.Z., Abbod, M.F., Shieh, J.S.: A comparison of five different algorithms for EEG signal analysis in artifacts rejection for monitoring depth of anesthesia. Biomed. Sig. Process. Control 25, 24–34 (2016) 16. Mack, D.J., Schönle, P.: An EOG-based, head-mounted eye tracker with 1 kHz sampling rate. In: IEEE Biomedical Circuits and Systems Conference: Engineering for Healthy Minds and Able Bodies, BioCAS, pp. 7–10 (2015) 17. Khan, M., et al.: Analysing the effects of cold, normal, and warm digits on transmittance pulse oximetry. Biomed. Sig. Process. Control 26, 34–41 (2016) 18. Janik, P., Janik, M.A., Wróbel, Z.: Integrated micro power frequency breath detector. Sens. Actuators A Phys. 239, 79–89 (2016)

12

Y. F. Uribe et al.

19. Essentials, F., Taylor, A.J.: Learning Cardiac Auscultation. Springer, London (2015). https:// doi.org/10.1007/978-1-4471-6738-9 20. Francisco, J., et al.: Changes in the severity of aortic regurgitation at peak effort during exercise ☆. Int. J. Cardiol. 228, 145–148 (2017) 21. Chuiko, G.P., Dvornik, O.V., Shyian, S.I., Baganov, Y.A.: A new age-related model for blood stroke volume. Comput. Biol. Med. 79(Oct), 144–148 (2016) 22. Lorenzi, P., Rao, R., Romano, G., Kita, A., Irrera, F.: Mobile devices for the real-time detection of specific human motion disorders. IEEE Sens. J. 16(23), 8220–8227 (2016) 23. Takaura, K., Tsuchiya, N., Fujii, N.: Frequency-dependent spatiotemporal profiles of visual responses recorded with subdural ECoG electrodes in awake monkeys: differences between high- and low-frequency activity. NeuroImage 124, 557–572 (2016) 24. Antelis, J.M., Gudi, B., Eduardo, L., Sanchez-ante, G., Sossa, H.: Dendrite morphological neural networks for motor task recognition from electroencephalographic signals. Biomed. Sig. Process. Control 44, 12–24 (2018) 25. Becerra, M.A., Alvarez-Uribe, K.C., Peluffo-Ordoñez, D.H.: Low data fusion framework oriented to information quality for BCI systems. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2018. LNCS, vol. 10814, pp. 289–300. Springer, Cham (2018). https://doi.org/10.1007/9783-319-78759-6_27 26. Kaur, H., Rajni, R.: On the detection of cardiac arrhythmia with principal. Wirel. Pers. Commun. 97(4), 5495–5509 (2017) 27. Rajesh, K.N.V.P.S., Dhuli, R.: Biomedical signal processing and control classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier. Biomed. Sig. Process. Control 41, 242–254 (2018) 28. Mulam, H.: Optimized feature mapping for eye movement recognition using electrooculogram signals. In: 8th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2017 (2017) 29. Lv, Z., Zhang, C., Zhou, B., Gao, X., Wu, X.: Design and implementation of an eye gesture perception system based on electrooculography. Expert Syst. Appl. 91, 310–321 (2018) 30. Young, A.J., Kuiken, T.A., Hargrove, L.J.: Analysis of using EMG and mechanical sensors to enhance intent recognition in powered lower limb prostheses. J. Neural Eng. 11(5), 56021 (2014) 31. Kaur, A., Agarwal, R., Kumar, A.: Adaptive threshold method for peak detection of surface electromyography signal from around shoulder muscles. J. Appl. Stat. 4763, 714–726 (2018) 32. Khurana, V., Kumar, P., Saini, R., Roy, P.P.: ScienceDirect EEG based word familiarity using features and frequency bands combination Action editor: Ning Zhong. Cogn. Syst. Res. 49, 33–48 (2018) 33. Koelstra, S.: Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012) 34. Degenhart, A.D., Hiremath, S.V., Yang, Y.: Remapping cortical modulation for electrocorticographic brain–computer interfaces: a somatotopy-based approach in individuals with upper-limb paralysis. J. Neural Eng. 15(2), 026021 (2018) 35. Ravan, M.: Beamspace fast fully adaptive brain source localization for limited data sequences. Inverse Probl. 33(5), 055021 (2017) 36. Alonso-ar, M.A., Ibarra-hern, R.F., Cruz-guti, A., Licona-ch, A.L., Villarreal-reyes, S.: Design and evaluation of a parametric model for cardiac sounds. Comput. Biol. Med. 89 (Aug), 170–180 (2017) 37. Babu, K.A., Ramkumar, B., Manikandan, M.S.: Real-time detection of S2 sound using simultaneous recording of PCG and PPG. In: IEEE Region 10 Annual International Conference, pp. 1475–1480 (2017)

Physiological Signals Fusion Oriented to Diagnosis - A Review

13

38. Prabha, A., Trivedi, A., Kumar, A.A., Kumar, C.S.: Automated system for obstructive sleep apnea detection using heart rate variability and respiratory rate variability. In: International Conference on Advances in Computing, pp. 1303–1307 (2017) 39. Lee, H., Chung, H., Ko, H., Lee, J.: Wearable multichannel photoplethysmography framework for heart rate monitoring during intensive exercise. IEEE Sens. J. 18(7), 2983– 2993 (2018) 40. Oliveira, C.C., Machado Da Silva, J.: A fuzzy logic approach for highly dependable medical wearable systems. In: Proceedings of the 2015 IEEE 20th International Mixed-Signal Testing Workshop, IMSTW 2015 (2015) 41. Li, J., et al.: Design of a continuous blood pressure measurement system based on pulse wave and ECG signals. IEEE J. Transl. Eng. Heal. Med. 6(Jan), 1–14 (2018) 42. Conte, R., Longo, M., Marano, S., Matta, V., Elettrica, I., Dea, A.: Fusing evidences from intracranial pressure data using dempster-shafer theory. In: 15th International Conference on Digital Signal Processing, pp. 159–162 (2007) 43. Al-Saud, K., Mahmuddin, M., Mohamed, A.: Wireless body area sensor networks signal processing and communication framework: survey on sensing, communication technologies, delivery and feedback. J. Comput. Sci. 8(1), 121–132 (2012) 44. Torniainen, J., Cowley, B., Henelius, A., Lukander, K., Pakarinen, S.: Feasibility of an electrodermal activity ring prototype as a research tool. In: IEEE Engineering in Medicine and Biology Society, EMBS, pp. 6433–6436 (2015) 45. Muller, J., et al.: Repeatability of measurements of galvanic skin response – a pilot study. Open Complement. Med. J. 5(1), 11–17 (2013) 46. Wang, Y.-Z., et al.: Nonenzymatic electrochemiluminescence glucose sensor based on quenching effect on luminol using attapulgite–TiO2. Sens. Actuators B Chem. 230, 449–455 (2016) 47. Belgacem, N., Fournier, R., Nait-Ali, A., Bereksi-Reguig, F.: A novel biometric authentication approach using ECG and EMG signals. J. Med. Eng. Technol. 39(4), 226– 238 (2015) 48. Kume, D., Akahoshi, S., Yamagata, T., Wakimoto, T., Nagao, N.: Does voluntary hypoventilation during exercise impact EMG activity? SpringerPlus 5(1), 149 (2016) 49. Stuart, S., Galna, B., Lord, S., Rochester, L.: A protocol to examine vision and gait in Parkinson’s disease: impact of cognition and response to visual cues [version 2; referees: 2 approved] Referee Status, pp. 1–18 (2016) 50. Abdat, F., Maaoui, C., Pruski, A.: Bimodal system for emotion recognition from facial expressions and physiological signals using feature-level fusion. In: Symposium on Computer Modeling and Simulation, pp. 24–29 (2011) 51. Zapata, J.C., Duque, C.M., Rojas-Idarraga, Y., Gonzalez, M.E., Guzmán, J.A., Becerra Botero, M.A.: Data fusion applied to biometric identification – a review. In: Solano, A., Ordoñez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 721–733. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-66562-7_51 52. Verma, G.K., Tiwary, U.S.: Multimodal fusion framework: a multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 102(P1), 162–172 (2014) 53. Soria-Frisch, A., Riera, A., Dunne, S.: Fusion operators for multi-modal biometric authentication based on physiological signals. In: IEEE International Conference on Fuzzy Syst, FUZZ 2010, pp. 18–23 (2010) 54. Khaleghi, B., Khamis, A., Karray, F.O., Razavi, S.N.: Multisensor data fusion: a review of the state of the art. Inf. Fusion 14(1), 28–44 (2013)

14

Y. F. Uribe et al.

55. Jeon, T., Yu, J., Pedrycz, W., Jeon, M., Lee, B., Lee, B.: Robust detection of heartbeats using association models from blood pressure and EEG signals. Biomed. Eng. Online 15, 1– 14 (2016) 56. Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015) 57. Van Gerven, M.A.J., Taal, B.G., Lucas, P.J.F.: Dynamic Bayesian networks as prognostic models for clinical patient management. J. Biomed. Inform. 41, 515–529 (2008) 58. Gravina, R., Alinia, P., Ghasemzadeh, H., Fortino, G.: Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges. Inf. Fusion 35, 68–80 (2017) 59. Ringeval, F., et al.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognit. Lett. 66, 22–30 (2015) 60. Alemzadeh, H., Saleheen, M.U., Jin, Z., Kalbarczyk, Z., Iyer, R.K.: RMED: a reconfigurable architecture for embedded medical monitoring. In: 2011 IEEE/NIH Life Science Systems and Applications Workshop, pp. 112–115 (2011) 61. Magalhães, J., Rüger, S.: Information theoretic semantic multimedia indexing. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pp. 619–626 (2007) 62. Sivanathan, A., Lim, T., Louchart, S., Ritchie, J.: Temporal multimodal data synchronisation for the analysis of a game driving task using EEG. Entertain. Comput. 5(4), 323–334 (2014) 63. Ruiz, M.D., Gómez-Romero, J., Molina-Solana, M., Ros, M., Martin-Bautista, M.J.: Information fusion from multiple databases using meta-association rules. Int. J. Approx. Reason. 80, 185–198 (2017) 64. Nemati, S., Malhotra, A., Clifford, G.D.: Data fusion for improved respiration rate estimation. EURASIP J. Adv. Sig. Process. 2010, 926305 (2010) 65. Zong, C.Z.C., Chetouani, M.: Hilbert-Huang transform based physiological signals analysis for emotion recognition. In: 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 334–339 (2009) 66. Martínez, H., Yannakakis, G.: Mining multimodal sequential patterns: a case study on affect detection. In: International Conference on Multimodal, pp. 3–10 (2011) 67. Chen, J., Luo, N., Liu, Y., Liu, L., Zhang, K., Kolodziej, J.: A hybrid intelligence-aided approach to affect-sensitive e-learning. Computing 98(1–2), 215–233 (2016) 68. Chen, L., Zhao, Y., Zhang, J., Zou, J.: Automatic detection of alertness/drowsiness from physiological signals using wavelet-based nonlinear features and machine learning. Expert Syst. Appl. 42(21), 7344–7355 (2015) 69. Su, H., Zheng, G.: A non-intrusive drowsiness related accident prediction model based on DS evidence theory. In: 1st International Conference on Bioinformatics and Biomedical Engineering, ICBBE, pp. 570–573 (2007) 70. Cosoli, G., Casacanditella, L., Tomasini, E., Scalise, L.: Evaluation of heart rate variability by means of laser doppler vibrometry measurements. J. Phys. Conf. Ser. 658, 12002 (2015) 71. Fatemian, S.Z., Agrafioti, F., Hatzinakos, D.: HeartID: cardiac biometric recognition. In: IEEE 4th International Conference Biometrics Theory, Applications and Systems, BTAS 2010, pp. 1–5 (2010) 72. Pantelopoulos, A., Saldivar, E., Roham, M.: A wireless modular multi-modal multi-node patch platform for robust biosignal monitoring. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 6919– 6922 (2011) 73. Zreik, M., Ben-Tsvi, Y., Taub, A., Almog, R.O., Messer, H.: Detection of auditory stimulus onset in the pontine nucleus using a multichannel multi-unit activity electrode. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 2, no. 17, pp. 2708–2711 (2011)

Physiological Signals Fusion Oriented to Diagnosis - A Review

15

74. Ueda, H., Miyawaki, M., Hiraoka, H.: High-normal blood pressure is associated with newonset electrocardiographic left ventricular hypertrophy. J. Hum. Hypertens. 29(1), 9–13 (2015) 75. Benoit, A., et al.: Multimodal focus attention and stress detection and feedback in an augmented driver simulator. Pers. Ubiquitous Comput. 13(1), 33–41 (2009) 76. Ai, L., Wang, J., Wang, X.: Multi-features fusion diagnosis of tremor based on artificial neural network and D–S evidence theory. Sig. Process. 88, 2927–2935 (2008) 77. Sukuvaara, T., Heikela, A.: Computerized patient monitoring. Acta Anaesthesiol. Scand. 37, 185–189 (1993) 78. Liou, L.M., et al.: Functional connectivity between parietal cortex and the cardiac autonomic system in uremics. Kaohsiung J. Med. Sci. 30(3), 125–132 (2014) 79. Almasri, M.M., Elleithy, K.M.: Data fusion models in WSNs: comparison and analysis. In: Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education -Engineering Education: Industry Involvement and Interdisciplinary Trends, ASEE Zone 1, no. 203 (2014) 80. Synnergren, J., Gamalielsson, J., Olsson, B.: Mapping of the JDL data fusion model to bioinformatics. In: Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 1506–1511 (2007) 81. Uluda, K., Roebroeck, A.: General overview on the merits of multimodal neuroimaging data fusion. NeuroImage 102(P1), 3–10 (2014) 82. Mohamed, S., Haggag, S., Nahavandi, S., Haggag, O.: Towards automated quality assessment measure for EEG signals. Neurocomputing 237, 281–290 (2017)

Optimized Artificial Neural Network System to Select an Exploration Algorithm for Robots on Bi-dimensional Grids Liesle Caballero1,2(&) , Mario Jojoa1(&) and Winston Percybrooks1(&)

,

1

Universidad del Norte, Barranquilla, Colombia {lieslec,jojoam,wpercyb}@uninorte.edu.co 2 Institución Universitaria ITSA, Barranquilla, Colombia

Abstract. This article shows how Machine learning techniques are tested to predict the performance of different exploration algorithms: Random Walk, Random Walk WSB and Q Learning, for robots moving on a bi-dimensional grid. The overall objective is to create a tool to help select the best performing exploration algorithm according to a configurable testing scenario, without the need to perform new experiments, either physical or simulated. The work presented here focuses on optimizing the topology of an Artificial Neural Network (ANN) to improve prediction results versus a previously proposed approach. The Hill Climbing algorithm is tested as optimization method, compared with manual trial and error optimization. The ANN was selected because it has the best performance indicators in terms of Relative Absolute Error and Pearson Correlation Coefficient compared with Random Forest and Decision Trees. The metric used to measure the performance of the exploration algorithms is Maximum Number of Steps to target. Keywords: Machine learning  2D grid exploration  Artificial neural network Optimization algorithms  Robots

1 Background The problem of exploration of an unknown environment has utmost importance in the area of mobile robotics due to its wide real-world applications, such as space exploration, search & rescue, hazardous material handling and military operations, among others [1, 2]. As a result, numerous publications about exploration algorithms have appeared in recent years [3–10]. In all these works, the problem of how to comparatively evaluate different competing exploration strategies is solved using a limited set of experimental conditions that are typically run in simulators. From the simulation results, evaluation metrics are quantified and used to select the algorithm with the best performance. If any change is made on the testing scenarios, then new simulations are required to determine the comparative performance of the algorithms under consideration. The work presented here shows an alternative approach to evaluate and compare the performance of exploration algorithms. The general objective is to develop a © Springer Nature Switzerland AG 2018 J. E. Serrano C. and J. C. Martínez-Santos (Eds.): CCC 2018, CCIS 885, pp. 16–27, 2018. https://doi.org/10.1007/978-3-319-98998-3_2

Optimized Artificial Neural Network System

17

practical tool that predicts the performance of a given exploration algorithm under a configurable testing scenario, without needing additional experiments (either physical or simulated). The proposed approach, first described by the authors in [11], is based on predicting, as opposed to measuring through new experiments, the performance of the algorithms under the new testing scenarios. This method uses a prediction model extracted from measured experimental algorithm’s performance under initial testing scenarios. Under this approach, algorithm’s performance is treated as a random variable that can be modeled for prediction. Traditional solutions for predicting the behavior of a random variable use statistical linear regression models and estimation of its probability density function (PDF) [12– 15]. However, as it is shown in [11], these solutions have many limitations related to meeting model requirements when they are applied to this particular application. For example, the number of initial simulations needed to obtain good PDF estimations is too large to be practical, because a different PDF is generated for each intersection point on the bi-dimensional grid. Given the shortcomings of classical prediction models for estimating navigation algorithm performance, in [11] it was proposed a Machine Learning (ML) algorithm as prediction method. Among the ML techniques tested in [11], Artificial Neural Networks (ANN) were identified as the best performers. The objective of the present work is to improve the results obtained in [11] using a optimization model to tune the parameters of the ANN. This optimized method is tested with the same two exploration algorithms considered in [11], Random Walk (RW) and Random Walk Without Step Back (RW WSB), plus a third exploration algorithm called Q Learning which is based on reinforcement learning. Then the main contribution of this paper is to measure the impact of automatic parameter optimization on the performance of the prediction algorithms, compared to the performance of the same algorithms but with manual optimization as done in [11]. In the reviewed literature, it was not possible to find additional studies that use ML to estimate and compare the performance of exploration algorithms. Using the proposed approach, it is possible to compare algorithms in environments that were not initially considered, that is, to make predictions under conditions not contemplated in the initial tests. This method saves up additional data collection, which can be costly and time consuming, becoming an invaluable tool for faster assessment of algorithmic alternatives. With ML, a set of new experimental conditions is entered as input and then the model computes a prediction of the corresponding performance metric for the algorithm under consideration. If effective, ML is a practical alternative in which experimental data needs to be collected only once for training the predictor. The rest of this document is organized as follows: Sect. 2 presents the experimental conditions and gives a detailed description of the proposed prediction system; Sect. 3 presents the methods and algorithms used for ANN optimization; in Sect. 4, the experimental results of the optimization methods are showed and analyzed; Finally, Sect. 5, provide conclusions and discusses results from the prediction models and pointers for future work.

18

L. Caballero et al.

2 Experimental Setup 2.1

Grid Based Scenario

In robotics, grid maps are often used for solving tasks like collision checking, path planning and localization [16]. The exploration algorithms chosen for testing are well known and relatively simple, the reasoning behind this decision is to reduce ambiguity in the interpretation of the new results. Since there is already a solid theoretical and empirical expectation about how those exploration algorithms perform when compared to each other, it allows us to focus solely on finding out if the machine learning predicted performance matches the expected results. For the experiment considered here, a single robot (green dot in Fig. 1(a)) will explore a rectangular grid with the task of finding a target object (red dot in Fig. 1(a)), by moving from intersection to intersection. The robot is limited to move a single intersection at a time on four possible directions: up, down, right and left. The robot knows its starting point, but does not know the location of the target object, nor of any obstacles that may be on the grid. The robot must keep moving until it reaches the target object, as illustrated in Fig. 1(a). For physical experiments, a digital computer executes the exploration algorithms, while a wireless communication module transmits motion commands to a robotic platform, which has proximity and color sensors to identify both, obstacles and target object. Square grids with 3  3, 4  4, 5  5, 6  6 and 7  7 intersections are selected considering the dimensions of the real experimental scenario available for validation. For each grid, a number of obstacles ranging from 0 to 4 are used. Because the objective of this paper is to improve the result obtained in reference [11] then our optimized method must be tested in the same experimental scenario described above, where a bi-dimensional grids, obstacles, and known initial point were used. A 7  7 grid example is shown in Fig. 1(b).

Fig. 1. Bi-dimensional grid and robotic platform in experimental scenario (Color figure online)

Optimized Artificial Neural Network System

2.2

19

General Description of the ML Prediction System

Machine Learning (ML) is a scientific discipline in the field of Artificial Intelligence (AI) that creates systems that can learn automatically from experience and includes techniques capable of generalizing behaviors from information supplied in the form of examples, identifying complex patterns present in the input data [17]. The procedure described below, and illustrated in Fig. 2, is proposed to solve the issue of performance prediction to select the most suitable exploration algorithm.

*R=Number of rows of the grid, C=Number of columns of the grid, O=Number of obstacles, X= x coordinate of target object, Y= y coordinate of target object

Fig. 2. Diagram for building the proposed ML regression system

• Execution of Exploration Algorithm To build the prediction models, each exploration algorithm under test must be executed under different experimental conditions (grid size, number and location of obstacles, location of target object) to compute training values for the chosen performance indicator variable. The Random Walk, Random Walk WSB and Q Learning algorithms were used for testing and evaluation in this work. These are all well known algorithms, therefore easing the task of validating the prediction results. For the Random Walk algorithm each step is randomly taken on one of the possible directions (up, down, right and left). After each step the robot checks if it has reached the target object, and if not, looks for obstacles on the adjacent intersections to update its internal map and decide where to take the next step.

20

L. Caballero et al.

The Random Walk WSB (Without Step Back) algorithm emerges as an improvement over the basic Random Walk described above. It works similarly to the Random Walk algorithm, but in this case the robot remembers its previous position, and so it will never choose to go immediately back to that point, unless forced by obstacles. In this way the robot avoids back-and forward loops possible in the basic random walk [11]. The last exploration algorithm under test is based on Q Learning. Here, the robot chooses its path in a way to maximize some reward measure. After a learning period, the robot finally settles on a relatively stable path as long as the scenario does not change. • Data collection The data set obtained from the simulations of the Random Walk, Random Walk WSB and Q Learning algorithms is described in Table 1. The performance indicator variable is computed for each exploration algorithm. From the work in [11], the appropriate indicator variable for the RW and RW WSB Table 1. Characteristics and experimental conditions to evaluate Random Walk, Random Walk WSB and Q Learning algorithms Configurable variables - Number of grid’s rows (R) - Number of grid’s columns (C) - Number of obstacles (O) - Target object’s X coordinate (X) - Target object’s Y coordinate (Y)

Performance indicator variable Max NS

Study cases

Total dataset

- RxC: 3  3, 4  4, 5  5, 6  6 and 77 - O: 0, 1, 2, 3, 4 - Iterations per training scenario: • 15000 simulations for each experimental condition for RW and RW WSB cases • 200 learning cycles for Q Learning case

300 distinct scenarios

algorithms is Max NS, defined as “the maximum number of steps in which the robot reached the target object, for a fixed set of conditions”. Both exploration algorithms, RW and RW WSB, are executed 15000 times to obtain a value for Max NS for each experimental condition. For the case of the Q Learning algorithm, initial tests were run with 1000, 5000, 10000, 15000, 20000 and 25000 learning cycles. However, it was identified that the learning occurred between the first 100 learning cycles in all cases. Therefore, for the results reported here, data collection was done with only 200 learning cycles for each experimental condition. Max NS is then computed from the last 100 cycles, when the variance of the data is minimal.

Optimized Artificial Neural Network System

21

After the data is collected, techniques of machine learning such as Artificial Neural Networks (ANNs) and decision trees are used to solve the prediction problem. • Prediction Performance Parameters - RAE and R The ML prediction is evaluated using the following two parameters: – Percentage relative absolute error ð%RAEÞ: This compares true values with their estimates but relating it to the scale of its true value [18].  PM  dNSi  Max NSi  Max  i¼1   100% %RAE ¼ PM    i¼1 Max NS  Max NSi

ð1Þ

Where, M is the number of experimental conditions evaluated, Max NSi are the dNSi are the predicted values experimental values that the variable Max NS takes, Max and Max NS is a mean value of Max NS. The variable Max NS could be in different ranges for each exploration algorithm, for this reason it was necessary to use a performance measure that can be easily compared across different variable ranges. Using RAE the errors should be comparable. – Pearson correlation coefficient ðRÞ: This measures the similarity between the true values and the predicted values of the variables, that is, their linear dependence. Independently of the scale of measurement of the variables, the correlation meadNS. This function gives values sures the similarity between Max N and Max between −1 and 1, where 0 is no relation, 1 is very strong, linear relation and −1 is dNS, an inverse linear relation [18]. In terms of the covariance of Max NS and Max the correlation coefficient R is definied as:   dNS   cov Max NS; Max dNS ¼ R ¼ q Max NS; Max rMax NS  r d

ð2Þ

Max NS

is the standard deviation of Max NS and r d is the standard Max NS dNS. deviation of Max For the selection of the most suitable ML algorithm the following criteria were followed: Where rMax

NS

– Low percentage Relative Absolute Error ð%RAEÞ, means that exist a narrow distance between the prediction data and true data. – High correlation coefficient ðRÞ between true values and predicted values, means that the predicted data is similar with the true data, that is, the variables have a high degree of relationship.

22

L. Caballero et al.

A low %RAE and R close to 1 will generate a good prediction and in consequence, can be reliably used to select the best performing exploration algorithm. • Training, validation and testing The selected machine learning algorithms will be evaluated using a data partition technique, cross-validation, which is commonly used in prediction problems. Crossvalidation is a way to divide the input data in one testing set and k−1 training sets to evaluate its performance. The k value is chosen according to the size of the data and the building of the sets is done randomly. Cross-validation then involves k-folds that are randomly chosen and of roughly equal size. This process is repeated k times as each subset is used once for validation [19]. • Comparison The box plot is a graph based on quartiles that visualizes the distribution of a data set and allows two or more data sets to be compared. For our testing scenario, lower values of the performance indicator variable Max NS are preferred. Consequently, the best performing algorithm is the one that has the box with the lowest height.

3 ML Optimization As a first approach, optimization by manual trial and error is applied on the following ML algorithms: ANN and Decision tree. These algorithms are commonly used to solve prediction problems. According to [11], ANN delivers better prediction results than Decision tree for RW and RW WSB which is confirmed with the results from the dataset used in this work, as shown in Table 2. For this reason, in this work the focus is on testing parameter optimization algorithms to improve the prediction indicators for the ANN. For an ANN, the number of layers and neurons can be adjusted in order to improve the performance indicators, i.e. lower %RAE and raise R. According to [11], the Bayesian regularization and Levenberg-Marquardt training algorithms achieve the lowest %RAE and the highest R for the RW and RW WSB exploration algorithms, when using manual trial and error to optimize the number of layers and neurons of the ANN. Table 2 shows the results obtained with these training methods (and manual optimization) with a new data set for three exploration algorithms: RW, RW WSB and Q Learning. The previous results, obtained using manual trial and error optimization, are used as baseline performance indicators for the proposed estimation algorithm. To improve those results, alternative optimization solutions using conventional and unconventional methods are reviewed. The work in [20] defines the optimization of a problem P, as the task of looking for parameter values that applied to P, satisfy certain expectations. For each problem P, an objective function f that measures the suitability of each possible solution of P is defined. The domain of f , that is, the set of points that can be proved as a solution to the problem, is called the space of solutions S. In an optimization problem, it is possible to find many areas of S with relatively good solutions (local optimums),

Optimized Artificial Neural Network System

23

Table 2. Comparison between the correlation coefficients and relative absolute error obtained for three exploration algorithms and two technique ML using different training algorithms. Exploration algorithm Random Walk

Technique ML ANN

Decision tree Random Walk WSB

ANN

Decision tree Q Learning

ANN

Decision tree

Training algorithm

R

RAE½%

Bayesian regularization Levenberg-Marquardt Trees: Random Forest Trees: RandomTree Bayesian regularization Levenberg-Marquardt Trees: Random Forest Trees: RandomTree Bayesian regularization Levenberg-Marquardt Trees: Random Forest Trees: RandomTree

0.9194

33.89

No. neurons per layer [27]

0.8911 0.8872

44.26 36.1397

[27] N/A

0.8135 0.9203

46.2505 34.5400

N/A [7]

0.9019 0.9093

38.2600 36.4538

[7] N/A

0.8783 0.8830

43.5831 42.89

N/A [9]

0.8658 0.8655

46.03 43.21

[9] N/A

0.7837

54.98

N/A

while a single area of S, or a few ones in the best case, provide the best overall solution (global optimum) [20]. For the work presented here, Hill Climbing optimization is proposed to search for the ANN topology (i.e. number of layers and neurons per layer) that maximizes R and minimizes RAE. In order to improve the changes of finding a global optimum as opposed to a local one, Hill Climbing is complemented with random restart, i.e. Hill Climbing is restarted several times using randomly chosen starting points [21]. According to [11] neural networks of 1 or 2 layers with few neurons generated predictors with good performance, for this reason, in this work the number of possible ANN layers has been limited to 1 or 2, while the number of neurons has been limited to a natural number in the range [1, 20]. As a result the solution space S for this optimization problem is bounded, which also increases the chances of finding the global optimum using Hill Climbing.

4 Experimental Results 4.1

Topology Optimization Using Hill Climbing Method

The results showed on Table 3 were obtained using Levenberg-Marquardt algorithm to train the ANN while performing parameter optimization with three different methods: manual trial-error, Hill Climbing and Hill Climbing (with Re-training).

24

L. Caballero et al.

Table 3. Comparison between the correlation coefficients and relative absolute error obtained for three exploration algorithms and three optimization methods to get the ANN topology using Levenberg-Marquardt training algorithm Exploration algorithms Random Walk

Random Walk WSB Q Learning

Method for optimizing ANN topology Trial and error Hill Climbing Hill Climbing (Re-training) Trial and error Hill Climbing Hill Climbing (Re-training) Trial and error Hill Climbing Hill Climbing (Re-training)

R

RAE½%

0.8911 0.9013 0.9815 0.9019 0.9206 0.9817 0.8658 0.8838 0.9628

44.26 40.84 12.79 38.26 37.03 15.49 46.03 46.63 19.40

No. neurons per layer [27] [8] [16:20] [7] [11] [10:19] [9] [2:2] [20:20]

The Hill Climbing algorithm with re-training obtained the best results for all three exploration algorithms. This indicates that the retraining process got a better adjustment of the ANN weights for the task. It is very important to use cross-validation to reduce the effect of over-fitting. For this reason, only three cycles of retraining were used. The results showed on Table 4 were obtained in a similar way to the ones on Table 3 but replacing Levenberg-Marquardt training with Bayesian regularization. Table 4. Comparison between the correlation coefficients and relative absolute error obtained for three exploration algorithms and three optimization methods to get the ANN topology using Bayesian regularization training algorithm Exploration algorithms Random Walk

Random Walk WSB Q Learning

4.2

Method for optimizing ANN topology Trial and error Hill Climbing Hill Climbing (Re-training) Trial and error Hill Climbing Hill Climbing (Re-training) Trial and error Hill Climbing Hill Climbing (Re-training)

R

RAE½%

0.9194 0.9296 0.9887 0.9203 0.9297 0.9864 0.8830 0.8900 0.9613

33.89 35.35 9.94 34.54 32.19 10.85 42.89 40.83 8.51

No. neurons per layer [8] [9:5] [19:4] [12] [5:3] [19:4] [9] [4:19] [20:8]

Comparison Between Real Data and Predicted Data

The topologies that achieved the best %RAE and R results from the previous subsection were used to build the ANN predictors. Each ANN computes an estimation of the corresponding value for Max NS given a new set of scenario parameters, obtaining immediate results without the need for further experiments.

Optimized Artificial Neural Network System

25

In order to evaluate the reliability of the predicted results to compare the performance of different exploration algorithms, we compare the predictions against the real experimental results when the exploration algorithms are used on the testing scenarios. The box plot chart in Fig. 3 shows the predicted data in contrast with the real data for each algorithm. The shape of the graphic for both sets of data is very similar. This suggests that the predicted results can be used as reliable decision parameters when choosing between exploration algorithms, since using them will gave the same conclusion than using actual experimental data. In our particular comparison the predicted data shows that the Q learning algorithm has a better performance than the Random Walk and Random Walk WSB algorithms, since the box plot corresponding to the predicted data for Q Learning, has lower values in each quartile with respect to the box plot for Random Walk and Random Walk WSB. Such result is consistent with what is indicated by the actual experimental data.

Fig. 3. Comparison between real and predicted target variable NS Max

5 Conclusions This research work extends previous work by the authors [11] demonstrating that an ML system can predict the performance of different exploration algorithms for robots moving on a bi-dimensional grid. The predicted values can then be compared to select the best exploration algorithm. A dataset of 300 different examples is enough to find a predictor with good performance, as measured by the corresponding %RAE and R values. Different experimental scenarios, not contemplated in the initial training dataset, are used for testing.

26

L. Caballero et al.

This work focuses on optimizing the topology of a neural network, that is, the number of layers and number of neurons in order to improve the initial results reported in [11]. Further effectiveness of the proposed method is also established by introducing a new exploration algorithm (Q Learning) into the tests. Three different optimization strategies are compared: manual trial and error, Hill Climbing and Hill Climbing with re-training. With single Hill Climbing the prediction performance improves versus the manual trial and error, however Hill Climbing with re-training outperforms the other two methods. Hill Climbing with re-training reduced %RAE from 33,89% to 9,94%, from 34,54% to 10,85% and from 42,89% to 8,51% respectively for the RW, RW WSB and Q learning exploration algorithms, when compared to manual trial and error optimization. Simultaneously, R was increased from 0,9194 to 0,9887, from 0,9203 to 0,9864 and from 0,8830 to 0,9313. Combining the results presented here with the ones previously reported [11] indicates the viability of building a Machine Learning-based tool to compare exploration algorithms under configurable testing scenarios. In future works, alternative optimization methods would be considered looking to reduce the current computational cost.

References 1. Zhang, Y., Gong, D., Zhang, J.: Robot path planning in uncertain environment using multiobjective particle swarm optimization. Neurocomputing 103, 172–185 (2013) 2. Wu, H., Tian, G., Huang, B.: Multi-robot collaboration exploration based on immune network model. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM 2008, pp. 1207–1212 (2008) 3. Andreychuk, A., Bokovoy, A., Yakovlev, K.: An empirical evaluation of grid-based path planning algorithms on widely used in robotics raspberry pi platform. In: The 2018 International Conference on Artificial Life and Robotics (ICAROB 2018), pp. 1–4 (2018) 4. Akutsu, T., Yaoi, S., Sato, K., Enomoto, S.: Development and comparison of search algorithms for robot motion planning in the configuration space. In: Proceedings IROS 1991: IEEE/RSJ International Workshop on Intelligent Robots and Systems ’91, pp. 429–434 (1991) 5. Faigl, J., Simonin, O., Charpillet, F.: Comparison of task-allocation algorithms in frontierbased multi-robot exploration. In: Bulling, N. (ed.) EUMAS 2014. LNCS (LNAI), vol. 8953, pp. 101–110. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17130-2_7 6. Kulich, M., Juchelka, T., Přeučil, L.: Comparison of exploration strategies for multi-robot search. Acta Polytech. 55(3), 162 (2015) 7. Holz, F., Behnke, S., Basilico, N., Amigoni, F.: Evaluating the efficiency of frontier-based exploration strategies. In: ISR/Robotik 2010, vol. 1, no. June, p. 8 (2010) 8. Juliá, M., Gil, A., Reinoso, O.: A comparison of path planning strategies for autonomous exploration and mapping of unknown environments. Auton. Robots 33(4), 427–444 (2012) 9. Amigoni, F.: Experimental evaluation of some exploration strategies for mobile robots. In: IEEE International Conference on Robotics and Automation (2008) 10. Martínez Puerta, J.J., Vallejo Jiménez, M.M.: Comparación de estrategias de navegación colaborativa para robótica móvil. Universidad Autónoma de Manizales (2016) 11. Caballero, L., Benavides, C., Percybrooks, W.: Machine learning-based system to estimate the performance of exploration algorithms for robots in a bi-dimensional grid. In: 2017 IEEE 3rd Colombian Conference on Automatic Control (CCAC), pp. 1–6 (2018)

Optimized Artificial Neural Network System

27

12. Pitarque, A., Ruiz, J.C., Roy, J.F.: Las redes neuronales como herramientas estadísticas no paramétricas de clasificación. Psicothema 12(SUPPL. 2), 459–463 (2000) 13. Tabachnick, B., Fidell, L.: Using Multivariate Statistics. Harper & Row, New York (1996) 14. Cohen, J., Cohen, P.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 545 p. Taylor & Francis, Milton Park (1983) 15. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, 8th edn. Springer, New York (2017). https://doi.org/10.1007/978-1-4614-7138-7 16. Lau, B., Sprunk, C., Burgard, W.: Efficient grid-based spatial representations for robot navigation in dynamic environments. Rob. Auton. Syst. 61(10), 1116–1130 (2013) 17. Flach, P.: Data, Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, New York (2012) 18. machine learning - How to interpret error measures? - Cross Validated (2017). https://stats. stackexchange.com/questions/131267/how-to-interpret-error-measures. Accessed 28 Feb 2018 19. Cross-Validation - MATLAB & Simulink (2018). https://la.mathworks.com/discovery/ cross-validation.html. Accessed 20 Apr 2018 20. Izquierdo, S.K., Rodó, D.M., Bakx, G.E., Iglésias, R.B.: Inteligencia artificial avanzada. Editorial UOC - Editorial de la Universitat Oberta de Catalunya (2013) 21. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)

Comparative Analysis Between Embedded-Spaces-Based and Kernel-Based Approaches for Interactive Data Representation C. K. Basante-Villota1,2,3,4 , C. M. Ortega-Castillo1,2,3,4(B) , an1 , J. A. Salazar-Castro2,3 , D. F. Pe˜ na-Unigarro1,2,3,4 , J. E. Revelo-Fuelag´ and D. H. Peluffo-Ord´ on ˜ez3,4 1

3

Universidad de Nari˜ no, Pasto, Colombia [email protected] 2 Universidad Nacional, sede Manizales, Manizales, Colombia Corporaci´ on Universitaria Aut´ onoma de Nari˜ no, Pasto, Colombia 4 Yachay Tech, Urcuqu´ı, Ecuador

Abstract. This work presents a comparative analysis between the linear combination of em-bedded spaces resulting from two approaches: (1) The application of dimensional reduction methods (DR) in their standard implementations, and (2) Their corresponding kernel-based approximations. Namely, considered DR methods are: CMDS (Classical Multi- Dimensional Scaling), LE (Laplacian Eigenmaps) and LLE (Locally Linear Embedding). This study aims at determining -through objective criteria- what approach obtains the best performance of DR task for data visualization. The experimental validation was performed using four databases from the UC Irvine Machine Learning Repository. The quality of the obtained embedded spaces is evaluated regarding the RN X (K) criterion. The RN X (K) allows for evaluating the area under the curve, which indicates the performance of the technique in a global or local topology. Additionally, we measure the computational cost for every comparing experiment. A main contribution of this work is the provided discussion on the selection of an interactivity model when mixturing DR methods, which is a crucial aspect for information visualization purposes.

Keywords: Artificial intelligence Dimensionality reduction methods CMDS · LLE · LE

1

· Kernel · Kernel PCA

Introduction

Nowadays, the large volumes of data are accompanied by the need of powerful tools for analysis and representation, as, you could have a dense repository of data, but without the appropriate tools the information obtained may not c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 28–38, 2018. https://doi.org/10.1007/978-3-319-98998-3_3

Comparative Analysis Between Two Approaches

29

be very useful [1]. The need arises to find different techniques and tools that help researchers or analysts in tasks such as obtaining useful patterns for large volumes of data, these tools are the subject of an emerging field of research known as Knowledge Discovery in Bases of Data (KDD). Dimension reduction (DR) is considered within the KDD process as a pre-processing stage because it projects the data to a space where the original data is represented with fewer attributes or characteristics, preserving the greater intrinsic information of the original data to enhance tasks such as data mining and machine learning. For example, in classification tasks knowing the representation of the data as well as knowing whether these have separability characteristics, make easier to engage and interpret by the user [2,3]. We have two method PCA (Principal Component Analysis) and the CMDS (Classical Multi-Dimensional Scaling) which are part of those classic RD methods whose objective is to preserve variance or distance [4]. Recently, the focus of DR methods is based on criteria aimed at preserving the data topology. A topology of this type could be represented in an undirected and weighted graph based on data constructed whose points represent the nodes, and their edge’s weights are contained in an affinity and non-negative similarity matrix. This representation is leveraged by methods based on spectral and divergence approaches, for the spectral approach we can represent the weights of the distances in a similarity matrix, such as with the LE (Laplacian Eigenmaps) method [5] and using a matrix of unsymmetrical similarity and focusing on the local structure of the data, the method called LLE (Locally Linear Embedding) arises [6]. There is also the possibility of working on the high-dimensional space with the advantage of greatly enhancing the representation and the embedded data visualization of the original space mapped to the high-dimensional space, from the calculation of the eigen decomposition. An estimate of the inner product (kernel) can be designed based on the function and application which one wants to develop [7], in this work the kernel matrices will represent distance or similarity functions associated with a dimension reduction method. In this research three spectral dimension reduction methods are considered, trying to encompass different criteria which CMDS, LLE and LE are based on, these are used under two approaches, one of them is the representation of their embedded spaces obtained from their standard algorithms widely explained in [5,6,8], and the second is based on the kernel approaches of the same methods. After obtaining each of the embedded spaces, a linear weighting is performed for combine the different approaches leveraging each of the RD methods, the same is done for the kernel ma-trices obtained from the approximations of the spectral methods. Subsequently the Kernel PCA technique is applied to reduce the dimension to obtain the embedded space from the combination of the kernelbased approach. The combination of embedded spaces already obtained from the RD methods is not clear and intuitive mathematically, on the other hand, the linear combination of kernel or similarity matrices which are represented in the same infinite space is more intuitive and concise mathematically. Nevertheless, in tasks such as visualization of information, choosing any of the two interaction

30

C. K. Basante-Villota et al.

methods for dimension reduction is a crucial task on which the representation of the data and also the interpretation by the user will depend, therefore this research proposes the quantitative and qualitative comparison in addition to the demonstration of the previous assumption in order to contribute to machine learning tasks, visualization data, data mining where dimension reduction execute an imperative role, For example, perform tasks of classification of high dimension data, it is necessary to visualize them in such a way that they are understandable for non-expert users who want to know he topology of the data and characteristics such as separability which aid to determine which classifier could be adequate for determinate data record.

2

Methodology

Mathematically, the objective of dimension reduction is to map or project (linear transformation) data from a high-dimensional space Y ∈ RD×N a lowdimensional space X ∈ Rd×n , where d < D, therefore, The original data and the embedded data will consist of N points or registers, denoted respectively by yi ∈ RD and Xi ∈ Rd with {K (1) , · · · , K (M ) } [5,6]. It means that the number of samples in the high-dimensional data matrix would not be affected when the number of attributes or characteristics is reduced. In order to represent the resulting embedded space in a two-dimensional Cartesian plane, this research takes into account only the two main characteristics in the kernel matrix, which represent most of the information in the original space. 2.1

Kernel Based Approaches

The RD method known as principal component analysis (PCA) is a linear projection that tries to preserve the variance from the values and eigenvectors of the covariance matrix [9,10]. Moreover, when a data matrix is centered, which means that the average value of the rows (characteristics) is equal to zero, the preservation of variance could be named as a preservation of the Euclidean internal product [9]. Kernel PCA method is as similar as PCA method which maximizes the variance criterion, but in this case of a kernel matrix, which is basically an internal product of an unknown space of high dimension. We define φ ∈ RD×N a highdimensional space with Dh  D, which is completely unknown except for its internal product that can be estimated [9]. To use the properties of this new high-dimensional space and its internal product, it is necessary to define a function φ(· ) that can map the data from the original space to the high-dimension (φ) as follows: φ(· ) : RD RDh yi ⇒ φ(yi ), where the i-th vector column of the matrix φ = φ(yi ).

(1)

Comparative Analysis Between Two Approaches

31

Considering the conditions of Mercer [11], and the matrix f is centered, the internal product of the kernel function K(· , · ) can be calculated as follows: φ(yi )T φ(yi ) = K(yi , yj ). In short, the kernel function can be understood as a composition of the mapping generated by φ(· ) and its scalar product as follows: φ(yi )T φ(yi ), so for each pair of elements of the set Y its scalar product is directly assigned without going through the mapping (φ). Organizing all possible internal products in a KN ×N array will result in a kernel matrix: KN ×N = ϕT Dh ×N ϕDh ×N .

(2)

The advantage of working with the high-dimensional space (φ) is that it can greatly improve the representation and visualization of the embedded data from the original space mapped to the high-dimensional space, from the calculation of the eigenvalues and eigenvectors of its product internal. An estimation of the internal product (kernel) can be designed based on the function and application that the user wants to develop [12], in this case the kernel matrices will represent distance functions associated with a dimension reduction method, approximations kernels presented below are widely explained in [13]. The kernel representation for the CMDS reduction method is defined as the distance matrix D ∈ RR×N doubly centered, that is, making the mean of the rows and columns zero, as follows: 1  KCM DS = − (IN − 1N 1 N )D(IN − 1N 1N ), 2

(3)

where the ij entry of D is given by the Euclidean distance: dij = ||yi − yj ||22 .

(4)

A kernel for LLE can be approximated from a quadratic form in terms of the matrix W holding linear coefficients that sum to 1 and optimally reconstruct observed data. Define a matrix M ∈ RN ×N as M = (IN − W)(IN − W  ) and λmax as the largest eigenvalue of M . Kernel matrix for LLE is in the form: KLLE = λmax IN − M .

(5)

Considering that kernel PCA is a maximization problem in the high-dimensional covariance represented by a kernel, LE can be represented as the pseudo-inverse matrix of the graph L, as shown in the following expression: KLE = L† ,

(6)

where L = D − S, S, such that S is a dissimilarity matrix and D = Diag(S1N ) is the degree matrix is the matrix of the degree of S. The similarity matrix S is organized in such a way that the relative width parameter is estimated by maintaining the entropy of the distribution with the nearest neighbor with approximately log K, where K is the given number of neighbors as explained in [14]. For this investigation the number of neighbors was established as the integer closest to 10% of the amount of data.

32

C. K. Basante-Villota et al.

Finally, to project the data matrix Y ∈ RD×N into an embedded space X ∈ Rd×N we use the PCA dimension reduction method. In PCA, the embedded space is obtained by selecting the most representative eigenvectors of the covariance matrix [6,10]. Therefore, we obtain the d most representative eigenvectors of the kernel matrix KN ×N obtained previously, constructing the embedded space X. As it was said for this research, the embedded space with two dimensions that represents most of the characteristics of the data is established. 2.2

DR-Methods Mixturing

In terms of data visualization through RD methods, the parameters to be combined are the kernel matrices and the embedded spaces obtained in each method, each matrix corresponds to each of the M RD methods considered, that is {K (1) , · · · , K (M ) }. Consequently, a matrix is obtained depending on the kernel approach or final embedded space K resulting from the mixing of the M matrices, such that: M  = αm K (m) , (7) K m=1

Defining αm as the weighting factor corresponding to the method M and α = {α1 , · · · , αm } as the weighting vector. In this research these parameters will be defined as 0.333 for each of the three methods used, so the sum of the three will be 1 in order to provide to each method equal priority, since the aim of this research is to present a comparison of each proposed approach in a equal conditions scenario, Each K ( M ) will represent the kernel matrices obtained after applying the approximations presented in Eqs. (3), (5) and (6) or the embedded spaces obtained by applying the RD methods in their classical algorithm.

3

Results

Data-Sets: Experiments are carried out over four conventional data sets. The first data set (Fig. 1(a)) is an artificial spherical shell (N = 1500 data points and D = 3). The second data set (Fig. 1(c)) is a toy set here called Swiss roll (N = 3000 data points and D = 3). The third data set (Fig. 1(d)) is Coil 20 is a database of gray-scale images of 20 objects. Images of the objects were taken at pose intervals of 5 degrees. This corresponds to 72 images per object (N = 1440 data points 20 and D = 1282 -number of pixels) [15]. The fourth data set (Fig. 1(b)) is a randomly selected subset of the MNIST image bank [11], which is formed by 6000 gray-level images of each of the 10 digits (N = 1500 data points 150 instances for all 10 digits and D = 242). Figure 1 depicts examples of the considered data sets. Performance Measure: In dimensionality reduction, the most significant aspect, which defines why a RD method is more efficiency, is the capability of preserve the data topology in low-dimensional space regarding the high-dimension.

Comparative Analysis Between Two Approaches

33

0.5 0.5 0 −0.5

0

−0.5

−0.5 0

0.5

(a) Sphere

(c) Swiss Roll

Fig. 1. The fourth considered datasets, datasets.html.

(b) MNIST

(d) Coil

source: https://archive.ics.uci.edu/ml/

Therefore, we apply a quality criterion used by conserving the k-th closest neighbors developed in [16], as efficiency measure for each approach proposed for the interactive RD methods mixture. This criterion is widely accepted as an adequate unsupervised measure [14,17], which allows the embedded space to assess in the following way: The rank of εj with respect to εi in high-dimensional space is denoted as: pij = |{k : δ ik < δ ij or (δ ik = δ ij and 1 ≤ k < j ≤ N )}|.

(8)

34

C. K. Basante-Villota et al.

In Eq. (8) | · | denotes the set cardinality. Similarly, in [13] is defined that the range of xj with respect to xi in the low-dimensional space is: rij = |{k : dik < dij or (dik = dij and 1 ≤ k < j ≤ N )}|.

(9)

The k-th neighbors of ζi and xi are the sets defined by (10) and (11), respectively. vi k = {j : 1 ≥ pij < K}, (10) ni k = {j : 1 ≥ rij < K}.

(11)

A first performance index can be defined as: QN X (K) =

N  |vi k ∩ ni k | i=1

KN

= 1.

(12)

Equation (12) results in values between 0 and 1 and measures the normalized average according to the corresponding k-th neighbors between the highdimensional and low-dimensional spaces. Defining in this way a coclasification matrix: (13) [Q = qN X ] f or j ≥ N − 1, whit qkl = |{(i, j) : pij = k and pij = l}|. Therefore QN X (K) counts k-by-k blocks of Q, the range preserved (in the main diagonal) and the permutations within the neighbors (on each side of the diagonal) [12]. This research employs an adjustment of the curve QN X (K) introduced in [12] in order that the area under the curve is an adequate indicator of the embedded data topology preservation, hence, the quality curve that is applied into the visualization methodology is given by: RN X (K) =

(N − 1)QN X (K) − N . N −1−K

(14)

When the equation in (14) is expressed logarithmically, errors in large neighborhoods are not proportionally as significant as small ones [14]. This logarithmic expression allows obtaining the area under the curve of RN X (K) given by: N −2 AU C logK (RN X (K)) =

R N X (K ) K=1 K N −2 1 K=1 K

.

(15)

The results obtained by applying the methodology proposed over four data bases described, are shown in Fig. 2, where the curve RN X (K) of each approach is presented as well as the AU C in (13) which assess the dimension reduction quality corresponding to each proposed combination. As a result, for RD procedure in terms of visualization we show the embedded space for each test performed. It is necessary to clarify that each combination was carried out same scenario with equal conditions which allows us to measure a computational cost in terms of execution time, which are shown in Table 1. This is an important

Comparative Analysis Between Two Approaches

35

Table 1. Consumed time for performing each approach over the fourth dataset. Based approach

Dataset

Computacional time (sec)

Kernel

3D sphere 6, 27 Swiss Roll 6, 43 Coil-20 28, 94 MINST 37, 87

Embedded-spaces 3D sphere 2, 88 Swiss Roll 3, 09 Coil-20 15, 24 MINST 16, 24

issue if users are seeking for an interactive RD methods mixture which has a satisfactory performance, as well as an efficient computational development. Nevertheless, results achieved in this research allows us to conclude that in data visualization terms performing an interactive mixture RD method based on kernel is more favorable than based on standard methods, mathematically combining a kernel approximations, which means that each kernel approximation is in the same high-dimensional space where all classes are separable before developing the mixture, is more appropriate than combining obtained embedded space from an unknown space which are the standard methods. The computational cost (Table 1) allows us to infer that the cost in executing kernel approaches and PCA kernel application for dimension reduction is a slightly more elevated in all cases. This is since the databases have a high number of registers, which means that acquiring the kernel matrices involves a lot of processing, as if the data base consists of n samples, the kernel matrix size will be N × N . Making a comparison of the RN X (K)curves for each database, there is a low performance in the dimension reduction process for the case of the Coil-20 database whose AUC is the lowest among all, which means that the data topology in the embedded space obtained is not as conserved as in the other studied cases. Evidently the best performance was accomplished for 3D spherical shell and Swiss roll which obtained the best AUC and preserve the data local structure, generally preserved local structure generates superior embedded spaces [13]. On the other hand, MNIST and spherical shell database preserved the global data structure in a preferable way as regards the other cases.

36

C. K. Basante-Villota et al. Kernel−Based

RNX(K) Curve

Embedded−Based

100RNX (K)

60 40 20

100RNX (K)

0 0 10

48.5 Kernel-Based 47.6 Embedded-Based 10

1

10

K

2

60 40 20 0 0 10

44.2 Kernel-Based 50.8 Embedded-Based K

10

2

(a) Results for datasets: Sphere 3D and Swiss Roll.

100RNX (K)

RNX (K) Curve

Kernel−Based

Embeddes−Based

30 20 10 0 0 10

16.0 Kernel-Based 9.0 Embedded-Based K

10

2

100RNX (K)

60 40 20 0 0 10

43.0 Kernel-Based 41.6 Embedded-Based 1

10

K

10

2

(b) Results for datasets: MNIST and Coil-20.

Fig. 2. Results obtained for the four experimental databases

4

Conclusion

This work presented a comparative analysis of two different approaches for DRmethods mixturing which are applied in an interactive. Results obtained in this research allows us to conclude that performing an interactive DR-methods mixture could be a tough task for a dataset with a great number of points and dimensions as it was proved that the computational cost is higher but also this approach gives to users a high-quality performance since, a greater area is obtained under the quality curve which indicates that the topology of the data can be preserved more. On the other hand, embedded-spaces-based approach has a slightly difference in the RN X (K) AUC curve, but it is not wide so if the user wants to carry out a quicker mixture, the embedded-spaces-based approach

Comparative Analysis Between Two Approaches

37

will be more appropriate for data visualization where interactivity is the most important achievement seeking a better perception for the inexpert users of their datasets. Acknowledgements. This work is supported by the “Smart Data Analysis Systems SDAS” group (http://sdas-group.com), as well as the “Grupo de Investigaci´ on en Ingenier´ıa El´ectrica y Electr´ onica - GIIEE” from Universidad de Nari˜ no. Also, the authors acknowledge to the research project supported by Agreement No. 095 November 20th, 2014 by VIPRI from Universidad de Nari˜ no.

References 1. Sacha, D., et al.: Visual interaction with dimensionality reduction: a structured literature analysis. IEEE Trans. Vis. Comput. Graph. 23(1), 241–250 (2017) 2. Peluffo Ordo˜ nez, D.H., Lee, J.A., Verleysen, M.: Recent methods for dimensionality reduction: a brief comparative analysis. In: 2014 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014) (2014) 3. Peluffo-Ord´ on ˜ez, D.H., Castro-Ospina, A.E., Alvarado-P´erez, J.C., ReveloFuelag´ an, E.J.: Multiple kernel learning for spectral dimensionality reduction. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. LNCS, vol. 9423, pp. 626–634. Springer, Cham (2015). https://doi.org/10. 1007/978-3-319-25751-8 75 4. Belanche Mu˜ noz, L.A.: Developments in kernel design. In: ESANN 2013 Proceedings: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning: Bruges (Belgium), 24–26 April 2013, pp. 369–378 (2013) 5. Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X 6. Lee, J.A., Verleysen, M.: Quality assessment of dimensionality reduction: rankbased criteria. Neurocomputing 72(7–9), 1431–1443 (2009) 7. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003) 8. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000) 9. Peluffo-Ord´ on ˜ez, D.H., Lee, J.A., Verleysen, M.: Generalized kernel framework for unsupervised spectral methods of dimensionality reduction. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 171–177. IEEE (2014) 10. Gij´ on G´ omez, J.: Visualizaci´ on bidimensional de problemas de clasificaci´ on en alta dimensi´ on. B.S. thesis (2013) 11. Ham, J., Lee, D.D., Mika, S., Sch¨ olkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 47. ACM (2004) 12. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000) 13. Lee, J.A., Renard, E., Bernard, G., Dupont, P., Verleysen, M.: Type 1 and 2 mixtures of kullback-leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing 112, 92–108 (2013)

38

C. K. Basante-Villota et al.

14. Cook, J., Sutskever, I., Mnih, A., Hinton, G.: Visualizing similarity data with a mixture of maps. In: Artificial Intelligence and Statistics, pp. 67–74 (2007) 15. Nene, S.A., Nayar, S.K., Murase, H., et al.: Columbia object image library (coil-20) (1996) 16. Chen, L., Buja, A.: Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J. Am. Stat. Assoc. 104(485), 209–219 (2009) 17. France, S., Carroll, D.: Development of an agreement metric based upon the RAND index for the evaluation of dimensionality reduction techniques, with applications to mapping customer data. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 499–517. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3540-73499-4 38

Solving Large Systems of Linear Equations on GPUs Tom´as Felipe Llano-R´ıos1(B) , Juan D. Ocampo-Garc´ıa1 , Johan Sebasti´ an Yepes-R´ıos1 , Francisco J. Correa-Zabala1 , and Christian Trefftz2 1

2

Departamento de Inform´ atica y Sistemas, Universidad EAFIT, Medell´ın, Antioquia, Colombia {tllanos,jocamp18,jyepesr1,fcorrea}@eafit.edu.co School of Computing and Information Systems, Grand Valley State University, Grand Rapids, MI, USA [email protected]

Abstract. Graphical Processing Units (GPUs) have become more accessible peripheral devices with great computing capacity. Moreover, GPUs can be used not only to accelerate the graphics produced by a computer but also for general purpose computing. Many researchers use this technique on their personal workstations to accelerate the execution of their programs and have often encountered that the amount of memory available on GPU cards is typically smaller than the amount of memory available on the host computer. We are interested in exploring approaches to solve problems with this restriction. Our main contribution is to devise ways in which portions of the problem can be moved to the memory of the GPU to be solved using its multiprocessing capabilities. We implemented on a GPU the Jacobi iterative method to solve systems of linear equations and report the details from the results obtained, analyzing its performance and accuracy. Our code solves a system of linear equations large enough to exceed the card’s memory, but not the host memory. Significant speedups were observed, as the execution time taken to solve each system is faster than those R MKL and Eigen, libraries designed to work on obtained with Intel CPUs. Keywords: GPU · System of linear equations Memory limitations

1

· Jacobi

Introduction

GPUs have become excellent resources for highly parallel numerical workloads thanks to their high-bandwidth memories and hardware that performs floatingpoint arithmetic at significantly higher rates than conventional CPUs. Algorithms that can be structured as stream processing can get an excellent performance improvement. Stream processing is a computing paradigm that takes c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 39–54, 2018. https://doi.org/10.1007/978-3-319-98998-3_4

40

T. F. Llano-R´ıos et al.

advantage of multiple computational units, such as the ones present in GPUs or field-programmable gate arrays (FPGAs) [1], without explicitly managing allocation, synchronization or communication among those units. One of the most important fields in which stream processing can be applied is Numerical Methods [2]. This article will be focused on an algorithm implementation for processing large systems of linear equations on GPUs [3] through a parallel Jacobi implementation. In this article, we consider a large system as a matrix which does not fit into GPU available memory, yet, it fits in the host computer memory. Currently, there are many Numerical Linear Algebra libraries [4] that can R [6]) and on solve this kind of systems on CPUs (e.g. Eigen [5], IntelMKL GPUs (e.g. cuBLAS [7], MAGMA [8]). However, these libraries are not designed to perform operations over matrices that exceed the computational available resources in conventional machines [9]. Jacobi was chosen because it is a method that can be easily parallelized. Additionally, it is an iterative method in which the accuracy can be refined in each iteration. This one was developed in CUDA [10] to take advantage of the GPU computing model and reduce the time needed to solve the systems. In this article, we describe the Jacobi method. Additionally, the GPU architecture is explained. Later, the algorithm developed in CUDA and C++ is described. Moreover, that section describes the matrix generator software developed for testing purposes. Section 5 is intended to show a comparison between our implementation and some of the most widely used libraries for solving these kinds of systems. Finally, in the last section, we present conclusions and future work.

2

Preliminary Concepts - The Jacobi Method

The Jacobi Method is an iterative method for solving a system of linear equations Ax = b. In general, the system Ax = b is transformed into a system of the form x = T x + D, where x is the solution vector, and T and D are obtained from the matrix A and the vector b. The last equation defines an iterative equation of the form xn+1 = T xn + D

(1)

We rewrite this iterative equation in the following equation diagram. In this → scheme we suppose that − x = (x1 , x2 , x3 , . . . , xn ) and g are family functions of n − 1 variables. We solve each equation i for the value of xi while assuming the other entries of x remain fixed as we show in the following scheme: x1 = g1 (x2 , x3 , x4 , . . . , xn ) x2 = g2 (x1 , x3 , x4 , . . . , xn ) x3 = g3 (x1 , x2 , x4 , . . . , xn ) .. . xn = gn (x1 , x2 , x3 , . . . , xn−1 )

(2)

Solving Large Systems of Linear Equations on GPUs

41

Now, we illustrate how the Jacobi method can be parallelized. The following diagram shows one step in the application process of the method. Let → − xc = (c1 , c2 , c3 , . . . , cn ) be a current approximation to the solution (old value) and − x→ n = (a1 , a2 , a3 , . . . , an ) the new approximation (new value) obtained from → − xc . a1 = g1 (c2 , c3 , c4 , . . . , cn ) a2 = g2 (c1 , c3 , c4 , . . . , cn ) a3 = g3 (c1 , c2 , c4 , . . . , cn ) .. .

(3)

an = gn (c1 , c2 , c4 , . . . , cn−1 ) −→ using the previous vector − We can then obtain a new vector − x−n+1 x→ n . The Jacobi − → method begins with an initial value x and it uses the previous equations to 0

calculate the series: ∞ {− x→ n }n=0

(4)

We expect that this succession converges to one value which is the solution of the system of equations. We selected this method because it is easy to parallelize. In effect, each equation i (function gi ) can be computed independently [11]. The Jacobi method finds the solution of a system of equations Ax = b, if the matrix A is strictly diagonally dominant. A Matrix A is called strictly diagonally dominant if:  |aij | (5) ∀i,j | i, j ∈ N ∧ aij ∈ A → |aii | > i=j

There are numerous problems that can be solved as a system of linear equations of the form Ax = b. The types of problems can be classified according to the characteristics of the Matrix A, the method used to solve the system of equations and the kind of hardware used to run the numerical method. In this article, we focus on large systems of linear equations where A is strictly diagonally dominant and the hardware of interest are GPU cards.

3

GPU Architecture

First of all, to understand why GPUs are one the most suitable options nowadays in Numerical Methods, it is necessary to include a brief description of GPUs’ architecture in order to comprehend how this approach provides a feasible way to solve those systems of linear equations. The best way to explain the architecture of a GPU is by comparing it with the architecture of a CPU. Bearing this in mind, NVIDIA states “Compare how they process tasks. A CPU consists of a few cores optimized for sequential serial processing while a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously” [12].

42

T. F. Llano-R´ıos et al.

In other words, the GPU is specialized for compute-intensive and highly parallel computation. The problems that are best suited for GPUs have a high arithmetic intensity and the number of subproblems that can be handled in parallel, if enough processors are available, is often a large multiple of the number of processors. Often this means that the same instruction can be executed by all processors at the same time on each subproblem. Highly parallel computation is a great way to operate over matrix problems which, frequently, can be solved using a parallel approach. As it is well-known, many primitive operations that work on matrices or vectors can be implemented using a SIMD (Single Instruction Multiple Data) approach. SIMD is one of the basic styles of programming in Flynn’s Taxonomy [13]: The same operation is performed simultaneously on different pieces of data. Having this in mind, the NVIDIA GPU architecture is described as “A scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor and multiple thread blocks can execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors” [10]. A multiprocessor is designed to execute hundreds of threads concurrently. To manage such a large amount of threads, it employs an execution model called SIMT (Single-Instruction, Multiple-Thread) used in parallel computing where a single instruction over multiple data (SIMD) is combined with multithreading. It is relevant to describe briefly the memory hierarchy in NVIDIA’s GPU architecture as well. CUDA threads may access data from multiple memory spaces during their execution as it is shown in the Fig. 1. Each thread has private local memory. Each thread block has shared memory visible to all threads of the block and with the same lifetime as the block. All threads have access to the same global memory. There are also two additional read-only memory spaces accessible by all threads: the constant and texture memory spaces. The global, constant, and texture memory spaces are persistent across kernel launches by the same application [10]. In our implementation, we only focused on the local and global memories.

4

Solving Big Systems of Linear Equations

The Software has been traditionally written as a discrete series of instructions that are sequentially executed on a computer to solve a problem whose solution has been theoretically proven to be feasibly calculated by computing the result of the above-mentioned instructions one after another, also known as Serial Computing. This concept was further improved to give rise to what is known today as parallel computing. The notion of parallel computing, in which a problem’s solution can be found by carrying out simultaneous calculations [14], is remarkably important to this article as it leads us to consider the practicality of using graphical processing units, given their potential to accelerate data processing by

Solving Large Systems of Linear Equations on GPUs

43

Fig. 1. Memory hierarchy in a GPU. Image retrieved from [10]

concurrently executing a large number of sets of instructions, to solve systems of linear algebraic equations. Although the speedups that can be achieved by using GPUs make them very attractive, their memory capacity is often a hardware constraint that limits the amount of information one can store, hence making them unsuitable for large systems. However, the concept of block matrix introduced by [15] can be used to break a system (or problem) into partitions (or subproblems), sequentially upload them to the memory of the GPU and concurrently solve each one of those subproblems by using GPU-cores, thus overcoming the limitation created by insufficient GPU-memory. To further illustrate this idea, we have written a set of basic steps: 1. Identify variables that may influence your system partitioning. Excluding algorithm-dependent variables (i.e. variables that are produced by the algorithm one selects to find a solution), the bare essentials one should consider are: available GPU-memory, size of the problem (aka size of the system). 2. Find a suitable partitioning scheme. This will always depend on the algorithm one chooses to use and the available resources. For instance, consider a system such as A(n,n) × x(n,1) = b(n,1) | n mod 2 = 1 where the goal is to compute b(n,1) . One algorithm could potentially partition n the matrix by rows and use threads to concurrently compute b(i,1) = j=1 a(i,j) × x(j,1) , while other may break A into four sub-matrices of size n2 , n2 , x into two subvectors of size n2 and use four threads to compute A1 × x1 , A2 × x2 , A3 × x1

44

T. F. Llano-R´ıos et al.

and A4 × x2 , or schematically (as it is shown in Theorem 1.9.6 from [15]):       x A1 x1 + A2 x2 A1 A2 × 1 = A3 A4 x2 A3 x1 + A4 x2 3. Upload each partition to GPU memory. Please note that the number of uploads is not only dependent on the partition scheme one chooses, but also how data is represented in memory. For instance, the partitions A1 and x1 can either be uploaded to GPU memory as two independent arrays or one array representing [A1 |x1 ]. Take into consideration that whichever you use could have a great impact on the algorithm’s structure. 4. Process data concurrently. In other words, use GPU threads to compute the solution of the partial system uploaded into memory. Note this also involves CPU operations required to control the GPU’s behavior. 4.1

Jacobi Iterative Method Applied to Block Matrices

The steps mentioned previously are the foundation of the parallel Jacobi Method used in Sect. 5. Algorithm 1 shows how step 1 influenced the structure of the traditional solver, while still satisfying Axn ≈ Ax, by introducing variables (whose name is prepended with a ‘g’) to handle data in GPU-Memory: – gM em. Stores number corresponding to the available GPU memory. Note that only 80% (this percentage may vary depending on the overall memory usage) is actually used in order to avoid possible out of memory errors. – gM emA . Stores number corresponding to the available GPU memory for → − − → → x→ A’s allocation. There are other four vectors, − xc , − n , ex and b , subject to allocation, thus compelling to the subtraction of their sizes (4 × N ) from gM em. → → − − → − xc , x→ – gB, gXc , gXn , gEx . Store data from b , − n and ex respectively. Step 2 led to the creation of variables whose only purpose is to store chunks of data and control its transformation according to a partitioning scheme where gRowsA rows from A create a block. Schematically, if gRowsA = k and 1 < i ≤ n then: ⎡

a11 ⎢ .. block1 = ⎣ .

a12 · · · .. . . . .

⎤ ⎤ ⎡ a1n ak+(i−1),1 ak+(i−1),2 · · · ak+(i−1),n ⎥ .. ⎥ , block = ⎢ .. .. .. .. ⎦ ⎣ i . . ⎦ . . .

ak1 ak2 · · · akn

aik,1

aik,2

···

aik,n

Algorithm 1 uses this logic to indicate Algorithm 2 how many blocks it needs to process. Thereupon, step 3 is applied to continuously copy a block blockA from host memory to GPU memory (gBlockA ). Note the process does not involve data replication across host memory, but rather pointer shifting:

Solving Large Systems of Linear Equations on GPUs

45

Algorithm 1. Jacobi global variables dts ≡ A’s datatype size (e.g. size of double) end global variables predefined functions zeros(x, N ): Ensure ∀i |i ∈ N ∧ xi ∈ x → xi = 0 GetGpuMem(Void): Return available GPU memory. toDevice(x, N): Copy N bytes from x to GPU memory and return pointer. gAlloc(N): Allocate N bytes of GPU memory and return pointer. → norm(x, N): Compute euclidean norm of x, i.e. ||− x ||2 end predefined functions Read N, A, b, tol, niter →  Initialize − xc to zero zeros(xc , N ) gM em ← 0.8 × getGpuMem() gM emA ← gM em − (4 × N ) gRowsA ← rDown(gM emA /(N × dts)) gBlockA ← gAlloc(gRowsA × N ) gB ← toDevice(b, n) gXc ← toDevice(xc , N ) gXn ← toDevice(xn , N ) gEx ← gAlloc(N ) err ← tol + 1  Initialize to some value greater than tol cont ← 0 blocks ← N/gRowsA while err > tol ∧ cont < niter do if (cont mod 2) = 0 then launchJacobi(A, gBlockA , gB, gXc , gXn , gRowsA , blocks, N ) normx ← norm(gXn , N ) else launchJacobi(A, gBlockA , gB, gXn , gXc , gRowsA , blocks, N ) normx ← norm(gXc , N ) end if err ← norm(gEx , N )/normx cont ← cont + 1 end while if err < tol then if (cont mod 2) = 0 then print gXn else print gXc end if print succeeded in cont iterations with an error of err else print failed end if

46

T. F. Llano-R´ıos et al.

– From the start position, take k elements you want to process. For instance, if you were to process k rows from a linear matrix A: ↓ a11 a12 a13 · · · a1n a21 a22 · · · akn · · · – Process the data you selected. This involves any intermediate step you need to perform. For instance, if you need to copy data from host to GPU memory (gpuMemCpy(destination, source)). – Recompute k if the kth element is out of bounds. This will only happen if the size of the last partition differs from the others. – Shift start position by k elements and repeat from the beginning.

↓ a11 a12 a13 · · ·

a1n a21 a22 · · ·

akn ak+1,n · · ·

Algorithm 2. Launch GPU kernel to process blocks from A function launchJacobi(A, gBlockA , gB, gXc , gXn , gRowsA , blocks, N ) lblock ← blocks − 1  Last block (true for 0-base indexing) for i ← [0, blocks) do f rb ← i × N  To which row of A corresponds the block’s first row shif t ← f rb × N if i = lblock then  Reassign last block’s size gRowsA ← N − i × gRowsA end if blockA ← shift(A, shif t) gBlockxc ← shift(gXc , f rb) gBlockxn ← shift(gXn , f rb) gBlockex ← shift(gXc , f rb) gpuMemCpy(gBlockA , blockA ) runJacobi(gBlockA , gB, gXc , gXn , xegpu , gRowsA , N, f rb) computeErr(gBlockxc , gBlockxn , gBlockex , gRowsA ) end for end function

Finally, step 4 is shown in Algorithm 3; where runJacobi computes: ⎛ ⎞ n  1 (k+1) (k) ⎝bi − = aij xj ⎠ , 0 < i ≤ n xi aii

(6)

j=i

and computeErr:

(k+1)

ei

(k+1)

= |xi

(k)

− xi |

(7) − → Note that Algorithm 1 uses the euclidean norm ||ex ||2 as a stopping criterion → instead of the maximum error within − ex . This is due to the fact that the uniform → − norm ||ex ||∞ causes unexpected behaviors if the tolerance is close enough to the machine epsilon.

Solving Large Systems of Linear Equations on GPUs

47

Algorithm 3. GPU kernels predefined functions getThreadId(V oid): Return thread’s id end predefined functions function runJacobi(A, b, xc , xn , ex , gRowsA , N, f rb) tid ← getThreadId() if tid < gRowsA then cr ← f rb + tid  Current row (relative to A) for the thread to process σ ← 0.0 index ← tid × N for j ← [0, N ) do if cr = j then σ ← A[index + j] × xc [j] end if end for xn [cr] ← (b[cr] − σ) ÷ A[index + cr] end if end function function computeErr(xc , xn , ex , N ) tid ← getThreadId() if tid < N then ex [tid] ← xn [tid] − xc [tid] end if end function

4.2

Generator of Systems of Linear Equations

We implemented a generator of systems of linear equations for which we knew, ahead of time, their solutions. We have a good way to guarantee the accuracy of the final solution produced by our implementation and some other methods. Basically, we ensure the correctness of the methods because we always have the true solution in order to compare it with the computed solution. We generated one system of linear equations Ax = b for which A, x and b are known. First, we generate the coefficients matrix A with aleatory values. Secondly, we generate a solution vector x with aleatory values as well. Finally, We multiply A and x to calculate the right-hand side vector b. Later, A and b will be passed to one method (Jacobi, MKL or Eigen). These compute a solution vector x ˆ that can then be compared against the real solution generated previously x. The Algorithm 4 shows this process. Notice that this process has been parallelized using threads. Each thread is in charge of generating a portion of the system. The portion assigned to each thread is calculated by dividing the number of rows of the system by the number of threads created. The initial row indicates the row where each thread starts to generate the corresponding portion of the system with respect to the general matrix.

48

T. F. Llano-R´ıos et al.

Algorithm 4. Generate system global variables A ≡ Coefficient matrix A. x ≡ Solution vector x. b ≡ Right-hand side vector b. cols A ≡ Number of columns of matrix A. rows A ≡ Number of rows of matrix A. end global variables predefined functions generateA(A submatrix, rows per thread, initial row): Generate portion of matrix A. generateX(x subvector, rows per thread): Generate portion of vector X. generateB(b subvector, A submatrix, rows per thread): Generate portion of vector B. end predefined functions function generateSystem(rows per thread, number threads, thread id) initial row ← thread id × rows per thread if thread id = (number threads − 1) then rows per thread ← rows A − initial row end if A local ← A[0] + initial row × cols A generateA(A local, rows per thread, initial row) x local ← x[0] + initial row generateX(x local, rows per thread) b local ← b + initial row generateB(b local, A local, rows per thread) end function

The Jacobi method does not always converge. So, we generate a system Ax = b where the matrix A is strictly diagonal dominant in order to guarantee the convergence of the method. To do this, we fill all the cells of the matrix by rows with random values except the diagonal element1 . In order to create a dominant matrix, we accumulate the absolute values of each row omitting the diagonal element. Next, we add a random positive value to each obtained accumulator. Finally, we multiply this result by (−1)a , where a is an integer random value. The Algorithm 5 shows the process of generating matrix A.

5

Tests

The comparative tests between the algorithms previously described were run in two (2) different computers with the specifications shown in Table 1. For the following tests, we established for each matrix size running ten (10) times the algorithms and calculating their average in order to obtain better accuracy and precision in the results. 1

We use initial row to point the diagonal element. This variable is calculated with respect to the general matrix.

Solving Large Systems of Linear Equations on GPUs

49

Algorithm 5. Generate coefficient matrix (A) global variables cols A ≡ Number of columns of matrix A. end global variables predefined functions random(initial, f inal): Returns a random value between initial and f inal. abs(num): Returns the absolute value of num. end predefined functions function generateA(A submatrix, rows per thread, initial row) for i ← [0, rows per thread) do accum ← 0 for j ← [0, initial row) do A submatrix[i × cols A + j] = random(−5000, 5000) accum ← accum + abs(A submatrix[i × cols A + j]) end for for j ← [initial row, cols A) do A submatrix[i × cols A + j] = random(−5000, 5000) accum ← accum + abs(A submatrix[i × cols A + j]) end for δ ← random(1, 1000) r ← random(1, 1000) A submatrix[i × cols A + initial row] ← (accum + δ) × (−1)r initial row ← initial row + 1 end for end function

5.1

Time

Figure 2 shows the execution times obtained with MKL, Eigen, and Jacobi for the solution of systems of linear equations Ax = b. Furthermore, matrix sizes begin at 2000 and grow up to 24000 with a step of 2000. The limit of equations is 24000 because MKL and Eigen do not have enough memory to operate over bigger matrices. Note that, the Jacobi method in the GPU shows better performance than the others. This behavior is explained by the GPU parallelism. On the other hand, the copy operations made in memory by the other algorithms affect their performance. For instance, MKL has to transpose the matrix A in order to make use of the LAPACK routines written in Fortran, since Fortran expects Table 1. Laptop features Model

Asus G56JR

Asus N551JW

OS

Antergos

Manjaro

Processor Intel(R) Core(TM) i7-4700HQ Intel(R) Core(TM) i7-4720HQ RAM

16 GB

12 GB

GPU

GeForce GTX 760m - 2 GB

GeForce GTX 960m - 2 GB

50

T. F. Llano-R´ıos et al.

time (ms)

ASUS N551JW 160000 120000 80000 40000 0 2000

4000

8000

12000

16000

20000

24000 140000 Jacobi 100000 Eigen 60000 MKL 20000

6000

10000

14000

18000

22000

size (N)

time (ms)

ASUS G56JR 160000 120000 80000 40000 0 2000

4000

8000

12000

16000

20000

24000 140000 Jacobi 100000 Eigen 60000 MKL 20000

6000

10000

14000

18000

22000

size (N)

Fig. 2. Line charts of the average time taken by each algorithm to solve systems of linear equations of different sizes.

the matrix A to be stored in the column-major format. In the case of Eigen, it requires a memory mapping operation between the original matrix A and its own matrix object defined to call its functions. Figure 3 shows the execution measured times for the Jacobi’s algorithm on CPU versus our implementation on GPU solving systems of linear equations of the form Ax = b. Moreover, we included one comparative analysis between several versions of Jacobi featuring both double and single precision (marked with the suffix SP). In general, the results show that Jacobi obtains better performance on GPU. In this case, matrix sizes begin at 2000 and grow until the maximum capacity of each machine with a step of 2000. The ASUS N551JW was able to handle sizes up to 36000 while the ASUS G56JR up to 42000. 5.2

Error

Figure 4 shows the relative error in Jacobi at the end of the fifth iteration calculated using the Euclidean norm as it was mentioned in the Eq. 7. The figure also shows how as the number of equations increases the error decreases faster, which means that the number of iterations decreases too. These results support the conclusion which stands the Jacobi’s method works better for big systems of linear equations under the diagonally dominant matrix restriction. In Fig. 4, for the sake of simplicity, the values obtained for N 2000 and 4000 were omitted. These results were 4.5 × 10−7 and 1.1 × 10−7 respectively. For

Solving Large Systems of Linear Equations on GPUs

51

time (ms)

ASUS N551JW 30000

GPU DP CPU DP OMP DP OMP SP GPU SP

20000 10000 0

2

6

10

14

18

22

26

30

34

size (N x 1000) ASUS G56JR time (ms)

40000 30000

GPU DP CPU DP OMP DP OMP SP GPU SP

20000 10000 0

2

6

10

14

18

22

26

30

34

38

42

size (N x 1000)

Fig. 3. Line charts of the average time taken by Jacobi’s method implemented in CPU vs our GPU to solve systems of linear equations of different sizes.

relative error

Jacobi 5.6e-08 4.8e-08 4e-08 3.2e-08 2.4e-08 1.6e-08 8e-09 0

4

8

12

16

20

24

28

32

36

40 5.2e-08 G56JR 4.4e-08 N551JW 3.6e-08 2.8e-08 2e-08 1.2e-08 4e-09

2

6

10

14

18

22

26

30

34

38

42

size (N x 1000)

Fig. 4. Line chart of the average error produced in the fifth iteration by Jacobi’s algorithm solving systems of linear equations of different sizes.

this test, we decided to use the maximum size that could fit into the RAM of each machine, G56JR and N551JW. The maximum N sizes were 42000 and 36000 respectively. And, matrix sizes begin at 2000 and grow until the maximum capacity of each machine with a step of 2000.

52

5.3

T. F. Llano-R´ıos et al.

Iterations

Figure 5 shows the number of iterations needed to solve systems of linear equations with different sizes. It can be observed how as the number of equations increases, the number of iterations decreases. This behavior shows the Jacobi’s method has better performance while the number of equations increases. For this test we decided to run two different sets of sizes; the first one begins at 5 and grows up to 100 with a step of 5, and the second one begins at 1000 and grows up to 24000 with a step of 1000.

iters

Jacobi 75 60 45 30 15 0

10

20

30

40

50

60

70

80

90

100 67.5 G56JR 52.5 37.5 N551JW 22.5 7.5

5

15

25

35

6

8

45 55 size (N)

65

75

85

95

iters

Jacobi 12 11 10 9 8

2

4

10

12

14

16

18

20

22

24 11.5 G56JR 10.5 9.5 N551JW 8.5

1

3

5

7

9

11 13 15 17 size (N x 1000)

19

21

23

Fig. 5. Line charts of the average iterations required by Jacobi to solve systems of linear equations of different sizes.

6

Conclusions and Future Work

The focus of our research has been solving systems of linear equations that are too large to fit in the memory of GPU cards but small enough that they fit in the main memory of a host system. Libraries that have been written to solve systems of linear equations on GPUs have not been designed to handle matrices of these sizes. Likewise, libraries that have been written to solve systems of linear equations on CPUs have not been designed to take the advantage of the presence of GPUs. Our work addresses matrices on this particular range of sizes and significant speedups were obtained by using a GPU to accelerate the execution of Jacobi’s method. Moreover, our tests show that it is feasible to

Solving Large Systems of Linear Equations on GPUs

53

process systems on GPU as blocks of data, rather than as a whole, while still achieving a significant increase in performance compared to CPU processing. The Jacobi method works well for diagonally dominant matrices and big systems of linear equations. Moreover, the method’s performance increases as the system’s dimension grows bigger. In general, the Jacobi method has better performance while running on GPU; however its single precision implementation shows unexpected results: On one machine the OpenMP version has better performance than the other; we think this result depends on the compute capability constraints on the GPU version. Finally, we conclude that better performance on the GPU can be achieved by using single precision. Additionally, GPU compute capability and number of threads have to be taken into account in order to obtain better experimental results. Future work will include using other methods, different from Jacobi, so that matrices of similar sizes but with other numerical properties can be solved more rapidly by using GPUs. This also involves conducting research into partitioning schemes other than the one showed in this paper. Other future work will include using streams in CUDA: An approach that allows the overlapping of memory copy operations and computations, which should produce a more efficient use of the GPU, addressing matrices big enough to exceed the host’s memory and having a better understanding of the error and iterations’ behavior required by systems of different sizes to converge.

References 1. Papakonstantinou, A., Gururaj, K., Stratton, J.A., Chen, D., Cong, J., Hwu, W.M.W.: FCUDA: enabling efficient compilation of CUDA kernels onto FPGAS. In: 2009 IEEE 7th Symposium on Application Specific Processors (2009) 2. Donno, D.D., Esposito, A., Tarricone, L., Catarinucci, L.: Introduction to GPU computing and CUDA programming: a case study on FDTD [em programmers notebook]. IEEE Antennas Propag. Mag. 52(3), 116–122 (2010) 3. Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In: Proceedings of the IEEE IPDPS 2010, 19–23 April 2010, Atlanta, GA, pp. 1–8. IEEE Computer Society (2010). https://doi.org/ 10.1109/IPDPSW.2010.5470941 4. Dongarra, J., et al.: Accelerating numerical dense linear algebra calculations with GPUs. In: Kindratenko, V. (ed.) Numerical Computations with GPUs, pp. 3–28. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06548-9 1 5. Guennebaud, G., Jacob, B., et al.: Eigen v3 (2010). http://eigen.tuxfamily.org R math kernel library 2018 - c (2017). https:// 6. Intel: Developer reference for intel software.intel.com/en-us/mkl-developer-reference-c 7. NVIDIA: CUDA CUBLAS Library, January 2010 8. Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36, 232–240 (2010) 9. Jaramillo, J.D., Vidal Maci´ a, A.M., Correa Zabala, F.J.: M´etodos directos para la soluci´ on de sistemas de ecuaciones lineales sim´etricos, indefinidos, dispersos y de gran dimensi´ on. Universidad Eafit (2006) 10. NVIDIA: CUDA programming guide, January 2010

54

T. F. Llano-R´ıos et al.

11. Correa Zabala, F.J.: M´etodos Num´ericos, 1st edn. Universidad EAFIT, November 2010 12. NVIDIA: What is GPU-accelerated computing? http://www.nvidia.com/object/ what-is-gpu-computing.html 13. Flynn, M.: Very high-speed computing systems. Proc. IEEE 54, 1901–1909 (1967) 14. Almasi, G.S., Gottlieb, A.: Highly Parallel Computing. Benjamin-Cummings Publishing Co., Inc., Redwood City (1989) 15. Eves, H.: Elementary Matrix Theory, Reprinted edn. Dover Publications Inc., Mineola (1980)

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment: A Higher Education Case Johnny Salazar-Cardona1, David Angarita-Garcia1(&), and Jeferson Arango-López2,3 Institucion Universitaria EAM, Av. Bolívar # 3 – 11, Armenia, Colombia [email protected] 2 FIET, Universidad del Cauca, Calle 5 N.º 4-70, Popayán, Colombia Facultad de Ingeniería, Universidad de Caldas, Calle 65 Nº 26-10, Manizales, Colombia

1

3

Abstract. In the past few years, the data analysis in the academic field has gained interest, because higher education institutions generate large volumes of information through historic data of students, information systems and the tools used by the learning processes. The analysis of this information supports the decision making, which positively impacts the academic performance of students and teachers. In addition, if the results of the analysis are shared with the institution community, it is possible to individualize the needs of the students and professors. Thus, the professional improving process can be applied to each person. This analysis needs to be focused on an education context such as Learning Analytics (LA) and Education Data Mining (EDM), which address the Knowledge Discovery process (KDP). Because of this, we present in this paper the implementation of LA in an Open Data (OD) environment to analyze the information of a higher education institution. The analysis is focused on the academic performance of the students from different perspectives (social, economic, family, among others). In order to improve the results, the analysis process is done in real time through Web Analytics (WA) for each member of the institution according to its needs. Keywords: Learning Analytics  Open Data Web Analytics  Academic Institution

 Educational Data Mining

1 Introduction The analysis of data in the academic field arose due to the adjusted in the ideology of use of Knowledge Discovery Process (KDP) and the business intelligence (BI). These processes are applied in educational institutions in order to find elements that support decision making by teachers and managers. The main objective of these processes is to continuously improve the processes of the institutions as an organization. Apart of that, it is important to identify the factors that impact students’ academic performance in a positive or negative manner. This through the collection of data from their histories or © Springer Nature Switzerland AG 2018 J. E. Serrano C. and J. C. Martínez-Santos (Eds.): CCC 2018, CCIS 885, pp. 55–69, 2018. https://doi.org/10.1007/978-3-319-98998-3_5

56

J. Salazar-Cardona et al.

academic trajectories and actions carried out in any system that supports the process of student training [1]. In this field of academic data analysis, we can find two scientific communities that give the guidelines to obtain the best results, LA (Learning Analytics) and EDM (Educational Data Mining). Although, some authors give a wrong approach to these concepts confusing each other. However, the approach of both communities is similar, but there are key elements that differentiate its application. The LA in its application takes as a main axis the human being for the analysis of the data, using them as a whole to make a description of their general behavior with a visual approach. For its part, EDM analyzes particular and individual data elements in an automated way, excluding the human being from the equation, without being necessary a high graphic level for the understanding of results [2]. Additionally, the LA takes as its main base the Business Analytics (BA) and the Web Analytics for its respective application in the educational field. Based on these elements, it makes use of the basic tasks of a KDP to execute them on this context, focusing on aspects of data analysis that are more effective in educational information. This provides a high level of interaction with the final user, offering intuitive graphics, search filters with dynamic reports that allow the human being to make a judgment in front of the visualized data [1]. In this study, a prior analysis was made of the approach that the technology used for the design, implementation and publication of the Dashboards would have, obtaining similar results of existing researches [3, 4]. Subsequently, with the data centralized and treated computationally, a descriptive process was applied with visual analytics, focusing efforts in the elaboration of predefined dynamic reports or Dashboards, with which the users (The academic community in general) can interact and perform their own analyzes in an Open Data environment. The data that was used in the elaboration of this prototype belongs to the students of the Software Engineering program of the “EAM University Institution”, which comply with the protection regulations of the HABEAS DATA [5], and focused on analyzing different aspects as the academic performance of students throughout their career, subjects, social context, employment situation and family context [6]. The objective of this article is to present the Learning Analytics implementation process in a higher education institution with an Open Data approach, and to achieve this, the document is structured as it follows: - Background in Sect. 2 where a conceptualization of the important elements is presented and general terms; - Materials and Methods in Sect. 3, which details the materials used in the research process; - Results in Sect. 4 and finally the conclusion in Sect. 5.

2 Background In the last few years, LA research has been conducted in different aspects of the academic sector, such as data analysis to improve teaching practice [7, 8], student motivation and collaborative work analysis in virtual environments [9], understand and improve the learning students process [10–12] and the generation of performance indicators to improve different institutional processes [11, 13], which is the objective of

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

57

this investigation. Additionally, Latin America is making some implementation efforts in this area of research, with visual analytics as one of its main focuses [14], but neither applied to an environment of open data for public decision making for the academic community as in this research. For the Learning Analytics application, it is necessary to be clear about some concepts on which this kind of methodologies is based. Now, we are going to present different key concepts that are necessary for the execution of the project. 2.1

Knowledge Discovery Process (KDP)

KDP is an ideology for the implementation of specific tasks and activities that must be executed for any knowledge discovery process. From this, many methodologies have emerged according to the field to apply. Some of these methodologies are: (1) KDD (knowledge Discovery in databases), Process to discover useful knowledge of a set of structured or semi-structured data; (2) SEMMA (Sample - Explore - Modify - Assess); (3) Catalyst or P3QT; (4) CRISP-DM (Cross Industry Standard Process for Data Mining); (5) EDM (Educational Data Mining); (6) LA (Learning Analytics) among others. Of all the existing models, the first one defined was KDD in 1996 as a model for the field of research [15–17], and subsequently various approaches were defined, such as the models mentioned above. Among the most prominent models is CRISP-DM, which specializes in the industrial field, KDD that offers flexibility in the execution process and EDM-LA for the academic context, this last one was used for the execution of this research. 2.2

Open Data (OD)

Open Data is an ideology and a practice that seeks the release and public access to the data that is digitized for its free use, reuse and distribution by any interested person, provided they are attributed and shared in the same way in which it appears [18, 19]. Open data usually come from 2 main sources: the scientific field (such as data on the human genome, geographic information, climate, etc.) and the government (Accountability, transparency, crime, catastrophes, education, etc.). Regarding the governmental field, the data should be published by public and private institutions with public functions [20], in order to provide processes of transparency, participation, innovation and empowerment on the part of the citizenry or any interested third party [21]. On the other hand, in the academic sector, are the educational institutions that have the duty to apply said processes of data opening [22]. Currently Open Data is very important and is accepted as a fundamental component for making decision and the promotion of research. But compared to other areas for which data is available, the field of education has limited resources in these repositories [23]. In the case of Colombia, the few efforts that have been made do not meet the minimum quality criteria [24, 25]. 2.3

Visual Analytics (VA)

Visual analytics is the point of convergence between the visualization of information and scientific visualization, centered on the analytical reasoning of the human being

58

J. Salazar-Cardona et al.

through visual and interactive interfaces, applying a descriptive process on the data [26]. This concept is widely used in the field of business intelligence and different areas of knowledge [27], allowing in an easy and intuitive way to understand the behavior of the data that is being visualized [28]. In addition, it offers a component of interaction with the end user to support decision making. To achieve a successful human judgment on the data used, it tries to increase the cognitive resource by using different visual elements, decreasing search times by presenting them in a compact space. This shows the data through different dimensions to understand its behavior and the identification of patterns with predefined dynamic filters that allow to evaluate the data in different contexts, supported by colors and sizes to induce knowledge [29]. Additionally, since the emergence of web 2.0, visual analytics has turned to its implementation on the web [30], in order to have Dashboards on any platform, integrating different kinds of graphics such as decision trees, bar charts, maps, and any kind of graph that allows structuring of the data. 2.4

Learning Analytics (LA)

Learning Analytics is a research community seen as an additional branch in the field of knowledge discovery processes, applied specifically to data from an educational environment. Where the concept of visual analytics is used to describe the behavior of the data generated in an academic context. These data involve elements such as levels of complexity of the courses, identification of aspects that affect academic performance, positive or negative reception of students in different topics, academic traceability, student desertion, access control to virtual education platforms, among others. During its history, the term Learning Analytics has been debated with respect to the concept and community of Educational Data Mining [31–33], with differences such as the focus of research and the size of the data. This gap between the research data approaches and the size of the datasets is currently non-existent, due to the fact that in the different scientific disclosures of these 2 communities, the research topics are similar, although it is the applied approach to the analysis of the data which makes a difference between these. LA makes use of the concept of Business Intelligence, specifically visual and web analytics, where the main element of judgment is the human being who interprets and focuses the analyzes according to their needs in an intuitive way, performing descriptive analysis. For its part, EDM focuses on the application of semiautomatic predictive elements, where the human being loses this relevance and is not the one who issues first-hand judgments for the interpretation of results [2]. In Latin America, the implementation of Learning Analytics is still limited, where the research published in this field of knowledge is headed by Brazil, which for the year 2017 had a total of 14 articles and the participation of 16 institutions, followed by Ecuador with 7 publications, Chile with 3 and Colombia is in the 4th Latin American position with a total of 2 articles from 2 institutions [14]. These quantities show a low level if it’s consider the relevance of knowledge discovery processes at the present. The few publications of the sector and of the global research trend, the predominant theme is monitoring and analysis, applying experimental processes and dominating the application of statistical processes, machine learning, social network analysis and visual analytics [34].

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

59

Finally, LA exposes an application process and a methodology of approach, which are based on the KDP ideology (see Figs. 1 and 2) and proposes 3 stages: (1) Data collection and pre-processing, in which the data from different sources in a single data warehouse, to later apply a set of business rules that will allow to have adequate data to analyze. This stage is defined from the union of different tasks of KDP, such as the analysis and understanding of the data environment, creation of the working database, cleaning and transformation of the data; (2) Analytics and action, is the application of data analysis techniques, for its interpretation, pattern detection and from the objective of the analysis, take the respective measures together with the disclosure of results. This stage takes the actions of understanding and choosing the technique of data mining, its application, interpretation - evolution and processing of results of the KDP ideology; (3) Post-processing, focused on the refinement of the work database in order to extend the analyzes carried out, adding new sources of data, attributes and extending the previously performed analyzes to new approaches. Regarding the application model, there are 4 fundamental elements: (1) what? It refers to the set of data that will be analyzed; (2) who? Referring to whom the analysis will be destined; (3) Why? Related to the objective of the analysis; (4) how? Related to the process that is applied to achieve it, and at this point is that the application process previously treated is immersed.

Fig. 1. Learning Analytics model [35].

60

J. Salazar-Cardona et al.

Fig. 2. Learning Analytics process [35].

3 Materials and Methods In the following paragraphs, we will explain the different stages that were carried out in the methodology used, guiding the execution of the investigation until reaching the results stage. 3.1

Learning Analytics ¿What?

The data set that was analyzed contains the basic information of all students of the software engineering program belonging to the EAM university institution, located in Armenia, Quindío. This dataset is made up of the historical data from 2007 to 2017, including family information, origin, residence, disability and employment status of students. In addition, it was integrated with another set of different data which contains all the academic record. In total the student records are 634 with 50 attributes, and the data set of the academic record had 13282 records, with the different registers each subject taken by each student with 13 attributes, which highlights the scores of the first 3 cuts and the definitive academic space. 3.2

Learning Analytics ¿Who and Why?

The objective of the project is to provide a mechanism by which in an OD environment real-time analysis can be performed, where any person or area of the educational institution can perform their own analysis. This was focused on senior management of the institution to assess the current status of the program; to the marketing area so that they can analyze which geo-referenced locations have the largest number of students, and have knowledge about the socioeconomic strata that should be considered to increase the income of a larger population; the area of the CAP (Comprehensive Accompaniment Plan1), to identify patterns and behaviors of students with low performance, students with disabilities and student desertion; The “Parents’ School2” area to focus studies and academic training programs aimed at parents of students for their 1 2

Responsible area for the accompaniment and support of students with poor academic performance. Academic program focused on the integration of parents in the academic education of their children as well as the continuity of their own professional training.

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

61

continued inclusion in the academic field; to the students in general so that they can carry out descriptive processes of the program, visualizing subjects with greater positive and negative impact in their professional training process. Mainly, the previously proposed approaches seek to find patterns that affect the academic performance of the students according to the situation of the parents, family nucleus, place of residence, semester that is studying, working day, academic spaces, socioeconomic level, disability and employment situation [6]. 3.3

Learning Analytics ¿How? – Data Collection and Pre-processing

The process of data centralization and pre-processing was applied from an ETL process [36], in which the data from the original data source were extracted from the consumption of a web Service, and subsequently a set of business rules were applied in order to calculate and organize data that did not meet a minimum quality, in order to answer the established questions. Some of the business rules applied to have an optimal data set were: Calculation of latitude and longitude for geo-referenced visualization of students from residence addresses, treatment of badly named subjects and notes out of range, calculation of time that it takes the students to complete their studies, last semester completed - last year attended by each student for the detection of student desertion, elimination of attributes in which a high percentage had null values and that were not necessary to answer the questions posed, and corrections of some attributes with non-standardized values such as gender, professions of parents, cities of residence (integrating neighboring towns, from where students move daily), levels of education of parents and monthly economic income [37]. It is noteworthy that the analyzed data comply with the HABEAS DATA protection standards [5]. Finally, when the data was found clean they were centralized on the Windows Azure platform [38], which provides a quick access mechanism on the web and integration with multiple data analysis platforms. In this centralization, a data warehouse with multiple dimensions was designed, which will allow an optimal performance in the long term when the amount of data is even greater [39] (see Fig. 3).

Fig. 3. Data warehouse model

62

3.4

J. Salazar-Cardona et al.

Learning Analytics ¿How? – Analytics and Action

In the initial stages of the project execution, there was carried out a systematic review process to determine in a higher education institution, what would be the most adaptable and efficient kind of tool that could be applied in an academic setting, resulting in the Power Tool Microsoft BI [4]. With this tool a configuration was established for the consumption and visualization of data in real time, in order to perform analyzes with updated data for the entire university community. Subsequently, we proceeded to design and implement each of the Dashboards that would allow the university community to perform their own analysis, and these reports provide the possibility of downloading the data at any time, in order to apply external analysis processes, as predictive analysis processes applying the EDM model with any of its different fields of action [40]. With the different Dashboard designed, it was sought to focus the analysis to answer the questions and objectives. The first one was the analysis of the academic performance of the students by the day, level and semester. For this Dashboard variables were used as: Definitive note of the 3 cuts per semester along with its definitive, and its visualization through the years of existence of the software engineering program, as well as through the different academic semesters. It is also allowed to see the behavior of each of the academic spaces and its final note, the total number of students who have seen each course, and the application of filters per academic day (Day - Night), city of residence, level (technical, technological or university), academic term (1–2), distribution by gender and a key performance indicator with the general academic average. It should be noted that all Dashboards implemented are completely dynamic, so when selecting any element in said report, all the graphics and data of the other sections in it will be affected. The Dashboard that allows analysis by social stratum or physical disability has variables such as final academic space, total students per year, distribution of students by socioeconomic stratum, distribution of students by kind of disability, indicator with the total number of students, map with geo-referenced visualization of all students, and filters such as municipality, kind of disability and kind of document. Regarding the analysis provided focused on the family nucleus of students has variables such as final academic space, average of each cut and final per year, distribution of the total siblings of students, distribution of total students by socioeconomic stratum, total students, general academic average, geo-referenced location of the students, and specific filters by HPE (Health Provider Entity), municipality, gender, number of dependents, own home, type of document, year and marital status (see Fig. 4). Regarding the analysis provided focused on the family nucleus of students, has variables such as final academic space, average of each cut and final note per year, distribution of the total siblings of students, distribution of total students by socioeconomic stratum, total students, general academic average, geo-referenced location of the students, and specific filters by HPE (Health Provider Entity), municipality, gender, number of dependents, own housing, type of document, year and marital status. An example of the generated Dashboards can be seen in Fig. 4. Regarding the analysis provided focused on the employment situation, the variables were available as definitive for academic space, distribution by socioeconomic stratum, distribution by dependence economy (dependent, independent, family), average per

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

63

Fig. 4. Dashboard of the family nucleus of the students

semester, total of dependent and independent students per year, total of students in general, and filters such as civil status, gender, city of residence, position in the company as a dependent and independent. The Dashboards in which it is not sought to analyze the academic performance as the main axis, but seeks to assess the context in which the student develops in their day to day as their family situation, can cross variables such as the distribution of fathers and mothers of the students by profession and academic level, academic performance of the students by said formation of the parents and the application of filters by years and gender visualizing the total of students. Finally, for the evaluation of student desertion, the analyzed variables were as average per academic space, total students per year and per semester, general academic average, student distribution by gender, last year attended by students, total students by stratum, total number of students enrolled, total number of students in course, and the provision of filters by program, year, last semester taken by a student and duration in culminate the race (here can be filter by seeing only deserters). 3.5

Learning Analytics ¿How? – Data Collection and Post-processing

All data and dashboards implemented were made available on the web portal of the EAM university institution, so that the entire academic community can access them and can perform their own analysis, as well as everything is in the Windows Azure ecosystem, all the data that is displayed is updated periodically, offering fully updated and treated data.

64

J. Salazar-Cardona et al.

4 Analysis of Results In the next paragraphs, we will explain the different results obtained, dividing them by academic performance, family nucleus and student desertion. 4.1

Students’ Academic Performance

By conducting an exploratory process in the different implemented dashboards, some relevant behaviors were identified in the students of the software engineering program, which allow to focus specific points that will improve the students’ academic performance: (1) In general terms, women have a better academic performance than men in the course of their training as software engineers, although it is a preferred career for men corresponding to 83.44% of the total student body; (2) It is possible to visualize the constant improvement of students through their professional training, but this behavior changes in 5th semester, this may be due to the fact that it is the semester where the change of training from technical level to technologist is generated, and subsequently to university (see Fig. 5); (3) It is possible to demonstrate that the academic spaces that have an average in the final mark of the students below 3 are: Administration of databases, Physics I, Geometry, Programming Logic, Mathematics, Discrete Mathematics, Applied Mathematics I, Applied Mathematics II, Pre-calculus and Principles of software engineering, being these academic spaces with high demand for logical skills; (4) It is evident that around 72% of the students are from strata 2 and 3, in addition there is a clear relationship between the socioeconomic stratum and academic performance (see Fig. 6), where the highest socioeconomic stratum is the lowest; (5) Half of the students in the program are only children; (6) Only 10% of students have a job in a dependent or independent way, the other 90% depends on their family; (7) Students who work as freelancers have better communication skills, and this is clearly reflected in the grades of the academic spaces where communication skills are necessary, far exceeding the general average; (8) Students with a physical or motor disability or who are married at the time of beginning their professional training process have an academic performance higher than the general academic average. (9) A representative factor in the analyzed sample is the impact that the socioeconomic strata have on the academic performance of the students and the possible student desertion. For this reason, awareness-raising strategies should be generated to encourage and value the process of professional training in students in order to level these differences between strata. One element that may be of interest for increasing the income of students to the engineering field of the Software area is the increase of effective dissemination in the highest strata, because these are not being well received, this being reflected in the low number of students to a greater socioeconomic stratum. 4.2

Family Nucleus of the Students

In the exploratory process carried out, it was identified that: (1) Although the students have fathers and mothers with postgraduate training, they tend to have a lower academic performance when their father has this kind of training, but when the mother has the performance, the student’s academic level is above average (see Fig. 7); (2) When

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

65

Fig. 5. Academic average by semester

Fig. 6. Academic average by social stratum 5–6 in comparison with the general average

the father of a student is abroad or works as an independent, the student’s academic performance is low, while if the father works as a teacher the academic performance tends to be higher; (3) In a high percentage the parents of the students of the software engineering program are retired, and of these none have postgraduate training, while in the retired mothers although there are not many, they have postgraduate training. With the above, element that influences the academic performance of students are their parents, which was verified with the results obtained, and, therefore, should create awareness strategies for parents to positively influence their children in the process deformation. In addition, the diversity of the professional practice of the parents was also identified, which can be seen as an opportunity for inclusion through programs such as “School of parents” for the integration of these with their professional training and that of their children.

66

J. Salazar-Cardona et al.

Fig. 7. Academic average bi academic level of the mother’s student

4.3

Student Desertion

The most relevant results obtained from the exploratory process carried out in the student dropout Dashboard are: (1) although in the program the great majority are men, women are the ones who desert the most. If taken individually, 67% of women are dropouts, compared to 59% of men; (2) In all academic semesters there are dropouts, but the lower the semester the greater the amount of desertion; (3) The distribution of desertion for stratum 5 and 6 is 75%, for stratum 4 it is 54%, 60% for stratum 3, 61% for stratum 2 and 55% for stratum 1; (4) Women, although they drop out more than men, have an academic performance above 3 when they make this decision, while men have an average below 3 when they drop out, (5) Academic spaces with worse academic performance of dropouts they are those related to mathematics and programming (see Fig. 8).

Fig. 8. Academic average by academic spaces and student desertion on the first semester.

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

67

5 Conclusions and Future Work This document presents the different elements that were taken into account for the implementation of a visual analytics process in an Open Data environment of a higher education institution, based on the Learning Analytics approach. These descriptive analyzes focused mainly on the academic performance of the students, student desertion and their family context from structured data, pre-processed, transformed and centralized in a Data warehouse under the Windows Azure ecosystem and analyzed with the different available elements of the Power BI tool. Statistical patterns were identified, which can be used by different areas of the university institution analyzed, such as the Software Engineering academic program, the CAP, the Marketing area or any individual of the university community. Among the elements identified, the impact that students have on their passage from a technical level to technological, and later the University. For this, strategies must be generated that allow the change between the different academic levels to be carried out normally. Additionally, it was evident that the most difficult academic spaces for students in general are related to the use of logic to solve problems, so efforts should be focused to further develop this skill in students, in order to increase these indicators and thus avoid student desertion, which was also compared with the low academic performance of this kind of subjects. This research demonstrates the possibility of integrating visual analytics in a higher education institution, where all the analysis mechanisms are made available to the academic community in general, releasing all non-sensitive data so that any interested can rely on them for decision making, applying filters and visualizing the behavior of the data in real time without violating any kind of privacy and always looking for the continuous improvement of the academy, both at the level of the training processes of the students and the institution seen as an organization. In order to continue refining the implementation of data analysis in the academic sector, data should be integrated not only from a particular program, but from the entire university community, in addition EDM elements should be added as predictive processes, in order to find even deeper patterns that allow generating new strategies for the strengthening of university institutions. Acknowledgements. We thank to the research area of the EAM University Institution and the Software Engineering program, which facilitated the whole process of project execution.

References 1. Baepler, P., Murdoch, C.J.: Academic analytics and data mining in higher education. Int. J. Sch. Teach. Learn. 4, 11 (2010) 2. Siemens, G., Baker, R.: Learning analytics and educational data mining: towards communication and collaboration. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, Vancouver, British Columbia, Canada, pp. 252–254. ACM (2012)

68

J. Salazar-Cardona et al.

3. Gounder, M.S., Iyer, V.V., Al Mazyad, A.: A Survey on business intelligence tools for university dashboard development. IEEE (2016) 4. Salazar Cardona, J.A., Angarita Garcia, D.A.: Evaluación y selección de herramientas de analítica visual para su implementación en una institución de educación superior. IngEAM, vol. 4, pp. 1–20 (2017) 5. Cote Peña, L.F.: Hábeas Data en Colombia, un trasplante normativo para la protección de la dignidad y su correlación con la NTC/ISO/IEC 27001. Magíster en Derecho, Facultad de Derecho, División de Ciencias Jurídicas y Políticas, Universidad Santo Tomás Seccional Bucaramanga, Bucaramanga, Colombia (2016) 6. Vásquez Velásquez, J., Castaño Vélez, E., Gallón Gómez, S., Gómez Portilla, K.: Determinantes de la desercion estudiantil en la Universidad de Antioquia. Facultad de ciencias econimicas. Centro de investigaciones economicas, Universidad de Antioquia, p. 43 (2003) 7. Tanaka, M.: C3.js D3-based reusable chart library, Mayo 2014. http://c3js.org/ 8. Einhardt, L., Aires Tavares, T., Cechinel, C.: Moodle analytics dashboard: a learning analytics tool to visualize users interactions in moodle. In: XI Latin American Conference on Learning Objects and Technology (LACLO) (2016) 9. Echeverría, L., Benitez, A., Buendia, S.: Using a learning analytics manager for monitoring of the collaborative learning activities and students’ motivation into the Moodle system. In: IEEE 11th Colombian Computing Conference (CCC) (2016) 10. Charleer, S., Vande Moere, A., Klerkx, J.: Learning analytics dashboards to support adviserstudent dialogue. IEEE Trans. Learn. Technol. (2017) 11. Morais, C., Alves, P., Miranda, L.: Learning analytics and performance indicators in higher education. In: 12th Iberian Conference on Information Systems and Technologies (CISTI) (2017) 12. Ruiz Ferrández, M., Ortega, G., Roca Piera, J.: Learning analytics and evaluative mentoring to increase the students’ performance in computer science. In: IEEE Global Engineering Education Conference (EDUCON) (2018) 13. Van der Stappen, E.: Workplace learning analytics in higher engineering education. In: IEEE Global Engineering Education Conference (EDUCON) (2018) 14. Lemos dos Santos, H., Cechinel, C., Carvalho Nunes, J.A.B., Ochoa, X.: An initial review of learning analytics in latin America. In: 2017 Twelfth Latin American Conference on Learning Technologies (LACLO), Argentina (2017) 15. Fayyad, U., Piatesky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. IA Mag. 17(3), 37–54 (1996) 16. Fayyad, U., Piatesky-Shapiro, G., Smyth, P.: The KDD process for extracting useful knoledge from volumes of data. Commun. ACM 39, 27–34 (1996) 17. Fayyad, U., Piatesky-Shapiro, G., Smyth, P.: Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge (1996) 18. Open Knowledge Foundation: Open Data Handbook Documentation: Release 1.0.0 [online], vol. 1, Cambridge (UK) (2012) 19. Salazar, J.: Marco de referencia para la implementación del mapa de ruta establecido en los lineamientos nacionales de apertura de datos del sector público y su integración con procesos de descubrimiento de conocimiento e inteligencia de negocios. Master en Ingenieria Computacional, Facultad de Ingenierias, Universidad de Caldas, Manizales, Colombia (2015) 20. El Congreso de la República, “Ley 1712 del 2014. Literal J, artículo 6. Definiciones,” Colombia, Ed., ed. Colombia: Diario Oficial 49084 de marzo 6 de 2014 (2014)

Learning Analytics as a Tool for Visual Analysis in an Open Data Environment

69

21. Florez Ramos, E.: Open data development of countries: global status and trends. In: IEEE ITU Kaleidoscope: Challenges for a Data-Driven Society (ITU K), 27–29 November 2017 22. Heise, A., Naumann, F.: Integrating open government data with stratosphere for more transparency. J. Web Seman. 14(Jan), 3 (2012) 23. Akhtar Khan, N., Ahangar, H.: Emerging trends in open research data. In: IEEE 9th International Conference on Information and Knowledge Technology, pp. 141–146 (2017) 24. Mahecha Moyano, J.F., López Beltrán, N.E., Velandia Vega, J.A.: Assessing data quality in open data: a case study. In: 2017 Congreso Internacional IEEE De Innovacion y Tendencias en Ingenieria (CONIITI) (2017) 25. Salazar Cardona, J.A., Gomez, C.H., López Trujillo, M.: Knowledge discovery process in the open government Colombian model. In: 2014 9th Computing Colombian Conference (9CCC) (2014) 26. Wong, P.C., Thomas, J.: Visual analytics. IEEE Comput. Graph. Appl. 24, 20–21 (2004) 27. Chitvan, M., Nayan, C., Ajayshanker, S.: Scope and challenges of visual analytics: a survey. In: International Conference on Computing, Communication and Automation (ICCCA2017) (2017) 28. Mitchell, J., Ryder, A.: Developing and using dashboard indicators in student affairs assessment. New Dir. Stud. Serv. 142, 71–81 (2013) 29. Scholtz, J., Ebert, D., Elmqvist, N.: User-centered evaluation of visual analytics. Synthesis Lectures on Visualization, p. 71 (2017) 30. Yu, S., Wu, L., Zhang, X.: Research on equipment knowledge representation based on visual analytics. In: IEEE 13th International Conference on Semantics, Knowledge and Grids, pp. 208–212 (2017) 31. Goldstein, P.J., Katz, R.N.: Academic analytics: the uses of management information and technology in higher education. ECAR Research Study, vol. 8 (2005) 32. Oblinger, D.G., Campbell, J.P.: Academic Analytics. EDUCAUSE White Paper (2007) 33. Norris, D., Baer, L., Leonard, J., Pugliese, L., Lefrere, P.: Action analytics: measuring and improving performance that matters in higher education. EDUCAUSE 43 (2008) 34. Bodily, R., Verbert, K.: Review of research on student-facing learning analytics dashboards and educational recommender systems. IEEE Trans. Learn. Technol. 10(4), 405–418 (2017) 35. Amine Chatti, M., Lea Dyckhoff, A., Schroeder, U., Thüs, H.: A reference model for learning analytics. Int. J. Technol. Enhanced Learn. 4, 318–331 (2012) 36. Mukherjee, R., Kar, P.: A comparative review of data warehousing ETL tools with new trends and industry insight. In: 2017 IEEE 7th International Advance Computing Conference (IACC) (2017) 37. Howard, H.: Knowledge Discovery in Databases (2012). http://www2.cs.uregina.ca/*dbd/ cs831/index.html 38. Di Martino, B., Cretella, G., Esposito, A., Giulio Sperandeo, R.: Semantic representation of cloud services: a case study for microsoft windows azure. In: 2014 International Conference on Salerno Intelligent Networking and Collaborative Systems (INCoS), Italy (2014) 39. Carreño, J.: Descubrimiento de conocimiento en los negocios. Panorama 4, 59–76 (2008) 40. Anoopkumar, M., Zubair, R.: A Review on Data Mining Techniques and Factors Used in Educational Data Mining to Predict Student Amelioration, pp. 1–12. IEEE (2016)

Mathematical Model for Assigning an Optimal Frequency of Buses in an Integrated Transport System Juan Sebasti´ an Mantilla Quintero1(B)

and Juan Carlos Mart´ınez Santos2

1

2

Ormuco Inc., Montreal, Canada [email protected] Universidad Tecnol´ ogica de Bol´ıvar, Cartagena, Colombia [email protected]

Abstract. This paper proposes a mathematical model to estimate the frequency setting of the buses for a specific route on any given hour in public transportation systems. This model can be used for three different purposes: determine how many buses a route needs to fully satisfy its demand, estimate an optimal bus frequency to satisfy the maximum amount of demand when the number of buses is fixed and estimate an optimal bus frequency to satisfy a given percentage of demand. It receives three entries: number of buses assigned to the route, which can vary or not depends on its purpose, the travel time for the route and the route’s demand. A series of equations are proposed using a heuristic method, which allows calculating the frequency of a route at any given hour of the day. A use case experiment is applied to help understand how to use the model on it’s different suggested uses. Additionally, exposes how the proposed model could improve an actual one. The results of this experiment case showed that the demand could be fulfilled using one of this model’s cases. Keywords: Public transportation · Frequency setting Transportation · Mathematical model

1

· Waiting time

Introduction

Public transportation is a key aspect of an urban infrastructure providing economic, social and environmental benefits. With the increasing pressure upon global carbon dioxide emissions, the role of public transport has been given a renewed focus [5,6]. A focus was public transport systems (PTS) have to lure the passengers much more than before. There have been various approaches over the time like trains, public buses, Integrated Public Transport Systems (IPTS) among others. The design of an efficient model of PTS is an urgent matter to encourage users away from the individual transportation onto it. This design has to be c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 70–82, 2018. https://doi.org/10.1007/978-3-319-98998-3_6

Mathematical Model for Assigning an Optimal Frequency of Buses in an ITS

71

done in such a way that the overall time spent in the public transport of a user to get from his departure point to the final is less than the one on a private one, this is what we’ll call a decent service. If any PTS doesn’t provide a good service, the passengers won’t use it and the system itself won’t be self-sustainable. As mentioned before, PTS has to be designed in such a way that they provide a good service. The way to do this is to divide the problem into reduced ones. This is well known as the Urban Transit Network Design Problem (UTNDP). There are five sub-problems of the [1]: network design, frequency setting, timetable development, bus scheduling, and driver scheduling. Each problem requires the output of the previous. By doing an implementation of these subproblems, it’s possible to provide a good service. In this paper, we focus on the second, frequency setting narrowed for ITS. As mentioned before, there are various approaches over the time like trains, community buses, IPTS among others. However, there are cities where a train it is impossible to implement. The public buses system is over saturated or heavily delayed and there are no water bodies enough to cover the whole city. In these cities, the most accurate PTS is the IPTS. The IPTS model such as Transmilenio (in Bogot´ a, COL) consists in having the main road to be used specifically for the IPTS’s buses, which helps to minimize the in-vehicle travel time. Although this helps to minimize the overall time spent, the waiting time is an important matter for the users to choose the public transport over the individual one. However, assigning specific roads to IPTS’s buses can increase the in-vehicle time of the personal transport users since it decreases the number of roads it could travel on. It intrinsically augments the number of users who would prefer the public transport. Additionally, extensive waiting times on IPTS can lead to a conglomeration of users who evolves into a system that facilitates pickpocketing, human harassment, and even user protests if the previous two are frequent. The waiting time is considered as an important matter for the user since in IPTS this is the time that the user doesn’t know he’ll spend. It has to be minimized. In order to do this, it considers a way to determine the frequency setting of buses that are sent to transportation by taking into account the number of users waiting for a specific bus. In Colombia, several cities have implemented ITS, e.g. Bucaramanga, Barranquilla, Cartagena, Cali, Medellin, and Risaralda. The study case in this paper is Transcaribe, Cartagena’s ITS. At Transcaribe, the route planning and the frequency setting were determined 10 years before it was working so the studies made didn’t apply very well when it started working. This paper is organized as follows: Sect. 2 discusses how the problem has been addressed. Section 3 introduces our approach. Section 4 shows the results gotten with the current frequency setting of the use case experiment and how it would react using the model proposed. Section 5 exposes the future work. And finally, Sect. 6 concludes the paper.

72

2

J. S. M. Quintero and J. C. Mart´ınez Santos

Related Work

This section wants to look at how the scientific community has approached the PTS. Specifically, the five sub-problems proposed by Ceder and Wilson [1]. What they proposed is a series of steps that determine the tasks to follow when designing a public transport system. 2.1

Network Design

The network design’s goal is to set all the routes that the buses will follow. With the timetable, it’s possible to schedule the buses according to the network design. A way to do is by searching the shortest path repeatedly in a timetable based public transport time-space network graph [8]. It concerns on the edges from the graph, individual preferences of the travelers and departure time from the origin. This method takes into account bus and train transports and a series of functions to determine the choice sets output. Then, it evaluates the choice sets by using their size, by asking if it generated at least one path having a high similarity to a corresponding observed path and the ability to generate choice sets, which provides a stable parameter in the estimation of route choices. These choice sets are used to develop the network design. 2.2

Frequency Setting

The frequency determines how often the buses will go on transit. Sch´eele proposed a non-linear programming solution for the frequency assignment problem [9]. He proposed to solve the problem by taking into account the capacity of the buses attempting to minimize the general travel time (walking, in the vehicle, and waiting time), assuming that the network routes are already established and only included the most relevant vertices which reduced the size of the problem. The proposed model was tested on the town of Lin¨ oping with 6 routes. Huang et al. proposed a bi-level formulation for assigning the frequency setting [4]. The upper level concerned in two factors: the network cost which is the passengers’ expected travel time that is based on uncertain bus passenger demand and operating costs and the network robustness. This is indicated by the variance in passengers’ travel time. The lower level, they calculated the proportional flow eigenvalue using the optimal strategy transit assignment model. 2.3

Timetable Development

Parbo et al. calculated an optimized timetable by using a transit network already established [7]. They calculated a weight at the transfer waiting time, optimized this weight using a public assignment model to evaluate the behavior changes of the passengers, imposed a new timetable using the changes evaluated in the optimization and repeat this process until there wasn’t any optimization achieved. This approach was applied in a large-scale network route in Denmark. Although

Mathematical Model for Assigning an Optimal Frequency of Buses in an ITS

73

it was applied to all transit lines of Denmark, only the bus timetables were changed due to its optimization. It took five iterations to converge, the transfer waiting time weight was improved in 5% and the general travel cost was reduced by 0.3%. Song et al. proposed a modern method to determine a real-time timetable using a WiFi network on every train line [10]. They do it using a spectral clustering method based on derived commuter trajectories. The data they proposed to collect is a record for every device, which has minimum fields, including an anonymous identifier (MAC address), a timestamp and location. The probe time varies on every device. 2.4

Bus Scheduling

Once the timetable scheduling is done, the bus scheduling can be done. This is a schedule of the enlistment time and the departure time of all the buses on the route. According to Wagale et al., timetable and vehicle scheduling are the basis of security and efficiency for various bus enterprises [11]. 2.5

Driver Scheduling

Han and Wilson had the objective of minimizing the occupancy at the most loaded point on any route through the network [3]. They added a constraint where there is a maximum number of buses to reduce the size of the problem. The passenger assignment problem was attacked as follows: if a there’s only one route for an origin-destination pair on hand, this route was used. However, if there is a set of routes available, the demand was divided between them using frequency sharing [2]. The methodology for this solution is to calculate a lower bound by assigning passenger flows and route frequencies until a convergence was achieved. Then, they formulated a surplus allocation problem by linear constraints. The proposed model was tested on an instance of six vertices and three routes.

3

Our Approach

This section explains how the formulas were gotten to estimate the current state of an IPTS (Transcaribe). A number of variables are used in these formulas. Figure 1 gives a graphic visualization of them. 3.1

Road Network

Transcaribe already has the network routes defined. These are shown in Fig. 2. Since they’re already established, the model has to work with these routes but to narrow the problem. For the current works, it will focus only on one route, but the procedure is scalable to the whole system.

74

J. S. M. Quintero and J. C. Mart´ınez Santos

Fig. 1. Graphic visualization of the variables used.

3.2

Frequency Estimation

To get a formula that is capable of determining frequency setting, the enlistment time (Et ) is calculated using Eq. 1. This enlistment time is the one that every bus takes from the time it arrives at the final point of the route to the time it’s ready for transportation. Then, the departure time (Dt ) is calculated using Eq. 2. Dt is calculated using the travel time of the current hour (th ), the number of buses assigned to the route (b), and the enlistment time of the bus (Et ). b Et (1) Et = i=0 i b Et + t h (2) b Now the frequency (f ), Eq. 3, can be estimated with the departure time (Dt ) and the quantity of time in minutes (t). Dt =

f=

t Dt

(3)

Mathematical Model for Assigning an Optimal Frequency of Buses in an ITS

75

Fig. 2. Transcaribe’s network road.

3.3

Maximum Capacity

To estimate the maximum capacity per hour for the route (Ch ), can be estimated with the frequency (f ) and the maximum capacity of each bus (Cb ). Ch = Cb f 3.4

(4)

Estimating Waiting Time

Once the f all the formulas above are calculated, the waiting time can be calculates as well. The waiting time of the user i (Wtui ), can be calculated with the time when the user i arrives to the bus stop (Atui ) and the arrival time of the bus j that receives the user i (Abj ), Eq. 5. Wtui = Abj − Atui

(5)

Now, it is imperative to know at what time the bus j arrives (Atbj ). Equation 6 allows to calculate this time. Abj = St + j Dt + ti

(6)

Where, – St is the start time of the system. – f is the frequency of the system calculated on Eq. 3. – ti is the time that the bus j takes from the initial point of the course to the point where it meets the passenger i.

76

3.5

J. S. M. Quintero and J. C. Mart´ınez Santos

Suggested Uses

There are three cases in which this model can be used depending on the number of buses: fixed number of buses, cover the whole demand and attend a specific percentage of the demand. On any case, the output of the b should be approximated to the upper limit if the system can afford it, if not, to the lower limit. I.E if the b got with the model is 12.3 and the system can afford to have as many buses as needed, b will get the value of 13. If not, it will get the value of 12. As well, the f should be always approximated to the lower limit. Case 1. Fixed Number of Buses. This case is useful when the system can’t afford to aggregate more buses to the route and wants to maximize the number of attended users. The model enables to find an optimum frequency that will maximize the number of users attended. Equations 2 and 3 can be combined to calculate f : f=

tb Et + t h

(7)

Case 2. Cover the Whole Demand. This case is useful when the system can afford to aggregate as many buses as needed to fulfill the entire demand. Combining Eqs. 2, 3 and 4 allows to determine the number of buses b that will cover the whole demand: b=

d(Et + th ) Cb t

(8)

Case 3. Attend a Specific Percentage of the Demand. This case is useful when the system wants to augment the number of users attended but is not able to aggregate as many buses to the route as needed to fulfill the whole demand. Knowing the demand, the percentage specified can determine a fixed number of buses on which Eq. 7 can be used. 3.6

Data Survey

To define the time spent to get from the initial point to the end, the model uses the Google traffic API by taking the estimated range time it shows when departing at a certain hour in a business day and the real time. Then, the percentage errors of the lower limit, the average limit, and the upper limit were taken. With this information, the average was calculated and used the one that has the minimum error. These results are shown in Table 1.

Mathematical Model for Assigning an Optimal Frequency of Buses in an ITS

77

Table 1. Google traffic times Hour

Estimated Real time (mins) %error min %error average %error max time (mins)

7:00

12 to 16

13

7.7

7.7

23

10:00

12 to 16

14

14.3

0.00

14.29

12:00

12 to 16

13

7.69

7.69

23.08

15:00

12 to 18

14

14.29

7.14

28.57

17:00

12 to 20

16

25.00

0.00

25.00

14.17

13.57

6.5

21.5

Average

4 4.1

Results Current State of the Route

The way transcaribe assigns frequency right now is by satisfying the demand it had at the same hour and day a week ago, in example, if today is Monday at 8:00, it will seek for the demand they had the past Monday at 8:00 and assign a frequency that fulfills that demand. The current demand and f of the route was obtained by counting the number of passengers that used the IPTS on the last week of the first semester of 2017 (May 29th to June 2nd) in the route A1-102 during the following hours: 7:00am to 8:00am, 10:00am to 11:00am, 12:00pm to 1:00pm, 3:00pm to 4:00pm, and 5:00pm to 6:00pm, which are the hours with the highest and lower demand. The Maximum capacity was obtained using Eq. 4 (Cb of the buses assigned to the route is 50 passengers) and shown on Table 3. Table 2 shows the number of attended passengers in one hour and how many buses they took to cover them. Table 2. Departure times of transcaribe’s route at 7 am on the first day of data polling Hour Passengers attended 07:00 21 07:02 16 07:12 15 07:15 19 07:27 12 07:30 14 07:33 25 07:45 22 07:48

7 Total: 151

78

J. S. M. Quintero and J. C. Mart´ınez Santos Table 3. Current state of the route’s demand with eight buses assigned to it. Hour Demand b f (b/h) Maximum capacity (Ch ) % of attended demand 07:00 1105

8 16

800

72.3

10:00

550

5

9

400

81.8

12:00

610

5 10

500

81.9

15:00

310

4

7

350

100

9 16

800

76.1

17:00 1050

As seen in Table 3, the current state of the route can’t satisfy the whole demand of the users which causes long waiting times and dissatisfaction from the users. 4.2

Implementing the Model

Using Case 1. In this case, we need an specific number of buses assigned to the route. Let’s say that the system can only afford to have 6 buses assigned to the route at maximum for any given hour of the day. With this information Eq. 7 can be used. Replacing variables, b can’t be higher than 6 buses assigned to the route at any hour, t is 60 min, Et is 5 min maximum and th can be determined using Table 1. If the (Ch ) got with Eq. 4 is greater than the demand, Eq. 8 can be used. The results got applying this case are shown in Table 4. Table 4. State of the route’s demand using case 1 of the model. Hour Demand b f (b/h) Maximum capacity (Ch ) % of attended demand 07:00 1105

6 11

550

49.7

10:00

550

6 11

550

100

12:00

610

6 11

550

90.1

15:00

310

4

8

400

100

17:00 1050

6

9

450

42.8

Note that in the case of hour 15:00, Eq. 7 throws that 11 buses per hour should be the frequency of this hour but the Ch would be 550 which is greater that the demand. This would be an unnecessary waste of resources, which is why Eq. 8 can be used instead of Eq. 7. This allows the system to use as low number of buses as needed and fulfill it’s demand at the same time. Using Case 2. The goal in this case is to cover the whole demand, Eq. 8 can be used. Ch has to be equal to the demand the route has which is shown on

Mathematical Model for Assigning an Optimal Frequency of Buses in an ITS

79

Table 3. Cb is 50 passengers per bus, t is 60 min, th can be found using Table 1 and Et kept at a maximum of 5 min. The experiment considers four different number of buses: 6, 8, 10 and 12 since the actual number of buses is 6, and the goal was to increase the demand attended. For the hours factor, it is considered the same hours used in the current scenario. Currently, the system is using 8 buses assigned to the route. The results got replacing the variables are shown in Table 5. Table 5. State of the route’s demand using case 2 of the model. Hour Demand b 07:00 1105

f (b/h) Maximum capacity (Ch ) % of attended demand

12 23

1150

100

10:00

550

7 13

650

100

12:00

610

7 14

700

100

15:00

310

17:00 1050

7

350

100

13 22

4

1100

100

Using Case 3. Now the system needs to improve the percentage of the attended demand by 20% at any given hour if possible. This means that if for a specific hour it has 80% demand attended or more, it would have to improve it to 100%. The desired state of the route is shown in Table 6. Table 6. Desired state of the route using case 3 of the model. Hour Current demand Desired % of attended demand Desired attended demand 07:00 1105

92.3

1020

10:00

550

100

550

12:00

610

100

610

15:00

310

100

310

17:00 1050

96.1

1009

Now that the desired attended demand is calculated, Eq. 8 can be used by replacing (Ch ) with the desired attended demand. The results of this calculation are shown in Table 7. Note that for hours 07:00 and 17:00 the model’s output got more demand attended than the ones initially required. This is because the capacity of the buses is 50 and the desired attended demand is not a multiple of 50.

80

J. S. M. Quintero and J. C. Mart´ınez Santos Table 7. State of the route’s demand using case 3 of the model. Hour Demand b 07:00 1105

11 21

1050

95

10:00

550

6 11

550

100

12:00

610

7 13

650

100

15:00

310

4

7

350

100

13 21

1050

100

17:00 1050

5

f (b/h) Maximum capacity (Ch ) % of attended demand

Future Work

For future work, this model can be implemented as a software that requests google’s traffic API the travel times to calculate the frequencies in real time. The model, as well, needs a way to estimate the demand of a route other than have people seeing how much demand the route has and writing it down. This can be done by using artificial intelligence having as entries previous demand and travel times. Also, as shown in Eq. 5 by having the demand of the route attached to the point where the users gets to the bus, the model could determine the waiting time of each user using the google traffic API to get the travel times of the bus to get to the point where it starts the route to the one that gets to the user. Even if the model is only evidenced for one route, it can be applied to several routes by having the maximum number of buses of the complete system, number of routes and the routes them selfs. This could be done by determining the demand of each route and calibrating the number of buses assigned to each route with the model. Since the waiting times can be calculated with (5), the system can ensure that using this model, it’s passengers won’t be waiting more than a determined amount of time that can be determined by the same system or by the passengers polling them. Once this amount of time is determined, the departure times (Dt ) shown on Table 2 can be assigned similar to the ones of Table 8, doing this the number of routes needed to satisfy the demand and the CO2 emissions could be reduced by 44%. Even though an optimization function isn’t found yet, we have the mathematical equations needed to achieve it. We will have obtain an objective function of bus frequency setting taking in account the demand, travel time and number of buses and optimizes the revenues of the system.

Mathematical Model for Assigning an Optimal Frequency of Buses in an ITS

81

Table 8. Desired frequency setting Hour Passengers attended 07:12 42 07:18 19 07:30 26 07:45 47 07:48

6

7

Conclusion

This model allows to have a control with the attended demand as needed in an ITPS: it allows to determine the frequency of a route which allows to fulfill the entire demand, the frequency when the number of buses assigned to the route is fixed and the frequency when the system requires to set the percentage the attended demand to an specific value. It’s evidenced that using this model can improve an actual one. The current frequency assigned to the use case mentioned in Table 3 used by Transcaribe, outputs an attended demand of 2808 (77.46% of the total demand) whereas using the Eq. 8 proposed in case 2, it can increase to 3625 attended users (100% of the total demand). Also, one of the advantages of this model is that it can be used to estimate the waiting times of the passengers using the system when the time that the users gets to the bus stop is known. It was established that for this model, the travel times of the buses in the route can be calculated using the average between the lower and upper limits of the travel times that google’s traffic API provides to estimate the travel times.

References 1. Ceder, A., Wilson, N.H.M.: Bus network design. Transp. Res. Part B: Methodol. 20(4), 331–344 (1986) 2. Chriqui, C., Robillard, P.: Common bus lines. Transp. Sci. 9(2), 115–121 (1975) 3. Han, A.F., Wilson, N.H.M.: The allocation of buses in heavily utilized networks with overlapping routes. Transp. Res. Part B: Methodol. 16(3), 221–232 (1982) 4. Huang, Z., Ren, G., Liu, H.: Optimizing bus frequencies under uncertain demand: case study of the transit network in a developing city. Math. probl. Eng. 2013, 10 (2013) 5. John, M.P.: Metaheuristics for designing efficient routes & schedules for urban transportation networks. Ph.D. thesis, Cardiff University (2016) 6. Mueller, N., et al.: Urban and transport planning related exposures and mortality: a health impact assessment for cities. Environ. Health Perspect. 125(1), 89 (2017) 7. Parbo, J., Nielsen, O.A., Prato, C.G.: User perspectives in public transport timetable optimisation. Transp. Res. Part C: Emerg. Technol. 48, 269–284 (2014) 8. Rasmussen, T.K., Anderson, M.K., Nielsen, O.A., Prato, C.G.: Timetable-based simulation method for choice set generation in large-scale public transport networks. Eur. J. Transp. Infrastruct. Res. 16(3), 467–489 (2016)

82

J. S. M. Quintero and J. C. Mart´ınez Santos

9. Sch´eele, S.: A supply model for public transit services. Transp. Res. Part B: Methodol. 14(1), 133–146 (1980) 10. Song, B., Wynter, L.: Real-time public transport service-level monitoring using passive WIFI: a spectral clustering approach for train timetable estimation. arXiv preprint arXiv:1703.00759 (2017) 11. Wagale, M., Singh, A.P., Sarkar, A.K., Arkatkar, S.: Real-time optimal bus scheduling for a city using a DTR model. Procedia-Soc. Behav. Sci. 104, 845–854 (2013)

Diatom Segmentation in Water Resources Jose Libreros1(B) , Gloria Bueno3 , Maria Trujillo1 , and Maria Ospina2 1

Multimedia and Computer Vision group, Universidad del Valle, Cali, Colombia {jose.libreros,maria.trujillo}@correounivalle.edu.co 2 Grupo de Investigaci´ on en Biolog´ıa de plantas y Microorganismos, Universidad del Valle, Cali, Colombia [email protected] 3 Grupo de Visi´ on y Sistemas Inteligentes, Universidad de Castilla-La Mancha, Ciudad Real, Spain [email protected]

Abstract. The amount of diatoms present in a water resources is used for monitoring environmental conditions and water quality, and studying climate changes. Diatoms are a kind of algae microorganisms characterised on about 20,000 species. Currently, diatomists and experiencedspecialists perform diatoms segmentation and counting by visually identifying microscopical structures from a given water sample. Thus, differentiating diatoms among all possible unwanted objects on a sample (debris, flocs, etc.) is a task mainly based on subjective, visual assessment, with limited repeatability and inter-observer agreements. In fact, researchers have a special interest on looking for different ways to perform an automated segmentation, that means a combination of algorithms, techniques and approaches of segmentation, along with identification and classification objectives, according to specific areas of application. In spite of applications, automated diatom image segmentation problem has a high level of difficulty, due to the precision and the accuracy that it requires. In this paper, an automated diatom segmentation approach is presented. The proposed segmentation is based on Scale and Curvature Invariant Ridge Detector Filters Bank (henceforth, SCIRD-TS), followed by a postprocessing method for identifying diatoms present in microscopic water sample images. In addition, evaluation based on the sensitivity and the specificity comparative analysis was made between the proposed approach obtained results and an existing Ground Truth of specimens per class. The SCIRD-TS based method and the post processing method are able to segment structures with well-defined edges, whilst a refinement method is required in order to separate diatoms from debris and flocs. Keywords: Diatoms · Hand craft filters · Segmentation Paleo-environmental studies · Water quality monitoring

1

Introduction

Diatoms have been also studied as paleo-environmental markers, since their silica structures make possible to reconstruct the historical environmental conditions c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 83–97, 2018. https://doi.org/10.1007/978-3-319-98998-3_7

84

J. Libreros et al.

by studying diatom fossil deposits in lake sediments. Therefore, these organisms allow not only to determine the current quality of water reserves but also, as authors in [1] suggest, to infer quality status and environmental variables that dominated in the past. Variations in temperature, pH or conductivity over centuries can be estimated by studying these organisms in sediments, allowing to know how climate had affected the studied area, along with to state a baseline conditions from which we can set criteria for optimal quality of a water reserve, according to [2–4]. Currently, diatomists and experienced-specialists visually identify these microscopical structures from a given image, differentiating them among all possible objects on a sample (debris, flocs, etc.). Visual identification of diatoms is a task mainly based on subjective with limited repeatability and required inter-observer agreements [8,9]. In fact, researchers have a special interest on looking for different ways to perform a better segmentation, that means a lot of algorithms, techniques and approaches of segmentation, with identification and classification objectives, according to specific areas of application. In spite of them, there is still an open problem in this field, specially with precision and accuracy that it requires. The problem begins with the fact that detection and counting diatoms are mainly based on subjective, visual assessment, with limited repeatability and inter-observer agreement. Furthermore, segmentation of thin structures, such as morphological micro structures and frustule –the series of linking, siliceous bands associated with a valve–, and discrimination from other elements in an image –fractionated cells or mineral particles– are still unresolved. Some diatoms that were considered as the same species for decades, have been now separated into different species and new species emerge continuously. In this paper, a diatom segmentation approach is presented. The proposed segmentation is based on Scale and Curvature Invariant Ridge Detector Filters Bank (henceforth, SCIRD-TS) [5], followed by a post-processing method for identifying diatoms present in microscopic water sample images. The evaluation is based on the sensitivity and the specificity quantitative analysis using some input conditions. The SCIRD-TS based method is able to segment structures with well-defined edges, whilst a refinement method is required in order to separate diatoms from debris and flocs.

2

Related Work

In water sample images, each structure is different and the process to segment and to identify characteristics of each one may require different approaches. Most of the existing methods for curvilinear structure segmentation rely on handcrafted filters (henceforth HCFs) designed to model local geometric properties of ideal tubular shapes [10,12,13]. HCF local feature representations, such as scale-invariant feature transform (SIFT) and histograms of oriented gradients (HOG), dominate in the field of remote sensing image classification [23]. Relying on a combination of efficient hand-crafted features and learned filters, different proposal methods offer different level of compromise between accuracy and running time. Different researches have proposed, sparse coding and K-means,

Diatom Segmentation in Water Resources

85

to efficiently learn the high amount of filters needed to cope with the multiple challenges involved, e.g., low contrast and resolution, non-uniform illumination, tortuosity and confounding non-target structures [24]. Pappas et al. [20] made a revision of morphometric methods to analyse the shape of structures, such as diatoms, in order to contribute to the research applied to this field. The contour of a diatom is a key element that researchers use in order to identify diatom’s characteristics, such as morphology and geometry, leading to create methods that detect them later, although no additional information is obtained through these techniques, for instance the work carried out by Kloster et al. [21]. As mentioned by Pappas [20], shape analysis and pattern recognition are the main areas of study to create automated methods to identify correctly diatoms. The former is based on kernel filters and post processing method and the latter is based on pattern recognition or machine learning methods. The above is useful as long as the challenges that learning by machine presents versus applied research are handled, as is the case, in which many of the characteristics that are had are not really comparable with what an experienced biologist takes into account to detect and classify different species. Methods for diatom detection and identification have been studied in Cairns et al. [22], Culverhouse et al. [25], Pech-Pacheco and Alvarez-Borrego [26], PechPacheco [27]. Culverhouse et al. [25] proposed diatom identification methods based on coherent optics and holography. Although it is a good technique, the high costs that it implies has not been of great attraction for the scientific community, and it has not been escalated as an alternative applied to support biologists [28]. Pech-Pacheco et al. [26] presented a hybrid optical-digital method for the identification of five species of phytoplankton through the use of operators invariant to translation, rotation and scale. Some research have focused on a single genus within the variety that exists. For example, Pappas and Stoermer [29], used Legendre polynomials and principal component analysis in the identification of the Cymbella cistula species. Rojas Camacho et al. [14] studied the use of a tuning method to set up the best parameters iteratively, as an optimisation problem, comparing the current result with the last result, and then validated them with Canny technique and a binarisation technique. Ambauen et al. [19] devised an approach based on Means of Graph Matching. Luo et al. [15] proposed a segmentation approach using curve fitting and texture measures with spectrum features to Round diatoms obtaining 96.2% accuracy. However, the limitation is that there are a large variety of diatoms with different forms. Segmentation based on morphological watershed from markers is studied in [16], and also a segmentation of structures such as diatoms based on a Gaussian kernel is proposed by Jalba et al. [17]. Fischer et al. [18] presented a grid graph matching based identification. They found a problem with the peaks in the curvature signal, having unsatisfactory results in one of the feature vectors. It is true that the detection and segmentation of structures like diatoms are the first step in any investigation. However, the direction of computer science applied to the field of diatoms is focused on the classification of species. The

86

J. Libreros et al.

Automatic Diatom Identification and Classification (ADIAC) project has set a very important precedent in the investigation of automatic diatoms classification [28,30]. In ADIAC, 171 features were used for diatom classification. These features are used to describe symmetry, shape, geometry and texture by means of different descriptors. The best results, up to 97.97% accuracy, were obtained with 38 classes using Fourier and SIFT descriptors with random forest. Performance decreased down to 96.17% when classifying 55 classes with the same descriptors and classifier [31]. In other approaches as [8,9], most of the efforts are still carried out with hand-crafted approaches or hand-designed methods where a set of fixed features is used. However, still, the hand-crafted methods present limited results as in [32], where 14 classes were classified with Support Vector Machines (SVMs) 10 fcv, using 44 GLCM features that describe geometric and morphological properties. They obtained an accuracy of 94.7%. Morphological operators, such as dilation and erosion, have demonstrated to be a good approach to filter noise and smooth images. Morphological operators are a good alternative in addition to filtering operators, [37]. For a long time, the strategy of identifying the centroids has been followed in order to determine the objects present in an image. This is how the bounding box is commonly built and compare those Regions of Interest (RoIs) bounding boxes with a Ground Truth [38]. At present, there is no a system capable of taking into account variations in both, contours and textures, in a relatively large number of species. One of the reasons is the difficulty in acquiring a big dataset of tagged diatoms with a sufficient number of samples per species. The manual classification of diatoms is tedious and laborious, even for expert diatomists.

3

Curvilinear Structure Model with Gaussian Behavior

In many researches, the process of segmenting thin structures has been tackled under the assumption of Gaussian-like profile [13,17,33]. This assumption emerges as a result of fitting Gaussian probability distributions to the data – making critical the task of understanding and broadening the nature of Gaussian fits. It may be assumed locally-straight curvilinear structure, that can be modelled with the multivariate Gaussian distribution with parameters zero-means (n-D) and a diagonal covariance matrix [6,7]: G(ψ; σ) = 

(2π)

1 n n

i=1



σi2

e

where ψ = (ψ1 , ψ2 , ..., ψn ),

n

ψi2 i=1 2σ 2 i

,

(1)

Diatom Segmentation in Water Resources

87

represents a point in the {ψ} coordinate system, and σ = (σ1, σ2, ..., σn ), represents the standard deviations in each direction. The second derivate of (1) with respect to each variable is:    1 ϕ2j Gψj ψj (ψ; σ) = G(ψ, σ) 2 −1 . σj σj2

(2)

In spite of that, evidence on treatments of structures with higher tortuosity and fragmented rates had demonstrated that these locally-straight assumption is not always valid, and it does not necessary to assume continuity and locally straight tubular shapes [6]. R. Annunziata uses the evidence where Lin et al. [34] extend the linearly transformed multivariate Gaussian models for introducing a non-linear transformation τ : n → n , with τ (x) = ψ = (ψ1 , ψ2 , ..., ψn ) of the form: ψ1 = x1 + k1  ψ2 = x2 + k2 m2i (x1 ) ψ3 = x3 +

 i

i

k3i m3i (x1 , x2 )

. .

ψn = xn +



. kni mni (x1 , x2 , ..., xn−1 ),

i

where the non-linear functions mij (x1 , ...xi−1 ) are suitably well-behaved and fixed basis functions of x1 , ...xi−1 . In the 2-D case (with n = 2), the equation is obtained by applying the transformation in 2, used by R. Annunziata as SCIRD:

x2 (x +kx2 )2 − 12 − 2 2 1 (x2 + kx21 )2 ) 1 2σ1 2σ2 F (x; σ; k) = 2 −1 ∗e . σ2 Z(σ)) σ22

4

(3)

Methodology

The proposed segmentation approach is based on “Leveraging Modelling and Machine Learning for the Analysis of Curvilinear Structures in Medical Images” [7] along with “Accelerating Convolutional Sparse Coding for Curvilinear Structures Segmentation by Refining SCIRD-TS Filter Banks” [5], and we use the implementations available at the author’s web-page. The SCIRD-TS is a strategy that supports the process of segmenting retinal images and identifying thin

88

J. Libreros et al.

Fig. 1. Flowchart of the main steps of the two proposed methods. Figure a. method 1 and b. method 2

structures. Thus, the SCIRD-TS algorithm is adapted and tested on our application domain using a set of diatom images. Later on, a post-processing method is developed in order to segment RoIs with diatoms. The main stages for creating this kind of image data-set are: image acquisition, data labeling, image processing, and then the dataset is built. The species selection and image acquisition are following the same protocol described by in a previous paper [8]. Figure 1 presents tow model proposed in this paper. Then a description of the other stages is presented. 4.1

Image Data-Set

The process begins with collecting samples from watersheds. Those samples may contain organic matter and other elements. Chemical substances are used for treating and made visible desired structures. It involves the application of hydrogen peroxide, with a temperature between 70 and 90 ◦ C. In order to open valves of diatoms and be able to establish a valve view, 1 N hydrochloric acid is used.

Diatom Segmentation in Water Resources

89

A small water portion is placed on a suitable plate and diatoms are attached to the glass slide for analysing under a microscope using a Naphrax, a synthetic resin, with an optical refractive index of 1.7 [11]. Images are captured from different areas of the plate in order to detect the presence of diatoms of different species in a sample. For this study, we use 365 images of water resources, provided by the Plant Biology and Microorganisms Research Group at the Universidad del Valle. In order to preserve the integrity of the data obtained in samples, as well as the protocols that biologists handle in sample classifications, and taking into account that conditions may be different between one moment and another, the set of images is distributed according to the water basin, the date and the plate from which a sample was obtained, it is called sample. Images have been obtained with a microscope Nikon eclipse Ni-U 90. Table 1 shows the amount of images per sample group. Table 1. Number of ground truth images per sample group Sample

Number of images

sample 22Aug2016 Lobog1 Plate1

41

sample 22Aug2016 Lobog1 Plate2

6

sample 22Aug2016 Lobog2 Plate1

1

sample 22Aug2016 Lobog2 Plate2

2

sample 30Aug2016 ElCarmen1 Plate4

16

sample 30Aug2016 ElCarmen1 Plate5

10

sample 30Aug2016 ElCarmen1 Plate6

15

sample 30Aug2016 ElCarmen1 Plate7

5

sample July2016 ElCarmen1 Plate1

43

sample July2016 ElCarmen1 Plate1count 12 sample July2016 ElCarmen1 Plate2count 13 sample July2016 ElCarmen1 Plate4count 15 sample July2016 Harinera1 Plate1

52

sample July2016 Lobog1 plate1

58

sample July2016 Lobog1 plate2

2

sample Oct2016 BitacoDagua1 Plate1

6

sample Oct2016 BitacoDagua1 Plate2

15

sample Oct2016 BitacoDagua1 Plate3

13

sample Oct2016 BitacoDagua1 Plate1

13

sample Oct2016 ElCarmen3 Plate1

13

sample Oct2016 Harinera1 Plate1

14

90

4.2

J. Libreros et al.

Image Processing

After examining the available images, we concluded that the conditions of an image (signal-noise level of the camera, presence of debris in the captured sample and luminosity conditions) are similar among images from each group, noticing a substantial difference between images obtained by one and another microscope. Two methods are proposed to segment diatoms from these images. Hence, the first 8 samples have been classified within a group that are segmented using the method 1, and the next 13 samples are grouped and segmented using the method 2. 1. Images in group 1 are characterised by high luminosity, large size diatoms, fluorescence conditions, some of them with debris concentration and large size and low noise levels (characteristic of the signal of the camera), and whose captured structures have a high relief. The method 1 is proposed, which is based on the application of SCIRD-TS with σ1 = [1 2] with step 1, σ2 = [1 2] with step 1; k = [−0.1 0.1] with step 0.1 and θstep = 15 and a post-processing that uses a fixed threshold, morphological operations and filtering by area, and assuming that flocs are of small size, due to low noise levels. An example of this type of image is shown in Fig. 4a. 2. Images in group 2 are characterised by high noise levels (caused by large load of particles and flocs of organic matter and dead diatoms), low signalto-noise ratio. The method 2 is proposed based on a difference of Gaussians by subtracting the resulted image after a single application of SCIRD-TS and the resulted image after a double application of SCIRD-TS. The first image is obtained using the same set of parameters –σ1 = [1 2] with step 1, σ2 = [1 2] with step 1; k = [−0.1 0.1] with step 0.05 and θstep = 15, and the second image is obtained using a variation, where the set of parameters is modified –σ1 = [1 2] with step 1, σ2 = [1 11] with step 3; k = [−0.1 0.1] with step 0.05 and θstep = 15. Since images have high presence of fluff and dust, the first image has higher light intensity than the second. Subtracting two Gaussian blurs allows to keep the spatial information conserved in the two blurred images, which is assumed to be the desired information [35]. That means to purge dust and fluff. An illustration of these images is shown in Fig. 4g. After the difference of SCIRD-TS applications (showed in Fig. 2), an adaptative threshold is applied in order to use morphological operations and filter objects by area. Images are distributed into two different groups with similar characteristics per group. It allows us the use of the two proposed methods based on an image content. Figure 3 shows the main steps of each method applied to images of the two groups. 4.3

Ground Truth

The segmentation results are compared with the ground truth image data-set, that consists of 365 images with regions indicating the specimens. This allows to

Diatom Segmentation in Water Resources

91

Fig. 2. Image a is the result of SCIRD-TS applied to original image with σ1 = [1 2] with step 1, σ2 = [1 2] with step 1; k = [−0.1 0.1] with step 0.05 and θstep = 15 and the second application (image b) has been made with σ1 = [1 2] with step 1, σ2 = [1 11] with step 3; k = [−0.1 0.1] with step 0.05 and θstep = 15. Image c is the difference between a and b

count the number of specimens associated to an image, and the location of each one. The ground truth consists of manually selecting areas where diatoms are. A labeling tool was provided to the experts in order to facilitate the annotation and know the needs of them.

5

Experimental Results

The performance of the proposed methods is evaluated using two levels of quantitative strategies: pixel and object detection, and measured with precision, recall and accuracy. Figure 4 presents the results of the experiments using the two groups of images. Table 2 shows the results in terms of pixels correctly identified and Table 3 shows the results in terms of error analysis in object identification. Table 2. Error analysis at the pixel level p precision p recall p accuracy Method 1 using image group 1 0.74

0.78

0.95

Method 2 using image group 2 0.56

0.71

0.97

Table 3. Error analysis at the object detection level o precision o recall o accuracy Method 1 using image group 1 0.77

0.94

0.73

Method 2 using image group 2 0.58

0.89

0.55

92

Fig. 3.

J. Libreros et al.

Illustration of the proposed approach: images at the left and at the center have characteristics for using the method 1, and image at the right has characteristics for using the method 2. Up to down: The first row illustrates obtained result after the application of SCIRD-TS, e.. The second row illustrates the difference of two Gaussians using SCIRD-TS. The third row presents obtained result after the threshold applied to SCIRD-TS images using the method 1 (images g. and h.) with a fixed threshold, and the method 2 (image i) with local adaptative thresholds. The fourth row shows obtained result from morphological operations applied to each image, the parameters of the size of the structural elements vary between the two methods. The fifth row shows identified and segmented objects

Diatom Segmentation in Water Resources

93

Fig. 4. Illustration of the two group of images: an original image at the left column: image with low noise level a, image with junction of frustules d and image with high noise level g). A ground truth image, annotation provided by a diatomist, at the centre column (b, e and h). The segmented image by our proposed method at the right column (c, f and i). Up to down: images a. and d. illustrate characteristics that were taken into account to propose the method 1 and image g. illustrates characteristics that were taken into account to propose the method 2.

The experimental results indicate that the method 1 yields higher precision and recall at pixel and at object levels than the method 2, whist the method 2 has higher accuracy than the method 1 at pixel level.

6

Final Remarks and Future Works

In this paper, object regions are localised in an image by exploring a hand-crafted bank filters changing the application domain. Structures to be localised comply with similar characteristics in terms of contrast, width of the lines, with an

94

J. Libreros et al.

advantage of little tortuosity. Therefore, the automated classification of diatoms (in terms of pattern recognition) or taxon identification remains a challenge. We proposed a segmentation that combine SCIRD-TS with a post-processing, using two different approaches based on specific image characteristics in order to identify diatoms present in an image of water sample. We reckon that combining detection of structures and post-processing strategy to detect potential regions of interest, may lead to a substantial speed-up of diatom segmentation, since a post-processing allows to filter unwanted elements. We obtained good results in terms of true positive, however it is still necessary to treat the false positives, according to the recall scores in contrast to the accuracy scores of object detection analysis. Although, morphological operations and filters remove flocs of small sizes, there remain regions with flocs of large size. Those flocs cannot be removed by the above mentioned operations, because wanted structures, such as diatoms, may be affected and they may have even smaller size of unwanted structures. Despite the process conducted at a chemical level to eliminate the largest amount of organic material, flocs and debris, there is still organic material, flocs and debris in images and the process of eliminating these contents through digital image processing is difficult through a purely mathematical process. Up to this point, a classification method is required to filter those regions that have structures that are detected by SCIRD-TS and they are unwanted regions, such as debris or flocs. However, this segmentation process helps greatly to detect those areas of interest that can be easily purified by a classification process, where the machine knows what to eliminate and what not. The research is going in this direction. The proposed segmentation has been discussed with biologists and diatomists, during the manual segmentation of diatoms. They have expressed, according with their experience, some criteria to rule out the counting of a structure that at the first glance is a diatom. Some of structures that have a diatom shape, but at the first glance it is not possible to identify different characteristics that make them differentiate between one species or another (number of striations, signs such as points, crosses, etc.). Thus, biologists and diatomists look for how diffuse is that structure, if it presents an opening of frustule (which gives the appearance of seeing two diatoms), if it presents a pleural or valval view, or if the view of the frustule is too thick, or in the presence of organic matter that overlaps or that is superimposed by the diatom. This information allows us to understand that there are many factors that have to be taken into account in order to propose a solution that suits biologist and diatomist needs, something that, up to this point, suggests why there is a lot of false positives, using the ground truth taken from the biologists. Acknowledgments. The first author thanks to Santander Bank for the financial support for his mobility to Universidad de Castilla-La Mancha, Ciudad Real, Spain. The authors acknowledge financial support of the Spanish Government under the Aqualitas-retos project (Ref. CTM2014-51907-C2-2-R-MINECO) http://aqualitasretos.es/en/.

Diatom Segmentation in Water Resources

95

The authors acknowledge the contribution to this work of Dr. E. Pe˜ na from Universidad del Valle. The authors are also grateful to the anonymous reviewers for their valuable comments, suggestions and remarks, which contributed to improve this paper.

References 1. Smol, J.P., Stoermer, E.F. (eds.): The Diatoms: Applications for the Environmental and Earth Sciences, vol. 17, pp. 283–284. Cambridge University Press, Cambridge (2010) 2. The European Parliament and the Council of the European Union: Establishing a Framework for Community Action in the Field of Water Policy. Official Journal of the European Community, Maastricht, The Netherlands (2000) 3. Presidencia de la Republica de Colombia: Por el cual se establece el Sistema para la Proteccion y Control de la Calidad del Agua para Consumo Humano (2007). http://www.alcaldiabogota.gov.co/sisjur/normas/Norma1.jsp?i=30007#35 4. Instituto de Hidrologia, Meteorologia y Estudios Ambientales IDEAM: Lineamientos conceptuales y metodologicos para la evaluacion regional del agua (2013) 5. Annunziata, R., Trucco, E.: Accelerating convolutional sparse coding for curvilinear structures segmentation by refining SCIRD-TS filter banks. IEEE Trans. Med. Imag. 35, 2381–2392 (2016) 6. Annunziata, R., Kheirkhah, A., Hamrah, P., Trucco, E.: Scale and curvature invariant ridge detector for tortuous and fragmented structures. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 588–595. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4 70 7. Annunziata, R.: Leveraging modelling and machine learning for the analysis of curvilinear structures in medical images. School of Science and Engineering (Computing), University of Dundee (2016) 8. Bueno, G., et al.: Automated diatom classification (part A): handcrafted feature approaches. Appl. Sci. 7, 753 (2017) 9. Pedraza, A., Bueno, G., Deniz, O., Cristbal, G., Blanco, S., Borrego-Ramos, M.: Automated diatom classification (part B): a deep learning approach. Appl. Sci. 7, 460 (2017) 10. Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Wells, W.M., Colchester, A., Delp, S. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 130–137. Springer, Heidelberg (1998). https://doi.org/ 10.1007/BFb0056195 11. Munn, A., Ginebreda, A., Prat, N. (eds.): Experiences from Surface Water Quality Monitoring: The EU Water Framework Directive Implementation in the Catalan River Basin District, vol. 42. Springer, Switzerland (2015). https://doi.org/10. 1007/978-3-319-23895-1 12. Soares, J.V.B., Leandro, J.J.G., Cesar, R.M., Jelinek, H.F., Cree, M.J.: Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification. IEEE Trans. Med. Imaging 25(9), 1214–1222 (2006) 13. Jalba, A.C., Wilkinson, M.H.F., Roerdink, J.B.T.M.: Automatic image segmentation using a deformable model based on charged particles. In: Campilho, A., Kamel, M. (eds.) ICIAR 2004. LNCS, vol. 3211, pp. 1–8. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30125-7 1

96

J. Libreros et al.

14. Rojas Camacho, O., Forero, M., Menndez, J.: A tuning method for diatom segmentation techniques. Appl. Sci. 7, 762 (2017) 15. Luo, Q., Gao, Y., Luo, J., Chen, C., Liang, J., Yang, C.: Automatic identification of round diatom. In: International Conference on Biomedical Engineering and Computer Science (ICBECS), pp. 1–4. IEEE (2010) 16. Jalba, A.C., Roerdink, J.B.T.M.: Automatic segmentation of diatom images. In: Petkov, N., Westenberg, M.A. (eds.) CAIP 2003. LNCS, vol. 2756, pp. 329–336. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45179-2 41 17. Jalba, A.C., Wilkinson, M.H.F., Roerdink, J.B.T.M., Bayer, M.M., Juggins, S.: Automatic diatom identification using contour analysis by morphological curvature scale spaces. Mach. Vis. Appl. 16, 217–228 (2005). https://doi.org/10.1007/ s00138-005-0175-8. LNCS, vol. 2756. Springer, Heidelberg 18. Fischer, S., Gilomen, K., Bunke, H.: Identification of diatoms by grid graph matching. In: Caelli, T., Amin, A., Duin, R.P.W., de Ridder, D., Kamel, M. (eds.) SSPR /SPR 2002. LNCS, vol. 2396, pp. 94–103. Springer, Heidelberg (2002). https://doi. org/10.1007/3-540-70659-3 9 19. Ambauen, R., Fischer, S., Bunke, H.: Graph edit distance with node splitting and merging, and its application to diatom identification. In: Hancock, E., Vento, M. (eds.) GbRPR 2003. LNCS, vol. 2726, pp. 95–106. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45028-9 9 20. Pappas, J., Kociolek, P., F. Stoermer, E.: Quantitative morphometric methods in diatom research (2016) 21. Kloster, M., Kauer, G., Beszteri, B.: SHERPA: an image segmentation and outline feature extraction tool for diatoms and other objects. BMC Bioinform. 15, 218 (2014) 22. Cairns, J.: Determining the accuracy of coherent optical identification of diatoms. JAWRA J. Am. Water Res. Assoc. 15, 1770–1775 (1979) 23. Risojevic, V., Babic, Z.: Unsupervised quaternion feature learning for remote sensing image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9, 1521–1531 (2016) 24. Annunziata, R., Kheirkhah, A., Hamrah, P., Trucco, E.: Combining efficient handcrafted features with learned filters for fast and accurate corneal nerve fibre centreline detection. In: IEEE, pp. 5655–5658 (2015) 25. Culverhouse, P.: Automatic classification of field-collected dinoflagellates by artificial neural network. Mar. Ecol. Prog. Ser. 139, 281–287 (1996) 26. Pech-Pacheco, J.L., Alvarez-Borrego, J.: Optical-digital system applied to the identification of five phytoplankton species. Mar. Biol. 132, 357–365 (1998) 27. Pech-Pacheco, J., Cristobal, G., Alvarez-Borrego, J., Cohen, L.: Automatic system for phytoplanktonic algae identification. Limnetica 20, 143–158 (2001) 28. Du Buf, H., Bayer, M.: Series in Machine Perception and Artificial Intelligence. World Scientific Publishing Co., Singapore (2002) 29. Pappas, J.L., Stoermer, E.F.: Legendre shape descriptors and shape group determination of specimens in the Cymbella cistula species complex. Phycologia 42, 90–97 (2003) 30. Du Buf, H., et al.: Diatom identification: a double challenge called ADIAC. In: 10th International Conference on Image Analysis and Processing, pp. 734–739 (1999) 31. Dimitrovski, I., Kocev, D., Loskovska, S., Dzeroski, S.: Hierarchical classification of diatom images using ensembles of predictive clustering trees. Ecol. Inform. 7, 19–29 (2012) 32. Lai, Q.T.K., Lee, K.C.M., Tang, A.H.L., Wong, K.K.Y., So, H.K.H., Tsia, K.K.: Opt. Express 24, 28170–28184 (2016). Optical Society of America

Diatom Segmentation in Water Resources

97

33. Borges, V., de Oliveira, M.F., Silva, T., Vieira, A., Hamann, B.: Region growing for segmenting green microalgae images. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. IEEE (2016) 34. Lin, J.K., Dayan, P.: Curved Gaussian models with application to the modeling of foreign exchange rates. In: Computational Finance, vol. 99. MIT Press (1999) 35. Davidson, M.W., Abramowitz, M.: Molecular expressions microscopy primer: digital image processing-difference of Gaussians edge enhancement algorithm. Olympus America Inc., Florida State University (2006) 36. Mitchell, H.B.: Image Fusion: Theories, Techniques and Applications. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11216-4 37. Haralick, R.M., Sternberg, S.R., Zhuang, X.: Image analysis using mathematical morphology. IEEE Trans. Pattern Analy. Mach. Intell. 9, 532–550 (1987) 38. Yang, H., Zhou, J.T., Zhang, Y., Gao, B., Wu, J., Cai, J.: Exploit bounding box annotations for multi-label object recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 280–288. IEEE (2016)

Implementation of a Wormhole Attack on Wireless Sensor Networks with XBee S2C Devices Julian Ramirez Gómez(&) , Héctor Fernando Vargas Montoya(&) and Alvaro Leon Henao(&)

,

Metropolitan Institute of Technology, Fraternidad Campus, 050012 Medellín, Colombia [email protected], [email protected], [email protected]

Abstract. One of the most dangerous threats to Wireless Sensor Networks (WSN) are wormhole attacks due to their capacity to manipulate routing and application data in real time and cause important damages to the integrity, availability, and confidentiality of network data. In this work, an empirical method to launch such attack (which is successful) on IEEE 802.15.4/Zigbee devices with source routing enabled is adopted to find signatures for detecting wormhole attacks in real environments. It uses the KillerBee framework with algorithms for packet manipulation through a malicious node to capture and inject malicious packets in victim nodes. Besides, a reverse variant of wormhole attack is presented and executed. To evidence the realization of this threat by the attacking software, the experimental framework includes XBee S2C nodes. The results include recommendations, detection signatures and future work to face wormhole attacks involving source routing protocols like DSR. Keywords: Wormhole  ZigBee  XBee  Source routing IoT  Cybersecurity  WSN  DSR  KillerBee

 Attack

1 Introduction A The Internet of Things (IoT) is a growing technology trend towards connecting all kinds of electronic devices to the Internet. The purpose of IoT devices is to interact and share information to ease end users’ life. Thanks to it, by 2020 nearly 37 billion devices are going to be connected to the cyberspace [1]. Nevertheless, IoT is a new challenge in the information security field because a wide range of devices with different security features can be integrated, leading to a wider security gap. Furthermore, the implementation of security measures such as strong cipher protocols on devices with reduced processing power and memory, like environmental sensors, is a difficult task [2]. One of the most important IoT technologies are Wireless Sensors Networks (WSN), which can be deployed in many places (e.g. homes, buildings, cities, factories and hospitals) to monitor environmental variables: temperature, humidity, movement, lighting, and also to improve processes in the industrial field [3]. © Springer Nature Switzerland AG 2018 J. E. Serrano C. and J. C. Martínez-Santos (Eds.): CCC 2018, CCIS 885, pp. 98–112, 2018. https://doi.org/10.1007/978-3-319-98998-3_8

Implementation of a Wormhole Attack on WSN with XBee S2C Devices

99

On the other hand, a considerable number of vulnerabilities and security threats related to WSNs has been presented in various research studies [4–6] that introduce potential damages to the integrity, availability, and confidentiality of the information in a WSN. Some of these threats are related to the network layer in the protocols stack. They include attacks selective forwarding, sinkholes, and wormholes, and are intended to induce an unwanted behavior in specific elements of WSNs through malicious nodes and traffic manipulation. These attacks are successful because they give an attacker the ability to intercept and modify data in real time, execute denials of service and selective forwarding attacks, store packets, inject false information into legitimate nodes and disrupt routing processes [7]. The risks of wormhole attacks represent new security gaps that must be addressed and reduced to protect end users’ data and privacy. 1.1

Background

Wormhole attacks exploit the mechanisms to discover routes of on-demand routing protocols. The most remarkable cases are Ad-Hoc On-Demand Distance Vector (AODV) and Dynamic Source Routing (DSR) protocols, which use route request (RREQ) and route replay (RREP) packets as a way to discover routes by nodes in a WSN [8]. A RREQ packet is a broadcast message sent by a source node (“S”) to request a route to a destination node (“D”), while an RREP is a unicast message sent by the destination node in response to an RREQ. Besides, when the RREP that contains the route to reach “D” arrives at “S”, the source node stores the route collected by the RREP in the route cache and then sends the application data to “D” through that route. Accordingly, the main goal of wormhole attacks is to build a tunnel between two remote nodes through a third node (“M”) placed within transmission range of “S” and “D”. This occurs when “S” needs to send application data to “D” and broadcasts an RREQ message to discover a route to “D”. “M” (which is listening to network traffic) forwards the message directly to “D” because the RREQ sent by “M” reaches “D” before the original RREQ through the direct link. “M” can listen to RREP from “D” first and then forward it to “S” with better metrics (zero hops), thus creating a false direct link between “S” and “D” through “M” in the process (Fig. 1).

Fig. 1. Wormhole attack with malicious node

100

J. R. Gómez et al.

At this point, the attacker can control the data that flows through the malicious tunnel and launch other attacks. Finally, if victim nodes are too far from each other, the attacker can use two malicious nodes sharing a link to build the wormhole tunnel [9]. 1.2

Related Work

In [10], wormhole attack detection is based on hop count and delay changes between source and destination nodes. If there is a wormhole tunnel between given source and destination nodes, the delay increases due to the longest path created by the wormhole tunnel, while hop count decreases for the same reason. In that sense, the detection scheme compares delay and hop count at a given moment with previous values to detect the attack. [11] proposes to apply modifications to the DSR routing protocol to automatically calculate a Round Trip Time (RTT) delay value between source and destination nodes at a given moment. Thus, initial RTT values are stored and compared with subsequent values of the same kind. If RTT changes, a wormhole attack is detected. Additionally, the network nodes are set in a promiscuous mode to monitor neighboring nodes. [12] introduces a modified version of the AODV routing protocol to calculate the transmission force from source nodes. The method aims to detect wormhole attacks with high transmission power by establishing a transmission power threshold for network nodes. If a node exceeds such threshold, it could be a compromised node and a wormhole attack is detected. In another modification of the AODV protocol [13], network nodes introduce the hash of the hop addresses and hop count into the RREQ packet while it follows a path from source to destination. When the RREQ packet reaches the destination node, the expected hash of RREP is calculated and compared with the received hash. If the hashes do not match, the packet is discarded assuming a wormhole attack in progress. In [10], to detect a wormhole attack, source nodes of RREQ calculate the delay between a sent RREQ and every received RREP to establish an average RTT value for all received routes. If the RTT of one or more routes is less than the average RTT, a wormhole attack is detected, malicious routes discarded, and the detection is replied to neighboring nodes to delete the malicious routes from their routing table. In [13], every node calculates changes in the number of neighboring nodes by counting neighbors at different times. As a result, a wormhole attack is detected if a predefined threshold of the number of neighboring nodes is exceeded by one or more nodes. Besides, [14] presents a wormhole detection algorithm with node connectivity and statistical calculation. Such method defines two terms, node connectivity and network connectivity, to determine the probability of a wormhole attack in progress in the network. The probability of said attack depends on the network’s density, which is based on the number of nodes and connections between nodes. The research studies above conducted tests in simulation environments to measure the impact of wormhole attacks and the effectiveness of different detection/prevention algorithms in WSNs. Nevertheless, they are based on simulations of routing protocol attacks and are difficult to implement in real environments because of the lack of devices with the features required by the proposed methods. Due to existing and potential cybersecurity threats to WSNs, intrusion detection systems need to be developed for real sensor nodes. At last, since most WSN security research studies are

Implementation of a Wormhole Attack on WSN with XBee S2C Devices

101

based on simulation results, future characterization of WSN threats should focus on real devices to build actual security solutions and prevent security disasters in WSN technologies. To expose the flexibility of a wormhole attack and its impact on real cybersecurity environments, this paper proposes an algorithm to execute classic and “reverse” wormhole attacks on XBee S2C devices with source routing enabled. The main goal is to modify the route record field in routing packet headers to manipulate the routing cache in victim nodes. The algorithm is implemented in Python language using the KillerBee framework and an RZUSBSTICK dongle with preinstalled KillerBee firmware. The results include recommendations to prevent wormhole attacks, attack patterns and fingerprints to develop an Intrusion Detection System (IDS) for WSNs as future work.

2 Proposed Wormhole Attack Algorithm The route record field in source routing packets [15, 16] contains the whole route from source to destination when the routing packet reaches the source of data transmission. This feature allows the intermediary hops between source and destination nodes to introduce their network address into the routing packets (RREP) while the packet follows the path from destination to source. A route is thus created and can be used by source nodes to send data packets to the corresponding destination of the source route, as shown in Fig. 2. When the route record field is void in received RREP packets, it means that both nodes source and destination are neighboring nodes.

Fig. 2. Route record parameter process

In a classic wormhole attack, the main goal is to create a false neighborhood between two remote nodes through a third malicious node causing the route record field of RREP packets sent through the malicious links to be unmodifiable by intermediary nodes; as a result, they arrive at the destination with zero hops. This approach encompasses capturing packets, modifying the route record in RREP packets and injecting them into the source node to override its routing table with zero hop routes, which eventually builds a false neighborhood between source and destination nodes, as shown in Fig. 3. Consequently, the route record parameter needs to be modified because RREP packets could come from an intermediary node.

102

J. R. Gómez et al.

Fig. 3. False route record injection

A wormhole attack begins with an attacker introducing a malicious node into a WSN to gather critical information about network attributes related to node types, network ID, frequency and operational channel. During this step, the target nodes are selected. Subsequently, the malicious node starts a packet capture process to find routing packets involving target nodes (interesting traffic). Once the interesting traffic is captured, the malicious node sets the hop count and relay list to zero in the route record header of the routing packets. Besides, source and destination MAC addresses are changed to match the network addressing of victim nodes since packets from an intermediary node can be captured. Finally, the malicious node forwards the modified routing packet to the destination node, overriding its routing table with the false route and creating a false neighborhood between target nodes in the process. The next step is to continue capturing packets to find application data to be modified and injected into destination nodes. When malicious nodes are not able to capture interesting traffic, the packets are stored and the capture process is restarted. Figure 4 shows the workflow of the proposed algorithm. In addition, two conditions must be satisfied to carry out a successful wormhole attack: (1) Source and destination addresses must match between layer 2 (802.15.4) and layer 3 (Zigbee); otherwise, the destination node of the RREP discards the packet. (2) The packet sequence number has to be different from the original routing packet; otherwise, the modified packet is discarded [13]. The proposed attack works by overriding the destination node of the RREP’s routing cache by injecting a modified version of the original routing packet, which prevents ZigBee devices from using the original route.

Implementation of a Wormhole Attack on WSN with XBee S2C Devices

103

Fig. 4. Proposed wormhole attack algorithm.

2.1

Attacking Software Design

The proposed algorithm was used to develop an attacking software for real devices as a tool to probe security levels in wireless sensor networks since most research studies describe wormhole attacks by means of simulation environments. On the other hand, the purpose of the attacking software is to expose attributes of ZigBee devices and wormhole attacks that could be used to effectively detect the latter. This section presents a short description of every phase of the attack. 2.2

Software Requirements

Scapy and KillerBee frameworks are required to dissect, capture and store packets, and also to inject malicious traffic into victim nodes. These features are combined in a Python script to execute the wormhole attack and build the malicious tunnel. (1) Malicious node introduction: During this phase, an attacker sends “beaconframe”1 requests channel by channel to discover routing and coordinator nodes in the network, as well as device addressing and network ID using the zbstumbler command of the KillerBee framework. (2) Attacking software design: The attacking software presents the following attributes and functions. Packet Capture and Network Learning: It occurs when the attacker has selected victim nodes in the network. Then, using relevant networking data like PANID, frequency channel, and node addressing, it captures the packets transmitted over the air through a malicious node. The following pseudocode algorithm describes the packet capture phase.

1

A “beacon-frame” is a message sent by the coordinator node to synchronize the clocks with network nodes.

104

J. R. Gómez et al.

The attack begins by using the sniffer object of the KillerBee framework to capture packets with ZigBee source routing header. Once a source routing packet has been captured, the next step determines if the packet belongs to a target device. If not, the while loop continues until KillerBee’s sniffer captures a source routing packet that involves victim nodes. Interesting Traffic: A packet is interesting traffic when it is originated or sent from/to an attacker-defined victim device. In that sense, the attacker must dissect the captured packet, extract the addressing data and compare it with victim nodes’ addressing. As shown in Algorithm 1, the compare function compares addresses. Since the KillerBee sniffer generates an object from the captured packet, packet dissection becomes a simple task. It consists of retrieving the addressing data from the packet object attributes (Algorithm 2).

Implementation of a Wormhole Attack on WSN with XBee S2C Devices

105

The previous algorithm determines if a captured packet involves victim nodes’ addressing in a data transaction. Once the addressing data is compared, the result can be true if both destination and source addresses of the captured packet and victim nodes are the same. It can also be false if one or more addressing data are not equal. In that case, the packet capture algorithm is executed again. Routing Packet Modification: After finding an RREP packet with the right addressing, the attacking software changes some attributes of the routing information in the captured packet to build the malicious tunnel. It specifically modifies route record information related to hop count, relay list, and sequence number. Algorithm 3 executes the routing packet modification.

Packet modification begins by rewriting the sequence_number of the captured packet with a random number between 1 and 255 to prevent the destination node form discarding the modified packet sent by the malicious node. At that point, the wormhole attack can present two scenarios: (1) victim nodes are further apart than one hop of distance, or (2) the victim nodes are neighbors. The first case describes a classic wormhole attack, and the modifications of hop_count and relay_list are made to “eliminate” the distance between victim nodes. Such changes also make nodes “think” they are neighbors because of the wormhole tunnel. Because victim nodes are distant from each other, layer 2 addressing must be altered to match layer 3 addressing. The second scenario is a “reverse” wormhole attack, where victim nodes are neighbors and a malicious node tries to add distance inbetween. In such case, packet modifications are performed by increasing the hop_count number and adding intermediary nodes to the relay_list. Routing Packet Forwarding: After the routing packet has been modified, the next step is to send it to its real destination with the send method of the KillerBee framework. Additionally, a new packet capture process is conducted to search for application data. The latter is used to make further modifications that may cause an unwanted behavior in the application of the WSN. Algorithm 4 shows the packet injection process.

106

J. R. Gómez et al.

Packet injection causes two possible effects in victim nodes because, once the modified packet is processed by the destination node, it depends on the malicious node whether to forward the next application packets or not. If they are not forwarded, the attack may cause a denial-of-service (DoS) state. Data Packets Modification and Forwarding: As shown in Algorithm 4, this wormhole attack tries to modify application as well as routing data. In this case, the destination node of the application data would receive the attacker’s data. The main differences with a replication attack is that the proposed wormhole prevents the direct communication between involved victim nodes and it works over real-time traffic. At last, the entire process is repeated indefinitely, injecting false routes with every modified data packet sent to the destination node to maintain the wormhole tunnel until the script is stopped or moved to another network point.

3 Implementation and Results In this section, the implementation of the proposed worm- hole attack on a testing network takes place without encryption protocols applied in the packets to measure its impact on unsecured devices. 3.1

Network Requirements and Characteristics

Table 1 lists legitimate features of nodes and the parameters used to build the prototype network. The malicious node specifications are shown in Table 2. Atmel RZUSB STICK with KillerBee firmware is used in conjunction with Raspberry Pi 3 to capture packets and inject modified data and routing packets into victim nodes. In order to execute the reverse and classic wormhole attacks, two testing networks were built with a coordinator node and two router nodes. Figure 5 presents a reverse wormhole scenario with router nodes sharing a direct link, which is common between

Implementation of a Wormhole Attack on WSN with XBee S2C Devices Table 1. Legitimate node features. Type Firmware Functions set Medium access control Network layer

XBee S2C (XB24C) 405E ZIGBEE TH Reg IEEE 802.15.4 ZigBee (Source Routing) 2.4 GHz 2 1

Frequency Router nodes Coordinator nodes Network ID 10 (PANID) Microcontroller Arduino UNO ATMEGA 328p

107

Table 2. Malicious node features. Node type Network interface Firmware Scripting language Frameworks Operating system

Raspberry Pi 3 Model B ATAVRRZUSBSTICK Killerbee Python 2.7.14 Scapy - Killerbee Raspbian

neighboring nodes. On the other hand, Fig. 6 shows router nodes without a direct link and the coordinator node as an intermediary node (adding one hop of distance between router nodes) to test a classic wormhole attack. In a reverse wormhole attack, victim nodes are identified by network addresses 0x72DD (source of route record) and 0x88F8 (source of application data). In a classic wormhole attack, the source node has the network address 0xE99C, while the destination node has 0xF14B. At last, the coordinator node has the default address 0x0000 in both cases.

Fig. 5. Prototype wormhole attack

3.2

network

for

reverse

Fig. 6. Prototype network for classic wormhole attack.

Wormhole Attack Execution

(1) Reverse wormhole attack: The main goal is to add distance between victim nodes by modifying the hop count and relay list in the routing packet, thus avoiding

108

J. R. Gómez et al.

using the direct link shared by nodes 0x72DD and 0x88F8. The following command line output shows the execution of the wormhole attack script.

The attack starts by capturing packets until a source routing packet involving victim nodes is found. Then, the routing packet is injected a hop count equal to 1 and an intermediary node (0xABCD) into its relay list parameter. Finally, when the script captures an application packet, the application data is replaced with the sentence “reverse wormhole”. Figure 7 shows the original application frame sent by node 0x88F8. In this case, the original application packet has the word “TEST”. When the packet arrives at the destination node, an update of the source route is sent to 0x88F8 from 0x72DD, as shown in Fig. 8.

Fig. 7. Original application packet payload.

Fig. 8. Original source routing packet for neighboring nodes.

Figure 9 shows the malicious route injected into 0x88F8 when the reverse wormhole attack captures the first source routing packet.

Implementation of a Wormhole Attack on WSN with XBee S2C Devices

Fig. 9. Modified source routing packet.

109

Fig. 10. Malicious data received at destination node.

Figures 8 and 9 show the difference between both routes. The first one contains the attributes of the original route with 0 relays as “Number of addresses”. The second one contains the false intermediary nodes with a relay that has the address 0xABCD. Due to this, the malicious node is the only one that can listen to the next application packets sent by 0x88F8, which are changed by the attacker’s malicious data (Fig. 10). (2) Classic wormhole attack: Similar to a reverse wormhole attack, this variant captures routing packets to create the malicious tunnel and application packets to inject malicious data. Figure 11 shows the legitimate and malicious routing packet received at the source node. The first route record indicator entry belongs to the original source of RREP and the second entry is the RREP modified by the malicious node to override the routing table of a victim node.

Fig. 11. Received routing packets.

Once again, a source route is updated when the source node attempts to send the word “TEST” and the packet is captured and modified by the wormhole attack. Figures 12 and 13 present the changes in the route received by the source node.

Fig. 12. Original source route fields.

Fig. 13. False source route fields.

110

J. R. Gómez et al.

An evident change can be observed in the field Number of addresses (hop count) of the source routes: the value goes from 1 hop in the first packet to 0 hops in the second. After false route injection, the source node attempts to send the word “TEST” and the task of the wormhole attack script is to replace these data with the word “WORMHOLE”. Figure 14 shows the malicious data packet received by the destination node, and Fig. 15 shows the content of the packet.

Fig. 14. Modified application data received at destination node.

3.3

Fig. 15. Application data content after attack.

Signatures for Wormhole Attack Detection

(1) Routing packet duplication: In ZigBee devices, source routes can be requested by sending the Network Discovery (ND) command or updated when destination nodes receive a packet. In that sense, a wormhole attack must inject false routes for every modified packet that is sent, thus forcing the sensors/devices to receive two source routing packets per data packet transmitted to destination nodes. The abrupt changes in route record fields of the routing packets and the increase in transmitted routing packets could be used to detect the presence of an attacker in the network. (2) Multiple “beacon-frame” requests without a joined de- vice: The first step to attack WSNs is launching a discovering process to identify possible targets in the network. In 802.15.4/Zigbee networks, “beacon-frame” requests are responded by router and coordinator nodes to have new nodes join the network. However, after malicious nodes send a “beacon-frame” request, no new devices join the network. To monitor this behavior, pairing beacon request frames with newly joined devices in the WSN would help to detect active scans before the wormhole attack occurs. (3) Neighborhood table and link status packets: ZigBee devices regularly send link status packets to maintain a first hop neighborhood table. Due to the fact that remote nodes cannot share link status packets, wormholes are detected by examining previous link status messages of nodes in a routing packet with route record of zero hops. If previous link status messages are not found, a wormhole threat is detected. On the other hand, a reverse wormhole is detected by checking

Implementation of a Wormhole Attack on WSN with XBee S2C Devices

111

routing packets with route records containing more than one hop. If the nodes involved in the transmitted packet have shared link status messages before, a reverse wormhole is detected. This approach could be used with neighborhood tables instead of link status messages. 3.4

Recommendations

Due to the harmful behavior of a wormhole attack, the cryptographical features of the ZigBee specification should avoid modifying data and routing packets during wireless transmission. Besides, encryption keys must be regularly changed to prevent brute force attacks and reduce the functionality of possible key extraction from a stolen node. Additionally, a better randomization method for the sequence number in every packet must be implemented by the ZigBee specification to make predicting this number difficult and prevent packet injection attacks, which causes packets with a wrong sequence number to be the discarded by legitimate nodes.

4 Conclusions and Future Work This implementation of a wormhole attack in real devices was successful in using the algorithm proposed to manipulate packets with the KillerBee framework and Scapy decoders. Besides, a new variant of the wormhole attack was introduced and tested to show the flexibility and risk of malicious nodes in a network. Such variant takes advantage of the vulnerability of ZigBee devices for wormhole attacks and packet injection. On the other hand, the lack of effective security measures for WSNs must be explored from an empirical point of view to close the security gap of IoT with the available technology. This would also enable end users to implement security tools for real devices. As future work, an Intrusion Detection System (IDS) for wormhole attacks is going to be designed and implemented using signatures and patterns presented in this paper as result of the wormhole attack execution on real devices. Additionally, other experimental attacks, such as sinkhole and Sybil, will be explored to improve the detection system.

References 1. Sahmim, S., Gharsellaoui, H.: Privacy and security in internet-based computing: cloud computing, internet of things, cloud of things: a review. Procedia Comput. Sci. 112, 1516– 1522 (2017) 2. Rani, A., Kumar, S.: A survey of security in wireless sensor networks, pp. 3–7 (2017) 3. Zhu, C., Leung, V.C.M., Shu, L.E.I.: Green internet of things for smart world. IEEE Access 3, 2151–2162 (2015) 4. Patle, A.: Vulnerabilities, attack effect and different security scheme in WSN: a survey (2016) 5. Goyal, S.: Wormhole and sybil attack in WSN: a review, pp. 1463–1468 (2015)

112

J. R. Gómez et al.

6. Anwar, R., Bakhtiari, M., Zainal, A., Abdullah, A.H., Naseer Qureshi, K.: Security issues and attacks in wireless sensor network. World Appl. Sci. J. 30(10), 1224–1227 (2014). ISSN 1818-4952 7. Jao, M., et al.: A wormhole attacks detection using a QTS algorithm with MA in WSN (2015) 8. Hu, Y., Perrig, A., Johnson, D.B.: Wormhole attacks in wireless Networks. IEEE J. Sel. Areas Commun. 24(2), 370–380 (2006) 9. Jabeur, N., Sahli, N., Muhammad, I.: survey on sensor holes: a cause-effect-solution perspective. Procedia - Procedia Comput. Sci. 19, 1074–1080 (2013) 10. Amish, P., Vaghela, V.B.: Detection and prevention of wormhole attack in wireless sensor network using AOMDV protocol. Procedia - Procedia Comput. Sci. 79, 700–707 (2016) 11. Qazi, S., Raad, R., Mu, Y., Susilo, W.: Securing DSR against wormhole attacks in multirate ad hoc networks. J. Netw. Comput. Appl. 36, 582–592 (2013) 12. Bhagat, S.: A detection and prevention of wormhole attack in homogeneous wireless sensor network, pp. 1–6 (2016) 13. Patel, A., Patel, N., Patel, R.: Defending against wormhole attack in MANET. In: 2015 Fifth International Conference on Communication Systems and Network Technologies (CSNT), pp. 674–678 (2015) 14. Zheng, J., Qian, H., Wang, L.: Defense technology of wormhole attacks based on node connectivity. In: 2015 IEEE International Conference on Smart City/SocialCom/ SustainCom, pp. 421–425 (2015) 15. ZigBee Alliance: ZIGBEE Specification. ZigBee Alliance Board of Directors (ZigBee Standards Organization). Document 053474r17 (2008) 16. Johnson, D.B.: The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks (DSR) (2004)

REAL-T: Time Modularization in Reactive Distributed Applications Luis Daniel Benavides Navarro1(B) , Camilo Pimienta2 , Mateo Sanabria1 , on1 , Willson Melo1 , and Hugo Arboleda2 Daniel D´ıaz1 , Wilmer Garz´ 1

Colombian School of Engineering Julio Garavito, Bogot´ a, Colombia {luis.benavides,daniel.diaz,wilmer.garzon}@escuelaing.edu.co, {mateo.sanabria,willson.melo}@mail.escuelaing.edu.co 2 Universidad Icesi, Cali, Colombia {hfarboleda,cfpimienta}@icesi.edu.co

Abstract. In this paper, we propose REAL-T, a distributed event-based language with explicit support for time manipulation. The language introduces automata for operational time manipulation, causality constructs and Linear Temporal Logic for declarative time predicates, and a distributed-time aware event model. We have developed a compiler for the language and a dynamic run-time framework. To validate the proposal we study detection of complex patterns of security vulnerabilities in IoT scenarios. Keywords: Distributed programming · Event oriented programming Explicit and implicit time management

1

Introduction

Time management requirements in distributed computer systems are becoming more complex. Intrusion detection systems, Internet of things networks (IoT), autonomous vehicles, and smart cities are all examples of reactive, concurrent, and distributed systems with complex real-time management needs. Those systems support millions of interconnected devices with complex and dynamic deployment topologies. However, mainstream distributed computing tools still support relatively simple and naive models of time. Namely, explicit time management using the system clock to tag events, and implicit time management by means of the next-instruction abstraction in programming languages and computer systems. These simple abstractions have created complex usage patterns to address massive parallelism (see, common concurrency patterns in [7]), frequent resource sharing errors (e.g., liveness and data-race errors [5,6]) and convoluted event ordering and synchronization algorithms (see, for example, [25]). Several strategies for explicit time management have been proposed to address the problems described above. Synchronous and asynchronous state machines address the problem of event ordering, pattern recognition, and formal specification of concurrent systems. Other state machine variants consider c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 113–127, 2018. https://doi.org/10.1007/978-3-319-98998-3_9

114

L. D. Benavides Navarro et al.

implicit time management and explicit time management, e.g. timed machines (see [12] for a complete overview). Temporal logic has been used to address real-time system specification and verification [4,17], error detection in concurrent systems [9,22], and Intrusion Detection [3,24]. Logical clocks [19] and vector clocks [21] have been proposed to address causal ordering of events in distributed systems, and have been implemented in several systems to address detection of complex distributed event patterns and debugging and unit testing of distributed concurrent application, see [6,27]. All these approaches suffer of at least one of two problems. First, they provide only some abstractions for time management. Two, except for [6,27], they are non distributed, thus assume centralized access to the program trace. We argue that both restrictions severely limit the applicability of the mentioned tools, considering the current computing systems have complex and heterogeneous requirements for time management, distribution, and concurrency. In this paper we investigate the implementation of several time management strategies in REAL-T, a reactive event-based distributed programming language. We also evaluate their applicability in the context of Intrusion Detection Systems for IoT networks. Concretely, we provide the following contributions: REAL-T, A decentralized, elastic, and time-aware event-based model for distributed programming and the corresponding language design; A prototype implementation of a compiler supporting: automata for complex pattern modelling, causal predicates, and Linear Temporal Logic to address explicit time aware predicates; Evaluation of usage scenarios in the context of Intrusion Detection Systems for IoT networks. The paper is organized as follows. Section 2 motivates our research analyzing actual problems in intrusion detection systems over IoT networks. Section 3 discusses work related to the issue. Sections 4 and 5 present the event-based distributed time model and the corresponding language design. Section 6 presents the prototype implementation of the compiler and the run-time virtual machine. In Sect. 7 we present usage scenarios. Finally, we conclude and discuss future work in Sect. 8.

2

Motivation: Time Constraints in Intrusion Prevention Systems

Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) have been in the security landscape for a long time. On these systems multi-step attacks can be modeled using an automaton to identify a sequence of specified events. If the automaton accepts such a combination of events, it could be evidenced that an attack is occurring, and the system may log the attack or stop the computation. An example of a multi-step attack is a Worm Attack [28] since the attack pattern is based on scanning, exploiting and finally developing a malicious action. Let us imagine a worm attack over a victim host that has installed an IDS.

REAL-T: Time Modularization in Reactive Distributed Applications

115

The first step is to scan the victim’s ports and find one that allows it to infiltrate, which makes the IDS identify event e1 referring to the detected port scan activity. Then, a new event e2 will be generated by the IDS when it detects a suspicious file is entering the system (dropper). The goal of a dropper is to download and install a malware (payload), when this occurs the IDS registers event e3. The next event generated by the IDS e4, happens when the malware runs and tries to fulfill its purpose (ex-filtration, disruption, tampering, etc.). Finally, the malware can go through a state of self-destruction to erase any trace of its existence in the victim’s host, which can also be registered by the IDS as event e5. The attack has gone through five (5) events to fulfill its objective, and each step depends on the fulfillment of the previous ones. Now consider such an attack over a Internet of things Network. The Internet of Things is a technological paradigm envisioned as a global network of machines and devices capable of interacting with each other [20]. A mechanism to detect a simple sequence of events is not enough to recognize attacks on such heterogeneous network. Not even, network-based IDS, which capture packets of network traffic and analyze them to detect possible attacks. Most of mainstream IDS/IPS systems have simple implementations of event sequence detectors, and even small variations on the sequence of events may affect the detection of an attack. Consider, for example, an IoT network that is under attack, due to the non-deterministic nature of the network, and the distributed nature of the attack, events can be triggered in the right malicious order (actual order), but detection may happen in different order. Having support for detection of event sequences is not a guarantee of attack identification. A modern IDS/IPS solution needs more sophisticated mechanisms to detect possible attacks, for example, identifying causality relations, or defining predicates on complex time dependencies. Therefore, in this paper, we argue that such systems could be enriched with real-time detection of intricate patterns of distributed events with sophisticated time dependencies.

3

State of the Art

Several dimensions must be considered when implementing time models in computer Systems (see [12] for a complete discussion). Time models may be discrete or dense (continuous). They may model time simply by imposing order on events or by means a metric system, tagging each event with a clock reading. They may support linear time, where each state has only one successor and one predecessor, or time branching, where each state has one predecessor but could have several successors. They may model time via explicit concepts of time, e.g., a clock, or via implicit concepts of time, e.g., the next step in a sequential algorithm. Finally, the modeler should take into consideration concurrency and composition, in particular considering that the problem of synchronization of parallel activities have created a plethora of abstractions (e.g., thread, process, tasks) and several complex usage patterns (see, common concurrency patterns in [7]), frequent resource sharing errors (e.g. liveness and data-race errors [6]) and convoluted event ordering and synchronization algorithms [25].

116

L. D. Benavides Navarro et al.

According to Furia et al. [12] temporal models of time in modern computer systems may be classified in three categories: operational time models, declarative time models, and hybrid time models. Operational time formalisms describe the evolution of a system, starting from a certain initial state, transitioning to other states through events or transitions. Finite state automata, Statecharts [14], and Petri networks [26] are examples of operational formalisms. On the other hand, declarative models describe explicitly temporal properties that must endure during the execution of the system. Most of these models are based on temporal logic [4]. Temporal logic is a family of first order logic that has temporal operators on time-dependent propositions. Temporal logic allows programmers to describe complex temporal relations among events happening in a computation. Hybrid models include abstractions from operational and declarative formalisms. The model proposed in REAL-T is a hybrid model, including explicit abstractions for automata and explicit abstractions for Propositional Temporal Logic. Several actual implementations of these models have been proposed, see for example [9,10]. Monitoring-Oriented Programming (MOP) frameworks aim to reduce the gap between formal specification and implementation [22]. MOP frameworks monitor whether the activities that are being performed by the software comply with a formal specification. The original MOP framework has been extended with state machines and temporal logic. However, the implementations of such frameworks, address only non-distributed applications and assume full access to the computation trace (see application of MOP frameworks to security [3]). REAL-T extends these ideas into a fully distributed framework for real time monitoring. To complete these discussion, we augment the taxonomy above with a category for distributed logical time abstractions. Logical time has been proposed by Lamport [19] and Mattern [21] to address partial orders of distributed events without a global synchronized clock. Additionally, the order of messages is based on a causality relation among events. Several implementations of these concepts have been proposed, see logical clocks implemented in the Horus system [27] and automata with logical clocks implemented in [6]. REAL-T includes these concepts with the additional support of Propositional Temporal Logic.

4

A Distributed Event-Based Time Model

The proposed programming model has three main components: an event model, a message model, and a time model. The event model describes the general architecture of the distributed application and what events are considered. The message model describes how messages are exchanged and differentiates the messages that exchange meta-information of the events, and the messages comprising the distributed application. Finally, the time model describes different considerations of time for the programming model components.

REAL-T: Time Modularization in Reactive Distributed Applications

4.1

117

The Event and Message Models

The model assumes the existence of a base distributed application where specific behaviors want to be detected or reinforced. The distributed application runs in distributed hardware, e.g., several servers and devices sending and receiving information. The network is a component of the distributed application, it connects all the devices and servers. The base application exchanges messages through the network to accomplish its purpose. REAL-T’s main constructs are event classes, those classes are instantiated with a singleton policy, i.e., each event class creates one event monitor on each running node of the distributed application. The proposed model considers only one type of event: method call. When a method call is detected in the base application, the meta-information of that call is broadcast to the nodes participating in the distributed application. This message does not interfere with the distributed messages of the base application. Furthermore, no restriction is imposed regarding synchronization with the messages of the base application. Event monitors are the instances of the event classes. They consume messages with event information, and they react to those events. The reaction may be a simple notification, e.g., registering the event in a log file, or it may modify the original behavior of the base application. Complex patterns of events are detected using a predicate language. The current implementation supports finite deterministic automata, to detect complex sequences of distributed events [6], causal predicates to reinforce causality [21], and Linear Temporal Logic (LTL, sometimes called PTL or PLTL [15]) to address more complex temporal predicates [22]. The message model differentiates explicitly two types of messages in the application. First, the regular messages that address the purpose of the distributed application. Second, the messages representing the meta-data of events. The meta-data messages are exchanged over the REAL-T framework while the regular messages are exchange over the mechanisms defined by the distributed base application. Thus, even though regular messages and meta-data messages may be triggered by the same events, they travel over different distributed software infrastructures. Figure 1 shows the main concepts of the event and message models. Nodes one, two, and three represent a distributed application with a three-layer architecture. Each node executes a multi-threaded component of the application and exchange messages through defined mechanisms depicted as bidirectional solid lines and white envelopes. Each node has an instance of a monitor. The figure shows only the computation that monitor three detects, thus we only show metaevent messages (black envelopes) arriving at monitor three. Finally, the time line at the bottom of the figure, shows the arriving order of messages with the event’s meta-information at monitor three, emphasizing that different components may see different histories of the distributed application. The other monitors may see different histories, and even the nodes of the application may see different histories.

118

L. D. Benavides Navarro et al.

Fig. 1. Message model

4.2

Time Model

Above we describe the general event model and the distributed message model. Those models characterize the non-deterministic behavior of concurrent and distributed applications. We now introduce the time model for REAL-T. The time model considers three type of time models. The first is operational time, where time is not explicit but is only modeled through the order of messages. The second is logical time [19,21] to address partial orders of events, predicating, for example, over causal relations. Finally, we introduce declarative time using LTL where custom models of time are introduced by the programmers of event classes. Operational Time: Operational formalisms [12] describe explicitly the evolution of software systems. In our model we use Deterministic State Automata to describe complex sequences of events. The automaton is concerned only with the possible next transitions, thus enforcing specific sequences of events. Each transition on the automaton may be guarded with a boolean guard. In our model each monitor may have an automaton definition and depending on the arrival order of messages each automaton will detect different histories of the distributed computation (see Fig. 2). Logical Time: REAL-T also incorporates a notion of logical time, virtual time, and the global state of distributed systems, following ideas presented by Lamport and Mattern [19,21]. On this model, each event is tagged with a value from the

REAL-T: Time Modularization in Reactive Distributed Applications

119

Fig. 2. Operational time model

logical clock instance deployed on the node where the event originated. Logical clocks are updated with the information from other logical clocks, such information arrives with event’s meta-information. The model allows programmers to write predicates on the causality relationship among events, i.e., when an event has causal influence over another. To understand this relation, let us a look at the Happens-before relationship defined in [19], which states that a causal relationship must meet any of the following cases. Let a, b and c be events, two events are considered to have causal relation if: – a and b are in the same process and a takes place before b; then a happensbefore b (a is causally related to b). – a represents an event of sending a message while b is an event of receiving a message, then a happens-before b. – If a happens-before b and b happens-before c, hence, a happens-before c due to relationship transitivity. In any other case, events are considered concurrent ([21]). Declarative Time Model (Linear Logical Time): Finally, REAL-T incorporates a time model based on Propositional Temporal Logic (PTL) [11]. Using PTL programmers write temporal predicates asserting temporal relations among events in a sequence of distributed events. Concretely, REAL-T supports the operators described below, where φ and ψ are PTL formulas:

120

L. D. Benavides Navarro et al.

φ := “Next: In the next moment φ is true”. ♦φ := “Eventually: In some future or present moment φ is true”. φ := “Always: φ is true in all future moments”. φ U ψ := “Until: φ continues being true up until some future moment when ψ is true”. – φ W ψ := “Unless: φ continues being true unless ψ becomes true”. – – – –

As an example, suppose that P2 is a process that at some point of its execution over time sends a message to P1 . P1 then receives the messages and sends a result to P2 so that it can continue its execution (a basic example of distributed computation). This behavior allows us to infer a main property, at some point in the execution of P2 , P1 will eventually happen. If you must specify the previous behavior this specification in PTL would be: Exe(Pi ) := The process Pi is being executed Spec : ♦(Exe(P2 ) ⇒ ♦Exe(P1 ) ) ∧ (Exe(P2 ) ⇒ ¬Exe(P1 ) ) The second condition of the specification informs that the sending of messages between P2 and P1 is not immediate in the same way it could be ensured that the sending of messages between P1 and P2 is not immediate. A possible behavior of that specification would be: •

P1 : P2 : •











LTL introduces the notion of time into a sequence of states or moments. Each state in the series is represented as a model at different moments in time. A model (M) in LTL is composed of: – A set of moments M. – A order relation ≺: M × M → {true, f alse}, the relation may be transitive, non-reflexive, linear, or discrete. This relation defines how moments are ordered and represents the temporal structure of the model. – A function, π : M → P(prop) such as π maps each moment/state to a set of valid propositions, where P(prop) is the power set operator. A concrete model may be represented as follows: Mi π(Mi ) q p∨q

Mi+1 π(Mi+1 ) ¬p p⇒q∨r

Mi+2 π(Mi+2 ) r, s p∨s≡r

REAL-T: Time Modularization in Reactive Distributed Applications

121

Where, – M = {Mi , Mi+1 , Mi+2 } – ≺ is a linear order on M, such that, (∀i, j|i < j : Mi ≺ Mj ) – The function : π = {(Mi , {q, p ∨ q}), (Mi+1 , {¬p, p ⇒ q ∨ r}), (Mi+2 , {r, s, p ∨ s ≡ r})} The model of PTL we are studying considers a discrete and linear model of time, thus each moment of time has at most one successor. However, the event and message models described above require the time model to have specific characteristics. First, each monitor has a concrete instance of a formula attached to it. Second, the set of formulas defines a custom model of time, where the events of interest of each formula define the set of events that moves the model from moment to moment. The next moment is determined by the arrival of an event of interest, i.e., each formula is evaluated once an event of interest arrives to the node. Third, the model of time may vary from node to node. As mentioned before, each node may see a different history of the computation, then the temporal model may be different, especially the sequence of events of interest seen by each node. This implies that evaluation of the formula depends on the model seen by each formula instance. This non-deterministic behavior simplifies implementation and requires no-synchronized clocks.

5

REAL-T by Example: Time Aware Constructs

REAL-T incorporates constructs to implement the event model, the message model, and the time model. In this section, we are going to present the main elements of the language using a security test example. Consider a distributed application with several servers doing business computations and persisting data to a database replicated in a different set of servers. We are interested in detecting write commands on the database that are not in a secure session, i.e., they are not in between a login - logout sequence of events. Figure 3 shows the implementation using an automaton in REAL-T. First, the event class is declared with the name SecurityTest. The events of interest (alphabet of the automaton) are defined from lines 2–10. The signature of the first event, sessionLogin, is defined with parameter uid of type String. The concrete event definition is a boolean expression. The causal construct indicates that the method call is only matched when there is a causal relation with the previous event. The call construct matches any call to the method destroySession on objects of type SecurityManager. The call may be executed on any host of the distributed application. The args construct bounds the parameter values to the variables. Note that, once a variable is bounded, subsequent events using the variable are only matched if the value of the event parameter is the same as the one in the variable. The sessionLogout event is defined similarly but it is interested in destroySession method calls on any host. On the other hand, the write event uses the construct host(localhost) to indicate that it is only interested in write events happening on the local host, i.e., writes that happen

122

L. D. Benavides Navarro et al. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

eventclass SecurityTest{ event sessionLogin(String uid): causal(call( SecurityManager.createSession(String))) && args(uid); event sessionLogout(Fqn name, Object x): causal (call( SecurityManager.destroySession(String))) && args(uid); event write(String uid, Fqn memorySpace, Object value): causal (call( DataBase.write(String, Fqn, Object))) && args(name) && host(localhost); automaton securityViolationDetector(String uid, Fqn memorySpace, Object value){ start init: (write(uid, Fqn, value) −> securityViolationt) || (sessionLogin(uid) −> login); login:(write(uid, Fqn, value) −> login) || (sessionLogout(name, x) −> init); end securityViolation;}

19 20 21 22 23

reaction before securityViolationDetector.securityViolation(String uid, Fqn memorySpace, Object value){ //Reaction to security violation}}

Fig. 3. Example of causal automaton implementing a security test

on the database host. Once the events are placed in lines 12 to 19 we define the automaton. The automaton has three states and four possible transitions. From the init state, the automaton may transition to the login state if a login event is received; or it may transition to the securityViolation state, if a write event is received before a login event. If the automaton is in the login state, it stays there if it receives a write event, or transitions to the init state if it receives a logout event. Lines 21 to 24 show the reaction definition which is executed before transitioning to the SecurityViolation state. Figure 4 shows the same implementation but using a PTL formula. In this case the same set of events are defined, however, those events will determine the set of moments for the temporal model. Once a concrete event is detected, the temporal model moves to the next moment and evaluates the formula. The formula defined in lines 4 to 8 asserts that, in the system is always true (always construct), that immediately (next construct) after a login event, write events are received until a logout event (until construct). If the formula is violated the reaction is triggered.

REAL-T: Time Modularization in Reactive Distributed Applications 1 2 3 4 5 6 7 8

123

eventclass SecurityTest{ // Event definition ltl securityViolationDetector(String uid, Fqn memorySpace, Object value){ always(sessionLogin(String uid) −> next(write(uid, memorySpace, value) until sessionLogout(name,x)))} //Reaction definition}

Fig. 4. Example of PTL formula implementing the security test

6

Compiler Implementation

We have developed a runtime library and a compiler for REAL-T1 . The event and messaging models are implemented using a group communication library [2]; this constitutes the core of the runtime framework. The compiler translates REAL-T programs into AspectJ [16] code, and then it is compiled into Java bytecode. Implementation of automata support, uses an automata library, [23] augmented with group communication. Detection of causal predicates uses vector clocks [21]. Finally, we translate propositional temporal logic formulae into B¨ uchi automata Sect. 4.2, feeding the automata with distributed events. The implementation of automata support and causality support with distribution, follows similar techniques as those described in [5,21]. We now present an overview of the translation of temporal logic into B¨ uchi automata. B¨ uchi automaton [8] is an extension of classic finite automata created to read and evaluate infinite words [1,11,18]. The main difference with finite automata is that the acceptance criterion over an infinite word is that there exists a run of the automaton which visits infinitely often one or more final states. Further information regarding the details of the translation is beyond the scope of this work, we encourage the interested reader to see [11]. However, we use now an example to show the mechanics of the translated automata. Consider the following formula defined in REAL-T: always(login −> next(write until logout))

The formula describes a property where always, immediately after a login event, there is a sequence of write events until a logout event occurs. Once the first event occurs, it will only recognize write until a logout appears. Note that any other declared event will be considered as another element in the alphabet. So, if any other event happens between a login and a logout different to write, it is a violation of the temporal property. Note that the B¨ uchi automaton of the formula (see Fig. 5) contains a transition labeled with 1, this transition is followed if the implementation moves the clock to the next moment. Thus, even though, the automaton does not have a transition for a specific event, if an event 1

https://github.com/unicesi/eketal.

124

L. D. Benavides Navarro et al.

Fig. 5. Buchi automaton for LTL property

notification arrives, the model will move to the next moment. REAL-T translates the formula into a B¨ uchi automaton using a library described in [13].

7

Securing an IoT Ecosystem

We evaluate the applicability of the proposed language in the context of an IoT scenario. Our scenario is composed of three main components. An IoT Sentinel monitoring several IoT devices at a particular site (e.g., home, factory, hospital, etc.). A network-based IDS (NIDS) monitoring messages flowing from the IoT devices to an IoT platform deployed on the cloud (e.g., Oracle IoT, Samsung Artik, Amazon IoT). A host-based IDS (HIDS) running on the storage server on the IoT platform infrastructure deployed on the cloud. The IoT Sentinel generates three kind of events: (i) urlAccess, when an IoT device with device Id (dId) access to a web server (url), (ii) dropperDownload, when a dropper (Object X) is downloaded to an IoT device (dId) using a web server (url), (iii) payloadDownload, when a dropper (Object X) located in an IoT device (dId) downloads a payload (Object Y). The NIDS generates event servicesScanning when a scanning activity with a severity (sev) is performed from an IoT device (dId) over a server (target) on the IoT platform. At last, the HIDS generates an event injectionAttack when an injection (dataHash) coming from an IoT device (dId) is detected against itself (target). For such a distributed configuration, traditional IDS components detect those events as independent nonsuspicious events. Then, after an attack, an automatic system may notice the event relation when accessing the full trace of the computation, i.e., when all the trace logs from all IDS components are compared together. However, REAL-T can do better. In REAL-T, the specific sequence of related events can be described using a PTL formula over distributed events on the system. The use of PTL to detect this attack is shown in Fig. 6. Figure 6 shows event definitions from lines 3 to 18, where the first three events are detected in the IoTDEvices group of hosts, and the other two events are detected at the NIDS and HIDS groups. In lines 20 to 27 a PTL formula is defined. The formula links together events through the common values dId, X, Y, url and target. The formula consumes events respecting causal order, and moves the model to the next moment each time a defined event arrives. The

REAL-T: Time Modularization in Reactive Distributed Applications

125

Fig. 6. Example of PTL formula detecting an distributed IoT attack

last part of the formula is negated to force triggering of the reaction when the pattern is violated. Lines 29 to 32 define the reaction when the security violation has occurred. The reaction is to isolate the IoT device (dId) and gather forensic evidence for later adversary analysis. Notice, that the reaction is taken on all the IDS. Thus, using REAL-T, not only the attack is detected in real time, but the IDS take a common action against the attack.

8

Conclusions

We have presented REAL-T a programming language for real time monitoring of distributed applications. The language supports a fully distributed programming model, with a notion of distributed events and distributed messages, no

126

L. D. Benavides Navarro et al.

central control is needed for the run-time infrastructure of the language. The language also incorporates a model to detect complex temporal relations among events. It supports operational time management through automaton constructs, predicates over causal relations of events using logical vector clocks, and predicates of Propositional Temporal Logic. We have explored composition of time models, allowing automata transitions to depend on causal relations of atomic events. We have validated the time model implementing a functional compiler capable of monitoring distributed java applications. The implementation of the compiler has concrete constructs for the automata and translates PTL formulas into Buchi automata. The run-time framework supports logical clocks and presents a fully distributed framework based on group communication. We have evaluated the usage of the model in the context of intrusion detection systems for IoT networks. Several open questions remain. First, we must explore run-time performance on real scenarios. We also should explore different semantics for the language, addressing how the instantiation policy affects the patterns that programmers may use. We also must explore applicability in other domains, for example, how real-time monitoring system may improve the performance and behavior of autonomous vehicles.

References 1. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT press, Cambridge (2008) 2. Ban, B., Grinovero, S.: JGroups (2011) 3. Barringer, H., Goldberg, A., Havelund, K., Sen, K.: Program monitoring with LTL in EAGLE. In: 18th International Parallel and Distributed Processing Symposium 2004, April 2004 4. Bellini, P., Mattolini, R., Nesi, P.: Temporal logics for real-time system specification. ACM Comput. Surv. 32(1), 12–42 (2000) 5. Benavides Navarro, L.D., Barrera, A., Garc´es, K., Arboleda, H.: Detecting and coordinating complex patterns of distributed events with KETAL. Electron. Notes Theor. Comput. Sci. 281, 127–141 (2011) 6. Benavides Navarro, L.D., Douence, R., S¨ udholt, M.: Debugging and testing middleware with aspect-based control-flow and causal patterns. In: Issarny, V., Schantz, R. (eds.) Middleware 2008. LNCS, vol. 5346, pp. 183–202. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89856-6 10 7. Benton, N., Cardelli, L., Fournet, C.: Modern concurrency abstractions for c#. ACM Trans. Program. Lang. Syst. 26(5), 769–804 (2004) 8. B¨ uchi, J.R.: Symposium on decision problems on a decision method in restricted second order arithmetic. Stud. Log. Found. Math. 44, 1–11 (1966) 9. Chen, F., Ro¸su, G.: Java-MOP: a monitoring oriented programming environment for Java. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 546–550. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-319801 36 10. Chen, F., Ro¸su, G.: MOP: an efficient and generic runtime verification framework. In: ACM SIGPLAN Notices, vol. 42. ACM (2007)

REAL-T: Time Modularization in Reactive Distributed Applications

127

11. Fisher, M.: An Introduction to Practical Formal Methods Using Temporal Logic. Wiley, Hoboken (2011) 12. Furia, C.A., Mandrioli, D., Morzenti, A., Rossi, M.: Modeling time in computing: a taxonomy and a comparative survey. ACM Comput. Surv. 42(2), 1–59 (2010) 13. Giannakopoulou, D., Lerda, F.: From states to transitions: improving translation of LTL formulae to B¨ uchi automata. In: Peled, D.A., Vardi, M.Y. (eds.) FORTE 2002. LNCS, vol. 2529, pp. 308–326. Springer, Heidelberg (2002). https://doi.org/ 10.1007/3-540-36135-9 20 14. Harel, D.: Statecharts: a visual formalism for complex systems. Sci. Comput. Program. 8(3), 231–274 (1987). https://doi.org/10.1016/0167-6423(87)90035-9 15. Haydar, M., Boroday, S., Petrenko, A., Sahraoui, H.: Propositional scopes in linear temporal logic. In: Proceedings of the 5th International Conference on Novelles Technologies de la Repartition (NOTERE 2005) (2005) 16. Kiczales, G., et al.: Aspect-oriented programming. In: Ak¸sit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0053381 17. Konur, S.: A survey on temporal logics for specifying and verifying real-time systems. Front. Comput. Sci. 7(3), 370–403 (2013) 18. Kr¨ oger, F., Merz, S.: Temporal Logic and State Systems. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-68635-4 19. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978) 20. Lee, I., Lee, K.: The internet of things (IoT): applications, investments, and challenges for enterprises. Bus. Horiz. 58(4), 431–440 (2015) 21. Mattern, F., et al.: Virtual time and global states of distributed systems. Parallel Distrib. Algorithms 1(23), 215–226 (1989) 22. Meredith, P.O., Jin, D., Griffith, D., Chen, F., Ro¸su, G.: An overview of the MOP runtime verification framework. Int. J. Softw. Tools Technol. Transf. 14(3), 249– 289 (2012) 23. Møller, A.: dk.brics.automaton – finite-state automata and regular expressions for Java (2017). http://www.brics.dk/automaton/ 24. Naldurg, P., Sen, K., Thati, P.: A temporal logic based framework for intrusion detection. In: de Frutos-Escrig, D., N´ un ˜ez, M. (eds.) FORTE 2004. LNCS, vol. 3235, pp. 359–376. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3540-30232-2 23 25. Ousterhout, J.: Why threads are a bad idea (for most purposes). In: Invited talk Given at USENIX Technical Conference (1996). https://web.stanford.edu/ ∼ouster/cgi-bin/papers/threads.pdf 26. Petri, C.A.: Fundamentals of a theory of asynchronous information flow. In: IFIP Congress (1962) 27. van Renesse, R., Birman, K.P., Maffeis, S.: Horus: a flexible group communication system. Commun. ACM 39(4), 76–83 (1996) 28. Robiah, Y., Rahayu, S.S., Shahrin, S., Faizal, M., Zaki, M.M., Marliza, R.: New multi-step worm attack model. arXiv preprint arXiv:1001.3477 (2010)

Odor Pleasantness Classification from Electroencephalographic Signals and Emotional States M. A. Becerra1(B) , E. Londo˜ no-Delgado2 , S. M. Pelaez-Becerra2 , 3 on3 , L. Serna-Guar´ın , A. E. Castro-Ospina3 , D. Marin-Castrill´ 4 and D. H. Peluffo-Ord´ on ˜ez 1 2

Instituci´ on Universitaria Pascual Bravo, Medell´ın, Colombia [email protected] Instituci´ on Universitaria Salazar y Herrera, Medell´ın, Colombia 3 Instituto Tecnol´ ogico Metropolitano, Medell´ın, Colombia 4 SDAS Research Group, Yachay Tech, Urcuqu´ı, Ecuador http://www.sdas-group.com

Abstract. Odor identification refers to the capability of the olfactory sense for discerning odors. The interest in this sense has grown over multiple fields and applications such as multimedia, virtual reality, marketing, among others. Therefore, objective identification of pleasant and unpleasant odors is an open research field. Some studies have been carried out based on electroencephalographic signals (EEG). Nevertheless, these can be considered insufficient due to the levels of accuracy achieved so far. The main objective of this study was to investigate the capability of the classifiers systems for identification pleasant and unpleasant odors from EEG signals. The methodology applied was carried out in three stages. First, an odor database was collected using the signals recorded with an Emotiv Epoc+ with 14 channels of electroencephalography (EEG) and using a survey for establishing the emotion levels based on valence and arousal considering that the odor induces emotions. The registers were acquired from three subjects, each was subjected to 10 different odor stimuli two times. The second stage was the feature extraction which was carried out on 5 sub-bands δ, θ, α, β, γ of EEG signals using discrete wavelet transform, statistical measures, and other measures such as area, energy, and entropy. Then, feature selection was applied based on Rough Set algorithms. Finally, in the third stage was applied a Support vector machine (SVM) classifier, which was tested with five different kernels. The performance of classifiers was compared using k-fold crossvalidation. The best result of 99.9% was achieved using the linear kernel. The more relevant features were obtained from sub-bands β and α. Finally, relations among emotion, EEG, and odors were demonstrated. Keywords: Electroencephalographic signal · Emotion Odor pleasantness · Sensorial stimuli · Signal processing c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 128–138, 2018. https://doi.org/10.1007/978-3-319-98998-3_10

Odor pleasantness classification from electroencephalographic signals

1

129

Introduction

The olfactory sense is a basic function of the human beings that allowing to acquire information about the external world. This processing is done through the bulb and olfactory cortex by the brain [12]. Among the different existing techniques of brain monitoring as Magnetic Resonance Imaging (MRI), functional MRI and Positron Emission Tomography (PET), Electroencephalography (EEG) is the most widely used technique because of its low cost, portability and not invasive nature [1,6]. Moreover, EEG signals have been used to detect activation of the somatosensory system [14,19], since it is a potential tool for diagnosis and treatment for some mental disorders [18,20]. In this sense, olfactory analysis has been gaining interest in multiple fields as Human Computer Interfaces and applications such as detection of olfactory impairments, multimedia, virtual reality, marketing, among others [13]. For recognizing odor pleasantness from EEG alterations, features from different domains has arisen as frequency analysis, time-frequency analysis, non-linear, Event-Related-Potentials-based, and fractal [11,15,20]. A frequency-based feature approach was presented in [24], where the power of the brain frequency bands was used as feature, achieving 72.89% as the best average performance among five participants and using a support vector machine (SVM) single trial classification, which could lead to low generalizations. In [22], a comparison between healthy subjects and patients with olfactory impairment was performed, to this aim, features from time-frequency analysis were computed, demonstrating that such features are reliable for patient/healthy discrimination, achieving a sensitivity of 75%, however, this work presents the discriminative value of each feature, which can be exploited by a machine learning technique. Another set of new features proposed, are the non-linear features, as in [10], where it is proposed to use two non-linear metrics as features to discriminate between pleasant and unpleasant odors, achieving an average accuracy of 55.24%, using a Linear Discriminant Analysis classifier with a leave-one-out strategy for validation. Another approach uses a data set consisting of five healthy subjects for different smell perception with eyes open and closed, features were extracted from the gamma band which is decomposed by means of the continuous wavelet transform [1]. As a classifier, the nearest neighbor was used, achieving an average accuracy of 87.7% ± 4.3 and 94.12% ± 2.9 when open eyes and closed eyes, respectively. As reported, some related works propose to manage features from different domains, and usually, a single learning technique is employed. On the other hand, Barret [2] demonstrated from a meta-analytic investigation of autonomic features of emotion categories [23] that emotions do not have fingerprint and they depend on the context. So, we in this study included an analysis of the effects of the odors on emotions considering that these are into the context of the individuals. So this analysis allows wide the applications of the odor recognition to affect the emotions intentionally for different purposes such as clinical, marketing, multimedia, virtual reality, among others.

130

M. A. Becerra et al.

In this work is presented a classification system for identifying odor pleasantness from acquired EEG signals. A survey of emotion levels and a wireless device were used to record an EEG database from three subjects, which were exposed to ten different odor stimuli twice. Then, a feature extraction stage was carried out using discrete wavelet transform, statistical measures and other features as area, energy, and entropy calculated from EEG sub-bands. These features were reduced by means of Rough Set algorithms at a feature selection stage. Besides, the relation among emotions and odors based on valence and arousal was analyzed. Finally, Support Vector Machine was tested with five different kernels, and their performance was compared using k-fold cross-validation. In general, achieved results show high performance for SVM classifier independent of the applied kernel with the extracted features. The best result 99.9% was achieved using linear kernel. The more relevant sub-bands were α and θ. Besides, the relation among emotion, EEG and odors was demonstrated.

2 2.1

Materials and Methods Discrete Wavelet Transform

The DWT subdivides a signal into its frequency bands without temporal information loss. This time-frequency decomposition generates two sets of basic functions, i.e., wavelets and scaling functions. Mathematically, DWT can be seen as a set of wavelet transform coefficients, and computed as inner product between a wavelet basis and a finite-length sequence of the signal. These allow analyzing the spectrum of the signals as a function of time [17]. Additionally, the DWT is also known as a filter bank due to the application of high-pass and low-pass filters to the EEG time series in the decomposition process. DWT can be expressed as [8] N −1  W f (j, k) = f (n).ψj,k (n), (1) N =0

Where, W f (j, k) is a DWT coefficient and f (n) is a sequence with length N . Wavelet basis is expressed as: 1 n − sj0 .k ψj,k (n) =  ψ( ) sj0 sj0

(2)

Where, sj0 and sj0 .k refers to the discretized versions of the DWT scale and translation parameters, respectively. 2.2

Support Vector Machine – SVM

SVM is a classification algorithm based on statistical learning theory, which was proposed by [5]. The aim of this method is to use a training data X = {x1 , x2 , . . . , xl } with its corresponding label set Y = y1 , y2 , . . . , yl , to find

Odor pleasantness classification from electroencephalographic signals

131

a function that satisfies f (xi ) ≈ yi . In this regard, such a function can be found solving the following dual optimization problem: max αi

l 

αi −

i=1

l 1  αi αj yi yj ϕ(xi , xj ) 2 i,j=1

Subject to : 0 ≤ αi ≤ C,

l 

αi yi = 0

(3)

(4)

i=1

Where αi represents a Lagrange multipliers, ϕ(xi , xj ) is kernel function, C is a free parameter that controls the trade-off between the tolerated error and the flatness of the solution [4], and finally the decision function can be represented as: l  yi αi ϕ(xi , x) + b) y = sgn(

(5)

i=1

This method takes advantage of the kernel trick to compute the most discriminative non-linear hyperplane between classes. Therefore, its selection and tuning of the kernel type are an important step. In this work were tested four kernels.

3

Experimental Setup

In this section is explained the proposed procedure for odor classification, which is shown in Fig. 1. Five stages were carried out as follows: (i) data acquisition, (ii) preprocessing, (iii) feature extraction, (iv) feature selection, and (v) classification.

Fig. 1. Proposed procedure

3.1

Data Acquisition and Experimental Protocol

Three voluntary subjects (two men and one woman, all with the shaved head) participated in the experiments. They had no respiratory, chronic, or mental

132

M. A. Becerra et al.

disease, and their ages ranged between 30 and 50 years. Prior to the experiment, each participant signed a consent form, filled out an anxiety survey [7], and filled out a Beck’s Depression Inventory [3]. All subjects have Normal ups and downs of depression levels and low anxiety (see Table 1). After the experiment, the participant filled out a questionnaire about identification and pleasantness of odors smelled. Twenty registers were acquired per subject, and each register has 14 EEG signals per stimulus. Table 1. Subject characterization: Beck’s Depression Inventory, Anxiety level, and age Survey

Subject 1

Subject 2

Subject 3

Beck’s Depression 8 (Normal) 7 (Normal) 0 (Normal) Anxiety level

Low

Low

Low

Age (years)

52

43

30

To record EEG signals, a controlled environment with low sound and low light was used. The participant sits on a comfortable chair, and Emotiv Epoc+ device is placed on his head. Then, the test is explained, and the participant is asked to keep their eyes closed. During the trial, ten scents were presented in random order. The odors are: Lavender, Coconut oil, Chocolate, Vanilla, Handcream, Acetic acid, Sulfur, Wasabi, Ammonia, and creolin disinfectant. The odors were packaged inside covered bottles to prevent odor identification before the experiment was carried out and also to avoid the contamination of the controlled environment. The odors were presented in random order but trying to alternate pleasant and unpleasant odors. The register is carried out in twenty seconds, from which 13 s are recorded as a baseline without the smell and between the seconds thirteen and twenty, a smell is presented. To present a new smell, the participant takes a break in which smells out coffee for ten seconds and takes a rest for 20 s more. This trial is repeated two times with each subject. This process is shown in Fig. 2.

Fig. 2. EEG signal acquisition process

Odor pleasantness classification from electroencephalographic signals

133

Additionally, a survey of emotion was applied and manikins (see Fig. 3) were shown to the participant during the experiment, following the methodology depicted in [9], to determine valence and arousal before and after to the stimuli (b-valence: valence before the stimulus, b-arousal: arousal before the stimulus, a-valence: valence after the stimulus, and a-arousal:arousal after the stimulus). Finally, each register was labeled as pleasant or unpleasant (see Table 2).

Fig. 3. Self assessment manikins of valence and arousal

Table 2. Label of odors - pleasant and unpleasant Odor Lavender

Coconut oil Chocolate

Vanilla

Handcream

Label Pleasant

Pleasant

Pleasant

Pleasant

Pleasant

Wasabi

Ammonia

Creolin disinfectant

Odor Acetic–acid Sulfur

Label Unpleasant Unpleasant Unpleasant Unpleasant Unpleasant

3.2

Preprocessing

First, artifacts of EEG signals were removed manually using the EEGLAB software. Then, the signals were standardized between −1 and 1. Then, these were decomposed into 5 sub-bands δ (0.5 Hz–4 Hz), θ (4 Hz–7.5 Hz), α (8 Hz–13 Hz), β (13 Hz–30 Hz), and γ (>30 Hz) using a 10th order Butterworth band-pass FIR filter. 3.3

Feature Extraction

Statistical measures, area, energy and Shannon entropy were calculated from the computed sub-bands and from the discrete wavelet transform coefficients obtained from each sub-band (δ, θ, α, β, and γ). Mother Wavelet Daubechies 10 (Db10) was applied.

134

M. A. Becerra et al.

3.4

Feature Selection

A feature selection was carried out using four algorithms based on rough set theory: Rough Set Neighbor RS-N, Rough Set Entropy RS-E, Fuzzy Rough Set Neighbor FRS-N, Fuzzy Rough Set Entropy FRS-E, which are discussed widely in [16]. We followed its methodology for adjusting the parameters of the algorithms. The inclusion rate was adjusted to 0.5, and neighbor distance tolerance was adjusted between 0.05 and 0.5 with increments of 0.05. Two relevant analyses were carried out; the first one was applied including the features obtained in the stage of feature extraction, along with emotion features (b-valence, b-arousal, a-valence and a-arousal) to determine the relation among odors, EEG, and emotion. The second relevant analysis was applied using only the features obtained in a feature extraction stage without considering emotion features. 3.5

Classification

Support vector machine (SVM) was applied for classifying the odors as pleasantness and unpleasantness. Five kernels (Gaussian, Polynomial, Custom, Linear, and Quadratic) were applied using individually the features of each sub-band and using the features obtained from all sub-bands for each subject and all subjects. These experiments were carried out to establish relevant sub-bands. The performance of classifiers was validated using 10-fold cross-validation.

4

Results and Discussion

The first experiment with FRS-N, RS-E, FRS-N, and FRS-E algorithms allowed to establish the relationships among emotions and odors based on valence and arousal before and after to the stimulus. The second process was carried out for determining the most relevant sub-bands of EEG signals. The most relevant sub-band was α (24%), and the worst θ (17%). Besides, FRS-N, RS-E, FRS-N, and FRS-E algorithms were applied on each sub-band for establishing the most relevant features. The results demonstrated to the domain frequency of signal, Shannon Entropy of DWT and Energy of DWT as the most relevant features for all sub-bands. The features of each sub-band were evaluated using the classifier for each subject and all subjects together. Besides, features of all sub-bands were applied on classifiers. Table 3 shows the results regarding the accuracy of the SVM classifier using Gaussian Kernel. The best accuracies of sub-bands were achieved for γ and α. However, the accuracy of all bands decrement the results. Table 4 shows the results in term of accuracy of SVM classifier using Polynomial Kernel. The best results were obtained applying all sub-bands. However, the best accuracies of sub-bands were achieved for δ and γ.

Odor pleasantness classification from electroencephalographic signals

135

Table 3. Percentage of accuracy performance - SVM Gaussian Kernel. Sub-band

δ

Subject 1

98.48 ± 1.36 97.3 ± 1.73

θ

α

β

Subject 2

97.69 ± 1.42 98.97 ± 1.28 98.77 ± 0.95 97.74 ± 1.48 98.62 ± 1.08

99.26 ± 0.92

Subject 3

97.33 ± 1.57 97.95 ± 1.43 98.95 ± 0.98 99.38 ± 0.96 99.04 ± 0.68

99.8 ± 0.49

All subjects 98.94 ± 0.66 99.06 ± 0.57 99.2 ± 0.57 99 ± 0.54 Mean

98.11

98.32

γ

97.84 ± 1.62 97.74 ± 1.66 99.31 ± 1.0

98.69

98.47

All 94.01 ± 1.24

99.29 ± 0.39 99.3 ± 0.41 99.07

98.09

Table 4. Percentage of accuracy performance - SVM Polynomial Kernel. Sub-band

δ

Subject 1

98.67 ± 1.11 97.1 ± 1.46

θ

α

96.96 ± 1.68 97.45 ± 1.38 99.46 ± 0.81

β

99.85 ± 0.44

Subject 2

97.89 ± 1.32 98.13 ± 1.15 98.55 ± 1.36 97.64 ± 1.42 98.52 ± 1.63

99.95 ± 0.26

Subject 3

98.28 ± 1.42 97.14 ± 1.75 98.23 ± 1.28 99 ± 1.19

99.42 ± 0.71

All subjects 99.26 ± 0.46 99.13 ± 0.51 99.3 ± 0.49 98.53

Mean

97.88

98.26

γ

All

98.95 ± 1.05

99.4 ± 0.53 99.33 ± 0.44 99.95 ± 0.14 98.37

99.07

99.79

Table 5. Percentage of accuracy performance - SVM Custom Kernel. Sub-band

δ

Subject 1

98.13 ± 1.58 97.69 ± 1.13 97.1 ± 1.18

Subject 2

97.79 ± 1.67 98.28 ± 1.16 98.43 ± 1.15 98.03 ± 1.41 98.38 ± 1.18 100 ± 0.0

Subject 3

98.8 ± 1.06

All subjects 99.5 ± 0.4 98.56

Mean

θ

α

β

γ

All

97.15 ± 1.38 99.31 ± 0.92 100 ± 0.0

97.47 ± 1.28 98.57 ± 1.18 99 ± 0.93

99.04 ± 1.14 99.6 ± 0.64

99.1 ± 0.42

99.4 ± 0.57

99.4 ± 0.49

99.44 ± 0.45 99.96 ± 0.12

98.14

98.38

98.40

99.04

99.89

Table 6. Percentage of accuracy performance - SVM Linear Kernel. Sub-band

δ

Subject 1

98.28 ± 1.28 97.2 ± 1.24

96.61 ± 1.93 96.96 ± 1.44 99.41 ± 1.06 99.49 ± 0.37

Subject 2

97.45 ± 1.68 98.33 ± 1.0

98.62 ± 1.21 97.1 ± 1.74

Subject 3

98.57 ± 1.18 97.52 ± 1.34 98.57 ± 1.54 99 ± 0.85

All subjects 99.4 ± 0.4 Mean

98.56

θ

α

β

γ

All

98.57 ± 1.41 99.85 ± 0.44 99.14 ± 1.27 99.52 ± 0.68

90.03 ± 0.43 99.4 ± 0.48

99.4 ± 0.48

99.6 ± 0.4

99.95 ± 0.14

95.77

98.12

99.18

99.7

98.3

Table 5 shows the results in term of accuracy of SVM classifier using Custom Kernel. The best results were obtained applying all sub-bands, however, the best accuracies of sub-bands were achieved for γ and δ. Table 6 shows the results in terms of accuracy of SVM classifier using Linear Kernel. The best results were obtained applying all sub-bands. However, the best accuracies of sub-bands were achieved for γ and δ.

136

M. A. Becerra et al. Table 7. Percentage of accuracy performance - SVM Quadratic Kernel.

Sub-band

δ

θ

α

Subject 1

97.1 ± 2.26

95.24 ± 2.2

92.89 ± 2.76 96.77 ± 2.29 98.12 ± 0.0

84.06 ± 4.9

Subject 2

97.84 ± 1.75 95.49 ± 2.07

93.48 ± 3.46 95 ± 2.2

87.1 ± 4.06

Subject 3

95.23 ± 1.56 94.09 ± 3.53

96.76 ± 1.83 96.66 ± 2.16 94.95 ± 2.72

All subjects 98.95 ± 0.6 98.93 ± 0.81 98.8 ± 0.83 Mean

97.28

95.94

95.48

β

γ

All

95.58 ± 2.6

91.76 ± 2.79

98.1 ± 0.91

98.91 ± 0.79 91.28 ± 2.32

96.63

96.89

88.55

Table 7 shows the results regarding the accuracy of SVM classifier using Quadratic Kernel. The best accuracies of sub-bands were achieved for γ and δ. The results for all bands was opposite (the worst result) regarding the previously discussed experiments. Globally, the best results were achieved with the sub-band γ followed by δ. However, the best results were achieved using all sub-bands excluding the results obtained by the SVM classifier with Quadratic Kernel. Finally, the best result of this proposed system is compared with the best results of other approaches in Table 8, where a higher performance is highlighted. Table 8. Comparison with other approaches Approach K-NN classifier [1] Hopfield neural networks [21] SVM (this work) Accuracy 94.12

5

97.06

99.89

Conclusions

In this paper, an SVM classification system was presented with high performance for classifying unpleasant and pleasant odors using features obtained from EEG signals. We demonstrated the relation among emotions, EEG signals, and odors. Taking into account the performance of the classifiers with the different subbands, we concluded that the features proposed in this work obtained from any sub-band are enough for classification of odors as pleasant and unpleasant. However, the best sub-band for this task is the γ followed by δ sub-band using DWT and statistical measures. Despite the limited number of subjects and trial achieved in this study, it is clear that the pleasant and unpleasant odor stimuli generate effects on the subjects that affect their emotional states and they elicited changes of EEG signals. We consider that these results can be an excellent guide to generate new paradigms for original studies which will allow building an odor classification system with high generality. As a future work, we will increase the number of registers of the database (subjects and trials). Besides, it will include other environment controlled and non-controlled for achieving odor classification systems with better generality. Moreover, other features and classifiers can be considered in order to improve

Odor pleasantness classification from electroencephalographic signals

137

the accuracy. Finally, we considered very important to identify if there exists a fingerprint of odor effect on the person from different physiological signals and test multiclass systems non-limited to pleasant and unpleasant classes.

References 1. Aydemir, O.: Olfactory recognition based on eeg gamma-band activity. Neural Comput. 29(6), 1667–1680 (2017) 2. Barrett, L.F.: How Emotions Are Made: The Secret Life of the Brain. Mariner Books, Boston (2017) 3. Beck, A.T., Steer, R.A., Brown, G.K.: BDI-II, Beck Depression Inventory : Manual. 2nd edn (1996) 4. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998). https://doi.org/10.1023/A:1009715923555 5. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411 6. Giraldo, E., Acosta, C.D., Castellanos-Dom´ınguez, G.: Estimaci´ on din´ amica neuronal a partir de se˜ nales electroencefalogr´ aficas sobre un modelo realista de la cabeza. Tecno L´ ogicas, no. 25 (2010) 7. Julian, L.J.: Measures of anxiety: state-trait anxiety inventory (STAI), beck anxiety inventory (BAI), and hospital anxiety and depression scale-anxiety (HADS-A). Arthritis Care Res. 63(Suppl 11), S467–S472 (2011). https://doi.org/10.1002/acr. 20561 8. Khalid, M.B., Rao, N.I., Rizwan-i Haque, I., Munir, S., Tahir, F.: Towards a brain computer interface using wavelet transform with averaged and time segmented adapted wavelets. In: 2009 2nd International Conference on Computer, Control and Communication, pp. 1–4. IEEE, February 2009. https://doi.org/10.1109/IC4. 2009.4909189 9. Koelstra, S., et al.: DEAP: a database for emotion analysis; using physiological. Signals (2012). https://doi.org/10.1109/T-AFFC.2011.15 10. Kroupi, E., Sopic, D., Ebrahimi, T.: Non-linear EEG features for odor pleasantness recognition. In: 2014 Sixth International Workshop on Quality of Multimedia Experience (QoMEX), pp. 147–152. IEEE (2014) 11. Min, B.C., et al.: Analysis of mutual information content for EEG responses to odor stimulation for subjects classified by occupation. Chem. Senses 28(9), 741– 749 (2003) 12. Mori, K., Manabe, H.: Unique characteristics of the olfactory system. In: Mori, K. (ed.) The Olfactory System, pp. 1–18. Springer, Tokyo (2014). https://doi.org/10. 1007/978-4-431-54376-3 1 13. Murray, N., Ademoye, O.A., Ghinea, G., Qiao, Y., Muntean, G.M., Lee, B.: Olfactory-enhanced multimedia video clips datasets. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–5. IEEE (2017) 14. Nakamura, T., Tomita, Y., Ito, S.i., Mitsukura, Y.: A method of obtaining sense of touch by using EEG. In: 2010 IEEE International Conference on RO-MAN, pp. 276–281. IEEE (2010) 15. Namazi, H., Akrami, A., Nazeri, S., Kulish, V.V.: Analysis of the influence of complexity and entropy of odorant on fractal dynamics and entropy of EEG signal. In: BioMed Research International 2016 (2016)

138

M. A. Becerra et al.

16. Orrego, D., Becerra, M., Delgado-Trejos, E.: Dimensionality reduction based on fuzzy rough sets oriented to ischemia detection. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (2012). https://doi.org/10.1109/EMBC.2012.6347186 17. Ortega-Adarme, M., Moreno-Revelo, M., Peluffo-Ordo˜ nez, D.H., Mar´ın Castrillon, D., Castro-Ospina, A.E., Becerra, M.A.: Analysis of motor imaginary BCI within multi-environment scenarios using a mixture of classifiers. In: Solano, A., Ordo˜ nez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 511–523. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-66562-7 37 18. Puchala, E., Krysmann, M.: An algorithm for detecting the instant of olfactory stimulus perception, using the EEG signal and the Hilbert-Huang transform. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds.) CORES 2017. AISC, vol. 578, pp. 499–505. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-59162-9 52 19. Russell, M.J.: Alpha blocking and digital filtering improve olfactory evoked potentials. In: 1991 Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 13, pp. 535–536. IEEE (1991) 20. Saha, A., Konar, A., Bhattacharya, B.S., Nagar, A.K.: EEG classification to determine the degree of pleasure levels in touch-perception of human subjects. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2015) 21. Saha, A., Konar, A., Rakshit, P., Ralescu, A.L., Nagar, A.K.: Olfaction recognition by EEG analysis using differential evolution induced Hopfield neural net. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, August 2013. https://doi.org/10.1109/IJCNN.2013.6706874, http:// ieeexplore.ieee.org/document/6706874/ 22. Schriever, V.A., Han, P., Weise, S., H¨ osel, F., Pellegrino, R., Hummel, T.: Time frequency analysis of olfactory induced EEG-power change. PloS One 12(10), e0185596 (2017) 23. Siegel, E.H., et al.: Emotion fingerprints or emotion populations? A meta-analytic investigation of autonomic features of emotion categories. Psychol. Bull. 144(4), 343–393 (2018). https://doi.org/10.1037/bul0000128 24. Yazdani, A., Kroupi, E., Vesin, J.M., Ebrahimi, T.: Electroencephalogram alterations during perception of pleasant and unpleasant odors. In: 2012 Fourth International Workshop on Quality of Multimedia Experience (QoMEX), pp. 272–277. IEEE (2012)

Exploration of Characterization and Classification Techniques for Movement Identification from EMG Signals: Preliminary Results A. Viveros-Melo1 , L. Lasso-Arciniegas1(B) , J. A. Salazar-Castro2 , D. H. Peluffo-Ord´ on ˜ez2,3 , M. A. Becerra4 , A. E. Castro-Ospina4 , and E. J. Revelo-Fuelag´ an1 1

2

Universidad de Nari˜ no, Pasto, Colombia [email protected] Corporaci´ on Universitaria Aut´ onoma de Nari˜ no, Pasto, Colombia 3 Yachay Tech, Urcuqui, Ecuador 4 Instituto tecnol´ ogico metropolitano, Medell´ın, Colombia

Abstract. Today, human-computer interfaces are increasingly more often used and become necessary for human daily activities. Among some remarkable applications, we find: Wireless-computer controlling through hand movement, wheelchair directing/guiding with finger motions, and rehabilitation. Such applications are possible from the analysis of electromyographic (EMG) signals. Despite some research works have addressed this issue, the movement classification through EMG signals is still an open challenging issue to the scientific community -especially, because the controller performance depends not only on classifier but other aspects, namely: used features, movements to be classified, the considered feature-selection methods, and collected data. In this work, we propose an exploratory work on the characterization and classification techniques to identifying movements through EMG signals. We compare the performance of three classifiers (KNN, Parzen-density-based classifier and ANN) using spectral (Wavelets) and time-domain-based (statistical and morphological descriptors) features. Also, a methodology for movement selection is proposed. Results are comparable with those reported in literature, reaching classification errors of 5.18% (KNN), 14.7407% (ANN) and 5.17% (Parzen-density-based classifier). Keywords: Classification Wavelet

1

· EMG signals · Movements selection

Introduction

Electromyographic signals (EMG) are graphical recordings of the electrical activity produced by the skeletal muscles during movement. The analysis of EMG signals has traditionally been used in medical diagnostic procedures, and more c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 139–149, 2018. https://doi.org/10.1007/978-3-319-98998-3_11

140

A. Viveros-Melo et al.

recently its applicability in human-machine control interfaces (HCIs) designing has increased, becoming -at some extent- indispensable for the activities of people’s lives. Some remarkable applications of EMG-based HCI to mention are: Wireless-computer controlling through hand movement, wheelchair directing/guiding with finger motions, and rehabilitation [1,2]. The muscle-signal-based control is possible thanks to the development of fields such as microprocessors, amplifiers, signal analysis, filtering and pattern recognition techniques. One of the main branches in the investigation of EMG signal recognition is that one aiming to identify features providing a better description of a specific movement. Often, such an identification process results in a difficult task since this kind of signals are sensitive to several artifacts, such as noise from electronic components, the action potentials that activate the muscles, the patient’s health, physical condition, and hydration level, among others [3]. For this reason, it is essential to well-perform a preprocessing stage so that such artifacts can be corrected or mitigated and therefore a proper, cleaner EMG signal is obtained, being more suitable for any today’s application, e.g. a prosthesis’ control. Along with adequate acquisition and preprocessing, EMG signals also require a characterization procedure consisting of extracting the most representative/informative, separable features and measures from the original signal so that the subsequent classification task may work well. That said, every signal EMG-signal processing stage plays a crucial role in the automatic movement identification [4–6]. Despite this research problem has a lot of manners to be addressed, it still lacks a definite solution and then remains a challenging, open issue. Consequently, in this work, we present an exploratory study on characterization and classification techniques to identify movements through EMG signals. In particular, the spectral features (wavelet coefficients), temporal features and statistics are used (Area under the curve, absolute mean value, effective value, standard deviation, variance, median, entropy) [1,4,7,8]. The characterization of the signal, leaves a matrix of large dimensions so it is necessary to make a dimension reduction. Two processes are performed to achieve a good reduction in size, the first is the selection of movements, proposed methodology of comparison between movements, seeking for which have a greater differentiability and present a lower error in their classification. The second process, is a selection of features, consists of the calculation of contribution of each feature to the classification, is carried out through the WEKA program and its RELIEF algorithm [9,10]. Finally, the performance comparison of three machine learning techniques, K-Nearest Neighbors (KNN), Artificial Neural Network (ANN) and classifier based on Parzen density. Each of the stages is developed and explained in depth in the text. The rest of this paper is structured as follows: Sect. 2 describes the stages of the EMG signal classification procedure for movement identification purposes as well as the database used for experiments. Section 3 presents the proposed experimental setup. Results, Discussion and Future work are gathered in Sect. 4.

Characterization and Classification of EMG Signals

2

141

Materials and Methods

This section describes the proposed scheme to explore the classification effectiveness on upper limb movements identification through different machine learning techniques. Broadly, our scheme involves stages for preprocessing, segmentation, characterization, movement selection, feature selection and classification as depicted in the explaining block diagram from Fig. 1.

Fig. 1. Block diagram of the proposed methodology

2.1

Database

The database considered in this study is available at the Ninaweb repository from Ninapro project [7]. It contains the upper-limb-electromyographic activity of 27 healthy people, performing 52 movements, namely: 12 movements of fingers, 8 isometric and isotonic configurations of the hand, 9 wrist movements and 23 functional grip movements. Such movements were selected from relevant literature as well as rehabilitation guides. Also, the Ninapro database includes an acquisition protocol, and brief descriptions of the subjects involved in the data collection. Muscle activity is recorded by 10 double differential electrodes at a sampling rate of 100 Hz. The position of the hand is registered through a Dataglove and a inclinometer. The Electrodes are equipped with an amplifier, and a pass filter and a RMS rectifier. The amplification has a factor of 14000. Eight electrodes are placed uniformly around the forearm using an elastic band, at a constant distance, just below the elbow. In turn, two additional electrodes are placed in the long flexor and extensor of the forearm [4]. By obeying experimental protocol, each subject sit down in a chair, in front of a table with a big monitor. While the electrodes, Dataglove and inclinometer are working, the patient repeats 10 times each of the 52 movements shown on the screen, as can be seen in Fig. 2. Each repetition takes 5 s, followed by a rest period of 3 s.

142

A. Viveros-Melo et al.

Fig. 2. EMG signal acquisition protocol for Ninapro database oriented to movement identification [4].

2.2

Stages of System

Pre-processing. The amplitude and frequency features of the raw electromyography signal have been shown to be highly variable and sensitive to many factors, like Extrinsic factors (electrode position, skin preparation, among others) and Intrinsic factors (physiological, anatomical and biochemical features of the muscles and others) [3]. Some Normalization procedure is therefore necessary for the conversion of the signal to a scale relative to a known and repeatable value. For the structure of the database, the normalization was applied by electrode. An electrode is taken and the maximum value is found. Then, with this value the whole signal is divided. Segmentation. At this stage, a procedure is performed of segmentation. By virtue, the database contains a tags vector, facilitating the trimming of the signals. The vector of tags indicates what action the patient was doing throughout the data collection process. A number 0 means that the patient is at rest and with the rest of the numbers that is making some movement, each movement has an identification number. So the segmentation consists, in taken the signal with its tag vector and eliminate all pauses, leaving only the signals of the movements. The trimmed and normalized electromyographic signals are stored in a data structure. Characterization. The matrix of features is organized in the following way: In each row, the data corresponding to patient n is placed, which is performing a movement j and a repetition k and so on. For columns, a bibliographic review is made, obtaining 28 different features for the EMG signals, which are applied to each of the 10 electrodes. As mentioned above, the feature matrix has a size of

Characterization and Classification of EMG Signals

143

14040 per 280. Among the 28 features, there are two types that can be identified. For a better understanding, each one of them is explained below: – Temporal features: Made a reference to the variables that we can get of the signals that find in the time domain, and quantified each T seconds of time. The features used for this study are: Area under the curve, absolute mean value, rms value, standard deviation, variance, median, entropy, energy and power [1,2,11]. – Spectral features: The time-frequency representation of a signal provides information of the distribution of its energy in the two domains, obtaining a more complete description of the physical phenomenon. The most common techniques used in the extraction of spectral are: the Fourier transform STFT (Short Time Fourier Transform), the continuous wavelet transform CWT, the discrete wavelet transform DWT and the wavelet packet transform WPT [5,8,11]. On one hand, the Fourier transform is widely used in the processing and analysis of signals, the results obtained after its application have been satisfactory in cases where the signals were periodic and sufficiently regular. The results are different when analyzing signals whose spectrum varies with time (non-stationary signals). On the other hand, the Wavelet transform is efficient for the local analysis of non-stationary signals and of rapid transience. Like the Fourier Transform with a time window, locates the signal into a time-scale representation. The temporal aspect of the signals is preserved. The difference is that the Transformed Wavelet provides a multi-resolution analysis with dilated window. The transform of a Wavelet function is the decomposition of the function f (t) into a set of functions ψs,τ , providing a basis called Wavelets Wf . The wavelet transform is defined as Eq. (1):  Wf (s, τ ) = f (t)ψs,τ (t)dt (1) Wavelets are generated from the transfer and scale change of a function ψ(t), called “Mother Wavelet”, as detailed in Eq. 2. t−τ 1 ) ψs,τ (t) = √ ψ( s s

(2)

Where s is the scale factor, and τ is the translation factor. The wavelet coefficients can be calculated by a discrete algorithm implemented in the recursive application of discrete high-pass and low-pass filters. As shown in Fig. 3. A wavelet Daubechies 2 (db2) function is used, with 3 levels of decomposition to obtain the coefficients of the filters of analysis and of detail, then features are extracted to the wavelet coefficients as well as for the discretized signal over time.

144

A. Viveros-Melo et al.

Fig. 3. Wavelet transform

Movements Selection. To reduce the number of movements, a comparison methodology is proposed. Involves taking a group of 2 movements, classify them and calculate their average error, then add one more movement and repeat successively the process until finishing with the 52 movements. Different combinations are made to find the movements that, when classified, have the least possible error. In the event that when adding a movement to the work group, the error goes up abruptly, this movement is eliminated immediately. As a result of this process, we obtained 10 movements, which have a very low classification error and few misclassified objects. It is important to highlight that the classifier used is KNN; each classification was repeated 25 times per group of movements and all functions applied are in the toolbox PRtools. Features Selection. The aim of this stage is to decrease the number of variables, deleting redundant or useless information. To improve the classifiers training time, is reduced the computational cost and performance is improved, when carrying out the training with a subset instead of the original data set. This stage is carried out with the RELIEF algorithm, used in the binary classification, which generalizes to the polynomial classification through different binary problems and giving contribution weights to each feature [9,10]. The algorithm orders the features according to their contributions, from highest to lowest, so that the feature matrix is reorganized and those columns or features that do not contribute to the classification of the movements are eliminated. To decide which number of features are appropriate, tests are performed with the KNN classifier, the number of features are varied each 25 iterations, in each one the movements are classified and the average error is calculated at the end of the iterations. As a result, you get a vector with the average error according to the number of features, as shown in Fig. 4. Thereby, the number of features is reduced to 60 columns, where the error is minimum. This new feature matrix, is used for the next step, the comparison of each classifiers. As seen In Fig. 5, the block diagram shows the methodology explained previously.

Characterization and Classification of EMG Signals

145

Fig. 4. Results of the selection of features, as the average error varies according to the number of features.

Fig. 5. Block diagram of the methodology used to select the optimum number of features

2.3

Classification

The final stage of this process is the classification, we carry out a bibliographic review on the algorithms used for the classification of movements with EMG signals. There are many researches and articles on this topic, where different algorithms were used and its performance is good, but we look for an algorithm, which does not take much time in training, its computational cost has been low, and it has been tested in multiple class problems. Continue with the process, with the new feature matrix, the classification of these movements is carried out with the following techniques. 1. K-nearest neighbors (KNN): It is a method of non-parametric supervised classification. A simple configuration is used, it consists in assigning to a sample the most frequent class to which its nearest K neighbors belong. Having a data matrix that stores N cases, each of which is defined with n features (X1 . . . Xn ), and a variable C that defines the class of each sample. The N cases are denoted: (x1 , c1 ), ..., (xN , cN ),

(3)

146

A. Viveros-Melo et al.

where: xi = (xi,1 ...xi,n ) for all i = 1, ..., N. ci ∈ {c1 , ..., cm } for all i = 1, ..., N. c1 ...cm denote the possible m values of c. x is the new sample to classify. The algorithm calculates the euclidean distances of the cases classified to the new case x. Once the nearest K cases have been selected, x is assigned the most frequent class c. Empirically through different tests, k = 5 was established [8,12,13]. 2. Artificial neural network (ANN): Artificial Neuronal Network is heuristic classification technique emulates the behavior of a biological brain through a large number of artificial neurons that connect and they are activated by means of functions. The model of a single neuron can be represented as in Fig. 6. Where x denotes the input values or features, each of the n inputs has an associated weight w (emulates synapsys force). The input values are multiplied by their weights and summed, obtaining: v = w1 x1 + w2 x2 + ...wn xn =

n 

wi xi .

(4)

i=1

The Neural Network is a collection of neurons connected in a network with three layers: Input layer is associated with the input variables. The Hidden Layer is not connected directly to the environment, but in this layer is where we can calculate each w. The Output Layer is associated with the output variables and are followed by an activation function. The process of finding a set of weights w such that for a given input the network produces the desired output is called training [1,6,14]. In this work a neural network is trained with a back propagation algorithm with a hidden layer with 10 neurons. The weight initialization consists of setting all weights to be zero, as well as the dataset is used as a tuning set. A sigmoid function is used in this work.

Fig. 6. Model of a single neuron

Characterization and Classification of EMG Signals

147

3. Parzen-density-based classifier: Usually the principals classifiers are designed for binary classification, but in practical applications, it is common that the number of classes is greater than two, in our case we have ten different movements to classify [15,16]. So, the Parzen-density-based- classifier is designed to work with multi-class problems. This probabilistic-based classification method requires a smoothing parameter for the Gaussian distribution computation, which is optimized.

3

Experimental Setup

Importantly for this process, we use the toolbox of Matlab called PRTools, which has all the necessary functions to perform the classification of movements with the different machine learning techniques and the calculation of efficiency of them. From the new matrix, two groups are obtained in a random way, one of training that uses 75% of the data and the other group of verification with the rest of the data. With the selected groups, we proceed to classify the movements, this step is repeated 30 times with each classifier, and the error and the misclassified movements are stored in a two vectors. At the end, the average error and the deviation are calculated, which are the measures to estimate the effectiveness of classification of the different machine learning techniques used.

4

Results, Discussion and Future Work

Based on the result of the mean error and the standard deviation, it was clear that in Fig. 7 that the KNN and Parzen-density-based classifier present a better overall performance with 94.82% and 94.8%, while the neural network does not get close to the performance of the other two classifiers, with the 60 selected features reached 85.26% of recognition rate, possibly the number of features or data is not enough for a training of the neural network, since the average number of misclassified movements does not have a big difference with respect to KNN and Parzen, as seen in Table 1. In Fig. 4 shows that it is possible to obtain a good performance with KNN algorithm using only twenty features. Figure 7 Also reveals greater uniformity of the KNN and Parzen classifiers in each of the tests, giving a standard deviation of 0.79% for KNN, 0.72% for Parzen-density-based classifier and 4.52% for the Neural Network. These results are comparable to the results obtained in [1], where with back-propagation neural networks a 98.21% performance was obtained but classifying only 5 movements. The recognition rates were 84.9% for the k-NN in [13] where five wrist movements were classified, this article highlight the difficult to place the electrodes in the forearm, but in our case the database used was acquired with a strict protocol avoiding this problem, this makes a difference in the results obtained, Ninapro also has more data acquisition channels, with which it is possible to obtain more information and discern the most important through the RELIEF algorithm.

148

A. Viveros-Melo et al.

As future work, it is proposed to develop a comparison with more classifiers such as SVM and LDA. Also observe the result of using variations of the algorithms, such as KNN with weights and FF-ANN. Other parameters of performance as specificity, sensitivity and computational cost must also be evaluated. We will explore the possibility to apply this knowledge in a practical application as a hand prosthesis or human machine interactive in real time using EMG signals of the forearm, searching a high classification rate.

Fig. 7. Performance of the classifiers. In the following order, ANN, KNN and Parzen. Table 1. Average of misclassified movements Number of movements Classifiers 1 2 3 4 5 6 7 8 9 10 KNN

5 4 4 3 8 3 3 2 2

2

ANN

7 2 3 4 6 4 4 5 2

4

Parzen

5 4 4 3 8 4 2 2 1

2

Acknowledgements. This work is supported by the “Smart Data Analysis Systems SDAS” group (http://sdas-group.com), as well as the “Grupo de Investigaci´ on en Ingenier´ıa El´ectrica y Electr´ onica - GIIEE” from Universidad de Nari˜ no. Also, the authors acknowledge to the research project supported by Agreement No. 095 November 20th, 2014 by VIPRI from Universidad de Nari˜ no.

References 1. Phinyomark, A., Phukpattaranont, P., Limsakul, C.: A review of control methods for electric power wheelchairs based on electromyography signals with special emphasis on pattern recognition. IETE Techn. Rev. 28(4), 316–326 (2011)

Characterization and Classification of EMG Signals

149

2. Aguiar, L.F., B´ o, A.P.: Hand gestures recognition using electromyography for bilateral upper limb rehabilitation. In: 2017 IEEE Life Sciences Conference (LSC), pp. 63–66. IEEE (2017) 3. Halaki, M., Ginn, K.: Normalization of EMG signals: to normalize or not to normalize and what to normalize to? (2012) 4. Atzori, M., et al.: Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 1, 140053 (2014) 5. Podrug, E., Subasi, A.: Surface EMG pattern recognition by using DWT feature extraction and SVM classifier. In: The 1st Conference of Medical and Biological Engineering in Bosnia and Herzegovina (CMBEBIH 2015), 13–15 March 2015 (2015) 6. Vicario Vazquez, S.A., Oubram, O., Ali, B.: Intelligent recognition system of myoelectric signals of human hand movement. In: Brito-Loeza, C., Espinosa-Romero, A. (eds.) ISICS 2018. CCIS, vol. 820, pp. 97–112. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-76261-6 8 7. Atzori, M., et al.: Characterization of a benchmark database for myoelectric movement classification. IEEE Trans. Neural Syst. Rehabil. Eng. 23(1), 73–83 (2015) 8. Krishna, V.A., Thomas, P.: Classification of emg signals using spectral features extracted from dominant motor unit action potential. Int. J. Eng. Adv. Technol. 4(5), 196–200 (2015) 9. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57868-4 57 10. Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning Proceedings 1992, pp. 249–256. Elsevier (1992) 11. Romo, H., Realpe, J., Jojoa, P., Cauca, U.: Surface EMG signals analysis and its applications in hand prosthesis control. Rev. Av. en Sistemas e Inform´ atica 4(1), 127–136 (2007) 12. Shin, S., Tafreshi, R., Langari, R.: A performance comparison of hand motion EMG classification. In: 2014 Middle East Conference on Biomedical Engineering (MECBME), pp. 353–356. IEEE (2014) 13. Kim, K.S., Choi, H.H., Moon, C.S., Mun, C.W.: Comparison of k-nearest neighbor, quadratic discriminant and linear discriminant analysis in classification of electromyogram signals based on the wrist-motion directions. Curr. Appl. Phys. 11(3), 740–745 (2011) 14. Arozi, M., Putri, F.T., Ariyanto, M., Caesarendra, W., Widyotriatmo, A., Setiawan, J.D., et al.: Electromyography (EMG) signal recognition using combined discrete wavelet transform based on artificial neural network (ANN). In: International Conference of Industrial, Mechanical, Electrical, and Chemical Engineering (ICIMECE), pp. 95–99. IEEE (2016) 15. Pan, Z.W., Xiang, D.H., Xiao, Q.W., Zhou, D.X.: Parzen windows for multi-class classification. J. Complex. 24(5), 606–618 (2008) 16. Kurzynski, M., Wolczowski, A.: Hetero- and homogeneous multiclassifier systems based on competence measure applied to the recognition of hand grasping moveE., Kawa, J., Wieclawek, W. (eds.) Information Technologies in ments. In: Pietka,  Biomedicine. AISC, vol. 4, pp. 163–174. Springer, Cham (2014). https://doi.org/ 10.1007/978-3-319-06596-0 15

An Automatic Approach to Generate Corpus in Spanish Edwin Puertas1,2,3(B) , Jorge Andres Alvarado-Valencia2,3 , Luis Gabriel Moreno-Sandoval2,3 , and Alexandra Pomares-Quimbaya2,3 1

Universidad Tecnologica de Bolivar, Cartagena, Colombia [email protected] 2 Pontificia Universidad Javeriana, Bogot´ a, Colombia {edwin.puertas,jorge.alavarado,morenoluis,pomares}@javeriana.edu.co 3 Center of Excellence and Appropriation in Big Data and Data Analytics (CAOBA), Bogot´ a, Colombia http://www.unitecnologica.edu.co/, http://www.javeriana.edu.co/home, http://alianzacaoba.co/

Abstract. A corpus is an indispensable linguistic resource for any application of natural language processing. Some corpora have been created manually or semi-automatically for a specific domain. In this paper, we present an automatic approach to generate corpus from digital information sources such as Wikipedia and web pages. The information extracted by Wikipedia is done by delimiting the domain, using a propagation algorithm to determine the categories associated with a domain region and a set of seeds to delimit the search. The information extracted from the web pages is carried out efficiently, determining the patterns associated with the structure of each page with the purpose of defining the quality of the extraction. Keywords: Text mining · Corpus · Knowledge extraction Natural language processing · Linguistic computational

1

Introduction

Currently, the web contains a massive amount of information from multiple sources (Online Social Networks, Blogs, Newspapers and others), most of which are in human language and with unstructured or semi-structured data that hinder their interpretation by computational tools. Some of these sources of information are provided by experts through specialized websites and wikis, providing easily accessible data resources for free. According to [5], corpus linguistics is an emerging area of cross-domain study, which contains qualitative and quantitative approaches, with the purpose to understand how people use language in different contexts and how they use the corpus to analyze the collective use of language. Moreover, the extraction of information from the text is an important task in text mining, and its main objective is to extract structured information from c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 150–161, 2018. https://doi.org/10.1007/978-3-319-98998-3_12

An Automatic Approach to Generate Corpus in Spanish

151

unstructured or semi-structured text. Also, it allows extracting semantic relations, terminology, lexicon, among others [13]. For this reason, this work proposes to “Automatic approach to Generate Corpus in Spanish” (AGCS), which is a compiler of automatically digital Wikipedia documents and web pages, This component can be configured with a specific domain to facilitate the construction of specialized lexicons and terminologies. Another purpose is to provide the Center of Excellence and Appropriation in Big Data and Data Analytics (CAOBA) a linguistic tool that automates the extraction of information in Spanish in specific domains. Besides facing and giving solutions to the new challenges posed using language in digital media. Our approach is based on the Design Science Research in Information Systems technique developed by [7,23,24], which consists in the design of a sequence of activities by an expert that produces an innovative and useful device for a problem. The artifact must be evaluated to ensure its usefulness for the specified problem and must contribute in a novel way to the investigation; Also, you must solve a problem that has not yet had been solved or provide a more effective solution. This paper organized as follows: the Sect. 2 contains the related work. Next, described our approach in Sect. 3. Continue, Sect. 4 experiment and result. Finally, presents the Sect. 5 conclusion and future work.

2

Related Work

Currently, a large number of Web Data Extraction systems are available as commercial products, even if an increasing number of free and open source alternatives to commercial software are now entering the market [11]. The investigations carried out by the authors [16,18,19,21] show recovery techniques automatically in generic information sources such as Wikipedia. At the same time, the authors [8,12,17,25] show approaches to how WordNet specific terms are used. On the other hand, the [18,31] authors describe techniques such as Web Scraping or deep search algorithms in web pages, manipulating the structures or work mark language - HTML or in the Model in Objects for the Representation of Documents - DOM to extract specific information in the domains. Also, the authors of [15,18] carry out the extraction of contents in different languages through the Extensible Markup Language - XML using Really Simple Syndication - RSS. Although many works focus on particular domains, the construction of new knowledge domains must be fed from open sources, because in these sources can be extracted new terms and lexicons that enrich the quality of knowledge domain. For this reason, it should be noted that the investigations mentioned above carry out the process of selecting information and terms manually or semi-automatically. Besides, his contribution focuses on the semantics, morphology, and syntax of texts. Therefore, knowing the structure of the data from the information sources and determining the quality of the results obtained in the searches of information in a particular domain, is a significant advantage, since

152

E. Puertas et al.

it allows to have better quality in the extraction of data. Below, the proposals are described in detail.

3

Approach

The approach is organized around the process and the algorithms used to build an automatic corpus in Spanish based on digital information sources, such as Wikipedia and Web pages. Figure 1 illustrates the process used and its description. Subsequently, the algorithms used are described in detail.

Fig. 1. Process automatic generator of corpus in Spanish.

The process begins when a user enters a set of keywords and URLs related to a particular context. Initially, the Wikipedia Extraction module calculates the domain region and identifies nearby articles using the seed words supplied by the user. Then, the web page extraction module get the adjacent DOM elements from the provided URLs and defaults the patterns. Subsequently, articles and web pages processed by the Text Analysis Module are retrieved by identifying the language of the text, eliminating labels, eliminating stopword, tokenizing and extracting n-grams less than 4. Finally, articles and web pages are normalized using a similar structure of data, and they are also exported in an XML file. The algorithms used in each of the modules are described below. Wikipedia Extraction (WE) obtains the information of Wikipedia articles in a subject using a set of seed terms, which allow calculating the domain regions using the categories of seed articles [1]. Subsequently, the articles adjacent to the initial article were identified by identifying articles close to the items identified in the previous step. Next, the content of the article, subtitles, categories, and links to other articles extracted was. Afterward, tokens are extracted from the links to identify articles related to the parent article and the stopwords [20] removed are from the content of the article. Finally, the text is normalized

An Automatic Approach to Generate Corpus in Spanish

153

and expertly extricates data in Extensible Markup Language (XML)[4] that contains an identifier, the title of the document, subtitles, and content. To expand the detail of the activities implemented in the process of extracting articles in Wikipedia, we developed an algorithm called Wikipedia Extraction. The details of the algorithm are shown in Table 1. Table 1. Description of Wikipedia Extraction algorithm. Algorithm: Wikipedia Extraction Input: terms seeds S = {T1 , T2 , ...Tn } Procedure: 1: Construct domain region G = (S,W). S is a finite set of seed terms. W is a finite set of articles in Wikipedia.W = (Wt , Wst , Wc , Wcont , Wlink ) where: Wt is the title of the article. Wst = {subtitle1 , subtitle2 ...subtitlen }, ∀ subtitlei ∈ Wi Wcont is the content of the article. Wc = {categorie1 , categorie2 ...categorien } Wlink = {link1 , link2 ...linkn }, ∀ linki ∈ Wi Domain Region ⇒ RD = {Wc : Wt = Si }. 2: Search for neighboring or adjacent articles. WN is a finite set of articles close to the seed articles. WN = { WN ⊂ Wi : ∀ WN ∈ RD } 3: Search articles children found in 2. Wchild it is a finite subset of articles that belong to a seed article. Wchild = { Wchild ⊂ Wi : ∀ Wchild ∈ RDparent } 4: Extract categories of items for children. Wchild it is a finite subset of categories that belong to a seed article. Wchildcategories = { Wchildcategories ⊂ Wiparent : ∀ Wchildcategories ∈ RDparent } 5: It follows with step 1, where the initial domain region is the domain region of children’s articles. 6:Continue with step 5 and verify that the depth of the search is equal to 2. 7: Then, token and stopword [20] are eliminated and the text of the extracted articles is normalized. 8: Return Dictionary of extracted articles (identifier, extraction source, title, subtitle, content). Output: Collection of articles

Web page extraction (WPE), in this component the technique proposed by the authors is used [1] where the structure of the pages is analyzed, by means of the adjacent detection of all the sets of similar records of the tree Document Object Model DOM [33], with the aim of identifying patterns and being able to extract texts with quality. Next, the content of the web page, title, subtitles, and links to another web page extracted was. Subsequently, Hypertext Markup Language (HTML) [2] tags removed are from the content of documents. Afterward, tokens are extracted from the links to identify articles related to the parent article and the stopwords [20] removed are from the content of the article. Finally,

154

E. Puertas et al. Table 2. Description of Web Page Extraction algorithm.

Algorithm: Web Page Extraction Input: URL’s or Domain Procedure: 1: Identify patterns in the elements of the DOM tree of the web pages of a particular web domain. Regular expressions of identified patterns are also created P = (E, A, D). E is a finite set of Element DOM. Ei = {< body >, < meta >, < p > ... < tag >i }, ∀ Ei ∈ Element DOM. A is a finite set of Attribute of Element DOM. Ai = {class, lang, ...attributei },∀ Ai ∈ Attribute of Element DOM. D is a finite set of HTML text tags. Di = {< h1 >, < h2 >, < text > ... < tagt ext >i }, ∀ Di ∈ HTML tag. 2: Create a vector of web page objects that exclude web pages that lead to a vector page. W Pobject is a finite set of web page. W Pobject = {W Pobject1 , W Pobject2 , ...W Pobjectn }, ∀ W Pobject ∈ Domain. 3: Iterate on the vector of web pages and the pages are extracted daughters of each web page. If the daughter page does not exist in the object vector it is added. W Pchild is a finite set of web page daughters corresponding to a web page father of the domain. W Pchild = {W Pchild1 , W Pchild2 , ...W Pchildn }, ∀ W Pchild ∈ W Pobject Domain and ∈ W Pobject . 4: Iterate in the vector of the web page extracting the patterns found in step 1, such as titles, subtitles and content.  W Pobject is a finite set of web page with title, sub-title, and content. title is a h1 tag on the web page. sub-title is a h2 tag on the web page. content is a finite set of string, statement, sentences and stopword.   W Pobject = {W Pobject1 , W Pobject2 , ...W Pobjectn }, ∀ W Pobject ∈ Domain. 5: Then, tags, token and stopword [20] are eliminated and the text of the extracted articles is normalized.  W Pobject is a finite set of web page with title, sub-title, and content without stopword.   W Pobject = {W Pobject1 , W Pobject2 , ...W Pobjectn }, ∀ W Pobject ∈ Domain. 6: Return Dictionary of extracted articles (identifier, extraction source, title, subtitle, content). Output: Collection of articles

the text is normalized and expertly extricates data in Extensible Markup Language (XML) [4] that contains an identifier, the title of the document, subtitles, and content. To expand the detail of the activities implemented in the process of extracting web page, we developed an algorithm called Web Page Extraction. The details of the algorithm showed in Table 2. Text Analysis is a component that developed in CAOBA with the purpose of grammatically analyzing texts in Spanish through a RESTful web service [29] that runs all the time to receive requests. Its responsibility is to provide natural language processing functionalities, necessary to analyze written texts, as well as

An Automatic Approach to Generate Corpus in Spanish

155

Table 3. Description of Text Analysis algorithm. Algorithm: Text Analysis Input: Text = {statement1 , statement2 ...statementn } statement: something that someone writes. Task: 1: Language Detection This task is to determine the language of a given content. 2: Tokenization Tokenization is the process of marking a text string in a sequence of tokens. string is a finite set of statement. token is a finite set of elements delimited by spaces in a string. T = {token1 , token2 , token3 ...tokenn },∀ tokeni ∈ string 3: Stemming This task reduces the words in their base or root form. S = {wordRoot1 , wordRoot2 , wordRoot3 ...wordRootn } 4: Lemmatization This task groups the flexed forms of a word to analyze it as a single element. lemma is a finite set of flexible forms of a word. L = {lemma1 , lemma2 , lemma3 ...lemman }, ∀ lemmai ∈ flexible forms of a word. 5: POS This task automatically extracts part of the related word Universal POS tags [27]. P = {word1 : post ag, word2 : post ag, word3 : post ag...wordn : post ag} 6: n-gram Generation This task generates a contiguous sequence of n elements (phonemes, syllables, letters, words or base pairs) of a given text.



k=1

P (w1n ) =

P (wk |w1k−1 )

n

8: NLP This task executes all the previous tasks. NLP is a finite set of Tokens, Stemming, Lemm, P. NLP = (T, S, L, P ) 9: Return A JSON object depending on the selected task.. Output: JSON [6] object

other fields in chain format. The functionalities include text preprocessing tasks, such as tokenization, derivation, elimination of completion words, deletion of symbols and special characters and identification of emoticons. The details of the algorithm activities showed in Table 3. The development of this approach is based on the Design Science Research in Information Systems technique developed by [7,23,24], the best software engineering practices, Agile Unified Process (AUP) [9], and SCRUM [30].

4

Experiments and Results

In this section we evaluate the experimental result of the proposed Automatic approach Corpus Generator in Spanish. There is no comparative evaluation method for the corpus extraction process in Spanish in specific domains. Therefore, we only compare our results with the results obtained by human experts.

156

E. Puertas et al.

First all, we will describe the input data and the characteristics of the execution environment for each module. Subsequently, the execution time is detailed, and exceptions presented for each module. Finally, the results obtained and their conclusions illustrated. 4.1

Input and Environment

For the module of Wikipedia Extraction is executed with a set of keywords or terms relevant to the domain that you want to build the corpus, such as bank, savings account, debt, investment. In the same way, for the module of Web Page Extraction is provided the URL to which you want to perform the extraction, for instance: https://www.banco.com.co/. It should be noted that for the first module the words that are introduced must be in Spanish and for the second module, the page or the domain must be in Spanish. This is because the models used for this approach are only in Spanish. The physical and logical characteristics of the execution environment are detailed in Table 4. Table 4. Characteristics of the execution environment. Environment

Library

Machine

Environment: Anaconda v5.1, Language: Python v3.6 [32], IDE: PyCharm 20018.1.2 citejetbrainsweb, OS: Windows 10 x64

Wikipedia-API v0.3.7 [26], wikipedia v1.4.0 [26], textblob v0.15.0 [22], beautifulsoup v4.6.0, spaCy v2.0.11.,

Processor: Intel (R) Core (TM) i7-6500U CPU @ 2.50 GHz, 2592 Mhz, 2 main processors, 4 logical processors, Physical memory (RAM) 8.00 GB, OS Microsoft Windows 10 Enterprise x64

4.2

Execution Time and Exceptions

According to the characteristics mentioned in the previous section, the estimated execution time for the Wikipedia Extraction modules was 1.2 h and Web Page Extraction was 9 h. On the other hand, the exceptions presented at the moment of executing the modules were; the request ran out, the URL provided was somehow invalid and could not decode the content of the response. The last exception occurs because the HTML pages consulted, sometimes were not encoded in Spanish and Wikipedia sometimes very minimal, there was only that article in English. The extension modules identified once the exception stored in a log file, with the purpose of detecting errors or frequent expeditions in the developed modules.

An Automatic Approach to Generate Corpus in Spanish

4.3

157

Result and Validation

After executing the modules with the parameters mentioned in last section. For the extraction of Wikipedia, 54 articles relevant to the domain were identified. In the same way for the extraction of web pages, 1393 relevant pages have been defined. Additionally, the quality of the lexicon is determined by extracting terminology from the built corpus, identifying n-grams [14] less than 4 and associating the following linguistic rules; noun, determinant + noun, adjective + noun, noun + determinant + adjective, determinant + noun + adjective. Finally, the terms extracted from the corpus are evaluated by a couple of experts in the domain, who verified the relevance of the terms in the study domain. Table 5 shows the detail of the articles and web pages extracted, in addition to the Precision, recall [34] and F-score [28]. Table 5. Precision, accuracy, and F-measure of document extracted. Wikipedia Web page T.Doc relevant

54

1393

T.Doc retrieved 64

1475

Doc irrelevant

18

158

Doc relevant

46

158

Precision

72%

89%

Recall

84%

94%

F-score

78%

92%

To determine the quality of the lexicon, 1317 terms were extracted among uni-grams, bi-grams, and tri-grams related to the linguistic rules mentioned in the previous section. Figure 2 shows the details of each subgroup of extracted n-grams. For the evaluation process. First, the inverse frequency of (TF-IDF) [3] document is calculated for 1317 terms. Then, the terms with TF-IDF less than 10% are excluded, resulting in 841. Then, the 841 terms are evaluated by experts in the domain. Finally, there are 640 terms corresponding to the domain. Figure 3 shows the dispersion of the terms extracted with TF-IDF greater than 10. The validation of terms was done by a couple of experts in the domain, where the terms were validated one by one if they were relevant or not to the domain. Below is the top 20 of the most frequent terms of the domain studied; inflation, bank, crisis, BBVA, debt, overdraft, pension, income, save, CDT, restructuring, transfer, check, tax, law, GDP, lease, cash, price, and fund.

158

E. Puertas et al.

Fig. 2. Terms extracted from the corpus

Fig. 3. Diagram of terms dispersion

5

Conclusion and Challenges

We proposed and evaluated a novel approach to extract text from unstructured Wikipedia articles and Web Page. The approach is based on the analysis of the domain regions of the Wikipedia articles and the analysis of the elements

An Automatic Approach to Generate Corpus in Spanish

159

of the DOM tree by identifying patterns in the web pages. The preliminary results reported in this document helped us put into practice key elements for the construction of corpus in Spanish, as well as text mining techniques. In the challenges that exist today, the automatic extraction of documents is associated to the complexity and dynamics of the concepts, this has led many strategies to opt for automatic learning systems or some training algorithm that generates some characteristics of the corpus for determining some type of class as it is in the polarity analysis detects positive and negative automatic words. However, the knowledge bases manage to have a greater capacity to analyze from a given domain under the semantic rules that this may have, the approximation of this work seeks to explore initially the automation of these knowledge bases, however the manual qualification is still vital for the result to be optimal [10]. Other approaches to the construction of knowledge or the generation of dynamic rules would be important to take into account in this type of processes, also the incorporation of cognitive computing of which IBM through Watson makes use of systems that allow performing the same tasks that are addressed from the NLP, and the challenge that social networks mean and the possibility of multicultural interaction that also impregnates the multiculturalism of knowledge [10]. Acknowledgements. The tool presented was carried out within the construction of research capabilities of the Center for Excellence and Appropriation in Big Data and Data Analytics (CAOBA), led by the Pontificia Universidad Javeriana, funded by the Ministry of Information Technologies and Telecommunications of the Republic of Colombia (MinTIC).

References 1. Arnold, P., Rahm, E.: Automatic extraction of semantic relations from wikipedia. Int. J. Artif. Intell. Tools 24(2), 1540010 (2015) 2. Berners-Lee, T., Connolly, D.: Hypertext markup language - 2.0. Technical report, USA (1995) 3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003) 4. World Wide Web Consortium, et al.: Extensible markup language (xml) 1.1 (2006) 5. Crawford, W., Csomay, E.: Doing Corpus Linguistics. Routledge, Abingdon (2015) 6. Crockford, D.: The application/json media type for javascript object notation (JSON) (2006) 7. Drechsler, A., Hevner, A.: A four-cycle model of is design science research: capturing the dynamic nature of is artifact design. In: Breakthroughs and Emerging Insights from Ongoing Design Science Projects: Research-in-Progress Papers and Poster Presentations from the 11th International Conference on Design Science Research in Information Systems and Technology (DESRIST). DESRIST 2016, 23–25 May 2016, St. John, Canada (2016) 8. Dutta, B., Chatterjee, U., Madalli, D.P.: YAMO: yet another methodology for large-scale faceted ontology construction. J. Knowl. Manag. 19(1), 6–24 (2015) 9. Edeki, C.: Agile unified process. Int. J. Comput. Sci. 1(3), 13–17 (2013)

160

E. Puertas et al.

10. Fan, J., Kalyanpur, A., Gondek, D.C., Ferrucci, D.A.: Automatic knowledge extraction from documents. IBM J. Res. Dev. 56(3.4), 5:1–5:10 (2012) 11. Ferrara, E., De Meo, P., Fiumara, G., Baumgartner, R.: Web data extraction, applications and techniques: a survey. Knowl.-Based Syst. 70, 301–323 (2014) 12. Gharib, T.F., Badr, N.L., Haridy, S., Abraham, A.: Enriching ontology concepts based on texts from WWW and corpus. J. UCS 18(16), 2234–2251 (2012) 13. Jiang, J.: Information extraction from text. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 11–41. Springer, Boston (2012). https://doi.org/10.1007/978-14614-3223-4 2 14. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall series in artificial intelligence, pp. 1–1024 (2009) 15. Kanakaraj, M., Kamath, S.S.: NLP based intelligent news search engine using information extraction from e-newspapers. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5. IEEE (2014) 16. Kanavos, A., Makris, C., Plegas, Y., Theodoridis, E.: Ranking web search results exploiting wikipedia. Int. J. Artif. Intell. Tools 25(03), 1650018 (2016) 17. Kozareva, Z., Hovy, E.: Tailoring the automated construction of large-scale taxonomies using the web. Lang. Resour. Eval. 47(3), 859–890 (2013) 18. K¨ u¸cu ¨k, D., Arslan, Y.: Semi-automatic construction of a domain ontology for wind energy using wikipedia articles. Renew. Energy 62, 484–489 (2014) 19. Lahbib, W., Bounhas, I., Slimani, Y.: Arabic terminology extraction and enrichment based on domain-specific text mining. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 340–347. IEEE (2015) 20. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014) 21. Liu, S., Zhang, C.: Termhood-based comparability metrics of comparable corpus in special domain. In: Ji, D., Xiao, G. (eds.) CLSW 2012. LNCS (LNAI), vol. 7717, pp. 134–144. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-363375 15 22. Loria, S., et al.: TextBlob: simplified text processing. Secondary TextBlob: simplified text processing (2014) 23. March, S.T., Smith, G.F.: Design and natural science research on information technology. Decis. Support Syst. 15(4), 251–266 (1995) 24. March, S.T., Storey, V.C.: Design science in the information systems discipline: an introduction to the special issue on design science research. MIS Q. 32, 725–730 (2008) 25. Medelyan, O., Witten, I.H., Divoli, A., Broekstra, J.: Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 3(4), 257–279 (2013) 26. Morell, M.F.: The Wikimedia foundation and the governance of Wikipedias infrastructure: historical trajectories and its hybrid character. In: Critical Point of View: A Wikipedia Reader, pp. 325–341 (2011) 27. Petrov, S., Das, D., McDonald, R.: A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086 (2011) 28. Powers, D.M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011) 29. Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media, Inc., Sebastopol (2008)

An Automatic Approach to Generate Corpus in Spanish

161

30. Schwaber, K., Beedle, M.: Agile Software Development with Scrum, vol. 1. Prentice Hall, Upper Saddle River (2002) 31. V´ allez, M., Pedraza-Jim´enez, R., Codina, L., Blanco, S., Rovira, C.: A semiautomatic indexing system based on embedded information in HTML documents. In: Library Hi Tech, vol. 33, no. 2, pp. 195–210 (2015) 32. Van Rossum, G., Drake, F.L.: Python Language Reference Manual. Network Theory, Bristol (2003) 33. Wood, L., Nicol, G., Robie, J., Champion, M., Byrne, S.: Document object model (DOM) level 3 core specification (2004) 34. Zhu, M.: Recall, precision and average precision. Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, vol. 2, p. 30 (2004)

Comparing Graph Similarity Measures for Semantic Representations of Documents Rub´en Manrique(B) , Felipe Cueto-Ramirez , and Olga Mari˜ no Systems and Computing Engineering Department, School of Engineering, Universidad de los Andes, Bogot´ a, Colombia {rf.manrique,f.cueto10,olmarino}@uniandes.edu.co

Abstract. Documents semantic representations built from open Knowledge Graphs (KGs) have proven to be beneficial in tasks such as recommendation, user profiling, and document retrieval. Broadly speaking, a semantic representation of a document can be defined as a graph whose nodes represent concepts and whose edges represent the semantic relationships between them. Fine-grained information about the concepts found in the KGs (e.g. DBpedia, YAGO, BabelNet) can be exploited to enrich and refine the representation. Although this kind of semantic representation is a graph, most applications that compare semantic representations reduce this graph to a “flattened” concept-weight representation and use existing well-known vector similarity measures. Consequently, relevant information related to the graph structure is not exploited. In this paper, different graph-based similarity measures are adapted to semantic representation graphs and are implemented and evaluated. Experiments performed on two datasets reveal better results when using the graph similarity measures than when using vector similarity measures. This paper presents the conceptual background, the adapted measures and their evaluation and ends with some conclusions on the threshold between precision and computational complexity.

1

Introduction

In recent years, great efforts have been made in the development of technologies and applications that incorporate semantic models exploiting the relational knowledge found in Knowledge Graphs (KG). A Knowledge Graph is defined as a large group of facts about a set of entities described by the classes that compose it and instances of these classes in a particular ontology [5]. KGs like DBpedia1 , Yago2 and BabelNet3 incorporate knowledge, which is freely accessible and supported by the mature technologies of the Semantic Web, from multiple domains. 1 2 3

http://dbpedia.org/. www.yago-knowledge.org/. http://babelnet.org/.

c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 162–176, 2018. https://doi.org/10.1007/978-3-319-98998-3_13

Comparing Graph Similarity Measures

163

To respond to the nature of this large, open, machine-readable knowledge, new representations of semantically-enriched documents have been used in different tasks such as content recommendation [10,14], user profiling [11,17], document retrieval [3,20] and query reformulation [12]. These representations are constructed from the concepts identified in the textual information of the document through Named Entity Recognition and Entity Linking tools. Using these concepts, novel algorithms have been developed to extract new, related relevant information from the KGs [10,15,18]. This extracted information is interconnected and expresses relationships between the concepts at the type level (classes), topics and hierarchies (categories), and characteristics expressed through the properties defined by the KG ontology. In essence, semantic representations are graphs whose nodes represent concepts and whose edges represent the existence of a semantic relationship between the connected nodes. Instead of exploiting this multidimensional graph structure, most applications use a “flattened” version that considers the set of nodes identified in combination with some weighting measure [10,17,20]. Even though this weighting measure may consider the importance or interconnection of the concept in the graph, much of the structural information of the graph is discarded due to flattening into vectors. The previous problem can be attributed to the fact that applications require the computation of distances and similarities between the document representations. Vectorial representations can be implemented efficiently and it is also possible to apply simple algorithms to them, such as cosine similarity. As a result of recent developments in graph matching, different algorithms have been proposed to compare graph-based representations [1]. These algorithms can be used to compare two semantic representations without flattening the representations. However, some of these proposals need to be adjusted to deal with the characteristics of semantic representations, particularly the absence of a common set of nodes between two representations. In this paper, we focus on comparing different graph similarity measures proposed in the literature about semantic representations. Using a semantic representation proposed by the authors in previous work, we implement different algorithms to calculate similarities. To compare these algorithms, we use two different datasets. The Lee50 dataset [9] consists of a set of short documents in which each pair of documents is scored according to their semantic relatedness by ten human annotators. The other set, a scholarly paper recommendation dataset [10], contains the profiles of eleven users and a corpus of more than 5000 academic papers both represented through semantic representations.

2

Related Work

With the growth and popularity of KGs, the use of semantic representations that exploit semantic content has been increasing. Semantic representations have been used to enrich vector spaces in information retrieval tasks. First, [20] shows that including a semantic layer that takes advantage of the connectivity and

164

R. Manrique et al.

hierarchical information of the concepts in the KG improves traditional textbased retrieval. Later, [3] proposes a representation for queries and documents using multiple semantic layers that exploit information from multiple KGs. These semantic layers include the unified resource identifier (URI) of each annotated concept to link them to DBpedia, YAGO and to a frame containing temporal values explicitly expressed in the text or associated with DBpedia concepts. In content-based recommendation tasks, semantic representations have been used for user modeling in social networks [17] and modeling user research interests [10,11]. These applications, which are based on semantic representation, have shown superior results in comparison with other representations such as the classical bag of words vector space model. Nevertheless, they do not exploit the structural information of the graph that is produced by the semantic connections of the concepts since their measures are based on a flattened representation of the graph. On the other hand, as a result of recent developments on graph matching, different algorithms have been proposed to solve the problem of comparing graph-based representation [1]. Though little has been explored in this regard for semantic representations, these graph matching algorithms can be adapted in order to support the calculation of similarities. Therefore, it is not necessary to flatten the representation, thus allowing the structure of the graph to be taken into account. Some of the most important measures are presented in the following paragraphs. It is also important to emphasise that there is not a single criterion to choose the best measure since their performance does greatly depend on the characteristics of the graph [16]. As such, experimentation is the most appropriate way to select the best algorithm for the problem at hand [8]. Since the nodes in the semantic representations are unambiguous concepts identified with URIs, we are interested in the algorithms that take advances of this known correspondence between nodes. A basic strategy known as VEO (vertex edge overlap) [16] measures the similarity between two graphs by calculating the overlap between their edges and nodes, ignoring the edge or node weights. GED (graph edit distance) is a more flexible similarity measure that contemplates the differences in edges and nodes as well as the set of associated weights [6]. There are many adaptations of GED; however, we use the bipartite variation of GED [4] to limit algorithm complexity as much as possible. Another graph similarity measure we consider is signature similarity [16]. This method seeks to create a signature vector of 1 s and 0 s for each graph using the weights of the nodes. Then, it compares the vectors by counting the amount of matches between the two. It normalizes the result and provides measurements of similarity between 0 and 1. A different approach to graph similarity is the different variations that have been proposed for the MCS (Maximum common subgraph) algorithm [2]. The MCS is the largest sub-graph that is common in the considered graphs. Different metrics use the size of the MCS as an indicative of similarity. The size of a sub-graph can be measured in several ways, however, in this paper we will focus on the amount of nodes. This method is particularly useful in biological and chemical analysis [21].

Comparing Graph Similarity Measures

165

There is little research in which measures of similarity based on graphs have been applied to semantic representations. Specifically, the work done in [18] is the closest to the purpose of our research. The authors present an approach to the calculation of document semantic similarity over graph-based structures. A variation of GED is used to measure the similarity of two semantic representations. Different from the previous work, we implement and compare different graph-based similarity measures on top of a more refined semantic representation that contemplates expansion and filtering processes.

3

Graph-Based Similarity Measures

To properly define and understand these algorithms, we must first address the issue of the set theory context of the problem and define the notation to explain the algorithms. It is important to clarify that vertex and node are one and the same. Therefore, we henceforth understand G = (V, E) as a directed graph, with a set V of vertices and a set E of edges, with both edge and vertex weights. We define an edge as a 2-tuple e = (o, d) with a origin o and a destination d. We will refer to the edge and vertex weight as w(e) and w(v) respectively. For the complexity of each algorithm we will use Big O notation. The vertices and the edges are both kept in 2 hash tables. For the vertices, we use the label as the hashed key. For the edges we combine the origin and the destination to form a unique hashed key. The size of set V and set E are known. This will allow us to estimate the complexity of the most efficient version of each algorithm. 3.1

Vertex Edge Overlap

Among the simpler algorithms, we find VEO (Vertex edge overlap). This method seeks to simplify the problem of graph matching by counting the total number of vertices and edges that match and dividing the result by the sum of the total number of vertices and of the total number of edges on each graph. This factor is multiplied by 2 in order to normalize the result to the correct scale. V EO(G, G ) = 2

|V ∩ V  | + |E ∩ E  | |V | + |V  | + |E| + |E  |

(1)

This algorithm can be applied on any graph structure since it only uses variables found on the graph. Even still,it is also an extremely narrow approach since it does not take information such as vertex or edge weight or path information into account. This approach is based on the simple form of the GED (Graph edit distance) algorithm and is normalized to a scale of 1 to 0, when 1 is completely similar, and 0 is completely dissimilar. The complexity of this algorithm is O(V + E) since it only requires a single iteration over the sets of one of the graphs in order to find the matching pairs in both vertices and edges.

166

3.2

R. Manrique et al.

Node Graph Edit Distance

To properly take advantage of the information found on the semantic models, we devised a highly modified version of GED. The first issue was creating a method that takes the weight of the vertices into account. This algorithm would need to provide a normalized measure of similarity that used the weight of the nodes. 

GEDnodes (G, G ) =         w(V ) + w (V ) − w(V ∩ V  ) − w (V ∩ V  ) + |w(V ∩ V  ) − w (V ∩ V  )|        w(V ) + w (V ) − w(V ∩ V  ) − w (V ∩ V  ) + max(w(V ∩ V  ), w (V ∩ V  ))

(2)

We can understand the dividend as the sum of vertex weights found only in G, plus the sum of vertex weights found only in G’, plus the sum of the differences in weights between the vertex intersections between G and G’. The divisor is the the total sum of vertex weights found only in G, plus the total sum of vertex weights found only in G’, plus the sum of the maximum weights between of the vertex intersections between G and G’. The complexity of this algorithm is O(V + V’) since it requires iterating over the vertices of both graphs in order to find the sum of weights in each one. 3.3

Edge Graph Edit Distance

Using a similar approach to node graph edit distance, we can obtain the edge graph edit distance. This formula works for both directed and non-directed graphs. With that in mind, we can convert a directed graph into a non-directed graph by adding the weights of corresponding opposite edges. This allows us to obtain a higher, and in some cases, more appropriate similarity measure. 

GEDedges (G, G ) =         w(E) + w (E ) − w(E ∩ E  ) − w (E ∩ E  ) + |w(E ∩ E  ) − w (E ∩ E  )|        w(E ∩ E  ) − w (E ∩ E  ) + max(w(E ∩ E  ), w (E ∩ E  )) w(E) + w (E ) −

(3)

We can understand the dividend as the total sum of edge weights found only in G, plus the total sum of edge weights found only in G’, plus the total sum of the differences in weights between the edge intersections between G and G’. The divisor is the the total sum of edge weights found only in G, plus the total sum of edge weights found only in G’, plus the sum of the maximum weights of the edge intersections between G and G’. The complexity of this algorithm is O(E + E’) since it requires iterating over the edges of both graphs in order to find the total sum of weights in each one. 3.4

Total Graph Edit Distance

Given the values of node similarity and edge similarity, we can create a more complete measure by adding them together. GED(G, G ) =

GEDnodes (G, G ) + GEDedges (G, G ) 2

(4)

Comparing Graph Similarity Measures

167

This formula creates a new measurement that equates the similarity weight of vertex similarity and edge similarity. It provides a normalized value between 0 and 1. The complexity of this algorithm is O(V + V’ + E + E’) since it is the sum of both the edge graph edit distance and the node graph edit distance. 3.5

Maximum Common Sub Graph

The MCS (Maximum Common Subgraph) of two graphs can be calculated by finding the common sub-graph with most nodes. In order to do this, MCS algorithm finds all common sub-graphs, and then calculates the amount of nodes in the largest one. To normalize the result, it divides this amount by the number of nodes in the graph with the most nodes. M CSnodes (G, G ) =

|M CS(G, G )| max(|V |, |V  |)

(5)

This formula creates a value between 1 and 0, where 1 is completely similar, and 0 is completely dissimilar. The MCS is calculated via Algorithm 1. Algorithm 1. Maximum common subgraph Require: G = {V, E}, G = {V  , E  } function mcsN odes(G, G ) currentN odeM aximum := 0 for all v ∈ V do count =: 0 visited = []  Keeps track of explored nodes in current subgraph result =: mcsN odesRecursor(count, visited, v, G, G ) if result[0] > currentN odeM aximum then currentN odeM aximum := count end if end for return currentN odeM aximum end function function mcsN odesRecursor(count, visited, v, G, G ) outwardEdges := E[v]  Gets all out-edges of a node inwardEdges := E[v]  Gets all in-edges of a node if v ∈ / visited then  if v ∈ V then count := count + 1 visited := visited + v for all e ∈ outwardEdges do if e ∈ E  then r = e[1]  Extracts destination node from edge result := mcsN odesRecursor(count, visited, r, G, G ) count := result[0] visited := result[1] end if end for for all e ∈ inwardEdges do if e ∈ E  then r = e[0]  Extracts origin node from edge result := mcsN odesRecursor(count, visited, r, G, G ) count := result[0] visited := result[1] end if end for end if end if return (count, visited) end function

168

R. Manrique et al.

In order to find a subgraph, Algorithm 1 recursively travels through the common connections between the two considered graphs. The algorithm starts iterating over all the nodes in one of the graphs. Once it finds a matching node, the algorithm adds it to the current subgraph hash-table and iterates through its edges looking for common connections. If an edge match is found, the algorithm adds the destination node and subsequently processes the edges of this new node. The MCS algorithm is usually defined in the context of non-directed graphs [2]. Since our graphs are directed, we adapted the original algorithm to explore nodes through outgoing and incoming edges. We accomplish this by iterating over outgoing edges and then iterating over incoming edges. We keep track of explored nodes by adding their labels to a hash-table. If the two nodes in a pair are connected by both an incoming and an outgoing edge, we only explore the subsequent node once and exclude it upon the second inspection. Once a subgraph has been identified and there are no more matching edges found on the border nodes, the algorithm detects the subgraph size and compares it with the largest subgraph previously found. This algorithm will provide us with the size of the largest possible common subgraph, in terms of nodes included, between two graphs. Finding the MCS is a np-complete problem. Nevertheless, it is possible to calculate the worst case time complexity. As mentioned in [2], the worst case time complexity of the MCS algorithm is O((V ∗ V  )V ).

4

Semantic Representation

In this section, we explain the semantic representation building process in general. The essential information is taken from a KG using the concepts found in the document text. A KG consists of a set of resources4 C and literals L that are interrelated through a set of properties/predicates P . Under an RDF model, KG data consists of a set of statements S ⊂ C × P × (C ∪ L). Each s ∈ S is a triplet composed of a subject, a predicate, and an object/literal. For this paper, DBpedia was employed as KG; however, other KGs can also be employed or combined to build the representation. Once the KG is defined, the representation of a document is constructed following the process depicted in Fig. 1. The process begins with the extraction of the concepts mentioned in the text (i.e. annotations) contained by the document. DBpedia Spotlight5 and Babelfy6 two automatic entity linking and word sense disambiguation tools were used for this task. Then, the Expansion Module receives the initial set of annotations and expands it through the rich number of relationships in the KG. In this module, new expanded concepts that are not

4 5 6

Hereafter, we use concept and entity interchangeably to refer to resources of the KG. http://www.dbpedia-spotlight.org/. http://babelfy.org/.

Comparing Graph Similarity Measures

169

Fig. 1. General overview of the semantic representation process

found in the text, but are related with the annotation, are incorporated into the representation. We follow two different expansion approaches: – Category-based expansion: We add the hierarchical information of each concept. We find such information in DBpedia through the Dublin Core dct:subject property. – Property-based expansion: The semantic representation is enriched with the set of resources recovered by following the set of properties of the KG ontology. As a result of the expansion, an initial set of nodes for the representation is obtained. A weight for each node is assigned by the Weighting Module that checks the importance of each concept for the document. For annotations and expanded concepts, different weighting strategies are employed. Finally, in the Filtering Module, we apply a filtering technique to select concepts that are highly connected such that weakly connected concepts are discarded. The strategy seeks connection paths of length l between annotations because it uses these to create edges in the representation and assign the corresponding edge weight. Using previous results, we limit the path length to a maximum of l = 2. From these processes, a graph whose nodes are concepts and edges expressing the existence of a linkage between two concepts in the KG is built. A more detailed description of each of these modules can be found in previous works by the authors [10,11]. The resulting representation follows Definition 1. Definition 1. The semantic representation Gi of a document ri is a directed weighted graph Gi = (Vi , Ei , w(ri , c), w(ri , e)), where both nodes and edges have an associated weight defined by the functions w(ri , c) : V → R+ and w(e) : E → R+. The set of nodes Vi = {c1 , c2 , ..., ck } are concepts belonging to the space of a KG (ck ∈ C). The node weight w(ri .c) denotes how relevant the node c is for the document. A connection edge between two nodes (ca , cb ) represents the existence of almost one statement s in the KG that links both concepts. The weight of the edge w(e) denotes how relevant this linkage in Gi is. The definition above refers to a directed graph to the extent that the direction of the relationships found in the KG are preserved. Nevertheless, it is also possible to build a non-directed version by unifying the vertices that share the same nodes but go in opposite directions. The weight of the resulting non-directed vertex is the sum of the directed opposite vertices. We also use this non-directed version

170

R. Manrique et al.

of the semantic representation to evaluate the contribution of the direction in the similarity calculation. Additionally, even when there is a loss of information, this non-directed version is lighter and reduces the computational cost. Finally, we define the flattened version of the representation as that which only preserves the set of nodes and their weights (Definition 2). Put simply, edges are removed from the representation. Since it is easy to transform this flattened version into a vector representation, measures such as the cosine similarity, L2norm or Manhattan distance can be used for the calculation of similarity between documents. For the flattened version, and following previous results [11], we use the cosine similarity. Definition 2. The flattened semantic representation Ri of a document r is a set of weighted KG entities/concepts. A weighted concept is a pair (c, w(ri , c)) where the weight w(ri , c) denotes how important the concept c is for the document ri , and is computed by a certain function w. Ri = {(c, w(ri , c))|c ∈ C}

5 5.1

(6)

Evaluation Datasets

Our evaluation aims to compare the graph similarity measures described above when the graphs are semantic representations of documents. To this end, we select two different datasets that have been used in the past. The first one, Lee50 [9] is a compilation of 50 short documents collected from the Australian Broadcasting Corporations news mail service. Each possible pair of documents was scored by ten human judges on their semantic relatedness. The final similarity judgment for every pair is obtained by averaging all annotation of the judges, so the final collection contains 1225 relatedness scores. With this dataset we can compare how well the combination of the representation and the graph similarity measures approximate the human notion of similarity. The second dataset, Man17 [10], was developed for the scholarly paper recommendation task. It contains eleven researcher profiles built from the concepts found in their open publications. For this dataset, the semantic representation is built for research profiles and the candidate corpus of documents (>5000 documents). Different from Lee50, documents in Man17 are larger since they are academic papers that usually contain more than 2000 words. Hence, the dataset has greater number of concepts in the text and a larger graph is produced in terms of the number of nodes and edges. For each profile, this dataset contains the set of relevant papers from the candidate set. The task here is try to recover relevant papers by comparing the research profile with the candidate corpus (i.e. content base recommendation) using the different graph similarity measures. In order to evaluate the performance on Lee50, we report the Pearson (r ) and Spearman correlation (ρ). According to [7], these correlation metrics are appropriate to evaluate relatedness measures and have been used in related work, so

Comparing Graph Similarity Measures

171

we are able to compare our results to other approaches. For the Man17 dataset, we use the following typical metrics for the evaluation of Top-N recommender tasks [19]: MRR (Mean Reciprocal Rank), MAP@10 (Mean Average Precision), and NDCG@10 (Normalized Discounted Cumulative Gain). Following the original paper, we select N = 10 as the recommendation objective [10]. In this data set, the relevance measures are binaries (i.e. the recommended documents are relevant to the user or not), so we use a binary relevance scale for the calculation of NDCG. The final NDCG is calculated averaging the results for each user profile. 5.2

Semantic Annotators and Path Length

We use DBpedia Spotlight (DBS) and Babelfy to annotate text documents. DBpedia Spotlight allows us to configure the level of annotation (precision/recall trade-off) by confidence/support parameters. Support parameters specify the minimum number of inlinks a DBpedia resource has to have in order to be annotated, while the confidence parameter controls the topical pertinence and the contextual ambiguity to avoid incorrect annotations as much as possible [13]. We define 5 different configurations for DBS: DBS1 (support: 5, confidence: 0.35), DBS2 (support: 5, confidence: 0.40), DBS3 (support: 5, confidence: 0.45), DBS4 (support: 10, confidence: 0.40), and DBS5 (support: 20, confidence: 0.40). We explore the influence of the confidence parameter with the first three configurations. Values higher than 0.45 are not considered since this would significantly reduce the number of annotations obtained. This can be particularly detrimental in short documents such as those handled in Lee50. For the support parameter we use values of 5, 10 and 20; our hypothesis is that the identification of highly specialized concepts may be affected by this parameter. For Babelfy, no special configuration was used and the complete set of annotations recovered are used in the semantic representation building process. As previously mentioned, the semantic representation input parameter is the path length (l) that specifies the maximum depth to look for connections between concepts in the KG. This parameter affects the edge composition, and thereby the graph structure. We want to explore the effect of this parameter in the graph similarity measures, so we define values of path length of l = 1 and l = 2.

6

Results

We report our results on the Lee50 dataset in Table 1. Each column in the table represents one of the similarity measures presented above. To differentiate the results obtained for the directed and non-directed versions of the semantic representations, we use the letter D to indicate directed or U to mean undirected. The column Flattened presents the results obtained with the flattened version of the representation (Definition 2). Finally, the column NodeAvg presents the average of nodes in each case. For each column, the best result at the level of each correlation measure is highlighted in bold.

172

R. Manrique et al.

Table 1. Lee50 dataset results. Correlation measure: Pearson (r ) and Spearman (ρ). l: path length parameter

The best results were obtained using the DGEDnode . Since there is no difference in nodes between the directed and undirected version of the graph, DGEDnode is equivalent to U GEDnode . We prefer DGEDnode because the additional step of merging edges is avoided. The superiority of this similarity measure is independent, in most cases, of other elements such as the annotation service and the path length. Regarding the contribution of the edges as the similarity measures DGEDedge and U GEDedge , the following is shown: (a) the direction seems to favor the Pearson correlation; however, better results were obtained at the Spearman correlation level using the undirected version; and (b) surprisingly, when combining the contribution of the edges and nodes (DGED and U DGE), there is no improvement in the results obtained compared to DGEDnode and U GEDnode . The behavior described in (a) can be attributed to increases in nonequivalent magnitude among the variables that are evaluated, in particular in the undirected version the changes of similarity are very low in comparison with its directed version. While the Pearson correlation is usually strongly affected by this, the Spearman correlation does not. Furthermore, (b) seems to indicate that the contribution of the edges is not so important compared to the node contribution, at least when compared to the similarity established by the human annotators. However, we believe that edges provide additional semantic information that increases the relatedness between documents on a deeper level, and this might be overlooked by human annotators when comparing text documents. The results obtained with the VEO measure, for both the directed (DVEO) and undirected (UVEO) versions, are interesting since both have an excellent trade-off between the computational complexity and the performance. Since VEO does not consider the edge or node weight, we also highlight that it is

Comparing Graph Similarity Measures

173

possible to reduce the time to construct the semantic representation by discarding the weighting module. Thus, VEO is an interesting alternative for critical response time applications. In accordance with the aforementioned, another suitable way to reduce the computational cost is to consider unitary path lengths l = 1. Results in Table 1 are not conclusive about an improvement when a longer path length is selected. In the cases where the correlation measures are improved by the selection of l = 2, the difference in the values obtained by its counterpart l = 1 does not seem to be significant (i.e. less than 2%). In contrast, the selection of connections path of l = 1 significantly reduces the complexity and/or the number of queries that must be send to the KG. When considering the Spearman correlation, the MCS algorithm performs poorly. The similarity measures obtained via this algorithm present the highest variance, so more appropriate ways to normalize M CS (Eq. 5) should be explored. Clearly, the results show the superiority of DBS1 as annotation service. Independent of the graph similarity measure or path length selected, DBS1 outperforms all the annotation services considered. Babelfy has a high number of false positives that negatively influence the representation. Although it seems not to be properly documented, Babelfy provides confidence measures that are associated with each recovered annotation that can be exploited for future filtering strategies. In the case of Spotlight service, small increases in the level of confidence strongly affect the number of concepts in the final representation. The support parameter, on the other hand, does not seem to be so decisive in the final representation and thus the similarity in the results obtained. Table 2 lists the performance for our best-performing similarity measures (obtained via DGEDnode similarity measure, DBS1 as annotation tool and a semantic representation build with l = 2 as input parameter), as well as for the following related baselines: – Salient Semantic Analysis (SSA): a concept-base strategy which incorporates a similar semantic abstraction and interpretation of words, by using the linkage of concepts in Wikipedia [7]. – Graph-based document similarity (GDS): Similar to this work, a semantic graph using KGs is constructed. The representation in this case is basically a KG subgraph on the basis of the annotated concepts. No refinement processing is performed. – Vector Space Model (V SM ): the cosine distance of a standard bag-of-words Vector Space Model. We carried out typical text processing operations including tokenizer, stop word removal and stemming. In general, very competitive results of our best measure of similarity are observed. At the level of the Spearman correlation (ρ) we present the best results; however, SSA is superior in terms of the Pearson correlation (r ). There is a relative improvement of 15.4% over V SM and 4.6% over GDS at the Pearson correlation level.

174

R. Manrique et al.

Table 2. Comparison with related work for Lee50 dataset. Correlation measures: Pearson (r ) and Spearman (ρ) r SSAs [7]

ρ

0.684 0.488

DGEDnode 0.659 0.516 GDS [18]

0.63

-

VSM

0.571 0.402

Table 3 shows the results obtained from the Man17 dataset. For this dataset, we report the results obtained by the semantic representation using DBS1 as annotation service . The results of the flattened version were taken from [11] when the full text of the paper was used as input for the semantic representation. We also report the results obtained via VSM. Table 3. Man17 dataset results.

Consistent with the results obtained in Lee50, DGEDnode presents the best results in terms of MAP and NDCG. The flattened version is better in terms of the MRR. In this dataset, the direction of the edges are more relevant for the recommendation quality. Indeed, there is a significant difference between the values obtained by DGEDedge and U GEDedge . The results also show that a path of size l = 1 is more appropriate for this task.

7

Conclusion

In this paper we have compared the performance of different graph-based similarity algorithms with two different datasets that employ semantic representations. One of the datasets is focused on the similarity between short documents, while the second is focused on the recommendation of academic papers. For each dataset, different evaluation measures were used. The graph yielded better results in comparison with the flattened version of the semantic representation. The results suggest that GEDnodes , an algorithm based on comparing the weighted nodes of both graphs, is an appropriate measure and that it is not necessary to consider the edges of the graph. This goes slightly against our initial hypothesis which suggested that the edges connecting the nodes in the semantic

Comparing Graph Similarity Measures

175

graph express relevant information of the described document and should thus be taken into account. However, as the baseline used was the similarity explicitly indicated by humans, two hypotheses should be further explored: either edges do not add significant information; or, when comparing documents, humans look at the broad picture and if examining a more detailed relation, they might reduce the initial similarity degree. The computational complexity will also be taken into account in order to select the most appropriate similarity measure. It should be noted that the current implementation of the algorithms discussed was done on networkx, a graph library for python. Consequently, the complexity achieved was much higher than theoretically possible. For VEO, we obtained a complexity of O(V + V’ + E + E’) since the algorithm had to create the hash tables to avoid an even higher processing cost. For node graph edit distance, we obtained a complexity of O(2V + 2V’), and for edge graph edit distance, O(V + V’ + 2E + 2E’) for the same reasons. Our future work will focus on exploring other graph matching algorithms, in particular, those with a low computational complexity that can be implemented in real recommendation scenarios and/or information retrieval applications. Additionally, we would like to explore other ways to evaluate the edge correspondence based on paths analysis. Acknowledgment. This work was partially supported by COLCIENCIAS PhD scholarship (Call 647-2014).

References 1. Bunke, H.: Recent developments in graph matching. In: Proceedings 15th International Conference on Pattern Recognition, ICPR-2000, vol. 2, pp. 117–124 (2000) 2. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognit. Lett. 19(3), 255–259 (1998) 3. Corcoglioniti, F., Dragoni, M., Rospocher, M., Aprosio, A.P.: Knowledge extraction for information retrieval. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 317–333. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3 20 4. Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 102–111. Springer, Heidelberg (2011). https:// doi.org/10.1007/978-3-642-20844-7 11 5. F¨ arber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of DBpedia, freebase, OpenCyc, Wikidata, and YAGO. Semant. Web J. 1–26 (2015) 6. Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010) 7. Hassan, S., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: AAAI (2011) 8. Jouili, S., Tabbone, S., Valveny, E.: Comparing graph similarity measures for graphical recognition. In: Ogier, J.-M., Liu, W., Llad´ os, J. (eds.) GREC 2009. LNCS, vol. 6020, pp. 37–48. Springer, Heidelberg (2010). https://doi.org/10.1007/ 978-3-642-13728-0 4

176

R. Manrique et al.

9. Lee, M.D., Welsh, M.: An empirical evaluation of models of text document similarity. In: CogSci 2005, pp. 1254–1259. Erlbaum (2005) 10. Manrique, R., Herazo, O., Mari˜ no, O.: Exploring the use of linked open data for user research interest modeling. In: Solano, A., Ordo˜ nez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 3–16. Springer, Cham (2017). https://doi.org/10.1007/978-3319-66562-7 1 11. Manrique, R., Mari˜ no, O.: How does the size of a document affect linked open data user modeling strategies? In: Proceedings of the International Conference on Web Intelligence, WI 2017, pp. 1246–1252. ACM, New York (2017) 12. Manrique, R., Mari˜ no, O.: Diversified semantic query reformulation. In: R´ oz˙ ewski, P., Lange, C. (eds.) KESW 2017. CCIS, vol. 786, pp. 23–37. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69548-8 3 13. Mendes, P.N., Jakob, M., Garc´ıa-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics 2011, pp. 1–8. ACM, New York (2011) 14. Musto, C., Lops, P., de Gemmis, M., Semeraro, G.: Semantics-aware recommender systems exploiting linked open data and graph-based features. Knowl.-Based Syst. 136, 1–14 (2017) 15. Nunes, B.P., Fetahu, B., Kawase, R., Dietze, S., Casanova, M.A., Maynard, D.: Interlinking documents based on semantic graphs with an application. In: Tweedale, J.W., Jain, L.C., Watada, J., Howlett, R.J. (eds.) Knowledge-Based Information Systems in Practice. SIST, vol. 30, pp. 139–155. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-13545-8 9 16. Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. J. Internet Serv. Appl. 1(1), 19–30 (2010) 17. Piao, G., Breslin, J.G.: Analyzing aggregated semantics-enabled user modeling on google+ and twitter for personalized link recommendations. In: Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, UMAP 2016, pp. 105–109. ACM, New York (2016) 18. Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM 2014, pp. 543–552. ACM, New York (2014) 19. Sugiyama, K., Kan, M.Y.: A comprehensive evaluation of scholarly paper recommendation using potential citation papers. Int. J. Digit. Libr. 16(2), 91–109 (2015) 20. Waitelonis, J., Exeler, C., Sack, H.: Enabled generalized vector space model to improve document retrieval. In: Proceedings of the Third NLP & DBpedia Workshop (NLP & DBpedia 2015) Co-located with the 14th International Semantic Web Conference 2015 (ISWC 2015), 11 October 2015, Bethlehem, Pennsylvania, USA, pp. 33–44 (2015) 21. Willett, P.: Matching of chemical and biological structures using subgraph and maximal common subgraph isomorphism algorithms. In: Truhlar, D.G., Howe, W.J., Hopfinger, A.J., Blaney, J., Dammkoehler, R.A. (eds.) Rational Drug Design, vol. 108, pp. 11–38. Springer, New York (1999). https://doi.org/10.1007/978-14612-1480-9 3

Knowledge Graph-Based Teacher Support for Learning Material Authoring Christian Gr´evisse1(B) , Rub´en Manrique2 , Olga Mari˜ no2 , 1 and Steffen Rothkugel 1

2

University of Luxembourg, Esch-sur-Alzette, Luxembourg {christian.grevisse,steffen.rothkugel}@uni.lu Systems and Computing Engineering Department, School of Engineering, Universidad de los Andes, Bogot´ a, Colombia {rf.manrique,olmarino}@uniandes.edu.co

Abstract. Preparing high-quality learning material is a time-intensive, yet crucial task for teachers of all educational levels. In this paper, we present SoLeMiO, a tool to recommend and integrate learning material in popular authoring software. As teachers create their learning material, SoLeMiO identifies the concepts they want to address. In order to identify relevant concepts in a reliable, automatic and unambiguous way, we employ state of the art concept recognition and entity linking tools. From the recognized concepts, we build a semantic representation by exploiting additional information from Open Knowledge Graphs through expansion and filtering strategies. These concepts and the semantic representation of the learning material support the authoring process in two ways. First, teachers will be recommended related, heterogeneous resources from an open corpus, including digital libraries, domain-specific knowledge bases, and MOOC platforms. Second, concepts are proposed for semi-automatic tagging of the newly authored learning resource, fostering its reuse in different e-learning contexts. Our approach currently supports resources in English, French, and Spanish. An evaluation of concept identification in lecture video transcripts and a user study based on the quality of tag and resource recommendations yielded promising results concerning the feasibility of our technique. Keywords: Learning material · Authoring support Knowledge graph · Concept recognition

1

Introduction

Teachers across all educational levels spend a significant part of their time on preparing learning material for their courses. This is a crucial task, as the quality of the provided resources is of utmost importance to the learning process and This work was partially supported by COLCIENCIAS PhD scholarship (Call 6472014). c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 177–191, 2018. https://doi.org/10.1007/978-3-319-98998-3_14

178

C. Gr´evisse et al.

student performance. It is common practice for teachers, when preparing learning material for their courses, to seek and consult existing resources [11], such as books on a certain study domain or subject, lecture notes or slides from other teachers or institutions, scientific papers, videos etc. The vast space of an open corpus, such as the Web, and the heterogeneity of resources constitute a challenge in retrieving relevant material. Although there has been a considerable effort in Technology Enhanced Learning (TEL) research to provide learning material repositories, the landscape is fragmented due to the use of diverging metadata schemas [3]. Annotations through informal tags hamper the establishment of semantic relations between resources and across repositories, which might be a reason why Open Educational Resources (OER) could stay hidden [1]. There is an abundance of OER, but it is still difficult to find and integrate them, due to the reduced interoperability [14]. Resources in Massive Open Online Courses (MOOC) generally lack in metadata; metadata of Learning Objects (LO) in Learning Object Repositories (LOR) are often incomplete, and metadata schemas for LO give little support in resource recommendation [4]. At the same time, Linked Data (LD) has become a standard for web-scale data sharing, facilitating the exploration of the Web of Data [3,14]. Although LD has not yet been extensively used for OER in practice [14], it could solve, to a certain extent, interoperability issues in TEL research [3] and enable semantic browsing [17]. Furthermore, Limongelli states that the semantic annotation of resources enables a fine grained retrieval of resources [8]. In this paper, we present SoLeMiO, a tool to recommend and integrate learning material in popular authoring software. During the authoring process, SoLeMiO supports teachers in two ways. First, SoLeMiO identifies concepts addressed in the already elaborated part of a new learning resource. To enable the identification of relevant concepts in a reliable, automatic and unambiguous way, we employ state of the art concept recognition and entity linking tools. Based on the recognized concepts, a semantic representation is built by exploiting additional information from Open Knowledge Graphs (KGs) through expansion and filtering strategies. The most important concepts are suggested to the author in order to annotate her resource or parts of it, fostering its reusability and interoperability in different e-learning contexts. The selected concepts are then used to pinpoint and retrieve related, heterogeneous yet semantically enriched learning resources from an open corpus, including digital libraries, domain-specific knowledge bases, and MOOC platforms. Our approach supports resources in English, French, and Spanish. An evaluation of concept identification in lecture video transcripts and a user study based on the quality of tag and resource recommendations yielded promising results concerning the feasibility of our technique. For the semi-automatic concept tagging, different ranking strategies that operate on the semantic representation were validated in order to recommend only the most relevant concepts. The remainder of this paper is organized as follows: In Sect. 2, we discuss related work. In Sect. 3, the semantic representation process using knowledge

Knowledge Graph-Based Teacher Support for Learning Material Authoring

179

graphs is explained. The SoLeMiO tool is presented in Sect. 4. We show an evaluation in Sect. 5, before concluding and presenting ideas for future work in Sect. 6.

2

Related Work

The enrichment of learning resources through semantic metadata may have advantages for both teachers and students. Although Linked Data has not been extensively used for OERs [14], a few research initiatives have given insights into the usefulness of this combination. Dietze et al. describe in [3] how existing TEL data could be exploited by exposing it as Linked Data, thereby providing interlinked data for education. This was done in the context of the mEducator project. Piedra et al. present in [15] how Linked Open Data (LOD) can be used to improve the search for OER in engineering classes. The resulting Serendipity tool provides a faceted search web interface for Linked OpenCourseWare Data. Sicilia et al. show in [17] how Organic.Edunet, a federation of learning repositories for organic agriculture, was redesigned to allow a LD-based navigation across learning resources. From a learner’s perspective, general Web resources are often used as learning material, in addition to the “official” resources provided by the teacher [7]. However, the act of searching for other resources may cause a split-attention effect [16] and lead to distraction from or even abandonment of the learning task. Krieger proposes to integrate Web resources as additional learning material in e-learning contexts by extracting the semantic fingerprint of such resources, publish it as Linked Data [7] and integrate them in an LMS. Thereby, the need to interrupt the learning task to consult a classical Web search engine is reduced. There is little literature focused on the authoring support Linked Data can provide in educational contexts. In a similar way, though, Fink et al. recognize the potential of well-annotated research papers to increase the amount of machine-readable literature metadata [5]. In biological sciences, ontologies are highly popular, but there are few author support tools for the annotation process during the writing phase of a paper. They created an Add-in for Microsoft Word to semantically enrich literature related to life sciences through the recognition of ontology concepts. The metadata is directly stored inside the Word document. Authoring support for Office products has also recently been achieved through a set of commercial Add-ins. While they provide information to lookup from or insert in a Word, Excel or PowerPoint document to support teachers in the authoring process and enhance the learning experience for students, their semantic browsing capabilities are rather limited. Wikipedia and Encyclopaedia Britannica provide the possibility to search for encyclopaedia articles on a certain topic out of a Word document. GeoGebraTube is an Add-in that allows to search and insert learning material regarding GeoGebra1 , a popular interactive geometry software. A recent feature of PowerPoint itself is QuickStarter, which creates an outline of slides based on a topic the user has entered. The data 1

https://www.geogebra.org.

180

C. Gr´evisse et al.

used in this feature comes from the Wikipedia and the Bing search engine. In addition, the Smart Lookup feature enables to retrieve further information on a topic, comparable to the formerly mentioned encyclopaedia Add-ins. In summary, there have been tools to enhance both authoring and learning experience. However, the potential of semantic web technologies has not yet been fully exploited, and the interlinking of learning resources using Linked Data is still in its early stages.

3

Semantic Representation with Knowledge Graphs

Broadly speaking, our semantic resource representation is a weighted directed graph in which nodes represent concepts, and edges represent the existence of a semantic relationship. Via entity recognition and entity linking tools, concepts present in the text (i.e., annotations) of the learning resource are recognized. Then, the structure of the graph, expressed in terms of its edges, is constructed by extracting information about the concepts found in the KG. Two main processes take advantage of the KG information. The first process expands the set of annotations with related concepts following a set of properties in the ontology that governs the KG. The second process filters out concepts that are weakly connected in the induced graph, and which are not representative to express the main topic of the learning resource. Figure 1 summarizes the process of building the semantic representation. In the following paragraphs a general description of the process is presented. A detailed description can be found in [9,10].

Fig. 1. Semantic representation process

3.1

Semantic Representation Process

There are different tools for the discovery of concepts mentioned in the text and their subsequent link to a KG [12,13]. Although many of these services return concepts associated with a single KG, it is possible to recover the Uniform Resource Identifier (URI) of the same concept in a different KG through the interconnections between datasets promoted through Semantic Web vocabularies. After the annotations are found, two different expansion modules incorporate new related concepts. The categorical-based expansion incorporated the

Knowledge Graph-Based Teacher Support for Learning Material Authoring

181

hierarchical information of each resource in the representation. The hierarchical structure of a KG is represented by child-parent relationships and usually denotes the membership of a concept in a category. The property-based expansion enriches the representation with the set of concepts recovered by following the set of non-hierarchical properties in the ontology. These annotations and new concepts incorporated through expansion processes are weighted following different strategies as is described in [9]. In the final filtering module, property paths2 between every pair of nodes are analyzed and constitute the base information for the graph edge conformation. Basically, an edge is created if there is a property path between the given pair of concepts. The weight of the edge is assigned according to the number and type of properties that link both concepts. To illustrate the process consider the first paragraph of the Wikipedia “For loop” page: “In most computer programming languages, a while loop is a control flow statement that allows code to be executed repeatedly based on a given Boolean condition. The while loop can be thought of as a repeating if statement.”. After the annotation process the following concepts are identified: (Code, Grommet, Programming language, Statement (computer science), While loop). Far from being a perfect annotation, the result is subject to the limitations of the entity identification and entity linking tools. The incomplete and incorrect annotation reflects the existing trade-off between precision and recall. In the proposed case, the “Grommet” concept corresponds to an incorrect annotation while “Control flow” is an unidentified concept even though it is present in the text. The expansion and filtering processes alleviate the previous problems, and refine the representation. Figure 2 shows the resulting graph just before removing weak connected concepts in the filtering module. The concepts that best represent the topic of the text tend to be strongly connected (i.e., greater number of connections with other concepts) through more “strong” links. On the other hand, the concept “Grommet”, which is indicated in the figure, does not have any input link and is susceptible to being removed from the representation. It is interesting to analyze that as a result of the expansion process concepts like “Control Flow”, “For Loop” and “Do While Loop” that are strongly related are incorporated. In particular the node “Control Flow” is important since it is a concept that was found in the text, yet it was not identified by the annotation tool. To determine the importance of each concept, different measures of centrality can be used. In the graph, the pagerank algorithm was used, resulting in the following top three concepts (Programming languages, Control Flow, While loop). These concepts describe correctly the analyzed text fragment. 3.2

Semantic Representation for Tag Recommendation

Based on the above facts, our hypothesis is that semantic representation is useful for the recommendation of concept-tags for learning resources during the authoring process. In SoLeMiO the recommendation of learning resources is 2

https://www.w3.org/TR/sparql11-property-paths/.

182

C. Gr´evisse et al.

Fig. 2. Semantic representation before filtering process. Blue nodes are annotations, Green nodes are concepts incorporated via the expansion module. (Color figure online)

made based on the concept-tags that are selected by the author, which serve as input to consult the different sources that are available. In particular if the source also contains concept-tag metadata, the resource exchange is more efficient and effective. In the evaluation in Sect. 5, we explore different strategies for the ranking of concepts from the semantic representation graph-based structure.

4

SoLeMiO

In this section, we present SoLeMiO, a tool to recommend and integrate learning material in popular authoring software, while semantically enriching the currently elaborated resource. SoLeMiO stands for Semantic integration of Learning Material in Office. Previous research initiatives often yielded custom educational authoring tools. However, such editors cannot reasonably reach the quality of industrial standard editors, while this quality must be present in learning resources [2]. As products from the Microsoft Office suite, such as Word and PowerPoint, are heavily used for authoring text documents and slideshows, and many higher education institutions offer free Office 365 subscriptions for both teaching staff and students, we decided to benefit from the Office Add-ins platform. In comparison to the platform available at the time of writing of Fink et

Knowledge Graph-Based Teacher Support for Learning Material Authoring

183

al. [5], nowadays, the Office Add-in platform enables cross-platform integration of additional UI elements, content insertion and remote service calls. An Office Add-in essentially comprises two elements, namely a manifest file locally stored on a host, and a webpage, possibly hosted at a remote server. The architecture of SoLeMiO is shown in Fig. 3. Assume a teacher is creating a new PowerPoint slideshow, e.g., for a Java beginners course, in French. While she has already written some initial content, she now wants to see what resources already exist in this realm, without having to leave PowerPoint. To do so, the concepts so far covered have to be correctly identified. The author can now choose to analyse either a selection (e.g., some text on a single slide) or the whole document so far. As already mentioned, an Office Add-in requires a manifest file on the local machine, which states the location of the webpage. For SoLeMiO, the webpage consists of a set of microservices, which delegate the actual computation to some dedicated, remote services. When an analysis is launched, the content (selection or whole document) is sent to a microservice, that calls a set of remote semantic annotators, which identify relevant concepts in the payload. For smaller selections, the correct identification of relevant concepts is challenging for semantic annotator services, which is why no knowledge graph exploration is done here. However, if the whole document has to be analysed, and the provided content is thus much bigger, the semantic annotator services tend to better perform the identification of relevant concepts. Here, the previously described expansion and filtering techniques are applied to explore and suggest further relevant concepts. While the analysis of a selection usually takes no more than 5 s, the knowledge graph exploration, depending on the size of the document, might take around 1 min. Ad-hoc suggestions for selections are thus useful as the author is writing her document, whereas analysis of whole documents are rather suitable for a posterior, offline semi-automatic annotation process, e.g., for existing non-annotated resources. 2. Request concepts

1. Send content

3. Return concepts

4. Present concepts 5. Request resources 8. Present resources

Add-in Microservices

6. Consult repositories 7. Return resources

Semantic Annotators Semantic Representation

Digital Libraries Knowledge Bases

Fig. 3. SoLeMiO architecture

The identified concepts are sent back to the Add-in, which will show the concepts in descending order of their relevance to the author. The relevance is determined by a set of properties, such as concept frequency and PageRank, as further described in Sect. 5. To perform a correct annotation of the selected content, she may now choose a subset of identified concepts. In addition, if there are any concepts missing but deemed important, she may add additional concepts from a set of ontologies. In our example, she may choose concepts from

184

C. Gr´evisse et al.

the ALMA ontology [6], a modular ontology for programming education. Figure 4 shows this situation.

Fig. 4. Concept identification and selection in SoLeMiO Add-in for PowerPoint. The Task Pane on the right-hand side of the screen shows the concepts identified by the semantic annotator services. At the end of the list, manual annotations with concepts from a set of ontologies can be added.

Once the concepts identified and confirmed, the Add-in enables her now to retrieve relevant learning material. A dedicated microservice requests resources relevant for the given concepts from a set of learning material repositories, domain-specific knowledge bases, MOOC platforms and digital libraries. If these repositories contain semantically enriched resources and can be browsed using Linked Data, resources can be pinpointed and recommended to the author. Otherwise, the performance of resource recommendation depends on whether the repository has an own search engine which ranks resources with respect to the given query. In this case, the formulation of the query may heavily influence the result, as synonyms of a concept could be used in the resources. The lack of a semantic relation in such a repository can thus result in a bottleneck. However, different forms of queries could be used to leverage the issue, by testing with closely related concepts. Still, this may not always be trivial, as described in [5]. In our example, we include two repositories containing resources related to programming and annotated using the ALMA ontology. The latter is aligned with DBpedia, such that concepts previously identified in the DBpedia knowledge graph can be used to query these repositories and retrieve resources. For instance, if the concept dbr:Assignment (computer science)3 was previously 3

http://dbpedia.org/page/Assignment (computer science).

Knowledge Graph-Based Teacher Support for Learning Material Authoring

185

identified (as seen in Fig. 4), and the concept programming:Assignment from the ALMA ontology is related to it, the repositories will return resources annotated with either concept. The first repository (ALMA) contains a mix of traditional learning resources (slideshows, book excerpts) and Web resources. The second repository (ALMA-DAJEE) provides excerpts of selected MOOC videos from a related study. In addition, we use the Safari Books Online4 digital library. As latter does not provide means to benefit from LD-based browsing, the concept label is sent as a query, relying on their internal search engine. Overall, a heterogeneous set of resources is returned. The proposed learning material can then be consulted from within PowerPoint, and added to a bookmark list for this particular concept in the given selection. This is shown in Fig. 5.

Fig. 5. Relevant resources are suggested in a dialog, from which their content can also be consulted right away, without the need to leave PowerPoint. For future reference, these resources can be added as a bookmark.

The set of ontologies for manual annotations and learning material repositories could be extended. As the example has shown, a slideshow in French was annotated with English concepts. The French concepts could still be retrieved via the owl:sameAs property, which might be needed to retrieve only learning material in this language. In addition to the natural language, resource retrieval might also rely on additional parameters such as the programming language. However, an inherent issue with learning resources on programming languages is that the considered version of the language might be of utmost importance. 4

https://proquest.safaribooksonline.com.

186

C. Gr´evisse et al.

For instance, a new slideshow to introduce the Python programming language probably does not want to integrate resources on Python 2 anymore. Therefore, it might be inevitable that the user has to manually indicate some further parameters to filter resources. The semi-automatic annotation support based on Linked Data enables the reusability and interoperability of the newly authored resource in further elearning contexts, such that, if available on a repository that provides semantic browsing capabilities, this resource itself could also be retrieved at a later point in time by some other author in a different context, thereby creating a network of interrelated documents, such as known from the research domain itself. The fact that resources can be saved as bookmarks in a document is not only a trace of sources for its author, but also helps its consumers, i.e., the students. While slideshows are often intended to represent a summary of a course topic, bookmarks to related material could be used as reading assignments, or provide additional information from a knowledge base (e.g., details on a molecule mentioned on a slide in a chemistry course). This way, students would not need to seek this information elsewhere, risking the previously mentioned split-attention effect and increasing the extraneous cognitive load.

5

Evaluation

Our evaluation aims at showing (i) how different annotators influences the performance of our approach, (ii) that relevant concepts are recommended to support the semi-automatic tagging of the resources, and (iii) that relevant learning resources are recommended to teachers. While (i) is evaluated across all our experiments, for (ii) and (iii) we use two different evaluation approaches. For the concept tagging functionality we carry out an automatic evaluation using a dataset of learning resources human-annotated with their main concepts. For the recommendation of learning resources, we perform a small user study in which a teacher used SoLeMiO to integrate learning material in a set of learning resources related to programming topics. As a result of this user study, the set of resources that were deemed relevant is obtained from the bookmarks. Based on this information, a set of metrics reveals the viability of the recommendation process. 5.1

Experimental Setup

We implemented our experimental evaluation using the following resources: – DBpedia: Although we can use any Knowledge Graph, we select DBpedia 2016-10 version. Its comprehensive vocabularies and its extensive relationships between concepts, combined with continuous updating, enable cross domain modeling capabilities. Recent experiments conducted by [9] show the potential of this knowledge source for the efficient calculation of similarity between documents. The hierarchical structure of a concept is drawn from categories in the Wikipedia categorical system. Categories are extracted through dct:subject predicate.

Knowledge Graph-Based Teacher Support for Learning Material Authoring

187

– DBpedia Spotlight semantic annotator5 : We use DBpedia Spotlight (DBS) to annotate text documents with concepts. DBpedia Spotlight allows to configure the level of annotation (precision/recall trade-off) by confidence/support parameters. The support parameter specifies the minimum number of inlinks a DBpedia resource has to have in order to be annotated, while the confidence parameter controls the topical pertinence and the contextual ambiguity to avoid incorrect annotations as much as possible [12]. Following previous results, we define 2 different configuration for DBS: DBS1 (support: 5, confidence: 0.35) and DBS2 (support: 5, confidence: 0.40). – Babelfy semantic annotator6 : In order to compare the influence of the quality of the semantic annotation, we also use Babelfy entity linking tool to discover concepts in the text. Unlike DBS, no special configuration was made for Babelfy and all concepts returned by the service were considered. 5.2

Concept Tagging

As mentioned before, our hypothesis is that from the proposed semantic representation it is possible to extract the most important concepts taking advantage of the graph-based structure. Even when semantic representation has been proven superior results in recommendation and information retrieval tasks [9], there is no guarantee that the same approach is suitable for the tagging of learning resources. In order to validate our hypothesis we built our own dataset with programming fundamentals learning resources in English extracted from MOOC courses. For each resource, at least 3 proficient programmers selected the most relevant concepts from the ALMA ontology. The final dataset is composed by a set of 60 learning resources and their corresponding list of tagged ALMA concepts that will be used as ground truth. On average each learning resource has only 3 concept-tags making this dataset a challenge for the evaluation of automatic tagging strategies. Since our objective is to extract the most important concepts/nodes from the semantic representation in order to compare with the ground truth, it is necessary to define different strategies for concept ranking: – Concept Frequency (CF): We select the nodes in the representation according to their weight. Nodes with higher weight are ranked first. The node weight represent the frequency of the concept in the learning resource. – Centrality Measures: each node is ranked according to one of the following centrality measures calculated on the graph: • Degree centrality (DC): The degree centrality of a node c is the relationship between the number of nodes that are connected to it and the total number of nodes. • Betweenness centrality (BC): The betweenness centrality is the fraction of shortest paths between all the possible node pairs that pass through the node of interest. 5 6

http://www.dbpedia-spotlight.org/. http://babelfy.org/.

188

C. Gr´evisse et al.

• PageRank (PR): PageRank is a well-known algorithm designed to rank web pages according to incoming links. In essence PageRank is a measure that ranks important nodes on directed graphs. We define the quality of the strategy in terms of the following evaluation metrics: – Mean Reciprocal Rank (MRR) is a measure used in information retrieval tasks that gives the averaged ranking of the first correct prediction. In our case, a correct prediction indicates that the predicted tag match with one of the human annotated tags in the dataset for a particular learning resource. MRR is defined as:  1 1 (1) M RR = |R| rankr r=1...|R|

where rankr refers to the rank at which the first correct prediction was found for the learning resource r ∈ R. – Precision at k (P@K) is the proportion of predicted tags in the top-k set that are corrected predicted. In our case, we select k = 3 as we have on average three concept-tags for each resource in the dataset. As shown in Table 1, the best results were obtained using PageRank as ranking strategy and DBS1 as an annotation system. According to the MRR, it is possible to recommend a relevant tag within the first 3 results in average. This is an important result as it is not usual to recommend more than 5 tags to the user [18]. Even when the precision seems low, the values obtained are suitable for the task at hand. As mentioned earlier, the dataset is challenging due to the limited number of annotations available for each resource. Finally, it is interesting to note that the frequency of the concept in the text is also an appropriate ranking strategy at a lower computational cost. 5.3

Learning Resource Recommendation

In order to evaluate the recommendation of learning material, a teacher was asked to use SoLeMiO on a set of learning resources of his own. The resources correspond to a set of PowerPoint presentations in French used in a programming course. The following protocol is followed: (i) for each learning resource, the most relevant set of concepts have to be annotated following the tag recommendations made by SoLeMiO, (ii) the user has to manually include the relevant concepts that are not suggested by SoLeMiO despite being relevant, and (iii) for a subset of these concepts, the user has to select from the recommended learning resources those that are pertinent and appropriate for the given learning concept. Following the results of the previous section, we use the PageRank strategy to rank the concepts using the annotations retrieved by DBS1 and Babelfy services. A total of 59 concept-tags were added to the learning resources with an average of 8.42 per resource. 21 concepts of the total were entered manually indicating that 64% of tagged concepts were taken from the recommendations.

Knowledge Graph-Based Teacher Support for Learning Material Authoring

189

Table 1. Concept tagging evaluation results Ranking strategy S. annotators MRR

P@3

CF

DBS1 DBS2 Babelfy

0.4188 0.3352 0.31

0.1888 0.1611 0.1777

DC

DBS1 DBS2 Babelfy

0.2312 0.1982 0.3082

0.1277 0.1 0.1277

BC

DBS1 DBS2 Babelfy

0.245 0.1686 0.1663

0.1222 0.0833 0.0833

PR

DBS1 DBS2 Babelfy

0.4562 0.1932 0.3843 0.1564 0.4194 0.1848

The teacher selected 23 concepts from the 59 concept-tags as input for the learning material recommendation process. SoLeMiO recommended an average 35.47 learning materials per concept, and from those recommended items, 3.56 were selected on average as relevant by the teacher and bookmarked. For all the 23 concepts, relevant results were found indicating that the sources are adequate for the programming domain. Table 2 presents the distribution of recommended and selected learning material by source. According to this table, most of the selected and recommended resources come from Safari Books Online. However, the ratio tells us that ALMA is a more appropriate source of resources for this domain since it is more likely to obtain a relevant recommendation. In general it was identified that results from Safari Books Online are too general and tend to be susceptible to the ambiguity problems product of the typical keyword search. Table 2. Distribution of recommended and selected learning material by source Avg. recommended Selected Ratio ALMA ALMA-DAJEE

5.086 10.26

Safari Books Online 20.0

1.17

0.23

1.0

0.097

1.39

0.0695

Finally, sorting the results by the source following previous results (i.e., first ALMA, then ALMA-DAJEE and finally Safari Books Online) we obtain the following evaluation metrics: M RR = 0.5814, P @10 = 0.2478 and P @20 = 0.1804. As observed by the precision and MRR values, the most relevant results will be found in the first recommended results. The previous behavior is desired since users usually tend to review only the first results that are recommended.

190

C. Gr´evisse et al.

In order to have insights about the use of SoLeMiO during the authoring process and not only about an existing resource, we asked the teacher to build the first slides of a new learning resource. In this exercise, an annotation process was performed on each slide. Unlike using the resource full text to annotate, the use of small pieces of text resulted in better concept-tag recommendations. Using DBS1 as the annotation service, an M RR = 0.68 and a presicion P @3 = 0.38 was obtained.

6

Conclusions and Future Work

In this paper, we presented SoLeMiO, a tool to recommend and integrate learning material in popular authoring software. SoLeMiO supports teachers in the authoring process of learning material through semi-automatic annotations based on a semantic representation built by exploiting information from Open Knowledge Graphs through expansion and filtering strategies. The semantic enhancement fosters the reusability and interoperability of the resource in different e-learning contexts. Based on the selected concepts, related, heterogeneous learning material is retrieved from repositories providing resources annotated with Linked Data. The newly authored resource is also ready to be published on such a repository. Learners can also benefit from the bookmarked resources, accessible from within the same learning context. An evaluation of concept identification in lecture video transcripts and a user study based on the quality of tag and resource recommendations yielded promising results concerning the feasibility of our technique. There are several directions for future work. Manual annotations with ontology concepts could be passed to the semantic representation process to enable a user-directed influence in the identification of concepts. Other study domains, such as biology, chemistry, geography or history could be used to study the transferability of our approach. Other types of learning resources, such as semantically enriched gamification activities could also be integrated to diversify the learning experience for students. Other authoring environments, like LATEX, could also be considered. Finally, an evaluation from the students’ point of view could help assess the added value of our approach.

References 1. Chicaiza, J., Piedra, N., Lopez-Vargas, J., Tovar-Caro, E.: Domain categorization of open educational resources based on linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2014. CCIS, vol. 468, pp. 15–28. Springer, Cham (2014). https:// doi.org/10.1007/978-3-319-11716-4 2 2. Dehors, S., Faron-Zucker, C.: QBLS: a semantic web based learning system. In: Proceedings of EdMedia: World Conference on Educational Media and Technology 2006, pp. 2795–2802. AACE, Orlando, June 2006 3. Dietze, S., et al.: Interlinking educational resources and the web of data: a survey of challenges and approaches. Program 47(1), 60–91 (2013)

Knowledge Graph-Based Teacher Support for Learning Material Authoring

191

4. Estivill-Castro, V., Limongelli, C., Lombardi, M., Marani, A.: DAJEE: a dataset of joint educational entities for information retrieval in technology enhanced learning. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2016. ACM (2016) 5. Fink, J.L., et al.: Word add-in for ontology recognition: semantic enrichment of scientific literature. BMC Bioinform. 11(1), 103 (2010) 6. Gr´evisse, C., Botev, J., Rothkugel, S.: An extensible and lightweight modular ontology for programming education. In: Solano, A., Ordo˜ nez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 358–371. Springer, Cham (2017). https://doi.org/10.1007/9783-319-66562-7 26 7. Krieger, K.: Creating learning material from web resources. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudr´e-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 721–730. Springer, Cham (2015). https://doi.org/10. 1007/978-3-319-18818-8 45 8. Limongelli, C., Lombardi, M., Marani, A., Taibi, D.: Enrichment of the dataset of joint educational entities with the web of data. In: 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT), pp. 528–529, July 2017 9. Manrique, R., Herazo, O., Mari˜ no, O.: Exploring the use of linked open data for user research interest modeling. In: Solano, A., Ordo˜ nez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 3–16. Springer, Cham (2017). https://doi.org/10.1007/978-3319-66562-7 1 10. Manrique, R., Mari˜ no, O.: How does the size of a document affect linked open data user modeling strategies? In: Proceedings of the International Conference on Web Intelligence. WI 2017, pp. 1246–1252. ACM, New York (2017) 11. Marani, A.: WebEduRank: an educational ranking principle of web resources for teaching. In: Proceedings of the 15th International Conference on Web-Based Learning. ICWL 2016, pp. 25–36, October 2016. http://ceur-ws.org/Vol-1759/ paper4.pdf 12. Mendes, P.N., Jakob, M., Garc´ıa-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems. I-Semantics 2011, pp. 1–8. ACM, New York (2011) 13. Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 13th International Semantic Web Conference, Posters and Demonstrations. ISWC 2014 (2014) 14. Navarrete, R., Luj´ an-Mora, S.: Use of linked data to enhance open educational resources. In: 2015 International Conference on Information Technology Based Higher Education and Training (ITHET), pp. 1–6, June 2015 15. Piedra, N., Chicaiza, J., L´ opez, J., Tovar, E.: Using linked open data to improve the search of open educational resources for engineering students. In: 2013 IEEE Frontiers in Education Conference (FIE), pp. 558–560, October 2013 16. Schmeck, A., Opfermann, M., van Gog, T., Paas, F., Leutner, D.: Measuring cognitive load with subjective rating scales during problem solving: differences between immediate and delayed ratings. Instr. Sci. 43(1), 93–114 (2015) ´ 17. Sicilia, M., Ebner, H., S´ anchez-Alonso, S., Alvarez, F., Abi´ an, A., Barriocanal, E.: Navigating learning resources through linked data: a preliminary report on the re-design of Organic.Edunet. In: Proceedings of Linked Learning 2011: The 1st International Workshop on eLearning Approaches for the Linked Data Age (2011) 18. Song, Y., et al.: Real-time automatic tag recommendation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2008, pp. 515–522. ACM, New York (2008)

Building Alternative Methods for Aiding Language Skills Learning for the Hearing Impaired Paula A. Correa D. , Juan P. Mej´ıa P. , Andr´es M. Lenis L. , Cristian A. Camargo G. , and Andr´es A. Navarro-Newball(B) Pontificia Universidad Javeriana, Cali, Cali, Colombia {pcdiaz,juanpmejia,mauriciol59,cristiancamargo, anavarro}@javerianacali.edu.co http://www.javerianacali.edu.co

Abstract. Rehabilitation therapy favour language development and cognitive processes in deaf children who are learning language skills. After some therapy, children should acquire narrative capabilities which are relevant in human communication and understanding. However, it is frequent that language practice takes place during therapy sessions only. Moreover, some kinds of therapy such as the ones related to language mechanisation, even though useful, become hard for children. In previous work, we demonstrated how videogames favour the repetitive approach required for language therapy. Nevertheless, technology offers other possibilities. In this work we propose two alternatives to video games to support language learning. First, we describe a colouring mobile application aimed at exploring the impact of art in language learning processes. Then, we describe two web applications based on mixed realities and tangible user interfaces. The idea is that these developments could be used not only during therapy sessions, but also for continuous practice at home with support of the parents. Requirements identification with language therapists and preliminary heuristic evaluation favour potential success and usablility for the proposed systems. Keywords: Art · Mixed reality Language therapy

1

· Tangible user interface

Introduction

Rehabilitation therapy favours language development and cognitive processes in deaf children who are learning language skills. After some therapy, children should acquire narrative capabilities which are relevant in human communication and understanding [1]. However, it is frequent that language practice takes place during therapy sessions only. Moreover, some kinds of therapy such as the ones related to language mechanisation, even though useful, become hard for children [2]. c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 192–200, 2018. https://doi.org/10.1007/978-3-319-98998-3_15

Aiding Language Skills

193

In previous work [2] we developed a video game that takes a deaf child using cochlear implant for an adventure with the main character. The results of the tests carried out show that children are more involved in the rehabilitation exercises through the video game than with the exercises practiced with the therapist. Both the parents of the child and the therapist said that the video game encourages children making the mechanisation exercises funnier. In other work, Cano et al. [3] demonstrate a gamification process used to develop a game to improve learning of children with cochlear implants. Tokuhisa and Kamiyama [4] present a mobile application to produce sketches aimed for colouring. Mich [5] proposes a method of evaluation of reading comprehension for deaf children using art as a means of evaluation. She proposes drawing as a way of evaluating children. Her study consisted in giving a reading to the children and then carrying out some comprehension exercises by elaborating an illustration of the history that they read. The project showed that drawing is a viable alternative as a method of evaluation in reading comprehension. At the same time, other works explore alternative ways to interact. Mirzaei et al. [6] developed an application to facilitate communication between people with hearing problems and people without that condition, using augmented reality capabilities to transform oral expressions into a visualised shape. This work serves as a basis to demonstrate that it is possible to use this type of technology to improve the quality of life of people. Shen et al. [7] describes the implementation of a system capable of helping people to recognise sound sources in a visual way, which could have application in their daily life. Iversen et al. [8] implemented a tangible interface application also aimed at people with hearing problems, in which user interaction is achieved through an interactive floor. The results of their tests indicate that this type of innovative interfaces can act as an alternative pathway for learning and rehabilitation, where taking advantage of the person’s multiple senses the user experience is enriched. Some works [2,3] use a video game as a tool for language learning, different from the artistic approach that seeks the present work. It should be noted that colouring is a playful medium that does not necessarily represent a game. Another work [4] involve art aimed at amusment only. In contrast, one project [5] is aimed at evaluating reading comprehension skills, but does not offer a tool for supporting rehabilitation and language construction. Other sort of interfaces [6– 8] are mainly used for facilitating communication but not necessarily for learning. There is no evidence of a system favouring the learning of the concepts of classification in groups and categories for deaf children associated with language using art. Integration of art and learning using technology, for children with cochlear implant is not common and is not described in the related work discussed previously. On the other hand, augmented reality and tangible interfaces reveal an enormous potential. Experiments show enrichment capabilities for augmented reality and advantages of manipulation and interaction in tangible interfaces to generate innovative and superior user experiences [9]. Our applications are based on these two ideas. Our goal is to explore alternative interaction methods for supporting language skills learning of children with cochlear implant. First, we

194

P. A. Correa D. et al.

describe a colour filling application. Then, we describe the use of augmented reality and tangible user interfaces.

2

Prototyping

We have developed three applications. To get requirements and clarify concepts we analysed related work and held and interview with a group of three phono audiologists and language therapists. We performed a iterative heuristic evaluation using the Nielsens principles [10] with the help of one expert and considered additional recommendations [11] in order to enhance the interfaces. We focused our evaluation to make sure the interface followed important principles. Now the interfaces have: – Aesthetic and minimalist design. Thus, irrelevant information is avoided. Also the interface is suitable for children in the range ages. – Error prevention. We minimise error-prone conditions using simple choices based on simple actions. – Consistency. Actions, mean the same thing and for each case, follow the application’s conventios. – and considered additional recommendation. All the applications are expected to display appropriate feedback in response to user’s actions. – Recognition rather than recall. Options to perform actions are visible. Then, we defined and architecture and developed a user interface. We performed iterative heuristic evaluation for developing the interfaces and we are to perform a user validation in an institute working for the hearing impaired. For the development of the interface and the heuristic evaluation we used human computer interaction principles found at [10,11].

3

Architecture

In all cases we use a layered architecture for the application. The independence provided by the layered architecture is a factor that led to choosing this pattern as a pillar on which to design the application. Additionally, the scalability of a layered architecture allows for extension without affecting the functionality of other layers (Fig. 1).

4

The Colouring Application

It is aimed at children between 4 and 7 years. The application involves art through colouring. The application will have several levels in which the child is presented with a graph that should be coloured depending on the instructions presented. These exercises focus on reinforcing the concepts learned by the child in his/her sessions with the therapist and as the child progresses in level, these exercises will become more complex. The idea is to use and test colouring as a tool

Aiding Language Skills

195

Fig. 1. Layered model for the applications.

to classify (e.g. the application place challenges like: fill in green all dogs in the image). As levels get harder the application should display greater classification challenges. At the end the child will be awarded with a free colouring level. The application has a strong emphasis on two fronts. On the one hand, the graphical part is the most notorious and important element of the application when considering the target audience of the project. Considering this, it is appropriate to think that a layer of the model is fully dedicated to the processes related to this front. On the other hand, we have a database to store the information of the users and the resources in the form of images necessary to deploy the exercises. One important algorithm for colouring is the filling algorithm [12]. For the project, we decided to use the Flood-Fill approach. The Flood-Fill algorithm is an algorithm that start at a point as a seed to fill a region in two dimensions. Its implementation is based on the use of queues and recursion, which in turn depends on the connectivity of the different regions of the area. There are 4connected and 8-connected regions depending on the degree of connectivity their points have.

196

P. A. Correa D. et al.

Fig. 2. Colouring application interface. (A) Login screen. (B) Choosing a level. (C) Main colouring interface. The work area in the centre is where the objects to classify will be placed. The palette is shown above and the brush and the eraser to the right. (D) Results screen.

Figure 2 shows the mockup of the interface. Additionally, we identified the following requirements. The application must: – – – – – – – – –

5

Have a colour palette in the interface. Have a brush of a pre-set thickness for colouring. Offer a login option to identify the user. Have a delete tool to undo what was coloured with the brush. Must allow to continue with the work that was being done after it is closed, and the application is opened again. Must have a menu where the exercise to perform can be chosen. Must have feedback on the performance from the exercise. Must present colouring exercises with different levels of difficulty in terms of language classification. Must have a minimalist interface, with large buttons to facilitate manipulation.

The Augmented Reality Domino and the Tangible Domino

In collaboration with the therapists, it was determined that the best option to integrate with interaction technologies was the activity known as “Animal

Aiding Language Skills

197

Domino”. This activity consists basically of a game of domino which differs from the traditional one by the fact that it uses animal figures instead of dots. Among other things, this therapy seeks to reinforce the following aspects in the child: deduction process, use of specific items, use of indeterminate items, use of demonstrative articles, use of adjectives, use of place adverbs. Additionally, we identified the following requirements. In the application: – There are two players: the child and the therapist (or tutor). – A set of animal-related nouns is used. – According to the level the animals vary (e.g. domestic, wild) each move implies a question for the child (oral production). – The objective of the game is that the child structure the sentences well as he puts the pieces on the table. – Multiple variables involved: small cow, fat, etc. – The pieces reflect both the animals and the adjectives that describe them. – In each move the child produces two sentences, one on each side of the piece he puts on the board. – It is under the judgment of the second player (therapist or tutor) if the child responds correctly or not. – The game does not continue to display domino pieces until the child responds correctly. – The game ends when it is not possible to put more pieces, or these are exhausted, or if the user wishes. – Each oral expression of the child should be fed positively or negatively. – Rewards must be offered for playing rightly (e.g. happy Caritas). Because of the first requirement, we used the layout shown in Fig. 3, which allows interaction of two people at the same time on a desk. In one case the paradigm of augmenting reality is used; in the other, the paradigm of extending the real space to the virtual one as looking through a window is used (tangible interfaces). For each of the selected interaction technologies, a series of general directives were formulated with the aim of guiding development. For the augmented reality system, physical objects (i.e. domino pieces) will have labels to allow them to augment them digitally with graphics. For the tangible interface, interaction will be achieved with elements that resemble those that would be used in real life for the accomplishment of the activities to make interaction as natural as possible. Here, the positioning of the domino pieces will be free, so that the moves will be more flexible. Augmented reality offers the possibility of enriching the game in ways that would not normally be possible. Each domino piece and every move that a child performs will be captured by the camera to generate the game board in the virtual space, where each piece will deploy a representative 3D model of the corresponding animal. In this way, we will achieve an effect of augmenting the reality through the virtual space. The biggest contribution from tangible user interfaces is that both, real and virtual space are equally important to perform the interaction. In contrast, in augmented reality the role of the real world is

198

P. A. Correa D. et al.

Fig. 3. The desk becomes the interaction space for two people.

Fig. 4. Interaction technologies. (A) Augmented reality domino. Markers are augmented with animal domino pieces. (B) Tangible domino. The domino pieces control positions of virtual animals

limited to allowing the visualisation of the virtual elements. On the other hand, with tangible interfaces there is a collaboration between both spaces in which the real environment is the main source of interaction with the application while the virtual environment is where feedback and visual effects of the game are given (Fig. 3). Figure 4 shows the mockup of the augmented reality application and the tangible interface. In Fig. 4A the domino pieces are laid on the desk and are detected by the camera. Then, on the screen, virtual animals are superimposed

Aiding Language Skills

199

Fig. 5. Interaction technologies. (A) Colouring interface. Left: level selection. Here, classification tasks increase difficulty according to level. Right: colouring canvas. (B) Left: domino pieces in the augmented reality domino. Right: animals waking by in the tangible domino.

on the domino pieces. Virtual animals superimposed on the domino pieces from our final implementation can be seen to the right of this figure. In Fig. 4B the tangible elements are laid on the desk and are detected by the camera. Then, on the screen, virtual animals are displayed in a space which seems an extension from the real space. To the right we can see a virtual cow from our final implementation, which is controlled by one of our tangible elements. Note that in the tangible domino, the real tangible object is not visible in the graphical output as the graphical output is considered as an extension of the real space.

6

Conclusion and Further Work

From related work, we have evidence of the potential of novel interaction metaphors such as art, augmented reality, and tangible user interface applied for language learning skills for the hearing impaired. Our three applications requirements were obtained with the help of therapist who explained to us the activities they carry during therapy sessions. We are proposing new activities (art) to achieve some language skills or expanding traditional activities to the digital world to demonstrate the benefits (the domino games). The layered model of the architectures has allowed easy and flexible construction of the application and the heuristic validation evidences a possible good usability. Nevertheless, we still need to perform a final validation of the three systems among both therapists and patients to demonstrate real usability and language learning potential. Figure 5 shows the actual state of development of the applications. Acknowledgements. This work is part of the project No. 125174455451, titled: Apoyo a la Terapia de Rehabilitacin del Lenguaje Oral y Escrito en Nios con

200

P. A. Correa D. et al.

Discapacidad Auditiva. This project is funded by the Departamento Administrativo de Ciencia, Tecnologa e Innovacin de la repblica de Colombia (COLCIENCIAS). We would like to acknowledge therapists from the Instituto Para Ni˜ nos Ciegos y Sordos, Cali - Colombia and the Destino Ressearch Group, Pontificia Universidad Javeriana, Cali.

References 1. Rinc´ on, L., Villay, J., Mart´ınez, J., Castillo, A.D., Portilla, A.Y., Navarro, A.: Un videojuego para apoyar la terapia del lenguaje: el caso de la descripci´ on est´ atica. In: Congreso Iberoamericano de Tecnolog´ıas de Apoyo a la Discapcidad, pp. 597–605, (2017). ISSN 2619–6433 2. Navarro, A., et al.: Talking to Teo: video game supported speech therapy. In: Entertainment Computing (2014). ISSN 1875–9521 3. Cano, S., Collazos, C.A., Manresa-Yee, C., Pe˜ ne˜ nory, V.: Principles of design for serious games to teaching of literacy for children with hearing disabilities. In: Moreno, L., de la Rubia Cuestas, E.J., Penichet, V.M.R., Garc´ıa-Pe˜ nalvo, F.J. (eds.) Proceedings of the XVII International Conference on Human Computer Interaction, Article 6, p. 2. ACM, New York (2016). https://doi.org/10.1145/ 2998626.2998650 4. Tokuhisa, S., Kamiyama, Y.: The world is Canvas: a coloring application for children based on physical interaction. In: Proceedings of the 9th International Conference on Interaction Design and Children. IDC 2010, pp. 315–318. ACM, New York (2010). https://doi.org/10.1145/1810543.1810601 5. Mich, O.: E-drawings as an evaluation method with deaf children. In: The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility. ASSETS 2011, pp. 239–240. ACM, New York (2011). https:// doi.org/10.1145/2049536.2049586 6. Mirzaei, M.R., Ghorshi, S., Mortazavi, M.: Using augmented reality and automatic speech recognition techniques to help deaf and hard of hearing people. In: Proceedings of the 2012 Virtual Reality International Conference. VRIC 2012, Article 5, p. 4. ACM, New York (2012). https://doi.org/10.1145/2331714.2331720 7. Shen, R, Terada, T., Tsukamoto, M.: A system for visualizing sound source using augmented reality. In: Khalil, I. (ed.) Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia. MoMM 2012, pp. 97–102. ACM, New York (2012). https://doi.org/10.1145/2428955.2428979 8. Iversen, S., Kortbek, K.J., Nielsen, K.R., Aagaard, L.: Stepstone: an interactive floor application for hearing impaired children with a cochlear implant. In: Proceedings of the 6th International Conference on Interaction Design and Children. IDC 2007, pp. 117–124. ACM, New York (2007). https://doi.org/10.1145/1297277. 1297301 9. Billinghurst, M., Hirokazu, K., Poupyrev, I.: Tangible augmented reality. In: ACM SIGGRAPH ASIA Courses (2008) 10. Hartson, R., Pyla, S.: The UX Book: Process and Guidelines for Ensuring a Quality User Experience. Morgan Kaufmann, Burlington (2012). ISBN 10: 0123852412 11. Gelman, D.L.: Design For Kids: Digital Products for Playing and Learning. Rosenfeld Media, Brooklyn (2014) 12. Hughes, J.F., et al.: Computer Graphics: Principles and Practice, 3rd edn. AddisonWesley Professional, Boston (2013). ISBN 10: 0321399528

A Training Algorithm to Reinforce Generic Competences in Higher Education Students Sara Mu˜ noz(B) , Oscar Bedoya , Edwin Gamboa , and Mar´ıa Trujillo Universidad del Valle, Cali, Colombia {sara.munoz,oscar.bedoya,edwin.gamboa, maria.trujillo}@correounivalle.edu.co

Abstract. In recent years it has become notable that Colombian undergraduate students have deficiencies in generic competences such as languages, mathematics and quantitative reasoning. These competences are basic skills that students must develop throughout an undergraduate program. Moreover, these competences are determinants in students learning performance throughout an undergraduate program. In this paper, we propose a gamification strategy based on the self-determination theory as a tool to motivate undergraduate students to train and improve generic competences. Gamification has been successfully used in education on several occasions, however it has not been used to reinforce generic competences. Additionally, our strategy includes an approach to reinforce knowledge by identifying the weakest competences of a student and provide him/her with a possibility of training on specif subjects. Keywords: Education · Gamification · Generic competences Undergraduate students · Reinforcement

1

Introduction

In [7] the authors highlights in the work “Deserci´ on estudiantil en la educaci´ on superior colombiana” that one of the main problems faced by the Colombian higher education system concerns to the high levels of academic desertion in undergraduate education. Despite the increment in coverage and admission rates to higher education, in recent years, the number of students who complete their higher education is low, evidencing that a large number of them drop out, mainly in the first semesters. From this point of view, the Direcci´ on de Nuevas Tecnolog´ıas y Educaci´ on Virtual (DINTEV), entity responsible for involving ICT (information and communication technology) in the university’s educational processes proposed the development of a pedagogical intervention with the use of ICT to strengthen generic competences: critical reading and quantitative reasoning for students of Universidad del Valle. That intervention is framed in the construction of a Virtual Skills Gymnasium. This platform aims to support three levels of student knowledge. The first level focuses on students in first semesters c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 201–212, 2018. https://doi.org/10.1007/978-3-319-98998-3_16

202

S. Mu˜ noz et al.

who have deficiencies in knowledge acquired at the secondary school, the second level focuses on students with intermediate concepts who enter and deepen in their areas of knowledge, and the last level focuses on students with advanced knowledge who conducting research and/or a degree work. In this paper, we describe related works in Sect. 2. In Sect. 3, we propose a gamification strategy based on the self-determination theory and a training algorithm, which is an approach to reinforce students knowledge by identifying the weakest competences. In Sects. 4 and 5, we present and discuss preliminary results of an evaluation of the training zone algorithm. Finally, we summarize the results of this work and draw conclusions in Sect. 6.

2

Related Work

In the context of higher education, gamification has been mainly employed on educational platforms to motivate users to use them. For instance in [9], a mobile web application called Hortari was developed to motivate and engage students in learning some subjects. The application allows a student to view announcements given by a teacher and see a leaderboard. Each lesson is composed of a game, in which student should find read a set of books. After, finding and reading a book, student get certain points. The results of an evaluation suggest that the application addresses students learning objectives clearly. In [10], a gymnasium-laboratory in higher education was developed to allow collection, classification, evaluation and transmission of materials created by teachers. In the gymnasium-laboratory, students can find tutorials on the topics that cause them most difficulties. Each tutorial contains interactive exercises, videos, examples of past exams (some of them with solution and interactive explanation), course materials, simulators and various activities to develop mathematical thinking or specific objectives of the course topics. However, authors do not include evidence of the effect of using the application with students. In [2] a tool called CodeTraining is proposed to motivate students to improve their programming skills. The platform allows instructors to create and edit new programming exercises; and students to take available courses and solve programming exercises. They are allowed to compile and run code on the web interface receiving instant feedback. After solving problems, students receive points based on the difficulty level of the problem (normal, hard, and advanced). Also, the platform handle two types of leaderboards: course-leaderboard which filters users by course and a global-leaderboard which shows all ranked students considering the total points earned across the several courses. In order to evaluate the efficiency of the system, some tests were carried out involving 40 students from the Culiacan Technological Institute. However, obtained results are not presented in this research. In [3] a zombie-themed evidence-based medicine game was developed to allow medical students to test their EBM (evidence-based medicine) skills. The game is a “choose your own adventure” style that takes students through a scenario

A Training Algorithm to Reinforce Generic Competences in Students

203

where a disease outbreak is taking place and a resident is asked to use evidencebased medicine skills to select a screening and diagnostic tool to use on potentially infected patients. Within the story, the resident uses her evidence-based medicine skills to choose a screening and diagnostic tool. This game is an example of a self-assess tool for students. However, as the game was designed as a proof-of-concept only, the story is left at a cliffhanger, leaving room for future expansion. The game has been viewed over 443 times with 343 unique users between April 2016 and May 2017. Nevertheless, this research does not include any evaluation of the effect of the game on students’ learning process. In [12] Duolingo, a popular language learning application, is described. The platform uses a playfully illustrated, gamified design that combines point-reward incentives with implicit instruction, mastery learning, explanations and other best practices. Also, the application uses strength meters to visualize the student model and a half-life regression based algorithm that allows a student to remember what he/she has learned and identify the vocabulary that are most difficult for him/her to master. In this research two experiments were carried out and both conclude with successful results.

3

Proposed Approach

The results of our work will be integrated into a project called Virtual Skills Gymnasium, which focuses on reinforcing generic competences of quantitative reasoning and critical reading of higher education students. For the development of this project, a team of experts in pedagogy designed a bank of questions that students must answer throughout the game. These questions are intended to strengthen one or more of the generic competences. A story was composed as a motivation element. The story begins in a destroyed world with three environments, each one corresponds to one civilization. First, students can select one of three characters. Next, students are guided to an initial questions area, where the first questions to be answered are presented. The platform objective is to reconstruct the destroyed areas of the civilizations. As students answer questions correctly, they gain resources to rebuild the world’s areas and new paths are opened to more complex questions. 3.1

A Gamification Strategy Based on the Self-determination Theory

The Virtual Skills Gymnasium is composed a set of questions per area and a training zone. The area of questions is formed by different zones, one of them is the training zone. Therefore, the question bank is divided into two parts: the questions that will be answered in the main question area and the questions that will be answered in the training zone. Below, we describe the training zone and a gamification strategy to motivate students to use it. In addition to the story of the main questions area, we propose a gamification strategy to motivate students to work on their weakest competences. For

204

S. Mu˜ noz et al.

the construction and design of the gamification strategy we use the structure proposed in [1]. The gamification strategy consists of four mechanics: a training zone where students have the opportunity to improve skills in which they have the greatest difficulty, a shop where students can exchange earned points for tools or power ups that will help them to overcome certain challenges in the main question area. The third mechanic is a badge module, which presents different tasks that students must perform to win the medals and the fourth mechanic is a skills chart that measures the generic competences of a student according to his/her performance throughout the application. This strategy is was designed based on the self-determination theory, described by Daniel Pink in his book “The Surprising truth about what motivates us” [11]. According to [13], this theory considers humans as inherently proactive, with a strong internal desire for growth. With this idea, external environment should support humans internal motivators to avoid frustration. These internal motivators are considered innate needs for growth and well-being: – Competence or mastery: People are attempting to be effective in dealing with the external environment. – Relatedness: Involves social connection with family, friends and others. It can also be expressed as a desire to “make a difference” in a group. – Autonomy: The innate need to feel in charge of life, while doing what is considered meaningful and in harmony with the values of each person. Training Zone. This module is intended to identify the sub-competences of generic competences in which students has performed poorly. For this purpose, a selection of questions is made according to the deficiencies of students, which are identified based on his performance in the main questions area. From this selection, a small set of questions is presented to students based on the competences that they have unlocked in the main questions area, priority is given to those questions that correspond to the weakest competences of students. Thus this module is a tool to enhance students’ weak competences and review competences unlocked in the main questions area. With regard to self-determination theory, this module is a way to improve player’s mastery motivator since a student may feel that the platform provides him/her necessary tools to reach learning goals. The training module is a tool that students can use to improve their performance and gain mastery in generic competences. According to [4], gamification and learning are a natural fit; mastery is a strong motivator, we all have an innate desire to improve. People are often inspired to work toward mastering a particular skill or building their knowledge, because of this the training zone will help students to find their path to success. Badge Module. In this module we propose a series of medals and awards that students obtain according to their performance in the application. These badges are distributed throughout the world, relate students’ performance to

A Training Algorithm to Reinforce Generic Competences in Students

205

narrative and recognize their effort in the application. Badges are also proposed as an alternative path made up of more complex challenges, which students are free to accomplish. According to [4], only badges that are meaningful to people can be motivating; thus when badges recognize achievements engagement may be fostered. Badges have a special meaning in developing skills because these badges represent micro-credentials that can be used to certify skills attainment [4]. Additionally, this module is part of a positive feedback system. Positive feedback systems are described in [8] as supportive, encouraging, and emphasizing on strengths. Therefore, our proposed badge module is aligned with the three needs of the self-determination theory. Shop Module. In this module students can exchange points previously earned for power ups that serve as support tools to overcome some challenges that are proposed in the Virtual Skills Gymnasium, this mechanic is oriented to give value to points obtained by the player. Skills Module. This module is composed of a diamond-shaped graph where the performance of student’s skills is measured. Also, this module is part of the feedback system of the application, particularly reinforcement feedback is provided [8], since it highlights weaknesses and poorly performed areas on which students should improve. 3.2

Training Algorithm

The goal of each question in the question bank is to reinforce one or more of the competences of the two generic competences (quantitative reasoning and critical reading). The goals of the training zone are to strengthen competences in which students have difficulty and to rehearse competences that they have reviewed already. To aim that, each student is represented as a vector that contains a success rate for each competence in the training zone. The success rate is a numerical value between 0.0 and 1.0 and represents the probability that a student will correctly answer a question of that competence. This vector is arranged in ascending order, which means, the history of the weakest competences for corresponding student are located in the first positions of the vector. The history of each competence is made up of the number of associated questions that a student has answered correctly, the total number of associated questions that a student has answered and a success rate (SR) formed by the Correct quotient of the two previous values T otalq q . This success rate is updated each time a student answers a question in a session. Questions that a student should answer during a training session are selected from the competences that he/she has previously worked on. In case a student has not answered a question, it is assigned a very big integer value, so it is intended that a student will answer all the items of the training bank at least once. Otherwise, each answered question is scored to identify student’s weakest competences. The following variables are used to score questions:

206

S. Mu˜ noz et al.

– Student’s most difficult competences. X ci =

T otalComp−hC 1 i ∗ e T otalComp , SRCi

(1)

where Xci represents the position of competence i in a student’s history vector and SRCi is the success rate of the competence i. The vector of competences is ordered in ascending order, therefore competences with the lowest success rate will be placed in first positions and competences with the highest success rate in last positions. The parameter Xci measures the relationship between the total number of competences and hCi , where hCi is the index of the competence in the history vector. In case the student has SRCi = 0, we assign a very close value to 0, but no 0. – Relation between forgetting curve and student’s performance in a competence Ci , Δt Xf = (1 − SRCi )e s . (2) This equation comes from the Forgetting curve proposed by Ebbinghaus in 1985, who states that memory decays exponentially over time [6]. In [12] the authors propose a model of spaced repetition for learning a new language, here they used Δt, as the lag time (in days) since the item was last practiced. Following this approach, we make Δt as the number of days between the session date and the last date when student answer the question. The variable s denotes the strength of memory, this is calculated using the variance between SRh − SRh−1 , where SRh is the success rate obtain by the student in the last session where a student answered a question of competence Ci and SRh−1 is the success rate of competence Ci obtained prior to that session. In case of no variance between the success rate of the student (SRh − SRh−1 = 0) or the student only have one record for that competence, we make s = −1. Additionally, this equation is used in [5] for measuring knowledge proficiency in students. – Number of questions for each competence available in the training bank, CompQuestions

Xq = 2K2 ( T otalQuestions ) .

(3)

This variable is aimed to balance the number of questions in the bank for each competence, we pretend to do a balance selection of the questions for distributing questions of a competence along a training session. We use the constant K2 to handle the range of the variable Xq , because there are some exponential parameters (Xci , Xf ), which have a higher growth rate in comparison with the variable, then the variable Xq can lose weight against the variables Xci and Xf . Thus, the constant K2 controls the range of values that the variable Xq can reach. – Success rate of the competence and the difficulty level of the question, Xl = 2K2 (1−SRCi )+(M axLevel−Levelq ) ,

(4)

where SRCi is the success rate of each competence i, M axLevel is the highest level of difficulty of the question and Levelq is the difficulty level of question

A Training Algorithm to Reinforce Generic Competences in Students

207

being scored. The difficulty levels of the questions were defined by the team who designed the questions. For the critical reading questions, there are three difficulty levels 1, 2 and 3 corresponding to minor, medium and major. And for quantitative reasoning questions the difficulty level is between 1 and 5. The constant K2 accomplishes the same purpose explained in the previous term Xq . The final score for a question is composed of the sum of the above variables, as follows: (5) X t = K 1 X ci + X q + X f + X l ,

X t = K1

T otalComp−hC 1 i ∗ e T otalComp SRCi CompQuestions

+ 2K2 ( T otalQuestions ) + (1 − SRCi )e

Δt s

+ 2K2 (1−SRCi )+(M axLevel−Levelq ) ,

(6)

where K1 y K2 are constants that let us handle the range of the variables. Particularly, the constant K1 allows us to add more relevance to the variable Xci , because the selection of the questions in the training zone is mainly focused on weak competences. Afterwards, scores of the questions are sorted in descending order and the first questions with the highest scores are taken to be answered in current training zone session.

4 4.1

Evaluation Simulation-Based Results Data Simulation

In order to test the proposed algorithm, we generate simulated data. Pseudorandomized numbers are generated with Python’s random generator which uses the Mersenne Twister to generate integer randomized numbers that follow a uniform distribution. The number of questions, competences and students created, and date range of sessions and evaluation day are parameters that are previously defined. First, a number of competences and questions were created, then a competence is assigned to each question randomly. Thus, the number of questions corresponding to each competence is irregular. Each question contains a level, a description, a score and answer options. During the simulation a number of students is created and then 5 to 12 sessions for a student are generated. Afterwards, 1 to 10 random answers are included per student’s session. Each answer corresponds to one question, this question can be answered several times, but only once in a session. Finally, the history vector of each student is calculated according to generated answers. For each student session, a history for each competence is calculated; if student has answered at least one question of that competence. Otherwise, the history of the competence is not calculated.

208

4.2

S. Mu˜ noz et al.

Test with Simulated Data

In this research two experiments were carried out; we make K1 and K2 equal to 10. The information of the initial variables of the test is presented in Table 1: Table 1. Test 1 parameters Variables

Value

Number of questions

40

Levels of questions

1,2,3

Number of competences 10 Number of answers

67

Range of date session

30 days

Day of evaluation

Day 31

For evaluation purposes we name the competences with dummy names. The period of time of the answers is one month. 67 answers were generated using the initial data. Then, historic data for each student were calculated. Table 2 summarizes the initial information of the first experiment, percentage of questions in each competence and the success rate in each competence of the student. Table 2 also shows if any questions from that competence have been selected for the training set. Table 2. Test 1 environment Competence Percentage of questions Success rate Selected competence0 10.0%

0.44

No

competence1

5.0%

0.50

No

competence2 10.0%

0.40

No

competence3 12.5%

0.43

Yes

competence4 10.0%

0.33

Yes

competence5 15.0%

0.57

Yes

competence6

7.5%

0.00

Yes

competence7 10.0%

0.50

Yes

competence8 10.0%

0.30

No

competence9 10.0%

0.40

Yes

The portion of selected questions for each competence is shown in Fig. 1. As illustrated in Fig. 1 and in relation to Table 2, six competences were selected, 83% of them have a success rate between 0.00 and 0.50. Also, 90% of

A Training Algorithm to Reinforce Generic Competences in Students

209

Fig. 1. Selected questions for training in test 1

selected questions belong to those competences. Besides, availability of questions for selected competences is between 4 and 6 questions. The minimum number of questions for a competence was 2 for competence1 and the maximum is 6 for competence5. For the second experiment, we decrease the period of time, therefore we obtained less answers than in previous test. The initial variables of the simulation data are shown in Table 3. The range of possible sessions is 20 days and 53 answers were generated. Table 4 shows the percentage of questions, success rate of a student in each competence and if any questions from that competence have been selected for the training set. The obtained results are shown in Fig. 2. Table 3. Test 2 parameters Variables

Value

Number of questions

40

Levels of questions

1,2,3

Number of competences 10 Number of answers

53

Range of date session

20 days

Day of evaluation

Day 21

The portion of selected questions for each competence is shown in Fig. 2. Six competences were selected, 66.67% of those have a success rate between 0.00 and

210

S. Mu˜ noz et al. Table 4. Test 2 environment Competence Percentage of questions Success rate Selected competence0 15.0%

0.20

No

competence1 15.0%

0.17

Yes

competence2 12.5%

0.67

Yes

competence3

5.0%

1.0

No

competence4

2.5%

No history

No history

competence5

7.5%

0.50

Yes

competence6

7.5%

0.83

No

competence7 12.5%

0.50

Yes

competence8 15.0%

0.63

Yes

competence9

0.00

Yes

7.5%

Fig. 2. Selected questions for training in test 2

0.50, and 60% of selected questions belong to those competences. Besides, availability of questions for selected competences is between 3 and 5. The minimum number of questions is 3 for competence5, competence7, competence9 and the maximum is 6 for competence0, competence1, competence8. The selection of the questions is also dependent of the forgetting curve and the level of the question, as we mentioned previously.

A Training Algorithm to Reinforce Generic Competences in Students

5

211

Discussion

The goal of the proposed training algorithm is to balance all factors that interfere in the selection of questions to be answered by students in a training session: number of questions, success rate of student in each competence, difficulty level of each question and forgetting curve. Therefore, a student has the opportunity to know and improve him/her weaknesses and make a review in all of the competences. Nevertheless, the result of the algorithm is dependent on the number of questions in the training bank for each competence. Thus, if the bank has a low number of questions, selection may become repetitive and boring for students. On the other hand, the results of the training equation can be determined by the variables Xci and Xf because these are exponential. Thus, those parameters have a faster growth than Xq and Xl parameters. Owing to this, we use the constant K2 to avoid that parameters Xq and Xl lose weight. Furthermore, Xf could become dominant depending on the period of time considered during questions selection. In that case, selected questions may not belong to weakest competences, but to the competences that a student has not reviewed for a long time.

6

Conclusions

We proposed a complementary gamification strategy based on self-determination theory that will be integrated in a platform called the Virtual Skills Gymnasium. We expect that the mechanics proposed in the gamification strategy allow the reinforcement of the generic competences of students and students obtain better results in their academic performance. In addition, we propose an training zone supported by an algorithm that allows identifying and addressing student’s weaknesses in generic competences. As a result, the training zone is presented as a space for practicing and improving generic competences. Obtained results showed that the algorithm focuses on balancing the variables that influence the selection of the questions in the training zone. In this way, we aim that the algorithm makes the best selection of questions for students. However, the selection depends on some factors such as the number of questions available in the training bank. Thus, better results will be obtained with a numerous bank of training questions. Our gamification strategy not only concentrates on motivating students to answer a set of questions, but also to train and review their weakest competences, which are determined based on students performance and forgetting curve. Thus, our approach may be effective in supporting students needs of mastery and autonomy. We plan to carry out tests with users and real data, which will allow us to take a closer view on the effect of the gamification strategy and the training zone on students’ performance.

212

S. Mu˜ noz et al.

Acknowledgments. The authors would like to thank to the Direcci´ on de Nuevas Tecnolog´ıas y Educaci´ on Virtual (DINTEV) for its collaboration and support in this work.

References 1. Aparicio, A.F., Vela, F.L.G., S´ anchez, J.L.G., Montes, J.L.I.: M´etodo de an´ alisis y aplicaci´ on de la gamificaci´ on. In: Interacci´ on 2012. Elche, Alicante, Spain, October 2012 2. Barr´ on-Estrada, M.L., Zatarain-Cabada, R., Lindor-Valdez, M.: CodeTraining: an authoring tool for a gamified programming learning environment. In: PichardoLagunas, O., Miranda-Jim´enez, S. (eds.) MICAI 2016. LNCS (LNAI), vol. 10062, pp. 501–512. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-624280 41 3. Blevins, A.E., Kiscaden, E., Bengtson, J.: Courting apocalypse: creating a zombiethemed evidence-based medicine game. Med. Ref. Serv. Q. 36(4), 313–322 (2017) 4. Burke, B.: Gamify: How Gamification Motivates People to Do Extraordinary Things. Routledge, Abingdon (2016) 5. Chen, Y., et al.: Tracking knowledge proficiency of students with educational priors. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 989–998. ACM (2017) 6. Ebbinghaus, H.: Memory: a contribution to experimental psychology. Ann. Neurosci. 20(4), 155 (2013) 7. Guzm´ an, C., et al.: Deserci´ on estudiantil en la educaci´ on superior colombiana. metodolog´ıa de seguimiento, diagn´ ostico y elementos para su prevenci´ on. Ministerio de Educaci´ on Nacional. Colombia (2009) 8. Kapp, K.M.: The Gamification of Learning and Instruction: Game-Based Methods and Strategies for Training and Education. Wiley, Hoboken (2012) 9. Landicho, J.A., Cerna, A.P.A.D., Marapao, J.J.G., Balhin, G.P.: Hortari: a gamification application for engaged teaching and learning in higher education. J. eLearn. Knowl. Soc. 13(1), 33–40 (2017) 10. Medina Herrera, L.M., Jaquez Rueda, J., Noguez Monroy, J.J., Garc´ıa Castel´ an, R.M.G., Tecnol´ ogico de Monterrey, CCM: Newton gymlab: gimnasio-laboratorio virtual de f´ısica y matem´ aticas (2013) 11. Pink, D.H.: Drive: The Surprising Truth About What Motivates Us. Penguin, London (2011) 12. Settles, B., Meeder, B.: A trainable spaced repetition model for language learning. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1848–1858 (2016) 13. Werbach, K., Hunter, D.: For the Win: How Game Thinking Can Revolutionize Your Business. Wharton Digital Press, Philadelphia (2012)

A Structure-from-Motion Pipeline for Topographic Reconstructions Using Unmanned Aerial Vehicles and Open Source Software Jhacson Meza1 , Andr´es G. Marrugo1(B) , Enrique Sierra1 , Milton Guerrero3 , Jaime Meneses3 , and Lenny A. Romero2 1

3

Facultad de Ingenier´ıa, Universidad Tecnol´ ogica de Bol´ıvar, Cartagena, Colombia [email protected] 2 Facultad de Ciencias B´ asicas, Universidad Tecnol´ ogica de Bol´ıvar, Cartagena, Colombia ´ Grupo de Optica y Tratamiendo de Se˜ nales, Universidad Industrial de Santander, Bucaramanga, Colombia http://opilab.unitecnologica.edu.co/

Abstract. In recent years, the generation of accurate topographic reconstructions has found applications ranging from geomorphic sciences to remote sensing and urban planning, among others. The production of high resolution, high-quality digital elevation models (DEMs) requires a significant investment in personnel time, hardware, and software. Photogrammetry offers clear advantages over other methods of collecting geomatic information. Airborne cameras can cover large areas more quickly than ground survey techniques, and the generated Photogrammetrybased DEMs often have higher resolution than models produced with other remote sensing methods such as LIDAR (Laser Imaging Detection and Ranging) or RADAR (radar detection and ranging). In this work, we introduce a Structure from Motion (SfM) pipeline using Unmanned Aerial Vehicles (UAVs) for generating DEMs for performing topographic reconstructions and assessing the microtopography of a terrain. SfM is a computer vision technique that consists in estimating the 3D coordinates of many points in a scene using two or more 2D images acquired from different positions. By identifying common points in the images both the camera position (motion) and the 3D locations of the points (structure) are obtained. The output from an SfM stage is a sparse point cloud in a local XYZ coordinate system. We edit the obtained point in MeshLab to remove unwanted points, such as those from vehicles, roofs, and vegetation. We scale the XYZ point clouds using Ground Control Points (GCP) and GPS information. This process enables georeferenced metric measurements. For the experimental verification, we reconstructed a terrain suitable for subsequent analysis using GIS software. Encouraging results show that our approach is highly costeffective, providing a means for generating high-quality, low-cost DEMs.

c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 213–225, 2018. https://doi.org/10.1007/978-3-319-98998-3_17

214

J. Meza et al. Keywords: Geomatics Open source software

1

· Structure from Motion

Introduction

The digital elevation model (DEM) is a three-dimensional visual representation of a terrestrial zone topography and is a commonly used geomatics tool for different land analysis properties such as slopes, height, curvature, among others. There are different technologies for the generation of DEMs, which include LIDAR (Laser Imaging Detection and Ranging), RADAR (Radio Detection and Ranging) or the conventional theodolites [1]. However, these techniques often do not offer enough spatial resolution to recover the terrain microtopography. On the one hand, it is frequently difficult to accurately measure the intricate drain networks with conventional techniques because they are within the measurement resolution. On the other, they are also difficult to measure in the field due to access limitations. As an alternative to these methods, Unmanned Aerial Vehicles (UAVs) equipped with high-resolution cameras have recently attracted the attention of researchers [2]. The UAVs acquire many images of an area of interest and using stereo-photogrammetry techniques generate a terrain point cloud. This point cloud represents an accurate terrestrial zone DEM [3]. However, UAV operation for precise digital terrain model estimation requires certain flight parameters, captured images characteristics, among other aspects [4]. Stereo-photogrammetry techniques consist in estimating 3D coordinates of several points in a scene using two or more 2D images taken from different positions. Within these images common points are identified, that is, the same physical object point as seen in different images. Then a line-of-sight or ray is constructed from the camera location to the detected object point. Finally, the intersection between these rays is calculated, being this process known as triangulation, which yields the three-dimensional location of the physical point. By doing the above for a significant number of points in the scene, it is possible to obtain a point cloud in the three-dimensional space which is representative of the object or the surface. To obtain a point cloud or to recover structure, correct correspondences between different images should be obtained, but often incorrect matches appear. The triangulation fails for incorrect matches. Therefore, it is often carried out in a robust approach. Recently, photogrammetric methodologies have been proposed to address the robust estimation problem of structure from multi-views, such as Structure from Motion (SfM) [5] and Multi-View Stereo (MVS) [6]. On the one hand, SfM is a methodology that, using a single camera that moves in space, allows us to recover both the position and orientation of the camera (motion) and the 3D location of the points seen in different views (structure). On the other, MVS allows us to densify the point cloud obtained with SfM. Nowadays there are several commercial software like Agisoft [7] or Pix4D [8] that allow obtaining dense 3D points clouds. However, being closed code applications they do not favor research reproducibility, and the code cannot be

An SfM Pipeline for Topographic Reconstructions Using UAVs

215

Fig. 1. The testing site.

modified. In this paper, we propose a processing tool or reconstruction pipeline for software-based geomatics applications and open source libraries. The pipeline is mainly based on the OpenSfM [9] and OpenDroneMap [10] libraries.

2

Method

In this work, we propose a processing pipeline for generating a Digital Elevation Model as depicted in Fig. 2. Our implemented strategy consists of four stages (one of them is optional) based mainly on the OpenSfM and OpenDroneMap libraries. To illustrate our approach for DEM generation, we have chosen a specific land located in the south zone from the Campus Tecnol´ ogico of the Universidad Tecnol´ ogica de Bol´ıvar (Cartagena de Indias, Colombia) as shown in Fig. 1. We have acquired 140 images with a DJI Phantom 3 Professional drone. The camera calibration stage is an optional step in the proposed methodology. For this reason, we show this block in a dotted line in Fig. 2. We carried out the camera calibration with the OpenCV library [11] for estimating the intrinsic camera parameters for the drone camera. The first stage consists in setting the flight path for the drone to carry out the image acquisition. We used the Altizure application [12] to set a flight strategy. The second stage is about performing the 3D reconstruction process. This stage is based mainly on the SfM photogrammetric technique implemented with the OpenSfM library that produces a scene point cloud. If it is required, we can edit the obtained point cloud in MeshLab [13] an open source system for processing and editing 3D triangular meshes and point clouds. The final stage is the point cloud post-processing obtained with OpenSfM which is done with the OpenDroneMap library. In this part, we convert the

216

J. Meza et al.

Fig. 2. Reconstruction process for DEM generation: camera calibration with OpenCV (optional step). First stage consists of image acquisition with a drone and Altizure app. The second stage is based on the OpenSfM library for the 3D reconstruction. The final stage is based on the OpenDroneMap library for post-processing the point cloud.

point cloud to LAS format; we generate a 3D surface with texture and with the captured images we generate an orthophoto mosaic. 2.1

OpenCV Stage: Camera Calibration

Camera calibration is a fundamental prerequisite for metric 3D sparse reconstruction from images [3]. It is necessary to know the intrinsic and extrinsic parameters of a camera to estimate the projection matrix P . With this matrix, we can find the x position in the image plane of a three-dimensional point, as given by Eq. (1). ⎡ ⎤ ⎤⎡ ⎤ X ⎡ r11 r12 r13 tx ⎢ ⎥ ax s x0 Y⎥ (1) x = P X = K[R | t]X = ⎣ 0 ay y0 ⎦ ⎣r21 r22 r23 ty ⎦ ⎢ ⎣ Z ⎦, 0 0 1 r31 r32 r33 tz     1   K Mext X

where the extrinsic parameters matrix Mext describe camera orientation and it consists of a rotation matrix R and a translation vector t, and the intrinsic parameters matrix K contains the camera internal parameters like the focal length in x and y direction (ax and ay ), skew (s) and optic center (x0 and y0 ). In addition, the camera lens radial distortion parameters k1 and k2 are important values for compensating geometric distorstions in the images caused by the camera lens, which are not included in the matrix K. The mathematical model for the lens radial distortion is given by [14]

An SfM Pipeline for Topographic Reconstructions Using UAVs

217

Fig. 3. Screenshot of Altizure, the mobile application used to implement the flight strategy for image acquisition.





xd 2 4 xc /zc = [1 + k1 r + k2 r ] , yd yc /zc

 r = 2

xc zc



2 +

yc zc

2 .

(2)

where (xd , yd ) are the distorted image coordinates and (xc , yc , zc ) are the normalized camera coordinates. The camera position and orientation, or the camera extrinsic parameters, are computed in the SfM pipeline. Therefore, only the intrinsic parameters have to be known before the reconstruction process. Usually, calibration of aerial cameras is performed in the [3]. For this work, the camera calibration was carried out with OpenCV by acquiring images from a flat black and white chessboard. The intrinsic parameters required by OpenSfM are the focal ratio and the radial distortion parameters k1 and k2 . The focal ratio is the ratio between the focal length in millimeters and the camera sensor width also in millimeters. This calibration process is optional because OpenSfM gives us the possibility to use the values stored in the EXIF information for the images. These parameters can be optimized during the reconstruction process. 2.2

Altizure Stage: Image Acquisition

Three-dimensional reconstruction algorithms require images of the object or scene of interest acquired from different positions. There has to be an overlap between the acquired images to be able to reconstruct an area. Any specific region must be observable in at least three images to be reconstructed [4]. Usually, image-based surveying with an airborne camera requires a flight mission which is often planned with dedicated software [3]. In this work, we used Altizure [12], a free mobile application that allows us to design flight paths specified in a satellite view based on Google Maps [15] as shown in Fig. 3. Further,

218

J. Meza et al.

Fig. 4. Pipeline reconstruction: each image goes through the stages of feature detection, point matching, camera pose estimation (motion), sparse reconstruction (structure) and finally, dense reconstruction using multi-view stereo (MVS).

with this application, we can adjust specific parameters such as flight height, camera angle and forward and side overlap percent between images. 2.3

OpenSfM Stage: Pipeline Reconstruction

The following stage is the 3D reconstruction process which we implemented with the OpenSfM library. This library is based on the SfM and MVS techniques. In Fig. 4 we show a workflow diagram for the 3D reconstruction process. First, the algorithm searches for features on the input images. A feature is an image pattern that stands out from its surrounding area, and it is likely to be identifiable in other images [16]. The following step is to find point correspondences between the images. Finally, the SfM technique uses the matched points to compute both the camera orientation and the 3D structure of the object. These steps lead to a sparse point cloud, which only includes the bestmatched features from the input images. It is possible to obtain a denser point cloud using MVS. This additional process increases the number of points resulting in a more realistic view of the scene [17]. The obtained reconstruction is then georeferenced converting the XYZ coordinates of each point to GPS coordinates. Finally, we used MeshLab for the removal of objects which are not of interest and for the visualization of the obtained point cloud in PLY format. Feature Detection and Matching. The search for characteristics or feature detection consists in calculating distinctive points of interest in an image which are readily identifiable in another image of the same scene. The feature detection process should be repeatable so that the same features are found in different photographs of the same object. Moreover, the detected features should be unique, so that they can be told apart from each other [18].

An SfM Pipeline for Topographic Reconstructions Using UAVs

219

(a) Feature detection using HAHOG algorithm.

(b) Matching of detected features in two photographs of the same scene.

Fig. 5. Key points detected with HAHOG algorithm (a) and feature matching resulting from FLANN algorithm (b). (Color figure online)

The detector used with the OpenSfM library is the HAHOG (the combination of Hessian Affine feature point detector and HOG descriptor), but apart from this, we have the AKAZE, SURF, SIFT and ORB detectors available [19]. These detectors calculate features descriptors that are invariant to scale or rotation. This property enables matching features, regardless of orientation or scale. In Fig. 5a we show with red marks the detected features in for a given image. Using these descriptors, we can find correspondences between the images, that is, to identify the 3D points of the same physical object which appear in more than one image. This process is implemented with the FLANN algorithm [20] available in the OpenSfM library. We can see an example of this process in Fig. 5b. Sparse (SfM) and Dense (MVS) Reconstruction. The SfM technique uses the matched points uij for calculating both the camera pose, to compute the projection matrix Pi , and the 3D position Xj of specific points through triangulation. The triangulation process give an initial estimation of Pi and Xj

220

J. Meza et al.

Fig. 6. Reconstruction outputs from OpenSfM library. (a) Sparse point cloud. (b) Dense point cloud.

which usually is refined using iterative non-linear optimization to minimize the reprojection error given by E(P, X) =

n  m 

d(uij , Pi Xj )2 ,

(3)

i=1 j=1

where d(x, y) denotes the Euclidean distance, n is the number of total images and m the number of 3D points. This minimization problem is known as bundle adjustment [21] which yields a sparse point cloud. This approach is implemented in OpenSfM with the incremental reconstruction pipeline that consists of performing an initial reconstruction with only two views and then enlarging this initial point cloud by adding other views until all have been included. This process yields a point cloud like the one shown in Fig. 6a. With the sparse 3D reconstruction, we can generate dense point clouds with the MVS technique. There are many approaches to MVS, but according to Furukawa and Ponce [22], these can be classified into four categories according to the representation of the generated scene. These approaches are Voxels, polygonal meshes, multiple depth maps, and patches. In OpenSfM, the MVS approach is multiple depth maps, creating one for each input image. The obtained depth maps are merged into a single 3D representation of the scene [23] obtaining a dense reconstruction as shown in Fig. 6(b). Georeferencing of Reconstruction. The obtained point cloud in the SfM and MVS processes are in an XYZ topocentric coordinates system. This coordinate system uses the observer’s location as the reference point. This reference point is set as the average value of the GPS coordinates from all photographs, as long as all the images have this information. With this reference point, we convert the point cloud from topocentric XYZ coordinates to the GPS coordinates. From this transformation, we obtain the georeferenced point cloud, where each point has an associated GPS position. Using the Google Earth tool, we can locate the point cloud in the real world, as shown in Fig. 7.

An SfM Pipeline for Topographic Reconstructions Using UAVs

221

Fig. 7. Sparse point cloud georeferenced seen from Google Earth.

3

OpenDroneMap (ODM) Stage: Post-processing

With the sparse and dense reconstruction generated with OpenSfM in PLY format, we use OpenDroneMap to post-process the point cloud. The postprocessing consists in generating a geo-referenced point cloud in LAS format, a geo-referenced 3D textured mesh (Fig. 8a) and an orthophoto mosaic in GeoTiff format (Fig. 8b). The LAS format is a standard binary format for the storage of LIDAR data, and it is the most common format for exchanging point cloud. At the end of the OpenSfM process, the sparse and dense point cloud is generated in PLY format with geo-referenced coordinates. OpenDroneMap converts these files in a georeferenced point cloud in LAS format, which can be used in other GIS software for visualization or ground analysis. The 3D textured mesh is a surface representation of the terrain that consists of vertices, edges, faces and the texture from the input images that is projected on it. ODM create a triangulated mesh using the Poisson algorithm. It consists in using all the points of the dense point cloud and its respective normal vectors from the PLY file to interpolate a surface model generating a welded manifold mesh on the form of a PLY file. Finally, the texture from the input images is projected on the mesh generating a 3D textured mesh in OBJ format. An orthophoto is an orthorectified aerial image, i.e., there are no geometrical distortions, and the scale is uniform throughout the image. The GeoTIFF format allows embedding the georeferencing information within an orthophoto in TIFF format generated with all images used in the reconstruction process. The resulting orthophoto allows us to measure distance accurately and can be used as background image for maps in applications using GIS software.

222

J. Meza et al.

(a) georeferenced 3D textured mesh

(b) Orthophoto made with 140 images.

Fig. 8. Outputs files from OpenDroneMap.

4

Ground Analysis

The study area is an area with little vegetation which has relatively flat regions and others with a significant slope. The photographs acquired from this study area were processed as explained in the previous sections. We obtained different 3D models. With the LAS file and the orthophotography produced with OpenDroneMap, we can carry out many different terrain analyses. In this work, we did basic elevation analysis and generated land contour lines. From the LAS file information, we generated a terrain digital elevation model (DEM) shown in Fig. 9. In this model, we can see the different height levels from the lowest (blue) to the highest (red). We have in total nine elevation levels each with a different color. In the figure, we can also see that the lower zone is

An SfM Pipeline for Topographic Reconstructions Using UAVs

223

Fig. 9. Digital elevation model. (Color figure online)

Fig. 10. Contour lines of land reconstructed.

a relatively flat because most of the area has only the color blue. In the part where there is a steep slope, we see that there are different height levels shown in different colors. This, in fact, shows that the area is not flat. In the upper

224

J. Meza et al.

zone the orange and red regions, we see that there is a flat zone in orange which is a narrow dirt road (which we can see in Fig. 8b of the orthophoto or Fig. 8a of the textured mesh). In red, we detect trees that represent the highest elements in the reconstructed area of interest. Using the LAS file and the orthophotography obtained with ODM we generate terrain contour lines (Fig. 10) placing the DEM on top of the orthophoto. In this figure we can see that there are many contour lines close to each other in the part of the terrain with the steepest slope with respect to other zones, it is mainly because this area is not flat and the height change faster. Contrarily, since the road is slightly flat, the contour lines on it are more separated from each other.

5

Conclusions

In this work, we have shown a methodology for 3D terrain reconstruction based entirely on open source software. The georeferenced point clouds, the digital elevation models and the orthophotographs resulting from the proposed processing pipeline can be used in different geomatics and terrain analysis software to generate contour lines and, for instance, to perform surface runoff analysis. Therefore, the combination of open source software with unmanned aerial vehicles is a powerful and inexpensive tool for geomatic applications. In the bundle adjustment process discussed in Sect. 2.3 given by Eq. (3), using only matched points from images is not possible to reconstruct the scene in a real scale. This restriction is why it is necessary to give additional information that can be used as initialization for the optimization process to recover the scale. This information can be an approximated position of the camera or the world position of specific points known as Ground Control Points (GCP). In our reconstruction process, we did not use GCPs, only the GPS position of camera measured by the drone was used as initialization of camera pose. This measurement is not highly accurate. As future work, we want to use GCPs in addition to camera GPS position to compare both reconstructions and compare for an elevation error. Acknowledgement. This work has been partly funded by Universidad Tecnol´ ogica de Bol´ıvar project (FI2006T2001). E. Sierra thanks Universidad Tecnol´ ogica de Bol´ıvar for a Masters degree scholarship.

References 1. Nelson, A., Reuter, H., Gessler, P.: DEM production methods and sources. Dev. Soil Sci. 33, 65–85 (2009) 2. Carbonneau, P.E., Dietrich, J.T.: Cost-effective non-metric photogrammetry from consumer-grade sUAS: implications for direct georeferencing of structure from motion photogrammetry. Earth Surf. Process. Land. 42, 473–486 (2016) 3. Nex, F., Remondino, F.: UAV for 3D mapping applications: a review. Appl. Geomat. 6(1), 1–15 (2014)

An SfM Pipeline for Topographic Reconstructions Using UAVs

225

4. James, M., Robson, S.: Straightforward reconstruction of 3D surfaces and topography with a camera: accuracy and geoscience application. J. Geophys. Res. Earth Surf. 117(F3) (2012) 5. Fonstad, M.A., Dietrich, J.T., Courville, B.C., Jensen, J.L., Carbonneau, P.E.: Topographic structure from motion: a new development in photogrammetric measurement. Earth Surf. Process. Land. 38, 421–430 (2013) 6. Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009, pp. 2402– 2409. IEEE (2006) 7. Agisoft photoscan professional. http://www.agisoft.com/downloads/installer/ 8. Pix4D. https://pix4d.com/ 9. Mapillary: OpenSfM. https://github.com/mapillary/OpenSfM 10. OpenDroneMap. https://github.com/OpenDroneMap/OpenDroneMap 11. Bradski, G., Kaehler, A.: OpenCV. Dr. Dobb’s J. Softw. Tools. 3 (2000) 12. Altizure. https://www.altizure.com 13. Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., Ranzuglia, G.: MeshLab: an open-source mesh processing tool. In: Eurographics Italian Chapter Conference, vol. 2008, pp. 129–136 (2008) 14. Duane, C.B.: Close-range camera calibration. Photogram. Eng 37(8), 855–866 (1971) 15. Google maps. https://maps.google.com 16. Tuytelaars, T., Mikolajczyk, K., et al.: Local invariant feature detectors: a survey. R Comput. Graph. Vis. 3(3), 177–280 (2008) Found. Trends 17. Bolick, L., Harguess, J.: A study of the effects of degraded imagery on tactical 3D model generation using structure-from-motion. In: Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications XIII, vol. 9828, p. 98280F. International Society for Optics and Photonics (2016) 18. Grauman, K., Leibe, B.: Visual object recognition. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 5, no. 2, pp. 1–181 (2011) 19. Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 79–116 (1998) 20. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(331–340), 2 (2009) 21. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https:// doi.org/10.1007/3-540-44480-7 21 22. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010) 23. Adorjan, M.: “openSfM ein kollaboratives structure-from-motion system”; betreuer/in (nen): M. wimmer, m. birsak; institut f¨ ur computergraphik und algorithmen. abschlusspr¨ ufung: 02.05.2016 (2016)

CREANDO – Platform for Game Experiences Base on Pervasive Narrative in Closed Spaces: An Educational Experience Carlos C. Ceron Valdivieso1, Jeferson Arango-López1,2(&) Cesar A. Collazos1 , and Francisco Luis Gutiérrez Vela2 1

,

FIET, University of Cauca, Street 5 Nº 4-70, Popayán, Colombia [email protected] 2 ETSIIT, Department of Languages and Informatics Systems, University of Granada, Street Periodista Daniel Saucedo Aranda, s/n, 18071 Granada, Spain

Abstract. Currently, games that combine the real and virtual worlds have higher availability, wide extension, and better performance thanks to the advances in mobile technologies. These advances allow increasing the accessibility by people. That is why mobile devices have become a fundamental part of daily lives of people. It has involved to people in different types of user experiences as those given by augmented reality, geolocation, virtual reality, among others. This kind of game is defined like Pervasive Game (PG) – it is known also as Pervasive Game Experience (PGE) –, where the game world is extended beyond limits of the virtual world, and it can become in part of the player reality. This kind of games can impact the player lives in a positive or negative way. In this line, the academic and industrial fields have tried to create frameworks and tools to develop PG, but it has been in a separately way. For that reason, this paper describes the design and implementation of a platform that integrates several tools to create and edit game experiences in both fields. This paper focuses the study on the narrative and geolocation applied to close spaces, it is a way of giving an enriched experience to the player. Finally, we present a prototype to validate the platform in the educational context. The game demonstrated the increase in the motivation and learning on students of the first semester of computer engineering about basic knowledge in programming. Keywords: Pervasive games Education

 Narrative  Game experience  Geolocation

1 Introduction The PG have become increasingly popular in recent years, a factor that has influenced this trend being the advances achieved in the field of mobile technologies, their massification, wide access, cost reduction, and computational performance. All of this has allowed us to offer users new experiences that in former times were only in their imagination. Today’s powerful mobile platforms enable users to be aware of their context, have multiple media and interact with their environment [1]. This has allowed © Springer Nature Switzerland AG 2018 J. E. Serrano C. and J. C. Martínez-Santos (Eds.): CCC 2018, CCIS 885, pp. 226–236, 2018. https://doi.org/10.1007/978-3-319-98998-3_18

CREANDO - Platform for Game Experiences

227

us to break the boundaries of the game world, and a mix between the physical world and the virtual worlds has emerged. From this, pervasive experiences have been created for users. The development of this type of experience is enriched by the use of technologies such as positioning and augmented reality. In addition, these experiences must be linked by a coherent narrative, a common thread in the story, which is responsible for conveying the author’s intention and directs the user’s interactions for their use [2]. Performing the synergy between these areas of knowledge to obtain a pervasive experience is not a formal and defined procedure. On the contrary, as in most scenarios where software is developed is a task of creation to measure. This is one of the main reasons for the high cost of developing this type of experience. From industry and academia, there have been significant but uncohesive efforts in terms of developer tools and framework proposals [3]. The development of the CREANDO platform emerges as a first step in the optimization of the development of play experiences based on pervasive narrative. Specifically, the platform allows the generation of experiences in closed spaces. This is achieved through the integration of free and open source tools that allow for a cohesive experiences development environment through a platform for creating and editing stories with support for multi-media deployment and interaction with the environment through AR augmented reality, indoor location through Beacons, presentation of information through QR codes, video, HTML text, images, among others. The CREANDO platform will be validated by a test case developed on the platform, supported by statistical tools (surveys) before and after the experience has been played. This paper is structured as follow. The Sect. 2 presents the background with the main topics. In Sect. 3 the related works are presented. Section 4 shows the process to build the platform CREANDO. In Sect. 5 the case of study is presented. Finally, in Sect. 6 are presented the conclusions and the future work.

2 Background 2.1

Games

Classic definitions of formal games such as the one presented by Huizinga [4] in the first part of the 20th century, define playing as a free activity that is very consciously outside of ordinary life and is not a serious activity, but at the same time it absorbs the player intensely and completely. This was a definition according to the time, where games were created and played in the physical world making use of real world properties such as the objects used and the spaces where they were developed [5]. Given the advance of technology, this definition would be adapted in principle, moving to completely virtual environments to take advantage of the massification and reduction of computer costs. Then, the same phenomenon occurred in mobile devices where in less than a decade, this technology, coupled with the advances in wireless transmission has evolved rapidly, and now provides the means for the development of applications that break the boundaries between the physical world and the virtual world [6].

228

2.2

C. C. Ceron Valdivieso et al.

Pervasive Games

The pervasive games are characterized by being played when and where the player wishes, taking into account the presence of some limitations of the same devices such as availability or not of internet connection, or battery life of mobile devices. Despite these limitations and without distinction of the application context, its main objective is to attract and keep the player by presenting an enriched game experience, which may or may not be used for learning, recreation or the relationship with his environment [7]. In addition, it can be said that the pervasive games are made up of components that work together and on which the narrative is based. These components are the devices, context, social interaction, time, spaces, realities and multiple media as proposed by Arango-López et al. in [6]. On the other hand, narrative is seen as the succession of events with the potential to convey events through the media, which unites the different components into a coherent and satisfying story for the user, giving rise to a pervasive narrative [8]. 2.3

Indoor Location and Proximity Technologies

Location and proximity systems are mechanisms for determining the position of an object in space. There are systems with global or local coverage with precision ranges of a few meters or millimeters, for outdoor or indoor use. There are even extra planetary positioning systems as Karimi shows in [9]. These systems make use of various base technologies such as optics, electromagnetic or acoustic components; which usually use some type of device with infrared, radiofrequency or ultrasonic detection capabilities, examples of these technologies can be found in [10]. There are also positioning systems based on Bluetooth technology, with precision ranges of a couple of meters, which makes it ideal for indoor positioning, in addition to the low cost of this technology [11]. One of its disadvantages is its moderate range, which makes it necessary to use several devices to cover a larger area as proposed by Feldmann et al. [12]. Some of the best known devices for this task are the Bluetooth Beacons, which due to their characteristics are of interest for the present work. 2.4

Information Deployment Technologies Between Realities

One of the technologies for deploying information between realities is augmented reality (AR), which defines the vision of a physical environment of the real world, through a technological device, adding virtual information to existing physical information; that is, a virtual synthetic part to the real one. The combination of tangible physical elements with virtual elements creates a real time AR [13]. Azuma defines in [14] that AR as the combination of real and virtual elements interactively in real time. According to Prendes Espinosa [15], the main techniques to show AR are: AR Glasses, Handheld Screen, Spatial Projection. Of particular interest is the handheld screen technique, given that the focus of this work is on the use of mobile platforms used as the camera’s optical input to superimpose the video on the graphic information. In addition, as mentioned in [16] there are different levels of augmented reality that can be defined by the different degrees of complexity of AR based

CREANDO - Platform for Game Experiences

229

applications depending on the technologies they implement from level 0 to level 3. Where Level 0 (linked to the physical world). Applications link the physical world by using 2D barcodes such as QR codes. These codes only serve as a link to other content, so there is no 3D registration or tracking of markers. 2.5

Pervasive Narrative in the Games

Technologies are the fundamental support point for the transmission of stories to the user that is the pillar on which the pervasive narrative is based. Today, video games using narrative convey the story to players as a multi-sensory experience. The narrative makes the players the protagonists of the story [17]. The special characteristic that the narrative acquires in the games is interactivity, to better define the narrative we will use the description of Meadows [18], where he explains it as “a representation of characters and timed actions in which a reader can affect, choose or change the story”, given this definition according to the player’s actions and choices, a different path will be followed in which he must always be presented with a formally correct story and an appropriate user experience, which must be taken into account when designing and structuring the story by the creator thinking about the different branches of the story that may arise [19]. The different forms of ramifications characterized presented by Lindle [20] and that can adopt an interactive narrative are: Tree, Exploration, Parallel Frame, Nodal, Modulated, Open, Open without narrative arc.

3 Related Works LAGARTO, created by Maia et al. in [21], who propose and develop a tool for the construction of position-enhanced games based on augmented reality support. This tool is composed of a web environment for the creation and management of the games. In addition to an application for tracking the position of players. The main objective of the tool is to provide a software solution that allows non-programmer users to design, build and run mobile games based on single-player or multiplayer location using graphical notation, with the possibility of ordering missions and game mechanics. This provides a flexible way to define multiple game streams. One of the deficiencies is that it does not have a way to monitor the execution of the games. In addition, it does not have characteristics for the editing of games related to narrative since this was not its design approach. fAR-Play by Gutierrez et al. in [22] present the development of a framework for the creation of alternate/enhanced reality games for games guided by the treasure hunt metaphor. In other words, the framework is strongly focused on this type of positionbased games, which is supported for outdoor location under GPS and for indoor location in barcodes and QR. The framework is composed of four main modules, one of which is a mobile application through which players interact with the game and a website where the state of the game is reflected. Another of the modules is the game engine where the game logic remains and is maintained. WeQuest, Macvean et al. in [23] present the development of a tool for the creation and facilitation of augmented reality games based on geolocation with user-generated

230

C. C. Ceron Valdivieso et al.

content. The platform has been designed with the aim of increasing the accessibility of location-based augmented reality games through three components. One of them is the game engine, which allows you to download and run geolocated stories on mobile devices. Another component is a story creation tool for the user. The final component is the localization translation that adapts the stories to new areas other than those in which the story has been designed, making it possible for them to be played anywhere.

4 The Platform: CREANDO CREANDO platform is designed with the guidelines provided by the GeoPGD methodology, which divides the design and development of a geolocalized pervasive game experience into 2 phases. (a) Pervasive Narrative: this phase serves to define the story script, the characters, elements and scenarios that take part in the execution of the game, allowing each of these components to evolve to expand the initial narrative. (b) Game World: due to the pervasiveness of the game world, it breaks the virtual limits and integrates with reality, offering the possibility of creating a mixed world for the execution environment of the game experience. 4.1

Analysis

CREANDO platform arises from the need to have a tool for the creation and edition of pervasive game experiences based on location in closed spaces. For them, as mentioned above, the state of the art and related work was reviewed. Starting from LAGARTO as one of the best approaches to the above mentioned problems, we wanted to implement the platform CREANDO based on this tool and with the desire to adjust and adapt some of its characteristics and add others. The main objective of the study is to provide a solution that allows users with technical knowledge of programming the possibility of designing, building and executing pervasive experiences based on localization, with the possibility of defining the story through the narrative and the mechanics of the experience. 4.2

Design and Implementation

To design the platform different tools were considered in each main component (Core, Back and Database). Thinking about Core component, we built a GUI administrator that allow to the manager the creation and edition of game experiences. This GUI give to manager many options to add crossmedia content like audio, image, video, among others. In addition, several web services are able to communicate with the Back component. In the Back component, CREANDO has the authentication and logic modules, which through an ORM can communicate with the database to get information about users and game experiences. Finally, the database component is composed by two separated repositories. in the first one, the user profiles are stored and managed. The second one allows to store the game experiences information, and also, to audit the player interaction from an app.

CREANDO - Platform for Game Experiences

231

With the purpose of creating this platform, the efforts were focused on finding and relating in a coherent and optimal way some of the best free and open source tools and repositories in the market, prioritizing the integration time with a satisfactory result. The tools and repositories are show below in Table 1. Table 1. Tools used to build the CREANDO platform. Tool name TwineJS

AR.JS QR Code-Reader

Google Beacon Tool IONIC2 Microsoft Azure The Physical Web App

4.3

Description It is an open source tool for the creation and editing of interactive stories and its narrative structure developed by Chris Klimas and maintained by a large community It is a framework for the efficient display of augmented reality in web browsers developed and maintained by Jerome Etienne It is a web application for the creation and reading of QR codes developed by Jerome Etienne, the application allows the display and access to information associated with QR codes It is Google’s platform for the management of physical web projects and the beacons associated with them, allows the association of the url through the Eddystone URL protocol to the beacons The IONIC2 framework is used to create a cross-platform hybrid application for grouping all tools into a single application The use of the Microsoft Azure Platform or the deployment of the platform and the management of the pervasive experiences It is Google’s application for the detection, reading and display of the information associated with the url issued by the beacons

The Operation of the CREANDO Platform

The platform consists of a story editing tool and a mobile cross-platform application for access to the story editor and the different tools that facilitate the design of pervasive gaming experiences, including augmented reality, QR code reading and indoor location using beacon devices. The tool has an admin panel of stories created for editing or generating a new game experience. Later, when you want to generate a new game experience you can create a graph with the different scenarios and challenges you want to expose the player (Fig. 1). The mobile application is the container where the tools that facilitate the pervasiveness of the experiences are grouped, such as augmented reality, QR codes and interior positioning.

232

C. C. Ceron Valdivieso et al.

Fig. 1. Part of a graph of a story with its scenes.

5 Case Study “Unicauca Aprende – Creando” In the case study, students from the Systems Engineering program at the University of Cauca were taken into account, given the high drop-out rate and low performance due to the paradigm shift in teaching, especially in subjects related to the programming area. For this reason, it was proposed to carry out a pervasive experience of introduction to the students with the aim of measuring their previous knowledge in the area of programming as well as being a good way to make them aware of the different spaces on campus. In order to validate the educational contribution of the experience, a survey was carried out prior to the experience and another one after the experience was carried out to measure its effectiveness. The results are presented later in the results section. In the design of the experience was used as a guide the game design document of the project Juguemos version 2.0.0 by Jeferson Arango López with adjustments to the design of pervasive games georeferenced for the project JUGUEMOS, Using this work and the document as a guide allows for the design of a coherent story in Annex A is the filled out game design document for the experience “Unicauca aprende – Creando”. 5.1

Game History

The game Unicauca Aprende - Creando is a treasure hunt game for students of systems engineering at the University of Cauca, which takes place on the university campus. The mechanics of the game is the following of instructions with the purpose of collecting the information in this case lines of code, which allows to advance to the next level in a sequential way until arriving at the last point of the route where these lines of code will be used for the execution of a computer program.

CREANDO - Platform for Game Experiences

233

The importance of the game is to bring key aspects of the university and career to the attention of system engineering students, as well as to put them in context and bring them closer to the basic concepts of programming logic. A differentiating aspect of the experience is the use of tools that facilitate the pervasiveness for the development of the game. The game world is a mixture of real world and virtual world, the game’s virtual guide makes a tour around the campus with the player, the elements of interaction between the real world and the virtual world are the QR codes distributed throughout the campus that present information to the player in the form of text or links to web pages, the bluetooth beacons present notifications for access to web pages with content related to the experience, the AR markers present content actually increased to the player referring to the theme of the experience. 5.2

Results

The elements of interaction to facilitate pervasiveness used in the experience were Estimote beacons, AR markers and QR Codes (Fig. 2).

Fig. 2. Elements for interaction within the experience.

During the experience the students’ immersion in the story of the game was evident, expressed in their level of motivation during the experience and the expectation for the next levels and elements with which to interact in addition to the way they were presented with the story information. Some of the interactions with elements of the story such as QR codes, Beacons and AR markers are presented in (Fig. 3). The experience interaction graph “Unicauca Aprende-Creando” presented in (Fig. 2) is based on the treasure hunt game methodology, where the player is presented with a welcome screen, instruction screen and 5 sequential missions for advancement in the story. In order to move forward, the condition of success that is found within the information presented by the interaction elements must be met. If the correct success condition is not provided, the player remains on the current mission until the correct one is provided. As the player progresses between levels he collects elements for the final mission solution, at the end of the 6 missions a game end screen is presented to the player.

234

C. C. Ceron Valdivieso et al.

Fig. 3. Interaction with the elements within the experience.

In order to validate the contribution of the experience to 20 of the participating students, two surveys were carried out, one before and one after the experience, focused on the previous motivation and during the experience, as well as the educational contribution. In Annex B, there are the templates of the surveys carried out for the “Unicauca Aprende – Creando” experience and its results. Some of the most significant were the increase in motivation of close to 65% over a score of 3 before the experience to reach 90% over a score of 3 after the experience (Fig. 4).

Fig. 4. Motivation before and after participating in the experience. Being 5 the max value and 0 the minor value.

CREANDO - Platform for Game Experiences

235

6 Conclusions and Future Work What has been said throughout this work and the case study allows us to reach the following conclusions: The analysis made from the observation and the data collected in the surveys allow us to intuit a direct relationship between motivation and learning potential in pervasive educational experiences. The use of this platform allows the reduction of development time associated with software engineering tasks, which allows to focus efforts on the player’s experience, it is necessary to study the benefits of using this type of platform focused on the narrative and location in closed spaces in other areas of application, as well as the definition of metrics for measuring its effectiveness within a software process. It is essential to validate the educational and agile potential of the platform for the development of pervasive experiences through the creation of other case studies with greater narrative interaction and educational content to measure and contrast the results obtained with the current case study. It is also imperative that the platform is endorsed by experts to validate its level of pervasiveness in terms of time, space, social interactions and user experience. Finally, as future work, it is proposed the integration to the platform of a greater quantity of tools that facilitate the pervasiveness in the experiences, on the other hand, it is necessary to define some guidelines for the use of the platform and the definition of a methodology and metrics for a quantitative analysis of the benefits of the use of the platform. Acknowledgements. This work has been funded by the Ministry of Economy and Competitiveness of Spain as part of the JUGUEMOS project (TIN2015-67149-C3).

References 1. Viana, R., Ponte, N., Trinta, F., Viana, W.: A systematic review on software engineering in pervasive games development. In: Proceedings of SBGames, pp. 742–751 (2014) 2. López-arcos, J.R., Gutiérrez Vela, F.L.: Introducing an interactive story in a geolocalized experience *. In: Int. Conf. Proc. Ser. (2016) 3. Arango-López, J., Collazos, C.A., Gutiérrez Vela, F.L., Castillo, L.F.: A systematic review of geolocated pervasive games: a perspective from game development methodologies, software metrics and linked open data. In: Marcus, A., Wang, W. (eds.) DUXU 2017. LNCS, vol. 10289, pp. 335–346. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-586373_27 4. Huizinga, J.: Homo Ludens: A Study of the Play-Element in Culture (1944) 5. Magerkurth, C., Cheok, A.D., Mandryk, R.L., Nilsen, T.: Pervasive games: bringing computer entertainment back to the real world. ACM Comput. Entertain. 3, 1–19 (2005) 6. Arango-López, J., Gutiérrez, F.L., Collazos, C.A., Valera, R., Cerezo, E.: Pervasive games: giving a meaning based on the player experience. In: Interacción 2017, pp. 1–4 (2017) 7. Hinske, S., Lampe, M., Magerkurth, C., Röcker, C.: Classifying pervasive games: on pervasive computing and mixed reality. Concepts Technol. Pervasive Games A Read. Pervasive Gaming Res. 1, 11–38 (2007)

236

C. C. Ceron Valdivieso et al.

8. Stach, C., Schlindwein, L.F.M.: Candy castle - a prototype for pervasive health games. In: 2012 IEEE International Conference on Pervasive Computing and Communication Workshop PERCOM Workshop, pp. 501–503 (2012) 9. Karimi, H.A.: Universal Navigation on Smart Phones. Springer, Heidelberg (2011). https:// doi.org/10.1007/978-1-4419-7741-0 10. Stojanović, D.H., Stojanović, N.M.: Indoor localization and tracking: methods, technologies and research challenges. Facta Univ. Ser. Autom. Control Robot. 13, 57–72 (2014) 11. Hallberg, J., Nilsson, M., Synnes, K.: Positioning with bluetooth, pp. 3–18 (2000) 12. Feldmann, S., Kyamakya, K., Zapater, A., Lue, Z.: An indoor bluetooth-based positioning system: concept, implementation and experimental evaluation. In: International Conference on Wireless Networks, pp. 109–113 (2003) 13. Kipper, G.: What is augmented reality? In: Augmented Reality, pp. 1–27 (2013) 14. Azuma, R.: Tracking requirements for augmented reality. Commun. ACM. 36, 50–51 (1993) 15. Prendes Espinosa, C.: Realidad aumentada y educación: análisis de experiencias prácticas. Píxel-Bit, Rev. Medios y Educ. (2014) 16. Blázquez Sevilla, A.: Realidad Aumentada en Educaciñon (2017) 17. Padilla-Zea, N., Gutiérrez, F.L., López-Arcos, J.R., Abad-Arranz, A., Paderewski, P.: Modeling storytelling to be used in educational video games. Comput. Hum. Behav. 31, 461–474 (2014) 18. Meadows, M.S.: Pause and Effect: The Art of Interactive Narrative. New Riders, Indianapolis (2002) 19. López-Arcos, J.R., Gutiérrez Vela, F.L., Padilla-Zea, N., Paderewski Rodríguez, P.: Diseño de una narrativa interactiva para experiencias geolocalizadas. In: 2016 XVII Congreso Internacional de Interacción Persona-Ordenador – Interacción, pp. 25–32 (2016) 20. Lindley, C.A.: Story and narrative structures in computer games (2005) 21. Maia, L.F., et al.: LAGARTO: a LocAtion based Games AuthoRing TOol enhanced with augmented reality features. Entertain. Comput. 22, 3–13 (2017) 22. Gutierrez, L., et al.: fAR-PLAY : a framework to develop Augmented/Alternate Reality Games. In: Percomorg, pp. 531–536 (2011) 23. Macvean, A., et al.: WeQuest: scalable alternate reality games through end-user content authoring. In: Proceeding of 8th International Conference Advance Computer Entertainment Technology (2011). Article no. 22

Towards a Smart Farming Platform: From IoT-Based Crop Sensing to Data Analytics H´ector Cadavid(B) , Wilmer Garz´ on , Alexander P´erez , Germ´ an L´ opez , Cristian Mendivelso , and Carlos Ram´ırez Escuela Colombiana de Ingenier´ıa, Bogot´ a, Colombia {hector.cadavid,wilmer.garzon,alexander.perez}@escuelaing.edu.co, {german.lopez-p,cristian.mendivelso, carlos.ramirez-ot}@mail.escuelaing.edu.co http://www.escuelaing.edu.co Abstract. Colombia is a country with a huge agricultural potential, thanks to its size and geography diversity. Unfortunately, it is far from using it efficiently: 65% of its farmland is either unused or underused due to political problems. Furthermore, vast of Colombian agriculture is characterized - when compared with other countries - by low levels of productivity, due to the lack of good farming practices and technologies. The new political framework created by the recently signed peace agreement in this country opens new opportunities to increase its agricultural vocation. However, a lot of work is still required in this country to improve the synergy between academia, industry, agricultural experts, and farmers towards improving productivity in this field. Advances in smart-farming technologies such as Remote Sensing (RS), Internet of Things (IoT), Big Data/Data Analytics and Geographic Information Systems (GIS), bring a great opportunity to contribute to such synergy. These technologies allow not only to collect and analyze data directly from the crops in real time, but to extract new knowledge from it. Furthermore, this new knowledge, combined with the knowledge of local experts, could become the core of future technical assistance and decision support systems tools for countries with a great variety of soils and tropical floors such as Colombia. Motivated by these issues, this paper proposes an extension to Thingsboard, a popular open-source IoT platform. This extended version aims to be the core of a cloud-based Smart Farming platform that will concentrate sensors, a decision support system, and a configuration of remotely controlled and autonomous devices (e.g. water dispensers, rovers or drones). The architecture of the platform is described in detail and then showcased in a scenario with simulated sensors. In such scenario early warnings of an important plant pathogen in Colombia are generated by data analytics, and actions on third-party devices are dispatched in consequence. Keywords: Smart farming IoT

· Data analytics · Precision agriculture

c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 237–251, 2018. https://doi.org/10.1007/978-3-319-98998-3_19

238

1

H. Cadavid et al.

Introduction

By 2050, a world population of nearly 9.1 billion people has been estimated, which would require increasing overall food production by at least 70% (compared with 2007 production statistics) [5]. Given this unsettling scenario, it is not surprising that food security policies are among the main goals in a global agenda. However, to achieve this Policies, the amount of soil devoted to agriculture should be incremented and be more efficient. A higher amount of areas devoted to agriculture, with bad or outdated farming practices, lead not only to low productivity rates but to the increase of other problems like water contamination by excessive dosage of pesticides [7]. Colombia is a strategic case study for this problem. Despite of being a large country of 114 Mha (twice as Spain), with 42 Mha suitable for agriculture, five thermal floors, and a great diversity in terms of soil, geology, topography and vegetation, it is increasingly supplying its food needs through imports (by 2016, 30% of the food consumed by its population in one year). These statistics are explained by the fact that the country is using only an about of a third of the available agricultural land (14 Mha out of 42 Mha), according to the Rural Agricultural Planning Unit (UPRA, acronym in Spanish). In addition, most of such agriculture is characterized by low levels of technology, due to almost 50years of internal conflict deterring investment in secluded farms. However, the new political framework created by the recently signed peace deals open new opportunities to increase the agriculture vocation of the country. Indeed, the FAO has defined Colombia as one of the possible agricultural leaders for the world, and a key actor in the fight against hunger and malnutrition. This paper describes the initial results of a research project whose final goal is to create a MaaS (Monitoring as a Service) platform that enables the synergy between IoT technology, Data-Analytics, and experts in Colombian agricultural species. This platform, which aims to be the core of future technologies for Colombian agriculture, is expected to enable a knowledge-feedback process as the one described below: 1. A set of soil sensors, distributed through several crops, transmits data (environment and soil variables) to the MaaS platform. 2. The MaaS platform, based on the rules for pre-known and pre-configured risks and threats, fires an alarm when the conditions are met. 3. When an alarm is fired, two additional actions could be performed: (1) a static actuator (e.g. an irrigation sprinkler) is remotely activated, or (2) a request for a precision-agriculture-task (e.g. applying a pesticide) is sent to the control center of an autonomous robot fleet [6]. 4. When an anomaly (still not a risk) is identified, the system could also request (through the robot fleets control center) a data-gathering task, such as taking multi-spectral pictures through a Drone or a Rover. 5. An expert, as a daily basis routine, or motivated by the anomaly detection, checks all the data (sensor readings, pictures, crop’s relative localization and history). The expert, based on such information, and further analysis -if

Towards a Smart Farming Platform

239

required- could register a spatial-temporal classification Tag (e.g. the name of a disease). 6. Once enough spatial-temporal Tags have been registered through the normal operation of the platform, a classifier (e.g. to identify the disease) is trained. Such classifier is then included as a component into the MaaS system so that future readings would allow the automatic detection of the newly identified disease. The platform is built upon Thingsboard [19], a popular open-source IoT software for device management, data collection, processing and visualization. The extensions proposed in this paper for the Thingsboard architecture, so far, will include the functional requirements of the steps 1 through 4 of the scenario described above. As a study case, a simulated scenario for the early detection of the Phytophthora infestans [8] pathogen is described, which makes use of the following features provided by the new architecture: – Extended data model with sensor/crops/farms detail level, and concepts from The International Center for Tropical Agriculture guidelines [2]. – API for accessing the extended data model from within the rules, and storing/accessing intermediate states of it. – Geo-referenced data indexing and GIS capabilities. – High-resolution photos storage and indexing. The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 describes the proposed architecture built upon the Thingsboard platform. Section 4 describes the problem of the Phytophthora infestans, how its early detection is addressed with the proposed platform, and the outcome of preliminary experiments. Section 5 concludes the paper.

2

Related Work

It is not surprising that the vast amount of research related to IoT and BigData applications in different fields (Ahmed et al. did a general survey in [1]), given the exponential growth in the data collected around the world (it has been said that up to 90% of the world’s data has been produced after 2011 [12]). When it comes to applications in agriculture, and given all the factors that affects its productivity such as climate, soil, pests, diseases, and weather [15] there are two main approaches in previous work: (1) how to gather and transmit data from the crop and (2) how to process data and perform actions based on such processing outcomes. For the first approach, there is a complete survey of communicationsrelated topics on wireless sensor network (WSN), such as communication protocols and energy efficiency in [11]. For the second approach, Lasso et al. [13] proposed the AgroCloud platform [14], which aims to the prevention of coffee rust. Although this platform generates early warnings based on data collected by third-party weather data providers, including air temperature, relative humidity, wind speed and direction, rain and solar radiation, it is not open for processing

240

H. Cadavid et al.

data transmitted directly from crop sensors. Verdouw et al. [21], proposed an analysis and decision-making model for a supply chain management in Dutch floriculture industry. Peisker et al. [17], describes a data-analysis model created by John Deere Company to keep trace of tractors’ performance using big data and data collected from devices in the field. Although not specifically intended for agricultural applications, for this approach there are other works worth mentioning given its application of real-time data processing of environmental sensors. Bashir et al. [3] presented a framework for the analysis of large amounts of data from smart buildings, including oxygen levels, smoke/hazardous gases, luminosity, among others. Sarangi et al. [18] proposed framework for an agricultural advisory call center; here the farmer sends images of plants with crop diseases and the system makes the diagnosis and indicates the appropriate management of the disease. This framework does not allow real-time data processing, nor the detection of diseases from the information of the sensors. One of the closest works to the proposal in this paper, when it comes to longterm objectives, is FarmBeats [20], a platform that covers both approaches. On the one hand, FarmBeats addressed the problem of how to transmit efficiently (in terms of energy and speed) data from sensors in regions with low coverage of communications infrastructure. In contrast, it proposed an architecture that aims to local data processing, integration with drones control (to take pictures) and cloud-based data persistence for centralized data-analytics. There are, however, two main differences with our proposal: – FarmBeats is a complete, full-fledged technology, with a fixed set of hardware and software components. This platform, on the other hand, has a narrower scope as an extensible software platform, where new devices, new third-party systems, and, more importantly, new data-analytics strategies for early phenomenon detection (e.g. diseases) could be integrated with ease. Moreover, this extension for Thingsboard is expected to be accepted (pulled ) by the community, and become the core of advanced smart-farming/IoT solutions with the integration of custom devices (sensors, actuators, and autonomous robots). – FarmBeats processes data at two different places: at local PC, for sensor monitoring and decision making, and in the cloud, to perform cross-data analytics with the information provided by all the farms in conjunction. The architecture of our proposal, on the other hand, aims to be a centralized, cloud-based smart farming solution that performs data analytics and decision taking in one place. Thus, the scope of a decision-making tasks in our platform will not be limited to the local context of the event.

3

Proposed Extension Points

This section presents the main contribution of this paper. The key idea is to extend the Thingsboard platform from both the functional and architectonic point of views, in order to make it suitable for the application scenarios described in Sect. 1.

Towards a Smart Farming Platform

3.1

241

Thingsboard - Base Architecture and Data Model

The base architecture of Thingsboard aims to a high scalability through the distribution of its workload across multiple processing nodes without a single point of failure. Such workload distribution is achieved with the actors’ model proposed by Hewitt et al. [9] and its implementation through the Akka platform [4]. Thingsboard was designed not only with scalability in mind, but also for front-end customization. On the one hand, its Widgets model enables the integration of new UI modules (for data visualization, alarms management, etc.). On the other hand, the Thingsboard Rule Engine allows to process messages from devices and trigger actions through plugins. One of the most useful plugins is the one provided to enable interoperability with Apache Spark, an analytics engine for large-scale data processing. Although a detailed description of Thingsboard architecture (at actors-level) is described on its official website1 , Fig. 1 presents an schema of the higher level components (and their interactions) which would be involved in a conventional IoT application case study. As described in such figure, a conventional Thingsboard configuration is limited to the analysis of data collected from sensors in real time. This, as mentioned before, is a big limitation for applications scenarios such as the early detection of diseases in crops, whose rules might require access not only to real-time sensor readings, but to crop details and its historical information. Furthermore, if an advanced action for such rule is expected (e.g. an autonomous drone action), geo-referenced information would be required as well. In order to enable adaptability when it comes to scenarios configuration, Thingsboard defines in its data and widget models the Asset entity. An Asset is an abstract IoT entity which could be related to other assets and devices (e.g. sensors) and therefore allowing a hierarchical composition of such devices. For example, a scenario of a farm with two crops, each one with two sensors, could be defined (directly through Thingsboard user interface) as a root asset (for the farm), two child assets (for the crops), each one with two devices. Furthermore, UI-widgets and dashboards for a hierarchical model as the aforementioned could be easily configured. 3.2

Proposed Extensions

Extended Data Model and Architecture. Although the Asset abstraction makes Thingsboard a highly flexible platform for most IoT application scenarios, it isn’t enough to represent the information which is expected to be captured and processed (by data analytics techniques) in our scenarios. Our IoT/dataanalytics application goals would require not only a devices hierarchy, but details such as crop history, application of good agricultural practices, geo-referenced information and pictures, among others, as described in Fig. 2.

1

https://thingsboard.io/docs/reference/architecture/.

242

H. Cadavid et al.

Fig. 1. (1) Sensors transmit data stream (through an Internet-gateway) using one of the protocols currently supported: MQTT, CoAP or HTTP. (2) Thingsboard backend (based on Akka actors, not detailed in the figure), transfer the data stream to all the relevant rules. (3) In this scenario, the rule is configured to use the Spark-plugin, so that the stream is transmitted to a Spark Task. (4) As a response, the spark task re-publishes new types of events, such as alarms, or transformed data, in order to be presented in the front-end (5).

Based on the data model described above, and the requirements of our study case (the early detection of Phytophthora infestans), the extended architecture described in Fig. 3 was proposed. A first version of the architecture was implemented as a fork of the official Thingsboard distribution https://github.com/ LIS-ECI/thingsboard, considering the following elements: – An extension to the default data model (implemented in Cassandra, a NoSQL time-series database) that integrates the concepts of Farm, Land Lot, Crop, and the ‘good practices’ check list proposed by the International Federation of Organic Agriculture Movements [16].

Towards a Smart Farming Platform

243

Fig. 2. Extended Thingsboard UI, including new data hierarchy and graphical representation of geo-referenced data.

– An integration of complementary database engines, with a distributed transactions mechanism and an access API: MongoDB for geo-referenced indexing of sensors, crops and farms (Location entity in Fig. 2); MongoDB+GridFS for geo-referenced pictures; and REDIS for keeping temporary-volatile data, such as intermediate states of a rule evaluation. – An API for the registration of third-party platforms the platform is going to interact with. – A Framework within Spark for the definition of new rules and actions for a potential phenomenon/disease in the crops monitored by the platform. Such framework allows the definition of rules with access to the extended and complementary data model, and the definition of actions with access to the third-party platforms API.

Extended User Stories. Given the hierarchies of the proposed extended data model, and the guidance of potential Stakeholders, new User Stories2 and Wireframes3 were defined, including the registration of sensors and the configuration of dashboards. Figures 4 and 5 show screenshots of two of the user stories developed so far using the Widgets model aforementioned. Figure 4 shows how geo-referenced details are now used to show elements such as the physical distribution of the crops within the farm. Figure 5, on the other hand, shows how the new User Stories allow the farmer to keep tracking of the good practices in a crop over the time [16]. Such information, as mentioned before, could therefore be accessed by the rules registered in Spark.

2 3

https://trello.com/b/V6wD9VEX/thingsboard-extensi%C3%B3n. https://ninjamock.com/s/9W6WWRx.

244

H. Cadavid et al.

Fig. 3. (1) Sensors transmit data stream (through an Internet-gateway) using one of the protocols currently supported: MQTT, CoAP or HTTP. (2 & 3) Thingsboard backend transfer the data stream to all the relevant rules, in this case, a rule with the Spark-plugin enabled. (3) A Spark Task configured by default to handle all the readings, delegates its evaluation to a series ‘Evaluation/Action’ components (previously injected to such task). (4) The ‘Evaluation/Action’ component, based on the sensor readings, crop details, will generate warnings through conventional Thingsboard alarm mechanism. (6) Such ‘Evaluation/Action’ components would be able to fire actions in third-party systems (e.g. an autonomous drone) providing them with all the details required for their mission. In the figure, the platform fires an autonomous drone (7) that will take multi-spectral pictures of the alarm zone, for further analysis.

Towards a Smart Farming Platform

245

Fig. 4. Extended Thingsboard UI, including new data hierarchy and graphical representation of geo-referenced data.

Fig. 5. Extended Thingsboard UI, including new data hierarchy and graphical representation of geo-referenced data.

4

Proof of Concept

This section presents a proof of concept of the architecture extension proposed to the Thingsboard platform in Sect. 3. A simulated scenario for the detection of Phytophthora infestans conidia pest was chosen for the experiments. The integration of previous works in control architectures for autonomous robots [6] is expected for future field tests. However, for the testing purposes of this paper (with a software architecture scope), a simulated drone fleet controller (which simply echoes all the received instructions) was integrated as a means to verify the outcome of the proposed scenario. As an outcome of the early detection of the Phytophthora infestans, not only the generation of alarms through the

246

H. Cadavid et al.

platform is expected, but the activation, with the right set of instructions, of the (simulated) drone fleet controller. 4.1

Early Detection of Phytophthora Infestans

There are different models, documented in literature, for the prediction of sporulation of Phytophthora infestans conidia pests. This prediction, as a means of early detection, allows applying a timely phytosanitary treatment for the crop in order to mitigate the development of the late blight disease. As the reader could see in this figure, the consequences of the propagation of this disease is catastrophic to the crop. One of the prediction models is the Smith Period Model [10], where the minimum temperature and relative humidity are data considered. The authors of this model proposed that a Smith Period occurs when the minimum temperature is higher than 10 ◦ C and the relative humidity is greater than 90% for 11 h, for 2 consecutive days. When two Smith Periods occur it is necessary to perform the first application of a fungicide to mitigate the sporulation risk before the disease appears in the crop. If the temperature and humidity criteria are met only on the first day, and on the second day they reach 10 h of relative humidity greater than 90%, it indicates that only one Smith Periods has taken place. As mentioned before, the proposed extension for the Thingsboard platform, makes possible the integration of a model like the former as a software component. For evaluation purposes, the model was implemented as a sliding-window algorithm, depicted in Fig. 6. Such figure, on one hand, shows the importance of the session-persistence feature proposed for the architecture. The details provided in Algorithm 1 show, on the other hand, how the framework within the Spark model enables access to crop’s details, including history and geo-referenced information. Furthermore, as shown in Algorithm 2, the framework also enables the definition of actions to be performed when there is an alert confirmation, including the interaction with third-party platforms (in this case, launch a hypothetical drone and notify to tenants of nearby crops). 4.2

Experiment Setup and Results

For our experiments, 25 sensing devices, associated with 5 different crops were simulated through Gatling tool, an open source Load and Performance profiler tool. Two of such sensors were fixed to produce data within the range of a series of Smith Periods. For testing purposes, time was scaled by a factor of 86.400 to 60 s (1 day=1 min). During the execution of the simulation, the Dashboard of the crop with the fixed sensors started as shown in Fig. 7(a), and few minutes later, generated an alarm (as expected), as shown in Figure 7(b). Moreover, in the same simulation scenario, the alarm is sent in real time to the simulated drone fleet controller with the geographic localization of the field, as shown in Fig. 8. For this setup, the servers where distributed in three virtual machines with 4 GB of RAM, running over an Intel (R) Xeon(R) E5620-2.4 GHz server.

Towards a Smart Farming Platform

247

Algorithm 1. Phytophthora-infestans-risk-evaluation 1: procedure risk-evaluation(cropid, humidityData, temperatureData) Input: – cropid: crop’s unique identifier. – humidityData: average humidity since last reading, provided by a sensor. – temperatureData: average temperature since last reading, provided by a sensor. 2: cropT ype ← dataapi.getCropT ype(cropid) 3: riskDetected ← F alse 4: conditionsF ulf illed ← ” − ” 5: now ← CurrentT ime 6: if cropType is ’potatoe’ and humidityData and temperatureData satisfy the condition then 7: conditionsF ulf illed ← ” + ” 8: if It’s the 1st time receiving data then 9: window ← conditionsF ulf illed 10: cacheapi.saveF irstT ime(cropid, now) 11: else if Eleven hours have already elapsed then 12: window ← cacheapi.getW indow(cropid) 13: window.removeF irstElement() 14: window+ = conditionsF ulf illed 15: if the amount of ’+’ in window is high then 16: if It’s the 1st day and Not exist a Smith period then 17: cacheapi.saveSmithP eriod(cropid, T rue) 18: else if It’s the 2nd day then 19: riskDetected ← T rue 20: window ← ”” 21: cacheapi.saveF irstT ime(cropid, now) 22: cacheapi.saveSmithP eriod(cropid, F alse) 23: else 24: window+ = conditionsF ulf illed 25: if 1st day finished then 26: if Not exist a Smith Period then 27: cacheapi.saveF irstT ime(cropid, now) 28: window ← ”” 29: cacheapi.cacheSmithP eriod(cropid, F alse) 30: if 2nd day finished then 31: cacheapi.saveF irstT ime(cropid, now) 32: window ← ”” 33: cacheapi.cacheSmithP eriod(cropid, F alse) 34: cacheapi.saveW indow(icrop, window) 35: return riskDetected

248

H. Cadavid et al.

Algorithm 2. Phytophthora-infestans-actions 1: 2: 3: 4: 5: 6: 7: 8: 9:

procedure Phytophthora infestans actions(cropid) cropT oken ← dataApi.getT hingsboardT oken(cropid) commApi.sendAlertT oT hingsboard(cropT oken) cropCoordinates ← geoApi.getP arcelCoordinates(cropid) neighborsCrops ← geoApi.getCropsInARadiud(cropCoordinates, radius) for crop in neighborsCrops do apiData.getOwnerData(crop).sendM ail(”RiskOf P hytophthorainf estans”) sensorLocation ← geoapi.getSensorLocation(idP arcel) commApi.sentT oT hirdP arty( droneController  , applyF ungicide , sensorLocation)

Fig. 6. Sliding Window approach for the evaluation of Phytophthora infestans conidia pests based on Smith Periods Model. The platform keeps tracking (in a fixed window) of the positive or negative readings for risk conditions over the last 11 h. When a Window is full of positive readings (Scenario 1), the count of Smith Periods in the 48-hour interval become one. With this approach, long periods of positive readings, with intermediate intervals of negative readings (Scenario 2) could be easily discarded as Smith Periods.

4.3

Performance Evaluation

In order to measure the overhead of the architecture extensions, the same load test were performed over a basic configuration of Thingsboard, with a conventional set of alarms (based on simple value intervals). The outcomes provided by Gatling’s dashboard after running the same load test on both configurations, were nearly identical. However, this could be explained given the asynchronous nature of the platform’s entry point: an MQTT server. For this reason, all the execution times of the Algorithm 1 were measured over the experiment. As can be seen in Fig. 9, the overhead of the data-access API and third-party systems interaction is in most cases about 0.5 and 1.5 s, with few outlier peaks. Given the low frequency of the data transmitted in most IoT applications, this overhead could be considered negligible.

Towards a Smart Farming Platform

249

Fig. 7. Default dashboard for the simulated environment, including temperature and humidity sensors, before (a) and after (b) the warning generated by the Phytophthora infestans evaluation rule.

Fig. 8. Screenshot of the simulated Drone-Fleet controller when a message is received after firing the Phytophthora infestans alarm.

Fig. 9. Differences in milliseconds between the start and end time of each execution of Algorithm 1

250

5

H. Cadavid et al.

Concluding Remarks

FAO considers Colombia an important player in the security food policy, but nowadays it does not have enough agricultural technologies in its production processes. In the last five decades, plowable land in Colombia has been disputed by internal war participants and recently by criminal organizations to produce narcotics. Within the framework of the peace agreement signed in 2016, a new perspective for agricultural production is rising for land owners and farmers in terms of crop substitution and the increase of land use as crop fields. One of the most important tasks for agriculture labors is increasing the uses of modern techniques of production. However, the lack of reliable data about specific varieties of plants in Colombia is a big gap to be filled. With this panorama, the incursion in new technologies like IoT and Analytics is mandatory were the country to want to increase its food exports to the rest of the world. In this work, the authors have presented an extension of a popular open source platform called Thingsboard, that is used to collect and manage data provided by sensors. This extension aims to be the core of a future cloud-based MaaS (Monitoring as a Service) tailored to the needs of the Colombian farming industry. The architecture of the proposed extension has been validated and illustrated with a real-life scenario, where the risk of an extremely dangerous potato disease is identified in real time and a simulated controller of autonomous drones is activated in response. Future work will integrate a module for data exploration (time series, pictures, etc.) and spatial-temporal tagging by experts. In the long term, once enough data has been collected and tagged, new classification models for other diseases would be trained (with such data) and integrated into the platform.

References 1. Ahmed, E., et al.: The role of big data analytics in internet of things. Comput. Netw. 129, 459–471 (2017) 2. Alvarez Villada, D.M., Estrada Iza, M., Cock, J.H.: Rasta rapid soil and terrain assessment: Gu´ıa pr´ actica para la caracterizaci´ on del suelo y del terreno (2010) 3. Bashir, M.R., Gill, A.Q.: Towards an IoT big data analytics framework: smart buildings systems. In: 2016 IEEE 18th International Conference on IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1325–1332. IEEE (2016) 4. Bon´er, J., Klang, V., Kuhn, R., et al.: Akka library. http://akka.io/ 5. Bruinsma, J.: World Agriculture: Towards 2015/2030: An FAO Study. Routledge, London (2017) 6. Cadavid, H., P´erez, A., Rocha, C.: Reliable control architecture with PLEXIL and ROS for autonomous wheeled robots. In: Solano, A., Ordo˜ nez, H. (eds.) CCC 2017. CCIS, vol. 735, pp. 611–626. Springer, Cham (2017). https://doi.org/10.1007/9783-319-66562-7 44 7. Espana, V.A.A., Pinilla, A.R.R., Bardos, P., Naidu, R.: Contaminated land in colombia: a critical review of current status and future approach for the management of contaminated sites. Sci. Total Environ. 618, 199–209 (2018)

Towards a Smart Farming Platform

251

8. Fry, W., et al.: Five reasons to consider Phytophthora infestans a reemerging pathogen. Phytopathology 105(7), 966–981 (2015) 9. Hewitt, C., Bishop, P., Steiger, R.: A universal modular actor formalism for artificial intelligence. In: Proceedings of the 3rd International Joint Conference on Artificial Intelligence, pp. 235–245. Morgan Kaufmann Publishers Inc. (1973) 10. Iglesias, I., Escuredo, O., Seijo, C., M´endez, J.: Phytophthora infestans prediction for a potato crop. Am. J. Potato Res. 87(1), 32–40 (2010) 11. Jawad, H.M., Nordin, R., Gharghan, S.K., Jawad, A.M., Ismail, M.: Energyefficient wireless sensor networks for precision agriculture: a review. Sensors 17(8), 1781 (2017) 12. Poole, J., Rae, B., Gonz´ alez, L., Hsu, Y., Rutherford, I.: A world that counts: mobilising the data revolution for sustainable development. Technical report, Independent Expert Advisory Group on a Data Revolution for Sustainable Development, November 2014 13. Lasso, E., Corrales, J.C.: Towards an alert system for coffee diseases and pests in a smart farming approach based on semi-supervised learning and graph similarity. In: Angelov, P., Iglesias, J.A., Corrales, J.C. (eds.) AACC’17 2017. AISC, vol. 687, pp. 111–123. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70187-5 9 14. Lasso, E., Valencia, O., Corrales, D.C., L´ opez, I.D., Figueroa, A., Corrales, J.C.: A cloud-based platform for decision making support in Colombian agriculture: a study case in coffee rust. In: Angelov, P., Iglesias, J.A., Corrales, J.C. (eds.) AACC’17 2017. AISC, vol. 687, pp. 182–196. Springer, Cham (2018). https://doi. org/10.1007/978-3-319-70187-5 14 15. Nuthall, P.: Farm Business Management: Analysis of Farming Systems. Lincoln University, CABI (2011) 16. International Federation of Organic Agriculture Movements (IFOAM): Best Practice Guideline for Agriculture and Value Chains. Sustainable Organic Agriculture Action Network/International Federation of Organic Agriculture Movements (IFOAM) (2013) 17. Peisker, A., Dalai, S.: Data analytics for rural development. Indian J. Sci. Technol. 8(S4), 50–60 (2015) 18. Sarangi, S., Umadikar, J., Kar, S.: Automation of agriculture support systems using wisekar: case study of a crop-disease advisory service. Comput. Electron. Agric. 122, 200–210 (2016) 19. ThingsBoard. Thingsboard - open-source IoT platform (2018). https:// thingsboard.io 20. Vasisht, D., et al.: Farmbeats: an IoT platform for data-driven agriculture. In: NSDI, pp. 515–529 (2017) 21. Beulens, A.J., Reijers, H.A., van der Vorst, J.G., Verdouw, C.N.: A control model for object virtualization in supply chain management. Comput. Ind. 68, 116–131 (2015)

Instrumented Insole for Plantar Pressure Measurement in Sports Iv´ an Echeverry-Mancera, William Bautista-Aguiar, Diego Florez-Quintero, Dayana Narvaez-Martinez, and Sonia H. Contreras-Ortiz(B) Universidad Tecnol´ ogica de Bol´ıvar, Km 1 V´ıa Turbaco, Cartagena, Colombia [email protected] http://www.utb.edu.co Abstract. Novel technological aids have been developed to evaluate sport performance. Among these tools there are wearable sensors that monitor physical and physiological variables during the execution of exercises. This paper describes the design and construction of an instrumented insole for acquisition and transmission of plantar pressure. The system was designed to support heavy weights, such as in weightlifting. It uses five high-range force sensors located in relevant anatomical points. It can be worn comfortably by the athlete and plantar pressure can be transmitted wirelessly to be registered and visualized in real-time. Keywords: Technology in sports Wearable device

1

· Biomechanics · Plantar pressure

Introduction

Electronic systems for sport performance monitoring use sensors and wireless communication to acquire and transmit data to a computer or mobile device. They can be used to quantify performance and determine optimum techniques [12,16]. Weightlifting is one of the most popular sports in Colombia. It consists on lifting a bar loaded with discs. There are two competition modalities: snatch, and clean and jerk. In snatch, the bar is elevated without interruption from the floor to a position overhead with a squad. In clean and jerk, the bar is lifted in two phases: from the floor to the shoulders, and from the shoulders to a position overhead. In both techniques, adequate feet position is fundamental for a safe and efficient execution of the exercises. The feet of weightlifters support high amounts of weight for short periods of time, and pressure is unevenly distributed on the soles, so pressure monitoring can be useful to evaluate efficiency and assess risks. The study of static and dynamic plantar pressure is an important task in ergonomics, medicine and sports [2,4,7,9,11,13]. Plantar pressure systems allow to estimate the foot’s center of pressure (COP), evaluate stability and balance, and detect abnormal conditions. Previous works describe the development of in-shoe systems that have been designed for gait analysis. In 1992, Wertsch et al. developed a portable system that consists of two insoles that have seven conductive polymer sensors each to measure plantar pressure during normal activities [17]. Shu et al. developed a measurement system based on a textile fabric c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 252–259, 2018. https://doi.org/10.1007/978-3-319-98998-3_20

Instrumented Insole for Plantar Pressure Measurement in Sports

253

sensor array with six sensors [14]. Tao et al. designed an insole that monitors triaxial ground reaction forces and 3D orientation of feet with inertial sensors [8]. Martinez et al. [10] developed a wireless system to measure plantar pressure in four anatomical points. Recently, Hu et al. developed a low cost electronic insole to monitor center of pressure (COP) during gait for fall risk assessment [5,6]. A commercial in-shoe system is F-scan (Tekscan Inc., Boston, MA, USA). It can measure plantar pressure up to 862 kPa with high spatial resolution. Other system is OpenGo (Moticon, Munich, Germany) that includes 13 sensors with range up to 400 kPa and wireless communication. Although these systems offer high spatial resolution, they have reduced range that may not be appropriate for weightlifting and other sports. For example, in soccer peak plantar pressures can be up to 680 ± 120 kPa [3]. This paper describes the design and construction of an electronic system for plantar pressure measurement in weightlifting. The system is composed of five force sensors distributed in the insole area that allow pressure monitoring in real-time. The insole has high measurement range and wireless communication. This system can be used to detect high-pressure points, estimate the foot’s COP, and detect excessive foot pronation, which can cause leg injuries in weightlifting.

2

Materials and Methods

The purpose of the instrumented insole is to provide a low-cost solution for real-time monitoring of plantar pressure in sports. 2.1

Requirements of the System

The insole was designed considering the following requirements. – High pressure range. During stance, foot pressure can be in the range of 0 to 200 kPa; during walking, it can go up to 1000 kPa, and in extreme conditions, it can go up to 3 MPa [15]. Therefore, pressure range should be of at least 3 MPa. – Reduced size, weight and cabling. For in-shoe placement, the insole should be thin, flexible and light [1]. Additionally, to avoid obstructing movements, the cabling should be kept to a minimum, so wireless transmission is desired. – Low power consumption. It is required to allow enough battery autonomy of the device. – Sensor size and placement. A minimum sensor size of 5 mm × 5 mm is recommended [1]. As most of the body weight is supported by 15 areas distributed on the heel, midfoot, metatarsal and toes [14], sensors should be located on these areas. 2.2

Description of the System

The insole in composed of the following components (See Fig. 1):

254

I. Echeverry-Mancera et al.

– Five piezoresistive force sensors (FlexiForce A201, Tekscan). These sensors were chosen because they have a large pressure range (up to 100 lb or 445 N), and are thin and flexible. The sensors were located at the hallux (big toe), first and fifth metatarsals, cuboid, and calcaneous (heel). These anatomic locations were selected because they are important for support and stability. – A signal conditioning circuit to convert resistance into a voltage signal up to 5 V. The circuit was designed to obtain accurate measurements and small size. – A microcontroller (PIC16F88, Microchip). This microcontroller has seven analog input pins, which are enough to acquire the signals from the sensors, and it is available in surface-mount technology (SMT). – Bluetooth module (HC-06). These Bluetooth modules allow wireless transmission with range up to 10 m and offer low power consumption.

Fig. 1. Block diagram of the system

2.3

Sensor Calibration

Sensors have a force range from 0 to 445 N, and area of 0.713 cm2 , what allows pressure measurements up to 6.24 MPa. The conductance of the sensor is proportional to the pressure exerted on its surface, as seen in Fig. 2. For calibration, compressive forces were applied normal to the sensor’s surface as seen in Fig. 3. We used an axial loading machine (Marshal and CBR PS-25) to apply forces from 0 to 400 N. Calibration curves are shown in Fig. 4. 2.4

Electronic Design

An electronic circuit was designed to convert resistance to voltage and acquire the signals from the sensors. Each sensor is connected to an operational amplifier (LM324, Texas Instruments and LM358, Texas Instruments) in non-inverting mode. This configuration allows a linear output with a range from 1 to 5 V, that is described in Eq. 1, where RF is the feedback resistor, RS is the sensor, and Vin is fixed in 1 V. Note that the output voltage has a linear relationship with the conductance of the sensor, which is proportional to the force. Additionally, as the integrated circuits have multiple operational amplifiers, the size of the circuit is reduced.

Instrumented Insole for Plantar Pressure Measurement in Sports

255

Fig. 2. Calibration curves provided by the manufacturer.

Fig. 3. Calibration set-up.

 Vout =

RF 1+ RS

 Vin

(1)

256

I. Echeverry-Mancera et al.

Fig. 4. Calibration curves of the sensors obtained in the laboratory.

Fig. 5. Signal conditioning circuit

The PIC16F88 microcontroller was used for analog-to-digital conversion and data processing. Figure 5 shows the circuit diagram, and Fig. 6 shows a 3D view of the circuit board.

Instrumented Insole for Plantar Pressure Measurement in Sports

257

Fig. 6. 3D view of the circuit board

3 3.1

Results and Discussion Web Application

The system was evaluated during the execution of exercises like walking and jumping. Figure 7 shows the web application that was designed to visualize data

Fig. 7. Sensor data during jumping test

258

I. Echeverry-Mancera et al.

from the insole. The signals were acquired when a volunteer performed seven jumps. It can be seen that pressure in sensors 1 and 5 increases during the execution of the jumps. These sensors are located at the big toe and the heel respectively, and are subjected to the greatest impact force during jumping. 3.2

Power Consumption

Among the requirements for the system is low power consumption, to allow a longer battery autonomy. The system used a 9 V battery with capacity of 650 mAh for power supply. The component with higher power consumption is the antenna during data transmission. Table 1 shows the current drawn by the system and the expected runtime with a full charged battery. Table 1. Power consumption of the device.

4

Operation mode Current Voltage/battery Power

Duration

Transmission

51 mA

10 h

Not linked

1∧t>1 0 ≤ SLDts < 1 ∀t ∈ T

0 ≤ SLGt < 1 smax CNt = (Nt−1 + CNt−1 ) ∗ (1 − d) − GNt−1 + SLCt



∀t ∈ T | t > 1 0 ≤ SLCt < 1

(6) (7) (8)

F P1 = f p

(9)

F I1 = f i

(10)

GI1 = gi

(11)

F P Cts + SF P = F Pt ∗ mf p ∀t ∈ T

|

t>1 (12)

s∈S

0 ≤ SF P < mf p 

F ICts + SF I = F It ∗ mf i ∀t ∈ T

| t>1

s∈S

(13)

0 ≤ SF I < mf i 

GICts + SGI = GIt ∗ mgi ∀t ∈ T

|

t>1

s∈S

0 ≤ SGI < mgi

(14)

362

R. Duque et al. s s Ct,r,h = (Nts /mr + SLKt,r,h ) ∗ asr,h

∀s ∈ S, ∀t ∈ T, ∀r ∈ R, ∀h ∈ H

s 1 t>1

(19) (20)

To test this configuration, we run 29 experiments in order to explore different values for tmax ∈ {2, ..., 30}. In Fig. 1, left scatter, we reported the average number of accepted students in first semester, across every term (Nt1 ) for each experiment. Notice that experiments that run for short terms (e.g., tmax < 10) reported a high student quota average. However, as we experimented with longer terms, we observed that such quota tend to decrease and stabilizes at a value close to 156 students admitted in first semester. We also observe that the average number of students in the career (i.e., middle scatter in Fig. 1) seems to stabilize at a value close to 900 and the number of required professors to complete this student quota is typically less than 65 (i.e., 17 faculty professors, 2 or less graduate student instructors, and about 46 adjunct faculty instructors). We recall that despite there is no constraint that sets an upper bound over the number of adjunct faculty instructors, they can not grow infinitely since Constraint (17) sets a maximum number of courses that can be

Making Decisions on the Student Quota Problem

365

Fig. 2. Results of a single experiment configured with tmax = 30

offered depending on infrastructure limitations (i.e., hmax ). This constraint has an extra effect and limits the number of courses and the number of professors to cover such demand (See Constraints (12) to (17)). We now move our attention to Fig. 2, where we depict the behavior of our model with respect to the infrastructure constraints in terms of maximum number of laboratory hours available in a week (i.e., hmax = 250 for this experiments). The scatter plot on the left reports the number of laboratory hours used as the number of terms is incremented in an experiment configured with tmax = 30. Notice that hmax is nearly reached after term number 10, which is consistent with the behavior in Fig. 1 that reports an stable number in the number of admitted students after term 10. Furthermore, our model is able to distinguish among the number of courses using either 0, 2 or 4 laboratory hours and we depict that information on the right scatter plot. We recall that this characteristic is useful since our model can be extended to more elaborated constraints that can be applied not only to the number of hours, but also to the number of credits per course. 4.2

Maximum Number of Graduating Students in Charge of a Faculty Professor (mn):

We now move our attention to the impact of the maximum number of graduating students in charge of a faculty professor (mn). We identified in preliminary experiments that mn = 10 had no impact on the number of admitted students since the laboratory hour use constraint hmax , keeps the number number of admitted students relatively small. As a result, Constraint (18) is always satisfied since the number of students in continuation (CNt ) plus the number of students in last semester (Ntsmax ) is always below 170 (i.e., F Pt ∗ mn). We now focus on Fig. 3 where we reported the impact of smaller values mn ∈ {10, 5, 3}. To test this configuration, we run 29 experiments in order to explore different values for tmax ∈ {2, ..., 30}. In the left scatter plot, we reported the average number of accepted students in first semester, across every term (Nt1 ) for each mn configuration. Notice that experiments that run for small mn (e.g., mn = 3) reported a lower student quota average. As expected, we also observed that the average number of students and professors in the career seem to decrease

366

R. Duque et al.

Fig. 3. Experiment results using tmax configurations ranging from 2 to 30 and featuring three different configurations for mn (maximum number of graduating students in charge of a faculty professor)

Fig. 4. Experiment results using tmax configurations ranging from 2 to 30 and featuring three different upper bounds for the maximum number of faculty instructors F It

as well. We also observed in preliminary experiments that the CPU time to solve the model tends to grow from a few seconds up to 10 min as we increased tmax and decreased mn (e.g., tmax = 30 and mn = 3). We also reported up to 20 min of CPU time when we limited the number of adjunct instructors (e.g., F It ≤ 35). We attribute this behavior to the fact that the model becomes more constrained and that tmax directly impacts the number of variables in the model. 4.3

Enhancing Adjunct Instructors’ Constraints

We now study the impact of setting an upper bound (u ∈ {40, 35}) to the number of adjunct faculty instructors per term F It . This would let us compare the results obtained in the previous section when the number of adjunct instructors was not constrained. We assume mn = 10 and also include Constraints (19)and (20). Additionally, we include the following constraint: F It ≤ u

∀t ∈ T

t>1

(21)

Figure 4 depicts the obtained results. Notice on the left scatter plot that such constraints reduce the average number of admitted students in first semester and

Making Decisions on the Student Quota Problem

367

also impacts the average number of students in the career (i.e., middle plot). The time scatter plot on the right only displays the average number of faculty professors F It in order to show that the given constraints are satisfied. We recall that our model is flexible and must be configured to specific undergraduate program demands. For instance, further experiments with input values and constraints that were closer to our case study undergraduate program, highly impacted the number of admitted students in first semester. Thus, for future experimentations, it is required to obtain accurate statistical data to tune the model parameters and also to find relations with the variables. For instance, we observed that the graduating percentage impacts the student quota, because students in continuation tend to limit the maximum number of students in charge of faculty professors, especially when mn is small.

5

Related Work

Course timetabling, student-class scheduling, and faculty-class assignment are some of the complex tasks that have been extensively studied for numerous universities in the literature. They are the kind of problems that are often encountered in arranging the assignments at educational organizations. The timetabling problem consists in placing certain resources, subject to constraints, into a limited number of time slots with the aim to satisfy a set of stated objectives [18]. The timetabling problem is indeed a complex combinatorial problem that is part of a set of known problems that are NP-complete. According to [12] different approaches have been applied to this problem, constrain based approach [4], meta-heuristic approaches [5], multi-criteria approaches that handle vectors of criteria rather than employing a single evaluation function [13], case-based reasoning approaches which use previously employed knowledge and experience in solving new timetabling problems [6]. With respect to class scheduling, Dimopoulou and Miliotis reported results of computer based systems to help the construction of a combined schedule of lectures and exams at the Athens University of Economics and Business. The difficulties they reported consisted mainly in the limited availability of classroom space. They proposed an integer programming (IP) model that assigned the course in a specific both the time slot and room. The faculty class assignment problem can be seen as an employee scheduling problem consisting in the assignment of faculty teachers to specific courses at a determined time-slot. In [2] the authors proposed a model for assigning faculty members to classes subject to academic class scheduling issues, and specific policies at Kuwait University. Authors used an integer programming model in order to minimize the individual and collective dissatisfaction of faculty members. The results showed satisfactory results in time and an overall satisfaction level of faculty members. In [3], authors presented an extension work from [2] in order to consider sections of classes, instead of an atomic class, and the faculty members preferences. The authors used a mixed integer programming model, providing good quality solutions. An heuristic approach for this problem was explored in

368

R. Duque et al.

[8], in particular, a heuristic driven process with iterative mutation that applies two fitness functions to solve the problem achieving teachers’ satisfaction and the fairness of the class-faculty assignment. Finally, the student course scheduling problem consists in the assignation of students to courses offered at University, the objective is to satisfy the student requests, providing each student with a conflict-free schedule, among other additional constraints. In [10], authors proposed two fitness functions in order to fulfill students’ needs for taking courses without delaying their graduation times. On the other hand, in [9] it is proposed an approach in which courses and students schedules are built simultaneously, it is based on heuristic functions used both to quantify the requirements and to order the processing of students. These university scheduling problems have been widely addressed in the literature. Although course timetabling, student-class and faculty-class assignment problems are, at some point, related to the problem studied in this paper, none of these well-known problems considers the main aspect of interest to our problem: deciding on the number of students to be admitted.

6

Conclusions and Future Work

The number of attended students in public and private institutions is indeed an important indicator of national and international interest. In recent years, some state universities are already planning to increment their student quota by the promotion of new virtual education programs; the creation of new professional careers and postgraduate programs; the efficient use of the resources; and also the increment of the number of admitted students in the available programs. In this paper we focused on the computational challenges that come together with the goal to increment the number of admitted students in available programs and we referred to this problem as the student quota problem. Deciding on the maximum number of students that can be admitted in a program is certainly not an easy task. A bad decision on the maximum student quota can either lead to the sub-utilization of resources or to unsatisfy students requirements. In this paper, we study a set of constraints related to the utilization of facility and human resources in order to supply a student demand of courses. We introduced a Mixed Integer Programming model (MIP) with a set of linear constraints that help to find the maximum number of students that can be admitted in a career and presented a case study with relevant results for an undergraduate program of a Colombian university. We recall that our model is flexible since it can be adjusted to university requirements in order to foresee upper or lower bounds on the number of admitted students per term based on students dropout and graduation percentages, the number of required professors, the number and type of courses from the pensum that require to be available for students, and the infrastructure capacity in terms of availability hours. We are planing to include in our model a repetition percentage that will allow to represent students who stay in the same semester because they did not complete the semester course requirements. Additionally, we are planning

Making Decisions on the Student Quota Problem

369

to extend our study to differentiate among the kind of courses and number of credits in charge of faculty professors.

References 1. Agencia de Noticias UN: Cobertura del 75% en educaci´ on superior, apuesta de la naci´ on para 2025 (2017). https://agenciadenoticias.unal.edu.co/detalle/article/ cobertura-del-75-en-educacion-superior-apuesta-de-la-nacion-para-2025.html 2. Al-Yakoob, S.M., Sherali, H.D.: Mathematical programming models and algorithms for a class-faculty assignment problem. Eur. J. Oper. Res. 173(2), 488–507 (2006) 3. Al-Yakoob, S.M., Sherali, H.D.: A column generation mathematical programming approach for a class-faculty assignment problem with preferences. Comput. Manag. Sci. 12(2), 297–318 (2015) 4. Brailsford, S.C., Potts, C.N., Smith, B.M.: Constraint satisfaction problems: algorithms and applications. Eur. J. Oper. Res. 119(3), 557–581 (1999) 5. Burke, E., Kendall, G., Newall, J., Hart, E., Ross, P., Schulenburg, S.: Hyperheuristics: an emerging direction in modern search technology. In: Glover, F., Kochenberger, G.A. (eds.) Handbook of metaheuristics, pp. 457–474. Springer, Boston (2003). https://doi.org/10.1007/0-306-48056-5 16 6. Burke, E.K., Petrovic, S., Qu, R.: Case-based heuristic selection for timetabling problems. J. Sched. 9(2), 115–132 (2006). https://doi.org/10.1007/s10951-0066775-y 7. Casta˜ neda Valle, R., Rebolledo G´ omez, C.: Panorama de la educaci´ on: Indicadores de la ocde. Nota del Pa´ıs, pp. 1–11 (2013) 8. Chin-Ming, H., Chao, H.M.: A heuristic based class-faculty assigning model with the capabilities of increasing teaching quality and sharing resources effectively. Comput. Sci. Inf. Eng. 4(1), 740–744 (2009) 9. Head, C., Shaban, S.: A heuristic approach to simultaneous course/student timetabling. Comput. Oper. Res. 34(4), 919–933 (2007). https://doi.org/10.1016/j. cor.2005.05.015. http://www.sciencedirect.com/science/article/pii/S03050548050 01656 10. Hsu, C.M., Chao, H.M.: A student-oriented class-course timetabling model with the capabilities of making good use of student time, saving college budgets and sharing departmental resources effectively, vol. 2, pp. 379–384 (2009) 11. Oktavia, M., Aman, A., Bakhtiar, T.: Courses timetabling problem by minimizing the number of less preferable time slots. In: IOP Conference Series: Materials Science and Engineering, vol. 166, p. 012025. IOP Publishing (2017) 12. Petrovic, S., Burke, E.K.: University timetabling (2004) 13. Petrovic, S., Bykov, Y.: A multiobjective optimisation technique for exam timetabling based on trajectories. In: Burke, E., De Causmaecker, P. (eds.) PATAT 2002. LNCS, vol. 2740, pp. 181–194. Springer, Heidelberg (2003). https://doi.org/ 10.1007/978-3-540-45157-0 12 14. Pochet, Y., Wolsey, L.A.: Production Planning by Mixed Integer Programming. Springer, New York (2006). https://doi.org/10.1007/0-387-33477-7 15. Smith, J.C., Taskin, Z.C.: A tutorial guide to mixed-integer programming models and solution techniques. In: Optimization in Medicine and Biology, pp. 521–548 (2008)

370

R. Duque et al.

16. SNIES - Ministry of National Education in Colombia: Resumen de indicadores de educaci´ on superior (2016). https://www.mineducacion.gov.co/ sistemasdeinformacion/1735/w3-article-212350.html 17. Vielma, J.P.: Mixed integer linear programming formulation techniques. SIAM Rev. 57(1), 3–57 (2015) 18. Wren, A.: Scheduling, timetabling and rostering—a special relationship? In: Burke, E., Ross, P. (eds.) PATAT 1995. LNCS, vol. 1153, pp. 46–75. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61794-9 51

Towards On-Line Sign Language Recognition Using Cumulative SD-VLAD Descriptors Jefferson Rodr´ıguez(B)

and Fabio Mart´ınez

Grupo de investigaci´ on en ingenier´ıa biom´edica (GIIB), Motion Analysis and Computer Vision (MACV), Universidad Industrial de Santander (UIS), Bucaramanga, Colombia {jefferson.rodriguez2,famarcar}@saber.uis.edu.co

Abstract. On-line prediction of sign language gestures is nowadays a fundamental task to help and support multimedia interpretation of deaf communities. This work presents a novel approach to recover partial sign language gestures by cumulative coding different intervals of the video sequences. The method starts by computing volumetric patches that contain kinematic information from different appearance flow primitives. Then, several sequential intervals are learned to carry out the task of partial recognition. For each new video, a cumulative shape difference (SD)-VLAD representation is obtained at different intervals of the video. Each SD-VLAD descriptor recovers mean and variance motion information as signature of the computed gesture. Along the video, each partial representation is mapped to a support vector machine model to obtain a gesture recognition, being usable in on-line scenarios. The proposed approach was evaluated in a public dataset with 64 different classes, recorded in 3200 samples. This approach is able to recognize sign gestures using only 20% of the sequence with an average accuracy of 53.8% and with 60% of information, the 80% of accuracy was achieved. For complete sequences the proposed approach achieves 85% on average. Keywords: On-line recognition · Motion analysis Mid-level representation · Shape difference VLAD

1

Introduction

Deaf community and people with some auditive limitation around world is estimated in more than 466 millions according to world health organization (WHO) [2]. Sign languages is the main resource of communication and interaction among deaf people, being rich and complex as any spoken language. This articulated language is composed by coherent and continuous spatio-temporal gestures that summarize the articulated motions of upper limbs, facial expressions and trunk postures. Despite of the importance of automatic interpretation of sign languages, such characterization remains as an open problem because the c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 371–385, 2018. https://doi.org/10.1007/978-3-319-98998-3_29

372

J. Rodr´ıguez and F. Mart´ınez

multiple inter and intra signers variations. Also, different factors such as culture and regions can introduce external variations to sign languages. Such variations imply great challenges to understand and associate semantic language labels to spatio-temporal gestures. Also, for real interactions, current automatic interpretations demand on-line applications to recognize gestures while they are developed. In such sense, the problem is even more difficult because computational strategies must predict incomplete gestures while remaining robust to illumination changes, variations of perspective and even partial occlusion of signers. The sign language recognition (SLR) has been addressed in literature by multiple approaches that include global shape representations that segment all articulators but with strong limitations due to occlusions and dependences of controlled scenarios. For instance, in [21] a multi-modal analysis was proposed to recover shape information from RGB-D sequences. Local gesture representations include interest points characterization [13,20] and the analysis of appearance and geometrical primitives to represent gestures in videos [16,19]. Zahedi et al. [22] proposed a SLR by computing descriptors of appearance that together with gradient of first and second order characterize particular signs. Such approach is dependent of signer appearance and perspective in video sequence. Motion characterization has also been used to recognize gestures being robust to appearance variance and illumination changes [11,13]. For instance, in [11,20] Lukas-Kanade motion fields were computed to characterize gestures in terms of velocity displacements. Nevertheless, this strategy is prone to errors because the flow sensibility to little camera displacements and also the sparse nature of the approach capture few displacement points that difficult any statistical analysis. Also, Konecn´ y et al. [11] integrates local shape information with histograms of optical flow to describe gestures. This approach achieved a frame-level representation but lose local and regional information. Wan et al. [20] proposed a dictionary of sparse words codified from salient SIFT points and complemented with flow descriptors captured around each point. This representation achieves a proper performance of sign recognition but remains limited to cover much of the variability gestures. In [13] was implemented a local frame motion description for SLR by computing motion trajectories along of the sign but losing spatial representation of the signs. Additional, machine learning strategies are proposed for gesture recognition from real-time and on-line perspectives [8,14,15]. For instance, Masood et al. [14] proposed a deep convolutional model to represent spatial and temporal recurrent features. This approach allows a sign representation of multiple gestures but with several limitations to segment articulators of signers. Also in [15] a 3D convolutional network (3D CNN) was adapted to recognize gestures in sign language. Initially, They normalize the number of video frames. They then apply the CNN model with two layers, one for feature extraction and the other for classification. Although, 3D feature extraction is more suitable for video processing, this method evidently does not take into account motion information. On the other hand, Fan et al. [8] recognize frame-level gestures using a simplified two-stream CNNs network. This network is trained with dense optical flow information as

On-Line Sign Language Recognition Using SD-VLAD

373

input to the convolutional network. However, this single kinematics is insufficient to describe large human motion, that result fundamental in language recognition. Other alternatives have included multi-modal information [5,12], for instance, Liu et al. [12] proposed a computational strategy over RGBD sequences by firstly segmenting and tracking hands. Then a convolutional proposal was adapted to learn hand trajectories but with limitations in the representation of first order kinematics. The main contribution of this work is a novel strategy to recognize partial gestures by using a cumulative regional mid level representation of kinematic primitives. The proposed approach achieves coding gestures while they are being developed in video sequences. Firstly, a kinematic representation of gestures is carried out by coding features from a dense large displacement optical flow. Then a patch volume based coding is carried out at each frame to code the developed gestures. A set of dictionaries that compute different intervals of the gestures are built from training videos. Finally, a test video is coded as a shape difference VLAD representation to recover main means and variance motion clues. Such representation is carried out at different intervals of the video and mapped to a previously trained support vector machine, allowing a partial gesture recognition. The proposed approach was evaluated in a public sign gesture corpus with 64 different classes and more than 3000 videos. This approach is able to recognize sign gestures using 20% of the sequence with an average accuracy of 53.8% and for 60% of the information, 80% accuracy on average is achieved. For complete sequences, 85% average accuracy is obtained. The rest of the paper is organized as follows: Sect. 2 introduces the proposed method, Sect. 3 presents results and the evaluation of the method, and finally Sect. 4 presents several conclusions and perspectives of the proposed approach.

2

Proposed Approach

A cumulative gesture representation is herein proposed to recognize video sequences. The proposed approach starts by a kinematic local level representation to achieve appearance independence characterization. The kinematic primitives are computed from a dense optical flow that take into account large displacements. Multiple temporal and cumulative dictionaries are then built from a patch volume representations of the kinematic primitives. At each defined video interval, the set of recovered patches with relevant motion information are coded w.r.t. the respective cumulative dictionary from a shape difference VLAD [7] representation. Finally the obtained representation of a particular video is mapped to a previously trained support vector machine to obtain a gesture label. The several steps considered in the proposed strategy are explained in detail in the next subsections. 2.1

Computing Kinematic Features

The method starts by characterizing sign gestures with low level kinematic relationships from a local velocity field. In such case, result crucial to quantify large

374

J. Rodr´ıguez and F. Mart´ınez

motion regions developed by independent actuators, such as arm, hands, face or even shoulders. To recover such large displacements a special dense optical flow was herein implemented [1] and then several measures were captured to represent motion. The set of kinematic features herein considered are illustrated in Fig. 1. The set of set of computed features are described as follows:

Fig. 1. Kinematic features computed along video sequences as low level description of gestures, namely: (b) large displacement optical flow, (c) divergence, (d) curl, (e) and (f) motion boundaries w.r.t. x and y axis

– Dense flow velocity fields Typical approaches remain limited to quantify large displacements because the assumption of smooth motion in local neighborhoods. To avoid these limitations, herein was implemented a robust optical flow approach able to capture dense flow fields but considering large displacements of gestures [1]. This approach consider a variational strategy to minimize classical flow assumptions in which color Ecolor (w) and gradient Egradient (w) changes remain constant among consecutive frames. Likewise, additional assumptions are considered, as: Esmooth (w) =



Ψ (|∇u(x)|ti+1 + |∇v(x)|ti )

(1)

x∈Ω

where Ψ represents the atypical values, penalized in a specific neighborhood Ω. Also, a non-local criteria allows the estimation of coherent large displacements. In this case, a sift point matching is carried out among consecutive frames to recover points with large displacements in space. Then the flow regions of such interest matched regions are measured to find flow similar patterns fti (x), described as: Edesc (w1 ) =

 x∈Ω

δ(x)Ψ (|fti+1 (x + w1 (x)) − fti (x)|2 )

(2)

On-Line Sign Language Recognition Using SD-VLAD

375

with δ(x) as step function that is active only for regions where exist interest points. The sum of whole restrictions are minimized from a variational EulerLagrange approach. – Divergence fields The physical pattern of divergence over the field was also herein considered as kinematic measure of gestures. This kinematic estimation result from the derivative of flow components (u, v) at each point x along spatial directions (x, y), described as: ∂u(pt ) ∂v(pt ) + (3) div(pt ) = ∂x ∂y This kinematic estimation captures a local field expansion and allows to characterize independent body actuators along a sign description. – Rotational fields The rotational flow kinematic estimation was also considered to measure local rotation around of a perpendicular axis. This rotational patterns stand out circular gestures, commonly reported in sign languages [9]. Also, this measure estimate the flow rigidity, useful to distinguish articulated motions. The rotation of field can be expressed as: curl(pt ) =

∂v(pt ) ∂u(pt ) − ∂x ∂y

(4)

– Motion boundaries The relative speed among pixels was also recovered as first spatial derivative in flow components [6]. This kinematic measure allows to code the relative motion among pixels and remove constant motion information. This primitive also highlight main articulator motions. 2.2

Coding Motion Gesture Patches

A main drawback of typical gesture strategies is the sensibility to occlusion of articulators, and scene perturbations while the sign is described. The herein proposed approach is based on a local gesture representation, from which, a set of volumetric motion patches are computed to represent a sign gesture. In this work only patches with motion information are taken into account, by removing background patches with poor motion information. For doing so, we t 1 ft (x, y). firstly compute the average background of the video as: B( x, y) = t t=1 Then, foreground pixels are get by a simple subtraction w.r.t. the background x, y)| > τ . Differences larger than τ are considered static pixels |ft (x, y) − B( and removed. For on-line purposes, the average background can be built from a recursive mean estimator. To remove relative static patches also improve the computational efficiency of the approach (see in Fig. 2).

376

J. Rodr´ıguez and F. Mart´ınez

Fig. 2. An efficient kinematic patch representation is achieved by only considering patches with relevant motion information. To remove static pixels is herein considered a simple but efficient background model.

2.3

Kinematic Patch Description

Each of the recovered volumetric patches are described using the kinematic histograms of local motion information. Then, a histogram is built for every kinematic primitive considered in the proposed approach, as:    2π Rb (x)W (x), b = 1, 2, · · · , h(p) = Δθ x∈p ⎧ (5) ⎨ 1 if (b − 1)Δθ ≤ θ(x) < bΔθ Rb (x, y) = ⎩ 0 elsewhere where Rb (x) is an activation function that determines the particular bin, thatcode-codes the local kinematic feature, while the W (x) corresponds to a particular weight for each histogram bin. In case of orientation flow histograms (HOOF) the bins b correspond to orientations, while the W (x) is defined by the norm of each vector [4]. Likewise, the motion limits are codified as MBH histograms, quantified for each x, y components [6]. For divergence and curl the primitives are min statistically cumulated by defining the bins as: {max, max 2 , 0, 2 , min}. In such case the curl histogram (HCURL) quantify the main motion around perpendicular axis, while divergence histogram (HDIV) summarize the main moments of divergence present around each spatio-temporal patch. For divergence a simple occurrence counting is carried out while for rotational the occurrence is weighted according to angular speed. The final descriptor for each patch is formed as the concatenation of all histogram. Then, a particular sign is defined as a set of n (c,j) spatio-temporal patches S = {p1...n : j ∈ [t1 − t2 ]; c ∈ [x1 , x2 ]} bounded in a temporal interval j and spatially distributed in a c region.

On-Line Sign Language Recognition Using SD-VLAD

2.4

377

Mid Level Partial Accumulated Gesture Representation

A main contribution of this work is the possibility to predict gestures while they are developed in the sequence. For this purpose, a set of cumulative partial dictionaries are obtained at different periods of the sequence. Then, a SD-VLAD descriptor can be updated at different times in the video, achieving a prediction of the signs using cumulative patches information. The whole temporal representation is illustrated in Fig. 3 and will be explained as follows.

Fig. 3. The figure illustrates the mid level partial accumulated gesture representation. With the patches of all the video partial sequences (a), a dictionary adapted to the partial content is created (b) and updated as the information arrives. Finally a coding accumulated representation is obtained using Hard assignment and SD-VLAD (c). The computed descriptors are mapped to support vector machines previously trained with the accumulated partial descriptors (d).

Gesture accumulated dictionaries to temporally recognize sign gestures, a set of cumulative dictionaries Λ ∈ Rt×w×k were built from training sequences with different interval gesture lengths. Then, Λ = [D1 , D2 , . . . Dt ] has t temporal dictionaries that are built in a cumulative way each 20% of the sequences, i.e., D1 is a dictionary built only with the first 20% of active patches, D2 summarize a representation of 40% of active sign patches and so on. Each dictionary Di = [di1 , di2 , . . . dik ], ∈ Rw×K has K representative centroids that correspond to w−dimensional kinematic features. Every built Di dictionary is constructed by using a classical k-means algorithm from a cumulative set of samples

378

J. Rodr´ıguez and F. Mart´ınez

X i = [x1 , x2 , . . . xN ] that increase as the gesture is developed. For each dictionary, K 0 is a parameter of energy regulation outside the window-signal. As already mentioned, if CN < 1, the diffraction efficiency increases but the reason signal noise decreases. If on the other hand CN > 1 diffraction efficiency decreases but the reason signal noise increases.

5 5.1

Band-Limited Diffuser and Without Phase Singularities The Speckle Problem in CGH

If we are only interested in the intensity of the signal, the superposition of a diffuser with a random phase distribution over the initial amplitude to smooth its spectrum can be used as a free parameter to obtain a high diffraction efficiency and small quantization noises. But this type of phase produces speckles, that is, large fluctuations of intensity that disturb the reconstruction obtained optically. The main origins of these speckles due to the influence of the initial phase are [10,11]: Aliasing, Spiral Phase Singularity (SPS) and Phase Singularity (SP).

HoloEasy, A Web Application for Computer Generated Holograms

5.2

477

Construction

To construct a limited band pseudo-random diffuser we have used the method that consists of initially constructing a diffuser with a binary phase difference (DDFB) Δϕ, to then sync-interpolate it and obtain a band-limited diffuser. Such a binary phase makes the sum of the phases, around any point, always zero; this avoids obtaining a sum equal to 2π and with it SPS. On the other hand, an ideal sync-interpolation generates a single-phase diffuser, that is, of constant amplitude equal to unity. Ralf Brauer et al. [12] performing a simple truncated numeric interpolation (only adjacent sync functions contribute to the amplitude of the intermediate point) have found a value of |Δϕ| ≈ 1.335, which allows having an acceptable sync-interpolation. On the other hand, to obtain a smoothed spectrum, the signs of the phase difference must be sufficiently randomized (always avoiding the SPS) before performing the numerical sync-interpolation. We have used a heuristic algorithm to obtain a sufficiently randomized DDFB that has been developed by Bhupendra et al. [13]. 5.3

Optimization

The numerical interpolation by cardinal sinus functions is the basic operation to construct the pseudo-random diffusers of limited band, but it does not produce a phase-only diffuser because it is an approximation of the ideal interpolation. There are always different amplitude values of 1. It is possible to develop a diffuser optimization procedure taking into account the desired image that you want to obtain. The quality of the optimization process has been measured by the smoothness of its power spectrum. The parameter used to measure such smoothness is the maximum-average ratio (Υ ), Eq. 14. Υ =

max|F (m , n )|2 . |F |2

(14)

Let exp[iϕ 0 (m , n )] be the interpolated version (for cardinal sinuses) of the pseudo-random band-bound diffuser with the obtained binary phase difference want to optimize. Let f (m , n ) be the interpolated version of the object to be reconstructed. Let WF be the domain of the power spectrum of f . The distribution of the complex amplitude that serves as input to the iterative optimization process is constructed by superimposing the diffuser to the interpolated version of the object, Eq. 15. 0 (m , n )]. f0 (m , n ) = f (m , n ) exp[iϕ

(15)

The description of the j-th iteration is as follows: (i) A retro-propagation is performed by means of an inverse Fourier transformation to obtain, Eq. 16. Fj (m, n) = F −1 [fj ](m, n) = |Fj (m, n)| exp[iΦj (m, n)].

(16)

478

A. Pati˜ no-Vanegas et al.

(ii) Regularization of the power spectrum is required, keeping its support limited, Eq. 17. ⎧ 0 ≤ |Fj (m, n)| ≤ FU , (m, n) ∈ WF , ⎨ Fj (m, n) if |Fj (m, n)| > FU , (m, n) ∈ WF , (17) Fj (m, n) = FU exp[iΦj ] if ⎩ 0 (m, n) ∈ / WF , where FU is the value that serves as a reference to the operation locale1 to maintain the regularization of the spectrum. This value is defined by Eq. 18.  1/2 FU = Υdes |F (m, n)|2  , (m, n) ∈ WF , (18) where Υdes represents the Υ that you want to obtain. It is advisable to make a commitment between Υ and the standardized quadratic error. A reference value of FU low leads to a softer power spectrum, but it is about obtaining it without speckle. An acceptable compromise is obtained by choosing the Υdes as a fraction of the Υ initial Υini ; that is Υdes = bΥini , where 0 < b < 1. (iii.) The effect of retro-propagation is calculated by a Fourier transformation and Eq. 19 is obtained. j (m , n )]. fj (m , n ) = F −1 [Fj ](m , n ) = |fj (m , n )| exp[iϕ

(19)

(iv) The amplitude is replaced by that of the object to be reconstructed to obtain the input distribution to the next iteration, Eq. 20. j (m , n )]. fj+1 (m , n ) = f0 (m , n ) exp[iϕ

(20)

At the end of the process, a band-limited pseudo-random diffuser is obtained, without SP or SPS. In addition, the complex amplitude that is obtained (desired image + diffuser) has a sufficiently smoothed power spectrum.

6

IFTA for Speckles Reduction

In what follows, we will describe the variants that must be made in an IFTA to obtain an image with reduced speckles. These variants are carried out both in the plane-signal and in the plane-image. 6.1

In the Initial-Signal Plane

The initial plan is built by placing in the window-signal WS , a sync-interpolated version of the desired image. To such a version of the desired image, a bandlimited diffuser without SP or SPS must be superimposed. Such a diffuser can be spherical of the same size as the window-signal or a pseudo-random diffuser such as that which results in the optimization process described in the Sect. 5.3 1

The operation is local in the sense where a very high value is replaced by a lower value, in a point.

HoloEasy, A Web Application for Computer Generated Holograms

6.2

479

On the Hologram-Plane

Hard-Coding. For the reduction of speckles due to aliasing, the finite size of the DOE must be considered within the IFTA. To do this, the illumination wave U (x, y) is embedded in a matrix of zeros twice its size in order to simulate the finite size of the DOE. With these considerations, the operator Hhard is defined in Eq. 21.  U (m, n) exp[iΦj (m, n)] (m, n) ∈ WH , (21) gj (m, n) = Hhard [gj (m, n)] = 0 (m, n) ∈ / WH , where WH is the hologram window where the DOE is contained. In each iteration, the field that arrives outside the window-hologram is set to zero. Soft-Coding. The strong restriction of maintaining the band-limited signal (limited spectrum in the hologram-plane) is done through the operator Hhard . A smooth application of such a restriction can be performed in a process that divides the total number of iterations into P steps. In the p-th step you can use Eq. 22 [11]. g j (m, n) = Hsof t [gj (m, n)] = ω(p)Hhard [gj (m, n)] + U (m, n)(1 − ω(p))I[gj (m, n)],

(22)

where I is the identity operator and ω (p) is a parameter that takes values from 0 to 1 in each step p. In the last step, the operator Hhard is used. The parameter ω is such that in each step the amplitude of the signal outside WH decreases; it is totally eliminated in the last step. Until now, omega values that allow optimal soft-coding have not been reported in the literature. We have deduced optimal values of omega from an analysis of the mean square error in each iteration. The values that we have found are shown below on Eq. 23. ω (1) = 0.25; ω (2) = 0.43; ω (3) = 0.55; ω (4) = 0.65; ω (5) = 0.73; ω (6) = 0.79; ω (7) = 0.85; ω (8) = 0.90; ω (9) = 0.95; ω (10) = 1.00. (23)

7

IFTA for Beam Shaping

An application of CGH is the Beam Shaping (BS). BS consists in to transform the incident beam into a flat geometric shape (square, circular, rectangle, etc.). The quality of the transformation is measured with Uniformity (U ). The hard restrictions in both the plane of the signal and in the DOE plane make the problem (BS) non-linear and the use of a simple scale factor in the operator S is not enough to maintain a compromise between diffraction efficiency and uniformity. For this reason it is necessary to use a relaxation parameter that allows control of the convergence rate of the algorithm and a stabilization parameter that allows a more stable process, softening the strong restriction of maintaining the object in the signal window.

480

A. Pati˜ no-Vanegas et al.

To find the appropriate parameters of relaxation and stabilization, Kim et al. [14] have proposed a new IFTA scheme using the Tikhonov first-order regularization theory [15]. In this proposed IFTA, a Laplacian filter is applied to the field of light that arrives on the signal window to improve smoothness. They have shown that such a Laplacian operation in combination with the parametric distribution of adaptive regulation [16,17] is a good strategy to approach the optimal compromise between diffraction efficiency and uniformity. They have found, that the distribution of complex amplitude fj+1 of input to the next iteration (given by the operator S), is shown by the Eq. 24.  ⎧ ⎪ ⎪ τ f0 (m, n) exp iϕ j (m, n) ⎪ ⎪ ⎪ ⎪ ⎨  fj+1 (m, n) = S[fj (m, n)] = + 1 − τ − τ βS (m, n) fj (m, n) ⎪ ⎪

⎪ ⎪ + τ βD ∇2 |fj (m, n)| exp iϕ j (m, n) (m, n) ∈ WS ⎪ ⎪ ⎩ (1 − τ βN )fj (m, n) ∈ / WS

(24)

where: – τ is a constant relaxation parameter. – βS (m, n) is the parametric distribution of adaptive stabilization defined by Eq. 25.    2γ −1 f (m, n) − Bj f0 (m, n) tan βS (m, n) = + γ − 1. (25) π Bj f0 (m, n) Here γ is a constant parameter that serves to control the contribution of βS and Bj is the same defined by Eq. 13. – βD is a constant parameter that serves to control the contribution of the Laplacian ∇2 of |fj (m, n)| to the object.

8

Iterative Quantization of Phase Distribution

In the numerically calculated phase, the DOE transmittance function is encoded in gray levels; which must be quantized to simplify the manufacturing process. A direct quantization rule can be implemented within an IFTA: In each iteration, the phase is quantized totally in the desired number of levels. Such a process of direct quantization stagnates in few iterations. To avoid this stagnation, Wyrowki [2] proposes a step-by-step iterative quantization rule. The total number of iterations is divided into steps. In each step, a range of values to be quantized is chosen each time larger, until in the last step a direct quantization is performed. In this way, those levels that are not quantized allow the process to not stagnate, achieving that the diffraction efficiency and the signal-noise ratio are not affected strongly with respect to the values obtained with the DOE with phase without quantizing.

HoloEasy, A Web Application for Computer Generated Holograms

9

481

Description of the WEB Application for the Calculation of Holograms

The WEB application that we have developed allows us to calculate a matrix whose elements are the phase distribution Φ(m, n) between 0 and 2π. In this first version, it has been considered to use only one illumination UA (m, n) with a monochromatic plane wave. The user can choose between speckle-free holograms or not. For the calculation of speckle-free holograms, only a pseudo-random diffuser constructed with binary phase differences described in the Sect. 5.2 has been used. For the WEB application design a 3-layer architecture was used: Layer 1 is the presentation layer, which shows and captures the user’s data; the layer 2 or layer where the procedure for the calculation of the hologram is realized, is implemented in the Python language; and layer 3 or data layer, is responsible for storing and retrieving data from the database for further analysis. The communication between the presentation layer and the calculation layer is through web services. These services will allow the application to integrate with other applications that require its services. At the moment the application is implemented in an anonymous server, and the user can access it through the address http://128.75.231.110. The main page of the application is shown in the Fig. 1. This page shows the following icons: – User’s guide. Through which the user accesses a page that describes all steps to follow to generate your hologram. – Start. Clicking gives access to the calculation of the hologram. – Supports. Where a list of the scientific publications that support the procedure used in our application for the calculation of holograms is shown. – Team. Where it shows the working group that has collaborated in the realization of the application with a brief description of everyone. – Contacts. By which you access a form where the user enters his name, email address and a text message to contact the administrator of the application. The requirements for the user on each page and the procedure for calculating the hologram are described below.

Fig. 1. The main page of the HoloEasy web application

482

9.1

A. Pati˜ no-Vanegas et al.

Requirements

By clicking on Star, the user accesses a series of pages where all the requirements are requested. Such pages are shown in the Fig. 2. The requirements are: a. Desired image. The user must previously have the image they want to obtain stored on their PC. On this first page through a pushbutton, the application asks the user for the image they want. Clicking on the application opens the file browser and the user must go to the file where the image is stored. Once the image is loaded, the application displays the desired image and its size in pixels on a visualizer. Once verified the image by the user, through a pushbutton can continue with the second requirement (see Fig. 2a). b. Application type. On the second page (figure tal), the user can choose between a free hologram of speckles or not. For this, the application has a radio button. It also has two TextEdit, one to enter the size of the planesignal and another to enter the size of the hologram window WH . Through a pushbutton, the user can continue with the third requirement (see Fig. 2b). c. Quantization. On the third page, through a radio button, the user can choose whether he wants a quantified hologram or not. If you choose a quantified hologram, you can enter through a TextEdit the number of levels in which you want to quantify the phase distribution. Through a pushbutton, the user can continue with the fourth requirement (see Fig. 2c). d. Parameters. The user, through TextEdit, can assign four values to the parameters that define the operator S given by Eq. 24. A value for the parameter τ (Regulation inside window-signal) and for the parameter βN (Regulation outside window-signal) of regulation of the contribution of the field that arrives inside and outside the window-signal respectively, a value for the parameter γ (Stabilization parameter) that serves to improve the uniformity and a value for the parameter βD (Relaxation parameter) which serves to increase diffraction efficiency without losing uniformity. Also, the user, through a TextEdit, can assign a number of iterations in each of the 10 steps of soft-coding. Here also, different values of both the parameters and the number of iterations can be assigned for the quantization stage (see Fig. 2d).

9.2

Procedure for Calculating the Hologram

Although in the previous sections several versions of IFTA have been described, our contribution has been in establishing an adequate procedure to calculate Fourier phase holograms. Such contribution lies in combining different versions, so that the result of a simple version serves as an input to a more elaborate version. In addition, the procedure we have devised to calculate a hologram depends on the type of application that the user wants to give his hologram. Once all the requirements have been entered (Sect. 9.1), the WEB application performs the following procedure:

HoloEasy, A Web Application for Computer Generated Holograms

483

Fig. 2. Series of pages where all the requirements described in the Sect. 9.1 are requested: (a) Image (b) Application type, (c) Quantization and (d) Parameters.

1. Calculation of the initial diffuser. For Optical Applications Free of Spleckles. A pseudo-random initial band-limited diffuser without spiral phase singularities is calculated from binary phase differences, as described in the Sect. 5.2. For Other Applications. The initial diffuser is constructed with a uniform random phase distribution between −π and π. 2. Optimization of the diffuser. For Optical Applications Free of Spleckles. The optimization process is performed as described in the Sect. 5.3 with a value of b = 0.8. For Other Applications. It is not necessary to optimize the diffuser. 3. Construction of the initial plane-signal and selection of the size of the windowhologram WH . For Optical Applications Free of Spleckles. The initial plane is constructed as described in Sect. 6.1, using a pseudo-random initial band-bound diffuser without spiral phase singularities. The size of the plane-signal and the window-hologram are determined by the values entered by the user in the requirement Sect. 9.1(b). The window-hologram is placed in the center of the hologram-plane. For Other Applications. The initial plane-signal is constructed using a uniform random phase distribution. The size of the window-hologram is the same as the plane-signal.

484

A. Pati˜ no-Vanegas et al.

4. Calculation of the hologram without quantizing For optical applications free of spleckles. The process is carried out in three stages: – First stage: An IFTA without regulation parameters is used as described in Eq. 10, using a hard-coding for the band constraint as described in Sect. 6.2. – Second stage: The hologram obtained in the first stage is used as input for an IFTA as described by Eq. 24, using the parameters entered by the user in the requirement Sect. 9.1(d) and a soft-coding for the band limitation as described in the Sect. 6.2, with the number of iterations in each step entered by the user. – Third stage: The hologram obtained in the second stage is used as input for an IFTA as described by Eq. 11 using the same parameters and a hard-coding for band limitation. For Other Applications. The same procedure is used above, not including the finite size of the hologram. Soft-coding is used only to enforce illumination step by step. 5. Quantization of the hologram For Optical Applications Free of Spleckles. The process is carried out in 2 stages: – First stage: The not quantized hologram is used as an input to an IFTA such as the one described by Eq. 24, using an iterative quantization as described in the section (8). – Second stage: The hologram obtained in the first stage is used as input for an IFTA as described by Eq. 11 using hard-coding for band limitation and direct quantization. For Other Applications. The same procedure is used above, not including the finite size of the hologram (the size of the hologram window is the same as that of the plane-signal). Soft-coding is used only to impose the lighting step by step.

10

Results

Once all the requested requirements have been entered, the user can make the application initiate the calculation of the hologram, through a pushbutton. When the application finishes making the calculations, it shows the obtained results. The Fig. 3 shows the result obtained by the HoloEasy application: The Fig. 3a shows the desired image loaded by the user in the first request and the Fig. 3b shows the calculated hologram (unquantized and quantized) and the respective reconstruction. The quality parameters of the reconstruction obtained are also shown on the same page. Finally, the user can download the hologram and the respective reconstruction.

HoloEasy, A Web Application for Computer Generated Holograms

485

Fig. 3. Results: (a) shows the desired image by the user, and (b) shows the hologram and the respective reconstruction obtained by the HoloEasy application; the values of the quality parameters are also displayed.

11

Conclusion and Perspective

In this work, we have described the procedure used to calculate a phase hologram and with which a WEB application has been developed. The application allows the user to calculate holograms for both speckles-free optical applications and for other applications. In addition, the user can place parameters that allow him to obtain the desired image with good signal-noise ratio and high diffraction efficiency. For beam shaping applications, the user can control parameters that allow him to obtain the desired image with good uniformity without losing efficiency of diffraction. There is the prospect of making a new version where the user would be allowed to choose the type of lighting and the type of initial phase of limited band. In addition, the implementation of a repository where the initial phases of limited band that users are generating will be kept. Such phases can be reused by other users and this would save time in the calculation of such diffusers. The authors thank Hernando-Claret Ariza-P´erez for his help in improving the appearance of the web application.

References 1. Herzig, H.P.: Micro-optics: Elements, Systems and Applications. Taylor and Francis, London (1998) 2. Wyrowski, F.: Diffractive optical elements: iterative calculation of quantized, blazed structures. J. Opt. Soc. Am. 7, 961–963 (1990) 3. Pellat-Finet, P.: Optique de Fourier, th´eorie m´etaxiale et fractionnaire. Springer, Paris (2009) 4. Gerchberg, R.W., Saxton, W.O.: A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 35, 237–346 (1972) 5. Fienup, J.R.: Reconstruction of an object from the modulus of its Fourier transform. Opt. Lett. 3, 27–29 (1978) 6. Youla, D.C.: Generalized image restoration by the method of alternating orthogonal projections. IEEE Trans. Circuits Syst. 25, 694–702 (1979)

486

A. Pati˜ no-Vanegas et al.

7. Gerchberg, R.W.: Super resolution through error energy reduction. Opt. Acta. 21, 709–720 (1974) 8. Papoulis, A.: A new algorithm in spectral analysis an band-limited extrapolation. IEEE Trans. Circuits Syst. 22, 735–742 (1975) 9. Fienup, J.R.: Phase retrieval algorithm for a complicated optical system. Appl. Opt. 32, 1737–1746 (1993) 10. Wyrowski, F., Bryngdahl, O.: Iterative Fourier-transform algorithm applied to computer holography. J. Opt. Soc. A. 5, 1058–1064 (1988) 11. Aagedal, H., Schmid, M., Beth, T., Teiwes, S., Wyrowski, F., Chaussee, R.: Theory of speckles in diffractive optics and its application to beam shaping. J. Mod. Opt. 43, 1409–1421 (1996) 12. Br¨ auer, R., Wyrowski, F., Bryngdahl, O.: Diffuser in digital holography. J. Opt. Am. 8, 572–578 (1991) 13. Chhetri, B., Serikawa, S., Shimomura, T.: Heuristic algorithm for calculation of sufficiently randomized object-independent diffuser for holography. SPIE 4113, 205–216 (2000) 14. Kim, H., Lee, B.: Iterative Fourier transform algorithm with adaptative regularization parameter distribution for optimal design of diffractive optical elements. Jpn. J. Appl. Phys. 43, 702–05 (2004) 15. Tikhonov, A., Goncharsky, V., Stepanov, V., Yagola, A.: Numerical Methods for the Solution of Ill-posed Problems. Kluwer Academic, Boston (1995) 16. Kotlyar, V., Seraphimovich, P., Soifer, V.: An iterative algorithm for designing diffractive optical elements with regularization. Opt. Lasers Eng. 29, 261–68 (1998) 17. Kim, H., Yang, B., Lee, B.: Iterative Fourier transform algorithm with regularization for optimal design of diffractive optical elements. J. Opt. Soc. Am. A. 21, 2353–2365 (2004)

Integrated Model AmI-IoT-DA for Care of Elderly People Andrés Sánchez(&), Enrique González

, and Luis Barreto

Faculty of Engineering, Pontificia Universidad Javeriana, Bogotá, Colombia {asanchez-m,egonzal,luis.barreto}@javeriana.edu.co

Abstract. Elderly people suffer physical and mental deterioration, which prevent and limit them to control household chores; they loss their independence and autonomy, affecting their quality of life and well-being. In this paper an AmI IoT integrated layered model is introduced. The proposed model combines functionalities of Internet of Things (IoT), Ambient Intelligence (AmI) and Data Analytics (DA), to provide a reference for the monitoring and assistance of elderly people. The model proposes four segments responsible for automating housing, supervising the user, taking reactive actions, supervise events, identify habits, and access to AmI and IoT services and Data Analytics. Keywords: Internet of Things Elderly care

 Ambient intelligence  Data analytics

1 Introduction This paper presents a layered model Quysqua for integrating the reference models IoT, AmI and Data Analytics. The model has been designed to be applied in tele-care applications. The reference case used for its validation is the care of elderly people. A problem that emphasizes our days is the way of life of a person in his home, because due to their natural condition of the process of aging, begins to manifest symptoms of deterioration that affect his quality of life. The main difficulties that elderly people present are: the decrease in physical capacity, cognitive capacity, physiological capacity, loss of their senses among others [1]. Reason why organizations such as “the convention on rights of persons with disabilities” mentions in their agreements that older adults can only carry out simple tasks [2]. Therefore, they want someone in the home to take care of them and help them in whatever they need, but this in reality is not easily fulfilled by the families. This a very relevant problem since, in 2020, it is estimated that, for every 100 adolescents under 15 years, there will be 50 people over 60 years old. In addition, the WHO [3] announces an expectation of life of 75 to 80 years, so it is understood that there will be more elderly people living alone in their home. Persons of advanced age as well as being alone and sick must assume displacement costs, face traffic problems, insecurity and the risk of contagion of diseases [4], having to attend congested hospital centers for routine checkups and treatments of their diseases. Depending on the degree of severity of the disease, it can be treated on an outpatient © Springer Nature Switzerland AG 2018 J. E. Serrano C. and J. C. Martínez-Santos (Eds.): CCC 2018, CCIS 885, pp. 487–497, 2018. https://doi.org/10.1007/978-3-319-98998-3_37

488

A. Sánchez et al.

basis and remote at home [5]. And thus benefit the health of patients, contributing to the management of hospital centers and allowing better use of doctors’ time [6]. In order to solve this problem, several useful approximations have been made. different technologies designed for home care such as: sensor networks wireless networks [7], remote monitoring at home via cell phones [8], augmented reality using mobile devices [9], collection of biomedical signal analysis [10], and remote medical devices [11]. Some of these technologies support parametric and remote management [12], satellite communication [13], integration with geographic information systems [14]. Another way that supports the well-being of the elderly is the approach proposed by Hassanalieragh [15] that defines the opportunities and challenges in a solution for monitoring health conditions by implementing Internet of the Things (IoT) as impact technology. Zamora [16] proposes DOMOSEC, which is a given solution for home automation and that could be used for monitoring people, using commercial devices. Mileo [17] raises the use of wireless sensor networks (WSN) in which it supports a system of “Smart House” for the care of the elderly people. Dogali [18] proposes an architecture for tele-care that monitors vital signs such as Electroencephalogram (EEG) and Electrocardiogram (ECG) in real time for timely care of medical applications in patients of the third age. Silva [19] produced Unimeds, a system that relies on ambient intelligence to supply and control of medicines to people who are assisted at home. To guarantee the care of the vulnerable older adult, support systems and methodologies for remote health care in patients have been developed [20], and also systems of transmission, consolidation and signal processing biomedical [21], support systems for medical decision making [22], and systems for generation of alerts [23] had been proposed. The main objective of this article is centered on the development of a model that integrates some of the models already mentioned. This integration is divided into two dimensions. The first dimension focuses not only on the care and well-being of elderly people, but also on supporting sick older adults who need medical care at home. The second dimension is focused in the computer field, integrating in a common framework different components from the hardware layer until tools for analytics. This article is organized in three sections as follows: in the first section the general description of the proposed model is introduced, in the second section includes the validation tests and the analysis of the obtained results, in the final section some conclusions and future work are presented.

2 General Description The Quysqua model is organized in four large stack segments: IoT segment (Internet of Things), AmI segment (ambient intelligence), DA segment (Data Analytics) and ApS segment (Application Segment). Each segment groups functionalities encapsulated in layers, components and sub- components, as shown in Fig. 1. This model aims to be a reference complete model; although, in its implementation not is mandatory to include all the segments, layers, components and subcomponents. In practice, depending on the specific application, only the required elements have to be developed. In the work presented in this paper, the reference case of care of elderly

Integrated Model AmI-IoT-DA for Care of Elderly People

489

Fig. 1. General scheme of the Quysqua model.

people was used as a basis to create and validate the model; however, most of the elements are general and could be also applicable for other type of contexts. In the IoT segment, the layers are responsible for the configuration of node networks (sensors and actuators), the integration of heterogeneous low level communication technologies, and basic data management (homogenize, transform and validate) collected from sensors and sent to actuators, seen as ubiquitous services. In this segment, the home automation is done, creating networks of specific nodes to supervise and control the house, and to monitor user state and behavior. Thus, providing one platform that is responsible for the integration and communication with physical devices of the house and the user. The layers of the AmI segment are responsible for data consumption supplied by the IoT segment, application of edge intelligence for reactive decision-making, and providing services for sending and receiving messages at the AmI communication interface. In this segment, the interpretation of the of the ambient and the user behavior using the information produced by the sensors is performed. Important situations are detected yielding to the generation of events. This events can produce immediate actions on the environment, but also can sent messages to the higher segments alerting about the detected situation. The objective of the DA segment is to perform long term and more exhaustive analysis of the data recovered from the lower segments. The results generated by this layer can be accessible to the users as support information to take high level decisions. The DA segment is composed of three layers. The lower layer is devoted to the storage, cleaning and organization of the data. The second layer incorporates a set of different tools to perform data analytics, most of them based on machine learning techniques. The higher layer includes a service interface that is used to access to the services delivered by the segments from the higher application layer.

490

A. Sánchez et al.

Finally, the segment ApS is in charge of providing a set of services for enriching the information and presentation by using adaptation techniques, allowing personalization and context awareness, and also tools for visualization and presentation of information to the users. The service manager layer is in charge of controlling session creation and assuring correct control access to the services used by the application in all the layers of the system. The last and higher layer includes a set of tools to make easier the development of applications. A more detailed description of components included in each layer of the Quysqua model is out of the scope of this paper.

3 Validation and Analysis of Results In order to validate that the Quysqua model is coherent and that it can be applied to solve the problem of the care of elderly people that was introduced in Sect. 1, four types of tests were designed and applied. 3.1

Validation of the Reference Case

The Quysqua model is applied in a physical dimension, attending to the care of the rhythm cardiac disease of a person. The idea is analyzing how each layer of the model could contribute to deal with this situation in a coherent and useful way. The reference case of the heart rhythm starts at the low level in the IoT segment by connecting wereable sensors to measure the patient’s heart rate through signals such as systolic blood pressure and oxygen saturation. This information is available using the component that publishes the data produced from these sensors; the layer not only manages the connections to the nodes that provide the data but also monitors their proper operation and also makes low level validations regarding the correctness/ coherence of the data. Once the AmI segment obtains the sensors information from the ubiquitous layer, the edge intelligence layer according to reactive rules can for instance informs the elderly person that he should take his medications every day at 5 pm. This layer can also analyse the body signals to detect if there is any thing out of the normal margins, when an anomaly is detected immediate actions can be taken, for instance activate an alarm to sound at home or remind the person to take a medication. In this case, the AmI Services layer sends the information to the upper layers that had subscribed to get it, for instance, sending it to be included in the data analytics storage, or even directly creating an external notification that can be sent to the doctor or emergency services through the ApS segment. In the DA segment, a long term analysis can detect and diagnose a variation or anomaly in the condition of the person, and even compare this result to what is happening with other persons; this kind of high level information can be used by the doctor to decide for instance that the patient should take the medication in different schedule. The access to this information by users as the doctor is done for instance through a web application, using the services of the ApS segment; the doctor can take the decision about what will be the new time frequency with which the older adult should take the medicine and also introduce it in the system to modify the rules that fire actions at the AmI segment.

Integrated Model AmI-IoT-DA for Care of Elderly People

491

Other different cases relative to the care of an elderly and/or sick person have been analysed. The cases not only include physical care situations, but also emotional support requirements, as the need of a person to have communication with his relatives. Similar results are obtained showing that the services of each segment, layer and component are coherent and useful to support the well being of the persons in the reference case. 3.2

Validation in Relation to Well-Known Models of Data Analytics

To prove that the Quysqua Model meets the minimum functionalities required to achieve a good level of data analytics, three papers presenting well-known models of data analytics were chosen and compared to the proposed model. The purpose of this test is to demonstrate that the model covers the most important functionalities proposed by the other analyzed ones. The selected works to be compared with were chosen under the criteria of having relation with the specific reference case, assisting a mature adult at home; but also of including the implementation of data analytics techniques and using algorithms for the implementation of data analytics. Table 1 shows a description of the models that were used in the comparison. Table 1. Description of already recognized data analytics models. Model Oracle - 2017, Oracle Advanced Analytics’ Machine Learning Algorithms sql Functions [24]

CISCO - 2016, reference model IoT/edge computing [25]

NIST - 2017, Big Data Reference Architecture NBDRA [26]

Description Oracle proposes techniques and algorithms that should be used to contemplate an application of data analytics that has its sources in the large amount of information provided by the Internet of the things Cisco proposes a decentralized reference model focused on Fog computing. The main idea is that the data, the computation, the processing, and storage are distributed between the cloud and a low level fog layer close to the origin of the data, which process the data and store it in the devices of the network that are closest to the user (fog nodes), without needing do it all in the cloud high level NIST (National Institute of Standards and Technology of the USA) proposes a Big Data reference architecture called NBDRA. It provides a framework for admitting a variety of business environments industries, including tightly integrated business systems, improving the understanding of how big data complements and differentiates integrating data analysis, business intelligence, databases and existing systems

492

A. Sánchez et al.

As a result of this test, it was found that the Quysqua model, encompasses most of the functionalities defined in the prototypes that were compared, below, the final results of the analysis of each model are presented. • ORACLE: to comply with all the techniques and algorithms proposed by the model Oracle, the Quysqua model has the DA segment and more specifically the Analytic Tools component. Where you incorporate the different techniques (classification, regression, detection of anomalies, importance of attributes, character extraction, associative rules and clustering) and their implementation algorithms. • CISCO: The Quysqua model complies with all the functionalities of the model of CISCO reference, which are included in the AmI segment. Inside the Edge Intelligence layer are define the mechanisms of analysis, decision making, prediction of events and interaction with the user, providing tools for direct supervision and control of environment. • NIST: to comply with all the functionalities of the reference architecture NBDRA, the Quysqua model has the DA segments, which has the Big Data Storage, Analytic Tools and Analytic Services. In addition, functions as data visualization is included in the segment ApS within the Application Tools layer. 3.3

Evaluation Test Using a TAM Acceptance Model

An evaluation test of acceptance was carried out in order to prove that the Quysqua model is coherent, complete, modular, relevant and applicable. The proposed model was presented to 5 experts in the areas of data analytics and internet of things. The experts did a model acceptance survey, which was build inspired by TAM methodology. The goal was to measure if the model complies with the requirements of the reference case using IoT, AmI and DA technologies. The model was evaluated using 6 criteria, with a set of 23 questions. Each question was evaluated on a scale of 1 to 5 (1 totally disagree, 2 disagree, 3 neither agree nor disagree, 4 agree and 5 totally agree). The detailed results of this third validation test are shown in the Table 2. As can be seen, the evaluation based on experts of the Quysqua Model was positive, most of the valuations are high. The experts made several recommendations from their perspective of experience (industry or academic) in data analytics and internet of

Table 2. Results of the TAM evaluation. Evaluators criteria Coherence Modularity Applicability Completeness Relevance Compatibility Average evaluator

Ev. 1 4.6 4.5 4 4.3 4.6 4.5 4.41

Ev. 2 4.6 4.5 4.6 4 4.6 4.5 4.46

Ev. 3 4.3 3.5 4 4 4.3 5 4.18

Ev. 4 4.4 4 4.5 4.8 3.2 4 4.15

Ev. 5 4.2 4 4.5 3.6 4.6 3.3 4.03

Average judgment 4.42 4 4.32 4.14 4.26 4.26

Integrated Model AmI-IoT-DA for Care of Elderly People

493

things. These recommendations will be taken into account in future works to the improve the quality of the proposed model. The most relevant recommendations are: • For the care of older adults, a very careful use of reactive rules and deliberative rules is mandatory. Safe guard mechanisms should be included to prevent negative effects on the health of the user. The idea of detecting trends and habits is very good, but deeper validation should be done in real field of a hospital under the supervision of medical personnel. • Ratings below 4, of the criteria of modularity, relevance and compatibility were investigated with the experts. They consider that the model is structured conceptually very well, but they expect more evidence of practical application in the hospital and real-world situations. • The low qualification in the criterion of modularity occurred because the evaluators expected a complete functional prototype where they would be able to use more techniques and analytics tools. 3.4

Multi-agent System Prototype

A multi-agent systems (MAS) model was designed in order to build the software prototype allowing to evaluate the practical application of the Quysqua model. A partial implementation including some relevant cases of the reference problem was constructed. To carry out the design of the multi-agent system, the AOPOA methodology [27] was used. First, external actors are identified that become the basis of the analysis of the environment where the system operates. Then an analysis and decomposition of the goals of the system, based on the requirements of the elderly people previously described in this paper, was carried out. Next, the roles and interactions were defined to obtain the architecture of the MAS. Finally, a detailed model of the agents that integrate the defined roles was designed; this model includes the description of the behavior that should have the agent in response to the possible incoming events identified in the interaction model and also determines if any special intelligent mechanism is required to take decisions. Figure 2 shows the interaction diagram of the designed multi agent system. This diagram is organized according to the segments and layers of the model Quysqua (Fig. 1). In the figure, the ovals represent the agents, the white arrows interactions, the text on the white arrows indicate the interaction protocols used (request/response and publish/subscribe), and the yellow triangles the adapters to support components (physical sensors/actuators, databases and user interface devices). Table 3 briefly describes the functionalities of each agent. For the implementation of the IoT segment, there were 2 data sources. The first is a postgress database that was build with real data in a previous DA project; it contains information of biomedical signals from elderly patients of the hospital San Ignacio who received medical care at home for more than 6 months. The second is the data, including different physical and affective variables, generated by the simulator developed in the master’s thesis of Agreda [28], which reproduce the typical behavior and

494

A. Sánchez et al.

Fig. 2. Multi-agent systems model (SMA).

Table 3. Results of the TAM evaluation. Agent

Descriptión

Node network Sensor service

Interacts with the environment both in the reception and acting through the adapters Sensor handler to manage data reading, in synchronous sensors for which reading is requested Converts into signals the signals of each one of the sensors or actuators that have in charge Receives messages from sensors and actuators and perform a process of homogenization and transformation of data to generate messages that include cleaned data Makes decisions about the environment and the user, the generation of knowledge AmI services and interaction with the user and a first level of data analytics Sends the actions towards the IoT segment and notifications to agents of DA segment of Aps Stores reactive rules and data in the database associated to the AmI segment Generates the DA segment notifications to the AmI segment and the ApS segment Performs the modeling of user habits, create suggestion rules and interpretation of results executed by the Analytical Events Supervisor Stores the deliberative rules and data in the database associated to the DA segment Gets the notifications from the different agents and generates messages to external users or entities Manages a presentation layer through a GUI graphical interface so that user can accesses the information using graphics and tables Provides security to the multi agent system through an access control service Performs tasks related to the adaptation of information and contents of the messages received from the AmI or DA segments

Actuator service Data manager

Edge intelligence AmI service Ambient interface Analytic services Analytic tools Big data storage Notification manager Data visualization Service manager Informatión requirement

Integrated Model AmI-IoT-DA for Care of Elderly People

495

the environment of an older adult who lives alone. A web application with JavaEE-JSF technology, which implements the EclipseLink library JPA2.1 was developed in order to handle persistence to a PostgreSQL database. A SOAP WebService was created for the publication of sensor data. And also a middelware was constructed to collect data from the agents of the simulator subsystem, as well for the data collected by wearable devices. For the implementation of the AmI segment, 3 important agents had to be built; one for the edge intelligence management, one that handles the reactive rules and AmI notifications, and another that manages the persistence in the database. The prototype developed by Agreda was modified and complemented to provide the information required in the cases of the reference problem. The implementation was made by developing a standalone application with Java- SE, which implements the libraries: the BESA3 agent framework [29] and the library Jfuzzy for the management of fuzzy logic in the edge intelligence agent. MySQL-JDBC was used to handle the communication with a MySQL database associated to the ambient agent interface. A SOAP WebService client connector was to create for the agent in charge of notifications in order to consume the services offered by the IoT segment. A functionality that uses a DB Scan algorithm was built to demonstrate a case of the implementation of the DA segment services. This clustering technique is used to detect anomalies in the behavior and habits of an elderly person. In the visualization service the segments generated by the algorithm are marked and labeled with different colors for each detected habit. For greater ease in the analysis, the detail of the alerts is presented in the lower part of the user screen. This functionality allows the specialist to filter information for each of the monitored patients, and to make combinations of signals for the more complex analyzes. The implemented system also includes the development of a web application with JavaEE-JSF technology, which implements the EclipseLink JPA2.1 library to manage the persistence to a MySQL database associated to the DA segment data and notifications. The data produced by the AmI segment is consumed through of a SOAP WebService client. For the implementation of the ApS segment, the free software chart of Primefaces and JFreeChart was used for the visualization of the data. These bookstores offer data visualization functionalities that can facilitate the monitoring of patients and the process of making decisions, through line graphs that represent the behavior or trend of a given signal, combined graphs to represent trends between minimum and maximum values, and point clouds to identify the trends detected by DB Scan algorithm.

4 Conclusions and Future Work The model resulting from the integration process was a broad and robust model, which covers the components and functionalities of AmI, IoT and data analytics. The model was designed to be flexible and modular. The Quysqua Model, proposes a reference model that can be applied to the case reference, which focuses on the care and welfare of an elderly person living alone in a housing unit with an automated environment. The implementations of this model

496

A. Sánchez et al.

would have defined an architect base, to which only the rules of environmental intelligence should be configured to guarantee the care and welfare. In the process of evaluating the proposed model, a methodology was proposed that it consists in carrying out four tests that demonstrated that the model is complete and applicable in telecare. The more detailed and practical test was the development of a functional and partial prototype of the proposed model. This prototype was used to verify that the model can be implemented for the reference case. It can be concluded that all topics related to data analytics oriented towards adult health care constitute a growing theme. HE observes a strong tendency towards the creation of global DA architectures and givings by which all the assistance services can be provided to any type of health condition. The correct operation of the implemented system using previously stored real data and a simulated environment demonstrates the feasibility of the proposed model as it includes its main components. In the near future, a more complete implementation must be constructed in order to operate in a real environment.

References 1. Salech, M.F., Jara, L.R., Michea, A.L.: Cambios fisiológicos asociados al envejecimiento Rev. Med. Clin. Condes 23(1), 19–29 (2012) 2. Organización de la Naciones Unidas: Convención sobre los derechos de las personas con discapacidad (CRPD). Organización de la Naciones Unidas (2006). http://www.un.org/esa/ socdev/enable/documents/tccconvs.pdf. Accessed 5 Apr 2016 3. Organización Mundial de la Salud: Informe mundial sobre el envejecimiento y la salud. Organización Mundial de la Salud (2015). http://apps.who.int/iris/bitstream/10665/186466/ 1/9789240694873_spa.pdf. Accessed 15 Apr 2016 4. Morelos Ramírez, R.: El trabajador de la salud. Revista de la Facultad de Medicina de la UNAM 57(4), 34–42 (2014) 5. Lopardo, G.: Neumonía adquirida de la comunidad en adultos. Medicina (Buenos Aires) 75 (4), 45–257 (2015) 6. Medina, V.: Sistema de pulsioximetría y capnografía para dispositivos móviles Android. (Spanish). Biomed. Eng. J./Revista Ingeniería Biomédica 8(15), 36–44 (2014) 7. Villar-Montini, A.: Remote wireless monitoring. Archivos de Cardiología de México 79(2), 75–78 (2009) 8. Chandler, D.L.: How technology will be transforming both inpatient and at-home care. IEEE Pulse 5(6), 16–21 (2014) 9. Jiménez González, C.: Smart multi-level tool for remote patient monitoring based on a wireless sensor network and mobile augmented reality. Sensors 14(9), 17212–17234 (2014) 10. Peláez, L.: Studio de redes de sensores y aplicaciones orientadas a la recolección y análisis de señales biomédicas. Gerencia Tecnologica Inform. 12(33), 85 (2013) 11. Rajkomar, A.: Understanding safety-critical interactions with a home medical device through distributed cognition. J. Biomed. Inform. 56(1), 179–194 (2015) 12. Fanucci, L.: Sensing devices and sensor signal processing for remote monitoring of vital signs in CHF patients. IEEE Trans. Instrum. Meas. 62(3), 553–569 (2013) 13. Guevara-Valdivia, M.E.: Monitoreo remoto y seguimiento del paciente con desfibrilador automático implantable y terapia de resincronización cardiaca. Archivos de Cardiología de México 81(2), 93–99 (2011)

Integrated Model AmI-IoT-DA for Care of Elderly People

497

14. Parra-Henao, G.: Sistemas de información geográfica y sensores remotos. Aplicaciones en enfermedades transmitidas por vectores. CES Medicina 24(2), 75–89 (2010) 15. Hassanalieragh, M.: Health monitoring and management using Internet-of-Things (IOT) sensing with cloud-based processing: opportunities and challenges. In: IEEE International Conference on Services Computing (2015) 16. Zamora, M.: An integral and networked home automation solution for indoor ambient intelligence. Pervasive Comput. 9, 66–77 (2010) 17. Mileo, A.: Wireless sensor networks supporting context-aware reasoning in assisted living. In: Proceedings of the 1st ACM International Conference on Pervasive Technologies Related to Assistive Environments (2008) 18. Dogali-Cetin, G.: A real-time life-care monitoring framework: WarnRed hardware and software design. Turk. J. Electr. Eng. Comput. Sci. (10.3906), 1304–178 (2015) 19. Silva, J.: UBIMEDS: a mobile application to improve accessibility and support medication adherence. In: Proceedings of the 1st ACM SIGMM International Workshop on Media Studies and Implementations that Help Improving Access to Disabled Users, MSIADU 2009 (2009) 20. Giraldo, U.: Modelo de contexto y de dominio para la ingeniería de requisitos de sistemas ubicuos. Revista Ingenierías Universidad de Medellin 9(17), 151–164 (2010) 21. Dogali Cetin, G.: A real-time life-care monitoring framework: WarnRed hardware and software design. Turk. J. Electr. Eng. Comput. Sci. 23(4), 1040–1050 (2015) 22. Celler, B.G., Sparks, R.S.: Home telemonitoring of vital signs technical challenges and future directions. IEEE J. Biomed. Health Inform. 19(1), 82–91 (2015) 23. Skubic, R.M.: Automated health alerts using in-home sensor data for embedded health assessment. IEEE J. Transl. Eng. Health Med. 3(01), 1–11 (2015) 24. ORACLE: Oracle Advanced Analytics Data Mining Algorithms and Functions SQL API. ORACLE, 11 November 2017. http://www.oracle.com/technetwork/database/enterpriseedition/odm-techniques-algorithms-097163.html. Accessed 28 Nov 2017 25. CISCO: og Computing and the Internet of Things: Extend the Cloud to Where the Things Are, Estados Unidos: Cisco Public (2015) 26. NIST: NIST Big Data Interoperability Framework. NIST Big Data Public Working group (NBD-PWG) N. I. o. S. a. Technology, United States of America (2017) 27. Rodríguez, J., Torres, M., González, E.: La Metodología AOPOA. Pontificia Universidad Javeriana, pp. 71–78 (2007) 28. Agreda Chamorro, J.A.: Diseño de un modelo de inteligencia ambiental para asistir a personas de la tercera edad. Trabajo de Grado: Maestría en Ingeniería de Sistemas y Computación, PI 133(01), 1–67 (2015) 29. González, E., Avila, J., Bustacara, C.J.: BESA: behavior-oriented, event-driven, social-based agent framework. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2003, vol. 3, pp. 1033–1039 (2003)

Intelligent Hybrid Approach for Computer-Aided Diagnosis of Mild Cognitive Impairment Juan Camilo Fl´ orez, Santiago Murillo Rend´ on(B) , Francia Restrepo de Mej´ıa , Belarmino Segura Giraldo , and for The Alzheimer’s Disease Neuroimaging Initiative Universidad Aut´ onoma de Manizales, Caldas, Colombia [email protected]

Abstract. Mild Cognitive Impairment (MCI) is a paramount nosological entity. The concept was introduced to define the clinical state of decline or the loss of cognitive abilities which implies an initial stage to severe dementia disorders. However, diagnosis of such impairment is a challenging task due to difficulties in costs, time, as well as finding qualified experts on this topic. In this paper, a hybrid intelligent approach based on symbolic and sub-symbolic machine learning techniques is proposed. It allows to analyze the results of different cognitive tests to support decisions-making by health service staff regarding the mental state of patients. The results show that the proposed approach has a high degree of effectiveness in computer-aided diagnosis of Mild Cognitive Impairment. Keywords: Computer-aided diagnosis · Cognitive tests Machine learning · Mild Cognitive Impairment (MCI)

1

Introduction

In clinical scenarios, the early diagnosis of Cognitive impairment is of utmost importance because treatments for certain types of dementia are more effective in initial stages [31]. However, the traditional detection of cognitive impairment is intensive in terms of time and may require multiple pieces of information (e.g. the results of various cognitive tests). This data is gathered to create a coherent image of the person’s disability where efficiency and precision are guided by a healthcare professional’s expertise. In addition, the high cost of a medical diagnosis becomes a main concern and demands cheaper ways of research than the traditional diagnostic methods [30]. The concept of cognitive impairment, as an atypical normal aging condition, has been referred in the literature for many years. This deals with several problems people have with certain cognitive functions such as thinking, reasoning, memory or attention [23]. In the 1990s, the Mild Cognitive Impairment (MCI) c Springer Nature Switzerland AG 2018  J. E. Serrano C. and J. C. Mart´ınez-Santos (Eds.): CCC 2018, CCIS 885, pp. 498–511, 2018. https://doi.org/10.1007/978-3-319-98998-3_38

Intelligent Hybrid Approach for Computer-Aided Diagnosis

499

concept arose to designate an early (but abnormal) state of cognitive impairment [4,22]. Since then, the understanding of this pathology has required great efforts of inquiry both from the clinical and research perspective [21]. Initially, the concept of MCI only highlighted the deterioration of memory and emphasized its status as a prior stage of Alzheimer’s disease (AD) [18]. Later, it was recognized that this condition could be different in terms of etiology, prognosis, clinical presentation and prevalence [29]. This allowed expanding the concept to include different cognitive domains, extending early detection of other dementias in prodromal phases [18]. Similar to the concept of MCI, the process of diagnosis of this pathology has evolved over time. Initially, health professionals used multiple tests of cognitive assessment [6]. However, modern processes also include physical examinations, laboratory tests and brain imaging [13]. Despite this progress, MCI has multiple sources of heterogeneity, which demand greater efforts in developing alternatives to facilitate the diagnosis of this condition [17]. Lately, the health sector has benefited from advances in computer sciences. Clinical data is gathered in many places, from hospitals to medical studies; however, this large quantity of information is raw until being organized and analyzed [20]. Thus, diagnostic systems assisted by computer and intelligent health systems [1] have acquired great importance. They provide means to improve the interpretation and manipulation of clinical data through analysis of images, data mining, machine learning, among other techniques. This paper explores the benefit of joint application in two distinct machine learning techniques to assist in the diagnosis of Mild Cognitive Impairment in people older than 60 years of age. Therefore, the construction of a hybrid Assembly - based on the Bagging method - from Artificial Neural Networks (ANN) and decision trees is proposed. This acquires modifications in the creation and the joint prediction algorithms to improve its performance compared to other existing methods. The current intelligent hybrid approach is capable of analyzing the results of multiple tests of cognitive assessment in order to support MCI diagnosis in the target population. For this purpose, two databases have been used: the first one was obtained from the implementation of the program project database for diagnosis and control of chronic non-communicable diseases and cervical and breast cancer, with the support of ICT in the Department of Caldas; and the second one was gained from the ADNI data for The Alzheimer’s Disease Neuroimaging Initiative1 .

1

Data used in preparation of this paper was obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the researchers within ADNI contributed to the design and implementation of ADNI and nor provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: https://adni.loni.usc.edu/wp-content/ uploads/how to apply/ADNI Acknowledgement List.pdf.

500

2

J. C. Fl´ orez et al.

Literature Review

Over time, several researchers have designed multiple neuropsychological valuation scales that allow the examination of people’s cognitive capacities. Minimental State Examination (MMSE) [5], Montreal Cognitive Assessment (MoCA) [19], Clinical Dementia Rating (CDR) [9] and Global deterioration scale (GDS) [32] nowadays are considered reliable and valid instruments for detecting diseases related to cognitive impairment (including Mild Cognitive Impairment) [33]. The comprehensive application of the aforementioned tools involves long-term processes, thus, several researchers have focused on proposing alternatives to expedite MCI diagnosis. Therefore, several studies that involve image analysis (MRI or PET) have increased to determine the prevalence of this condition in humans [6]. Some signs such as hypometabolism and medial temporal lobe atrophy have been recorded in people with Mild Cognitive Impairment compared to cognitively normal individuals. Even the presence of these conditions has a high predictive value of progress to other subsequent diseases, such as dementia [24]. In recent years, Computer-aided diagnosis (CAD) has become a favorite exploration subject. This allows physicians to support decision making regarding the diagnosis of a specific disease [3]. Most studies involving CAD and MCI focus on studying structural differences by the analysis of medical images through supervised learning techniques [33]. To support this, Liu et al. [15] presented an algorithm called MBK for assisted detection of MCI and AD. This technique allows modeling the diagnostic process as a synthesis analysis of biomarkers obtained through medical image processing (MRI and PET). On the other hand, Suk and Shen [25] proposed a computational model for MCI diagnosis from the classification of components extracted from medical images by using vector support machines. Liu et al. [14] designed a method to assist MCI diagnosis involving brain imaging (MRI) through multilayer neural networks. Although CAD represents an important line of research, it is still difficult to apply in primary clinical scenarios due to limited access to medical imaging equipment [33]. This constraint has favored the development of a different method of MCI detection. It works in a similar way to computer aided diagnosis techniques but does not require the use of medical images. Therefore, Sun et al. [26] propose an algorithm called MNBN to assist Mild Cognitive Impairment diagnosis. It uses information of various characteristics such as age, sex, level of education, MMSE, CDR, among others, to find similar cases to guide the physicians’ diagnosis. The authors report that the MNBN method reaches an accuracy of 82%. On the other hand, Umer [28] presents an analysis on the performance of different methods of decision trees in machine learning, random forests, Bayesian networks, Perceptron’s Multilayer, among others-in the task of classifying the cognitive state (normal, MCI or AD). The author reports that the evaluated techniques have between 82% and 89% accuracy. Later, Williams et al. [30] explore the use of demographic and cognitivetesting data to predict clinical dementia rating (CDR) scores and clinical diagnoses of patients (cognitively sound, mild cognitive impairment or dementia),

Intelligent Hybrid Approach for Computer-Aided Diagnosis

501

through the implementation of four machine learning algorithms:-Na¨ıve-Bayes(NB), decision Tree (AD), Artificial Neural Network (ANN), and support vector machine (SVM). The authors state that the evaluated techniques are between 74% and 84% of accuracy. Finally, Yin et al. [33] propose an intelligent hybrid approach to support MCI and AD detection from computational analysis of the results of various cognitive tests such as MoCA, MMSE, among others. This method involves 2 stages: the first stage comprises a reduction technique of attributes based on genetic algorithms. The second stage involves the application of techniques of uncertain reasoning to predict the probability MCI or AD. The authors state that the hybrid approach reaches an accuracy of 83.5%.

3 3.1

Methodology Gathering Information

The current research uses the variables obtained from two databases (See Table 1): The first – named ADNI - comes from The Alzheimer’s Disease Neuroimaging Initiative. This initiative was launched in 2003 as a public-private partnership led by Principal investigator Michael W. Weiner, MD. The main objective of ADNI has been to test Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), other biological markers, clinical and neuropsychological evaluations which can be combined to measure the progression of Mild Cognitive Impairment (MCI) and early stages of Alzheimer’s disease. Currently, the database has information on 877 patients from the United States of America and Canada, classified in three different groups: 426 healthy subjects, 254 subjects with MCI and 197 subjects with AD. The second database - named Caldas - comes from the project implementation of the program for diagnosis and control of non-communicable chronic diseases, and cervical and breast cancer, with the support of ICT in the Department of Caldas. It provides information on the cognitive testing of project participants. Currently, the database has information on 370 patients from the department of Caldas in Colombia, classified in three different groups: 226 healthy subjects, 115 subjects with MCI and 29 subjects with Severe Cognitive Impairment (SCI). 3.2

Fundamentals for Decision Trees

A decision tree represents a hierarchical structure formed by nodes and directed edges. These trees have three types of nodes: a root node, several internal nodes, and multiple nodes leaves or terminals. Each terminal node is associated with class labels. Nonterminal nodes which include the root and other internal nodes - include the attribute test conditions to split the elements that have different characteristics [27]. The measures designed to select the best split of the records in the nonterminal nodes are usually based on the degree of impurity of the descendant nodes. The smaller the degree of impurity, the more skewed the class

502

J. C. Fl´ orez et al.

Table 1. Descripci´ on de las variables demogr´ aficas y neurosicol´ ogicas incluidas en el estudio. Estas se eligen de acuerdo a un an´ alisis de importancias por medio de bosques aleatorios, tal y como se explica en [16] Name

Possible values ADNI Possible values Caldas

Age

{x ∈ R|x ≥ 50}

{x ∈ R|x ≥ 60}

Education {x ∈ N>0 }

N/A

MMSE

{0; 1; 2; . . . ; 30}

{0; 1; 2; . . . ; 30}

MoCA

N/A

{0; 1; 2; . . . ; 30}

CDR

{0; 0, 5; 1; 2; 3}

{0; 0, 5; 1; 2; 3}

GDS

N/A

{1; 2; 3; . . . ; 7}

Diagnosis

{Sano; DCL; Grave}

{Sano; DCL; Grave}

distribution [27]. The two most common impurity measurements are entropy (see Eq. 1) and Gini index (see Eq. 2) [11]. Entropy(t) = −

c−1 

p(i|t) log2 p(i|t)

(1)

i=0

Gini(t) = 1 −

c−1 

[p(i|t)]2

(2)

i=0

where c indicates the number of existing classes and p(i|t) denotes the fraction of the records belonging to class i at a specific node t. 3.3

Performance Decision Trees Test

After the construction of the base technique for decision trees, a performance test is carried out to assist the MCI diagnosis. In this, training of the technique is carried out based on the method of cross-validation with 10 iterations. Subsequently, the estimates for the records of both databases (ADNI and Caldas) are consolidated in confusion matrices and the respective metrics of accuracy, precision, sensitivity, specificity and Cohen’s kappa coefficient are calculated. The latter is included since the first four may present a bias due to the current imbalance of the classes [29]. 3.4

Fundamentals for Artificial Neural Networks (ANN)

An Artificial Neural Network (ANN) is a mathematical model that aims to simulate the structure and functionalities of biological neural networks [7]. The basic building block for any ANN is the artificial neuron (see Fig. 1), that is, a simple mathematical function [12].

Intelligent Hybrid Approach for Computer-Aided Diagnosis

503

Fig. 1. Principle of operation of an artificial neuron

Although, the working principles of an artificial neuron are quite simple, the true potential and calculation power of these models come to life when starting to interconnect them in layers of artificial networks. These networks are based on the fact that complexity can just arise from basic and simple rules [12]. In the present study, a specific structure and configuration of artificial neural networks are used. The first is based on a three layer scheme (see Fig. 2): two layers of unitary linear rectifiers (ReLU, see Eq. 3) and one layer with softmax activation functions (see Eq. 4). ReLU (x) = max(0, x)

(3)

exj Sof tmax(xj ) =  i xi

(4)

The ANN entries represent the demographic and neuropsychological variables of the patients. On the other hand, the neurons of the output layer are associated to each of the three possible diagnostic classes Healthy; MCI; SCI. Finally, a cross-entropy function and an Adamax optimizer for ANN training are available. The latter is a variant of the stochastic optimization method known as Adam, based on the infinity norm [10]. 3.5

Performance Test for Artificial Neural Networks (ANN)

After the construction of the base technique for Artificial Neural Networks, a performance test is carried out to assist the MCI diagnosis as described in Sect. 3.4. 3.6

Hybridization of Machine Learning Techniques

The Bagging, in its original form, suggests that the base classifiers are constructed from the same technique of machine learning. However, when using heterogeneous classifiers as decision trees and artificial neural networks, it is

504

J. C. Fl´ orez et al.

Fig. 2. Used Neural Network Net structure

necessary to adapt the creation algorithm of the ensemble tree to allow this composition [2]. The present study used the algorithm proposed by Hsu et al. [8] for the creation of the hybrid ensemble. This has ten base classifiers, five decision trees and five Artificial Neural Networks (ANN) trained in random subsets of the original data set. Regarding the prediction mechanism of the ensemble, Algoritmo 1 based on the Bagging method is proposed, which deals with the estimations both from a classification and regression perspective. This allows to lower the inconveniences present in the said method in the event of a tie throughout the voting process of the final class label. Consequently, it is required that each base classifier provides its individual prediction as a pair formed by the expected class tag and the probability of belonging to each of the available classes. For example: being Healthy; MCI; SCI possible prediction classes, the response of a classifier could be (healthy, [0,7,0,2, 0,1]) or MCI; [0,1,0,9, 0,0]). In case that the averages of the probabilities of belonging to two or more classes are equal, the jump-off mechanism of the traditional Bagging method is adopted, involving the selection of the lowest class label [2]. 3.7

Validation of the Intelligent Hybrid Approach

In this stage, a performance test is done to assist MCI diagnosis as described in Sect. 3.4. Additionally, the estimated performance metrics of the intelligent hybrid approach are calculated with respect to a new database. This database comes from the continuation of the project Implementation of the program for

Intelligent Hybrid Approach for Computer-Aided Diagnosis

505

Input: x are the demographic variables and neurophsycological from each person, Z is the possible class labels set, E is the trained classifiers set, y N is the number of base classifiers Output: E ∗ is the final class label, it shows the mental state of the person labels ← [0 . . . N ]; // array or list probability ← [0 . . . N ][Size(Z)]; // array or list for i ← 1; i ≤ N do (labelsi , probabilyi ) ← predict(Ei , x); end  ax 1; // label(s) more predicted E ∗ ← arg m´ z∈Z

i:etiquetasi =z

if Size(E ∗ ) ¿ 1 then eje ← 0; promedios ← Mean(probability, axis); ax means; E ∗ ← arg m´ z∈Z

// there is a tie in the vote // label with highest average

end return E ∗ ;

Algoritmo 1. Mechanism of prediction of the assembly based on the Bagging method

diagnosis and control of chronic non-communicable diseases and Cervical and Breast Cancer, with the support of ICT in the department of Caldas and has information on the cognitive tests of 271 patients from Caldas, Colombia who are classified in three different groups: 180 healthy patients, 76 patients with MCI and 15 patients with Severe Cognitive Impairment (SCI).

4 4.1

Results Performance for Decision Trees

The performance metrics for decision trees resulting from the test described in Sect. 3.4 is shown below (Table 2). Table 2. Performance metrics for decision trees on the databases of the ADNI y Caldas M´etricas

Base de datos ADNI Base de datos Caldas

Exactitud

0, 8818 ± 0, 0018

0, 9526 ± 0, 0127

Precisi´ on

0, 8818 ± 0, 0020

0, 9526 ± 0, 0128

Sensibilidad

0, 8818 ± 0, 0019

0, 9526 ± 0, 0127

Especificidad 0, 9479 ± 0, 0010

0, 9509 ± 0, 0137

0, 8123 ± 0, 0030

0, 9091 ± 0, 0237

Kappa

506

4.2

J. C. Fl´ orez et al.

Performance for Artificial Neural Networks

The performance metrics for Artificial Neural Networks resulting from the test described in Sect. 3.5 is shown below (Table 3). Table 3. Performance metrics for artificial neural networks on the databases of the ADNI y Caldas M´etricas

Base de datos ADNI Base de datos Caldas

Exactitud

0, 8958 ± 0, 0055

Precisi´ on

0, 8957 ± 0, 0058

0, 9556 ± 0, 0367

Sensibilidad

0, 8958 ± 0, 0055

0, 9664 ± 0, 0056

Especificidad 0, 9618 ± 0, 0016

0, 9692 ± 0, 0035

0, 8341 ± 0, 0085

0, 9355 ± 0, 0111

Kappa

4.3

0, 9664 ± 0, 0056

Performance for Hybrid Approach with Initial Databases

The performance metrics for the intelligent hybrid approach resulting from the first instance of the test described in Sect. 3.7 is shown below (Table 4). Table 4. Performance metrics for the intelligent hybrid approach on the databases of the ADNI y Caldas M´etricas

Base de datos ADNI Base de datos Caldas

Exactitud

0, 9149 ± 0, 0102

0, 9770 ± 0, 0046

Precisi´ on

0, 9155 ± 0, 0096

0, 9771 ± 0.0046

Sensibilidad

0, 9149 ± 0, 0102

0, 9770 ± 0, 0046

Especificidad 0, 9683 ± 0, 0040

0, 9775 ± 0, 0045

0, 8642 ± 0, 0003

0, 9554 ± 0, 0087

Kappa

4.4

Performance for Hybrid Approach of New Databases

The performance metrics for the intelligent hybrid approach resulting from the second instance of the test described in Sect. 3.7 is shown below. 4.5

Intelligent Hybrid Approach Compared to Decision Trees

The difference between the performance metrics of the intelligent hybrid approach versus those obtained in the decision trees is shown below (Tables 5 and 6).

Intelligent Hybrid Approach for Computer-Aided Diagnosis

507

Table 5. Performance metrics for the intelligent hybrid approach on the new database of Caldas M´etricas

Nueva base de datos Caldas

Exactitud

0, 9439 ± 0, 0144

Precisi´ on

0, 9507 ± 0, 0108

Sensibilidad

0, 9439 ± 0, 0144

Especificidad 0, 9716 ± 0, 0048 Kappa

0, 8865 ± 0, 0280

Table 6. M´etricas de desempe˜ no del enfoque h´ıbrido inteligente frente a las obtenidas en los ´ arboles de decisi´ on Metrics

Base de datos ADNI

Data base Caldas

Tree

Tree

Hybrid

Hybrid

Exactitud

0, 8818 ± 0, 0018 0, 9149 ± 0, 0102 0, 9526 ± 0, 0127 0, 9770 ± 0, 0046

Precisi´ on

0, 8818 ± 0, 0020 0, 9155 ± 0, 0096 0, 9526 ± 0, 0128 0, 9771 ± 0.0046

Sensibilidad

0, 8818 ± 0, 0019 0, 9149 ± 0, 0102 0, 9526 ± 0, 0127 0, 9770 ± 0, 0046

Especificidad 0, 9479 ± 0, 0010 0, 9683 ± 0, 0040 0, 9509 ± 0, 0137 0, 9775 ± 0, 0045 Kappa

4.6

0, 8123 ± 0, 0030 0, 8642 ± 0, 0003 0, 9091 ± 0, 0237 0, 9554 ± 0, 0087

Intelligent Hybrid Approach Compared to Artificial Neural Networks

The difference between the performance metrics of the intelligent hybrid approach versus those obtained in artificial neural networks is shown below (Table 7). Table 7. Performance metrics of the intelligent hybrid approach versus those obtained in artificial neural networks Metrics

Data base ADNI RNA H´ıbrido

Data base Caldas RNA Hybrid

Accuracy

0, 8958 ± 0, 0055 0, 9149 ± 0, 0102 0, 9664 ± 0, 0056 0, 9770 ± 0, 0046

Precision

0, 8957 ± 0, 0058 0, 9155 ± 0, 0096 0, 9556 ± 0, 0367 0, 9771 ± 0.0046

Sensitivity 0, 8958 ± 0, 0055 0, 9149 ± 0, 0102 0, 9664 ± 0, 0056 0, 9770 ± 0, 0046 Specificity 0, 9618 ± 0, 0016 0, 9683 ± 0, 0040 0, 9692 ± 0, 0035 0, 9775 ± 0, 0045 Kappa

0, 8341 ± 0, 0085 0, 8642 ± 0, 0003 0, 9355 ± 0, 0111 0, 9554 ± 0, 0087

508

5

J. C. Fl´ orez et al.

Conclusions

The purpose of this study was to research on the joint application of two dissimilar machine learning techniques to assist in the diagnosis of Mild Cognitive Impairment (MCI) in people over 60. It was carried out by using an intelligent hybrid approach that is capable of analyzing the results of multiple cognitive assessment tests to support the MCI diagnosis of the target population. The proposed hybrid approach bases its operation on the so-called Bagging method that is put to vote and average in case there is a tie of the individual predictions of multiple symbolic and sub symbolic classifiers (decision trees and artificial neural networks respectively) to provide health professionals a guide regarding the patients’ mental conditions. The creation and prediction algorithms the ensemble obtained modifications to approach the computer assisted diagnosis from both the classification and regression perspectives. Finally, some aspects related to the current study are highlighted: – The current intelligent hybrid approach has a high performance in assisting the diagnosis of MCI, surpassing that one exhibited by individual techniques (decision trees and artificial neural networks). Also, the results obtained by the said approach exhibit improvement compared to other similar methods documented in international literature. – The current intelligent hybrid approach can be easily used in hospitals and health centers of low complexity level because it does not require specialized clinical hardware. The input parameters are related to the results of four cognitive tests whose application demands low monetary costs and time. This benefits a large number of people, even when they have low economic resources. – Any other type of disease that currently does not have a golden criteria for its diagnosis could test the proposed approach of computer-aided diagnosis.

6

Acknowledgments

This study was supported by the research groups of Software Engineering, Automatic and Neuro-Learning of the Autonoma University of Manizales. Additionally, we thank the initiative Implementaci´ on del programa para el diagn´ ostico y control de enfermedades cr´ onicas no transmisibles y C´ ancer de C´ervix y Mama, con el apoyo de TIC en el Departamento de Caldas c´ odigo BPIN2013000100132 for providing human resources, infrastructure technology and information necessary for the development of this study. This initiative proposes the incorporation of information and communication technologies for the diagnosis and monitoring of various diseases in the department of Caldas, among them diabetes, hypertension, Cervical and Breast Cancer and Mild Cognitive Impairment. It is financed by “La Gobernaci´ on de Caldas y el Sistema General de Regal´ıas de Colombia” and executed jointly by the “Universidad Aut´ onoma de Manizales y la Universidad de Caldas”.

Intelligent Hybrid Approach for Computer-Aided Diagnosis

509

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI)(National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. The authors would like to express their gratitude to Ines Gabriela Guerrero U., M´ onica Naranjo R. and Thomas Lock, who works at the Translation Center of the Autonoma University of Manizales, for translating and reviewing this manuscript.

References 1. Bramer, M. (ed.): Artificial Intelligence an International Perspective. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03226-4 2. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 3. Doi, K.: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput. Med. Imaging Graph. 31(4), 198–211 (2007) 4. Flicker, C., Ferris, S.H.: Mild cognitive impairment in the elderly: predictors of dementia. Neurology 41(7), 449–450 (1991) 5. Folstein, M.F., Folstein, S.E., McHugh, P.R.: “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12(3), 189–198 (1975) 6. Gauthier, S., et al.: Mild cognitive impairment. Lancet 367(9518), 1262–1270 (2006) 7. Hassanien, A.-E., Abraham, A. (eds.): Computational Intelligence in Multimedia Processing: Recent Advances. Springer, Heidelberg (2008). https://doi.org/10. 1007/978-3-540-76827-2. Chap. 2 8. Hsu, K.W.: Hybrid ensembles of decision trees and artificial neural networks. In: IEEE International Conference on Computational Intelligence and Cybernetics (CyberneticsCom), pp. 25–29 (2012)

510

J. C. Fl´ orez et al.

9. Hughes, C.P., Berg, L., Danziger, W.L., Coben, L.A., Martin, R.L.: A new clinical scale for the staging of dementia. Br. J. Psychiatry 140, 566–572 (1982) 10. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization (2015) 11. Kingsford, C., Salzberg, S.L.: What are decision trees? Nat. Biotechnol. 26(9), 1011–1013 (2008) 12. Krenker, A., Bester, J., Kos, A.: Introduction to the artificial neural networks. Artificial Neural Networks - Methodological Advances and Biomedical Applications. InTech, The Hague (2011) 13. Langa, K.M., Levine, D.A.: The diagnosis and management of mild cognitive impairment: a clinical review. JAMA 312(23), 2551–2561 (2014) 14. Liu, S., Liu, S., Cai, W., Pujol, S., Kikinis, R., Feng, D.: Early diagnosis of Alzheimer’s disease with deep learning. In: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 1015–1018 (2014) 15. Liu, S., et al.: Multifold Bayesian kernelization in Alzheimer’s diagnosis. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 303–310. Springer, Heidelberg (2013). https://doi.org/10.1007/9783-642-40763-5 38 16. Louppe, G.,Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26. Neural Information Processing Systems Foundation Inc. (2013) 17. Mariani, E., Monastero, R., Meccoci, P.: Mild cognitive impairment: a systematic review. J. Alzheimer’s Dis. 12(1), 23–25 (2007) 18. McDade, E.M., Petersen, R.C.: Mild cognitive impairment: epidemiology, pathology, and clinical assessment (2015) 19. Nasreddine, Z.S., et al.: The montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment. J. Geriatr. Soc. 53(4), 695–699 (2005) 20. University of Illinois at Chicago: When Healthcare and Computer Science Collide (2014) 21. Petersen, R.C., Morris, J.C.: Mild cognitive impairment as a clinical entity and treatment target. Arch. Neurol. 62(7), 1160–1163 (2004) 22. Petersen, R.C., et al.: Apolipoprotein E status as a predictor of the development of Alzheimer’s disease in memory-impaired individuals. JAMA 273(16), 1274–1278 (1995) 23. Roy, E.: Cognitive impairment. In: Gellman, M.D., Turner, J.R. (eds.) Encyclopedia of Behavioral Medicine, pp. 449–451. Springer, New York (2013). https://doi. org/10.1007/978-1-4419-1005-9 1118 24. Stoub, T.R., et al.: MRI predictors of risk of incident Alzheimer disease: a longitudinal study. Neurology 64(9), 1520–1524 (2005) 25. Suk, H.-I., Shen, D.: Deep learning-based feature representation for AD/MCI classification. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 583–590. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-40763-5 72 26. Sun, Y., Tang, Y., Ding, S., Cui, Y.: Diagnose the mild cognitive impairment by constructing Bayesian network with missing data. Expert Syst. Appl. 38, 442–449 (2011) 27. Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education Ltd., London (2006). Chap. 4 28. Umer, R.: Machine learning approaches for the computer aided diagnosis and prediction of Alzheimer’s disease based on clinical data. Master’s thesis, University of Georgia (2011)

Intelligent Hybrid Approach for Computer-Aided Diagnosis

511

29. Voisin, T., Touchon, J., Vellas, B.: Mild cognitive impairment: a nosological entity? Curr. Opin. Neurol. 16, S43–S45 (2003) 30. Williams, J.A., Weakley, A. Cook, D.J., Schmitter-Edgecombe, M.: Machine learning techniques for diagnostic differentiation of mild cognitive impairment and dementia. In: Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 71–76 (2013) 31. Xekardaki, A., et al.: Arterial spin labeling may contribute to the prediction of cognitive deterioration in healthy elderly individuals. Radiology 274(2), 490–499 (2015) 32. Yesavage, J.A., et al.: Development and validation of a geriatric depression screening scale: a preliminary report. J. Psychiatr. Res. 17(1), 37–49 (1982) 33. Yin, Z., Zhao, Y., Xudong, L., Duan, H.: A hybrid intelligent diagnosis approach for quick screening of Alzheimer’s disease based on multiple neuropsychological rating scales. Comput. Math. Methods Med. 27–40, 2015 (2015)

Author Index

Aguiar, Maria 432 Alvarado-Valencia, Jorge Andres 150 Alvarez, Jhon 260 Alvarez-Uribe, K. C. 1 Andres, Cortes 459 Angarita-Garcia, David 55 Aranda, Jesús Alexander 355 Arango-López, Jeferson 55, 226 Arboleda, Hugo 113 Barreto, Luis 487 Basante-Villota, C. K. 28 Bautista-Aguiar, William 252 Becerra, M. A. 1, 128, 139 Bedoya, Oscar 201 Benavides Navarro, Luis Daniel Bolivar, Holman 402 Bucheli, Víctor A. 326, 338 Bucheli, Víctor 355 Bueno, Gloria 83

113

Dominguez-Jimenez, J. A. 444 Duque, Robinson 355 Echeverry-Mancera, Iván 252 Edwin, Gamboa 459 Espitia P., Esperanza 386 Figueroa-Buitrago, Estefanía 301 Flórez, Juan Camilo 498 Florez-Quintero, Diego 252 Gamboa, Edwin 201 Garcia, Karol 402 Garzón, Gustavo 276 Garzón, Wilmer 113, 237 Goez Mora, Jhon Edison 312 Gómez, Julian Ramirez 98 González, Enrique 487 Gonzalez, Fabio A. 338 Grévisse, Christian 177 Guerrero, Fabio G. 301 Guerrero, Milton 213 Gutiérrez Vela, Francisco Luis 226

Caballero, Alejandro 418 Caballero, Liesle 16 Cadavid, Héctor 237 Camargo G., Cristian A. 192 Camilo, Ruiz 459 Campillo, Javier 444 Castano, Felipe 432 Castillo, Andrés M. 291 Castillo, Sandra 402 Castro R., Luis Fernando 386 Castro-Ospina, A. E. 128, 139 Ceron Valdivieso, Carlos C. 226 Collazos, Cesar A. 226 Contreras-Ortiz, Sonia H. 252 Correa D., Paula A. 192 Correa-Zabala, Francisco J. 39 Cueto-Ramirez, Felipe 162

Lasso-Arciniegas, L. 139 Lenis L., Andrés M. 192 Libreros, Jose 83 Llano-Ríos, Tomás Felipe 39 Londoño Lopera, Juan Camilo 312 Londoño-Delgado, E. 128 López, Germán 237

de Piñerez Reyes, Raúl Gutierrez 326 Díaz, Cesar 402 Díaz, Daniel 113 Díaz, Juan Francisco 355 Diaz-Pacheco, Lenier Leonis 471

Manrique, Rubén 162, 177 Maria, Trujillo 459 Marin-Castrillón, D. 128 Mariño, Olga 162, 177 Marrugo, Andrés G. 213

Henao, Alvaro Leon 98 Hidalgo Suarez, Carlos G. Hurtado, Julio 260 Jojoa, Mario

338

16

514

Author Index

Martínez Santos, Juan Carlos 70 Martínez, Fabio 276, 371 Martínez, Juan-C. 418 Martínez-Santos, Juan Carlos 471 Mejía P., Juan P. 192 Melo, Willson 113 Mendivelso, Cristian 237 Meneses, Jaime 213 Meza, Jhacson 213 Mogollón, Javier 418 Montilla, Andrés Felipe 386 Moreno, Wilson 276 Moreno-Sandoval, Luis Gabriel 150 Muñoz, Sara 201 Murillo Rendón, Santiago 498 Narvaez-Martinez, Dayana 252 Navarro-Newball, Andrés A. 192 Ocampo-García, Juan D. 39 Ortega-Castillo, C. M. 28 Ospina, Maria 83 Pabón, María C. 418 Patiño-Vanegas, Alberto 471 Patiño-Vanegas, John Jairo 471 Pelaez-Becerra, S. M. 128 Peluffo-Ordóñez, D. H. 1, 28, 128, 139 Peña-Unigarro, D. F. 28 Percybrooks, Winston 16 Pérez, Alexander 237 Pimienta, Camilo 113 Pomares-Quimbaya, Alexandra 150 Puertas, Edwin 150

Quintero, Juan Sebastián Mantilla

70

Ramírez, Carlos 237 Restrepo de Mejía, Francia 498 Restrepo-Calle, Felipe 338 Revelo-Fuelagán, E. J. 139 Revelo-Fuelagán, J. E. 28 Rico Mesa, Edgar Mario 312 Rincón, Luisa 418 Rios, Sonia 402 Rodríguez, Jefferson 371 Romero, Lenny A. 213 Rothkugel, Steffen 177 Salazar-Cardona, Johnny 55 Salazar-Castro, J. A. 28, 139 Sanabria, Mateo 113 Sánchez, Andrés 487 Segura Giraldo, Belarmino 498 Serna-Guarín, L. 128 Sierra, Enrique 213 Torres, Jefferson Peña 326 Trefftz, Christian 39 Trujillo, María 83, 201, 432 Uribe, Y. F. 1 Vargas Montoya, Héctor Fernando Velasco Castillo, Javier H. 291 Viveros-Melo, A. 139 Yepes-Ríos, Johan Sebastián 39

98

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 AZPDF.TIPS - All rights reserved.