Intelligent Computing PDF

This book, gathering the Proceedings of the 2018 Computing Conference, offers a remarkable collection of chapters covering a wide range of topics in intelligent systems, computing and their real-world applications. The Conference attracted a total of 568 submissions from pioneering researchers, scientists, industrial engineers, and students from all around the world. These submissions underwent a double-blind peer review process. Of those 568 submissions, 192 submissions (including 14 poster papers) were selected for inclusion in these proceedings. Despite computer science’s comparatively brief history as a formal academic discipline, it has made a number of fundamental contributions to science and society—in fact, along with electronics, it is a founding science of the current epoch of human history (‘the Information Age’) and a main driver of the Information Revolution. The goal of this conference is to provide a platform for researchers to present fundamental contributions, and to be a premier venue for academic and industry practitioners to share new ideas and development experiences. This book collects state of the art chapters on all aspects of Computer Science, from classical to intelligent. It covers both the theory and applications of the latest computer technologies and methodologies. Providing the state of the art in intelligent methods and techniques for solving real-world problems, along with a vision of future research, the book will be interesting and valuable for a broad readership.

118 downloads 5K Views 115MB Size

Report

Download pdf

Recommend Stories

Empty story

Idea Transcript

Advances in Intelligent Systems and Computing 858

Kohei Arai Supriya Kapoor Rahul Bhatia Editors

Intelligent Computing Proceedings of the 2018 Computing Conference, Volume 1

Advances in Intelligent Systems and Computing Volume 858

Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: [email protected]

The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artiﬁcial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover signiﬁcant recent developments in the ﬁeld, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results.

Advisory Board Chairman Nikhil R. Pal, Indian Statistical Institute, Kolkata, India e-mail: [email protected] Members Rafael Bello Perez, Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba e-mail: [email protected] Emilio S. Corchado, University of Salamanca, Salamanca, Spain e-mail: [email protected] Hani Hagras, University of Essex, Colchester, UK e-mail: [email protected] László T. Kóczy, Széchenyi István University, Győr, Hungary e-mail: [email protected] Vladik Kreinovich, University of Texas at El Paso, El Paso, USA e-mail: [email protected] Chin-Teng Lin, National Chiao Tung University, Hsinchu, Taiwan e-mail: [email protected] Jie Lu, University of Technology, Sydney, Australia e-mail: [email protected] Patricia Melin, Tijuana Institute of Technology, Tijuana, Mexico e-mail: [email protected] Nadia Nedjah, State University of Rio de Janeiro, Rio de Janeiro, Brazil e-mail: [email protected] Ngoc Thanh Nguyen, Wroclaw University of Technology, Wroclaw, Poland e-mail: [email protected] Jun Wang, The Chinese University of Hong Kong, Shatin, Hong Kong e-mail: [email protected]

More information about this series at http://www.springer.com/series/11156

Kohei Arai Supriya Kapoor Rahul Bhatia •

Editors

Intelligent Computing Proceedings of the 2018 Computing Conference, Volume 1

123

Editors Kohei Arai Faculty of Science and Engineering, Department of Information Science Saga University Honjo, Saga, Japan

Rahul Bhatia The Science and Information (SAI) Organization Bradford, West Yorkshire, UK

Supriya Kapoor The Science and Information (SAI) Organization Bradford, West Yorkshire, UK

ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-01173-4 ISBN 978-3-030-01174-1 (eBook) https://doi.org/10.1007/978-3-030-01174-1 Library of Congress Control Number: 2018956173 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Editor’s Preface

On behalf of the Organizing and Program Committee of the Computing Conference 2018, we would like to welcome you to the Computing Conference 2018 which was held from July 10 to July 12, 2018, in London, UK. The Conference is organized by the SAI Conferences, a group of annual conferences produced by The Science and Information (SAI) Organization, based in the UK. Despite the short history of computer science as a formal academic discipline, it has made a number of fundamental contributions to science and society—in fact, along with electronics, it is a founding science of the current epoch of human history called the Information Age and a driver of the Information Revolution. The goal of this Computing Conference is to give a platform to researchers with such fundamental contributions and to be a premier venue for industry practitioners to share new ideas and development experiences. It is one of the best-respected conferences in the area of computer science. Computing Conference 2018 began with an opening ceremony, and the Conference program featured welcome speeches followed with two keynote speeches, project demonstrations, and poster presentations. Post lunch, six sessions of paper presentations were presented by the authors including the 15-min networking break. The day ended with an Evening London Tour. Day-2 program started with two keynote talks followed with nine sessions of paper presentations. Day-3 program also started with two keynote talks followed with nine sessions of paper presentations, and the Conference ended with a closing ceremony. The Conference attracted a total of 568 submissions from many academic pioneering researchers, scientists, industrial engineers, students from all around the world. These submissions underwent a double-blind peer review process. Of those 568 submissions, 192 submissions (including 14 poster papers) have been selected to be included in this proceedings. It covers several hot topics which include artiﬁcial intelligence, data science, intelligent systems, machine learning, communication systems, security, software engineering, e-learning, Internet of things, image processing and robotics. The Conference held over 3 days hosted paper presentations, poster presentations, and project demonstrations. v

vi

Editor’s Preface

I would like to express my deep appreciation to the Keynote Speakers for sharing their knowledge and expertise with us and to all the authors who have spent the time and effort to contribute signiﬁcantly to this Conference. We extend a sincere “thank you” to the Organizing Committee for their great efforts in ensuring the successful implementation of the Conference and the Technical Committee for their constructive and enlightening reviews on the manuscripts. Without their efforts, the Conference would not have been possible. Our sincere thanks to all the sponsors, press, print, and electronic media for their excellent coverage of this Conference. Finally, we hope everyone who attended enjoyed the Conference Program and also their stay in London, UK. We ﬁrmly look forward to the impact of Computing Conference 2018 in promoting the standardization work of computer science. We are pleased to present the proceedings of this Conference as its published record. Hope to see you in 2019, in our next Computing Conference, with the same amplitude, focus, and determination. Kohei Arai

Contents

Dynamic Control of Explore/Exploit Trade-Off in Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dipti Jasrasaria and Edward O. Pyzer-Knapp

1

A Bayesian Approach for Analyzing the Dynamic Relationship Between Quarterly and Monthly Economic Indicators . . . . . . . . . . . . . . Koki Kyo

16

Sentiment Analysis System for Roman Urdu . . . . . . . . . . . . . . . . . . . . . Khawar Mehmood, Daryl Essam, and Kamran Shaﬁ User Centric Mobile Based Decision-Making System Using Natural Language Processing (NLP) and Aspect Based Opinion Mining (ABOM) Techniques for Restaurant Selection . . . . . . . . . . . . . . . . . . . . Chirath Kumarasiri and Cassim Farook

29

43

Optimal Moore Neighborhood Approach of Cellular Automaton Based Pedestrian Movement: A Case Study on the Closed Area . . . . . . Najihah Ibrahim and Fadratul Haﬁnaz Hassan

57

Emerging Structures from Artisanal Transports System: An Agent Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lea Wester

72

Automatic Web-Based Question Answer Generation System for Online Feedable New-Born Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . Sameera A. Abdul-Kader, John Woods, and Thabat Thabet

80

Application of Principal Component Analysis (PCA) and SVMs for Discharges Radiated Fields Discrimination . . . . . . . . . . . . . . . . . . . . Mohamed Gueraichi, Azzedine Nacer, and Hocine Moulai

99

Chatbot: Efﬁcient and Utility-Based Platform . . . . . . . . . . . . . . . . . . . . 109 Sonali Chandel, Yuan Yuying, Gu Yujie, Abdul Razaque, and Geng Yang

vii

viii

Contents

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis (MCDA) for Life Cycle Assessment . . . . . . . . . . . . . . 123 Andrzej Macioł and Bogdan Rębiasz The Research on Mongolian and Chinese Machine Translation Based on CNN Numerals Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Wu Nier, Su Yila, and Wanwan Liu Feature Selection for Bloom’s Question Classiﬁcation in Thai Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Khantharat Anekboon Improved Training for Self Training by Conﬁdence Assessments . . . . . 163 Dor Bank, Daniel Greenfeld, and Gal Hyams Optimal Design of Fuzzy PID Controller with CS Algorithm for Trajectory Tracking Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Oğuzhan Karahan and Banu Ataşlar-Ayyıldız Artiﬁcial Neural Network (ANN) Modeling of Reservoir Operation at Kainji Hydropower Dam, Nigeria . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 B. F. Sule, A. A. Mohammed, and A. W. Salami Research on Mongolian-Chinese Machine Translation Annotated with Gated Recurrent Unit Part of Speech . . . . . . . . . . . . . . . . . . . . . . . 199 Wanwan Liu, Yila Su, and Wu Nier Detection of Cut Transition of Video in Transform Domain . . . . . . . . . 212 Jharna Majumdar, M. Aniketh, and B. R. Abhishek Vocally Speciﬁed Text Recognition in Natural Scenes for the Blind and Visually Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Alhanouf Alnasser and Sharifa Al-Ghowinem Text Categorization for Authorship Attribution in English Poetry . . . . . 249 Catherine Gallagher and Yanjun Li A Fuzzy Programming Approach to Solve Stochastic Multi-objective Quadratic Programming Problems . . . . . . . . . . . . . . . . 262 Hamiden A. Khalifa, Elshimaa A. Elgendi, and Abdul Hadi N. Ebraheim A Two Level Hybrid Bees Algorithm for Operating Room Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Lamya Ibrahim Almaneea and Manar Ibrahim Hosny A Computational Investigation of the Role of Ion Gradients in Signal Generation in Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Seyed Ali Sadegh Zadeh and Chandra Kambhampati

Contents

ix

Simpliﬁcation Method Using K-NN Estimation and Fuzzy C-Means Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Abdelaaziz Mahdaoui, A. Bouazi, A. Hsaini Marhraoui, and E. H. Sbai Applying Data Mining Techniques to Ground Level Ozone (O3) Data in UAE: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Faten F. Kharbat, Tarik A. Elamsy, and Rahaf K. Awadallah Online Creativity Modeling and Analysis Based on Big Data of Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Anton Ivaschenko, Anastasia Khorina, and Pavel Sitnikov Single Document Extractive Text Summarization Using Neural Networks and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Niladri Chatterjee, Gautam Jain, and Gurkirat Singh Bajwa A Framework for Feature Extraction and Ranking for Opinion Making from Online Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Madeha Arif and Usman Qamar Generating Social Relationships from Relational Databases for Graph Database Creation and Social Business Intelligence Management . . . . . 372 Frank S. C. Tseng and Annie Y. H. Chou Improved Classiﬁcation Method for Detecting Potential Interactions Between Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 Li-Yeh Chuang, Yu-Da Lin, and Cheng-Hong Yang Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Fatima Khalique, Shoab Ahmed Khan, Qurat-ul-ain Mubarak, and Hasan Safdar The Use of Computational Creativity Metrics to Evaluate Alternative Values for Clustering Algorithm Parameters . . . . . . . . . . . . 415 Andrés Gómez de Silva Garza Overlapped Hashing: A Novel Scalable Blocking Technique for Entity Resolution in Big-Data Era . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Rana Khalil, Ahmed Shawish, and Doaa Elzanfaly A New Clustering Algorithm Based on Graph Connectivity . . . . . . . . . 442 Yu-Feng Li, Liang-Hung Lu, and Ying-Chao Hung An Interacting Decision Support System to Determine a Group-Member’s Role Using Automatic Behaviour Analysis . . . . . . . 455 Basmah AlKadhi and Sharifa Alghowinem

x

Contents

Data-Driven Pattern Identiﬁcation and Outlier Detection in Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Abdolrahman Khoshrou and Eric J. Pauwels Image Based Diameter Measurement and Aneurysm Detection of the Ascending Aorta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Şerife Kaba, Boran Şekeroğlu, Hüseyin Haci, and Enver Kneebone Object Recognition Using SVM Based Bag of Combined Features . . . . 497 Fozia Mehboob, Muhammad Abbas, and Abdul Rauf Optical Flow for Detection of Transitions in Video, Face and Facial Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Jharna Majumdar, M. Aniketh, and N. R. Giridhar A Green Printing Method Based on Human Perceptual and Color Difference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Pei-Chen Wu and Chang-Hong Lin Focused Visualization in Surgery Training and Navigation . . . . . . . . . . 537 Anton Ivaschenko, Alexandr Kolsanov, and Aikush Nazaryan Using DSP-ASIP for Image Processing Applications . . . . . . . . . . . . . . . 548 Sameed Sohail, Ali Saeed, and Haroon ur Rashid Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Mahadi Hasan, Mehnaz Tabassum, and Md. Jakir Hossain Texture Classiﬁcation Framework Using Gabor Filters and Local Binary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Farhan Riaz, Ali Hassan, and Saad Rehman Biometric Image Enhancement, Feature Extraction and Recognition Comprising FFT and Gabor Filtering . . . . . . . . . . . . . 581 Al Bashir, Mehnaz Tabassum, and Niamatullah Naeem Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Elena Acevedo, Antonio Acevedo, and Federico Felipe Mathematical Modeling of Real Time ECG Waveform . . . . . . . . . . . . . 606 Shazia Javed and Noor Atinah Ahmad EyeHope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Zulﬁqar A. Memon, Hammad Mubarak, Aamir Khimani, Mahzain Malik, and Saman Karim

Contents

xi

Chromaticity Improvement Using the MSR Model in Presence of Shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 Mario Dehesa Gonzalez, Alberto J. Rosales Silva, and Francisco J. Gallegos Funes Digital Image Watermarking and Performance Analysis of Histogram Modiﬁcation Based Methods . . . . . . . . . . . . . . . . . . . . . . . 631 Tanya Koohpayeh Araghi A Cognitive Framework for Object Recognition with Application to Autonomous Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638 Jamie Roche, Varuna De Silva, and Ahmet Kondoz Adaptive Piecewise and Symbolic Aggregate Approximation as an Improved Representation Method for Heat Waves Detection . . . . . . . . . 658 Aida A. Ferreira, Iona M. B. Rameh Barbosa, Ronaldo R. B. Aquino, Herrera Manuel, Sukumar Natarajan, Daniel Fosas, and David Coley Selection of Architectural Concept and Development Technologies for the Implementation of a Web-Based Platform for Psychology Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 Evgeny Nikulchev, Pavel Kolyasnikov, Dmitry Ilin, Sergey Kasatonov, Dmitry Biryukov, and Ilya Zakharov Modeling Race-Tracking Variability of Resin Rich Zones on 90º Composite 2.2 Twill Fibre Curve Plate . . . . . . . . . . . . 686 Spiridon Koutsonas MCF: Multi Colour Flicker iOS Application for Brain-Computer Interface Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 Artur Szalowski, Thomas Pege, and Dorel Picovici Performance of Map-Reduce Using Java-8 Parallel Streams . . . . . . . . . 723 Bruce P. Lester Coopetition: The New Age Panacea for Enabling Service Provider Sustainability and Proﬁtability . . . . . . . . . . . . . . . . . . . . . . . . 737 Mohibi Hussain and Jon Crowcroft On Requirements for Event Processing Network Models Using Business Event Modeling Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 Arne Koschel, Irina Astrova, Sebastian Kobert, Jan Naumann, Tobias Ruhe, and Oleg Starodubtsev qBitcoin: A Peer-to-Peer Quantum Cash System . . . . . . . . . . . . . . . . . . 763 Kazuki Ikeda A Scalable, Low-Cost, and Interactive Shape-Changing Display . . . . . . 772 Amith Vijaykumar, Keith E. Green, and Ian D. Walker

xii

Contents

Multimodal Attention for Visual Question Answering . . . . . . . . . . . . . . 783 Lorena Kodra and Elinda Kajo Meçe Random Generation of Directed Acyclic Graphs for Planning and Allocation Tasks in Heterogeneous Distributed Computing Systems . . . 793 Apolinar Velarde Martinez Revised Theoretical Approach of Activity Theory for Human Computer Interaction Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803 Ahamed M. Mithun, Z. Abu Bakar, and W. M. Shaher Yafooz Analyzing the Customer Attitude Towards an Intention to Receive SMS Marketing via Missed Call Subscription . . . . . . . . . . . . . . . . . . . . 816 Taha Zafar, Anita Laila, and Yumnah Hasan Zipf’s Law and the Frequency of Characters or Words of Oracles . . . . 828 Yang Bai and Xiuli Wang Resource Planning at the Airport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836 Ma Nang Laik and Murphy Choy Blockchain Time and Heisenberg Uncertainty Principle . . . . . . . . . . . . . 849 Ricardo Pérez-Marco Using a Hierarchical Temporal Memory Cortical Algorithm to Detect Seismic Signals in Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 Ruggero Micheletto, Kahoko Takahashi, and Ahyi Kim Fighting Apparent Losses in Metering Systems Through Combination of Meter Abstraction and Digital Object Architecture . . . . . . . . . . . . . . 864 Patrick Gacirane and Desire Ngabo Real-Time Earthquake Localisation and the Elliptic Correction . . . . . . 880 George R. Daglish and Iurii P. Sizov A Collusion Set Detection in Value Added Tax Using Benford’s Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 Priya, Jithin Mathews, K. Sandeep Kumar, Ch. Sobhan Babu, and S. V. Kasi Visweswara Rao Personalized and Intelligent Sleep and Mood Estimation Modules with Web based User Interface for Improving Quality of Life . . . . . . . . 922 Krasimir Tonchev, Georgi Balabanov, Agata Manolova, and Vladimir Poulkov Fireﬂy Combinatorial Testing Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 936 AbdulRahman A. Alsewari, Lin Mee Xuan, and Kamal Z. Zamli Autonomous Flight and Real-Time Tracking of Unmanned Aerial Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945 Bogdan Muresan and Shabnam Sadeghi Esfahlani

Contents

xiii

Qualitative Spatial Reasoning for Orientation Relations in a 3-D Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957 Ah-Lian Kor Puyuma: Linux-Based RTOS Experimental Platform for Constructing Self-driving Miniature Vehicles . . . . . . . . . . . . . . . . . . 985 Shao-Hua Wang, Sheng-Wen Cheng, and Ching-Chun (Jim) Huang A New Framework for Personal Name Disambiguation . . . . . . . . . . . . . 995 L. Georgieva and S. Buatongkue Indoor Air Quality Monitoring (IAQ): A Low-Cost Alternative to CO2 Monitoring in Comparison to an Industry Standard Device . . . . . . . . . 1010 Darshana Thomas, Bhumika Mistry, Steven Snow, and M. C. Schraefel The Application and Use of Information Technology Governance at the University Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028 Alejandra Oñate-Andino, David Mauricio, Gloria Arcos-Medina, and Danilo Pastor Individual Rationality and Real-World Strategic Interactions: Understanding the Competitive-Cooperative Spectrum . . . . . . . . . . . . . 1039 Predrag T. Tošić Modelling and Simulation of Large and Complex Systems for Airport Baggage Handling Security . . . . . . . . . . . . . . . . . . . . . . . . . 1055 Saeid Nahavandi, Bruce Gunn, Michael Johnstone, and Douglas Creighton OPC UA-Integrated Authorization Concept for the Industrial Internet of Things (IIoT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1068 Thomas Gamer, Johannes O. Schmitt, Roland Braun, and Alexander M. Schramm Smart Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086 Foziah Gazzawe and Russell Lock Utilizing Smart GPS System for Monitoring and Tracking Vehicles . . . 1098 Jamal S. Zraqou Intelligent Health Monitoring Using Smart Meters . . . . . . . . . . . . . . . . 1104 Carl Chalmers, William Hurst, Michael Mackay, Paul Fergus, Dhiya Al-Jumeily, and Bryony Kendall Detecting Situations from Heterogeneous Internet of Things Data in Smart City Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114 SK Alamgir Hossain, Md. Anisur Rahman, and M. Anwar Hossain

xiv

Contents

A Secure Key Management Technique Through Distributed Middleware for the Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . 1128 Tamanna Tabassum, SK Alamgir Hossain, and Md. Anisur Rahman On Programming Models, Smart Middleware, Cyber-Security and Self-Healing for the Next-Generation Internet-of-Things . . . . . . . . . 1140 Predrag T. Tošić and Frederick T. Sheldon Revisiting Industry 4.0: A New Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . 1156 Ahmad Ojra Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization Dipti Jasrasaria and Edward O. Pyzer-Knapp(B) IBM Research, Hartree Centre, Sci-Tech Daresbury, Warrington, UK [email protected]

Abstract. Bayesian optimization oﬀers the possibility of optimizing black-box operations not accessible through traditional techniques. The success of Bayesian optimization methods, such as Expected Improvement (EI) are signiﬁcantly aﬀected by the degree of trade-oﬀ between exploration and exploitation. Too much exploration can lead to inefﬁcient optimization protocols, whilst too much exploitation leaves the protocol open to strong initial biases, and a high chance of getting stuck in a local minimum. Typically, a constant margin is used to control this trade-oﬀ, which results in yet another hyper-parameter to be optimized. We propose contextual improvement as a simple, yet eﬀective heuristic to counter this - achieving a one-shot optimization strategy. Our proposed heuristic can be swiftly calculated and improves both the speed and robustness of discovery of optimal solutions. We demonstrate its eﬀectiveness on both synthetic and real world problems and explore the unaccounted for uncertainty in the pre-determination of search hyperparameters controlling explore-exploit trade-oﬀ. Keywords: Bayesian optimization Hyperparameter tuning

1

· Artiﬁcial intelligence

Introduction

Many important real-world global optimization problems are so-called ‘blackbox’ functions - that is to say that it is impossible either mathematically, or practically, to access the object of the optimization analytically - instead we are limited to querying the function at some point x and getting a (potentially noisy) answer in return. Some typical examples of black-box situations are the optimization of machine-learning model hyper-parameters [1,2], or in experimental design of new products or processes [3]. One popular framework for optimization of black-box functions is Bayesian optimization [1,4–7]. In this framework, a Bayesian model (typically a Gaussian process [1,8], although other models have been successfully used [9]) based on known responses of the black-box function is used as an ersatz, providing closed form access to the marginal means and variances. The optimization is then performed upon this c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 1–15, 2019. https://doi.org/10.1007/978-3-030-01174-1_1

2

D. Jasrasaria and E. O. Pyzer-Knapp

‘response surface’ in place of the true surface. The model’s prior distribution is reﬁned sequentially as new data is gathered by conditioning it upon the acquired data, with the resulting posterior distribution then being sampled to determine the next point(s) to acquire. In this way, all else being equal, the accuracy of the response surface should start to increasingly resemble the true surface. This is in fact dependent upon some of the choices made in the construction of the Bayesian model; and it is worth noting that a poor initial construction of the prior, through for instance an inappropriate kernel choice, will lead to a poor optimization protocol. Since Bayesian optimization does not have analytical access properties traditionally used in optimization, such as the gradients, it relies upon an acquisition function being deﬁned for determining which points to select. This acquisition function takes the model means and variances derived from the posterior distribution and translates them into a measure of the predicted utility of acquiring a point. At each iteration of Bayesian optimization, the acquisition function is maximized, with those data points corresponding to maximal acquisition being selected for sampling. Bayesian optimization has particular utility when the function to be optimized is expensive, and thus the number of iterations the optimizer can perform is low. It also has utility as a ‘ﬁxed-resource optimizer’ since - unlike traditional optimization methods - it is possible to set a strict bound on resources consumed without destroying convergence criteria. Indeed, in abstract, the Bayesian optimization protocol of observe, hypothesize, validate is much closer in spirit to the scientiﬁc method than other optimization procedures. 1.1

Acquisition Functions

A good choice of acquisition function is critical for the success of Bayesian optimization, although it is often not clear a priori which strategy is best suited for the task. Typical acquisition strategies fall into one of two types - improvement based strategies, and information based strategies. An improvement based strategy is analogous to the traditional optimization task in that it seeks to locate the global minimum/maximum as quickly as possible. An information based strategy is aimed at making the response surface as close to the real function as quickly as possible through the eﬃcient selection of representative data. Information based strategies are strictly exploratory and thus we focus our attention on improvement based strategies for the duration of this paper. In general, we can deﬁne the improvement, γ, provided by a given data-point, x, as γ(x) =

μ(x) − f ∗ σ(x)

(1)

for maximization, where f ∗ is the best target value observed so far, μ(x) is the predicted means supplied through the Bayesian model, and σ 2 are their corresponding variances.

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

3

Two typically used acquisition functions are the Probability of Improvement (PI) [10] and the Expected Improvement (EI).[5] In PI, the probability that sampling a given data-point, x, improves over the current best observation is maximized: P I(x) = Φ(γ(x)) (2) where Φ is the CDF of the standard normal distribution. One problem with the approach taken in PI is that it will, by its nature, prefer a point with a small but certain improvement over one which oﬀers a far greater improvement, but at a slightly higher risk. In order to combat this eﬀect, Mockus proposed the EI acquisition function. [5] A perfect acquisition function would minimize the expected deviation from the true optimum, f (x∗ ), however since that is not known (why else would we be performing optimization?) EI proposes maximizing the expected improvement over the current best known point: (3) EI(x) = μ(x) − f ∗ Φ(γ) + σ(x)φ(γ) where φ denotes the PDF of the standard normal distribution. By maximizing the expectation in this way, EI is able to more eﬃciently weigh the risk-reward balance of acquiring a data point, as it considers not just the probability that a data point oﬀers an improvement over the current best, but also how large that improvement will be. Thus a larger, but more uncertain, reward can be preferred to a small but high-probability reward (which would have been selected using PI). EI has been shown to have strong theoretical guarantees [11] and empirical eﬀectiveness [1] and so we use it throughout this study as the baseline.

2 2.1

Contextual Improvement Exploration vs. Exploitation Trade-Oﬀ

As with any global optimization procedure, in Bayesian optimization there exists a tension between exploration (i.e. the acquisition of new knowledge) and exploitation (i.e. the use of existing knowledge to drive improvement). Too much exploration will lead to an ineﬃcient search, whilst too much exploitation will likely lead to local optimization - potentially missing completely a much higher value part of the information space. EI, in its naive setting, is known to be overly greedy as it focuses too much eﬀort on the area in which it believes the optimum to be, without eﬃciently exploring additional areas of the parameter space which may turn out to be more optimal in the long-term. The addition of margins to the improvement function in (1) allow for some tuning in this regard [12,13]. A margin speciﬁes a minimum amount of improvement over the current best point, and is integrated into (1) as follows: ypred − f ∗ + (4) γ= σ

4

D. Jasrasaria and E. O. Pyzer-Knapp

for maximization, where ≥ 0 represents the degree of exploration. The higher , the more exploratory. This is due to the fact that high values of require greater inclusion of predicted variance into the acquisition function. 2.2

Deﬁnition of Contextual Improvement

The use of modiﬁed acquisition functions such as (4) have one signiﬁcant drawback. Through their use of a constant whose value is determined at the start of sampling, they now include an additional hyperparameter which itself needs tuning for optimized performance. Indeed the choice of can be the deﬁning feature for the performance of the search. As Jones notes in his 2001 paper [12]: ...the diﬃculty is that [the optimization method] is extremely sensitive to the choice of the target. If the desired improvement is too small, the search will be highly local and will only move on to search globally after searching nearly exhaustively around the current best point. On the other hand, if is set too high, the search will be excessively global, and the algorithm will be slow to ﬁnetune any promising solutions. Given that the scope of Bayesian optimization is for optimizing functions whose evaluations are expensive; this is clearly not desirable at all. In order to combat this, we propose a modiﬁcation of the improvement which is implicitly tied to the underlying model, and thus changes dynamically as the optimization progresses - since the exploration/exploitation trade-oﬀ is now dependent upon the model’s state at any point in time, we call this contextual improvement, or χ: ypred − f ∗ + cv (5) χ= σ for maximization, where cv is the contextual variance for which can be written as: σ2 cv = ∗ (6) f where σ 2 is the mean of the variances contained within the sampled posterior distribution and should be distinguished from σ which is the individual variance of a prediction for a particular point in the posterior. This is an intuitive setting for improvement, as exploration is preferred when, on average, the model has high uncertainty, and exploitation is preferred when the predicted uncertainty is low. This can provide a regularization for the search, due to the eﬀects an overly local search will have on the posterior variance. The rationale for this is as follows: since the posterior variance can be written as σ 2 (x∗ ) = K(X∗ , X∗ ) − K(X∗ , X)K(X, X)−1 K(X, X∗ )

(7)

where X∗ represents a set of as yet unsampled data-points (i.e. part of the posterior rather than the prior), K represents the kernel function and therefore K(X, X∗ ) denotes the nxn∗ covariance matrix evaluated at all pairs of training (X) and test X∗ points, and similarly for K(X, X), and K(X∗ , X∗ ), [8] - we can

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

5

see that the variance depends only upon the feature space. If a search is overly local (i.e. stuck in a non-global minimum), it will produce a highly anisotropic variance distribution with small variances close to the local minima sampled, and larger variances elsewhere in the information space. This results in a larger value for the standard deviation for the posterior variance, which in turn, through (5), forces greater sampling of the variance (equivalent to an increase in ). Since the variance is low in the locally sampled area, the acquisition function is depressed here. It is important to note here, the diﬀerence between this approach and an information centered approach. Due to the fact that (5) works directly on the acquisition function, if there are no other areas with a high expectation of improvement (i.e. the local optimum is also predicted to be a strong global optimum beyond the range of variance) then that area will continue to be sampled - this is not the case in an information centered approach. When the acquisition function is optimized directly (using a global optimization technique such as DIRECT - DIviding RECTangles), [14] the authors suggest providing a value for the distribution of the posterior variance required for (5), σ 2 , using a sampling method over the function bounds such as a lowdiscrepency sequence generation such as a Sobol or Halton sequence. Alternatively, if the manifold is not suited to this type of exploration, an MCMC-type sampling method such as slice sampling [15,16] will also produce satisfactory results, albeit at greater computational expense.

3 3.1

Experiments Deﬁnition of Success Metrics

In order to separate the contribution of contextual improvement from other algorithmic contributions, we directly compare EI with traditional improvement, -EI, with a value of 0.3 (a common value for ) and EI using contextual improvement, which we will denote as adaptive EI (AEI). Our metrics for success are twofold: ﬁrstly, we measure the performance of the search (i.e. which method ﬁnds, on average, the best value) - this is referred to in the results tables as Mean - and secondly we measure the robustness of the search (how much variance is there between repeat searches). The robustness is measured as the diﬀerence between the 10th and 90th conﬁdence intervals of the ﬁnal sampling point (i.e. 50th) as calculated using a bootstrap. Thus, throughout this study robustness is referred in results tables as ΔCI. 3.2

Experimental Details

For all experiments, we utilize a Gaussian process with a squared-exponential kernel function with ARD using the implementation provided in the GPFlow package [17]. We optimized the hyperparameters of the Gaussian process at each sampling point on the log-marginal likelihood with respect to the currently observed data-points. The validity of the kernels was determined by testing for

6

D. Jasrasaria and E. O. Pyzer-Knapp

vanishing length-scales as this is typically observed when the kernel is missspeciﬁed. Each experiment was repeated 10 times, with conﬁdence intervals being estimated using bootstrapping of the mean function. 3.3

Optimization of Synthetic Functions

One of the traditional ways of evaluating the eﬀectiveness of Bayesian optimization strategies is to compare their performance on synthetic functions. This has the advantage of the fact that these functions are very fast to evaluate, and the optima and bounds are well known. Unfortunately these functions are not necessarily representative of real world problems, hence the inclusion of the other two categories. We have chosen to evaluate three well-known benchmarking functions, the Branin-Hoo function (2D, minimization), the 6-humped camelback function (2D, minimization), and the 6-dimensional Hartmann function (6D, maximization). 3.4

Tuning of Machine Learning Algorithms

A popular use for Bayesian optimization functions is for tuning the hyperparameters of other machine-learning algorithms [1,2]. Due to this fact, the lack of dependence of contextual improvement on pre-set scheduling hyperparameters is particularly important. In order to test the eﬀectiveness of contextual improvement for this task, we use it to determine optimal hyperparameters for a support vector machine for the abalone regression task [18]. In this context we have three hyperparameters to optimize - C (regularization parameter), (insensitive loss) for regression and γ (RBF kernel function). For the actual prediction process, we utilize the support vector regression function in scikit-learn [19]. We also tune ﬁve hyperparameters of a 2 layer multi-layered perceptron to tackle the MNIST 10-class image classiﬁcation problem [20] for handwritten digits. Here we tune the number of neurons in each layer, the level of dropout [21] in each layer, and the learning rate for the stochastic gradient descent using the MLP implementation provided in the keras package, [22] which was used in conjunction with TensorFlow [23]. 3.5

Experimental Design

An obvious use for Bayesian optimization is in experimental design, where each evaluation can be expensive both in time and money, and the targets can be noisy. For this experiment, we aim to design 2D aerofoils which optimize a lift to drag ratio as calculated using the JavaFoil program [24]. In order to specify the aerofoil design, we use the NACA 4-digit classiﬁcation scheme, which denotes thickness relative to chord length, camber relative to chord length, the position of the camber along the chord length, as well as the angle of attack, thus resulting in a 4-dimensional optimization problem. It is important to note that due to the empirical treatment of the drag coeﬃcient, unrealistically high values of the

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

7

lift to drag ratio can be observed when using JavaFoil as the ground truth. We chose to simply optimize the ground-truth function as calculated, but note the potential to apply a constraint in the optimization to account for this [25].

4 4.1

Results and Discussion Synthetic Functions

A graphical representation of the search, including the optimization progress and model fragility (the variance between runs) is shown in Fig. 1. A numerical comparison is shown below in Table 1. Table 1. Summary of the results of experiments on synthetic functions. For Conﬁdence Intervals (ΔCI), smaller values demonstrate reliability over multiple runs. Branin (min) Camelback (min) Hartmann (max) Mean ΔCI Mean ΔCI Mean ΔCI AEI

0.406 0.002 −1.000 0.000

3.074

0.122

EI-0.0 0.997

1.481

−0.9259 1.000

3.081 0.439

EI-0.3 0.702

1.185

−0.9499 0.816

3.0754 0.652

It can be seen that our setting of EI produces superior search capability for the three synthetic functions studied. For all but the 6-dimensional Hartmann function, AEI on average produces the most optimal results, and in all cases it achieves that result with the greatest reliability (smallest value for CI). This is due, in part to its ability to extract itself from local minima, since in the case of the Branin-Hoo function, the higher means for both settings of EI are due to the algorithm getting stuck in a local minima with a far worse value. Even in the one case in which AEI did not perform the best - 6-dimensional Hartmann - it can be seen that the average result discovered is extremely close to the best discovered by EI, and for this case AEI demonstrates superior reliability. It is also interesting to observe that in general the AEI search tracks with, or outperforms whichever method is performing best in the early sampling. Given that this method does not require the tuning parameter of traditional improvement, this can be seen as a validation of the dynamic approach taken here. 4.2

Tuning of Machine Learning Algorithms

As previously described, we test our contextual improvement on two tasks - the tuning of three hyperparameters of a support vector machine for the abalone regression task, and the tuning of ﬁve parameters of a 2-hidden-layer multilayer perceptron for the MNIST classiﬁcation task. The results can be seen in Table 2. For the SVM regression task, it can be seen that methods result in

8

D. Jasrasaria and E. O. Pyzer-Knapp

Fig. 1. Summary of the searches performed on synthetic functions. (a), (c) and (e) show the evolution of the best sampled value for Branin, camelback and 6-D Hartmann respectively, whilst (b), (d) and (f) show the evolving fragility of the model based upon the initial seed data. Values are constructed by bootstrapping the mean over each equivalent sampling position as the search progresses. If the performance of the model is strongly varying amongst the 10 trial runs performed then the value is large. Since we are aiming at developing a robust - ideally one- shot - framework, a small value is most desirable here.

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

9

the same results, on average, after 50 epochs, with very little diﬀerence in the robustness, although AEI does perform slightly worse. This could be indicative of a funnelling shape of the information landscape, in which one basin is both dominant, and wide. This can be seen in Fig. 2. This is an ideal case for hyperparameter setting, as the method used does not seem to particularly impact the results although, as can be seen from the other experiments in this study, it is not a typical one. As the study in the next section clearly shows, however, this could also be due to fortunate choices of which values of to study, and the authors argue that in tasks such as hyperparameter searches, which can be critical to the success of tasks further down the pipeline, disconnecting the conﬁdence in the quality of the hyperparameters from the setting of a search hyperparameter, such as should be considered a signiﬁcant advantage of this method. Table 2. Summary of the Results of Experiments on the Tuning of Machine Learning Algorithms - A Support Vector Machine, and a 2-layer Multi-layer Perceptron. For Conﬁdence Intervals (ΔCI), Smaller Values Demonstrate Reliability over Multiple Runs SVM Mean ΔCI

MLP Mean

ΔCI

1.940 0.006

0.253

0.086

EI-0.0 1.940 0.004 0.298

0.223

AEI

EI-0.3 1.940 0.004 0.1938 0.008

The ﬁve-dimensional MLP-classiﬁcation hyperparameter-setting task was more challenging for AEI, and the best performance was obtained using EI with = 0.3. It is worth noting, however, for this task that the performance of = 0.0 - signiﬁcantly worse both in search results and in CI - may suggest that the slightly worse performance of AEI is a price worth paying given the potential ramiﬁcations of getting the wrong value for . Of course, this is said under the assumption that there is no a priori knowledge about this value; and if this is not the case then this should be built taken into account when making risk-reward judgments. This is studied and discussed in more detail in the next section. The authors also recognise the possibility of building this knowledge into the contextual improvement framework, and this is an area under ongoing investigation. 4.3

Experimental Design

This problem was selected to represent a real-world design problem. Experimental design is an area in which Bayesian optimization has the potential to provide powerful new capabilities, as traditional design of experiment (DoE) approaches are static and information centric (exploratory), and thus have the potential

10

D. Jasrasaria and E. O. Pyzer-Knapp

Fig. 2. Visualization of the search progress for EI with epsilon set to 0.0, and 0.3 and our Adaptive EI, which is based upon contextual improvement for setting hyperparmeters of support vector machines performing the abalone regression experiment. Each experiment is performed 10 times with 3 diﬀerent randomly selected data points, with conﬁdence intervals are produced by bootstrapping the mean.

to be highly ineﬃcient for design tasks. The performance of our AEI protocol here demonstrates the value of dynamic control of explore/exploit tradeoﬀ. The results are shown in Table 3. Unlike other problems investigated thus far, the = 0.3 setting of EI is highly ineﬃcient, producing the worst lift/drag ratios out of the three protocols, although as a result of its exploratory nature it has better reproducibility (lower CI). As can be seen in Fig. 3, AEI discovers the highest performing aerofoils with more reliability than the next best, the = 0.0 setting of EI - demonstrating how the method balances the twin goals of performance and reproducibility. Table 3. Summary of the Results of Experiments on the Experimental Design of 2D Aerofoils, a Maximization Problem. For Conﬁdence Intervals (ΔCI) Smaller Values Demonstrate Reliability over Multiple Runs Aerofoil Mean AEI

ΔCI

255.0327 183.7029

EI-0.0 234.0355

195.8036

EI-0.3 187.7445

165.3584

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

11

Fig. 3. Visualization of the search progress for EI with epsilon set to 0.0, and 0.3 and our Adaptive EI, which is based upon contextual improvement. Each experiment is performed 10 times with 3 diﬀerent randomly selected data points, with conﬁdence intervals produced by bootstrapping the mean.

4.4

Overall Performance: Sensitivity to Hyperparameters

One way to measure the robustness of AEI is to compare the rankings of the search and CI metrics over the whole range of tasks performed in this study. Since raw rankings can be misleading (a close second ranks the same as a search in which the gap between methods was much wider) we utilize a normalized ranking using the following method. Z=

s − s smax − smin

(8)

where, s represents the result of a particular strategy, s the result of the best strategy, and smax −smin represent the range of results encountered in the study. Calculating the average value for Z across each of the experiments performed in this study is enlightening into the beneﬁt provided by the dynamic control of explore-exploit trade-oﬀ (essentially, ). Our contextual-improvement based strategy (AEI) provides superior results for both search results (i.e. the discoverability of desirable solutions) and the CI (i.e. the robustness of the search). Additionally, we can start to estimate the dependency of these metrics upon a good choice of epsilon by comparing the Z scores obtained using = 0 and = 0.3. Comparing the overall Z score (i.e. the combination of search and CI), we see that the diﬀerence between the two settings of epsilon is around 78% of the total value of our dynamic setting (Table 4, oﬀering a signiﬁcant degradation in performance.

12

D. Jasrasaria and E. O. Pyzer-Knapp

Table 4. Summary of the Results of Experiments Performed during this Study using the Z Criterion in (8). Bold Indicates the Best Performing Method Z Search AEI

4.5

ΔCI

Overall

0.3910 0.3278 0.3594

EI-0.0 0.7187

0.7665

0.7426

EI-0.3 0.4854

0.4369

0.4611

The Importance of a One-Shot Technique

It is important to note here that the true apples to apples comparison is not really between any one value of , be it 0.0, or 0.3, (or even the diﬀerence between these two values) but instead to compare to the CI over a wide range of since the correct value cannot be determined a priori. In order to better illustrate this point, we perform two of the tasks described in the paper - the Camelback minimization (a synthetic function) and a ‘real world’ example of tuning the hyperparameters of an SVM for the abalone regression problem - over a range of values for from 0.0 to 1.0, with a resolution of 0.01 (i.e. 100 values of epsilon). The additional uncertainty associated with selecting a particular value of can be clearly be seen from Fig. 4. Whilst we can see from the previous experiments that it is possible to ﬁnd a value of , which performs as well as AEI, it is hard to know what the best value should be. Figure 4 shows the potential danger of using a poor value of , with Fig. 4 (b) showing clearly the potential danger of choosing a bad value for epsilon when samples are low. In the typical Bayesian optimization setting, this is particularly important as there may be very little sampling as performing a ground truth evaluation can result in a signiﬁcant cost, either ﬁnancial or computational and thus a method which minimizes this risk has signiﬁcant beneﬁts. Additionally, since many decision making exercises are coming to increasingly rely on deterministic (i.e. not Bayesian), but highly scalable machine learning models, the potential consequences of not locating a good set of hyperparameters can be signiﬁcant. ‘One shot’ methods such as AEI aﬀord the user a larger degree of conﬁdence that the search has located a good set of parameters without the need to evaluate multiple search settings (such as would be required with − EI). An approximation to the risk reward trade-oﬀ can be performed visually using Fig. 4. Experiments in which the gamble failed to pay dividends (i.e. the performance of using a constant is worse than AEI) are represented as the shaded area above the black trace. This can be thought of as the situations in which AEI outperforms a static model. It can be seen that or both tasks evaluated there is a large density of experiments which fall into this ‘loss’ zone, especially when small number of samples have been drawn. For an idea of the magnitude of the risk, you can compare the areas shaded grey above and below the black trace. Again, the expectation, given a random selection of is signiﬁcantly in the ‘loss’ with this result being more pronounced at low number of samples.

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

13

Fig. 4. Visualization of the search progress for EI with a set of ranging between 0.0, and 1.0 (grey) and our Adaptive EI (black), which is based upon contextual improvement. (a) shows the eﬀect of varying for the camelback minimization, whilst (b) shows the eﬀect of varying for the SVM hyperparameter search experiment. Each experiment is performed 5 times with 3 diﬀerent randomly selected data points, with conﬁdence intervals produced by bootstrapping the mean.

14

5

D. Jasrasaria and E. O. Pyzer-Knapp

Conclusion

We present a simple, yet eﬀective adaptation to the traditional formulation of improvement, which we call contextual improvement. This allows a Bayesian optimization protocol to dynamically schedule the trade-oﬀ between explore and exploit, resulting in a more eﬃcient data-collection strategy. This is of critical importance in Bayesian optimization, which is typically used to optimize functions where each evaluation is expensive to acquire. We have demonstrated that EI based upon contextual improvement outperforms EI using traditional improvement, and improvement with a margin in a range of tasks from synthetic functions to real-world tasks, such as experimental design of 2-D NACA aerofoils and the tuning of machine learning algorithms. We also note that our proposed contextual improvement results in settings of expected improvement which are signiﬁcantly more robust to the random seed data, which is a highly desirable property since this allows the use of minimal seed data sets. In traditional Bayesian optimization settings, where each data point is expensive to acquire, this can result in signiﬁcant savings in costs, both in time and ﬁnancial outlay. Acknowledgements. The authors thank Dr Kirk Jordan for helpful discussions.

References 1. Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. arXiv:1206.2944 [cs, stat], June 2012 2. Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., Cox, D.D.: Hyperopt: a Python library for model selection and hyperparameter optimization. Comput. Sci. Disc. 8(1), 014008 (2015) 3. Lisicki, M., Lubitz, W., Taylor, G.W.: Optimal design and operation of Archimedes screw turbines using Bayesian optimization. Appl. Energy 183, 1404–1417 (2016) 4. Brochu, E., Cora, V.M., de Freitas, N.: A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599 [cs], December 2010 5. Mokus, J.: On Bayesian methods for seeking the extremum. In: Optimization Techniques IFIP Technical Conference Novosibirsk, July 17, 1974, pp. 400–404, Springer, Heidelberg, July 1974 6. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148– 175 (2016) 7. Mockus, J.: The Bayesian approach to global optimization. In: System Modeling and Optimization, Lecture Notes in Control and Information Sciences, pp. 473–481, Springer, Heidelberg (1982) 8. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press (2006) 9. Snoek, J., Rippel, O., Swersky, K., Kiros, Satish, N., Sundaram, N., Patwary, M., Ali, M., Adams, R.P, et al.: Scalable Bayesian Optimization Using Deep Neural Networks, arXiv preprint arXiv:1502.05700 (2015)

Dynamic Control of Explore/Exploit Trade-Oﬀ in Bayesian Optimization

15

10. Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86, 97–106 (1964) 11. Vazquez, E., Bect, J.: Convergence properties of the expected improvement algorithm with ﬁxed mean and covariance functions. J. Stat. Plan. Infer. 140, 3088– 3095 (2010) 12. Jones, D.R.: A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21, 345–383 (2001) 13. Lizotte, D.J.: Practical Bayesian Optimization. University of Alberta (2008) 14. Direct global optimization algorithmDirect Global Optimization Algorithm. Springer 15. Neal, R.M.: Slice sampling. Ann. Stat. 31(3), 705–741 (2003) 16. Murray, I., Adams, R.P.: Slice sampling covariance hyperparameters of latent Gaussian models. In: Laﬀerty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1732–1740. Curran Associates, Inc. (2010) 17. de G. Matthews, G., van der Wilk, M., Nickson, T., Fujii, K., Boukouvalas, A., Le´ on-Villagr´ a, P., Ghahramani, Z., Hensman, J.: GPFlow: a gaussian process library using TensorFlow, arXiv preprint arXiv:1610.08733, October 2016 18. Nash, W.J.: T.M.R. Laboratories, The Population biology of abalone (Haliotis species) in Tasmania. 1, Blacklip abalone (H. rubra) from the north coast and the islands of Bass Strait (1994) 19. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 20. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010) 21. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overﬁtting. J. Mach. Learn. Res. 15, 1929–1958 (2014) 22. Chollet, F., et al.: Keras (2015) 23. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Vigas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). www. tensorﬂow.org 24. Hepperle, M.: JavaFoil, http://www.mh-aerotools.de/airfoils/javafoil.htm 25. Gelbart, M.A., Snoek, J., Adams, R.P.: Bayesian optimization with unknown constraints, arXiv:1403.5607 [cs, stat], March 2014

A Bayesian Approach for Analyzing the Dynamic Relationship Between Quarterly and Monthly Economic Indicators Koki Kyo(B) Department of Human Sciences, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho, Obihiro, Hokkaido 080-8555, Japan [email protected]

Abstract. We propose an approach for analyzing the dynamic relationship between a quarterly economic indicator and a monthly economic indicator. In this study, we use Japan’s real gross domestic product (GDP) and whole commercial sales (WCS) as examples of quarterly and monthly indicators, respectively. We ﬁrst estimate stationary components from the original time series for these indicators, with the goal of analyzing the dynamic dependence of the stationary component of GDP on that of WCS. To do so, we construct a set of Bayesian regression models for the stationary component of GDP based on the stationary component of WCS, introducing a lag parameter and a time-varying coeﬃcient. To demonstrate this analytical approach, we analyze the relationship between GDP and WCS-FAP, the WCS of farm and aquatic products, in Japan for the period from 1982 to 2005. Keywords: Bayesian modeling · State space model Dynamic relationship analysis · Gross domestic product Whole commercial sales · Analysis of japanese economy

1

Introduction

It has been shown that there is strong correlation between economic growth and consumption (see [1] for example). Economic analysis can beneﬁt, however, from further analysis of the relationship between economic growth, generally indicated by the measure of real gross domestic product (GDP), and consumption, as measured by whole commercial sales (WCS). However, the challenge for this analysis is that GDP data are presented as a quarterly time series, while WCS data are published monthly. Thus, in this study we propose an approach for analyzing the relationship between a quarterly economic indicator and a monthly economic indicator, to examine the dynamic dependence of the quarterly GDP on the monthly WCS. c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 16–28, 2019. https://doi.org/10.1007/978-3-030-01174-1_2

A Bayesian Approach for Analyzing the Dynamic Relationship

17

The project of the present study is also related to the problem of estimating monthly GDP values. Because real GDP is a basic indicator of business conditions, it is very important for analyzing business cycles in a country. However, GDP measures present shortcomings for business cycle analysis. A major problem is the promptness of data, because GDP ﬁgures are typically published as a quarterly time series [9]. Therefore, estimation of monthly GDP is considered important for obtaining timely business cycle information [8,10,11].This paper proposes a Bayesian dynamic modeling approach for estimating monthly GDP. We focus particularly on Japan’s business cycles. Another challenge for analyzing the relationship between GDP and the WCS concerns the dynamics in the relationship. Regression models are often used for relationship analysis with constant regression coeﬃcients, the implication being that no structural changes occur. However, when the study period spans several decades, it is clearly unrealistic to assume constant coeﬃcient parameters. Thus, these conventional approaches are considered inadequate for the analysis of business cycles involving long-term time series. However, [3] developed a Bayesian approach based on vector autoregressive models with time-varying coeﬃcients for analyzing time series that are nonstationary in covariance. Subsequently, [2] introduced a Bayesian time-varying regression model for dynamic relationship analysis. These approaches were applied in [5–7]. In this study, we similarly use such Bayesian dynamic modeling approaches to estimate monthly GDP in Japan. The necessary ﬁrst step in estimating monthly GDP is the estimation of the stationary component. We consider that an estimate of the stationary component of GDP is based on the stationary component of WCS. Thus, we ﬁrst extract the stationary components from the original time series for GDP and WCS using a set of state space models. Then, we present a method to analyze the dynamic relationship between the stationary components of GDP and WCS using Bayesian dynamic modeling. There are two important aspects of the relationship between GDP and the WCS: the lead-lag relationship and the timevarying dependence between these two indicators. We capture these aspects of the relationship by introducing a lag parameter and a time-varying coeﬃcient into a set of Bayesian linear models. The rest of this paper is organized as follows. In Sect. 2, we introduce a method for estimating the stationary component from the original time series data. In Sect. 3, we show our models and methods of parameter estimation for the proposed approach. An application of the proposed approach is shown in Sect. 4. Finally, we oﬀer conclusions in Sect. 5.

2

Estimating the Stationary Component

As mentioned above, the primary task in analyzing the relationship between GDP and WCS is the estimation of the stationary components in these time series. Thus, ﬁrst we introduce a method for estimating the stationary components from the original time series.

18

K. Kyo

For the quarterly time series ym of GDP, we consider a set of statistical models as follows: y y + wm , ym = tym + sym + rm

tym sym

= =

y = rm

y 2tym−1 − tym−2 + vm1 , y −sym−1 − sym−2 − sym−3 + vm2 , p y y αj rm−j + vm3 (m = 1, 2, . . . , M ), j=1

(1) (2) (3) (4)

y where, tym , sym , and rm are, respectively, the trend component, the seasonal component and the stationary component of the time series ym . Also, p represents the order of an AR model for the stationary components and α1 , . . . , αp are the y y AR coeﬃcients. wm ∼ N(0, σ 2 ) is the observation noise, while vm1 ∼ N(0, τ12 ), y y 2 2 vm2 ∼ N(0, τ2 ) and vm3 ∼ N(0, τ3 ) are system noises for each component model. y y y y , vm1 , vm2 and vm3 are independent of one another. It is assumed that wm When the model order p and the hyperparameters α1 , . . . , αp , σ 2 , τ12 , τ22 and τ32 are given, we can express the models in (1)–(4) by a state space representation. A likelihood function for the hyperparameters is deﬁned using the Kalman ﬁlter algorithm, therefore we can estimate the model order and the hyperparameters using a maximum likelihood method. Then, we can estimate each component in the time series ym using the Kalman ﬁlter algorithm, so the estimate for the y of GDP can be obtained (see [4] for detail). stationary component rm Further, to estimate the stationary component in a monthly time series xn , such as WCS, we use a set of models similar to that in (1)–(4), as follows:

xn = txn + sxn + rnx + wnx , x txn = 2txn−1 − txn−2 + vn1 ,

(5) (6)

x sxn = −sxn−1 − · · · − sxn−11 + vn2 , q x x βj rn−j + vn3 (n = 1, 2, . . . , N ), rnx =

(7) (8)

j=1

where, q represents the order of an AR model for the stationary component, β1 , . . . , βq are the AR coeﬃcients. wnx ∼ N(0, ψ 2 ) is the observation noise, x x x ∼ N(0, η12 ), vn2 ∼ N(0, η22 ) and vn3 ∼ N(0, η32 ) are system noises. The other vn1 quantities correspond to each term in the models in (1)–(4). Thus, the model order q and the hyperparameters β1 , . . . , βq , ψ 2 , η12 , η22 and η32 are estimated using the same algorithm. As a result, the estimate of the stationary component rnx in the time series xn can be obtained.

3 3.1

Proposed Approach Modeling

To analyze the dynamic relationship between the quarterly GDP and the monthly WCS, we propose an approach based on a set of models which are called two-mode regression with time-varying coeﬃcients (TMR-TVC).

A Bayesian Approach for Analyzing the Dynamic Relationship

19

We classify GDP growth into two states; an upside mode corresponding to situations in which the stationary component of GDP continues to increase, and a downside mode corresponding to situations in which the stationary component continues to decrease. We expect that the relationship between GDP and WCS might diﬀer according to situation. Thus, we use diﬀerent models for the two modes. For the upside mode, the TMR-TVC models are given in the form of a regression model with a time-varying coeﬃcient as follows: y = rm

3

x a3(m−1)+i r3(m−1)+i+L + ε(1) m , 1

(9)

i=1

a3(m−1)+3 = 2a3(m−1)+2 − a3(m−1)+1 (1)

+e3(m−1)+3 ,

(10)

a3(m−1)+2 = 2a3(m−1)+1 − a3(m−1) (1)

+e3(m−1)+2 ,

(11)

a3(m−1)+1 = 2a3(m−1) − a3(m−1)−1 (1)

+e3(m−1)+1

(12)

(m = 1, 2, . . . , M ), y where, rm denotes the estimate of the stationary component in the quarterly time series of GDP, which is obtained from the estimation of models in (1)–(4), and rnx denotes the same for the monthly time series of WCS, which is obtained from the estimation of models in (5)–(8). an is the time-varying coeﬃcient comprising a (1) monthly time series, and L1 denotes a lag. εm ∼ N(0, λ21 ) is the observation noise (1) 2 and en ∼ N(0, φ1 ) is the system noise with λ21 and φ21 being hyperparameters. (1) (1) We assume that εm and en are independent of each other for any values of m and n. The lag L1 and the time-varying coeﬃcient an are two important parameters. The value of L1 describes the lead-lag relationship between GDP and WCS, where L1 > 0 implies that WCS lags GDP, and L1 < 0 implies that WCS precedes GDP. Moreover, from the estimate of an we can analyze the dynamic relationship between these indicators. The models in (9)–(12) are essentially Bayesian linear models in which the model in (9) deﬁnes the likelihood, and the models in (10)–(12) form a second order smoothness prior for the time-varying coeﬃcient. So we can estimate the time-varying coeﬃcient with optimal smoothness on an by controlling the value of φ21 .

20

K. Kyo

Similarly to the upside mode, the TMR-TVC models for the downside mode are given as: y = rm

3

x b3(m−1)+i r3(m−1)+i+L + ε(2) m , 2

(13)

i=1

b3(m−1)+3 = 2b3(m−1)+2 − b3(m−1)+1 (2)

+e3(m−1)+3 ,

(14)

b3(m−1)+2 = 2b3(m−1)+1 − b3(m−1) (2)

+e3(m−1)+2 ,

(15)

b3(m−1)+1 = 2b3(m−1) − b3(m−1)−1 (2)

+e3(m−1)+1

(16)

(m = 1, 2, . . . , M ) with L2 and bn being the lag and the time-varying coeﬃcient, respectively. Also, (2) (2) εm ∼ N(0, λ22 ) is the observation noise and en ∼ N(0, φ22 ) is the system noise for the case where λ22 and φ22 are hyperparameters. As in the models in (9)–(12), (2) (2) we assume that εm and en are independent of each other for any values of m and n. Below we only show the methods for estimating the hyperparameters in the TMR-TVC models for the upside mode; those for the downside mode are similar. 3.2

Estimating the Time-Varying Coeﬃcient

Now, we put ⎡ ⎤ (x) ⎤ r3(m−1)+3+L1 a3(m−1)+3 ⎢ (x) ⎥ ⎥ z m = ⎣ a3(m−1)+2 ⎦ , H Tm = ⎢ ⎣ r3(m−1)+2+L1 ⎦ , (x) a3(m−1)+1 r3(m−1)+1+L1 ⎡ ⎤−1 ⎡ ⎤ 1 −2 1 123 G = ⎣ 0 1 −2 ⎦ = ⎣ 0 1 2 ⎦ , 0 0 1 001 ⎡ ⎤ ⎡ ⎤ 0 00 4 −3 0 F = −G ⎣ 1 0 0 ⎦ = ⎣ 3 −2 0 ⎦ , −2 1 0 2 −1 0 ⎡ ⎤ (1) e ⎢ 3(m−1)+3 ⎥ (1) T 2 ⎢ e m = ⎣ e3(m−1)+2 ⎥ ⎦ , Q = E{e m e m } = φ1 I 3 (1) e3(m−1)+1 ⎡

A Bayesian Approach for Analyzing the Dynamic Relationship

21

with I 3 denoting a 3-th identity matrix. Then, the models in (9)–(12) can be expressed by the following state space model: z m = F z m−1 + Ge m , y rm = H m z m + ε(1) m .

(17) (18)

In the state space model comprising (17) and (18), the time-varying coeﬃcient an is included in the state vector z m , so the estimate for an can be obtained from the estimate of z m . Moreover, the parameters, λ21 and φ21 , which are called hyperparameters, can be estimated using the maximum likelihood method. (k) Let z 0 denote the initial value of the state and Y1 denote a set of estiy mates for rm up to time point k, where k denotes a quarter. Assume that (k) z 0 ∼ N(z 0|0 , C 0|0 ). Because the distribution f (z m |Y1 ) for the state z m con(k)

ditional on Y1 is Gaussian, it is only necessary to obtain the mean z m|k and (k) the covariance matrix C m|k of z m with respect to f (z m |Y1 ). 2 2 Given the values of L1 , λ1 and φ1 , the initial distribution N(z 0|0 , C 0|0 ), and y up to time point M , the means and covariance matrices a set of estimates for rm in the predictive distribution and ﬁlter distribution for the state z m can be obtained using the Kalman ﬁlter for m = 1, 2, . . . , M (see [4] for example): [Prediction] z m|m−1 = F z m−1|m−1 , C m|m−1 = F C m−1|m−1 F t + GQG t . [Filter-1] K m = C m|m−1 H tm (H m C m|m−1 H tm + λ21 )−1 , y z m|m = z m|m−1 + K m (rm − H m z m|m−1 ), C m|m = (I 3 − K m H m )C m|m−1 .

[Filter-2] z m|m = z m|m−1 , C m|m = C m|m−1 . Note that for each value of m, when the moment of time m occurs during an upside period we use the step Filter-1, otherwise Filter-2 is applied. Based on the results of the Kalman ﬁlter, we can obtain the estimate for z m using the ﬁxed-interval smoothing for m = M − 1, M − 2, . . . , 1 as follows: [Fixed-Interval Smoothing] Am = C m|m F t C −1 m+1|m , z m|M = z m|m + Am (z m+1|M − z m+1|m ), C m|M = C m|m + Am (C m+1|M − C m+1|m )Atm . Then, the posterior distribution of z m can be given by z m|M and C m|M . Subsequently, the estimate for the time-varying coeﬃcient an can be obtained because the state space model described by (17) and (18) incorporates an in the state vector z m .

22

3.3

K. Kyo

Estimating the Hyperparameters (M )

y = {r1y , r2y , . . . , rM } and the corresponding time Given the time series data Y1 x x x series data {r1 , r2 , . . . , r3M }, a likelihood function for the hyperparameters λ21 and φ21 and the parameter L1 is given by:

(M )

f (Y1

M

|λ21 , φ21 , L1 ) =

y fm (rm |λ21 , φ21 , L1 ),

m=1 y y where, fm (rm |λ21 , φ21 , L1 ) is the density function of rm . By taking the logarithm (M ) 2 2 of f (Y1 |λ1 , φ1 , L1 ), the log-likelihood is obtained as (M )

(λ21 , φ21 , L1 ) = log f (Y1 =

M

|λ21 , φ21 , L1 )

y log fm (rm |λ21 , φ21 , L1 ).

(19)

m=1

As proposed by [4], using the Kalman ﬁlter, the density function y fm (rm |λ21 , φ21 , L1 ) is a normal density given by y |λ21 , φ21 , L1 ) =

fm (rm

× exp

−

1 2πwm|m−1

y y (rm − r m|m−1 )2

2wm|m−1

,

(20)

y y is the one-step ahead prediction for rm and wm|m−1 is the variance where, r m|m−1 of the predictive error, respectively given by: y r m|m−1 = H m z m|m−1 ,

wm|m−1 = H m C m|m−1 H tm + λ21 . Moreover, for a ﬁxed value of L1 , the estimates of the hyperparameters can be obtained using the maximum likelihood method, i.e., we can estimate the hyperparameters by maximizing (λ21 , φ21 , L1 ) in (19) together with (20). In practice, when we put the new λ21 = 1 into the above Kalman ﬁlter algorithm, the estimate 2 for λ2 is obtained analytically by: λ 1 1 y M (r y − r m|m−1 )2 m 2 = 1 . λ 1 M m=1 wm|m−1

(21)

So, the estimate φ 21 for φ21 can be obtained by maximizing (λ21 , φ21 , L1 ) under the use of (21).

A Bayesian Approach for Analyzing the Dynamic Relationship

23

Thus, for a given value of the lag L1 , the maximum likelihood is given as (M ) 2 2 (1) (1) (2) (2) f (Y1 |λ 1 , φ1 , L1 ), then for a set {L1 , L1 + 1, . . . , L1 − 1, L1 } of L1 we can calculate the relative likelihood by: (M )

f (Y R(L1 ) = (2) 1 L1 (1)

j=L1 (1)

2 , φ 2 , L1 ) |λ 1 1 (M )

f (Y1

2 , φ 2 , j) |λ 1 1

(1)

(2)

(2)

(L1 = L1 , L1 + 1, . . . , L1 − 1, L1 ) (1)

(2)

with L1 and L1 being a negative integer and a positive integer, respectively. So, we can analyze the lead-lag relation between GDP and WCS from the distribution of the relative likelihood on L1 . The same approach is used to analyze the lag L2 in the downside-mode models.

4

Application

70000

90000

110000

In analyzing the dynamic relationship between GDP and the WCS, we used the WCS of farm and aquatic products (WCS-FAP) as an indicator for consumption, because we are interested in agriculture related sectors. Here, we use the data for the quarterly GDP time series and the monthly WCS-FAP time series in Japan for the period between 1982 and 2005 as the object of analysis. The data for real GDP are obtained from the Cabinet Oﬃce, Government of Japan. The data for the WCS-FAP are obtained from the website of the Ministry of Economy, Trade and Industry, Japan. Figure 1 shows the quarterly time series of real GDP in Japan for the period 1982Q1 - 2005Q4. Figure 2 is a line graph showing the monthly time series of the WCS-FAP during 1980.1–2007.12. In both Figs. 1 and 2, the unit of measure is billions of Japanese Yen.

1985

1990

1995

2000

Fig. 1. Time series data of real GDP in Japan (1982Q1–2005Q4).

2005

24

K. Kyo

For simplicity in parameter estimation, we adjusted the scale for each time ∗ and x∗n denote the original data for GDP and series. Speciﬁcally, letting ym WCS-FAP, we adjusted the associated scale by: ∗ ym y1∗ x∗ xn = 100 × n∗ x1

ym = 100 ×

(m = 1, 2, . . .), (n = 1, 2, . . .).

3000

4000

5000

6000

In the analysis below, we use the scale-adjusted time series ym and xn as the data for GDP and WCS-FAP, respectively. To estimate the stationary component in GDP, we compute the likelihoods for the models in (1)–(4) for p = 1, 2, 3, 4. The maximum likelihood is obtained for the models with p = 1. So we use models with p = 1 as a set of best models in data analysis. To estimate the stationary component of WCS-FAP, we compute

1980

1985

1990

1995

2000

2005

−3

−2

−1

0

1

2

3

Fig. 2. Time series data for WCS-FAP in Japan (1980.1–2007.12).

0

20

40

60

80

Quarter

Fig. 3. Time series for the estimation of the stationary component of real GDP in Japan (1982Q1–2005Q4). The vertical lines indicate inﬂections of the business cycle (the solid and broken lines indicate peaks and troughs, respectively).

25

−10

−5

0

5

10

15

A Bayesian Approach for Analyzing the Dynamic Relationship

0

50

100

150

200

250

300

Month

Fig. 4. Time series for the estimation of the stationary component of WCS-FAP in Japan (1980.1–2007.12). The vertical lines indicate inﬂections of the business cycle (the solid and broken lines indicate peaks and troughs, respectively).

0.0

0.2

0.4

(a) for upside mode model

−20

−10

0

10

20

Lag

0.0

0.2

0.4

(b) for downside model model

−20

−10

0

10

20

Lag Fig. 5. Relative likelihood distribution on the lag for WCS-FAP.

26

K. Kyo

the likelihoods for the models in (5)–(8) for q = 1, 2, . . . , 10. The maximum likelihood is obtained for the models with q = 6. Thus, in the data analysis we use the models with q = 6 for WCS-FAP. Figure 3 shows the estimate for the stationary component of GDP. The thin line expresses the value for the original estimate and the thick line shows that for a 7-quarter moving average. The vertical lines indicate inﬂections of the business cycle (the solid and broken lines indicate peaks and troughs, respectively). It can be seen from Fig. 3 that ﬂuctuation of the stationary component in GDP correlates closely with business cycles in Japan. Figure 4 shows the estimate for the stationary component of the WCS-FAP. Similarly to Fig. 3, the vertical lines indicate inﬂections in the business cycle (the solid and broken lines indicate peaks and troughs, respectively). Any correspondence between business cycles and ﬂuctuation in the stationary component of the WCS-FAP is not immediately obvious from Fig. 4. Figure 5 shows the relative likelihood distribution on the lags between −24 and 24 in the models in (9)–(13). In panel (a), showing the result for upside mode model, the relative likelihood distribution indicates a peak value occurring with three months lead. This indicates that, during expansion phases, movements of WCS-FAP lead GDP movements by three months. In panel (b), showing the

−0.15

0.00

0.15

(a) for upside mode model

1985

1990

1995

2000

2005

−0.15

0.00

0.15

(b) for downside mode model

1985

1990

1995

2000

2005

Fig. 6. Time series of the time-varying coeﬃcient for WCS-FAP (1982.1 - 2005.12).

A Bayesian Approach for Analyzing the Dynamic Relationship

27

result for downside mode model, the relative likelihood value peaks 22 months ahead. Thus, during recession phases, the movements of WCS-FAP lead those of GDP by nearly two years. These results imply that the WCS-FAP is coincident with GDP during expansion phases. Conversely, during contraction phases, the WCS-FAP can be used as a leading indicator of GDP, with a lead of about two years, and hence it can be applied to predict the movements of GDP and the business cycle more generally. Figure 6 shows the time series of estimates for the time-varying coeﬃcient in the models in (9)–(13). From these results we can analyze the dynamic relationship between GDP and WCS-FAP. From Fig. 6, it can be seen that the time-varying coeﬃcient displays more complicated behavior, suggesting that the time-varying relationship between WCS-FAP and GDP is not straightforward to interpret.

5

Conclusion

We proposed a Bayesian dynamic linear modeling method for analyzing the dynamic relationship between a quarterly economic indicator and a monthly economic indicator. In this study, real GDP and whole commercial sales of farm and aquatic products (WCS-FAP) in Japan were used as examples of the quarterly and the monthly indicators, respectively. The proposed approach can be used to analyze the dynamic relationship between GDP and WCS-FAP, and can hence be used to estimate monthly GDP. Our approach comprised two main steps. First, we extracted the stationary components of the time series for GDP and the WCS-FAP, using a set of state space models. Then, we constructed a set of Bayesian regression models with a time lag parameter and a time-varying coeﬃcient. The constructed models are called two-mode regression with time-varying coeﬃcients (TMR-TVC). We note the importance of two parameters in the TMR-TVC models: one is the time lag between GDP and the WCS-FAP and the second is the time-varying coeﬃcient. The value of the lag indicates the lead-lag relationship between GDP and the WCS-FAP. The estimate of the time-varying coeﬃcient indicates the dynamic correlation between GDP and WCS-FAP. Finally, in an empirical study based on the proposed approach, we analyzed the dynamic relationship between GDP and the WCS-FAP using Japanese economic statistics from 1982 to 2005. The empirical study found that: (1) During expansion phases of business cycles, the WCS-FAP is coincident with GDP. (2) During contraction phases, the WCS-FAP is as a leading indicator of GDP, with approximately two years plead, so it can be applied to predict the movements of GDP and the business cycles. Acknowledgment. This work is supported in part by a Grant-in-Aid for Scientiﬁc Research (C) (16K03591) from the Japan Society for the Promotion of Science. I thank Deborah Soule, DBA, from Edanz Group (www.edanzediting.com/ac) for editing a draft of this manuscript.

28

K. Kyo

References 1. Anghelache, C.: Analysis of the correlation between GDP and the ﬁnal consumption. Theor. Appl. Econ. 18(9), 129–138 (2011) 2. Jiang, X.-Q., Kyo, K.: A Bayesian method for the dynamic regression analysis. Trans. Inst. Syst. Control Inf. Eng. 8(1), 8-16 (1995) 3. Jiang, X.-Q., Kyo, K., Kitagawa, G.: A time-varying coeﬃcient vector AR modeling of nonstationary covariance time series. Signal Process. 33(3), 315–331 (1993) 4. Kitagawa, G.: Introduction to Time Series Modeling. CRC Press (2010) 5. Kyo, K., Noda, H.: A new algorithm for estimating the parameters in seasonal adjustment models with a cyclical component. ICIC Express Lett. Int. J. Res. Surv. 5(5), 1731–1737 (2011) 6. Kyo, K., Noda, H.: Bayesian analysis of the dynamic relationship between oil price ﬂuctuations and industrial production performance in Japan. Inf. Int. Interdisc. J. 16(7A), 4639–4660 (2013) 7. Kyo, K., Noda, H.: Dynamic eﬀects of oil price ﬂuctuations on business cycle and unemployment rate in Japan. Int. J. Innov. Manage. Technol. 6(6), 374–377 (2015) 8. Liu, H., Hall, S.G.: Creating high-frequency national accounts with state-space modelling: a Monte Carlo experiment. J. Forecast. 20(6), 441–449 (2001) 9. Mariano, R.S., Murasawa, Y.: A new coincident index of business cycles based on monthly and quarterly series. J. Appl. Econometrics 18(4), 427–443 (2003) 10. Mariano, R.S., Murasawa, Y.: A coincident index, common factors, and monthly real GDP. Oxford Bull. Econ. Stat. 72(1), 27–46 (2010) 11. Seong, B., Ahn, S.K., Zadrozny, P.: Estimation of vector error correction models with mixed-frequency data. J. Time Ser. Anal. 34(2), 94–205 (2013)

Sentiment Analysis System for Roman Urdu Khawar Mehmood(&), Daryl Essam, and Kamran Shaﬁ University of New South Wales (UNSW), Kensington, Australia [email protected], {d.essam,k.shafi}@adfa.edu.au Abstract. Sentiment analysis is a computational process to identify positive or negative sentiments expressed in a piece of text. In this paper, we present a sentiment analysis system for Roman Urdu. For this task, we gathered Roman Urdu data of 779 reviews for ﬁve different domains, i.e., Drama, Movie/Teleﬁlm, Mobile Reviews, Politics, and Miscellaneous (Misc). We selected unigram, bigram and uni-bigram (unigram + bigram) features for this task and used ﬁve different classiﬁers to compute accuracies before and after feature reduction. In total, thirty-six (36) experiments were performed, and they established that Naïve Bayes (NB) and Logistic Regression (LR) performed better than the rest of the classiﬁers on this task. It was also observed that the overall results were improved after feature reduction. Keywords: Opinion mining

Roman urdu Urdu Social media

1 Introduction Sentiment analysis, or opinion mining, is a computational process to determine the polarity of a topic, opinion, emotion, or attitude. Social media (i.e., blogs, microblogs, Facebook, discussion forums, etc.) are powerful tools for people to express their thoughts, opinions, emotions, and feelings. The proliferation of such information, related to opinions and sentiments, has made it a formidable task to manually distill the relevant details from this plethora of information. Such big data necessitates the development of an automated system, such as a Sentiment Analysis System, which can intelligently extract opinions and sentiments from such data. Most of the work done on sentiment analysis is for major languages, such as English and Chinese [1, 2]. However only limited, or very less work, has been done for Roman Urdu/Hindi, which is hence a resource-poor language. Developing a robust Sentiment analysis system for Roman Urdu is necessitated due to two major reasons. First, Urdu/Hindi is the third largest spoken languages in the world, with over 500 million speakers [3]. Secondly, it is becoming increasingly used because people prefer to communicate on the web using Latin Script (26 English Alphabets), instead of typing in their language using their language-speciﬁc keyboards. The objective of this work is to develop a baseline sentiment analysis system for Roman Urdu. To this end, a corpus of 779 Roman Urdu reviews has been created. Five different classiﬁers are trained on this data, and their effectiveness is examined on the task of Roman Urdu sentiment analysis using three different features, namely unigram, bigram, and uni-bigram, on six different types of datasets. © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 29–42, 2019. https://doi.org/10.1007/978-3-030-01174-1_3

30

K. Mehmood et al.

The paper is organized as follows. Section 2 explains related work, Sect. 3 discusses the dataset. Section 4 describes the machine learning algorithms used for this work, then Sect. 5 describes the Features, Sect. 6 describes the data preprocessing steps, and Sect. 7 describes the overall methodology. Next Sect. 8 describes and discusses the results. The paper concludes with Sect. 9.

2 Related Work Sentiment Analysis is one of the hottest areas of computer science, with over 7000 research papers [4]. Sentiment analysis and opinion mining is a challenging task with a wide range of applications, both at individual and enterprise level. It is helpful in business intelligence applications and recommendation systems [5]. There are three broad approaches to sentiment analysis. First is the machine learning approach, which predicts the polarity of a sentiment using training, testing and an optional development data. For this, the training data needs to be hand labeled [6, 7]. Second is the lexiconbased approach which exploits a list based methodology of predeﬁned words to determine the polarity of sentiment [8]. The third is a hybrid approach which combines the machine learning and lexicon-based approaches [9]. Sentiment analysis classiﬁcation is done at three levels [10]. Document-level classiﬁcation considers a document as an atomic unit of information and classiﬁes it as expressing a positive or negative view [11, 12]. Sentence level sentiment analysis classiﬁes sentences as per their polarity [4, 13]. Aspect or attribute level sentiment analysis classiﬁes the sentiment of speciﬁc attributes of a target entity [14, 15]. Sentiment analysis is not simple for all cases. Treebank techniques are good approaches to handle sentiment analysis with the handling of semantic-based scenarios. For this, a Recursive Neural Tensor Network used for sentiment analysis produced accuracies ranging from 80–85 per cent [16]. Contextual polarity can be determined by exploring the essence of any phrase. It can be identiﬁed by ﬁrst identifying the neutral or polar sense of a sentence [17]. The Entropy Weighted Genetic Algorithm helped to generate better results using heuristic and information gain for sentiment analysis of English & Arabic with a movie review dataset and U.S. and Middle Eastern Web forum postings [14]. Feature-based and tree kernel models performed well for sentiment analysis on English Twitter data [18]. There are multiple machine learning algorithms for classiﬁcation of sentiments. Naïve Bayes classiﬁer, Support Vector Machine, and Maximum entropy classiﬁcation produced relatively bad results for the classiﬁcation of sentiments, even though these models produced good results for other traditional text classiﬁcations [6]. The availability of datasets is also an issue in the ﬁeld of sentiment analysis. Much data is available online in the form of product reviews. Such reviews are of great importance to business holders, as they want to know the preferences of their customers to improve the performance of their products, or to add or remove attributes of a product [10]. Apart from product reviews, sentiment analysis can also be applied in a variety of other ﬁelds. For example, it can be applied to news articles to extract useful information [19]. Based on the prevailing situation in the region, it can be used to predict trends in the stock market [20, 21]. Sentiment analysis is also useful to determine the opinions of political candidates. Also, the results of an election can be predicted based on opinions

Sentiment Analysis System for Roman Urdu

31

gathered from political debates and people’s reviews about the political parties and election candidates in social media [22]. Since most of the leading research is being done in English speaking countries and China, all the above work, along with any others not mentioned here, has been done for resource-rich and major languages of the world, a very less work has been done for Urdu [23–34], and only limited work has been done for Roman Urdu.

3 Dataset Different websites, such as hamariweb, YouTube, dramaonline, Facebook, etc. were identiﬁed for the collection of Roman Urdu user review data. The data, for ﬁve different domains, was extracted from these links using a semiautomatic method. In the Indian Subcontinent, people have strong interests in areas such as movies, mobiles, dramas/teleﬁlms and politics. By building a sentiment analysis system on such data, people can be informed about positive and negative trends in their area of interest. The data gathered is multilingual with no standard language structure. We cleaned the gathered data by removing reviews written in Arabic script and the English language. Since the downloaded reviews don’t contain polarity, therefore the data was then annotated using the Multi Annotator methodology. For the multi annotator methodology, the data was handed over to two annotators who independently annotated the data. As per guidelines, the annotators were asked to annotate the data into two different classes, i.e., positive and negative. The reason for this was that we focus on binary classiﬁcation of reviews. There were only a few reviews which were annotated as positive by the ﬁrst annotator, but marked as negative by the second annotator. Similarly, there were few reviews which were annotated as negative by the ﬁrst annotator but marked as positive by the second annotator. The rest of the reviews were marked as positive or negative by both annotators, i.e., there was no disagreement between the annotators. A discussion was held on the reviews with disagreement between the two annotators, and those reviews were dropped when the disagreement could not be resolved. The reviews on which the disagreement was resolved during the discussion were either transferred to the positive or the negative category, as per the consensus. The dataset, containing a total of 779 reviews, as shown in Table 1, is divided into ﬁve different domains. Out of the total reviews gathered, 152 reviews belong to Drama, 167 to Movie/teleﬁlm, 149 to Mobile Reviews, 162 to Politics and 149 to a Misc category. Also, out of these 779 reviews, 412 reviews were identiﬁed as positive and 367 as negative. Table 1. Roman Urdu dataset Domain Drama Movie/TeleFilm Mobile Reviews Politics Misc Total

No of reviews Positive Negatives 152 92 60 167 101 66 149 86 63 162 74 88 149 59 90 779 412 367

32

K. Mehmood et al.

4 Algorithms Used This section gives brief details of the algorithms used in this paper. Naïve Bayes. Naïve Bayes is one of the most important algorithms for text classiﬁcation. It relies on a very simple representation of a document, i.e., bag of words. In bag of words representation, information like order and position of the words in documents are lost. Naïve Bayes is based on Bayes rule with a very simple independence assumption, i.e., all features are independent given class. When used in text classiﬁcation, the task of Naïve Bayes is to output the class “c*” with the highest probability for a particular document “d” i.e. c ¼ argmaxc pðcjdÞ.This probability can be expanded by using Bayes rule as follows:

c ¼ argmaxc pðcjdÞ ¼ argmaxc

pðdjcÞpðcÞ pð d Þ

The term pðdÞ is the evidence, p(c) is the prior probability and pðdjcÞ is the likelihood. Logistic Regression. Logistic regression is one of the most widely used and powerful algorithms for classiﬁcation problems. In logistic regression, the selected hypothesis function always predicts output values between 0 and 1. 0 hh ðhT xÞ 1 The hypothesis function is represented by the sigmoid function as follows: hh ðhT xÞ ¼

1 T 1 þ eh x

Where, hh hT x is the hypothesis function for logistic regression, parameterized by h; and x is the input variable or feature. SVM. Support vector machines (SVMs) have been shown to be highly effective at traditional text categorization. They are large-margin classiﬁers and output either 0 or 1 to classify the examples as positive or negative, in contrast to logistic regression and naïve Bayes which output probabilities, i.e., values between 0 and 1. The objective of SVM is to ﬁnd the optimal separating hyperplane which maximizes the margin of the training data. The distance between the hyperplane and the closest data points of different classes deﬁne the margin. The hypothesis function of SVM is as follows:

hh ðxÞ ¼

f1 0

if if

hT x 1 hT x 1g

Sentiment Analysis System for Roman Urdu

33

k-nearest neighbors (KNN). This is a very simple algorithm in which the training phase consists of storing the entire training data. Whenever a new data point comes for prediction, its distance is measured from all the training examples, and the most common class of “k” nearest training examples is assigned to the test example. One drawback of KNN is that it requires a lot of memory to store the training examples and it only performs computation while predicting test examples. Decision Tree (DT). In a decision tree, Information gains of all the features are computed, and the feature with the highest gain becomes the root node of the tree. Similarly, the information gains for the rest of the features are calculated, and the features for the next level of the tree are selected, and the tree is built using the same procedure again and again.

5 Features In this work, we use three different kinds of word-level N-Gram features, i.e., unigram, bigram, and uni-bigram. For example, in the sentence “I go to school” there are four unigrams, i.e., “I”, “go”, “to”, “school”, there are ﬁve bigrams, i.e. “ I”, “I go”, “go to”, “to school”, “school ”, where is the start and is the end of sentence, and there are nine uni-bigrams, i.e., “I”, “go”, “to”, “school”, “ I”, “I go”, “go to”, “to school”, “school ”. There are two major uses of N-grams. First, it is used to build N-gram Language Models (LMs). Language models are used to assign probabilities to the next word given history and ultimately assigns the probability to a complete sentence. pðwordjhistoryÞ For example, if we want to know the probability of the next word being “school”, given history “he is going to”. Language models (unigram LMs, Bigram LMs, Trigram LMs … N-gram LMs) have a variety of applications in Machine Translation, computational linguistic and computation biology, etc. The second use of N-grams is to develop features for supervised machine learning algorithms like Naïve Bayes, SVM, and Logistic Regression, etc. In this work, we used N-grams to develop features for our machine learning algorithms.

6 Data Preprocessing This section will discuss the data preprocessing steps which were needed to convert the data to a format required by machine learning algorithms, as text data can’t be fed directly to machine learning algorithms. We performed data preprocessing to convert text data into a numeric format. At the end of the preprocessing step, a document-term matrix was built. The rows of the matrix represent the documents, i.e., in our case reviews, and the column of the matrix represents the features, i.e., in our case unigram,

34

K. Mehmood et al.

bigram, and uni-bigram. The values of the matrix are the numeric counts of the features (feature frequency). In our case, the size of the documents term matrix is 779 (rows) x number of features (columns). To understand the document-term matrix, consider a training set that consists of the following three reviews (documents). 1. I go to school 2. They go to school 3. We go to school The extracted unigram features from the training examples are {I, go, to, school, they, we}. From the extracted features using training examples, a document-term matrix (shown in Table 2) with a size of 3 6 is formed. Similarly, the extracted bigram features using training examples are { We, They, I, I go, go to, to school, school , They go, We go}. For the given training examples and bigram features, a document-term matrix (shown in Table 3), with a size of 3 9 is formed. The extracted uni-bigram features using training examples are {I, go, to, school, they, we, We, They, I, I go, go to, to school, school , They go, We go}. From the extracted uni-bigram features and training examples, a document-term matrix (shown in Table 4) with a size of 3 15 is formed. As is evident from the above example, the extracted features follow a bag of words representation. In this kind of representation, the word is neither bound to follow any grammatical rule for its appearance nor does its position matter in the text. Table 2. Example of document-term matrix (Unigram) Features I go Document ID 1 1 1 2 0 1 3 0 1

to school They We 1 1 1 1 1 1

0 1 0

0 0 1

Table 3. Example of document-term matrix (Bigram) Features Document ID 1 2 3

We

They

I

I go

go to

to school

school

They go

We go

0 0 1

0 1 0

1 0 0

1 0 0

1 1 1

1 1 1

1 1 1

0 1 0

0 0 1

Table 4. Example of document-term matrix (Uni-Bigram) Features Document ID

I go to school They We We

They

I

I go go to

to school school

They go

We go

1 2 3

1 1 0 1 0 1

0 1 0

1 0 0

1 0 0

1 1 1

0 1 0

0 0 1

1 1 1

1 1 1

0 1 0

0 0 1

0 0 1

1 1 1

1 1 1

Sentiment Analysis System for Roman Urdu

35

7 Methodology In this section, the proposed methodology for building the baseline sentiment analysis system for Roman Urdu is described (Fig. 1). First of all, we gathered the reviews mentioned in Sect. 3. For this, the source websites, blogs and social media links, which contained user reviews in Roman Urdu, were identiﬁed and then the reviews were extracted using a semiautomatic methodology. Both a scraper and a manual method were used to extract the data from the identiﬁed links. After extracting, the data was cleaned by removing unwanted information and was placed in an excel ﬁle. Since the downloaded reviews don’t contain polarity (positive or negative), therefore in the next step, the excel ﬁle was given to two different annotators to annotate the data with polarities. After annotation, the results from the two annotators were compared for any disagreements. The reviews on which disagreements couldn’t be resolved were dropped. After that, the uniﬁed results of annotation were stored in an Excel ﬁle. Six different datasets were used for experimentation. Five were domain speciﬁc, i.e., one dataset for each domain and the sixth dataset was formed by combining the datasets of all ﬁve domains. Data was split into training (60%) and testing sets (40%). We used scikit-learn [35], a free machine learning library, for implementation of machine learning algorithms. We conducted our experiments in two phases. In Phase I, we used all tokens (word, punctuations, etc.) to build our unigram, bigram, and uni-bigram features and then computed their accuracies. In Phase II, to improve accuracies, we performed feature reduction, where we eliminated punctuation marks, special characters, and single letter words. We selected these features for reduction with an assumption that the punctuation marks and single letter words like “a” do not contain

Fig. 1. Step by step methodology for Roman Urdu Sentiment Analysis (SA).

36

K. Mehmood et al.

any sentiment value. This helped us to reduce feature dimensions and improve accuracies. In total, thirty-six experiments (6(datasets) 3(features) for each phase) were performed, and the details of the results are mentioned in the results and discussion section. To compute average accuracies, each experiment was performed thirty times. In the end, we compared the results of each phase, then selected the phase with better results for further discussion and detailed analysis.

8 Results and Discussion Table 5 shows the results of Phase I and Table 6 shows the results of Phase II using the three different features of the six types of dataset. The Phase I results show that on most of the datasets, NB performed better than LR (ﬁve out of six), while LR performed better than NB for only one type of dataset. Further analysis shows that out of ﬁve times, NB outperformed three times using unigram, one-time using bigram and one-time using uni-bigram. Similarly, LR outperformed one-time using unigram. Overall, best results were achieved by using unigram (four times). Phase II results (Table 6) show that on some datasets NB performed better than LR (four out of six), and sometimes LR performed better than NB (two out of six), but overall NB outperformed. Further analysis shows that NB (out of four times) outperformed one-time using unigram, one-time using bigram and two times using unibigram. Similarly, LR (out of two times) outperformed one-time using unigram and one-time using uni-bigram. Overall, best results were achieved by using uni-bigram, followed by unigram. The results of Phase I (Table 5) and Phase II (Table 6) were used to select Phase II for further discussion and analysis, as it was observed that, in most of the cases, Phase II outperformed Phase I. Besides calculating the accuracies for three features of each dataset as mentioned in Table 6, we also used the information in Table 6 to calculate the row-wise feature average, overall average accuracies of each feature, column-wise average accuracies of each feature (for example the average accuracy of all unigram features in the column) and column average accuracies (for example the average accuracy of all features in a column). Also, we have compared the accuracies of the movie reviews data of Table 6, with the accuracies achieved for English movie reviews. After calculating the average accuracies achieved by each classiﬁer on all features of the six datasets (Column Average (CA)), it was observed that NB outperformed all other classiﬁers with an average accuracy of 67.58%, followed by LR with an average accuracy of 66.17%. To see how all the classiﬁers behaved on each feature, the classiﬁer wise average of accuracies against each feature of all six datasets was calculated (Column Average of Features (CAF), e.g., an average of all unigram features of all six datasets for LR is 68.88%). It was observed that NB outperformed all other classiﬁers in all the three features with average accuracies of 69.8% (unigram), 63.10% (bigram) and 69.83% (unibigram). In this setting, uni-bigram achieved the maximum accuracy and was the best.

Sentiment Analysis System for Roman Urdu

37

Also, an average of the row-wise average accuracies of each feature of Table 6, shown separately in Table 7 (for example, average accuracy of unigram feature of All Data (A) as achieved by each classiﬁer is 65.53%, i.e., how all the classiﬁers performed on unigram feature of All Data (A)) for all datasets shows that unigram is always best. After calculating the average of all thirty accuracy values (for example, ﬁve classiﬁers’ unigram accuracies for six datasets make thirty accuracy values for it) of each feature separately (shown in Table 8), it is also observed that unigram outperforms all other features with the accuracy of 63.27%. Overall, it is observed that NB performed best. One possible reason for this is that NB works on the simple independent assumption that features are independent of each other. It also doesn’t have speciﬁc parameters which can be tuned to improve its performance. Therefore, it is being utilized at its best strength. Also, SVM and KNN could not perform well for this task. One possible reason for this could be that for all cases, the number of features is much more than the number of observations. Another reason can be that all of the four classiﬁers (except NB) have hyper-parameters, which could be tuned to improve their performance.

Table 5. Results after thirty iterations (without feature reduction) DS

Features

Total number of LR NB SVM DT KNN features (%) (%) (%) (%) (%) (A) Uni 3844 70.33 71.79 53.24 63.01 56.18 Bi 10930 60.56 65.06 53.11 54.77 50.89 U-B 14774 70.11 72.00 53.11 63.05 53.94 (D) Uni 1030 75.75 75.70 60.16 69.30 57.69 Bi 2022 63.17 67.96 60.27 55.81 45.22 U-B 3052 74.09 74.73 60.27 68.55 55.70 (M/T) Uni 744 71.41 73.64 59.24 68.43 62.68 Bi 1428 60.00 62.53 59.29 50.96 50.61 U-B 2173 70.56 73.48 59.29 69.29 57.02 (MR) Uni 612 69.33 73.56 57.33 60.67 57.39 Bi 1207 61.78 65.06 56.78 58.39 57.22 U-B 1819 67.22 72.94 57.39 60.11 57.67 (P) Uni 1239 61.49 60.10 52.15 52.00 49.17 Bi 2262 48.97 53.74 52.15 55.74 45.74 U-B 3501 60.77 60.87 52.15 53.38 46.61 (M) Uni 1724 61.72 61.78 59.72 60.83 62.50 Bi 4410 61.66 65.56 59.72 58.61 48.61 U-B 6135 61.94 62.00 59.72 60.22 57.56 DS = Dataset, (A) = All Data, (D) = Drama, (M/T) = Movie/Teleﬁlm, (MR) = Mobile Reviews, (P) = Politics, (M) = Misc, Uni = Unigram, Bi = Bigram, U-B = Uni-Bigram Bold values show best accuracies

38

K. Mehmood et al. Table 6. Results after thirty iterations (with feature reduction)

DS

Features

Total number of features 3794 10201 13995 1002 1825 2827 717 1259 1976 588 1102 1690 1223 2211 3434 1691 4105 5796

LR NB SVM DT KNN (%) (%) (%) (%) (%) (A) Uni 71.33 71.98 53.09 63.25 58.01 Bi 60.81 64.64 53.1 54.21 50.58 U-B 70.94 72.37 53.1 63.67 53.4 (D) Uni 76.98 75.96 60.26 68.44 60.96 Bi 64.83 68.27 60.26 50.32 43.97 U-B 76.61 75.32 60.26 68.06 56.18 (M/T) Uni 72.22 75.95 59.29 70.3 63.43 Bi 61.16 65.5 59.29 52.17 51.31 U-B 71.26 76.56 59.29 70.45 57.57 (MR) Uni 69.83 73.27 56.88 60.38 58.44 Bi 59.5 61.05 56.61 56.5 56.94 U-B 67.83 72.44 56.83 59.88 57.16 (P) Uni 61.38 61.02 52.15 52.41 48.66 Bi 48.87 53.94 52.15 54.51 46.05 U-B 61.53 60.92 52.15 52.3 46.1 (M) Uni 61.55 60.61 59.72 58.94 61.38 Bi 62.16 65.22 59.72 57.05 45.83 U-B 61.83 61.38 59.72 58.77 54.66 (CAF) Uni 68.88 69.80 56.90 62.29 58.48 Bi 61.31 63.10 56.86 54.13 49.11 U-B 68.33 69.83 56.89 62.19 54.18 (CA) 66.17 67.58 56.88 59.53 53.92 DS = Dataset, (A) = All Data, (D) = Drama, (M/T) = Movie/Teleﬁlm, (MR) = Mobile Reviews, (P) = Politics, (M) = Misc, (CAF) = Column Average of Features, (CA) = Column Average, Uni = Unigram, Bi = Bigram, U-B = Uni-Bigram Bold values show best accuracies

Table 7. Row-wise feature averages (of Table 6) Feat./DS (A) (%) (D) (%) (M/T) (%) (MR) (%) (P) (%) (M) (%) Uni 63.53 68.52 68.24 63.76 55.12 60.44 Bi 58.77 57.53 57.89 58.12 51.10 58.0 U-B 62.7 67.29 67.03 62.83 54.6 59.27 DS = Dataset, Feat = Features, (A) = All Data, (D) = Drama, (M/T) = Movie/Teleﬁlm, (MR) = Mobile Reviews, (P) = Politics, (M) = Misc, Uni = Unigram, Bi = Bigram, U-B = Uni-Bigram Bold values show best accuracies

Sentiment Analysis System for Roman Urdu

39

Table 8. Overall Average accuracies of each feature (of Table 6) Unigram Bigram Uni-bigram 63.27% 56.90% 62.28%

Table 9. Comparison of English and Roman Urdu movie review results (of Table 6) English NB SVM Unigram (%) 78.7 72.8 Bigram (%) 77.3 77.1 Uni-Bigram (%) 80.6 82.7

Roman Urdu NB SVM 75.95 59.29 65.5 59.29 76.56 59.29

From the perspective of domain-speciﬁc results, it can be seen that the best results are from the drama and movie domain. A possible reason is that Drama and Movies are interrelated with some similar characteristics, such as scripts, directors, actors, etc., and the reviews about these domains contain relatively limited sentiment-bearing words (like achi (good), achee (good), aala (excellent), for positive sentiment and bakwas (rubbish), fazool (poor), burey (bad) etc. for negative sentiments). The results of Politics and Misc domains are relatively poor. Probably because in politics people discuss a variety of issues related to politicians, policies, governments, political parties, different countries and so on. In the Misc domain, the reviews are the opinions of people against questions like “talba per Facebook ka asar” (effect of Facebook on students), NATO supply ki bandish – sahi ya ghalat faisala? (Stoppage of NATO supply - correct or wrong decision?), Pakistan ki cricket team (Pakistan’s cricket team), etc. Due to this, a variety of sentiment-bearing words are in the Politics and Misc domains, which causes problems while training the machine learning algorithms, and contributes to the poor results. We also compare the results of English movie reviews with our movie review data for two algorithms, i.e., NB and SVM [6] (as shown in Table 9). The dataset used for English consists of 1400 movie reviews, while our data consist of 167 movie reviews. Although, the results of NB for Roman Urdu remained close to those for English, but overall the results for Roman Urdu remained poor. There are two possible reasons for the poor performances of NB and SVM on our data. First, our dataset has less movie reviews as compared to the English movie reviews. Second and the most important reason is the complexity of Roman Urdu. Complexity exists since Roman Urdu is the Urdu language written in Latin script with user-dependent word spellings selection. Further, due to variations in pronunciations of Urdu words in different regions, people tend to type words as per their pronunciations, which causes different spelling variations of a single word, e.g., Who bhot accha ha, Wo bohat acha hai,Woh bhoat acha hy (he is very good). In addition, Urdu, and hence Roman Urdu, is a morphologically rich language i.e. wo bohat acha hai, (he is very good), wo bohat achi hai (she is very good), wo booth achay hain. (They are very good). Thus acha, achi and achay all point to one English word, i.e., good. This problem adversely effects the learning of the machine learning algorithms and consequently effects its results.

40

K. Mehmood et al.

9 Conclusion and Future Work In this work, we presented a Roman Urdu Sentiment Analysis system using three different features, i.e., unigram, bigram and uni-bigram, and ﬁve different classiﬁers, namely: NB, DT, SVM, LR and KNN. Also, a corpus of 779 reviews was developed. Experiments on six types of dataset were conducted by repeatedly dividing the data into 60% training and 40% testing sets; then average accuracies were computed from thirty iterations. Experiments were conducted for two Phases separately, i.e., before feature reduction (Phase I) and after feature reduction (Phase II). Results of both the Phases were compared, and the Phase with better results was selected for further discussion and detailed analysis. The results of the experiments showed that in both Phases, NB and LR performed better on this task. In future, we will analyze the impact of a stop-word list (corpus speciﬁc (automatic) and general (manual/automatic)) on results. Further, standardization of roman Urdu data is still an open and hard problem. Efforts can, however, be made to devise a mechanism to automatically normalize the Roman Urdu data and to then perform experiments on normalized data to assess the effect of this on accuracies.

References 1. Wan, X.: Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 553–561. Association for Computational Linguistics (2008) 2. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC, vol. 10, no. 2010 (2010) 3. Simons, G.F., Fennig, C.D. (eds.) Ethnologue: Languages of the World, Twentieth edition. SIL International, Dallas (2017). http://www.ethnologue.com 4. Feldman, Ronen: Techniques and applications for sentiment analysis. Commun. ACM 56(4), 82–89 (2013) 5. Tatemura, J.: Virtual reviewers for collaborative exploration of movie reviews. In: Proceedings of the 5th International Conference on Intelligent User Interfaces, pp. 272– 275. ACM (2000) 6. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classiﬁcation using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002) 7. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classiﬁcation of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002) 8. Taboada, M., Brooke, J., Toﬁloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011) 9. Alessia, D., Ferri, F., Grifoni, P., Guzzo, T.: Approaches, tools and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125(3) (2015) 10. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014)

Sentiment Analysis System for Roman Urdu

41

11. Yessenalina, A., Yue, Y., Cardie, C.: Multi-level structured models for document-level sentiment classiﬁcation. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1046–1056. Association for Computational Linguistics (2010) 12. Moraes, R., Valiati, J.F., Neto, W.P.G.: Document-level sentiment classiﬁcation: an empirical comparison between SVM and ANN. Expert Syst. Appl. 40(2), 621–633 (2013) 13. Zhang, C., Zeng, D., Li, J., Wang, F.Y., Zuo, W.: Sentiment analysis of Chinese documents: from sentence to document level. J. Assoc. Inf. Sci. Technol. 60(12), 2474–2487 (2009) 14. Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: feature selection for opinion classiﬁcation in web forums. ACM Trans. Inf. Syst. (TOIS) 26(3), 12 (2008) 15. Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classiﬁcation. In: 2013 International MultiConference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), pp. 712–717. IEEE (2013) 16. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642 (2013) 17. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005) 18. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011) 19. Xu, T., Peng, Q., Cheng, Y.: Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowl. Based Syst. 35, 279–289 (2012) 20. Yu, L.C., Wu, J.L., Chang, P.C., Chu, H.S.: Using a contextual entropy model to expand emotion words and their intensity for the sentiment classiﬁcation of stock market news. Knowl. Based Syst. 41, 89–97 (2013) 21. Hagenau, M., Liebmann, M., Neumann, D.: Automated news reading: stock price prediction based on ﬁnancial news using context-capturing features. Decis. Support Syst. 55(3), 685– 697 (2013) 22. Maks, I., Vossen, P.: A lexicon model for deep sentiment analysis and opinion mining applications. Decis. Support Syst. 53(4), 680–688 (2012) 23. Malik, M.K.: Urdu named entity recognition and classiﬁcation system using artiﬁcial neural network. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 17(1), 2 (2017) 24. Malik, M.K., Sarwar, S.M.: Urdu named entity recognition system using hidden Markov model. Pak. J. Eng. Appl. Sci. (2017) 25. Malik, Muhammad Kamran, Sarwar, Syed Mansoor: Named entity recognition system for postpositional languages: urdu as a case study. Int. J. Adv. Comput. Sci. Appl. 7(10), 141– 147 (2016) 26. Usman, Muhammad, Shaﬁque, Zunaira, Ayub, Saba, Malik, Kamran: Urdu text classiﬁcation using majority voting. Int. J. Adv. Comput. Sci. Appl. 7(8), 265–273 (2016) 27. Ali, A., Hussain, A., Malik, M.K.: Model for english-urdu statistical machine translation. World Appl. Sci. 24, 1362–1367 (2013) 28. Shahzadi, S., Fatima, B., Malik, K., Sarwar, S.M.: Urdu word prediction system for mobile phones. World Appl. Sci. J. 22(1), 113–120 (2013)

42

K. Mehmood et al.

29. Karamat, N., Malik, K., Hussain, S.: Improving generation in machine translation by separating syntactic and morphological processes. In: Frontiers of Information Technology (FIT), pp. 195–200. IEEE (2011) 30. Siddiq, S., Hussain, S., Ali, A., Malik, K., Ali, W.: Urdu noun phrase chunking-hybrid approach. In: 2010 International Conference on Asian Language Processing (IALP), pp. 69– 72. IEEE (2010) 31. Malik, M.K., Ali, A., Siddiq, S.: Behavior of Word ‘kaa’ in Urdu language. In: 2010 International Conference on Asian Language Processing (IALP), pp. 23–26. IEEE (2010) 32. Ali, W., Malik, M.K., Hussain, S., Siddiq, S., Ali, A.: Urdu noun phrase chunking: HMM based approach. In: 2010 International Conference on Educational and Information Technology (ICEIT), vol. 2, pp. V2-494. IEEE (2010) 33. Ali, A., Siddiq, S., Malik, M.K.: Development of parallel corpus and english to urdu statistical machine translation. Int. J. Eng. Technol. IJET-IJENS 10, 31–33 (2010) 34. Malik, K., Ahmed, T., Sulger, S., Bögel, T., Gulzar, A., Raza, G., Hussain, S., Butt, M.: Transliterating Urdu for a broad-coverage Urdu/Hindi LFG grammar. In: Seventh International Conference on Language Resources and Evaluation, LREC 2010, pp. 2921– 2927 (2010) 35. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

User Centric Mobile Based Decision-Making System Using Natural Language Processing (NLP) and Aspect Based Opinion Mining (ABOM) Techniques for Restaurant Selection Chirath Kumarasiri1(&) and Cassim Farook2 1

2

Department of Computing, Informatics Institute of Technology, University of Westminster, Colombo, Sri Lanka [email protected] Department of Computing, Informatics Institute of Technology, Colombo, Sri Lanka [email protected]

Abstract. Opinions about restaurants from diners who visit them are essential in restaurant selection process. Crowd-sourced platforms bridge both diners and restaurants through opinions. These platforms suggest straightforward rating based options; however, they do not address the confusion among diners in differentiating restaurants as a rating is a diverse characteristic. Diners resort to reading reviews as a solution since it has a human touch over calculated ratings. Nevertheless, reasons such as prolong time to read, discrepancies in reviews, non-localized platforms, biased opinions and information overload have obstructed diner perspective in picking a restaurant. This research proposes a mobile based intelligent system that collaborates Natural Language Processing (NLP) and Aspect Based Opinion Mining (ABOM) techniques, to assist user centric decision making in restaurant selection. A third party public data provider was used as the data source. The research automates reading of reviews and processes each of them using a custom algorithm. Initially, the paper focuses on a Part-of-Speech (POS) Tagger based NLP technique for aspect identiﬁcation from reviews. Then a Naïve Bayes (NB) Classiﬁer is used to classify identiﬁed aspects into meaningful categories. Finally, NB based supervised and lexicon based unsupervised sentiment allocation techniques are used to ﬁnd the opinion from categorized aspects. Decision making accuracies and variations were measured over three main areas: aspect extraction, aspect classiﬁcation and sentiment allocation. Overall results are in acceptable level; however unsupervised method always outperformed supervised method during sentiment allocation. It was proven that NLP and ABOM techniques can be used for the restaurant domain. Keywords: Aspect based opinion mining Crowd sourced social platforms Intelligent systems Mobile applications Natural language processing Restaurant reviews Sentiment analysis

© Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 43–56, 2019. https://doi.org/10.1007/978-3-030-01174-1_4

44

C. Kumarasiri and C. Farook

1 Introduction In our day to day life, it is a common question: “Where shall we go for dinner tonight?” which repeats and passes between each other wearyingly until particular choice is made. It is true that we all have our favourite places for dinner, lunch and coffee as it is no longer a treat; it’s a way of life. Nevertheless, ﬁnding the perfect place considering everyone’s opinion is always a hassle and often it happens at the last minute. Options such as word-of-mouth on experience and heavy Google search on restaurants have become less preferred on this situation, as people want to try new experiences all the time with the competition and variance in the modern restaurant industry. Restaurant buying guide of Consumer Report 2016 suggests three main issues that matter when selecting a restaurant in US: Adventurous eaters, Cuisine conscious eaters and Takeout preferred eaters [1]. Social media platforms have tried to address these issues, thereby becoming an indispensable part of our daily lives. Everyone wants to know the good and bad aspects before going to a restaurant. Therefore, people search on social websites and apps to read previous visitors’ opinions before they make a choice. According to 2016, Restaurant Industry Forecast, 1/4 of consumers say technology options are important factors and 70% of smartphone users view restaurant menus on their phones [2]. Enormous success of solution giants such as Foursquare [3], Trip Advisory [4] and Yelp [5], depict clearly how the restaurant selection process has been changed into more user-centric mobile apps with the rapid expansion of smartphones during past few years. Review evaluation has become important as much as checking ratings of a restaurant, since reading reviews include more humanly aspects than the unfeeling rating checks. For instance, everyone’s “three stars” is not someone’s three-star rating. Research ﬁrm YouGov found that 88% of consumers in the US trust reviews as much as they do personal recommendations made by friends and family [6]. However, due to the large amount of available opinions or reviews in these commercial applications, people often overwhelm with information. Therefore, diners ﬁnd it extremely difﬁcult to obtain useful opinions to decide about menus, accommodation, restaurants, budget and attractions. Eventually people end up having an unlikable experience even after wasting their precious time on restaurant selection process. Considering all above, this research paper determines to eliminate the gap of stressful review reading and time waste in daily restaurant selection process. The proposed solution automates review understanding on behalf of user and it can decide the best restaurant options for user in real-time, just based on user’s preferences. Initially, paper discusses mainstream and utmost researched NLP, ABOM and sentiment analysis techniques with relevance to the problem domain. Then, it justiﬁes the selection of reliable restaurant review data source to solve the problem. Afterwards, paper critically analyses each component of the proposed framework. Later, paper clearly cites experiments proposed prototype went through with its results. Prototype concept limitations and future enhancements are discussed at the end of the paper.

User Centric Mobile Based Decision-Making System Using NLP

45

2 Background and Related Work 2.1

Aspect Based Opinion Mining of Reviews

Opinion mining is the analysis of people’s opinions, sentiments, evaluations, appraisals, attitudes and emotions which express or imply positive negative statements towards entities such as products, services, issues, events, topics, and their attributes [7]. Opinion mining can be performed mainly in three different levels: (1) Document level: To classify negative or positive sentiment of a whole opinion document [8]; (2) Sentence level: To classify negative or positive sentiment of a whole opinion sentence [9]; and (3) Aspect level. Since both, document level and sentence level are not capable of identifying what people liked or did not liked exactly, aspect level classiﬁes negative or positive sentiment of an aspect/feature directly [10]. Wiebe et al. [9] explain challenges in sentiment classiﬁcation using objective sentences vs. subjective sentences and further explain how sentiment neutrality (no opinion) can occur. Opinion variance between regular opinions (opinion which describe an aspect of an entity) and comparative opinion (opinion which compares same aspect of different entities) is another barrier which makes opinion mining more challenging, especially in document and sentence level [11]. However, aspect based opinion mining can facilitate to assign an overall sentiment to a referred entity (restaurant) through sentiments associated with the aspects (food type or restaurant characteristic) of the entity being discussed [12]. Kim et al. [13] cite this as the most common opinion mining type, and its ability to segment aspects after extracting them from opinion has gained researchers predilection over other methods. In other non-aspect based opinion methodologies, opinion data should be pre-segmented to know identiﬁed aspect’s category. 2.2

Aspect Identiﬁcation Techniques

As the initial step of aspect based opinion mining methodology, it is needed to identify salient topics discussed in the opinion text through an appropriate aspect identiﬁcation technique. NLP Techniques have become quite effective in feature discovery, since it commonly involves POS tagging and syntax tree parsing approaches [10]. Most of modern-day taggers and parses are built through extensive research where it has eventually helped to produce higher accuracies in feature extraction. However, there are discrepancies in NLP based techniques such as speed of parsing or tagging during large scale processing and insufﬁciencies in discovering all features due to lack of domain ontology or knowledge model [13]. On the other hand, Mining Techniques could also be used for feature discovery since it can eliminate the shortages of NLP techniques, such as inability to discover each feature out from opinion text [10]. These techniques are mostly used in document and paragraph level opinion mining, due to its capability of expanding beyond known certain word types or phrases through support information and rule based approaches. Biggest downfall in mining techniques is that, it may work differently in diverse domains as there should be an exclusive methodology to discover aspects each domain [13].

46

2.3

C. Kumarasiri and C. Farook

Sentiment Classiﬁcation Techniques

After aspect discovering phase, sentiment classiﬁcation of identiﬁed aspects should be begun as the second step of selected aspect based opinion mining methodology. Basically, to assign opinion or sentiment (positive, negative or natural) orientation on identiﬁed feature/aspect, technique should be selected. Learning-based (Supervised) sentiment prediction is a common sentiment classiﬁcation technique where machine learning model is trained using characteristics of words around targeted text to detect actual sentiment of raw text. Obtaining an accurate sample data set with labelled aspect and sentiment examples is a potential problem with supervised techniques [14]. Though the classiﬁer can identify a sentiment for each aspect, accuracy is always limited to the competence of training data set. Additionally, as in other model driven identiﬁcations there should be exclusively trained models for each domain since general domain is not accurate enough. Lexicon/Rule-based sentiment prediction is the other popular sentiment classiﬁcation technique, where sentiment is allocated for each identiﬁed aspect through a sentiment dictionary which holds lists of positive and negative words. Simplicity in domains which explicitly express opinion such as product reviews, has gained popularity for this lexicon based technique over other methods. However, in much complex domains such as movie reviews where people deal with emotions and other complex expressions, this technique tends to give inferior performance [13]. Furthermore, performance will always depend on the dictionary that is being used, while sentiment detection is limited to the listed words. To expand it beyond, either dictionary should be frequently updated or separate dictionaries should be used for different components. 2.4

Aspect Classiﬁcation Techniques

It is needed to classify identiﬁed aspects according to characteristics of them, since allocating sentiments for anonymous extracted aspects would be meaningless. Maria Pontiki et al. [15] rationalize the importance of aspect category and its opinion target expression as it is beneﬁcial to represent outputs of realistic software applications. They propose a new uniﬁed aspect based sentiment analysis framework, with benchmark datasets containing manually annotated reviews for aspect categories and opinion in restaurant review domain. This proposed framework verges six main general aspect categories that people are interested in restaurant domain such as ambience, drinks, food, location, restaurant and service. Additionally, when considering industrial standards of restaurant domain, AAA Diamond Rating Guidelines for restaurants is one of the most widely known rating scales among professionals [16] and AAA guidelines also mention quality of food, service and ambiance as the base for their prominence rating scale [17]. These reflections further endorse the importance of aspect classiﬁcation in restaurant reviews. Schouten and Frasincar [12] have conducted an in-depth survey on aspect level sentiment analysis, by speciﬁcally discussing sub problems encounter in actual solutions, such as this aspect classiﬁcation issue. They cite the importance of dealing with implicit and explicit aspect entities and sentiments while showing how each aspect could be linked with relevant characteristics. Moreover, they recommend a Joint Aspect Detection and Sentiment Analysis Methods based approach

User Centric Mobile Based Decision-Making System Using NLP

47

where solitary supervised, unsupervised or other approach could be utilized for both aspect detection and sentiment allocation components. When reflecting the discoveries of Schouten and Frasincar [12], Supervised Machine Learning is the most popular classiﬁcation technique among domain experts, since it does not need any direct relation between aspect and category/sentiment to conduct classiﬁcation. Study analyses techniques such as CRFs, HMM, Naïve Bayes classiﬁers, SVMs and Maximum Entropy throughout the survey and it was perceptible that the Naïve Bayes classiﬁer is the most used supervised technique among past researches. Furthermore, they recommend this supervised technique for non-complex domains such as product and restaurants opinion mining, due to its practicality and effectiveness over other techniques. However, there is a downfall in supervised techniques as an accurate and domain related training data set is always essential for the procedure. Unsupervised Machine Learning Techniques such as decision trees, ANNs, LDA and PLSI models are also discussed in the study [12], while citing fair abilities of them to obtain domain speciﬁc results. These techniques also have similar accuracy and performance levels when compared to supervised techniques and nonetheless drawbacks such as over ﬁtting issues and necessity of training time for these models based approaches have become potential drawbacks. Other suggested Joint Aspect Detection and Sentiment Analysis options mentioned in the survey paper such as Syntax-Based and Hybrid Approaches were in theoretical level as those suggested models should be built ﬁrst to use in the proposed solution.

3 Restaurant Review Data Sources With the focus to build a mobile based system that processes actual diner opinions found in Location Based Social Networks (LBSN), priority was given to investigate the crowd sourced social restaurant reviewing platforms which offer an API. It is the most sensible way to aggregate and analyse restaurant reviews in real-time. Platforms such as Yelp [18], Trip Advisory [19], Zomato [20] and Foursquare [21] were considered for this selection and Foursquare was selected for the initial prototype as it has several positives over other platforms such as micro review style, trustworthiness and priority towards restaurant category. Comprehensive and unbiased survey of Chen et al. [22] further justiﬁes these pros of Foursquare data source, since the study depicts the diversity and richness of Foursquare tips through tips published by randomly selected 6.52 million reviewers. Additionally, study highlights the micro review style of Foursquare tips, since its informal and straightforward manner of describing venue aspects has encouraged users to build one of the largest crowd sourced location review databases. With the sophisticated developer support of Foursquare API, it would be possible to ﬁlter relevant review data real-time based on user’s preference. For instance, if user is looking for speciﬁc cuisine or dish, API could deliver reviews related to the preference including all essential data such as date and time of creation [23]. This will increase the efﬁciency and accuracy of mining, since it would only process exclusive and relevant reviews to the user’s preference, rather than processing irrelevant and general data. When considering previous studies related to Foursquare, Scellato et al. [24] have studied socio-spatial properties of LBSN users and identiﬁed that users

48

C. Kumarasiri and C. Farook

exhibit friendship connections across wide range of geographic distance with a variability in their own social triads. Frith [25] explores about social practices of LBSN users by taking foursquare as an example and shows how people establish social norms using physical location. These studies are beneﬁcial to understand the core areas of Foursquare such as social structure and check-in features. Moraes et al. [26] have conducted an empirical study on Foursquare tip polarity using four different methods: Naïve Bayes, SVM, Maximum Entropy and Unsupervised Lexicon Based technique and they test those using two different tip data sets: one manually labelled by volunteers and other automatically labelled based emoticons. Their results further endorse the rightful compliance between both Foursquare reviews and the aspect based sentiment prediction techniques such as Naïve Bayes, SVM and lexicon based, since those methods show the best overall accuracies. Furthermore, they test a Hybrid approach by combining all four methods which does not show impressive overall results over individual techniques. However, they have limited their study just for testing polarity on two sample datasets over selected polarity detection methods. Ultimately, they suggest opinion summarization and venue recommendations through more data sets and combining multiple methods as future enhancements in the study, which are hoped to achieve as a part of proposed solution.

4 Proposed Framework 4.1

API Handler Component

During a search, prototype gathers relevant restaurant data from the API as the ﬁrst step, based on user preferred cuisine, speciﬁc meals and other inputs. Therefore, customized GET requests with latitude, longitude and search query are sent to two API endpoints: to aggregate venue data and to aggregate tip data with relevance to each aggregated venue [23]. In return to these GET requests, API sends JSON responses with the relevant information related to venues and tips. System is capable of deserializing the venue and tip data from JSON dictionaries and it saves data as respective Venue and Tip objects in the database. 4.2

Tip Processor Component

With the completion of data aggregation phase, system starts to process gathered reviews to identify aspects and trends in tips. Tip processing follows chronological steps below: 4.2.1 (a) (b) (c) (d) (e)

Tip Normalization Removing Unicode Characters in text Removing Accents and Diacritics in text Removing Combining marks in text Tokenizing text into words Creating normalized text joining tokens

User Centric Mobile Based Decision-Making System Using NLP

49

Initially, each raw tip should go through normalization process since program is dealing with crowd sourced text. Once normalized tip is created it will be saved in the database. 4.2.2 Tip Processing Using POS Tagger When normalized tip is created, it goes through the POS tagger where the aspect extraction happens. A new opinion mining algorithm was implemented using the POS tagger to identify aspects with its trend in the tip. Initially, each normalized tip was sent through the tagger and it was returning each token with its respective tag. However, in this initial approach there were discrepancies, since tagger could not distinctly identify dish names in tips. For instance, though “Spicy Tom Yum” and “Sweet Corn Soup” are whole dish names, tagger just processes those as just separate English words. To overcome this issue, tips must be processed using a training data set which consists of general dish names used in restaurants. Different data sets with dish names were created through web scraping sites such as Yamu [27] and Foursquare [21]. Initial prototype includes 10 different custom-made data sets with dish names, as it supports 10 different cuisines. Once the tips are processed through data sets with dish names, discrepancies were minimized in a signiﬁcant level. Subsequently, trained tip strings were processed and prototype can identify adjectives (trends) with relative noun or place name or other name (aspect) when the trend comes before or after the aspect. Since the method is handling all tokens of the tip through a dictionary, it even identiﬁes multiple trends of a speciﬁc aspect. 4.3

Aspect Classiﬁer Component

4.3.1 Training Classiﬁcation Model Two Naïve Bayes Classiﬁcation models were created to detect aspect category (entity type) of identiﬁed aspects and to detect polarity category (opinion type) of identiﬁed aspects (during supervised calculation). Both classiﬁer models are trained using a training data set compiled by SemEval [28], to detect the classiﬁcation category for the respective aspect type. This training dataset consists of precise restaurant reviews which are annotated with its opinion (such as Positive or Negative state) and the speciﬁc entity type, it is focused on such as Ambience, Food, Service and Restaurant. 4.3.2 Detecting Category Using Classiﬁcation Model During the tip processing phase, aspects are classiﬁed into three main categories such as user, cuisine and general based on keyword searching. Basically, if the identiﬁed aspect is a dish aspect in user preference, it will be flagged as a user aspect and if the identiﬁed aspect is a cuisine aspect in user preference, it will be flagged as a cuisine aspect. All remaining aspects are marked as general aspects which ultimately go through the built classiﬁer model. Classiﬁer identiﬁes four categories mainly in flagged general aspects and identiﬁed aspects appearing as in Fig. 1 after overall categorization.

50

C. Kumarasiri and C. Farook

Fig. 1. Identiﬁed aspects after overall aspect classiﬁcation.

4.4

Sentiment Allocation Component

4.4.1 Unsupervised Calculation After tip texts have been segmented into bag-of-words model which consists of aspects with trends, ﬁrst phase of prototype will be using an unsupervised method to extract the opinion. Each identiﬁed trend will be considered and system will give each word a speciﬁc subjectivity score referring a sentiment lexicon. AFINN is the sentiment lexicon used here which includes list of English words rated for valence with an integer between minus ﬁve (negative) and plus ﬁve (positive) [29]. Furthermore, the system is sharp enough to calculate the cumulative of multiple trends. 4.4.2 Supervised Calculation As the second sentiment allocation option, trained sentiment classiﬁcation model (second model) in section C will be used here. Each identiﬁed aspect with its respective trend will go through the trained classiﬁer to detect pertinent sentiment, like section C process. Classiﬁer is trained in a way to assign +1 if the identiﬁed aspect is positive and to assign −1 if the identiﬁed aspect is negative or any other state. Based on these ground level aspect sentiments, aggregated sentiments for each tip and venue are calculated, which will be utilized in presenting results to the user (Fig. 2).

Fig. 2. Aspect with its trend & weight (Unsupervised-left, Supervised-right).

User Centric Mobile Based Decision-Making System Using NLP

51

5 Experiments Initial prototype supports 6–10 cuisine types and it can provide up to 20 venues (based on APIs delivery) with sentiment score for each restaurant search query. These search results vary according to user inputs, since simple change in one user preference can give different output. It is impossible to measure accuracy and performance of each search query as there are multiple combinations. However, to provide testing results of the solution, restaurant search query was performed using a single cuisine (Thai) and providing location latitude (6.8652715) & longitude (79.8598505). This user preference represents a valid and common user behaviour. For this restaurant search query, 20 venues near provided location which are popular for Thai food were aggregated from the API and 405 tips related to venues were also aggregated. Then a sample of tips was created, which consists of both top and worst tips out of gathered tips. Sample included total of 20 tips, which covers top 10 venues in search results and 2 good or bad tips per venue. These tips have the highest impact on results, since selected two tips per venue hold the highest level of aspect extraction for respective venue. 5.1

Accuracy Testing

5.1.1 Overall Accuracy Three main accuracies were calculated based on the sample as in Table 1, benchmarking solution’s competency against real world results. Table I. Overall accuracy of the solution based on sample Mean Variance Aspect extraction accuracy 0.6818 0.0081 Aspect extraction accuracy 0.6471 0.0177 Sentiment allocation accuracy 0.8824 0.0053

Range of fluctuation ±0.0900 ±0.1330 ±0.0730

5.1.2 Supervised vs. Unsupervised Accuracy According to the implementation, sentiment allocation component of solution has the capability of ﬁnding opinion from reviews using either Lexicon Based Unsupervised or Naïve Bayes Classiﬁer Based Supervised Method. However, precisions of both methods were tested using the same sample above to ﬁnd the most accurate method for the solution. Ultimately, confusion matrices for both Unsupervised and Supervised methods were formed and key calculations related to the confusion matrix were done as cited in Table 2.

52

C. Kumarasiri and C. Farook Table II. Sentiment Accuracy (Supervised vs. Unsupervised)

Rates Accuracy Misclassiﬁcation rate (Error Rate) True positive rate (Recall) False positive rate Speciﬁcity Precision Prevalence F-measure

(TP + TN)/Total (FP + FN)/Total TP/Actual Yes FP/Actual No TN/Actual No TP/Predicted Yes Actual Yes/Total (2*Precision*Recall)/ (Precision + Recall)

Unsupervised method 0.8706 0.1294

Supervised method 0.7176 0.2824

0.8514 0 1 1 0.8706 0.9197

0.8235 0.4412 0.5588 0.7368 0.6 0.7777

Prevalence of two methods changes, since both methods have different ways of classifying sentiments for aspects. However, Unsupervised Method’s F- measure is closer to 1 (best accuracy) than the Supervised Method’s value. Considering all above, Unsupervised Method was selected as the default sentiment allocation method of the solution, since it outperforms the Supervised Method easily. 5.2

Reliability Testing

5.2.1 Aspect Extraction Variations Like the accuracy testing, it is impossible to measure reliability for every possible scenario of the solution. Therefore, sample search queries were performed targeting four different cuisines mainly, as it gives acceptable scales of data to test the entire solution. Although perfection of these identiﬁed aspects can be varied with the calculated accuracy, solution can easily extract adequate number of aspects from gathered tips. Another signiﬁcant fact is that the cuisine aspects are less in most of the tips, as people tend to review the place using general aspects most of the times and the solution is limited to dishes in the cuisine data sets. Therefore, unless user speciﬁcally mentions dish names for speciﬁc cuisine, solution is not capable of identifying much cuisine aspects (Fig. 3).

Fig. 3. Aspect extraction variations.

User Centric Mobile Based Decision-Making System Using NLP

53

5.2.2 Aspect Classiﬁcation Variations Subsequently, reliability in general aspect classiﬁcation was tested using the same search queries and analysing related ambiance, food, service and restaurant aspects found in each general aspect. The solution tends to classify ambience, food and service aspects in a tolerable manner, with an even distribution of each category which is approximately 30% (Fig. 4).

Fig. 4. Aspect classiﬁcation variations.

5.2.3 Sentiment Allocation Variations The solution identiﬁes positive and neutral trends mostly when compared to negative aspects. Limitations in used lexicon, is one of the reasons behind increased neutral number as there are various trends in user reviews which cannot be found in the lexicon. Moreover, there is lack of negativity in tips as people tend to write good experiences more frequent than the bad experiences (Fig. 5).

Fig. 5. Sentiment allocation variations (in default unsupervised method).

54

C. Kumarasiri and C. Farook

6 Conclusion 6.1

Proof of Concept

The prototype application was implemented as a proof of concept. Based on availability and accessibility to the hardware and software resources, iOS mobile application targeting iOS 10/11 was built as the initial prototype. Various iOS related development tools and techniques had to be used during implementation. NSLinguisticTagger [30] was used as the core NLP framework plus POS tagger and Parsimmon linguistics toolkit [31] was selected as the Naïve Bayes classiﬁcation framework of the solution. Nevertheless, all algorithms and methodologies were implemented with the intention of expanding it towards other platforms in future. 6.2

Limitations of the Solution

The scope of the project was limited to aspect level opinion mining techniques where solution focuses on good and bad aspects in reviews. Prototype depends on Foursquare API in data aggregation phase, as the venue and tip data per search may vary on API’s supply. Tested prototype is only capable of processing English reviews and accuracy may vary based on grammar and spelling competencies in both aggregated tips and entered user aspects. Furthermore, it processes each review in sentence wise pattern and it is only capable of identifying two main extraction patterns in a sentence as: Trend First Aspect Second. (Ex: Good Food) and Aspect First Trend Second. (Ex: Food is superb). Solution is limited to dish names found in data sets during normalization process and user should enter speciﬁc dish name to extend processing from there. Solution depends on SemEval data set in aspect classiﬁcation and it can be further improved with a better data set or through a data mining approach with past search results. Unsupervised technique entirely depends on AFINN data set and it is limited to words found in the sentiment lexicon. Supervised technique depends on SemEval data set and it can assign sentiment without any limitation like in unsupervised technique. 6.3

Future Enhancements

Now solution solely uses Foursquare API as the review source. Utilizing other popular review sources such as Trip Advisory, Yelp and Zomato can make the solution more diversiﬁed. It is possible to save users’ tastes, location and other aspects during a search, while allowing solution to process and suggest appropriate restaurants. Subsequently this will allow solution to bring enormous business value as it can trade users’ search data trends to the nearby restaurants. Most importantly, to overcome from language related discrepancies in aspect extraction, a new Language Model and Frequency Based Aspect Detection Methodology could be implemented inside the tip processing module. Furthermore, sentiment allocation module could be improved through Human Based Reasoning Techniques such as latent factors of reviews.

User Centric Mobile Based Decision-Making System Using NLP

55

References 1. Consumer Reports, Restaurant Buying Guide, Consumer Reports (2017). http://www. consumerreports.org/cro/restaurants/buying-guide.html. Accessed 27 Apr 2017 2. April, G.: 10 Startling Restaurant Statistics in 2015. Blog.qsronline.com (2015). http://blog. qsronline.com/10-startling-restaurant-statistics-in-2015/. Accessed 27 Apr 2017 3. Sensor Tower, Foursquare City Guide: Restaurants & Bars Nearby - App Store revenue & download estimates - US, Sensor Tower (2017). https://sensortower.com/ios/us/foursquarelabs-inc/app/foursquare-city-guide-restaurants-bars-nearby/306934924/. Accessed 27 Apr 2017 4. Sensor Tower, TripAdvisor Hotels Flights Restaurants - App Store revenue & download estimates - US, Sensor Tower (2017). https://sensortower.com/ios/us/tripadvisor-llc/app/ tripadvisor-hotels-ﬂights-restaurants/284876795/. Accessed 27 Apr 2017 5. Sensor Tower, Yelp - Nearby Restaurants, Shopping & Services - App Store revenue & download estimates - US, Sensor Tower (2017). https://sensortower.com/ios/us/yelp/app/ yelp-nearby-restaurants-shopping-services/284910350/. Accessed 27 Apr 2017 6. Gammon, J.: YouGov | Americans Rely on Online Reviews Despite Not Trusting Them, YouGov: What the world thinks (2017). https://today.yougov.com/news/2014/11/24/ americans-rely-online-reviews-despite-not-trusting/. Accessed 27 Apr 2017 7. Liu, B.: Sentiment Analysis and Opinion Mining, 1st ed. Morgan & Claypool, San Francisco (2012) 8. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - EMNLP 2002 (2002) 9. Wiebe, J., Bruce, R., O’Hara, T.: Development and use of a gold-standard data set for subjectivity classiﬁcations. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (1999) 10. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2004 (2004) 11. Jindal, N., Liu, B.: Mining comparative sentences and relations. In: Proceedings of the 21st National Conference on Artiﬁcial Intelligence, 1st ed., vol. 2, pp. 1331–1336. AAAI Press, Menlo Park (2006) 12. Schouten, K., Frasincar, F.: Survey on aspect-level sentiment analysis. IEEE Trans. Knowl. Data Eng. 28(3), 813–830 (2016) 13. Kim, H., Ganesan, K., Sondhi, P., Zhai, C.: Comprehensive review of opinion summarization. Technical report (2011, Unpublished) 14. Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: Proceedings of the 18th International Conference on World Wide Web - WWW 2009 (2009) 15. Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S.: SemEval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 486–495 (2015) 16. Namasivayam, K.: The impact of intangibles on consumers’ quality ratings using the star/diamond terminology. Foodserv. Res. Int. 15(1), 34–40 (2004) 17. Diamond Rating Deﬁnitions | AAA NewsRoom, AAA NewsRoom (2017). http://newsroom. aaa.com/diamond-ratings/diamond-rating-deﬁnitions/. Accessed 01 May 2017 18. Factsheet | Yelp, Yelp (2017). https://www.yelp.com/factsheet. Accessed 01 May 2017 19. Media Center, TripAdvisory (2017). https://tripadvisor.mediaroom.com/us-about-us. Accessed 01 May 2017 20. About, Zomato (2017). https://www.zomato.com/about. Accessed 01 May 2017

56

C. Kumarasiri and C. Farook

21. About, Foursquare.com (2017). https://foursquare.com/about. Accessed 01 May 2017 22. Chen, Y., Yang, Y., Hu, J., Zhuang, C.: Measurement and analysis of tips in foursquare. In: 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops) (2016) 23. API Endpoints, Developer.foursquare.com (2017). https://developer.foursquare.com/docs/. Accessed 01 May 2017 24. Scellato, S., Lambiotte, R., Mascolo, C.: Socio-spatial properties of online location-based social networks. Artif. Intell. 11, 329–336 (2011) 25. Frith, J.: Communicating through location: the understood meaning of the foursquare checkin. J. Comput. Mediat. Commun 19(4), 890–905 (2014) 26. Moraes, F., Vasconcelos, M., Prado, P., Dalip, D., Almeida, J., Gonçalves, M.: Polarity Detection of Foursquare Tips. Lecture Notes in Computer Science, pp. 153–162 (2013) 27. Menus, YAMU (2017). https://www.yamu.lk/menus. Accessed 01 May 2017 28. SemEval-2015 Task 12: Aspect Based Sentiment Analysis. SemEval- 2015 Task 12. Alt. qcri.org (2017). http://alt.qcri.org/semeval2015/task12/. Accessed 01 May 2017 29. AFINN. www2.imm.dtu.dk (2017). http://www2.imm.dtu.dk/pubdb/views/publication_ details.php?id=6010. Accessed 01 May 2017 30. NSLinguisticTagger - Foundation | Apple Developer Documentation, Developer.apple.com (2017). https://developer.apple.com/reference/foundation/nslinguistictagger. Accessed 01 May 2017 31. ayanonagon/Parsimmon, GitHub (2017). https://github.com/ayanonagon/Parsimmon. Accessed 01 May 2017

Optimal Moore Neighborhood Approach of Cellular Automaton Based Pedestrian Movement: A Case Study on the Closed Area Najihah Ibrahim(&) and Fadratul Haﬁnaz Hassan School of Computer Sciences, Universiti Sains Malaysia, 11800 George Town, Pulau Pinang, Malaysia [email protected], [email protected]

Abstract. Closed area building is the most dangerous structure that caused major casualties compared to open space during panic situation due to the limited access of space. The unfamiliarity of the pedestrian and the unstructured arrangement of the space area (e.g. furniture, exit point, rooms, workspace and etc.) had caused the high physical collision that can cause casualties and heavy injuries. Hence, this research is to simulate the pedestrian movement to ﬁnd the impact of building’s familiarity and space design towards the pedestrian movement speed. The familiarity of the pedestrian had been tested with the horizontal movement of Von Neumann approach and optimal criterion of Moore Neighborhood approach for a closed building area with the randomization of spatial layout obstacles arrangement. The optimal Moore Neighborhood is able to re-enact the real behavior of the pedestrian with high familiarization. Hence, this research had proven that the pedestrian with high familiarity of a building escape route had moved faster with higher speed. The pedestrian movement speed was improved with the feasible spatial layout design of the closed area building that is able to shape the flow of the pedestrian’s movement and reduce the physical collision. Keywords: Cellular automata Pedestrian movement Moore neighborhood Von neumann Closed area building Spatial layout design

1 Introduction Closed area building is a space with limited access of escape route. Nowadays, there are many closed area of skyscraper’s building built and secured with high security maintenance for dodging any outsiders and external attack such as Burj Khalifa in Dubai, Shanghai Tower in Shanghai, Lotte World Tower in Seoul, One World Trade Center in New York, Petronas Twin Tower in Malaysia and many more. Usually, the high security closed area building has a limited access of ingress and egress point for the building administrator to control the pedestrian flow into and out of the premise [1–3]. However, this kind of rules and building structures are perfectly working for the normal situation when there are no man-made incidents or nature disasters happened. When the panic situation such as ﬁre disasters, earthquake, bombing and many more happened, the pedestrian will be alarmed and will start to react towards the situation by © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 57–71, 2019. https://doi.org/10.1007/978-3-030-01174-1_5

58

N. Ibrahim and F. H. Hassan

displaying the panic behavior reflection which the pedestrian will be having a great shock and make a drastic reactions to save their life and to avoid any physical contacts that can harm themselves [4]. In these recent years, there are a lot of closed areas incident happened that lead towards high or total casualties of the pedestrian. These incidents’ casualties happened due to the pedestrian entrapment inside the building and unable to access the exit point during evacuation. Recently, in Malaysia, there was a great tragedy of a religious school that was burned down at dawn and killed 25 people [5]. The tragedy happened due to the entrapment of the victims behind the barred window and the building only have a single exit point which was believed to be blocked during the burning [5]. There is also another recent tragedy in Malaysia that lead towards the total death toll for a family of four when their house was burned down and the victims were unable to escape and were trapped inside their house [6]. Furthermore, there are also some incidents that involve the high physical collision of the victims such as the Mina stampede in Mecca during hajj pilgrims in 2015, ﬁre combustion that leads towards the stampede of the customers by pushing and shoving themselves at Station Nightclub in Rhode Island in 2003 and the overflow of crowd for new year’s eve celebration at Address Downtown Hotel in Dubai [4, 7]. Based on these incidents, there are many of contribution factors were identiﬁed and listed down as the highlight for the important features of a space, especially a closed area building [4, 8–10]. There are many aspects and features of managing a space such as; emergency warning system, geographical location, demographic setting, news and social network broadcasting, emergency education, spatial layout designing and also the pedestrian behavior reactions [1, 2]. Based on [4], during the panic situation, the ﬁrst survival instinct will be; (1) monitoring the surrounding condition, (2) deciding on several escape routes, and (3) ﬁnding the shortest path towards the exit point. However, based on [1], the panic situation will caused an irrational decision making and the pedestrian will react based on their surroundings and their familiarity towards the surroundings. Hence, based on this research objectives, this study is trying to prove that the most basic aspect in managing the crowd during panic situation is the familiarity of the pedestrian with the structure of the building and the behavioral respond of the pedestrian towards the surroundings’ layout. Lewin’s equation in human behavior had stated that human behavior is the function of their environment. B ¼ f ðP; EÞ

ð1Þ

Equation 1: Lewin’s Equation where B is the behavior, P is the human and E is the environment. This equation was made for human’s psychological reactions towards the surroundings social and cultural influences [11]. However, this equation is also can be adopted and interpreted as; the pedestrian behavior can be affected by their surroundings structure. The surroundings layout is also able to stimulate the pedestrian behavior for body and brain reflection for ﬁnding the desire direction of escaping during evacuation process [1, 2]. From the incidents and ﬁndings, the simulation of the pedestrian movement and surrounding layout become more important in these recent

Optimal Moore Neighborhood Approach of Cellular Automaton

59

years with the enhanced of computer technologies for predicting the pedestrian movement and escape speed and time for evacuation during panic situation [1–4, 12]. Simulation is the computer modelling of the situation or process of a space and the entities of the space [10, 12–14]. In this research context, the simulation of the pedestrian is the simulation of the pedestrian movement in between the space, obstacle and other pedestrian to move forward and escape a closed area building. This simulation of pedestrian will able to re-enact the real situation of pedestrian movement and will predict the future development of the surroundings, the safety needs and pedestrian trafﬁc that will able to reduce the casualties during any panic situation in a particular space area.

2 Panic and Normal Situations: Pedestrian Movement Behaviors Pedestrian behavior can be affected by the surrounding situations. The situation can be divided into normal situation and panic situation. Both of these situations have a great effect on the pedestrian behavior movement and will affect the percentage of survival and casualties of the pedestrian involved. 2.1

Normal Situation

Pedestrian movement behavior during normal situation is based on the intuition, personality and attraction in a space area. There are a lot of trading or attraction space areas that had sieved through the simulation of pedestrian in normal situation for enhancing the pedestrian flow for increasing the number of customers and visitors [15]. The pedestrian behavior during normal situation will have less physical contact between the pedestrians and the obstacle of the area and increasing the steps’ space of the pedestrian to move leisurely and changing their route courses. 2.2

Panic Situation

Panic situation can be happening due to the man-made incidents, nature disasters and also the high collision of the pedestrian with other pedestrians or their surroundings during normal situation. This panic situation will cause a panic, chaos, unpredictable and uncontrollable of the pedestrian behavior [4, 16]. The sudden movement will cause the stress and selﬁsh impact such as crying, yelling, shoving and pushing [4]. These impacts, based on the incidents that already happened, will contribute towards the serious injuries and death of the pedestrians. This behavior can be sub-consciously controlled by the familiarity of the space area towards the pedestrian. The pedestrian that are familiar and well known of the space’s structure may able to escape with a faster speed compare to the new pedestrian and have no access towards the space’s structure before. Hence, the favor of escaping and evacuate from the incidents will be on the pedestrian with a lot of information and not the pedestrian with the least of knowledge about the space they were in.

60

N. Ibrahim and F. H. Hassan

The escaping approach also becomes an important factor in pedestrian movement during panic situation. There are three type of movement approach; (1) microscopic approach, (2) macroscopic approach, and (3) mesoscopic approach. The microscopic approach is the homogenous entity-based approach which the pedestrian will make their individual decision making process and have a sole survival journey. While the macroscopic is the heterogeneous crowd-based approach that makes the escaping process is a movement of a group of people. The mesoscopic approach is the crowdherding approach that almost similar to the macroscopic approach but having an agentbased movement. Based on the movement approaches, microscopic approach become the best solution and almost similar to the real life of pedestrian evacuation movement as the pedestrian will have a tendency of making their own decision to move and evacuate from the closed area compare to the macroscopic and mesoscopic approach as the movement of a group of pedestrians will increase the physical collision and decrease the movement speed due to the bottleneck of the pedestrian in front of the exit point.

3 Microscopic Movement: Cellular Automata Approach Pedestrian movement behavior during panic situation is highly based on the microscopic movement approach that promotes the individual survival instinct. The best approach to simulate the pedestrian movement is the Cellular Automata (CA) movement model approach. Figure 1 shows the simulation parameters of pedestrian movement possibilities by using CA model.

Microscopic Movement

Horizontal Movement Directions

Obstacles Avoidance

Exit Distance

Simulation in Panic and Normal Situation

Fig. 1. CA features parameters for pedestrian movement.

Optimal Moore Neighborhood Approach of Cellular Automaton

61

During the movement simulation of the pedestrian, from Fig. 1, these are three parameters that need to be valued for the pedestrian to move and evacuate similarly to the real escaping activity.

(a)

(b)

(i-1, j-1)

(i, j-1)

(i+1, j-1)

(i-1, j)

(i, j)

(i+1, j)

(i-1, j+1) (i, j+1) (i+1, j+1) (c)

Fig. 2. (a) The von Neumann movement direction approach. (b) The Moore Neighborhood movement direction approach. (c) The transition probabilities of Moore neighborhood movement direction over a time step.

62

N. Ibrahim and F. H. Hassan

The movement with CA approach will describe the whole possibility of the pedestrian surrounding within a time step. The time step movement can be design with von Neumann approach and Moore Neighborhood approach. Figure 2(a) shows the basic von Neumann movement direction approach and Fig. 2(b) shows the Moore Neighborhood direction approach. The von Neumann direction is the basic two directions of movement that will enable the regular human movement; (1) up and down, and (2) left and right. However, the Moore Neighborhood approach had introduced the nautical navigation movement which had increased the intelligent of the pedestrian in the simulation to meet the real reaction of the pedestrian during the reallife movement. Hence the, for the simulation of the pedestrian in the closed surrounding area, the Moore Neighborhood approach will be designed in a 3 3 matrices that shows the transition probabilities (refer Fig. 2(c)). The time step movement with the transition probabilities will design the movement of the pedestrian and their decision making to move for the nearest exit point and also to avoid the physical collisions. Hence, the CA approaches able to re-enact the real situation especially when the panic situation occur.

4 Methodology: Pedestrian Movement Simulation Pedestrian movement simulation is the second hand movement that imitate the real human movement behavior and pattern by using the machine learning process to promote the artiﬁcial intelligent. This movement simulation can be implemented in a speciﬁc selected area for predicting the pedestrian movement especially during panic situation to ﬁnd the number of expected casualties and the feasible spatial arrangement to reduce the physical collision and reduce injuries. In this research study, the issue of the familiarity of the pedestrian with their surrounding and the structure arrangement of the space had been identiﬁed as the factors of contributing towards the high casualties during panic situation happened due to the high physical collision and unnecessary movement direction took by the pedestrian. Hence, this research was made to show the impact of familiarity of the building structure towards the pedestrian movement and the flow of the pedestrian during escaping can be control by the good arrangement of the space. The familiarity of the pedestrian had been tested with the horizontal movement of Von Neumann approach and optimal criterion of Moore Neighborhood approach for a closed building area with the randomization of spatial layout arrangement. The pedestrian speed movements in this experiment were proﬁled based on the situations. The pedestrian movement speed during normal situation was set as 3 m/s and 5 m/s during panic situation [17]. 4.1

Spatial Layout Design: Closed Area

Closed area can be the most crowded place during an event or any assembly activities. However, nowadays, closed areas are more likely were built to meet the cultural design and aesthetical value instead of focusing on the safety and movement flow of the pedestrian. Hence, during the panic situation, the small triggering incident is able to create a great chaos and easily caused casualties and serious injuries. In this research,

Optimal Moore Neighborhood Approach of Cellular Automaton

63

a grid map of a closed area hall had been introduced for the selected space for further simulation on the pedestrian movement. Figure 3 shows the grid map for the closed area hall.

Fig. 3. Closed area hall’s grid map. Note – The blue color cells are the wall of the space and the yellow color cells are the ingress and egress points of the closed area hall. The white color cells are the empty floor.

4.2

Horizontal Von Neumann Approach

Horizontal von Neumann approach is the pedestrian movement approach that implements the basic von Neumann direction which consists of two directions. This movement was used in this research to show the movement of the pedestrian that are unfamiliar with the structure of the selected closed area. The pedestrian will wander along their route path, either to the left side or to the right side to ﬁnd the nearest exit. The pedestrian will change their route course if they had encountered any obstacle, other pedestrian or wall. The route course changes will be based on the von Neumann direction approach. 4.3

Optimal Moore Neighborhood Approach

The Moore Neighborhood direction movement approach is the most similar to the real human movement by implementing the nautical navigation direction that will create the four movement directions. This movement was used in this research to show the movement of the pedestrian that are familiar with the structure of the selected closed area. However, in this research, the approach was enhanced by implementing the optimality criterion for the each of the probabilities transition matric to create the more intelligent pedestrian that are able to locate the nearest exit point and the nearest floor to move for reducing their escaping time and their movement steps in order to save their life during evacuation process.

64

N. Ibrahim and F. H. Hassan

4.4

Randomization of Obstacles

The obstacles were randomized in this research experiments to ﬁnd the impact of the obstacles arrangement towards the pedestrian movement speed.

5 Results Some experiments were run through for this study to show the impact of pedestrian familiarity of a closed area building towards the movement speed during evacuation and also the impact of the spatial layout design towards the pedestrian movement pattern in avoiding the heavy physical collision. The closed area is the hall that was replicate into the 26 20 grid cells. The experiments are divided into; (1) Pedestrian movement with horizontal von Neumann approach, (2) Pedestrian movement with optimal Moore Neighborhood approach. Both of movement approaches will be run with the different numbers of obstacles; 0, 20, 60 and 100 obstacles with different numbers of pedestrian; 20, 60, 100, 140, 180 and 220 pedestrian. Figure 4 shows the simulation of the pedestrian movement to evacuate from the hall.

Fig. 4. The simulation of pedestrian movement in a closed area hall. Note – The blue color cells are the wall of the space and the yellow color cells are the ingress and egress points of the closed area hall. The white color cells are the empty floor and the light green color cells are the obstacles. The pedestrian are the red color cells.

Table I shows the result of experiments of pedestrian movement in time (seconds) for pedestrian movement with horizontal von Neumann approach. Table 2 shows the result of experiments of pedestrian movement in time (seconds) for pedestrian movement with optimal neighborhood approach.

Optimal Moore Neighborhood Approach of Cellular Automaton

65

Table 1. Experiment result on the unfamiliar movement of pedestrian using horizontal von Neumann approach in normal and panic situation Obstacle(s) 0 Pedestrian N(s) P(s) 20 26 12 60 30 12 100 34 23 140 44 24 180 47 56 220 81 63 a N = Normal situation b P = Panic situation c s = Seconds

20 N(s) 27 30 44 51 59 72

P(s) 12 14 25 50 50 61

60 N(s) 30 39 57 63 67 71

100 P(s) N(s) P(s) 13 33 20 17 46 30 26 70 35 37 82 38 43 107 50 60 126 68

Table 2. Experimant result on the familiar movement of pedestrian using optimal Moore neighborhood approach in normal and panic situation Obstacle(s) 0 Pedestrian N(s) P(s) 20 7 7 60 14 16 100 15 16 140 19 17 180 33 24 220 44 26 a N = Normal situation b P = Panic situation c s = Seconds

20 N(s) 8 9 11 17 23 50

P(s) 8 10 17 20 25 33

60 N(s) 6 9 14 15 21 45

P(s) 9 12 16 34 45 49

100 N(s) 8 10 18 25 40 48

P(s) 7 8 15 17 21 36

Fig. 5. The graph comparison of the time taken for the unfamiliar pedestrian (horizontal von Neumann approach) to exit the space in a normal situation.

66

N. Ibrahim and F. H. Hassan

Based on the both Tables 1 and 2, the time (seconds) took for the pedestrian to exit the space area were increased with the number of pedestrian and the number of obstacles. From Tables 1 and 2, several graphs were plotted and the results were shown in Figs. 5, 6, 7 and 8.

Fig. 6. The graph comparison of the time taken for the unfamiliar pedestrian (horizontal von Neumann approach) to exit the space in a panic situation.

Fig. 7. The graph comparison of the time taken for the familiar pedestrian (Optimal Moore Neighborhood approach) to exit the space in a normal situation.

Based on Fig. 5, the graph was plotted to show the speed taken by the unfamiliar pedestrian to escape from the closed area during normal situation. Figure 6 had shown the plotted graph on the speed of the unfamiliar pedestrian to evacuate from the closed area during panic situation. Both of these ﬁgures were plotted based on the result of horizontal von Neumann direction movement approach. Figure 7 was plotted to show the speed of the pedestrian that are familiar with the spatial layout arrangement of the closed area during normal situation and Fig. 8 was plotted for the panic situation. Both of these Figs. 7 and 8 were plotted based on the result of optimal Moore Neighborhood direction movement approach.

Optimal Moore Neighborhood Approach of Cellular Automaton

67

Fig. 8. The graph comparison of the time taken for the familiar pedestrian (Optimal Moore Neighborhood approach) to exit the space in a panic situation.

The pedestrian with different movement approach had shown a great gap between the both situation for normal and panic situation. The pedestrian with optimal Moore Neighborhood approach are able to escape with the speedy time compare to the von Neumann approach. Several graphs in Figs. 9, 10, 11 and 12 were plotted to show the pattern on pedestrian movement speed between the unfamiliar and familiar pedestrian of the structure of the area to evacuate for different number of obstacles in panic situation.

Fig. 9. The graph comparison of the time taken to exit the space for the both movement approaches in 0 number of obstacle.

68

N. Ibrahim and F. H. Hassan

Fig. 10. The graph comparison of the time taken to exit the space for the both movement approaches in 20 obstacles.

Fig. 11. The graph comparison of the time taken to exit the space for the both movement approaches in 60 obstacles.

These graphs of result’s comparison had shown that the pedestrian with familiarity towards the structure of the space and able to locate the exit point will able to escape faster than the pedestrian that had low or unfamiliar with the structure of the spatial layout of a particular area. These results had shown the important of the pedestrian to be educated and familiarize themselves towards their surroundings, environment and space area. The familiarity will create the sub-conscious behavior which is based on the basic knowledge and idea of one particular space.

Optimal Moore Neighborhood Approach of Cellular Automaton

69

Fig. 12. The graph comparison of the time taken to exit the space for the both movement approaches in 100 obstacles.

The time taken to evacuate from the closed area were decrease between the normal situation and panic situation due to the movement speed of the pedestrian that was increase during panic situation. However, there are several numbers of pedestrian in a particular number of obstacles had an increase number of time (seconds) for escaping during panic situation compare to the normal situation. These abnormal evacuation time taken was due to the bottleneck effect and heavy collision of the pedestrian at the exit points. In this research the abnormality of the time for the pedestrian evacuation happened due to the randomization of the obstacles arrangement for the spatial layout. Some of the obstacles were spawned near or next to the exit points which had blocked the pedestrian’s movement for their way out of the area (refer to Fig. 4). Hence, from this research, the familiarization of the pedestrian towards the spatial layout is important for the best decision on the escape route and the spatial layout design able to affect the movement speed of the pedestrian to evacuate from the affected area.

6 Conclusion Pedestrian simulation is the re-enact situation of the real-life pedestrian movement in a scene for a speciﬁc situation. Nowadays, the pedestrian simulation had become the most popular tools for designing and predicting the spatial layout and the pedestrian movement flow. The implementation of model simulation for a speciﬁc area can help the developer to increase the safety and improved the spatial layout design for a better usage and able to decrease the number of casualties during panic situation. Through this research, the implementation pedestrian simulation able to prove that the important of the emergency education by sending the awareness to the pedestrian to be aware of their surroundings arrangement and always ready for any incidents that require them to evacuate for a safer place. The randomization of the obstacles in this research had shown that the obstacles or any items and elements in a space should be arrange in a

70

N. Ibrahim and F. H. Hassan

feasible spatial layout design that satisfy the universal standard design rules to shape the pedestrian movement flow for the pedestrian to easily move and change their route in ﬁnding the shortest path to evacuate from a space area. Acknowledgment. Research experiment reported here is pursued under the Fundamental Research Grant Scheme (FRGS) by Ministry of Education Malaysia for “Enhancing Genetic Algorithm for Spatial Layout Design Optimization with Pedestrian Simulation in a Panic Situation” [203.PKOMP.6711534] and Bridging Grant by Universiti Sains Malaysia for “Pedestrian Simulation Model for Clogging Detection and Survival Prediction in a Fire Spreading Situation” [304.PKOMP.6316019]. The preliminary study of this research is supported under the Short Term Grant Scheme by Universiti Sains Malaysia for “Pedestrian Simulator and Heuristic Search Methods for Spatial Layout Design” [304.PKOMP.6313169].

References 1. Sime, J.D.: Crowd psychology and engineering. Saf. Sci. 21(1), 1–14 (1995) 2. Huixian, J., Shaoping, Z.: Navigation system design of ﬁre disaster evacuation path in buildings based on mobile terminals. In: 2016 11th International Conference on Computer Science & Education (ICCSE) (2016) 3. Tcheukam, A., Djehiche, B., Tembine, H.: Evacuation of multi-level building: design, control and strategic flow. In: 2016 35th Chinese Control Conference (CCC) (2016) 4. Lu, X., et al.: Impacts of anxiety in building ﬁre and smoke evacuation: modeling and validation. IEEE Robot. Autom. Lett. 2(1), 255–260 (2017) 5. Jay, B.N.: Tahﬁz did not have ﬁre exit; bodies found piled on top of each other, in New Straits Times. New Straits Times Press, Berhad (2017) 6. Four in a Family Killed in Fire. The Star Online. Star Media Group Berhad (ROC 10894D) (2017) 7. Yamin, M., Al-Ahmadi, H.M., Muhammad, A.A.: Integrating social media and mobile apps into Hajj management. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (2016) 8. Konstantara, K., et al.: Parallel implementation of a cellular automata-based model for simulating assisted evacuation of elderly people. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) (2016) 9. Ruiz, S., Hernández, B.: A parallel solver for markov decision process in crowd simulations. In: 2015 Fourteenth Mexican International Conference on Artiﬁcial Intelligence (MICAI) (2015) 10. Hassan, F.H.: Using microscopic pedestrian simulation statistics to ﬁnd clogging regions. In: 2016 SAI Computing Conference (SAI) (2016) 11. Kihlstrom, J.F.: The person-situation interaction. In: The Oxford Handbook of Social Cognition, pp. 786–805 (2013) 12. Zong, X., Jiang, Y.: Pedestrian-vehicle mixed evacuation model based on multi-particle swarm optimization. In: 2016 11th International Conference on Computer Science & Education (ICCSE) (2016)

Optimal Moore Neighborhood Approach of Cellular Automaton

71

13. Wang, H., et al.: Simulation research based on evacuation ability estimation method. In: 2016 12th World Congress on Intelligent Control and Automation (WCICA) (2016) 14. Miao, Q., Lv, Y., Zhu, F.: A cellular automata based evacuation model on GPU platform. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems (2012) 15. Wineman, J.D., Peponis, J.: Constructing spatial meaning: spatial affordances in museum design. Environ. Behav. 42(1), 86–109 (2010) 16. Yue, H., et al.: Simulation of pedestrian flow on square lattice based on cellular automata model. Phys. A 384(2), 567–588 (2007) 17. Helbing, D., Farkas, I., Vicsek, T.: Simulating dynamical features of escape panic. Nature 407(6803), 487–490 (2000)

Emerging Structures from Artisanal Transports System: An Agent Based Approach Lea Wester(B) Ecole Centrale de Casablanca, UMR 7300 ESPACE Aix Marseille University, Marseille, France [email protected]

Abstract. Some metropolises in the world don’t have any transport authority; the collective transports are not planned or centralized. This lack of institutionalization of the sector leads to the apparition of alternative travel solutions. They are based on individual initiatives. These transports are called “artisanal”. They have the characteristic to let a great freedom to the vehicle crew about the concrete operation of the transport service. Our aim is to understand how collective transport without any planning can allow daily mobility of several million people in the world. A ﬁrst part of our research consisted of the identiﬁcation of the essential elements of the strategy of the vehicle crews. We deﬁned two kinds of operational logic based on ﬁeld surveys in Brazzaville. The objective of this communication is to think about the inﬂuence of the urban context on the adaptation of an artisanal transport system. From the deﬁnition of the structure of the network and the repartition of the travel demand, we focus on the eﬃciency and the emerging structure of the transports. To understand the relation between micro scale elements and emerging spatial structure, we use a methodology based on informatics models. We will present results of several simulations to test the adaptation of artisanal transports in diﬀerent urban contexts. Keywords: Paratransit

1

· Multi agents system · Spatial modelling

Introduction

Transport systems in major cities of southern countries usually evoke apparently archaic services with anarchic organization. These stereotypes result from a lack of understanding of these systems. Today, several metropolis of southern countries have transport services without any centralized management [1]. Alternative solutions have appeared which are based on individual initiatives [2]. In a sector where Northern countries place greater emphasis on centralization, these solutions challenge us. They allow an alternative mobility management. Emergence process is at the heart of these systems, which appear to emanate from the city herself. These transports are called “artisanal” [3]. They have the characteristic c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 72–79, 2019. https://doi.org/10.1007/978-3-030-01174-1_6

Emerging Structures from Artisanal Transports System

73

to let a great freedom to the vehicle crew about the concrete operation of the transport service. These transports take a lot of diﬀerent forms and are made of multiple types of modes. They are mototaxis in Cotonou, minibuses in Djakarta and shared taxis in Morocco... Our research aim is to understand what spatial structures of collective transport systems are when there is not any planning. That kind of system allows daily mobility of several million people in the world. Precisely we are interested to understand how the transport systems adapt to the city. Cartographic and spatial analysis allows us to understand the emerging properties but we want to link these properties with the process building them. Indeed a dynamic approach is more adapted to analyse the emerging structures: the question is to clarify the link between micro-scale strategies and emerging spatial structures. The objective of this communication is to think about the inﬂuence of the urban context on the adaptation of an artisanal transport system. From the deﬁnition of the structure of the network and the repartition of the travel demand, we focus on the eﬃciency and the emerging structure of the transports. This work starts with a case study which is presented below. Then we will explain our model building methodology. Our results come from several simulations to test the adaptation of artisanal transports in diﬀerent urban contexts.

2 2.1

Starting from a Case Study Global Approach

In the case of Brazzaville, we observed an artisanal transport system. An enquiry allowed us to deﬁne the spatial, social and economic characteristics of this organization. A ﬁrst model has been built from these data [4]. We understood the adaptation of these transports to their ﬁeld of apparitions. This model also allowed us to judge the plausibility of our model. But this ﬁrst experience can’t demonstrate the impact of the characteristics of the city on the collective transports. So, we are now analysing this kind of transport in theoretical urban context. We aim to understand what the implications of city structure and organization for artisanal transports are. Obviously, the artisanal transports of Brazzaville don’t represent all the types of artisanal transports. We called that kind of organization: “route management strategy”, as we will see. 2.2

The Case of Brazzaville

Our study starts with the case of Brazzaville. It is the capital of Congo Republic, a country of four and a half millions inhabitants, with the third part living in the town capital. Indeed, the city concentrate the majority of activities and infrastructure. It keeps a basic structure inherited from the colonial period [5]. In fact, the current city centre corresponds to the colonial neighbourhood. It is better equipped than the rest of the town, and country. In particular, it has a

74

L. Wester

great number of paved roads. The peripheries still being organized with squared plots, as the neighbourhood reserved for the local population in the colonial times [6]. In this city, we can observe an example of artisanal transport system. Since more than twenty years there was not any planning or control of the collective transports. The collective transports are operated by independent workers. We made several enquiries in Brazzaville, between 2013 and 2014, to understand how the system is working. We collected data of diﬀerent kinds: spatial information, interviews with the actors, counts... Among the actors of the transport system, the vehicle crews are the only one in contact with the users. They pay the rent to the vehicle owner and taxes to the public authority. The control agents, as policemen, are only looking at the driver’s licence. They don’t control the traﬃc of the buses: the vehicle crew decide alone where and when they want to go. To make the best choice of itinerary and traﬃc speed, the vehicle crew develop some strategies. When the vehicle arrives at a stop the crew choose a new destination. They take into account the most demanded destination and their representation of the stop. For example, they take into account the traﬃc jams or the potential users. This way of functioning create a progressive building of the bus route. In fact, the transport service appears step by step without planning. This is the reason why that kind of system is based on a “route management strategy”. The observation of the spatial repartition of the transport system of Brazzaville shows some structural characteristics: the transport service has a large coverage of the city but it stays polarized in the city centre. A static observation of the data doesn’t explain the relation between the organization of the system and its spatial anchoring. So, we propose a way to focus on this emerging phenomenon and his dynamics, using agent based models.

3 3.1

Modeling Artisanal Transport System Multi-Agent System (MAS)

To understand the relation between micro scale elements and emerging spatial structure, we use a methodology based on informatics models. These models are called multi-agent systems. They are composed of several “agents” who are micro informatics programs. These agents interact between themselves and with a deﬁned environment [7]. For our subject, that kind of modelling approach is ideal because it focus on individual strategies. Indeed, artisanal transports are based on the vehicle crews strategies. The processes start from micro-scale actions and interactions. Another interest of multi-agent systems is to allow the programming of several context. We develop an experimental approach which is not possible outside the model. Moreover, in a geographical point of view, the MAS are particularly interesting: they integrate a spatial dimension in the interaction of the agents [8]. Our general approach is based on a series of models. The ﬁrst ones include ﬁeld data as far as possible: network is implemented from spatialized data and

Emerging Structures from Artisanal Transports System

75

travel demand from counts. Buses strategies follow our enquiry. They are memorizing the results of each displacement and building a ranking of the stops. The diagram shows their sequence of actions (Fig. 1). When they have to choose their next stops, they look ﬁrst at the users destinations and after their own ranking.

Fig. 1. Activity diagram of the agents “bus”.

This model allowed us to confront the outputs with ﬁeld data and check their plausibility. Indeed, we built a ﬁrst environment with the Brazzaville’s network and travel demand. It shows that the spatial repartition of the service was concentrated on the same spots than the observed traﬃc in the city. Also, the economic beneﬁts of the simulated buses are consistent with the ﬁeld observation of A.W. Landa [9]. These ﬁrst results showed that the system was more eﬃcient with a weak number of buses. The transport service is more spatially homogeneous and serve more users when the buses are not too numerous [4]. But in this case, the transport system is adapting to the constraints of the context: the network and the travel demand are directly built from ﬁeld data. We can’t understand what the implication of this urban context on the adaptation of the system and the emerging structure are. In other words: is the context important for the adaptation of the transport system or this kind of collective transport has their own structure despite the context? 3.2

Environment and Simulations

We are now at the step of confronting the strategies with diﬀerent urban context. Our question is to test the adaptation of these artisanal transport strategies outside their environment of apparitions. We will present results of several simulations to test the adaptation of artisanal transports in diﬀerent urban contexts. We deﬁned the urban contexts using the network structure of roads and buses’ stops and the demand structure. We propose two kinds of networks: • Squared networks with 4 or 8 connection for each node. • Star networks with 3 or 5 connections for each node.

76

L. Wester

These two kinds of networks are also built with a more or less concentrated travel demand. The three types of travel demand are: • Homogeneous repartition: the users are homogeneously distributed on the stops. • Gaussian repartition: there are more spots with a medium number of users and a few spots with a strong or a weak demand. • Concentrated repartition: the users are concentrated on a few number of spots. Using these two parameters, we can generate theoretical environments with a more or less polarized network, and a more or less concentrated travel demand. All things considered, we are focusing our reﬂexion on the eﬀects of the global polarization of the urban context. So we have several models with diﬀerent urban environments and the same strategy of buses. We also propose an economic parameter with variation of the tickets prices. Besides, a saturation parameter is included with the variation of the number of buses. These variations will reveal the reaction of the system when traﬃc jams increase. These models have three diﬀerent outputs. The number of served users, the spatial repartition of the transport service and the global proﬁts generated by the buses. We can estimate the adaptation of the buses in terms of eﬃcacy for the users, spatial structures and economic beneﬁts. These three outputs are each corresponding to a kind of eﬃcacy for a collective transport system.

4

Results

The following results come from 1000 iterations of each possible conﬁguration of inputs. This way the outputs can be observed separately. When we compare the results of each output, we can observe that the best conﬁguration is diﬀerent if we focus on the served users, the spatial repartition of the transport service of the proﬁts. 4.1

In Terms of Served Users (Fig. 2)

In the case of the served users, we can see that the squared network with a demand distributed in a Gaussian way is the most eﬃcient conﬁguration. Indeed, the squared network includes more road connections between the nodes and the buses can build more direct itineraries according to travel demand. A Gaussian repartition of the travel demand create more numerous stops with a medium travel demand and a few stops with a lot of travel demand or a really weak demand. In this context, the strategy of building itineraries step by step is more eﬃcient because it has more often medium travel demand at the stops. So the buses make almost never empty trips.

Emerging Structures from Artisanal Transports System

77

Fig. 2. Served users according to inputs.

The number of buses, which increase the traﬃc jams, has a light eﬀect on the number of served users: they are just a little more numerous with 20 buses. That is to say when the traﬃc jams are totally avoided. In fact the route management strategy avoid the creation of traﬃc jams because the buses are bypassing the competition. They are looking for unserved zones: stops with a high demand and no waiting time. The variation of ticket prices does not have any eﬀect on the number of served users. In fact, the users are just programmed as resources, they are just reacting agents and they do not have any strategy. Finally, the most eﬃcient conﬁguration in terms of served users is a conﬁguration with: a squared network, a Gaussian repartition of demand and not too many buses. This urban context seems to facilitate the adaptation of the route management strategy. Indeed, it allows direct and almost full trips. 4.2

In Terms of Spatial Repartition of the Transport Service (Fig. 3)

The spatial repartition of the transport service is measured by a percentage of concentration. That is to say that when the percentage is high, the transport service is concentrated on a few number of stops. And when the percentage is low, the transport service is homogeneously distributed on the stops.

Fig. 3. Spatial repartition of the transport service according to inputs.

78

L. Wester

We can observe that the transport service is more homogeneous with a squared network. In this kind of network, the buses can build almost direct itineraries. On the contrary, a star network force the buses to go by the central stops and it builds a more concentrated transport service. A Gaussian repartition of travel demand, with a lot of medium demanded stops, create a transport service more homogeneous. We note that the homogeneous travel demand doesn’t create a homogeneous spatial repartition of the transport service. That is because a homogeneous demand prevents the buses to rank the stops. This way they are circulating almost randomly. A high number of buses also create traﬃc jams, so a more concentrated transport service. Once again, the ticket prices have no eﬀect on the spatial repartition of the transport service. The conﬁguration with the most homogeneous service is the one with: squared network, Gaussian repartition of travel demand and weak number of buses. This urban context let the route management strategy create direct itineraries in response to travel demand. The buses are moving fast without traﬃc jams. 4.3

In Terms of Profits (Fig. 4)

With regard to the proﬁts, we can observe that we are working with negative values. This is due to the fact that all the values are taken in account while there is just one conﬁguration with positive proﬁts, as you can see. Obviously, the increase of the ticket prices increase the proﬁts. A higher number of buses generate more proﬁts because they are all having passengers. We consider the average of the proﬁts without links to the number of served users. If all the buses make just one short and full trip, they generate a lot of proﬁts. But it doesn’t mean that they serve more users than fewer buses circulating quicker. A fractal network allows more proﬁts because the vehicle travels are shorter and polarized one the fractal nodes. The more homogeneous repartition of the travel demand is the one with more proﬁts because all the bus trips contain users. The most concentrated travel demand create more proﬁts because the buses focus on the most demanded stops and make shorter trips between them.

Fig. 4. Proﬁts according to inputs.

Emerging Structures from Artisanal Transports System

79

The money making conﬁguration is the most concentrated one. When network and travel demand are concentrated, the bus trips are shorter and full. So it is the more proﬁtable.

5

Conclusion

To conclude, the agent-based approach is particularly adapted to our object of research. This kind of informatics models allows us to observe the emerging structures and dynamics of artisanal transports. This analysis shows that there are conﬁgurations more eﬃcient than other depending on the kind of eﬃciency you want. We can observe that served users eﬃciency and spatial eﬃciency are pretty similar: squared networks, Gaussian travel demand repartition, not to many buses and low ticket prices. The best conﬁguration for economic eﬃciency is diﬀerent of the others: the polarization of the network and the concentration of the travel demand are generating more proﬁts because it reduces the costs. Finally, artisanal transports are more adapted to some sort of urban contexts. In reality the crews of the buses are always trying to ﬁnd an equilibrium between these three kinds of eﬃciency and the urban context is an important parameter for the global eﬃciency of artisanal collective transports.

References 1. Wilkinson, P., Golub, A., Behrens, R., Salazar Ferro, P., Schalekamp, H.: Transformation of urban public transport systems in the global south. In: Geyer, H.S. (ed.) International Handbook of Urban Policy. Issues in the Developing World, vol. 3, p. 30 (2011) 2. Cervero, R., Golub, A.: Informal transport: a global perspective. World Transit Research, January 2007. http://www.worldtransitresearch.info/research/1434 3. Godard, X.: Les transports et la ville en Afrique au sud du Sahara: le temps de la d´ebrouille et du d´esordre inventif. KARTHALA Editions (2002) 4. Wester, L.: Mod´elisation multi-agents de transports collectifs artisanaux: structures ´emergentes et strat´egies individuelles. In: Actes de la conf´erence SAGEO 2015, vol. 11, Hammamet, pp. 74–89 November 2015. https://hal.archives-ouvertes.fr/ hal-01263536/document 5. Ziavoula, R.E.: Brazzaville, une ville ` a reconstruire. KARTHALA Editions, November 2006 6. Balandier, G.: Sociologie des Brazzavilles noires. Colin, A. (ed.), Paris (1955) 7. Ferber, J.: LES SYSTEMES MULTI-AGENTS. Vers une intelligence collective. Dunod, Paris, December 1997 8. Daud´e, E.: Syst`emes multi-agents pour la simulation en g´eographie: vers une g´eographie artiﬁcielle. In: Lavoisier (ed.) Mod´elisations en G´eographie. IGAT, Paris, pp. 353–380 (2005) 9. Landa, A.W.: Le transport en commun ` a Brazzaville: organisation de l’espace et eﬀets socio-´economiques, Ph.D. dissertation, Universit´e Marien Ngouabi, Brazzaville (2014)

Automatic Web-Based Question Answer Generation System for Online Feedable New-Born Chatbot Sameera A. Abdul-Kader1,2(&), John Woods1, and Thabat Thabet1,3 1

School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK {saabdua,woodjt,tfytha}@essex.ac.uk 2 University of Diyala, Diyala, Iraq 3 Technical Collage, Mosul, Iraq

Abstract. The knowledge bases of Chatbots are built manually, which is difﬁcult and time consuming to create and maintain. The idea of automatically building a Chatbot knowledge base from the web has emerged in recent years. Question Answer (QA) pairs are acquired from existing online forums. Little work has been done on generating questions from existing fact or ﬁctional sentences. Two main contributions are presented in this paper. The ﬁrst contribution is generating factual questions from sentences gathered by a web spider; the raw text sentences are extracted from the HTML and pre-processed. Named Entity (proper none) Recognition (NER) is used in addition to verb tense recognition in order to identify the factual sentence category. Speciﬁc rules are built to categorize the sentences and then to generate questions based upon them. The second contribution is to generate a new born Chatbot database by placing the resultant QA pairs into an SQLite database built for this purpose. The new built database is used to nurture a Chatbot that can simulate the personality of a desired ﬁgure or behavior of an object. The footballer David Beckham is used as an example and the data used is acquired from a page on about him on Wikipedia. The resulting QA pairs are presented and a subjective assessment shows considerable enhancement in QA pairs’ generation over a comparative system. Keywords: Chatbot knowledge Feature extraction Information retrieval Named entity recognition Natural language processing Question answer pairs

1 Introduction The majority of the existing Chatbot databases are hand built and take a long time to construct. These databases are not dynamic and they are difﬁcult to update. Designing a new born Chatbot and populating it from the web is a new area of research. Few researchers have investigated the possibility of educating a new Chatbot that embodies an artiﬁcial ﬁgure. Some authors suggest extracting Chatbot knowledge from the discussion forums available online [1, 2]. Others start database population from the web or plain text depending on a particular object or person [3]. The data extracted © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 80–98, 2019. https://doi.org/10.1007/978-3-030-01174-1_7

Automatic Web-Based Question Answer Generation System

81

from web pages needs signiﬁcant processing before it is ready for conversational systems; especially, if the text is unstructured like that in Wikipedia. Filtering and analysis is required in order to be certain text in the database has meaning. One of the popular forms of data in Chatbot knowledge bases is QA pairs. The majority of QA pairs are either written manually or acquired from online existing discussion forums [2]. Generating QA pairs for a Chatbot requires a lot of processing and ﬁltering of the sentences to generate the corresponding questions. Some kind of hypothesis is necessary to build a framework to derive questions from sentences. The hypothesis then needs to be converted to speciﬁc rules and processes to mechanically populate the database. Validation is also applied to the QA pairs and compared with other question generation systems. The main purpose of this paper is to generate factual questions from existing factual answer sentences i.e. to reverse engineering. These sentences are extracted from plain text retrieved from the Web and pre-processed using NLTK. Entity Name (proper noun) recognition and verb tense recognition are used to determine factual sentence type. Then, speciﬁc rules are used to classify the factual sentences and then generate and categorize the questions. The second purpose of this paper is to generate a new born Chatbot database by putting the resultant QA pairs into an SQLite database that is built in a number tables to contain different question and answer categories. A Chatbot database is automatically populated with QA pairs from the web pages associated with a desired ﬁgure or object. This enables the user to chat with a Chatbot that emulates the behavior of the object they would like. The example ﬁgure used is the footballer David Beckham and his Wikipedia page is used to retrieve the associated data. We present results for our unranked QA pairs produced by our rule-based QA generator. The evaluation results show the enhancements achieved by our proposed system over the comparative works.

2 Related Work The idea of acquiring Chatbot knowledge from online is presented by [1] as an approach to extract QA pairs from online discussion forums. The authors in [2] improved this idea by acquiring data from online discussion forums automatically using a classiﬁcation model and ensemble learning theory. In [4] the authors propose an idea to ﬁnd QA pairs from the web by detecting the question in a thread of an extracted forum. A method of graph based propagation is used to detect the answer from the same thread. The authors in [5] present Open Instructor that is based on Wikipedia to extract unstructured text and transfer it to corresponding sentences without mentioning the Chatbot as an application for their idea [1, 2, 4, 5]. The idea of an automatic generating mechanism is proposed by the authors in [6] to give expressive opinion sentences from numerous reviews extracted from the web. The authors in [6] base the analysis on the frequency of adjectives, sentence length, and contextual relevance to rank the reviews. However, no question generation is presented for a conversational based system. The authors in [7] presents an approach to automatically generate short answer questions for reading comprehension assessment. The authors introduce Lexical Functional Grammar (LFG) as the linguistic framework for question generation, which enables systematic utilization of semantic and syntactic information. The approach can

82

S. A. Abdul-Kader et al.

generate questions of better quality and uses paraphrasing and sentence selection in order to improve the cognitive complexity and effectiveness of questions [7]. However, this system generates short answer questions and the authors in [7] did not demonstrate that it works with factual questions. The authors in [8] developed two algorithms in order to supply instructors with questions for students in introductory biology classes. One of the algorithms generates questions about photosynthesis and another retrieves biology questions from the web. Students were used to validate the quality of the questions. The quality of the questions generated were rated by the authors themselves in [8]. The pattern of results shows an improvement in the pedagogical beneﬁts of each class. This suggests that the generated questions may work well helping students to learn [8]. However, the authors in [8] stated that the questions generated may be shallower than questions written by professionals. The authors in [9] introduce a template-based approach that combines semantic role labels with an automatically generating natural language questions system to support online learning. However, the Questions generated by the approach in [9] are not answerable from the sentences they are generated from; as stated by the authors [9]. The authors in [3] describe an idea to identify signiﬁcant facts in text representing the life of a historical ﬁgure to build a corresponding Chatbot. This Chatbot should be able to learn (supervised learning) from previous experiences in order to act more realistically. The authors provide a generic form of sentence to solve the problem of learning to enable the Chatbot to acquire as much information as possible relating to the personality and life of the person being simulated. The source of information to feed this Chatbot is websites such as: Wikipedia for unstructured data and DBpedia for structured. NLP techniques are used to convert plain text to structured text and then restructure them into a generic form of a sentence [3]. The input is a collection of factual sentences which are transformed to match the generic form. The authors use an open source Chatbot called ChatScript to design the conversation. However, there was no demonstration for question generating from the corresponding data. The idea proposed in this paper carries through the ideas presented in [3]. This paper discusses the idea of extracting unstructured text from Wikipedia and preprocesses the text using NLTK to structure it into individual sentences and then categorize the sentences using the proposed rules to extract fact or deﬁnition. The novelty presented here is using Named Entity Recognition and proposed rules for verb tense recognition in order to manufacture factoid or deﬁnition questions from the extracted sentences. The operation is automatic and after few hours of run, an SQLite database is prepared from the QA pairs. The QA pairs can be used directly as a knowledgebase for a Chatbot. The approach in [9] has been adapted in our system and the produced data set used to generate QA pairs which are compared with our system. Subjective assessment is used to evaluate both our and the comparative systems and conclusions drawn.

Automatic Web-Based Question Answer Generation System

83

3 Background 3.1

New Born Chatbot

To build a dialogue system (Chatbot) program, one of the most essential requirements is to design a sufﬁciently detailed database for that system. As the Chatbot bases its knowledge on statements or sentences and uses them to hold a conversation, it needs a large but not overlapping knowledgebase. Chatbots can assist in human computer interaction and they have the ability to examine and influence the behavior of the user by asking questions and responding to the user’s questions. The Chatbot is a computer program that mimics an intelligent conversation. The input to this program is natural language text, and the application should give an answer that is the best intelligent response to the input sentence. This process is repeated as the conversation continues and the response is either text or speech [10]. Trying to build a new born Chatbot is a big challenge. The challenge becomes even more difﬁcult when trying to make it learn. The trickiest part is collecting and processing the data that is used to populate the Chatbot database because the only knowledge the Chatbot has access to, is the information it has learnt itself. So, the data fed into the Chatbot should be selected and ﬁltered carefully using statistical and numerical means. A new born Chatbot can learn general facts but is often focused towards a speciﬁc ﬁgure or object and its database can be updated from the web according to a user request (i.e. the user can choose the ﬁgure or the object they need). The new born Chatbot has an empty knowledgebase and it is considered to be born when its database begins population from the web according to the user’s choice of ﬁgure or object. 3.2

Question Answer

Question answering is a topic of Information Retrieval and an NLP domain interest. The authors in [11] think that it needs more cooperation between the communities of Knowledge Representation and NLP [11]. The Question answering system is normally a mechanism embedded within sophisticated search engines seen ﬁrst in 1999 as TREC 8 (Text Retrieval Conference) [12]. A QA system normally retrieves a particular piece of information from the web to select the optimum answer to a user query. The concepts and rules of TREC developed over a number of years to expand the range of question sets and to choose more accurate answers. The rules proposed in this paper reverse the concept of putting or setting the question, analysing it, and then ﬁnding the best answer for it. It is proposed to retrieve the piece of information, classify the sentences after extracting, analyze the sentences, and then to generate questions from the existing statement to give answers. The analysis techniques used in both cases (Q lead to A or A lead to Q) are the same, such as: NLTK, ngrams, and NER.

84

3.3

S. A. Abdul-Kader et al.

Named Entity

One of the main elements in QA technique is Named Entity Recognition. It is one of the features that the researchers and programmers use to extract information from text. The NER consists of three groups. (1) Entity Names. (2) Number Expressions. (3) Temporal Expressions. Number expressions identify number entities like monetary values and temporal expressions identify time entities such as: date, and time [13]. Numbered and temporal expressions are not the interest of this paper. Entity names can annotate unique identiﬁers for the proper nouns that represent: PERSON, ORGANIZATION, LOCATION (GPE), and FACILITY names in the text (see Fig. 1) as in NLTK-NE library. The NER systems normally recognize the string that represent the entity name then identify it as the Named Entity specifying the type as the example below: The sentence: “Rami Eid is studying at Stony Brook University in NY” contains entities of proper names. Applying NER operation gives the following: [[(‘Rami’, ‘PERSON’), (‘Eid’, ‘PERSON’)], [(‘Stony’, ‘ORGANIZATION’), (‘Brook’, ‘ORGANIZATION’), (‘University’, ‘ORGANIZATION’)], [(‘NY’, ‘LOCATION’)]] The noticeable thing here is we have two main entity names: ‘Rami Eid’ and ‘Stony Brook University in NY’. This chained type of entity names is called a Cascaded Entity Name and the existing (python) kinds of NER recognize one word entities. Therefore, the cascaded names are separated during NER application because of the limitations in the Stanford and NLTK-NER modules. Hard code is needed in the implementation part to re-join the same name separated words and get the following form of output: [(‘Rami Eid’, ‘PERSON’), (‘Stony Brook University’, ‘FACILITY’)] In this paper, NER is used to detect the proper noun subjects at the beginning of the sentences and identify their classes in order to determine question-words during the question generation process.

Entity Names (NE)

FACILITY

PERSON

ORGANISAT -ION

LOCATI ON (GPE)

Fig. 1. Entity Name classes as in NLTK_NE library.

Automatic Web-Based Question Answer Generation System

3.4

85

Sentence Hypothesis

There are several kinds of sentences in the English language. The simplest form of sentences has been chosen for question generation in order to simplify the procedure of acquisition. The sentence intended is a factual or deﬁnition sentence. The hypothesis of active factual and deﬁnition sentences used in this paper is a generic form sentences as follows: Simple past sentence, is supposed to have the format: Subject þ The main verb in simple past form þ object or sentence completion

ð1Þ

Auxiliary verbs are excluded from the form above since their rules are different from the simple past tense. Auxiliary verbs, was and were as the main verb sentence hypothesis is: Subject þ the verbs was or were þ sentence completion

ð2Þ

Simple present sentence hypothesis is similar to simple past form except the main verb: Subject þ The main verb in simple present form þ object or sentence completion

ð3Þ

and the hypothesis of the axillary verbs is and are is similar to the one for was and were: Subject þ the verbs is or are þ sentence completion

ð4Þ

The subject type chosen here is just the proper noun according to the connection between the fact and deﬁnition associated with these types of nouns. The hypothesis of the verb tense for the main verb of the sentence is: (1) Should be the ﬁrst verb after the subject. (2) The auxiliary verb shouldn’t be followed by a past participle because this makes it passive which is not included in the proposal in this paper. (3) The auxiliary verb shouldn’t be followed by a present participle since this make it present continuous which is not needed in our hypothesis. (4) Simple past and present tenses shouldn’t be followed by propositions as the propositional tenses are not included in this proposal. 3.5

Syntactic Analysis for the Sentence

The normal steps performed in QA systems is syntactic analysis for the question based on the hypothesis set for the question followed by extracting the information that enables the system to detect the best answer. The procedure proposed in this paper is to analyze the sentence (answer) according to the built rules, then generate the question depending on the question hypothesis set in advance. NLTK and NLTK-NE modules are used to analyze the sentences to ﬁlter them and acquire the desired sentences according to the hypothesis

86

S. A. Abdul-Kader et al.

stated above. Verb tense recognition exists and is quite straight forward to perform in NLTK, but differentiation between the verbs to ﬁlter them needs hard coding. The POStags are mainly used to recognize the main verb tense and then processing needs to separate the required verbs from the eliminated ones. The diagram in Fig. 2 shows sentence analysis with regard to the hypothesis built above and NLTK analysis concepts. (Wh) Questions

Past

Present

was, were

Simple Past

Who was Where + were + NE ? What

What + did + NE +

the verb in simple present form?

is, are

Simple present

Who is Where + are + NE ? What

What + does + NE +

the verb in simple present form?

Fig. 2. The analysis of ‘Wh’ factoid questions with regard to verb tense and Named Entity type.

4 Question Hypothesis The detailed diagram of question analysis according to verb tense is shown in Fig. 3. Question type ‘Wh’ is generated from each hypothesis based on selected answers. Factoid (factual) or deﬁnition questions are generated according to verb tense and Entity Name class. So, the hypothesis is as illustrated below: The generic form of the question in the case of simple past tense is: What þ did þ NE þ the verb in simple present form0 ?0

ð5Þ

The question format of was and were auxiliary verb tense is different from simple past form: 0

1 Who @ What A þ was þ ðNE Þ þ 0 ?0 were Where

ð6Þ

Where the question word ‘who’ is used if the Entity Name type is PERSON or ORGANIZATION, ‘What’ is used in the case of EN type is FACILITY, and the word ‘Where’ is used for asking about LOCATION or GPE.

Automatic Web-Based Question Answer Generation System

87

The question hypothesis for the simple present tense is similar to the one for simple past except the auxiliary verb did, it is does here: What þ does þ NE þ the verb in simple present form þ 0 ?0

ð7Þ

Is and are questions are the same format as was and were: 0

1 Who @ What A þ is þ ðNE Þ þ 0 ?0 h are Where

ð8Þ

The answer for each type of questions must be the same type considered in the sentence hypothesis stated above.

Active Tense

Past

Present

was, were

Simple Past

NE + Verb + Object

NE + was, were + sentence completion

Simple present

NE + Verb + Object

is, are

NE +

is, are

+

sentence completion

Fig. 3. Factual sentence analysis relating to verb tense.

5 Source of Error The assumptions and hypothesis considered in the theoretical part are not overly complicated. The rules are simpliﬁed to minimize the errors that may result from over complexity. Errors have been found in the results and the sources or these errors are not the theoretical propositions. The errors are mainly due to the implementation where tools are used to analyze the text or the sentences. The modules of NLTK and NLTKNE in the python library are very useful tools to analyze the sentences syntactically and to identify the NEs, but both have errors in their databases, for instance; NLTK-NE recognizes the word Please in the beginning of the sentence as a named entity and the NLTK database considers the verb saw is the present and the past of the verb see. These types of defects in the tools cause many of the observed errors in the results.

88

S. A. Abdul-Kader et al.

Fig. 4. The main diagram of proposed QA production system.

6 SQLite Database The data resulting from the processing needs to be saved into a database for storage and for later evaluation. An SQLite database has been built to contain the resultant QA pairs and cosine similarity scores between the question and answer.

7 Proposed System The proposed system begins with a web crawler which is capable of accessing web pages and retrieves plain text from the web starting with a desired URL called the start/seed URL. Buffering has been used in order to avoid storage limitation problems. The buffer enables the crawler to keep the number of pages within the limits of computer memory by controlling the generation of new pages. Pre-processing is applied to the HTML code to extract the plain text. Then, further manipulation is applied to the resultant text to ﬁlter the redundant symbols such as: stop words, nonEnglish letters & words, and punctuations. NLTK operations are applied to split (tokenize) the text to sentences in separate lines and each sentence is then tokenized into a group of words. The split words are then POS-tagged by speech parts. NLTK Named entity Recognition (NER) is used to identify the sentences with proper noun subjects and verb tense is identiﬁed in order to set the verb form. The question and answer pairs and their cosine similarities are produced and then the results are loaded into an SQLite database. The diagram of the proposed system is shown in Fig. 4. The system is implemented in the Python programming language and the implementation details are presented below.

Automatic Web-Based Question Answer Generation System

89

The web crawler for the proposed system begins with a seed URL of a page associated to a desired ﬁgure or the object. The seed URL is used to make a request to the associated web page and then to receive the HTML document with the page data. URLs are extracted to be saved in a (To Visit) ﬁle by parsing the HTML document. Try and Catch is used to track the saved URLs in order to check the availability of each of them and then to visit the associated new web pages. Plain text is extracted from the web pages of existing URLs and saved into a text ﬁle which is processed in the next stage. The process carries on up to the last URL in the (To Visit) ﬁle. The diagram in Fig. 5 shows the flow of the web crawler operation. The text extracted from the web is read from the (To Visit) ﬁle. The plain text could contain different undesired code after ﬁltering from the HTML. For example: u appears before each word in the text and it is called UNICODE. Therefore, the text is encoded to ASCII in such as to make it easier to deal with it. The text after that is broken down into sentences using the NLTK sentence tokenizing operation. The resultant sentences are then ﬁltered to remove redundant English symbols, punctuations, in addition to non-English letters and symbols. The ﬁltered sentences are split into words by the word tokenize. NLTK operation and each word is pos-tagged by a part of speech label (POS). Sentences with less than three words are not useful in this system because two words do not make a complete deﬁnition sentence. Therefore, too short (less than two) and too long sentences (more than 20 words) are ﬁltered before or after POS-tagging. The main idea that we rely on to identify the subject is detecting named entity at the begging of the sentence as the subject. Depending on this concept the sentences that begin with named entity are only used to extract question forms. Hence, the proper none named entity is determined for each sentence using NLTK named entity recognition (NLTKNE). NLTK-NE normally identiﬁes only one word named entity and doesn’t recognize multiple word named entities as a single entity, so to deal with. In this defect a function has been written to detect and get continuous (chain named entity) from the multiple entities that NLTK-NE nominated. To generate the proposed form of the questions, named entity (subject) and verb tense should be known. The verb tense is determined in the stage of ﬁnding the named entity subject. Based on the verb tense the sentence is manipulated in one of the four question manufacturing functions in the implemented software. If the verb tense is past and it is was or were, it goes to the function that generates the speciﬁed category questions. If it is not was and were, it is simple past and goes to the function that produce simple past category questions. The present tense is also divided into two categories: is and are category and simple present category. Each present tense category also has a function that extracts the question from the sentence. The resultant QA pairs are then placed in an SQLite database that is designed to maintain the knowledge base of a Chatbot. Cosine similarity is calculated for each QA pair so as to know the similarity distance between the generated question and the original sentence which is the answer. The Python modules used in the implemented program are: re (regular expressions), urllib (for web URLs), BeautifulSoup, ngram, SQlite3 in addition to NLTK. The flow diagram in Fig. 6 demonstrates the sequence of operation to treat plain text and generate questions from sentences and produce QA pairs.

90

S. A. Abdul-Kader et al.

8 Results Example results are four tables in an SQLite database. The tables present QA pairs for four categories of factoid or deﬁnition sentence classes. The text extracted from a 100 web pages or URLs is treated to produce 12 QA pairs in the is and are category (examples are shown in Table 1). Table 1. Is and Are QA group examples Question 1. Who is Elton John? 2. Where is Beckham?

Answer Elton John is godfather to Brooklyn and Romeo Beckham their godmother is Elizabeth Hurley Beckham is currently playing Major League Soccer for LA Galaxy

In simple past category, 36 QA pairs are generated and examples are shown in Table 2. Table 2. Simple past QA group examples Question 1. What did Beckham choose? 2. What did Beckham become? 3. What did David help?

Answer Beckham chose to wear number Beckham became only the ﬁfth Englishman to win caps David helped launch our Philippines Typhoon children s appeal which raised in the UK alone

Only two QA pairs result for the category of simple present as demonstrated in Table 3. Table 3. Simple present QA group Question

Answer

1. What does Greatest Britons award? Greatest Britons awards The Celebrity number 2. What does Man Utd play? Man Utd play down Arsenal rift

Thirteen pairs of QA are created within was and were category and examples of the results are shown in Table 4. The example QA pairs include the mistaken ones. Table 4. Was and Were group Question 1. Who was Beckham? 2. Who was Tottenham Hotspur? 3. Who was Ryan Giggs?

Answer Beckham was a Manchester United mascot for a match against West Ham United in Tottenham Hotspur was the ﬁrst club he played for Ryan Giggs The 39-year-old was the ﬁrst of Fergie s Fledglings

Automatic Web-Based Question Answer Generation System

Start

Enter URL

Send Request for URL page

Receive HTML Document from the URL page

Extract URLS by parsing HTML document

Save URLs in to visit file

Use try and catch for to visit URL list in the file

Is the URL exist

Yes Extract the plain text

Add the plain text into a file

NO

Is it the last URL in the file exist

Yes End

Fig. 5. The implemented web crawler.

NO

91

92

S. A. Abdul-Kader et al.

Start

Read the text already retrieved from the Web

Encode the plain text to ASCII code

Break the text down to sentences (sentence tokenize)

Filter each sentence from redundant and non-English symbols

Split each sentence into words (word tokenization)

POS tag each word in a sentence

Filter too long and too short sentences

Use NLTK-NE to obtain determine Named Entities

Obtain continuous (chain) NE

No

Determine the verb tense

Simple present ?

Present

Which verb tense?

Past

No

Yes

Yes Obtain the question for is, are

Simpl e

Obtain the question for simple present

Obtain the question for simple past

Obtain the question for was, were

Calculate cosine similarity between the question and the answer

Place the results in a database

End

Fig. 6. The implemented steps to treat plain text to generate questions from sentences and produce QA pairs.

Automatic Web-Based Question Answer Generation System

93

9 Comparative System We will give the comparative system in [9] the abbreviation GNLQ and our system the abbreviation AWQDG. These abbreviations are extracted from the titles of the comparative paper and this work. The approach in [9] has been selected to be considered as a comparative system to assess our hypothesis for the following reasons: (1) The approach in GNLQ uses single sentences to generate questions from sentences which is quite similar to our system. (2) GNLQ considers target identiﬁcation by determining which speciﬁc words or phrases to ask about which is similar to our narrowing for the selection of a speciﬁc subject type (True Noun Named Entity) and speciﬁc verb tense. (3) GNLQ generates template based questions and to some extent uses syntactic or/and semantic information to select the sentences or generate questions. Our approach uses semantic features and verb tense types to select the sentences and generate questions in a form similar to the template-based category. (4) GNLQ generates questions to support learning on-line and our system generates questions for a conversational agent that can be a tutor to teach the user about a ﬁgure or an object it simulates. This gives us another justiﬁcation for the comparison. (5) GNLQ does not simplify the selected sentences to generate the questions from i.e. it doesn’t cut words or phrases from the selected sentence. It uses predicates of a sentence to generate a question. (6) GNLQ focuses on generating speciﬁc kinds of questions and it selects only the sentence targets appropriate for those kinds of questions. Despite the kinds of questions generated in our approach being fundamentally different, similar comparisons can still be made.

10 Experimental Evaluation The experiment starts by applying the rule set to select particular sentences extracted from text retrieved from Wikipedia which has been pre-processed. It is then used to generate questions from these carefully selected sentences. Then, the resultant QAs are compared with other QA systems after they have been incorporated into our system. The results of both systems are assessed by subjective testing. The subjects used are PHD students in different research areas i.e. Computer Science, Electronic Engineering, linguistics, and Mathematics at the University of Essex. The study included 34 participants both male and female distributed into two equal groups. The participants are chosen and split into groups depending on the theory of within and in between [14]. One of the two groups assessed our system and the other evaluated the work of the comparative system. A questionnaire has been ﬁlled by each participant to allow us to measure the level of accuracy that has been achieved by our Question Generation system. The results of this questionnaire are calculated by aggregating the participants’ responses.

94

S. A. Abdul-Kader et al.

11 Evaluation Results After ﬁnishing calculation of data classes for both AWQDG and GNLQ, a python program is used to calculate the precision value for each part in each group of the two compared systems. Precision is calculated for Question, Answer, and QA pair match for each of the groups is and are, simple past, simple present, and was and were in both systems AWQDG and GNLQ. Precision calculation results have been recorded and then entered into a MATLAB program to produce comparative bar graphs between AWQDG and GNLQ. The graphs are drawn as follows: The bar graph demonstrated in Fig. 7 is for precision levels of Question, Answer, and QA pairs matching in group is and are for both AWQDG and GNLQ. The graph shows proximity in precision levels between AWQDG and GNLQ. However, AWQDG exceeds GNLQ by 3 points in Questions as 0.96 for the former and 0.93 for the latter and QA match by also 3 points as 0.98 for the former and 0.95 for the latter. In contrast, GNLQ exceeds AWQDG by 10 points as 0.67 for the former and 0.57 for the latter. The bar graph in Fig. 8 illustrates the precision levels of Question, Answer, and QA pair match in simple past group for both AWQDG and GNLQ.

Fig. 7. Precision comparison between AWQDG and GNLQ (is and are group).

The graph shows another proximity between AWQDG and GNLQ with an increase for AWQDG over GNLQ in Answers, and QA pairs match as 0.96, 0.95 respectively for the former and 0.89, 0.92 respectively for the latter. The results show AWQDG exceeding by 7 points in Answers over GNLQ and 3 points in QA pairs match. On the contrary, GNLQ exceeds by 1 point in Questions as 0.89 for AWQDG and 0.90 for GNLQ.

Automatic Web-Based Question Answer Generation System

95

Fig. 8. Precision comparison between AWQDG and GNLQ (simple past group).

The levels of precision for Question, Answer, and QA pairs match for simple present groups of both AWQDG over GNLQ are shown as a bar graph in Fig. 9. The graph shows a signiﬁcant excess by 84 points for AWQDG over GNLQ in the Questions part where the values were 0.95 for the former and 0.11 for the latter. Equality is shown for both in Answers where the values are 0.11 for each. AWQDG also exceeded over GNLQ in QA pairs match by 3 points as 0.97 for the former and 0.94 for the latter.

Fig. 9. Precision comparison between AWQDG and GNLQ (simple present group).

Figure 10 presents the bar graph for precision levels of Question, Answer, and the QA pair matches for was and were groups in AWQDG against GNLQ. The bar graph shows a considerable increase in precision level of AWQDG over GNLQ by 68 points as 0.93 for the former and 0.25 for the latter. AWQDG also beats GNLQ in QA pair match by 27 points and the numbers are 0.94, and 0.67 respectively for the former and the latter. Whereas GNLQ exceeds in the Answer part over AWQDG by 24 points and the values are 0.91, and 0.67, respectively.

96

S. A. Abdul-Kader et al.

Fig. 10. Precision comparison between AWQDG and GNLQ (was and were group).

The comparison of overall precision levels for both AWQDG over GNLQ is shown in the bar graph of Fig. 11. The graph shows a remarkable rise in the favorability of AWQDG over GNLQ in was and were groups by 33 points as 0.89 in AWQDG and 0.56 in GNLQ and in the simple present group by 17 points as 0.67 for the former and 0.50 for the latter. Even so, GNLQ overtakes AWQDG in is and are, and simple past groups by 5 points in is and are groups as 0.88 for the former and 0.83 for the latter and 2 points in the simple past group as 0.96 for the former and 0.94 for the latter.

Fig. 11. Precision comparison between AWQDG and GNLQ (overall of the four groups).

It is noticeable from the bar graph shown in Fig. 12 that AWQDG exceeds GNLQ in overall Questions and QA pair matches. The graph shows an increase by 12 points in Questions for AWQDG over GNLQ as 0.93 for the former and 0.81 for the latter. Also it is illustrated in the graph that the QA pair matches part rises in AWQDG over GNLQ by 7 points and the numbers are 0.96 and 0.89 for them respectively. However, GNLQ increases in the Answers part over AWQDG by 8 points as 0.9 for the former and 0.82 for the latter.

Automatic Web-Based Question Answer Generation System

97

Fig. 12. Precision comparison between AWQDG and GNLQ (overall of Questions, Answers, QA match).

Overall the recorded precision value for our question generation system was 0.91 relating to the subjective assessment results we implemented for our system evaluation, whereas an overall precision value of 0.86 has been obtained for the comparative system that has been adapted to our system and our produced data set using the same evaluation method for the experimental results. The overall values present a clear success for our system over the comparative system by 5 points. An improvement is also shown in Questions and QA pairs match which means that our system AWQDG generates more answerable questions than the comparative system GNLQ.

12 Conclusion In this paper two main contributions are presented. The ﬁrst contribution is generating factual questions from existing factual sentences. Plain text has been extracted from the 100 URLS from the Wikipedia page of the footballer David Beckham. Factual sentences have been extracted from the plain text after pre-processing. Named Entity (proper noun) Recognition (NER) and verb tense recognition are used to identify the factual sentence category. Speciﬁc rules are built to categorize the sentences and then to generate questions and categorize them. The new built database is used as knowledge for the new born Chatbot that can simulate the personality of a desired ﬁgure or behavior of an object and improve over time. Four categories of QA pairs are produced and examples of these categories are presented. A comparative system has been incorporated into our system using our produced dataset and compared with our system. A subjective test to validate the QA pairs is performed and the evaluation stage is implemented after the subjective assessment made for the two systems. The overall precision levels obtained for the subjective assessment shows an enhancement by 5 percentage points for our system over the comparative system. Also the results show a clear overrun for our system in the Question and QA pair match categories which means that our system produces more answerable questions from a sentence than the

98

S. A. Abdul-Kader et al.

comparative system. The resultant QA pairs produced by our question generation system are put into an SQLite database to be used as part of the knowledge base for our Online Feedable New born Conversational Agent.

References 1. Huang, J., Zhou, M., Yang, D.: Extracting chatbot knowledge from online discussion forums, pp. 423–428 (2007) 2. Wu, Y., Wang, G., Li, W., Li, Z.: Automatic chatbot knowledge acquisition from online forum via rough set and ensemble learning, pp. 242–246 (2008) 3. Haller, E., Rebedea, T.: Designing a chatbot that simulates an historical ﬁgure, pp. 582–589 (2013) 4. Cong, G., Wang, L., Lin, C.-Y., Song, Y.-I., Sun, Y.: Finding question-answer pairs from online forums, pp. 467–474 (2008) 5. Wu, F., Weld, D.S.: Open information extraction using Wikipedia, pp. 118–127 (2010) 6. Matsuyama, Y., Saito, A., Fujie, S., Kobayashi, T.: Automatic expressive opinion sentence generation for enjoyable conversational systems. IEEE/ACM Trans. Audio Speech Lang. Process. 23(2), 313–326 (2015) 7. Huang, Y., He, L.: Automatic generation of short answer questions for reading comprehension assessment. Nat. Lang. Eng. 22(03), 457–489 (2016) 8. Zhang, L., VanLehn, K.: How do machine-generated questions compare to human-generated questions? Res. Pract. Technol. Enhanc. Learn. 11(1), 1 (2016) 9. Lindberg, D., Popowich, F., Nesbit, J., Winne, P.: Generating natural language questions to support learning on-line. In: ENLG 2013, p. 105 (2013) 10. Abdul-Kader, S.A., Woods, J.: Survey on chatbot design techniques in speech conversation systems. IJACSA 6(7), 72–80 (2015) 11. Peñas, A.R., Sama, V., Verdejo, F.: Testing the reasoning for question answering validation. J. Logic Comput. 18(3), 459–474 (2008) 12. Ferret, O., Grau, B., Hurault-Plantet, M., Illouz, G., Jacquemin, C., Monceaux, L., Robba, I., Vilnat, A.: How NLP can improve question answering. Knowl. Organ. 29(3/4), 135–155 (2002) 13. Chinchor, N., Robinson, P.: MUC-7 named entity task deﬁnition, p. 29 (1997) 14. Almuhaimeed, A.: Enhancing Recommendations in Specialist Search Through SemanticBased Techniques and Multiple Resources. University of Essex (2016)

Application of Principal Component Analysis (PCA) and SVMs for Discharges Radiated Fields Discrimination Mohamed Gueraichi(&), Azzedine Nacer, and Hocine Moulai Laboratory of Electrical and Industrial Systems, Faculty of Electronic and Computer Science, University of Sciences and Technology Houari Boumediene (USTHB), 32, El Alia, 16111 Bab Ezzouar, Algiers, Algeria {mgueraichi,anacer,hmoulai}@usthb.dz

Abstract. This paper proposes the use of a fast method of discriminating magnetic signals radiated from electrical discharges occurring in overhead lines string insulations. Two types of electrical discharges exist: dangerous that can lead to arcing discharges and those without danger that auto extinguish. We propose as a new method of discriminating and classifying partial discharges principal component analysis (PCA) combined with support vector machines (SVMs) which have proved their robustness in several disciplines. However, the used database is composed of two classes, the majority representing 130 harmless radiated magnetic ﬁeld signals, while the second minority class represents 31 dangerous signals. The learning and test sets correspond respectively to 2/3 and 1/3 of the database. The SVMs application to the test set shows that no dangerous signal is detected, this being due to the fact that the two classes are unbalanced. We were then asked to apply the Principal Component Analysis (PCA) even before classiﬁcation, which made it able to select the most relevant variables. The results show that by using PCA and then SVMs, the detection rate of a dangerous signal is 90%. Keywords: Partial discharges (PD) Insulation systems Support Vector Machines (SVMs) Principal component analysis (PCA)

1 Introduction Partial electrical discharges present in the dielectrics, especially liquids and gases, are indicative of the vulnerability of the latter’s [1–7]. In fact they are the main way to monitor the electrical failure of the equipments where they are used. Up to now neural networks have been widely used in the classiﬁcation of types of partial discharges: dangerous from those without danger [4, 6]. Although this type of classiﬁer is efﬁcient [2, 3, 5, 8], it requires a large number of parameters to be set, necessitating ﬁnding another robust alternative. Support Vector Machines or SVMs are recent classiﬁers and widely used in other disciplines and require only the adjustment of two parameters.

© Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 99–108, 2019. https://doi.org/10.1007/978-3-030-01174-1_8

100

M. Gueraichi et al.

In the following, we will initially apply the SVMs directly to the original signals of our database consisting of a total of 161 signals of 25,000 points each, where 130 represent radiated magnetic ﬁeld signals related to non-dangerous partial electrical discharges and 31 representing partial electrical discharges which can lead to arcing breakdown. In a second step, we will apply the principal component analysis to the original signals even before the application of the SVMs. The rest of this article is organized as follows: In Sect. 2, we make a brief overview of the SVMs, followed by Sect. 3 where we describe the Principal Component Analysis (PCA). In Sect. 4 we present the experimental results and their interpretation. Finally, we conclude on the work done and the prospects envisaged.

2 General View on SVMs Classiﬁers Support Vector Machines classiﬁers (SVMs) were originally implemented by Vapnick et al. [9, 10], and allow to separate two classes by means of an optimal hyperplane, and this from a set fðxi ; yi Þgni¼1 where xi 2 Rm are the n learning examples and yi are the labels of the classes {−1, +1}. The aim is to ﬁnd a decision function f which deﬁnes the optimal separation hyperplane and maximizes the margin of separation between the closest data of the two classes (called support vectors), which minimizes the classiﬁcation error as well as for the learning set than for the test set. Very often, the data are not linearly separable, which is why a kernel function Kðx; xk ) is used [11]. This function f is expressed as follows: f ð xÞ ¼ sign

XS

a y Kðx; xk Þ þ b k¼1 k k v

ð1Þ

Where ak are the Lagrange multipliers, Sv is the number of support vectors xk which are learning data such that: 0 ak C where C is the regularization parameter which is a trade-off between maximizing the margin and minimizing the number of drive errors of non-separable points. Finally, x and b represent respectively the vector characteristic of the data and the bias which is the translation factor of the hyperplane with respect to the origin. Note that the kernel Kðx; xk ) must satisfy the Mercer conditions. There are three common types of kernel functions [9, 11]: polynomial, sigmoid, and ﬁnally RBF. However, the radial-based kernel function (RBF) is the simplest and the easiest to implement compared to other kernels and moreover gives the best results in the ﬁeld of pattern recognition [9, 12]. This is the reason of its use it in the present work. Its expression is: RBFðx; xk Þ ¼ exp 2r1 2 kðx xk Þk2

Where r is the parameter of kernel. The optimal hyper-parameters (r, C) must then be adjusted and selected to ensure that a detection rate of 100% dangerous and harmless radiated ﬁeld signals is obtained for training set. Optimal torques ropt ; Copt , will be saved to perform the ﬁnal learning on the test signals.

Application of Principal Components Analysis (PCA) and SVMs

101

3 Principal Component Analysis The Principal Component Analysis was originally implemented by Pearson (1901). It is a statistical technique, compression and reduction of data, so useful information is enhanced and the signal-to-noise ratio is improved [13, 14]. 0 This is to switch from a starting space or an original vector X ¼ x1 ; x2; . . .xn where the components xi are strongly correlated between them, while on the contrary 0 the components of a new vector Y ¼ y1 ; y2; . . .yp of the new arrival space are decorrelated between them, where p n: The PCA has the following properties: (a) The covariance matrix of the new system is diagonal, i.e. the components are orthogonal two by two: Covðyi; yj Þ ¼ 0 8 i 6¼ j: (b) Maximum variances with: Varðy1 Þ [ Varðy2 Þ [ . . . [ Varðyp Þ, therefore of decreasing importance. The principal components analysis y1 ; y2; . . .yp are entirely determined by the matrix of covariance of the space of arrival which is diagonal. So, Y = G. X where G is the transformation matrix between the two spaces. P Recall that by deﬁnition the covariance matrix of the input space is X¼ 0 E ðX mX ÞðX mX Þ where mX is the arithmetic mean such that: mX ¼ E ð X Þ ¼ ½mx1 ; mx2 ; . . .; mxn 0 . Similarly, the covariance matrix of the output space is: P Y ¼ E ðY mY ÞðY mY Þ0 where mY is the arithmetic mean such that: mY ¼ EðY Þ ¼ ½my1 ; my2 ; . . .; myp 0 : After transformation, one obtains: P P Y ¼ E ðY mY Þ Y mY Þ0 where Y ¼ EfðG:X G:mX ÞðG:X G:mX Þ0 Þg P P P 0 0 Y ¼ G: X :G0 that is: Y ¼ G: E ðX mX ÞðX mX Þ :G , therefore we have: P where the diagonal elements of Y are the eigenvalues k1 ; k2 ; . . .; kp with Varðyi Þ ¼ ki and such as: k1 [ k2 [ . . . [ kp : P P We can note that VarðXi ) = Trace ( XÞ because the total variance associated with the original variables ½x1 ; x2; . . .xn is retained after transformation into main components in the variables produced ½y1 ; y2; . . .yp , and it is interesting to note that in fact it is redistributed over them with the greatest proportion in the ﬁrst components. The P proportion of the latter expressed by each component is given by ½kk = pi¼1 ðki Þ for k = 1, 2; . . .; p:

102

M. Gueraichi et al.

P The amount [k1 = pi¼1 ðki Þ] expresses the variability of the ﬁrst component with P respect to the set of remaining components; also ½ðk1 þ k2 Þ= pi¼1 ðki Þ represents the variability of the ﬁrst two components with respect to the set of components and so on, so that the variability (expressed in %) is often plotted as a function of the number of components; which makes it possible to ﬁlter a few and in particular the last components which are highly noisy. Generally, 90% variability is enough to ﬁx the number of main components that one wants to use.

4 Experimental Results This section illustrates the experimental setup for the recording of signals from partial electrical discharges as well as the application of SVMs and then the PCA and SVMs respectively for signal classiﬁcation and reduction of size and classiﬁcation. 4.1

Database

The experimental set-up used to record the magnetic ﬁeld radiated signals allowed us to build up the database obtained by applying voltages from 63 kV to 80 kV with 2 kV/s increasing steps on high voltage composite chain insulator used in power energy and railway transportation. At 63 kV, the fault signals DS1 to DS12 appear; at 65 kV: DS13 to DS24; at 67 kV: DS25 to DS46; at 69 kV: DS47 to DS57; at 71 kV: DS58 to DS73; at 73 kV: DS74 to DS90; at 75 kV: DS91 to DS107; at 77 kV: DS108 to DS138, and ﬁnally at 80 kV: DS139 to DS161. In total, 161 radiated ﬁeld signals are associated with the partial electrical discharges of the insulation system. The 161 signals obtained were organized in the form of a .CSV ﬁles and each signal is represented by 25,000 points. Thus we ﬁnd that our database is formed mainly of signals associated with discharges of relatively low voltages, therefore not dangerous whereas the signals associated with discharges of higher voltages and dangerous are minority. Signals numbered 1 to 130 consist of harmless signals and represent the ﬁrst class; while numbers 131 to 161 are dangerous signals represent the second class. 2/3 of the database of each class of signals where considered for learning and the remaining 1/3 were reserved for the test. Figures 1 and 2, respectively give an illustration of the form of a signal associated to the respective harmless and dangerous type. Note that the levels of the radiated magnetic ﬁeld signals are expressed in Volts. The real values of the latter are obviously proportional to these levels and can be evidently expressed in their units.

Application of Principal Components Analysis (PCA) and SVMs

103

1.4 1.2

Radiated Magnetic Field (V)

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.5

1 Time (s)

0.5

0

2

1.5

2.5 x 10

-5

Fig. 1. Example of a harmless signal. 0.5 0.4

Radiated Magnetic Field (V)

0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -4

-2

0

2

4 Time (s)

6

8

10

12 x 10

-6

Fig. 2. Example of a dangerous signal.

We can notice in Fig. 1 that the radiated magnetic ﬁeld corresponding to a harmless signal has a duration practically equal to 22 ls whereas in Fig. 2, the radiated magnetic ﬁeld corresponding to a dangerous signal has a much shorter duration (a little more than 10 ls), so twice as fast; Moreover, in the areas where the signals of both types have large undulations (practically of duration equal to 2 ls), the variations in the radiated magnetic ﬁeld level are much denser and faster for the dangerous signal compared to that harmless. Finally, the radiated magnetic ﬁeld of the latter is extinguished after 10 ls while the one corresponding to the dangerous one remains at a constant non-zero level and is characterized even by two bursts of successives undulations around this level, though low but which seems amplitude, which can lead to another disruption, the consequences of which can be disastrous for the equipment.

104

M. Gueraichi et al.

4.2

Implementing SVMs

4.2.1

Evaluation criteria

(a) Class recognition rate: The class recognition rate (noted CRR) is deﬁned as the ratio between the number of well-recognized examples of a class and the total number of examples of that class. It is expressed in %: CRR ¼

Number of well recognized examples of class Total number of examples in class

(b) Overall recognition rate: The overall recognition rate, also known as the good classiﬁcation rate (noted GCR), is a general observation on the recognition system. It is the ratio between the number of examples of the test base that are well recognized and their total number. It is calculated by the following formula: GCR ¼

Number of well recognized examples Number of total examples

(c) Evaluation of results: SVMs learning require the choice of the kernel on the one hand and the selection of the model parameters (in our case the parameter r of the selected kernel RBF and the regulation parameter C). Since we have two non-linearly separable classes, the bi-class (binary) implementation of two SVMs is required with an RBF kernel associated with the SMO algorithm for quadratic optimization of SVMs [11]. All the values of the preceding parameters are ﬁxed at the start for all the tests. Moreover, the parameters of regularization (C) and of the kernel RBF (r) are adjusted and determined experimentally. After having carried out several tests, one will retain their optimal values which give the best overall recognition rate GCR. But in contrast to the ﬁeld of pattern recognition, in the ﬁeld of insulation systems, we are actually interested in the class of dangerous signals, which is why we will focus our attention on the CRR and not on the GCR. Furthermore, all the results were obtained for a range of variation of the parameters C and sigma in the following intervals: – C from 10 to 300 with a pitch of 10. – Sigma from 0.1 to 10 with a pitch of 0.3. Let HStrain and DStrain denote respectively the data base fractions relating to the training set while HStest and DStest are the fractions relating to the test set. For learning, the respective recognition rate for fractions HStrain (i.e. CRRHS train ) and DStrain (i.e. CRRDS train has reached 100% for several couples (r; C), while for the test fraction HStest ; the CRRHS test has reached 100%, but unfortunately for the other fraction DStest , the SVMs cannot recognize any signal (i.e. CRRDS test = 0%).

Application of Principal Components Analysis (PCA) and SVMs

105

So in summary, we can say that learning is done in a perfect way and this is expressed by the recognition rate per class i.e. the CRRHS train and the CRRDS train . The problem remains with the test. In fact, the rate of recognition of fraction DStest is zero, meaning that there is no dangerous signal that is recognized. This is due to the fact that the size of fraction DStest is very low with regard to fraction HStest , about 10 dangerous signals and 43 harmless signals. The possibility of detecting signal from fraction DStest is nothing. This is exactly the case where the classes are imbalanced and most probably the data is noisy, which explains why we are dealing with an overﬁtting. Although it is still possible to solve the problem by reorganizing the database or eliminating irrelevant a harmless signals [12, 15], we preferred to keep the same basis and we thought of a reduction method of the data. In this case the PCA do not only select the main components but also reduces the number of points constituting the signal. 4.3

Application of the PCA then the SVMs

The optimization of the training matrix (consist of the mixture of HStrain and DStrain ) where each signal is placed at each row and where the columns corresponds to the variables that must be reduced by the PCA and by keeping only the most relevant variables necessary to the separation of the two classes. The algorithm is as follows: • The average is calculated using the function «mean» then the covariance is calculated using the function «cov». • We calculate the eigenvectors and the eigenvalues. • We sort the eigenvectors according to the eigenvalues. The variability expressed in number of components is plotted. This is shown in the following illustrated Figs. 3 and 4: In Fig. 3. we can notice that the variability increases very quickly and reaches 100% for 30 principal components (PCs) whereas in Fig. 4, for more visibility, we present the variability according to the ﬁrst ten PCs. We can notice that for ten PCs, the variability reaches exactly 90%, this is the reason that this number will be then retained to build a new database for both training and test. It may be noted from Table 1 that for the values of C and r, equal to 100 and 9 respectively, the learning is done correctly and is 100% for both classes, but for the class of harmless signals, the test gives 100% and decreases to 70% for the other margins. In fact, we must not lose sight from the fact that we are much more interested in the class of dangerous signals because they represent the risk for the system insulation [7]. We ﬁnd that there is a substantial improvement since the rate of recognition of a dangerous signal is increased from 0% (when we apply only SVMs to original database) to 90%, (when we apply PCA + SVMs) indicating the effectiveness of such a strategy; therefore, among 10 dangerous signals, one gets to recognize 9. SVMs are re-applied to this new database and the results obtained are presented in the form of a table shown below. Note in Table 1. that expressions CRRDStest min , CRRDStest max and CRRDStest inter denote respectively the maximum, minimum and intermediate values of CRRDStest .

106

M. Gueraichi et al.

The results show the robustness of combining PCA and SVMs. The recognition rate of a dangerous signal has been signiﬁcantly improved from 0% to 90%. Therefore, the PCA method combined with the SVMs is effective even in the presence of imbalanced classes and noisy. 100 90

Variability expressed in %

80 70 60 50 40 30 20 10 10

70 80 30 40 50 60 Total number of Principal Components

20

90

100

Fig. 3. Variability according to the total number of PCs.

100 90

Variability for the first ten PCs

80 70 60 50 40 30 20 10 1

2

3

4

5 6 The first ten PCs

7

8

9

Fig. 4. Zoomed variability for the ﬁrst ten PCs.

10

Application of Principal Components Analysis (PCA) and SVMs Table 1. Class recognition rate of fraction DStest ðCRRDS

C r CRRHS train CRRDS train GCRtrain CRRHS test CRRDS test GCRtest

C, r, CRRHS train , CRRDS train , CRRDS test CRRDS testmin CRRDS testmax 10 100 7.5 9 100% 100% 100% 100% 100% 100% 100% 70% 60% 90% 80% 80%

CRRHS CRRDS 110 10 100% 100% 100% 70% 70% 70%

107

test Þ

test , testinter

5 Conclusion In this paper we proposed and presented the combination of PCA and SVMs for detecting a signal of radiated magnetic ﬁeld associated with a dangerous type partial discharge to secure the durability and operation of insulations, especially in overhead lines and power transformers. The PCA allowed the reduction of the original database, which moreover consisted of majority and minority. With the application of SVMs on the original data base, an overﬁtting has been obtained due to the unbalance of the available classes. The combination of PCA and SVMs has led to a substantial improvement in the detection of dangerous signals compared to the application of only SVMs. Nine dangerous signal from ten have been well recognized even in the presence of the two unbalanced classes cited above. Future investigations will consist of applying the SVMs and then the combination of SVMs and the PCA by partitioning the original database into three classes for example: harmless signal; suspicious signal and dangerous signal, with the aim of eliminating the unbalance of the classes and thus improving further the detection rate of dangerous signals type.

References 1. Aberkane, F., Moulai, H., Benyahia, F., Nacer, A., Beroual, A.: Pre-breakdown current discrimination diagnose technique for transformer mineral oil. In: IEEE Conference on Electrical Insulation and Dielectric Phenomena, 14–17 October 2012, Montreal, Quebec, Canada (2012) 2. Aberkane, F., Nacer, A., Moulai, H., Benyahia, F., Beroual, A.: ANN and multilinear regression line based discrimination technique between discharge currents for power transformers diagnosis. In: 2nd International Advances in Applied Physics and Material Science Congress, 26–29 April 2012, Antalya, Turkey (2012) 3. Aberkane, F., Moulai, H., Nacer, A., Benyahia, F., Beroual, A.: ANN and wavelet based discrimination technique between discharge currents in transformers mineral oil. Eur. Phys. J. Appl. Phys. 58, 20801 (2012)

108

M. Gueraichi et al.

4. Moulai, H.: Etude des Courants de Préclaquage dans les Diélectriques Liquides. Thèse de Doctorat d’Etat, Ecole Nationale Polytechnique, Alger, Algerie (2001) 5. Aberkane, F.: Etude des Processus de Décharges Electriques dans les Diélectriques Liquides. Université des Sciences et de la Technologie Houari Boumèdiene, Alger, Algérie, Thèse de Doctorat (2015) 6. Schenk, A.: Surveillance Continue des Transformateurs de Puissance par Réseaux de Neurones Auto-Organiséés. Thèse de Doctorat, Ecole Polytechnique Fédérale de Lausanne (2001) 7. Sanchez, J.: Aide au Diagnostic de défauts des Transformateurs de Puissance. Université de Grenoble, Thèse de Doctorat (2006) 8. Marques de Sá, J.P.: Pattern Recognition, Concepts, Methods and Applications, pp. 21–39. Springer, Heidelberg (2001), pp 147–239 9. Cheriet, M., Kharma, N., Liu, C.L., Suen, C.Y.: Character Recognition System, pp. 129– 199. Wiley, New York (2007) 10. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (1999) 11. Platt, J.C.: Fast training of support vector machines using sequential minimal optimisation. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Machines, pp. 185–208. MIT Press, Cambridge (1999). Chap 12 12. Abidine, M.B., Fergani, B., Oussalah, M., Fergani, L.: A new classiﬁcation strategy for human activity recognition using cost sensitive support vector machines for imbalanced data. Kybernet J. 43(8), 1150–1164 (2014) 13. Webb, A.R.: Statistical Pattern Recognition, 2nd edn., pp. 319–329. Wiley, Hoboken (2002) 14. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classiﬁcation, 2nd edn., pp 114–115 (2002) 15. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)

Chatbot: Efﬁcient and Utility-Based Platform Sonali Chandel1(&), Yuan Yuying1, Gu Yujie1, Abdul Razaque1, and Geng Yang2

2

1 School of Engineering & Computing Sciences, New York Institute of Technology, Nanjing, China {schandel,yyuan11,ygu13,arazaque}@nyit.edu Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Nanjing, China [email protected]

Abstract. This paper aims to analyze the technology of chatbots and investigate its development, which is becoming a popular trend now. A chatbot can simulate a human being to interact with the people in real-time, using the natural language and sends its response from a knowledge base and a set of business rules. Firstly, by using a few examples of the famous chatbots, we have shown that the artiﬁcial intelligence based chatbots are the latest trend. The salient features of the chatbot techniques have been discussed, in short, using examples of 5 chatbot-based utilities. Then we have analyzed the signiﬁcance of a chatbot. Also, we have presented people’s view of chatbots through a short survey to ﬁnd if the popularity of this utility is rising or declining. The way they work and their advantages and disadvantages have also been analyzed respectively through the arrangement and analysis of information, as well as statistics and conclusions. Further, we have introduced the design principles of a chatbot. We have used the examples of some popular utilities to explain them speciﬁcally. The empirical result to create a prototype for the proposed test is shown in the form of questionnaire and recommendations. We have tried to ﬁnd out a relationship between chatbot and utility. We have also presented the study of their time complexity, according to the algorithm of a chatbot. In the future, human beings are more likely to use human-computer interaction by interacting with chatbots rather than using network connections or utilities. With this research, we hope that we can provide a better understanding and some clear information for people to know better about the relationship between chatbot and utility. Keywords: Chatbot Time complexity

Human-computer interaction Artiﬁcial intelligence

Supported in part by the National Natural Science Foundation of China. © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 109–122, 2019. https://doi.org/10.1007/978-3-030-01174-1_9

110

S. Chandel et al.

1 Introduction A chatbot (short form for “chat robot”) is a computer program that communicates with a human being through text or voice messaging in real-time, in a way that is very personalized. Many times they are also called as “bots.” Chatbots are developed using artiﬁcial intelligence and natural language processing technology. They are mainly designed to simulate a conversation environment between a program and a human being in a way that is very similar to an actual conversation between two human beings. At present, the most common areas where the presence of chatbots can be profoundly seen are, customer service centers, e-commerce, healthcare, and messaging apps. With the aid of these services, users only need to send a short message as input to get the required answers, which they get by visiting a related website or by making a phone call. Chatbots are gaining popularity very rapidly because they are much faster and cheaper to implement than a human being who is responsible for doing the same job. Artiﬁcial intelligence (AI) is the branch of science called cognitive science. It can accomplish the speciﬁc instructions by simulating scenarios, human consciousness and thought patterns. In cognitive science, artiﬁcial intelligence is deﬁned as: “a codiﬁcation of knowledge that will ﬁnally explain intelligence” [1]. While services based on artiﬁcial intelligence are gradually improving, it still needs to develop the ways that can make an artiﬁcial conversation appear more real and more connected on an emotional level. A chatbot uses AI algorithms to process natural language and use the analysis that results from the processing to ﬁnally produce an intelligent response, based on the human input. Researchers are trying to make the human-computer interaction (HCI) more and more natural [2]. Although HCI is designed to provide a platform for users and computers to interact with each other, the ﬁnal target is to use expert systems and deep learning more accurately to create a chatbot that can entirely copy the way a human would respond to a speciﬁc question or a situation. This makes it very difﬁcult for a human being to realize that they are talking to a robot and not a real person. The importance of this paper is to give an analysis of the applied ﬁelds and development of chatbots so that people can be better prepared to welcome the arrival of chatbot era. The rest of this paper is structured as follows. Section 2 outlines the salient features of various chatbots. It also talks about the related techniques by using the examples of ﬁve chatbot based utilities. Section 3 introduces the signiﬁcance of chatbots by analyzing the current situation. Section 4 introduces the design principles of our proposed chatbot and uses the examples of some favorite utilities to explain them accurately. It also presents a relationship between a chatbot and utility. Section 5 shows the empirical result to design a prototype for a proposed test in the form of a questionnaire and recommendations. Sections 6 and 7 gives the algorithm of chatbot and the corresponding time complexity in the best case and worst case scenario. Section 8 provides the conclusion and future of chatbots.

2 Salient Features of Chatbots In this section, we introduce the salient features of existing chatbots.

Chatbot: Efﬁcient and Utility-Based Platform

111

A. Siri: Siri is one of the most popular applications in Apple phones. It works as an intelligent chatbot which is more like your assistant. You can activate Siri by saying, “Hi Siri” and ask her questions or direct her to do something. She can give you the corresponding response regarding information or a recommendation. She can also help its user to get some simple tasks done such as making a call, sending messages, ordering a meal, or booking a flight, etc. But sometimes, Siri cannot give the accurate answers to your questions. She also has a problem in understanding various accents from different parts of the world. After the combined work with many local online services, Apple has made her a little more diversiﬁed and local in its newer versions. But it still cannot meet its user’s needs completely. B. Simsimi: Simsimi (pronounced as Shim-shimi) is a very popular South Korean chatbot. It is aimed to give interesting answers to its users during the chatting process to help them to release their stress. The working principle of AI Chatbot Simsimi can be roughly divided into two parts: Teaching + Matching. “Teaching” of Simsimi is the process of training. During this process, if Simsimi tells you that it cannot understand your question or it cannot ﬁnd the answer, then there exists a way which the user can use to teach Simsimi how to do it. The purpose is to build or make a rich thesaurus. And “Matching” process consists of comparing the word given by the user and searching it in its database to ﬁnd the suitable answer. The most signiﬁcant difference between Simsimi and other chatbots is that it will add the users’ answers into its corpus. As a result, it is entirely possible that the answers you receive as a response to a question from Simsimi could be the answers from other users. It is one of the simplest chatbots in the world that can only provide chatting function. Also, if a user wants to teach Simsimi some slangs or explicit words that are considered offensive in a normal conversation, then it will learn it as well because of its lack of an independent mechanism to distinguish between right and wrong words. As a result, it has not been accepted by the industry as widely as other chatbots in recent years. It is mainly famous among youngsters who love its colorful presentation and the feature that allows them to stay anonymous. C. Cortana: Cortana is a Microsoft created personal assistant for Windows. Cortana works in association with Bing search engine to respond to user’s queries or requests and can be activated by saying, “Hey Cortana!”. Cortana sometimes will provide some useful information even before the user asks for it, which it obtains from its user’s emails. Such as the time of the next flight, and based on that; it can remind users to catch the plane on time [3]. This feature efﬁciently avoids the probability of forgetfulness. However, Cortana also has problems in understanding the localized English accent. D. Google Now: Google Now is a Google product that works on Android and iOS compatible, diverse types of devices (although the performance of different smartphones and operating systems is slightly different). It works in association with Google search engine and can be activated by saying “Okay Google.” It has a lot of useful features such as support for voice input, SMS, and mail, providing navigation, reading the agenda, booking restaurants and searching information, etc. Understanding various local accents in English is also a problem with this chatbot.

112

S. Chandel et al.

E. Facebook Messenger: In 2016, Facebook released its chatbot through the new version of Facebook messenger which got separated from Facebook App and as a result became very popular among the users worldwide. To use this bot, the user just needs to send a text to the designated number and get various things such as booking a restaurant, ordering a meal, shopping online, etc. done. It can also give suggestions to its users like planning an event, dating reminders, sending and receiving red envelopes, location sharing, and even facial expressions according to the various scenarios it experiences on a daily basis. The more it is used, the more intelligent and the more accurate it gets. A unique feature that this bot has is the ability to allow other chatbot developers to use its language processing technology to develop their bots for free.

3 Signiﬁcance of a Chatbot Smartphone apps are in a state of change, which gives a chance for chatbots to take over as a new developing service. Even though the download of applications is increasing, the economic market of apps has been showing a saturation phenomenon. The consulting company indicates that “the independent developer’s dream, which relies on the app store to create a business, has burst already” [4]. The cost of developing, maintaining and promoting an app is getting higher with time. Meanwhile, according to the results of our questionnaire, users’ enthusiasm is also falling because it is getting difﬁcult to download or update some apps for free and switch between different apps. About a quarter of apps are used only once before they are deleted. A lot of users do not like the idea of installing, updating and learning new apps or they rarely use it after installing. Most of them rarely use more than 5 out of 20–30 apps (on an average) on a regular basis, and the most popular ones are still messaging apps and the ones for e-commerce. As a result, it is turning out to be a costly loss for companies who spend time, money, and energy to develop these apps. Vision Mobile analyst, Michael Vakulenko declares that there’s a great need for chatbots. It is hosted by the server instead of installing on the user’s device, just like a website’s page, which reduces the cost and difﬁculty of development, maintenance, and update [2]. That is why chatbot appeals to enterprise users much more than the ordinary users. For them, the performance of chatbot is more welcoming. It just takes a few seconds to install chatbot, and most of them work through already installed messaging apps on users’ side, which is already quite high in numbers with all the popular messaging apps put together. The user does not need to click on the extra icons to switch among different chatbots. Also, talking to a chatbot is less complicated and much faster than talking to a customer service representative of a company regarding getting a response to a question or a speciﬁc recommendation based on a particular need. The main reason behind this delay is because, in spite of an organization spending millions to keep their system updated when it comes to technology, the customer service representative will still have a limitation, regarding the speed with which they can work as a human.

Chatbot: Efﬁcient and Utility-Based Platform

113

However, it takes time to make users completely trust services like a chatbot. Recently, Microsoft designed a model to imitate the millennial generation robot, Tay [5], but soon, it learned dirty words from some Twitter users and other chatting servers. As a result, Tay ended up being rebuilt. Chatbots always provide answers by working together with various search engines (based on their brand) which sometimes does not appeal to the users because it makes them feel like chatbots are short of intellectual functions. The lack of advisory services on an emotional level is also one of the weaknesses that chatbots need to improve in the future. 3.1

Current Situation

In the utility, what the user can only gain is the solid service content. However, with the chatbot, the user will regain service option by automatic message response. Users can ask frequent questions from chatbot and get the corresponding response or recommendations in real-time. Also, the chatbot can automatically add the relevant contents and data model according to user’s questions, which allows companies to understand what data content their customers need and analyze the services and products based on their need. That is the function the utility does not have. Until Apple and Google started to develop its personalized app store, the utility economy was indeed at a rapid development and growth phase. The economics of chatbot also needs the leadership of giant companies. In fact, Microsoft and Facebook are hoping to play that role in the present times. Apple or Google control most of the operating systems of smartphones at present. The robot, by contrast, the market has yet to happen. Facebook is expected to open the chatting platform for all kinds of robots and launch an online store dedicated to showing these services. Currently, the start-up companies associated with chatbots are growing quite rapidly. Thanks to all these tech giants like Microsoft, Google, and Facebook who are trying their best to make chatbots accessible by providing free advanced development tools and framework to the developers. The effort from these companies has resulted in making the job of developers very easy in creating a better chatbot as they can focus more on making a chatbot more accurate and more real for users by using advanced AI, deep learning, machine learning, speech recognition and other natural language processing techniques. Chatfuel, for example, is considered to be one of the most user-friendly, free bot platform for creating an AI-based chatbot on Facebook. It claims to let a developer or anyone for that matter create a chatbot in just 7 min without using any coding. Facebook even allows any company to integrate their bots with its messenger app thus making the app very efﬁcient and versatile regarding searching and looking for speciﬁc information and getting recommendations for some specialized services. Pana and SnapTravel are few of the examples of such collaborations between Facebook and a lot of hotels and travel websites that makes a user’s life much more convenient by getting what they want so comfortably.

114

3.2

S. Chandel et al.

Future

In spite of all these tech giants trying their best to make chatbots more user-friendly and famous, no one can still guarantee that the chatbots will be as successful as smartphone apps are. According to the estimation of Progressive Policy Institute, the latter has created 3.3 million jobs only in the United States and Europe. For developers, the appeal to chatbot economy is not so obvious yet as compared to utilities. If a chatbot is too easy to develop, then it also means that it will be more competitive. Furthermore, users may still be surrounded by a large number of services and user-friendly interactive ways, and thus feel chatbots to be more of an emotionless interference than anything else. Besides, designing a more realistic text or voice interface may not be very easy. After Slack launched the ﬁrst edition of its bot service, Matty Mariansky, who is the co-founder of MeeKan (AI scheduling bot), was shocked using userdiversiﬁed communication. He even hired a scriptwriter who came up with a total of 2000 multiple sentences to deal with a meeting request. The increasing popularity of chatbots shows that people are willing to work with robots. According to a recent Gartner report, by 2020, an average person will have more conversations with AI-enabled bots than they will have with their spouse [6]. But its success largely depends on the “killer chatbot,” i.e., the service that is the most suitable for the existence of chatbots and is more popular. Toby Coppel, who owns a venture capital ﬁrm, believes that health care is an up-and-coming market for chatbots. Chatbots can handle patients with routine diseases very well while doctors can take care of more serious, incurable diseases. Chatting application Kik launched a “robot shop” on April 5. The company founder, Ted Livingston expects that “instant interaction” will be the key to its success. He believes that the future of enterprises not only needs a phone number and web pages as their business ids, but it would also require a mention of having their chatbots. Restaurants through instant messaging platforms and bots can receive an online order for home delivery. In fact, many restaurants in China has already started to provide such services, and they are running very successfully. The chatbots also need to go through a lot of smart explorations to ﬁnd their positioning as they depend on the ability of the providers managing their platform. Telegram allows developers to be engaged in almost all the development work (but they have closed their chatting channels in most of the Islamic countries). Microsoft also promises to remain open as much as possible. Developers and investors have also got some scruples about Facebook’s initiative in this domain because the company has previously had a changeable history, leading the companies which developed apps for their websites, into trouble. There are some companies which are hoping to become a foundation for the survival of other services. Assist wishes to play the role of “Google” for chatbot users by helping them ﬁnd available chatbots as per their needs. Another company named Operator wants to become “Amazon” in the ﬁeld of chatbot business. For example, when a user is looking for some sneakers, the system will contact the nearest salesman or let their own “experts” deal with the order. Operator’s boss Robin Chen (Robin Chan) hopes to create a virtuous cycle with more buyers to attract more merchants, with more businesses in turn to attract more buyers.

Chatbot: Efﬁcient and Utility-Based Platform

115

Devices such as Google Home and Amazon’s Echo are increasingly being used to interact with machines without using a screen. Gartner also predicts that by 2020, 30% of browsing sessions will be done via a screen-less interface. Advances in Natural Language Processing (NLP) and Natural Language Generation (NLG) mean that AIs such as IBM’s Watson can interact and respond in ways that are increasingly indistinguishable from person-to-person communication [7].

4 Design Principle of the Proposed Chatbot Based on all that is currently going on in the ﬁeld of chatbot development, we are proposing a chatbot prototype through this paper. The system framework of our proposed chatbot system is shown in the diagram below (Fig. 1), which is made up of ﬁve functional modules. Automatic Speech Recognition is responsible for converting the voice signal from users into the form of text. Natural Language Understanding module processes the message after receiving the text. After understanding the semantics of users’ input, Language Understanding module transfers the speciﬁc semantic expression to Dialogue Management module. Dialogue Management module is responsible for the coordination of calls between various modules and maintains the current conversational state. It chooses a particular way of replying and gives it to the Natural Language Generation module for processing. A. Automatic Speech Recognition (ASR): It is a technology that allows a human being to interact with a computer interface through their voice in such a way that it seems very close to the actual human conversation in spite of having various accents in their speech. B. Natural Language Understanding: Natural Language Understanding is related to machine reading comprehension. The process of taking parts of sentences and analyzing the meaning is complicated because the machine needs to determine the correct syntactic structure and semantics of the language used.

Fig. 1. Design principles of chatbot.

116

S. Chandel et al.

C. Dialogue Management: Dialogue Management function is mainly responsible for coordinating the various components of a chatbot and maintaining the dialogue structure and its state. The key technologies involved in this feature among many are dialogue behavior recognition, dialogue strategy of learning, dialogue state recognition, and dialogue awards. D. Natural Language Generation: Natural Language Generation is usually based on the non-verbal information produced by the part of conversation management. It automatically generates user-oriented feedback of natural language. In recent years, the dialogue generation of the conversation in the chatbot system mainly relates to the process of retrieving and producing two types of technologies.

5 Empirical Results to a Prototype for the Proposed Test 5.1

A Subsection Sample

We prepared a questionnaire to analyze the current situation of chatbots and applications. The result of the survey is shown in Table 1. Table 1. Questionnaire about chatbot features Questions What function of a chatbot have you used?

What’s your ﬁrst impression of a chatbot?

What do you think about the dialogue functions? What aspects of chatbot need to improve? Please suggest What is the advantage of using a chatbot?

What is the disadvantage of using a chatbot?

Response Online shopping Make restaurant reservation Order tickets Lively, interesting and lovely Intelligent and high-tech Well, maybe still need to improve Lively and interesting The chatbot can respond to our questions accurately The lack of intellectual functions The lack of advisory services Human-computer interaction Transform something into an integrated service platform We need to download a variety of apps Some apps are not practical

After the survey mentioned above completed, we organized a face-to-face meeting, in which all the participants came and shared their experiences and gave their impression regarding the test of our chatbot. We also got suggestions regarding the modiﬁcation and improvement of our chatbot. Based on the participants’ feedback, we

Chatbot: Efﬁcient and Utility-Based Platform

117

compiled the results and arranged it in the form of a table for better understanding (shown in Table 2). 5.2

Recommendation

Chatbots should have the ability to analyze the data sets and create links between these data. They should be capable of automatically generating more accurate and convincing answers for the users, building a better understanding of the response to the questions that are being asked, to be able to project a better understanding of the context of the conversation. Chatbots should continuously update their dataset to create more elaborate answers and knowledge of the parameters used in the conversation as input. It should be able to search for further analytical service source information and establish the most reliable connection for the HCI between chatbots and humans. Chatbots, which perform the function of customer service, have a substantial role to play in creating an impact on its overall beneﬁts. If they can be widely used in the future successfully, then they would be able to control almost the entire service industry’s domain. People in the future will be able to interact with machines by using Automatic Speech Recognition without using a screen. Table 2. Pre-usability testing of process

Should be supposed to use? Should be a user-friendly interface? To give more convenient daily services for people To put functions of apps together To be more practical To reduce the difﬁculty of developing and updating app To help users improve working efﬁciency

Strong positive 18 19

Positive

Neutral

Negative

02 00

00 01

00 00

Strong negative 00 00

16

02

01

01

00

17

02

01

00

00

18 15

02 03

00 02

00 00

00 00

16

03

01

00

00

6 Proposed Chatbot The answers we get from a chatbot comes from its database which uses the concept of AI and language processing techniques to establish a more personalized response. So, we need to ﬁnd out how to ﬁnd the answer corresponding to a particular question. After doing our analysis, we found that the segmentation of sentences is critical. We can use the space to recognize the difference between words. The flowchart of the proposed chatbot is depicted in Fig. 2 that shows the process to get the key phrases.

118

S. Chandel et al.

Fig. 2. The flow chart of the proposed chatbot.

We also generated an algorithm depicting the working logic of the proposed chatbot which is as follows: The complete working process of the algorithm can be explained as follows: Algorithm 1：：Word-segmentation Process 1.Initialization {dic ＝ Dictionary, Dt = Decompose the sentence, St = Segmentation text, Se = Sentences, β = dealing sentences, δ

= available}

2. Input {Se: Sentence} 3. Output {St} 4. Set dic // loading dictionary 5. For St = 1 to n 6. Do St // obtaining segmentation text 7. End for 8. Process Dt // decompose the sentence 9. If Se = β then 10. Display St 11. Else if do maximum process 12. Check ambiguity 13. If ambiguity = δ, then // Is ambiguity available 14. Resolve ambiguity, otherwise 15. go to step 7 16. end if 17. end else if 18. end if

Chatbot: Efﬁcient and Utility-Based Platform

119

Steps 1–3 show initialization of input and output respectively. In Step 4, the loading process of the dictionary begins. Steps 5–7 show the sentence checking and segmentation process. In Step 8, the decomposition to split the long sentences into small sentences takes place. Steps 9–10 deal with the sentences and their display. Step 11 uses the maximum matching technique to ﬁnd the longer phrases in the split parts of sentences, which is used as the keyword to search the corresponding answer in the database. Steps 12–14, determine the ambiguity of the phrases. If ambiguity is found, then it is addressed. Step 15 checks whether any ambiguity exists in any sentences. If not, then the process of dealing with the sentences is initialized in Step 15.

7 Mathematical Analysis and Complexity The time complexity can calculate the performance of any chatbot. The time complexity refers to the total amount of execution time reserved to run as a task signifying the input. Also, the time complexity is measured by calculating the number of primary operations accomplished by the algorithm, and a central procedure takes a constant amount of time to execute. The lesser is the time, the higher is the efﬁciency. In Fig. 3 below, we show the trend of the time complexity of our proposed chatbot and comparison of its time complexity with other famous chatbots, namely, Simsimi, Siri, Cortana, Google Now and Facebook Messenger.

) log n

)

O(n

Time Complexity T(n) Seconds

0.14

O (n

O(log n +n)

)

0.16

O(n

O(n 2

0.18

log n )

Malicious Nodes: 2%

0.20

0.12

O(log n)

0.10 0.08 0.06

Chatbot Simsimi

0.04

Siri Cortana

0.02

Google Now Facebook-M

0

8.0

16.0

24.0

32.0

40.0 48.0 56.0 Input file size [KB]

64.0

72.0

80.0

Fig. 3. Showing the time complexity of our proposed chatbot and other known chatbots in the best-case.

120

S. Chandel et al.

The results show that our proposed chatbot has the time complexity of O (log n) and it took 0.08 s to complete the analysis of 80 kilobytes of the input ﬁle. The results conﬁrm that our proposed chatbot provides better improvement compared to the other known, existing chatbots. Our proposed chatbot has lowest time complexity because of the use of the word segmentation that helped to reduce the time complexity. The time complexity analysis of our proposed chatbot and other known chatbots are given in Table 3. Table 3. Time complexity of our proposed and known chatbots Time complexity of proposed approaches Facebook messenger

Proposed chatbot

Siri

Simsimi

Time complexity T (n) = T (n − 1) + T (0) + O(n) ¼ T (n − 1) + O(n) ¼ O(n2) T ðnÞ ¼ at nb þ OðnÞ Problem consists of ﬁnite set of inputs, but computation complexity remains constant ‘n’ T ðnÞ ¼ t n2 þ OðnÞ T ðnÞ ¼ t n2 þ n ðnÞ ¼ t nn þ n T ðnÞ ¼ tð1Þ þ n T ðnÞ ¼ t þ n Where ignore t; therefore, we get T ðnÞ ¼ n n = k & k = log n By substitution, we get thus, the complexity is O ðlog nÞ T ðnÞ ¼ at nb þ OðnÞ The problem consists of a ﬁnite set of inputs, but its computation time linearly increases. Thus, T ðnÞ ¼ t n2 þ OðnÞ T ðnÞ ¼ t nn þ OðnÞ T ðnÞ ¼ t þ OðnÞ Where ignore t; therefore T ðnÞ ¼ OðnÞ T ðnÞ ¼ at nb þ OðnÞ Where the problem is divided into two parts with the same size. However, the algorithm is inﬁnite. Thus T ðnÞ ¼ 2t n2 þ OðnÞ T ðnÞ ¼ 2t n2 þ OðnÞ ðnÞ ¼ 4t n4 þ n þ n T ðnÞ ¼ 4t nn þ 2n T ðnÞ ¼ 4t þ 2n T ðnÞ ¼ OðknÞ T ðnÞ ¼ Oðlog nnÞ Where k = log n T ðnÞ ¼ Oðn log nÞ (continued)

Chatbot: Efﬁcient and Utility-Based Platform

121

Table 3. (continued) Time complexity of proposed approaches Cortana

Google Now

Time complexity T ðnÞ ¼ at nb þ OðnÞ The problem consists of a ﬁnite set of inputs, but computation complexity remains constant ‘n.’ T ðnÞ ¼ t n2 þ OðnÞ T ðnÞ ¼ t n2 þ n þ n ... ... . .. ðnÞ ¼ t nn þ n þ n T ðnÞ ¼ tð1Þ þ n þ n T ðnÞ ¼ t þ n þ n Where ignore t; therefore, we get T ðnÞ ¼ n þ n n = k & k = log n we get O(log n + n) T ðnÞ ¼ at nb þ OðnÞ The problem is divided into two parts with different size according to the need of the proposed algorithm T ðnÞ ¼ t n3 þ t 2n 3 þ OðnÞ T ðnÞ ¼ t n3 þ t 2n 3 þ OðnÞ T ðnÞ ¼ t n3 þ t 2n 3þnþn T ðnÞ ¼ t nn þ t 2n n þnþn T ðnÞ ¼ 2t þ 2n T ðnÞ ¼ lognn T ðnÞ ¼ OðnlogðnÞÞ

8 Conclusion and Future Work The efﬁcient and utility-based platform for chat is proposed with a low time complexity and better efﬁciency. Humans interact with chatbots rather than network connections or applications, which is the integration of utilities. As the app market is maturing, chatbots based on the text and voice are expected to inherit the app economy, becoming the new growth point of science and technology ﬁeld in the future. Chatbots are expected to change the way how business is done by changing the end-user experience and how the companies advertise and make money. There is a vast potential for the growth of chatbots. The future looks very promising as well as challenging. Trying to achieve an entirely realistic conversation on an emotional level with a robot is not very easy. But with the advanced AI technologies, better algorithms, and machine learning techniques, it can be achieved Especially when not only the tech giants like Facebook, Microsoft or Google but many other start-up

122

S. Chandel et al.

companies are also trying to open their chatting platform to all kinds of robots. Moreover, some companies are hoping to become a foundation of the survival for other services in this domain. The slogan “there is always a chatbot for you” will perhaps become a reality soon. Acknowledgment. This work was supported in part by the National Natural Science Foundation of China under Grant 61572263, Grant 61502251, Grant 61502243, and Grant 61602263.

References 1. Spyrou, E., Iakovidis, D., Mylonas, P. (eds.) Semantic Multimedia Analysis and Processing. CRC Press, Boca Raton (2014) 2. Jacko, J.A. (ed.) Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications. CRC Press, Boca Raton (2012) 3. Barga, R., Fontama, V., Tok, W.H: Cortana analytics. In: Predictive Analytics with Microsoft Azure Machine Learning, pp. 279–283. Apress, New York (2015) 4. Chaoguo, C., Rui, L.: Chatbot is the new entrance without only apps (2016) 5. Sina Technology, I’m smoking kush in front of the police (Unpublished) 6. Marr, B.: From big data to insights: what questions would you ask your AI chat robot? (2016) 7. Asseﬁ, M., Liu, G., Wittie, M.P., Izurieta, C.: An experimental evaluation of Apple’s Siri and Google speech recognition. In: Proceedings of the 2015 ISCA SEDE (2015)

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis (MCDA) for Life Cycle Assessment Andrzej Macioł(&) and Bogdan Rębiasz Faculty of Management, AGH University of Science and Technology, Krakow, Poland {amaciol,brebiasz}@zarz.agh.edu.pl

Abstract. In every case of analysis of Life Cycle Assessment (LCA), there is the problem of comparing repeatedly contradictory criteria related to various types of impact factor. Traditional methods of LCA analysis are not capable of implementing such comparisons. This is a problem for multi-criteria evaluation. The analogy between the LCA and MCDM methodologies and the description of LCA as an MCDM problem for resolving the trade-offs between multiple environmental objectives are discussed in this study. The objective of the study is evaluation of opportunities of the use of knowledge-based methods to aggregate LCA results. We compare the results obtained with knowledge-based methods with results from a variety of specialized multi-criteria methods. The research used two classical multi–criteria decision making methods analytic hierarchy process (AHP) and technique for order of preference by similarity to ideal solution (TOPSIS), conventional (crisp) reasoning method and Mamdani’s fuzzy inference method. Classical rule-based approach flattens the results of assessments that practically are not suitable for LCA. The obtained results demonstrate that among the knowledge-based methods, crisp reasoning does not give satisfactory results. Mamdani’s method, AHP method and TOPSIS method allow diversity in the assessment but there are not solutions to assess the quality of these valuations. Keywords: Environmental indicators Life-Cycle Assessment (LCA) Multi-criteria decision analysis (MCDA) Rule-based MCDA Fuzzy reasoning in MCDA Light-duty vehicles

1 Introduction It is widely recognized that one of the most important factors affecting the quality of the environment is the use of modes of transport (especially in large urban areas). Many publications concern analyses of the harmful environmental impact of different types of passenger cars [1–5]. This problem is global and many actions are undertaken to limit

This work is supported by the AGH University of Science and Technology statutory research No. 11/11.200.327. © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 123–139, 2019. https://doi.org/10.1007/978-3-030-01174-1_10

124

A. Macioł and B. Rębiasz

the impact. This is evidenced by the regulations adopted by the EU. Unfortunately, these regulations are often based only on impact factors associated with vehicle operation (fuel consumption, NOx emission, CO emission, particulate matter and other). Such a restriction can cause a very imperfect solution. It is necessary to take into account the impact on the environment a full life cycle from cradle to grave. In every case of analysis of LCA there is the problem of comparing repeatedly contradictory criteria related to various types of impact factor. Traditional methods of LCA analysis are not capable of implementing such comparisons. This is a problem for multi-criteria evaluation. In the 1990s, an attempt was made to combine elements of LCA and MCDA. These researches are continued and presented in many publications [5–9]. Examples of discussion of the analogy between LCA and MCDM in the automotive sector include studies on different biofuels [10, 11]; transportation systems [12], vehicle fuels [3, 13–16], routes maintenance concepts [17] and comparison of different personal cars in reference to the environmental impact [5]. All these authors estimate the alternatives using MCDA methods. Some of them use weighted sums and additive value functions [12, 14, 17], others use AHP method [10], preference ranking organization method for enrichment of evaluations (PROMETHEE), stochastic multiattribute analysis for life-cycle assessment (SMAA-LCIA) [15, 16, 18], compromise programing [13] and ELECTRE TRI (fr. elimination et choix traduisant la realité) [5]. There is also the possibility of using methods that have been developed for solving another class of problems but they can be an alternative in some cases to MCDA. This multiplicity of available tools paradoxically creates additional problems. It was demonstrated in an earlier study that various MCDA methods for the same input data may produce different results [19]. There are different conditions for the use of various methods, such as a set of necessary input data, layout of the input data, computational complexity, and the way that results are interpreted. There is a need for further studies to evaluate the different MCDM and to develop criteria for selection of these methods in different decision-making situations. The objective of the study is evaluation of opportunities of the use of knowledgebased methods to aggregate LCA results. We compare the results obtained with knowledge-based methods with results from a variety of specialized multi-criteria methods. These studies were conducted using personal vehicles LCA as an example. In the research attention was given to two classical multi–criteria decision making methods (AHP and TOPSIS), conventional (crisp) reasoning method and Mamdani’s fuzzy inference method.

2 Data Sources and Methods 2.1

LCA Data and Methods

To evaluate the usefulness of different methods of multi-criteria analysis, data presented in [5] was used taking into consideration assumptions concerning LCA lightduty vehicles. LCA was applied to assess potential environmental impacts of six EURO 5 compact passenger vehicles (light-duty vehicles): a gasoline internal combustion

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis

125

engine vehicle, a diesel internal combustion engine vehicle, a hybrid electric vehicle (HEV), a plug-in hybrid electric vehicle 10-mile battery range (PHEV10), a plug-in hybrid electric vehicle 40-mile battery range (PHEV40), and a battery electric vehicle (BEV). The research described in this publication, as one of the few, includes the overall life-cycle of vehicles and their components (e.g., batteries), as well as the electricity generation system and the production of fossil fuels (gasoline and diesel), from a cradle-to-grave perspective. The inventory data were characterized into the following indicators, according to the CML 2001 LCA method [20]: abiotic depletion (AD), acidiﬁcation (AC), eutrophication (EUT), global warming (GW), ozone layer depletion (OLD), and photochemical oxidation (PO). Additional indicators addressed vehicle operation: fuel consumption (primary energy) (FC) and tailpipe and abrasion emissions (NOx, CO, particulate matter PM), since the use phase was considered important in the comparison of vehicles. With the MCDA method used in the referenced paper (ELECTRE TRI) normalization is not required. Nevertheless, normalization was performed as a means to facilitate the communication with stakeholders, in particular decision makers. It consisted in representing the impact of the alternatives with respect to the emissions of a reference Portuguese fleet in 2011. Using the ELECTRE TRI method weighting is not required and it was not done in the cited research. Because of the purpose of our research (comparison of different MCDA methods), we concluded that the external normalization adopted in the cited paper is not adequate. Therefore, we adopted internal normalization of relative contribution [21], [9] that does not have the issues of external normalization (mainly due to difﬁculties in ﬁnding a suitable external reference set). Unfortunately, for many MCDA methods, this raises the bias of dependence on other alternatives: adding or removing one alternative may change the relative positions of the remaining alternatives [22]. The choice of normalization can have an important impact on the results as shown by [18, 23]. Despite these imperfections we concluded that, given that in the next stage weighting was made, use of internal normalization is the only solution. Internal normalization consists of using the highest and lowest impacts of different alternatives being compared as references to transform the original scales into [0, 1] ranges. In our case we used as a reference the highest value of each impact. The next step of data preparation was weighting. As rightly is noted in the report “Background Review of Existing Weighting Approaches in Life Cycle Impact Assessment (LCIA)” [24] according to ISO 14040 and 14044, weighting is an optional and controversial element in Life Cycle Impact Assessment (LCIA). Several weighting methods have been developed over last years. This can be classiﬁed in three categories: subjective, so called panel methods, where a group of experts provide their weighting factors, “monetization” methods, where the weighting factors are expressed in monetary costs and distance-to-target methods, where the weighting factors are calculated as a function of some type of target values, which are often based on political decisions. We used distance-to-target method. In our opinion, most mature is the concept for weighting in the EDIP [25] methodology. The ﬁgures used for weighting are based on the political reduction targets for the individual substances contributing to the relevant impact category.

126

A. Macioł and B. Rębiasz

The weights will be used directly in classical MCDA algorithms and in the case of rule-based methods they will help the experts in rules formulation. The process for applying and using weighting in this project contains the following steps: • Deﬁnition of actual emissions in the reference year, • Deﬁnition of targeted emissions in the target year, • Calculation of weighting ratios. In our research we have tried to determine the weights, taking into account the fact that the aim of research is not to support speciﬁc decisions but only estimate the usefulness of different MCDA methods. We assume that a good enough approximation of universal weights for impact factors in the case of light-duty vehicle LCA will be relations of the level of environmental impacts for Portugal fleet to the comparable levels of impact in Europe. The impact of different types of influence was determined based on various reliable sources [26, 27]. Such an assumption is questionable, since it can be regarded as a kind of normalization and not weighting. Nevertheless, considering that the previous stage adopted internal normalization and it is used in many studies, it is assumed that the weighting coefﬁcients should be equal. The quotient of the magnitude of the environmental impact caused by the life cycles and operation of vehicles and the overall impact on the environment can be considered as an acceptable way to express weights in evaluation of MCDA. There is no methodical basis to ﬁnd that such a formulation of weights can affect the outcome of the evaluation of each MCDA method. Table 1 presents the normalized results of each alternative in the respective indicator and the weights of indicators. 2.2

MCDA Classical Methods

Using MCDA methods, decision makers can select the best alternatives in the presence of multiple criteria. The criteria are repeatedly contradictory. Each criterion takes into account one aspect of the analyzed problem. MCDA methods allows evaluating the weight of each criterion. Using these weights the decision maker can select the preferred alternative. Considering the above properties of MCDA methods, these methods in LCA can be used to aggregate multiple criteria assessments of various technologies into a single synthetic indicator. This enables clear sorting of these technologies from the point of view of their impact on the environment. These methods therefore form the basis for a clear and easy interpretation of results of LCA. There are several dozen multi-criteria decision-making methods [28–30] described in the literature. The most well-known methods of MCDA methods are: AHP, TOPSIS PROMETHEE, ELECTRE and VIKOR (sr. visekriterijumska optimizacija i kompromisno resenje). The most popular and most commonly used are AHP and TOPSIS [29–31]. In our research we used two most popular MCDA classical methods: • AHP developed in 1980 by Saaty [28, 29] and improved inter alia by [32, 33].

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis

127

• TOPSIS developed by Hwang and Yoon [34, 35] based on the assumption that the best alternative should have the shortest distance from the positive ideal solution (PIS) and the farthest distance from the negative ideal solution (NIS). Table 1. Normalized indicator values and it’s weight Indicator AD: Abiotic depletion (g Sb eq) AC: Acidiﬁcation (g SO2 eq) EUT: Eutrophication (g PO4 3-eq) GW: Global warming (g CO2 eq) OLD: Ozone layer depletion (g CFC-11 eq) PO: Photochemical oxidation (g C2H4 eq) FC: Fuel consumption (MJprim) NOx (g) CO (g) PM: Particulate matter (g)

2.3

Weights Gasoline Diesel HEV PHEV10 PHEV40 BEV 29.86% 1.00 0.89 0.90 0.72 0.84 0.72 7.85% 0.84 3.19% 0.37

0.78 0.40

1.00 0.72 0.42 0.49

0.95 0.82

0.78 1.00

15.63% 1.00

0.89

0.90 0.68

0.79

0.61

0.01% 0.10

0.09

0.17 0.28

1.00

0.03

2.23% 0.95

0.66

1.00 0.81

0.88

0.46

1.23% 1.00

0.88

0.86 0.46

0.28

0.00

14.76% 0.17 14.46% 1.00 10.79% 0.97

1.00 0.67 1.00

0.17 0.08 1.00 0.45 0.97 0.61

0.04 0.26 0.48

0.00 0.00 0.32

Conventional (Crisp) Reasoning Method

Using the conventional rule-based reasoning system in MCDA is not a commonly known approach. However, efforts are undertaken to use this approach, inter alia, in agricultural sustainability investigations and investment analyses. An example would be the DEXiPM system for assessment of the sustainability of agricultural cropping systems decision support model, developed to design any hierarchical decision tree [36] and method of multi-criteria comparison of investment projects [19]. The essence of the rule-based approach is the transformation of premises in the model inference from crisp values ﬁrst into interval values and next into linguistic values. The next step is to formulate rules by experts in the form of Horn clauses that allow for reliable inference about the value of the conclusions presented in the form of linguistic variables. Due to the extent of the problem, the rules are divided into hierarchized rule set bindings by the intermediate conclusions. The ﬁnal conclusions can be formulated as a numerical assessment in the proper scale and simply used thereafter in the rating of the analyzed alternatives.

128

2.4

A. Macioł and B. Rębiasz

Mamdani’s Fuzzy Inference Method

The most commonly used fuzzy inference technique is the so-called Mamdani’s method that was proposed by Mamdani and Assilian [37]. This model was created for the implementation of control systems simulating human behavior. The Mamdani model is a rules set, each of which deﬁnes the so-called fuzzy point. The rules are as follows: R1 : IF ðx1 is X11 Þ AND ðx2 is X12 Þ AND. . .AND ðxm is X1m Þ THEN ðy ¼ Y1 Þ R2 : IF ðx1 is X21 Þ AND ðx2 is X22 Þ AND. . .AND ðxm is X2m Þ THEN ðy ¼ Y2 Þ . . .. . .. . .. . .. . .:

ð1Þ

Rn : IF ðx1 is Xn11 Þ AND ðx2 is Xn22 Þ AND. . .AND ðxm is Xnmm Þ THEN ðy ¼ Yn Þ where xi - are crisp values of the current input, Xij and Yk - are linguistic values (represented by fuzzy sets) of the variables xi and y in the respective universes. Inference is performed in the following way: The ﬁrst step (Fuzziﬁcation) is to take the crisp inputs xi, and determine the degree to which these inputs belong to each of the appropriate fuzzy sets. The fuzziﬁed inputs are applied to the antecedents of the fuzzy rules in the second step. If a given fuzzy rule has multiple antecedents, the fuzzy operator AND is used to obtain a single number that represents the result of the antecedent evaluation, which in turn determines the value of the conclusion. The membership functions of all the rule consequents are combined into a single fuzzy set in the third step. The last step is defuzziﬁcation. The most popular defuzziﬁcation method is the centroid technique. It ﬁnds a point representing the center of gravity (COG) of the aggregated fuzzy set A in the interval [a, b].

3 Results 3.1

AHP Method

Table 2 presents a pairwise comparison of the criteria. When deﬁning these magnitudes, the ordinal scale presented in [29] and weights presented in Table 1 were used. To deﬁne Table 2 we compared each pair of weights assigned to indicators and on this basis we chose the magnitude from the scale contained in the above-mentioned publication. In Table 3 is presented the ﬁnal priority of a criterion.

Abiotic depletion Acidiﬁcation Eutrophication Global warming Ozone layer depletion Photochemical oxidation Fuel consumption NOx CO

1.00

0.33

3.00 3.00

0.11

0.11

0.20 0.20

3.00 3.00

1.00

1.00

1.00 1.00

0.20

0.20

0.33 0.20 1.00 0.20

1.00 1.00 3.00 0.33

0.14 0.11 0.20 0.11

1.00 1.00 5.00 1.00

Acidiﬁcation Eutrophication Global warming 7.00 9.00 5.00

Abiotic depletion 1.00

5.00 5.00

1.00

1.00

3.00 1.00 5.00 1.00

Ozone layer depletion 9.00

5.00 5.00

1.00

1.00

1.00 1.00 5.00 1.00

Photochemical oxidation 9.00

Table 2. Pairwise comparison of criteria

5.00 5.00

1.00

1.00

3.00 1.00 5.00 1.00

0.33 0.33 1.00 0.20

1.00 0.33 1.00 0.33

1.00 1.00 1.00 1.00 1.00 1.00

0.20 0.20 0.33

0.20 0.20 0.33

0.33 0.33 1.00 0.20

Fuel NOx CO Particulate consumption matter 9.00 5.00 5.00 7.00

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis 129

130

A. Macioł and B. Rębiasz Table 3. Final priority of criterion The criterion Abiotic depletion Acidiﬁcation Eutrophication Global warming Ozone layer depletion Photochemical oxidation Fuel consumption NOx CO Particulate matter

Final priority of criterion 0.390 0.050 0.033 0.123 0.027 0.030 0.027 0.117 0.117 0.087

In Table 4 is presented the pairwise comparison matrix of analyzed alternatives according to the criterion Abiotic depletion. This table arose from the data contained in Table 1 and the use of an ordinal scale presented in [29]. To deﬁne Table 4 we compared each pair of indicators assigned to analyzed vehicles and on this basis we chose the magnitude from the scale contained in the above-mentioned publication. The same tables were developed for the remaining criteria. On the basis of these tables the performances of each alternatives with respect to each criterion was calculated (see Table 5). Table 4. Pairwise comparison alternatives according to criterion abiotic depletion Gasoline Gasoline 1.00 Diesel 3.00 HEV 3.00 PHEV10 9.00 PHEV40 5.00 BEV 9.00

Diesel 0.33 1.00 1.00 5.00 1.00 5.00

HEV 0.33 1.00 1.00 7.00 1.00 7.00

PHEV10 0.11 0.20 0.14 1.00 0.20 1.00

PHEV40 0.20 1.00 1.00 5.00 1.00 5.00

BEV 0.11 0.20 0.14 1.00 0.20 1.00

The global priority for each alternative is presented in Table 6. According to the data in Table 6, BEV is the best. 3.2

Topsis

The weighted normalized decision matrix was calculated using the data of Table 1. The weights of criterion elaborated in the AHP method, presented in Table 3 were used in the calculations. In Table 7 is presented a weighted normalized decision matrix. In Table 8 is presented the distances from the PIS and NIS for the alternatives and closeness coefﬁcients.

Abiotic depletion Gasoline 0.028 Diesel 0.077 HEV 0.068 PHEV10 0.372 PHEV40 0.083 BEV 0.372

Acidiﬁcation Eutrophication Global warming 0.149 0.237 0.028 0.225 0.237 0.061 0.034 0.237 0.061 0.324 0.224 0.324 0.044 0.041 0.132 0.225 0.023 0.395

Ozone layer depletion 0.195 0.195 0.195 0.156 0.023 0.235

Photochemical oxidation 0.047 0.253 0.054 0.064 0.071 0.511

Fuel consumption 0.040 0.040 0.046 0.164 0.218 0.492

Table 5. The performances of each alternatives with respect to each criterion

0.196 0.022 0.196 0.196 0.196 0.196

0.034 0.070 0.034 0.166 0.220 0.477

NOx CO

Particulate matter 0.039 0.039 0.039 0.175 0.225 0.482

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis 131

132

A. Macioł and B. Rębiasz Table 6. The global priority for each alternatives BEV PHEV10 PHEV40 Diesel HEV Gasoline

Global priority 0.357288 0.274493 0.126740 0.087428 0.084214 0.069837

Table 7. Weighted normalized decision matrix Abiotic depletion Acidiﬁcation Eutrophication Global warming Ozone layer depletion Photochemical oxidation Fuel consumption NOx CO Particulate matter

Gasoline 0.3707 0.0424 0.0184 0.1235 0.0028 0.0287 0.0271 0.0203 0.1173 0.0844

Diesel 0.3305 0.0392 0.0195 0.1099 0.0025 0.0199 0.0239 0.1173 0.0782 0.0872

HEV 0.3350 0.0503 0.0205 0.1114 0.0045 0.0303 0.0232 0.0203 0.1173 0.0844

PHEV10 0.2680 0.0360 0.0239 0.0843 0.0075 0.0244 0.0125 0.0094 0.0527 0.0534

PHEV40 0.3126 0.0477 0.0404 0.0979 0.0271 0.0267 0.0075 0.0047 0.0306 0.0422

BEV 0.2680 0.0392 0.0491 0.0753 0.0009 0.0140 0.0000 0.0000 0.0000 0.0281

Table 8. Distances from PIS and NIS for alternatives and closeness coefﬁcients for analyzed alternatives BEV Distance from PIS 0.03087 Distance from NIS 0.21378 Closeness coefﬁcients 0.873819

PHEV10 0.062669 0.174702 0.735986

PHEV40 0.072008 0.163514 0.694261

HEV 0.155439 0.110373 0.415229

Gasoline 0.176665 0.104961 0.372698

Diesel 0.170584 0.071095 0.294171

According to the data in Table 8 BEV are the best. 3.3

Conventional (Crisp) Reasoning Method

The core idea of the rule-based reasoning approach is the evaluation of the analyzed vehicles using if-then rules. In the case of the conventional approach, crisp linguistic variables were used to describe the environmental impacts of the analyzed vehicles. The variables assume the values of the domain {low, middle, high} estimated assessment of vehicles from the point of view of their impact on the environment. This estimation is described on a scale of 1 to 5 (1 is a vehicle with the lowest rating - the highest negative impact on the environment).

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis

133

The antecedents were transformed into linguistic variables in such a way that their range of variation [0; 1] was divided into three intervals of the same length [0;0.3333], (0.3333; 0.6667] and (0.6667; 1]. The transformation is very simple. For example, the Abiotic depletion of BEV vehicle equal to 0.72 gives the linguistic value high, and Global warming of the same vehicle equal to 0.61 gives the linguistic value middle. Due to the very low level of ozone layer depletion of vehicles in the fleet in the global environmental problem this factor was omitted in further analysis. Unfortunately, building a rule set taking into account all combinations of examples of values of input variables is not possible due to the phenomenon of exponential “explosion” of the number of rules (the number of rules grows exponentially with the number of variables in the premise). The introduction of intermediate criteria (“artiﬁcial” or partial variables) is the only possible way to limit the complexity of description and to bring the knowledge base model to a form manageable by experts. In our view, the rational is as follows to structure the knowledge base. First, we independently take into consideration LCA factors and factors related to vehicle operation. This will allow for balancing ratings of these two groups of factors to make it possible to take into account objectives and strategies for stakeholder analysis. However, such a division of the rule set does not solve the problem. Since there are no grounds for substantial decomposition of the subset, it would be logical to divide indicators from the point of view of the importance of their impact on the level of the relevant phenomena on a global scale. The ﬁrst subset included abiotic depletion and global warming, whose share in the level of these phenomena in Europe amounts to 29.86% and 15.63%, and the second subset acidiﬁcation, eutrophication and photochemical oxidation with shares of respectively, 7.85%, 3.19% and 2.23%. Due to the character of the variables, it was possible to automatically generate the examples in the form of a Cartesian product. Next, the crisp values of the intermediate and ﬁnal assessments were assigned. The knowledge model can be presented in the form of ﬁve decision tables (Tables 9, 10, 11, 12 and 13). Table 9. Decision table for LCIA high weights indicators assessment (selected rows) Abiotic depletion Global warming 1 High High 2 High Middle … … … 9 Low Low

LCIA high weights High High … Low

The reasoning is realized in ﬁve stages (steps). During the ﬁrst stage, the LCA high weight indicators assessment is established as a result of the relations between the abiotic depletion and global warming, as deﬁned by the set of rules (Table 9). Next, the LCA low weights indicator assessment is established using rules presented in Table 10 and the global LCA indicators assessment using the decision table presented in Table 11. During the fourth stage, vehicle operation is evaluated as the joined effect of fuel consumption, NOx emission, CO emission and particulate matter indicator

134

A. Macioł and B. Rębiasz

Table 10. Decision table for LCIA low weights indicators assessment (Selected Rows) 1 2 … 27

Acidiﬁcation High High … Low

Eutrophication High High … Low

Photochemical oxidation LCIA low weights High High Middle High … … Low Low

Table 11. Decision table for global LCI indicators assessment (Selected Rows) 1 2 … 9

LCA high weights LCA low weights LCA global High High High High Middle High … … … Low Low Low

(Table 12). Finally, the assessment of each vehicle type is established on the grounds of previously evaluated LCA global level and vehicle operation complex indicator (Table 13). In Table 14 the intermediate and ﬁnal results of reasoning are presented. Table 12. Decision table for vehicle operation indicators assessment (Selected Rows) 1 2 … 81

Fuel consumption High High … Low

NOx High High … Low

CO High High … Low

Particulate matter Vehicle operation High High Middle High … … Low Low

Evaluation of particular types of vehicles is flattened signiﬁcantly by the result of calculation. The lowest assessment is the diesel vehicle and all others have the same rating. 3.4

Mamdani’s Fuzzy Inference Method

By Mamdani’s method, we use the same rules as in case of crisp reasoning. Mamdani’s method requires that all input variables are either directly presented in the form of linguistic variables or transformed into this form. In our example, we have to address crisp values. That is, why it is necessary to transform them into the form of linguistic variables. For each of input variables the input membership functions are deﬁned by triangular fuzzy number. They can be represented in the following form: Tlow(0; 0; 0.5) Tmiddle(0; 0.5; 1.0) Thigh(0.5; 1.0; 1.0)

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis

135

Table 13. Decision table for ﬁnal assessment (Selected Rows) 1 2 … 9

LCIA global Vehicle operation Final assessment High High 1 High Middle 2 … … … Low Low 5

Table 14. Intermediate and ﬁnal results of crisp reasoning Indicator LCIA high weights LCIA low weights LCIA global Vehicle operation Final assessment

Gasoline High High High Middle 2

Diesel High Middle High High 1

HEV High High High Middle 2

PHEV10 High High High Middle 2

PHEV40 High High High Low 2

BEV High High High Low 2

The values of the linguistic variables that are necessary to calculate the value of the membership function were set according to the formulas presented [38]. The exemplary input data represented by membership grade are shown in Table 15. Table 15. Exemplary input data for Mamdani’s reasoning after fuzziﬁcation Indicator Abiotic depletion Low Middle High

Gasoline Diesel HEV PHEV10 PHEV40 BEV 0.00 0.00 1.00

0.00 0.22 0.78

0.00 0.00 0.19 0.55 0.81 0.45

0.00 0.31 0.69

0.00 0.55 0.45

The ﬁnal assessment was evaluated as weighted average of partial results (Table 16). Table 16. The ﬁnal assessment of Mamdani’s reasoning Gasoline Diesel HEV PHEV10 PHEV40 BEV Final assessment 1.04 1.22 1.72 2.08 2.21 2.55

The results of inference are basically consistent with the results of crisp reasoning; however, it permits diversity of vehicles which were recognized in the previous analysis as identical in terms of impact on the environment. The classes of vehicles Gasoline, HEV, PHEV10, PEHV40 and BEV assessed by crisp reasoning on the same level vary greatly.

136

A. Macioł and B. Rębiasz

4 Assessment and Comparison of the Results Due to the different ways of evaluating the various methods, direct comparison of the results is not warranted. In view of this, a normalization of results obtained using different methods was performed. To normalize the data, high-value as a base was used. The comparison of normalized results of the methods used is presented in Table 17. Regardless of the method of evaluation of vehicles we obtained similar supreme results. In any case, the highest evaluation was BEV vehicles. The lowest evaluations were varied. In some cases, the lowest evaluation was obtained for diesel engines and in some for gasoline engines. Table 17. Comparison of various methods results Gasoline Diesel HEV PHEV10 PHEV40 BEV AHP

Normalized Ranking TOPSIS Normalized Ranking Classical rule-based Normalized Ranking Mamdani’s Normalized Ranking

0.20 6 0.43 5 2 1 0.41 6

0.25 4 0.34 6 1 2 0.48 5

0.24 5 0.48 4 2 1 0.67 4

0.77 2 0.84 2 2 1 0.82 3

0.36 3 0.79 3 2 1 0.87 2

1.00 1 1.00 1 2 1 1.00 1

Also intermediate estimates obtained by different methods are varied. Classical rule-based approach flattens the results of assessments that practically are not suitable for LCA. Increasing the expressiveness of this approach is indeed possible, but would be linked with the need to analyze a much greater number of examples that virtually eliminates expert rational assessment. Despite the similarity of the results obtained with the classical MCDA methods and the rule-based methods, it can be seen that the evaluations made by Mamdani’s methods are more in line with common sense judgments. This is because rule-based methods reflect a human-like way of thinking. However, to ﬁnd out that this feature favors rule-based methods, further research is needed. The advantage of rule-based methods is also the fact that the knowledge model and then the identiﬁable way of reaching the conclusion is user-readable, which is difﬁcult to say in the case of classical MCDA methods.

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis

137

5 Conclusions The aim of our research was to verify the hypothesis that for the assessment of different types of vehicles, appropriately good results can be obtained using conventional (crisp) reasoning method and Mamdani’s fuzzy inference method. The results obtained by these methods were compared with classical multi–criteria decision making methods (AHP and TOPSIS). The obtained results demonstrate that among the analyzed knowledge based methods, crisp reasoning does not give satisfactory results. Remaining methods allow diversity in the assessment but there are not solutions to assess the quality of these valuations. The fact that the AHP method, TOPSIS method and Mamdani’s method signiﬁcantly differentiate the different types of engines despite different mechanisms of reasoning leads to the adoption of the prudent hypothesis that further work should focus on Mamdani’s method. Especially promising are fuzzy knowledge-based systems that map not only knowledge but also a method of inference by experts. Future work will analyze other fuzzy reasoning methods, among others commonly used Takagi-Sugeno and RIMER methods presented in [19]. However, the key problem that requires further research is the question of weighting the LCA and operation indicators. As mentioned in Sect. 2.1, the most mature is the concept for weighting in the EDIP methodology. Future work will use this approach to determine weights and rules based on the intentions of certain European countries or large cities.

References 1. Nemry, F., Leduc, G., Mongelli, I., Uihlein, A.: Environmental Improvement of Passenger Cars (IMPRO-car) (2008). http://www.jrc.es/publications/pub.cfm?id=1564 2. Messagie, M., Macharis, C., Van Mierlo, J.: Key outcomes from life cycle assessment of vehicles, a state of the art literature review. In: Electric Vehicle Symposium and Exhibition (EVS27), 2013 World, pp. 1–9 (2013) 3. Messagie, M., Boureima, F.-S., Coosemans, T., Macharis, C., Van Mierlo, J.: A Rangebased vehicle life cycle assessment incorporating variability in the environmental assessment of different vehicle technologies and fuels. Energies 7(3), 1467–1482 (2014) 4. Bauer, C., Hofer, J., Althaus, H.-J., Del Duce, A., Simons, A.: The environmental performance of current and future passenger vehicles: life cycle assessment based on a novel scenario analysis framework. Appl. Energy 157, 871–883 (2015) 5. Domingues, R., Marques, P., Garcia, R., Freire, F., Dias, L.C.: Applying multi-criteria decision analysis to the life-cycle assessment of vehicles. J. Clean. Prod. 107, 749–759 (2015) 6. Miettinen, P., Hämäläinen, R.P.: How to beneﬁt from decision analysis in environmental life cycle assessment (LCA). Eur. J. Oper. Res. 102(2), 279–294 (1997) 7. Chevalier, J., Rousseaux, P.: Classiﬁcation in LCA: building of a coherent family of criteria. Int. J. Life Cycle Assess. 4(6), 352–356 (1999) 8. Benoit, V., Rousseaux, P.: Aid for aggregating the impacts in Life Cycle assessment. Int. J. Life Cycle Assess. 8(2), 74–82 (2003)

138

A. Macioł and B. Rębiasz

9. Gaudreault, C., Samson, R., Stuart, P.: Implications of choices and interpretation in LCA for multi-criteria process design: de-inked pulp capacity and cogeneration at a paper mill case study. J. Clean. Prod. 17(17), 1535–1546 (2009) 10. Narayanan, D., Zhang, Y., Mannan, M.S.: Engineering for Sustainable Development (ESD) in bio-diesel production. Process Saf. Environ. Prot. 85(5), 349–359 (2007) 11. Perimenis, A., Walimwipi, H., Zinoviev, S., Müller-Langer, F., Miertus, S.: Development of a decision support tool for the assessment of biofuels. Energy Policy 39(3), 1782–1793 (2011) 12. Bouwman, M.E., Moll, H.C.: Environmental analyses of land transportation systems in The Netherlands. Transp. Res. Part D Transp. Environ. 7(5), 331–345 (2002) 13. Tan, R.R., Culaba, A.B., Purvis, M.R.I.: POLCAGE 1.0-a possibilistic life-cycle assessment model for evaluating alternative transportation fuels. Environ. Model Softw. 19(10), 907– 918 (2004) 14. Zhou, Z., Jiang, H., Qin, L.: Life cycle sustainability assessment of fuels. Fuel 86(1–2), 256– 263 (2007) 15. Safaei Mohamadabadi, H., Tichkowsky, G., Kumar, A.: Development of a multi-criteria assessment model for ranking of renewable and non-renewable transportation fuel vehicles. Energy 34(1), 112–125 (2009) 16. Rogers, K., Seager, T.P.: Environmental decision-making using life cycle impact assessment and stochastic multiattribute decision analysis: a case study on alternative transportation fuels. Environ. Sci. Technol. 43(6), 1718–1723 (2009) 17. Elghali, L., Cowell, S.J., Begg, K.G., Clift, R.: Support for sustainable development policy decisions - a case study from highway maintenance. Int. J. Life Cycle Assess. 11(1), 29–39 (2006) 18. Prado-Lopez, V., Seager, T.P., Chester, M., Laurin, L., Bernardo, M., Tylock, S.: Stochastic multi-attribute analysis (SMAA) as an interpretation method for comparative life-cycle assessment (LCA). Int. J. Life Cycle Assess. 19(2), 405–416 (2014) 19. Rębiasz, Macioł, A.: Comparison of classical multi-criteria decision making methods with fuzzy rule-based methods on the example of investment projects evaluation BT. In: NevesSilva, R., Jain, L.C., Howlett, R.J. (eds.) Intelligent Decision Technologies: Proceedings of the 7th KES International Conference on Intelligent Decis, pp. 549–561. Springer, Cham (2015) 20. Guinée, J., Heijungs, R., Huppes, G., Kleijn, R., de Koning, A., van Oers, L., Wegener Sleeswijk, A., Suh, S., Udo de Haes, H.A., de Bruijn, H., van Duin, R., Huijbregts, M.A.J., Gorree, M.: Handbook on Life Cycle Assessment. Operational Guide to the ISO Standards. Kluwer Academic Publishers, Dordrecht (2002) 21. Dahlbo, H., Koskela, S., Pihkola, H., Nors, M., Federley, M., Seppälä, J.: Comparison of different normalised LCIA results and their feasibility in communication. Int. J. Life Cycle Assess. 18(4), 850–860 (2013) 22. Dias, L.C., Domingues, A.R.: On multi-criteria sustainability assessment: spider-gram surface and dependence biases. Appl. Energy 113, 159–163 (2014) 23. Myllyviita, T., Leskinen, P., Seppälä, J.: Impact of normalisation, elicitation technique and background information on panel weighting results in life cycle assessment. Int. J. Life Cycle Assess. 19(2), 377–386 (2014) 24. Huppes, G., van Oers, L.: Background review of existing weighting approaches in life Cycle Impact Assessment (LCIA) (2011). http://publications.jrc.ec.europa.eu/repository/handle/ JRC67215 25. Stranddorf, H.K., Hoffmann, L., Schmidt, A.: Impact categories, normalisation and weighting in LCA (2005)

Classical, Rule-Based and Fuzzy Methods in Multi-Criteria Decision Analysis

139

26. EEA: Annual European Union greenhouse gas inventory 1990–2014 and inventory report (2016). http://www.eea.europa.eu/publications/annual-european-union-greenhouse-gas. Accessed 01 Oct 2016 27. EEA: European Union emission inventory report 1990–2014 under the UNECE Convention on Long-range Transboundary Air Pollution (LRTAP) (2016). http://www.eea.europa.eu/ publications/lrtap-emission-inventory-report-2016. Accessed 20 Oct 2016 28. Saaty, T.L.: The Analytic Hierarchy Process: Planning, Priority Setting. McGraw-Hill International Book, Resource Allocation, New York; London (1980) 29. Olson, L.: Decision Aids for Selection Problems. Springer, New York (1996) 30. Olson, L.: Comparison of weights in TOPSIS models. Math. Comput. Model. 40(7–8), 721– 727 (2004) 31. Shih, H.-S., Shyur, H.-J., Lee, E.S.: An extension of TOPSIS for group decision making. Math. Comput. Model. 45(7), 801–813 (2007) 32. Adamcsek, E.: The Analytic Hierarchy Process and its Generalizations. Eötvöos Loránd University (2008) 33. Coyle, G.: The Analytic Hierarchy Process (AHP). Practical Strategy (2004) 34. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988) 35. Parsons, S.: Current approaches to handling imperfect information in data and knowledge bases. Knowl. Data Eng. IEEE Trans. 8, 353–372 (1996) 36. Pelzer, E., Fortino, G., Bockstaller, C., Angevin, F., Lamine, C., Moonen, C., Vasileiadis, V., Guérin, D., Guichard, L., Reau, R., Messéan, A.: Assessing innovative cropping systems with DEXiPM, a qualitative multi-criteria assessment tool derived from DEXi. Ecol. Indic. 18, 171–182 (2012) 37. Mamdani, E.H., Assilian, S.: An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Man Mach. Stud. 7, 1–13 (1975) 38. Maciol, A., Rebiasz, B.: Advanced Methods in Investment Projects Evaluation. AGH University of Science and Technology Press, Krakow (2016)

The Research on Mongolian and Chinese Machine Translation Based on CNN Numerals Analysis Wu Nier(&), Su Yila, and Wanwan Liu College of Information Engineering, Inner Mongolia University of Technology, Hohhot, China {451575814,1825065997}@qq.com, [email protected]

Abstract. With the progress of science and technology and the development of artiﬁcial intelligence, the Machine Translation method based on neural network replaces statistical method better because of its translation, especially in the aspect of inter-translation among the major languages in the world. Recurrent neural network can extract more features when encoding the source language, which is vital to the quality of translation. In the aspect of translating Mongolian, it is difﬁcult to obtain semantic relations sufﬁciently from the corpus due to lacking corpus. Therefore, a method of Mongolian-Chinese machine translation based on Convolutional Neural Network (CNN) is proposed. Analysis of Mongolian numerals is to improve the encoder and then selection out of vocabulary. In the process of encoding source language, through the pooling layer, the semantic relation and the series of key information of convolution neural network in the sentence can be obtained. Then, through the Gated Recurrent Unit adds to the global attention mechanism, the source language after encoding can be decoded into Chinese. The experimental result shows that the method takes advantage of the recurrent neural network (RNN) in the aspect of the accuracy and training speed of the translation. Keywords: Machine translation Mongolian and Chinese Global attention mechanism Numerals

CNN

1 Introduction With the development of social communication, information spread has become more and more important. From domestic exchanges to international cooperation are inseparable from it. Text translation is an important way to achieve information spread and communication. Translation mainly done by human are gradually eliminated due to inability to meet the large-scale corpus of translation work. While, the translation method mainly based on machine translation gradually becomes mainstream owing to its fully automatic as well as high quality of translation. From 1990s, Statistical machine translation [1] has gradually become the mainstream method of machine translation, having got good and bad comments from the subject of scientists and linguists. Scientists believe that using the logarithmic linear model [2] based on the statistical analysis method can predict the probability of each © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 140–151, 2019. https://doi.org/10.1007/978-3-030-01174-1_11

The Research on Mongolian and Chinese Machine Translation

141

target word, which can improve the fluency of the translation as possible, and avoid the emergence of illegal sentences. But the linguists argue that the statistical machine translation method is too dependent on mathematical statistics, and barely analyze semantic, which may lead to the generation of ambiguous translations, and make it difﬁcult to obtain higher quality translations. For the problem of statistical machine translation, Cho [3] proposed a neural network structure of the machine translation system framework, who is known as the encoding - decoding framework, by encoding the source language sentence as a ﬁxed length of the vector form. The group vector contains all the semantic information of the whole sentence source language, then a decoder can decode the corresponding target language sentence from the set of vector information. Finally, the bilingual translation can be achieved. However, because of the neural network type in this framework is the RNN [4], and the reverse propagation algorithm Back Propagation through Time [5] is used to reverse the model. The lower the correlation is between the location of the word and the current word, the smaller is the gradient, which may lead the disappearance of the gradient. The result is that the translation quality is very low when translating long sentences. Jean [6] presents a machine translation system based on the sentence length level Long short times memory. The advantage of this model is that the memory element that is added to the hidden layer of the neural network to memorize the vector information alleviating the disappearance of the gradient. But the disadvantages are still obvious. Due to the construction of memory cells in the input gate, output doors and forgotten doors also need to add activation function for each gated unit, which is resulted in training is very slow, especially for large corpus and high time complexity. So in 2013 Kalchbrenner [7] used CNN for the ﬁrst time as the source language encoding carrier on the basis of the encoding and decoding framework. With a single hidden layer RNN target language decoding, not only training speed is fast, but also the semantic information can be got at the same time in order to achieve the parallel computing. However, the study of machine translation based on neural network algorithm is still at the primary stage, especially in the ﬁeld of translation between Chinese and minority languages. Liu [8] proposed a method based on hierarchical recurrent neural network to obtain the semantic relation of sentences to improve the translation model and obtain better results in English and Chinese translation tasks. Shi [9] proposed that using DNN (Deep neural network) to add implicit expression of Chinese-English bilingual expression, in the relationship recognition to obtain efﬁcient translation. In Mongolian-Chinese translation, only Inner Mongolia University is doing RNN-based translation tasks at present. The approach in this paper can be summarized as follows. Firstly, using CNN encodes Mongolian which is different from the time series model and it can be deal with any word in the sentence in parallel, and also improve the training speed while acquiring the semantic relation. Then is used the GRU network of the global attention model to decode the vector. Finally taking BLEU value as the evaluation index, through the normalization function to predict the translation, ultimately improve the quality of Mongolian and Chinese machine translation model.

142

W. Nier et al.

2 Model Building Building the model, CNN is based on end to end neural network structure [10], which consists of an encoder and a decoder. The function of encoder is to encode Mongolian words and sentence, generating a vector sets contained with semantic information. The function of the decoder is to decode this part of the vector sets and then predict the translation. The speciﬁc process is shown as follows (Fig. 1):

ConvoluƟon layer Input matrix

All-connecƟon layer

ConvoluƟon layer Pooling layer

Hidden layer Output layer

Global aƩenƟon Pooling layer

M×N

Fig. 1. Model structure.

As is shown in Fig. 1, it is necessary to process words as vector from the sentence before coding Mongolian sentences. The vector dimensions will be ﬁxed as N, the number of vectors of each word of a sentence will be set as M, which could constitute the input matrix of a M N. Through the default window ﬁlter size of the window, conv-Layer deal with the Mongolian word collection model by convolution operation. Then, use the sampling strategy f the pooling layer Pooling-Layer to extract feature. All of features collected are transmitted to the full connection layer so as to complete the coding process. Decoding process, the global attention mechanism is used to calculate the alignment weights of the Chinese words and each source language Mongolian words, and the ﬁnal translation is predicted by a Gated Recurrent Unit neural network to complete the decoding process. The model is described by using “total-sub-total” analysis structure. 2.1

Vectorization

There are two ways to vectorize the corpus. One of the ways is a mode with a dictionary size as a dimension. In the dictionary, the position of the word is set to 1, as well as the other positions are set to 0, which means that the method is simple, but it will constitute a large dimension of input matrix. And in the convolution operation, lots of 0 elements lead to a large number of invalid operation, which makes the CPU resources wasted. The other is a distributed word vector, which means that each word vector has a speciﬁc set of codes that is different from other word vectors. During deal with context, the corresponding similarity is obtained from the orthogonal operation of the word vector which facilitates the encoding of the text. In addition, the dimension of the vector can be set manually, and the CPU can be used efﬁciently in the subsequent calculation of the convolution neural network. This paper uses the word vector

The Research on Mongolian and Chinese Machine Translation

143

representation, and the word vector is processed using Skip-gram to predict the surrounding words, as shown in (1). expðm0T xoutput mxinput Þ pðxoutput jxinput Þ ¼ PW 0T x¼1 expðmx mxinput Þ

ð1Þ

Where mx and m0x are the vector representation of the input and output of x, and x is the number of words in the dictionary. 2.2

The Construction of CNN Encoder

The construction of CNN includes input matrix, convolution layer, pool layer (sampling layer) and the whole connection layer. The convolution layer consists of several windows (convolution kernel), and each window contains a set of Mongolian word vectors in the input matrix. According to the window movement step, the input matrix is traversed. Each time the window moves a step size, and the word vector set within the window is sampled. Then ﬁnd the optimal value within the set through the sampling strategy, and ultimately get a source language vector, which contains all the keyword information, the process is shown in Fig. 2.

Fig. 2. CNN encoder.

As shown, in Fig. 2, after entering the encoder, using the wide convolution method [11] when the high-level convolution kernel storage Mongolian word vector is free, it is necessary to use the 0 element as a ﬁll, which can ensure the integrity of the semantics and also ensure the accuracy of the convolution operation. Through the parallel calculation, each convolution window will take a subset of the input matrix so as to go on convolutional calculations, and the dimension of the convolutional window is the same with the dimension of the input matrix as shown in (2). xlj ¼ hð R xl1 kijl þ blj Þ j i2Mj

ð2Þ

144

W. Nier et al.

h represents the activation function of each neuron; xlj indicates the l layer output characteristics; Mj represents the set of input matrices; blj represents the offset of the current convolution layer output bias; k indicates the value of the convolution summation. In the same convolution layer, all the parameters are shared for the current input matrix. The h function is the ELU function as shown in Fig. 3.

Fig. 3. ELU activation function.

The convolution layer ELU activation function and its derivative are shown in (3). ( hðxÞ ¼ 0

h ðxÞ ¼

x; x [ 0

eðexpðxÞ 1Þ; x 0 ( 1; x [ 0

ð3Þ

hðxÞ þ e; x 0

e is a constant and x is an input. The ELU function improves the robustness of the noise by reducing the gradient disappearance problem [12] via taking x in the positive interval and having a soft saturation characteristic via entering a small value in the negative interval. The corresponding semantic information is obtained by the maximum pooling strategy according to the value of each convolution layer. The Mongolian sentence: “ ”; for example, take the Chinese sentence “I do not have any help with you.” After the feature extraction, store the Mongolian words “I”, “you”, “go”, “no”, “help”, so corresponding keywords of the sentence are extracted completely. The implementation process is shown in (4). l xlj ¼ f ðblj Maxpooling ðxl1 j Þ þ bj Þ

ð4Þ

Where b represents the multiplication bias and b represents the addition bias. Maxpooling represents the pooling layer activation function. Finally, all vector sets containing semantic information are transferred to the full connection layer.

The Research on Mongolian and Chinese Machine Translation

2.3

145

Numeral Analysis

Aiming at the difference of word segmentation granularity, there are also some differences between Mongolian and Chinese numerals. For example, Chinese sentences: “ ” means “The football team has eleven people in the ﬁeld”. Chinese sentences are ﬁne grained segmentation is “ ”, under coarse granularity is “ ”. The Mongolian corresponding to “ten” is “ ”, but the Mongolian corresponding to “eleven” is “ ”. It can be seen that the same word is different in writing. So set up a numeral mapping table to solve this problem and the resulting numeral choice problem. When a numeral does not appear in the dictionary, it can retrieve the corresponding translation by retrieving the map table. The partial mapping table is shown in Table 1. Table 1. The partial mapping table

2.4

Attention and Decoder

After the encoder’s work is ﬁnished, the position of the source Mongol words which align with the current Chinese words can be obtained through global attention. And then decoder uses GRU neural network which contains one hidden layer to decode. The decoder’s structure chart is shown in Fig. 4. In the ﬁgure, new information is transmitted into GRU decoder contained the attention mechanism through the connection layer. The reset gate and update gate in node of hidden layer of decoder decide whether to remember it. The gate control unit activation formulas are shown in listing 5, 6, 7. Update Gate: zt ¼ rðWz xt þ Uz ht1 Þ

ð5Þ

146

W. Nier et al. Reset gate All-connection layer

Update gate

GRU decoder ht-1

Ur σσ

Global attention

xt

Wr rt

ht-1

ht-1

Wz

xt

not

U tanh

xt

Uz σσ

ht

＋

ht-1

W ht

Fig. 4. GRU decoder.

Reset Gate: rt ¼ rðWr xt þ Ur ht1 Þ

ð6Þ

ht ¼ tanhðrt Uht1 þ Wxt Þ

ð7Þ

New Memory:

For example, formula 7 shows that new memory ht is got from ht1 that comes from a previously hidden layer and current income xt , formula 6 shows that reset signal rt determines the importance of ht1 to ht , formula 5 shows that update signal zt determines the transmitting of ht1 to next condition. If zt is approximate to 1, ht1 will be transmitted to ht entirely. If zt is approximate to 0, new memory ht will forward pass to the next layer of hidden layer. The current condition of hidden layer ht which is shown in formula 8 is produced according to the outcomes of update gate. Current hidden layer status: ht ¼ ð1 zt Þ ht þ zt ht1

ð8Þ

According to each hidden layer, state ht , the Softmax function can go on normalization processing predict the target word of the current time, and then introduce the current target word vector into the next hidden layer state. In the meanwhile, is used the current hidden layer state to predict the next target word. The Softmax function is shown in (9). pðyt jy\t ; xÞ ¼ softmaxðWs ht þ bz Þ

ð9Þ

The Research on Mongolian and Chinese Machine Translation

2.5

147

System Processing

After completing the system model, it is necessary to initialize the parameters of the model. During training the model, the parameters of the initialization are adjusted and optimized by the random gradient descending SGD [13] so as to obtain the optimization parameters. The model flow diagram of the system is shown in Fig. 5.

Chinese Chinese word word segmentation segmentation

Train set

Valid set

Build Build CNN CNN encoder encoder Mongolian and Chinese parallel corpus

Build Build global global attention attention GRU GRU decoders decoders

Model Model parameter parameter assignment assignment

Test set

BLEU BLEU score score

Model Model training training

BPTT BPTT Adam Adam

Test Test set set evaluation evaluation

Parameter Parameter optimization optimization

SGD SGD

Fig. 5. The chart of model.

3 Experiments The research data in this paper includes 67,000 sentences of Mongolian and Chinese parallel language as well as 20,000 paper-level alignment bilingual corpuses downloaded from Mongolia Daily and other relevant website. Through the preprocessing and proofreading of text aligned corpus, a total of 85,000 bilingual sentences can be obtained. The corpus is divided as follows: 75,000 pairs of sentences for training set; 8,000 pairs of sentences for veriﬁcation set; 3,000 pairs of sentences for test set. And it also selects 8,000 words that are most frequent in the bilingual dictionary. In the construction of the model, the CNN encoder adopts double convolution double layer structure. The convolution layer activation function is ELU, and the decoder uses the single hidden layer GRU neural network with global attention. The word vector dimension is set as 300, the convolution window size is set as 4, the iterative training is set as 30 rounds, and the initial learning rate is set as 1. But the learning rate will return to 0.9 after each 3 rounds of iteration and the system adopts BLEU as the translation evaluation index. The hardware environment of experiment is Ubuntu 14.04, GPU is GTX760. The benchmark system adopts the Mongolian Chinese translation machine with RNN structure and Mongolian and Chinese SMT system.

148

W. Nier et al.

3.1

Training Speed

This paper will veriﬁes its training speed ﬁrst and compares the operation speed of CNN encoder and RNN encoder, making use of GPU to train and record the training time of the ﬁrst 10 rounds. The experimental results are shown in Table 2. Table 2. System run speed /min

Epoch1 Epoch2 Epoch3 Epoch4 Epoch5 Epoch6 Epoch7 Epoch8 Epoch9 Epoch10

Baseline 9.2 CNN 3.4

8.7 2.7

9.1 2.9

8.8 3.1

9.3 2.7

9.6 2.8

8.9 2.8

8.5 3.1

8.6 3.1

9.1 2.9

It can be seen from the above sheet that the speed of coding system with CNN is 2.3 times faster than that of benchmark system, which means when convolution neural network dealing with word information by means of parallel processing and matrix calculation, it is much faster than the means of RNN time series model which can only deal with a Mongolian sentence each time. 3.2

Experimental Evaluation and Result

The model, after passing the training, uses the 3,000 pairs of sentences of Mongolian corpus respectively to have tests based on CNN system and the benchmark system. CNN encoder input matrix is 30, the maximum length of a Mongolian sentence, 300

Fig. 6. Test result.

The Research on Mongolian and Chinese Machine Translation

149

words for each word corresponds to the vector dimension, and the sliding window step length is 1. When using the single convolution single layer structure, as shown in Fig. 6 (a), the double convolution double pool structure is shown in Fig. 6(b). Table 3. The BLEU value of each experiment Experimental name

BLEU

Baseline(RNNSearch, SMT) CNN(+1Convolution layer + 2Pooling layer) + GRU CNN(+2Convolution layer + 2Pooling layer + Numeral mapping table) + GRU

23.65 24.89 25.61

Positive and negative values 0 +1.24 +1.96

It is shown in Fig. 6, with the change of CNN structure, the semantic information will be better encoded by adding the corresponding convolution layer and pool layer to facilitate the target decoding and prediction. In Fig. 6(b), when using a double roll laminated structure, it can be seen that the BLEU value of the translation is greatly improved compared with the benchmark system, with the maximum lifting of 1.96 BLEU values. It shows that the quality of the translated text is guaranteed under the premise of training speed. As shown in Table 3, the most BLEU values and their corresponding positive and negative values are obtained under different experiments. Table 4. Part of result

150

W. Nier et al.

It can be seen form the above table that, with the increase of the number of convolution layer pool layer, the corresponding BLEU value has also been a corresponding increase. In theory, it can further increase the number of layer and pool layer volume in the CNN encoder to improve the system, but is bound to cause excessive burden for GPU calculation. Thus this paper runs in GPU load current conﬁguration mode under normal conditions. As shown in Table 4, partial results are obtained from using the test set corpus as well the benchmark system and the CNN based Mongolian Chinese Translation machine system. From the following table, we can see that the translation results obtained in different experiments are different. Our system in this paper can extract the text fully by using two sets of convolution layers. Compared with the benchmark system, the BLEU value is improved.

4 Conclusion This paper proposes the use of CNN and GRU neural network respectively on the Mongolian corpus and Chinese corpus for encoding and decoding the corresponding operations, mainly based on the CNN neural network to parallel computing, weight sharing and extraction characteristics of sampling features to enhance the training speed and the semantic encoding quality, with the use of GRU neural regeneration the network door and reset door to memorize the main semantic information model to further ease the gradient diffusion and gradient explosion phenomenon problem of RNN neural network in the long distance in the phenomenon of problem. In addition, this paper calculates the weights of all the Mongolian words at the end of the target language to obtain the alignment probability, which further improves the quality of translation. However, there is a series of problem in the translation of Mongolian to other languages, that is, the corpus is small and the input is insufﬁcient. This not only greatly limits the development of minority language machine translation, but also affects the cultural progress. In future work, it will further collect the information and materials related, optimize the framework of the system, For some details such as optimization of mistranslation and OOV, we will do further research contributing to the development of Mongolian culture and all the information of Chinese minority industry.

References 1. Miao, H., Cai, D., Song, Y.: Phrase-based statistical machine translation. J. Shenyang Inst. Aeronaut. Eng. 24(2), 32–34 (2007) 2. Knoke, D, Burke, P.J.: Log-Linear Model. Truth&Wisdom Press, Shanghai (2012) 3. Cho, K., Van Merrienboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. 2(11), 23–37 (2014) 4. Mikolov, T., Karaﬁát, M., Burget, L., et al.: Recurrent neural network based language model. In: Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, September. DBLP, pp. 1045–1048 (2010) 5. Prokhorov, D.V., Si, J., Barto, A., et al.: BPTT and DAC—A Common Framework for Comparison. Handbook of Learning and Approximate Dynamic Programming, pp. 381–404. Wiley, Hoboken (2012)

The Research on Mongolian and Chinese Machine Translation

151

6. Jean, S., Cho, K., Memisevic, R., et al.: On using very large target vocabulary for neural machine translation. Comput. Sci. (2014) 7. Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models (2013) 8. Liu, Y., Ma, C., Zhang, Y.: Hierarchical machine translation model based on deep recursive neural network. Chin. J. Comput. 40(4), 861–871 (2017) 9. Shi, X., Chen, Y.: Machine translation prospect based on discourse. In: Chinese Information Processing Society of China 25th Anniversary Academic Conference (2006) 10. Chen-wei: The research of Machine Translation technology based on Neural Network. University of Chinese Academy of Sciences (2016) 11. Chen, X.: Research on algorithm and application of deep learning based on convolutional neural network. Zhejiang Gongshang University (2013) 12. Wang, L., Yang, J., Liu, H., et al.: Research on a self-adaption algorithm of recurrent neural network based Chinese language model. Fire Control Command Control 41(5), 31–34 (2016) 13. Wang, B., Wang, Y.: Some properties relating to stochastic gradient descent methods. J. Math. 31(6), 1041–1044 (2011)

Feature Selection for Bloom’s Question Classiﬁcation in Thai Language Khantharat Anekboon(&) Department of Computer and Information Science, KMUTNB, Bangkok, Thailand [email protected]

Abstract. Bloom’s taxonomy cognitive domain is a list of knowledge and the development of intellectual skills words. It is widely used in an assessment. Currently, in Thai language, teacher identiﬁes Bloom’s taxonomy cognitive level manually, which is a tedious and a time-consuming task. This study presents automatic natural language question classiﬁcation in Thai, feature selection is focused. Several previous works have been studied to fulﬁll Bloom’s taxonomy cognitive domain; however, those works cannot apply to Thai language due to the language characteristic. This study shows that verb, adverb, adjective, conjunction, and question tag should be selected as features in Thai’s exam classiﬁcation. The dataset has been collected from a number of websites on Bloom’s cognitive domain literature. Each question is processed through cleaning data, word segmentation, part-of-speech tagging, and feature selection. After that selected feature, 70% of data set is used for training into a model. Four different classiﬁer models, namely, Naïve Bayes, decision tree, multilayer perceptron, and support vector machine are used to show the effects of the proposed feature selection technique. The results from the testing data (30% of data set) show that the proposed technique with support vector machine gives the good value of accuracy, precision, and recall, which is 71.2%, 72.2%, and 71.2%, respectively. Keywords: Feature selection Question classiﬁcation Bloom’s cognitive domain Thai language Natural language processing

1 Introduction This study focuses on Thai’s question classiﬁcation in education with Bloom’s taxonomy. In education, every teaching course has learning outcome. The learning outcome is statements that describe what the student will be known, able to do when the end of learning [1]. After the end of learning, an assessment will be used to check learning outcome of students. Examinations are a well-known assessment and evaluation technique in learning [2]. Bloom’s taxonomy is a word for a classiﬁcation of the different learning objectives. Bloom’s taxonomy cognitive domain is widely accepted taxonomy for classifying objectives and assessment [3]. It consists of six levels: knowledge, comprehension, application, analysis, synthesis, and evaluation.

© Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 152–162, 2019. https://doi.org/10.1007/978-3-030-01174-1_12

Feature Selection for Bloom’s Question Classiﬁcation in Thai Language

153

Currently, in Thai, exam’s questions were identiﬁed the level of Bloom’s manually by teachers or lecturers. It is a difﬁcult task for teachers who are not expert in Bloom’s taxonomy. Manual identiﬁcation from non-specialists makes a lot of incorrect classiﬁcation rate; also time-consuming. There are many challenge problems with classifying examination questions into Bloom’s taxonomy cognitive level. (1) Although Bloom’s taxonomy provides keywords for classiﬁcation, in the real world, questions in examination are natural language, therefore, people can use any words out of Bloom’s taxonomy to ask a question. (2) There is the overlapping keyword problem. This problem occurs when some keywords are assigned to more than one level. For example, in Table 1, the keyword word and are assigned in both application level and synthesis levels. Table 1. Six Level of Bloom’s taxonomy cognitive domain with sample keywords

(3) The number of the word in exam question is very small (shown in Table 2). It usually consists of only one sentence. (4) There is an ambiguous word problem; one question able to contain more than one level of keyword. For example, question 5 in Table 2, it has 2 Bloom’s taxonomy keywords. The word is a keyword of comprehension level and the word is a keyword of analysis level. (5) Some questions may have the word in Bloom’s taxonomy one level but actually, that question is classiﬁed into another level. For example, question 3 consists of the keyword . That keyword falls into an application level, however, question 3 should be classiﬁed in comprehension level.

154

K. Anekboon Table 2. Example questions in Thai

Those challenge problems are factors affecting the accuracy of a classiﬁcation model. Feature selection is an important process to select subset of relevant features used to construct the classiﬁcation model. Generally, data contains both irrelevant and relevant features. Using all features to construct the classiﬁcation model may raise the

Feature Selection for Bloom’s Question Classiﬁcation in Thai Language

155

curse of dimensionality problem, takes a long time to train a model, and gives the low accuracy. The previous studies [2, 4–13] unable to solve those challenge problems with Thai’s language. To overcome those problems, this study proposes feature selection technique for question’s classiﬁcation in Thai to determining automatic the Bloom’s taxonomy cognitive level. The remainder of this paper is organized as follows: Sect. 2 describes the background knowledge of Bloom’s taxonomy cognitive domain. Section 3 describes related previous works. Section 4 shows the research methodology. Section 5 gives experimental results and discussion. Section 6 is the conclusion.

2 Bloom’s Taxonomy Cognitive Domain Bloom’s taxonomy cognitive domain is introduced in 1956 by Benjamin Bloom and his colleague [14]. It is widely accepted taxonomy for classifying objectives and assessment [3]. It consists of six levels: knowledge, comprehension, application, analysis, synthesis, and evaluation. Each level is arranged in a hierarchy, the upper level requires the skill of its level and also demands skills of lower levels. Knowledge level emphasizes the remembering and recalling. It does not need the student to understand. Questions in this level asking for knowledge of (1) terminology, (2) speciﬁc facts, (3) conventions, (4) trends and sequences, (5) classiﬁcations and categories, (6) criteria, (7) methodology, (8) principles and generalizations, and (9) theories and structures. Example questions at this level are shown below: •

(Who is the composer of Phra Abhai Mani episode the escapement from Pisua Samudr?) (What does Wihok mean?) • (How many color(s) on the Thai’s flag?) • Comprehension level focuses on understanding. Questions in this group are asking for translating, interpreting, and extrapolating. It can also ask about evidence of interpretation behavior such as inferences, generalizations, or summarizations. Example questions at this level are shown below: • • •

(What does the given verse mean?) (What does this picture mean?) (What is the main idea of this story?)

Application level focuses on using acquired knowledge to solve problems in a new situation that students never faced before in the learning process. The example questions in this level are asking a student to apply, develop, restructures something from acquired knowledge, facts, techniques, and rules. Example questions at this level are shown below: • •

(How many area of this room?) (Does this food contains powder? Please test it.)

156

K. Anekboon

Analysis level focuses on the breakdown of something into its component parts. Questions in this group ask a student to analyses elements, relationships, and organizational principles. Example questions at this level are shown below: •

(Describe the relationship of the component of a computer system.) • (What are the most critical issues in education nowadays?) Synthesis level combines materials or elements from many sources into a structure to form a whole. Example questions at this level such as asking a student to (1) produce of a unique communication or a plan, (2) proposed set of operations, or (3) derivation of a set of abstract relations. Example questions at this level are shown below: • •

(Write your own love poem.) (Coding the calculator program in java.)

•

(Use the following character to create the longest word.)

Evaluation level is a level of making judgments or opinions about something such as the value, ideas, techniques, or methods in terms of internal evidence, external criteria. Example questions at this level are shown below: •

(Do you think that the pioneers did the right thing?)

• •

(Is it good programming style?) (From the Ramayana story, is Pipake a good guy?).

3 Related Work There is no automatic question classiﬁcation with Bloom’s taxonomy in Thai. Teachers have to identify the Bloom’s cognitive level manually. However, there are many studies work with Bloom’s taxonomy in English. Nevertheless, those studies cannot apply to Thai’s language due to the characteristic of Thai. This section shows the background of previous studies for English’s question and their limitations. Question classiﬁcation of Bloom’s taxonomy cognitive level in English can be divided into three feature selection groups. The ﬁrst group is selecting features only Bloom’s taxonomy. The second group is selecting features from Bloom’s taxonomy and other words. The last group is selecting every word from questions. Selecting features only Bloom’s taxonomy to classify exam’s questions maps list of keywords with each level of Bloom’s taxonomy cognitive domain. However, different studies used different bloom’s taxonomy keywords. Chang and Chung [4] solve the overlapping keyword problem by weighting the words that exist in more than one level. They search some sources of Bloom’s taxonomy. If every source identiﬁes that word

Feature Selection for Bloom’s Question Classiﬁcation in Thai Language

157

into the same group, weight is one. However, if there are three sources, two of them identify that word into one group and one source identity that word into another one group, the weight of this word is 0.66 for the ﬁrst one group and 0.33 for the second one group. The limitation of this technique occurs when there is an even number of sources. For example, the “classify” keyword is found at two sources; the ﬁrst source assigns the “classify” keyword as comprehension level, whereas the second source assigns that keyword as application level. In this situation, the weight of each level is 0.5. Therefore, even number of sources cannot solve the overlapping keyword problem. Supriyanto, Yusof and Nurhadiono [13] uses only Bloom’s taxonomy keywords. They reduce features by using both ﬁlter-based and wrapper-based feature selection. They applied chi-square, information gain, forward feature selection and backward feature elimination for Naïve Bayes classiﬁer. The limitation of this work is a limitation of using chi-square. The chi-square should not be used when there are more than 20% of the expected values in cells are less than 5. Moreover, forward feature selection and backward feature elimination take a lot of computational time. Haris and Omar [5] select features from Bloom’s taxonomy and other words. This study selects the verb, topic, focus, comment, and perspectives word as features for the programming subject. The limitation of their study is that the model cannot classify a question that cannot contain any verb such as “what is Encapsulation?” For developing rules, this work groups the part-of-speech (POS) into ﬁve speciﬁc categories: (1) supporting statement, (2) symbol, (3) method, class or function’s name, (4) not method, class or function’s name, and (5) special word. The limitation of this work is ﬁve speciﬁc categories. Those speciﬁc categories suit for only programming subject. Yusof and Hui [7] compare three types of feature selection: (1) using every word in questions as features, (2) document frequency method, and (3) the category frequencydocument frequency. This work found that the category frequency-document frequency feature reduction technique is not suitable for question classiﬁcation. The limitation of this work occurs when there are a lot of words that have only one or two appearance(s) in a document. Many classiﬁcation techniques are applied for classifying Bloom’s taxonomy cognitive domain such as rule-based approaches [5, 6, 8], Support Vector Machines (SVM) [9–11], Naive Bayes, k-Nearest Neighbor, and neural networks [7, 12].

4 Research Methodology Selecting features only Bloom’s taxonomy to classify exam’s questions from previous literatures is not suitable for Thai’s question due to the translation. One English’s word can be translated into many Thai’s words, which increase a number of words assigned to more than one level of Bloom’s taxonomy. This causes the increasing of misclassiﬁcation. For example, from So Sethaputra dictionary, and NECTEC’s Lexitron dictionary, translate the word “describe” to , , , , , , , , and [15]. It can be seen from Table 1 that the word , , and are in the knowledge level. Moreover, a word translated from a dictionary is a formal word, which does not use for asking a question in the real world such as the word .

158

K. Anekboon

This study proposes the method of selecting features for question’s classiﬁcation in Thai to determining the Bloom’s taxonomy cognitive level. The proposed method removed insigniﬁcant features, solve the problem of no Bloom’s keyword, overlapping keyword, and ambiguous word problem by selecting verb, adverb, adjective, conjunction, and question tag as features and classify a question in feature space. Figure 1 demonstrates overall processes of this study. The ﬁrst process is cleaning data. After that, due to no space between each word in Thai, word segmentation must be performed to separate each word. The next steps are identiﬁed word’s POS and feature selection. After the feature selection process, selected features will be encoded into a numeric feature vector by bag of words. Those feature vectors will be input into the classiﬁcation model. Predicted group of questions in Bloom’s taxonomy cognitive domain is the ﬁnal output from this study. Following subsections describe each process in details.

Fig. 1. Overall proposed processes.

4.1

Cleaning Data and Word Segmentation

The cleaning data step is removing punctuations and non-Thai words in a sentence. Removing a question that asking more than one asking point. For example, how many color(s) on the Thai’s flag? Identify those color(s). (a) (b) (c) (d)

1 2 3 4

color, red colors, red and white colors, red, white, and blue colors, red, white, blue, and yellow.

Moreover, all numbers are transformed into word such as “2” is transformed to . Finally, each sentence of exam question will be segmented as a word. Word segmentation is done by ICU.

Feature Selection for Bloom’s Question Classiﬁcation in Thai Language

4.2

159

Part-of-Speech Tagging and Feature Selection

There are 11 types of words in Thai: noun, pronoun, verb, adverb, numeral, demonstrative, preposition, conjunction, ending, interjection, and refuse. The types of word that directly relate with interrogative sentence are interrogative pronoun and interrogative adverb. Considering only verb cannot bring the exact meaning from a question. Some verb such as “is” does not provide any information about the level of Bloom’s taxonomy. In this paper, interrogative pronoun and interrogative adverb will be called as question tag. Each level of Bloom focuses on the different skill. Knowledge level involves recognizing or remembering. Comprehension involves demonstrating understanding. Applying involves using acquired knowledge to solve problems in new situations. Analyzing involves analysis of elements, relationships, or organization. Synthesizing involves production of a unique communication, a plan, proposed set of operations, or derivation of a set of abstract relations. Evaluating involves presenting and defending opinions by judgments in terms of internal evidence, or external criteria. It can be seen that type of words which represents those skill can be verb, adverb, and conjunction. Therefore, in this study, a word that it’s POS is verb, adverb, adjective, conjunction, and question tag will be selected as features. This study uses a unigram part-of-speech tagging. 4.3

Feature Encoding

After features are selected, it is ready to train the model. This step transforms the selected feature into a numeric feature vector by bag of words. Each column of feature vector represents the appearance frequency of the speciﬁc word. However, the order of the words is not signiﬁcant in this study. 4.4

Classiﬁcation Algorithms

Numeric feature vectors are separated into training set (70%) and test set (30%). This study compares classiﬁcation techniques of Naïve Bayes, decision tree, multilayer perceptron, and SVM.

5 Experimental Result and Discussion 5.1

Data Set

The data set consists of 762 instances from general knowledge exam has been collected from a number of web site on Bloom’s cognitive domain literature in Thai. Number of instance in each group is shown in Table 3.

160

K. Anekboon Table 3. Number of question of each Bloom’s level Level of Bloom Number of question Knowledge 293 Comprehension 78 Application 80 Analysis 150 Synthesis 96 Evaluation 65

5.2

Experimental Result

The experimental result of this study analyzes individual classiﬁers of Naïve Bayes, decision tree, multilayer perceptron, and SVM by Weka [16], as shown in Tables 4, 5, 6, 7, respectively. All parameters are set as default values. Table 4. Results from Naïve Bayes classiﬁer Feature set Proposed technique Only Bloom’s taxonomy Bloom’s taxonomy and question tag Whole word

Number of attribute 188 122 174

Accuracy (%) 65.5 57.2 61.6

Precision (%) 65.7 56.8 61.7

Recall (%) 65.5 57.2 61.6

1,246

69.0

70.2

69.0

This study compares four types of feature selection: using the feature as the proposed technique, using feature only word from Bloom’s Taxonomy, using the feature from Bloom’s Taxonomy and question tag, and using whole data. The results show that the whole word feature set gives better accuracy precision, and recall in Naïve Bayes and SVM. However, the different values of whole word feature set and proposed feature set is very small i.e., 3.5% for accuracy and recall, 4.5% for precision. However, there is a large difference between the number of attribute of whole feature set (1,246 attributes) and proposed feature set (188 attributes). Moreover, 1,246 attributes with 762 instances cannot run multilayer perceptron algorithm by i7 2.5 GHz with 8 GB of RAM. The accuracy, precision, and recall from decision tree with proposed feature set give a little bit better values than using whole word feature, however, these values are not good enough compared with classifying by SVM. The proposed method removed insigniﬁcant features with POS. Moreover, it solves the problem of no Bloom’s keyword by selecting verb, adverb, adjective, conjunction, and question tag as features. Classifying from word vector feature space solves the overlapping keyword and ambiguous word problems because every feature is merged into one point in the feature space.

Feature Selection for Bloom’s Question Classiﬁcation in Thai Language

161

Table 5. Results from decision tree classiﬁer Feature set Proposed technique Only Bloom’s taxonomy Bloom’s taxonomy and question tag Whole word

Number of attribute 188 122 174

Accuracy (%) 67.2% 63.8% 64.2%

Precision (%) 69.4 67.2 65.6

Recall (%) 67.2 63.8 64.2

1,246

65.1%

66.1

65.1

Table 6. Results from multilayer perceptron classiﬁer Feature set Proposed technique Only Bloom’s taxonomy Bloom’s taxonomy and question tag Whole word

Number of attribute 188 122 174

Accuracy (%) 67.6856% 60.262% 56.3319%

Precision (%) 0.739 0.627 0.561

Recall (%) 0.677 0.603 0.563

1,246

N/A

N/A

N/A

Table 7. Results from SVM classiﬁer Feature set Proposed technique Only Bloom’s taxonomy Bloom’s taxonomy and question tag Whole word

Number of attribute 188 122 174

Accuracy (%) 71.179% 64.6288% 66.3755%

Precision (%) 0.722 0.673 0.674

Recall (%) 0.712 0.646 0.664

1,246

74.6725%

0.750

0.747

It can be concluded from the results that the proposed technique give the good feature selection set with good values of accuracy, precision, and recall, compared with selecting feature only Bloom’s Taxonomy, and selecting feature only Bloom’s Taxonomy and question tag.

6 Conclusion This study is aimed to develop an automated technique for feature selection to classify Thai’s examination questions into Bloom’s taxonomy cognitive levels. The experimental results show that a proposed feature selection technique gives a satisfactory performance in accuracy, precision, and recall. There are only a little bit different value of accuracy, precision, and recall between proposed feature selection technique and the whole word feature set. In the other hand, there is a large difference between the

162

K. Anekboon

number of whole word attribute and proposed selected attribute. However, this study proposed for questions that asking only one asking point in a question. For future work, semantic similarity will be exploited to improve the effectiveness of the classiﬁcation.

References 1. The Glossary of Education Reform (2017). http://edglossary.org/hidden-curriculum 2. Omara, N., et al.: Automated analysis of exam questions according to Bloom’s taxonomy. Procedia Soc. Behav. Sci. 59, 297–303 (2012) 3. Nayef, E.G., Rosila, N., Yaacob, N., Ismail, H.N.: Taxonomies of educational objective domain. Int. J. Acad. Res. Bus. Soc. Sci. 3(9), 2222–6990 (2013) 4. Chang, W., Chung, M.: Automatic applying Bloom’s taxonomy to classify and analysis the cognition level of English question items. In: Pervasive Computing (JCPC), pp. 727–733 (2009) 5. Haris, S.S., Omar, N.: Determining cognitive category of programming question with rulebased approach. Int. J. Inf. Process. Manag. 4(3), 86–95 (2013) 6. Jayakodi, K., Bandara, M., Perera, I., Meedeniya, D.: WordNet and cosine similarity based classiﬁer of exam questions using Bloom’s taxonomy. Int. J. Emerg. Technol. Learn. 11(4), 142–149 (2016) 7. Yusof, N., Hui, C.J.: Determination of Bloom’s cognitive level of question items using artiﬁcial neural network. In: 10th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 866–870 (2010) 8. Haris, S.S., Omar, N.: Bloom’s taxonomy question categorization using rules and N-gram approach. J. Theor. Appl. Inf. Technol. 76(3), 401–407 (2015) 9. Pincay, J., Ochoa, X.: Automatic classiﬁcation of answers to discussion forums according to the cognitive domain of Bloom’s taxonomy using text mining and a Bayesian classiﬁer. In: EdMedia: World Conference on Educational Media and Technology, Canada, pp. 626–634 (2013) 10. Sangodiah, A., Ahmad, R., Fatimah, W., Ahmad, W.: A review in feature extraction approach in question classiﬁcation using support vector machine. In: IEEE International Conference on Control System, Computing and Engineering, pp. 536–541 (2014) 11. Osman, A., Yahaya, A.A.: Classiﬁcations of exam questions using linguistically-motivated features: a case study based on Bloom’s taxonomy. In: The Sixth International Arab Conference on Quality Assurance in Higher Education, Saudi Arabia, pp. 467–474 (2016) 12. Yusof, N., Chai, J.H.: Determination of Bloom’s cognitive level of question items using artiﬁcial neural network. In: 10th International Conference on Intelligent Systems Design and Applications, pp. 866–870 (2010) 13. Supriyanto, C., Yusof, N., Nurhadiono, B.: Two-level feature selection for Naive Bayes with kernel density estimation in question classiﬁcation based on Bloom’s cognitive levels. In: Information Technology and Electrical Engineering (ICITEE) International Conference, pp. 237–241 (2013) 14. Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., Krathwohl, D.R.: Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. David McKay Co Inc., New York (1956) 15. Dictionary (2017). http://dictionary.sanook.com/search/describe 16. Hall, M., et al.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

Improved Training for Self Training by Confidence Assessments Dor Bank1(B) , Daniel Greenfeld2 , and Gal Hyams1 1

Tel Aviv University, Tel Aviv, Israel [email protected], [email protected] 2 Weizmann Institute of Science, Rehovot, Israel [email protected]

Abstract. It is well known that for some tasks, labeled data sets may be hard to gather. Self-training, or pseudo-labeling, tackles the problem of having insuﬃcient training data. In the self-training scheme, the classiﬁer is ﬁrst trained on a limited, labeled dataset, and after that it is trained on an additional, unlabeled dataset, using its own predictions as labels, provided those predictions are made with high enough conﬁdence. Using credible interval based on MC-dropout as a conﬁdence measure, the proposed method is able to gain substantially better results comparing to several other pseudo-labeling methods and out-performs the former state-of-the-art pseudo-labeling technique by 7% on the MNIST dataset. In addition to learning from large and static unlabeled datasets, the suggested approach may be more suitable than others as an online learning method where the classiﬁer keeps getting new unlabeled data. The approach may be also applicable in the recent method of pseudogradients for training long sequential neural networks. Keywords: Semi-supervised learning · Self-training Limited training set · MNIST · Image classiﬁcation

1

Introduction

In the semi-supervised learning scheme, both labeled and unlabeled data are being used to train a classiﬁer. This is especially appealing when labeled data is very limited but unlabeled data is abundant, which is often the case when labeling new data is expensive and suﬃcient labeled data is not yet easy to ﬁnd. Such tasks include semantic segmentation, sentence stressing, video labeling and more. A very common example of limited labeled data but practically unlimited unlabeled data is the on-line stage of a classiﬁer, in which the training data is D. Bank, D. Greenfeld and G. Hyams—Equally contributed authors, writers are presented by the alphabetical order. This study was supported in part by fellowships from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University and from The Manna Center for Food Safety and Security at Tel-Aviv University. c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 163–173, 2019. https://doi.org/10.1007/978-3-030-01174-1_13

164

D. Bank et al.

fully exploited, but unlabeled test data keeps coming. From a practical standpoint, self-training is one of the most simple approaches that can be utilized in those cases. In this setting, after ﬁnishing the supervised training stage of a classiﬁer, it is possible to continue the learning process of the classiﬁer on new unlabeled data, which may be the on-line unlabeled test samples it is asked to classify. Whenever the classiﬁer encounters a sample on which the certainty that the classiﬁcation is correct is high, this sample can be used as a training example along with that prediction as replacement for a label. The crucial question is how can the self-training classiﬁer decide on which of the self-labeled samples it should train on. In other words, when should the predictions of the not yet fully trained classiﬁer should be trusted? In this work, diﬀerent methods for training a self-training classiﬁer are suggested and their utilities are analyzed. The suggested techniques can be easily implemented on top of any boosting and data augmentation methods, improving the obtained results. The main contributions of this work are: (1) Demonstrating how to use the self-training method in the most eﬀective manner. Speciﬁcally, using the algorithm suggested here, state-of-the-art results for self-training on MNIST were achieved, improving the former state-of-the-art results by 7%. (2) Suggesting an empirical limitation of the self-training method, including an empirical lower bound on the preliminary success rate and data set size when implementing the self-training or C-EM method on multi-class classiﬁcation image tasks. The remainder of the paper is organized as follows: A description of previous works on semi-supervised and speciﬁcally self training classiﬁcation is brought in the next section. In Sect. 3, a detailed description of the techniques for deciding the trustworthiness of the classiﬁer on unlabeled samples is given. In Sect. 4, a comparison between the results of the approaches described on methods is presented. The contribution and importance of this article is further discussed in Sect. 5. Finally, future works are suggested in the last in Sect. 6.

2

Previous Works

Self-training. The approach of self-training was ﬁrst presented by Nigam et al. [1] and it was shown that it can be interpreted as an instance of the Classiﬁcation Expectation Maximization algorithm [2]. Implementing the self-training method harnessing Denoising Auto-Encoder and Dropout [3] on the visible layer in order to avoid over-ﬁtting to the training set [4] have achieved the best known results for the self-training method on MNIST data-set so far. Semi Supervised MNIST Classification. Additional approaches for the semi-supervised task, demonstrated for image classiﬁcation on MNIST: Generative Adversarial Networks [5] have been successfully used in order to achieve the state-of-the-art results for semi-supervised MNIST classiﬁcation, based on labeled dataset containing 10 samples for each of the 10 classes. On the other hand, such balanced labeled set is not always available, and training a GAN well may be diﬃcult for a lot of tasks and datasets.

Improved Training for Self Training by Conﬁdence Assessments

165

The unsupervised Ladder Networks technique [6] was successfully harnessed for semi-supervised classiﬁcation [7] with success on the MNIST task as well. Implementing this method is not trivial for a variety of tasks. Another successful technique for contending with insuﬃcient labeled data is the augmentation method, usually used for vision tasks. We note that the method examined in this work can be used on top of augmentation even after the latter was exploited to its fullest; in addition, the method examined here can be eﬃciently used for multiclass classiﬁcation tasks outside the area of computer-vision.

3

Methods

As stated before, the proposed training is done as follows: The model is ﬁrst trained on some labeled dataset, and then the model is trained on unlabeled data using its own predictions as ground-truth, whenever a conﬁdence condition is met. In classiﬁcation tasks, the standard method for extracting a conﬁdencemeasure is by looking at the soft-max layer probabilities. Unfortunately, networks tend to be over-conﬁdent in that sense, rendering those probabilities not informative enough. We therefore applied maximal entropy regularization as recently suggested in [8], which penalizes the network whenever the soft-max probabilities are too concentrated in one class. With that said, even if we know how to measure a networks conﬁdence in its predictions, a crucial challenge remains setting a conﬁdence-threshold by which to decide whether to trust those predictions or not. The trade-oﬀ is clear - a low threshold will result in a high false-positive (FP ) rate which will cause the network to train on wrong samples; a high threshold will result in a low true-positive (TP ) rate which will mean that not enough additional data is obtained to make a diﬀerence. Furthermore, looking at the soft-max probabilities does not exactly yield a desired conﬁdence measure. Those probabilities represent the network’s best guess, but the quantity of interest here is to what extent is this guess reliable. We therefore turned to additional methods that help asses if a prediction is trustworthy: (1) Using MC-dropout as another way to represent a networks uncertainty [9] by running the same network multiple times on a given sample, each run sending to zero a random sample from the hidden units of the network, thus obtaining a distribution over the network’s predictions; and (2) Bagging of two networks - even when the networks have exactly the same architecture, the random initialization of the weights and the fact that the training is stochastic due to dropout layers are enough to ensure that the networks will yield diﬀerent results on the borderline cases. That is, if they agree on a prediction - the chances of it being correct are much greater. Concretely, the following methods were examined as ways to determine when a prediction is likely to be correct:

166

3.1

D. Bank et al.

Soft-Max Threshold

In this method, a hard coded threshold is compared against the highest soft-max result. It is assigned as a hyper-parameter, which the user should provide. It can be assigned diﬀerently for each class, and intuitively it requires prior knowledge on the unlabeled data. 3.2

Ensemble Consensus

A diﬀerent approach, would be to ask for a vote from several classiﬁers. This approach is well ﬁtted for neural networks, even when the networks has the same architecture, since each network is randomly initialized to diﬀerent values. The downside of using an ensemble of classiﬁers is that it requires storing and training several networks, which is not applicable for most tasks. 3.3

Dropout Consensus

A more tractable version of such agreement test is to run one network several times with dropout lo_deg and (φ + Sd /2) up_deg then 20: Ds = ((lo_deg + up_deg)-Sd) / 2 21: break 22: end if 23: end for 24: // based-on different quadrant, the above radial 25: // coordinate and degree information is converted 26: // to toner-saving result 27: if Ds 0 and Ds < 90 then (ad , bd) = (Rs*cosd(Ds), Rs*sind(Ds)) 28: 29: else if Ds 90 and Ds < 180 then (ad , bd) = (-Rs*sind(Ds-90), Rs*cosd(Ds-90)) 30: 180 and Ds < 270 then 31: else if Ds (ad , bd) = (-Rs*cosd(Ds-180),-Rs*sind(Ds-180)) 32: 33: else (ad , bd) = (Rs*sind(Ds -270), -Rs*cosd(Ds-270)) 34: 35: end if

530

P.-C. Wu and C.-H. Lin

Fig. 5. (a) Original image. (b) Processed image (ΔE94 = 2.0). (c) Original’s color distribution on a*-b* plane. (d) The color distribution of processed image.

A Green Printing Method Based on Human Perceptual

531

3 Experimental Results The proposed approach is evaluated with two standard color image datasets, the Kodak dataset and the IMAX dataset. Figure 6 shows 24 images of the Kodak dataset. The 18 images of IMAX dataset are shown in Fig. 7. To demonstrate the visual effect, many color difference settings of CIE94 are adapted to generate its corresponding color scale table. The image size of the Kodak and IMAX dataset is 768 512 and 500 500 pixels respectively. Generally, the default kernel size of bilateral ﬁlter is set as 7 7 pixels, and its spatial-domain standard deviation is set to 3. However, due to the images of these datasets are relatively noiseless, the bilateral ﬁlter is disabled while processing these datasets. To demonstrate the improvement, the color histogram of processed image is accumulated to objectively assess the toner usage and image quality. Besides, the total saturation is accumulated to determine the performance of toner saving. The Kodak dataset is named as ‘kodim01’, ‘kodim02’…, and ‘kodim24’ from the upper-left to the bottom-right of Fig. 6. The IMAX dataset is named as ‘MCM_01’, ‘MCM_02’…, and ‘MCM_18’ from the upper-left to the bottom-right of Fig. 7.

Fig. 6. The Kodak dataset images.

Fig. 7. The IMAX dataset images.

532

P.-C. Wu and C.-H. Lin

The total color usage comparison is shown in Figs. 8 and 9 0

10000 20000 30000 40000 50000 60000 70000 80000

Kodim01

Original

Kodim02

dE94 = 0.5

Kodim03

dE94 = 1.0

Kodim04

dE94 = 1.5

Kodim05 Kodim06 Kodim07 Kodim08 Kodim09 Kodim10 Kodim11 Kodim12 Kodim13 Kodim14 Kodim15 Kodim16 Kodim17 Kodim18 Kodim19 Kodim20 Kodim21 Kodim22 Kodim23 Kodim24 Average

Fig. 8. Comparison of total color usage under various color difference settings for Kodak dataset.

The results show that the proposed method can be used to group the similar color. By leveraging the above advantage, the proposed method can be used to operate color remapping to reduce toner usage in the stage of color management printing process. Table 1 is the total usage color comparison between original images and the proposed method under various color difference settings.

A Green Printing Method Based on Human Perceptual 0

50000

100000

150000

200000

533

250000

Original dE94 = 0.5 dE94 = 1.0 dE94 = 1.5

MCM_01 MCM_02 MCM_03 MCM_04 MCM_05 MCM_06 MCM_07 MCM_08 MCM_09 MCM_10 MCM_11 MCM_12 MCM_13 MCM_14 MCM_15 MCM_16 MCM_17 MCM_18 Average

Fig. 9. Comparison of total color usage under various color difference settings for IMAX dataset.

The average saturation comparison list for the Kodak dataset and IMAX dataset is shown in following Figs. 10 and 11, respectively. The results show that the proposed method can moderately reduce the average image saturation. Leveraging the characteristics of color difference model and human perceptual, the proposed method modiﬁes each pixel color to vicinal low-saturation color by using color replacement processing. Table 1. Comparison of total color usage Original Proposed ðDE94 ¼ 0:5Þ Proposed ðDE94 ¼ 1:0Þ Proposed ðDE94 ¼ 1:5Þ Kodak 35613 33857 27009 21059 IMAX 115399 70653 45391 32243

Table 2 is the average saturation comparison between original images and the proposed method under various color difference settings.

534

P.-C. Wu and C.-H. Lin 0

0.2

0.4

0.6

0.8

1

Kodim01

Original

Kodim02

dE94 = 0.5

Kodim03

dE94 = 1.0

Kodim04

dE94 = 1.5

Kodim05 Kodim06 Kodim07 Kodim08 Kodim09 Kodim10 Kodim11 Kodim12 Kodim13 Kodim14 Kodim15 Kodim16 Kodim17 Kodim18 Kodim19 Kodim20 Kodim21 Kodim22 Kodim23 Kodim24 Average

Fig. 10. Comparison of average saturation under various color difference settings for Kodak dataset.

A Green Printing Method Based on Human Perceptual 0

0.2

0.4

0.6

0.8

535

1

MCM_01

Original

MCM_02

dE94 = 0.5 dE94 = 1.0

MCM_03

dE94 = 1.5

MCM_04 MCM_05 MCM_06 MCM_07 MCM_08 MCM_09 MCM_10 MCM_11 MCM_12 MCM_13 MCM_14 MCM_15 MCM_16 MCM_17 MCM_18 Average

Fig. 11. Comparison of average saturation under various color difference settings for IMAX dataset.

Table 2. Comparison of average saturation Original Proposed ðDE94 ¼ 0:5Þ Proposed ðDE94 ¼ 1:0Þ Proposed ðDE94 ¼ 1:5Þ Kodak 0.3080 0.3002 0.2921 0.2837 IMAX 0.5480 0.5307 0.5164 0.5001

536

P.-C. Wu and C.-H. Lin

4 Conclusion In this paper, a toner-saving method based on human perceptual and CIE94 color difference model is proposed. The proposed method is expected to help related industries improve their printing quality and reduce the energy wasting. This method can be applied to cooperate with existing hardware-based toner-saving method to enlarge application scope. However, the proposed algorithm cannot directly apply to the frameworks which are based on other color difference models. The future work will be developing a tonersaving solution, which can be applied on various color difference models.

References 1. Kruse, T.: Greener library printing and copying. Bottom Line 24(3), 192–196 (2011) 2. Wolkoff, P., Wilkins, C., Clausen, P., Larsen, K.: Comparison of volatile organic compounds from processed paper and toners from ofﬁce copiers and printers: methods, emission rates, and modeled concentrations. Indoor Air 3(2), 113–123 (1993) 3. Montrucchio, B., Ferrero, R.: Toner savings based on quasi-random sequences and a perceptual study for green printing. IEEE Trans. Image Process. 25(6), 2635–2646 (2016) 4. Pappas, T., Neuhoff, D.: Least-squares model-based halftoning. IEEE Trans. Image Process. 8(8), 1102–1116 (1999) 5. Lieberman, D., Allebach, J.: A dual interpretation for direct binary search and its implications for tone reproduction and texture quality. IEEE Trans. Image Process. 9(11), 1950–1963 (2000) 6. Li, X.: Edge-directed error diffusion halftoning. IEEE Signal Process. Lett. 13(11), 688–690 (2006) 7. Donevski, D., Poljicak, A., Kurecic, M.: Colorimetrically accurate gray component replacement using the additive model. J. Vis. Commun. Image Represent. 44, 40–49 (2017) 8. Spiridonov, I., Shopova, M., Boeva, R.: Study the effect of gray component replacement level on reflectance spectra and color reproduction accuracy. In: 17th International School on Quantum Electronics: Laser Physics and Applications (2013) 9. Zeighami, F., Tehran, M.: Use of colorants replacement technique” in order to reduce the amount of dye consumed in textile dyeing processes. J. Text. Inst. 105(2), 119–128 (2013) 10. Decker, W., Lee, H., Zable, J.: System, method, and program for saving toner/ink in a color printer without sacriﬁcing image quality. US Patent 6,313,925 B1, 6 Nov 2001 11. Hibi, Y.: Method for under-color removal in color image forming apparatus. US Patent 5,359,437 A, 25 Oct 1994 12. Rolleston, R.J., Maltz, M.S., Stinehour, J.E.: Color printer calibration architecture. US Patent 5,528,386 A, 18 Jun 1996 13. MacAdam, D.: Visual sensitivities to color differences in daylight*. J. Opt. Soc. Am. 32(5), 247 (1942) 14. Berns, R., Billmeyer, F., Saltzman, M.: Billmeyer and Saltzman’s Principles of Color Technology. Wiley, New York (2000) 15. Sharma, G.: Digital Color Imaging Handbook: Electrical Engineering and Applied Signal Processing Series. CRC Press, Boca Raton (2003)

Focused Visualization in Surgery Training and Navigation Anton Ivaschenko1(&), Alexandr Kolsanov2, and Aikush Nazaryan2 1

2

Information Systems and Technologies Department, Samara National Research University, Samara, Russia [email protected] Simulation Center, Samara State Medical University, Samara, Russia

Abstract. 3D simulation of human anatomy and surgery intervention are actively implemented nowadays in medical care and higher education. On the basis of recent advances in surgery modeling and augmented reality there was developed a new solution for surgery assistance in real time. The solution consists of three modules: (1) preoperative planning; (2) 3D imaging; and (3) surgery navigation. New simulation models and algorithms were introduced for surgery focused visualization and decision-making support. The developments were successfully probated at clinics of Samara State Medical University for a number of medical cases. This paper describes the details of the proposed solution and its implementation in practice. Keywords: Surgery training Surgery navigation 3D anatomy Augmented reality Image-guided surgery Simulation

1 Introduction Augmented reality (AR) is one of the most challenging technology trends of simulation and modeling nowadays. Effective implementation of AR in tourism and game development motivates looking for new possibilities of its application in e.g. medical care. Despite the powerful possibilities of modern AR goggles and headsets and wide enough experience of surgery simulation the problem of AR surgery assistant development appeared to be complicated in practice. First of all, there should be developed an individual set of realistic 3D models of human body parts for each medical case. Next, the simulated scene should be adaptively visualized over the human body. Finally, the movements of surgery instruments should be captured and coordinated with the surgeon head movements. In addition to this, the AR scene should be wide enough in order to provide comprehensive imaging of the surgery scene. Considering the existing capabilities of available AR equipment there was made a decision to develop an original hardware and software solution based on focused visualization of surgery scenes. Details of the proposed approach are presented below.

© Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 537–547, 2019. https://doi.org/10.1007/978-3-030-01174-1_40

538

A. Ivaschenko et al.

2 State of the Art Image-guided surgery (IGS) is being successfully implemented in modern neurosurgery and helps to perform safer and less invasive procedures [1, 2]. During the IGS procedure the surgeon uses tracked surgical instruments in conjunction with preoperative or intraoperative images in order to indirectly guide the procedure. IGS systems present the patient’s anatomy and the surgeon’s precise movements related to the patient to computer monitors in the operating room. The source images are captured by cameras or electromagnetic ﬁelds. IGS refer to the area of computer assisted surgery (CAS) that is based on application of various computer technologies for surgical planning and guiding or performing surgical interventions [3]. CAS include the phases of medical imaging (using CT, MRI, X-rays, ultrasound, etc.), image analysis and processing, preoperative planning and surgical simulation, surgical navigation, and robotic surgery. Using the surgical navigation system, the surgeon uses special instruments, which are tracked by the navigation system. The basic challenge of image-guided surgery implementation in practice is the complexity of data visualization. Surgeon needs the convenient, adequate and real-time picture being presented over the area of surgery intervention scene. At the same time the presented image should not be overcomplicated with minor details attracting attention to the most essential points. Therefore, there is a strong request for optimization of medical data visualization in image-guided surgery. Visualization of medical data is a separate problem, which requires application of modern technologies in the areas of computer optics, psychology, software development and mental science. The results of recent research in this area [4, 5] present architectures capable to interlink various related data sets (e.g., text, measured values, images, scans) to provide adequate visual analytics for decision-making support. Such an approach allows present a complex biological system at once considering interdependencies between multiple layers. Ampliﬁcation of the ideas of surgery advances based on improved medical data visualization instead of automated decision-making is supported by “human-in-theloop” approach [6]. According to this concept decision makers should directly interact with the automated system providing information about data processing details that can be used for machine learning and logic conﬁguration. The users of software and hardware start being involved intro the process of data processing and visualization through interactive user interfaces that help improving the learning capabilities of both humans and algorithms. These trends in surgery decision-making support are highly respected by medical community. Interactive IT devices that are widely spread nowadays like smartphones, tablets, specialized visual panels, VR and AR goggles, provide new possibilities of complex data presentation and tracking the user’s activity. This functionality can be used to understand the user’s behavior features and adapt the logic of data processing and presentation. The example of successful experience of virtual reality (VR) technology implementation as a part of 3D web-based anatomical visualization tool is presented in [7].

Focused Visualization in Surgery Training

539

An existing experience in 3D simulation of surgery scenes [8–10] has inspired an idea of AR implementation in real time simulation of the process of human body intervention for surgery decision making support as a part of an image-guided surgery system. AR has high perspectives of its implementation to solve the stated problems in medical applications. Using modern solutions [11, 12] tailored with adaptive user interfaces [13] there can be solved a number of problems speciﬁc for medical data visualization. User Interfaces are usually designed with the focus on maximizing usability and user experience. The goal of user interface design is to make the user’s interaction as simple and efﬁcient as possible, in terms of accomplishing user goals. The possibilities of AR as a ubiquitous user interface to the real world are greatly discussed in [14]. Technological advances, exploding amounts of information, and user receptiveness are fueling AR rapid expansion from a novelty concept to potentially the default interface paradigm in coming years [15]. At the same time AR faces the same core usability challenges as traditional interfaces, such as the potential for overloading users with too much information and making it difﬁcult for them to determine a relevant action. However, AR exacerbates some of these problems because multiple types of augmentation are possible at once, and proactive apps run the risk of overwhelming users.

3 Augumented Reality Application The proposed solution for AR surgery assistant is developed as a multifunctional complex that allows planning surgical operations based on preoperative MRI and CT examinations by building a 3D model of internal organs and tissues. The solution includes the systems of 3D imaging, preoperative planning, and surgery navigation. The solution is illustrated by Fig. 1.

Fig. 1. AG headset for surgery assistance and tracking system.

Based on CT and MRI studies of the patient radiologist creates 3D reconstructions of the zones of surgery operational interest and tissues the subject to destruction.

540

A. Ivaschenko et al.

Consequently, the complex saves the time of the radiologist due to automatic segmentation of vessels, organs and neoplasm. There was developed an original segmentation technology described in [16]. One the next stage the surgeon plans an operation based on the resulting 3D model. During this process he establishes anatomical landmarks, chooses the trajectory and safe limits of surgical access. The navigation system allows displaying the 3D model using AR goggles. The individual anatomical model is projected over the patient’s actual organ, together with the data on the surgery plan and the current clinical parameters of the patient. Tracking system includes originally designed headset, markers surgery instruments and a specialized tracking system (see Fig. 2). Original design allows capturing the movements of surgery instruments coordinated with the surgeon focus.

Fig. 2. The concept of AG surgery assistant.

At the beginning the visualization system projects a personiﬁed anatomical 3D model onto the skin. Due to this, the surgeon determines and outlines the optimal entry points, reduces the surgical ﬁeld and the volume of surgical access. During the operation, the surgical navigation system provides the surgeon with continuous monitoring of the access trajectory, the position of the surgical instrument and comparison with the surgical plan. Thus, it helps the surgeon consciously increase the radicalism and accuracy of surgical intervention. The system targets the maximum safety and efﬁciency of complex surgical interventions with minimal damage to the patient’s tissues. Therefore, the system provides: • Assistance in the analysis and planning of a future operation in a visual 3D model based on a preoperative CT/MRI study; • Navigation during the operation with the help of the visualized model and the operation plan superimposed on the surgery ﬁeld.

Focused Visualization in Surgery Training

541

4 Focused Visualization Concept The proposed solution targeting to improve the quality of surgery training and navigation using modern technologies of medical data visualization is based on implementation of special software for AR devices. As a part of AR solution there was designed and implemented a special module for surgeon’s focus coordination based on intelligent analysis of the intervention process. The focus of AR user is captured in the form of event chains and compared to typical scenarios of surgery intervention formalized by user behavior patterns. The models of event flows are compared with behavior patterns using the algorithms of cross-correlation analysis of non-equidistant times series. This technology allows to identify critical deviations being persistent to random and human factors. The proposed approach allows identiﬁcation possible gaps in viewer’s perception, when no required attention is given to the necessary surgery intervention scenario steps at certain times. In addition to identiﬁcation feature there was implemented a possibility to attract the surgeon’s focus to required operating scene objects using especially generated notiﬁcations and alerts. These notiﬁcations include textual items, marks or highlights and provide additional information when needed: in case the system understands that the process is performed according to the predeﬁned scenario, no extra data is needed to be presented for a surgeon. Such concept allows minimizing the distraction of surgeon’s attention when using AR technologies in practice. Solution architecture for focused visualization is presented in Fig. 3. Domain Ontology is a knowledge base used to capture and store surgery scenarios and user behaviour patterns. Additional software plug-in for AR device is used to identify the movements of head and eyes, track the user’s focus and analyse his/her interest. Focus attractor is introduced to match the user’s models and surgery intervention patterns and generate notiﬁcations and alerts.

Fig. 3. Focused visualization solution vision.

542

A. Ivaschenko et al.

The results of probation of the proposed approach present a possibility to improve the usability of AR headset for using it as a surgery supporting device and prove the efﬁciency of focused visualization implementation in practice.

5 Anatomy Modeling To generate virtual scenes for surgery simulation there was developed a special library of 3D models of human body parts. The delivered models differ from the existing available analogs by a possibility to be used as a part of surgery training suites and navigation systems. They allow generating surgery scenes close to real images processed in surgery intervention. There were designed and conjoined up to 4000 models of human body parts (see Fig. 4) combined to 12 layers, including the ligaments, blood vascular system, innervations system, outflow tracts, lobar and segment structures of internals. To develop the models there were used real digital volume computer tomography and magnetic resonance tomography images. Some models were developed using 3Dscanner Solutionix Regscan III with prost processing using 3D editor Autodesk 3ds Max and Autodesk Maya. 3D models visualization was implemented using Unity game engine (unity3d.com). As a result, there were worked through a number of atlases addressing various aspects of human anatomy. • • • • •

Generic (systemic) anatomy; Topographic (regional) anatomy; Pathologic anatomy; Physiology; Histology (microanatomy).

To store and manage the developed library there was also developed a shell viewer and a database. The developed library of 3D models was used to develop several scenes for laparoscopy and endovascular training. To provide realistic picture these models were fashioned with the help of speciﬁcally designed shaders and some fragments like jars, liquids and blood were simulated by speciﬁc algorithms. 3D models of human body parts are implemented to educational process as an informational basis for “Inbody Anatomy” anatomic atlas and Interactive 3D anatomy study table “Pirogov” (see Fig. 5). Pirogov solution (http://www.nash-pirogov.ru/en/) is an interactive software implementing a clear-structured natural science study program for undergraduate and graduate medical specialists, including such study ﬁelds as applied anatomy, morbid anatomy, forensic medicine, surgical studies, ophthalmology, human virtual dissection, otorhinolaryngology etc. This software as the surgeon guide allows building a complete study cycle, starting from entry level anatomy studies based on visual and descriptive materials, to knowledge assessment and automatic test results control.

Focused Visualization in Surgery Training

543

Fig. 4. Anatomic models of human body: skin, skeleton and bones simulation, blood-vascular and nervous systems (head and body).

Fig. 5. Representation of anatomic pictures using Interactive 3D anatomy study table.

544

A. Ivaschenko et al.

6 AR Implementation for Surgery Navigation AR headset equipped with surgery instruments markers and tracking system is presented in Figs. 6 and 7. The designed headset allows presenting 3D models of human body parts and other medical data that is contextually required over the real surgery intervention ﬁeld.

Fig. 6. AG headset for surgery assistance.

The human body is represented by 3D models with realistic appearance and possibility to interact with surgery instruments. The inner parts of human body are simulated in the scene by soft body models and surgery instruments are simulated by rigid bodies. These models are fashioned with the help of speciﬁcally designed shaders and algorithms that simulate jars, liquids and blood. Solution functionality includes the following: • 3D reconstruction of typical and atypical anatomical structures with visualization of normal and pathological cases based on X-ray studies (organs, vessels, ducts, affected parts); • surgery intervention planning by setting the basic stages and visual anatomical information; • visualization of personiﬁed topographic and anatomical data of the patient in augmented reality mode; • accompanying surgical intervention with monitoring the location of surgical instruments; and • training of students and beginners in the video material recorded during the operation. The system allows loading medical data in DICOM format from various sources and export images to JPEG and PNG.

Focused Visualization in Surgery Training

545

Fig. 7. 3D surgery scene simulator.

7 Probation Results In this section there are presented some results of AR surgery assistant application to a real medical case (lungs surgery). 3D simulation of human body parts is based on automated reconstruction of KT and MRI images with post processing in manual mode. The resulting image can be used by surgeon for preliminary planning of surgery intervention and modeling of possible alternatives.

Fig. 8. 3D reconstruction of lungs to detect the zones of increased density.

546

A. Ivaschenko et al.

The process of determination of the severity of pneumonia and lung contusion is presented in Fig. 8. Detection and calculation of the volume of zones of increased density of lungs allows determining the zones of consolidation and “frosted glass”. Figure 9 presents the results of human lungs 3D reconstruction for surgery planning and intervention decision making support. The process of diagnostic surgery planning allows determining the optimal access point for puncture for the analysis of neoplasm.

Fig. 9. Lungs 3D reconstruction and puncture planning.

8 Conclusion The proposed surgery assistant is based on implementation of AR and provides preoperative planning, 3D imaging and surgery navigation. There was developed an original solution for hardware and software that provides focused visualization for surgery intervention. The solution allows to increase the radicality and precision of surgical intervention, reduce the damage to the patient’s tissues, the time of surgery and the amount of blood loss; obtain a weighted surgical plan with the optimal resection volume, entry points, access pathway; improve communication between doctors in interdisciplinary cases and ensure the conduct of training rehearsals procedures.

References 1. Mezger, U., Jendrewski, C., Bartels, M.: Navigation in surgery. Langenbecks Arch. Surg. 398, 501–514 (2013) 2. Galloway, Jr., R.L.: Introduction and historical perspectives on image-guided surgery. In: Golby, A.J. Image-Guided Neurosurgery, pp. 3–4. Elsevier, Amsterdam (2015) 3. Haaker, R.G., Stockheim, M., Kamp, M., Proff, G., Breitenfelder, J., Ottersbach, A.: Computer-assisted navigation increases precision of component placement in total knee arthroplasty. Clin. Orthop. Relat. Res. 433, 152–159 (2005)

Focused Visualization in Surgery Training

547

4. Holzinger, A.: Extravaganza tutorial on hot ideas for interactive knowledge discovery and data mining in biomedical informatics. Lecture Notes in Computer Science, vol. 8609, pp. 502–515 (2014) 5. Sturm, W., Schreck, T., Holzinger, A., Ullrich, T.: Discovering medical knowledge using visual analytics – a survey on methods for systems biology and *omics data. In: Eurographics Workshop on Visual Computing for Biology and Medicine (VCBM) Eurographics (EG), pp. 71–81 (2015) 6. Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3(2), 119–131 (2016) 7. Said, C.S., Shamsudin, K., Mailok, R., Johan, R., Hanaif, H.F.: The development and evaluation of a 3D visualization tool in anatomy education. EDUCATUM – J. Sci. Math. Technol. 2(2), 48–56 (2015) 8. Ivaschenko, A., Kolsanov, A., Nazaryan, A., Kuzmin, A.: 3D surgery simulation software development kit. In: Proceedings of the European Simulation and Modeling Conference 2015 (ESM 2015), Leicester, UK, EUROSIS-ETI 2015, pp. 333–240 (2015) 9. Ivaschenko, A., Gorbachenko, N., Kolsanov, A., Nazaryan, A., Kuzmin, A.: 3D scene modelling in human anatomy simulators. In: Proceedings of the European Simulation and Modeling Conference (ESM 2016), Spain, EUROSIS-ETI 2016, pp. 307–314 (2016) 10. Ivaschenko, A., Milutkin, M., Sitnikov, P.: Accented visualization in maintenance AR guides. In: Proceedings of SCIFI-IT 2017 Conference, Belgium, EUROSIS-ETI, pp. 42–45 (2017) 11. Krevelen, R.: Augmented reality: technologies, applications, and limitations, Department of Computer Science. Vrije Universiteit, Amsterdam (2007) 12. Navab, N.: Developing killer apps for industrial augmented reality. In: Technical University of Munich, IEEE Computer Graphics and Applications. IEEE Computer Society (2004) 13. Julier, S., Livingston, M.A., Swan, J.E., Baillot, Y., Brown, D.: Adaptive user interfaces in augmented reality. In: Proceedings of ISMAR 2003, Tokyo, Japan (2003) 14. Schmalstieg, D., Reitmayr, G.: The world as a user interface: augmented Reality for ubiquitous computing. In: Central European Multimedia and Virtual Reality Conference (2005) 15. Singh, M., Singh, M.P.: Augmented reality interfaces: natural web interfaces. IEEE Internet Comput. 66–70 (2013) 16. Nikonorov, A., Yakimov, P., Yuzifovich, Y., Kolsanov, A.: Semi-automatic liver segmentation using Tv-L1 denoising and region growing with constraints. In: 9th German-Russian Workshop on Image Understanding, Koblenz, Germany, pp. 1–4 (2014)

Using DSP-ASIP for Image Processing Applications Sameed Sohail1(B) , Ali Saeed2 , and Haroon ur Rashid1 1

2

Department of Electrical Engineering, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad, Pakistan [email protected], [email protected] Department of Electrical Engineering, Linkoping University, Linkoping, Sweden [email protected]

Abstract. The rapid deployment of embedded image processing applications have forced a paradigm shift from complete hardware and software based implementations providing best performance and lowest cost, respectively towards a hybrid approach, namely, application speciﬁc instruction-set processor (ASIP). In this paper, we evaluate the applicability of CuSP, a softcore DSP-ASIP, for image processing applications. CuSP has a Crimson DSP processor core and hardware accelerators directly coupled with the core oﬀering improved performance with ﬂexibility. Results show that CuSP oﬀers performance improvement over standard softprocessor MicroBlaze by up to a factor of 36 times. Crimson DSP core alone gives up to 5.3 times lower execution cycles than MicroBlaze. Keywords: DSP-ASIP

1

· Image processing · 2D convolution

Introduction

Latest embedded image/video processing applications like facial recognition, automated surveillance systems, etc. demand high performance, ﬂexibility and fast product development cycle. Although custom hardware units implemented in production grade FPGA or ASIC oﬀer best performance for the aforementioned applications, their inﬂexibility and complex development cycle makes them a less feasible implementation choice for most resource constrained and time critical applications. Also, design of hardware accelerators itself poses another challenge as it requires both hardware and software experts that work on the design from start till maturity. Once designed, any future changes would mean taking the design back to step one and starting again. To reduce the development eﬀort required in accelerator design, some researchers have proposed schemes for automatic extraction of accelerators from algorithm source code. Authors in [1,2] propose methods for automatic extraction of accelerators, but these approaches require well structured algorithms and do not guarantee performance for all possible algorithms. c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 548–557, 2019. https://doi.org/10.1007/978-3-030-01174-1_41

Using DSP-ASIP for Image Processing Applications

549

In recent years, softcore general purpose processors (GPPs) have been proposed for image processing applications. The author in [3] discusses the use of Xilinx MicroBlaze softcore processor for image inversion. In [4], author implements .jpeg image compression on NIOS softcore processor by Altera. Such softcore processor implementations oﬀer both control and ﬂexibility to the hardware designer which is not possible with oﬀ the shelf processors. Despite these attractive features, performance of such softcore GPPs still falls signiﬁcantly below hardware accelerators. Researchers have proposed modiﬁed softcore GPPs for improved performance. The authors in [5] introduce rVEX soft processor for image processing, a very large instruction word (VLIW) processor, with ability to perform multiple operations in parallel. Although much faster than standard softcores like MicroBlaze, still these designs lag far behind hardware accelerators in terms of performance. To improve performance further, some softcore GPPs, like MicroBlaze oﬀer hardware acceleration support but communication delays between the accelerator fabric and processor considerably reduce performance [6]. Application speciﬁc instruction set processors (ASIPs) have been proposed that oﬀer superior performance than softcore GPPs and are far more ﬂexible than hardware accelerators. ASIP instruction set consists of general instructions that execute on GPP core and application speciﬁc custom instructions like convolution, audio compression, etc. that execute on hardware accelerators. ASIPs oﬀer a compromise between performance and ﬂexibility constraints that aﬀect most embedded image processing systems. Digital signal processor (DSP) applications, like signal, image processing, etc. involving complex performance constraints have given rise to a sub-class of ASIPs, namely, DSP-ASIPs [7]. These DSP-ASIPs include a softcore DSP processor attached with multiple hardware accelerators to reduce performance bottlenecks in DSP applications. DSP processor contains most GPP features and DSP speciﬁc additions like multiply and accumulate (MAC), hardware looping and multiple data memories. These additions in a DSP processor, improve its performance in comparison with GPPs. In [8], author discusses the suitability of TMS320C66x DSP processor for radar imaging applications. In [9], author implements and simulates an 8-bit DSP processor for image processing applications. In this paper CuSP, a DSP-ASIP for image processing applications is proposed. It contains a DSP processor core called Crimson that is based on Senior DSP processor architecture proposed in [7] and [10]. Also hardware accelerators for performance enhancement of image processing tasks are part of CuSP. Senior DSP processor architecture has been modiﬁed to support image processing requirements, like accelerator memory interfacing. The processor has low development time [11] and supports tightly coupled hardware accelerators through custom instructions. Performance of proposed CuSP is compared with MicroBlaze and rVEX [5] softcore processors for 2D image convolution. Layout of the remaining paper is as follows. Section 2 discusses CuSP implementation. Section 3 describes CuSP results for image applications. Section 4 provides conclusion and Sect. 5 outlines future direction.

550

S. Sohail et al.

Fig. 1. Crimson general architecture.

2

Implementation

CuSP consists of Crimson DSP processor and hardware accelerators designed for performance enhancement of image processing applications. After proﬁling some commonly used image processing algorithms like convolution, median ﬁltering etc. design requirements for Crimson were identiﬁed. These included requirements like total data memories, data width, pipeline stages, arithmetic operations, ﬂexibility, etc. To improve performance further, kernel portions of above mentioned applications that signiﬁcantly reduced performance were identiﬁed. For proof of concept, initially only 2D convolution accelerator was implemented. Both Crimson and convolution hardware accelerator are discussed in detail in this section. Table 1. Crimson DSP processor instruction types Instruction type

Instruction mnemonics

Memory/Register access

mov, load, store

General purpose (RISC/CISC) add, or, lsr, mac, conv

2.1

Flow control

jump, call, return

Accelerator/Custom

conv3 × 3, conv5 × 5

Crimson DSP Processor Architecture

Based on the design and performance requirements for Crimson, Senior DSP processor architecture [7] was adopted as reference design. Senior not only meets

Using DSP-ASIP for Image Processing Applications

551

most of the processor design requirements like two memories etc. but also serves as a platform for hardware accelerators designed for any image processing application. Custom modiﬁcations have been made to the Senior instruction set architecture (ISA) while adopting it for Crimson to speedup image processing applications. Crimson supports 32-bit instructions with direct addressing of 16-bit memory. Data and address width of the processor is 16-bits. Diﬀerent instructions supported by Crimson are given in Table 1. The processor has 32× 16-bit general purpose registers (GRF) and 32× 16-bit special purpose registers (SRF). The processor has four 32-bit accumulator (ACR) registers for storing results of multiplication, convolution etc. The processor has 256 KB program memory (PM) and two separate data memories (DM0 and DM1) of 64 KB each. Figure 1 shows the architectural organization of Crimson for all instructions excluding conv instruction. Conv instruction is a CISC instruction that simultaneously fetches operands from both memories, computes convolution and stores result in ACR register. Crimson pipeline stages for RISC instructions are fetch (IF), decode (ID), operand fetch (OP), execute (EX) and write back (WB). Senior [7] follows a longer pipeline sequence compared with Crimson as can be seen in [10]. This is because Crimson has been designed keeping in mind requirements of image processing applications like low latency, multiple memory access support, low compiler complexity etc. whereas Senior has no such requirements in place. Pipeline sequence includes a Program counter (PC) that points to the next instruction in program memory (PM) which is read into instruction register (IR) at end of IF stage. After ID, register operands are available in OP and arithmetic, logical or shift (ALU) operations take place in EX stage. Results of ALU operations are written to GRF at start of WB stage. Jump and sub-routine operations are managed by the instruction ﬂow controller (PC FSM) that decides the next value of PC based on inputs from ID and EX stage. Memory and register transfer operations also take place in EX and WB stages. For multiply and accumulate (MAC) instruction, multiply operation takes place in EX stage while summation with ACR register takes place in WB stage. To remove data dependency between MAC and ALU instructions, ACR registers have been used in mac instruction. Thus mac instructions can now execute in sequence with most RISC ALU instructions like add, sub, etc. This reduces execution time for algorithms that repeatedly call mac instruction like convolution. For image applications, data memories (DMs) have been directly interfaced with accelerator as shown via dotted line in Fig. 1. DMs are now be accessed via custom instructions as well. Pipeline stalls can signiﬁcantly reduce performance of any processor [12]. These stalls occur because of data hazards/control hazards in subsequent instructions and force no operation (NOP) condition in some pipeline stages which in turn not only increases execution cycles but also makes design of compiler/scheduler more complex. Because of the pipeline structure write after write (WAW), write after read (WAR) and data memory access hazards don’t occur. For read after write (RAW) hazards, register forwarding has been

552

S. Sohail et al.

implemented through a hazard detection unit thus reducing wait/NOP cycles. These design changes have reduced execution cycles, improved performance and made design of scheduler much easier.

Fig. 2. Datapath of 2D convolution accelerator.

2.2

2D Convolution Accelerator Design

2D convolution accelerator design with 3 × 3 and 5 × 5 window sizes is presented here. The accelerators are accessed via custom instructions given in Table 1. The accelerator data path (DP) in Fig. 2 is shared by all supported windows and can be easily modiﬁed to support more window sizes for varying noise levels in an image. Control path (CP) for the convolution accelerator is too extensive for this paper. It consist of separate addressing and control state machine for each window size. Accelerator is directly coupled with vacant SRF registers for passing operands from Crimson core. The data width of multiplier and adder units in the accelerator can be increased based upon requirements. The accelerator is activated by control signals from ID stage as the conv3 × 3/conv5 × 5 instruction is decoded. It starts reading the window coeﬃcients from DM0 and storing them in coeﬃcient ﬁle. Next neighborhood pixels are fetched from DM0 and stored in pixel ﬁle registers, convolution is carried out and output is stored in DM1. This step is repeated for all pixels in the image. Direct coupling of accelerator with data memory reduces memory latency and improves execution time. 2.3

Image Processing System

To evaluate performance of CuSP for image processing applications, 2D convolution was implemented. For Crimson based implementation, general instructions were used whereas for accelerator custom instructions given in Table 1 were used. CuSP at present lacks assembler support, thus a pseudo-assembler was implemented in C/C++ to create 32-bit machine codes for CuSP instructions. CuSP supports gray scale images of diﬀerent resolutions limited only by memory capacity. Maximum supported image size is 250 × 250 pixels as designed system uses DM0 and DM1 for input image and output image storage respectively.

Using DSP-ASIP for Image Processing Applications

553

Each image pixel is stored in as 16-bit words. Steps have been taken to ensure that no extra execution overhead occurs during execution of algorithm. Before starting convolution, input image pixels are always available in DM0 and at the end of routine, output image is available in DM1. Serial transmission delays do not form a part of this system. No DDR/cache setup exists for this system. The program code is loaded into PM at start and cannot be updated during execution. A dedicated counter in CuSP counts clock cycles (cc) that have passed from the start of program till execution of last instruction in the convolution algorithm. For in system evaluation of accuracy, test routine part of convolution code sums every pixel of output image residing in DM1 and returns the sum at an external port. This value is then compared with results from a MATLAB program implementing 2D convolution on the same image and giving accurate pixel sum.

3

Results and Discussion

This section presents results Figs. 3 and 4 of 2D convolution implementation on CuSP and compares its performance with rVEX and MicroBlaze implementations given in [5]. In ﬁgures, Crimson label represents the case when Crimson DSP core general instructions are used for 2D convolution whereas CuSP refers to the accelerator implementation. CuSP simulation was performed on Modelsim and for hardware implementation Digilent Genesys 3 board with Virtex 5 FPGA was used. The Crimson processor and 2D convolution accelerator were synthesized at 83 MHz, the maximum frequency determined from place and route (PAR) analysis. Since each implementation has its own operating frequency e.g. 83 MHz for CuSP, 150 MHz and 75 MHz for MicroBlaze and rVEX respectively, comparison of execution time results was performed in clock cycles instead of seconds. As comparison metric is clock cycles (cc) lower value means better performance. For

Fig. 3. Clock cycles (CC) comparison for 3 × 3 mask.

554

S. Sohail et al.

Fig. 4. Clock cycles (CC) comparison for 5 × 5 mask.

fair comparison with rVEX and MicroBlaze [5], performance must be evaluated using the same image resolutions i.e. 640 × 480, 1024 × 1024 and 1920 × 1080. Since maximum image size supported by CuSP is 250 × 250 thus above image resolutions cannot ﬁt in CuSP memory. But as exact machine cycles required by each assembly instruction in CuSP are known, clock cycles required for 2D convolution on any image size can be calculated. For accelerator, clock cycles taken for 2D convolution on any image in accelerator are known to the designer. From Fig. 3 it is evident that CuSP is upto 23 times faster than MicroBlaze, 13 times faster than rVEX and 6 time faster than Crimson for 3 × 3 convolution window size. While it is 36 times faster than MicroBlaze, 10 times faster than rVEX and 7 times faster than Crimson for 5 × 5 window as seen in Fig. 4. Although authors in [5] propose rVEX processor for speeding up image applications, CuSP oﬀers much better performance through tight coupling of accelerator with Crimson core and direct linking with data memories while avoiding rVEX shortcomings like compiler complexity, algorithm dependant performance, etc. Excluding accelerator, Crimson alone is 3.6 times than MicroBlaze and 2.1 times faster than rVEX for 3 × 3 window size. For 5 × 5 window it is 5.3 times faster than MicroBlaze and 1.5 times faster than rVEX. Crimson core performs better than MicroBlaze and rVEX because of its architectural design and instruction set. Crimson uses mac instruction (refer to Table 1) to combine multiply and accumulate operations whereas on MicroBlaze/rVEX more than two instructions are required to perform both operations. Separate MAC and ALU allow for concurrent execution of most ALU instructions with MAC instructions without pipeline stalls. Hardware looping instructions in Crimson reduce jump overhead costs as loop iterations are known in advance for convolution algorithm. This reduces the number of wasted cycles considerably as problem complexity i.e. window size increases. Although it seems Crimson DSP oﬀers slightly improved performance over rVEX, still it doesn’t suﬀer from compiler complexity of VLIW processor like rVEX for

Using DSP-ASIP for Image Processing Applications

555

non-parallelizable image applications and oﬀers negligible communication delays with seamless accelerator integration absent in MicroBlaze [6].

Fig. 5. 2D convolution output error (5 × 5 mask).

Outputs from MATLAB and CuSP for 2D convolution are shown in Fig. 5. The diﬀerence of results from these implementations is shown in form of an image scaled by the maximum diﬀerence intensity. We have used root mean squared error (RMSE) to evaluate the accuracy of our system as it is frequently used for image quality assessment. m−1 n−1 2 r=0 c=0 (I1 (r, c) − I2 (r, c)) (1) RM SE = m∗n In (1), I1 is the output image from MATLAB simulation and I2 is the output from CuSP implementation of 2D convolution. Evaluation of results reveals that our implementation incurs a RMSE of 2 per pixel for 3 × 3 window and 3 per pixel for 5 × 5 window size. This error has been calculated after averaging RMSE values for convolution over multiple image resolutions e.g. 32 × 32, 250 × 250, 128 × 64, etc. This error occurs because of quantization aﬀects as CuSP stores each pixel as 16-bit ﬁxed point number. By replacing 16 × 16 bit multiplier shown in Fig. 2 with a 24 × 24 bit multiplier, we have been able to reduce the average RMSE to 0.6 for 3 × 3 mask and 0.7 for a 5 × 5 mask. This error is now in tolerable range for our target application areas.

4

Conclusion

Proposed CuSP involving a Crimson processor and 2D convolution accelerator achieves execution time speedup of up to 36 times over Xilinx MicroBlaze and up to 13 times over rVEX softprocessor for 2D convolution. Crismon DSP processor alone oﬀers improvement of up to 5.3 times compared with MicroBlaze

556

S. Sohail et al.

and 2.1 times compared with rVEX. The quantization error for 2D convolution has been reduced by 3 times to further improve results of CuSP. Crimson takes over above mentioned softcores by incorporating DSP speciﬁc features like MAC, hardware looping etc. with hardware accelerators to oﬀer both performance and ﬂexibility for image processing applications.

5

Future Work

In future more hardware accelerators would be added to the instruction set for increasing algorithm coverage. Further additions like, increasing supported image size, reducing quantization error and improving tool chain would make CuSP a likely choice for accelerating image processing applications.

References 1. Atasu, K., Pozzi, L., Ienne, P.: Automatic application-speciﬁc instruction-set extensions under microarchitectural constraints. In: Proceedings of the 40th Annual Design Automation Conference, series DAC 2003, pp. 256–261. ACM, New York (2003) 2. Zhao, K., Bian, J., Dong, S.: A fast custom instructions identiﬁcation algorithm based on basic convex pattern model for supporting ASIP automated design. In: 2007 11th International Conference on Computer Supported Cooperative Work in Design, pp. 121–126, April 2007 3. Samanta, S., Paik, S., Gangopadhyay, S., Chakrabarti, A.: Processing of image data using FPGA-based microblaze core. In: Mantri, A., Nandi, S., Kumar, G., Kumar, S. (Eds.) HPAGC, series Communications in Computer and Information Science, vol. 169, pp. 241–246. Springer (2011) 4. McNichols, J.M., Balster, E.J., Turri, W.F., Hill, K.L.: Design and implementation of an embedded NIOS II system for JPEG2000 tier II encoding. Int. J. Reconﬁg. Comput. 2:2 (2013) 5. Hoozemans, J., Wong, S., Al-Ars, Z.: Using VLIW softcore processors for image processing applications. In: 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), pp. 315–318, July 2015 6. Shrimal, D., Kumar Jain, M.: Instruction customization: a challenge in ASIP realization. Int. J. Comput. Appl. 98(15), 22–26 (2014) 7. Liu, D.: Embedded DSP Processor Design, vol. 2. Elsevier (2008) 8. Vityazev, S., Kharin, A., Savostyanov, V., Vityazev, V.: TMS320C66x multicore DSP eﬃciency in radar imaging applications. In: 2015 4th Mediterranean Conference on Embedded Computing (MECO), pp. 115–118, June 2015 9. Rangarajan, P., Kutraleeshwaran, V., Vaasanthy, K., Perinbam, R.P.: HDL synthesis and simulation of eight bit DSP based micro-controller for image processing applications. In: The 2002 45th Midwest Symposium on Circuits and Systems MWSCAS-2002, vol. 3, pp. III–609–612, August 2002 10. Liu, D., Tell, E.: Senior Instruction Set Manual. Linkoping University, Tech. Rep. (2008)

Using DSP-ASIP for Image Processing Applications

557

11. Tell, E., Olausson, M., Liu, D.: A general DSP processor at the cost of 23K gates and 1/2 a man-year design time. In: 2003 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings ICASSP 2003, vol. 2, pp. II–657– 660, April 2003 12. Lilja, D., Sapatnekar, S.: Designing Digital Computer Systems with Verilog. Cambridge University Press, Cambridge (2004)

Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms Mahadi Hasan(&), Mehnaz Tabassum, and Md. Jakir Hossain Department of Computer Science and Engineering, Jagannath University, Dhaka, Bangladesh [email protected], [email protected], [email protected]

Abstract. In the age of internet, protecting our information has turned into important role as protecting our wealth. The drill is to protect both physical and digital information from destruction or unauthorized access. Hackers want to damage or steal our information. Steganography and cryptography are techniques for hidden communication. In this research work, we follow three techniques to improve the security; ﬁrst we create a digest of message of our original message and then use steganography. Here the encryption is done by using the AES (Advanced Encryption Standard) algorithm and it can restore the previously hidden data to create a stego image. At the beginning, Secure Hash Algorithm (SHA-512) is used for making message digest. Message digest is used to check the validity of the original message. Finally steganography cover the desired message which is intricate and more secure. Keywords: Steganography

AES Message digest LSB Cryptography

1 Introduction From 2000–2016 the growth of Internet users worldwide reached at 67.7% and Asia continent is the highest users of internet, regarded for 50% of entire internet users worldwide [12 & Google analytics]. In today’s age of rapid development in technology, the technologies have more developed so fast that most of the users use internet to transfer information from one to another across the world. So secrecy in digital communication is needed when conﬁdential information is being shared between two users. To provide security, various techniques are used. Steganography, Secure Hash Algorithm (SHA-512) and Cryptography techniques have been used in this research. Most researchers used two level data security system. But we use three level data security system. We used AES for text encryption and decryption and LSB used for creating steganography image for hiding data. Finally, we used Secure Hash Algorithm (SHA-512) for creating message digest and checking the validity of the original message or wrong message. If anyone changes the original message, we will ﬁnd out it by checking previous message digest and current message digest. The reasonable goal of this paper is increasing the security. We build a strong security with cryptographic and steganographic algorithms, that ensure the safekeeping of information exchange. © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 558–568, 2019. https://doi.org/10.1007/978-3-030-01174-1_42

Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms

559

2 Literature Review Now a day’s data security is most important issue. There are three security goals like conﬁdentiality, integrity and availability. Many researchers implemented text encryption and decryption using various techniques. Thesis papers [1, 2] conceal the message using AES cryptography algorithm and LSB. It is very difﬁcult to identify distinguish between real image and steganography image. These papers did not used hash function. For Information hiding paper [3] used Least Signiﬁcant Bit Steganography and Blowﬁsh Algorithm. And the other papers [4, 6, 7, 11] used LSB for steganography and cryptography algorithm used for data encryption message. Steganography used for hiding the text Information in paper [5]. In [8] used AES, DES and Blowﬁsh Algorithm’s-DES and AES algorithm used on these papers [9–11] ﬁnd lesser encryption time and decryption time. The papers [9, 11] used AES and [10] used SDES algorithm for encryption.

3 Proposed Methodology Based on the importance about data security, the problem formulation of this paper is how the LSB method can be used as one of the Steganography methods combined with AES cryptographic methods to hide messages into a digital image. In the Fig. 1 shows the research used the Logical LSB method for hiding a message on the digital image with AES algorithms for cryptographic messages and check message security with hash algorithm SHA-512. At ﬁrst we take a message and an image as input. The message uses as a plaintext. Then we create a message digest by using Secure Hash Algorithm (SHA-512). The plaintext is encrypted by Advanced Encryption Standard (AES).

Fig. 1. Block diagram of our proposed model.

The encrypted message hides within the image by using Least Signiﬁcant Bit (Steganography LSB) method. For decryption process, ﬁrst we retrieve the encrypted message from the steganography image. Then we decrypt the message by using same

560

M. Hasan et al.

algorithm (AES). Again we create another message digest which is called current message digest from the decrypted text. Finally, we match the similarity between the previous message digest and the current message digest. If we get similarity within two message digest, we will show the message. Otherwise the message will be rejected. Thus we complete our proposed works. 3.1

Generating Message Digest with SHA-512

First we take a message which is called plaintext. Figure 2 exposes Secure Hash Algorithm (SHA-512) works on the plaintext for making the 512 bits message digest.

Fig. 2. Messages digest with SHA-512.

3.2

Encryption Process with AES (Advanced Encryption System)

Figure 3 shows encryption process is running on plaintext to create ciphertext. Advanced Encryption System (AES) algorithm is applied for making ciphertext.

Fig. 3. Cipher text using AES.

Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms

3.3

561

Decryption Process with AES (Advanced Encryption System)

Figure 4 exposes that decryption processes is inversed of encryption. Plain text returns after decryption process. Thus we ﬁnd our original message without any changing.

Fig. 4. Plaintext using AES.

3.4

Message Inserting with LSB

Today, when converting an analog image to digital format, we usually choose between three different ways of representing color: (1) 24-bit color: Every pixel can have one in 2^24 colors, and these are represented as different quantities of three basic colors: red (R), green (G), blue (B), given by 8 bits (256 values) each. (2) 8-bit color: Every pixel can have one in 256 (2^8) colors, chosen from a palette, or a table of colors. (3) 8-bit gray-scale: Every pixel can have one in 256 (2^8) shades of gray. LSB insertion modiﬁes the LSBs of each color in 24-bit images, or the LSBs of the 8-bit value for 8-bit images. Example, say our message has ‘M’ letter an ASCII code of 77(decimal), which are 01001101 in binary. We need three consecutive pixels for a 24-bit image to store a ‘M’. Let’s say that the pixels before the insertion are: 11001101, 01001101 and 11001101. In Fig. 5 shows that ﬁrst we check our condition if it successfully then next step occurs for RGB pixels. After the insertion of a ‘M’ will be:11001101, 11001101, and 11001101.

562

M. Hasan et al.

Fig. 5. Boolean AND operation before inserting.

The method of steganography is an important motivation for feature selection. Figure 6 exposes a new stegnographic algorithm for 8-bit (gray scale) or 24-bit (color image) is presented in our thesis paper, based on Logical operation. Algorithm embedded ASCII code of text into LSB of cover image.

Fig. 6. Boolean OR operation for inserting.

3.5

Message Embedding into an Input Image

Encrypted message is hiding into an image by using Least Signiﬁcant Bit (LSB) that creates a stego image that shown in Fig. 7. Some parts of the image change very small but it is very difﬁcult to identify distinguish between real image and stego image. Human eyes cannot see encrypted message into stego image. Given two pictures in the below are of real image and stego image. Figure 8 shows the screen view of the total procedure.

4 Result Analysis 4.1

Testting Encode

Our research resolved how much an image can effectively allow the messages without changing any important parts in the image quality the Table 1 exposes it. An example is given below with three sample images and message text.

Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms

Input Real Image

563

Steganography Image

Fig. 7. Message embedding into an input image.

Fig. 8. Screen view of total procedure. Table 1. Testing encode Image name Boat.bmp pond.bmp stone.bmp

4.2

Image size 943 KB 734 KB 1.83 MB

Dimension 700 460 544 461 800 600

Stego image size Message text size Status 944 KB 16 byte Success 736 KB 48 byte Success 1.84 MB 128 byte Success

Testing Decode

In our research paper, we can recover the hidden message which is attached within the image after encode that shown in Table 2.

564

M. Hasan et al. Table 2. Testing decode Image name Boat.bmp Pond.bmp Stone.bmp

4.3

Stego image size Dimension 944 KB 700 460 736 KB 544 461 1.83 MB 800 800

Image size 943 KB 734 KB 1.84 MB

Message text size Status 16 byte Success 48 byte Success 128 byte Success

Testing of Cryptrography AES

Table 3 test shows that to create a ciphertext (encryption) from the message and recover the message (decryption) from the ciphertext. Based on cryptography test on table the plaintext is entered successfully created into ciphertext. Table 3. Testing of cryptography AES No. 1 2 3

4.4

Number of characters Key 32 32 48 48 64 64

Size KB Number of characters 128 32 192 48 256 64

Cipher Text Testing

Table 4 shown the resulting ciphertext cannot be predicted which is created from the plaintext by using AES cryptography. Cipher text is given below. 4.5

Imperceptibility Testing

It is very difﬁcult to see the secret message which is hidden into image. It also cannot distinguish by eye between ordinal image stego images Table 5 shows it. 4.6

Reliability Criteria

Figures 9, 10 and 11 displays its graphical representation of image histogram is distribution of RGB color in a digital image. In reliability criterion is that the image quality has not much changed after the addition of message, stego image looks good. Pictured above is the RGB color histogram display after insertion message. 4.7

Recovery Criteria

The hidden message can be recovered in Table 6 this process is called Recovery criteria Messages are inserted into image and created stego image.

Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms

565

Table 4. Cipher text testing

5 Discussion Though, we know that AES-192 and AES-256 can be attacked by related key cryptanalysis, more security needed in this issue in now and future. The basic goal of this paper is to ensure three levels security through a three sequence process, rather than hidden the message bits directly in cover image and encryption. In this paper, we applied a hash function SHA-512 that improves the security of a message stronger than previous method [1, 2, 5, 6]. Then LSB addition method works by replacing the one bit with bits of secret message text. Here we use three bit whether others used at least four bits previously. The following Table 7 shows the comparison of our works with the related.

6 Limitations and Future Work This research paper works on digital image ﬁles only to the media container, is expected to be developed so that it supports audio ﬁles, video, and others as a medium.

566

M. Hasan et al. Table 5. Imperceptibility testing

Fig. 9. Comparison between Boat.bmp real image and stego image.

Fig. 10. Comparison between Pond.bmp real image and stego image.

In future, the total procedure can be converted through some other development programming languages, i.e. Java, Ruby, Python and others; by which we can develop it into its application form.

Efﬁcient Image Steganography Using Adaptive Cryptographic Algorithms

Fig. 11. Comparison between Stone.bmp real image and stego image. Table 6. Recovery criteria No.

Stego image

Length of prepared character

Status

1

32 character 80 character 160 character

Successfully recover

2

32character 80 character 160 character

Successfully recover

3

32 character 80 character 160 character

Successfully recover

567

568

M. Hasan et al. Table 7. Our proposal vs reference Reference & Method

Message Digest (SHA)

Reference [1], [2], [5] & [6]

Encryption

√

Reference [17]

√

Our proposal

√

LSB with Image

√ √

√

√

References 1. Nurhayati, S., Ahmad, S.S., Hidayatullah, S.: Steganography for Inserting Message on Digital Image Using Least Signiﬁcant Bit and AES Cryptographic Algorithm The world’s largest technical professional organization for the advancement of technology the IEEE Xplore 2. Bhardwaj, R., Khanna, D.: Enhanced the security of image steganography through image encryption. In: INDICON 2015 1570172921. IEEE, India. Engineering Department, Thapar University, Patiala 3. Patel, K., Utareja, S., Patel, H.G.: Information hiding using least signiﬁcant bit steganography and blowﬁsh algorithm. Int. J. Comput. Appl. 63(13) (2013). Institute of Technology Ratibad, Bhopal 4. Harshitha, K.M., Vijaya, P.A.: Secure data hiding algorithm using encrypted secret message. Int. J. Sci. Res. Publ. 2(6) (2012) 5. Vennice, M.G., Rao, T.V., Swapna, M., Sasikiran, J.: Hidingthe text information using steganography. Int. J. Eng. Res. Appl. (IJERA) 2(1) (2012) 6. Tyagi, V.: Data hiding in image using least signiﬁcant bit with cryptography. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(4) (2012) 7. Gupta, S., Goyal, A., Bhushan, B.: Information hiding using least signiﬁcant bit steganography and cryptography. In: Modern Education and Computer Science, June 2012 8. Thakur, J., Kumar, N.: DES, AES and blowﬁsh: symmetrickey cryptography algorithms simulation based performance analysis. Int. J. Emerg. Technol. Adv. Eng. 1(2) (2011) 9. Sarmah, D.K., Bajpai, N.: Proposed system for data hiding using cryptography and steganography. Int. J. Comput. Appl. (2010) 10. Agarwal, A.: Security enhancement scheme for image steganography using S-DES technique. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(4) (2012) 11. Phad Vitthal, S., Bhosale Rajkumar, S., Panhalkar Archana, R.: A Novel security scheme for secret data using cryptography and steganography. Int. J. Comput. Netw. Inf. Secur. (2012) 12. National Institute of Standards and Technology 2001. Speciﬁcation for The Advanced Encryption Standard (AES), 25 July 2014. http://csrc.nist.gov/publications/ﬁps/ﬁps197/ﬁps197.pdf

Texture Classification Framework Using Gabor Filters and Local Binary Patterns Farhan Riaz(B) , Ali Hassan, and Saad Rehman National University of Sciences and Technology (NUST), Islamabad, Pakistan {farhan.riaz,alihassan,saadrehman}@ceme.nust.edu.pk

Abstract. In this paper, a novel method towards rotation, scale and illumination invariant texture image classiﬁcation is introduced. We exploit the useful rotation and scale characteristics of Gabor ﬁlters and illumination invariance characteristics of LBP, proposing image features which are invariant to the above mentioned imaging dynamics. The images are ﬁrst ﬁltered using Gabor ﬁlters followed by a summation of ﬁlter responses across scales. An LBP of the resulting features is calculated followed by an integral histogram of LBPs across various orientations of the Gabor ﬁlters. An experimental validation of the invariance of the descriptor is shown on a texture classiﬁcation problem using two publicly available datasets: the USC-SIPI, and CUReT texture datasets. Our experiments show that the proposed descriptor outperforms the other methods that have been considered in this paper. Keywords: Image texture

1

· Pattern classiﬁcation · Gabor ﬁlters

Introduction

Texture is a vital visual property in image classiﬁcation and has been an active area of research recently. The analysis of texture has many potential applications such as remote sensing, aerial imaging, biomedical imaging, etc. Keeping in view the importance of the description of visual texture, many methods have been proposed by diﬀerent researchers. The design of methodologies tackling this issue are speciﬁc to the requirement of the intended application. A major issue in this area is that the real world textures are often dynamic due to the variations in orientation, scale, illumination or other visual appearances of the images. Traditionally, the texture feature extraction methods can be mainly divided into four categories [1]: (1) statistical, (2) structural, (3) modeling based, and (4) signal processing methods. Mostly the statistical and modeling based methods assume the spatial relations in the images over a relatively small neighborhood and thus are more suitable for micro-textures. In fact, the scale at which an image should be analyzed is not ﬁxed is a variable that should depend on the image contents. Some images having a ﬁne texture should be analyzed at smaller scales whereas some (such as the texture of a brick) should be analyzed at least at a speciﬁc scale to c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 569–580, 2019. https://doi.org/10.1007/978-3-030-01174-1_43

570

F. Riaz et al.

correctly identify texture in the images. Structural methods are more commonly used for semi-regular or regular textures and thus are not generic enough to yield a good classiﬁcation of all types of images textures. Therefore, such methods are more apt for highly deterministic textures. Many researchers have proposed ﬁlter bank based methods to mitigate these problems. Filter based methods can be mainly categorized as: (1) spatial ﬁltering, (2) frequency ﬁltering and (3) multiresolution approaches. The former two approaches are limited as they do not allow an analysis of images at diﬀerent scales which is one of the main requirements for an eﬀective texture description. Given this, we have chosen Gabor ﬁlters as the prime approach to our texture feature extraction methodology. Our choice is motivated by several important considerations: Firstly, the Gabor ﬁlters mimic the cells in the visual cortex of the mammals. This is because of the band pass nature of the ﬁlters and their directional selectivity which is an important characteristics which is also shared by these cortical cells. Additionally, the Gabor ﬁlters result in an optimal spatiofrequency localization of the ﬁlters resulting in a very eﬃcient multiresolution analysis. Previously, variants of Gabor features have been used extracting texture features. Kamarainen et al. [2] performed reordering of the Gabor responses, achieving rotation and scale invariance for various applications. Riaz et al. [3,4] proposed the use of autocorrelation and discrete fourier transform on a speciﬁc arrangement of Gabor ﬁlter responses to obtain the descriptors which are invariant to scale, rotation and homogeneous illumination. Xie et al. [5] devised a scheme for obtaining angular normalization and a scale searching method for extracting features. Some others [6–8] have proposed image features which are either rotation or scale invariant. In most of these approaches, either the features which are rotation or scale invariant have been proposed or, the induction of invariance involves a point wise comparison, which is not suitable for classiﬁers based on feature spaces such as support vector machines. Those which are invariant (such as Riaz et al. [3,4]) are not invariant to illumination gradients. In this paper, we propose the Invariant Gabor Local Binary Patterns (IGLBP) which are rotation, scale, illumination and illumination gradient invariant visual descriptors. The features can be used for texture analysis in methods that demands more robust texture features. A histogram of the features is obtained which can be used in combination with the state-of-the-art machine learning techniques to classify the images. We have demonstrated the robustness of the descriptor to various transformation in the images. The paper is organized as follows: In Sect. 2, we discuss the methods, followed by our experimental results (Sect. 3) and conclude the paper (Sect. 4).

2

Methodology

In this section, we will discuss the details of the proposed framework (Fig. 1) used for classifying texture images.

Texture Classiﬁcation Framework Using Gabor Filters Input Image

Gabor Filtering

GaborR

Uniform LBP

Classification

Integral Histogram of LBP

Output Labels

571

Fig. 1. Feature extraction and classiﬁcation framework.

2.1

Gabor Filters

The ﬁrst step involved in feature extraction is Gabor ﬁltering. Recently, Gabor ﬁlters have been widely used for applications related with texture classiﬁcation. The 2D Gabor ﬁlters are very similar to the visual cortex of mammals [9] making them a suitable candidate as image descriptors. Another important consideration is the ability of Gabor ﬁlters to achieve optimum localization in space and frequency [10]. A Gabor ﬁlter consists of a sinusoid wave with a particular frequency, modulated by a Gaussian function. The Gabor transformation of an image is represented by the convolution of the images with a bank of Gabor ﬁlters (Fig. 2).

Fig. 2. Visual depiction of the notion of multiresolution analysis in Gabor ﬁlters (adapted from [11]).

572

F. Riaz et al.

Gθ,σ (x, y) = I(x, y) ∗ ψθ,σ (x, y)

(1)

where I(•) denotes the input images, ψθ,σ (•) represents the Gabor ﬁlter with an orientation θ and scale σ and ∗ is the convolution operator. A Gabor ﬁlter can be mathematically deﬁned as follows: f2 − e ψ(x, y) = πγη

f2 γ2

2

x2 + fη2 y 2

ej2πf x

x = x cos θ + y sin θ y = −x sin θ + y cos θ

(2) (3)

where f denotes the centroid frequency, θ is the angle of the major axis of the Gaussian curve, γ is the deviation along the major axis and η is the deviation along the minor axis. In the given form, λ = γη is the aspect ration of the Gaussian function. The frequency representation of a Gabor ﬁlter can be given as is Ψ(u, v) = e

−π 2 f2

(γ 2 (u −f )2 +η2 v2 )

u = u cos θ + v sin θ v = −u sin θ + vcosθ

(4) (5)

which is a bandpass ﬁlter in the frequency domain. If S is the number of scales and N is the number of angles at which the Gabor ﬁlters are calculated, we obtain a feature vector of size S × N for every pixel in a Gabor ﬁltered image. 2.2

Scale Invariant Gabor Responses

The feature vectors obtained after Gabor ﬁltering are sensitive to various scales and orientations of the images. Thus, for two diﬀerent images which are just rotated or scaled versions of one another their ﬁlter responses will not be the same and thus will result in distinct features. This is because, if an image is scaled by a speciﬁc factor, its ﬁlter response will be captured by a ﬁlter that is scaled by the same amount as that of the image [2], resulting in a diﬀerent feature vector. This is not desirable in many practical scenarios. In order to achieve scale invariance in our proposed framework, we sum the Gabor ﬁlter responses [12] across diﬀerent scales: Gθ =

S

Gθ,σ (x, y)

(6)

σ=1

where S is the number of scales that have been used for our Gabor decomposition. It should be noted that Gθ is scale invariant as the notion of scale sensitivity has been removed from the Gabor responses due to the summation operation.

Texture Classiﬁcation Framework Using Gabor Filters

2.3

573

Local Binary Patterns

Next, for each scale invariant Gabor image, the local binary patterns (LBP) are computed. The LBP operator is used to compare a pixel with its neighbors thresholding the neighbor with reference to the center pixel, thus generating a pattern [13] (Fig. 3). LBPP,R =

P −1

p

s(gp − gc )2 , s(x) =

p=0

1 0

x≤0 x>0

(7)

where gc and gp represent the gray values of the center pixel and its neighbor respectively, and p depicts the pth neighbor. P is the total number of the neighbors, and R is the radius of the set surrounding a pixel. Suppose that gc is (0, 0), the spatial location of each neighboring pixel gp is computed according to its index p and parameters (P, R) as (R cos(2πp/P ), R sin(2πp/P )). The intensity values of the neighbors not available at the image grid is estimated using interpolation. Statistics show that among all LBPs in an image, over 90% of the patterns usually indicate the intrinsic properties of the textures and have the ability to characterize the microtextures such as corners, edges, etc. (Fig. 4). These are known as uniform patterns [14]. If an image is characterized using only uniform LBPs, the feature size can be reduced while not making a signiﬁcant compromise on the discrimination power between diﬀerent patterns. To quantify the uniformity of an LBP, the following parameter is deﬁned as: LBP u

P,R

= |s(gP −1 − gc ) − s(g0 − gc )|+ P −1

|s(gp − gc ) − s(gp−1 − gc )|

(8)

p=1

which corresponds to a count of spatial transitions (bitwise 0/1 changes) in the pattern. In our framework after summation of Gabor responses, we take uniform LBP of the scale invariant Gabor responses giving us a local description of the neighborhood of the scale invariant Gabor ﬁltered images (Fig. 4). ηθj = LBP u (Gθ ) P,R

(9)

where ηθj represents the uniform LBP image at j th orientation. 2.4

Integral Histogram of LBPs

The features obtained using (9) are not invariant to the rotations in the images. This is because for every rotation, we will have a distinct ηθ and these values will change when the images undergo diﬀerent rotations. In this paper, we plan to cater for these variations by merging the Gabor LBPs in the form of cumulative histograms. These histograms are extracted for each bin of the histogram by counting the cumulative count of pixels falling into that bin at diﬀerent rotations.

574

F. Riaz et al.

Fig. 3. Calculation of local binary patterns.

Fig. 4. Uniform local binary patterns, corresponding to the relevant micropatterns.

This gives us the histograms of images at various resolutions. We deﬁne the integral histogram as: IGLBP =

R j=1

fθj (LBP ui (Gθ )) P,R

(10)

where f (•) denotes the frequency of occurrence of an LBP pattern represented by the ith bin in the histogram and is the union operator that is as follows: IGLBP equals the sum of the previously visited LBP histogram bins, that is the sum of all f (•) at various resolutions for a given ith LBP. The integral LBP histogram is invariant to image rotation. This is because when an image is rotated, its response will be captured by diﬀerent Gabor ﬁlters (rotated by the same amount) yielding diﬀerent LBPs [2]. However, when the integral histogram is calculated, the information at all rotations is summed up in the form of

Texture Classiﬁcation Framework Using Gabor Filters

575

cumulative bins that will remove the dependence of the features on image rotation. The resulting image features are invariant to rotation, scale and illumination changes in the images. 2.5

Classification

In this paper, we have used a support vector machines (SVM) classiﬁer for classiﬁcation purposes. The SVM, originally proposed by Vapnik et al. [15] mainly consists of constructing an optimum hyperplane that maximizes the separating margin between two diﬀerent classes. SVM classiﬁcation uses a kernel function to construct the hyperplane. This approach typically constructs the classiﬁcation models which have excellent generalization ability thus making it a powerful tool in various applications [16]. For our implementation, we used the SVM classiﬁcation with linear kernel. The weka1 tool was used in our experiments [17]. For calculation of our results, we used 10-fold cross validation.

3

Experiments

In this section, we will empirically demonstrate the invariance characteristics of the novel descriptor and compare it with some other state-of-the-art methods. For all our experiments, we have used Gabor ﬁlters with six orientations and eight scales. This selection was done empirically using grid search. Nonetheless, the relative results in our experiments are consistent irrespective of the ﬁlter parameters. 3.1

Invariance of IGLBP

To analyse the rotation and scale invariant characteristics of the descriptor, we used images from the USC-SIPI dataset [18] (used in several other papers for the same purpose [19–21]). It has 13 images from the Brodatz album, each captured at seven diﬀerent orientations (0, 30, 60, 90, 120, 150 and 200◦ ) creating a dataset of 91 images. Each images has a resolution of 512x512 pixels with 8 bits per pixel. To demonstrate rotation invariance, we have used the original images (unrotated) for training the classiﬁer and their rotated versions for testing the classiﬁer. Classiﬁcation was performed using SVM. We compared the performance of IGLBP with several other feature extraction methods. The classiﬁer performance was evaluated using overall classiﬁcation accuracy. Experiments demonstrate that IGLBP show good invariance characteristics in comparison with other methods (Table 1). For the demonstration of scale invariance of IGLBP, another experiment was performed. In this experiment, the unrotated 13 texture images were used for training the classiﬁer. For testing, the features were extracted from the resampled texture images (×0.66, ×0.8, ×1.25 and ×1.5). Bicubic interpolation was used for 1

http://www.cs.waikato.ac.nz/ml/weka/.

576

F. Riaz et al.

Table 1. Classiﬁcation accuracy when images are rotated at diﬀerent angles (IGLBP Invariant Gabor Local Binary Patterns; AHT - Autocorrelation Homogeneous Texture; HT - Homogeneous Texture) 30◦

60◦

90◦

120◦ 150◦ 200◦ Mean

IGLBP 0.92 0.70 0.85 0.70 0.77 0.62 0.76 AHT

0.62 0.46 0.69 0.62 0.85 0.62 0.64

HT

0.23 0.08 0.08 0.15 0.31 0.23 0.22

resizing the images. Our experiments show that for small changes in the image scale, IGLBP show good invariance characteristics. For a relatively large change in scaling of the images, IGLBP perform better as compared to the Homogeneous Texture (HT) features but AHT features perform even better. This is because the IGLBP use LBP for feature construction which are local image descriptors. When the images are scaled by a higher scaling factor, the images details are lost or smoothed for downsampling and upsampling respectively, thus eﬀecting the discrimination capability of IGLBP (Table 2). Table 2. Classiﬁcation accuracy when images are scaled by various factors (IGLBP Invariant Gabor Local Binary Patterns; AHT - Autocorrelation Homogeneous Texture; HT - Homogeneous Texture) ×0.66 ×0.8 ×1.25 ×1.5 Mean IGLBP 0.39

1

1

AHT

0.92

1

1

HT

0.23

0.23 0.62

0.62 0.75 1

0.98

0.46 0.39

To do a further rigorous testing of our proposed descriptor, we have performed another set of experiments. In these experiments, we have analysed the eﬀect of scaling as well as rotation on the performance of the IGLBP. For this purpose, the 13 texture images (unrotated) were used for training the classiﬁer whereas the rotated (30, 60, 90, 120, 150, 200) and scaled (×0.66, ×0.8, ×1.25, ×1.5) images were used for testing the performance of the classiﬁer (Table 3). Our experiments show that the IGLBP features show good classiﬁcation results when the images are subjected to both scale and rotation changes. The only exception is when the images are scaled by 0.6. In this case, the image loses a lot of detail as the image has been subsampled. The LBPs are calculated from the local neighborhood of an pixel by thresholding the diﬀerence between a reference pixel and its surrounding pixels. With a signiﬁcant subsampling the loss in image details result in signiﬁcant changes in the LBPs thus eﬀecting the invariance characteristics of a IGLBP.

Texture Classiﬁcation Framework Using Gabor Filters

577

Table 3. Classiﬁcation accuracy when the images are rotated and scaled. The rows for each method correspond to the scaling factors of ×0.66, ×0.8, ×1.25 and ×1.5 (IGLBP Invariant Gabor Local Binary Patterns; AHT - Autocorrelation Homogeneous Texture, HT - Homogeneous Texture)

3.2

30◦

60◦

90◦

120◦ 150◦ 200◦ Mean

IGLBP ×0.66 ×0.8 ×1.25 ×1.5

0.39 0.54 0.85 0.85

0.46 0.85 0.62 0.62

0.46 0.62 0.46 0.46

0.39 0.62 0.62 0.54

0.39 0.62 0.62 0.54

0.39 0.62 0.70 0.62

0.42 0.65 0.65 0.61

AHT

×0.66 ×0.8 ×1.25 ×1.5

0.38 0.54 0.54 0.54

0.23 0.46 0.38 0.38

0.69 0.62 0.62 0.69

0.46 0.54 0.54 0.54

0.46 0.77 0.77 0.77

0.46 0.62 0.54 0.69

0.45 0.59 0.57 0.60

HT

×0.66 ×0.8 ×1.25 ×1.5

0.08 0.15 0.23 0.08

0.15 0.23 0.23 0.23

0.23 0.23 0.15 0.3

0.23 0.15 0.3 0.3

0.15 0.3 0.3 0.15

0.08 0.15 0.08 0.15

0.15 0.20 0.22 0.20

Results on CUReT Dataset

We have also performed our experiments on the CUReT album consisting of 61 textures acquired under diﬀerent imaging conditions. This is a challenging dataset given that some of the texture samples are very similar. Moreover, the variation in imaging conditions further complicates the feature extraction process. A total of 205 images from each texture type have been acquired. Following the evaluation methods adopted by some of the previous works on the CUReT dataset [22], we have selected 92 images from each texture type. Half of the images from each texture sample (46 images) were used for training the classiﬁer and the remaining half (46 images) were used for testing the classiﬁer. Thus, features from a total of 5612 images were obtained (61 textures x 92 images) that wee equally divided into training and testing sets (Fig. 5). Our experiments show that the proposed descriptor outperforms the other methods that have been used in this paper. We owe this performance to the improved invariance characteristics of the proposed descriptor. The CUReT dataset consists of images acquired with diﬀerent viewing perspectives and illumination characteristics. The HT descriptor which is not invariant at all shows poor classiﬁcation results whereas the AHT descriptor that is invariant to rotation and scale changes performs much better as compared to that of HT. The best performance is shown by IGLBP that exhibits better invariance characteristics as as compared to AHT as it also caters for the changing illumination characteristics of the images (Table 4).

578

F. Riaz et al.

Fig. 5. Texture images from Columbia-Utrecht database. The color images were converted to gray scale for our experiments.

Table 4. Classiﬁcation performance of diﬀerent texture descriptors on the CUReT texture dataset Methods Accuracy ROC area

4

IGLBP 0.91

0.98

AHT

0.90

0.98

HT

0.70

0.85

Conclusions

In this paper, we have introduced a novel texture descriptor, the IGLBP, which is invariant to rotation, scale, homogeneous illumination and illumination gradients in the images. Our framework involves Gabor ﬁltering of the images, which decomposes the image into a number of ﬁlter responses that are sensitive to image orientation and scale. The eﬀect of scaling is curtailed by summation of ﬁlter responses across various scales. The LBPs on these responses are calculated in the next step in which homogeneous illumination and illumination gradients are removed followed by a calculation of a histogram of LBP for each Gabor image (sensitive to orientation). Later, the integral histogram of all LBP histograms is calculated which normalizes the eﬀect of rotations in the images. Thus we get an invariant image descriptor. Feature extraction is followed by dimensionality reduction (using PCA) given that the feature space is typically skewed.

Texture Classiﬁcation Framework Using Gabor Filters

579

The invariance of IGLBP is demonstrated empirically using the USC-SIPI image dataset which consists of images from Brodatz texture album digitized at various rotations. The images are artiﬁcially scaled to demonstrate the invariance characteristics of the proposed descriptor. Experiments demonstrated the robustness of IGLBP to image transformations with the exception that when the image is subsampled by a large amount, the image details are lost compromising on the classiﬁcation accuracy of IGLBP. To further validate the superiority of the proposed descriptor, we have performed our experiments on the CUReT dataset. This is a challenging dataset as it consists of images with varying lighting conditions and viewing perspectives. Our experiments show that the proposed descriptor outperforms the other descriptors that have been considered in this paper.

References 1. Selvan, S., Ramakrishnan, S.: SVD-based modeling for texture classiﬁcation using wavelets transformation. IEEE Trans. Image Process. 16(11), 2688 (2007) 2. Kamarainen, J.-K., Kyrki, V., Kmrinen, H.: Invariance properties of gabor ﬁlter based features - overview and applications. IEEE Trans. Image Process. 15(5), 1088 (2006) 3. Riaz, F., Silva, F., Ribeiro, M., Coimbra, M.: Invariant gabor texture descriptors for classiﬁcation of gastroenterology images. IEEE Trans. Biomed. Eng. 59(10), 2893–2904 (2012) 4. Riaz, F., Hassan, A., Rehman, S., Qamar, U.: Texture classiﬁcation using rotationand scale-invariant gabor texture features. IEEE Signal Process. Lett. 20(6), 607– 610 (2013) 5. Xie, X., Dai, Q., Lam, K.M., Zhao, H.: Eﬃcient rotation- and scale-invariant texture classiﬁcation method based on gabor wavelets. J. Electron. Imaging 17(4), 1–7 (2008) 6. Arivazhagan, S., Ganesan, L., Priyal, S.P.: Texture classiﬁcation using gabor wavelets based rotation invariant features. Pattern Recognit. Lett. 27(16), 1976 (2006) 7. Jafari-Khouzani, K., Soltanian-Zadehl, H.: Radon transform orientation estimation for rotation invariant texture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 1004 (2005) 8. Manthalkar, R., Biswas, P.K., Chatterji, B.N.: Rotation invariant texture classiﬁcation using even symmetric gabor ﬁlters. Pattern Recognit. Lett. 24(12), 2061 (2003) 9. Field, D.: Relations between the statistics of natural images and the response properties of cortical cells. IEE J. Radio Commun. Eng. 4(12), 2379 (1987) 10. Granlund, D.H.: In search of a general picture processing operator. Comput. Graph. Image Process. 8, 155 (1978) 11. Kmrinen, J.K.: Feature extraction using gabor ﬁlters. Ph.D. thesis, Lappeenranta University of Technology (2003) 12. Tao, D., Li, X., Wu, X., Maybank, S.: General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1700–1715 (2007)

580

F. Riaz et al.

13. Ojala, T., Pietikainen, M., Harwood, D.: A comparative study of texture measures with classiﬁcation based on feature distributions. Pattern Recognit. 29, 51 (1996) 14. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classiﬁcation with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002) 15. Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1979) 16. Byun, H., Lee, S.W.: A survey of pattern recognition applications of support vector machines. Int. J. Pattern Recognit. Artif. Intell. 17(3), 459–486 (2003) 17. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009) 18. Weber, A.: The usc-sipi image database version 5. USC-SIPI Report 315, pp. 1–24 (1997) 19. Zhou, H., Wang, R., Wang, C.: A novel extended local-binary-pattern operator for texture analysis. Inf. Sci. 178(22), 4314–4325 (2008) 20. Li, Z., Hayward, R., Walker, R., Liu, Y.: A biologically inspired object spectraltexture descriptor and its application to vegetation classiﬁcation in power-line corridors. IEEE Geosci. Remote. Sens. Lett. 8(4), 631–635 (2011) 21. Ojala, T., Pietik¨ ainen, M., M¨ aenp¨ aa ¨, T.: Gray scale and rotation invariant texture classiﬁcation with local binary patterns. In: Computer Vision-ECCV 2000, pp. 404–420 (2000) 22. Varma, M., Zisserman, A.: A statistical approach to texture classiﬁcation from single images. Int. J. Comput. Vis. 62(1), 61 (2005)

Biometric Image Enhancement, Feature Extraction and Recognition Comprising FFT and Gabor Filtering Al Bashir(&), Mehnaz Tabassum, and Niamatullah Naeem Department of Computer Science and Engineering, Jagannath University, Dhaka, Bangladesh [email protected], [email protected], [email protected] Abstract. Biometrics is a technology used to identify, analyze, and measure an individual’s physical and behavioral characteristics. It is used for authenticating and authorizing a person. Among all other biometric authentication, ﬁngerprint recognition is the most known and used solution to authenticate people on biometric systems. Usually, ﬁngerprint recognition approaches are minutiaebased and correlation-based. However, the minutiae-based approach is popular and extensively used method for ﬁngerprint authentication, it shows poor performance for low quality images and insecure over data-passing channel. In proposed methodology, feature-based approach for ﬁngerprint recognition is developed by enhancing the low quality image using FFT and Gaussian ﬁlter and by extracting the feature of the image using Gabor ﬁlter. Similarity measurement is done by calculating the cosine-similarity value of correlation factors. The cosine-similarity value of correlation factors of input image and template image are computed and compared. If it is over a certain threshold the result of the matching process is positive otherwise negative. Keywords: Biometrics Gabor ﬁlter

Fingerprint recognition FFT Gaussian ﬁlter

1 Introduction In the era of technology, digital authentication is a process for identifying and giving the permission in the speciﬁc system for the real entity or object. In this process, mainly we use two types methodology. One of this is password or pin that is widely used last few decades. Biometric prooﬁng approach is another one, which gains more attentation nowadays. Biometrics is a technology used to identify, analyze, and measure an individual’s physical and behavioral characteristics [1]. It is used for authenticating and authorizing a person. Among all other biometric authentication, ﬁngerprint recognition is the most known and used solution to authenticate people on biometric systems. There are two methods are exist for ﬁngerprint recognition. (1) Minutiae-based (2) correlation-based © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 581–591, 2019. https://doi.org/10.1007/978-3-030-01174-1_44

582

A. Bashir et al.

A primary motivation for using biometrics is to easily and repeatedly recognize an individual. The supremacy to a biometric is that it doesn’t change. It goes where we go. It’s very exhausting to forge or fake. In some cases, it is next to impossible. It provides a very strong access control security solution satisfying authentication, conﬁdentiality, integrity, and non-repudiation. The goal of this research work is to design and develop a system to enhance and recognize low quality ﬁngerprint image efﬁciently. Minutiae matching recognition system is widely used and popular. But it gives poor performance for low quality image. Instead of minutiae, global features of the ﬁngerprint image is the key factor in this recognition system.

2 Literature Review There are two sets of studies are considered for biometric identiﬁcation based on ﬁngerprint here. The ﬁrst sets of studies are focused on the local structure of ﬁngerprint. Haghighat et al (2015) proposed cloud based face recognition on paper “CloudID: Trustworthy cloud-based and cross-enterprise biometric identiﬁcation”, Where they apply Gabor ﬁlter for extracting feature and store it on cloud for online service in encrypted way [2]. Tuyls et al. (2005) proposed template protecting biometric authentication on the paper named, “Practical Biometric Authentication with Template Protection”. Where they apply Gabor ﬁltering on ﬁngerprint image only four orientations and get a feature vector of 1024 sized for matching [3]. A ﬁrmly privacyenhanced face identiﬁcation methodology was proposed by Erkin and Zekeriya (2009), which permits to efﬁciently hide both the biometrics and the outcome from the server by implementing secure multiparty computation on their proposed paper named “Privacy-preserving face recognition”. Euclidean distance is used for matching [4]. Osadchy et al. (2010, 2013) formulated a privacy-preserving face recognition system named SCiFI on the paper “Sciﬁ-a system for secure face identiﬁcation” [5]. The execution of this procedure is based on additive homomorphic encryption and oblivious transfer. SCiFI represents the facial images by binary feature vectors and uses Hamming distances to measure the image similarity.

3 Biometrics The measurement and statistical analysis of people’s physical and behavioural characteristics are known as biometrics. The technology is mainly used for recognition and access control, or for identifying individuals that are under supervision. So it is divided into two categories: (1) Physiological (2) Behavioural Different types of biometrics system are shown in Fig. 1.

Biometric Image Enhancement, Feature Extraction and Recognition

583

Fig. 1. Types of biometric system.

3.1

Fingerprints as a Biometric

Since the initiation of biometric automation, ﬁngerprint identiﬁcation has been getting more attention over the others. The reason is that all ﬁngerprints are unique. The procedure is the traditional yet modern procedure. Of all the biometrics available in the market (such as retina scanning, iris identiﬁcation, face identiﬁcation, and voice identiﬁcation), ﬁngerprint is one of the safest and most convenient way of getting a fool-proof of a person. It is based on a basic principle that says each of our ten ﬁngerprints is unique. One is signiﬁcantly different from the others.

4 Proposed Methodology Our proposed biometric authentication using the global shape of ﬁngerprint image is focused on the features that are extracted by Gabor Filter Bank. We divide our overall methodology into two phases, shown in Fig. 2. (1) Enrollment phase. (2) Veriﬁcation phase. In the case of enrollment phase, we take the ﬁngerprint image by using scanner or any other machine from the particular individuals. This task will be done under the monitoring of a CA (Certiﬁed Authority) team. A number of biometric representations of an individual are acquired in the ﬁrst phase of enrollment. Preprocessing, enhancement, feature extraction, post processing has been done after acquiring. In veriﬁcation phase, normally noisy image will be captured and hence some preprocessing and resizing will be needed. Then we will face a light enhancement of the sample ﬁngerprint image. Extracting feature of the global shape of sample ﬁngerprint image, we will perceive a same size of column vector of features by applying Gabor Filter Bank as like as enrollment phase.

584

4.1

A. Bashir et al.

Data Resources and Software Used

We used FVC2002 [7] ﬁngerprint databases, which are collected via optical sensor “FX2000” by Biometrika.

Fig. 2. Proposed system architecture.

4.2

Pre-processing

If we get raw color image then we convert it to gray-scale. If the acquired image size is not 256 256, then we resize it. Our concern image format is TIFF (Tag Image File Format) which is identiﬁed by ‘.tiff’ or ‘.tif’ after the ﬁle name of image. Although maximum optical scanner let out TIFF formatted image, if not then we convert it to TIFF. 4.3

Enhancement

(i) After taking a gray-scale preprocessed ﬁngerprint image, the template image must be converted into double type, so we convert it to double type. Then we call a function for the ﬁrst time with the parameter of this ‘double typed image’ and with the numeric value ‘6’ for enhancement. (ii & iii) For making ‘6 6’ block sized ‘spectral window’ we pass this numeric value ‘6’ as the second parameter of this function, mentioned above. This small block size of ‘spectral window’ is created for applying FFT on this 2-D (256 256 sized) image. Formula of FFT is deﬁned below in (1). F ðx; yÞ ¼

XM1 XN1 m¼0

n¼0

f ðm; nÞej2pðxM þ yN Þ m

n

ð1Þ

Biometric Image Enhancement, Feature Extraction and Recognition

4.4

585

Binarization

For further enhancing with the help of ‘ridge orient ﬁlter’ that was described above is called. Moreover, the images are divided into ‘16 16’ sized block and mean intensity values are calculated for each block. Subsequently, each pixel is turned into 0 if the pixel intensity value is larger than the mean intensity value [6]. The output of this function is enhanced binary image. Above two processes are shown in Fig. 3.

Fig. 3. Flow diagram of proposed ﬁngerprint image enhancement procedure.

4.5

Feature Extraction

The size of the ﬁngerprint images used in our experiment is 256 256 pixel. Using forty Gabor ﬁlters (2), the dimension of the feature vector is 256 256 40 = 2621440. As the adjacent pixels in an image are usually highly correlated, we can reduce this data redundancy by down-sampling the feature images resulting from Gabor ﬁlters. f2 x0 þ c2 y0 Gðx; yÞ ¼ exp pcg 2r2 2

2

! expðj2pfx0 þ /Þ

ð2Þ

586

A. Bashir et al.

x0 ¼ x cos h þ y sin h

ð3Þ

y0 ¼ x sin h þ y cos h

ð4Þ

Fig. 4. Flow diagram of feature extraction by using Gabor ﬁlter bank.

4.6

Feature Storage

After extracting the feature of the ﬁngerprint image, shown at Fig. 4, it is needed to store the template into a database. Biometric data is very sensitive. It needs to be as kept secure as possible. That is shown in Fig. 5. Features cane store at: (1) Remote Database or (2) Local Database

Fig. 5. Feature storage of enrolled image.

Biometric Image Enhancement, Feature Extraction and Recognition

4.7

587

Matching Procedure

The matching procedure in our proposed biometric authentication is executed by applying cosine similarity. As a column vector is contained features of ﬁngerprint image, hence cosine similarity (5) is recommended. Similarity ¼ cosðhÞ ¼

Pn A:B i¼1 Ai Bi ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃpP ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ ¼ pP n n 2 2 k AkkBk A i¼1 i i¼1 Bi

ð5Þ

We choose cosine similarity, because it measure a similarity between two non-zero vector of inner product space. Moreover it also works on positive vector space for ﬁnding similarity where the outcome is bounded with [0, 1]. 4.8

Decision Making

In cosine similarity ‘0’ denotes no similarity between two considered vectors and ‘1’ denotes absolute similarity [8]. So, here a threshold value is acknowledged. Our proposed threshold value is 0.5001.

5 Performance Analysis After Enhancement & Binarization some enhanced images and their corresponding raw images are shown below in Fig. 6.

Fig. 6. FBC2002 database (DB2) ﬁngerprint image (a) 101_1.tif (b) 102_2.tif and corresponding enhanced image.

588

A. Bashir et al.

Experimental analysis is carried out for both genuine and imposter ﬁngerprint using the program developed in MATLAB on FVC 2002 DB2 database. Tables 1 and 2 are used to show the matching percentages among the ﬁngerprint images (101_1 with all of 104 and 103). A histogram is plotted in Fig. 7 to show all the result of acceptance and rejection rate before and after image enhancement. Imposter ﬁngerprint match is shown at Table 3(a)–(c), to illustrate the necessity of enhancement operation. False match rate is calculated for both with and without enhanced image. Here we will see that imposter match rate is almost 0 after enhancement. Table 1. Matching ﬁnger image 101_1 with 104 Finger image 104_1 104_2 104_3 104_4 104_5 104_6 104_7 104_8 False match rate (Threshold = 51%)

Percent match (101_1) (with enhancement) 48% 44% 42% 25% 62% 45% 23% 22% 12.5%

Percent match (101_1) (without enhancement) 42% 58% 57% 61% 46% 53% 43% 58% 62.5%

To evaluate the system performance, genuine ﬁngerprint match, imposter ﬁngerprint match, False Rejection Rate and False Acceptance Rate are calculated that is shown in Fig. 8. A. Genuine Fingerprint Match B. Imposter Fingerprint Match Imposter ﬁngerprint match is evaluated in the following Table 3(a)-(c): FAR and FRR Curve FRR = Total genuine rejection/Total genuine observation = 12/56 = 0.214 = 21%. FAR = Total imposter acceptance/Total imposter observation = 2/28 = 0.071 = 7%.

Biometric Image Enhancement, Feature Extraction and Recognition

589

Table 2. Matching ﬁnger image 101_1 with 103 Finger image 103_1 103_2 103_3 103_4 103_5 103_6 103_7 103_8 False match rate (Threshold = 51%)

Percent match (101_1) (with enhancement) 40% 43% 31% 32% 41% 33% 37% 29% 0%

Percent match (101_1) (without enhancement) 47% 53% 45% 41% 54% 43% 49% 47% 25%

Fig. 7. Histogram of matched and non-matched ﬁngerprint.

6 Discussion In this work, we have presented a ﬁngerprint enhancement and recognition system. The proposed system is designed to recognize low quality image also. Global features of ﬁngerprint image are evaluated for recognition process. Features are extracted using Gabor ﬁlter and Cosine similarity formula ﬁnds the similarity between two feature vectors for veriﬁcation process. The minutiae matching algorithm gives better performance in terms of accuracy, but it fails when the image quality is poor. Our presented system tries to ﬁll this gap to recognize poor quality image more efﬁciently.

590

A. Bashir et al. Table 3. Imposter Fingerprint match on different image (a) Finger Image Percent match (101_1) 102_1 48% 103_1 52% 104_1 29% 105_1 30% 106_1 26% 107_1 41 108_1 41% 102_2 44% 104_2 49% 106_2 12% False match rate (Threshold 51%) 10% (b) Finger image Percent match (107_1) 108_1 47% 108_2 27% 108_3 42% 108_4 26% 108_5 21% 108_6 46% 108_7 44% 108_8 33% False match rate (Threshold 51%) 0% (c) Finger image Percent match (102_2) 101_2 42% 103_2 46% 104_2 50% 105_2 41% 106_2 17% 107_2 39% 108_2 51% 101_3 40% 103_3 42% 105_3 44% False match rate (Threshold 51%) 10%

Biometric Image Enhancement, Feature Extraction and Recognition

591

Fig. 8. FRR and FAR for different threshold values.

7 Limitation and Future Works The image quality of selected image for enrollment process plays an important role to the system performance. Image with rich information like ﬁne details of ridges, core point, delta point, etc. gives us better performance in veriﬁcation step. So it’s very important for a ﬁngerprint recognition system to estimate the quality of enrolled image. This can be done by developing an automatic image quality assessment system to ensure the enrolled image quality. Another issue is the number of extracted features by Gabor ﬁlters. It is very high which increases the computational complexity and decreases the performance of the system. Some dimensionality reduction function can be applied to diminish the size of the feature vector. We want to address these issues in our future development of this system.

References 1. Iqbal, A.A.: An overview of leading biometrics technologies used for human identity. In: Student Conference on Engineering Sciences and Technology, SCONEST 2005. IEEE (2005) 2. Haghighat, M., Zonouz, S., Abdel-Mottaleb, M.: CloudID: trustworthy cloud-based and cross-enterprise biometric identiﬁcation. Expert Syst. Appl. 42(21), 7905–7916 (2015) 3. Tuyls, P., et al.: Practical biometric authentication with template protection. In: AVBPA, vol. 3546 (2005) 4. Erkin, Z., et al.: Privacy-preserving face recognition. In: International Symposium on Privacy Enhancing Technologies Symposium. Springer, Heidelberg (2009) 5. Osadchy, M., et al.: Sciﬁ-a system for secure face identiﬁcation. In: 2010 IEEE Symposium on Security and Privacy (SP). IEEE (2010) 6. Garg, B., et al.: Fingerprint recognition using Gabor ﬁlter. In: 2014 International Conference on Computing for Sustainable Global Development (INDIACom). IEEE (2014) 7. Maltoni, D., et al.: Handbook of Fingerprint Recognition. Springer, London (2009) 8. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing. ACM (2002)

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools Elena Acevedo(&), Antonio Acevedo, and Federico Felipe Instituto Politécnico Nacional, Mexico City, Mexico [email protected], {eacevedo,macevedo,ffelipe}@ipn.mx

Abstract. The face is the reflection of our emotions. We can guess the state of mind of a person by observing the face. In this paper, we applied an Associative Model algorithm to recognized Grammatical Facial Expressions. We used the dataset of the Brazilian sign language (Libras) system. The model we applied was a Morphological Associative Memory. We implemented a memory for each expression. The average of recognition for the same expression was of 98.89%. When we compare one expression with the others, we obtained a 98.59%, which means that our proposal confuses few expressions. Keywords: Computational intelligence Facial expression Pattern recognition

Associative memories

1 Introduction Humans communicate with peers using nonverbal facial expressions and language [1]. Crucially, some of these facial expressions have grammatical function and, thus, are part of the grammar of the language. These facial expressions are thus grammatical markers and are sometimes called grammaticalized facial expressions. The face is an important tool to describe a person without a verbal interaction. Faces tell us the identity of the person we are looking at and provide information on gender, attractiveness, and age, among many others [2]. Sign languages (SL) are the principle communication form for hearing disabled people in all over the world. In the SL, grammatical information in a sign sentence is conveyed through the facial expressions; which is considered to be a function of the facial expressions. When the facial expressions communicate grammatical information in a sign sentence, they are emerged as grammatical facial expressions (GFE) [3]. Note that non-manual signs and grammatical processes are integral aspects of SL communication, for instance, a sentence might be ungrammatical in absence of a non-manual sign. In order to take the advantage of non-manual signs (e.g. facial expression), there has been conducted several analyses of facial expressions to automate the recognition of SL. A ﬁrst reason machine learning and computer vision researchers are interested in creating computational models of the perception of facial expressions of emotion is to aid studies in the above sciences. Furthermore, computational models of facial

© Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 592–605, 2019. https://doi.org/10.1007/978-3-030-01174-1_45

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

593

expressions of emotion are important for the development of artiﬁcial intelligence and are essential in human computer interaction (HCI) systems. In this work, we develop a system able to recognize grammatical facial expressions by applying Morphological Associative Memories which is belong to the Artiﬁcial Intelligence tools.

2 Related Work In 1988 [4], Hidden Markov models were used to accomplish facial expression recognition. They used three approaches to extract information: facial feature point tracking, PCA, and high gradient component detection with the recognition of 85%, 93% and 85% respectively. In 2015, the JAFFE database was used to recognize 6 basic facial expressions plus one neutral. Six algorithms were applied to perform the task [5]: Histogram of Oriented Gradients (85.71%), Local Directional Pattern (78.06%), Local Gradient Code (LGC) (87.25%), LGC based on principle on horizontal diagonal (86.22%), LGC based on principle on vertical diagonal (86.73%), and Local Binary Pattern (86.73%). Tang and Chen [6] applied an algorithm based on Curvelet transform with an improved Support Vector Machine based on Particle Swarm Optimization to recognized facial expressions. They used the same database of JAFFE. The percentage of recognition was of 94.94%. Chakrabartia and Dutta [7] obtained 65.83% of recognition by using Eigenfaces. A 2-layer Ada-Random Forests framework based on Ada-Boost feature selector and Random Forests classiﬁer [8] was presented for the recognition of the grammatical facial expressions. They used the Brazilian sign language (Libras) system. They obtained a 98.53% of recognition. After testing several algorithms, the authors found that the Random Forest [9] showed the best results in most of the cases to recognize FGEs. In Sect. 2, we present the basic concepts of Associative Memories and the algorithm to calculate a Morphological Associative Memory. The description of the algorithm for the recognition of facial expressions is presented in Sect. 3. We show the results in Sect. 4, and ﬁnally, Sect. 5 contains the conclusions.

3 Methods and Materials 3.1

Associative Memories

An Associative Memory (AM) is a system that works as human memory: it associates patterns to recall them later. For example, we associate faces with names and when we meet a friend we can call him by his name because we recognize the face. We can say that we are eating a guava due to the smell and the taste because we associate that smell and taste from the ﬁrst time we had a guava. Also, a doctor can diagnose a knee fracture with just to observe an X-ray because he has associated the pattern of a fracture with that event. We can predict the rain when we see a cloudy sky. Therefore, when we have

594

E. Acevedo et al.

associated stimuli with responses we can recall the response if the stimulus is presented. That is what an associative memory does. In the learning phase, it associates input patterns (stimuli) with output patterns (responses) and then, in the recalling phase, it recalls the corresponding response when a speciﬁc stimulus is presented. The input and output patterns for an AM can be images, strings, or numbers (real, integer or binary), they can be any kind of patterns that can be represented with a number. These patterns are stored in vectors. The task of association of these vectors is called Training Phase and the Recognizing Phase allows recovering patterns. The stimuli are the input patterns represented by the set x = {x1, x2, x3, …, xp} where p is the number of associated patterns. The responses are the output patterns and are represented by y = {y1, y2, y3, …, yp}. Representation of vectors xµ is xl ¼ fxl1 ; xl2 ; . . .; xln g where n is the cardinality of xµ. The cardinality of vectors yµ is m, then yl ¼ fyl1 ; yl2 ; . . .; ylm g. The set of associations of input and output patterns is called the fundamental set or training set and is represented as follows: {(xl, yl) | l = 1, 2, …, p}. 3.2

Morphological Associative Memories

The basic computations occurring in the proposed morphological network [10] are based on the algebraic lattice structure (R, _, ^, +), where the symbols _ and ^ denote the binary operations of maximum and minimum, respectively. Using the lattice structure (R, _, ^, +), for an m n matrix A and a p n matrix B with entries from R, the matrix product C = A ∇ B, also called the max product of A and B, is deﬁned by Eq. (1). p cij ¼ _ aik þ bkj ¼ ai1 þ b1j _ . . . _ aip þ bpj k¼1

ð1Þ

The min product of A and B induced by the lattice structure is deﬁned in a similar fashion. Speciﬁcally, the i,jth entry of C = A D B is given by (2). p cij ¼ ^ aik þ bkj ¼ ai1 þ b1j ^ . . . ^ aip þ bpj k¼1

ð2Þ

Henceforth, let (x1, y1), (x2, y2), …, (xp, yp) be p vector pairs with xk ¼ t t xk1 ; xk2 ; . . .; xkn 2 Rn and yk ¼ yk1 ; yk2 ; . . .; ykm 2 Rm for k = 1, 2, …, p. For a given set of pattern associations {(xk, yk) | k = 1, 2, …, p} we deﬁne a pair of associated pattern matrices (X, Y), where X = (x1, x2, …, xp) and Y = (y1, y2, …, yp). Thus, X is of dimension n p with i,jth entry xij and Y is of dimension m p with i,jth entry yij . Since yk ∇ (−xk)t = yk D (−xk)t, the notational burden is reduced by denoting these identical morphological outer vector products by yk (−xk)t. With these deﬁnitions, we present the algorithms for the training and recalling phase.

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

595

(1) Training Phase 1. For each p association (xl, yl), the minimum product is used to build the matrix l yl D (−xl)t of dimensions l m l n, wherel the input transposed negative pattern x is l t deﬁned as ðx Þ ¼ x1 ; x2 ; . . .; xn . 2. The maximum and minimum operators (_ and ^) are applied to the p matrices to obtain M and W memories as Eqs. (3) and (4) show. p t M ¼ _ yk xk

ð3Þ

p t W ¼ ^ yk xk

ð4Þ

k¼1

k¼1

(2) Recalling phase In this phase, the minimum and maximum product, D and ∇, are applied between memories M or W and input pattern xx, where x 2 {1, 2, …, p}, to obtain the column vector y of dimension m as (5) and (6) shows: y ¼ M xx

ð5Þ

y ¼ W xx

ð6Þ

4 Facial Expression Recognition Algorithm 4.1

Description of the Dataset

In this paper, we analyze the Grammatical Facial Expression (GFE) that is used as grammatical markers from expressions in Brazilian Sign Language. There are eight types of grammatical markers in the Libras system: • WH-question: generally used for questions with WHO, WHAT, WHEN, WHERE, HOW and WHY; • Yes/no question: used when asking a question to which there is a “yes” or “no” answer; • Doubt question: this is not a “true” question since an answer is not expected. However, it is used to emphasize the information that will be supplied; • Topic: one of the sentence’s constituents is displaced to the beginning of the sentence; • Negation: used in negative sentences; • Assertion: used when making assertions; _ conditional clause: used in subordinate sentence to indicate a prerequisite to the main sentence; • Focus: used to highlight new information into the speech pattern; • Relative clause: used to provide more information about something.

596

E. Acevedo et al.

Each expression is represented by a set of three-dimensional (X, Y, Z) data points which were captured with a Microsoft Kinetic sensor. The set has 100 points. Each point is described as follows: X – Y coordinates, are the positions in pixels from an image captured by a RGB Kinect camera. Coordinate Z, is a measure of Depth in millimeters and it was captured by an infrared Kinect camera. The analyzed points from each part of the face are: 0–7 (X, Y, Z) - left eye 8–15 (X, Y, Z) - right eye 16–25 (X, Y, Z) - left eyebrow 26–35 (X, Y, Z) - right eyebrow 36–47 (X, Y, Z) - nose 48–67 (X, Y, Z) - mouth 68–86 (X, Y, Z) - face contour 87 (X, Y, Z) - left iris 88 (X, Y, Z) - right iris 89 (X, Y, Z) - peak of the nose 90–94 (X, Y, Z) – line over the left eyebrow 95–99 (X, Y, Z) - line over the right eyebrow In Fig. 1 are illustrated the analyzed points from the human face.

Fig. 1. The 100 analized points from the human face.

The complete set of data is organized in 18 ﬁles. Nine ﬁles are from user A (for training) and the other 9 ﬁles are from user B (for testing). Table 1 shows the number of records for each ﬁle. The name of the ﬁle corresponds to each GFE.

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

597

Table 1. Number of records for each ﬁle GFE afﬁrmative_datapoints.txt conditional_datapoints.txt doubt_question_datapoints.txt emphasis_datapoints.txt negative_datapoints.txt relative_datapoints.txt topics_datapoints.txt wh_question_datapoints.txt yn_question_datapoints.txt

4.2

User A User B 414 528 548 589 491 780 330 531 528 712 644 550 360 467 609 549 532 715

Morphological Associtive Memory Design

We built nine MAMs, one for each expression. The number of input and output patterns are equal to the number of the vector in each ﬁle, for example, the afﬁrmative ﬁle has 414 records, therefore, the number of pairs of vectors is 414. The dimension of the input patterns is 300 because we have 100 points with three dimensions. The set of the output vectors ð yÞ shape a quadratic matrix whose diagonal has the value of 2000 to build the max memory. We have another quadratic matrix with the value of the diagonal of −2000 to build the min memory, this memory is built with the output patterns ð^yÞ. The remaining values in both matrices are equal to zero. The values of 2000 and −2000 were chosen because the greatest number in the dataset is 1563. For the diagonal we use a number greater than the greatest number in the complete dataset to assure the best recognition. Regularly, we build the output patterns in the same way, however, the resulting vectors are always different, namely, the vectors have to be analyzed to ﬁnd the particular criteria that allow the best results. We present an example to illustrate the functioning of the MAMs. The values of the elements of input patterns are the original from the dataset. We use vectors with lower dimension for convenience. The three pairs of patterns to build a max MAM are: 0

1 0 1 289:846 2000 x1 ¼ @ 203:407 A ! y1 ¼ @ 0 A; 0 1216 1 0 0 1 289:637 0 x2 ¼ @ 200:528 A ! y2 ¼ @ 2000 A ; 0 1216 1 0 0 1 290:796 0 x3 ¼ @ 200:383 A ! y3 ¼ @ 0 A 1208 2000

598

E. Acevedo et al.

Now, we calculate the ﬁrst association.

y1 x

1

0

1 2000 ¼ @ 0 A ð 289:846 0

0 2000 289:846 y1 x1 ¼ @ 0 289:846 0 289:846

y1 x

1

203:407

1216 Þ

1 2000 203:407 2000 1216 0 203:407 0 1216 A 0 203:407 0 1216

0

1710:154 ¼ @ 289:84 289:84

1 1796:593 784 203:407 1216 A 203:407 1216

We apply the same process for the remaining pair of patterns, and the results are:

2

3

y2 x

y3 x

0

289:637 ¼ @ 1710:363 289:637 0

290:796 ¼ @ 290:796 1709:204

200:528 1799:472 200:5287

1 1216 784 A 1216

200:383 200:383 1799:617

1 1208 1208 A 792

With these three associations, we calculate de max MAM. 0

1710:154

1796:593

784

1

B C M ¼@ 289:84 203:407 1216 A 289:84 203:407 1216 0 1 289:637 200:528 1216 B C _ @ 1710:363 1799:472 784 A 289:637 200:5287 1216 0 1 290:796 200:383 1208 B C _ @ 290:796 200:383 1208 A 1709:204 1799:617 792 0 1 1710:154 1796:593 784 B C M ¼ @ 1710:363 1799:472 784 A 1709:204 1799:617 792

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

599

To calculate the min memory, we use the following three pairs of patterns. 0

1 0 1 289:846 2000 x1 ¼ @ 203:407 A ! ^y1 ¼ @ 0 A; 0 1216 1 0 0 1 289:637 0 x2 ¼ @ 200:528 A ! ^y2 ¼ @ 2000 A; 0 1216 1 0 0 1 290:796 0 x3 ¼ @ 200:383 A ! ^y3 ¼ @ 0 A 1208 2000 We perform the same calculations and the resulting min memory is, 0

2289:846 W ¼ @ 2289:637 2290:796

2203:407 2200:528 2200:383

1 3216 3216 A 3208

Now, we present the ﬁrst input pattern to the max and min MAMs. 0

1710:154 MDx1 ¼ @ 1710:363 1709:204 0

1796:593 1799:472 1799:617

ð1710:154 þ 289:846Þ^ MDx1 ¼ @ ð1710:363 þ 289:846Þ^ ð1709:204 þ 289:846Þ^ 0

2000 MDx1 ¼ @ 2000:209 1999:05

1 0 1 784 289:846 784 AD@ 203:407 A 792 1216

ð1796:593 þ 203:407Þ^ ð1799:472 þ 203:407Þ^ ð1799:617 þ 203:407Þ^

^ 2000 ^ 2002:879 ^ 2003:024

1 ð784 þ 1216Þ ð784 þ 1216Þ A ð792 þ 1216Þ

1 0 1 ^ 2000 2000 ^ 2000 A ¼ @ 2000 A ^ 2008 1999:05

Then, we replace the elements with value 2000 with a 1 and the remaining values with zeros. 0

1 0 1 2000 1 MDx1 ¼ @ 2000 A ! @ 1 A 1999:05 0 It can be observed that the corresponding output pattern is not recalled. This result indicates that the input pattern x1 is like the second input pattern. Therefore, we present the ﬁrst input pattern to the min memory to see if y1 can be recovered.

600

E. Acevedo et al.

0

2000

_

B Wrx1 ¼ @ 1999:791 _ 2000:95

_

2000

1

0

2000

1

2000

_

1997:121

_

C B C 2000 A ¼ @ 1997:121 A

1996:976

_

1992

1992

In the same way, we placed a number 1 on the elements where there is a value of −2000. 0

1 0 1 2000 1 Wrx1 ¼ @ 1997:121 A ! @ 0 A 1992 0 We apply the logical operation AND between the both results, 0 1 0 1 0 1 1 1 1 @ 1 A AND @ 0 A ¼ @ 0 A 0 0 0 The number 1 in the ﬁrst element means that the ﬁrst output pattern was recalled. From the experiments, we observed that the min memory showed a better performance than max memory, therefore, we decided to apply only the min MAM. Then we trained the min memory with the records from the ﬁle a_afﬁrmative_datapoints.txt. When we present the testing data from ﬁle b_afﬁrmative_datapoints.txt to the memory, in the resulting vector, we replaced with a number one those elements whose value was in the range of −2000 to −1900. With this criterion, we obtained a vector where all the elements had a value of one. That meant the testing data corresponded to the same expression. Then, we present the testing data from the ﬁle b_conditional_datapoints.txt, we applied the same criterion and we obtained a vector whose elements were all equal to zero, this indicated that the expressions were not the same.

5 Results We implement the software in a laptop with an Intel CORE i7 processor and with language programming Visual Studio 2013 C#. In Fig. 2, we can observe the main window of the interface. The ﬁrst step is to read the points from some of the GFEs, for both, user A and user B from a txt ﬁle. We indicate the GFE and the number of input vectors for the two users. The matrix of output vectors is calculated. Then we generate max and min Morphological associative memories. We have two options: one is to recognize expressions from the same user or from different user. Finally, the system indicates the number of mistakes and the percentage in the recognition.

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

601

Fig. 2. The main window of the interface of the software.

On the left right side of the window, we can observe that we only include a button with min memory recall for user B. This is because the experiments showed that the error was less when we apply only min memory. In the case of compare expressions from the same user (user A), we observed that the process illustrated with the numerical example, was the best to obtain a zero error in the recognition.

Fig. 3. Percentage of the error recognition when we select emphasis expression from both users.

Figure 3 shows an example of recognition when we select the emphasis expression from user A and user B. The message box in the bottom of the window shows that the error is zero which means that the system identiﬁed that both expressions were the same. From Fig. 4, we can observe an example when we selected two different expressions for both users. The message box in the bottom of the window indicates that the

602

E. Acevedo et al.

Fig. 4. Percentage of the error recognition when we select conditional expression from user A and relative expression from user B.

percentage of error in the recognition was 5.45%. If the error of recognition is greater than to zero then we conclude that the expressions are different. In Table 2, we present the results of the percent of recognition of GFEs. The Morphological Associative Memory was trained with the data of user A and tested with the values of user B. From Table 2, we can observe that the recognition is equal or greater than 95%, and in almost all the cases, the memory showed a correct recall. We must highlight that the percentage of recognition of the doubt GFE was ﬁrst of 91.97, this means that the memory committed 385 mistakes. Then we analyzed the results from that memory and we observed that we could take a range of the values of the output patterns. Therefore, before replace the value of −2000 with a zero, we observed that we obtained better results if we took as a success a value of −2000 ± 0.23. With this process we reduced the number of mistakes to 39 achieving a 95% of recognition. Table 2. Percentage of recognition when the memory is trained with a GFE from user A and tested with a GFE of user B GFE afﬁrmative conditional doubt emphasis negative relative topics wh_question yn_question Average

% of recognition 100 99.15 95 100 100 100 100 100 95.94 98.89

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

603

The next experiment was to compare each expression from user A with the 9 expressions of user B. the results can be observed in Table 3. The results can be interpreted as follows, in the ﬁrst column and ﬁrst row, the value of zero indicates that the system did not make a mistake when identifying the conditional expression from both users. Table 3. Percentage of error when we compare one expression from user A with the nine expressions of user B GFE user B

GFE user A aff con afﬁrmative 0 0 conditional 5 5 doubt 6 8 emphasis 0 0 negative 0 0 relative 19 30 topics 0 0 wh_question 0 0 yn_question 5 44 Error (%) 0.64 1.60 Recognition (%) 99.36 98.4

(mistakes) dou emp 7 0 4 0 39 2 5 0 9 0 14 4 1 0 1 0 9 4 1.64 0.18 99.35 99.82

neg 0 0 3 0 0 14 0 0 1 0.33 99.67

rel 0 2 15 0 0 0 0 0 0 0.31 99.69

top 0 2 5 0 0 11 0 0 9 0.49 99.51

wh 5 36 19 14 23 50 1 0 136 5.23 94.7

yn 2 36 22 14 19 50 1 0 29 3.19 96.81

In the second row and ﬁrst column, there is a value of 5 that indicates that the system committed that number of mistakes when we compared afﬁrmative expression from user A with conditional expression from user B. This means that the system confused both expressions. The percentage of error is calculated with (7). %Error ¼

Total error 100 5421

ð7Þ

where total error is the sum of the errors from each expression. The value of 5421 is total records from the 9 ﬁles of expressions from user B. The average of the recognition of GFEs is 98.59%. The best and the worst percentage of recognition were 99.82% for relative expression and 94.7% for doubt expression, respectively. The identiﬁcation of GFEs never is lower than 94%. The memory just committed one mistake with the wh_question expression while with yn_question expression, MAM made more mistakes. Now, we show a comparison of our results with other algorithms in Table 4. From Table 4, we can observe that the Random Forest shows the worst results. The Ada-Random Forests (ARF) shows better results in some cases, for example, in the doubt expression ARF had a 98.33% of recognition while our proposal showed a 95%. In a similar way, with the wh_question expression, ARF had a 4.06% of recognition

604

E. Acevedo et al.

more than the Morphological Associative Memory. In the case of the negative expression, both algorithms showed the same recognition, and in ﬁve cases, our proposal was better. In general, the MAM algorithm showed the best results with a 98.89% of recognition when is compared with the 98.53% of ARF and 93% of Random Forest. Table 4. Comparison of the results of our proposal with two algorithms of the state-of-art that use the Brazilian sign language (Libras) system GFE afﬁrmative conditional doubt emphasis negative relative topics wh_question yn_question Average

% of rec. our proposal 100 99.15 95 100 100 100 100 100 95.94 98.89

% of rec. 2-layer Ada-Random forests 100 98.66 98.33 98.67 100 99.18 97.70 98.53 100 98.53

% of rec. random forest 89 95 90 98 91 95 90 95 94 93

6 Conclusions Morphological Associative Memories are a suitable Artiﬁcial Intelligence tool to recognize Grammatical Facial Expressions. We implemented a MAM for each expression. When we trained the MAMs with the data of user A and tested the memory with data of user B, we obtained the 98.89% of recognition. The memory committed just 73 mistakes. When we compared one expression from user A with the remaining expressions of user B, we obtained the 98.59% of recognition, which means that our system committed 740 mistakes of 5421 records. We showed that our proposal had better results than two other algorithms: Ada-Random Forests and Random Forest. The difference between MAM and these algorithms was 0.36% and 5.89%, respectively.

References 1. Benitez-Quiroz, F., Wilbur, R., Martinez, A.: The not face: a grammaticalization of facial expressions of emotion. Cognition 150, 77–84 (2016) 2. Martinez, A., Du, S.: Model of the perception of facial expressions of emotion by humans: research overview and perspectives. J. Mach. Learn. Res. 13, 1589–1608 (2012) 3. Freitas, F., Marques, S., Aparecido de Moraes, C., Venancio, F.: Grammatical facial expressions recognition with machine learning. In: Proceedings of the Twenty-Seventh International Florida Artiﬁcial Intelligence Research Society Conference, pp. 180–185 (2014)

Gramatical Facial Expression Recognition with Artiﬁcial Intelligence Tools

605

4. Lien, J.J., Kanade, T., Cohn, J.F., Li, C.C.: Automated facial expression recognition based on FACS action units. In: Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition (1998) 5. Kumaria, J., Rajesha, R., Pooja, K.M.: Facial expression recognition: a survey. Procedia Comput. Sci. 58, 486–491 (2015). Second International Symposium on Computer Vision and the Internet 6. Tang, M., Chen, F.: Facial expression recognition and its application based on curvelet transform and PSO-SVM. Optik 124, 5401–5406 (2013) 7. Chakrabartia, D., Dutta, D.: Facial expression recognition using eigenspaces. Procedia Technol. 10, 755–761 (2013). International Conference on Computational Intelligence: Modeling Techniques and Applications 2013 8. Taufeeq, M.: An Ada-Random forests based grammatical facial expressions recognition approach. In: International Conference on Informatics, Electronics & Vision (ICIEV) (2015) 9. Bhuvan, M.S., Vinay, D., Siddharth, J., Ashwin, T.S., Ram, M., Reddy, G., Sutej, P.K.: Detection and analysis model for grammatical facial expressions in sign language. In: IEEE Region 10 Symposium (TENSYMP), Bali, Indonesia (2016) 10. Ritter, G.X., Sussner, P., Diaz de León, J.L.: Morphological associative memories. IEEE Trans. Neural Netw. 9(2), 281–293 (1998)

Mathematical Modeling of Real Time ECG Waveform Shazia Javed1(B) and Noor Atinah Ahmad2 1

Department of Mathematics, Lahore College for Women University, Lahore 54000, Pakistan [email protected] 2 School of Mathematical Sciences, Universiti Sains Malaysia, 11800 Penang, Malaysia [email protected]

Abstract. Electrocardiogram (ECG) is a digital recording of heart rate variability that is used to detect the cardiac disorders. Often these recordings are aﬀected by physiological and instrumental noises that aﬀects an accurate diagnosis of the disease. An exact understanding of ECG waveform may help in overcoming such issues. Mathematical modeling is eﬃciently used to understand the pattern of 12-lead ECG and simulate real time ECG’s waveform. Real ECG can be taken as a superposition of bounded functions and this property is a deﬁning feature of almost periodic functions (APF). The proposed model has utilized this characteristic of ECG signals to generate the real time ECG waveform with negligibly small error.

Keywords: 12-lead ECG Electrocardiogram

1

· Almost periodic function

Introduction

The electrocardiogram (ECG) waveform helps in diagnosis of heart diseases and their treatment. Despite the inevitable importance of ECG signals, the clinical reliability of some ECG waveforms is questionable and lead to wrong diagnosis of cardiac disease that may result in vital damage. An artiﬁcial simulation of the ECG, using mathematical modeling techniques can help in understanding the behavior of ECG signal, and therefore, can be helpful in improving signal quality. A deeper observation of ECG waveform reveals that it consists of diﬀerent events and intervals. The event corresponding to the P-wave is associated to the atrial depolarization, while corresponding to T-wave and U-wave is ventricular repolarisation. The waveform of QRS complex is of largest amplitude and shows the depolarization of both the interventricular septum, right ventricle, and left ventricle. These are the main incidents of ECG, however, positions of leads and power conduction may deﬁne more intervals. Sinus rhythms can be c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 606–614, 2019. https://doi.org/10.1007/978-3-030-01174-1_46

Mathematical Modeling of Real Time ECG Waveform

607

classiﬁed by heart rate of the patient, and amplitude of the waves. Other parameters for automated detection of waveform include duration of the event and its location. The ECG patterns of cardiac activities are recorded by 12-lead by placing electrodes on various limbs of human body as well as on the chest [1]. Sometimes clinical recordings are disturbed by power line ﬂuctuations, instrumental or physiological interferences and the ECG waveform fail to provide an exact diagnosis of the disease. This may lead to vital damage and therefore recording of accurate ECG is one of the biggest challenges of cardiology research. A simulation of real ECG waveform can be of great help in understanding the characteristics and pattern of ECG and hence improving quality of diagnosis. Mathematical modeling is one of the most eﬃcient techniques of signal simulation, and has done remarkable work in ECG simulation. Researchers have used various techniques to simulate ECG waveform. Remarkable work include the Fourier analysis based quasi-periodic simulator that was proposed by AlNashash [2] in 1995, and periodic simulator was presented by Karthik [3] in 2003. Another important contribution is the synthetic ECG model of McSharry [4] that employs iterative algorithms of nonlinear optimization to ﬁt into real ECG. Borsali [5] used curve ﬁtting techniques for matrix alignment of the beats for ECG compression, while Nunes and Nait [6] the Hilbert transform methods. Afterwards Ghaﬀari [7] developed an algebraic model to artiﬁcially generate events of ECG. The ECG simulator, proposed in 2014 [8], was designed for an adaptive noise cancellation model and is based on almost periodic functions. Interference of noise signals is obvious in real ECG and cannot be ignored while modeling a real ECG waveform. Recently Casta˜ no [9] presented autoregressive models of motion artifact contaminated ECG. For an appropriate frequency of heart beat, the mathematical model of [8] is able to simulate normal as well as abnormal waveforms of the ECG of 12-lead and sinus rhythms, provided that the model parameters of ECG’s normal events P, Q, R, S, T and U are suitably chosen. In this paper, this model is improved to simulate real time clinical ECG signal, available on MIT-BIH that is a database of physionet.com [10,11]. The ECG signals of this database are contaminated by noises of various types, like major noises in ECG 101 signals are baseline wander and muscle artifacts. The modeled waveform can help in recognizing exact ECG waveform that may improve the diagnosis of the disease. These real signals are simulated in Matlab by modeling artifacts to be added in the ECG model of [8] to make it comparable with the real ECG waveform. The improved model can produce synthetic waveform of real ECG with negligibly small error. The rest of the paper includes a description of the mathematical model for ECG simulation in Sect. 2, generation of 12-lead ECG in Sect. 2.1 and modeling of diﬀerent sinus rhythms in Sect. 2.2. The real ECG waveform is evaluated in Sect. 3, and the concluding remarks fall in Sect. 4.

608

2

S. Javed and N. A. Ahmad

Mathematical Model for Artificial Simulation of ECG Waveform

The whole formation of the mathematical model of [8] depends on number of heartbeats per minute. Overall waveform is artiﬁcially generated, following a superposition of waveforms corresponding to each of the events, according to the fact that the amplitude of an event does aﬀect the amplitude of the neighboring event, unless the incident time of consecutive events is almost same. 2.1

Mathematical Model

Let us denote the number of heartbeats per minute by N , and J be the set of all the events of ECG waveform corresponding to P-wave, Q-wave, R-wave, S-wave, T-wave and U-wave. At time n, if Θj (n) is a function of time period τ (n) = 60 N with the property Θj (n) ≈ Θj (n + τ (n)), then for event j ∈ J that is incident at tj , [8] yields Θj (n) = Aj +

N

n αj (n)cos{2πn

n=1

f (n)dn}

(1)

tj

where f (n) = 1/τ (n) is the fundamental frequency of heartbeat whereas Aj and αj (n) represent the Fourier coeﬃcients. Simplifying (1) and then substituting ω = 2πf (n) = τ2π (n) , Θj (n) = Aj +

N

αj (n)cos{ω(n − tj )n}

(2)

n=1

Occurrence of each event j can be taken as cyclic with period τj = τd(n) and j frequency fj = τ1j . For given amplitude mj , if duration dj and location tj is known for each j ∈ J, then the coeﬃcients Aj & αj (n) are given by: ⎧ 2mj , j ∈ {P, T, U }; ⎪ ⎨ πτj m , j ∈ {R}; (3) Aj = 2τjj ⎪ ⎩ −mj , j ∈ {Q, S}; 2τj and, ⎧ 4f mj dj ⎪ ⎨ π{1−(2nf dj )2 } cos{ωj n} , j ∈ {P, T, U }; 2mj αj (n) = f dj (nπ) 2 (1 − cos{ωj n}) , j ∈ {R}; ⎪ ⎩ −2mj f dj (nπ)2 (1 − cos{ωj n}) , j ∈ {Q, S};

(4)

Mathematical Modeling of Real Time ECG Waveform

609

where ωj = πfj = τπj . The superposition of events Θj yield an ECG waveform of a single lead for appropriate values of model parameters (mj , dj , tj ). For known N , the ECG waveform is approximated as: Θj (n) ECG(n) = j∈J

= Ao + where a nonzero value of Ao =

N j∈J n=1

αj (n)cos{ω(n − tj )n}

(5)

Aj represent ECG waveform from the line

j∈J

of zero voltage. Since occurrence of an event is independent of N , (5) can be rewritten as: ECG(n) = Ao +

N

αj (n)cos{ω(n − tj )n}

(6)

n=1 j∈J

For a given set of model parameters, (6) can generate an ECG waveform artiﬁcially. In the following sections, this mathematical model will be used to generate ECG waveforms of diﬀerent leads and sinus rhythms. However, a real ECG may contains several artifacts, such as baseline wander (BW), power line interference (PLI) and EMG noise etc. Therefore, in order to model a waveform of real ECG signal, these artifacts should also be modeled artiﬁcially. 2.2

Generating ECG Waveforms for 12-leads

ECG signals may either be recorded as a standard 12-lead ECG waveform that looks at diﬀerent parts of heart, or as individual rhythm strips that look at heart rate variability (HRV). In this section, model (6) is used to simulate online available images of 12-lead ECG waveforms of Fig. 1. Setting Number of hear beat per minute as N = 72, and choosing appropriate values of model parameters, Fig. 2 shows a short segment of the recording of each of the 12-leads. These segments resemble well with the online available derivations of Fig. 1.

Fig. 1. Online available 12-lead ECG waveform (http://en.wikipedia.org/wiki/File: ECG 12derivations.png).

610

S. Javed and N. A. Ahmad I

2

III

2

aVL

2

VI

2

V3

2

1

1

1

1

1

0

0

0

0

0

0

−1

−1

−1

−1

−1

−1

−2

0

0.5

1

II

2

−2

0

0.5

1

aVR

2

−2

0

0.5

1

aVF

2

−2

0

0.5

1

V2

2

−2

1

0

0.5

1

V4

2

−2

1

1

1

1

0

0

0

0

0

0

−1

−1

−1

−1

−1

−1

0

0.5

1

−2

0

0.5

1

−2

0

0.5

1

−2

0

0.5

1

−2

0

0.5

1

V6

2

1

−2

V5

2

1

0

0.5

1

−2

0

0.5

1

Fig. 2. 12-lead synthetic ECG waveform. (a)

2

mV

mV

1

1 0.5

0

1

2

2 3 Time (sec.) (c)

4

−0.5

5

1

2 3 Time (sec.) (d)

4

5

0

1

2 3 Time (sec.)

4

5

1.5

1

mV

mV

0

2

1.5

0.5 0 −0.5

0.5 0

0 −0.5

(b)

1.5

1.5

1 0.5 0

0

1

2 3 Time (sec.)

4

5

−0.5

Fig. 3. Heart rate variability (HRV) components: (a) Normal sinus arrhythmia, with 72 beats per minute, (b) Sinus bradycardia, with 25 beats per minute, (c) Sinus Tachycardia, with 120 beats per minute, and (d) Supraventricular Tachycardia, with 170 beats per minute.

2.3

Modeling ECG Waveforms for Diﬀerent Sinus Rhythms

An important thing about the mathematical model (5) is that frequencies of all the events are controlled by the heart beat frequency f . This action is quite visible in the ECG waveforms of diﬀerent rhythm stripes which determine heart rate variability (HRV). HRV is a measure of alterations in heart rate and is composed of two major components: high frequency respiratory sinus arrhythmia (RSA) and low frequency sympatric components. Some important HRV components are shown in Fig. 3 according to the following categories: (a) Normal Sinus Arrhythmia: Frequency of Normal ECG is 60 to 100 beats per minutes, i.e., 60 ≤ N ≤ 100, see Fig. 3(a). (b) Sinus Bradycardia: A sinus rhythm of less than 60 beats per minute, as shown in Fig. 3(b). (c) Sinus Tachycardia: The sinus node sends out electrical signals faster than usual, speeding up the rate and generating a sinus rhythm of more than 100 beats per minute, Fig. 3(c).

Mathematical Modeling of Real Time ECG Waveform

611

(d) Supraventricular Tachycardia: Usually caused by re-entry of currents within the atria or between ventricles and atria producing higher heart rates of 150 − 250. Sinus rhythm at 170 BPM is shown in Fig. 3(d). Out of all the 12-leads, clinical ECG machines mostly record Lead I, II and III only that cause missing information and hence wrong diagnosis. However, to understand heart’s rhythm, only one lead is enough and for this Lead II is a trivial choice. Figure 3 demonstrates the simulated waveforms of sinus rhythms of four types, described above. Starting with event P, the signals are sampled with 10−4 sec resolution for 5 s, with time: n = 0.001 : 0.0001 : 5. The values of model parameters taken form Table 1 of [8].

3

Evaluation of a Real ECG Waveform of MIT-BIH

In this section, proposed mathematical model is modiﬁed to simulate real ECG signals. The real time ECG waveform of MIT-BIH arrhythmia database [10] will be used for this purpose. This arrhythmia data base contains 48 clinical ECG recordings of patients from diﬀerent age groups and are labeled as 101, 102, etc. These ECG signals consist of ambulatory recordings of two channel ECGs, obtained from diﬀerent patients who were facing heart problems of diﬀerent types. Out of 48 patients, there were 25 men whose ages were between 32 − 89 years and 23 were women between 23 − 89 years of age. Although the clinical machines generate an analogue data, but the available recordings were digitized by taking 360 samples per second per channel and resolution of 11-bit. Out of these recordings, a one second waveform of ECG 101 is chosen to be modeled here. Being real time signals, these recordings are contaminated by diﬀerent kinds of artifacts and instrumental noises. The ECG 101 has contamination of baseline wander (BW) and muscle artifacts (EMG). Furthermore, the power line interference (PLI) is present in almost all the recording that is often because of the perturbations of machine’s power line, and can be avoided by using ﬁne power supply and better quality machines. To have a better modeling of real signals, these artifacts are generated artiﬁcially, and then added in modeled ECG waveform. The simulations are made for 1 s duration, with a sampling of 360 signals. The heartbeat frequency N is chosen according to the patient’s age, and frequency range provided on database. 3.1

MIT-BIH 101

The patient is a female with age = 75 years, and we have taken N = 68 for simulation of its one second digitized waveform. As described earlier, real time signals are mostly contaminated by artifacts, and therefore simulated noises are required to be added in the modeled ECG. Starting with a suitable choice of Ao , the model parameters are chosen carefully to make the modeled waveform comparable with the real one. Table 1 gives the values of amplitude, duration and location that are used to generate the

612

S. Javed and N. A. Ahmad

modeled ECG101. Figure 4 shows a comparison of the waveforms of modeled ECG101 and clinical ECG MIT-BIH101. This modeled waveform has contamination of two modeled noises in it: the BW noise and EMG noise. The BW noise, modeled by ABW sin(2πfBW n) + cos(2πfBW n) of frequencyfBW = 0.25 is added in modeled ECG waveform. Amplitude ABW of this baseline wander is taken as a fraction of amplitude of event R that is largest in most of the leads. The EMG noise is due to movements of muscles and varies randomly as the patients muscles move. For this reason, ECG noise can be generated by perturbing a sinusoidal signal with a random frequency noise. Addition of these artifacts to the model of Sect. 2.1 generate modeled ECG 101, provided that the value of Ao is suitably chosen. Figure 4 shows a comparison of the modeled ECG 101 and MIT-BIH 101 for the ﬁrst second. The two waveforms are looking similar, showing ECG 101 to be a fairly good approximation of the real signal. The diﬀerences are because of the present of powerline interference in the real ECG. To view them separately, waveform of modeled ECG 101 and the real MIT-BIH 101 are shown in Fig. 5. The two waveforms resemble to a great extent, showing minor error in Fig. 5(c) that shows the need of recording machines with ﬁne powerline. These comparisons and error curve of Fig. 5(c) show the accuracy of modeled ECG waveform with negligibly small error as compared to the range of the signals in the waveform. Table 1. Parameters for Modeled ECG to Simulate a 1-s Waveform of MIT-BIH 101 Model Parameters P-Wave Q-Wave R-Wave S-Wave T-Wave U-Wave Amplitude (mV)

32

−8

328

18

42

0.1

Duration (sec.)

0.15

0.06

0.035

0.35

0.13

0.15

Location (sec.)

0.1

0.2

0.23

0.27

0.5

0.51

Comparison

1250

Modeled ECG101 MIT−BIH101

1200

Amplitude

1150 1100 1050 1000 950 900 0

50

100

150

200 250 Time (1sec = 360)

300

350

400

Fig. 4. Comparison of 1-s waveforms of Modeled ECG 101 and MIT-BIH 101.

Mathematical Modeling of Real Time ECG Waveform

Amplitude

1400

(a). Modeled ECG for N =68

1200 1000 800 0 1400

Amplitude

613

50

100

150

200 Time (1sec = 360)

250

300

350

400

50

100

150

200 250 Time (1sec = 360)

300

350

400

50

100

150

200 250 Time (1sec = 360)

300

350

400

(b). MIT−BIH101

1200 1000 800 0

Amplitude

50

(c). Error

0 −50 0

Fig. 5. Sinus Rhythms: (a) Modeled ECG 101 with 68 beats per minute, (b) MIT-BIH 101, (c) Error.

4

Conclusion

The simulations of this paper have shown that the proposed ECG model is able to generate real ECG waveforms of various frequencies and leads. Furthermore, with suitable modeling of artifacts, this model can simulate real time signals successfully with high accuracy. An important characteristic of the proposed model is that it is designed in Matlab, and requires superposition of almost periodic functions only. The simulated applications are done by employing periodic nature of the functions, however future work can be done to extend this model to general almost periodic functions. This extension may be a great help in minimizing the error between real and modeled signals. The proposed modeling of real time ECG waveform can help in better understanding of clinical ECG recordings that will lead to better diagnosis of the heart problem. This work can further be extended to model waveform of the signals of longer time. Acknowledgment. This research work is preformed at the Lahore College for Women University, Lahore, Pakistan and is an improvement of the research work done at the Universiti Sains Malaysia, Penang, Malaysia. The work is supported ﬁnancially by the Punjab Higher Education Commission (PHEC) of Pakistan.

References 1. Boulakia, M., Fern´ andez, M.A., Gerbeau, J.-F., Zemzemi, N.: Numerical simulation of electrocardiograms. In: Modeling of Physiological Flows. Springer, pp. 77–106 (2012) 2. Al-Nashash, H.: A dynamic fourier series for the compression of ECG using FFT and adaptive coeﬃcient estimation. Med. Eng. Phys. 17(3), 197–203 (1995) 3. Karthik, R.: ECG simulation using matlab (2003)

614

S. Javed and N. A. Ahmad

4. McSharry, P.E., Cliﬀord, G.D., Tarassenko, L., Smith, L.A.: A dynamical model for generating synthetic electrocardiogram signals. IEEE Trans. Biomed. Eng. 50(3), 289–294 (2003) 5. Borsali, R., Na¨ıt-Ali, A., Lemoine, J.: ECG compression using an ensemble polynomial modeling: comparison with the DCT based technique. Cardiovascular Eng. Int. J. 4(3), 237–244 (2004) 6. Nunes, J.-C., Nait-Ali, A.: Hilbert transform-based ECG modeling. Biomed. Eng. 39(3), 133–137 (2005) 7. Ghaﬀari, A., Homaeinezhad, M., Ahmadi, Y.: An open-source applied simulation framework for performance evaluation of QRS complex detectors. Simul. Model. Pract. Theory 18(6), 860–880 (2010) 8. Javed, S., Ahmad, N.A.: An adaptive noise cancelation model for removal of noise from modeled ECG signals. In: Region 10 Symposium, 2014 IEEE, pp. 471–475. IEEE (2014) 9. Casta˜ no, F., Hern´ andez, A.: Autoregressive models of electrocardiographic signal contaminated with motion artifacts: benchmark for biomedical signal processing studies. In: VII Latin American Congress on Biomedical Engineering CLAIB: Bucaramanga, Santander, Colombia, October 26th-28th 2017, pp. 437– 440. Springer (2016) 10. Moody, G.B., Mark, R.G.: (2000) MIT − BIH arrhythmia database http://www. physionet.org/physiobank/database/mitdb/ 11. Moody, G.B., Mark, R.G.: The impact of the MIT − BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001)

EyeHope (A Real Time Emotion Detection Application) Zulﬁqar A. Memon(&), Hammad Mubarak, Aamir Khimani, Mahzain Malik, and Saman Karim Department of Computer Science, National University of Computer and Emerging Sciences (NUCES-FAST), Karachi, Pakistan {zulfiqar.memon,k132113,k132072,k132004, k132071}@nu.edu.pk

Abstract. This paper describes about an android application named “EyeHope” whose sole purpose is to reduce the dependency level of the blind and visually impaired people. This application would help them decrease the communication gap by allowing them to perceive and identify the expressions of the person they are communicating with, whether the person is speaking or just listening to them. This would be an android based application. This paper also states about the real-time system communication with blind and visually impaired people using frames captured by mobile camera which would then be processed. Processing includes face detection and emotion detection using OpenCV. The output would be the emotion of the person whose image had been processed. This output would be converted from text to speech so that the blind person could listen to it through earphones connected to his/her phones. Its implementation and future enhancements will deﬁnitely be going to improve the life style of blind or visually impaired people by allowing them to effectively communicate with people around them. Keywords: Emotion detection Assisting blind people

Computer vision Image processing

1 Introduction Human senses are an essential part of human body through which a person can manage to survive in the current society. But in the case when a person is blind or visually impaired, the inability to identify the presence of people normally at home or during meetings becomes inconvenient for them [1, 29]. Real-time face recognition [4, 30], text recognition [2, 31] and object detection [5, 32] are some of the dominant developed applications. Machine learning employs algorithms that can ﬁnd patterns from exemplars and make data driven predictions. Thus, computers have the ability to learn and act without being given explicit directions by mimicking the human cognitive framework of collecting and applying knowledge to make decisions [7, 33]. Furthermore, algorithms used for face detection are proposed by Voila and Jones [4, 34] and Eigenfaces (Principle Component Analysis) are used in order to identify gestures, facial expressions and emotions [3, 35]. The present work is focused on developing face © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 615–623, 2019. https://doi.org/10.1007/978-3-030-01174-1_47

616

Z. A. Memon et al.

detection and face recognition algorithms to be used by visually impaired people [1, 36]. The previous research also says that body movement can also become a part for detecting emotions of the humans. But the emotion communication through bodily expressions has been a neglected area for much of the emotion research history [6, 37]. Over the last two decades, researchers have signiﬁcantly advanced human facial emotion recognition with computer vision techniques. Historically, there have been many approaches to this problem, including using pyramid histograms of gradients (PHOG) [8], AU aware facial features [9], boosted LBP descriptors [10], and RNNs [11, 38]. However, recent top submissions [12, 13, 39] to the 2015 Emotions in the Wild (EmotiW 2015) contest for static images all used deep convolution neural networks (CNNs). The classiﬁcation of emotions is also a huge milestone to cover. As, in classiﬁcation problems, good accuracy in classiﬁcation is the primary concern; however, the identiﬁcation of the attributes (or features) having the largest separation power is also of interest [14, 40, 51]. Below, Sect. 2 describes background work done by various researchers in the literature. Section 3 describes various attempts from the past to solve the emotion detection through software applications and how our approach is different from others. Section 4 describes our approach towards building the application and the workflow we used for developing the application. Section 5 describes the Methodology and tools and packages used to build the application and also the comparison between different tools. The section also explains the working of each feature of the application in detail. Section 6 includes experimental analysis conducted with different algorithms for calculating the accuracy and selection of algorithm. The section also provides an insight of our experiments and results derived from those experiments for building the application.

2 Literature Review Understanding emotional facial expressions accurately is one of the determinants in the quality of interpersonal relationships. The more one reads another’s emotions correctly, the more one is included to such interactions. The problems in social interactions are shown in some psychopathological disorders may be partly related to difﬁculties in the recognition of facial expressions. Such deﬁcits have been demonstrated in various clinical populations. Nonetheless, with respect to facial expressions, there have been discrepant ﬁndings of the studies so far [15, 23, 41, 52]. The process of emotion recognition involves the processing images and detecting the face then extracting the facial feature. Facial Expression Recognition consists of three main steps. In ﬁrst step face image is acquired and detect the face region from the images and pre-processes the input image to obtain image that have a normalized size or intensity. Next is expression features are extracted from the observed facial image or image sequence. Then extracted features are given to the classiﬁer and classiﬁer provides the recognized expression as output [16, 24, 42]. In the past researches, it is also stated that Principal Component Analysis [17, 43], Local Binary Pattern (LBP) [18, 44], Fisher’s Linear Discriminator [19, 45] based approaches are the main categories of the approaches available for feature extraction and emotion recognition [16, 28, 46]. There are various

EyeHope

617

descriptors and techniques used in facial expression recognition like the Gradient faces, local features, local binary pattern (LBP), local ternary pattern (LTP), local directional pattern (LDiP) and Local derivative pattern (LDeP) [20, 47]. Many studies on facial expression recognition and analysis have been carried out for a long time because facial expressions play an important role in natural humancomputer interaction as one of many different types of nonverbal communication cue [21, 48]. Paul Ekman et al. postulated six universal emotions (anger, disgust, fear, happiness, sadness, and surprise), and developed Facial Action Coding System (FACS) for taxonomy of facial expressions [22, 27, 49].

3 Approach Advance technology is an integral part of the twenty ﬁrst century. Initially it was thought that the fascinating ideas of upcoming scientists are reviving the lifestyle of those who are physically well and want to make their life even more convenient. There should be something for those who are already bereaved from their physical senses due to some reasons. There should be some technological initiatives which will pull them out from the sphere of blindness and will guide them towards self-reliant and independency. The purpose behind developing this application is to use the current advanced technological resources so that blind and visually impaired people can live their life conveniently and independently. Our Analyst team speciﬁcally targeted and started taking interviews from people related to different societal classes (such as literate or illiterate, employees or managers, etc.) who are blind or visually impaired. After concluding their answers and ﬁndings, it was observed that the most common and foremost problem they face in their daily routine is that they face difﬁculty while inferring about the number of people in front of them on a particular moment as well as they face hurdles while interpreting the expressions/emotions of people which are communicating with them on that moment. Most of the interviewee’s said that when the audience in front of them is verbally communicating (talking), the blind one can barely judge what expressions currently are on the faces of audience. But when that verbal communication between them stops, it is quite hard to judge the expressions of that audience. These minor but daily occurring difﬁculties motivated us to develop something that could help blind or visually impaired people to overcome these hurdles. Now days, each and every individual whether he/she is a youngster or older, owns a smart phone. A global research says that there are 2.32 billion people who owns and know how to use a smart phone [25, 26, 50]. Concluding all these ﬁndings and researches, it was decided that there should be an application developed for blind and visually impaired people, which will help them to interpret number of people in front of them as well as acknowledge them with the emotion on the faces of each detected person. After analyzing the targeted user problems and setting project goal, technical tools, platform on which application should be built, algorithms which should be used for getting good results were discussed and implemented in detailed manner which is briefly described in next section.

618

Z. A. Memon et al.

4 Methodology In the global market, there is 80% of the population which chooses android as their Smartphone platform. From this fact, it was decided to keep our application android based. Not only this, android also provides best prices to ﬁt customer needs. Higher level of customization, multi-tasking and Google integration makes android the most preferred platform among consumers. As EyeHope is a computer vision related software system, OpenCV was suggested by many of the experienced professionals. OpenCV (Open source Computer Vision Library) is an efﬁcient open source computer vision and ML (Machine Learning) library that provides a common infrastructure for computer vision based applications. OpenCV contains approximately 2500 optimized and efﬁcient algorithms. These algorithms can be used to perform multiple computer vision related tasks. OpenCV has its user community which is based on more than 47 thousand users as well as its number of download is exceeding a count of 14 million. As the matter of its usage and popularity, the companies such as Google, Microsoft, IBM, Intel, Yahoo, Sony and many more have used OpenCV for their startup products. To acquire a commercial single user license for Matlab it would cost USD 2150 whereas the OpenCV’s BSD license is free and easily available for users. These are only physical or social measures of OpenCV but what about its technical beneﬁts? Most of the computer vision related projects or applications are confused between OpenCV and Matlab. The reasons behind choosing OpenCV as ﬁrst priority of these applications are that if we compare these developmental tools, on the basis of speed, MATLAB is built on Java, and Java is built upon C. So, when you run a MATLAB program, your computer is busy trying to interpret all that MATLAB code. Then it turns it into Java, and then ﬁnally executes the code. OpenCV, on the other hand, is basically a library of functions written in C/C++. You are closer to directly providing machine language code to the computer to get executed. So ultimately you get more image processing done for your computers processing cycles, and not much interpreting. In the case of computer vision related programs, From Matlab, we would get 3 to 4 frames analyzed per second whereas from OpenCV, we would get at least 20 to 30 frames analyzed per second which result in real-time detection. If these tools are compared on the basis of the resources it need, then due to the high-level nature of MATLAB, it uses a lot of your systems resources. MATLAB code requires over a gig of RAM to run. In comparison, typical OpenCV programs only require *70 mb of RAM to run in real-time. There are several algorithms which are used for computer vision and image processing related systems. But the algorithm which best suit’s in the case of EyeHope application is Fisher Face algorithm. When the goal of application is classiﬁcation rather than representation, the eigenfaces/least-squares may not provide the best desirable results. For this, it is necessary that there should be a subspace that maps the sample vector of same class in a single point of feature representation. To ﬁnd such subspace, the technique mostly used is known as Linear Discriminant Analysis. When linear discriminant analysis technique is used to ﬁnd the subspace representation of a

EyeHope

619

set of face images, the consequential basis vector deﬁning the space is called ﬁsher faces. For computing Fisherfaces, it is to be assumed that each of the class is normally distributed. The multivariate normal distribution is denoted by Ni(li, Ri), mean is denoted by li, covariance matrix by Ri and its probability density function by ﬁ(x|li, Ri). In this case, we have more than two classes for which its procedure will minimize within class differences and maximize between class distances. For the class differences computed using the within class scatter matrix is given by: Sw ¼

X

Cj ¼ 1

X

nji ¼ 1ðxij ljÞðxij ljÞT

Whereas between class differences are estimated using the between class scatter matrix is given by: X Sb ¼ Cj ¼ 1ðlj lÞðlj lÞT Where, l shows the mean of all classes. The above computations results in generalized eigenvalue decomposition which is represented by: SbV ¼ SwVK Where V represents matrix of eigenvectors and K represents diagonal matrix of consequent eigenvalues. The eigenvector of V related to non-zero eigenvalues are the Fisherfaces. One of the basic advantages of ﬁsherface algorithm is that it is used for classiﬁcation purpose and is quite faster than some of the existing algorithms. It uses discriminable linear projection model. For the development of the application, ﬁrst step was to conﬁgure and integrate OpenCV with android which required a bit research and tutorials for its successful completion. The next step was to create user friendly interface. The colors, font size, design and sound quality of application was properly checked so that the users (blinds and visually impaired people) can operate the application in a very convenient way. After this, the ﬁrst feature of the application (detecting number of people adjusting in the frame of mobile screen) was focused in a very detailed manner. Speciﬁcally, Haar cascade classiﬁers were collected experimented and were deployed on the application. The face detection feature was successfully completed and then the focused was shifted towards emotion detection feature. For this, initially a large amount of dataset was required. This application was trained on both internal (self-made) and external (image repositories) datasets. After the dataset collection, the image data was divided for training and testing purposes. Application was trained and tested several times for getting accurate results. Several experiments were performed on the application (speciﬁcally on emotion detection feature) to increase its accuracy and to get desired results. Finally, the application was tested by users in their respective environment.

620

Z. A. Memon et al.

5 Experiments After developing the application using methodology described above, our focus was to increase the accuracy to the maximum level. The dataset we used to train and test the machine was Cohn Kanade Dataset which consists of the images of 123 subjects with 8 emotions. Using this dataset, we achieved an accuracy of 67%. Handling the noises in dataset and over-ﬁtting of different classiﬁers our accuracy was close to 73.7%. We then worked on ways to improve the accuracy of our application. Through indepth analysis and research, we found out that FisherFace classiﬁer usually performs better than the results we acquired. So, we switched our focus on the dataset we were working on. Our team paid a visit to community schools and other places to collect our own realistic dataset in order to help the algorithm correctly classify the emotions. Our dataset was diversiﬁed as it includes subjects belonging to almost all age groups. We then ﬁltered the dataset to remove any discrepancies. We designed the ﬁnal dataset by cleansing and then merging both of the datasets, i.e. the Cohn Kanade dataset and our own dataset. We tested our application on the combination of some emotions, so as to know which emotions are classiﬁed better and which of them need improvements. We acquired the following results (Table 1): Table 1. Accuracy results of the combination of some different emotions Emotions Neutral and Neutral and Neutral and Neutral and Neutral and Neutral and

Accuracy achieved happy 96% disgust 90% angry 80% fear 82% surprise 88% sad 80%

After many optimizations and observing our application by applying various approaches, our ﬁnal accuracy was around 85.53%. This reflects that the emotions detected by EyeHope application are more likely to match the interpretation of a normal human observer.

6 Conclusion Our aim was to create a solution to assist blind people and help them communicate effectively using the technological advancements. We made use of OpenCV library for Android and Android Studio to develop an Android application. This application works real time. It captures, on average, 30 frames per second, from the mobile camera and uses Haar Cascade Classiﬁers to detect the frontal face and crops it. The picture is then converted to gray scale and tested on the set of emotions. We used the FisherFace Classiﬁer provided by the OpenCV for Machine Learning. Moreover, the dataset we

EyeHope

621

used is a merger of Cohn Kanade dataset and a dataset prepared by us. Both of the datasets were ﬁltered and cleaned prior to applying the machine learning techniques. The application was tested on a set of emotions and was optimized constantly to improve the accuracy. We are able to achieve around 85.53% accuracy in terms of correctly classifying the emotions in real time environment. We believe this project is an important step in creating a sense of self reliability and sustainability among the disabled community.

7 Future Enhancements The system of EyeHope application can be enhanced by integrating more features in it such as, text detection by processing the frames having any kind of text in it, describing the scenario in the frame, face recognition, object detection and object recognition. The system can also be enhanced by modifying or polishing the current features like, the frontal face detection feature can be extended by detecting the face even if camera is facing only left side of the face or only right side of the face and settings module can also be modiﬁed to implement more ﬁltering options.

References 1. Chillaron, M., et al.: Face detection and recognition application for Android (2015). https:// www.researchgate.net/publication/284253914_Face_detection_and_recognition_ application_for_Android 2. Bourlard, H., Chen, D., Odobez, J.-M.: Text detection and recognition in images and video frames. Pattern Recognit. 37, 595–608 (2004) 3. Krishna, Sh., et al.: A wearable face recognition system for individuals with visual impairments, pp. 106–113 (2005) 4. Viola, X., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004) 5. Yuille, A.L., Chen, X.: A Time-Effect Cascade for Real-Time Object Detection: With Applications for the Visually Impaired (2005) 6. Konar, A., Chakraborty, A.: Emotion Recognition: A Pattern Analysis Approach (2014) 7. Cosgrove, C., Li, K., Lin, R., Nadkarni, S., Vijapur, S., Wong, P., Yang, Y., Yuan, K., Zheng, D.: Developing an Image Recognition Algorithm for Facial and Digit Identiﬁcation (1999) 8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep convolutional neural networks. In: NIPS, vol. 1, p. 4 (2012) 9. Yao, A., Shao, N.M., Chen, Y.: Capturing AU-aware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI 2015, New York, NY, USA, pp. 451–458. ACM (2015) 10. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009) 11. Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: ICMI, pp. 467–474 (2015)

622

Z. A. Memon et al.

12. Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI 2015, New York, NY, USA, pp. 435–442. ACM (2015) 13. Kim, B., Roh, J., Dong, S., Lee, S.: Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J. Multimodal User Interfaces 10, 1–17 (2016) 14. Visa, S., Ramsay, B., Ralescu, A., van der Knaap, E.: Confusion matrix-based feature selection (2011) 15. Dursun, P., Emül, M., Gençöz, F.: A review of the literature on emotional facial expression and its nature (2010) 16. Raval, D., Sakle, M.: A literature review on emotion recognition system using various facial expression (2017) 17. Sagarika, S.S., Maben, P.: Laser face recognition and facial expression identiﬁcation using PCA. IEEE (2014) 18. Happy, S.L., George, A., Routray, A.: A real time facial expression classiﬁcation system using local binary patterns. IEEE (2012) 19. Bhadu, A., Tokas, R., Kumar, V.: Facial expression recognition using DCT, gabor and wavelet feature extraction techniques. Int. J. Eng. Innov. Technol. 2(1), 92 (2012) 20. Bele, P.S., Mohod, P.S.: A literature review on facial and expression recognition with advanced image descriptor template (2015) 21. Suk, M., Prabhakaran, B.: Real-time Mobile Facial Expression Recognition System – A Case Study. Department of Computer Engineering, The University of Texas at Dallas, Richardson (2014) 22. Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978) 23. https://karanjthakkar.wordpress.com/2012/11/21/what-is-opencv-opencv-vs-matlab/2017 24. https://www.researchgate.net/post/Which_is_the_best_opencv_or_matlab_for_image_ processing/2017 25. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/2017 26. http://topmobiletrends.com/apple-vs-android/2017 27. Lyons, M.J., Akemastu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with gabor wavelets. In: 3rd IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998) 28. Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2000), Grenoble, France, pp. 46–53 (2000) 29. Bosse, T., Duell, R., Memon, Z.A., Treur, J., van der Wal, C.N.: Computational modelbased design of leadership support based on situational leadership theory. Simul. Trans. Soc. Model. Simul. Int. 93(7), 605–617 (2017) 30. Bosse, T., Duell, R., Memon, Z.A., Treur, J., van der Wal, C.N.: Agent-based modelling of emotion contagion in groups. Cognit. Comput. J. 7(1), 111–136 (2015) 31. Hoogendoorn, M., Klein, M.C.A., Memon, Z.A., Treur, J.: Formal speciﬁcation and analysis of intelligent agents for model-based medicine usage management. Comput. Biol. Med. 43 (5), 444–457 (2013) 32. Bosse, T., Memon, Z.A., Treur, J.: A cognitive and neural model for adaptive emotion reading by mirroring preparation states and hebbian learning. Cognit. Syst. Res. J. 12(1), 39– 58 (2012) 33. Bosse, T., Hoogendoorn, M., Memon, Z.A., Treur, J., Umair, M.: A computational model for dynamics of desiring and feeling. Cognit. Syst. Res. J. 19, 39–61 (2012)

EyeHope

623

34. Bosse, T., Memon, Z.A., Treur, J.: A recursive BDI-agent model for theory of mind and its applications. Appl. Artif. Intell. J. 25(1), 1–44 (2011) 35. Bosse, T., Memon, Z.A., Oorburg, R., Treur, J., Umair, M., de Vos, M.: A software environment for an adaptive human-aware software agent supporting attention-demanding tasks. Int. J. Artif. Intell. Tools 20(5), 819–846 (2011) 36. Memon, Z.A.: Designing human-awareness for ambient agents: a human mindreading perspective. J. Ambient Intell. Smart Environ. 2(4), 439–440 (2010) 37. Memon, Z.A., Treur, J.: On the reciprocal interaction between believing and feeling: an adaptive agent modelling perspective. Cognit. Neurodynam. J. 4(4), 377–394 (2010) 38. Kashif¸ U.A., Memon, Z.A., et al.: Architectural design of trusted platform for IaaS cloud computing. Int. J. Cloud Appl. Comput. (IJCAC) 8(2), 47 (2018) 39. Laeeq, K., Memon, Z.A., Memon, J.: The SNS-based e-learning model to provide smart solution for e-learning. Int. J. Educ. Res. Innov. (IJERI) 10, 141 (2018) 40. Samad, F., Memon, Z.A., et al.: The future of internet: IPv6 fulﬁlling the routing needs in Internet of Things. Int. J. Future Gener. Commun. Netw. (IJFGCN) 11(1) (2018, in Press) 41. Memon, Z.A., Samad, F., et al.: CPU-GPU processing. Int. J. Comput. Sci. Netw. Secur. 17 (9), 188–193 (2017) 42. Samad, F., Memon, Z.A.: A new design of in-memory ﬁle system based on ﬁle virtual address framework. Int. J. Adv. Comput. Sci. Appl. 8(9), 233–237 (2017) 43. Memon, Z.A., Ahmed, J., Siddiqi, J.A.: CloneCloud in mobile cloud computing. Int. J. Comput. Sci. Netw. Secur. 17(8), 28–34 (2017) 44. Abbasi, A., Memon, Z.A., Jamshed, M., Syed, T.Q., Rabah, A.: Addressing the future data management challenges in IOT: a proposed framework. Int. J. Adv. Comput. Sci. Appl. 8(5), 197–207 (2017) 45. Waheed-ur-Rehman, Laghari, A., Memon, Z.: Exploiting smart phone accelerometer as a personal identiﬁcation mechanism. Mehran Univ. Res. J. Eng. Technol. 34(S1), August 2015, pp. 21–26 (2015) 46. Memon, Z.A., Treur, J.: An agent model for cognitive and affective empathic understanding of other agents. Trans. Comput. Collect. Intell. (TCCI) 6, 56–83 (2012) 47. Duell, R., Memon, Z.A., Treur, J., van der Wal, C.N.: Ambient support for group emotion: an agent-based model. In: Agents and Ambient Intelligence, Ambient Intelligence and Smart Environments, vol. 12, pp. 261–287. IOS Press (2012) 48. Hoogendoorn, M., Memon, Z.A., Treur, J., Umair, M.: A model-based ambient agent providing support in handling desire and temptation. Adv. Intell. Soft Comput. 71, 461–475 (2010) 49. Batra, R., Memon, Z.A.: Effect of icon concreteness, semantic distance and familiarity on recognition level of mobile phone icons among e-literate and non e-literates. Int. J. Web Appl. 8(2), 55–64 (2016) 50. Siddiqi, S.S., Memon, Z.A.: Internet addiction impacts on time management that results in poor academic performance. In: Proceedings of the 14th International Conference on Frontiers of Information Technology, pp. 63–68. IEEE Computer Society (2016) 51. Laghari, A., Waheed-ur-Rehman, Memon, Z.A.: Biometric authentication technique using smartphone sensor. In: Proceedings of the 13th International Conference on Applied Sciences & Technology, IBCAST 2016, pp. 381–384 (2016) 52. Kashif, U.A., Memon, Z.A., Balouch, A.R., Chandio, J.A.: Distributed trust protocol for IaaS cloud computing. In: Proceedings of the 12th International Conference on Applied Sciences & Technology, IBCAST 2015, pp. 275–279 (2015)

Chromaticity Improvement Using the MSR Model in Presence of Shadows Mario Dehesa Gonzalez(&), Alberto J. Rosales Silva, and Francisco J. Gallegos Funes ESIME Zacatenco, Señales y Sistemas, Instituto Politécnico Nacional, Mexico City, Mexico [email protected], [email protected], [email protected]

Abstract. One of the main problems in digital images is the illumination conditions affecting different objects due to angle reflection and lightness in a scenario, to solve this the Retinex algorithm is proposed to estimate the illumination source due to the lighting conditions, which in turn causes that the artiﬁcial vision algorithms deliver little optimum visual results; particularly this phenomenon is caused by the shadows and the angles of the incident source that produces different reflections from different objects in a scene. So, in this article an algorithm is proposed to diminish the effect of the shadows present in the digital images using the method of Color Constancy of a pixel. The actual proposal presents good inherent characteristics preservation, such as contrast and poor visibility. Keywords: Shadows Digital images

Retinex Color constancy Lightness

1 Introduction In artiﬁcial vision, the illumination conditions can modify the results obtained from the algorithms [1]. For example, in the color changes intensity can occur and this change can cause problems in post-processing procedures like segmentation, tracking, or recognition. The principal problem of the illumination depends on the angle of the illuminant source, the shadows caused by the objects present in the scene due to the obstruction of the light coming from the illumination source and the position of the image capture system. This problem causes the edges of shadows to be identiﬁed and some algorithms interpret the shadows as objects [2]. It is important to understand the nature of the shadows creation to be able to comprehend this effect in the digital images in post-processing procedures. If we are able to diminish the shadow effects, it is possible to improve the methods like identiﬁcation of color, pattern recognition, tracking, face recognition, etc. Our proposal estimates the shadow location using a single digital image. The method uses an image which is invariant to changes in light, color and intensity which means that the invariant image depends only in the reflectance due to the shadow edge is a change of the color and the intensity of the incident light [3]. In this case, the © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 624–630, 2019. https://doi.org/10.1007/978-3-030-01174-1_48

Chromaticity Improvement Using the MSR Model

625

algorithm takes the shadow edges by indication of change only of the color and the intensity of the incident light, because of the captured colors by the sensor can be perceived with modiﬁcations from the real colors of the object. The Human Vision System (HVS) is able to determine the colors of objects regardless of the light source, this ability is known as Color Constancy [2], the HVS has the ability to calculate the illuminance descriptors and reflectance that remains constant in an object, even if changes are made in the illuminant, hence the complexity with which the HVS performs the processing of light since the photosensitive cells in the human eye only measure the amount of light reflected by an object and the light reflected by the object changes with the characteristics of the illuminant.

2 Shadows Identiﬁcation Methodology Lighting changes in a linear behavior can be perceived by (1) [4]. Z

qk ¼ I ðkÞSðkÞQk ðkÞdk;

ð1Þ

where qk is de color present on the image whit k layers (R, G, B), I ðkÞ is the spectral power distribution, SðkÞ is the reflectance on the surface of the objects, Qk ðkÞ represents the sensitivity of the camera, and k is the wavelength. Supposing that lighting can be approximated by Plank’s Law as observed in (2) [3]. c2 1 E ðk; T Þ ¼ Lc1 k5 eTk 1 :

ð2Þ

The constants c1 ¼ 3:7421 1016 Wm2 and c2 ¼ 1:4388 102 m K, L is the intensity value of the incident light on the object. E ðk; T Þ can be approximated by (3). c2

E ðk; T Þ ’ Lc1 k5 eTk :

ð3Þ

Equations (2) and (3) are functions that smooth the wavelengths, and the chromaticity rk is formed by the color values given by qk , as can be observed in (4). rk ¼ qk =qp :

ð4Þ

Using above equation is eliminated the information of intensity of chromaticity, now it is necessary to calculate the color of the lighting for it is later eliminate, how can be seen in (5). rk0 logðrk Þ ¼ log sk =sp þ ek ep =T;

ð5Þ

626

M. D. Gonzalez et al.

Result of apply this equation is a grayscale image ðgsÞ, and is obtained by (6): gs c1 rR0 c2 rB0 :

ð6Þ

When is captured an image, it contains objects of different colors and these can be lighting by different color lights with different intensities. At this moment, is possible to smooth the shadows, this method of shadow identiﬁcation is based in the recovery of the light intensity, to calculate the lightness in the image, it is possible to assume that the lighting has a slow change, and the reflectance has fast changes [5]. In these cases, is possible to remove the effects of the source light Tðrq0 ðx; yÞÞ; how can be described in (7). 0

Tðrq ðx; yÞÞ ¼

0

if krq0 ðx; yÞk\threshold rq0 ðx; yÞ other cases

ð7Þ

In this way we can remove the shadows present in digital images. Now, it is necessary to process the shadows using Retinex model to change the light source. This model is presented in the next section.

3 Retinex MSR Model The Retinex model is based in the color compression model of HVS, in which is possible to perceive images as the product of reflectance and the spectral distribution of the light source [2]. In the area of photometry the possibility of reproducing the spectral distribution is the same as that of illuminating a green apple with a red light source, or of illuminating a red apple with a green light source, the result of the spectral distribution will be the same [6]. The characteristic of color constancy presented in the HVS is to identify the product of the luminosity LðkÞ and reflectance RðkÞ present in a scene, the result of this product is the irradiance E ðkÞ, this spectral characteristic can be approximated by (8). E ðkÞ ¼ LðkÞRðkÞ;

ð8Þ

The HVS is in charge of carrying out the processing of the light, depending on the spatial distribution and chromaticity present in the scene [7]. The model of color constancy is imitated by the Retinex model, this model needs references to be able to determine the color of an object present in the scene for this reason it is necessary to make comparisons with wavelengths reflected by neighboring objects present in the scene. The main objective of the Retinex model is to compensate the effects of lighting over an image, which consists of decomposing an image in two, an image of reflectance (R) and other image of lightness (L), so that each pixel I(x, y) that forms the image can be decomposed as the product of both, (9).

Chromaticity Improvement Using the MSR Model

I ðx; yÞ ¼ Rðx; yÞLðx; yÞ;

627

ð9Þ

This image decomposition theory was proposed by Edwin Land and McCann in 1971 [2]. According to this model, three different receptors measure the energy in different points of the visible spectrum, where each receptor set contributing to form a representation of the visual world and process the measured energy. Dr. Edwin Land assumed that the color interpretation process began at the retina with interpretation in visual cortex, this retina-cerebral system is known as Retinex, where the operators that form part of the Retinex model have the quality to imitate the light receptor of HVS [2]. The Retinex model is described by biological operators in charge of light reception and they are applied in an iterative way to the image through trajectory or a set of trajectories and the value of each pixel within the trajectory is compared to the neighboring pixel [2]. To be able to ﬁnd the best approximation for white color, a search for the maximum intensity Ik is done in each channel within the image fk ðx; yÞ, as shown in (10). Ik ¼ maxffk ðx; yÞg;

ð10Þ

The ﬁnal objective of the Retinex algorithm is to extract the image reflectance information, because this information is an intrinsic property of the object and is invariant to the type of lighting. So, in this way is obtained the lighting effects that are present in an image to ﬁnally obtaining an image that is invariant to lighting changes. The Single Scale Retinex model (SSR) [8] is the basis of the Multiscale Retinex Model (MSR) [9], the illumination is calculated using a combination of images with different standard deviations that calculate the illuminant using Gaussian ﬁlters at different scales (rs), leaving as shown in (11). Lðx; yÞ ¼

S X

xs ðI ðx; yÞ Gs ðx; yÞÞ;

ð11Þ

s¼1

where xs is the weight to the estimation from the obtained lightness via single scale Retinex ðx ¼ x1 ; x2 ; . . .; xs Þ convolving the original image with standard deviation rs agree to the Gaussian function [10]. Using this model, it is possible to identify the color dynamically, what the algorithm does is to reduce abrupt changes in the lighting that are directly related to the lightness and the image will be deﬁned as shown in (12): RMSRk ¼

N X n¼1

xn Rni ¼

Ik ðx; yÞ xs log ; Ik ðx; yÞ Gs ðx; yÞ n¼1

N X

ð12Þ

The channels of the RGB space are represented by k, N represents the number of scales, I is the original image and RMSRk is the image processed in the kth channel by the Multiscale Retinex.

628

M. D. Gonzalez et al.

4 Evaluation Method The evaluation of this method is directly related to the comparison of the quality of the original image, quantifying the distortion suffered by the resulting image is necessary in different ﬁelds of image processing. Because the evaluation of color perceived by human vision is related to psychophysical elements of HVS, this computationally becomes complex. But this evaluation can be done in a less complex way by performing evaluations based on the distortion of the image [11]. This evaluation uses the CIELab color space, because it is the most similar to what the HVS perceives. This space needs a background in white to gray color to observe the colors, in addition to the use of a known light source, in this case a type D65 light source, the properties of this illuminant are similar to the light of the half a day with a color with the temperature of 6,504 K [12]. This model is composed of a luminance (L), and two color vectors (a*, b*) that correspond to the vectors from red to green and another vector from blue to yellow [13]. The cylindrical coordinates in the CIELab space are related to the chromaticity of the image which is an attribute of the color images and the intensity of the color. This characteristic is directly proportional to the magnitude of the vector resulting from its two chromatic components, as observed in (13) [12]. a ¼ ½C1 C2 =11 ¼ ½Ra 12Ga =11 þ Ba =11; b ¼ ð1=2Þ½C2 C1 þ C1 C3 =ð4:5Þ ¼ ð1=9Þ½Ra þ C2 þ 2Ba :

ð13Þ

If the resulting chromaticity of the image is larger compared to the average of the original image, the colors present in the processed image will be more saturated [14]. The procedure to obtain the mean of this vector can be observed in (14). CMN ¼

M 1 X N 1 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ X 2 a2 ðx;yÞ þ bðx;yÞ =MN; x¼0 y¼0

ð14Þ

CMN represents the mean chromaticity magnitude, where the position values of each pixel are given by (x, y).

5 Results We can see in Table 1 the results of the process of the original image that the luminance is not uniform and when using the SSR and MSR model, the results tend to over expose the image to the light source. To use the shadow detection algorithm to support the MSR algorithm, the chromaticity result is better compared to the SSR and MSR results. Although the result to this paper does not have a photo quality, the algorithm can be used in the task such as tracking, segmentation or face recognition where the shadows cause problems.

Chromaticity Improvement Using the MSR Model

629

Table 1. Results of MSR and shadow detection process Results of Improvement Chromaticity using different models Original image

SSR Process

MSR Process

MSR+Shadow detection Process

Chromaticity: 194.21

Chromaticity: 187.41

Chromaticity: 187.47

Chromaticity: 194.42

Chromaticity: 191.27

Chromaticity: 184.40

Chromaticity: 184.37

Chromaticity: 189.25

Chromaticity: 196.52

Chromaticity: 176.20

Chromaticity: 176.29

Chromaticity: 200.91

Chromaticity: 192.15

Chromaticity: 186.86

Chromaticity: 186.22

Chromaticity: 191.98

6 Conclusions We have presented an algorithm that is able to estimate the shadows and reflectance intrinsic images, given the direction of the illumination of the scene. Basically the algorithm is to gather local evidence from color and intensity patterns in the image to determine where there is a shadow. The different modes of Retinex algorithm have a problem when the illuminant on the image is not homogeny, the shadows cause that the lighting is not homogeny, to improve its ability to understand a shadow because of the Retinex algorithm assumes the existence of a second source of illumination and not as a result of an obstacle to the main source of illumination. The Retinex algorithm combined with the shadow detection algorithm improve the results in images with nonhomogeneous illumination. Acknowledgment. The authors thank the Instituto Politécnico Nacional de México and CONACyT for their support in this research work.

630

M. D. Gonzalez et al.

References 1. Barrow, H., Tenenbaum, J.: Recovering intrinsic scene characteristics from images. Comput. Vis. Syst. 2, 3–26 (1978) 2. Land, E.H., McCann, J.J.: Lightness and retinex theory. J. Opt. Soc. Am. 61, 1–11 (1971) 3. Finlayson, G.D., Hordley, S.D.: Color constancy at a pixel. J. Opt. Soc. Am. A 18, 253–264 (2001) 4. Graham, G.D., Drew, M.S., Funt, B.V.: Spectral sharpening: sensor transformations for improved color constancy. J. Opt. Soc. Am. 11, 1553–1563 (1994) 5. Blake, A.: Boundary conditions for lightness computation in Mondrian World. Comput. Vis. Graph. Image Process. 32, 314–327 (1985) 6. Ebner, M.: Color Constancy. Wiley-IS&T, Würzburg (2007) 7. Rizzi, A., Gatta, C.: From retinex to automatic color equalization: issues in developing a new algorithm for unsupervised color equalization. J. Electron. Imaging 13, 75–85 (2004) 8. Herscovitz, M., Yadid-Pecht, O.: A modiﬁed multi scale retinex algorithm with an improved global impression of brightness for wide dynamic range pictures. Mach. Vis. Appl. 15, 220– 228 (2004) 9. Jobson, D.J., Rahman, Z., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 6(7), 965– 976 (1997) 10. Oppenheim, A.V., Schafer, R.W., Stockham Jr., T.G.: Nonlinear ﬁltering of multiplied and convolved signals. IEEE Trans. Audio Electroacoust. 16, 437–466 (1968) 11. Zhang, X., Wandell, B.A.: Colour image ﬁdelity metrics evaluated using image distortion maps. Imaging Science and Techology Program, Department of Psychology (1998) 12. Schanda, J.: Colorimetry: Understanding the CIE System. Wiley-Interscience, New York (2007) 13. Fairchild, M.D.: Color Appearance Models, 2nd edn. Wiley, New York (2005) 14. Tsagaris, V., Ghirstoulas, G., Anastassopoulos, V.: A measure for evaluation of the information content in color images. In: IEEE International Conference on Image Processing (2005)

Digital Image Watermarking and Performance Analysis of Histogram Modiﬁcation Based Methods Tanya Koohpayeh Araghi(&) Computer Department, Iranian Social Security Organization, Arak, Iran

Abstract. Digital image watermarking is deﬁned as inserting digital signals in to a cover image such that the degradation of quality would be minimized and most amounts of the hidden data can be retrieved after geometric and signal processing distortions. In order to select an efﬁcient algorithm in digital image watermarking to fulﬁll the criteria such as robustness, imperceptibility and capacity, it is necessary to be aware of the speciﬁcations of the chosen method. Considering the independency of image histogram from the position of the pixels classiﬁes the histogram modiﬁcation based watermarking as an appropriate method against geometric and signal processing attacks. This paper investigates the recent presented methods in histogram modiﬁcation based image watermarking from 2010 to 2017 to identify the weak and strength points of them to emphasize which method should be developed to enhance the performance of the watermarking algorithms in terms of the mentioned criteria. Results show that using the techniques like selection of the adjacent bins intelligently, secret keys and constant points of cover images make them to be a good candidate for image watermarking to withstand against geometric and signal processing attack. Keywords: Digital image watermarking Robustness Imperceptibility

Histogram modiﬁcation

1 Introduction Digital image watermarking is a branch of information hiding to protect images from illegitimate access and support the intellectual property. The necessity of research on watermarking techniques is very clear since digital media is easily and quickly accessible on the Internet and it is very probable to be used illegally [1, 2]. However, even watermarked images can be misused by exposing on afﬁne transform or signal processing attacks. In signal processing attacks the watermark image is totally removed, while in geometric or afﬁne transforms the hidden watermark cannot be synchronized with the cover image and the watermark will be distorted [3, 4]. Since histogram of every image has an important property of being invariant from afﬁne transformation, inserting a watermark to a host image based on modifying its histogram is regarded as a good solution for increasing the robustness in image watermarking. In this paper, the outstanding presented works in histogram modiﬁcation based methods during the recent years are investigated to classify them based on the strength points © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 631–637, 2019. https://doi.org/10.1007/978-3-030-01174-1_49

632

T. K. Araghi

and their drawbacks in order to highlight the vital factors in design and implementation of new high performance algorithms in this area.

2 Image Histogram Speciﬁcations A histogram in a gray image reflects the relation of the number of pixels in a gray level of the image. This relation is not involved with the zero-value pixels. Since image histogram is a statistical concept, moving the location of the pixels has no effect on it. In other words, to afﬁne transformations, it has invariance property. This property can be used for robustness against various geometric and signal processing attacks [2, 5]. For example, when a watermarked image is exposed to rotation and scaling attack, only the image pixel are shifted, but the watermark information still will remain in the picture [6, 7]. Hence, histogram modiﬁcation is introduced as an invariant domain method for embedding the watermark image. However, the histogram based watermarking is suffering from its limitation for histogram equalization since this operation will distort the histogram shape [8–10]. Table 1 shows the general speciﬁcations of histogram based methods. Table 1. Speciﬁcations of histogram modiﬁcation methods Method Histogram modiﬁcation

Strength points Drawbacks Applicability Stability Robustness p p p

While histogram schemes are very useful and high applicable to enhance the image contrast, contrasting the image will distort the histogram shape which brings instability as a drawback for de-synchronization problem. Here is an important question that how to use the Strength points of histograms to withstand geometric and signal processing attacks? In the following the prominent recent works are investigated to address this problem.

3 Literature Review and Related Works The work offered by He et al. [5] is based on histogram modiﬁcation in gray scale images, they embed the watermark information into the host image through changing the number of the gray samples of the image histogram. In each preset of a gray range, all three consecutive bins are separated into a group to embed one-bit watermark via altering the sample value of a bin to another adjacent bin. In order to avoid duplicating alteration and to improve imperceptibility, only one bin in every two neighbor bins is altered. Then the modiﬁed histogram is mapped to the watermarked image. For watermark detection, the inverse process of watermark embedding is performed such that the watermark extraction is done through calculation and judgment of the relationship in the number of samples among three successive bins. This scheme suffers

Digital Image Watermarking and Performance Analysis

633

from the constraint of histogram equalization to contrast the image. In addition, this scheme is exposed to a high probability of being exploited by attackers due to small value of pseudo random noise (PN) sequence in gray scale images that is considered as a secret key. In the work proposed by Deng et al. [11] a robust digital image watermarking scheme against geometrical distortions was presented. This scheme includes three phases: ﬁrst, with the aid of Harris-Laplace detector, the feature points are extracted and the local circular regions (LCRs) are constructed. In the second phase, in order to select a set of non-overlapped LCRs, clustering-based feature selection is used as a theoretical mechanism, to shape invariant LCRs using orientation normalization. In the third phase, the histogram and independent pixel position are computed based on the chosen LCRs to embed the watermark. Despite offering sufﬁcient robustness against signal processing and geometric attacks, this scheme suffers from low effective capacity for watermark embedding. In addition, the security and robustness of this scheme needs to be enhanced. In other work proposed by Pun and Yuan [12] a feature based extraction scheme using adaptive Harris detector was explained in which several attacks are simulated and a response threshold is adjusted and ranked. The host image is exposed to simulated attacks in order to ﬁnd consistent feature points to embed the watermark bits. Afterwards, these regions are established as square spaces. The intensity level histogram, in each area, is adapted by shifting a number of pixels to ﬁgure a speciﬁc sample in relation to the matching watermark bit. In order to extract the watermark, the adaptive Harris detector is projected. In case of confronting geometric attacks, the watermarked image is restored to its original location, and the watermarked regions are retrieved. Then, in these areas, a sequence of watermark bits is extracted in accordance with the strength level of histogram distribution. In this scheme capacity is independent of robustness but increasing the capacity causes a reduction in imperceptibility. Juang et al. in [13], proposed a reversible watermarking scheme based on the correspondence of values of the neighbor coefﬁcients in wavelet. Most differences between the values of two adjacent coefﬁcients are near to zero. Thus, the histogram is constructed according to the statistics of these variations. Since many peak points are used for hiding the watermark, the capacity is increased in comparison to the other histogram equalization methods. Furthermore, the transparency is also improved as the difference of neighbor coefﬁcients is around zero. This technique is useful for military and medical applications because of high transparency. However, like the other histogram equalization methods, it is impressed by image contrasting effect. In the proposed work by Divya and Kamalesh [14], the host image ﬁrstly is processed by Gaussian ﬁlter in order to tackle the geometric attacks such as cropping and Random Bending attacks. The gray levels of the image by the means of secret key are chosen, and then histogram is constructed. The watermark is embedded in these chosen pixels group by the use of Fast Walsh-Hadamard method. It compensates the side effects caused by Gaussian ﬁltering, resulting to enhance robustness. Security and robustness of the method would be fulﬁlled by the use of a secret key via tree based parity check algorithm. In this work the authors just mentioned the methodology, but practically they did not show the results based on the performance metrics like PSNR

634

T. K. Araghi

to show imperceptibility or NC to present robustness after exposing the watermarked image on attacks. In [15], Hu and Wang presented a histogram modiﬁcation scheme based on combination of block dividing and histogram statistics to enhance imperceptibility. For calculation of histogram the mean pixel value of divided blocks are considered and in order to increase imperceptibility the mean square error is calculated to select blocks and modify them based on HVS. In this scheme a good trade-off between robustness to geometric attacks and common image processing attacks by adjusting the block size is achieved. Experimental result shows an outstanding robustness against geometric attacks and some signal processing attacks, like adding noise and JPEG compression. Although, in the proposed scheme they made a good balance to robustness in both signal processing and geometric attacks, they did not mentioned the taken strategy for security of the image. Salunkhe and Kanse in [16] have proposed a histogram bin shifting technique to minimize the distortion of cover image in accordance with the size of an embedded watermark. Enhancement is achieved by selecting optimal embedding point of the cover image frequency histogram. The result of this technique is reducing the number of pixels to be shifted while embedding the watermark. Consequently, leads to reduction in the cover image distortion based on the presented experimental results.

4 Comparison and Discussion In this section the mentioned related works are compared based on the important characteristics like security, imperceptibility, robustness, capacity and related parameters are shown in Table 2. As it is presented in the table, the major limitations of all methods is impressing by image contrast and low capacity and also effect of relation of capacity on imperceptibility. Each of the presented methods tried to increase the efﬁciency of their proposed method. The highlighted works in terms of fulﬁlling the requirements based on Table 1 are stated as follows: Salunkhe & Kanse used histogram bin shifting to minimize relativity of capacity and imperceptibility. Divya and Kamalesh used secret key to fulﬁll security and robustness. Hu & Wang used block dividing and histogram to provide sufﬁcient robustness. Pun & Yuan selected the stable points by exposing host images on attacks to increase robustness and provide independency of capacity and robustness and HE, made modiﬁcation in gray scale images to enhance robustness and imperceptibility. Table 3 points to robustness of methods based on withstanding the methods against different attacks. In Table 2, general speciﬁcations of the related works are shown. While in Table 3 and Fig. 1, the speciﬁcations of the attacks and the number of attacks that each proposed scheme can be robust against them are presented. Based on Fig. 1, the methods proposed by He, and Pun & Yuan and Divya & Kamalesh have the highest robustness by offering the robustness against 8 and 7 attacks, respectively.

Digital Image Watermarking and Performance Analysis

635

Table 2. Comparison of methods Author/Year

Deng Juang (2010) (2012)

Impressed by image contrast Sufﬁcient security High capacity High imperceptibility Relativity of capacity & imperceptibility Relativity of capacity & robustness Sufﬁcient robustness

Yes

Watermark inserted in spatial/transform domain

Yes

He Pun & (2013) Yuan (2014) Yes Yes

Hu & Wang (2015) Yes

Divya and Kamalesh (2016) Yes

Salunkhe & Kanse (2017) Yes

No No Yes

Yes Yes Yes

No No Yes

Yes No Yes

No No Yes

Yes No Yes

No No Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

No

Yes

Yes

Spatial

Hybrid (Dft + Lsb)

Not checked Spatial

No

Not Yes Yes checked Spatial Dwt Spatial Spatial

Table 3. Effectiveness of the methods against attacks Method proposed by:

Divya and Kamalesh (2016) Hu & Wang (2015) Pun & Yuan (2014) He (2013) Deng (2010)

Attacks (a: Salt & Pepper 0.01, b: Gaussian noise 0.01, c: Median ﬁltering 3 * 3, d: additive noise, e: random bending, f: Scaling ½, i: Rotation 50, j: Cropping ¼, k: Jpeg compression 50%, l: Low pass ﬁltering 3 * 3, m: Wiener ﬁlter, n: Translation) a b c d e f i j k l m n p p p p p p p p p p

p p p

p p p

p p p p

p p p p

p p p

p p

p p

p

p

636

T. K. Araghi

Fig. 1. Robustness of the methods against attacks.

5 Conclusion In this paper, an overview in digital image watermarking against geometric and signal processing attacks on histogram modiﬁcation based methods was presented. Different techniques were investigated and the most vital factors which are affected on robustness, imperceptibility, capacity, and security of the algorithms based on the strength and weak points of each method were discussed. Experimental results show that although histogram modiﬁcation based methods are suffering from vulnerabilities like instability of the histogram shape resulting from image contrast, using the techniques like intelligently choosing the adjacent bins to embed the watermark, employing the secret keys and selecting the constant points of cover images by exposing them under attack before watermark embedding make them to be a good candidate for image watermarking to withstand against geometric and signal processing attacks.

References 1. Araghi, T.K., et al.: Taxonomy and performance evaluation of feature based extraction techniques in digital image watermarking (2016) 2. Araghi, T.K., Manaf, A.B.A.: Evaluation of digital image watermarking techniques. In: International Conference of Reliable Information and Communication Technology, pp. 361– 368 (2017) 3. Araghi, T.K., et al.: A survey on digital image watermarking techniques in spatial and transform domains (2016)

Digital Image Watermarking and Performance Analysis

637

4. Araghi, S.K., et al.: Power of positive and negative thoughts extracted from EEG signals to ﬁnd a biometric similarity. In: 6th SASTech 2012, Malaysia, Kuala Lumpur, 24–25 March 2012 5. He, X., et al.: A geometrical attack resistant image watermarking algorithm based on histogram modiﬁcation. Multidimension. Syst. Signal Process. 26, 291–306 (2015) 6. Nasir, I., et al.: Robust image watermarking via geometrically invariant feature points and image normalisation. Image Process. IET 6, 354–363 (2012) 7. Licks, V., Jordan, R.: Geometric attacks on image watermarking systems. IEEE Multimedia 12, 68–78 (2005) 8. Xiang, S., et al.: Invariant image watermarking based on statistical features in the lowfrequency domain. IEEE Trans. Circ. Syst. Video Technol. 18, 777–790 (2008) 9. Stark, J.A.: Adaptive image contrast enhancement using generalizations of histogram equalization. IEEE Trans. Image Process. 9, 889–896 (2000) 10. Araghi, T.K., Manaf, A.A.: Template based methods in image watermarking to avoid geometric attacks. In: International Conference on Research and Innovation in Computer Engineering and Computer Sciences (RICCES) (2018) 11. Deng, C., et al.: Local histogram based geometric invariant image watermarking. Sig. Process. 90, 3256–3264 (2010) 12. Pun, C.-M., Yuan, X.-C.: Histogram modiﬁcation based image watermarking resistant to geometric distortions. Multimed. Tools Appl. 74, 7821–7842 (2015) 13. Juang, Y.-S., et al.: Histogram modiﬁcation and wavelet transform for high performance watermarking. Math. Probl. Eng. 2012, 14 pages (2012) 14. Divya, M., Kamalesh, M.D.: Recovery of watermarked image from geometrics attacks using effective histogram shape based index. Indian J. Sci. Technol. 9 (2016) 15. Hu, X., Wang, D.: A histogram based watermarking algorithm robust to geometric distortions (2015) 16. Salunkhe, P.P., Kanse, Y.: Reversible image watermarking based on histogram shifting (2017)

A Cognitive Framework for Object Recognition with Application to Autonomous Vehicles Jamie Roche(&), Varuna De Silva, and Ahmet Kondoz Institute for Digital Technologies, Loughborough University London, London, UK {A.J.Roche,V.D.De-Silva}@lboro.ac.uk

Abstract. Autonomous vehicles or self-driving cars are capable of sensing the surrounding environment so they can navigate roads without human input. Decisions are constantly made on sensing, mapping and driving policy using machine learning techniques. Deep Learning – massive neural networks that utilize the power of parallel processing – has become a popular choice for addressing the complexities of real time decision making. This method of machine learning has been shown to outperform alternative solutions in multiple domains, and has an architecture that can be adapted to new problems with relative ease. To harness the power of Deep Learning, it is necessary to have large amounts of training data that are representative of all possible situations the system will face. To successfully implement situational awareness in driverless vehicles, it is not possible to exhaust all possible training examples. An alternative method is to apply cognitive approaches to perception, for situations the autonomous vehicles will face. Cognitive approaches to perception work by mimicking the process of human intelligence – thereby permitting a machine to react to situations it has not previously experienced. This paper proposes a novel cognitive approach for object recognition. The proposed cognitive object recognition algorithm, referred to as Recognition by Components, is inspired by the psychological studies pertaining to early childhood development. The algorithm works by breaking down images into a series of primitive forms such as square, triangle, circle or rectangle and memory based aggregation to identify objects. Experimental results suggest that Recognition by Component algorithm performs signiﬁcantly better than algorithms that require large amounts of training data. Keywords: Object recognition Recognition by component Deep learning One short classiﬁcation Intelligent mobility Autonomous vehicles

1 Introduction Worldwide there are an average 3,287 road deaths a day. In the UK alone, from 1999 to 2010, there were more than 3 million road casualties [1]. Most recently Transport for London (TfL) data shows that in 2015 25,193 casualties took place at signal-controlled urban junctions [2, 3]. Increasing degrees of automation - from semi-manual to fully © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 638–657, 2019. https://doi.org/10.1007/978-3-030-01174-1_50

A Cognitive Framework for Object Recognition

639

computer-controlled vehicles - are already being added to vehicles to improve safety, reduce the number of driver tasks and improve fuel efﬁciency. It is expected that as autonomous vehicles become more ubiquitous trafﬁc incidents and fatalities will reduce [4, 5]. This research aims to reduce incident and causality numbers on roads using smart autonomous systems. By applying an alternative classiﬁcation method, this research will develop an improved method of computer perception and spatial cognition. The approach takes inspiration from mammal vision, early childhood development, multimodal sensor data, spatial and social cognition. Actions such as vision, perception, and motivation assist in human development. They are derived from the internal mapping of external stimuli and the internal mapping of internally perceived stimuli [6]. Commonly referred to as Spatial Cognition and Perception, they are tools humans use to understand and navigate the world [7]. Currently, Deep Nets are a popular method of classiﬁcation [8]. Their main shortcoming is the limited knowledge about the internal workings of artiﬁcial networks and the large quantities of training data required to classify objects accurately (i.e. supervised learning) [9]. As objects become more complex the quantity of training data needed increases. Only after supervised learning has taken place can unsupervised learning begin. Unsupervised learning occurs when a machine relates an object to an event it has previously encountered before learning the new image [10]. Although improved learning methods and modern Graphic Processing Units (GPU) can bring forward the point when unsupervised learning begins, the process is slow and relies heavily on resources. To address the above issues this paper proposes a novel method for object classiﬁcation utilizing Recognition by Component (RBC). RBC explains the cognitive approaches to perception that humans rely on to understand the surrounding world. RBC denotes an ability to break complex images into a series of primitive forms such as square, triangle, circle or rectangle. Cross-correlating this information with the arrangement of the geometric primitive improves the accuracy [11, 12]. This classiﬁcation method - best viewed as reducing equivocation - relies on increasing the verity of data the machine is exposed to. The rest of this paper is organized as follows: Sect. 2 describes relevant literature and related work, Sect. 3 presents the proposed object recognition framework, Sect. 4 discusses the results, and Sect. 5 concludes the paper, with some references to future work.

2 Literature Review and Related Work A review of the relevant literature and related work is presented in this section. It is organized into four sections: Perception and Cognition, childhood development, Deep Learning, One Shot Classiﬁcation and Geometric Based Recognition.

640

2.1

J. Roche et al.

Perception and Cognition

Cognition, or cognitive development, is how humans develop and understand their environment and the things they encounter. Researchers from disciplines in neuroscience, cognition, and sociology, have learned a great deal about how humans sense, interpret and communicate information about the things they see [13, 14]. Revision of cognitive information is an on-going process and adapts as humans age [15]. This infers that memories can be changed, some forgotten, with others out of reach at a certain time. How well humans utilize memories affects how they think and understand their surroundings. Many cognitive pathways are employed to survey a visual scene before a judgment and associated action are made. During this period, objects in sensory range are classiﬁed and identiﬁed. Posterior and occipital lobes are critical in linking the visual map with reality, and therefore to determining the location of objects [16]. The sound, neck, and extra-ocular muscle contribute to this ability to geo-locate [17]. These muscles and auditory ability are responsible for maintaining the link between reality and visual perception [18]. For example, when the ears hear a sound, the head and eyes move with respect to the body, synchronous input from both eyes is required to locate the object of interest. Once an object has been located, the visual input is compared to memories stored in the temporal lobes - bringing about recognition of the objects humans see. Clearly perception is a result of many senses working in parallel. For example, humans do not solely rely on vision to recognize people. Shape, movement and other characteristics, that are equally human, contribute to classiﬁcation [19]. Obstacles can be easily perceived and quickly acted on because of the importance humans assign to a response [20]. For example, when scared, humans experience an increase of adrenaline and nano-adrenaline into their body and brain. Brain activity increases, and oxygen flow allowing mussels and organs to function at a faster rate so that people can move away from the feared object [21]. Not all mammals localize and identify objects in the same manner. Dolphins, Whales and Bats, for example, use echolocation in conjunction with their extra-ocular and ocular muscles to locate and identify objects they encounter [22]. Echolocation is listening for reflected sound waves from objects. The sounds generated are used to determine the position, size, structure and texture of the object [23]. While the process is similar to the echo humans hear, the term echolocation is mostly used for a select group of mammals that employ it on a regular basis [24] for environmental perception, spatial orientation and hunting [22, 25]. 2.2

Childhood Development

Research shows that approximately 50% of a person’s intellectual potential is reached by age four and that early life events have an extended effect on intellectual capacity, personality, and social behavior [26–29]. How children perceive and make sense of the world can be explored through the interpretation of objects they encounter. Although conditioning plays a crucial role in development, motivations such as ‘hunger’, a ‘desire to understand’ and other ‘basic instincts’ are equally important [30, 31].

A Cognitive Framework for Object Recognition

641

Lowenﬁeld’s and Edwards describe the stages of development in their research - as per Fig. 1 [32]. Lowenﬁeld’s and Edwards hypothesized that children initially portray the world in a series of scribbles, enjoying kinesthetic activities, that are merely manipulation of the environment [32, 33]. After passing through different iterations of the scribble stage a child enters the schematic or landscape stage before eventually progressing onto the realism stage [32, 34].

Fig. 1. Lowenﬁeld’s and Edwards stages of early childhood development [33].

The schematic or landscape stage of development is particularly useful for this research. During this time, a child uses shapes to describe complex images while only starting to discover perspective [35]. This is of vital importance and marks the point where a child arrives at a deﬁnitive way of portraying an object. Although the object will be modiﬁed with the addition of features when the child is trying to describe something familiar, the structure of the object will largely remain the same. This stage represents active knowledge of the subject and will contain order along a single line upon which all images sit [32]. 2.3

Deep Learning

The structure of a Deep Net is largely the same as a Neural Net; there is an input layer, an output layer and connected hidden layers - as per Fig. 2 [36, 37]. The main function of such a Deep Network is to perceive an input before performing more complex calculations, resulting in an output that can be used to classify and solve a problem. Image based classiﬁcation is predominately used to categorize groups of objects using features that describe them [38]. There are many types; Logistical Regression, Support Vector Machines, Naive Bayes and Convoluted Neural Nets. When a classiﬁer is activated it produces a score that is dependent on the weight and bias [39]. When a string of classiﬁers are placed in a layered web they can be viewed as a Neural Net [40]. Each layer can be broken down into nodes that produce a score, which

642

J. Roche et al.

Fig. 2. A typical deep network layout showing, weights, bias, feedforward and back propagation.

is passed onto the next layer, diffusing through the network before reaching the output layer. At this point the score generated by the nodes of each layer dictates the result of the classiﬁcation. This process is repeated for each input into the net and is commonly referred to as feed forward propagation [40]. When a neural network is faced with a problem the weights and bias that affect the output enable a prediction. Weights and bias are generated and influenced during training. When the output generated during feed forward propagation does not match the output that is known to be correct, the weights and bias change. As the net trains, the difference – often referred to as cost – is constantly reducing [41]. This is the whole point of training. The net gets familiar with the features of the training data before adjusting the weights and bias until the predictions closely match the inputs that are known to be correct. As the problem to be solved becomes progressively complex, Deep Neural Nets start to outperform standard classiﬁcation engines. Inauspiciously, as the problem become increasingly complex, the number of nodes within the layers grows, and the training becomes more expensive [42]. Deep Nets work around this issue by breaking objects down into a series of simpler patterns [36, 37, 41]. This important aspect of using features, edges and pixels to identify more complex patterns is what gives Deep Learning its strength and its vulnerability. For example, when learning human faces the Deep Net passes a large region of an image - onto a smaller output region - until it reaches the end. The net result is a small change in output even though a large change in input has occurred. Networks that only ever make small changes do not have the opportunity to learn and never make that giant change to the network that is required for autonomous decisions [43]. Consequently

A Cognitive Framework for Object Recognition

643

“the gradients of the network’s output with respect to the parameters in the early layers become extremely small” [44]. Commonly referred to as the vanishing gradient problem, it is largely dependent on how the activation function passes inputs into a small output range in a non-linear manner [45]. In 2006 Hinton, Osindero, and Yee-Whye Teh published breakthrough work on the vanishing gradient point problem [37, 46]. Thinking of the gradient as a hill and the training process as a wheel rolling down the hill, the wheel rolls fast along a surface with a large gradient and slow along the low gradient. The same is true of a Deep Net; at the early stages of the net when there is a small learning curve the progress of the net is quite slow. Towards the end where there is a much larger learning curve the net learns at a much quicker rate [46]. This gives way to a singularity since the layers at the start of the net are responsible for identifying the simpler patterns and laying the building blocks of the image. If the layers at the start of the net perceive things incorrectly then the later layers will also get things wrong. To overcome this problem when a Deep Net wants to learn it starts looking at the error to identify the weights that are affecting the output. After this the Net attempts to reduce the error by changing the speciﬁc weights [47]. This process, known as back propagation, is used for training Deep Nets and removes the issues created by the vanishing gradient problem [37, 41, 46]. 2.4

One Shot Classiﬁcation

Humans demonstrate a strong ability to recognize many different types of patterns. Humans, in particular, have an innate ability to comprehend foreign concepts and many different variations on these concepts in future perception [48]. Unlike humans, machine learning is computationally expensive, and although it has proven to be successful in a variety of applications – spam detection, speech and image recognition – the algorithms often falter when forced to make decisions with little supervised information. One particularly difﬁcult task is classiﬁcation under restriction – where predictions are made having only observed a single example [48–51]. Commonly referred to as One Shot Classiﬁcation, this form of machine learning identiﬁes “domain speciﬁc features” or “inference procedures” that have extremely discriminative properties for the classiﬁcation task [52]. Subsequently, machines that feature One Shot Classiﬁcation excel at similar tasks, but fall short at providing reliable results to unfamiliar types of data. 2.5

Geometric Based Recognition

Falling somewhere between One Shot Learning and Deep Learning, Geometric based Recognition uses pre-deﬁned metrics and some knowledge about the subject before making a decision about the objects perceived. To function, effectively Recognition by Component (RBC) requires an image to be segmented at regions of deep concavity - as per Fig. 3. This allows an image to be broken into an arrangement of simple geometric components - cubes, cylinders, prisms, etc. The theory, ﬁrst proposed in 1987 by Irving

644

J. Roche et al.

Biederman, makes the fundamental assumption that humans segment objects of any form into 36 generalized components, called primitives [53].

Fig. 3. A schematic of the processes used to recognize an object, as proposed by Biederman [53].

For true identiﬁcation, the position of the primitive is the key relationship between perceptual order and object recognition. This enables humans to reliably perceive an image at an obscure angle and still understand what is being observed [53]. If the image can be viewed from any orientation, the projection at that time can be regarded as twodimensional. Objects; therefore, do not need to be presented as a whole, but can be represented as a series of simpliﬁed shapes, even if some parts are occluded [54, 55]. In addition to ﬁlling in the blanks for occluded sections of an object, humans are excellent at trying to make sense of the unknown. For example, when presented with unfamiliar objects humans easily recognize the primitives of which the image is composed, even if the overall image is not recognized [54, 55]. Biederman and others believed that humans perform this process on a regular basis [53, 55–58]. Therefore, humans rely on what the image is composed of rather than the familiarity of the image as a whole. This is a representational system that identiﬁes elements of complex images to assist in human understanding and development [53]. This phenomenon of RBC allows humans to rapidly identify objects from obscure scenes, at peculiar angles and under noisy conditions [53, 55–58]. Deep concavities between primitives are identiﬁed using the surface characteristics of the overlapping parts. Non-accidental properties - shapes that look alike from certain angles - are distinguished by co-linearity and symmetry of the primitive being observed [59]. Colinearity and symmetry play a vital a role in identifying components, as does the orientation of the components. For example, a triangle on top of a square bears a striking resemblance to a house, whereas a square on a triangle makes little sense. Just like Lowenﬁeld and Edwards schematic stage of development, the components need to match the representation of the memory both in shape and orientation [33].

A Cognitive Framework for Object Recognition

645

3 Proposed Object Recognition Framework The previous sections discussed a commonly used method for object detection - Deep Learning. To address the identiﬁed problems, an alternative and novel approach of object recognition - RBC - as depicted in Fig. 4 is proposed.

Fig. 4. Proposed framework for RBC algorithm to classify circles, triangles, squares and rectangles.

The process of RBC requires the decimation of complex patterns into basic geometric shapes or primitive forms. Difﬁculties arise when shapes are occluded or overlapping as in Fig. 5. These issues can be resolved by identifying the watershed ridge line at areas of deep concavity between the individual components, as in Fig. 5. Once identiﬁed the Euclidean distance between the geometric shapes can be computed before applying the individual component metrics. Figure 6 shows the corresponding watershed ridge lines for the same pixels displayed in Fig. 5. From Fig. 6, it is possible to determine the areas of concavities and the geometric shape catchment basin. Images must be of binary form to prepare for boundary tracing and prevent inner contours from being identiﬁed. Boundary tracing of eroded images facilitated the identiﬁcation of object properties - ratio of dimensions, roundness, area, etc. – before classiﬁcation can occur. The segmented image output is equivalent to a book where each page has a geometric shape in the corresponding location to Fig. 5. It should be noted that the original image shown in Fig. 5 contained only four geometric shapes, yet there are 13 layers.

646

J. Roche et al.

Fig. 5. Image composed of multiple primitives overlapping and touching.

Fig. 6. Image composed primitives showing the watershed ridge lines.

These additional layers arise from irregularities within the original image. The additional layers can be viewed as valuable information if looking to identify the background or noise to be ﬁltered at a later point in the process. The function “minboundrect” - developed by John D’Errico - generates the smallest rectangular bounded box that contains the primitive perimeter [60]. Orientation of the primitives influences the size of the bounding box and affects the accuracy in identifying the primitive in question. Consequently the bounded box surrounding the primitive needs to be rotated by some angle to make the object axis parallel to the horizontal axis of the image [61]. Only then is it possible to calculate the metrics of the shape bounded by the box. There are a variety of image quantities and features that facilitate the identiﬁcation of the geometric shape. One of the returning properties - centroid - generates a 1-by-Q vector that speciﬁes the center of mass for each primitive. Additional useful properties

A Cognitive Framework for Object Recognition

647

for identifying shapes are area and perimeter. When used in conjunction with the function regionprops, area returns a scalar that speciﬁes the number of pixels inside the region of interest. Perimeter is determined in a similar manner to the area, but focuses on the individual pixels around the shape rather than what’s inside the boundary. The distance between each adjoining pixel of the primitive boundary is calculated and returns a single value similar to the area scalar returned previously. For twodimensional shapes area and perimeter provide vital information. These functions allow for the calculation of certain metrics that distinguish the different geometric shapes and facilitate recognition. 3.1

Component Metrics

Humans recognize that simple geometric shapes are often categorized into basic classes such as square, rectangle, circle or triangle. However, most shapes frequently encountered are more complex, and typically composed of a combination of the 36 primitives forms [53, 55–58]. If machines are to develop cognition to perceive the world as a whole, they ﬁrst need to be able to understand the simple primitives. To date our research has focused on the process of classiﬁcation of four different primitives - circle, triangle, square and rectangle. To classify a circle there are many methods, of which the majority rely on the radius being calculated. Since the method stated above does not produce a radius for any of the primitives, we need to ﬁnd an alternative method of describing a circle based solely on area and perimeter: A ¼ p r2

ð1Þ

C ¼2pr

ð2Þ

Looking for a solution that utilizes area and perimeter independently of radius, Eqs. 1 and 2 must be rewritten to isolate radius, as follows: pﬃﬃﬃﬃﬃﬃﬃﬃ A=p ¼ r

ð3Þ

C=ð2 pÞ ¼ r

ð4Þ

The resulting is two expressions equal to r, but neither containing r in the body of the equation. Since both equations equal to r, (3) can be substituted for r, and rewritten as: C=ð2 pÞ ¼

pﬃﬃﬃﬃﬃﬃﬃﬃ A=p

ð5Þ

It is possible to further simplify by squaring both sides to get: C 2 =4 p2 ¼ A=p

ð6Þ

648

J. Roche et al.

Cross-multiplying and divide by p (6) returns: C2 ¼ 4 A p

ð7Þ

Since C will have units in length and A will have units of area, C will need to be squared. To conﬁrm this, we can set the radius in (1) and (2) to (1) before substituted into (7) to ﬁnd: C 2 ¼ 4 A p ! ð2 p Þ2 ¼ 4 p p

ð8Þ

Which can be simpliﬁed as: C2 Ap¼1 4

ð9Þ

For a triangle, the process relies on comparing the box bounding area and the primitive area. If the ratio between the two is approximately half, the identiﬁed shape is as a triangle and not a circle: At =Ab ¼ 0:5

ð10Þ

Focusing solely on the aspect ratio (width to height) a square is distinguished from all other primitive forms when the ratio is equal to one (= 1.0): Hb =Wb ¼ 1

ð11Þ

Although relatively simple, identifying shapes in this manner raises a complex problem. If the square metrics and the circle metrics return a value close to 1 the shapes could be classiﬁed incorrectly. This problem can be addressed using hierarchal conditions with the added beneﬁt of classifying a rectangle at the same time.

4 Experimental Results and Discussion In this section, we evaluate the performance of the proposed RBC methodology against the more common method of recognition - Deep Learning. It should be noted that data used to test the different recognition methods was of different types. 4.1

Recognition by Deep Networks

The Rasmus Berg Palm Deep Learning Toolbox in MatLab was used to train a Deep Net with the MINST dataset. The Modiﬁed National Institute of Standards and Technology (MNIST) database consists of centered and normalized handwritten digit images - 60,000 examples for the training and 10,000 examples for testing – measuring 28 pixels wide by 28 pixels high. Each pixel of each image is represented by a value

A Cognitive Framework for Object Recognition

649

between 0 and 255, where 0 is black, 255 is white and anything in between is a different shade of grey. When a machine must decide whether an image contains a digit number of interest a Deep Net uses features and edges to detect different parts of the number - the whips, curve, length, crown. The accuracy of the Deep Net was proven to be dependent on the total number of images presented to the Deep Net during training - as per Fig. 7.

Fig. 7. The rate plotted against the number of images used to train the deep network.

During training, 60,000 images are broken down into a feature, edge and pixel layer. To test the Deep Net accuracy the above process is repeated with a further 10,000 images from the same dataset. An arduous process that only returns positive results when the training set is of certain quantity. It was found that the error fell below 10% when the Deep Net was presented with 32,000 or more training images. Below this the error increased and the accuracy of the Net dramatically reduced. To achieve accuracy in the 99th percentile a total of 60,000-digit images were required to train the Deep Net (see Table 1). Observations of the Deep Nets response to hand drawn digit images that were not part of the MNIST dataset were made (see Table 1). Typically, the expected output matched the users input with accuracy between of 70% and 80%. For example, a predicted output of 5 was generated with 90% conﬁdence, when the Deep Net was presented with a hand drawn digit image of the number 8 - as per Fig. 8. In another example, a predicted output of 1 was generated with 50% conﬁdence, when the Deep Net was presented with a hand drawn digit image of the number 2 - as per Fig. 8. The results vary depending on the person drawing the digit image, the image type and how closely they match the training data. Table 2 shows the observations for the predicted value and the corresponding accuracy.

650

J. Roche et al.

Output Class

Table 1. A confusion matrix showing the accuracy of the trained Deep Net

1

511 0 1 0 0 0 1 0 0 0 99.6 10.2 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % %

2

501 99.6 0 0 0 0 0 0 1 0 1 10.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % % 0.4%

3

0 0 495 0 0 0 0 0 0 0 0.0 % 0.0 % 9.9 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 %

100 % 0.0 %

4

0 0 0 493 0 0 0 0 0 0 0.0 % 0.0 % 0.0 % 9.9 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 %

100 % 0.0 %

5

508 99.2 0 0 0 0 1 0 3 0 0 10.2 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % % 0.8 %

6

0 0 0 1 0 492 0 0 0 0 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 9.8 % 0.0 % 0.0 % 0.0 % 0.0 %

7

496 99.4 2 0 0 1 0 0 0 0 0 10.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % % 0.6 %

8

0 0 0 0 0 2 0 497 0 0 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 9.9 % 0.0 % 0.0 %

99.6 % 0.4 %

9

0 0 0 0 0 0 0 0 494 0 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 9.9 % 0.0 %

100 % 0.0 %

10

0 0 0 0 0 1 0 0 0 493 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 0.0 % 9.9 %

99.8 % 0.2 %

99.8 % 0.2 %

99.6 100 99.8 99.6 100 99.2 99.8 99.2 99.8 99.7 100 % % % % % % % % % % % 0.0 % 0.4 % 0.0 % 0.2 % 0.4 % 0.0 % 0.8 % 0.2 % 0.2 % 0.2 % 0.3 % 1

2

3

4

5

6

7

8

9

10

Target Class

4.2

Recognition by Component

To test the accuracy of the RBC algorithm, a dataset of 8 synthetic primitives was generated. The dataset consisted of ﬁve triangles, one square, one circle and one rectangle – as per Fig. 9. In all cases the RBC algorithm accurately identiﬁed all synthetic shapes. Figure 10 shows the results of the combined watershed and shape recognition RBC algorithm. The resulting matrix contains integers of different values, displayed as different colors. The accurately identiﬁed shapes were labelled with the class and metrics that deﬁne the primitive – C = circle metrics, S = square metrics and T = triangle metrics. Note the irregularities within the original image generate additional artefacts that the RBC algorithm attempts to classify as shapes. When faced with Fig. 9 the RBC algorithm returned 8 positive results and 9 false positives. As noted, the false positives identiﬁed are an unwanted quantity that can be used to identify the background or

A Cognitive Framework for Object Recognition

651

Fig. 8. Hand drawn digit images not contained in the MNIST dataset that were presented to the trained Deep Net. Note the expected output did not always match the hand drawn number correctly. Table 2. Observations of the Deep Nets response to hand drawn digit images that were not part of the MNIST dataset Dataset Digit Numbers not part of the MNIST dataset Predicted Output Accuracy Digit 1 1 94.64% Digit 2 1 50.70% Digit 3 7 62.36% Digit 4 4 96.60% Digit 5 5 98.58% Digit 6 7 29.56% Digit 7 7 89.77% Digit 8 5 90.48% Digit 9 7 52.18% Digit 0 0 97.87%

Fig. 9. A synthetic dataset developed for testing the RBC algorithm.

652

J. Roche et al.

Fig. 10. Primitives recognized by the RBC algorithm using a synthetic dataset. Table 3. Confusion matrix for proposed RBC algorithm. A total of 22 primitives were used to test the algorithm

Class

Square

Circle

Triangle Rectangle

Square

4 94.30%

4.60%

1.10%

0.00%

Circle

4.02%

9 88.20%

7.78%

0.00%

Triangle 6.90%

4.10%

4 88.90%

0.00%

Rectangle 0.00%

0.00%

0.00%

5 100%

ﬁltered out depending on requirements. Table 3 shows the confusion matrix for the RBC algorithm when tested with 22 different primitives of four different classes. In all cases the geometric components were accurately identiﬁed with high levels of accuracy. Observations of the RBC algorithm’s response to data captured by monocular imaging sensor were made. The dataset consisted of one triangle, one square and one rectangle – as per Fig. 11. The triangle, square and triangle were identiﬁed with 98%, 94% and 100% accuracy. Irregularities within the original image generate additional artefacts that the RBC algorithm classiﬁes as shapes. The additional artefacts and the components of interest can be seen in Fig. 12.

A Cognitive Framework for Object Recognition

653

Fig. 11. A triangle, square & rectangle captured by monocular imaging sensor used to test the RBC algorithm.

Fig. 12. The additional artefacts and the components of interest identiﬁed by the RBC algorithm.

654

4.3

J. Roche et al.

Limitations of the Current Method

During the course of this research a number of limitations of the proposed framework were identiﬁed. Firstly, the proposed method works only on a single frame, and does not utilize the diversity offered by temporal redundancies. Secondly, RBC is best applied to fused perception data captured using Light Detection and Ranging (LiDAR) and a monocular image sensor. Although it is possible to apply RBC to monocular sensor data alone, information redundancy removes the possibility of irregularities being falsely identiﬁed as a component of interest. Finally, the proposed framework is the ﬁrst step towards object recognition using RBC. General density estimation models for a ﬁxed set of fundamental objects will need to be developed before objects other than basic primitives can be recognized. Once fundamental models have been identiﬁed it is envisaged that the RBC algorithm will adapt original density estimation models to form new classes. We will consider the above problems and address the limitations in our future work.

5 Conclusion and Further Work This paper illustrates some of the shortcomings of Deep Learning methods when applied on systems that do not have the luxury of massive amounts of training data. To address the situational awareness of autonomous vehicles (or similar systems) - which require algorithms to react to new situations to which they were not trained - we propose a novel cognitive approach for object recognition. The proposed method, named Recognition by Components (RBC) - is inspired by early childhood psychology - is shown to be more practical to use, without the need for large amounts of training data. To facilitate RBC a method identifying the watershed ridge line between adjoining primitive forms was explored. Unlike traditional methods of machine learning, this approach mimics early childhood development and the multi sensing methods mammals frequently use to recognize objects. The preliminary results presented in this paper indicate that the proposed method is capable of learning with small amounts of data. The future work planned, includes the development of a sensor fusion framework to include multiple cameras, radar scanners and ultra sound scanners. Furthermore, methods for robust free space detection based on the data fusion framework will be investigated and reﬁned using our RBC algorithm to identify objects in real world scenarios.

References 1. BBC: Every death on every road in Great Britain from 1999 to 2010, 2 April 2011. http:// www.bbc.co.uk/news/uk-15975564 2. O.N.S.: Cycling to Work in London. G. L. Authority, London (2011) 3. Copsey, S.: A Review of Accidents and Injuries to Road Transport Drivers. EU-OSHA, Luxembourg (2012)

A Cognitive Framework for Object Recognition

655

4. Bernini, N., Bertozzi, M., Castangia, L., Patander, M., Sabbatelli, M.: Real-time obstacle detection using stereo vision for autonomous ground vehicles: a survey. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 873–878 (2014) 5. Sivaraman, S., Trivedi, M.M.: Looking at vehicles on the road: a survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans. Intell. Transp. Syst. 14, 1773–1795 (2013) 6. Dolins, F., Mitchell, R.: Spatial Cognition, Spatial Perception: Mapping the Self and Space. Cambridge University Press, Cambridge (2010) 7. Fleming, R.: Visual perception of materials and their properties. Vision. Res. 94, 62–75 (2014) 8. Kim, J.H., Yang, W., Jo, J., Sincak, P., Myung, H.: Robot Intelligence Technology and Applications. Springer, Cham (2015) 9. Suresh, S., Sundararajan, N., Savitha, R.: Supervised Learning with Complex-valued Neural Networks. Springer, Heidelberg (2012) 10. Baruque, B.: Fusion Methods for Unsupervised Learning Ensembles. Springer, Heidelberg (2010) 11. Chen, Y., Jahanshahi, M., Manjunatha, P., Gan, W., Abdelbarr, M., Masri, S., et al.: Inexpensive multimodal sensor fusion system for autonomous data acquisition of road surface conditions. IEEE Sens. J. 16, 7731–7743 (2016) 12. Vantsevich, V.V., Blundell, M.: Advanced Autonomous Vehicle Design for Severe Environments: IOS Press, Amsterdam (2015) 13. Burack, J.: The Oxford Handbook of Intellectual Disability and Development, 2nd edn. Oxford University Press, Oxford (2012) 14. Waller, D., Nadel, L.: Handbook of Spatial Cognition. American Psychological Association, Washington, DC (2013) 15. Chowdhury, R., Sharot, T., Wolfe, T., Düzel, E., Dolan, R.: Optimistic update bias increases in older age. Psychol. Med. 44, 2003–2012 (2014) 16. Badcock, J.: The cognitive neuropsychology of auditory hallucinations: a parallel auditory pathways framework. Schizophr. Bull. 36, 576–584 (2010) 17. Wade, N.: Pioneers of eye movement research. i-Perception 1, 33–68 (2010) 18. Dutton, G.: Cognitive vision, its disorders and differential diagnosis in adults and children: knowing where and what things are. Eye 17, 289–304 (2003) 19. Yun, J., Lee, S.: Human movement detection and identiﬁcation using pyroelectric infrared sensors. Biomed. Sens. Syst. 14, 24 (2014) 20. Monaco, S., Buckingham, G., Sperandio, I., Crawford, J.: Perceiving and acting in the real world: from neural activity to behavior. Front. Hum. Neurosci. 10, 179 (2016) 21. Martin, E.: Concise Colour Medical Dictionary, 3rd edn. Oxford University Press, Oxford (2002) 22. Thomas, J., Moss, C., Vater, M.: Echolocation in Bats and Dolphins. The University of Chicago Press, Chicago (2004) 23. Gudra, T., Furmankiewicz, J., Herman, K.: Bats sonar calls and its application in sonar systems. In: Sonar Systems. InTechOpen (2011) 24. Surlykke, A., Nachtigall, P., Fay, R., Popper, A.: Biosonar. Springer, New York (2014) 25. Akademiya-nauk, Airapetyants, S.O.B., Konstantinov, E.S., Ivanovich, A.: Echolocation in Animals. IPST, Jerusalem (1973) 26. UNICEF: Early Childhood Development: the key to a full and productive life. UNICEF (2014) 27. H. C. Council: Supporting Children with Dyslexia: Taylor & Francis (2016)

656

J. Roche et al.

28. Arden, R., Trzaskowski, M., Garﬁeld, V., Plomin, R.: Genes influence young children’s human ﬁgure drawings and their association with intelligence a decade later. Psychol. Sci. 25, 1843–1850 (2014) 29. Miles, S., Fulbrook, P., Mainwaring-Mägi, D.: Evaluation of Standardized Instruments for use in universal screening of very early school-age children. J. Psychoeduc. Assess. 36(2), 99–119 (2016) 30. Beck, R.: Motivation: Theories and Principles, 5th edn. Pearson Prentice Hall, Upper Saddle River (2004) 31. Ford, M.: Motivating Humans: Goals, Emotions, and Personal Agency Beliefs. Sage Publications, Newbury Park (1992) 32. Twigg, D., Garvis, S.: Exploring art in early childhood education. Int. J. Arts Soc. 5, 12 (2010) 33. Löwenfeld, V., Brittain, W.: Creative and Mental Growth. Macmillan, New York (1964) 34. Edwards, B.: Drawing on the Right Side of the Brain: A Course in Enhancing Creativity and Artistic Conﬁdence. Souvenir Press, London (2013) 35. Siegler, R., Jenkins, E.: How Children Discover New Strategies. Taylor & Francis, London (2014) 36. Arel, I., Rose, D., Karnowski, T.: Deep machine learning-a new frontier in artiﬁcial intelligence research. IEEE Comput. Intell. Mag. 5, 13–18 (2010) 37. Heaton, J.: Artiﬁcial Intelligence for Humans: Deep Learning and Neural Networks. Heaton Research, Incorporated, St. Louis (2015) 38. Knerr, S., Personnaz, L., Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing: Algorithms, Architectures and Applications, pp. 41–50. Springer, Heidelberg (1990) 39. Iliadis, L., Papadopoulos, H., Jayne, C.: Engineering Applications of Neural Networks. Springer (2013) 40. Roli, F., Kittler, J.: Multiple Classiﬁer Systems. Springer, Heidelberg (2003) 41. Nielsen, M.: Neural Nets and Deep Learning. Determination Press (2017) 42. Sgurev, V., Hadjiski, M.: Intelligent Systems: From Theory to Practice. Springer, Heidelberg (2010) 43. Raidl, G.: Applications of Evolutionary Computing. Springer, Essex (2003) 44. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012) 45. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difﬁcult. IEEE Trans. Neural Netw. 5, 157–166 (1994) 46. Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006) 47. Géron, A.: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Sebastopol (2017) 48. Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.: One-shot learning of simple visual concepts. In: Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Boston, Massachusetts, USA (2011) 49. Fei-Fei, L., Fergus, R., Perona, P.: A Bayesian approach to unsupervised one-shot learning of object categories. Presented at the Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 2 (2003) 50. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006) 51. Palatucci, M., Pomerleau, D., Hinton, G., Mitchell, T.M.: Zero-shot learning with semantic output codes. Presented at the Proceedings of the 22nd International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada (2009)

A Cognitive Framework for Object Recognition

657

52. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: International Conference on Machine Learning, Lille, France (2015) 53. Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94, 115–47 (1987) 54. Ronald, E.: Patterns of identity: hand block printed and resist-dyed textiles of rural Rajasthan, Ph. D, De Montfort University (2012) 55. Tversky, B., Hemenway, K.: Objects, parts, and categories. J. Exp. Psychol. Gen. 113, 169– 97 (1984) 56. Binford, T.: The Vision Laboratory. M.I.T. Project MAC Artiﬁcial Intelligence Laboratory, Cambridge (1970) 57. Brooks, R.A.: Symbolic reasoning among 3-D models and 2-D images. Artif. Intell. 17, 285–348 (1981) 58. Guzman, A.: Analysis of Curved Line Drawings Using Context and Global Information. University of Edinburgh Press, Edinburgh (1971) 59. Marr, D., Nishihara, H.: Representation and Recognition of the Spatial Organization of Three Dimensional Shapes. Massachusetts Institute of Technology, Artiﬁcial Intelligence Laboratory, Cambridge (1977) 60. D’Errico, J.: A suite of minimal bounding objects. In: Tools to Compute Minimal Bounding Circles, Rectangles, Triangles, Spheres, Circles. [Program], Matworks (2014) 61. Rege, S., Memane, R., Phatak, M., Agarwal, P.: 2d geometric shape and color recognition using digital image processing. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2, 8 (2013)

Adaptive Piecewise and Symbolic Aggregate Approximation as an Improved Representation Method for Heat Waves Detection Aida A. Ferreira1(B) , Iona M. B. Rameh Barbosa1 , Ronaldo R. B. Aquino1 , Herrera Manuel2 , Sukumar Natarajan3 , Daniel Fosas3 , and David Coley3 1 Federal Institute of Pernambuco, Recife, Brazil {aidaferreira,ionarameh}@recife.ifpe.edu.br, [email protected] 2 University of Cambridge, Cambridge, UK [email protected] 3 University of Bath, Bath, UK {s.natarajan,dfdp20,d.a.coley}@bath.ac.uk

Abstract. Mining time series has attracted an increasing interest due to its wide applications in ﬁnance, industry, biology, environment, and so on. In order to reduce execution time and storage space, many high level representations or abstractions of the raw time series data have been proposed including Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), Piecewise Aggregate Approximation (PAA) and Symbolic Aggregate approXimation (SAX). In this paper, we introduce a novel adaptive piecewise and symbolic aggregate approximation (APAA/ASAX) which creates segments of variable length in order to automatically adapt any segment length to its local condition of variability and diﬀerence to the average value of the current values in which the segment is deﬁned. The average of each variable segment length from APAA is represented as a symbol from an ordered alphabet generating a modiﬁed version for SAX called adaptive SAX (ASAX). This straightforwardly allows to handle a more versatile deﬁnition for the event duration. The method APAA/ASAX was used for locating heat waves patterns in a real-world time series datasets of daily temperature information, from the year 1970 until 2009. The experimental results show that APAA/ASAX representation was able to locate heatwave events in a huge databases. Advantages of APAA regarding traditional PAA are mainly based on being constrain-free of ﬁxed schemes of segment length. It also highlights the ability of self-tuning this length depending on local time series characteristics. This means that for ﬂat time series APAA proposes a lower number of segments to reduce dimensionality than in the case to deal with time series of high variability. The approach will be of use to those looking extreme events in any time series. Keywords: Mining time series · Piecewise aggregate approximation Symbolic aggregate approximation · Extreme weather events c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 658–671, 2019. https://doi.org/10.1007/978-3-030-01174-1_51

Adaptive Piecewise and Symbolic Aggregate Approximation

1

659

Introduction

A time series is a sequence of data points indexed (listed or graphed) in time order, commonly taken at successive equally spaced order. Time series data can be deﬁned by its speciﬁc characteristics. For large size or even high dimensionality cases, time series data mining stems from the desire to transform our natural ability to visualize the shape of data. Humans rely on complex schemes in order to perform such tasks. We can actually avoid focusing on small ﬂuctuations in order to derive a notion of shape and identify almost instantly similarities between patterns on various time scales. Major time-series-related tasks include query by content [1], anomaly detection [2], motif discovery [3], prediction [4], clustering [5], classiﬁcation, and segmentation [6]. Most prominent problems of time series data mining arise from the high dimensionality of time-series data and the diﬃculty of deﬁning a similarity form of measure based on human perception. Due to the large growth of digital data sources the time-series data mining algorithms will have to combine ever larger data sets. According with Esling and Agon [7] major issues related with time series data mining are: • Data representation. How can the fundamental shape characteristics of a time series be represented? What invariance properties should the representation satisfy? A representation technique should derive the notion of shape by reducing the dimensionality of data while retaining its essential characteristics. • Similarity measurement. How can any pair of time-series be distinguished or matched? How can an intuitive distance between two series be formalized? This measure should establish a notion of similarity based on perceptual criteria, thus allowing the recognition of perceptually similar objects even though they are not mathematically identical. • Indexing method. How should a massive set of time series be organized to enable fast querying? In other words, what indexing mechanism should be applied? The indexing technique should provide minimal space consumption and computational complexity. As happen in most computer science problems, data representation is key to reach eﬃcient and eﬀective solutions. One of the most commonly used representations is a piecewise linear approximation. This representation has been used by various researchers to support clustering, classiﬁcation, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. A natural way to deal with time segments of time series is by Piecewise Aggregate Approximation (PAA), which works with data average at every time series division. PAA is the basis for approaching a symbolic aggregate approximation (SAX) to represent these average values as a symbol from an ordered alphabet [8]. One of the key advantages using SAX is the dimensionality reduction of long time series which become into the so-called SAX words. That is, sequences of

660

A. A. Ferreira et al.

alphabet symbols (typically letters from a dictionary) which concatenate forming a SAX word. This work extends the PAA concept to an adaptive piecewise aggregate approximation (APAA) which creates segments of variable length. The average of each variable segment length from APAA is represented as a symbol from an ordered alphabet generating a modiﬁed version for SAX. A data base of real worldwide daily temperature is used to validate the APAA/ASAX method to detect heat waves. The database used has been collected from the NCEI/NOAA weather stations database. Data from more than 100, 000 weather station are analyzed and ﬁltered to select daily station ﬁles with valid temperature data from 1970 to 2009. Over 3, 300 weather stations have been ﬁnally selected.

2

Background

Indexing time series is traditionally used as a way to eﬃciently store a large temporal database [9]. This proposal expands its use to allow the extraction of patterns from time series average. 2.1

PAA - Piecewise Aggregate Approximation

PAA is a widely used method for time series data representation. This approxix1 , ..., x ¯M of any arbitrary mates a time-series X of length n into vector X = (¯ length M ≤ n where each of x ¯i is calculated as follows: n

M x ¯i = n

Mi

xj .

(1)

n j= M (i−1)+1

In order to reduce the dimensionality from n to M , we ﬁrst divide the original time-series into M equal sized “frames”. The mean value of the data falling within a frame is calculated and a vector of these values becomes the datareduced representation. The sequence assembled from the mean values is the PAA approximation (i.e., transform) of the original time-series. The representation can be understood as an attempt to approximate the original time series with a linear combination of box basis functions. 2.2

SAX - Symbolic Aggregate Approximation

The symbolic aggregate approximation of time series (SAX) [8] represents a time series as a sequence of symbols such as chain of characters. It extends the PAA-based approach inheriting its original algorithm simplicity and low computational complexity while providing satisfactory sensitivity and selectivity in range query processing. Using a symbolic representation opened a door to the existing wealth of data-structures and string manipulation algorithms in

Adaptive Piecewise and Symbolic Aggregate Approximation

661

computer science for data mining tasks such as indexing [10], clustering [11], and classiﬁcation [12]. SAX transforms a time series X of length n into the string of arbitrary length w, where w 2. The algorithm consist of three steps: (1) Divide a time series into segments of length L. (2) Compute the average of the time series on each segment. (3) Represent the average values as a symbol from an alphabet of size N . The time series division is based on a previous PAA phase. SAX is based on the assumption that time series values follow a Gaussian distribution for each of the segments into which PAA divided the series. The conversion of the average values into a symbol makes use of (N-1) breakpoints that divide the area under the Gaussian distribution into N equi-probable areas and then the average value per segment is quantized according to the areas of this distribution [8]. As a result a “word” is composed containing as many letters as segments in the PAA. This alphabetic approach is then useful in further analyses using methods such as hashing [13], variations of Markov models [14], and suﬃx tree approaches [15]. In addition, it automatically has associated a sliding windows approach in which every time-frame is encoded by a letter. Figure 1 represents the output of a single use of SAX process for a temperatures time series from London (April to September, 1989).

Fig. 1. Example of the SAX conversion process for a time series with length 549, w = 9 and resolution 4 (a, b, c, d). Temperature variations from the long term baseline.

2.3

APAA/ASAX Method

This work extends the PAA concept to an adaptive piecewise aggregate approximation (APAA) which creates segments of variable length. The average of each variable segment length from APAA is represented as a symbol from an ordered

662

A. A. Ferreira et al.

alphabet generating a modiﬁed version for SAX called adaptive SAX (ASAX). APAA inherits ideas from change-point detection techniques [16] but it keeps strongly related to the PAA Algorithm [17]. APAA’s aim is to automatically adapt any segment length to its local condition of variability and diﬀerence to the average value of the current values in which the segment is deﬁned. A priori length for creating a segment is given by locating time series peaks. Their related parameters are minimum distance between peaks (minpeakdist in Algorithm 1) and minimum peak size (minpeakh in Algorithm 1) to be considered as a proper peak. These parameters are tuned depending on criteria related to both minimum segment length and sensitivity. In addition to a variable length for each single segment, it is also likely to have 2 or more consecutive segments which average values related to the same SAX codiﬁcation. The Algorithm 1 presents the main idea of APAA/ASAX method. Data: timeseries, alphabet, minpeakdist, minpeakh Result: saxword, locs alphabet ← size of sax alphabet; invertedT data ← invert timeseries; locs ← cut points from invertedTdata; for i ← 1 to length(locs) − 1 do aux ← timeseries(locs(i):locs(i+1)); saxword ← SAX alphabet for aux; end

Algorithm 1. APAA/ASAX Figure 2 shows how APAA/ASAX works for the daily max temperatures series (year 2009 in Valladolid - Spain). The dictionary for ASAX is ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’.

Fig. 2. Example of APAA/ASAX for daily max temperatures series: year 2009 in Valladolid - Spain.

The range of ASAX alphabet symbols (dictionary) is automatically tuned for every case and its size depends on the time series variability. Advantages of

Adaptive Piecewise and Symbolic Aggregate Approximation

663

APAA regarding traditional PAA are mainly based on being constrain-free of ﬁxed schemes of segment length. It also highlights the ability of self-tuning this length depending on local time series characteristics. This means that for ﬂat time series APAA proposes a lower number of segments to reduce dimensionality than in the case to deal with time series of high variability. Another key feature worths to discuss is the also automatic selection of the APAA associated SAX dictionary which is wider as higher is the global variability of the time series.

3

Heat Waves

This section proposes ASAX to index temperatures time series to detect anomalies. These can further be classiﬁed as heat waves when both maximum and minimum time series are considered abnormally high w.r.t. their corresponding average values. While peak hot temperatures are considered those extremes coming up with a duration of just 1 day; a heat wave is deﬁned for those steadily abnormally high temperatures for periods longer than 2 days. The proposed method detects heat wave events when both ASAX codiﬁcations are at their highest code. For example, in ASAX dictionary equal ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, code equal ‘f’. Taking into account the cutoﬀ days for minimum and maximum temperature time series, it is possible to classify time periods as heat wave events by intersecting the time coded as high temperature ‘f’ in both ASAX words.

4

Heat Waves Database

This work uses a worldwide daily temperature database for the period 1970– 2009. The data is collected from the NCEI/NOAA weather stations database of over 3, 300 weather stations. The Global Historical Climatology Network (GHCN) Daily dataset is available for download at the NOAA website1 . It was developed for a wide variety of potential applications, including climate analysis and monitoring studies that require data at daily resolution (e.g., assessments of the frequency of heavy rainfall, heat wave duration, etc.). The dataset contains records from over 80, 000 stations in 180 countries and territories, and its processing system produces the oﬃcial archive for US daily data [18]. Each data row, in daily station ﬁle, is preceded by a 21 digit label containing the station id, year, month, and a 4 digit parameter code, all concatenated without spaces. Following this label, there are 31 repeating groups of 8 digit ﬁelds, each containing a 4 digit signed integer value and 3 separate qualiﬁer codes, one group for each potential day in a calendar month. Overall, 189 distinct parameters are possible after compound and coded parameter types are expanded. Working with these ﬁles clearly requires custom programming and an advanced data integration methodology [19]. 1

https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/.

664

A. A. Ferreira et al.

The Georgia Coastal Ecosystems (GCE) Data Toolbox free add-on library to the MATLAB technical computing language to manage with (GHCN) Daily dataset. The software has been adapted for MATLAB 2 to allow scrapping the NCEI weather database. The heterogeneity on how data are collected at each weather station in the NOAA/NCEI database makes it diﬃcult to guarantee the usefulness of the information at each of the single stations. Hence we have included quantity and quality criteria: (1) a minimum of 40 years of daily data from 1970 (2); (2) ﬁles must contain maxima, minima, and average daily temperatures. Files not meeting these conditions are discarded. Once the weather station ﬁles have successfully passed these preliminary ﬁlters, they are exported to CSV format. It is then proposed a ﬁrst set of databases consisting on one single information table per weather station meeting the above criteria. If the number of bytes in any database is then below a set value it is assumed it contains missing or empty ﬁles. Data: Country List, Station List Result: Table Station (List of valid daily station ﬁles) initialization GCE toolbox; qty daily f iles ← quantity of daily station f iles; c ← Country List; t ← Station List; tx ← 1; for ix ← 1 to qty daily f iles do %transforming GNC-D ﬁle in matlab structure; station data ← imp ncdc ghcnd(daily station f ile(ix)); if (initial year < 1971 and qty years > 39 and exist T emperature and exist LAT and exist LON G) in station data then T able Station(tx) ← station data(ix); tx ← tx + 1; end end write Table Station;

Algorithm 2. Filter: Selecting Valid Daily Station Files

The Algorithm 2 presents the process to obtain a valid daily stations ﬁle for our experiment using the GCE toolbox from more than 100, 000 daily station ﬁles. The output corresponding to Algorithm 2 is a comma separated value (CSV) ﬁle whose role is to contain key information of every valid weather station. Over 3, 000 daily station ﬁles were obtained. Figure 3 presents the geographic distribution of valid weather stations. The stations are not homogeneously distributed on the globe mainly due to the number of requirements the data should meet regarding both number of years and daily resolution on temperatures information.

2

http://www.mathworks.com/products/matlab/.

Adaptive Piecewise and Symbolic Aggregate Approximation

665

Fig. 3. Geographic distribution valid weather stations (over 3,300 weather stations).

Algorithm 3 presents the process adopted to create a set of SAX FILE, based on GHCN-D ﬁles ﬁltered by Algorithm 2, to be used for APAA/ASAX method with the aim to locate heat waves. Data: Table Station (List of valid daily station ﬁles) Result: Set of SAX FILE initialization GCE toolbox; qty stations ← lenght(T able Station); c ← Country List; for ix ← 1 to qty stations do %transforming GNC-D ﬁle in matlab structure; station data ← imp ncdc ghcnd(T able Station(ix)); temp(year, month, day, hour) ← station data(temperature); avg(year, month, day) ← average(temp); max(year, month, day) ← maximum(temp); min(year, month, day) ← minimum(temp); T able Daily(ix) = [lat, long, year, month, day, avg, max, mix]; write SAX FILE(ix) from Table Daily(ix) ; end

Algorithm 3. Creating Set of SAX FILE

At the end of Algorithm 3 a database of SAX FILEs was created. Each row of a SAX FILE is composed by the following columns: year, month, day, maximum temperature (in Celsius), minimum temperature (in Celsius), average temperature (in Celsius), latitude and longitude of the weather station.

666

5

A. A. Ferreira et al.

Experimental Validation

After collecting and pre-processing worldwide daily temperature information from the year 1970 until 2009, an automatic identiﬁcation process for heat waves identiﬁcation is launched (Algorithm 4). 5.1

APAA/ASAX to Heat Waves Detection

In Algorithm 4, each SAX FILE is processed by APAA/ASAX method (Algorithm 1) and their minimum and maximum temperatures are reduced to ASAX words and their cut-oﬀ points. This process mainly lies in a time series indexing process over a variation of a Piecewise Aggregate Approximation (PAA) partition. If the mean of any segment at both maximum and minimum temperatures is classiﬁed as top temperature (last letter of ASAX alphabet), the process identify the segment as a heat wave event.

Data: SAX FILE Result: prototype [ini year, last year] ← ﬁrst and last year from SAX FILE; alphabet ← size of alphabet asax; minpeakdist ← 1; minpeakh ← 2; for year ← ini year to last year do [t min, t max] ← min and max daily temperature for year; [sax max, locs] ← APAA/ASAX(t max, alphabet, minpeakdist, minpeakh); sax min ← APAA/ASAX(t min, alphabet, locs); heatwaves ← heatwavesloc(sax max, sax min, locs, alphabet); prototype ← features(heatwaves, year, sax max, sax min, locs); end

Algorithm 4. Creating a Set of Heat Wave Prototype

Once heat waves or hot temperature events have been identiﬁed, the aim is to proceed to a feature extraction from them ahead to save all the generated information in a new database (see Algorithm 5). This database is speciﬁcally dedicated to hot temperature events. This is based on time windows of top alphabet codiﬁcations for both maximum and minimum temperatures. The prototype dataset is composed by informations for each heatwave identiﬁed. The following columns compound the prototype dataset: median maximum temperature (‘Tmax’), heat wave starting and ending days of the year in number (‘Start’ and ‘End’, respectively), median minimum temperature for these periods (‘Tmin’), geographic coordinates of the speciﬁc weather station where is register the heatwave (‘Lat’ and ‘Long’), median values registered for each period and location regarding the number of heat waves and their duration (‘Count’ and ‘Duration’, respectively).

Adaptive Piecewise and Symbolic Aggregate Approximation

667

Data: heatwaves, year, sax max, sax min Result: features (set of heatwave prototype) Count ← lenght(heatwave); for i ← 1 to nh w do Start ← initial position of heatwave(i); End ← ﬁnal position of heatwave(i); T max ← maximum temperature in heatwave(i); T min ← minimum temperature in heatwave(i); Duration ← length(heatwave); f eatures(i) ← Count, Duration, Tmaxi, Tmin, Lat, Long; end

Algorithm 5. Heat Waves Feature Extraction

5.2

Results

Heatwave prototypes estimate the main characteristics that deﬁne a hot time period. The information is aggregated in sets of 5 years summaries. There are consequently available 8 summary datasets corresponding to the periods 1970–74, 1975–79, 1980–84, 1985–89, 1990–94, 1995–99, 2000–04, and 2005– 09. A descriptive analysis comes from summarizing the 0.9 quantiles for each quinquennium. After summarizing, an operation of diﬀerence between the ﬁrst quinquennium (1970–1974) and the last quinquennium (2005–2009) was applied using median values. Through the quantil 0.90 for both minimum and maximum temperatures of those periods it is possible to get insight on how are the temperature values for the more frequent but also the more intense hot periods. It is possible to observe, in Table 1, how the severity of heat waves temperature across the whole planet has been shifted by 1 ◦ C in 40 years. Despite the robustness of the median to avoid issues with outliers the median duration was the only value that decreasing in the last quinquennium comparison with the info for the ﬁrst one. This value could be inﬂuenced by the number of heat waves per year which is in the last quinquennium nearly four times the number in the ﬁrst one. Table 1. Quantile Global Temperature Evolution Count Duration q90 max q90 min 3.80

−1.00

1.10

0.93

Figure 4 shows that the percentage of weather stations with no heat waves during 1970–74 was approximately 4%. This value drops to a little more than 1% by 2005–09. The geographic spread of heat waves and their evolution in time can be analyzed using GIS to create heatwave maps, as it is shown in Figs. 5 and 6. These two ﬁgures present an estimate of the temperature for those heat waves identiﬁed by APAA method in the periods 1970–74 and 2005–09, respectively.

668

A. A. Ferreira et al.

Fig. 4. Analysis for the years 1970–2009.

Taking into account the importance of truly extreme temperatures, the analyses are focused on the 0.9 quantile. Since data is only available at the location of the weather station, the IDW method is used to interpolate heatwave temperatures to areas not covered by stations.3 Table 2 shows the percentage of area covered by IDW (coloured points on the map) separated into ﬁve ranges. For the period 1970–74, the higher temperatures heatwaves are localized in North America, North Africa, south Europe, south and south-east Asia, and Australia. For 2005–09 heat waves are still occur here but in a denser manner. In addition, regions of South America and west Asia are now having high temperature levels when a heat wave occurs. The increasing of heatwave severity is summarised in Table 2. 51.5 % of the entire area covered by IDW in 1970–04 period presents heat wave temperature higher than 34 ◦ C. This percentage becomes 63.62 % for period 2005–09. 3

IDW (Inverse Distance Weighting) is an interpolation method. In this case, IDW uses a radius of approximately 360 Km to generate maps estimating heatwave temperatures for not monitored areas. There are excluded weather stations with have not extreme events recorded. The same set of stations is used on both periods ﬁrst half of the 1970s (Fig. 5) and the second half of the 2000s (Fig. 6). To do these maps, a reclassiﬁcation is performed by dividing them into ﬁve intervals corresponding to ﬁve temperature ranges.

Adaptive Piecewise and Symbolic Aggregate Approximation

669

Subtracting the results for the two periods shown in Fig. 5 and 6 shows how heat wave severity has changed across the planet, Fig. 7. South and North America, South Africa, central Europe, central and east Asia and Australia show signiﬁcant increases in the severity. Globally, 27.19% of the land area shows an increase in severity of heat wave events between 1970–74 and 2005–09.

Fig. 5. IDW interpolation of 0.9 quantile of time series of maximum temperatures associated to heat waves worldwide. Summaries of the periods 1970–74.

Fig. 6. IDW interpolation of 0.9 quantile of time series of maximum temperatures associated to heat waves worldwide. Summaries of the periods 2005–09.

670

A. A. Ferreira et al.

Table 2. Percentage of area covered by IDW aﬀected by diﬀerent levels of heat wave severity for the periods 1970–74 and 2005–09 Range of temp (C) 1970-74 2005-09 [-inf, 20] [20, 29] [29, 34] [34, 39] [39, inf]

4.04% 11.00% 33.81% 30.42% 20.73%

3.29% 8.77% 24.32% 35.68% 27.94%

Fig. 7. Diﬀerence of heat waves temperatures. Comparing 1970–74 and 2005–09 years.

6

Conclusion

In this paper, we introduced a novel adaptive technique, called APAA/ASAX, to mining time series. APAA/ASAX was applied to locate heat waves in a data base of real worldwide daily temperature information. Despite the large dataset used in this work, the proposed method was capable to process the dataset and locate heat waves with a small process time. The main advantage of APAA over traditional PAA is it being free of a ﬁxed segment length. This allows in principle to do not break or over-extend an event of interest (e.g. steadily high temperatures) conditioned by a predetermined time-window which in the traditional PAA algorithm is independent of local time series characteristics. APAA also allows an easy self-tuning process for its segment lengths and avoids constraints related to time series length divisibility by a number of ﬁxed size segments. There are a number of challenging time series data mining tasks such as motif discovery, discord discovery, classiﬁcation and clustering which we intend to extend in future work.

Adaptive Piecewise and Symbolic Aggregate Approximation

671

Acknowledgment. The authors would like to thank IFPE and COLBE for ﬁnancial support.

References 1. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. SIGMOD Rec. 23(2), 419–429 (1994) 2. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6(1), 7–19 (2004) 3. Lin, J., Keogh, E., Lonardi, S., Lankford, J.P., Nystrom, D.M.: Visually mining and monitoring massive time series. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 460–469. ACM, New York (2004) 4. Chatﬁeld, C., Weigend, A.S.: Time series prediction: forecasting the future and understanding the past. In: Weigend, A.S., Gershenfeld, N.A. (eds.). The future of Time Series, pp. 1–70. Addison-Wesley, Reading (1994). Int. J. Forecast. 10(1), 161–163 (1994) 5. Keogh, E., Lin, J.: Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl. Inf. Syst. 8(2), 154–177 (2005) 6. Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: a survey and novel approach. Data Mining in Time Series (2004) 7. Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12:1– 12:34 (2012) 8. Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007) 9. Keogh, E.: Indexing and mining time series data. In: Encyclopedia of GIS, pp. 493–497. Springer (2008) 10. Toshniwal, D.: Feature extraction from time series data. J. Comput. Methods Sci. Eng. 9(1), 2S1, 99–110 (2009) 11. Aghabozorgi, S., Wah, T.Y.: Clustering of large time series datasets. Intell. Data Anal. 18(5), 793–817 (2014) 12. Yuan, J., Wang, Z., Han, M., Sun, Y.: A lazy associative classiﬁer for time series. IDA, 19(5), 983–1002 (2015) 13. Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 275–309 (2013) 14. Lin, J., Li, Y.: Finding structural similarity in time series data using bag-ofpatterns representation. In: Scientiﬁc and Statistical Database Management, pp. 461–477 (2009) 15. Rasheed, F., Alshalalfa, M., Alhajj, R.: Eﬃcient periodicity mining in time series databases using suﬃx trees. IEEE Trans. Knowl. Data Eng. 23(1), 79–94 (2011) 16. Aminikhanghahi, S., Cook, D.J.: A survey of methods for time series change point detection. Knowl. Inf. Syst. 51(2), 339–367 (2017) 17. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. SIGMOD Rec. 30(2), 151–162 (2001) 18. Chamblee, J.J.F.: Overview guide for the GCE data toolbox for MATLAB 19. Limnol Oceanog Bull, pp. 117–120 (2015)

Selection of Architectural Concept and Development Technologies for the Implementation of a Web-Based Platform for Psychology Research Evgeny Nikulchev1(&), Pavel Kolyasnikov1, Dmitry Ilin1, Sergey Kasatonov2, Dmitry Biryukov2, and Ilya Zakharov3 1

3

Moscow Technological Institute, Moscow, Russia [email protected], [email protected], [email protected] 2 Moscow Technological University MIREA, Moscow, Russia [email protected], [email protected] Psychological Institute of Russian Academy of Education, Moscow, Russia [email protected]

Abstract. This paper considers the design and development of a web-based platform for conducting psychological online research. As a result, a scalable multicomponent platform architecture is formed, implying the separation into a public and private Intranet parts. To organize communication between components, it is intended to use the REST API. The basic unit of data transmission for experiments is a package that allows ensuring the autonomy of the online tool in conditions of poor Internet connection. A technological stack is selected, which includes the use of JavaScript and AngularJS 1 for the client part, Node.JS and Loopback for the server. Keywords: Psychology research Web-based platform Software architecture Requirement analysis Web framework evaluation

1 Introduction Currently, psychological research is increasingly using computer technology and automated tools for data collection and analysis, including the gradual transition from research laboratories to the Internet [1–6]. Computer methods are used in a number of areas, such as behavioral genetics [7–9], neuropsychology [10, 11], developmental psychology [12, 13], cross-cultural studies [14–16], etc. However, the standard pen-and-paper approach is still relevant for research, for example, in some clinical groups, and when there is no access to the Internet or the computer as a whole [17–19]. Computer technologies provide a more convenient way of organizing the collection, storage and processing of research data. The use of web technologies for conducting psychological research provides a number of advantages for psychologists [4–6]. © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 672–685, 2019. https://doi.org/10.1007/978-3-030-01174-1_52

Selection of Architectural Concept and Development Technologies

673

• Easy access to ongoing research through the Internet. • Expansion of training samples through the use of the Internet, which will lead to more reliable results. • Radical increase in the number of people who can participate in the study through the use of the Internet (for example, from other regions). • Individualization of research results. • Feedback from participants in the study. • Allows the introduction of machine learning and algorithms of artiﬁcial intelligence by increasing the amount of data coming in electronic form. • Representation of the means of centralized administration. • Convenient automated presentation of research results. The developed web platform for conducting psychological research is a tool for psychological, psycho-physiological and cognitive evaluation. The web platform should provide opportunities for conducting online research using automated sets of tests. The tool will include a wide range of generally accepted psychological methods, and, in connection with rapid development, the opportunity to include new ones offered by independent researchers. The main goal of the work is to describe and consider architectural solutions within the developed web platform for conducting psychological research, the components of which it consists, a description of the method of transferring the experimental data, the rationale for selecting the technological stack for the client and server parts, a description of the problems encountered at the time of the design and their solutions. Thus, the following issues will be considered in the work: • What architectural solutions should be used to organize the operation of a multicomponent web platform for psychological research? • What technologies are most applicable to the implementation of the architecture in conditions of limited resources? The paper consists of ﬁve sections including introduction. The problem statement section deﬁnes key issues and priorities. The third section describes the methods taken to formulate the architectural approach and to analyse best suited technologies. The fourth section contains description of the platform components, provides details on the main approach to store and transfer the tests, as well as on the selected technologies for the client and server sides of the platform. The conclusion section summarizes results and provides insight on the future work.

2 Problem Statement The architecture of the application is central to the organization of the application and the structure of the system as a whole, which includes a description of the approaches to development, environment, system components and their relationship [20–22]. In addition, the description of the system architecture includes answers to various questions that arose during the design of the system.

674

E. Nikulchev et al.

The main priority is the online survey tool, since it covers most of the possible uses. Cognitive and psychophysiological assessments make more demands on the environment for their evaluation, and in this connection it is also necessary to develop an autonomous instrument [3]. Thus, within the scope of the present work, it is necessary to consider in more detail the architectural approaches to the development of the platform, what components it will consist of and how they will be linked, what they are responsible for and how data is transferred between them. The platform under development is web-oriented and will consist of a server and client parts. In this regard, one of the main tasks is the choice of programming languages and technologies suitable for the development of these components. The platform for psychological research should work in most browsers, including mobile ones without installing additional plug-ins and extensions. Therefore, one needs to choose a solution that does not impose any speciﬁc restrictions or require the user to install additional plug-ins and libraries, except for the presence of the browser itself. The requirements for server components are less stringent. Nevertheless, it is necessary to take into account the peculiarities of certain technological solutions in relation to the learning curve and the complexity of supporting the resulting software product. Since the task is to reduce the cost of studying a large number of technologies, it is reasonable to consider options for minimizing the total number of programming languages that will be involved in the development of the web platform.

3 Research Methods For the formation of an adequate architectural solution, a number of methods have been applied. The initial phase included the architectural requirement analysis [23, 24], in order to identify the main uses, functional and non-functional requirements for the platform. In addition, to clarify the requirements, unstructured interviews of the pedagogical staff were used to reveal the degree of variation in the technical characteristics of software and hardware in schools. Based on the information received, an architectural synthesis [25] was carried out with the aim of identifying a set of loosely coupled components of the system, their connections, the most effective methods of data exchange. To select the programming languages and technologies suitable for the development of the platform, their research and comparison were conducted. It was carried out in the context of the formed architecture, requirements and constraints. For the analysis of languages and technologies, reports and materials of such services as StackOverflow and GitHub [26, 27], which are the most authoritative in the software development environment, were studied. With regard to programming languages for browser applications, application delivery capabilities are evaluated without the need for additional software. Frameworks are considered for their active application in projects, the size of the community of developers, the relevance of the task and the time on the market.

Selection of Architectural Concept and Development Technologies

675

It should be noted that direct comparison of frameworks for development will not yield results, since each of them will allow reaching the ﬁnal result. Nevertheless, a number of them should be considered more suitable due to better scalability, less training costs and more existing modules.

4 Architecture of the Psychological Platform 4.1

Description of Platform Components

The architecture of the developed platform for psychological research was chosen to be multicomponent, which provides more flexibility than monolithic [3]. Monolithic architectures have a number of disadvantages: • The larger the system, the more difﬁcult it is to maintain it and make changes. • With a large system, changing a small portion of the code can cause errors throughout the system. • After each code change, one has to test the entire system for errors. Unlike monolithic, the use of multi-component architecture gives the following advantages: • Write and maintain smaller parts easier than a single large one. • It is easier to distribute the developers to write a speciﬁc part of the system. • The system can be heterogeneous, because for each component developers can use the most suitable programming languages and technologies, depending on the task. • Easier to update, as fewer components are affected. • The system becomes more fault-tolerant, since in the event of failure of one of the components, others may still be working. Thus, according to the authors, the choice in favor of a multicomponent architecture is justiﬁed due to a number of advantages over monolithic, and the most suitable taking into account the requirements for the developed platform. Figure 1 shows the diagram of the platform architecture for psychological research. The architecture is divided into separate components that can work independently and communicate with each other using the REST API [28–30]. “API Server” is the main core of the system, which is a REST API server and is responsible for working with the data store, as well as for performing various service functions. “Online web services” are components that must be accessible from the Internet. They are the main online part of the platform being developed, among which are the online test player, the online test designer, the researcher’s personal area and the personal area of the examinee. “External applications” are separate applications, such as desktop and mobile. Unlike the online version of the test player, the feature of the applications is that the process of passing these tests should work without connecting to the Internet. In this regard, the data for the tests must be loaded in advance, and after passing the tests the answers are uploaded back to the server.

676

E. Nikulchev et al.

Fig. 1. Multi-component platform architecture.

“Private web services (intranet)” are separate services, including the Platform Administration Panel and the Population Data Analysis Panel. The peculiarity of these services is that they must be isolated from direct access from the Internet for a greater security guarantee. It is also worth noting that these services communicate with their own REST API, which includes administrative methods, which should not be accessible from the Internet as well. 4.2

Batch Approach to Storage and Transfer of Tests

To carry out psychological research, a set of tests is used, which can consist of several different tests and include the necessary materials, for example, images or ﬁles. Thus, for storage and transmission of tests, a batch approach is relevant, which will allow storing and transferring data in one ﬁle. The use of the package is also important for transferring data to the client part of the application, where it cannot always be guaranteed access to the Internet. It includes an online application and a desktop application that will be used in schools, as well as an application for mobile devices. Packages will be used for sending psychological tests to the application, for obtaining test results and sending them to the server. Using a package to transfer data to online applications in the client browser is justiﬁed for the following reasons:

Selection of Architectural Concept and Development Technologies

677

• One request to the server is used, instead of several, which minimizes the load on creating an HTTP connection. • The client part does not depend on the server at the time of passing the test. • In the case of a disconnected Internet connection from the user, they will not be in a situation where the passage of the research cannot be completed. • Logging time on the client side minimizes recording errors. • It is easier to track the degree of workload of the package and inform the user about it. Figure 2 shows a diagram of the package structure for storing the tests. It shows that the package is complete, includes information and description of this package, as well as the tests themselves. There must be at least one test in the package. The test includes data on its description, as well as images and ﬁles, if they are needed.

Fig. 2. Structure of the package for storing and transferring tests.

The test itself should be described using a special JSON Schema standard [31, 32], whose structure is approved in advance. Based on this structure, the test will be validated. Using JSON Schema standard avoids a number of problems and has the following advantages [31, 32]: • No need to manually check the contents of documents. • No need to not create task speciﬁc validators with a variety of conﬁgurations and support these solutions. • With a single standard, the process of integration and support of validation in various components of the platform, such as the online test player and desktop applications, is simpliﬁed. • Changing the schema does not require the replacement of the validator code. • One can describe the psychological test manually without the help of additional tools, which will be a plus at the early stage of development, when the test designer is not yet available. • There are many implementations for different programming languages and platforms.

678

4.3

E. Nikulchev et al.

The Choice of Technological Solutions for the Development of the Client Part of the Application

As a result of the consideration of JavaScript, Java applets and the Adobe Flash platform from the point of view of applicability for code execution in the browser, it was found that only JavaScript can be considered applicable. This is due not only to the fact that JavaScript is used in many areas: client browsers, server part, mobile platforms, as well as the desktop applications. Java applet technology, like Adobe Flash technology, requires the installation of additional components in the user’s system [33, 34]. Moreover, depending on the operating system and the browser, the installation and conﬁguration process may vary. Due to the great variability of the hardware and software in schools, the use of these two technologies is not advisable, since this can complicate the process of conducting mass research. It should be noted that in browsers on mobile devices Adobe Flash and Java applets are not supported [34]. It is also worth noting that Adobe Flash becomes obsolete, while HTML5 gives similar ability to work with multimedia (video and audio) [35, 36]. Thus, the choice in favor of JavaScript for the development of the client part becomes obvious and there is no alternative solution under the given conditions at the current moment. JavaScript is supported by all common browsers and is included in them by default. Developing large Single-Page Applications (SPA) based on pure JavaScript on the client side is a difﬁcult and inefﬁcient process, so one needs to use frameworks that deﬁne the application structure and have a basic set of components. It should be noted that almost all modern frameworks have similar functionality and are able to solve this task. Thus, the choice should ﬁrst of all be based not on the functionality of the framework, but on the requirements and objectives within the framework of a particular project. The most famous and popular frameworks were selected for consideration, among them Backbone.js, AngularJS 1, AngularJS 2, React, Ember.js, Vue.js and Polymer. Table 1 shows the advantages and disadvantages of these frameworks, taking into account the applicability to the developed platform. Backbone.js is ill-suited for developing large projects, as there are no necessary components for implementing complex functionality. Thus, according to the authors of the article, the use of this framework is inexpedient in view of the fact that it does not have sufﬁcient functionality, and there are also alternative solutions. Polymer is a library that is based on a fairly new Web Components technology. The W3C speciﬁcation for this technology is not yet complete. There may be problems with browser support, problems in stability, and also high barriers to entry for developers. In this regard, the use of this framework was decided to be abandoned due to possible risks. React, unlike others, is a library and does not allow to create a web application, since it is designed to create a View part and must work with data on the server, for example, in conjunction with Flux or Redux. Therefore React is difﬁcult to understand, has an uncommon structure, which complicates the understanding of the application as a whole, and also has high barriers to entry for junior developers. According to the

Selection of Architectural Concept and Development Technologies

679

Table 1. Advantages and disadvantages of front-end frameworks Framework Backbone.js

AngularJS 1

AngularJS 2

React

Ember.js

Advantages – Compact – Simple structure – Low barriers to entry – Good documentation – Supports REST – High popularity – Low barriers to entry – Rich documentation – Wide community – Many existing solutions – It is part of the MEAN stack (MongoDB, Express.JS, AngularJS, NodeJS) – Supports REST – High speed development – Supports two-way data binding – Rich documentation – Wide community – Has a large number of functions – Supports REST – There are Angular Universal for solving problems of search engine optimization (rendering of pages on the server) – Supports two-way data binding – Compact – High performance – Rich documentation – Suitable for large and complex projects with a high degree of load

– Rich documentation – Large ecosystem – Suitable for complex and large applications – Supports REST – Supports two-way data binding

Disadvantages – Does not support two-way data binding – Requires additional components to implement complex functionality – Not suitable for large projects – It is believed to be outdated, since there is an AngularJS 2 – Not compatible with AngularJS 2 – Performance decreases with a sufﬁciently large amount of data

– Uses TypeScript to compile in JavaScript – More complex barriers to entry compared to AngularJS 1 – It is necessary to take many actions to provide even small functionality

– Requires an additional implementation on the server to work with data (for example, Flux or Redux) – Does not support REST – Not compatible with libraries that modify the DOM – High barriers to entry – Complex approach to development – It is considered to be monolithic in comparison with other frameworks – There is no reuse of components at the controller level – High barriers to entry – Heavy structure – Too big for small projects (continued)

680

E. Nikulchev et al. Table 1. (continued)

Framework Vue.js

Polymer

Advantages – Very rapidly growing popularity – Low barriers to entry – Few dependencies – High performance – Rich documentation – Good ecosystem – Supports two-way data binding – New and promising technology – Web Components – High performance

Disadvantages – A fairly new framework – Developed mainly by one person – Not many projects were done – Does not support REST by default (there is an Axios library)

– Relatively new solution – Great risks when using – Few ready solutions and examples – High barriers to entry

authors, React is more difﬁcult to make a quick prototype and support the solution than on another framework. AngularJS 1, AngularJS 2, Ember.js and Vue.js have two-way data binding, the ability to build large systems, good documentation and community. The main choice will be made between these frameworks. Ember.js has a complex project structure and high barriers to entry for junior developers, and in case of going beyond the standard use is cumbersome and not flexible. In addition, the framework is less popular than AngularJS and Vue.js [26, 27]. Vue.js version 2 is currently the fastest growing popular framework; it took the best solutions from Ember.js, React and AngularJS, and also has good performance. Another important factor is that Vue.js does not support REST and requires an additional Axios library for this. In addition, the framework is relatively new and is developed mostly by one person [27], so its use can lead to greater risks. As a result, the most appropriate for developing a platform for psychological research is AngularJS 1 and AngularJS 2. AngularJS 1 is a fairly simple framework for mastering and understanding has low barriers to entry with a rich set of functions. AngularJS 2 is a parallel project with AngularJS 1 and is developed separately. AngularJS 2 is greatly complicated: to write the simplest application it requires much more action. In addition, it is written in TypeScript, which will require additional knowledge from the developers. Taking into account what was written above, as well as the fact that the platform for psychological research has a limitation in resources and involves junior developers, the most appropriate solution for the current moment, according to the authors of the article, is AngularJS 1. In addition, AngularJS 1 has more popularity than other frameworks, according to GitHub [27]. 4.4

Selection of Technological Solutions for the Development of the Server Part of the Application

The development of the server part of the platform allows one to choose from a fairly wide range of technologies, in comparison with the client part. This is primarily due to the fact that server technologies depend on the preferences of developers, equipment

Selection of Architectural Concept and Development Technologies

681

and requirements for the project, while client technologies are severely limited. The choice of technological solutions for the development of server components is better to start not with programming languages, but with consideration of frameworks because they set the basic structure for the development of the application, as it was written Table 2. Advantages and disadvantages of server-side frameworks Framework Laravel, Symfony (PHP) Django (Python)

Ruby on Rails (Ruby)

Advantages – Low barriers to entry – A large number of PHP developers – Low barriers to entry – Generating the administration panel for relational databases – Low barriers to entry

Express.js (JavaScript/Node. js)

– Not blocking by default (asynchronous) – Steep learning curve

Loopback (JavaScript/Node. js)

– Not blocking by default (asynchronous) – Generating the Preview Panel and Working with the REST API – Declarative approach to the generation of the REST API – Not blocking by default (asynchronous) – Well scaled even with blocking code – Strict typing simpliﬁes refactoring – Contains the library of pre-made UI elements – Front-end code is generated based on the server – Strict typing simpliﬁes refactoring – Strict typing simpliﬁes refactoring

Play (Scala/Java)

Vaadin (Java)

ASP .NET MVC (C#)

Disadvantages – Blocking I/O calls – PHP interpreter has low performance – No paid support – Blocking I/O calls – Does not support NoSQL solutions out of the box – In the development community, there are references to scaling problems under increasing load – Blocking I/O calls – Long-term support of the project has difﬁculties (complexity of refactoring) – Development in large groups can be difﬁcult – The generated API does not contain methods for mass update of related entities

– Slow compilation – New versions of the framework require improvements in the ﬁnal software

– – – –

Blocking by default Slow compilation High barriers to entry The development of new UI elements is time-consuming – There is no full control over the frontend code – Locked in to the Windows platform – Need to purchase Windows Server licenses for deployment

682

E. Nikulchev et al.

above. Table 2 presents the features, advantages and disadvantages of the most suitable, in the authors’ opinion, frameworks for the development of the server part of the platform. Since it was determined that a high degree of project scalability is required, attention should be paid to non-blocking I/O frameworks [37]. In this regard, it is necessary to exclude Laravel, Symfony, Django and Ruby on Rails from consideration. Also, due to the complexities of implementing non-blocking I/O and custom interfaces, the Vaadin framework is not suitable for the project. ASP .NET MVC imposes additional restrictions on the infrastructure in the absence of signiﬁcant advantages, so the framework should be excluded from further consideration. Thus, the main choice will be made between the Express.js, Loopback and Play frameworks. An important factor is the programming language on which the framework is written. Express.js and Loopback are written using Node.js (JavaScript), while Play uses Java and Scala. In the case of JavaScript, a single syntax will be used for both client and server parts. This will increase the effectiveness of the development of the platform, since the developer will need to know not two, but only one programming language, which is an advantage in the conditions of a small number of developers. In addition, it allows combining parts of the learning process and reduces the overall barriers to entry, which will decrease the time of training of new professionals who will participate in the development of the platform. It’s also worth noting, that JavaScript is the most popular language in the world according to the statistics of such large services as GitHub and StackOverflow [26, 27]. In this regard, according to the authors of the article, it is more expedient to use Express.js and Loopback frameworks rather than Play. Of the remaining two frameworks, the choice in favor of Loopback is more appealing for a number of reasons: • Loopback offers a number of patterns that will help maintain the proper level of support for the code base as it increases. • The framework is based on Express.js, which will enable all its functional components. • Loopback offers functionality for simpliﬁed API generation, which greatly reduces the amount of labor involved in development. Listed above, according to the authors of the article, is more signiﬁcant than the steep learning curve. Thus, the choice is Loopback framework.

5 Conclusion At present, modern computer technologies can act as a promising research tool and influence positively on the future development in the ﬁeld of psychological research. In this connection, it is very important for Russian psychologists to have their own tool. The main architectural aspects of the work were formulated, namely the application of the scalable multi-component platform architecture, which is divided into the server (main), public and private Intranet parts, as well as external applications (desktop and

Selection of Architectural Concept and Development Technologies

683

mobile). To organize communication between components and transfer data, it is intended to use the REST API. As the basic unit for the transfer of experiments, a package was chosen that allows ensuring the autonomy of the online tool in conditions of a poor Internet connection. To implement the architectural aspects, the most appropriate technologies were selected in the given task: JavaScript language that will be used to implement most of the software components, AngularJS 1 framework for the client part of the online application, Loopback (Node.js) framework for implementing the API server, it provides a single access point for all other components of the platform. It is worth noting that the use of one basic JavaScript language simpliﬁes the process of development and support of the platform, which is an advantage in the conditions of a small number of developers. The future work will be dedicated to the problems of the design and development of the psychological platform components. In addition to it, there are plans to research various methods of horizontal scaling to support the growing amount of incoming data, which includes performing of stress tests on the resulting web platform. Acknowledgment. This research is supported by the RFBR grant no. 17-29-02198.

References 1. Buchanan, T., Smith, J.L.: Using the internet for psychological research: personality testing on the World Wide Web. Br. J. Psychol. 90(1), 125–144 (1999) 2. Naglieri, J.A., Drasgow, F., Schmit, M., Handler, L., Priﬁtera, A., Margolis, A., Velasquez, R.: Psychological testing on the Internet: new problems, old issues. Am. Psychol. 59(3), 150–162 (2004) 3. Zakharov, I., Nikulchev, E., Ilin, D., Ismatullina, V.: Web-based platform for psychology research. In: ITM Web of Conferences, vol. 10 (2017) 4. Birnbaum, M.H.: Human research and data collection via the internet. Annu. Rev. Psychol. 55, 803–832 (2004) 5. Gosling, S.D., Vazire, S., Srivastava, S., John, O.P.: Should we trust web-based studies? a comparative analysis of six preconceptions about internet questionnaires. Am. Psychol. 59 (2), 93–104 (2004) 6. Kraut, R., Olson, J., Banaji, M., Bruckman, A., Cohen, J., Couper, M.: Psychological research online: report of board of scientiﬁc affairs’ advisory group on the conduct of research on the internet. Am. Psychol. 59(2), 105–117 (2004) 7. Ismatullina, V., Zakharov, I., Nikulchev, E., Malykh, S.: Computerized tools in psychology: cross cultural and genetically informative studies of memory. In: ITM Web of Conferences, vol. 6 (2016) 8. Rimfeld, K., Shakeshaft, N., Malanchini, M., Rodic, M., Selzam, S., Schoﬁeld, K., Dale, P., Kovas, Y., Plomin, R.: Phenotypic and genetic evidence for a unifactorial structure of spatial abilities. Proc. Natl. Acad. Sci. 114(10), 2777–2782 (2017) 9. Kuppermann, M., Norton, M.E., Gates, E., Gregorich, S.E., Learman, L.A., Nakagawa, S., Feldstein, V.A., Lewis, J., Washington, A.E., Nease, R.F.: Computerized prenatal genetic testing decision-assisting tool: a randomized controlled trial. Obstet. Gynecol. 113(1), 53–63 (2009)

684

E. Nikulchev et al.

10. Luciana, M.: Practitioner review: computerized assessment of neuropsychological function in children: clinical and research applications of the cambridge neuropsychological testing automated battery (CANTAB). J. Child Psychol. Psychiatry 44(5), 649–663 (2003) 11. Coutrot, A., Silva, R., Manley, E., de Cothi, W., Sami, S., Bohbot, V., Wiener, J., Hölscher, C., Dalton, R.C., Hornberger, M., Spiers, H.: Global determinants of navigation ability. bioRxiv (2017) 12. MacKenzie, E.P., Hilgedick, J.M.: The computer-assisted parenting program (CAPP): the use of a computerized behavioral parent training program as an educational tool. Child Fam. Behav. Ther. 21(4), 23–43 (2000) 13. Dawson, T.L., Wilson, M.: The LAAS: a computerized scoring system for small-and largescale developmental assessments. Educ. Assess. 9(3–4), 153–191 (2004) 14. Elfenbein, H.A., Mandal, M.K., Ambady, N., Harizuka, S., Kumar, S.: Cross-cultural patterns in emotion recognition: highlighting design and analytical techniques. Emotion 2(1), 75–84 (2002) 15. Van de Vijver, F.J.R., Poortinga, Y.H.: Towards an integrated analysis of bias in crosscultural assessment. Eur. J. Psychol. Assess. 13(1), 29–37 (1997) 16. Matsumoto, D., Van de Vijver, F.J.R.: Cross-Cultural Research Methods in Psychology. Cambridge University Press, Cambridge (2010) 17. Naquin, C.E., Kurtzberg, T.R., Belkin, L.Y.: The ﬁner points of lying online: e-mail versus pen and paper. J. Appl. Psychol. 95(2), 387–394 (2010) 18. Weerakoon, G.L.P.: The role of computer-aided assessment in health professional education: a comparison of student performance in computer-based and paper-and-pen multiple-choice tests. Med. Teach. 23(2), 152–157 (2001) 19. Mueller, P.A., Oppenheimer, D.M.: The pen is mightier than the keyboard: advantages of longhand over laptop note taking. Psychol. Sci. 25(6), 1159–1168 (2014) 20. Maier, M.W., Emery, D., Hilliard, R.: Software architecture: introducing IEEE standard 1471. Computer 34(4), 107–109 (2001) 21. Emery, D., Hilliard, R.: Updating IEEE 1471: architecture frameworks and other topics. In: Seventh Working IEEE/IFIP Conference on Software Architecture, WICSA 2008 (2008) 22. Mei, H., Chen, F., Feng, Y.D., Yang, J.: ABC: an architecture based, component oriented approach to software development. J. Softw. 14, 721–732 (2003) 23. Wang, H., He, W., Wang, F.K.: Enterprise cloud service architectures. Inf. Technol. Manag. 13(4), 445–454 (2012) 24. Ross, D.T., Schoman, K.E.: Structured analysis for requirements deﬁnition. IEEE Trans. Softw. Eng. 3(1), 6–15 (1977) 25. Li, Z., Liang, P., Avgeriou, P.: Application of knowledge-based approaches in software architecture: a systematic mapping study. Inf. Softw. Technol. 55(5), 777–794 (2013) 26. Stack Overflow Developer Survey 2017: Stack Exchange Inc. (2017). https://insights. stackoverﬂow.com/survey/2017. Accessed 12 Sep 2017 27. GitHub Octoverse 2016: GitHub Inc. (2016). https://octoverse.github.com/. Accessed 12 Sep 2017 28. Li, L., Chou, W., Zhou, W., Luo, M.: Design patterns and extensibility of REST API for networking applications. IEEE Trans. Netw. Serv. Manag. 13(1), 154–167 (2016) 29. Pautasso, C.: RESTful web services: principles, patterns, emerging technologies. In: Web Services Foundations, pp. 31–51 (2014) 30. Belqasmi, F., Singh, J., Melhem, S.Y.B., Glitho, R.H.: SOAP-Based vs. RESTful web services: a case study for multimedia conferencing. IEEE Internet Comput. 16(4), 54–63 (2012) 31. Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., Vrgoc, D.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)

Selection of Architectural Concept and Development Technologies

685

32. JSON Schema (2017). http://json-schema.org/. Accessed 14 Sep 2017 33. Garcia-Zubia, J., Ordua, P., Lopez-de-Ipia, D., Hernndez, U., Trueba, I.: Remote laboratories from the software engineering point of view. In: Advances on Remote Laboratories and ELearning Experiences, pp. 131–149 (2007) 34. Garcia-Zubia, J., Orduna, P., Lopez-de-Ipina, D., Alves, G.R.: Addressing software impact in the design of remote laboratories. IEEE Internet Comput. 56(12), 4757–4767 (2009) 35. Vaughan-Nichols, S.J.: Will HTML 5 restandardize the web? Computer 43(4), 13–15 (2010) 36. Prince, J.D.: HTML5: not just a substitute for flash. J. Electron. Resour. Med. Libr. 10(2), 108–112 (2013) 37. Lei, K., Ma, Y., Tan, Z.: Performance comparison and evaluation of web development technologies in PHP, Python, and node.js. In: 2014 IEEE 17th International Conference on Computational Science and Engineering (CSE), pp. 19–21 (2014)

Modeling Race-Tracking Variability of Resin Rich Zones on 90º Composite 2.2 Twill Fibre Curve Plate Spiridon Koutsonas(&) Faculty Computing and Engineering/NIACE Advanced Composites and Engineering, Ulster University/NIACE (North Ireland Advanced Composites Engineering Centre), Belfast, UK [email protected]

Abstract. Continuous ﬁbre reinforced composites are widely used for aerospace, automotive, marine and civil applications due to their light weight and enhanced mechanical properties. The Liquid Composite Moulding (LCM) processes are one of the most common manufacturing routes for composites. The resin flow behaviour during impregnation affected by the preform properties, which are ﬁbres orientation, and textile volume fractions, can vary locally. Local variations induced by moulders’ geometry and fabrics’ architecture are the forming process. Advanced Composites Structures are made of 2D or 3D woven geometrically complex preforms, thus making the impregnation process hard to control and potentially causing defects in the manufacturing of the ﬁnal component. Industrial experience has shown that during mould ﬁlling, due to racetracking and stochastic variability in the material properties, the ﬁlling patterns and arising cycle times are rarely the same between a given set of apparently identical mouldings. Therefore, modeling race-tracking variability of resin rich zones on 90º composite ﬁbre curve plate is a very important issue and presented on this paper. Keywords: Modeling Liquid infusion Composite materials 90° curve plate Fibres Race-track permeability

1 Introduction Resin rich zone along a component edges is a common phenomenon during composites liquid infusion with the resin transfer moulding (RTM) process. The challenge in the present work is to be able to predict the race-track flow behaviour along a 90° edge in order to manufacture high quality composites materials for aerospace or other applications. To that end, there is a lack of an advanced simulation tool capable to predict the manufacture of multi-layer textile composites. The issue of 2D, 3D racetrack prediction on this paper was investigated along a 90° edge for a composite textile. A novel numerical approach for 3D FE CAD modelling was developed in order to predict race-tracking permeability for any composite structure. Approaches to estimate race-tracking and local permeability during the impregnation of a composite have been studied by Rudd et al. [1]. Li et al. [2] used a stochastic simulation based approach for © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 858, pp. 686–707, 2019. https://doi.org/10.1007/978-3-030-01174-1_53

Modeling Race-Tracking Variability of Resin Rich Zones

687

statistical analysis and composites characterization. Liu et al. [3] for modelled RTM gate control, venting and dry spot prediction. Schell et al. [4] published on numerical prediction and experimental characterisation of meso-scale voids in LCM process. Simacek and Advani [5] provided a numerical model in order to predict ﬁbre tow saturation during LCM. Long [6, 15] gave a description of race-tracking and what may happen during composites manufacturing process. Lawrence et al. [7] characterized the preform permeability in the presence of race-tracking. Bickerton et al. [8, 9] researched fabric structure and mould curvature effects on preform permeability and mould ﬁlling during RTM process. Endruweit et al. [10, 22] showed ﬁrstly the influence of stochastic ﬁbre angle variations on the permeability of bi-directional textile fabrics and secondly on random discontinuous carbon ﬁbre preforms. Babu and Pillai [11] presented an experimental investigation of the effects of ﬁbre-mat architecture on the unsaturated flow during LCM process. Hieber and Shen [12] studied the injection moulding ﬁlling computationally with ﬁnite element and ﬁnite difference methods. Frederick and Phelan [13] provided further insights into computational modelling, Trochu et al. [14] used the FE method for numerical analysis of the RTM process. Devillard et al. [16] provided an on-line characterization of bulk permeability and race-tracking during in RTM process ﬁlling stage. Andersson et al. [17] provided a numerical model for vacuum infusion manufacturing of polymer composites. Hammami et al. [18, 19] modelled the edge effect in LCM and the vacuum infusion moulding process analysis. Lawrence et al. [20] studied automated manufacturing in order to address bulk permeability variations and race-tracking in RTM with auxiliary gates. Weimer et al. [21] provided a research on approach to net-shape preforming using textile technologies on the edges. Pillai et al. [23] modelled the heterogeneities presented in preforms during RTM during mould ﬁlling where the influences of various race-tracking situations on the flow pattern have been addressed. 90º composite ﬁbre curve plate presented by various researchers Dong [25] Presented a Model development for the formation of resin-rich zones in composites processing on a 90º composite ﬁbre curve plate. Devillard, Hsiao and Advani [26] presented a Flow sensing and control strategies to address race-tracking disturbances in resin transfer molding process. Bickerton and Advani [27] presented a paper Characterization and modelling of race-tracking in liquid composite molding processes. Finally Koutsonas [28] measured the Compaction and bending variability measurements of a novel 3D woven layer to layer interlock composite textile around a 90º curve plate 3.2 mm radius. Addressing a problem of high relevance in composite manufacturing of complex geometry, therefore this research study aims at description of shape of gaps forming at 90° bends in the mould modelling of related throughthickness variations in the effective preform permeability, and numerical simulation of impregnating resin flow with particular focus on the racetracking in the gap.

2 Race-Track Modelling 2.1

Race-Track Modelling Procedure

Macro-flow models consider flows through a ﬁbre preform with a deﬁned permeability. In the present work the race-track effect was modelled, by grouping different zones on

688

S. Koutsonas

90° curved plate reinforcement and by assigning to each of them a single local permeability. The modelling software used was PAM-RTM® developed by ESI group [24]. However despite its many advantages this software is also quite time consuming making 3D computations long enough and complex parts (such as the Advanced Composites Structures) with multiple injection ports and vent gates, required high speed processing power to be used. Within PAM-RTM® the saturated, partially saturated and unsaturated flow regions are modelled using the Finite Element Control Volume method, where each node is assigned a ﬁll factor one (ﬁlled node), between zero and one (partial ﬁlled node) and zero (unﬁlled node). 2.2

FE CAD 90° Curved Plate Race-Track Modelling

(1) 2D, FE CAD 90° curved plate race-track modelling As soon as the resin flow front arrives in the mould at the 90° curve plate edges, (where gaps often exist between the preform and mould wall), the resin will tend to flow faster than elsewhere in the mould as reported for example by Liu et al. [3]. Bickerton et al. [8, 9] reported the mould curvature effects on preform permeability during mould ﬁlling in RTM with his experimental work. By taking into account this effect and in order to model the variability the following approach will be used: Firstly the critical variability areas (gap channels) of a component must be considered. At edges during the ﬁlling process, the local permeability may differ from the rest of the component. This is because of a free channel between the mould wall and the fabric. Thus the local permeability may be given from the empirical equation reported by Endruweit et al. [22] and Pillai et al. [23]: K¼

h2 12

ð1Þ

Where K in (m2) is the permeability of the gap and h (mm) is the gap between the fabric and mould wall. For all 2D, 3D curve plate models a linear injection gate (inlet) from one end and linear vent (outlet) from other end were used as shown in Fig. 1. Firstly, a 2D CAD model of 90° curve plate was designed and meshed with triangular ﬁnite elements in Altair Hyper-mesh software as shown in Fig. 2. Along the 90° curved plate a number of ﬁve different zones were generated. Element properties were determined by taking into account the weighted averaged permeability of the compacted preform and the gap and by assuming a good in-plane flow behaviour as suggested by Trochu et al. [14] according to: K2D ¼

ðhfabric Kfabric þ hgap Kgap Þ hfabric þ hgap

ð2Þ

Modeling Race-Tracking Variability of Resin Rich Zones

689

Fig. 1. Curved plate model injection gate (group 1, indicated by blue line left) and vent (group 2, indicated by green line right end). Red arrows shown the resin through thickness flow direction for all 2D, 3D 90° curve plate CAD models.

Fig. 2. 2D FE model with 9800 triangular elements (due to PAM-RTM) on a 90° curved plate, geometry with ﬁve zones see red arrow (green, yellow, purple, turquoise, orange) colours of different local permeability for race-track modelling and blue colour zone with the fabric’s permeability.

where K2D in (m2) is the average 2D-element permeability, Kfabric is the fabric permeability, Kgap is the gap permeability, hfabric is the preform height and hgap is the gap height between the preform and the mould wall. So for an anisotropic preform the inplane permeability will be Trochu et al. [14]: K1aver ¼

ðhfabric K1 þ hgap Kgap Þ hfabric þ hgap

ð3Þ

690

S. Koutsonas

and K2aver ¼

ðhfabric K2 þ hgap Kgap Þ hfabric þ hgap

Furthermore, K1aver, K2aver were used as local permeability input data along with the material properties presented with power law in Tables 1 and 2. Table 1. Compaction tests power law ﬁtting Preform Layers H height (mm) 2.2 Twill 3-layers 1.8637 P(−0.078) 2.2 Twill 2-layers 1.2424 P (−0.078)

Table 2. Preform permeability K1, K2, K3 against Vf that may be used in simulation tools a) In-plain K1, K2 fitting equations: Preform 2/2 Twill (cf)

K1 fitting eq.(m2) 0.001 Vf (-11.28)

K2 fitting eq.(m2 ) 0.0012 Vf (-10.73)

b) Through thickness K3 fitting equations: Preform 2/2 Twill

K3 fitting eq.(m2) 0.0593 Vf (-6.644)

FE modelling in 2D provided an average estimation about how much local permeability may be affected by fabric’s bending along a 90o angle during infusion in RTM process. (2) 3D, FE CAD 90° curved plate race-track modelling To model three-dimensional flow, 3D FE CAD models were designed and meshed with bend zone segmentation. For each zone the average fabric thickness Hfabric was determined with the equivalent preform permeability in relation to the ﬁbre volume fraction Vf and the gap thickness hgap with equivalent gap permeability according to (1). In this case a mesh with tetrahedral ﬁnite elements was used allowing higher geometrical details in comparison to other types of ﬁnite elements. However the use of high element mesh density leads to an FE model which requires intensive computation. Therefore for practical reasons it was preferable to limit the mesh reﬁnement on the FE CAD model and the number of zones along the curved plate (see Fig. 3). Thus, 3D race-tracking was studied by creating a series of FE 3D CAD models of different gap heights (hmax). In Fig. 4(a) and (b) the 2.75 mm radius height was divided in two sections: (1) A blue colour zone 1 with the fabric’s permeability. (2) An outer gap section with measured maximum gap height (hmax) of resin rich zones 2, 3 and 4 (green, yellow and purple).

Modeling Race-Tracking Variability of Resin Rich Zones

691

Fig. 3. 3D curve plate of 2.75 mm thick with mesh reﬁnement 1064399 tetrahedral with 10 elements through thickness along the 90° angle for race-track modelling study.

(3) A compressed bent fabric section along the 90° curved plate zones 5, 6 and 7 (pink, orange and turquoise), under the racetrack channel of higher volume fraction vf and with compressed fabric permeability. Within the zones the permeability of the gap was assigned according to (1). The gap height was derived from stochastic modelling based on experimental observations Koutsonas [28]. On the bent preform and along the 90° angle, the material properties (Tables 1, 2 and 3) were assigned with the appropriate volume fraction Vf. Thereafter a series of six FE 3D CAD models were designed and constructed with hmax = 0.35, 0.45, 0.55, 0.65, 0.75, 0.85 mm, i.e. values within the range of those measured with CMM experimental observations Koutsonas [28]. Gaps with hmax less than 0.35 mm have not been modelled due to stochastic variability of the tested preform that made it impossible to quantify experimentally a small gap. The experimental behaviour of each of the tested fabrics was thereafter compared with the closest behaviour of the above generated FE 2D and 3D CAD models. FE modelling in 3D provided more detailed information about how much local permeability may be affected by fabric’s bending along a 90o angle during infusion in RTM process. For the flow behaviour during RTM, fluid mass conservation and equations of unsaturated flow of a viscous liquid through a porous medium were solved by PAMRTM®, based on an adaptive mesh as presented by Trochu et al. [14]. All the 2D, 3D FE CAD model simulations were scaled in the same dimensions as the Perspex cavity tool (i.e. 150 150 2.75 mm) that has been used subsequently for the modelling veriﬁcation.

692

S. Koutsonas

Fig. 4. 3D FE CAD 90° curved plate models meshed with seven zones geometry of different permeability, three hgap zones (green, yellow, purple) see red arrows, three compressed zones with different Vf (turquoise, orange, pink) and blue colour zone with the fabric’s permeability (a) with 154350 tetrahedral elements and hmax 0.85 mm (b) with 300600 tetrahedral elements and hmax 0.35 mm. Table 3. 2/2 twill aerial density measured experimentally Mass (g) 343 66.8 67.5 64 65.8 121.42

Area (mm) 700 700 360 280 360 280 400 230 400 235

d (m/A) 0.7 0.663 0.67 0.696 0.7 0.686

Modeling Race-Tracking Variability of Resin Rich Zones

2.3

693

Permeability Vector Orientations Inside 90° 3D Curved Plate

For all the FE 3D CAD curved plate geometry models, (one of which is shown in Fig. 5), permeability vector orientations were incorporated for each tetrahedral element. For this, all tetrahedral elements were assigned and the warp in-plane K1-permeability orientations and the weft in-plane K2-permeability orientations. These are shown in Fig. 5. The through-thickness K3-permeability orientation was automatically considered by the PAM-RTM® software, to be perpendicular to K1 and K2 accordingly.

Fig. 5. Permeability vector orientations K1 (red arrows), K2 (green arrows) on a 3D curved plate model.

2.4

2/2 Twill, Triaxial, 3D Woven Layer to Layer Interlock ACTS Fabric Volume Fractions for Simulations

In order to evaluate the aerial density (d) of the tested fabrics (2/2 Twill) a series of experimental measurements of mass (g) and area (m2) were done for each preform. Thereafter the average aerial density of each preform was deﬁned as the division of mass (g) against area (m2) and shown in Table 3. With the use of Eqs. (3–2) the material properties and mould height, the ﬁbre volume fraction was calculated as function of mould thickness as shown the Table 3. (1) 2/2 twill preform Volume fraction Eqs. (3–2) numerical calculation for 2/2 twill 3-layer preform for 2.75 mm thickness: Vf 3layer ¼ 42:6%. Fibre volume fractions calculated have been used as simulations input data with PAM-RTM® ﬁrstly for FE CAD 90° curve plate models in this paper.

694

S. Koutsonas

a) At 5 cm from injection gate

b) At 10 cm from injection gate

c) At 5 cm from injection gate

Fig. 6. (a)–(c) 2D 2/2 twill Dy racetrack evaluation for 0.85 mm gap modelled with the PAMRTM software (red coloured ﬁlled area, blue unﬁlled, other colours half ﬁlled).

Modeling Race-Tracking Variability of Resin Rich Zones

2.5

695

Mesh Sensitivity

In this section the 2D, 3D 90° curved plate mesh sensitivity studies are presented. All sensitivity studies were performed for linear injection gate and vent along the preform bend geometry in 2D, 3D as shown in Fig. 6(c). The ﬁll time (s) of each model was provided from the simulation. Analytical ﬁlling time was calculated with the use of (4) tff ¼

Ul x2 2KPinj ff

ð4Þ

for rectilinear flow presented by Rudd C. D., et al. [1] where U is the bed porosity, µ is the fluid viscosity, K is the permeability (m2), P the injection pressure (bar), xff is the ﬁlling distance (m). All models had the same dimensions as the Perspex tool cavity used for the experimental modelling veriﬁcation. 2.6

2D, 3D 90° Curved Plate Sensitivity

2D, 3D 90° curved plate models without the racetrack gap segmentation were studied for sensitivity veriﬁcation using K principal permeability (m2) as shown in Tables 4 and 5. Models were simulated with 1 bar pressure and 0.3 Pas viscosity. All results agreed closely with the analytical ﬁll time. Table 4. 2D 90° curve plate sensitivity with 6899 triangular elements K principal permeability (m2) 3.40E−10 2.22E−10 5.21E−12

2.7

Filling time (s) Analytical ﬁlling time (s) 8.79 8.82 5.77 5.75 13.36 13.5

2D, 3D Race-Tracking 90° Curved Plate Sensitivity

2D racetrack sensitivity was veriﬁed for a series of models as shown in Table 6 (ﬁrst with one segment on bend geometry), Table 7 and Fig. 4, using, the permeability data presented in Tables 1 and 2 for the 2/2 twill fabric 2D 90° curved plate models with ﬁve segment racetrack zones as per Fig. 2 were simulated with 1 bar pressure and 0.3 Pas viscosity, as measured for the industrial oil HDX30 at lab temperature (19 °C). Table 7 shows racetrack ﬁll time against number of elements, illustrating that convergence was seen for 93.600 triangular elements. For example in Table 6, racetrack ﬁll time represents the ﬁll time along the racetrack zone, edge ﬁll time is the ﬁll time at the edge of the mould, and Dy lag position the distance between them.

696

S. Koutsonas Table 5. 3D 90° curve plate sensitivity with 169892 tetrahedral elements K principal permeability (m2) 3.40E−10 2.22E−10 5.21E−12

Filling time (s) Analytical ﬁlling time (s) 8.69 8.82 5.76 5.75 13.29 13.5

Table 6. 2D 90° curved plate with one segment racetrack zone No. of triag. elem. 22320

Racetrack ﬁll time (s) 1.69

Edge ﬁll time (s) 2.13

Dy lag position (cm) 1.67

Racetrack zones 1 zone gap

Table 7. 2D 90° curve plate 5 gap zone racetrack sensitivity studies No. of triag. elem. 9800 28800 50400 93600

Racetrack ﬁll time (s) 1.77 1.77 1.74 1.74

Edge ﬁll time (s) 2.12 2.13 2.132 2.135

Dy lag position (cm) 1.29 1.33 1.42 1.44

Racetrack zones 5 zones 5 zones 5 zones 5 zones

Table 8. 3D 90° curved plate upper side racetrack sensitivity No. of tetrah. elem. 22680 154350

Racetrack ﬁll time (s) 0.736 0.755

Edge ﬁll time (s) 2.13 2.15

Dy-lag position edge (cm) 6.17 6.2

Racetrack zones 3 out of 6 3 out of 6

Table 9. 3D 90° curve plate lower side racetrack sensitivity No. of tetrah. elem. 22680 154350

Racetrack ﬁll time (s) 0.818 0.839

Edge ﬁll time (s) 2.13 2.15

Dy lag position edge (cm) 2.7 3.2

Racetrack zones 3 out of 6 3 out of 6

Due to mesh reﬁnement of the gap geometry which required a 3D structured mesh (highly time consuming and intensive computationally) race-tracking mesh sensitivity was only considered for two 3D models like those presented in Fig. 4. The calculation of race-tracking was done at the end of the vent of the upper side of the 3D curved plate model.

Modeling Race-Tracking Variability of Resin Rich Zones

697

The Dy lag racetrack position was calculated at the injection point of the opposite side (lower part) 3D curved plate model: Table 8 shows racetrack sensitivity for the upper side 3D curved plate models. Table 9 shows lower side Dy lag racetrack sensitivity. The viscosity used for the 3D simulations was again equal to 0.3 Pas and the injection pressure was 1 bar. Results of ﬁlling time of the edge for increasing number of tetrahedral elements comply well in both 3D and 2D cases with small variation which demonstrates acceptable level of mesh convergence.

3 Results and Discussion The experimental flow behaviour was captured through imaging with a Mat-Lab software code web cameras presented in Appendix A. Therefore, in the following sections, the simulation which was the closest match to a particular experiment is presented. From this the racetrack gap height used in the simulation is presumed to determined represent that present in the experiment. This comparison yielded the racetrack behaviour of the tested fabrics along the 90° angle. Considering different mould materials and testing the simile preforms, some small flow variations were expected, from the previously tested 2/2 twill fabrics. This was due to: • Stochastic fabric variability that has not introduced on the 90° curved plate FEA model. • Local variation of the friction-compression coefﬁcient (lfp), of the preforms. • Lateral race-tracking on edges and corners of the model during infusion was not considered as well on FEA modelling. However, the compaction behaviour of both the aluminium base tested fabrics and Perspex were not expected to differ considerably, as both surfaces were comparably smooth. The experimental flow behaviour was captured through imaging by the web cameras. Therefore, in the following sections, the simulation which was the closest match to a particular experiment is presented. From this the racetrack gap height used in the simulation is presumed to determined represent that present in the experiment. This comparison yielded the racetrack behaviour of the tested fabrics along the 90° angle. 3.1

2/2 Twill Race-Tracking

In this Sect. 2D, 3D FE curved plate models via the PAM-RTM® software shown in Figs. 6(a), (b), (c), 7(a), (b) and 8(a), (b) were compared with the experimental results for the 2/2 twill fabric Figs. 7(c), (d) and 8(c), (d).

698

S. Koutsonas

a) Upper side PAM-RTM FE model b) Lower side PAM-RTM FE model

c) Upper side Perspex

d) Lower side Perspex

Fig. 7. 2/2 twill racetrack evaluation (at y = 10 cm of race-track) for 0.85 mm gap size, ﬁrstly obtained with the PAM-RTM software 3D FE CAD model (top images (a), (b)) and secondly with the Perspex transparent tool (lower images (c), (d)).

(1) 2D 2/2 twill race-tracking for 0.85 mm gap FE model As input for 2D flow simulations, thickness-weighted averaging of the equivalent gap permeability shown in Fig. 2 and the permeability of the compressed fabric allows the effective permeability for each segment of the bend to be estimated. In the 2D racetrack model, the flow front (red colour ﬁlled area) is symmetrical as shown in Fig. 6. Table 10 presents y ﬁt race-track analysis for Fig. 6. (2) Results discussion 2D 2/2 twill 0.85 mm gap FE model The simulated 2D average flow front position showing race-tracking, with lagging pressure gradient shapes reflect the dominating effect of race-tracking in the gap as shown in Fig. 6 and numerically presented on Table 10. For the 2D case simulated y ﬁll position centre measured at 5, 10, 15 (cm) from the inlet at this point y ﬁll position top-left and down-right was found to be 2.42, 5.98, 9.99 (cm) respectively increased Dy race-tracking flow front for 2.58, 4.02 and 5.01 (cm) at the outlet. (3) 3D 2/2 twill race-tracking for 0.85 mm gap FE model In order to obtain more accurate results through-thickness variations in permeability has to be taken into account in 3D FE model flow simulation. 3D 2/2 twill weave FE model shown in Fig. 7 for a 0.85 mm gap height along the 90° angle compared with Perspex experiment revealed about the same fluid flow behaviour and so a gap of about the same height for the upper side bend of the tool.

Modeling Race-Tracking Variability of Resin Rich Zones

a) Upper side PAM-RTM model

c) Upper side Perspex

699

b) Lower side PAM-RTM model

d) Lower side Perspex

Fig. 8. 2/2 twill racetrack evaluation for the 0.35 mm gap size, obtained with the PAM-RTM software 3D FE CAD model (images (a), (b)) and with the Perspex transparent tool (images (c), d)). Table 10. 2D 2/2 twill racetrack flow shape analysis from Inlet for a 0.85 MM gap FE model of Fig. 6(a), (b) and (c) y-ﬁll position y-ﬁll position 90° centre (cm)a upper (cm) 5 2.42 10 5.98 15 9.99 a For briefly in the future only images

y-ﬁll position lower (cm) 2.42 5.98 9.99 y ﬁt presented.

Dy race-track (ﬁll 90° centre (-) ﬁll average upper/lower) (cm) 2.58 4.02 5.01

Shape analysis on resin flow simulation based on 2/2 twill along the 90° angle in Fig. 7 is presented in the following Table 11. Comparing 3D modelling race-track results in Fig. 7 and Table 11(a) with (b), Table 11(c) with (d) give close relation to the experimental results than those on 2D modelling in Table 7 as expected due to more realistic Vf along the 90° curved plate bending. (4) Results discussion 3D 2/2 twill 0.85 mm gap FE model In this situation (case 1) the simulated flow front pressure gradient shapes reflect the dominating effect of race-tracking in the gap.

700

S. Koutsonas

Table 11. 2/2 twill racetrack flow shape analysis from inlet for a 0.85 MM gap FE model against 2/2 twill perspex exp. (a), (b) Upper Sides, (c), (d) Lower Sides Related to Fig. 7 a) 3D 2/2 twill FE (PAM-RTM) model shape analysis upper side y-fill position 90° centre (cm) 5 10 15 b)

y-fill position lower (cm) 1.43 3.33 6.36

Δy race-track (fill 90° centre (-) fill average upper/lower) (cm) 3.57 6.67 8.64

2/2 twill Perspex exp. shape analysis upper side

y-fill position 90° centre (cm) 5 10 15 c)

y-fill position upper (cm) 1.43 3.33 6.36 y-fill position upper (cm) 1.38 2.75 4.03

y-fill position lower (cm) 1.86 2.83 5.75

Δy race-track (fill 90° centre (-) fill average upper/lower) (cm) 3.38 7.21 10.11

3D 2/2 twill FE (PAM-RTM) model shape analysis lower side

y-fill position 90° centre (cm) 5 10 15

y-fill position upper (cm) 3.32 6.74 11.2

y-fill position lower (cm) 3.32 6.74 11.2

Δy race-track (fill 90° centre (-) fill average upper/lower) (cm) 1.68 3.26 3.8

d) 2/2 twill Perspex exp. shape analysis lower side y-fill position 90° centre (cm) 5 10 15

y-fill position upper (cm) 3.11 6.87 11.31

y-fill position lower (cm) 3.01 6.82 11.41

Δy race-track (fill 90° centre (-) fill average upper/lower) (cm) 1.94 3.16 3.64

This observation indicated that the volume fraction modelling along the 90° angle with the 2/2 twill textile was correctly evaluated and that the fluid flow between the mould and the 2/2 twill fabric have been predicted more accurately with 3D FE modelling. Modelling results of fluid flow agrees well with the experiment as shown in Fig. 7 and Table 11. Considering that the local variability of the fabric and the others phenomena in Sect. 3 results and discussion bullet points was not take on to account for the 3D FE model these are satisfactory results. (5) 3D 2/2 twill race-tracking for 0.35 mm gap FE model A further 2/2 twill weave Perspex experiment revealed a fluid flow behaviour ﬁlling similar to 3D FE (PAM-RTM®) 0.35 mm gap height for the upper side bend of the tool.

Modeling Race-Tracking Variability of Resin Rich Zones

701

Table 12. 2/2 twill racetrack flow shape analysis from inlet for a 0.35 MM gap FE model against 2/2 twill perspex exp. (a), (b) Upper Sides, (c), (d) Lower Sides Related to Fig. 8 a) 3D 2/2 twill FE (PAM-RTM) model shape analysis upper side y-fill position 90° centre (cm) 5 10 15

y-fill position upper (cm) 3.94 8.7 13.7

y-fill position lower (cm) 3.94 8.7 13.7

Δy race-track (fill 90° centre (-) fill average upper/lower) (cm) 1.06 1.3 1.3

b) 2/2 twill Perspex exp. shape analysis upper side y-fill position 90° centre (cm) 5 10 15 c)

y-fill position upper (cm) 4.86 8.93 13.45

y-fill position lower (cm) 4.2 8.82 12.6

Δy race-track (fill 90° centre (-) fill average upper/lower) (cm) 0.47 1.13 1.97

3D 2/2 twill FE (PAM-RTM) model shape analysis lower side

y-fill position 90° centre (cm) 4.39 9.38 14.1

y-fill position upper (cm) 5 10 15

y-fill position lower (cm) 5 10 15

Δy lag (fill 90° centre (-) fill average upper/lower) (cm) -0.61 -0.62 -0.9

d) 2/2 twill Perspex exp. shape analysis lower side y-fill position 90° centre (cm) 4.79 9.36 14.19

y-fill position upper (cm) 4.97 9.87 14.9

y-fill position lower (cm) 5.07 10.05 15.00

Δy lag (fill 90° centre (-) fill average upper/lower) (cm) -0.23 -0.6 -0.77

Figure 8 shape analysis 3D 2/2 twill race-tracking for 0.35 mm gap FE model is presented in the following Table 12. 3D modelling race-track results on Fig. 8 and Table 12(a), (b), (c) and (d) are more realistic due to more realistic Vf along the 90° curved plate bending. (6) Results discussion 3D 2/2 twill race-tracking for 0.35 mm gap FE model 3D simulation modelling results from PAM-RTM® with 2/2 twill properties, 0.35 mm gap observed similar flow behaviour compared with a Perspex tool experiments (with the same dimensions). In this situation the through-thickness permeability K3 is less dominant and therefore there is only a small racetrack on top of the tool along the bend angle and small lag of flow front at the opposite side of the tool.

702

S. Koutsonas

Similar flow behaviour was obtained with the 3D FE CAD model for 0.35 mm gap with 2/2 twill material properties as shown in Fig. 8 and Table 12 and in modelling results of fluid flow are in agreement with the experiment. (7) Results discussion 3D 2/2 twill race-tracking Experimental observations with the Perspex transparent tool suggested that the racetrack gap on the upper part of the bend angle may vary between 0.85–0.35 mm. The variation of the through thickness permeability K3 against volume fraction Vf provided different shapes of race-tracking, which were modelled with the PAM-RTM® software. Experimental results observation of the 2/2 twill preform with the Perspex tool on a 90° curve angle bend, suggested that during infusion there may be a centre gap of 0.85 mm between reinforcement and tool, which results in race-tracking at the upper radius of the bend. In this case the through thickness permeability K3 is reasonable high and therefore may allow an effective through thickness fluid exchange on both sides of the bend. The flow front shape at the top and the opposite side was dominated by the racetrack zone of the upper part. A similar behaviour was obtained for the 2/2 twill material properties of gap size 0.85 mm, but with the 3D FE CAD model as shown in Fig. 7. For a 0.35 mm centre gap observed (transient behaviour) the through thickness permeability K3 is less dominant therefore there is only a small racetrack on top of the tool along the bend angle and small lag of flow front at the opposite side of the tool as presented in Fig. 8.

4 Conclusion A novel numerical approach for 3D FE CAD modelling was developed in order to predict race-tracking and variability for advanced composites structures. A stochastic analysis technique was developed to account for the effect of node variability during the fabrication process by RTM. The study based on this technique provided important insights into flow ﬁlling variations, voidage formation and optimization on a generic advanced composite structure. The model technique developed from this work can be used to account for the effects of race-tracking and variability on any other composite component at the macroscale level. The predicted race-track and variability data can complement experimental data in order to enhance flow simulations at the component scale. Future scope on this research is add the racetrack behaviour in Advanced Composites Structures CAE/CAD geometries which include the 90º composite ﬁbre curve plates in order to be able to predict the racetracking and flow behaviour in more advanced composites geometries.

Modeling Race-Tracking Variability of Resin Rich Zones

Appendix A: Programming Codes A.1 Images Acquisition with 2 Web Cams (Mat Lab) %11.10.2017 Spiridon Koutsonas %test video input of two installed webcams %based on Matlab demo "Logging Data To Disk" function video_input() close all clc %************************************************************* *** %Input: s=90; %input in seconds video1 = 'video_1.A'; %name of video 1 video2 = 'video_2.A'; %name if video 2 fgi = 1; % FrameGrabInterval - capture every fgi frame only %************************************************************* *** t=s*15; %total number of frames to capture -> camera has nominal 15fps video1=sprintf([video1, '.avi']); video2=sprintf([video2, '.avi']); imaqhwinfo; info=imaqhwinfo('winvideo'); disp('') disp('*************** Supported Formats: ***************') info.DeviceInfo.SupportedFormats info.DeviceIDs %Construct a video input object. vid1 = videoinput('winvideo', 2,'YUY2_320x240'); %'YUY2_1024x768' 'YUY2_800x600' 'YUY2_320x240' vid2 = videoinput('winvideo', 3, 'YUY2_320x240'); %'YUY2_320x240' % %Select the source to use for acquisition. set(vid1, 'FramesPerTrigger', t); set(vid1, 'ReturnedColorSpace', 'gray'); set(vid1, 'FrameGrabInterval', fgi); disp('') disp('*************** Video Input 1 settings: ***************') get(vid1) set(vid2, 'FramesPerTrigger', t);

703

704

S. Koutsonas set(vid2, 'ReturnedColorSpace', 'gray'); set(vid2, 'FrameGrabInterval', fgi); disp('') disp('*************** Video Input 2 settings: ***************') get(vid2) % % %View the properties for the selected video source object. disp('*************** Video stream 1 properties: ***************') src_vid1 = getselectedsource(vid1); get(src_vid1) disp('*************** Video stream 2 properties: ***************') src_vid2 = getselectedsource(vid2); get(src_vid2) % % %Preview a stream of image frames. preview(vid1); preview(vid2); % Configure the logging mode to disk. set(vid1, 'LoggingMode', 'Disk'); %Disk&Memory set(vid2, 'LoggingMode', 'Disk'); % Create an AVI file object. disp('*************** AVI parameters: ***************') logfile1 = avifile(video1) logfile2 = avifile(video2); % Select a codec for the AVI file. logfile1.Compression = 'none'; logfile2.Compression = logfile1.Compression ; %logfile.Fps=1; logfile1.Quality=100; logfile2.Quality=logfile1.Quality; % Since grayscale images will be acquired, a colormap is required. logfile1.Colormap = gray(256); logfile2.Colormap = gray(256) ; % Configure the video input object to use the AVI file object. vid1.DiskLogger = logfile1; vid2.DiskLogger = logfile2; disp('') disp('*************** Start acuisition... ***************') %read: http://www.mathworks.com/matlabcentral/newsreader/view_thread/100745 start(vid1); start(vid2); tic % Wait for the acquisition to finish. wait(vid1,t); %data = getdata(vid1,2); %read out time stamp stop(vid1); stop(vid2); t1=toc %total acquiring time disp('*************** End acuisition. ***************') % % Determine the number of frames acquired. f1=vid1.FramesAcquired f2=vid2.FramesAcquired % % Ensure that all acquired frames were written to disk. % vid1.DiskLoggerFrameCount % Once all frames have been written, close the file. aviobj1 = vid1.Disklogger; file = close(aviobj1); aviobj2 = vid2.Disklogger; file = close(aviobj2); % Once the video input object is no longer needed, delete % it and clear it from the workspace. delete(vid1); delete(vid2); clear vid1 vid2

Modeling Race-Tracking Variability of Resin Rich Zones

705

%************************************************************* ********** %Playback movie (just by grabbing frames - not possible to play two movies %in the same window??? pvid1=MMREADER(video1); %load video file to play back pvid2=MMREADER(video2); %determine the number of frames if pvid1.NumberOfFrames { for (int j = 0; j < n; j++) hist[image[i][j]].incrementAndGet(); });

The performance of this parallel version of the program depends on the value of max: the number of intensity levels (and the size of the hist array). If max is small compared to the total number of cores, then collisions to increment the elements of hist are frequent. Whereas, a large value of max spreads out the increments to hist and reduces the contention. Lester [12] presents a more complete mathematical analysis of this contention problem in a parallel Histogram program, and concludes that contention can cause overall execution time to increase by as much as a factor of two. We tested both the parallel and sequential versions of our Histogram program using a desktop Windows computer with the following 6-core processor: AMD Phenom II X6 1035T 2.6 GHz. For an image with max = 10, the total parallel execution time is 7872 ms.

726

B. P. Lester

Increasing to max = 200, reduces the execution time to 3989 ms. However, this is still considerably slower than the sequential version execution time 1069 ms. Increasing the value of max reduces the contention problem for access to the hist array, but still leaves the other two performance problems mentioned earlier: access time delays for the Atomic Variables, and caching issues for the hist array. To remove these other problems, we need a new parallel programming pattern that introduces local copies of the hist array for each parallel task. Then at the end of the computation, the local copies of hist are combined (reduced) into a single ﬁnal output copy. The portion of the parallel computation with local copies of hist is the Map portion, and the Reduce portion combines the local copies. Thus, we have a Map-Reduce parallel programming pattern. The implementation using Java parallel streams is as follows: AtomicLong hist[] = new AtomicLong[max]; // shared for (i = 0; i < max; i++) hist[i] = new AtomicLong(0); // initialize IntStream.range(0,n).parallel() .forEach(i -> { long ahist[] = new long[max]; // local for (k = 0; k < max; k++) ahist[k] = 0; // initialize for (j = 0; j < n; j++) ahist[image[i][j]]++; // reduce local ahist into shared hist for (k = 0; k < max; k++) hist[k].addAndGet(ahist[k]); });

Using the 6-core processor mentioned earlier, the execution time for this new parallel version is 291 ms. This provides a speedup of 3.7 over the original sequential Histogram program, and a surprising speedup of 13.7 over the previous parallel version. The small change of providing a local copy of the hist array for each parallel task has changed this parallel stream implementation from very poor performance to quite good performance. 2.3

Performance Analysis

To further investigate the performance issue, we ran benchmark tests on three different classes of computers with multi-core processors: laptop, desktop, and server. Here is the processor information for each computer: Laptop: 2-core (hyperthreading) Intel i7 2020M 2.5 GHz Desktop: 6-core AMD Phenom II X6 1035T 2.6 GHz Server: 10-core Intel Xeon E5-2650L 1.7 GHz Table 1 shows the execution time for the sequential version and the three parallel versions of the Histogram program for the three different classes of computer. All execution times are in milliseconds.

Performance of Map-Reduce Using Java-8 Parallel Streams

727

Table 1. Execution time of histogram program (MS) Program Shared Hist (Max = 10) Shared Hist (Max = 200) Local Hist (Max = 200) Sequential Parallel Speedup Factor

Server 5140 2360 210 650 3.1

Desktop 7872 3989 291 1069 3.7

Laptop 4775 3540 321 378 1.2

The Parallel Speedup Factor is computed as the Sequential time divided by the best parallel version time (Local Hist). On all three classes of computers, the Map-Reduce parallel performance optimizations described above produce dramatic reductions in the execution time, transforming a parallel program with poor performance into one with good speedup compared to the sequential version. The most dramatic performance improvement results when each parallel task is provided with its own local copy of the Hist array. There are two performance factors causing this improvement. The ﬁrst is the transformation of type deﬁnition of the Hist array. The shared Hist array is an Array of AtomicLong, whereas the local Hist array is an Array of long. Less time is required for a simple increment of the primitive type long than an incrementAndGet() operation on an AtomicLong. However, the bulk of the performance improvement results from a second important factor: the caching behavior of the Hist array. It is well known that caching behavior can create enormous changes in program execution times for parallel programs, as well as sequential programs. Chen and Johnson [13] show that interchanging the inner two loops of a standard matrix multiplication program gives an 8x speedup to both the sequential and parallel versions. Their analysis shows this dramatic performance improvement results completely from the caching behavior of the program. Akhter and Roberts [14, p. 195] show that a cache-friendly version of the Sieve of Eratosthenes (Prime Numbers) is ﬁve time faster than the cache-unfriendly version. During the execution of our Shared Hist parallel version of the Histogram program, each parallel task does 20,000 increments on the elements of the Shared Hist array. These parallel increments by different processor cores prevent the Hist array from migrating to the private cache of each core. An elements of the Hist array that are copied into the private cache of any core, are quickly invalidated when another core increments that element of the Hist array. Thus, the vast majority of the memory accesses to the Shared Hist array require main memory access, which takes ten times longer than cache memory access. In the Local Hist parallel version, each parallel task increments its own Local Hist array 20,000 times. Thus, the elements of the Hist array quickly migrate to the lowest level of the private cache of the core on which the parallel task is running. Almost all of the increments to Local Hist array are then done in the cache, providing a dramatic performance improvement. It is still necessary to access and update the Shared Hist array during the ﬁnal Reduction Phase of the program. However, each parallel task only has 200 accesses of the Shared Hist array, compared to the 20,000 accesses to the Local Hist array.

728

B. P. Lester

3 Document Keyword Search To further investigate the Map-Reduce performance optimizations for Java parallel streams, this section presents a parallel program for searching a document for keywords. To simplify the program and focus attention on the parallelism, we assume the document has already been reduced to plain text, and preprocessed to remove spaces, punctuation marks, and capital letters. The input to our Document Keyword Search program is two arrays of String: keys[] is an array of keywords, and doc[] is an array of the words from the document. The output of the program is an array of integers freq[] that gives the total number of occurrences of each keyword in the document. For keyword keys[i], the value of freq[i] gives the total number of occurrences in the document. Following is a simple sequential program for the Document Keyword Search: String doc[] = new String[n]; // document words String keys[] = new String[m]; // keywords int freq[] = new int[m]; for (i = 0; i < m; i++) freq[i] = 0; // initialize for (i = 0; i < n; i++) for (j = 0; j < m; j++) if (keys[j] == doc[i]) freq[j]++;

The performance of this sequential program can be signiﬁcantly improved by sorting the keys array ﬁrst, and then using a Binary Search instead of a full linear search of the whole array. However, an even faster technique is to store the keywords in a HashMap: HashMap keyMap = new HashMap(m); for (i = 0; i < m; i++) keyMap.put(keys[i], i);

For example, if the keyword “tree” is originally stored in keys[50], then the pair (“tree”, 50) will be put into the HashMap. The HashMap allows keywords to be looked up very quickly. The main loop of the sequential form of the Document Keyword Search is then as follows: for (i = 0; i < n; i++) freq[keyMap.get(doc[i])]++;

Using n = 100,000,000 words in the document, and m = 1,000 keywords, the Sequential Execution time is 1784 ms on the 6-core Desktop computer described earlier.

Performance of Map-Reduce Using Java-8 Parallel Streams

3.1

729

Parallel Keyword Search

For our ﬁrst parallel version, we use a parallel stream to create a parallel task for each document word. The freq array then becomes a shared array and must be converted to an array of AtomicInteger. Also, the HashMap is accessed in parallel and must be converted to a ConcurrentHashMap, the Java thread-safe version of HashMap. The Parallel Document Keyword Search program is as follows: AtomicInteger freq[] = new AtomicInteger[m]; ConcurrentHashMap keyMap = new ConcurrentHashMap(1000); for (i = 0; i < m; i++) { // initialize freq[i] = new AtomicInteger(0); keyMap.put(keys[i], i); } IntStream.range(0,n).parallel() .map(i -> keyMap.get(doc[i])) .forEach(j -> freq[j].incrementAndGet());

The execution time for this ﬁrst parallel version on the 6-core desktop computer is 2747 ms, which is a 50% increase over the execution time of the sequential version. There are two main causes for this increase: the additional time required for increment of the Atomic Integers in the freq array, and the caching behavior of the shared freq array. In the sequential version of the program, the freq array migrates to the private cache of the core executing the code and therefore has a low access time. However, in the parallel version, the shared freq array cannot be effectively cached because all cores are continually updated all elements of the array. Any elements of the freq array that do reach the private cache of any core are quickly invalidated when other cores increment them. To improve the performance of the parallel program, we would like to apply the same Map-Reduce optimization used for the Histogram program in Sect. 2: let each parallel task have its own local copy of the freq array, and then reduce them back together at the end into a shared array. However, we have a granularity problem in the case of the Document Keyword Search program. Each parallel task created by the parallel stream only processes a single document word. This is hardly enough granularity to effective make use of a whole separate local copy of the freq array. To utilize this Map-Reduce optimization here, we must ﬁrst group the elements of the document and then give a local copy of the freq array to each group. This grouping is also required in the Histogram program of Sect. 2. However, the two-dimensional image array provides a convenient way of grouping: each parallel task operates on a whole row of the image array. Since each row has 20,000 pixels, this grouping by row provides a large granularity for each parallel task, so it can efﬁciently use its own local copy of the shared hist array. In the case of the Document Keyword Search, the document is a one-dimensional array, and each parallel task only processes a single element of the array.

730

B. P. Lester

To increase the parallel task granularity in the Document Keyword Search, we use manual grouping of portions of the document array. A constant p determines the number of parallel tasks (groups). If the document array has n elements, then the group size G is n/p. In our ﬁrst parallel version above, the Java Stream has n elements, one for each element of the document array. In the new parallel version, the Java Stream has only p elements. Each group will have its own local (private) copy of the freq array. At the end of each parallel task, the local freq array is reduced (merged) into the shared freq array. The Java code of this improved parallel version is as follows: int n = 100000000; // number of document words int m = 1000; // number of keywords int p = 100; // number of groups int G = n/p; // group size AtomicInteger freq[] = new AtomicInteger[m]; IntStream.range(0,p).parallel() .forEach(w -> { int afreq[] = new int[m]; // local freq for (c = 0; c < m; c++) afreq[c] = 0; // initialize local for (c = w*G; c < (w+1)*G; c++) afreq[keyMap.get(doc[c])]++; // Reduce local afreq to shared freq for (c = 0; c < m; c++) freq[c].addAndGet(afreq[c]); });

The execution time of this new parallel version with the Map-Reduce optimizations is 556 ms on the 6-core desktop computer. This is a speedup of 3.2 over the Sequential version, and a speedup of 5 compared to the previous unoptimized parallel version. To further investigate the performance of this program, Table 2 shows the execution time for the sequential version and the two parallel versions of the Document Keyword Search program for the three different classes of computer (details given in Sect. 2.1). All execution times are in milliseconds. The Parallel Speedup Factor is computed as the Sequential time divided by the best parallel version time (Local Freq). On all three classes of computers, the Map-Reduce parallel performance optimizations described above produce dramatic reductions in the execution time, transforming a parallel program with poor performance into one with good speedup compared to the sequential version. Table 2. Execution Time of Keyword Search Program (MS) Program Shared Freq Array Local Freq Array Sequential Parallel Speedup Factor

Server 800 323 1117 3.5

Desktop 2747 556 1784 3.2

Laptop 1219 609 922 1.5

Performance of Map-Reduce Using Java-8 Parallel Streams

3.2

731

MapReduce Stream Operation

We believe the Map-Reduce parallel programming pattern with its associated optimizations should be added as a new terminal stream operation to the Java language. This pattern can be applied in a wide range of parallel programs to produce good performance. The parallel programming pattern which usually produces the best performance is the Relaxed Algorithm [12] (often called Embarrassingly Parallel [10]): the parallel subtasks are completely independent and do not interact at all, and shared data is read-only. Unfortunately, many parallel algorithms require some interaction between parallel tasks, in which many tasks modify the same shared data structure. To optimize the performance of such algorithms, the Map-Reduce pattern has two phases: a relaxed phase (the Map operation) followed by the Reduce phase. During the Map Phase, each parallel task operates on its own local copy of the data structure(s). Then during the Reduce Phase, the local data of the parallel tasks is merged back into a shared data structure, which represents the output of the whole computation. As long as the Reduce Phase is relatively short compared to the Map Phase, this pattern will produce good performance. In the above examples of the Histogram program and Document Keyword Search program, we have seen two major performance optimizations: Grouping and Localization. The Grouping is needed to create a large enough parallel task to justify having a Local Copy of the shared data structure. This creates the Map Phase of the program. Then during the Reduce Phase, these Local Copies are reduced (merged) back into the shared data structure. We propose the addition of a new terminal stream operation to Java called MapReduce, which automatically applies these optimizations: MapReduce(fMap, D, fReduce) where fMap is a function applied to each element of the Stream,

D is the shared data structure, fReduce is a reduction function that merges local copies of D

If the stream elements have type Stype and the data structure has type Dtype, then the signature of the two functions is as follows: Dtype fMap(Stype) Dtype fReduce(Dtype, Dtype)

This MapReduce() operation will perform the following computing steps: (1) Divide the stream elements into separate groups, one group for each parallel task. (2) Create a separate local copy of the shared data structure for each parallel task. (3) Each parallel task will apply the function fMap() sequentially to each stream element in its assigned group, thereby modifying its local copy of the data structure.

732

B. P. Lester

(4) Reduce all the local copies of the data structure into a single shared copy using fReduce(). An operation similar to this MapReduce() is found in the standard OpenMP API for parallel programming [15]. It is the reduction clause that can be added to the OpenMP parallel directive. The general form is: reduction(operator: variable), where variable is a scalar shared variable of primitive type, and operator is a reduction operator. For example, reduction(+: sum) will perform the following steps: (1) Create one local copy of the shared variable sum for each parallel thread. (2) Initialize each local copy of sum using the value of the shared sum. (3) When the parallel threads are ﬁnished, reduce all the local sum variables into the shared sum variable using the “+” (addition) operator. This is similar to, but somewhat more restricted than the MapReduce() operation we are proposing for Java. The OpenMP reduction clause can only deal with a scalar variable of primitive type and a simple reduction operator like addition or multiplication. Our MapReduce handles an arbitrary shared data structures, such as a collections or array, and allows a user-deﬁned reduction function fReduce() that operates on this data structure. Our proposed MapReduce stream operation is somewhat different from the Hadoop MapReduce operation [9], which was originally developed by Google [16] for distributed processing of big data on computer clusters. The Hadoop MapReduce has a very speciﬁc data model: key/value pairs, where our MapReduce allows any function fMap to be mapped onto the stream elements of any type. We also allow more general shared data structures, such as arrays, and reduction functions fReduce that operate on these data structures.

4 Jacobi Relaxation To further illustrate our Map-Reduce parallel programming pattern and its associated performance optimizations, we will consider in detail one ﬁnal example: solving a partial differential equation using Jacobi Relaxation [12], [17]. Consider a simple application problem of determining the voltage levels across a two-dimensional, rectangular, conducting metal sheet where the voltage level is held constant along the four outer boundaries. Then the voltage function v(x, y) on the surface of the metal sheet must satisfy Laplace’s Equation in two-dimensions: @2v @2v þ ¼0 @x2 @y2

Performance of Map-Reduce Using Java-8 Parallel Streams

733

The solution to this partial differential equation can be numerically computed by using a two-dimensional array of points on the surface of the metal sheet and applying the Jacobi Relaxation Algorithm: start with an arbitrary guess for the initial voltage at each point, then iteratively recompute the value at each point as the average of the four neighboring points (above, below, left, right): v(i,j) = (v(i-1,j)+v(i+1,j)+v(i,j-1)+v(i,j+1))/4

The algorithm converges when the changes to the points become very small. To facilitate the computation, two arrays (A and B) will be used. As the new value for each point in array A is computed, it is stored in array B. Then the new values are copied from B back to A to prepare for the next iteration. As each new value is copied, it is compared to the corresponding old value for the convergence test. The sequential Java program is as follows: int n = 10000; double tolerance = .1; do { // Phase I: Compute new value of each point for (i = 1; i

Intelligent Computing

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch