Advances in Intelligent Systems and Computing 857
Kohei Arai Supriya Kapoor Rahul Bhatia Editors
Intelligent Computing Proceedings of the 2018 Computing Conference, Volume 2
Advances in Intelligent Systems and Computing Volume 857
Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail:
[email protected]
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results.
Advisory Board Chairman Nikhil R. Pal, Indian Statistical Institute, Kolkata, India e-mail:
[email protected] Members Rafael Bello Perez, Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba e-mail:
[email protected] Emilio S. Corchado, University of Salamanca, Salamanca, Spain e-mail:
[email protected] Hani Hagras, University of Essex, Colchester, UK e-mail:
[email protected] László T. Kóczy, Széchenyi István University, Győr, Hungary e-mail:
[email protected] Vladik Kreinovich, University of Texas at El Paso, El Paso, USA e-mail:
[email protected] Chin-Teng Lin, National Chiao Tung University, Hsinchu, Taiwan e-mail:
[email protected] Jie Lu, University of Technology, Sydney, Australia e-mail:
[email protected] Patricia Melin, Tijuana Institute of Technology, Tijuana, Mexico e-mail:
[email protected] Nadia Nedjah, State University of Rio de Janeiro, Rio de Janeiro, Brazil e-mail:
[email protected] Ngoc Thanh Nguyen, Wroclaw University of Technology, Wroclaw, Poland e-mail:
[email protected] Jun Wang, The Chinese University of Hong Kong, Shatin, Hong Kong e-mail:
[email protected]
More information about this series at http://www.springer.com/series/11156
Kohei Arai Supriya Kapoor Rahul Bhatia •
Editors
Intelligent Computing Proceedings of the 2018 Computing Conference, Volume 2
123
Editors Kohei Arai Faculty of Science and Engineering, Department of Information Science Saga University Honjo, Saga, Japan
Rahul Bhatia The Science and Information (SAI) Organization Bradford, West Yorkshire, UK
Supriya Kapoor The Science and Information (SAI) Organization Bradford, West Yorkshire, UK
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-01176-5 ISBN 978-3-030-01177-2 (eBook) https://doi.org/10.1007/978-3-030-01177-2 Library of Congress Control Number: 2018956173 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents
Statistical Learning of Lattice Option Pricing and Traders’ Behavior Using Ising Spin Model for Asymmetric Information Transitions . . . . . Prabir Sen and Nang Laik Ma
1
Deep Time Series Neural Networks and Fluorescence Data Stream Noise Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Obert and Matthew Ferguson
18
Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Riyaz Sikora and Sahil Raina
33
Optimisation of Hadoop MapReduce Configuration Parameter Settings Using Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Khaleel and H. S. Al-Raweshidy
40
Fast Interpolation and Fourier Transform in High-Dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Hecht and Ivo F. Sbalzarini
53
A Reconfigurable Architecture for Implementing Locally Connected Neural Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. E. G-H-Cater, C. T. Clarke, B. W. Metcalfe, and P. R. Wilson
76
Minimum Spanning Tree Problem with Single-Valued Trapezoidal Neutrosophic Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Said Broumi, Mohamed Talea, Assia Bakali, Florentin Smarandache, and Santanu Kumar Patro
93
Hybrid Evolutionary Algorithm Based on PSOGA for ANFIS Designing in Prediction of No-Deposition Bed Load Sediment Transport in Sewer Pipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Bahram Gharabaghi, Hossein Bonakdari, and Isa Ebtehaj
v
vi
Contents
Extreme Learning Machines in Predicting the Velocity Distribution in Compound Narrow Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Hossein Bonakdari, Bahram Gharabaghi, and Isa Ebtehaj Rice Classification Using Scale Conjugate Gradient (SCG) Backpropagation Model and Inception V3 Model . . . . . . . . . . . . . . . . . 129 Zahida Parveen, Yumnah Hasan, Anzar Alam, Hafsa Abbas, and Muhammad Umair Arif A Machine Learning Approach to Analyze and Reduce Features to a Significant Number for Employee’s Turn Over Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Mirza Mohtashim Alam, Karishma Mohiuddin, Md. Kabirul Islam, Mehedi Hassan, Md. Arshad-Ul Hoque, and Shaikh Muhammad Allayear Legal Document Retrieval Using Document Vector Embeddings and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, Vindula Jayawardana, Dimuthu Lakmal, and Madhavi Perera Deep Learning Based Classification System for Identifying Weeds Using High-Resolution UAV Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 M. Dian Bah, Eric Dericquebourg, Adel Hafiane, and Raphael Canals Recognition of Heart Murmur Based on Machine Learning and Visual Based Analysis of Phonocardiography . . . . . . . . . . . . . . . . . 188 Magd Ahmed Kotb, Hesham Nabih Elmahdy, Fatma El Zahraa Mostafa, Christine William Shaker, Mohamed Ahmed Refaey, and Khaled Waleed Younis Rjoob Activity Recognition from Multi-modal Sensor Data Using a Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Aboozar Taherkhani, Georgina Cosma, Ali A. Alani, and T. M. McGinnity Quality Scale for Rubric Based Evaluation in Capstone Project of Computer Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Zeeshan Haider Malik, Sabur Butt, and Hanzla Sajid Transfer of Empirical Engineering Knowledge Under Technological Paradigm Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Xinyu Li, Zuhua Jiang, Yeqin Guan, and Geng Li Role of Digital Fluency and Spatial Ability in Student Experience of Online Learning Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Tatiana Tchoubar, Thomas R. Sexton, and Lori L. Scarlatos
Contents
vii
Identifying the Underlying Factors of Students’ Readiness for E-Learning in Studying English as a Foreign Language in Saudi Arabia: Students’ and Teachers’ Perspectives . . . . . . . . . . . . . 265 Ibrahim M. Mutambik, John Lee, and Yvonne Foley Using Cyber Competitions to Build a Cyber Security Talent Pipeline and Skilled Workforce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 R. Cherinka Practical Security for Electronic Examinations on Students’ Devices . . . 290 Bastian Küppers, Marius Politze, Richard Zameitat, Florian Kerber, and Ulrik Schroeder Automating the Configuration Management and Assessment of Practical Outcomes in Computer Networking Laboratories . . . . . . . . 307 Neville Palmer, Warren Earle, and Jomo Batola Designing the University’s Creative Environment: Structural-Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Alexander O. Karpov Birds Control in Farmland Using Swarm of UAVs: A Behavioural Model Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Chika O. Yinka-Banjo, Wahab A. Owolabi, and Andrew O. Akala A Review of Path Smoothness Approaches for Non-holonomic Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Iram Noreen, Amna Khan, and Zulfiqar Habib Design and Implementation of Bluetooth Controlled Painting Robot for Auto Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Bilal Ahmad, Ayesha Iqbal, Roshaan Saqib, Mohammad Mustafa Mirza, and Atta ul Mohsin Lali A Survey on Trust in Autonomous Systems . . . . . . . . . . . . . . . . . . . . . . 368 Shervin Shahrdar, Luiza Menezes, and Mehrdad Nojoumian An Energy Efficient Coverage Path Planning Approach for Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Amna Khan, Iram Noreen, and Zulfiqar Habib SOAP-Based Web Service for Localization of Multi-robot System in Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Tlijani Hayet and Jilani Knani Metrics for Real-Time Solutions Design . . . . . . . . . . . . . . . . . . . . . . . . . 411 Khaldia Laredj, Mostefa Belarbi, and Abou Elhassan Benyamina
viii
Contents
GGSE-Website Usability Evaluation Framework . . . . . . . . . . . . . . . . . . 426 Aiman Khan Nazir, Iqra Zafar, Asma Shaheen, Bilal Maqbool, and Usman Qamar Measuring Application Stability Using Software Stability Assessment Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Bassey Asuquo Ekanem and Evans Woherem Automated Scenario-Based Evaluation of Embedded Software and System Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Thomas Kuhn, Pablo Oliveira Antonino, and Andreas Morgenstern Development Approaches for Mobile Applications: Comparative Analysis of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Lisandro Delia, Pablo Thomas, Leonardo Corbalan, Juan Fernandez Sosa, Alfonso Cuitiño, Germán Cáseres, and Patricia Pesado Using Linked Data Resources to Generate Web Pages Based on a BBC Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Leila Zemmouchi-Ghomari, Rania Sefsaf, and Kahina Azni Trade-off Analysis Among Elicitation Techniques Using Simple Additive Weighting Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 Asif Ullah, Shah Nazir, and Sara Shahzad Risk Management in Software Engineering: What Still Needs to Be Done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Tauqeer Hussain A Survey of Quality of Service (QoS) Protocols and Software-Defined Networks (SDN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Ronak Al-Haddad and Erika Sanchez Velazquez Ambiguity Function Analysis of Frequency Diverse Array Radar Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 Jianbin Gao, Kwame Opuni-Boachie Obour Agyekum, Emmanuel Boateng Sifah, Qi Xia, and Edward Agyemang-Duah WebNSM: A Novel Scalable WebRTC Signalling Mechanism for One-to-Many Bi-directional Video Conferencing . . . . . . . . . . . . . . . 558 Naktal Moaid Edan, Ali Al-Sherbaz, and Scott Turner Developing an Asynchronous Technique to Evaluate the Performance of SDN HP Aruba Switch and OVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Ameer Mosa Al-Sadi, Ali Al-Sherbaz, James Xue, and Scott Turner
Contents
ix
Energy Aware Cluster-Head Selection for Improving Network Life Time in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Faheem Khan, Toor Gul, Shujaat Ali, Abdur Rashid, Dilawar Shah, and Samiullah Khan Small Cells Solution for Enhanced Traffic Handling in LTE-A Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Raid Sakat, Raed Saadoon, and Maysam Abbod Differential Cooperative E-health System over Asymmetric Rayleigh-Weibull Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Sara AlMaeeni Calculating Minimum Rotation Angle to Find Marked Geographical Locations Using Built-in Mobile Sensors . . . . . . . . . . . . . . . . . . . . . . . . 617 Naween Fonseka and Cassim Farook Performance Evaluation of Sending Location Update Packet to a Locator Identity Split Mapping Infrastructure . . . . . . . . . . . . . . . . 626 Avinash Mungur, Atish Foolchand, and Avishan Gopaul Power Allocation Scheme Using PSO for Amplify and Forward Cooperative Relaying Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636 Kamarul Ariffin Bin Noordin, Mhd Nour Hindia, Faizan Qamar, and Kaharudin Dimyati Performance Evaluation of Resources Management in WebRTC for a Scalable Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648 Naktal Moaid Edan, Ali Al-Sherbaz, and Scott Turner Barriers to Adopting Interoperability Standards for Cyber Threat Intelligence Sharing: An Exploratory Study . . . . . . . . . . . . . . . . . . . . . . 666 Nicole Gong qCB-MAC: QoS Aware Cluster-Based MAC Protocol for VANETs . . . 685 A. F. M. Shahen Shah, Haci Ilhan, and Ufuk Tureli Error Reconciliation with Turbo Codes for Secret Key Generation in Vehicular Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696 Dhouha Kbaier Ben Ismail, Petros Karadimas, Gregory Epiphaniou, and Haider M. Al-Khateeb Enumeration and Applications of Spanning Trees in Book Networks with Common Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Raihana Mokhlissi, Dounia Lotfi, Joyati Debnath, and Mohamed El Marraki
x
Contents
Digital Chunk Processing with Orthogonal GFDM Doubles Wireless Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Mohammad R. Kadhum, Triantafyllos Kanakis, Ali Al-Sherbaz, and Robin Crockett Model of Maturity of Communication Processes in Human Capital Management in an Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732 Andrzej Sołtysik PD Controller for Resilient Packet Ring Networks . . . . . . . . . . . . . . . . . 751 Fahd Alharbi Visual Meaningful Encryption Scheme Using Intertwinning Logistic Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764 Saadullah Farooq Abbasi, Jawad Ahmad, Jan Sher Khan, Muazzam A. Khan, and Shehzad Amin Sheikh Behavior of Organizational Agents on Managing Information Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 Mark van der Pas and Rita Walczuch Expanded Algorithm for Inertial Navigation . . . . . . . . . . . . . . . . . . . . . 789 P. M. Aksonenko, V. V. Avrutov, Yu. F. Lazarev, P. Henaff, and L. Ciarletta In-Body Antenna for Wireless Capsule Endoscopy at MICS Band . . . . 801 Md. Rasedul Islam, Raja Rashidul Hasan, Md. Anamul Haque, Shamim Ahmad, Khondaker Abdul Mazed, and Md. Rafikul Islam Classification and Analyses of Tendencies of Prior Published Research Studies on Unmanned Aerial Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . 811 Il-Kyu Ha Interference and Channel Quality Based Channel Assignment for Cognitive Radio Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823 Manish Wadhwa and Komalpreet Kaur Super Adaptive Routing Protocol for Mobile Ad Hoc Networks . . . . . . 834 Firas Sabah Al-Turaihi and Hamed S. Al-Raweshidy Multi-hop Hierarchical Routing Based on the Node Health Status in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849 A. Anhar and R. Nilavalan TCP-MAC Cross Layer Integration for Xor Network Coding . . . . . . . . 860 Khaled Alferaidi and Robert Piechocki Rotating Signal Point Shape Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876 Janak Sodha
Contents
xi
Measuring the Effectiveness of TCP Technique for Event Sequence Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881 Johanna Ahmad, Salmi Baharom, Abdul Azim Abd Ghani, Hazura Zulzalil, and Jamilah Din Ternary Computing to Strengthen Cybersecurity . . . . . . . . . . . . . . . . . 898 Bertrand Cambou and Donald Telesca A Novel Spatial Domain Semi-fragile Steganography Algorithm and Authentication Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 920 Nikolaos G. Bakaoukas and Anastasios G. Bakaoukas Towards Personalised, DNA Signature Derived Music via the Short Tandem Repeats (STR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951 Tirthankar Paul, Seppo Vainio, and Juha Roning SEAMS: A Symmetric Encryption Algorithm Modification System to Resist Power Based Side Channel Attacks . . . . . . . . . . . . . . . . . . . . . 965 K. P. A. P. Pathirana, L. R. M. O. Lankarathne, N. H. A. D. A. Hangawaththa, K. Y. Abeywardena, and N. Kuruwitaarachchi A Light Weight Cryptographic Solution for 6LoWPAN Protocol Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 Sushil Khairnar, Gaurav Bansod, and Vijay Dahiphale Forensics Data Recovery of Skype Communication from Physical Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995 Ahmad Ghafarian and Charlie Wood Looking Through Your Smartphone Screen to Steal Your Pin Using a 3D Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010 Diksha Shukla, Vir V. Phoha, and Saurav Prakash Cyber Physical Security Protection in Online Authentication Mechanisms for Banking Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021 Ahmed Yousuf Jama, Tansal Güçlüoğlu, and Maheyzah Md Siraj The Impact of Social Networks on Students’ Electronic Privacy in Saudi Arabia Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032 Nabih T. Abdelmajeed Comparison of Different Types of ANNs for Identification of Vulnerable Web Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042 Mahmoud O. Elish
xii
Contents
E-Secure: An Automated Behavior Based Malware Detection System for Corporate E-Mail Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056 K. Thebeyanthan, M. Achsuthan, S. Ashok, P. Vaikunthan, A. N. Senaratne, and K. Y. Abeywardena An Intelligent Path Towards Fast and Accurate Attribution . . . . . . . . . 1072 Jim Q. Chen Survey of Automated Vulnerability Detection and Exploit Generation Techniques in Cyber Reasoning Systems . . . . . . . . . . . . . . . . . . . . . . . . 1083 Teresa Nicole Brooks Managing Privacy Through Key Performance Indicators When Photos and Videos Are Shared via Social Media . . . . . . . . . . . . . . . . . . 1103 Srinivas Madhisetty and Mary-Anne Williams Incentivizing Blockchain Miners to Avoid Dishonest Mining Strategies by a Reputation-Based Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1118 M. Nojoumian, A. Golchubian, L. Njilla, K. Kwiat, and C. Kamhoua A BI Solution to Identify Vulnerabilities and Detect Real-Time Cyber-Attacks for an Academic CSIRT . . . . . . . . . . . . . . . . . . . . . . . . . 1135 Francsico Reyes, Walter Fuertes, Freddy Tapia, Theofilos Toulkeridis, Hernán Aules, and Ernesto Pérez Security Metrics for Ethical Hacking . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154 Reem Al-Shiha and Sharifa Alghowinem Spanning Tree Protocol for Preventing Loops and Saving Energy in Software Defined Networks Along with Its Vulnerability and Threat Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166 Aleena Rehman, Farhan A. Siddiqui, Jibran R. Khan, and Muhammad Saeed Simulations for Deep Random Secrecy Protocol . . . . . . . . . . . . . . . . . . . 1181 Thibault de Valroger PREDECI Model: An Implementation Guide . . . . . . . . . . . . . . . . . . . . . 1196 Fernando Molina-Granja, Glen D. Rodríguez Rafael, Washington Luna, Raúl Lozada-Yanez, Fabián Vásconez, Juan Santillan-Lima, Katherine Guerrero, and Cristian Rocha SAHCE: Strong Authentication Within the Hybrid Cloud Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1212 Belbergui Chaimaa, Elkamoun Najib, and Hilal Rachid A Blockchain-Based Decentralized System for Proper Handling of Temporary Employment Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . 1231 Andrea Pinna and Simona Ibba
Contents
xiii
Poster: Password Input Method Using Simple Device . . . . . . . . . . . . . . 1244 Reina Momose, Manabu Okamoto, and Miyu Shibata Detection of Prolonged Stress in Smart Office . . . . . . . . . . . . . . . . . . . . 1253 Elena Vildjiounaite, Ville Huotari, Johanna Kallio, Vesa Kyllönen, Satu-Marja Mäkelä, and Georgy Gimel’farb An Efficient NTRU-Based Authentication Protocol in IoT Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1262 SeongHa Jeong, KiSung Park, YoHan Park, and YoungHo Park Pain Evaluation Using Analgesia Nociception Index During Surgical Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269 Jiann-Shing Shieh, Bhekumuzi Mathunjwa, Muammar Sadrawi, and Maysam F. Abbod Automated Inner Limiting Membrane Segmentation in OCT Retinal Images for Glaucoma Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1278 Aneeqa Ramzan, M. Usman Akram, Javeria Ramzan, Qurat-ul-Ain Mubarak, Anum Abdul Salam, and Ubaid Ullah Yasin Kernel Matrix Regularization via Shrinkage Estimation . . . . . . . . . . . . 1292 Tomer Lancewicki Single Image Based Random-Value Impulse Noise Level Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306 Long Bao, Karen Panetta, and Sos Agaian Hybrid Vehicular Network System for the Malfunctioned Vehicles Over Highways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319 Weam Gahsim Mergani, Abo-Obyida Mohammed Alahssen, L. M. Ahmed, and M. F. L. Abdullah International Cyber Attackers Eyeing Eastern India: Odisha - A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328 Bhaswati Sahoo, Rabindra Narayan Behera, and Sanghamitra Mohanty Computer Graphics Based Approach as an Aid to Analyze Mechanics of the Replaced Knee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1340 Ahmed Imran Efficient Set-Bit Driven Shift-Add Binary Multiplier . . . . . . . . . . . . . . . 1346 Alvernon Walker and Evelyn Sowells-Boone Estimation of Stress Condition of Children by Nasal Skin Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1351 Madoka Suzuki and Tomoaki Ohtsuki Private Collaborative Business Benchmarking in the Cloud . . . . . . . . . . 1359 Somayeh Sobati-Moghadam and Amjad Fayoumi
xiv
Contents
Mobile Phone Operations Just by Sight and Its Applications . . . . . . . . . 1366 Kohei Arai Music4D: Audio Solution for Virtual Reality . . . . . . . . . . . . . . . . . . . . . 1375 Shu-Nung Yao and Cheng-Yuan Yu c3d.io: Enabling STEAM (Science, Technology, Engineering, Arts, Mathematics) Education with Virtual Reality . . . . . . . . . . . . . . . . . . . . 1380 Jason Madar Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387
Statistical Learning of Lattice Option Pricing and Traders’ Behavior Using Ising Spin Model for Asymmetric Information Transitions Prabir Sen1(&) and Nang Laik Ma2 1
2
STATGRAF Research, Vancouver, Canada
[email protected] School of Business, Singapore University of Social Sciences, Singapore, Singapore
[email protected]
Abstract. Financial fluctuations are one type of complex problem to determine the market behavior. The study of such fluctuations and statistical (machine) learning methods to predict the option price changes has been done by many researchers in the past. With the advancement in technology, one can capture the complexities in the financial systems and use of deep statistical (machine) learning, and apply unique set of rules and principles to these multi-layered complex networks. This paper provides a framework for lattice option pricing to determine the state for choice-sets, as one such unique set, in complex financial networks. This is largely based on human intelligence that learns features of each individual stock, and their trade-off, pay-off, preferential attachment and strategic options in the decision-making process. This paper also focuses on cases where both price and demand fluctuates stochastically and where both buyers and sellers have asymmetric information with limited time for highquality decisions at their disposal to encourage or deter behavioral change. The situation draws on statistical mechanics and Ising-spin approaches to derive computational methods that infer and explain patterns and themes from highdimensional data to “manage the probable” as well as “lead the possibilities” for multi-stage optimal control in dynamic systems. Keywords: Data science Machine learning Lattice option pricing Signal processing Cognitive decisions Statistical physics Ising-spin model Intelligent systems Network science Statistical mechanics
1 Introduction Trading is very common in financial industry. Trader will send important information, as a signal to another party who would be buyer or seller or the IT system or platform. The question that we are interested in this business scenario is the interactive structure of a market accounts for informational content, if any, of these potential signals (Spence 1973) [1], the endogenous market process whereby the trader, for example, requires (and transmits) information about potential partners (as buyer or seller), which ultimately determines the implicit speculation involved in selecting portfolios, © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 1–17, 2019. https://doi.org/10.1007/978-3-030-01177-2_1
2
P. Sen and N. L. Ma
allocating information to people and triggering people for further information in the market? As additional new partners join the market, this study assumes repeated cycles around the loop to “manage the probable”. This study also modifies trader’s conditional probabilistic outcomes, as end-reward schedules, are adjusted to partners’ behavior with respect to signal choice, signal change and post-change to new signals, as they become available to the trader. It is useful to introduce a distinction between attributes that collectively constitute the state of the partner’s present; some are fixed, while others are alterable, but irreversible. It is these aspects of the mind or process, as cognitive state, that corresponds to thinking and feeling, comprising a conglomerate of mental representations and propositional attitudes (Simon 1973) [2]. At any point in time when interacting with an individual partner, the trader’s subjective assessment of the speculation which he or she confronts is defined by these states or conditions, based on probability distributions. This viewpoint regards signals and states as drivers in probability distributions that define a trader’s eventual outcome or pay-offs. At this point it is perhaps clear that there are actions of the trader that “lead the possibilities”, for a pre-defined outcome or a series of outcomes. These actions, attributes of the dynamics in a complex financial network, are a mathematical function that takes the trajectory or history of the financial system as an argument and the metrics as results. As new market information becomes available to the trader through selection and subsequent observations on partner capabilities, as they represent signals and state, the trader’s conditional probabilistic outcomes are adjusted, and a new round starts. The fee schedule of each subsequent new entrant in this network generally differs from that of the previous partners in the network. It is desirable to find the expected response, to each action, and impact thereof in the market over time. It could be something said or written as a reply to a possible action: something that is expected as a response to an action. To avoid studying the complex financial system in continuous state of action and response, it is useful to look at this as a non-transitory configuration of the responsive system, where the market is generating - either through each “satisficing” decision or a series of small (quantum like) decisions towards a big decision – an empirical distribution of partner capabilities through observable action-response or signals (in cognitive states). The trader’s pay-off or reward, is an intended outcome, of this relationship for information or knowledge, so yt yp , where the trader’s outcome, yt, and the partner’s outcome, yp, is acceptable to both party. A reward is as an appetitive stimulus given to a human or some other entities in recognition of their service, effort or achievement n in order to alter their behavior. Again, the information or knowledge level conveys the partners’ intent of investing in information or knowledge in the financial network (Barabasi 2002) [3]. If they do not invest they might incur lower fees but the loss would exceed the gain from not making the knowledge-based decisions. This implies that there could be prerequisites that convey no information and hence it is not useful to make any decision based on that information. In recent years, due to the advancement in data science and advanced analytics techniques as well as the massive amount of data available, the vision of better prediction to ‘manage the probable’ and prescription to “lead the possibilities” is gaining
Statistical Learning of Lattice Option Pricing
3
momentum. In the next few sections, we discuss our “slicing” approach of training data for machines to learn probabilities and their densities: by breaking down the deep financial network into several layers for a data model, where individual systems – trader, buyer and seller – interact to take multiple quanta-like small decisions towards a big decision that naturally generalizes the model (Watts and Strogatz 1998) [4]. We further slice the data for each machine to learn features and trade-offs, for one asset visà-vis another that likely attracts buyers and sellers to the trader. This multi-layered, multi-dimensional learning model is designed to determine probabilities and volatilities – using the Ising-spin model in asymmetric information transition algorithms – for finding short paths with high probability and low errors for each option. We further use a multi-layered, multi-dimensional optimization model for efficient learning and steepest descent to execute the model in a decentralized financial system for dynamic decisions (Scholtes and Tessone 2011) [5]. Later, we also present some computational results and performance of the model. Finally, we discuss the limitations of the model and propose some challenges and opportunities that emerge when using quantum-like machine learning for strategic decisions with predictable and quantifiable prescriptive properties.
2 Quantum Neural Data Individual training data, including asset features, trading (exchange) location, preorder, order, delivery, volatility, etc., were structured in a way to provide an understanding of the membrane properties of different features of a particular type of partner or individual. These features of a neuron (partner or individual) in the neural-network structure show different (labeled) compartments, together with a canonical representation. Therefore, the search for membrane properties (set to data search), instead of inferring for deep-layer properties, gives an overview of the membrane properties found in all compartments in the neuron while connectivity shows the membrane properties and synaptic connections to all compartments in this neuron and inferences provides interactions collected at a particular time for all properties of all compartments in this neuron. The search enables the user to observe the combination of membrane properties that mediates the integrated activity of a given compartment; compare it to combined property in different compartments; identify a series of inferences that are established in a property in a compartment of this neuron type; and learn the differences using the evidence from a particular property as a signal. A cognitive and predictive system (Sen 2015) [6] is operable to estimate a human cognitive state based on activities and trajectories using multiple sources of dynamic behavioral data and advancing analyses techniques in real-time to draw inferences for individually-signaled adaptive decisions. Entities may respond to these inferences with personalized services at a location and at a moment in time that is relevant to the individual. Inferences for each individual machine may be aggregated to form a collection of dynamic decisions relevant to an individual, and the collection of dynamic decisions may also be used to draw the inferences for the individual. Representation of various canonical forms of cognitive states requires the diversity of their dendritic trees, as in neural-network structure. In a single dendrite represented as an equivalent
4
P. Sen and N. L. Ma
cylinder (e) consists of a chain of three decision compartments, (p) for proximal, (m) for middle, and (d) for distal, with respect to the cell body. The approach (Shepherd 1968) [7] is representing these patterns as equivalent dendrites, with corresponding p, m and d compartments. A lattice fractal is a graph, which corresponds to a fractal, with self-similarities, but most of them have no translation invariance (Chen 1992) [8]. They are generally distributed compositional semantics of multi-dimensional objects. Chen’s model shows that the Ising model on the lattice Sierpinski carpet exhibits the phase transition in any dimension, but has no transition phase (because of the character of the finitely ramified fractal) on the lattice Sierpinski gasket. When missing data are present, the model uses expectation-maximization (EM) algorithm to estimate parameter in Bayesian Networks when there is training data (Friedman 1997) [9] and in-record linkage when there is no training data (unsupervised learning). The model further uses EM and Monte Carlo Markov Chain (MCMC) methods to estimate the error rate automatically in some of the record linkage situations (Larsen and Rubin 2001) [10].
3 Measuring Trade-offs Transforming signals in complex financial networks begins with understanding the flow of information between the trader and partner machines. Trader systems try to identify signals in the individual machines which are relatively more attractive and measure the extent of their (dis)satisfaction so as to prioritize areas for improvement. Individual machines and their interactions with the financial network have two fundamental properties: the small world (get “satisfied” or “dissatisfied” from one or two network interactions) and the scale-free properties (“make it viral” in other network interactions). In small world network, one can reach a given node from another one with the smallest number of links between the nodes, in a very small number of steps. Many naturally occurring networks are of this nature. It is mathematically expressed by the slow (logarithmic) increase of the average diameter of the network, l; with the total number of nodes N, l < lnN; where l is the shortest distance between two nodes and defines the distance metric in complex networks (Erdos and Renyi 1960) [11] equivalently, obtains N el=lo , where lo is a characteristic length. Secondly, these complex networks arises with the discovery that the probability distribution of the number of links per node, P(k) (also known as the degree distribution), can be represented by a power-law (‘scale-free’) with a degree exponent c that is usually in the range 2 \c\3 (Bollobas 1985) [12]. Pðk Þ kc These discoveries have been confirmed in many empirical studies of diverse networks. Multi-layered multi-dimensional networks (Newman 2003) [13], as depicted in Fig. 1, explicitly incorporate multiple service channels of connectivity and constitute the natural environment to describe systems interconnected through different categories of connections: each partner channel (relationship, activity, category) is represented by
Statistical Learning of Lattice Option Pricing
5
a layer and the same node or entity may have different kinds of interactions (different set of neighbors in each layer) for different set of decisions. Partner (dis)satisfaction measurement efforts generate accurate, actionable insights based on three guiding principles:
Fig. 1. Multi-layered relationships in the financial network
(a) Individuals signal what matter most, without their being asked directly: Explicitly asking people which they considered as most important to improve in the economic delivery, for example, the time required to respond a request versus the direction and guidance from a staff, is unlikely to produce accurate results. Unfortunately, most people will say every aspect is equally important. So individual machines are trained to learn how to rate each feature (for example, the overall trading process of settlement) or trade-offs between the features. This method provides insights about partners’ needs and priorities. (b) Identify natural break points in service-quality: It is unrealistic to achieve zero wait times and one-click transactions across all services as it will increase the operational cost exponentially. Thus, traders need to find a balance between delivering high-quality and responsive services verses managing resources efficiently and effectively by using metrics to determine “statisficing’ economic levels. One way for machines to learn is to identify break points – the “last-mile” journey of experience in trading – at which delays or shortfalls cause (dis)satisfaction. Machines are able to identify the trade-offs between trader and partner satisfaction for both of these channels in real time, which will in turn raise overall partner satisfaction (Faloutsos et al. 1999) [14]. (c) Combine pay-off for internal data with partner data and external public data to uncover hidden painpoints: Combining trader’s internal information with operational data of partners – usage data, volume, delay, annual reports, social media
6
P. Sen and N. L. Ma
data, public review, market trends, etc. – can produce additional insights on machines’ cognitive state implicitly or explicitly. This will help to identify the correlation between problematic machines and the key drivers of their (dis)satisfaction. The trader system requires a structured model, as in Fig. 2, to foresee the unintended consequences and “manage the probable” for a collective decisionmaking process.
Fig. 2. These steps in decision making give machines the essential elements of a structured process model.
In addition, the trader system also requires a deeper understanding of such underlying latent variables to identify how individual machines interact in different financial networks. These, so called interaction mechanisms, are paramount in understanding the emergent nature of collective decisions, as they often lead to common features and same patterns within the network structure. This also hypothesizes what other outcomes there could be, should the interactions change. And that shines light on the unintended consequences at some future time. Very little study has been found on the connectivity of interconnected nodes which have different length. But many examples in finance exhibits the importance of individual and collective trade-offs behavior, for example, between helpful responses and long wait-time in banking networks and links between clusters of similar subject payment sites of exhibit similar behavior (Berlingerio et al. 2011) [15]. In such structured systems, virtual connections between participating machines are created in a globally consistent way to construct a pre-determined topology. While this allows for the development of highly efficient algorithms for distributed search, routing or information dissemination, the fact that this network topology has to be maintained at all times in the face of highly dynamic constituents is a major challenge. If the expected activities, for example, for a group of partners indicate that a large number of machines are exercising a particular choice, the forecasting module may modify the machine’s target choices to encourage (e.g. frequency of trade) or deter
Statistical Learning of Lattice Option Pricing
7
behavior (e.g. disinvestment). This model also estimates the cognitive state of a machine, which may include their experience for an activity. A stochastic sub-process may be executed to determine expectations the probability of a machine (n) being in an unobserved state r at time t. A state-choice sub-process to determine the cognitive state for each choice associated with a choice set attribute. The method applied that train one support vector machine (SVM) per individual choice basket ðb. . .BÞ and to compute the individual partworths generated from Gaussian distribution by regularizing with the aggregated partworths. A choice from the choice set (j…J) with feature variables are shown as X and a choice for one over another is expressed as ð1. . .:iÞ of X, that was selected by a machine – the trader or partner – is determined and the maximum likelihood probability for that choice may be used to determine the expected cognitive state defined as: Pnr ð jÞ ¼
K 1 Y X ji 1 Bð jÞ i¼1 i
The expected cognitive states for each machine’s decision on a choice-set are accumulated based on the last activity performed for machines 1 to N. A relational activity model is used to accumulate the cognitive states. A relational clique is a construct of a clique over all activities at various states on a trajectory, which may be a travel path of one or more machines. Each clique C is associated with a potential function that maps a tuple (values of decisions or aggregations). Together they provide (a) activity-based decision, (b) state and (c) actions of consecutive activities as expressed by the following Equation [6]. X Y a ; v0 y0 cc v0 c c c c
Since each machine is modeled as a quantum candidate, for example, each machine’s last decision is modeled as a function of one or more of time, state, transition and constraints. A single machine’s cognitive state, therefore, is determined and then Z is defined, where Z is the probability statistical distribution of finding the machine in any particular cognitive state associated with a decision U, machines N and statedensity V. Z is proportional to the degeneracy of the accumulated cognitive states (of R as in Relational activity model). The grand sum is the sum of the exponential, which may be determined by expanding the exponential in Taylor series, over all possible combinations of U, V and N. The single-particle density of states is investigated for a financial system of localized assets (electrons) with Coulomb interaction in a random potential situated on a fractal lattice of the Vicsek type with a fractal dimension of two embedded in three-dimensional Euclidean space. The study found that the density is determined by the geometric fractal dimension of the lattice instead of the spectral dimension, which usually determines the density.
8
P. Sen and N. L. Ma
4 Use of Ising: Spin Model for Stochastic Volatility Next we consider a price model of auctions for an asset in an exchange (stock) market. Assume that each trader can trade the asset several times at each day t 2 {1, 2, …, T}, but at most one unit number of the asset at each time. Let (t) denote the daily closing price of tth trading day. And let Kn be a subset of (2), where. Kn ¼ ffy; xg 2 S ð2Þ : 3n y 3n ; 3n x 3n g and Ct ð0Þ be a random open cluster on Kn. Suppose that this asset consists of jKn j (n is large enough) investors, who are located in Kn lattice. And (0) is a random set of the selected traders who receive the information. At the beginning of trading in each day, suppose that the investors receive some news. We define a random variable ft for these investors, suppose that these investors taking buying positions ðft ¼ 1Þ selling positions ðft ¼ 1Þ, or neutral positions ðft ¼ 0Þ with probability q1 ; q1 or 1 ðq1 þ q1 Þ ðq1 ; q2 [ 0; q1 þ q2 1Þ, respectively. Then these investors send bullish, bearish or neutral signal to the market. However, the volatility of the underlying asset is a function of an exogenous stochastic process, typically assumed to be mean-reverting. Assuming that only discrete past asset data is available, we adapt an interacting particle stochastic Monte Carlo algorithm (Moral et al. 2001) [16] to estimate the stochastic volatility (SV), and construct a multinomial tree which samples volatilities from the SV filter’s empirical measure approximation at time 0. The study applied the construction of a multivariate generalization of the Dirichlet-multinomial distribution (DMD), instead of binomial or trinomial tree, as approximations to continuous-time short-rate models (Kalwani 1980) [17]. The proposed model of DMD to determine the distribution f(X) of the proportion of the partners exposed none (F1 = 0), one (F2 = 1), two (F3 = 2)… mn (Fmn+1 = mn) of the n position of m asset. This model can be expressed in the general form: Z f ðXjA; nÞ ¼ 0
1
Z
1p0
Z ...
0
with Ai ¼ pi :ðSÞ and S ¼
1
Pm2 i¼0
pi
f ðXjP; nÞDðPjAÞdpm1 . . .. . .dp1 dp0
0 m P
Ai .
i¼0
The P vector contains the probabilities of m + 1 occurrence (p0 ; p1 . . .pm ¼ P), and is the multinomial probability vector. The random variable (pi) satisfy the following constraint: m X
pi ¼ 1 and 0 pi 1
i¼0
Such machine probabilities are Dirichlet distributed, and expressed as DðPjAÞ ¼
CðA0 þ A1 þ . . .Am Þ A0 1 A1 1 Am 1 p p1 . . .pm CðA0 ÞCðA1 Þ. . .CðAm Þ 0
Statistical Learning of Lattice Option Pricing
9
where the A vector {A0, A1, …, Am} is the Dirichlet parameter vector. When the multivariate Dirichlet and multinomial distribution are compounded, the form of the proposed model is given as: C
f ðXjA; nÞ ¼
m P
Ai
i¼o
m Q Cð A Þ
n! x0 !x1 !. . .xm !
i
m Q
i¼o C
Cðxi þ Ai Þ
nþ
i¼o
m P
Ai
i¼o
where X vector {x0, x1…xm} specifies the expected return from each asset of m asset at n level position opportunities. As with arbitrage-free pricing scheme, we also introduce one or more Martingale (a.k.a. risk-neutral) probability measures, usually denoted by Q, which is defined as the SV model with l replaced by r, the risk-free rate. The riskneutral price of an option is its discounted expected future value under such a martingale measure Q. This means that, given the present state of the asset at the time of pricing, the price must be determined by using the model for the asset under Q. For simplicity, the study assumes that each partner can be in either one of two spin states, one state is designated by A, # or +1, and the other by B, " or –1. The total number of spins (partner states) is B, the number of +1 spins is N or NA and the number of –1 spin is by B – N. The intensity of action (magnetization), I, may be conveniently defined as the net number of –1 spins. The intensity of action per spin, I, is then I¼
I N ¼ 1 2 ¼ B 2q B B
where q ¼ NjB. Then the canonical ensemble is: Qm ðB; I; T Þ ¼ jm ðT ÞB
X X
eEi =kT
i¼1
where Ei is the sum of nearest-neighbor (pair) trade for the ith configuration and jm represents the non-configurational trade (partition) function of each of the B partners of the system. To derive buyer(a)-trader(b)-seller(c) configurations at exchange (d), the system at exchange (e) limits itself to the simple cubic lattice and allowing nearest-neighbor interactions that are not only between habihbcihcd i and hdei, but also hbd i (trader and exchange) in a three-dimension Ising model as nearest-neighbor interactions in a selfsimilar way does affect critical behavior (Gefen et al. 1980) [18]. We derive the model as: Kðy; xÞ ¼
ð1 þ xÞ3 ð1 þ yÞ X ur ð yÞur : 4 2 r 0 ð1 þ yÞ2r
10
P. Sen and N. L. Ma
The consensus time is characterized in terms of a certain measure of segregation that depends only on large-scale linking patterns among groups, and (with high probability) not on idiosyncratic details of network realizations, as depicted in Fig. 3. This measures the extent to which machines of an economic (dis)satisfaction group are biased toward forming links with other, similar types.
Fig. 3. Features and attributes of assets at different buying and selling process.
In terms of modeling their performance and robustness, most unstructured approaches to the information of such interactions through economic (dis)satisfaction rely, either explicitly or implicitly. In order to be able to appreciate the analogies between the information of large, dynamic networked systems, statistical mechanics and thermodynamics, briefly recalled one of the basic models of random graph theory. This model defines a probability space that contains all possible graphs or networks G with n nodes. Assuming that edges between pairs of nodes are being generated by a stochastic process with uniform probability p, the G(n,p) model assigns each network G = (V, E) with n ¼ jV j nodes and m ¼ jE j edges the same probability PG ðn pÞ ¼ pm :pnðn1Þ=2m This simple stochastic model for networks has been used in the modeling of a variety of structural features of real-world networks. This mapping predicts that the common epithets used to characterize competitive systems, such as “winner takes all,” “fit get rich” (FGR), or “first-mover advantage,” emerge naturally as thermodynamically and topologically distinct phases of the underlying complex evolving network
Statistical Learning of Lattice Option Pricing
11
(Bianconi and Barabasi 2001) [19]. To demonstrate the existence of a state transition from the FGR phase, the pay-off distribution follows: gðÞ ¼ C 2h where h is a free parameter and the pay-offs are chosen from 2 ð0; max Þ; with þ 1 normalization C ¼ h þ 1= hmax : For this class of distributions the cognitive state (a Bose condensation) is: hþ1 ðb max Þh þ 1
Z
b max b min ðtÞ
dx
xh \1 ex 1
As the classical G(n, p) model can be viewed alternatively as an adiabatic ensemble with a fixed Poissonian degree distribution, since power-laws (or at least heavy tailed) degree distributions have been observed for a number of real-world networks (Berg and Lassig 2002) [20]. In order to formalize a useful representation of a graph in terms of its adjacency matrix A where each element is aij ¼ 1 aij ¼ 0 if the nodes i and j are connected (respectively, disconnected). Then, the spectrum of such a network is given by the set of n eigenvalues of its adjacency matrix. Then, the spectrum of such a network is given by the set of n eigenvalues of its adjacency matrix. For the G(n, p) model, it is possible to characterize the spectrum of the networks in the limit of diverging network sizes. In this regime, if there is a giant cluster that spans the complete financial network, the probability pðkÞ of finding an eigenvalue ka a in the spectrum follows the so-called semi-circle law (Cohen et al. 2001) [21]. This recognized that different kinds of transitions appear when studying evolutionary processes in networks, as depicted in Fig. 4. When the different nodes exhibit common dynamics at the global level, it can be said that a synchronized state has emerged. This also shows we need large enough grid to have sharp predictions. The idea of this model is to start with the estimated volatility distribution Y0n :¼ UnK at the
present time, represented by its weighted particles Y n ¼ Y j ; pj : j ¼ 1; 2; . . .:n and with the logarithm of the price of the asset today x0 = log S0, and then to take advantage of the same particle filtering scheme to generate future asset prices and volatility values: that is to say, we used stochastic volatility filtering in a dynamic way for pricing. Once we find the value at the expiration X(x0)N = XT we can compute the value of the option at the expiration, and then we discount back to the present value using the risk-free rate. This represents one replication of a Monte Carlo method: to compute an estimate of the option price we generate many replications (typically of the order n″ = 106) then compute the average of the values obtained. This average is our estimate for the price of the option today. The convergence, and the convergence speed, of order (n′′)−1/2, is of course guaranteed by a standard argument based on the central limit theorem. Since option prices given by our dynamic model is not as close to those given in the market — static model’s prices —, there seems little reason to improve the efficiency of our Monte Carlo method for the dynamic model. Interestingly, this also means that
12
P. Sen and N. L. Ma
Fig. 4. The same mechanisms of learning models that lead to the improvements in transitions with larger lattices for bother partners (M) and Traders (E) – a myopia of tendency to ignore the larger picture.
there exists a relationship between trader and partners in the financial network (in terms of its eigenvalues) and the dynamic processes are happening at the level of nodes. Dynamic replication processes also take place in a discrete space. (Farbod 2009) [22] uses the discrete logarithmic mapped in a graph for such process, it is equivalent to stochastic search, and transport phenomena within the network. This study introduces the basic idea of the replica method by computing the spectrum of random matrices. To be confident, the study explores the case of real symmetric matrices of large order N having random elements Jij that are independently distributed with zero mean and variance J 2 ¼ N 1 . ^ of the N N matrix ^J is defined as follows: The resolvent R ^ ðeÞ ¼ eI ^j 1 R where I is the N N identity matrix and e is a complex number. This also formulates ^ ðeÞ: in computing the trace of R
X ^ ðeÞ ¼ Tr R k
1 e ek
Statistical Learning of Lattice Option Pricing
13
in the large N limit. One can argue that, decision-making is not invariant to the physical environment in which a decision is made. In addition, a partner with access to deeper (quantum) information resources may be able to do strictly better than a partner with access only to classical information resources. In this respect, our findings are somewhat akin to those in computer science that have established the superiority of quantum over classical algorithms for certain ^ ðeÞ is a real analytic function with a cut on the problems. In the nutshell the function R real line and the discontinuity on the cut is the density of states. Now we can write the average of the logarithm of the determinant in the following way (Dotsenko 2005) [23]: n=2 d 1 ^ log det eI j ¼ 2 lim n!0 dn detðeI ^ j Operatively one computes the quantity on the r.h.s for integer (positive) values of the replica number n, then one performs an analytical continuation of the result to real values of n and eventually takes the limit n ! 0. Such replication in decision-making in complex systems is based on multi-layered, multi-dimensional-optimization problem with several criteria to identify the sensitive preferences of a decision maker. The key idea of quantum decisions (QD), decentralized deeper (quantum) signals in the decision-making, as, is to provide the simplest generalization of the underlying individualized multiple decisions towards a single decision, so as to account for the complex dynamics of the many non-local hidden variables that may be involved in the cognitive trade-offs and decision making processes of the brain (Yukalov and Sornette 2009) [24]. In such decision-making, involving signals with unknown states of nature, actions of trade-offs, responses to actions, and rewards in subconscious play the role of hidden variables. However, the learning model represented the neural network as a quantum-like object for which several mechanisms have been suggested (Lockwood 1989) [25]. It also meant a genuine quantum nature of some elements in the psychological processes in decision-making (Cheon and Takahashi 2010) [26]. However, to rigorously test the existence or the absence of genuinely quantum effects, the study, in Fig. 5, considered a decision-making experiment with incomplete information, analogous to the Harsanyi type quantum game (Cheon and Iqbal 2008) [27] in which inequalities (Bell 1964) [28] were tested. In the case of uncertainty, when the outcomes are unknown, partners may lack a clear reason for choosing an option and consequently they could abstain and make an irrational choice. When partners confronting uncertainty — paralyzing them against acting — are presented with a detailed explanation of the possible outcomes, they changed their mind and decided to act, thus reducing the disjunction effect (Tversky and Shaffir 1992) [29]. Thus, by providing the partner with additional explanations, it is possible to influence their state or conditions. This gives the partner a drive to either alter their existing cognitions, or to reframe their interpretation of a situation, through a reorientation of their local frame. Rather than positing a collapse of the partner’s state to whichever axis represents their decision, this model updates their decision by shifting it towards the axis representing the decision.
14
P. Sen and N. L. Ma
Fig. 5. Results show big fluctuations near the critical point as the behavior of (Partner) M vs (Trader) E is very different.
5 Multi-layered, Multi-dimension Optimization This learning model is optimized using a parallelizable Branch-and-Fix Combinatorial (BFC) algorithm. For overall process flow, we want to minimize the overall time spent at each stage of learning, which is a function of trading rate, buy-sell rate and number of trades assigned to derive an outcome, as expressed below: 0
0
0
00
00
00
min f ðt; ait ; lit ; Xti þ gðt þ c; ait þ c ; lit þ c ; Yti þ c Þ þ h(t þ k; ait þ k ; lit þ k ; Zti þ k Þ where ti – set of time period {1, 2, …, T} at stage i; Dit – total partners buy-sell at stage 0 00 i at time t; Dit þ c – total partners buy-sell at stage i0 at time t + c; Dit þ k – total partners 0 buy-sell at stage i00 at time t + k; ait - buy-sell rate at stage i at time t; ait þ c – buy-sell rate 00 at stage i0 at time t + c; ait þ k – buy-sell rate at stage i00 at time t + k; lit – trade rate at 0 00 stage i at time t; lit þ c – trade rate at stage i0 at time t + c; lit þ k – trade rate at stage i00 at 0 time t + k; Cti – delay cost at stage i at time t; Cti þ c – delay cost at stage i0 at time t + c; 00 Cti þ k – delay cost at stage i00 at time t + k; Rt – Available trades at time t; R0t þ c – Available trades at time t + c; R00t þ k – Available trades at time t + k. The model includes Xti ¼
1 0
if trades are assigned at stage i at time t otherwise
Statistical Learning of Lattice Option Pricing
0 Yti þ c
¼
00 Zti þ k
¼
15
1 if trades are assigned at stage i0 at time t + c 0 otherwise 1 0
if trades are assigned at stage i00 at time t + k otherwise
In order to obtain a set of Pareto optimal solutions efficiently, the learning model uses an improved algorithm with multiple-choice constraints to arrive at a new solution method. To introduce the uncertainty in the parameters, the model uses a scenario analysis approach. A scenario consists of a realization of all random variables in all stages, that is, a path through the scenario tree. The scenario tree, in real-life, is very frequently asymmetric. Thus, the learning model for the trader’s interaction with a partner under the first-encounter scenario is defined as: XT Ci X i Min t¼1 t t P P P P P P P Subject to Tt¼1 ait Tt¼1 lit Xti ; Ii¼1 Xti Rt ; Ii¼1 Tt¼1 lit Xti Ii¼1 Tt¼1 Dit . The trader’s subsequent interaction with another partner in the relational activity is the second-encounter scenario, which is conditional to the first partner in the first scenario, is defined as: Min Subject
PT 0
to
i0 i0 t¼t þ c lt þ c Yt þ c
XT 0
0
t¼t þ c
PT 0
i0 t¼t þ c at þ c PI 0 PT 0 i0 i¼i0 t¼t þ c Dt þ c .
0
Cti þ c Yti þ c
PT 0
t¼t þ c
0
0
lit þ c Yti þ c ;
PI 0
i¼i0
0
0
Yti þ c Rt þ c ;
PI 0
i¼i0
The trader’s subsequent interaction with another partner in the relational activity is the third-encounter scenario, which is conditional to the first partner in the first scenario and second partner in the second-encounter, is defined as: Min
XT00 t¼t þ k
Cti00þ k Zti00þ k
PT 00 PT 00 P 00 P 00 00 00 i00 i00 Subject to Zti þ k ; Ii¼i00 Yti þ k R00t þ k ; Tt¼t þ k t¼t þ k at þ k t¼t þ k lt þ k P 00 P 00 P 00 P 00 00 00 00 00 Yti þ k ¼ 1; Ii¼i00 Tt¼t þ k lit þ k Zti þ k Ii¼i00 Tt¼t þ k Dit þ k . In order to reduce the dimensions of the model, the explicit representation of the non-anticipativity constraints is not desirable for all pairs of scenarios. The learning model takes astonishingly small computing time, a fraction of the time required by the proposed algorithm for solving a large-scale real-world problem.
6 Conclusion Since the training data-sets and statistical (machine) learning methods that derive lattice option pricing model require a larger grid for sharper predictions and, therefore, require quantum-like decisions in small world networks, as a foundation, to “manage the
16
P. Sen and N. L. Ma
probable” and derive maximum likelihood probability for signals where financial network properties affect individual (quantum) states, actions and rewards that “lead the possibilities” thereof. This study presents the Ising-spin methods and the critical subroutines of statistical quantum mechanics learning algorithms for all-photonic continuous variables that achieve an exponential speedup compared to their equivalent classical counterparts. Finally, the study also maps out a real-life implementation that can be used as a blueprint for future demonstrations in order to take advantage of asymmetric information interests and their scalability. However, please note that our method is not limited to photonic demonstrations but is applicable to a variety of substrates, including spin ensemble systems, such as trapped humans and, therefore, machines in complex networks. We have also looked at the optimal control policies of such multi-layered, multidimensional models for one type of interaction of signals. More broadly, future researchers will ask how well other probabilistic models of complex networks with asymmetry of information can be applied using the same basic concepts and analysis.
References 1. Spence, M.: Job market signaling. Q. J. Econ. 87(3), 355–374 (1973) 2. Simon, H.A.: The structure of ill-structured problems. Artif. Intell. 4, 181–201 (1973/1984) 3. Albert, R., Barabasi, A.L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002) 4. Watts, D., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440 (1998) 5. Scholtes, I., Tessone, C.J.: Organic design of massively distributed systems: a complex networks perspective 6. Sen, P.: Location-based cognitive and predictive communication system. US Patent 9,026,139 (2015) 7. Rall, W., Shepherd, G.M.: Theoretical reconstruction of field potentials and dendrodendritic synaptic interactions in olfactory bulb. J. Neurophysiol. 31, 884–915 (1968) 8. Chen, M.-F.: From Markov Chains to Non-Equilibrium Particle Systems, p. 1992. World Scientific Publishing, River Edge (1992) 9. Friedman, N.: Learning belief networks in the presence of missing values and hidden variables. In: Fisher, D. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning, pp. 125—133. Morgan Kaufmann, San Francisco, CA 10. Larsen, M.D., Rubin, D.B.: Iterative automated record linkage using mixture models. J. Am. Stat. Assoc. 79, 32–41 (2001) 11. Erdo¨s, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5 (1960) 12. Bollobás, B.: Random Graphs. Academic, London (1985) 13. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003) 14. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the Internet topology. Comput. Commun. Rev. 29, 251–262 (1999)
Statistical Learning of Lattice Option Pricing
17
15. Berlingerio, M., Coscia, M., Giannotti, F., Monreale, A., Pedreschi, D.: Foundations of multidimensional network analysis. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 485–489 (2011) 16. Del Moral, P., Jacod, J., Protter, P.: The Monte-Carlo method for filtering with discrete time observations. Probab. Theory Relat. Fields 120, 346–368 (2001) 17. Kalwani, M.U.: Maximum likelihood estimation of the multinomial-Dirichlet distribution. Paper No. 741 (September), Krannert Graduate School of Management, Purdue University (1980) 18. Gefen, Y., Mandelbrot, B.B., Aharony, A.: Critical phenomena on fractal lattices. Phys. Rev. Lett. APS (1980) 19. Bianconi, G., Barabási, A.L.: Bose-Einstein condensation in complex networks. Phys. Rev. Lett. 86, 5632, 11 June 2001 20. Berg, J., Lassig, M.: Correlated random networks. Phys. Rev. Lett. 89 (2002) 21. Cohen, R., Erez, K., Ben-Avraham, D., Havlin, S.: Breakdown of the Internet under intentional attack. Phys. Rev. Lett. 86, 3682 (2001) 22. Shokrieh, F.: The monodromy pairing and discrete logarithm on the Jacobian of finite graphs. J. Math. Cryptol. 4 (2009, 2010) 23. Dotsenko, V.: Introduction to the Replica Theory of Disordered Statistical Systems. Cambridge University Press, Cambridge (2005) 24. Yukalov, V.I., Sornette, D.: Physics of risk and uncertainty in quantum decision making. Eur. Phys. J. B 71, 533–548 (2009) 25. Lockwood, M.: Mind, Brain and the Quantum. Basil Blackwell, Oxford (1989) 26. Cheon, T., Takahashi, T.: Interference and inequality in quantum decision theory. Phys. Lett. A 375(2010), 100–104 (2010) 27. Cheon, T., Iqbal, A.: Bayesian nash equilibria and bell inequalities. Phys. Soc. Jpn. 77, 024801 (6p) (2008) 28. Bell, J.S.: On the Einstein-Podolsky-Rosen paradox. Physics 1, 195 (1964) 29. Tversky, A., Shafir, E.: The disjunction effect in choice under uncertainty. Psychol. Sci. 3, 305–309 (1992)
Deep Time Series Neural Networks and Fluorescence Data Stream Noise Detection James Obert(&) and Matthew Ferguson Biophysics Department, Boise State University, Boise, ID, USA {jamesobert,mattferguson}@boisestate.edu
Abstract. The advent of fluorescence microscopy and the discovery of green fluorescent protein has resulted in an explosion of fluorescence techniques and the promise of characterizing complex biochemical reactions in the living cell. Fluorescence Correlation Spectroscopy (FCS) which relies on correlation function analysis (CFA) is a powerful tool used for single molecule and single cell measurements that provide new insights into complex biomolecular interactions. FCS time series analysis coupled with the use of deep time-series neural networks provides better insight into a data set than correlation functions alone. Typical correlation functions average over many molecular events to give an average characterization of concentration, mobility, binding or enzymatic processivity. The benefit of coupling CFA with deep-learning is the added ability to perform quantitative measurements in the presence of noise. In this paper, a unique systematic approach for identifying noise in fluorescence data is explained. How classical CFA can be enhanced using a deep time-series neural network is illustrated. How wavelet transforms facilitate signal noise isolation and root cause analysis in both stationary and non-stationary fluorescence signals is also explained. While CFA has been used to study neural networks, this article uniquely combines classical CFA with a time-series neural network for the detection of noise in fluorescence data streams. Keywords: Time-series neural networks Fluorescence correlation spectroscopy Correlation function analysis
1 Introduction and Related Work Temporal correlation function analysis dates back originally to the Hanbury Brown and Twiss experiment [1] and to van Hove in 1962 [2] and the development of Quasielastic Neutron Scattering (QNS). More commonly today it is used in Dynamic Light Scattering (DLS) [3] or Fluorescence Correlation Spectroscopy (FCS) [4, 5]. The advent of fluorescence microscopy and the discovery of green fluorescent protein have resulted in an explosion of FCS related techniques and the promise of characterizing complex biochemical reactions in the living cell [6–11]. Fluorescence Correlation Spectroscopy is a technique for measuring the amplitude and duration of fluctuations which arise from the Brownian motion of fluorescent molecules or the rate of chemical reactions between fluorescent species inside a solution or in a living cell [5]. Historically, digital correlators have been used to collect © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 18–32, 2019. https://doi.org/10.1007/978-3-030-01177-2_2
Deep Time Series Neural Networks
19
data and correlation function analysis (CFA) has been used to interpret FCS data [12, 13]. Over time, experiments have increased in complexity in terms of instrumentation [14, 15], acquisition [7, 8] and the processes under consideration [9–11]. With new faster clock speeds on computers and digital to analog converters (DAC) digital correlators are no longer needed. In fact use of a 1 MHz DAC now allows experimenters to save time series data which includes photon arrival times and thus more information on the underlying molecular processes in single living cell [9] and single molecule [16] experiments. In this article, it is shown that FCS time series analysis coupled with the use of deep time-series neural networks provides substantial analysis benefits over correlation function analysis (CFA) alone [17]. Correlation functions average over many molecular events to give an average characterization of concentration [8], mobility [7], binding [18] or enzymatic processivity [9]. Coupling CFA with deep-learning allows for quantitatively more accurately measurements in the presence of noise. Fluorescence signal noise generally arises from a lack of detector measurement efficiency (which is termed shot noise) or high background intensities. Other fluorescence signal noise sources include low frequency photobleaching, or high frequency vibrations. Such noise degrades the correlation signal and can cause artifacts in the resulting correlation function. While shot noise and statistical error have been frequently discussed in the literature [17–19], deep neural net detection of such noise has not been discussed since historically this was not possible either due to the frequent use of a digital correlator or the absence of suitable computational tools. Convolutional Neural Networks have been used to predict correlations in fluorescence imaging of neural activity [20], non-negative matrix factorization has been used to analyze fluorescence correlation spectroscopy data containing several components [21] but this study represents the first application of deep time-series neural networks directly to fluorescence time series rather than correlation functions and represents a powerful advance over traditional methods. In this paper, it is shown that this uniquely devised system provides superior fluorescence signal processing capabilities over classical CFA by capturing and analyzing noise amplitude, frequency and time domain information. In Sect. 2 the process of calculating cross-correlation for two-color Fluorescence Cross Correlation Spectroscopy (FCCS) is explained, and a detailed systematic approach for detecting fluorescence signal noise using a deep time-series neural network is described. In Sect. 3, results of a two-color FCCS experiment are explained, and the use of a timeseries neural network and wavelet transform analysis in noise detection are illustrated. Finally, in Sect. 4 conclusions are summarized and recommendations are briefly discussed.
2 Correlation and Noise Detection In order to detect noise in a two-color FCS experiment, the system described in Fig. 1 was devised. Figure 1 depicts a cross-correlation noise detection system which enhances traditional CFA. Two time-series fluorescence signals (Ri and Gi) are introduced to the Cross-Correlator which outputs the cross-correlation time-series Cd. Let us
20
J. Obert and M. Ferguson
Fig. 1. The deep time series network predicts future values of Cd and detects noise present in leading signal Ri. The spectrum analyzer characterizes signal Ri noise by comparing the phase differences and signal coherence between calculated cross-correlation Cd and predicted crosscorrelation Fd.
assume that Ri leads Gi in the time domain, and The Deep Time Series Network is trained using Cd and Gi. The trained Deep-Time Series Network is used to predict future values of Cd, and identify patterns in Cd that indicate the presence of noise in Ri. The Spectrum Analyzer calculates signal coherence (Hk) and phase (Pj) between the signals Fd and Cd. Analysis of the spectral phase differences within Pj coupled with the cross-signal coherence Hk is conducted in order to study the affects that Ri noise has on Cd’s spectral properties. When processing non-stationary signals where frequency content varies over time, time-frequency domain cross-signal coherence and phase shift analysis is used to analyze spectral differences between signals. Detailed descriptions of each of the components shown in Fig. 1 are included in the following sections. 2.1
Signal Cross-Correlation
Traditional CFA is primarily based (1) where cross-correlating between signals Ri and Gi is calculated. Consider signals Ri and Gi which are represented as two series, where i = 1, 2, 00203…N and N equals the total number of samples in the series. The crosscorrelation Cd at delay d is defined as [22]. PN i¼1 Ri meanð RÞÞðGðid Þ meanðGÞ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q Cd ¼ P 2ffi N 2 PN i¼1 ðRi meanð RÞÞ i¼1 Gðid Þ meanðGÞ
ð1Þ
If (1) is computed for all delays d = 1, 2, 3…N. When (i – d < 0 or i – d N), it is assumed the series Ri and Gi are assumed to be zero for i < 0 and i N. In this analysis values for Cd that are within the bounds of (i – d > 0 and i – d < N) are computed, and cross-correlations at short delays are tested. The denominator in (1) normalizes the correlation coefficients such that (1 Cd −1) with these bounds indicating maximum correlation either negative or positive and 0 indicating no correlation between Ri and Gi.
Deep Time Series Neural Networks
2.2
21
Deep Time-Series Network
In order to predict values for Fd, given time-series Ri and Gi a dynamic neural network with feedback elements and input delays is required. A dynamic network with enough memory (a deep network) is required to learn the large number of sequential patterns present in signals Gi and Cd. Futhermore, it is desirable to recognize noise present within Cd and forward the predicted cross-correlation values present in Fd to the spectrum analyzer for further analysis. The network output Fd is inherently dependent on current or previous network states and inputs and it has been shown that a deep nonlinear autoregressive network with exogenous inputs (NARX) is capable of learning complex sequential patterns [23]. Figure 2 shows the deep NARX seriesparallel training network, while Fig. 3 shows the parallel prediction network with feedback connections enclosing hidden layers.
Fig. 2. Series-parallel training network.
For analysis sake in Figs. 2, 3 and 4 and in (1), let G(t) = Gi, C(t) = Cd and F(t) = Fd. The NARX model is based on the linear ARX model, a prediction error method based on least squares [24]. The least squares method is the most efficient polynomial estimation method because this method solves linear regression equations in analytic form. The NARX model in terms of the detection system is as follows: F ðtÞ ¼ f ðF ðt 1Þ; F ðt 2Þ; . . .; F ðt nÞ; Gðt 1Þ; Gðt 2Þ; . . .; Gðt nÞÞ
ð2Þ
where t is time, n = 1, 2, 3, …, N and f is the function being approximated. As shown in Figs. 2, 3 and 4 each successive value of the output signal F(t) is regressed on previous values of the output signal and previous values of the independent exogenous input signal G(t). Figure 4 is a more detailed view of a single hidden layer parallel prediction network illustrated in Fig. 3. Generally speaking, increasing the number of hidden layers increases the learning and prediction capacity of the network. In Sect. 3,
22
J. Obert and M. Ferguson
Fig. 3. Parallel prediction network.
Fig. 4. Single hidden layer series network with inputs G(t) - “green signal Gi”, and F(t) “predicted cross- correlation Fd”.
a discussion of how the number of hidden layers was determined for the highest effectiveness in this application. 2.3
Choice of NARX Model
With regards to the NARX non-linear function f(.) (2), various models are available. A commonly used algorithm is the Forward Regression Orthogonal Estimator (FROE) [31]. FROE is based a prediction error minimization (PEM) algorithm and uses Orthogonal (OLS) to estimate the parameters. FROE decouples the regressors via orthogonalization, and does this by iterating and keeping the regressors that best enhance the prediction performance. Fast Recursive Algorithm (FRA) [32] is a PEMbased approach that more efficiently uses OLS recursively minus matrix decomposition and transformation. There can be issues with using PEM when selecting a NARX model and these issues are outlined in [33, 34]. Simulation Error Minimization (SEM) with
Deep Time Series Neural Networks
23
pruning (SEMP) improves model selection [34] using iteration, pruning and by rating candidate models based on simulation performance. Computational complexity can be greatly reduced in SEMP using cluster selection methods [35]. The greatest success was observed when using SEMP for model selection, and an expanded explanation of this analysis is provided in Sect. 3. 2.4
Spectrum Analyzer
The output signal Fd is not an exact copy of Cd, but rather because of the non-linear auto-regressive nature of the NARX, pattern differences between the lagging Gi signal and the jointly derived (Cd & Gi) cross-correlation signal Cd are amplified. In order to characterize pattern differences, the spectrum analyzer compares Cd and Fd signal phase and spectral cross-coherence between the two signals. To address potential nonstationary characteristics of Cd and Fd, it is important to have a measure of coherence in the time-frequency plane. The time-frequency representation is constructed using continuous wavelet transforms (CWT) and wavelet transform coherence (WTC) measurements which are discussed in Sects. 2.5 and 2.6 below. 2.5
Wavelet Transform
A wavelet is an oscillation that fluctuates between zero and some defined amplitude, and are constructed for the purposes of extracting information from signals. Specific mother wavelets with inherent signal extraction properties are convolved with signals of interest using the continuous wavelet transforms (CWT) shown in (3) and (4) below. For analysis sake in (3)–(7), let C(t) = Cd and F(t) = Fd. WC ðs; sÞ ¼ WF ðs; sÞ ¼
1
Z
CðtÞw
t s dt s
ð3Þ
Z
F ðtÞw
t s dt s
ð4Þ
js1=2 j 1 js1=2 j
where s is the scale (s > 0), s 2 R þ , translational value s with s 2 R. C(t) is the computed cross-correlation, F(t) equals the predicted cross-correlation, w is the mother wavelet complex conjugate, and WC ðs; sÞ; WF ðs; sÞ are wavelet coefficients. The mother wavelet used for the analysis is the Morse wavelet. Morse wavelets are a family of analytic wavelets that are complex-valued, and are particularly useful for analyzing modulated time-varying amplitude and frequency signals [25–28]. Equations 3 and 4 divide non-stationary signals C(t) and F(t) into wavelets, and transforms them into a time-frequency form with high resolution time-frequency localization. To compare Cd and Fd signal amplitude, time and frequency characteristics (3) and (4) are used. In Sect. 3.4 and Fig. 6 it is shown how well noise is detected in Fd through such analysis.
24
2.6
J. Obert and M. Ferguson
Wavelet Transform Coherence
Wavelet coherence (Hk) detects common time-localized oscillations in non-stationary signals, and thus wavelet coherence is used to measure the influence and differences between Cd and Fd. In the time-frequency domain signals are correlated by measuring signal coherence versus phase. The appearance of uncorrelated frequency components between signals within specific timeframes and shifts in phase (Pj) and coherence of cross-signal frequency components often signify the presence of noise. Equations 5, 6 and 7 below are used to calculate the wavelet transform coherence and phase between signals Cd and Fd. WCF ðs; sÞ ¼ WC ðs; sÞWF ðs; sÞ WLCCðs; sÞ ¼ CWCFðs; sÞ ¼
real½WCF ðs; sÞ jWC ðs; sÞjjWF ðs; sÞj 2jWCF ðs; sÞj2
jWC ðs; sÞj4 þ jWF ðs; sÞj4
ð5Þ ð6Þ ð7Þ
The wavelet local correlation coefficient (WLCC) is a measure of the phase correlation between Cd and Fd at the time-scale domain, and the cross-wavelet coherence function (CWCF) measures the cross-signal coherence [29]. Equations 6 and 7 are used to calculate the signal phase and cross-wavelet coherence differences between Cd and Fd. In Sect. 3.4 and Fig. 7, it is shown that phase and cross-wavelet coherence differences between Cd and Fd can be used fully characterize the noise in Fd.
3 Methods, Results and Evaluation 3.1
Data
The data analyzed in this paper is from a 3D orbital tracking experiment in the lab. 3D Orbital Tracking (3DOT) is a technique for following brightly labeled fluorescent particles in time by monitoring photon counts on a detector as a laser focus is scanned in a circular orbit around the particle of interest [14]. The experimental platform used was a 3DOT-FCCS system used for the characterization of transcriptional and splicing regulation with dual color (red Ri and green Gi) in vivo RNA labelling [10]. In this analysis a single-cell FCS experimental database with red and green signals was used. Poisson distributed noise mixed with the red signal to simulate shot noise and low frequency photobleaching was used. To test the noise detection capabilities of the deep time series Artificial Neural Network (ANN), 4 separate datasets were created. Each dataset was composed of base red signal data mixed with a different shot noise frequency (k) and split into training, validation and testing subsets.
Deep Time Series Neural Networks
3.2
25
Model Development
The calculation cross-correlation Cd with inputs Ri and Gi constitute a dynamic linear system. Accordingly, as shown in Sect. 2.2 and Fig. 2, it was of interest to create a model that is capable of estimating the non-linear dynamic time-series values of Cd. As shown in Fig. 2, the first step was to develop a dynamic feedforward network with tapped delay lines and inputs Gi and Cd. The ANN was trained using the series-parallel feedforward network shown in Fig. 2 with the lagging green signal Gi and the calculated cross-correlation Cd as inputs. The output of the parallel-series network is the predicted cross-correlation value Fd. Next, the series-parallel network’s ability to detect noise in the lagging red signal Ri was tested. This testing was accomplished by introducing Gi and Fd to the trained network shown in Fig. 3. Section 3.3, Fig. 5(c) and (d) shows an example of Cd versus Fd. To establish a relationship between the number of network hidden layers versus the network’s noise detection capacity, the aforementioned training and testing process was repeated while varying the number of hidden layers (input delays = 2, # training epochs = 1000, individual hidden layer sizes = 20 neurons). In deriving the most effective model, both gradient and Jacobianbased back propagation training algorithms were utilized in the experiments and error and convergence times were closely observed. In selecting the non-linear function f(.) (2), Simulation Error Minimization with Pruning was utilized. 3.3
Noise Detection
To establish a relationship between noise detection and a deep ANN architecture, the number of hidden layers (h) was varied and each of the datasets were introduced to each unique ANN. Using independent component analysis, the predicted cross correlation signal (Fd) average inverse Peak Signal-to-Noise Ratio (PSNR) was then measured for each ANN architecture-noise frequency combination. The expectation was that an ANN which yielded a larger inverse PSNR is a better noise detector. Figure 5 below illustrates the affects of input signal noise on cross-correlation where red Ri leads green Gi. Figure 5(a) shows the input leading red signal (Ri) and 5(b) shows the same input signal with the noise removed. Figure 5(c) illustrates the affect that red input signal noise has upon the NARX predicted cross-correlation (Fd) versus the calculated cross-correlation (Cd) while 6(d) compares Cd with Fd when red signal noise is low. As shown the mean square error (MSE) between calculated Cd and NARX predicted Fd is approximately 14.8% in the noisy red signal case 5(c) and 6.4% MSE when red signal noise is low 5(d). It should be noted however, that the crosscorrelation difference between calculated Cd in the noisy red signal instance 5(c) versus Cd in the low red signal noise case 5(d) is only 1.5% average MSE. The NARX’s ability to detect the presence of simulated shot noise by adding Poisson distributed noise to the leading red signa was tested. Table 1 below shows the inverse of Peak Signal-to-Noise Ratio average in dB versus the number of hidden layers (h) with k = 2, 3, 4, and 5. A high MSE between Cd and Fd indicates that the NN has detected a high level of noise in Ri, and 1/PSNR is directly proportional to the MSE. If the MSE increases, a proportional increase in 1/PSNR is expected. Over the span of the experiments, on
26
J. Obert and M. Ferguson
Fig. 5. (a) Noisy red signal, (b) Low-noise red signal (c) Low-noise red Cd and Fd, (d) Noisy red Cd and Fd (blue: Cd, black Fd).
Deep Time Series Neural Networks
27
Fig. 6. (a) Low-noise red signal Cd CWT, (b) Low-noise red signal Fd CWT, (c) Noisy red signal Cd CWT, (d) Noisy red signal Fd CWT.
28
J. Obert and M. Ferguson Table 1. Average inverse PSNR versus number hidden-layers
k k k k
= = = =
2 3 4 5
1/PSNR h = 15 h = 20 0.0493 0.0518 0.0494 0.0519 0.0495 0.0520 0.0497 0.0522
h = 25 0.0537 0.0538 0.0538 0.0541
h = 30 0.0574 0.0575 0.0576 0.0579
h = 35 0.0624 0.0625 0.0626 0.0629
average the observed MSE was found to be ( 1.7%) with a signal-to-noise ratio (1/PSNR < 0.051), and ( 10.8%) when red signal 1/PSNR > 0.57. In general red signal noise could not be easily detected by direct signal analysis of calculated Cd values, but was found to be detectable in the NARX predicted values of Fd. Analysis of the NARX predicted values of Fd showed that red signal noise was most effectively detected on average when the number of hidden NARX layers was 25. Training of the NARX using both gradient and Jacobian-based back propagation algorithms was done. It was subsequently found the best noise detection performance was achieved using the Jacobian-based Levenberg-Marquardt backpropagation algorithm [30] when configured with a minimum average performance gradient of 1e−7, and average mu increase and decrease factors of 10 and 0.1, respectively. 3.4
Spectral Analysis
As mentioned in the previous section, the autoregressive non-linear nature of the NARX was found to be highly sensitive to red signal noise. In Fig. 6, 3D CWT graphs are shown for Fd and Cd. Figure 6(b) shows only slightly elevated wavelet coefficient levels at time 450 and for 6(b). Where red signal noise is high, greatly elevated wavelet coefficient levels for NARX predicted Fd cross-correlation values are shown in Fig. 6(d). In order to characterize the effects of red signal noise on Cd and Fd in relation to one another, cross-wavelet coherence (7) and wavelet local correlation coefficient (6) data were captured and represented in a two dimensional plot with frequency on the vertical axis and time on the horizontal as shown in Fig. 7. Figure 7(a) shows wavelet coherence and phase when the red signal contains high levels of noise between Cd and Fd, and 7(b) shows the same when red signal noise is low. Comparing the signal coherence (Hk) and phase differences (Pj) between Cd and Fd, it is evident that the primary noise frequency components reside in the frequencies between 2 and 4 Hz. Additionally, these same components are present throughout the entire timeframe. Observations indicate that important information rich, high coherence bands between (freq: 8–32 Hz, time: 0–200) and (freq: 4–18 Hz, time: 800–100) are present in the low-noise signals 7(b), but absent in the high-noise signals in 7(a).
Deep Time Series Neural Networks
29
Fig. 7. (a) Noisy red Cd and Fd wavelet coherence (Hk) and phase (Pj), (b) Low-noise red Cd and Fd wavelet coherence (Hk) and phase (Pj).
30
J. Obert and M. Ferguson
4 Conclusions and Limitations As mentioned earlier, classical fluorescence signal CFA (1) does not exclude noise data points when calculating cross-correlation. In cases where signal noise is considerable, it is important to detect when such noise is undermining the correlation integrity. This research has shown that the use of deep time-series neural networks coupled with wavelet spectral analysis is able to identify extranious noise in FCS two-color experiments. It has been shown that it is possible for a deep time-series neural network (DTSNN) to detect leading source signal noise when trained with a lagging source signal and calculated cross-correlation data. It was observed that as the number of hidden layers grows, the noise detection sensitivity of the DTSNN increases. Additionally, through the use of wavelet time-frequency analysis it was shown that the DTSNN’s predicted cross-correlation signal contained leading signal noise predictors. When inspected, these predictors (wavelet coherence and phase) were shown to be guides in characterizing leading fluoresent signal noise. That is not to say there are not limitations to these devised methods. DTSNNs are based on supervised learning and careful consideration must be given to the overhead of training. In this research the training outcomes were often nondeterministic and depended greatly on the choice of initial parameters (input delays, number of neurons, number of layers, training algorithm choice, etc.). The implemented networks in this research rely on regression and there is always the possibility of overfitting the data resulting in unpredictable data generalization. To avoid such situations, careful statistical preprocessing of training data must be performed to predict if a selected network architecture is subject to overfitting or is unable to converge. Nevertheless, knowing which signal noise frequencies are degrading cross-correlations enables one to implement countermeasures. Such countermeasures include implementing digital band-pass filters that effectively remove known harmful frequencies from fluoresence input signals, physically removing noise sources or insulating the measurement system itself. Innovative countermeasures is a topic of our future research.
References 1. Brown, R.H., Twiss, R.Q.: A test of a new type of stellar interferometer on Sirius. Nature 178, 1046–1048 (1956) 2. Van Hove, L., McVoy, K.W.: Pair distribution functions and scattering phenomena. Phys. Rev. C: Nucl. Phys. 33, 468–476 (1962) 3. Laser, C., Scattering, L.: Ann. Rev. Phys. Chem. 21, 145–174 (1970) 4. Elson, E.L., Magde, D.: Fluorescience correlation spectroscopy I. Conceptual basis and theory. Biopolymers 13, 1–27 (1974) 5. Magde, D., Elson, E.L., Webb, W.W.: Fluorescence correlation spectroscopy II. An experimental realization. Biopolymers 13, 29–61 (1974) 6. Schwille, P., Haupts, U., Maiti, S., Webb, W.W.: Molecular dynamics in living cells observed by fluorescence correlation spectroscopy with one- and two-photon excitation. Biophys. J. 77, 2251–2265 (1999)
Deep Time Series Neural Networks
31
7. Digman, M.A., Sengupta, P., Wiseman, P.W., Brown, C.M., Horwitz, A.R., Gratton, E.: Fluctuation correlation spectroscopy with a laser-scanning microscope: exploiting the hidden time structure. Biophys. J. 88, L33–L36 (2005) 8. Digman, M.A., Dalal, R., Horwitz, A.F., Gratton, E.: Mapping the number of molecules and brightness in the laser scanning microscope. Biophys. J. 94, 2320–2332 (2008) 9. Larson, D.R., Zenklusen, D., Wu, B., Chao, J.A., Singer, R.H.: Real-time observation of transcription initiation and elongation on an endogenous yeast gene. Science 332, 475–478 (2011) 10. Coulon, A., Ferguson, M.L., de Turris, V., Palangat, M., Chow, C.C., Larson, D.R.: Kinetic competition during the transcription cycle results in stochastic RNA processing. Elife 3 (2014). https://doi.org/10.7554/elife.03939 11. Morisaki, T., Lyon, K., DeLuca, K.F., DeLuca, J.G., English, B.P., Zhang, Z., et al.: Realtime quantification of single RNA translation dynamics in living cells. Science 352, 1425– 1429 (2016) 12. Berne, B.J., Pecora, R.: Dynamic light scattering: with applications to chemistry, biology, and physics. Courier Corporation (2000) 13. Coulon, A., Larson, D.R.: Fluctuation analysis: dissecting transcriptional kinetics with signal theory. Methods Enzymol. 572, 159–191 (2016) 14. Levi, V., Ruan, Q., Gratton, E.: 3-D particle tracking in a two-photon microscope: application to the study of molecular dynamics in cells. Biophys. J. 88, 2919–2928 (2005) 15. Levi, V., Ruan, Q., Plutz, M., Belmont, A.S., Gratton, E.: Chromatin dynamics in interphase cells revealed by tracking in a two-photon excitation microscope. Biophys. J. 89, 4275–4285 (2005) 16. Yu, J., Xiao, J., Ren, X., Lao, K., Xie, X.S.: Probing gene expression in live cells, one protein molecule at a time. Science 311, 1600–1603 (2006) 17. Wohland, T., Rigler, R., Vogel, H.: The standard deviation in fluorescence correlation spectroscopy. Biophys. J. 80, 2987–2999 (2001) 18. Digman, M.A., Gratton, E.: Analysis of diffusion and binding in cells using the RICS approach. Microsci. Res. Tech. 72, 323–332 (2009) 19. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queueing Syst. (2008). http://dl.acm.org/citation.cfm?id=1365500 20. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006) 21. Bengio, Y., Goodfellow, I.J., Courville, A.: Deep learning. Nature 521, 436–444 (2015). Citeseer 22. Tahmasebi, P., Hezarkhani, A., Sahimi, M.: Multiple-point geostatistical modeling based on the cross-correlation functions. Comput. Geosci. 16(3), 779–797 (2012) 23. Caudill, M., Butler, C.: Understanding Neural Networks: Computer Explorations, vols. 1 and 2. The MIT Press, Cambridge (1992) 24. Chang, G.: Time-frequency dynamics of resting-state brain connectivity measured with fMRI. NeuroImage 50(1), 81–98 (2010) 25. Olhede, S.C., Walden, A.T.: Generalized Morse wavelets. IEEE Trans. Signal Process. 50(11), 2661–2670 (2002) 26. Lilly, J.M., Olhede, S.C.: Higher-order properties of analytic wavelets. IEEE Trans. Signal Process. 57(1), 146–160 (2009) 27. Lilly, J.M., Olhede, S.C.: On the analytic wavelet transform. IEEE Trans. Inf. Theory 56(8), 4135–4156 (2010) 28. Lilly, J.M., Olhede, S.C.: Generalized Morse wavelets as a superfamily of analytic wavelets. IEEE Trans. Signal Process. 60(11), 6036–6041 (2012)
32
J. Obert and M. Ferguson
29. Sello, S., Bellazzini, J.: Wavelet cross-correlation analysis of turbulent mixing from largeeddy-simulations. arXiv:physics/0003029v1 (2000) 30. Hagan, M.T., H.B. Demuth, Beale, M.H.: Neural Network Design. PWS Publishing, Boston (1996). Chaps. 11 and 12 31. Billings, S.A., Chen, S., Korenberg, M.J.: Identification of MIMO non-linear systems using a forward-regression orthogonal estimator. Int. J. Control 49, 2157–2189 (1989) 32. Li, K., Peng, J.-X., Irwin, G.W.: A fast nonlinear model identification method. IEEE Trans. Autom. Control 50(8), 1211–1216 (2005) 33. Aguirre, L.A., Billings, S.A.: Dynamical effects of overparametrization in nonlinear models. Physica D 80, 26–40 (1995) 34. Piroddi, L., Spinelli, W.: An identification algorithm for polynomial NARX models based on simulation error minimization. Int. J. Control 76(17), 1767–1781 (2003) 35. Piroddi, L.: Simulation error minimization methods for NARX model identification. Int. J. Model. Identif. Control. To be published
Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem Riyaz Sikora(&) and Sahil Raina Department of Information Systems, University of Texas at Arlington, Arlington, TX 76019, USA
[email protected],
[email protected]
Abstract. Class imbalance problem has been a widely studied problem in data mining. In this paper, we present a new filter approach to the class imbalance problem that uses repeated under-sampling to create balanced data sets and then uses majority voting ensemble type learning to create a meta-classifier. We test our method on five imbalanced data sets and compare its performance with that of three other techniques. We show that our method significantly improves the prediction accuracy of the under-represented class while also reducing the gap in prediction accuracy between the two classes. Keywords: Class imbalance Anomaly detection
Under sampling Majority voting
1 Introduction Imbalanced data sets are a growing problem in data mining and business analytics. A data set is imbalanced when, for a two-class classification problem, the data for one class (majority) greatly outnumbers the other class (minority). In recent years, the imbalanced learning problem has drawn a significant amount of interest from academia, industry, and government funding agencies [1]. Most predictive machine learning or data mining algorithms assume balanced data sets and their ability to predict the minority class deteriorates in the presence of class imbalance. This is especially troubling when the minority class is the class of interest and it is costly to misclassify examples of the minority class. For example, in detecting fraudulent credit card transactions the fraudulent transactions may be less than 1% of the total transactions. However, it is imperative that we detect these transactions with high accuracy. In the presence of such severe imbalance most data mining algorithms would predict all instances as belonging to the majority class and be more than 99% accurate [2, 3]. Imbalanced data sets exist in many real-world domains, such as predicting fraudulent credit card transactions, diagnosing cancer, spotting malicious incoming internet traffic, text classification, detection of fraudulent telephone calls, information retrieval and filtering tasks, and so on. In these domains the goal is to predict the minority class with high precision. Although most of the studies on class imbalance only look at a two-class problem, imbalance between classes does exist in multi-class problems too © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 33–39, 2019. https://doi.org/10.1007/978-3-030-01177-2_3
34
R. Sikora and S. Raina
[4, 5]. There are many approaches that have been studied in the literature to tackle the imbalance problem. Most of these approaches focus either on manipulating the composition of the data by using sampling or modifying the metrics used by the data mining algorithms. Most of these approaches have been met with limited success. In this working paper we propose combining the use of under-sampling with ensemble learning to improve the performance of a standard data mining algorithm. We test our algorithm on 5 data sets collected from the UCI repository [6] with an imbalance ratio of at least 70%. We compare its performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the prediction accuracy for the minority class.
2 Literature Review Various techniques have been proposed to solve the problems associated with class imbalance [7]. Traditionally research on this topic has focused on solutions both at the data and algorithm levels. These can be broadly classified into three categories: (1) Resampling methods for balancing the data set, (2) Modification of existing learning algorithms, and (3) Measuring classifier performance with different metrics. Resampling techniques can be again broadly classified into over-sampling and under-sampling methods. In over-sampling the representation of minority examples is artificially boosted. In the simplest case, the minority class examples are duplicated to balance their numbers with those of the majority class [8–10]. In another widely used technique, SMOTE [2, 11], new minority instances are synthetically created by interpolating between several minority instances that lie close together. In undersampling [10], only a small subset of the majority class instances are sampled so as to create a balanced sample with the minority class. Despite its simplicity, it has empirically been shown to be an effective resampling method. However, the major problem of under-sampling is that it can discard data that is potentially important for the classification process. Several techniques at the algorithmic level have been proposed for solving the class imbalance problem. The goal of this approach is to optimize the performance of learning algorithm on unseen data. One-Class learning methods [12], as the name implies, learn concepts from instances of only one class. In this case the minority class instances are used as the main target class with either no instances of the majority class or just a few instances of the majority class to treat them as outliers. Another technique is to use cost-sensitive learning algorithms [13, 14]. The goal of cost sensitive classification is to minimize the cost of misclassification, which can be realized by choosing the class with the minimum conditional risk. In the next section we describe our method that combines under-sampling and ensemble learning.
Controlled Under-Sampling with Majority Voting Ensemble
35
3 Our Method Figure 1 describes our method where we combine majority voting ensemble learning with under-sampling. The majority class instances are randomly split into disjoint subsamples that are similar in size to the minority class instances. Each majority class subsample is then combined with the minority class instances to create multiple balanced sub-sets. The number of balanced sub-sets thus created depends on the ratio of imbalance in the original data set. For e.g., if the imbalance ratio is 75% then three balanced sub-sets will be created, each containing about one-third of the majority class instances and all of the minority class instances. Each sample is then used by the data mining algorithm to create a classifier. The individual classifiers are then combined into a meta-classifier my using majority voting when predicting instances from the test set. The test set is created before the balanced sub-sets are created by using stratified sampling so as to make sure that it represents the original class imbalance.
Fig. 1. Using under-sampling with majority voting ensemble learning.
To test our method we collected 5 data sets from the UCI Learning Repository [6] that had an imbalance ratio of at least 70%. Table 1 gives the details about the data sets used. For data sets with more than one class we converted the problem into a binary class by combining the minority classes into one class.
36
R. Sikora and S. Raina Table 1. The data sets used in our experiments Data set Adult Statlog German Bank marketing Student alcohol Red wine quality
# of attributes # of instances 14 30,603 20 1000 21 41,188 33 395 12 1599
Majority % 76.04% 70% 88.73% 88.35% 86.43%
We ran our experiments as 10-fold cross-validation by creating 10 stratified folds of the original data set. In each run we used one fold as the testing set and for our method used the remaining 9 folds to create the balanced training sub-sets using undersampling as described above. Similarly, in each run we also applied SMOTE and oversampling only on the training set consisting of the 9 folds. In all experiments we used the decision tree learning algorithm J48 from the Weka Machine Learning software. We compared our method with using the J48 algorithm on the original data set, and on balanced training sets created using SMOTE and over-sampling.
4 Results and Discussion Table 2 presents the results for the total accuracy across the four methods. All the results reported here are average of 10 runs described earlier. We also report the results of a paired t-test comparing our method with the other three methods. As can be seen, our method’s total accuracy drops compared to all the other methods. Although most of the differences are statistically significant, the drop itself is not by a significant amount. For e.g., the maximum drop in the total accuracy is 14% compared to the original data on the Student Alcohol data set. Note that all the three methods of dealing with the imbalance problem show a drop in the total accuracy. This indicates that there is usually a trade-off in treating the class imbalance problem. Table 2. Total accuracy of the four methods Data set
Original SMOTE Over sampling
Our method
Adult Statlog German Bank marketing Student alcohol Red wine quality
86% 70%
85% 65%
82% 66%
82% 60%
T-test for significance PSMOTE Pover Poriginal 6.03E−12 1.42E−10 n.s. 0.000517 n.s. n.s.
91%
90%
86%
86%
3.44E−16 6.8E−14
86%
85%
85%
72%
4.79E−06 5.02E−05 9.11E−06
88%
85%
88%
78%
3.12E−05 0.001208 3.93E−05
n.s.
Controlled Under-Sampling with Majority Voting Ensemble
37
To better study the trade-off, we look at the accuracy of predicting the individual classes. Since the minority class is the class of interest, we treat it as the positive class and the majority class as the negative class. Our goal is to improve the prediction accuracy of the minority class. In Table 3, we compare the prediction accuracy of the majority class or the true negative rate, also known as Specificity defined by TN/ (TN + FP), where TN is the true negatives, FN is the false negatives, TP is the true positives, and FP is the false positives. In Table 4, we compare the prediction accuracy of the minority class or the true positive rate, also known as Sensitivity defined by TP/ (TP + FN). These results clearly show the trade-off. Although the prediction accuracy of the majority class for our method drops, our method significantly improves the accuracy of predicting the minority class compared to all the other methods. In some cases, like the Student Alcohol data set it more than doubles the prediction accuracy of the minority class compared to all the other methods. Table 3. Accuracy of predicting the majority class Data set
Original SMOTE Over sampling
Our method
Adult Statlog German Bank marketing Student alcohol Red wine quality
94% 92%
91% 70%
84% 68%
82% 52%
T-test for significance PSMOTE Pover Poriginal 1.68E−18 1.49E−16 0.00017 2.74E−11 5.81E−05 0.000578
96%
93%
87%
85%
2.88E−23 8.22E−19 n.s.
94%
91%
91%
71%
1.64E−09 2.83E−08 1.76E−08
94%
87%
91%
77%
6.36E−08 0.000122 8.08E−07
Since most data mining algorithms work best on a balanced data set, the ideal performance goal of an algorithm should be to have high but similar prediction accuracies for both the classes even in the presence of class imbalance. To evaluate this relative performance between the two classes we combine the results from Tables 3 and 4 and report the gap between the prediction accuracies of the two classes in Table 5. Again, our method provides the best performance in terms of minimizing the gap in performance between the two classes. Most of the improvements shown by our method are also statistically significant.
38
R. Sikora and S. Raina Table 4. Accuracy of predicting the minority class
Data set
Original SMOTE Over sampling
Our method
Adult Statlog German Bank marketing Student alcohol Red wine quality
63% 19%
66% 55%
77% 62%
84% 77%
T-test for significance PSMOTE Pover Poriginal 3.93E−16 5.96E−15 2.56E−09 3.01E−07 0.005079 n.s.
54%
65%
74%
94%
2E−18
22%
36%
37%
78%
8.49E−07 1.01E−05 1.22E−05
53%
74%
63%
86%
1.64E−06 0.003636 5.6E−06
6.03E−16 1.26E−16
Table 5. Gap between the prediction accuracy of both classes Data set
Original SMOTE Over sampling
Our method
Adult Statlog German Bank marketing Student alcohol Red wine quality
31% 73%
26% 20%
7% 17%
2% 29%
T-test for significance Poriginal PSMOTE Pover 3.44E−19 2.6E−17 2.28E−08 5.52E−06 n.s. n.s.
42%
27%
18%
9%
7.08E−17 5.42E−12 2.21E−09
72%
56%
54%
13%
5.5E−08
3.04E−07 1.49E−06
41%
13%
29%
10%
6.2E−06
n.s.
0.000375
5 Conclusions and Future Work In this paper we presented a new technique for overcoming the class imbalance problem in data mining. We used under-sampling to create multiple disjoint sub-sets of the majority class which are then combined with the minority class instances to create balanced sub-sets of data. We then applied ensemble type of learning where a data mining algorithm is applied on the individual sub-sets and the resulting classifiers are combined into a meta-classifier by using majority voting for predicting the test cases. We used the decision tree learning algorithm J48 as the data mining algorithm and tested our method on five data sets collected from the UCI Repository that had an imbalance ratio of at least 70%. We compared our method with just using the J48 algorithm on the original data and with two other methods that have been proposed for the class imbalance problem. We showed that our method was the best in terms of significantly improving the prediction accuracy of the minority class and minimizing the gap in prediction accuracy of both the classes.
Controlled Under-Sampling with Majority Voting Ensemble
39
This was a preliminary working paper and we plan on testing several modifications to improve on our algorithm. In this version we created mutually exclusive sub-sets of the majority class. The drawback is that the number of subsets that have to be created then becomes fixed. In future we would like to try a more general random sampling approach so that different sub-sets can have common instances. We can then try varying the number of sub-sets to find the optimal number. In terms of the data mining algorithm, instead of using the same data mining algorithm on all the sub-sets of data as we have done in this paper, we will experiment with using different algorithm to see if that can further improve the results. Finally, in creating the meta-heuristic we would like to test using weighted average to combine the prediction of the individual classifiers instead of a simple voting mechanism currently used. On the experimental side we would like to test our method on 20 or more data sets to show consistency and validity of our method. Another promising avenue for further research is modifying the metrics used by the standard data mining algorithms to make them more cost sensitive.
References 1. He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009) 2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 3. Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, W.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Int. J. Pattern Recogn. Artif. Intell. 7(6), 1417–1436 (1993) 4. Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of International Conference on Data Mining, pp. 592–602 (2006) 5. Liu, X.Y., Zhou, Z.H.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006) 6. UC Irvine Machine Learning Repository (2009). http://archive.ics.uci.edu/ml/ 7. Garcia, V.: The class imbalance problem in pattern classification and learning. Congreso Espanol de Informatica, vol. 9 (2013) 8. Batista, G.E., Pratti, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Expl. 6, 20–29 (2004) 9. Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proceeding of 4th International Conference on Knowledge Discovery and Data Mining, pp. 73–79 (1998) 10. Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under sampling beats over-sampling. In: Proceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets II (2003) 11. Han, H.: Borderline - SMOTE. Springer, Berlin (2005) 12. Raskutti, B., Kowalczyk, A.: Extreme rebalancing for svms: a case study. SIGKDD Expl. 6, 60–69 (2004) 13. Domingos, P.: Metacost: a general method for making classifiers costsensitive. In: Proceedings of 5th International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999) 14. Gordon, D.F., Perlis, D.: Explicitly biased generalization. Comput. Intell. 5, 67–81 (1989)
Optimisation of Hadoop MapReduce Configuration Parameter Settings Using Genetic Algorithms Ali Khaleel(&) and H. S. Al-Raweshidy Department of Electronic and Computer Engineering, Brunel University London, London, UK {Ali.Khaleel,Hamed.Al-Raweshidy}@brunel.ac.uk
Abstract. In this paper, all issues related to Hadoop MapReduce parameter settings are addressed. Some significant parameters of Hadoop MapReduce are tuned using a novel intelligent technique based on both Genetic programming and Genetic Algorithm to optimise the performance of Hadoop MapReduce job. In Hadoop framework, there are more than 150 configurations parameters. Setting this number of parameters manually is a difficult aim and consumes long time. As a result, these algorithms are used to search for the optimum values of parameter settings. Experiments have been carried out on two typical applications of Hadoop that includes both Word Count Application and Tera Sort application. The results showed that our proposed technique improves MapReduce jobs performance for 20 GB on Word Count Application by 69.63% and 30.31% in comparison with default and Gunther work, respectively. Whilst on Tera sort application, the performance of Hadoop MapReduce is improved by 73.39% and 55.93% compared with default and Gunther work, consecutively. Keywords: Hadoop MapReduce Genetic Algorithm Genetic programming Big data Parameter settings optimsation
1 Introduction Big data is a term that refers to the large and complex data sets that cannot be processed, captured, stored and analysed using traditional tools [1]. These amounts of huge data are generated from different sources such as social media, sensor devices, Internet of things, mobile banking, transaction and many more. Furthermore, many governments and commercial organizations are producing a large amount of data such as financial and banking statements, healthcare providers, high education and research centres, manufacturing sector, insurance companies and transportation sector. For instance, International Data Corporation (IDC) reported that 2800 Exabyte of world were stored in 2012 and this number of data will reach up to 40000 Exabyte in the next 10 years. Similarly, Facebook processes around 500000 GB every day. The vast amount of data includes both structured such as relational database, semi structured and unstructured data such as texts, videos, images, multimedia, and web pages. These types of huge data with various formats lead to emerge the term of big data [2]. As a result, these massive datasets are hard © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 40–52, 2019. https://doi.org/10.1007/978-3-030-01177-2_4
Optimisation of Hadoop MapReduce Configuration Parameter Settings
41
to be processed using traditional tools and current database systems. Hadoop MapReduce is a powerful computing technology to support big data applications [3]. Hadoop is an open source framework that enables the implementation of MapReduce algorithm for data processing purposes. It is scalable, fault-tolerant and able to process massive data sets in parallel. large datasets can be distributed across several computing nodes of Hadoop cluster to achieve better computation resources and power [4]. Hadoop has a complex structure that contains several parts which react with each other through several computing devices. Moreover, Hadoop has more than 150 configuration parameters and recent studies have showed that tuning some parameters has a considerable effect on the performance of Hadoop job [5, 6]. Because of the black box feature of Hadoop framework, the tuning of configuration parameters values manually is challenging task and time consumption. To tackle this issue, Genetic Algorithms for Hadoop have been developed to achieve optimum or near optimum performance of Hadoop MapReduce parameter settings. The other sections of this paper is organised as follows. Section 2 presents some related work. In Sect. 3, a set of Hadoop MapReduce parameters are introduced. Section 4 presents the implementation of Genetic Programing for building an objective function of Hadoop MapReduce parameters. The implementation of Genetic Algorithm for MapReduce parameters optimisation is explained in Sect. 5. Section 6 presents the performance evaluation of the proposed work using Hadoop cluster in Microsoft azure cloud. Section 7 describes and discusses the experimental results and eventually the paper is concluded in Sect. 8.
2 Related Work Many works have been proposed for the automatic tuning of Hadoop MapReduce parameter settings. One of them is PPABS [7] (Profiling and Performance AnalysisBased Self-tuning). In this framework, Hadoop MapReduce parameter settings are tuned automatically using the analyser that classifies MapReduce applications into equal classes by modifying k-means ++ clustering and simulated annealing algorithm. Furthermore, recogniser is also used to classify unknown jobs to one of these equivalent classes. However, PPABS cannot tune parameters of an unknown job which is not included into equivalent classes. Another approach called Gunther is proposed for Hadoop configuration parameters optimisation using Genetic Algorithm. However, all MapReduce jobs were executed physically to evaluate the objective functions of required parameters because Gunther does not have an objective function for each parameter. Moreover, the execution time of running MapReduce jobs for objective function evaluation is very long [8]. Panacea framework is proposed to optimise Hadoop applications based on combination of statistic and trace analysis using a compiler guided tool. Panacea divides the search place into sub places and subsequently the search for best values within predetermined Ranges [9]. A performance evaluation model of MapReduce is proposed in [10]. This framework correlated performance metrics from different layers in terms of hardware, software, and network. Industrial professionals proposed the Rule-Of-Thumb (ROT), which is merely a common practice for Hadoop parameter settings tuning [11, 12]. In [13] online performance tuning system for MapReduce is proposed to monitor the execution of a Hadoop job and tunes associated performance-tuning parameters based on
42
A. Khaleel and H. S. Al-Raweshidy
collected statistics. Author in [14] optimises MapReduce parameters by proposing profile to collect profiles online during the execution of MapReduce jobs in the cluster. In [15] self-tuning system for big data analytics called starfish is proposed to achieve the best configurations of Hadoop framework to better utilise cluster resources in terms of CPU and memory.
3 Hadoop MapReduce Parameters Settings Hadoop is a software platform written in java that enables distributed storage and processing of massive data sets using clusters of computer nodes. It provides large storage of any type of data (structured, semi structured and unstructured data) due to its scalability and fault tolerance. Furthermore, it has more than 150 tuneable parameters that play vital role on the flexibility of Hadoop MapReduce jobs. Some of these parameters have a remarkable influence on the performance of Hadoop job. Table 1 presents the main parameters of Hadoop. Below is the further description of the main parameter settings mentioned in Table 1. Table 1. The main parameter settings of Hadoop framework Parameter MapReduce.task.io.sort.mb MapReduce.task.io.sort.factor Mapred.compress.map.output MapReduce.job.reduces Mapreduce.map.sort.spill.percent MapReduce.tasktracker.map.tasks.maximum MapReduce.tasktracker.reduce.tasks.maximum Mapred.job.shuffle.input.buffer.percent
Default 100 10 false 1 0.80 2 2 0.70
(1) MapReduce.task.io.sort.mb: During sorting files, amount of buffer memory is required for each merge stream. This amount is determined by this parameter and by default it is set to be 1 MB for each merge stream and the total amount is 100 MB. (2) MapReduce.task.io.sort.factor: This parameter determines the required number of merged streams during sorting files process. The default value is set to be 10 as explained in Table 1. (3) Mapred.compress.map.output: The output results generated from mappers should be sent to the reducer through the shuffle phase. However, high traffic is generated during the shuffling process especially when the output data of mappers is large. Therefore, the results generated from mappers should be compressed to reduce the overhead in the network during the shuffling process and thus accelerate the hard disk IO.
Optimisation of Hadoop MapReduce Configuration Parameter Settings
43
(4) MapReduce.job.reduces: A specific number of map tasks are required to perform the process of MapReduce job in Hadoop cluster. This number of map tasks is specified by this parameter. Default settings of this parameter are assigned to 1. Furthermore, this parameter has a significant effect on Hadoop job performance. (5) Mapreduce.map.sort.spill.percent: The default setting of this parameter is 0.80 which represents the threshold of in memory buffer used in map process. The data of in memory buffer is spilled to the hard disk once the in memory buffer reaches to 80%. (6) MapReduce.tasktracker.reduce.tasks.maximum: Each MapReduce job has several Map and Reduce tasks running simultaneously on each data node in Hadoop cluster by task tracker. Reduce tasks number is determined by this parameter and its default setting is set to be 2. This parameter can have an important impact on the performance of Hadoop cluster when better utilising the cluster resources in terms of CPU and memory by tuning this parameter to the optimal value. (7) MapReduce.tasktracker.map.tasks.maximum: While number of reduce tasks is determined by parameter 6, this parameter defines number of map tasks running simultaneously on each data node. The default value of this parameter is 2. On the other hand, any change in the default settings of this parameter can have a positive impact on the total time of MapReduce job. (8) MapReduce.reduce.shuffle.input.buffer.percent: The output of mapper during the shuffling process requires a specific amount of memory from the maximum heap size for storage purposes. The percentage of this mount is determined by this parameter and its default value is set to be 0.70.
4 Hadoop MapReduce Parameters Evolve with Genetic Programming Genetic programming [16] is a technique that used to solve problems automatically using a set of genes and chromosomes. These genes and chromosomes are evolved using two essential genetic operations, which are crossover and mutation. In this work, GP is employed to create an objective function of MapReduce parameters. The parameters of Hadoop MapReduce are represented as (k1, k2, ……,kn). In this work, eight parameters are tuned using Genetic Algorithm. Objective function should be built first using Genetic programming. Therefore, a mathematical expression or function between these parameter settings needs to be determined. Genetic programming is used to evolve an expression between these parameters using arithmetic operations (*, +, –,/). The fitness assigned to each parameter during the population process in Genetic programming should reflect how closely the output of the mathematical expression (function) for this parameter to the original one. The arithmetic operations in Genetic programming are called functions while the parameters (k1, …, kn) are the leaves of the tree which are also called terminals. The mathematical expressions between Hadoop MapReduce parameters are determined based on data types of parameters. The mathematical expression should have same input data type and same number of input parameters. After the determination of mathematical expression, the completion time of these functions needs
44
A. Khaleel and H. S. Al-Raweshidy
to be calculated and compared with the real completion time. The best mathematical expression among the parameters (k1, …, kn) will be selected based on its approximated completion time. The approximated completion time should be very near to the real completion time. The tree in Genetic programming is used to hold both functions and terminals. As mentioned above, arithmetic operations (*, +, –,/) are called functions and (k1, …, kn) are called leaves or terminals. Figure 1 shows an example of the representation of parameters using Genetic programming.
Fig. 1. An example of Genetic Algorithm.
The figure shows that the function (*) has two input arguments, which are (+) and (/) and the function (+) has also two input arguments (k1, k2). The completion time of MapReduce job of Hadoop parameters can be represented as f (k1, k2,.., kn). The approximated completion time of Hadoop MapReduce job represents the evolved function that will be compared to the real completion time of Hadoop MapReduce that represents the target function. According to [16], the approximated completion time of the Hadoop MapReduce (evolved function) should be very near to the real completion time of the Hadoop MapReduce job (Target problem or function). Algorithm 1 shows the procedures of GP. In this work, a list of MapReduce jobs are used as input datasets and a large number of experiments was run for both Word count and Tera sort applications that used to process different sizes of the input datasets as presented in Sect. 7. The implementation of Genetic programming is performed to find all possible expressions between Hadoop MapReduce parameters by generating hundreds of chromosomes. In this work, 600 chromosomes were initially generated. All linear chromosomes are represented into form of Graph tree. The fitness value of every chromosome is calculated based on the completion time of a Hadoop MapReduce job for each training dataset. The completion time of Hadoop MapReduce job f (k1, k2,.., kn) for training datasets that generated from genetic chromosomes is compared with the real completion time of the Hadoop MapReduce job. The difference between the approximated and real completion time of Hadoop MapReduce job should not be more than 40 s which is referred as (TS).
Optimisation of Hadoop MapReduce Configuration Parameter Settings
45
The chromosome with high fitness value is selected. The measure of fitness value is the same as the number of Hadoop MapReduce job used in this process. This measure is supposed based on the example of soccer player to test the fitness in [17]. The evolution process will terminate once the best fitness value is obtained when the fitness value reaches to the number of Hadoop MapReduce jobs used in the process. Moreover, genetic selections and operators are applied such as mutation and crossover to produce new chromosomes and update the current ones. The expression between parameters is obtained after 40000 iterations.
46
A. Khaleel and H. S. Al-Raweshidy
Equation 1 below represents the mathematical expression and relation between Hadoop MapReduce parameters which is used as an objective function in the next algorithm (Genetic Algorithm). f ðk1 ; k2 ; . . .; k8 Þ ¼ ðk3 þ k7 Þ ðk5 =k2 Þ þ ðk1 k6 Þ ðk4 þ k8 Þ
ð1Þ
5 Hadoop MapReduce Parameter Settings Tuning Using Genetic Algorithm Genetic Algorithm (GA) is a metaheuristic algorithm which belongs to the group of evolutionary algorithms (EA) and was first proposed by John Holland to provide better solutions to complex problems. GA is widely used to solve many optimisation problems based on natural evolution processes. It works with a set of artificial chromosomes that represent possible solutions to a particular problem. Each chromosome has a fitness value that evaluates its quality as a good solution to the given problem [18]. GA starts with generating a random population of chromosomes. A set of essential genetic operations such as crossover, mutation and update are applied on the chromosome to perform recombination and selection processes on solutions of problem. The selection of chromosomes is performed based on their fitness value. The chromosome with high fitness has the chance to be chosen and create an offspring to generate next population [19]. Algorithm 2 describes the procedure for GA implementation, where (1) generated from Genetic programming used as an objective function that needs to be minimised. The objective function is expressed below: f ðk1 ; k2 ; ::; kn Þ ¼ ðk3 þ k7 Þ ðk5 =k2 Þ þ ðk1 k6 Þ ðk4 þ k8 Þ
Optimisation of Hadoop MapReduce Configuration Parameter Settings
47
In Algorithm 2, an initial population of chromosomes is randomly generated. Each MapReduce parameter is represented as a chromosome in the population. It means that chromosome(i) = k1,k2,..,kn, where n is the number of parameters. In this work, there are eight parameters that need to be tuned. After the generation of population, the fitness value of each chromosome in the population is evaluated based on the objective function f(k1, k2,.., kn). The chromosome with high fitness is selected and genetic operators which are selection, crossover and mutation are applied to update the current population and generate new one. The procedures are repeated until the best fitness values of chromosomes which represent the optimised MapReduce parameters are obtained or the number of iterations is finished. In this algorithm, 15 chromosomes are used as a population size and number of iterations was set to be 100. Furthermore, the probability of crossover Pc = 0.2 and the probability of mutation Pm = 0.1 are empirically determined and used as genetic operators. Roulette wheel spinning is employed as a selection process. The ranges and recommended values of Hadoop MapReduce parameters are presented in Table 2.
Table 2. Hadoop Mapreduce parameters recommended from genetic algorithm Hadoop MapReduce parameters Range 100–165 K1 K2 10–160 K3 True K4 1–16 0.60–0.80 K5 K6 2–4 K7 2–4 K8 0.70–0.71
Parameters name MapReduce.task.io.sort.mb MapReduce.task.io.sort.factor Mapred.compress.map.output MapReduce.job.reduces MapReduce.task.io.sort.spill.percent MapReduce.tasktracker.map.tasks.maximum MapReduce.tasktracker.reduce.tasks.maximum MapReduce.reduce.shuffle.input.buffer.percent
6 Performance Evaluation Environment The proposed work was implemented and evaluated using eight virtual machines of Hadoop cluster placed on Microsoft azure cloud. Each VM is assigned with 8 GB memory, 4 CPU cores and 320 GB storage for the whole cluster. Hadoop Cloudera (Hadoop 2.6.0-cdh5.9.0) has been installed on all nodes. One node has been configured as a master node and the rest as slave nodes. The master node is also can be run as a slave node. For fault-tolerance purposes we set the replication factor of data block to be 3 and the HDFS block size is 128 MB. Table 3 presents the specifications of Hadoop cluster.
48
A. Khaleel and H. S. Al-Raweshidy Table 3. Hadoop cluster setup
Intel Xeon X5550 server 1 and uxisvm04 server 2
CPU Processor Hard disk Connectivity Memory Host Operating System Guest Operating System
Operating system
32 cores 2.27 GHz 360 GB 1 GBit Ethernet LAN interconnectivity between two servers 64 GB Microsoft windows server 2012 R2 Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-27-generic x86_64)
7 Experimental Results Both Word Count and Tera sort application have been run as real job programs for Hadoop MapReduce framework to evaluate the performance of our proposed work on Hadoop cluster. It can be clearly observed that there is a difference among the tuned configurations of Hadoop MapReduce parameter settings using our proposed system, the default one and Gunther work. For instance, Fig. 2 shows that when the value of io.sort.mb rises up leads to decrease the execution time of the Hadoop MapReduce job. Moreover, io-sort-factor parameter defines the number of data streams to merge during sorting files. From Fig. 3, it can be clearly seen when the value of this parameter goes up; the execution time of the job goes down. It can also be observed from Fig. 4 when the number of reduce tasks is increased from 5 to 10 the execution time of Hadoop MapReduce job decreases. However, more increase for the number of reduce task results in increasing the execution time due to the overhead of network resources and over utilised of computing resources such CPU and memory. It is found that any further increasing of reduce tasks leads to generate high network traffic and consequently increase the overall time of Hadoop job. Figure 5 shows that increase the slots of map and reduce can play crucial role for better utilisation of cluster resources and accordingly minimise the overall time.
Fig. 2. The effect of the io.sort.mb parameter.
Optimisation of Hadoop MapReduce Configuration Parameter Settings
49
Fig. 3. The effect of io.sort.factor.
Fig. 4. Reduce tasks influence.
Fig. 5. Map and Reduce slots influence on MapReduce job.
Table 4 shows the optimised values of Hadoop MapReduce parameters for each size of dataset on eight virtual machines. To show the performance of our method, different sizes of data including 1 GB, 10 GB and 20 GB have been generated. The tuned parameters are used for both Word Count and Tera sort application. A comparison of the execution time of both Word Count and sort application of default settings and Gunther with the execution time of tuned settings using our proposed method has been carried out. Both word count and Tera sort application were run 2 times each. It can be observed that our proposed method can improve the MapReduce job
50
A. Khaleel and H. S. Al-Raweshidy
Table 4. Hadoop Mapreduce parameter settings recommended from genetic algorithm on eight virtual machines Name
Default Optimised values using genetic algorithms 1 GB 10 GB 20 GB mapreduce.task.io.sort.mb 100 100 140 165 mapreduce.task.io.sort.factor 10 50 125 160 mapred.compress.map.output false True True True mapreduce.job.reduces 1 16 10 10 mapreduce.map.sort.spill.percent 0.80 0.87 0.68 0.77 mapreduce.tasktracker.map.tasks.maximum 2 4 4 4 mapreduce.tasktracker.reduce.tasks.maximum 2 4 3 4 mapreduce.reduce.shuffle.input.buffer.percent 0.70 0.70 0.71 0.71
performance in Hadoop cluster especially with large input data sizes. Figures 6 and 7 shows the completion time of Hadoop MapReduce job using the proposed method in comparison with the default one and Gunther work. From Fig. 6, it can be observed that the performance of Hadoop Word Count Application is improved using the proposed work by 63.15% and 51.16% for 1 GB dataset compared with the default and Gunther settings, respectively. Furthermore, experiments have been carried out on 10 GB of dataset showed that our proposed method improves the performance of Word Count Application roughly 69% and 37.93% in comparison with the default and Gunther work, consecutively. The proposed method also achieved better performance than the default and Gunther settings on Word Count Application by 69.62% and 30.31%, respectively for 20 GB.
Fig. 6. Comparison of word count application.
Optimisation of Hadoop MapReduce Configuration Parameter Settings
51
Fig. 7. Comparison of Tera sort application.
From Fig. 7, it can be clearly seen that our proposed method improved Tera sort application performance by 52.72% comparable with default system and 44.28% compared to Gunther settings for 1 GB. For 10 GB, the performance is improved by 55.17% as compared to default one and 51.25% as compared to Gunther work. Finally, Tera sort application performance for 20 GB is improved by 73.39% and 55.93% compared with the default and Gunther settings, respectively.
8 Conclusion This paper has presented Genetic Algorithm and Genetic programming to optimise the performance of Hadoop MapReduce parameter settings. The configuration parameter settings have been tuned automatically using Genetic Algorithm and Genetic programming. Genetic Algorithm has been employed to search for the optimal values of parameter settings. Two applications which are Word Count and Tera sort application have been run to evaluate the MapReduce job performance of Hadoop framework. This work was evaluated using a cluster consisting of 8 virtual machines placed on internally cloud in Brunel University London. The results showed that our proposed method has improved the MapReduce job performance in Hadoop cluster in comparison to Gunther and the default system. Acknowledgment. The authors would like to thank the Iraqi Ministry of Higher Education and Scientific Research to financially sponsor the current research and study. The authors also thank Brunel University London for providing the efficient environment to implement this work experimentally.
52
A. Khaleel and H. S. Al-Raweshidy
References 1. Nandimath, J., et al.: Big data analysis using Apache Hadoop. In: IEEE 14th International Conference Information Reuse and Integration (IRI), pp. 700–703 (2013) 2. Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and map reduce. Nirma University, International Conference on Engineering (NUiCONE), pp. 1–5 (2012) 3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, vol. 6, p. 10 (2004) 4. Pavlo, A., Paulson, E., Rasin, A.: A comparison of approaches to large-scale data analysis. In: SIGMOD ’09 Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178 (2009) 5. Tips for improving MapReduce performance (2009). http://blog.cloudera.com/blog/2009/12/ 7-tips-for-improving-mapreduce-performance/. Accessed 15 Aug 2017 6. Apache Hadoop: Apache http://hadoop.apache.org/. Accessed 15 Aug 2017 7. Wu, D., Gokhale, A.: A self-tuning system based on application profiling and performance analysis for optimizing hadoop MapReduce cluster configuration. In: High Performance Computing (HiPC20th International Conference, pp. 89–98 (2013) 8. Liao, G., Datta, K., Willke, T.L.: Gunther: search-based auto-tuning of MapReduce. In: European Conference on Parallel Processing, pp. 406–419 (2013) 9. Liu, J. et al.: Panacea: towards holistic optimization of MapReduce applications. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp. 33–43 (2012) 10. Li, Y. et al.: Breaking the boundary for whole-system performance optimization of big data. In: Proceedings of International Symposium on Low Power Electronics and Design, pp. 126–131 (2013) 11. Hadoop Performance Tuning: https://hadoop-toolkit.googlecode.com/files/WhitepaperHadoopPerformanceTuning.pdf.. Accessed 15 Aug 2017 12. White, T.: Hadoop: The Definitive Guide, 3rd edn, p. 688. Yahoo press, Sebastopol (2012) 13. Li, M., et al.: Mronline: MapReduce online performance tuning. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 165–176 (2014) 14. Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of MapReduce programs. Proc. VLDB Endowment 4(11), 1111–1122 (2011) 15. Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: Cidr, pp. 261– 272 (2011) 16. McPhee, N.F., Poli, R., Langdon, W.B.: Field Guide to Genetic Programming (2008) 17. Walker, M.: Introduction to Genetic Programming. University of Montana, Tech.Np (2001) 18. McCall, J.: Genetic algorithms for modelling and optimisation. J. Comput. Appl. Math. 184 (1), 205–222 (2005) 19. Sivanandam, S., Deepa, S.: Introduction to Genetic Algorithms. Springer, Heidelberg (2007)
Fast Interpolation and Fourier Transform in High-Dimensional Spaces Michael Hecht1,2(B) and Ivo F. Sbalzarini1,2 1
Chair of Scientific Computing for Systems Biology, Faculty of Computer Science, TU Dresden, Dresden, Germany 2 Center for Systems Biology Dresden, Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany {hecht,ivos}@mpi-cbg.de
Abstract. In scientific computing, the problem of finding an analytical representation of a given function f : Ω ⊆ Rm −→ R, C is ubiquitous. The most practically relevant representations are polynomial interpolation and Fourier series. In this article, we address both problems in highdimensional spaces. First, we propose a quadratic-time solution of the Multivariate Polynomial Interpolation Problem (PIP), i.e., the N (m, n) coefficients of a polynomial Q, with deg(Q) ≤ n, uniquely fitting f on a determined set of generic nodes P ⊆ Rm are computed in O(N (m, n)2 ) time requiring storage in O(mN (m, n)). Then, we formulate an algorithm for determining the N (m, n) Fourier coefficients with positive frequency of the Fourier series of f up to order n in the same amount of computational time and storage. Especially in high dimensions, this provides a fast Fourier interpolation, outperforming modern Fast Fourier Transform methods. We expect that these fast and scalable solutions of the polynomial and Fourier interpolation problems in high-dimensional spaces are going to influence modern computing techniques occurring in Big Data and Data Mining, Deep Learning, Image and Signal Analysis, Cryptography, and Non-linear Optimization.
Keywords: Fast Fourier transform (Multivariate) Polynomial interpolation · Signal analysis Gradient descent · Optimization · Newton-Raphson iteration Integration of multivariate functions · Big data Machine & deep learning · Data mining
1
Introduction
Almost all applications of scientific computing involve a function f : Rm −→ R, C, m ≥ 1 that is not analytically known, i.e., there is no closed formula describing f , but the value of f (x) can be evaluated in O(1) time for any x ∈ Rm . This includes the problem of numerically computing an integral κ = Ω f dΩ (i.e. quadrature) by sampling f on Ω with respect to a resolution r ∈ R+ such c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 53–75, 2019. https://doi.org/10.1007/978-3-030-01177-2_5
54
M. Hecht and I. F. Sbalzarini
that a chosen discretized version of the integral is close to κ and can be efficiently determined. However, the amount of required sample points increases exponentially with the space dimension m, which is a crucial problem in modern applications of computing and is not limited to the application of numerical integration. Especially in Big Data and Deep Learning, functions f or distributions γ are often defined on high-dimensional spaces. In addition to their integral, quantities such as the gradient ∇f (x), x ∈ Rm , products f1 · f2 · · · fk , and ratios fg need to be computed [1–3]. The gradient ∇f for example determines the dynamics of the modeled system through the classical flow equation x(t) ˙ = −∇f (x(t)), x(0) = x0 , or some Partial Differential Equation (PDE) in terms of partial derivatives of f . In other applications, f plays the role of an objective function and Ω models the possible solutions of an optimization problem, e.g., in machine learning and computer vision. Also there, gradients and/or integrals are required in order to (approximately) solve the resulting high-dimensional optimization problem. Examples of applications where the aforementioned issues arise notably include signal analysis, image processing, data compression, partial differential equations, and large-integer multiplication, [4,5] with significance for instance for astronomy, [6,7], systems biology, molecular chemistry, and biophysics, [8,9], genomics, [10], as well as market and financial analysis, [11], and associated ethical questions [12]. However, for increasing dimension m 1, the computation of the models and identities typically becomes infeasible due to the exponentially increasing amount of sample points needed to represent f with reasonable resolution. While this can be alleviated by interpolation of an analytical basis, the available interpolation algorithms—such as polynomial interpolation or Fourier transforms—also scale unfavorably with problem dimension. Here, we propose two interpolation solutions, polynomial and Fourier, with improved computational and storage complexity. This extends applications to higher dimensions m ∈ N than currently possible. In particular, we consider the following computational problems: Problem 1: (PIP) Given parameters n, m ∈ N, m ≥ 1, and a function f : Rm −→ R, the problem is to choose N (m, n) generic nodes P = {p1 , . . . , pN (m,n) } ⊆ Rm such that there is exactly one polynomial Q ∈ R[x1 , . . . , xm ] of degree deg(Q) ≤ n that coincides with f on P , i.e., Q(p) = f (p) for all p ∈ P , and to determine Q once P has been chosen. Note that if f is a polynomial of deg(f ) ≤ n, then Q = f . Problem 2: (FIP) Let m, n ∈ N, m ≥ 1, and f : Rm −→ C be a multivariate complex periodic function. Then the problem is to compute the Fourier coefficients cα ∈ C of the Fourier series cα e2πiα·x , (1) f (x) ≈ sn (x) = α∈Am,n
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
where Am,n is the set of multi-indices of order |α| = α · x = α1 x1 + · · · + αm xm .
m i=1
55
|αi | less than n and
Formally, f is called periodic iff there exists a lattice L ⊆ Rm with fundamental domain Ω = [t1 , t2 ] × · · · [t2m−1 , t2m ] ⊆ Rm , ti ∈ R, i = 1, . . . , 2m, such that the function f : Tm L −→ C defined by f ◦ L = f is well defined on the torus m m Tm := R /L, where −→ Tm L : R L L , is the natural projection. In this case, the approximation of f by its Fourier series is possible. The formulation above only considers positive frequencies, which is typically the case in applications for physical reasons. In this article, we state a key result, whose detailed proof is given in [13], reducing the computational and memory complexity of the PIP in high dimensions. We furthermore show here for the first time how this result can be used to also efficiently solve the FIP in high dimensions and/or for certain types of scattered points. We formulate a recursive decomposition algorithm to efficiently compute the solutions to both problems. Consequently, for the first time, interpolation of functions f : Rm −→ R, C defined on high-dimensional spaces m 1, and Fourier transforms on scattered points, become efficiently possible. 1.1
Paper Outline
In Sect. 2, we state the main result. In Sects. 3, 4, 5 and 6, we provide the mathematical background and the proof. In Sect. 7, we describe a novel recursive algorithm for computationally solving the PIP/FIP, and in Sect. 8 we evaluate a MATLAB prototype implementation. Finally, in Sect. 9, we discuss the most relevant future applications and close with conclusions and future prospects in Sect. 10.
2
Main Result
The main result of this article is that the PIP and the FIP can be solved by a recursive decomposition algorithm. More precisely: Theorem 1 (Main Result): Let m, n ∈ N, m ≥ 1. Then there exist algorithms with runtime complexity O N (m, n)2 requiring storage in O(mN (m, n)), solving the PIP and the FIP, respectively. Indeed, in case of the PIP the result above yields an algorithm that scales significantly better with problem size than previous approaches, as we show below. The essential reason for this is that all other approaches require the inversion of the Vandermonde matrix, which requires runtime at least in O(N (m, n)2.8 ) [14] and storage in O(N (m, n)2 ). To solve the FIP, classical Fast Fourier Transform (FFT) methods approximate the Fourier transform 1 f (x)e−2πix·y dΩ f (y) = |Ω| Ω
56
M. Hecht and I. F. Sbalzarini
of f on regular Cartesian grids. To do so, f is exhaustively sampled on Ω with respect to a regular grid D ⊆ Rm of resolution r ∈ N and 2r nodes in every dimension, i.e., #D = 2rm . Then, based on the algorithm of James W. Cooley and John W. Tukey, [15] modern FFTs require a runtime in O(#D log(#D)) to compute the values of the discrete Fourier transformation fD (y) =
1 f (x)e−id·y , |D||Ω|
, ∀y ∈ D
(2)
d∈D
∼ where D = D denotes a regular grid in the domain of the Fourier transform f then fD (α) = cα , ∀α ∈ Am,n . is chosen such that Am,n ⊆ D, of f , see [16]. If D Thus, the FIP can be solved by FFT with exponential runtime in O(rm2rm ). In contrast, the fact that N (m, n) ∈ O(mn ) shows that our solution yields a polynomial-time solution of the FIP as long as n is bounded, which is usually the case in practice. Moreover, our solution works on certain scattered point distributions and is not limited to Cartesian grids.
3
Systems of Polynomials
We first show the solution of the PIP by considering systems of polynomials as function basis. We follow [17] to introduce some basic notions and helpful notations. While a polynomial Q ∈ R[x1 , . . . , xm ] of degree deg(Q) = n possesses the number of monomials of degree k is given by N (m, n) := m+n m monomials, m+k−1 m+k . We enumerate the coefficients c0 , . . . , cN (m,n)−1 M (m, k) := m − m of Q such that Q(x) = c0 + c1 x1 + · · · + cm xm + cm+1 x21 + cm+2 x1 x2 + · · · + c2m x1 xm + c2m+1 x22 + · · · + cM (m,n−1)+1 xn1 + cM (m,n−1)+2 xn−1 x2 + · · · 1 + cN (m,n)−1 xnm . + cN (m,n)−2 xm−1 xn−1 m
(3)
We assume that 0 ∈ N and consider Am,n := {α ∈ Nm |α| ≤ n} the set of m all multi-indices of order |α| := k=1 αk ≤ n. The multi-index is used to address the monomials of a multivariate polynomial. For a vector x = (x1 , . . . , xm ) and α ∈ Am,n we define αm 1 xα := xα 1 · · · xm and by ordering the M (m, k) multi-indices of order k with respect to lexicographical order, i.e., α1 = (k, 0, . . . , 0), α2 = (k − 1, 1, 0 . . . , 0),. . . , αM (m,k) = k M (m,k) (0, . . . , 0, k). The k-th symmetric power xk = (xk is 1 , . . . , xM (m,k) ) ∈ R defined by the entries xk := xαi , i Thus, x0 = 1, x1 = xi . i i
i = 1, . . . , M (m, k) .
(4)
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
57
Definition 1: Given parameters n, m ∈ N. For a set P = {p1 , . . . , pN (m,n) } ⊆ Rm of nodes, with pi = (p1,i , . . . , pm,i ), we define the multivariate Vandermonde matrix Vm,n (P ) by ⎛
1 p1 p2 1 ⎜ 1 p2 p2 2 ⎜ ⎜ p2 3 Vm,n (P ) = ⎜ 1 p3 ⎜ .. .. ⎝1 . . 1 pN (m,n) p2 N (m,n)
··· ··· ··· .. .
pn 1 pn 2 pn 3 .. .
· · · pn N (m,n)
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
We call a set P = {p1 , . . . , pN (m,n) } ⊆ Rm of nodes generic with respect to Thus, the PIP if and only if the Vandermonde matrix Vm,n (P ) is regular. the m det V = {P ⊆ R (P )
= 0}, set of all generic node sets is given by P m,n m,n which is open in Rm since P → det Vm,n (P ) is a continuous function. Given a real-valued function f on Rm and assuming that there exist generic nodes P = {p1 , . . . pN (m,n) } ⊆ Rm , then the linear system of equations Vm,n (P )x = F , with T T and F = f (p1 ), . . . , f (pN (m,n) ) x = x1 , . . . , xN (m,n) possesses the unique solution s = (s1 , . . . , sN (m,n) ) ∈ RN (m,n) given by s = Vm,n (P )−1 F . Thus, by setting ci := si for the coefficients of Q ∈ R[x1 , . . . , xm ], enumerated as in (3), we have uniquely determined the solution of Problem 1. The essential difficulty herein lies in finding a good generic node set by such that inversion of the Vandermonde matrix Vm,n (P ) is numerically stable and accurate. In this regard, the following observation mentioned in [18] and again in [17] is crucial. Theorem 2: Let m, n ∈ N and P ⊆ Rm , #P = N (m, n). The Vandermonde matrix Vm,n (P ) is regular if and only if the nodes P do not belong to a common algebraic hyper-surface of degree ≤ n, i.e., if there exists no polynomial Q ∈ R[x1 , . . . , xm ], Q = 0 of degree deg(Q) ≤ n such that Q(p) = 0 for all p ∈ P . Proof: Theorem 2 says that P is generic with respect to the PIP if and only if the homogeneous Vandermonde problem Vm,n (P )x = 0 possesses no non-trivial solution, which is equivalent to the fact that there is no hyper-surface V of degree deg(V ) ≤ n with P ⊆ V .
58
4
M. Hecht and I. F. Sbalzarini
Fourier Systems
Analogously, to the PIP we now consider trigonometric polynomials as Fourier bases: Definition 2: We consider the set Θm,n of all trigonometric polynomials θ ∈ Θm,n of order deg(θ) ≤ n depending on m variables, i.e., cα e2πix·α , cα ∈ C , ∀ α ∈ Am,n . (5) θ(x) = α∈Am,n
Following Sect. 3, we enumerate the coefficients c0 , . . . , cN (m,n)−1 of a trigonometric polynomial θ such that θ(x) = c0 + c1 e2πix1 + · · · + cm e2πixm + cm+1 e2πi2x1 + cm+2 e2πi(x1 +x2 ) + · · · + c2m e2πi(x1 +xm ) + c2m+1 e2πi2x2 + · · · + cM (m,n−1)+1 e2πinx1 + cM (m,n−1)+2 e2πi((n−1)x1 +x2 ) + · · · + cN (m,n)−2 e2πi(xm−1 +(n−1)xm ) 2πinxm
+ cN (m,n)−1 e
(6)
.
Definition 3: In analogy to the previous section, we order the multi-indices α1 , . . . , αM (m,k) ∈ Am,n of order |α| = k, k ∈ N . For a vector x = (x1 , . . . , xm ) k M (m,k) we define the k-th exponential power xk = (xk by 1 , . . . , xM (m,k) ) ∈ R the entries := e2πiαj ·x , j = 1, . . . , M (m, k) . (7) xk j Thus, x0 = 1, x1 = e2πixj . Given nodes P = {p1 , . . . , pN (m,n) } ⊆ Rm we j j consider the Fourier coefficient matrix ⎛
1 p1 p2 1 1 1 2 ⎜ 1 p2 p 2 ⎜ 1 ⎜ p2 3 Vm,n (P ) = ⎜ 1 p3 ⎜ .. .. ⎝1 . . 2 1 p1 p N (m,n) N (m,n)
··· ··· ··· .. .
pn 1 pn 2 pn 3 .. .
· · · pn N (m,n)
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
We call a set of nodes P ⊆ Rm generic with respect to the FIP if and only if Vm,n (P ) ∈ CN (m,n)×N (m,n) is regular. Analogously to Theorem 2, we observe that P is generic if and only if there exists no trigonometric polynomial θ ∈ Θm,n with θ(P ) = 0 or, in other words, if there is no trigonometric hyper-surface T of order less than n such that P ⊆ T . In this case, as in Sect. 3, we observe that the solution c ∈ CN (m,n) of the linear system of equations Vm,n (P )c = F ,
F = (f (p1 ), . . . , f (pN (m,n) ))T
is uniquely determined and yields the desired Fourier coefficients.
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
5
59
Special Cases
We efficiently solve the PIP and the FIP by recursively decomposing them into linear or 1-dimensional sub-PIPs/FIPs. Before presenting the decomposition method, we therefore first describe how to solve these sub-PIPs/FIPs. To this end, we define the following central concepts: Definition 4: Let τ : Rm −→ Rm be given by τ (x) = Ax + b, where A ∈ Rm×m is a full-rank matrix, i.e., rank(A) = m, and b ∈ Rm . Then we call τ an affine transformation on Rm . Definition 5: For every ordered tuple of integers i1 , . . . , ik ∈ N, iq < ip if 1 ≤ q < p ≤ k, we consider Hi1 ,...,ik = {(x1 , . . . , xn ) ∈ Rm xj = 0 if j ∈ {i1 , . . . , ik }} the k-dimensional hyperplanes spanned by the i1 , . . . , ik -th coordinates. We denote by πi1 ,...,ik : Rm −→ Hi1 ,...,ik and ii1 ,...,ik : Rk → Rm , with ii1 ,...,ik (Rk ) = Hi1 ,...,ik the natural projections and embeddings. We denote by πi∗1 ,...,ik : R[x1 , . . . , xm ] −→ R[xi1 , . . . xik ], i∗i1 ,...,ik : R[y1 , . . . yk ] → R[x1 , . . . , xm ] the induced projections and embeddings on the polynomial ring. Definition 6: Let m, n ∈ N, ξ1 , . . . , ξm ∈ Rm be an orthonormal frame (i.e., ξi , ξj = δij , ∀1 ≤ i, j ≤ m, where δij denotes the Kronecker symbol), and b ∈ Rm . For I ⊆ {1, . . . , m} we consider the hyperplane HI,ξ,b := {x ∈ Rm x − b, ξi = 0 , ∀ i ∈ I} . Given functions f : Rm −→ R, g : Rm −→ C we say that a set of nodes P0 , P1 ⊆ HI and polynomials Q ∈ R[x1 , . . . , xm ] with deg(Q) ≤ n, θ ∈ Θm,n solve the PIP or the FIP with respect to n, k = dim HI and f , g on HI if and only if Q(p) = f (p), θ(q) = g(q) for all p ∈ P0 , q ∈ P1 . Furthermore, it has to hold that whenever there is a Q ∈ R[x1 , . . . , xm ] with deg(Q ) ≤ n, θ ∈ Θm,n and Q (p) = f (p), θ (q) = g(q) for all p ∈ P0 , q ∈ P1 then Q (x) = Q(x), θ (x) = θ(x) for all x ∈ HI , respectively. Definition 7: Let H ⊆ Rm be a hyperplane of dimension k ∈ N and τ : Rm −→ Rm an affine transformation such that τ (H) = H1,...,k . Then, we denote by τ ∗ : R[x1 , . . . , xm ] −→ R[x1 , . . . , xm ] the induced transformation on the polynomial ring defined on the monomials as: τ ∗ (xi ) = η1 x1 + · · · + ηm xm with η = (η1 , . . . , ηm ) = τ (ei ) , where e1 , . . . , em denotes the standard basis of Rm .
60
M. Hecht and I. F. Sbalzarini
If μ : Rm −→ Rm , μ(x) = x + b, b ∈ Rm is a translation, then we define the induced transformation on the trigonometrical polynomials μ∗ : Θm,n −→ Θm,n by μ∗ (θ)(x) := θ(μ(x)). Thus, μ∗ (θ)(x) = cα e2πiμ(x) = cα e2πiα·(x+b) α∈Am,n
=
α∈Am,n
cα e2πiα·b e2πiα·x =
α∈Am,n
cα e2πiα·x ,
α∈Am,n
where cα = cα e2πiα·b ∀ α ∈ Am,n . Lemma 1: Let m, n ∈ N and P0 , P1 be generic node sets with respect to m, n and the PIP and FIP, respectively. (i) Let τ : Rm −→ Rm , τ (x) = Ax + b be an affine transformation. Then, τ (P0 ) is also a generic set with respect to the PIP. (ii) Let μ : Rm −→ Rm , μ(x) = x + b be a translation. Then, μ(P1 ) is also a generic set with respect to the FIP. Proof: We assume there are polynomials Q0 ∈ R[x1 , . . . , xm ] with deg(Q0 ) ≤ n, θ0 ∈ Θm,n , such that Q0 (τ (P0 )) = 0, θ0 (μ(P1 )) = 0, respectively. Then, setting Q1 := τ ∗ (Q0 ), θ1 := μ∗ (θ0 ) yields non-zero polynomials with deg(Q1 ) ≤ n, θ1 ∈ Θm,n such that Q1 (P0 ) = τ ∗ (Q0 )(P0 ) = Q0 (τ (P0 )) = 0 , θ1 (P1 ) = μ∗ (θ0 )(P1 ) = θ0 (μ(P1 )) = 0 , which contradicts the assumption that P0 , P1 are generic with respect to the PIP and FIP, respectively. Hence, τ (P0 ), μ(P1 ) must be generic sets. 5.1
1-Dimensional Polynomial Interpolation
We review the classical results of polynomial interpolation in the special case of dimension m = 1 and refer the reader to [19] for a more detailed discussion. In one dimension, i.e. f : R −→ R, P ⊆ R, the Vandermonde matrix V1,n (P ) takes its classical form ⎞ ⎛ 1 p1 · · · pn1 ⎟ ⎜ V1,n (P ) = ⎝ ... ... . . . ... ⎠ . 1 pn+1 · · · pnn+1 For the matrix V1,n (P ) to be regular, the nodes p1 , . . . , pn+1 have to be pairwise different. Theorem 2 implies that this is also a sufficient condition for the nodes P to be generic. In light of this fact, we consider the Lagrange polynomials Lj (x) =
n+1 h=1,h=i
x − ph , pj − ph
j = 1, . . . , n
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
61
fulfilling Lj (pk ) = δjk , where δjk is the Kronecker symbol. Note that V1,n induces a linear bijection ϕ : Rn+1 −→ Rn [x], where Rn [x] is the vector space of real polynomials Q ∈ R[x] with deg(Q) ≤ n and ϕ(Q) is given by the polynomial with coefficients V1,n s. Hence, if we use the Lj instead of {1, x, x2 , . . . , xn } as a basis of Rn [x], the transformed Vandermonde matrix becomes the identity matrix. Thus, the solution of the PIP in dimension m = 1 is given by choosing pairwise different nodes P and setting Q(x) =
n+1
f (pj )Lj (x) .
j=1
The coefficients of Q are given by: ci = Q(i) (0) =
n+1
(i)
f (pj )Lj (0) ,
i = 0, . . . , n ,
(8)
j=1 (i)
where Q(i) , Li denotes the i-th derivative of Q and Li , respectively. Thus, (i) starting from an analytical expression for Lj (0) we can establish an analytical formula for the ci that requires computing n derivatives of n + 1 polynomials, therefore incurring a computation cost of O(n2 ) overall. This observation yields the following statement for embedded 1-dimensional PIP proven in [13]: Proposition 1: Let m, n, j ∈ N, 1 ≤ j ≤ m, ξ1 , . . . , ξm ∈ Rm an orthonormal frame, b ∈ Rm , and I = {1, . . . , m} \ {j} such that HI := HI,ξ,b is a line. Further let a function f : Rm −→ R be given. (i) If ξ1 , . . . , ξm is the standard basis and bj = 0, then there exists an algorithm that solves the PIP with respect to (1, n) and f on HI in O(n2 ). (ii) If ξ1 , . . . , ξm is an arbitrary frame, then there exists an algorithm that solves the PIP with respect to (1, n) and f on HI in O(mnN (m, n) + n2 ).
5.2
1-Dimensional Fourier Interpolation
For n ∈ N, a one-dimensional function f Fourier coefficient matrix becomes: ⎛ 1 e2πip1 ⎜ .. V1,n (P ) = ⎝ ... . 1 e2πipn+1
: R −→ C, and nodes P ⊆ R, the ⎞ · · · e2πinp1 ⎟ .. .. ⎠. . . 2πinpn+1 ··· e
In analogy to the Lagrange polynomials, we set j (x) = L
n+1 h=1,h=i
e2πix − e2πiph , e2πipj − e2πiph
j = 1, . . . , n .
62
M. Hecht and I. F. Sbalzarini
As long as the nodes P are chosen such that they differ on T1 pair-wise, i.e., j (x) are C-linear : R −→ R/ n1 Z with (x) = x + n1 Z is injective on P , the L independent. Thus, the analogous argumentation as for polynomial bases yields that the desired trigonometric polynomial is given by: θ(x) =
n+1
j (x) . f (pj )L
j=1
Moreover, we have that V1,n (P ) is regular, which shows that the condition on the nodes is necessary and sufficient for them to be generic. Hence, the coefficients of θ can be computed as: ci = θ(i) (0) =
n+1
(0) , f (pj )L j (i)
i = 0, . . . , n ,
(9)
j=1
yielding an O(n2 )-time algorithm for solving the 1-dimensional FIP. More general, this yields: Proposition 2: Let m, n, j ∈ N, 1 ≤ j ≤ m, ξ1 , . . . , ξm ∈ Rm the standard basis of Rm , b ∈ Rm , and I = {1, . . . , m} \ {j} such that HI := HI,ξ,b is a line. Further let a function f : Rm −→ C be given. Then, there exists an algorithm that solves the FIP with respect to f on HI in O(n2 + mn). Proof: By assumption the translation μ : Rm −→ Rm , μ(x) = x + b maps HI,ξ,0 to HI,ξ,b . Thus, by setting f := f ◦ μ and using the argumentation above, we can solve the FIP on HI,ξ,0 ∼ = R with respect to f in O(n2 ), denoting the computed trigonometric polynomial by θ ∈ Θm,n . The desired solution then is θ := (μ−1 )∗ (θ ). Since the translation requires O(mN (1, n)) ⊆ O(mn) operations to compute the scalar product α · b for every trigonometric monomial, we show the claim. 5.3
Linear Polynomial Interpolation
If we choose P = {0, e1 , . . . , em }, where e1 , . . . , em is the standard basis of Rm , we get: ⎛ ⎞ 1 0 0 ··· 0 ⎜ .⎟ ⎜ 1 1 0 · · · .. ⎟ ⎜ ⎟ ⎟ Vm,1 (P ) = ⎜ ⎜1 0 1 ... 0⎟ . ⎜. . . . .⎟ ⎝ .. .. . . . . .. ⎠ 1 0 ··· 0 1 Thus, c = (c0 , . . . , cm )T ∈ Rm+1 with c0 = f (p1 ) ,
ci = f (pi ) − f (p1 )
for i > 1
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
63
solves Vm,1 (P )c = F with F = (f (p1 ), . . . , f (pm+1 ))T . Hence, Q(x) = c0 + c1 x1 + · · · cm xm
(10)
is the unique solution of the PIP with respect to P and the given function f : Rm −→ R. Thus, P is generic. In [13], this observation yielded a solution of the linear PIP on the general hyperplanes HI,ξ,b from Definition 6 as follows: Proposition 3: Given m, n ∈ N, a function f : Rm −→ R, an orthonormal frame ξ1 , . . . , ξm ∈ Rm , b ∈ Rm , I ⊆ {1, . . . , m}, and HI := HI,ξ,b , the following holds: (i) If ξ1 , . . . , ξm is the standard basis, then there exists an algorithm that solves the PIP with respect to (m, 1) and f on HI in O(k), k = |I|. (ii) If ξ1 , . . . , ξm is an arbitrary frame, then there exists an algorithm that solves the PIP with respect to (m, 1) and f on HI in O(mk), k = |I|. 5.4
Linear Fourier Interpolation
For m ≥ 1 we choose λ1 , . . . , λm ∈ (0, 1) and consider P = {0, λ1 e1 , . . . , λm em }, where e1 , . . . , em is the standard basis of Rm . Then, the Fourier coefficient matrix with respect to P becomes: ⎞ ⎛ 1 1 1 ··· 1 ⎜ 1 e2πiλ1 1 · · · 1 ⎟ ⎟ ⎜ 2πiλ2 ⎜ ... 1 ⎟ Vm,1 (P ) = ⎜ 1 1 e ⎟. ⎜ .. .. .. ⎟ .. .. ⎝. . . . . ⎠ 2πiλm 1 1 ··· 1 e Thus, by subtracting the first row from all other rows, we obtain that c = (c0 , . . . , cm )T with c0 = f (0) −
m j=1
solves
Vm,1 (P )c = F ,
cj , cj =
f (pj ) − f (0) ,j≥1 e2πiλj − 1
F = (f (0), f (λ1 e1 ), . . . , f (λm em ))T .
Hence, P is generic with respect to the linear FIP, and we can compute the solution with respect to P in O(m) time. Therefore, using the analogous argumentation as in the proof of Proposition 2, we find: Proposition 4: Let m, n ∈ N, f : Rm −→ C, ξ1 , . . . , ξm ∈ Rm be the standard basis, b ∈ Rm , I ⊆ {1, . . . , m}, and HI := HI,ξ,b . Then, there exists an algorithm that solves the FIP with respect to (m, 1) and f on HI in O(mk), k = |I|.
64
6
M. Hecht and I. F. Sbalzarini
Solution of the General PIP/FIP
In dimensions m ≥ 2, we establish a recursive decomposition of the PIP/FIP into two sub-problems, one of lower dimension (m−1, n) and one of lower degree (m, n − 1). m 1 m be the m-dimensional torus with fundaTo do so, we let Tm n = R /nZ 1 1 m mental domain [− n , n ) . Further, we denote by m,n : Rm −→ Tm n , m,n (x) = x + n1 Zm the natural projection onto it. Theorem 3: Let m, n ∈ N, m ≥ 1, ξ1 , . . . , ξm be an orthonormal frame of Rm , b ∈ Rm , I ⊆ {1, . . . , m} with |I| = m − 1, and H = HI,ξ,b be as in Definition 6. Further, let P ⊆ Rm such that: (i) There is a hyperplane H ⊆ Rm of co-dimension 1, and P1 := P ∩ H is generic with respect to the PIP and H, i.e., by identifying H ∼ = Rm−1 the Vandermonde matrix Vm−1,n (P1 ) is regular. (ii) The set P2 = P \ H is generic with respect to the PIP for parameters (m, n − 1), i.e., the Vandermonde matrix Vm,n−1 (P2 ) is regular. Then, P is generic with respect to the PIP for (m, n). If ξ1 , . . . , ξm is the standard basis and P is such that: (a) The nodes P1 := P ∩ HI,ξ,b are generic with respect to the hyperplane H = HI,ξ,b and the FIP, i.e., by identifying μ(H) ∼ = Rm−1 , where μ : Rm −→ Rm is given by μ(x) = x − b, the Fourier coefficient matrix Vm,n (P ) is regular. (b) The nodes P2 = P \ P1 satisfy m,n (P2 ) ∩ m,n (H) = ∅ and are generic with respect to the FIP for parameters (m, n − 1). Then, P is generic with respect to the FIP for (m, n). Proof: We prove the Theorem with respect to the FIP. The proof for the PIP follows analogously. Due to Lemma 1(ii), generic node sets remain generic under translation. Thus, by applying the translation μ and re-indexing the standard basis ξ, we can assume w.l.o.g. that H = H1,...,m−1 . By (a), there is no trigonometric polynomial σ ∈ Θm−1,n with σ(P1 ) = 0. Hence, if there is θ ∈ Θm,n with θ(P1 ) = 0, then θ(H) = 0, which implies that θ can be decomposed into θ = θ0 · θ1 , where θ0 ∈ Θm,n−k and θ1 = 1 − e2πikxm , k ≥ 1. Therefore, θ1 (x) = 0 for all x ∈ Rm with m,n (x) ∩ m,n (H) = ∅. By (b), there is no polynomial ν ∈ Θm,n−1 with ν(P2 ) = 0. Hence, there is no trigonometric polynomial θ ∈ Θm,n with θ(P ) = 0, which is equivalent to P being generic. The recursive decomposition defined by Theorem 3 allows us to decompose the PIP/FIP into smaller and therefore simpler sub-problems: Theorem 4: Let m, n ∈ N, m ≥ 1, f : Rm −→ R a given function, H ⊆ Rm be a hyperplane of co-dimension 1, QH ∈ R[x1 , . . . , xm ] a polynomial such that m be deg(QH ) = 1, and Q−1 H (0) = H. Further, let H and P = P1 ∪ P2 ⊆ R such that (i) and (ii) of Theorem 3 hold with respect to H. Require Q1 , Q2 ∈ R[x1 , . . . , xm ] to be such that:
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
65
(i) Q1 has deg(Q1 ) ≤ n and solves the PIP with respect to f and P1 on H. (ii) Q2 has deg(Q2 ) ≤ n−1 and solves the PIP with respect to fˆ := (f −Q1 )/QH and P2 = P \ H on Rm . Then, Q = Q1 +QH Q2 is the uniquely determined polynomial with deg(Q) ≤ n that solves the PIP with respect to f and P on Rm . For the FIP, we analogously find: Theorem 5: Let m, n ∈ N, m ≥ 1, f : Rm −→ C a given periodic function, ξ1 , . . . , ξm be the standard basis, b ∈ Rm , I ⊆ {1, . . . , m} with |I| = m − 1, H = HI,ξ,b , and P such that (a) and (b) of Theorem 3 hold with respect to H. −1 (0) = m,n (H). Consider θH (x) = 1 − e2πiξj ·(x−b) , j ∈ {1, . . . , m} \ I with θH Require θ1 ∈ Θm,n , θ2 ∈ Θm,n−1 to be such that: (a) θ1 has deg(θ1 ) ≤ n and solves the FIP with respect to f and P1 on H. (b) θ2 has deg(θ2 ) ≤ n − 1 and solves the FIP with respect to fˆ := (f − θ1 )/θH and P2 on Rm . Then, θ = θ1 + θH θ2 is the uniquely determined trigonometric polynomial with θ ∈ Θm,n that solves the FIP with respect to f and P on Rm . Proof: We again prove the claim for the FIP. The proof for the PIP is analogous. By our assumptions on θH and θ1 , we have that θ(x) = θ1 (x) ∀x ∈ H and therefore θ(p) = f (p) ∀p ∈ P1 . At the same time, θH (x) = 0 for all x ∈ Rm with m,n (x) ∈ m,n (H). Therefore fˆ is well defined on P2 , and θ(p) = θ1 (p) + fˆ(p)θH (p) = f (p) ∀p ∈ P2 . Hence deg(θ) ≤ n, and θ solves the FIP with respect to f and a generic set of nodes P ⊆ Rm . Thus, θ is the unique solution of the FIP with respect to P and f . We close this section by stating the main result in a more precise way and delivering its proof. Theorem 6 (Main Result): Let m, n ∈ N, m ≥ 1, f : Rm −→ R be a function, and g : Rm −→ C be a periodic with runtime function. Then, there exist algorithms complexity O N (m, n)2 , requiring storage in O mN (m, n) , that compute: (A1) a generic set of nodes P0 ⊆ Rm for the PIP with respect to (m, n); (A2) a polynomial Q ∈ R[x1 , . . . , xm ] with deg(Q) ≤ n, such that Q is the unique solution of the PIP with respect to f and P0 = P0 + η, where P0 was generated in (A1) and η ∈ Rm is an arbitrary vector. (B1) a generic set of nodes P1 ⊆ Rm for the FIP with respect to (m, n); (B2) a trigonometric polynomial θ ∈ Θm,n , such that θ is the unique solution of the FIP with respect to g and P1 = P1 + ζ, where P1 was generated in (B1) and ζ ∈ Rm is an arbitrary vector. Proof: Again, we show (B1) and (B2), where (A1) and (A2) follow analogously. To do so, we claim that there is a constant C ∈ R+ and an algorithm computing (B1) and (B2) in less than CN (m, n)2 operations. To prove this claim, we argue
66
M. Hecht and I. F. Sbalzarini
by induction on N (m, n). If N (m, n) = 1, then n = 0. According to Sect. 5, the claim then holds. Now let N (m, n) > 1. If m = 1 or n = 1, Propositions 1 and 3, provide us with the required runtime bounds. Thus, the claim holds by choosing C as the largest occurring constant. If m > 1 and n > 1, we choose −1 (0), θH (x) = 1 − e2πiem ·(x−b) , b ∈ Rm and consider the hyperplane H = θH T m em = (0, . . . , 0, 1) ∈ R . By translation we get H ∼ = Rm−1 . Thus, induction yields that we can determine a set of generic nodes P1,a ⊆ H in less than CN (m − 1, n)2 operations. Induction also yields that a set P1,b ⊆ Rm of generic nodes can be determined with respect to n − 1 in less than CN (m − 1, n)2 operations. By translating P1,b with ζ ∈ Rm \ H, i.e., setting P1,b = P1,b + ζ, we of the can guarantee that m,n (P1,b ∩ H) = ∅. Hence, the union P1 = P1,a ∪ P1,b corresponding generic sets of nodes is also generic due to Theorem 3, proving (B1). It is trivial to see that N (m − 1, n) + N (m, n − 1) = N (m, n). Thus, by (B1) and induction, trigonometric polynomials θ1 ∈ Θm−1,n , θ2 ∈ Θm,n−1 , can be determined in less than CN (m−1, n)2 +CN (m, n−1)2 ≤ CN (m, n)2 operations, such that θ1 solves the FIP on H with respect to g and P1,a , while θ2 solves the . Due to Theorem 4, we have that FIP with respect to gˆ = (f − θ1 )/θH and P1,b θ1 + θH θ2 solves the FIP with respect to g and P1 . Thus, for any ζ ∈ Rm , setting g (x) = g(x + ζ) yields a solution with respect to g and P1 = P1 + ζ. Therefore, it remains to bound the steps required for computing θ1 + θH θ2 . The bottleneck herein lies in the computation of θH θ2 , which requires C2 (m + 1)N (m, n − 1) operations, with constant C2 ∈ R+ . Now observe that 2N (m − 1, 2) = m(m + 1) and N (m − 1, k) ≤ N (m − 1, k ) for k ≤ k ∈ N. Thus, (m + 1)N (m, n − 1) ≤ 2N (m−1, n)N (m, n−1), for n > 1, which shows that θ1 +θH θ2 can be computed in less than C3 N (m − 1, n)N (m, n − 1) operations, with constant C3 ∈ R+ . Hence, by using again N (m − 1, n) + N (m, n − 1) = N (m, n) and safely assuming 2C ≥ C3 , we have that C N (m − 1, n)2 + N (m, n − 1)2 + C3 N (m − 1, n)N (m, n − 1) ≤ CN (m, n)2 , proving (B2). We bound the memory complexity of the decomposition algorithm using an analogous induction. If N (m, n) = 1, we need to store at most CmN (m, n) = D, D ∈ N, numbers. If N (m, n) > 1, and m = 1 or n = 1, Propositions 1 and 3 imply that we have to store the generic nodes, which requires D1 mN (m, n) numbers, and the coefficients requiring D2 N (m, n) numbers. This shows the claim by setting D = max{D1 , D2 }. If N (m, n) > 1 and m, n > 1, using the same splitting of the problem as above, induction yields that we have to store at most D(m − 1)N (m − 1, n) or DmN (m, n − 1) numbers for each sub-problem, respectively. Altogether, we therefore need to store D(m − 1)N (m − 1, n) + DmN (m, n − 1) ≤ DmN (m, n) numbers, proving the storage complexity.
7
Algorithms
By recursively applying the decomposition of Theorems 3 and 4, we obtain a general fast PIP/FIP-SOLVER algorithm. The algorithm starts by constructing
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
67
Fig. 1. The PIP decomposition tree TH,Q for m = n = 3.
a binary tree TH,Q or TH,θ until the leaves contain only linear and 1-dimensional sub-problems. These are then solved following Sect. 5 and the final result is composed from the theorems. We illustrate this algorithm here in the following examples. Example 1: Let m = n = 3 and F = {e1 , e2 , e3 }. Then, the decomposition is illustrated in Fig. 1. The labels (k, l) ∈ N2 on the vertices are the parameters of the subproblem, i.e., k denotes the dimension and l the degree, while (Hε , Qε ), ε ∈ {0, 1}d , d ∈ {1, 2, 3} denote the solution Qε on the hyperplane Hε and d the tree depth. The QHε denote the polynomials defining the corresponding hyperplanes. For instance setting QH1 (x) = e3 (x − 2e3 ), QH1,1 (x) = e2 (x − 2e3 + 2e2 ), QH0,1 = e3 (x + 4e3 ), QH0,1,1 (x) = e2 (x + 4e3 − 4e2 ), QH1,0,1 (x) = e2 (x − 2e3 − 8e2 ). defines the hyperplanes H0 = R3 , H1 = Hx,y + 2e3 , H0,0 = R3 , H0,1 = Hx,y − 4e3 , H1,0 = H1 , H1,1 = Hx + 2e3 − 2e2 , H0,1,1 = Hx − 4e3 + 4e2 , H1,0,1 = Hx + 2e3 + 8e2 ,
where Hxy , Hx , Hy , Hz denote the x, y-plane, the x-line, y-line, and the z line, respectively. Therefore, we get H0,1,1 = {x ∈ R3 QH0 (x) = QH0,1 (x) = QH0,1,1 (x) = 0} with QH0 (x) = 0. Indeed, no planes of equal dimension intersect, which implies that the splitting criteria of Theorems 3 and 4 are fulfilled for the PIP for these choices. Example 2: We continue Example 1 by computing the interpolating polynomial for the generic nodes determined therein. Starting in the first leaf from the left, v1 with parameters (1, 3), we compute a generic set of nodes Pv1 by following Proposition 1, and we compute the solution Q1,1 of the 1-dimensional PIP with respect to f and Pv1 . Then we go to the next leaf v2 with parameters (1, 2) and
68
M. Hecht and I. F. Sbalzarini
compute Pv2 and Q1,0,1 with respect to fˆ = (f −Q1,1 )/QH1 and Pv2 . We proceed to v3 with (2, 1) and use Proposition 3 to compute linear generic nodes Pv3 and Q1,0,0 with respect to fˆ = (f − Q1,1 − Q1,0,1 · QH1,1 )/(QH1 · QH1,1 ). Then, we can compose Q1 = Q1,1 + QH1,1 Q1,0 , Q1,0 = Q1,0,1 + QH1,0,1 Q1,0,0 . We then proceed to v4 with parameters (1, 2), solving the 1-dimensional PIP with respect to fˆ =(f − Q1 )/QH1 =
(f − Q1,1 − Q1,0,1 · QH1,1 −Q1,0,0 · QH1,1 · QH1,0,1 )/QH1 .
We solve the PIP for the remaining leaves analogously, observing that finally Q = Q1 + QH1 Q0 = Q1,1 + QH1,1 Q1,0 + QH1 (Q0,1 + QH0,1 Q0,0 ) = Q1,1 + QH1,1 (Q1,0,1 + QH1,0,1 Q1,0,0 ) + QH1 (Q0,1,1 + QH0,1,1 Q0,1,0 ) + QH0,1 Q0,0 solves the PIP with respect to f and P on R3 . Remark 1: Classical methods for polynomial interpolation are based on the inversion of the Vandermonde matrix, requiring additional communication overhead. Thus, in addition to the fact that our approach requires less memory (O(m(N (m, n)) instead of O(N (m, n)2 )) and less computation (O(m(N (m, n)2 ) instead of O(N (m, n)2.8 ) [14]). Thus, large instances, e.g. m ≥ 1000 and n ≤ 5, can be solved on standard hardware.
8
Numerical Experiments
We verify and illustrate some of the findings of the previous sections in numerical experiments. We are only able to present experimental results for the PIP, as implementation of a FIP-SOLVER is work in progress. For benchmarking, we use a prototype MATLAB (version R2015b (8.6.0.267246)) implementation of the algorithm described above running on an Apple MacBook Pro (Retina, 15-in., Mid 2015) with a 2.2 GHz Intel Core i7 processor and 16 GB 1600 MHz DDR3 memory and operating system macOS Sierra (version 10.12.14). For given m, n ∈ N and function f : Rm −→ R, we compare our approach with the previously used methods: (i) Our solution described in the previous section, which we call PIP-SOLVER. (ii) Finding generic nodes P = {p1 , . . . , pN }, N = N (m, n) using our approach, but then solving the PIP by generating the Vandermonde matrix Vm,n (P ) and using the MATLAB linear solver, which uses an LU -decomposition to compute C ∈ RN with Vm,n (P )C = F , F = (f (p1 ), . . . , f (pN ))T . We call this approach Linsolve. (iii) Finding generic nodes P = {p1 , . . . , pN }, N = N (m, n) using our approach, but then solving the PIP by generating the Vandermonde matrix Vm,n (P ) and using MATLAB matrix inversion, which is a hybrid algorithm of modern matrix inversion approaches, to compute C ∈ RN as C = Vm,n (P )−1 F , F = (f (p1 ), . . . , f (pN ))T . We call this approach Inversion.
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
69
Fig. 2. Runtimes for n = 3, m = 2, . . . , 35.
Experiment 1: We first compare the accuracy of the three approaches, which also serves to validate our method. To do so, we choose uniformly-distributed random numbers C = (c0 , . . . , cN −1 ) ∈ [−1, 1]N , N = N (m, n) to be the coefficients of a polynomial Qf in m variables and degree deg(Qf ) = n. Then we generate a set of generic nodes Pm,n and evaluate F = (Qf (p1 ), . . . , Qf (pN ))T . Afterwards, we compute the solutions Ck of the PIP with respect to these nodes and the three approaches k = 1, 2, 3 above. Finally, we measure the numerical accuracy of each approach k by computing ||C − Ck ||l∞ , k = 1, 2, 3, the maximum absolute error in any coefficient. Figure 3 shows the average and min-max span of the errors (over 10 repetitions with different i.i.d. random polynomials; same 10 polynomials for the three approaches) for fixed degree n = 3 and dimensions m = 2, . . . , 35, plotted versus the quadratic size of the problem. In the case tested here, all methods show high accuracy, which reflects the fact that our generic nodes are well chosen. However, even though all approaches use the same “good” generic nodes, we observe a significant gain in accuracy for the PIP-SOLVER and, importantly, an error that seems to be independent of problem size. Experiment 2: We compare the computational runtimes of the three approaches. For this, we choose uniformly-distributed random function values F = (f1 , . . . , fN ) ∈ [−1, 1]N , N = N (m, n) as interpolation targets. We then measure the time required to generate the generic nodes P = p1 , . . . , pN (m,n) and add the time taken by each approach to solve the PIP with respect to f : Rm −→ R with f (pi ) = fi , i = 1, . . . , N (m, n). The average and min-max span (over 10 repetitions with different i.i.d. random function values) are shown in Fig. 2 versus the square of the problem size. Because all repetitions use the same set of generic nodes, the min-max-span error
70
M. Hecht and I. F. Sbalzarini
Fig. 3. Accuracy of the three approaches for n = 3, m = 2, . . . , 35.
bars are very small. The data clearly show that the PIP-SOLVER scales significantly better with problem size than the other approaches. For small problems, however, the overhead of the PIP-SOLVER may not be amortized and there is a cross-over (for our implementation and benchmark setup at m = 7 and n = 9) below which previous methods may be faster. This is shown in the inset plot for m = 2, . . . , 9 or n = 2, . . . , 10, respectively. However, the absolute runtimes are below 0.1 seconds at the point of cross-over, which may not be too relevant in practice. Moreover, we expect the cross-over to shift to lower N once a properly optimized implementation of the present approach is available. However, the results here confirm that the PIP-SOLVER scales better than any previous approach and that its computational cost is in O(N (m, n)2 ).
9
Potential Applications
We close by highlighting some potential applications of the present algorithms in scientific computing and computational science. However, this list is by no means exhaustive, as PIPs and FIPs are fundamental components of many numerical methods. The following applications may not be obvious, though: 9.1
Classical Numerical Computations
Given m, n ∈ N, m ≥ 1 and a function f : Ω ⊆ Rm −→ R. It is classical in numerical analysis to determine the integral Ω f dΩ, Ω ⊆ Rm and the derivative ∂v f (x0 ) of f in direction v ∈ Rm at x0 ∈ Rm . Solving the PIP, these desired quantities can easily be computed for the interpolation polynomial Qf of f evaluated at some collocation points, converging to the quantities of f with
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
71
increasing degree deg(Qf ) = n. While classical numerical methods can compute ∂v f (x0 ) by sampling f locally around x0 , classical integration methods require to sample f on a discrete set D ⊆ Ω exhausting the whole domain Ω with certain resolution. D is often chosen as a regular grid with #D = rm points, where r ∈ R+ reflects the resolution. Thus, #D grows exponentially with increasing dimension and so does the runtime of all classical integration methods. However, if f can be approximated by a polynomial of bounded degree, e.g. n ≤ 5, then we can determine the integral analytically. The estimation N (m, n) =
(m + n)n−1 (m + n)! ≤ ∈ O(mn ) n!m! n!
implies that due to Theorem 6 this method requires runtime in O(m2n ), which is therefore polynomial in dimension m for bounded degree n ∈ N. As suggested by Fig. 3, the PIP-SOLVER produces a constant absolute error εC in any coefficient of order 10−11 for n = 3 and m = 2, . . . , 35. If Ω ⊆ Rm is a hypercube of length l, by summing up the integrals for every monomial, we obtain a quadrature error bound of εC |α|+1 εC n+1 l l N (m, n) . εT ≤ |α| + 1 n+1 α∈Am,n
Since l N (m, n) ∈ O(mn ) for fixed length l and degree n, the total error grows polynomially with increasing dimension, but not exponentially as it does for the classical alternatives [20]. n+1
9.2
Numerical Solution of Partial Differential Equations
A core application of numerical analysis is the approximation of the solution of Partial Differential Equations (PDE). This always involves a spatial discretization scheme or solver. There are three classes of methods: collocation schemes, Galerkin schemes, and spectral methods. Spectral methods are based on Fourier transforms. Since FFTs are only efficient on regular Cartesian grids, spectral methods are hard to apply in complex geometries and in adaptiveresolution settings. The present FIP-SOLVER, however, is not limited to grids and could enable spectral methods on arbitrary distributions of discretization points in arbitrary geometries. Collocation methods can generally be understood as PIPs, as becomes obvious in the generalized formulation of finite-difference schemes and mesh-free collocation methods [21]. The inversion of the Vandermonde matrix implied in mesh-free methods, or the choice of mesh nodes in compact finite-difference schemes could benefit from the algorithms presented here. Finally, Galerkin schemes are based on expanding the solution of the PDE in some function basis, which is essentially what the PIP and FIP do for the orthogonal bases of monomials and trigonometric functions, respectively. It is therefore conceivable that the approaches presented here can be applied to other bases as well, potentially even non-orthogonal ones.
72
9.3
M. Hecht and I. F. Sbalzarini
Cryptography
A maybe surprising application is found in cryptography. There, the PIP is used to “share a secret” by choosing a random polynomial Q ∈ Z[x] in dimension m = 1. Knowing the values of Q at n + 1 different nodes enables one to determine Q(0) mod p for some large prime number p ∈ N. However, knowing only n “keys” prevents one for opening the “door”. Certainly, this method can be generalized to arbitrary dimensions and therefore our approach applies to this type of cryptography [22]. In particular, the improved numerical accuracy of PIP-SOLVER would prevent the reconstructed message from being corrupted by numerical noise. 9.4
Black-Box Gradient Descent
Gradient descent over multivariate functions is often used to (locally) solve nonconvex optimization problems, where Ω ⊆ Rm models the space of possible solutions for a given problem and f : Rm −→ R is interpreted as an objective function. Thus, one wants to minimize f on Ω. Often, the function f is not explicitly known ∀x ∈ Ω, but can be evaluated point-wise. If m 1, an interpolation of f on Ω is often not possible or not considered so far. Due to this circumstance, classical numerical methods like the Newton-Raphson iteration could not be used so far. Instead, discrete or stochastical gradient descent methods are applied to minimize f . However, these methods are time-consuming and inaccurate. Our approach allows to interpolate f even for m 1 and therefore to apply the classical Newton-Raphson method. Consequently, local minima can be found much faster and more accurately then by the previous methods. Moreover, the analytical representation of f allows to determine the global minimum, see [23,24]. 9.5
Analytical Gradient Descent
In principle, gradient descent methods use the following fact: Let x = x(t) be the global solution of the ODE x(t) ˙ = −∇f (x(t)), x(0) = x0 , where f : Ω ⊆ Rm −→ R, C. Then it is well known in Morse theory [25] that the set of points x0 ∈ Ω with x(+∞) that are not a minimum, i.e., x(+∞) is a saddle point, is of lower dimension than m and is therefore a zero-set with respect to the Lebesgue measure. Thus, in the generic situation, i.e., for f being a Morse function and for almost all x0 , we have that x(+∞) is a local minimum. If f can be interpolated by a polynomial Qf , one can hope to solve the ODE x(t) ˙ = −∇f (x(t)) analytically. For example, if f is a quadratic polynomial the ODE becomes linear and therefore easily solvable. Consequently, by sampling over initial conditions, the global optimum of the interpolation polynomial can be found faster than by using iterative gradient descent methods.
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
9.6
73
Deep Learning
Deep Learning methods consider artificial neural networks consisting of X0 , X1 , . . . , XK , K ≥ 1 layers, where Xi = (xi1 , . . . , xiki ), ki ∈ N, i = 0, . . . , K. X0 is the input layer, XK is the output layer and all other Xi are hidden layers. Each neuron xij possesses an activation function αij : [0, 1] −→ [0, 1]. In general, each neuron xij can send a signal to any other neuron xpq with rate Rijpq ∈ [0, r], r ∈ R+ . For chosen activation functions αij training the network aims to adjust the rates R ∈ R such that the output XK solves a given problem [2]. Training is formulated as an optimization problem over a set of training data T = (X0,1 , Xopt,1 , . . . , X0,d , Xopt,d ), d ∈ N of input and optimal output layers. Further, an energy/loss function E : R −→ R+ is introduced to measure the d quality of a generated output, e.g., E(R) = i=1 ||XK,i −Xopt,i ||2 , where XK,i is the output for input X0,i with respect to R. We denote by σ : X0 ×R −→ XK , the signal function of the neural network depending on the input layer and the rates. Then, back-propagation methods allow a stoschastical gradient approximation of E : R −→ R+ at R0 ∈ R, i.e., ∇R (E)(R0 ) =
d
∇R ||σ(X0,i , R0 ) − Xopt,i ||2 .
i=1
Discrete gradient descent then yields a locally optimal configuration of rates Ropt ∈ R. The weakness of such training is that a large amount on training data are required, i.e., d 1, and that the gradient descent is time-consuming and in-accurate, as already mentioned in Sect. 9.4. Using the PIP-SOLVER we can approximate σ by interpolating all coordinate functions σi , i = 1, . . . , kK . This fact has three crucial consequences: (C1) For given training data T , we can parameterize the manifold of optimal rates by MR,T = {R ∈ R σ(X0,i , R) = Xopt,i , ∀ i = 1, . . . , K} . Since the coordinate functions of σ are understood as polynomials, MR,T is given as an intersection of algebraic hyper-surfaces. MR,T constrains the rates as known in manifold learning [2]. Furthermore, if R0 ∈ MR,T is a design center, i.e., R0 maximizes the distance dist(R0 , ∂MR ) to the boundary of MR . Then R0 is robust against perturbations and therefore it is likely that R0 will remain good rates also for new inputs, which were not trained so far. An efficient randomized algorithm for finding approximate solutions to the NP-hard design-centering problem has recently become available [26]. In addition, the volume of MR,T can be approximated by this algorithm, yielding a measure of whether the neural network is over-fitted or not. If vol(MR,T ) vol(R) = rdim MR then the neural network is likely to be over-fitted and not to generalize robustly. (C2) We can apply the more efficient gradient descent methods introduced in Sects. 9.4 and 9.5 to more rapidly find optimal rates R0 ∈ R. If R0 is such that R0 ∈ MR,T , then we additionally know that R0 is a global minimum of
74
M. Hecht and I. F. Sbalzarini
E ◦ σ with respect to training data T , i.e., E ◦ σ is minimal on MR,T . Thus, for larger training data T ⊇ T we have that MR,T ⊆ MR,T . Thus, starting with R0 ∈ MR,T , it suffices to apply a gradient decent tangently to MR,T to reach a point R1 ∈ MR,T . (C3) If #R is too large to interpolate σ at once in reasonable time, an admissible decomposition of the network might enable interpolations on the resulting sub-networks. Consequently, a hybrid approach of stochastical gradient descent and signal interpolation might improves the current state of the art performance.
10
Conclusion
We have presented efficient solutions of the PIP and FIP in high dimensions and proved their correctness and runtime bounds. We experimentally evaluated a prototype implementation of the PIP-SOLVER and have shown that it performs better then the known alternatives and scales as theoretically predicted, which is important for Big-Data applications. The FIP-SOLVER also relaxes the requirement of FFT methods that the points need to be on a regular Cartesian grid. Summarizing, our results open the door to significant progress in computational sciences with many potential applications. In the future, besides working on those applications, developing interpolations with respect to scattered data and other L2 -function bases seem to be the most promising directions.
References 1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006) 2. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 3. Pal, A., Pal, S.K.: Pattern Recognition and Big Data. World Scientific (2016) 4. Chu, E., George, A.: Inside the FFT black box: serial and parallel fast Fourier transform algorithms. CRC Press (1999) 5. Rockmore, D.N.: The FFT: an algorithm the whole family can use. Comput. Sci. Eng. 2(1), 60–64 (2000) 6. Edwards, K.J., Gaber, M.M.: Astronomy and Big Data: A Data Clustering Approach to Identifying Uncertain Galaxy Morphology. Springer, Heidelberg (2014) 7. Kremer, J., Stensbo-Smidt, K., Gieseke, F., Pedersen, K.S., Igel, C.: Big universe, big data: machine learning and image analysis for astronomy. IEEE Intell. Syst. 32(2), 16–22 (2017) 8. Altaf-Ul-Amin, M., Afendi, F.M., Kiboi, S.K., Kanaya, S.: Systems biology in the context of big data and networks. In: BioMed Research International, vol. 2014 (2014) 9. Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D.P., Kania, R., Schaeffer, M., St. Pierre, S. et al.: Big data: the future of biocuration. Nature 455(7209), 47–50 (2008) 10. Telenti, A., Pierce, L.C., Biggs, W.H., di Iulio, J., Wong, E.H., Fabani, M.M., Kirkness, E.F., Moustafa, A., Shah, N., Xie, C., et al.: Deep sequencing of 10,000 human genomes. In: Proceedigns of National Academy of Science, USA (2016). 201613365
Fast Interpolation and Fourier Transform in High-Dimensional Spaces
75
11. Walker, S.J.: Big data: a revolution that will transform how we live, work, and think (2014) 12. Nunan, D., Di Domenico, M.: Market research & the ethics of big data. Intl. J. Market Res. 55(4), 505–520 (2013) 13. Hecht, M., Cheeseman, B.L., Hoffmann, K.B., Sbalzarini, I.F.: A quadratic-time algorithm for general multivariate polynomial interpolation. arXiv Preprint DOI arXiv:1710.10846. math.NA 14. Strassen, V.: Gaussian elimination is not optimal. Numerische mathematik 13(4), 354–356 (1969) 15. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965) 16. Beard, J.: The FFT in the 21st century: Eigenspace processing. Springer, Heidelberg (2013) 17. Olver, P.J.: On multivariate interpolation. School of Mathematics, University of Minnesota, Minnesota (2009) 18. Yao, T.H., Chung, K.C.: On lattices admitting unique Lagrange interpolations. SIAM J. Numer. Anal. 14(4), 735–743 (1977). https://doi.org/10.14658/pupjdrna-2015-Special Issue-4 19. Meijering, E.: A chronology of interpolation: from ancient astronomy to modern signal and image processing. Proc. IEEE 90(3), 319–342 (2002) 20. Stoer, J., Bauer, F.L., Bulirsch, R.: Numerische Mathematik, vol. 4. Springer, Heidelberg (1989) 21. Schrader, B., Reboux, S., Sbalzarini, I.F.: Discretization correction of general integral PSE operators for particle methods. J. Comput. Phys. 229(11), 4159–4182 (2010). https://doi.org/10.1016/j.jcp.2010.02.004 22. Shamir, A.: How to share a secret. Commun. ACM 22(11), 612–613 1979. [Online]. Available: http://doi.acm.org/10.1145/359168.359176 23. Hanzon, B., Jibetean, D.: Global minimization of a multivariate polynomial using matrix methods. J. Global Optim. 27(1), pp. 1–23 (2003). http://dx.doi.org/10. 1023/A:1024664432540 24. Parrilo, P.A., Sturmfels, B.: Minimizing polynomial functions. Algorithmic and quantitative real algebraic geometry, DIMACS Series in Discrete Mathematics and Theoretical Computer Science 60, 83–99 (2003) 25. Milnor, J.: Morse theory. In: Annals of Mathematics Studies, vol. 51. Princeton University Press (1963) 26. Asmus, J., M¨ uller, C.L., Sbalzarini, I.F.: Lp-adaptation: simultaneous design centering and robustness estimation of electronic and biological systems. Sci. Rep. 7 (2017)
A Reconfigurable Architecture for Implementing Locally Connected Neural Arrays J. E. G-H-Cater(B) , C. T. Clarke, B. W. Metcalfe, and P. R. Wilson Department of Electronic and Electrical Engineering, University of Bath, Bath, England
[email protected]
Abstract. Moore’s law is rapidly approaching a long-predicted decline, and with it the performance gains of conventional processors are becoming ever more marginal. Cognitive computing systems based on neural networks have the potential to provide a solution to the decline of Moore’s law. Identifying common traits in neural systems can lead to the design of more efficient, robust and adaptable processors. Despite the potentials, large-scale neural systems remain difficult to implement due to constraints on scalability. Here we introduce a new hardware architecture for implementing locally connected neural networks that can model biological systems with a high level of scalability. We validate our architecture using a full model of the locomotion system of the Caenorhabditis elegans. Further, we show that our proposed architecture archives a ninefold increase in clock speed over existing hardware models. Importantly the clock speed for our architecture is found to be independent of system size, providing an unparalleled level of scalability. Our approach can be applied to the modelling of large neural networks, with greater performance, easier configuration and a high level of scalability. Keywords: Neural network · Neuromorphic · PNAA Caenorhabditis elegans · Reconfigurable hardware
1
Introduction
Moore’s Law has, thus-far, provided a clear path for improving computational solutions and their associated power efficiency through the shrinking of transistor scales. However Moore’s Law is fast approaching its long-predicted end and novel ways to continue the trend must therefore be developed [1]. Carver Mead drew parallels between the operation of neurons and that of transistors, concluding that greater efficiency may be achieved in transistor based systems by fundamentally changing the way the devices are used [2]. Mead went on to coin the term ‘Neuromorphic Systems’ to refer to computing platforms that are in some way inspired by naturally occurring neural processing systems, and since this original work there have been many attempts to implement various types of neuromorphic systems. c Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 76–92, 2019. https://doi.org/10.1007/978-3-030-01177-2_6
A Reconfigurable Architecture
77
In the last two decades, there has been an increase in research efforts to advance our understanding of brain behaviour using massively parallel processing architectures [1]. Many such systems use communications principles, such as packet switched networks, to produce a fully connected and therefore highly configurable neural architecture. Full connectivity between all neurons provides maximum flexibility, at the potential price of redundancy in unused connections. While beneficial for investigating new topologies or training regimes, this exhaustive architecture can result in large hardware redundancies. The internal routing often presents design challenges, and contributes significantly to the scale or power consumption of the final device. Drawing inspiration from nature, this work proposes a new reconfigurable neural architecture that is optimised for predominantly locally connected network implementations. Section 3 outlines existing hardware neural arrays, comparing their power consumption and implementations. The relevant approaches for modelling neurons are discussed in Sect. 2, where the key aspects, features and limitations of the various techniques are reviewed. These concepts are then encapsulated in a proposed programmable neuron array architecture (PNAA) in Sect. 4. Finally, Sect. 5 details a C. elegans locomotive model implementation using the PNAA architecture provided in Sect. 4, with the validation of the results against previous work and a discussion given in Sect. 6.
2
Modelling of Neurons
Representing the basic actions of a neural network in hardware requires the selection of an appropriate and representative model for the particular action or mechanism of interest. There are a significant number of models available, and a large subset of these see frequent use in neuromorphic systems. It is important to note that neuromorphic hardware is using a representation of a neuron (and synaptic behaviour) to provide a suitably accurate implementation, not simply for simulation purposes. Cases that require detailed neurological representations (such as applications interfacing directly to the nervous system itself) may require biophysically accurate details, allowing for the precise and complex emulation of neural systems. In contrast, neuromorphic systems that are designed for a single computational task may use a more abstract approach, resulting in greater computational efficiency at the cost of biophysical accuracy. A number of available models are shown in Fig. 1, with the trade-off between biophysical accuracy and computational efficiency illustrated. One of the most famous biophysically accurate models was constructed by Alan Hodgkin and Andrew Huxley, who were exploring the relationship between chemical and electrical signalling within a squid (Loligo forbesi ) giant axon [3]. This model has had a large impact in the field of neuromorphic systems [4], however the use of this model is limited because it is both computational expensive and does not scale well for simulations of many neurons at a time. A significantly
78
J. E. G-H-Cater et al.
simpler model, the binary model, was first developed in 1943 by McCulloch and Pitts [5]. In this model, each neuron has one or more inputs and a single output. The inputs are summed and compared with an internal threshold that, when exceeded, activates the output. This economical approach to neural dynamics is highly efficient, but less biologically relevant.
Fig. 1. The trade-off between biophysical accuracy and computational efficiency for a number of model classes.
There are some models that appear to exist outside of the scale shown in Fig. 1, such as the Izhikevich model, which uses bifurcation theory to greatly simplify the Hodgkin-Huxley dynamics [6]. The resultant model is highly efficient while providing a good approximation of the bursting dynamics seen within nature, however, the accuracy of the temporal morphology of individual action potentials is forfeit. Different models will require different communications hardware support. Bio-physically accurate models may require attenuation or dendrite emulation as an inherent feature of their connection implementation. Spiking neural networks will produce sparse pulse trains which must be transmitted with known timings and amplitudes, while binary systems will only require the communication of a single bit. The model selection for any neuromorphic system must therefore be considered before the interconnect technology is designed. The trade-off for the modelling of neurons depends partly on the scale of the network to be analysed. For example if the number of neurons is low, the scope
A Reconfigurable Architecture
79
for using detailed biophysical models may be justified, however as the number of neurons moves into the 100s (and beyond) the simulation time required may be difficult to justify. An effective understanding of biological behaviour in neural networks can be achieved by using a simplified approach to modelling neurons either in simulations or on hardware emulators [7]. The method of choice for implementing neural networks can therefore be made in a pragmatic fashion based on a trade-off between accuracy and speed, depending largely on the scale and size of the network in question.
3
Implementing Neural Architectures
The implementation of connectivity in neuromorphic hardware is device specific. For example, the SpiNNaker platform has the ability to emulate massively parallel neural networks. It uses a packet switched asynchronous network, based on a fabric of ARM9 cores on dedicated multi-core processors [8–10]. The SpiNNaker architecture is highly flexible, however it suffers from excessively high power consumption when compared to other approaches. SpiNNakers key strength is in its ability to handle arbitrary local and global connections, in other words it supports full arbitrary connectivity between any neurons, and it has been used in a range of applications including tasks such as vision systems [11]. Other example neuromorphic systems are TrueNorth and Neurogrid. TrueNorth is a platform whereby (local) short-distance communications are handled by an on-chip 64K crossbar array and asynchronous routers, while (global) long-distance communications are handled by protocols for intra-chip connections [12]. In contrast, Neurogrid is a mixed analogue-digital implementation that operates in the deep sub-threshold region of the semiconductor device. Synaptic inputs between spatially neighbouring neurons are distributed using a custom local arbor system, or a multi-cast router [13]. A common denominator of neuromorphic platforms is the flexibility to handle arbitrary levels of connectivity between any and all neurons. Whilst this is a clear advantage in general, for specific examples this can lead to less efficiency. A key approach to manage this issue is the implementation of a neural fabric that can be optimally configured for different scales of connectivity, whilst retaining the reconfigurability of the large scale neuromorphic implementations. 3.1
Modelling Connectivity
In order to understand the connectivity in hardware neural networks we can apply ‘Rent’s Rule’ to relate the internal (local) and external (global) connections in the network. Rent’s Rule was described in [14] however the basis was first explained earlier in [15]. More specifically, Rent’s rule is the relationship between the number of terminals (T ) and internal components, such as logic gates, (g) with the equation given in (1). (1) T = tg p
80
J. E. G-H-Cater et al.
Where t and p are variables that define the connectivity of the network in question and p < 1. The relation can be used to indicate the level of connectivity, for example in neural networks, in which the terminals are synapses and the gates are neurons. Understanding the value of p and how it correlates to a network’s connectivity can be used to optimize the resulting hardware implementation. An important aspect of Rent’s rule is that if the equation is re-framed in terms of logarithms [16], the relationship between T and g becomes linear when both axes are expressed in logarithmic terms as shown in (2). log(T ) = log(t) + p ∗ log(g)
(2)
This results in an offset value log(t) and the slope (in the log domain) determined by the Rent coefficient p. One of the interesting observations in nature is that this coefficient tends to be the same for all neural structures. Typically it is about p = 0.75 for both lower order animals such as the C. elegans, and human brains. A random circuit, however has a rent value of p = 1 [16]. 3.2
Hardware Comparison
A comparison of SpiNNaker, Neurogrid and TrueNorth’s relative power per neuron may be seen in Table 1. SpiNNaker is largely simulation focused, using conventional processors to emulate its neural models. Both Neurogrid and TrueNorth, however, are hardware implementations and have recognised a significant power efficiency as a result. This supports the theory that such dedicated hardware may provide low-power neuromorphic processing solutions. Table 1. Power usage per neuron in three of the main hardware neural network systems System
Quoted power Quoted neuron count Power per neuron (W) (µW)
SpiNNaker 0.75
17,000
Neurogrid
3.10
1,000,000
44.12 3.10
TrueNorth 0.65
1,000,000
0.65
All three of these systems make use of packet-based communications resulting in fully connected systems (i.e. T = 1) that support the emulation of any network topology. From a practical perspective, the use of spiking neuron networks is an efficient mechanism for modelling neural networks, and is particularly appropriate for the hardware implementation on digital platforms such as FPGAs. Given the locally connected nature of biological networks it may be possible to improve the capabilities of the packet-based systems by exploring the effect of local vs global connectivity.
A Reconfigurable Architecture
4
81
Existing Programmable Neural Array Architecture
In this section we demonstrate the limitation with an existing programmable neural array that exploits local connectivity, and we define our improved design that overcomes such limitations. 4.1
Locally Connected Architectures
An existing architecture for implementing locally connected neural arrays is that of Bailey [7]. In this approach the forming of networks of neurons may be implemented by connecting them all to a shared bus. If each neuron is assigned an address it is possible to ensure that only one neuron drives the bus at a time using a global controller [7,17]. However, this implementation has a fundamental drawback that the address space must be accessed at every simulated time-step. The implication of such an approach is that two clocks must be implemented, the internal system clock, which drives the address unit, and the simulation clock, which marks off a single simulated time-step. Since every address (1 to n) must be accessed in each system step, the simulation clock frequency, Fc , is limited by the internal system clock frequency, Fi , as shown in (3). This means that the simulation clock frequency is directly connected to the network size, making this design unscaleable. Fi (3) n An alternative solution is to consider the basic structure of multilayer feedforward neural networks, which are widely used in a number of applications [18–21]. In this type of system the synapses and neurons are brought together in what is referred to as a ‘node’. These nodes can be arranged in layers, with each layer connected to the previous and next layer by two separate buses. Input/Output (IO) units are situated on the extremities of each layer and a full layer of IO units is added as the first and last layers of the system. Each row is assigned a unique address and a global address controller is used to dictate which row of nodes are driving the buses at any given time. An internal register in each node controls which addresses the node receives input from, allowing a node to connect to any of its neighbours in the previous layer. Each node can operate in a number of modes, such as forward, reverse or passthrough, depending on the requirements of the network. This system allows a designer to add as many layers as they desire with only the internal system clock propagation delay acting to limit the simulation clock rate. The simulation clock frequency is limited by the number of nodes in a single layer. The internal clock frequency is limited by the node propagation delay, which may differ between network implementations if pass-through nodes are utilized. Fc =
4.2
2-Dimensional Hardware Architecture
We propose a 2-Dimensional architecture that overcomes the issues with scalability in locally connected networks that is present in the work of Bailey [7].
82
J. E. G-H-Cater et al.
This Programmable Neural Array Architecture (PNAA), shown in Fig. 2, expands the idea of locally connected networks into two-dimensions.
Fig. 2. Infrastructure of the PNAA, showing a number of ‘loop’ configurations labelled A to E. The circular nodes are the IO blocks that surround all edges of the architecture, the square nodes represent the processing, or neuron, blocks. Potential connectivity is indicated by the dashed lines and in the example loops given actual connectivity is indicated by a solid line.
The system is made up of two distinct elements, the node, which is an arbitrary neuron implementation and the IO block, operating as a connection to external actuators and sensors. Nodes are connected in loops, as shown in Fig. 2. All nodes on a loop can communicate with one another and are assigned an address within the loop, with the largest loop dictating the largest address required. The system clock speed is dependent on the network and may be loosely associated to the largest number of connections that the most connected neuron requires to or from itself. In Fig. 2 loop ‘C’ is seen to be the largest connected loop, with 8 units contained within its domain. The system clock must therefore run at least 7 times slower than the internal clock, regardless of the number of neurons used overall. The loops are implemented as a ring of shift registers, with a register located at each node input, meaning that the propagation delay remains fixed regardless of loop size. This architecture can mimic a fixed-width layered network by simply assigning loops down the full length of each column. This architecture supports two-dimensional scaling which is independent of the system clock timings. A pseudo-global connection can be achieved, with any node located a maximum of one node away from any other node, and a single neuron can send and receive in four directions at the same time allowing for rapid transmission through a network. This architecture can support locally connected networks with a small number of global connections.
A Reconfigurable Architecture
4.3
83
Intermediate Representation
In order to produce a scalable and practical PNAA it is important to consider how a particular network may be synthesised onto hardware. There are a number of different methods for high-level representation of a network. An intermediate translation must be defined to allow a user to move from such high-level descriptions to a physical mapped solution and this often reflects the physical architecture through its defined structure. A number of different connection options are shown in Fig. 3 as an example of the available connection routing. Each node can support four connections, with one starting on each face of the node. These optional connections are used to form loops of nodes, allowing data-flow between neurons.
Fig. 3. Examples of connection options for a node in the PNAA, the interconnect blocks may be configured for X = not used, 1 = straight and 0 = clockwise routing patterns. This enables a programmable connection path between a source node and a connected node.
The implications of the interconnect options offered by this architecture must also be considered when mapping a high-level design, and inherent to this is the concept of locality between neurons. For two neurons to be connected they must exist within the one-another’s neighbourhoods and it is therefore up to the synthesis engine to find an arrangement of nodes which results in all nodes sitting within the neighbourhoods of their connected nodes. This task becomes a fitting problem, similar to that performed by FPGA synthesis engines. Neurons sitting on the same row or column will share in the row or column neighbourhood, respectively. This can be used to create layers of neurons, allowing a simple implementation of feed-forward neural networks. 4.4
Configuration
Each node is connected to a set of interconnect switches that can be configured to route connections to the neighbouring nodes in a North, East, South, West fashion. The PNAA has a rectangular structure and so the configuration process may be split into columns, where each column starts and stops with an IO block.
84
J. E. G-H-Cater et al.
Each neuron has three standard configurable parameters, the input weights, the input bias and the threshold. The bias and threshold values are singular for each neuron block, whereas the weights are a vector with an entry for each possible connected input.
5
Results from a C. elegans Locomotive Model
Sect. 4.2 has provided an overview for a new type of PNAA, demonstrating the fundamental principles by which locally connected neural networks may be implemented in an efficient fashion. In this section we demonstrate this architecture using a working model of a biological system. 5.1
Overview of the C. elegans locomotory System Model
The locomotory system of the free living nematode Caenorhabditis Elegans (C. elegans) was used demonstrate the ability of this programmable neural array to replicate a real neural network that exhibits a high level of local connectivity. The C. elegans is a transparent free-living nematode with a generation time of about 3.5 days [7]. With only 302 neurons in its nervous system, the C. elegans is the only organism to have had its connectome (the neuronal ‘wiring diagram’) fully mapped [22]. The relative simplicity, and high level of local connectivity of the C. elegans locomotory nervous system makes it a good naturally occurring network against which artificial solutions may be verified. Subsets of neurons within the C. elegans can be identified by the incorporation of appropriate promoter sequences to drive expression of fluorescent proteins. A micrograph image is given in Fig. 4 where the animals were transgenic for the expression of two fluorescent proteins DS-red and GFP (green fluorescent protein). The subset of neurones expressing the neuropeptide FLP-8 fluoresce green, and those containing the putative receptor fluoresce red. To demonstrate the operation of our model at a network level we present simulations of the locomotory system of C. elegans on the PNAA described in Sect. 4.2. The structure of the C. elegans locomotory system is regular, and there are very few variations from animal to animal within the species. In total there are 86 neurons and 180 synapses [22]. There are six different types of neuron and four different types of synapse. Further, the locomotory system of the C. elegans has already seen common usage in both the MBED Cellular Automata Neuron model by Claverol et al. [23] and by Bailey [7]. The locomotion system that was modelled by Bailey used a direct implementation VHDL model [17], and was based upon work performed by Claverol [23]. This model showed that an artificial neural network was capable of generating the same motor neuron patterns as had been observed in living C. elegans. The locomotive model of the C. elegans may be divided into small, repeated segments (as shown in Fig. 5a). Each segment exhibits a high order of local connectivity, with relatively sparse connections from one segment to another.
A Reconfigurable Architecture
85
By utilising the local connectivity seen within this model, it may be implemented using the PNAA system described in Sect. 4.2 (an example mapping is shown in Fig. 5b). The largest loop within this implementation, which lies along the width of a single segment, contains only 10 neurons. This system therefore requires only 9 clock cycles per simulated system time-step regardless of the number of segments implemented. This represents a 9× speed increase in the 10 segment system when compared to the direct implementation of Bailey [7]. The sensory AVA and AVB neurons are actually two single neurons that connect along the entire length of the C. elegans. These neurons have no incoming connections from within the network and they may therefore be divided into a number of identical neurons that are all driven by the same external stimulus, as shown in Fig. 5a and b. 5.2
Validation of the C. elegans Model using Bi-phase Stimulation
In order to validate the mapping of the model onto the PNAA, an RTL simulation was performed for a hypothetical C. elegans with 10 segments. Forward and backwards locomotion was achieved using a bi-phasic control signal which produced the appropriate stimulation to the head and tail of the model, the resulting motor neuron activations are shown in Fig. 6a. The top trace (Control) shows the control signal, resulting in motor neuron activation along both the ventral and dorsal sides in a sinusoidal fashion. The result of the bi-phasic control signal is an alternating forwards and backwards locomotion.
Fig. 4. Fluorescent subset of neurones within C. elegans showing projections running alongside the pharynx. The overall structure of the head region can be seen in the blue, overlaid transmission. (Image courtesy of John Chad and Ilya Zheludev).
86
J. E. G-H-Cater et al.
Fig. 5. C. elegans locomotion model and equivalent architecture
A Reconfigurable Architecture
87
Fig. 6. C. elegans simulation results.
The top bank of traces represent the Dorsal motor neurons and the lower bank the Ventral motor neurons, activation is shown as a solid block of colour due to the high frequency oscillations of the neurons outputs. Neurons numbered 0 are the head end, and 9 the tail end. As the stimulus pulse is alternated from head to tail and back again there is a propagating wave of muscle activation along the length of the model. This wave produces a sinusoidal locomotion action. Vertical lines are drawn every 500 ms on the x-axis. Activity begins on the Ventral side at the head (VM0) and propagates steadily down through the Dorsal muscles reaching the tail end (VD9) in approximately 2900 ms. Comparing this result to that of Bailey and Claverol confirms the same propagating waves are observed in all models. This evidence shows that the PNAA model is behaving correctly when compared against previous work. 5.3
Validation of the C. elegans Model Coiling Behaviour
A second behaviour explored in earlier models was coiling. Figure 6b shows the activation of the Dorsal and Ventral motor neurons in response to a stimulus
88
J. E. G-H-Cater et al.
applied to the Ventral side of both the head and tail simultaneously. Motor neuron activation begins at each end of the model and propagates towards the centre. This muscle activation results in a coiling motion towards the stimulus, which is consistent with the models of both Bailey and Claverol and further confirms the effectiveness of the PNAA model when compared to previous work. 5.4
Validation of the C. elegans Model UNC25 Knockout Behaviour
A third behaviour explored in earlier models was when the UNC25 gene knockout mutation is applied to the model. In this circumstance the Dorsal and Ventral motor neurons response to a constant stimulus is modified. Motor neuron activation begins on both the Dorsal and Ventral sides of the model and propagates from head to tail, this muscle activation results in a shrinking behaviour as each of the motor neurons lock into a constant firing mode. This behaviour manifests itself in both the Dorsal and Ventral motor neurons firing in the same sequence leading to all the neurons firing simultaneously This result is consistent with the models of both Bailey and Claverol and further confirms the effectiveness of the PNAA model when compared to their previous work.
6 6.1
Discussion of Results Introduction
Given the range of possible modelling approaches for systems such as the C. elegans locomotory system, there is always a question over the correct level of detail to accurately reflect the true biological behaviour in simulation. In many cases it comes down to a subjective choice based on software preferences, hardware availability and ease of use. However, it is possible to have some basis for comparison using four fundamental criteria; biological accuracy, scalability, performance (which could be in terms of speed, area or both) and flexibility. Therefore, we will consider the three fundamental parameters of scalability, performance and flexibility in a qualitative manner, using quantifiable metrics wherever possible. In each case the proposed approach will be compared with the previous hardware implementations of Claverol [23] and Bailey [7]. 6.2
Biological Accuracy
As has been demonstrated in Sect. 5, the basic locomotory behaviour, coiling and the effect of UNC-25 knockout has been validated in the model as implemented on the PNAA architecture. These results are consistent not only with the observed biological behaviour, but also with the hardware models developed by Claverol [23] and Bailey [7]. There are more mutation conditions that could be evaluated however these three fundamental tests are validation of the efficacy of the model as they show the independent neuron behaviour under locomotion,
A Reconfigurable Architecture
89
the asymmetric behaviour of coiling and the shrinking behaviour of the UNC-25 knockout. This clearly demonstrates the fundamental behaviour of the model is correct. 6.3
Scalability
The key advantage of our PNAA approach is its 2-dimensional nature and the ability to scale the model in both axes arbitrarily. Using the software interface, the design is abstracted to a network level making it very easy to generate and programme a network accordingly. The connectivity is assumed to be highly local, and while this will not provide the full connectivity for completely arbitrary networks, global connections can be defined in addition to the core local networks. This is analogous to the approach used in the well known GALS systems. Importantly, this feature permits the rapid scaling of models such as C. elegans to an arbitrary number of segments without any change in network architecture. 6.4
Performance (Speed/Resources)
From a resource perspective the PNAA is an efficient method, by using a level of granularity that maps from the neuron network level directly to hardware, with sufficient local connectivity to ensure that the model utilizes the connectivity available. The intrinsic efficiency of the clocking approach leads directly to a speed improvement of 9× in the 10 segment system when compared to the direct implementation of Bailey [7]. This is primarily due to the efficient connectivity in this implementation that limits the loop size, therefore for locally connected networks, with relatively low numbers of connections, a performance improvement can be dramatic as in this case. The largest loop within this implementation, which lies along the width of a single segment, contains only 10 neurons. This system therefore requires only 9 clock cycles per simulated system time-step regardless of the number of segments implemented. 6.5
Flexibility
The abstraction of the model to a hardware architecture makes the whole approach extremely flexible from a network definition and programming perspective. Unlike the fully coded approaches of Claverol and Bailey. The library based approach of Bailey meant that a standard synthesis software tool would be required to configure the network, making the ultimate design not necessarily representative of the locally connected nature of the nervous system being analysed. Using the software interface to define the connectivity and the weights makes the process straightforward, and most importantly the definition of an underlying hardware fabric gives a known structure and fundamental behaviour that is consistent and simple to extend or modify.
90
6.6
J. E. G-H-Cater et al.
Future Work
Using the PNAA as a test platform, the impact of hardware enforced locality on neural systems may be easily assessed. Classic artificial neural models, such as convolutional neural networks, demonstrate high measures of locality and further mapping of these systems onto the PNAA will provide meaningful insight into how a networks underlying structure and it’s connection requirements relate to one another. A digital model of the C. elegans is also under construction, using physical constraints and a soft-body IK model to support simulations of the nematodes locomotive response to neural stimulus.
7
Conclusion
With Moore’s Law reaching its inevitable end, a new processing paradigm must be developed. A number of neurological hardware implementations have been identified and compared, leading to the observation that most hardware implementations are large systems with support for full global connectivity. However, many naturally occurring neurological systems are locally-connected specialised systems. A constraint focused approach has been outlined, resulting in a new programmable neural array architecture or PNAA. Sections 4 outlines our novel approach to connectivity within neural hardware. Focusing on predominantly local resources while allowing a number of neurons to connect to any other neuron within the system, this architecture can recognise speed improvements over other address based systems. A form of synthesis or intermediate representation is discussed, showing how a design may be implemented on such a reconfigurable architecture. Further, an implementation of the C. elegans locomotive model is developed, validating our architecture and resulting in a speed increase of around 9× that of previous solutions when modelling a 10 segment system. Our approach can be applied to the modelling of many different neural networks, with a potential for greater performance, easier configuration and most importantly a high level of scalability. Such systems can provide a novel solution to the problems associated with conventional processing methods leading to new and efficient processing technologies. Acknowledgements. The authors gratefully acknowledge the permission to reproduce the image of the C. elegans nervous system taken by Dr. Ilya Zheludev working with Professor John Chad at the University of Southampton.
References 1. Monroe, D.: Neuromorphic computing gets ready for the (really) big time. Commun. ACM 57(6), 13–15 (2014) 2. Mead, C.: Neuromorphic electronic systems. Proc. IEEE 78(10), 1629–1636 (1990)
A Reconfigurable Architecture
91
3. Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952) 4. Catterall, W.A., Raman, I.M., Robinson, H.P.C., Sejnowski, T.J., Paulsen, O.: The Hodgkin-Huxley heritage: from channels to circuits. J. Neurosci. 32(41), 14 064– 14 073 (2012) 5. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943) 6. Izhikevich, E.: Simple model of spiking neurons. IEEE Trans. Neural Netw. 14(6), 1569–1572 (2003) 7. Bailey, J.A.: Towards the neurocomputer: sn investigation of VHDL neuron models. Ph.D. thesis, University of Southampton (2010) 8. Furber, S.B., Lester, D.R., Plana, L.A., Garside, J.D., Painkras, E., Temple, S., Brown, A.D.: Overview of the SpiNNaker system architecture. IEEE Trans. Comput. 62(12), 2454–2467 (2013) 9. Furber, S.B., Galluppi, F., Temple, S., Plana, L.A.: The SpiNNaker project. Proc. IEEE 102(5), 652–665 (2014) 10. Painkras, E., Plana, L.A., Garside, J., Temple, S., Davidson, S., Pepper, J., Clark, D., Patterson, C., Furber, S.: SpiNNaker: a multi-core System-on-Chip for massively-parallel neural net simulation. In: Proceedings of the IEEE 2012 Custom Integrated Circuits Conference, vol. 4, pp. 1–4. IEEE, September 2012 11. Kawasetsu, T., Ishida, R., Sanada, T., Okuno, H.: Live demonstration: a hardware system for emulating the early vision utilizing a silicon retina and SpiNNaker chips. In: 2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings, p. 188, October 2014 12. Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.-J., Taba, B., Beakes, M., Brezzo, B., Kuang, J.B., Manohar, R., Risk, W.P., Jackson, B., Modha, D.S.: TrueNorth: design and Tool Flow of a 65 mW 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 34(10), 1537–1557 (2015) 13. Benjamin, B.V., Gao, P., McQuinn, E., Choudhary, S., Chandrasekaran, A.R., Bussat, J., Alvarez-Icaza, R., Arthur, J.V., Merolla, P.A., Boahen, K.: Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. Proc. IEEE 102(5), 699–716 (2014) 14. Lanzerotti, M.Y., Fiorenza, G., Rand, R.A.: Microminiature packaging and integrated circuitry: the work of E. F. Rent, with an application to on-chip interconnection requirements. IBM J. Res. Dev. 49(4.5), 777–803, July 2005 15. Landman, B., Russo, R.: On a pin versus block relationship for partitions of logic graphs. IEEE Trans. Comput. C-20(12), 1469–1479 (1971) 16. Bassett, D.S., Greenfield, D.L., Meyer-Lindenberg, A., Weinberger, D.R., Moore, S.W., Bullmore, E.T.: Efficient physical embedding of topologically complex information processing networks in brains and computer circuits. PLOS Comput. Biol. 6(4), e1 000 748+ (2010). http://dx.doi.org/10.1371/journal.pcbi.1000748 17. Bailey, J.A., Wilcock, R., Wilson, P.R., Chad, J.E.: Behavioral simulation and synthesis of biological neuron systems using synthesizable VHDL. Neurocomputing 74(14–15), 2392–2406 (2011) 18. Azalan, M.S.Z., Paulraj, M.P., bin Yaacob, S.: Classification of hand movement imagery tasks for brain machine interface using feed-forward network. In: 2014 2nd International Conference on Electronic Design (ICED), vol. 3, pp. 431–436. IEEE, August 2014
92
J. E. G-H-Cater et al.
19. Deng, B., Zhang, M., Su, F., Wang, J., Wei, X., Shan, B.: The implementation of feedforward network on field programmable gate array. In: 2014 7th International Conference on Biomedical Engineering and Informatics, no. BMEI, pp. 483–487. IEEE, October 2014 20. Upadhyay, D., Tarun, N., Nayak, T.: ANN based intelligent controller for inverted pendulum system. In: 2013 Students Conference on Engineering and Systems (SCES), pp. 1–6. IEEE, April 2013 21. Arce, P., Arce, P.: Forecasting high frequency financial time series using parallel FFN with CUDA and ZeroMQ. In: 2012 9th Asia-Pacific Symposium on Information and Telecommunication Technologies (APSITT, November 2012 22. White, J.G., Southgate, E., Thomson, J.N., Brenner, S.: The structure of the nervous system of the Nematode Caenorhabditis elegans. Philos. Trans. Roy. Soc. B Biol. Sci. 314(1165), 1–340 (1986) 23. Claverol, E.T.: An event-driven approach to biologically realistic simulation of neural aggregates. Ph.D. dissertation, University of Southampton (2000)
Minimum Spanning Tree Problem with SingleValued Trapezoidal Neutrosophic Numbers Said Broumi1(&), Mohamed Talea1, Assia Bakali2, Florentin Smarandache3, and Santanu Kumar Patro4 1
Laboratory of Information Processing, Faculty of Science Ben M’Sik, University Hassan II, B.P. 7955 Sidi Othman, Casablanca, Morocco
[email protected],
[email protected] 2 Ecole Royale Navale, Boulevard Sour Jdid, B.P. 16303 Casablanca, Morocco
[email protected] 3 Department of Mathematic, University of New Mexico, 705 Gurley Avenue, Gallup, NM 87301, USA
[email protected],
[email protected] 4 Department of Mathematics, Berhampur University, Bhanja Bihar -76007, Berhampur, Odisha, India
[email protected]
Abstract. A single valued trapezoidal neutrosophic number (SVTNN) is a special case of single valued neutrosophic set (SVNS), which is defined on real number set. This paper investigates the single valued trapezoidal neutrosophic minimum spanning tree (SVTNMST) problem where the edge weights are assumed to be single valued trapezoidal neutrosophic variable. A neutrosophic Kruskal algorithm is presented for searching the minimum spanning tree in a single valued trapezoidal neutrosophic graph (SVTN-graph). To check the validity of the proposed algorithm, an illustrative example is explained. Finally, a comparison study has been made with Mullai’s algorithm in neutrosophic graphs. Keywords: Single valued trapezoidal neutrosophic sets Neutrosophic graph Minimum spanning tree
Score function
1 Introduction Prof. Smarandache [1] proposed the concept of neutrosophic logic, which is very useful tool for dealing with indeterminate and inconsistent information which exist in various real life problems. The concept of neutrosophic set (NS) generalizes the concept of classic set, fuzzy set (FS) and intuitionistic fuzzy sets (IFS). The three characteristic functions of neutrosophic sets are the truth-membership function (T), the indeterminate-membership function (I) and the false-membership of the real standard or nonstandard unit interval ]−0, 1+[. An extension of NS, single-valued neutrosophic sets (SVNS) have introduced by Prof. Smarandache [1] and Prof. Wang et al. [2] to deal real with scientific and engineering applications. Many researches making particularizations on the T, I, F components that leads to definition of particular case of © Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 93–105, 2019. https://doi.org/10.1007/978-3-030-01177-2_7
94
S. Broumi et al.
neutrosophic sets such simplified neutrosophic sets (three, interval-valued neutrosophic sets [4], bipolar neutrosophic set [5], and multi-valued neutrosophic set [6] and so on). In addition, Deli and Subas [7] developed a new raking method by proposing the concept of cut sets for SVTNNs. The authors applied it for solving multicriteria decision-making problem. Thamaraiselvi and Santhi [8] proposed an approach based on SVTNNs for solving neutrosophic transportation problem. Singh et al. [29] solved some drawbacks in the definition of score function over SVTNNs in [8] and investigated a new method for transportation problem with SVTNN. Ru-xia et al. [9] had proposed the concept of score, accuracy and certainty function as well as a method to rank SVTNNs. The authors improved the drawbacks of the score function of SVTNNs defined in [7]. Ru-Xia et al. [30] defined the single-valued trapezoidal neutrosophic normalized weighted Bonferroni operator. An approach based on this operator is develops to solve the MAGDM problems with SVTNNs. Peide et al. [31] extended the Maclaurin symmetric mean (MSM) operator to SVTNNs and defined some SVTNMSM operators. The authors has discussed some interesting properties and developed a method to deal with MGDM problem based on SVTNWMSM operator. In classical graph theory, there are common algorithms [10–13] for solving the minimum spanning tree including Prim and Kruskal algorithm. Two types of graphs exist in the literature, the undirected and directed graphs. The graph can be represented by two ways: adjacency matrix and Adjacency list. By integrating SVNS on graph theory, Broumi et al. [14–21] developed a new version of graph theory called single valued neutrosophic graph theory (SVNGT) and proposed different classes of graph based on triad (T, I, F). Later on, few scholars have presented neutrosophic techniques to deal with neutrosophic minimum spanning tree. Broumi et al. [22] presented an algorithm to search a minimum spanning tree of a neutrosophic graph whose edge weights are labeled by an interval-valued bipolar neutrosophic values. In [23], the authors, proposed a new concept of matrix algorithm for MST in undirected interval valued neutrosophic graph whose edge weights are interval valued neutrosophic values. Kandasamy [28] put forward the concept of double valued neutrosophic sets (DVNS) and introduced some basic operational laws of DVNS. The author proposed the minimum spanning tree (DVN-MST) clustering algorithm, to cluster the data represented by double-valued neutrosophic information. Kandasamy [24] proposed the triple refined indeterminate neutrosophic minimum spanning (TRIN-MST) clustering algorithm based on the distance matrix as a generalization of the inituitionistic fuzzy minimal spanning tree and single valued neutrosophic minimal spanning tree. Based on the definition of SVNS, Prof Ye introduced a method for finding the MST of single valued neutrosophic graph whose nodes are represented by SVNS [25]. In another study, Mandal and Basu [25] developed an algorithm for finding optimum spanning tree problem in neutrosophic environment based on an improved similarity measure. The authors applied this algorithm to a network problem with multiple criteria. And also Dr. Mullai presented an algorithm for searching the minimum spanning tree in an undirected bipolar neutrosophic graph [27]. The main contribution of this research paper is to develop a neutrosophic Kruskal algorithm based on matrix approach for searching the cost minimum spanning tree of an undirected single valued trapezoidal neutrosophic graph. In this paper, we adopted the score function defined in [9].
Minimum Spanning Tree Problem
95
The rest of the paper is structured as follows: Sect. 2 depicts the basic definition of single valued trapezoidal neutrosophic number and other definitions. Section 3 describes the neutrosophic Kruskal algorithm for solving minimum spanning tree problem on a SVTN-graph. To check the effectiveness of the proposed method, a numerical example is explained in Sect. 4. Section 5 conducts a comparison with other existing algorithms. Finally, Sect. 6 presents the main conclusions with future work.
2 Prliminaries The very fundamental interesting results, those are usually used here, may be found in [1, 2, 7, 9]. Definition 2.1 [2]. Given W be an universal set. A neutrosophic set Q is given by: Q = \TQ ð xÞ; IQ ð xÞ; FQ ð xÞ [ jx 2 W
ð1Þ
A neutrosophic set Q described using three characteristic functions: membership degree TQ ð xÞ, indeterminacy degree IQ ð xÞ and non-membership degree FQ ð xÞ contained in real standard or non- standard subset of ]−0, 1+[ respectively.
0 TQ ð x Þ þ I Q ð x Þ þ F Q ð x Þ 3 þ
ð2Þ
Definition 2.2 [1, 2]. Given W be an universal set. A single valued neutrosophic sets R is given by: R = f\TR ð xÞ; IR ð xÞ; FR ð xÞ [ jx 2 Wg
ð3Þ
Where, TR ð xÞ: W ! [0,1] is degree of membership of each element x 2 W, IR ð xÞ : W ! ½0; 1 is degree of indeterminacy of each element x 2 W and FR ð xÞ : W ! ½0; 1 is degree of non-membership of each element x 2 W and such that 0 TR ð xÞ þ IR ð xÞ þ FR ð xÞ 3
ð4Þ
Definition 2.3 [7]. A single valued trapezoidal neutrosophic number (SVTNN) ~a ¼ \ða1 ; b1 ; c1 ; d1 Þ; Ta ; Ia ; Fa [ is a kind of single valued neutrosophic set defined on real number set, characterized by three functions: the truth-membership, indeterminate-membership and false-membership presented as follows: 8 ðx a1 ÞTa =ðb1 a1 Þ > > < Ta Ta ðxÞ ¼ ðd xÞTa =ðd1 c1 Þ > > : 1 0
ða1 x b1 Þ ðb1 x c1 Þ ðc1 x d1 Þ otherwise
ð5Þ
96
S. Broumi et al.
8 ðb1 x þ Ia ðx a1 ÞÞ=ðb1 a1 Þ > > < Ia Ia ðxÞ ¼ ðx c1 þ Ia ðd1 xÞÞ=ðd1 c1 Þ > > : 1 8 ðb1 x þ Fa ðx a1 ÞÞ=ðb1 a1 Þ > > < Fa Fa ðxÞ ¼ ðx c1 þ Fa ðd1 xÞÞ=ðd1 c1 Þ > > : 1
ða1 x b1 Þ ðb1 x c1 Þ ðc1 x d1 Þ otherwise
ð6Þ
ða1 x b1 Þ ðb1 x c1 Þ ðc1 x d1 Þ otherwise
ð7Þ
where 0 Ta 1; 0 Ia 1; 0 Fa 1 and 0 Ta + Ia + Fa 3; a 1 ; b1 ; c 1 ; d1 2 R Definition 2.4 [9]. Given K = [a1 , a2 , a3 , a4 ] be a trapezoidal fuzzy number on the real set R a1 a2 a3 a4 ; then, the center of gravity (COG) of K can be defined as: ( COGðKÞ ¼
ah
1 3
a1 þ a2 þ a3 þ a4 a4aþ4 aa33aa22a1a1
i
if a1 ¼ a2 ¼ a3 ¼ a4 otherwise
ð8Þ
Ru-xia et al. [9] had also proposed the concept of score, accuracy and certainty function as well as a method to rank SVTN-numbers, as shown in the following definition. Definition 2.5 [9]. Given ~b = < [b1 , b2 , b3 , b4 ],ðT~b , I~b ; F~b Þ > be a SVTN-number; the score function, accuracy function and certainty function of ~ b are presented as follows:
T ~b I~b F~b Þ E(~bÞ ¼ COGðKÞ 3 A(~bÞ ¼ COGðKÞ T~b F~b
ð10Þ
C(~bÞ ¼ COGðKÞ T~b
ð11Þ
ð9Þ
3 Neutrosophic Kruskal Algorithm for Solving Spanning Tree Problem In this section, a neutrosophic version of Kruskal’s algorithm based on matrix approach is presented to handle indeterminacy data. The proposed algorithm is described as follows: Algorithm: Input: M = Wij nn be the weight vector for UWNG G. Output: Finding spanning tree T for least cost. Step 1: Represent the SVTN-graph (G) by SVTN-adjacency matrix A.
Minimum Spanning Tree Problem
97
Step 2: Translate the SVTN- matrix into score matrix Sij nn , by putting score function values of SVTNs as crisp values. Step 3: Highlight every entry of matrix or set it zero by repeating step 4 and 5 for all n-1 entries. Step 4: Find M (one column at time) in order to find the highlighted minimum entries Sij denoting the weight of eij . Step 5: If eij corresponds to Sij generates a cycle with the previously highlighted entries of the score matrices Sij then set Sij = 0 else mark Sij . Step 6: The highlighted entries of score matrix is the required MST. Step 7: Stop. The proposed algorithm should be explained by illustrative example
4 Numerical Example An example of SVTNMST is examined in this section as an illustration of neutrosophic Kruskal algorithm. Given the graph G = (V, E) depicted in Fig. 1. By applying the steps of neutrosophic Kruskal algorithm for searching the minimum cost spanning tree, we proceed as follow:
Fig. 1. Undirected SVTN- graphs.
98
S. Broumi et al.
The SVTN- adjacency matrix A is given below: 2
0 6 6 \½4; 5; 6; 7; ð0:5; 0:4; 0:1Þ [ 6 ¼6 6 \½1; 4; 6; 7; ð0:5; 0:1; 0:3Þ 6 4 \½1; 2; 5; 6; ð0:4; 0:2; 0:1Þ [
½4; 5; 6; 7; ð0:5; 0:4; 0:1Þ 0 0 \½2; 4; 6; 7; ð0:1; 0:3; 0:6Þ [
0 \½1; 4; 6; 7; ð0:5; 0:1; 0:3Þ [
0 \½1; 2; 5; 6; ð0:4; 0:2; 0:1Þ [
0 0
\½2; 4; 6; 7; ð0:1; 0:3; 0:6Þ [ \½3; 4; 5; 6; ð0:6; 0:3; 0:5Þ [
\½3; 4; 5; 6; ð0:6; 0:3; 0:5Þ [ 0 \½1; 2; 3; 4; ð0:7; 0:4; 0:4Þ [ \½1; 3; 4; 5; ð0:3; 0:2; 0:5Þ [ 3 0 7 0 7 7 \½1; 2; 3; 4; ð0:7; 0:4; 0:4Þ [ 7 7 7 \½1; 3; 4; 5; ð0:3; 0:2; 0:5Þ [ 5 0 Thus, using the score function, we get the score matrix (Fig. 2).
Fig. 2. Score matrix.
Fig. 3. The selected edge (3, 5) of G.
Minimum Spanning Tree Problem
99
Based on the matrix illustrated in Fig. 2, we observe that the least value is 2.81. So the corresponding edge (3, 5) is marked red in Fig. 3. This practice is repeated until last iteration. According to Figs. 4 and 5, the next least value 3.73 is highlighted and the corresponding edge is marked red.
Fig. 4. The highlighted next least value 3.73 of S.
Fig. 5. The selected edge (4, 5) of G.
Fig. 6. The highlighted next least value 3.85 of S.
Based on the matrix illustrated in Fig. 6, the next least value 3.85 is highlighted in Fig. 7. Based on the matrix illustrated in Fig. 8, the next least value is 4.89 but its corresponding edge (1, 3) portrayed in Fig. 9 form a cycle. So this value is marked zero. Based on the matrix illustrated in Fig. 10, the next least value is 5.1 but its corresponding edge form a cycle. So this value is marked zero.
100
S. Broumi et al.
Fig. 7. The selected edge (1, 4) of G.
Fig. 8. The highlighted next least value 4.89 of S.
Fig. 9. Corresponding edge (1, 3).
Fig. 10. The highlighted next least value 5.1 of S.
Minimum Spanning Tree Problem
101
The next least value 5.69 is highlighted it is shown in Fig. 11. The corresponding highlighted edges are portrayed in Fig. 12.
Fig. 11. The highlighted next least value 5.69 of S.
Fig. 12. The selected edge (2, 4) of G.
Based on the matrix illustrated in Fig. 13, the next least value is 6.11 but its corresponding edge form a cycle, so this value is marked zero.
Fig. 13. The highlighted next least value 6.11 of S.
Finally, the SVTN minimal spanning tree generated by the neutrosophic Kruskal algorithm include only the highlighted edges with red color and are illustrated in Fig. 14.
102
S. Broumi et al.
Fig. 14. SVTNMST with proposed algorithm.
Following the steps of neutrosophic Kruskal algorithm presented in Sect. 4. The cost effective spanning tree T based on crisp data is 16, 08 and the final cost effective tree is {1, 4}, {2, 4},{4, 5},{5, 3}.
5 Comparative Study In this section, we compare the proposed approach presented in Sect. 4 with the algorithm presented by Mullai et al. [27]. Based on the steps of Mullai’s algorithm, we get the results: Iteration 1: 1 ¼ f2; 3; 4; 5g Let C1 ¼ f1g and C Iteration 2: 2 ¼ f2; 3; 5g Let C2 ¼ f1; 4g and C Iteration 3: 3 ¼ f2; 3g Let C3 ¼ f1; 4; 5g and C Iteration 4: 4 ¼ f2g Let C4 ¼ f1; 4; 5; 3g and C Finally, SVTN minimal spanning tree is presented in Fig. 15.
Minimum Spanning Tree Problem
103
Fig. 15. SVTNMST with Mullai’s algorithm.
So, after deneutrosophication of edges’ weight using the score function, it can be observed that the SVTN minimal spanning tree {1, 4},{2, 4},{4, 5},{5, 3} obtained by execution of Mullai’s algorithm, constitute the same SVTNMST produced by the neutrosophic Kruskal algorithm. The algorithm proposed by Mullai in [27] is based on comparison of the edges of graph at each iteration, whereas the Kuskal neutrosophic algorithm proposed by us is easier than the previous algorithm, because it is based on matrix approach and can be easily implemented in MATLAB workspace.
6 Conclusion In this paper, a neutrosophic Kruskal algorithm is developed for solving the spanning tree problem on a neutrosophic network, where the edges are labeled by SVNNs. As for future work, we would to extend the neutrosophic Kruskal algorithm to other structures of neutrosophic graphs like bipolar neutrosophic graphs, bipolar neutrosophic digraph and rough neutrosophic graphs. Acknowledgment. The authors are very grateful to the chief editor and reviewers for their comments and suggestions, which is helpful in improving the paper.
104
S. Broumi et al.
References 1. Smarandache, F.: Neutrosophy. Neutrosophic Probability, Set, and Logic, Pro Quest Information & Learning, Ann Arbor, Michigan, USA, 105 p (1998) 2. Wang, H., Smarandache, F., Zhang, Y., Sunderraman, R.: Single valued neutrosophic sets. Multisspace Multistructure 4, 410–413 (2010) 3. Ye, J.: Vector similarity measures of simplified neutrosophic sets and their application in multicriteria decision making. Int. J. Fuzzy Syst. 16(2), 204–211 (2014) 4. Wang, H., Smarandache, F., Zhang, Y.Q., Sunderraman, R.: Interval neutrosophic sets and logic: theory and applications in computing, Hexis, Arizona (2005) 5. Deli, A.M., Smarandache, F.: Bipolar neutrosophic sets and their applications based on multicriteria decision making problems. In: International Conference Advanced Mechatronic Systems, (ICAMechs), pp. 249–254 (2015) 6. Wang, J.J., Li, X.E.: TODIM method with multi-valued neutrosophic sets. Control. Decis. 30, 1139–1142 (2015). (in Chinese) 7. Deli, I, Subas, Y.: A Ranking methods of single valued neutrosophic numbers and its application to multi-attribute decision making problems. Int. J. Mach. Learn. Cybern. pp. 1– 14 (2016) 8. Thamaraiselvi, A., Santhi, R.: A new approach for optimization of real life transportation problem in neutrosophic environment. Math. Probl. Eng. 2016 (2016) article ID 5950747, 9 pages 9. Liang, R., Wang, J.Q., Zhang, H.: A multi-criteria decision making method based on single valued trapezoidal neutrosophic preference relations with complete weight information. Neural Comput. Appl. (2017). https://doi.org/10.1007/s00521-017-2925-8 10. Bazlamacc, F., Hindi, K.S.: Minimum-weight spanning tree algorithms: a survey and empirical study. Comput. Operat. Res. 28, 767–785 (2001) 11. Mandal, A., Dutta, J., Pal, S.C.: A new efficient technique to construct a minimum spanning tree. Int. J. Adv. Res. Comput. Sci. softw. Eng. (10), 93–97 (2012) 12. Dey, A., Pal, A.: Prim’s algorithm for solving minimum spanning tree problem in fuzzy environment. Ann. Fuzzy Math. Inf (2016) 13. Patel, N., Patel, K.M.: A survey on: enhancement of minimum spanning tree. J. Eng. Res. Appl. 5(1 (Part 3)), 06–10 (2015) 14. Broumi, S., Bakali, A., Talea, M., Smarandache, F., Vladareanu, L.: Computation of shortest path problem in a network with SV-trapezoidal neutrosophic numbers. In: Proceedings of the 2016 International Conference on Advanced Mechatronic Systems, Melbourne, Australia, pp. 417–422 (2016) 15. Broumi, S., Bakali, A., Talea, M., Smarandache, F., Vladareanu, L.: Applying dijkstra algorithm for solving neutrosophic shortest path problem. In: Proceedings of the 2016 International Conference on Advanced Mechatronic Systems, Melbourne, Australia, November 3–December 3, pp. 412–416 (2016) 16. Broumi, S., Bakali, A., Talea, M., Smarandache, F., Kishore Kumar, P.K.: Shortest path problem on single valued neutrosophic graphs. In: 2017 International Symposium on Networks, Computers and Communications (ISNCC) (2017, in press) 17. Broumi, S., Bakali, A., Mohamed, T., Smarandache, F., Vladareanu, L.: Shortest path problem under triangular fuzzy neutrosophic information. In: 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA), pp. 169–174 (2016) 18. Broumi, S., Talea, M., Bakali, A., Smarandache, F.: Single Valued Neutrosophic Graphs. J. New Theory, N 10, 86–101 (2016)
Minimum Spanning Tree Problem
105
19. Broumi, S., Talea, M., Smarandache, F., Bakali, A.: Single valued neutrosophic graphs: degree, order and size. In: IEEE International Conference on Fuzzy Systems (FUZZ), pp. 2444–2451 (2016) 20. Broumi, S., Bakali, A., Talea, M., Smarandache, F.: Isolated single valued neutrosophic graphs. Neutrosophic Sets Syst. 11, 74–78 (2016) 21. Broumi, S., Smarandache, F., Talea, M., Bakali, A.: Decision-making method based on the interval valued neutrosophic graph. In: Future Technologie, pp. 44–50. IEEE (2016) 22. Broumi, S., Bakali, A., Talea, M., Smarandache, F., Verma, R.: Computing minimum spanning tree in interval valued bipolar neutrosophic environment. Int. J. Model. Optim. 7 (5), 300–304 (2017). https://doi.org/10.7763/IJMO.2017.V7.602 23. Broumi, S., Bakali, A., Talea, M., Smarandache, F., Kishore Kumar, P.K.: A new concept of matrix algorithm for MST in undirected interval valued neutrosophic graph. In: Neutrosophic Operational Research- Volume II-Florentin Smarandache, Mohamed AbdelBasset and Victor Chang (Editors), pp. 54–69 (2017). ISBN 978-1-59973-53724. Kandasamy, I., Smarandache, F.: Clustering algorithm of triple refined indeterminate neutrosophic set for personality Grouping. In: Computing Conference 2017 (2017, in press) 25. Ye, J.: single valued neutrosophic minimum spanning tree and its clustering method. J. Intell. Syst. 23, 311–324 (2014) 26. Mandal, K., Basu, K.: Improved similarity measure in neutrosophic environment and its application in finding minimum spanning tree. J. Intell. Fuzzy Syst. 31, 1721–1730 (2016) 27. Mullai, M., Broumi, S., Stephen, A.: Shortest path problem by minimal spanning tree algorithm using bipolar neutrosophic numbers. Int. J. Math. Trends Technol. 46(2), 80–87 (2017) 28. Kandasamy, I.: Double-valued neutrosophic sets, their minimum spanning trees, and clustering algorithm. J. Intell. Syst. 1–17 (2016) 29. Singh, A., Kumar, A., Appadoo, S.S.: Modified approach for optimization of real life transportation problem in neutrosophic environment. Math. Probl. Eng. (2017) 9 pages 30. Liang, R., Wang, J.Q., Li, L.: Multi-criteria group decision-making method based on interdependent inputs of single-valued trapezoidal neutrosophic information. Neural Comput. Appl. (2016). https://doi.org/10.1007/s00521-016-2672-2 31. Liu, P., Zhang, X.: Some maclaurin symmetric mean operators for single-valued trapezoidal neutrosophic numbers and their applications to group decision making. Int. J. Fuzzy Syst. 1– 17 (2017). https://doi.org/10.1007/s40815-017-0335-9
Hybrid Evolutionary Algorithm Based on PSOGA for ANFIS Designing in Prediction of No-Deposition Bed Load Sediment Transport in Sewer Pipe Bahram Gharabaghi1(&), Hossein Bonakdari1, and Isa Ebtehaj2 1
2
School of Engineering, University of Guelph, Guelph, Canada
[email protected],
[email protected] Department of Civil Engineering, Razi University, Kermanshah, Iran
[email protected]
Abstract. In this paper, a new hybrid algorithm is introduced based on advantage of two evolutionary algorithms, Particle Swarm Optimization (PSO) and genetic Algorithms (GA) known as PSOGA. The proposed algorithm was utilized in optimize design of ANFIS network (ANFIS-PSOGA). The ANFIS-PSOGA was employed to predict limiting velocity in sewer pipe to prevent sediment deposition. Firstly, the effective dimensionless variables were provided. Then, minimum velocity parameter was presented as densimetric Froude number (Fr) was predicted using ANFIS-PSOGA. The results of proposed hybrid method is evaluated using different statistical indices (R2 = 0.98; MAPE = 3.62; RMSE = 0.23; RRMSE = 0.05). The performance of new hybrid algorithm (PSOGA) is compared with GA, PSO and a hybrid algorithm (i.e. a combination of back-propagation and least-square (BPLS). The results show that the presented hybrid algorithm in optimize design of ANFIS (PSOGA) has better accuracy than algorithms. Keywords: ANFIS Bed load Genetic Algorithm (GA) Particle Swarm Optimization (PSO) PSOGA hybrid algorithm Sediment transport
1 Introduction Sediment deposition in sewer systems is one of the challenges in sediment transport. Sediment transport is a three dimensional complex phenomena. Input flow to sewer often has solid particles. These materials according to discharge flow along the pathway are washed and with mainstream enter to the sewer and if the flow has not sufficient ability to transfer these particles, will be deposited. Sediment deposition leads to increasing and decreasing in bed roughness and flow cross-section (respectively). Therefore shear stress distribution in channel bed is changed. Hence, a method is required to prevent the minimum necessary velocity to put off sediment deposition in the sewer system.
© Springer Nature Switzerland AG 2019 K. Arai et al. (Eds.): SAI 2018, AISC 857, pp. 106–118, 2019. https://doi.org/10.1007/978-3-030-01177-2_8
Hybrid Evolutionary Algorithm Based on PSOGA
107
The easiest method is using fixed velocity in the range of 0.6–0.9 m/s has been suggested by several researchers [1]. But regarding to this method that does not consider the effective hydraulic parameters in limiting velocity estimation, several researchers using different experimental studies presented a different regression equations [1–4]. Azamathulla et al. [5] introduced a new multiple linear regression-base equation using a combination of Ab Ghani [2] and Vongvisessomjai et al. [1] datasets to apply in different boundaries with partially full flow. Ebtehaj et al. [4] modified the coefficient of Vongvisessomjai et al.’s [1] equations into states, be load and suspended load, using wide range of experimental datasets which consider different hydraulic conditions. One of the main regression-based equation problems in limiting velocity prediction is lack of flexibility of relations in various hydraulic conditions [4, 6]. Therefore, using these equations is associated with uncertainty and made underestimate or overestimate prediction of limiting velocity which is results in sedimentation or non-economic plane (respectively). Thus, an approach is needed with high ability in prediction of complex nonlinear systems. In recent decade, artificial intelligence (AI) has been utilized in many engineering fields consist of hydraulic and hydrology engineering such as diversion channel [7–9]; scour depth [10, 11] and sediment transport [12, 13]. Evolutionary algorithms which are a generic population-based meta-heuristic optimization algorithms had been used in optimizing of different classical AI techniques such as artificial neural network (ANN), Adaptive Neuro-Fuzzy Inference Systems (ANFIS) and Support Vector machines (SVM) [14–17]. Qasem et al. [18] estimated the minimum velocity to prevent sedimentation in drainage systems using Radial Basis Function (RBF) network and optimization of RBF with article swarm optimization (PSO) (PSO-RBF). Ebtehaj and Bonakdari [19] forecasted the limiting velocity in sewer pipe using two neuro-fuzzy based techniques. The ANFIS and optimization of ANFIS with differential evolution (ANFIS-DE) were considered to overcome the limitation of regression-based equations. They found that evolutionary algorithm is stronger than gradient algorithms in optimize design of AI methods. Generally, providing a method emphasize on its advantages compared with existing methods, it is clear that almost any method that is provided for a specific problem, in addition to having different advantages, disadvantages as well. The POS is a poplar evolutionary algorithm which had been employed in different problems. The most important advantage of PSO include having memory, cooperation and information sharing between particles, high-speed convergence, easy implementation, high strength in general answer finding, low number of control parameters and very low complexity that requires less memory size related to other evolutionary algorithm. The main problems in PSO are prematurely converges [20], local minimum problem and population variety decrease during different iterations. In this study, a new hybrid method combination of Adaptive Neuro-Fuzzy Inference Systems (ANFIS), Particle swarm Optimization (PSO) and Genetic Algorithm (GA) (ANFIS-PSOGA) is presented. In fact, in the ANFIS-PSOGA, ANFIS training by new PSOGA algorithm is done. The performance of this algorithm is to prevent access to PSO algorithm to local optimal, presented strategy in genetic algorithm was used. To predict the limiting velocity using ANFIS-PSOGA, with the effective parameters for estimation were determined and then by using dimensional analysis, effective
108
B. Gharabaghi et al.
dimensionless parameters in the minimum velocity estimation determined. Using dimensionless parameters introduced, the minimum velocity is predicted using ANFISPSOGA. The results of proposed method have been compared with ANFIS-PSO, ANFIS-GA and ANFIS-BPLS which is BPLS is a hybrid algorithm based combining of back propagation (BP) and least-square (LS). In the following sections, the proposed hybrid method which is employed a hybrid algorithm (combination of PSO and GA) to optimize design of ANFIS are presented. In this section an overview of ANFIS is presented. Also, the key features of GA and basic of PSO is offered. In the third section, the methodology of this paper consists of derivation of effective parameters on limiting velocity prediction and employed data for ANFIS-PSOGA developing is presented. Finally, the results of proposed method in compared with existing methods are presented.
2 ANFIS-PSOGA 2.1
ANFIS
ANFIS is a model based on advantage of neural network and fuzzy logic simultaneously to provide an approach for fuzzy modeling to learn information about certain dataset in each problem. Fuzzy logic system [21] organizes a system by fuzzy rules which are the phrases that elucidate the relationship between set of inputs-output data in the form of IF-THEN which are depending on the linguistic variables. The rules are expressed in following form: IF x1 is A1 ; x2 is A2 ; then y ¼ px1 þ qx2 þ r
ð1Þ
where p, q, r are the consequent parameters of Takagi-Sugeno fuzzy model, x1 and x2 are input parameters and A1 and A2 are the fuzzy sets defined for x1 and x2 (respectively). ANFIS network [22] is a network structure includes number of nodes which are connected together by direct links. ANFIS structure consists of five layers. The node functions are similar in the same layer and expressed as follow: Layer 1: Each node in this layer implement fuzzy membership functions (MFs) which are mapping the input parameters to corresponding fuzzy membership degree. O1i ¼ lAi ð xÞ
ð2Þ
where Ai is the linguistic label, O1i is the MF of Ai, and x is the input of node i. The MF used in this study is Gaussian MF type which is smooth, non-zero and has the less parameter related to other MFs. Layer 2: Each node is calculated the activate degree of each rule which is indicates the firing strength of each node.
Hybrid Evolutionary Algorithm Based on PSOGA
O2i ¼ wi ¼ lA1 ð xÞ lA2 ð yÞ. . .: i ¼ 1; 2; . . .; N
109
ð3Þ
Layer 3: The normalized firing strength of rules are calculated in this layer so that the ith node is computes the ratio of ith firing strength of rule i to the firing strength of all rules. i ¼ O3i ¼ w
wi i ¼ 1; 2; . . .; N w1 þ w2 þ . . . þ wN
ð4Þ
Layer 4: The output of each node is computed as follows: i fi ¼ w i ðpx þ qy þ ; ::; þ r Þ O4i ¼ w
i ¼ 1; 2; . . .; N
ð5Þ
where {p, q, r} are the adaptive parameter set of layer i, which are referred as consequent parameters. Layer 5: The summation of all input signals which is overall output of ANFIS network is calculated in this layer. O5i
¼
N X i
i fi ¼ w
N X i
w i fi
, N X
wi
ð6Þ
i
To generation of fuzzy inference systems (FIS) the grid partitioning methods is utilized. The training algorithm in main ANFIS is consist of two algorithms, backpropagation and hybrid which is a combine of back-propagation and least-square (BPLS). Moreover, evolutionary algorithm can be used as an alternative training algorithm in design of ANFIS structure. The particle swarm optimization (PSO), genetic algorithm (GA) and combination of these algorithms (PSOGA) are presented in following sections. 2.2
Key Features in GA Search
Genetic Algorithm (GA) is introduced by Holland [23]. The GA is employed as a universal effective search technique in various optimization problems. The standard and other forms of GA were coded in the search space of variables (genotypes) instead of phenotypes. Thus, any chromosome (or genotype) involves a number of genes comprising coded value of a design variable of a specific problem. Therefore, each individual chromosome which is defined distinct vector of design variables is decoded to phenotype to evaluate the objective function or corresponding fitness. The main operators in GA are mutation and crossover (or recombination). The new generation of chromosomes out of current population is reproduced using a fitness-based selection. Using priority selection and fitness evaluation the precise problem information is transferred to the search space. The exploiting in corresponding search space is done by crossover to assessment the probable alters between the current population genes. Mutation is also vital to maintain the genetic diversity from one generation population of chromosomes to next one. Therefore, it is a net probative operator to empower GA
110
B. Gharabaghi et al.
points out of the current space from the entire search space. These exploration operators and distinguished exploitation operators, permit GA to fine tune the balance between diversification and intensification for more effective search. 2.3
Basic of PSO Algorithm
The particle Swarm Optimization (PSO) algorithm is introduced by Eberhart and Kennedy [24]. PSO is an evolutionary algorithm which is inspired by simultaneous swimming of fish, light of birds, and their social lives. In this algorithm, each particle has certain fitness which is evaluated by using fitness function. The position (x) and velocity (v) of particle i in d-dimensional space an iteration t are indicated as follows: Xit ¼ xti1 ; xti2 ; . . .; xtid
ð7Þ
Vit ¼ vti1 ; vti2 ; . . .; vtid
ð8Þ
To save the best previous position in each iteration, the particle I utilize a memory as p vector: Pti ¼ pti1 ; pti2 ; . . .; ptid
ð9Þ
In each iteration, the particle is updated by considering two different best value which are the The best solution so far it has experienced by particles (gbset) and the best position ever achieved in the population (pbset). vit þ 1 ¼ wvki þ c1 r2 : pbesti xki þ c2 r2 : gbesti xki
ð10Þ
xki þ 1 ¼ xki þ vki þ 1
ð11Þ
where r1 and r2 random numbers in [0 1], c1 and c2 weight factors and w is the weigh parameter. After pbest and gbest was determined, velocity and position of each particle is updated using above-mentioned equations. Generally, the training in PSO is started by random selection of initial population. The fitness of population is evaluated in each iteration, the pbset and gbset is determined the velocity and position is updated until to reach termination criteria. This criteria would be maximum number of iteration or a certain value of fitness function. 2.4
PSOGA Algorithm
Due to both of GA and PSO are population-based evolutionary algorithms but each of them has advantages and disadvantages. One of GA disadvantage is high computational cost of this algorithm. Also, this algorithm needs many functions to evaluate the optimized variable. Furthermore, because of crossover, mutation and selection operators, the convergence speed of this algorithm is low. The PSO has memory to store the results of previous generations while the GA, the information of a not elected
Hybrid Evolutionary Algorithm Based on PSOGA
111
individual is lost. In PSO, interactions lead to an increase in the search for the optimal solution. While in GA, finding precise solution in optimal search space is difficult. Therefore, a new hybrid algorithm based on PSO and GA (PSOGA) is introduced to consider the advantage of both of PSO and GA, simultaneously to increase the search process precision. In PSOGA, the global solution is computed in the view of the whole search space that combined the standard velocity and position rules using GA operators (selection, crossover, mutation). Thus, some of individuals are similar to previous generation but in different positions of search space due to PSO, the remaining individual is replaced with new individuals produced by GA operators. This kind of updating mechanism would be results in more natural evolution related to standard PSO. The PSOGA updating rules is as follows: vjt þ 1 ¼ wvtj þ c1 rand1 pbestj ðGAÞ xtj þ c2 rand2 gbest ðGAÞ xtj xtj þ 1 ¼ xtj þ vjt þ 1 ;
xmin \xtj þ 1 \xmax
ð12Þ
ð13Þ
where pbest(GA) and gbest(GA) are the best position reached and best solution in the swarm (respectively) after GA operation.
3 Methodology One of the important issues in designing sewer pipe is prevent sedimentation to decrease the operation and maintain costs. Therefore, the effective parameters should be determined to calculate minimum velocity required to prevent sediment deposition. Based on experimental studies in literature [1–3], the effective parameters are as including sediment, flow and pipe channel characteristics as gravitational acceleration (g), kinematic viscosity (m), water density (q), sediment density (qs), pipe diameter (D), cross-sectional area of flow (A), hydraulic radius (R), flow depth (y), median diameter of particles (d), channel slope (S0), sediment friction factors (ks), volumetric sediment concentration (CV), mean velocity of flow (V). These parameters are expressed as a functional relationship: V ¼ Uðks ; CV ; m; g; s; d; D; y; R; AÞ
ð14Þ
where s is the specific gravity of sediment (=qs/q). Using dimensionless variables can produce more precise prediction sediment transport than those attaining by dimensional parameters [1–5]. Hence, dimensionless variables were considered in following form to develop ANFIS-PSOGA model [12, 25] D2 d R d d Fr ¼ U ks ; CV ; ; ; ; ; ; Dgr A D D R y
ð15Þ
112
B. Gharabaghi et al.
where Fr (=V/(g(s − 1)d)0.5) is the densimetric Froude number and Dgr = d(g(s − 1)/ m2)1/3 is the dimensionless particle number. Ebtehaj and Bonakdari [25] categorized the dimensionless variables in above-mentioned equation in five different groups, movement (Fr), transport (CV), sediment (Dgr, d/D), transport mode (d/R, D2/A, d/y, R/ D) and flow resistance (ks). They found that the best results of ANFIS model in prediction Fr are obtained using following functional relationship: d Fr ¼ U CV ; ; Dgr ; ks R
ð16Þ
To predict sediment transport in a sewer pipe, three different datasets [1, 2, 26] were used. The total number of data utilized in this study is 218 samples. Using random selection without replacement, the number of 30% of all data (66 samples) were utilized to validate model and the rest of samples are used to training the ANFISPSOGA. The range of entire used sample are as: 0:1\D ðmÞ\0:45; 0:005\R ðmÞ\0:136; 1\CV ðppmÞ\1280; 0:072\d ðmmÞ\8:3; 0:153\y=D\0:84 and 0:237\V ðm/sÞ\1:216:
4 Results and Discussion The performance of the ANFIS-PSOGA, ANFIS-PSO, ANFIS-GA and ANFIS-BPLS is evaluated using different statistical indices, including the correlation coefficient (R2), Mean Average Percentage Error (MAPE), Root Mean Square Error (RMSE), Relative Root Mean Square Error (RRMSE) and BIAS. The statistical indices are presented as follows: 0
12
N P
B ðO Oi Þ:ðPi Pi Þ C B i¼1 i C C s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi R2 ¼ B B N C N @ P 2P 2A ðOi Oi Þ ðPi Pi Þ i¼1
ð17Þ
i¼1
N 100 X jOi Pi j MAPE ¼ Oi N i¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X ðOi Pi Þ2 RMSE ¼ t N i¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u 1 X Oi Pi RRMSE ¼ t N i¼1 Oi
ð18Þ
ð19Þ
ð20Þ
Hybrid Evolutionary Algorithm Based on PSOGA
113
where Pi and Oi are the predicted and observed values (respectively), Pi and Oi are the average predicted and observed values (respectively), and n is the number of samples. The limiting velocity which is presented as a dimensionless variable known as Fr, was predicted using ANFIS-BPLS, ANFIS-GA, ANFIS-PSO and ANFIS-PSOGA. The obtained results of the Fr prediction using ANFIS-PSOGA, ANFIS-PSO, ANFIS-GA and ANFIS-BPLS are shown in Fig. 1. The used input combination in Fr prediction is as (16). This figure shows the ANFIS-BPLS which employed a combination of backpropagation and least-square methods in optimization of ANFSI, performs with overestimation and underestimation with relative error of over 10%. This trend is indicated in different range of Fr. The error distribution for ANFIS-BPLS is shown in Fig. 2. Due to this figure, only 50% of all sample has relative error less than 10%. Also, the 20% of all predicted Fr are with relative error more than 15%. The highest relative error is related to Fr = 2.07 which is 21%. Figure 3 indicates the statistical indices to quantitative evaluation of different design of ANFIS in prediction of Fr. The obtained value of presented indices in this figure (R2 = 0.95; MAPE = 9.57; RMSE = 0.41; RRMSE = 0.11) also confirms the low accuracy of ANFIS-BPLS. Therefore by using classical training algorithms which were considered in conventional ANFIS could not have present high level of accuracy in prediction of Fr with wide range of datasets. Thus, the new algorithms such as PSO and GA which are popular evolutionary algorithm should be employed to increase the precision of Fr prediction. The use of PSO and GA as two evolutionary algorithms (ANFIS-PSO and ANFISGA) are increased the prediction accuracy related to ANFIS-BPLS. The presented scatter plot in Fig. 1 indicates use of GA in ANFIS design decrease the prediction error in low value of Fr (Fr