Springer Proceedings in Mathematics & Statistics
Valery A. Kalyagin · Panos M. Pardalos Oleg Prokopyev · Irina Utkina Editors
Computational Aspects and Applications in Large-Scale Networks NET 2017, Nizhny Novgorod, Russia, June 2017
Springer Proceedings in Mathematics & Statistics Volume 247
Springer Proceedings in Mathematics & Statistics This book series features volumes composed of selected contributions from workshops and conferences in all areas of current research in mathematics and statistics, including operation research and optimization. In addition to an overall evaluation of the interest, scientific quality, and timeliness of each proposal at the hands of the publisher, individual contributions are all refereed to the high quality standards of leading journals in the field. Thus, this series provides the research community with well-edited, authoritative reports on developments in the most exciting areas of mathematical and statistical research today.
More information about this series at http://www.springer.com/series/10533
Valery A. Kalyagin Panos M. Pardalos Oleg Prokopyev Irina Utkina •
•
Editors
Computational Aspects and Applications in Large-Scale Networks NET 2017, Nizhny Novgorod, Russia, June 2017
123
Editors Valery A. Kalyagin Higher School of Economics National Research University Nizhny Novgorod, Russia Panos M. Pardalos Department of Industrial and Systems Engineering University of Florida Gainesville, FL, USA
Oleg Prokopyev Department of Industrial Engineering University of Pittsburgh Pittsburgh, PA, USA Irina Utkina Higher School of Economics National Research University Nizhny Novgorod, Russia
ISSN 2194-1009 ISSN 2194-1017 (electronic) Springer Proceedings in Mathematics & Statistics ISBN 978-3-319-96246-7 ISBN 978-3-319-96247-4 (eBook) https://doi.org/10.1007/978-3-319-96247-4 Library of Congress Control Number: 2018948592 Mathematics Subject Classification (2010): 05C82, 90B10, 90B15 © Springer International Publishing AG, part of Springer Nature 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume is based on the papers presented at the 7th International Conference on Network Analysis held in Nizhny Novgorod, Russia, June 22–24, 2017. The main focus of the conference and this volume is centered on the development of new computationally efficient algorithms as well as underlying analysis and optimization in large-scale networks. Various applications in the context of social networks, power transmission grids, stock market networks, and human brain networks are also considered. The previous books based on the papers presented at the 1st–6th Conferences International Conference on Network Analysis can be found in [1–6]. The current volume consists of three major parts, namely, network computational algorithms, network models, and network applications. The first part of the book is focused on algorithmic methods in large-scale networks. In the Chapter “Tabu Search for Fleet Size and Mix Vehicle Routing Problem with Hard and Soft Time Windows”, the authors present an efficient algorithm for solving a computationally hard problem in a real-life network setting. In the Chapter “FPT Algorithms for the Shortest Lattice Vector and Integer Linear Programming Problems”, fixed-parameter tractable (FTP) algorithms are discussed. The main parameter is the maximal absolute value of rank minors of matrices in the problem formulation. The Chapter “The Video-Based Age and Gender Recognition with Convolution Neural Networks” reviews the problem of age and gender recognition methods for video data using modern deep convolutional neural networks. A new video-based recognition system is implemented with several aggregation methods that improve the method identification accuracy. The Chapter “On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs” considers a hereditary class of graphs with the following property: the maximum cardinality of a triangle packing for each graph of the class is equal to the minimum cardinality of a triangle vertex cover. Then, some minimal forbidden induced subgraphs for this class are described. The Chapter “The Global Search Theory Approach to the Bilevel Pricing Problem in Telecommunication Networks” deals with the hierarchical problem of optimal pricing in telecommunication networks.
v
vi
Preface
New methods of local and global search for finding optimistic solution are developed. In the Chapter “Graph Dichotomy Algorithm and Its Applications to Analysis of Stocks Market”, a new approach to graph complexity measure is presented. This new measure is used to study short-term predictions of crises at the stock market. In the Chapter “Cluster Analysis of Facial Video Data in Video Surveillance Systems Using Deep Learning”, a new approach is proposed for structuring information in video surveillance systems. The first stage of the new approach consists of grouping the videos, which contain identical faces. Based on this idea, a new efficient and fast algorithm of video surveillance is developed. The Chapter “Using Modular Decomposition Technique to Solve the Maximum Clique Problem” applies the modular decomposition technique to solve the weighted maximum clique problem. Developed technique is compared against the state-of-the-art algorithms in the area. The second part of the book is focused on network models. In the Chapter “Robust Statistical Procedures for Testing Dynamics in Market Network”, a sign similarity market network is considered. The problem of testing dynamics in market network is formulated as the problem of homogeneity hypotheses testing. Multiple testing techniques to solve this problem are developed and applied for different stock markets. The Chapter “Application of Market Models to Network Equilibrium Problems” describes an extension of the network flow equilibrium problem with elastic demands and develops a new equilibrium type model for resource allocation problems in wireless communication networks. In the Chapter “Selective Bi-coordinate Variations for Network Equilibrium Problems with Mixed Demand”, a modification of the method of bi-coordinate variations for network equilibrium problems with mixed demand is proposed. Some numerical results that confirm efficiency of the method are presented. The Chapter “Developing a Model of Topological Structure Formation for Power Transmission Grids Based on the Analysis of the UNEG” studies the nodes degrees distribution in the United National Electricity Grid (UNEG) of Russia. This study allows to develop a new Random Growth Model (RGM) to simulate UNEG, and to identify the key principles for the network formation. In the Chapter “Methods of Criteria Importance Theory and Their Software Implementation”, a general approach for solution of the multicriteria choice problem is developed on the basis of the criteria importance theory. These methods are implemented in the computer system DASS. The Chapter “A Model of Optimal Network Structure for Decentralized Nearest Neighbor Search” discusses the problem of the best structure of network for the decentralized nearest neighbor search algorithm. Mathematical programming model for the problem is proposed. Optimal structures of small size networks are computed. A generalization for large-scale networks is also discussed. In the Chapter “Computational Study of Activation Dynamics on Networks of Arbitrary Structure”, new results on modeling dynamical properties of collective systems are presented. A general technique for reducing the problems to SAT problem is developed. This approach is then applied to computer security networks. The Chapter “Rejection Graph for Multiple Testing of Elliptical Model for Market Network” uses the symmetry condition of tail
Preface
vii
distributions to test elliptical hypothesis for stock return distributions. Multiple testing procedures are developed to solve this problem. These procedures are then applied to the stock markets of different countries. Specific structure of the rejection graph is observed and discussed. In the Chapter “Mapping Paradigms of Social Sciences: Application of Network Analysis”, the network approach is applied to studying the relationships between various elements that constitute any particular research in social sciences. A set of relations between various elements from textbooks on methodology of social and political sciences is extracted and analyzed. The third part of the book deals with network applications. The Chapter “Using Geometry of the Set of Symmetric Positive Semidefinite Matrices to Classify Structural Brain Networks” presents a method of Symmetric Positive Semidefinite (SPSD) matrices classification and its application to analysis of structural brain networks (connectomes). Existing Symmetric Positive Definite (SPD) matrix-based algorithms are generalized to SPSD case. The performance of the proposed pipeline is demonstrated on structural brain networks reconstructed from the Alzheimer‘s Disease Neuroimaging Initiative (ADNI) data. In the Chapter “Comparison of Statistical Procedures for Gaussian Graphical Model Selection”, the uncertainty of statistical procedures for Gaussian graphical model selections is studied. Different statistical procedures are compared using different uncertainty measures, such as Type I and Type II errors, ROC and AUC. The Chapter “Sentiment Analysis Using Deep Learning” analyzes advantages of the deep learning methods over other baseline machine learning methods using sentiment analysis task in Twitter. In the Chapter “Invariance Properties of Statistical Procedures for Network Structures Identification”, optimality of some multiple decision procedures is discussed in the class of scale/shift invariant procedures. In the Chapter “Topological Modules of Human Brain Networks Are Anatomically Embedded: Evidence from Modularity Analysis at Multiple Scales”, an MRI data set is used to demonstrate that modular structure of brain networks is well reproducible in test–retest settings. These results provide evidence of the theoretically well-motivated hypothesis that brain regions neighboring the anatomical space also tend to belong to the same topological modules. The Chapter “Commercial Astroturfing Detection in Social Networks” is devoted to constructing a model capable of detecting astroturfing in customer reviews based on network analysis. In the Chapter “Information Propagation Strategies in Online Social Networks”, the authors discuss the problem of predicting information propagation using social network interactions. They suggest a new approach to construct the model of information propagation and test it on a real data set from social networks. The Chapter “Analysis of Co-authorship Networks and Scientific Citation Based on Google Scholar” investigates the problem on how scientific collaboration represented by co- authorship is related to citation indicators of a scientist. In the Chapter “Company Co-mention Network Analysis”, a new company network is constructed on the basis of news mentioning two companies together. Different types of social network analysis metrics (degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and frequency) are used to identify key companies in the network.
viii
Preface
We would like to thank all the authors and referees for their efforts. This work is supported by the Laboratory of Algorithms and Technologies for Network Analysis (LATNA) of the National Research University Higher School of Economics and by RSF grant 14-41-00039. Nizhny Novgorod, Russia Gainesville, FL, USA Pittsburgh, PA, USA Nizhny Novgorod, Russia
Valery A. Kalyagin Panos M. Pardalos Oleg Prokopyev Irina Utkina
References 1. Goldengorin, B.I., Kalyagin, V.A., Pardalos, P.M. (eds.): Models, algorithms and technologies for network analysis. In: Proceedings of the First International Conference on Network Analysis. Springer Proceedings in Mathematics and Statistics, vol. 32. Springer, Cham (2013) 2. Goldengorin, B.I., Kalyagin, V.A., Pardalos, P.M. (eds.): Models, algorithms and technologies for network analysis. In: Proceedings of the Second International Conference on Network Analysis. Springer Proceedings in Mathematics and Statistics, vol. 59. Springer, Cham (2013) 3. Batsyn, M.V., Kalyagin, V.A., Pardalos, P.M. (eds.): Models, algorithms and technologies for network analysis. In: Proceedings of Third International Conference on Network Analysis. Springer Proceedings in Mathematics and Statistics, vol. 104. Springer, Cham (2014) 4. Kalyagin, V.A., Pardalos, P.M., Rassias, T.M. (eds.): Network models in economics and finance. In: Springer Optimization and Its Applications, vol. 100. Springer, Cham (2014) 5. Kalyagin, V.A., Koldanov, P. A., Pardalos, P.M. (eds.): Models, algorithms and technologies for network analysis. In: NET 2014, Nizhny Novgorod, Russia, May 2014. Springer Proceedings in Mathematics and Statistics, vol. 156. Springer, Cham (2016) 6. Kalyagin, V.A., Nikolaev, A.I., Pardalos, P.M., Prokopyev, O.A. (eds.): Models, algorithms and technologies for network analysis. In: NET 2016, Nizhny Novgorod, Russia, May 2016. Springer Proceedings in Mathematics and Statistics, vol. 197. Springer, Cham (2017)
Contents
Part I
Network Computational Algorithms
Tabu Search for Fleet Size and Mix Vehicle Routing Problem with Hard and Soft Time Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mikhail Batsyn, Ilya Bychkov, Larisa Komosko and Alexey Nikolaev
3
FPT Algorithms for the Shortest Lattice Vector and Integer Linear Programming Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. V. Gribanov
19
The Video-Based Age and Gender Recognition with Convolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angelina S. Kharchevnikova and Andrey V. Savchenko
37
On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitry B. Mokeev
47
The Global Search Theory Approach to the Bilevel Pricing Problem in Telecommunication Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrei V. Orlov
57
Graph Dichotomy Algorithm and Its Applications to Analysis of Stocks Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Rubchinsky
75
Cluster Analysis of Facial Video Data in Video Surveillance Systems Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Anastasiia D. Sokolova and Andrey V. Savchenko Using Modular Decomposition Technique to Solve the Maximum Clique Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Irina Utkina
ix
x
Part II
Contents
Network Models
Robust Statistical Procedures for Testing Dynamics in Market Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 A. P. Koldanov and M. A. Voronina Application of Market Models to Network Equilibrium Problems . . . . . 143 Igor Konnov Selective Bi-coordinate Variations for Network Equilibrium Problems with Mixed Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Igor Konnov and Olga Pinyagina Developing a Model of Topological Structure Formation for Power Transmission Grids Based on the Analysis of the UNEG . . . . . . . . . . . . 171 Sergey Makrushin Methods of Criteria Importance Theory and Their Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Andrey Pavlovich Nelyubin, Vladislav Vladimirovich Podinovski and Mikhail Andreevich Potapov A Model of Optimal Network Structure for Decentralized Nearest Neighbor Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Alexander Ponomarenko, Irina Utkina and Mikhail Batsyn Computational Study of Activation Dynamics on Networks of Arbitrary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Alexander Semenov, Dmitry Gorbatenko and Stepan Kochemazov Rejection Graph for Multiple Testing of Elliptical Model for Market Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 D. P. Semenov and Petr A. Koldanov Mapping Paradigms of Social Sciences: Application of Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Dmitry Zaytsev and Daria Drozdova Part III
Network Applications
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices to Classify Structural Brain Networks . . . . . . . . . . . . . . . . . . . 257 Mikhail Belyaev, Yulia Dodonova, Daria Belyaeva, Egor Krivov, Boris Gutman, Joshua Faskowitz, Neda Jahanshad and Paul Thompson Comparison of Statistical Procedures for Gaussian Graphical Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Ivan S. Grechikhin and Valery A. Kalyagin
Contents
xi
Sentiment Analysis Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 281 Nikolay Karpov, Alexander Lyashuk and Arsenii Vizgunov Invariance Properties of Statistical Procedures for Network Structures Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Petr A. Koldanov Topological Modules of Human Brain Networks Are Anatomically Embedded: Evidence from Modularity Analysis at Multiple Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Anvar Kurmukov, Yulia Dodonova, Margarita Burova, Ayagoz Mussabayeva, Dmitry Petrov, Joshua Faskowitz and Leonid E. Zhukov Commercial Astroturfing Detection in Social Networks . . . . . . . . . . . . . 309 Nadezhda Kostyakova, Ilia Karpov, Ilya Makarov and Leonid E. Zhukov Information Propagation Strategies in Online Social Networks . . . . . . . 319 Rodion Laptsuev, Marina Ananyeva, Dmitry Meinster, Ilia Karpov, Ilya Makarov and Leonid E. Zhukov Analysis of Co-authorship Networks and Scientific Citation Based on Google Scholar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Nataliya Matveeva and Oleg Poldin Company Co-mention Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . 341 S. P. Sidorov, A. R. Faizliev, V. A. Balash, A. A. Gudkov, A. Z. Chekmareva and P. K. Anikin
Contributors
Marina Ananyeva Higher School of Economics, National Research University, Moscow, Russia P. K. Anikin Saratov State University, Saratov, Russian Federation V. A. Balash Saratov State University, Saratov, Russian Federation Mikhail Batsyn Higher School of Economics, National Research University, Moscow, Russia; Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Mikhail Belyaev Skolkovo Institute of Science and Technology, Moscow, Russia; Kharkevich Institute for Information Transmission Problems, Moscow, Russia Daria Belyaeva Skolkovo Institute of Science and Technology, Moscow, Russia; Kharkevich Institute for Information Transmission Problems, Moscow, Russia Margarita Burova Higher School of Economics, National Research University, Moscow, Russia Ilya Bychkov Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia A. Z. Chekmareva Saratov State University, Saratov, Russian Federation Yulia Dodonova Kharkevich Institute for Information Transmission Problems, Moscow, Russia Daria Drozdova Higher School of Economics, National Research University, Moscow, Russia A. R. Faizliev Saratov State University, Saratov, Russian Federation
xiii
xiv
Contributors
Joshua Faskowitz Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, University of Southern California, Marina del Rey, CA, USA; Indiana University, Bloomington, USA Dmitry Gorbatenko Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia Ivan S. Grechikhin Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia D. V. Gribanov Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russian Federation; National Research University Higher School of Economics, Nizhny Novgorod, Russian Federation A. A. Gudkov Saratov State University, Saratov, Russian Federation Boris Gutman Kharkevich Institute for Information Transmission Problems, Moscow, Russia; Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, University of Southern California, Marina del Rey, CA, USA Neda Jahanshad Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, University of Southern California, Marina del Rey, CA, USA Valery A. Kalyagin Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Ilia Karpov Higher School of Economics, National Research University, Moscow, Russia Nikolay Karpov Higher School of Economics, National Research University, Moscow, Russia Angelina S. Kharchevnikova Higher School of Economics, National Research University, Nizhny Novgorod, Russia Stepan Kochemazov Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia A. P. Koldanov Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Petr A. Koldanov Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Larisa Komosko Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia
Contributors
xv
Igor Konnov Kazan Federal University, Kazan, Russia; Kazan Federal University, Institute of Computational Mathematics and Information Technologies, Kazan, Russia Nadezhda Kostyakova Higher School of Economics, National Research University, Moscow, Russia Egor Krivov Skolkovo Institute of Science and Technology, Moscow, Russia; Kharkevich Institute for Information Transmission Problems, Moscow, Russia; Moscow Institute of Physics and Technology, Moscow, Russia Anvar Kurmukov Higher School of Economics, National Research University, Moscow, Russia Rodion Laptsuev Higher School of Economics, National Research University, Moscow, Russia Alexander Lyashuk Higher School of Economics, National Research University, Moscow, Russia Ilya Makarov Higher School of Economics, National Research University, Moscow, Russia Sergey Makrushin Financial University under the Government of the Russian Federation, Moscow, Russia Nataliya Matveeva Higher School of Economics, National Research University, Moscow, Russia Dmitry Meinster Higher School of Economics, National Research University, Moscow, Russia Dmitry B. Mokeev Department of Algebra, Geometry and Discrete Mathematics, Lobachevsky State University of Nizhny Novgorod, N.Novgorod, Russia; Laboratory of Algorithms and Technologies for Networks Analysis, Higher School of Economics in Nizhny Novgorod, N.Novgorod, Russia Ayagoz Mussabayeva Higher School of Economics, National Research University, Moscow, Russia Andrey Pavlovich Nelyubin Mechanical Engineering Research, Institute of the RAS, Moscow, Russia Alexey Nikolaev Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Andrei V. Orlov Matrosov Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russia
xvi
Contributors
Dmitry Petrov Kharkevich Institute for Information Transmission Problems, Moscow, Russia; Imaging Genetics Center, University of Southern California, Los Angeles, USA Olga Pinyagina Kazan Federal University, Institute Mathematics and Information Technologies, Kazan, Russia
of
Computational
Vladislav Vladimirovich Podinovski Higher School of Economics, National Research University, Moscow, Russia Oleg Poldin Higher School of Economics, National Research University, Moscow, Russia Alexander Ponomarenko Higher School of Economics, National Research University, Moscow, Russia Mikhail Andreevich Potapov Institute of Computer Aided Design of the RAS, Moscow, Russia Alexander Rubchinsky Higher School of Economics, National Research University, Moscow, Russia; National Research Technological University “MISIS”, Moscow, Russia Andrey V. Savchenko Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Alexander Semenov Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia D. P. Semenov Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia S. P. Sidorov Saratov State University, Saratov, Russian Federation Anastasiia D. Sokolova Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Paul Thompson Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, University of Southern California, Marina del Rey, CA, USA Irina Utkina Higher School of Economics, National Research University, Moscow, Russia; Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Arsenii Vizgunov Higher School of Economics, National Research University, Moscow, Russia
Contributors
xvii
M. A. Voronina Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia Dmitry Zaytsev Higher School of Economics, National Research University, Moscow, Russia Leonid E. Zhukov Higher School of Economics, National Research University, Moscow, Russia
Part I
Network Computational Algorithms
Tabu Search for Fleet Size and Mix Vehicle Routing Problem with Hard and Soft Time Windows Mikhail Batsyn, Ilya Bychkov, Larisa Komosko and Alexey Nikolaev
Abstract The paper presents a tabu search heuristic for the Fleet Size and Mix Vehicle Routing Problem (FSMVRP) with hard and soft time windows. The objective function minimizes the sum of travel costs, fixed vehicle costs, and penalties for soft time window violations. The algorithm is based on the tabu search with several neighborhoods. The main contribution of the paper is the efficient algorithm for a real-life vehicle routing problem. To the best of our knowledge, there are no papers devoted to the FSMVRP problem with soft time windows, while in real-life problems, this is a usual case. We investigate the performance of the proposed heuristic on the classical Solomon instances with additional constraints. We also compare our approach without soft time windows and heterogeneous fleet of vehicles with the recently published results on the VRP problem with hard time windows. Keywords Soft time windows · Fleet size and mix · Vehicle routing · Variable neighborhood search · Tabu search
1 Introduction The Vehicle Routing Problem (VRP) is an NP-hard combinatorial optimization problem. The classical VRP consists in finding the optimal set of routes for a fleet of vehicles located at the central depot to deliver goods to a given set of customers. M. Batsyn (B) · I. Bychkov · L. Komosko · A. Nikolaev Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, 136 Rodionova, 603093 Nizhny Novgorod, Russia e-mail:
[email protected] I. Bychkov e-mail:
[email protected] L. Komosko e-mail:
[email protected] A. Nikolaev e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_1
3
4
M. Batsyn et al.
The practical interest of this problem consists in its application to various spheres of the real life: delivery of goods to supermarkets [26], waste collection [3], school bus routing [20], mail delivery [25], transportation of people with disabilities [8], and many others. However, real-life vehicle routing problems differ considerably from classical ones having various constraints such as hard and soft time windows, different vehicles, capacity constraints, split deliveries, and others. All these additional features make the process of finding the solution extremely difficult and time-consuming. VRP problems with a heterogeneous fleet of vehicles are frequently encountered in logistics. In real-world applications, fleets of vehicles with different capacities and costs are more common than homogeneous ones [1]. There are two types of heterogeneous fleet VRP [16, 17]. The first one is Fleet Size and Mix Vehicle Routing Problems (FSMVRP). In FSMVRP, there are no limits on the number of available vehicles of each type. The second category is Heterogeneous Fleet Vehicle Routing Problems (HFVRP), where the number of vehicles of each type is limited. In this paper, the FSMVRP is considered [4, 23, 29, 30]. According to Laporte [18], time windows are also one of the most vital constraints used in real-world applications. There are two types of them: hard and soft time windows. In real life, it is rarely possible that a customer exposes only one type of time windows. The most common situation is when there are certain working hours of a customer—hard time window, and a desirable period of time for receiving an order—soft time window. In the academic world, much attention is paid to the Vehicle Routing Problem with hard Time Windows (VRPTW) [2, 5, 6, 10, 12, 13, 21, 25, 27, 28]. Fewer papers deal with the vehicle routing problem with soft time windows [9, 14, 22, 24]. And the papers considering both the heterogeneous fleet of vehicles and soft time windows are not known to the authors. Note that the existing VRP models with Soft Time Windows (VRPSTW) also have hard time window constraints. So hereafter VRPSTW will denote a VRP problem with both soft and hard time windows. In this paper, we suggest an efficient approach for the Fleet Size and Mix Vehicle Routing Problem with Soft Time Windows (FSMVRPSTW). Our algorithm is based on tabu search with variable neighborhoods. We use swap and relocate neighborhoods for inter-route optimization and swap neighborhood for intra-route. A special procedure is developed for soft time windows optimization. The paper is organized as follows. In the next section, we provide an integer linear programming model for the considered problem. In Sect. 3, the pseudocode and description of our algorithm are presented. Section 4 contains computational experiments and results, and Sect. 5 concludes the paper.
2 Mathematical Model The fleet size and mix vehicle routing problem with soft time windows can be described as follows. Let G (V0 , A) be a complete directed graph, where
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ...
5
V0 {0, . . . , n} is a set of vertices representing customers V {1, . . . , n} with the depot node 0, and A is a set of arcs. The nonnegative travel cost ci j and travel time ti j are assigned to every arc (i, j) ∈ A, i j. Each customer i ∈ V has a nonnegative demand of quantity qi , service time si , hard time window—the earliest and latest possible start of service—[e i , li ], and soft time window—the earliest and latest ˜ preferred start of service— e˜i , li . K denotes the set of different types of vehicles available at the depot. k is an index of the chosen type of vehicle. Parameters Q k and f k correspond to the capacity and fixed cost of a vehicle of type k ∈ K . The beginning of service of the customer i ∈ V by vehicle l of type k ∈ K in a solution is denoted by variable bikl . Note that the number of available vehicles of any type in this model is not limited, and therefore, index l can potentially take any values from 1 to n. In case when a vehicle arrives to customer i ∈ V earlier than the earliest possible start of service ei , it has to wait (without any restrictions) at least the earliest preferred until ei to start the service. If a vehicle starts service before time of service e˜i and later than ei , then the penalty α e˜i − bikl is paid. Here, the parameter α denotes the cost of one time unit of soft time window violation. Also ˜ if a vehicle starts service later than the latest preferred time of service li and earlier kl ˜ than the latest possible time of service li , then the penalty α bi − li is paid. The objective of the FSMVRPSTW problem is to choose vehicles and their routes to serve all the customers with minimum total cost including travel costs, fixed vehicle costs, and penalties for soft time window violations. The following mixed integer linear programming model for the fleet size and mix vehicle routing problem with hard and soft time windows is introduced. Decision variables xiklj ∈ {0, 1}, xiklj 1, if vehicle l of type k travels by the arc (i, j) ∈ A in the solution. ykl ∈ {0, 1}, ykl 1, if vehicle l of type k is used in the solution for serving customers. z ikl ∈ {0, 1}, z ikl 1, if vehicle l of type k serves customer i in the solution. bikl ≥ 0, beginning of service at customer i by vehicle l of type k. kl ≥ 0, vehicle l of type k violation of the earliest preferred time at customer i. vi− kl ≥ 0, vehicle l of type k violation of the latest preferred time at customer i. vi+ Objective function k∈K
fk
n l1
ykl +
n k∈K l1 i∈V0 j∈V0
ci j xiklj + α
n kl kl vi− + vi+ → min k∈K l1 i∈V
(1)
6
M. Batsyn et al.
Constraints: n
z ikl 1, ∀i ∈ V
k∈K l1 z ikl , ∀k ∈
K , ∀l 1, n, ∀i ∈ V ykl ≥ kl x0ikl xi0 ykl , ∀k ∈ K , ∀l 1, n
i∈V
xiklj
j∈V0
(2) (3) (4)
i∈V
x klji z ikl , ∀k ∈ K , ∀l 1, n, ∀i ∈ V
(5)
j∈V0
qi z ikl ≤ Q k , ∀k ∈ K , ∀l 1, n
(6)
i∈V
bklj ≥ bikl + si + ti j − M 1 − xiklj , ∀k ∈ K , ∀l ∈ 1, n, ∀i ∈ V0 , j ∈ V ei ≤ bikl ≤ li , ∀k ∈ K , ∀l 1, n, ∀i ∈ V kl vi− kl vi+
≥ e˜i − ≥
bikl
bikl ,
∀k ∈ K , ∀l 1, n, ∀i ∈ V ˜ − li , ∀k ∈ K , ∀l 1, n, ∀i ∈ V
(7) (8) (9) (10)
The objective function (1) minimizes the total cost of all routes including travel costs, fixed vehicle costs, and penalties for soft time window violations. Constraints (2) require that each customer should be serviced exactly once. Inequalities (3) guarantee that a vehicle is used in the solution and its fixed cost is taken into account only if it serves at least one customer. Equalities (4, 5) specify that every used vehicle has to leave the depot, arrive to all the appointed customers, leave them, and at the end return to the depot. Constraints (6) require the total demand of all customers in one route to satisfy the capacity of the vehicle serving it. Inequalities (7) guarantee that the service of the customer is not started before the vehicle completes service at the previous customer and travels to this one. Here, we use big-M notation, which makes the inequality to be automatically satisfied for any values of variables, when xiklj 0. Finally, constraints (8) require beginning times to lie inside hard time windows and constraints (9, 10) determine how much the soft time windows are violated.
3 Algorithm Description The developed algorithm is based on the tabu search and variable neighborhood search metaheuristics. The main function in our algorithm is TABUSEARCH() (Algorithm 1). The idea of the developed algorithm for the fleet size and mix vehicle routing problem with soft time windows consists in application of tabu search with several neighborhoods to an initial greedy solution. One of the advantages of the
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ...
7
tabu search is in tabu list which forbids some edges and thus diversifies the search. This approach allows avoiding loops and going further from a local optimum out from the neighborhood of the current best solution. New solutions are generated by means of 2-opt, swap and relocate operators taking into consideration the tabu edges. If a better value of the objective function is found and the new solution is feasible in terms of hard time windows and vehicle loads, then it replaces the current optimal solution. In such a case, the algorithm tries to optimize further the found solution by means of intra-route improvement operators. If the solution is infeasible, then the function value is penalized. All these steps are carried out until the current solution remains infeasible during SC (Stopping Criterion) iterations. Algorithm 1. Tabu search function TABUSEARCH() ⊳ builds solution S* for the fleet size and mix vehicle routing problem with soft time windows NS ← 1000 ⊳ the number of shakes – restarts from a new initial solution ⊳ diversification parameter, the best value was found empirically σ ← 0,833 TL ← ∅ ⊳ tabu list S* ← ∅ ⊳ f (S*) = +∞ while NS > 0 do S ← SOLOMONINSERTION(V) β←1 ⊳ the penalty for violating a hard time window by 1 time unit γ←1 ⊳ the penalty for exceeding a vehicle capacity by 1 unit ⊳ stopping criterion – the number of infeasible steps in tabu search SC ← 1500 while SС > 0 do S' ← CHOOSENEIGHBORSOLUTION(S) if f (S') ≥ f (S*) then TL ← TL ∪ S end if S' ← S if (w(S) = 0) & (q(S) = 0) then ⊳ w(S) – total violation of hard time windows in S q(S) – total overload of vehicle capacities in S INTRAOPTIMISATION(S) SOFTWINDOWOPTIMISATION(S) if f (S) < f (S*) then S* ← S end if SC ← 1500 else f (S) ← f (S) + β·w(S) + γ·q(S) SC ← SC – 1 end if UPDATEPARAMETER(β, w(S)) UPDATEPARAMETER(γ, q(S)) end while TL ← ∅ NS ← NS – 1 end while return S* end function
8
M. Batsyn et al. Algorithm 2. Updating objective function parameters function UPDATEPARAMETER(parameter, criterion) ⊳ assigns a new value to parameter to control the direction of the search space exploration if (criterion > 0) then parameter ← parameter ⋅ (1 + σ) else if (parameter > 1) parameter ← parameter / (1 + σ) end if end function
3.1 Initial Solution The algorithm of the initial solution construction for the stated problem is based on Solomon [28] insertion heuristic. First, the route consisting of only one customer, which is the farthest from the depot, is created. Afterward, new customers are gradually added to the current route taking into account hard time windows and capacity constraints. To evaluate the insertion cost for the new customer u between two adjacent customers i and j criterion c1 (i, u, j) is computed. And to choose the best customer for insertion we compute criterion c2 (u). Let (r1 , r2 , . . . , rm ) be the current route where the first and the last points are the depots: r1 0, rm n + 1 (vertex n + 1 is a copy of vertex 0). For each unserved customer, we determine the best insertion place taking into account the change in travel cost, beginning times, and fixed vehicle cost. The evaluation of these factors is taken into account in the c1 (i, u, j) criterion calculation. After application of this procedure, the best insertion place with the least insertion cost is chosen. Next, for every customer u the c2 (u) criterion is calculated. It takes into account how expensive is to reach this customer from the depot—direct travel cost c0u , what is the cost of the smallest vehicle to serve the route with this customer, and what is the cheapest insertion cost c1∗ (u). The calculation of these criteria is implemented in SolomonInsertion().
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ...
9
To better determine the best insertion place for the customer in terms of vehicle costs, Dullaert et al.’s [7] approach AROS (Adapted Realistic Opportunity Savings)
10
M. Batsyn et al.
is used. The idea of Realistic Opportunity Savings (ROS) was suggested by Golden et al. [11]. The principle of the method consists in taking into account the opportunity to fill the unused capacity of a bigger vehicle needed after a new customer u insertion. The new vehicle cost is taken into account only if the new total demand q q + qu in the route is greater than the capacity Q of the current vehicle and requires at least the vehicle of capacity Q . The difference F(q ) − F(q) shows how greater is the fixed cost F(q ) of the smallest vehicle needed to serve the new total demand q than the fixed cost F(q) of the current vehicle. The additional item F (Q − q ) is called the opportunity savings. Here, Q − q is the unused capacity of the bigger vehicle which gives an opportunity to add more customers with total demand not greater than Q − q . So we can avoid using for such customers any vehicle having the capacity not greater than Q − q . The fixed cost of the largest such vehicle is F (Q − q ).
3.2 Solution Improvement As the tabu search implies, when the initial solution is known we should find the neighbor solutions of it and choose one of them to be the current. Swap and relocate operators are chosen as the neighborhood construction methods in the developed algorithm. A notice should be made that while generating neighbor solutions it is allowed to break restrictions regarding hard time windows and capacity constraints. This is done to diversify the search because often the way to the best solution lies through infeasible ones. After neighbor solutions are constructed, to avoid the situation of repeating solutions, the random one from the 3% of the best generated solutions with the minimal function value is chosen as the current. When a good feasible solution is found sometimes possible changes within each route can be made which can lead to the decrease in the objective function value. Hard time windows constraints set a certain sequence of customers in a route therefore not all solution improvement methods are suitable for this operation. We considered only neighbors’ swap. For details, see Braysy and Gendreau [2]. Besides taking into account soft time window violation penalty in the objective function, the soft time window optimization procedure is developed. The principle of it is in the following: for each route starting from its last customer, the method tries to distribute soft time windows intervals with their smallest violations among all the nodes in the route trying to use the latest preferred time where possible.
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ...
11
4 Computational Experiments The experiments are carried out on Intel Xeon X5690 3,47 GHz with the following parameter values determined empirically: Stopping Criterion SC 1500, Tabu List size |TL| 24, diversification parameter σ 0.833, the penalty for one time unit of soft time windows violation α = 4. As the main goal of the article is to develop the algorithm for the fleet size and mix vehicle routing problem with soft time windows—the problem regarding which the published papers are not found—therefore the effectiveness of the proposed heuristic is tested on the VRPTW problems. Our results are compared with the latest VRPTW results from the article of Jawarneh and Abdullah [15].
4.1 VRPTW Results Comparison As we compare the performance of our algorithm with the one from the article Jawarneh and Abdullah [15], the computation times for every instance are set nearly
12
M. Batsyn et al.
the same: from 1 to 10 min. This is done by setting an appropriate Number of Shakes (NS parameter) in the main algorithm. Table 1 shows that in 34 cases of the 56 our approach is better than the approach proposed by Jawarneh and Abdullah [15]. In eight cases, our algorithm shows the same performance and for the remaining 14 cases, we have a slightly lower performance. The best results we have on random instances (R101–R211), where we obtain better solutions on 18 instances of the 23, and our improvement reaches 10%. We have good results on random clustered instances (RC101–RC208), where our algorithm gives lower cost on 10 instances of the 16, and the maximal improvement is 6.4%. Finally, on clustered instances (C101–C208), our best improvement is 4.4%; we are better on six instances of the 17 and have the same result on eight instances.
4.2 Results for FSMVRPSTW As the fleet size and mix vehicle routing problem with hard and soft time windows is not considered in literature we have generated instances for the computational experiments. As a base, we take benchmarks of Liu and Shen [19] (Table 2) and add soft time windows with different sizes of 10, 20, 30, and 40% of the hard time window size. The position of the determined soft time window inside the hard one is chosen randomly with uniform distribution. In each instance, there are several vehicle types with different capacities and fixed costs denoted with letters A, B, C, D, E, and F. For each VRPTW instance, we generate three instances with large, medium, and small fixed vehicle costs marked with letters a, b, c correspondingly. In Tables 3, 4, 5 and 6, the average results for each class of instances are presented. The generated instances are available in the following link: https://nnov.hse.ru/en/latna/benchmarks. The computational experiments show that the size of soft time windows influences the value of the objective function: the larger is the soft time window interval the better is the solution. However, the travel cost of routes in a solution can be even smaller in case of narrow soft time windows, because only short routes can satisfy such time windows. At the same time, such a solution will have more routes and a greater fixed cost of used vehicles.
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ...
13
Table 1 Comparison of the results for the VRPTW with Jawarneh and Abdullah [15] Instances Jawarneh and Proposed heuristic Improvement, % Abdullah C101 828,94 828,94 0,00 C102
828,94
828,94
0,00
C103
835,71
832,07
0,44
C104
885,06
847,45
4,44
C105
828,94
828,94
0,00
C106
828,94
828,94
0,00
C107
828,94
828,94
0,00
C108
831,73
828,94
0,34
C109
840,66
828,94
1,41
C201
591,56
591,56
0,00
C202
591,56
591,56
0,00
C203
593,21
636,08
−6,74
C204
606,9
648,07
−6,35
C205
588,88
588,88
0,00
C206
588,88
588,49
0,07
C207
590,59
602,99
−2,06
C208
593,15
588,49
0,79
R101
1643,18
1626,42
1,03
R102
1476,11
1452,20
1,65
R103
1245,86
1232,13
1,11
R104
1026,91
979,63
4,83
R105
1361,39
1362,88
−0,11
R106
1264,5
1246,08
1,48
R107
1108,11
1075,48
3,03
R108
994,68
955,35
4,12
R109
1168,91
1178,57
−0,82
R110
1108,22
1114,86
−0,60
R111
1080,84
1081,08
−0,02
R112
992,22
991,46
0,08
R201
1197,09
1152,34
3,88
R202
1092,22
1077,94
1,32
R203
983,06
893,26
10,05
R204
845,3
775,19
9,04
R205
999,54
983,99
1,58
R206
955,94
929,16
2,88
R207
903,59
858,53
5,25
R208
769,96
749,70
2,70 (continued)
14
M. Batsyn et al.
Table 1 (continued) Instances
Jawarneh and Abdullah
Proposed heuristic
Improvement, %
R209
935,57
936,29
−0,08
R210
988,34
965,07
2,41
R211
867,95
808,00
7,42
RC101
1637,4
1640,34
−0,18
RC102
1486,85
1488,28
−0,10
RC103
1299,38
1288,07
0,88
RC104
1200,6
1174,37
2,23
RC105
1535,8
1568,02
−2,05
RC106
1403,07
1370,83
2,35
RC107
1230,32
1248,13
−1,43
RC108
1165,17
1155,29
0,86
RC201
1315,57
1291,96
1,83
RC202
1169,72
1154,40
1,33
RC203
1010,74
984,07
2,71
RC204
890,28
837,58
6,29
RC205
1221,28
1209,98
0,93
RC206
1097,65
1106,28
−0,78
RC207
1024,17
1047,52
−2,23
RC208
864,56
812,43
6,42
Table 2 Data set for the FSMVRPTW Data set Type of Capacity vehicle C1
C2
R1
R2
A B C A B C D A B C D E A
100 200 300 400 500 600 700 30 50 80 120 200 300
Fixed vehicle costs a
b
c
300 800 1350 1000 1400 2000 2700 50 80 140 250 500 450
60 160 270 200 280 400 540 10 16 28 50 100 90
30 80 135 100 140 200 270 5 8 14 25 50 45 (continued)
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ... Table 2 (continued) Data set Type of vehicle B C D A B C D A B C D E F
RC1
RC2
Capacity
400 600 1000 40 80 150 200 100 200 300 400 500 1000
15
Fixed vehicle costs a
b
c
700 1200 2500 60 150 300 450 150 350 550 800 1100 2500
140 240 500 12 30 60 90 30 70 110 160 220 500
70 120 250 6 15 30 45 15 35 55 80 110 250
Table 3 Results of our approach for the FSMVRPSTW with 10% soft windows size Instances Travel cost
Vehicle cost
R1a
1650,29
3061,67
R1b
1487,75
680,00
R1c
1426,78
357,42
C1a
1204,69
7500,00
C1b
1191,56
1581,11
C1c
1166,14
808,89
RC1a
1768,90
3686,25
RC1b
1650,56
RC1c
1646,78
Penalty
Total cost
Instances Travel cost
Vehicle cost
Penalty
Total cost
2587,00
7298,96
R2a
1323,69
2968,18 12243,00 16534,87
2524,42
4692,17
R2b
1315,86
560,00
12471,91 14347,77
2566,67
4350,86
R2c
1335,55
326,36
12452,00 14113,91
9063,56
17768,24 C2a
851,28
6212,50 25503,38 32567,15
8432,78
11205,44 C2b
900,74
1327,50 24791,88 27020,11
8424,67
10399,70 C2c
836,25
645,00
2517,25
7972,40
RC2a
1728,08
4037,50 10707,00 16472,58
774,75
2330,88
4756,19
RC2b
1586,48
895,00
10251,63 12733,10
401,25
2382,50
4430,53
RC2c
1546,64
484,38
10716,88 12747,89
26021,50 27502,75
Table 4 Results of our approach for the FSMVRPSTW with 20% soft windows size Instances Travel cost
Vehicle cost
R1a
1681,72
3104,17
R1b
1481,03
677,00
R1c
1424,44
356,33
C1a
1211,20
C1b
1117,91
C1c
Penalty
Total cost
Instances Travel cost
Vehicle cost
Penalty
Total cost
2364,17
7150,05
R2a
1354,00
2877,27 10552,55 14783,82
2270,75
4428,78
R2b
1331,91
602,73
9844,00
2366,75
4147,53
R2c
1290,53
317,73
10400,55 12008,80
7522,22
7691,00
16424,42 C2a
939,58
6112,50 21657,38 28709,45
1566,67
7450,67
10135,24 C2b
878,56
1250,00 23326,50 25455,06
1021,01
798,89
7514,78
9334,68
C2c
842,30
646,25
RC1a
1839,18
3663,75
2179,00
7681,93
RC2a
1662,14
4287,50 10084,25 16033,89
RC1b
1652,99
780,00
2156,63
4589,61
RC2b
1651,98
1041,25 9961,38
12654,60
RC1c
1612,03
404,25
2252,75
4269,03
RC2c
1584,89
471,88
11854,39
11778,64
22229,25 23717,80
9797,63
16
M. Batsyn et al.
Table 5 Results of our approach for the FSMVRPSTW with 30% soft windows size Instances Travel cost
Vehicle cost
Penalty
Total cost
Instances Travel cost
Vehicle cost
Penalty
Total cost
R1a
1659,97
3075,83
R1b
1484,86
688,00
1990,92
6726,72
R2a
1373,71
2959,09 10392,55 14725,35
2078,00
4250,86
R2b
1404,22
736,36
R1c
1432,79
359,50
9547,18
11687,76
2084,17
3876,46
R2c
1311,35
335,91
9917,09
C1a
1226,03
7577,78
11564,35
6442,22
15246,03 C2a
844,48
6112,50 20286,00 27242,98
C1b
1088,10
1516,67
6943,00
9547,77
C2b
821,24
1287,50 21192,75 23301,49
C1c
997,82
778,89
6453,00
8229,71
C2c
767,10
608,75
RC1a
1827,35
3652,50
1937,63
7417,48
RC2a
1726,11
4118,75 8404,13
RC1b
1705,35
800,25
1736,00
4241,60
RC2b
1695,56
970,00
8269,88
10935,44
RC1c
1606,41
395,25
1876,63
3878,29
RC2c
1673,14
543,13
8160,63
10376,89
20877,00 22252,85 14248,99
Table 6 Results of our approach for the FSMVRPSTW with 40% soft windows size Instances Travel cost
Vehicle cost
Penalty
Total cost
Instances Travel cost
R1a
1588,19
3131,67
R1b
1477,42
666,50
R1c
1429,51
344,08
C1a
1280,06
7577,78
C1b
1198,17
1601,11
5663,78
8463,06
C1c
1071,29
773,89
5453,78
7298,96
RC1a
1736,29
3757,50
1775,13
RC1b
1650,96
777,75
RC1c
1627,43
403,88
Vehicle cost
Penalty
Total cost
1812,50
6532,36
R2a
1403,66
2890,91 9052,45
13347,03
1838,83
3982,75
R2b
1323,85
610,91
8211,27
10146,03
1911,92
3685,51
R2c
1375,59
359,55
8222,91
9958,05
5549,33
14407,17 C2a
926,33
6537,50 18681,50 26145,33
C2b
816,90
1222,50 19201,50 21240,90
C2c
807,35
586,25
7268,91
RC2a
1631,05
4231,25 7804,13
1762,50
4191,21
RC2b
1637,08
803,75
7501,88
9942,70
1744,00
3775,30
RC2c
1618,71
445,63
7459,75
9524,09
17781,63 19175,23 13666,43
5 Conclusion In this paper, a tabu search heuristic for the fleet size and mix vehicle routing problem with hard and soft time windows is suggested. The advantage of the developed algorithm is in the solution of the vehicle routing problem with the combination of heterogeneous fleet of vehicles with hard and soft time window constraints. To the best of our knowledge, there are no papers devoted to such problem statement. To confirm the efficiency of the developed approach to the FSMVRPSTW solution, a comparison with the state-of-the-art VRPTW results is conducted. According to our results, the proposed algorithm works in 34 cases better than the approach of Jawarneh and Abdullah [15], and in eight cases, it shows the same performance. Acknowledgements The research was funded by Russian Science Foundation (RSF Project No. 17-71-10107).
Tabu Search for Fleet Size and Mix Vehicle Routing Problem ...
17
References 1. Andersson, H., Hoff, A., Christiansen, M., Hasle, G., Lokketangen, A.: Industrial aspects and literature survey: combined inventory management and routing. Comput. Oper. Res. 37(9), 1515–1536 (2010) 2. Braysy, O., Gendreau, M.: Vehicle routing problem with time windows, part I: route construction and local search algorithms. Transp. Sci. 39(1), 104–118 (2005) 3. Buhrkala, K., Larsena, A., Ropkea, S.: The waste collection vehicle routing problem with time windows in a city logistics context. Soc. Behav. Sci. 39, 241–254 (2012) 4. Burchett, D., Campion, E.: Mix fleet vehicle routing problem—An application of Tabu search in the grocery delivery industry. Manag. Sci. Honor. Proj. (2002) 5. Cordeau, J.F., Laporte, G., Mercier, A.: A unified tabu search heuristic for vehicle routing problems with time windows. J. Oper. Res. Soc. 52, 928–936 (2001) 6. Czech, Z.J., Czarnas, P.: Parallel simulated annealing for the vehicle routing problem with time windows. In: 10th Euromicro Workshop on Parallel, Distributed and Network-Based Processing, Canary Islands, Spain, pp. 376–383 (2002) 7. Dullaert, W., Janssens, G., Sorensen, K., Vernimmen, B.: New heuristics for the fleet size and mix vehicle routing problem with time windows. J. Oper. Res. Soc. 53, 1232–1238 (2002) 8. Feillet, D., Garaix, T., Lehuede, F., Peton, O., Quadri, D.: A new consistent vehicle routing problem for the transportation of people with disabilities. Networks 63(3), 211–224 (2014) 9. Figliozzi, M.: An iterative route construction and improvement algorithm for the vehicle routing problem with soft time windows. Transp. Res. Part C 18, 668–679 (2010) 10. Gendreau, M., Tarantilis, C.: Solving large-scale vehicle routing problems with time windows: the state-of-the-art. Technical Report 2010–04, CIRRELT, Montreal, Canada (2010) 11. Golden, B., Assad, A., Levy, L., Gheysens, E.: The fleet size and mix vehicle routing problem. Comput. Oper. Res. 11, 49–66 (1984) 12. Hedar, A., Bakr, M.: Three strategies Tabu search for vehicle routing problem with time windows. Comput. Sci. Inf. Technol. 2(2), 108–119 (2014) 13. Hwang, H.S.: An improved model for vehicle routing problem with time constraint based on genetic algorithm. Comput. Ind. Eng. pp. 1–9. (2002) 14. Iqbal, S., Rahman, M.: Vehicle routing problems with soft time windows. In: 7th International Conference on IEEE Electrical & Computer Engineering (ICECE), pp. 634–638. https://doi.o rg/10.1109/icece.2012.6471630 15. Jawarneh, S., Abdullah, S.: Sequential insertion heuristic with adaptive bee colony optimisation algorithm for vehicle routing problem with time windows. PLoS ONE 10, 1–23 (2015) 16. Koc, C., Bektas, T., Jabali, O., Laporte, G.: A hybrid evolutionary algorithm for heterogeneous fleet vehicle routing problems with time windows. Comput. Oper. Res. 64, 11–27 (2015) 17. Koc, C., Bektas, T., Jabalib, O., Laporte, G.: A hybrid evolutionary algorithm for heterogeneous? fleet vehicle routing problems with time windows. Comput. Oper. Res. 64, 11–27 (2015) 18. Laporte, G.: The vehicle routing problem: an overview of exact and approximate algorithms. Eur. J. Oper. Res. 59, 345–358 (1992) 19. Liu, F.-H., Shen, S.-Y.: The fleet size and mix vehicle routing problem with time windows. J. Oper. Res. Soc. 50(7), 721–732 (1999) 20. Minocha, B., Tripathi, S.: Solving school bus routing problem using hybrid genetic algorithm: a case study. Adv. Intell. Syst. Comput. 236, 93–103 (2014) 21. Moccia, L., Cordeau, J.F., Laporte, G.: An incremental tabu search heuristic for the generalized vehicle routing problem with time windows. Technical Report, https://www.cirrelt.ca/Docum entsTravail/CIRRELT-2010-12.pdf (2010) 22. Mouthuy, S., Massen, F., Deville, Y.: A multi-stage very large-scale neighborhood search for the vehicle routing problem with soft time-windows. Transp. Sci. 49(2), 223–238 (2015) 23. Mutingi, M., Mbohwa, C.: A group genetic algorithm for the fleet size and mix vehicle routing problem. IEEE Conference on Industrial Engineering and Management, Hong Kong (2012)
18
M. Batsyn et al.
24. Nai-Wen, L., Chang-Shi, L.: A hyrid tabu search for the vehicle routing problem with soft time windows. In: Proceedings of the 2012 International Conference on Communication, Electronics and Automation Engineering, vol. 181, pp. 507–512 (2012) 25. Niroomand, I., Khataie, A.H., Galankashi, M.R.: Vehicle routing with time window for regional network services. In: IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp. 903–907 (2014) 26. Rizzoli, A.E., Montemanni, R., Lucibello, E., Gambardella, L.M.: Ant colony optimization for real-world vehicle routing problems. Swarm Intell. 1(2), 135–151 (2007) 27. Rousseau, L.M., Gendreau, M., Pesant, G., Focacci, F.: Solving VRPTWs with constraint programming based column generation. Ann. Oper. Res. 130, 199–216 (2004) 28. Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. Oper. Res. 35(2), 254–265 (1987) 29. Subramanian, A., Penna, P., Uchoa, E., Ochi, L.: A hybrid algorithm for the fleet size and mix vehicle routing problem. Eur. J. Oper. Res. 221(2), 285–295 (2012) 30. Suthikarnnarunai, N.: A sweep algorithm for the mix fleet vehicle routing problem. Lect. Note Eng. Comput. Sci. 2169(1), 1914–1919 (2008)
FPT Algorithms for the Shortest Lattice Vector and Integer Linear Programming Problems D. V. Gribanov
Abstract In this paper, we present FPT algorithms for special cases of the shortest vector problem (SVP) and the integer linear programming problem (ILP), when matrices included in the problems’ formulations are near square. The main parameter is the maximal absolute value of rank minors of matrices included in the problem formulation. Additionally, we present FPT algorithms with respect to the same main parameter for the problems, when the matrices have no singular rank sub-matrices. Keywords Integer programming · Shortest lattice vector problem · Matrix minors · FPT algorithm
1 Introduction Let A ∈ Zd×n be the integral matrix. Its i j-th element is denoted by Ai j , Ai ∗ is i-th row of A, and A∗ j is j-th column of A. The set of integer values started from a value i and finished on j is denoted by the symbol i : j = {i, i + 1, . . . , j}. Additionally, for subsets I ⊆ {1, . . . , d} and J ⊆ {1, . . . , n}, A I J denotes the sub-matrix of A that was generated by all rows with numbers in I and all columns with numbers in J . Sometimes, we will change the symbols I and J to the symbol ∗ meaning that we take the set of all rows or columns, respectively. Let rank(A) be the rank of an integral matrix A. The lattice spanned by columns of A is denoted (A) = {At : t ∈ Zn }. Let ||A||max denote the maximal absolute value of the elements of A. We refer to [13, 27, 47] for mathematical introductions to lattices.
D. V. Gribanov (B) Lobachevsky State University of Nizhny Novgorod, 23 Gagarina Avenue, Nizhny Novgorod, Russian Federation 603950 e-mail:
[email protected] D. V. Gribanov National Research University Higher School of Economics, 136 Rodionova, Nizhny Novgorod, Russian Federation 603093 © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_2
19
20
D. V. Gribanov
An algorithm parameterized by a parameter k is called fixed-parameter tractable (FPT-algorithm) if its complexity can be expressed by a function from the class f (k) n O(1) , where n is the input size and f (k) is a function that depends on k only. A computational problem parameterized by a parameter k is called fixed-parameter tractable (FPT problem) if it can be solved by an FPT algorithm. For more information about parameterized complexity theory, see [15, 19]. Shortest Lattice Vector Problem The shortest lattice vector problem (SVP) consists in finding x ∈ Zn \ {0} minimizing ||H x||, where H ∈ Qd×n is given as an input. The SVP is known to be NP-hard with respect to randomized reductions, see [1]. The first polynomial-time approximation algorithm for SVP was proposed by Lenstra et al. in the paper [35]. Shortly afterward, Fincke and Pohst [20, 21], and Kannan [30, 31] described the first exact SVP solvers. R. Kannan’s solver has the complexity 2 O(n log n) poly(size H ). The first SVP solvers that achieve the complexity 2 O(n) poly(size H ) were proposed by Ajtai et al. [2, 3], and Micciancio and Voulgaris [43]. The previously discussed SVP solvers are useful for the l2 Euclidean norm. Recent results about SVP solvers for more general norms are presented in the papers [10, 16, 17]. The paper of Hanrot et al. [28] is a good survey about SVP solvers. Recently, a novel polynomial-time approximation SVP solver was proposed by J. Cheon and L. Changmin in the paper [14]. The algorithm is parameterized by the lattice determinant, and its complexity and approximation factor are the best for bounded determinant lattices. In our work, we consider only integral lattices, whose generating matrices are near square. The goal of Sect. 2 is development of an exact FPT algorithm for the SVP parameterized by the lattice determinant. Additionally, in Sect. 3 we develop an FPT algorithm for lattices, whose generating matrices have no singular sub-matrices. The proposed algorithms work for the l p norm for any finite p ≥ 1. Integer Linear Programming Problem The integer linear programming problem (ILPP) can be formulated as min{c x : H x ≤ b, x ∈ Zn } for integral vectors c, b and an integral matrix H . There are several polynomial-time algorithms for solving the linear programs. We mention Khachiyan’s algorithm [33], Karmarkar’s algorithm [32], and Nesterov’s algorithm [44, 46]. Unfortunately, it is well known that the ILPP is NP-hard problem. Therefore, it would be interesting to reveal polynomially solvable cases of the ILPP. Recall that an integer matrix is called totally unimodular if any of its minor is equal to +1 or −1 or 0. It is well known that all optimal solutions of any linear program with a totally unimodular constraint matrix are integer. Hence, for any primal linear program and the corresponding primal integer linear program with a totally unimodular constraint matrix, the sets of their optimal solutions coincide. Therefore, any polynomial-time linear optimization algorithm (like algorithms in [32, 33, 44, 46]) is also an efficient algorithm for the ILPP. The next natural step is to consider the bimodular case, i.e., the ILPP having constraint matrices with the absolute values of all rank minors in the set {0, 1, 2}. The first paper that discovers fundamental properties of the bimodular ILPP is the
FPT Algorithms for the Shortest Lattice Vector …
21
paper of Veselov and Chirkov [53]. Very recently, using results of [53], a strongly polynomial-time solvability of the bimodular ILPP was proved by Artmann et al. in the paper [9]. More generally, it would be interesting to investigate the complexity of the problems with constraint matrices having bounded minors. The maximum absolute value of rank minors of an integer matrix can be interpreted as a proximity measure to the class of unimodular matrices. Let the symbol ILPPΔ denote the ILPP with constraint matrix each rank minor of which has the absolute value at most Δ. A conjecture arises that for each fixed natural number Δ the ILPPΔ can be solved in polynomial time [48]. There are variants of this conjecture, where the augmented matrices c and (A b) are considered [7, 48]. Unfortunately, not much is known about A the complexity of the ILPPΔ . For example, the complexity statuses of the ILPP3 are unknown. The next step toward a clarification of the complexity was done by S. Artmann, F. Eisenbrand, C. Glanzer, O. Timm, S. Vempala, and R. Weismantel in the paper [8]. Namely, it has been shown that if the constraint matrix, additionally, has no singular rank sub-matrices, then the ILPPΔ can be solved in polynomial time. Some results about polynomial-time solvability of boolean ILPPΔ were obtained in the papers [7, 11, 24]. Additionally, the class of ILPPΔ has a set of interesting properties. In the papers [23, 26], it has been shown that any lattice-free polyhedron of the ILPPΔ has relatively small width, i.e., the width is bounded by a function that is linear by the dimension and exponential by Δ. Interestingly, due to [26], the width of an empty lattice simplex can be estimated by Δ for this case. In the paper [25], it has been shown that the width of any simplex induced by a system with bounded minors can be computed by a polynomial-time algorithm. Additional result of [25] states that any simple cone can be represented as a union of n 2 log Δ unimodular cones, where Δ is the parameter that bounds minors of the cone constraint matrix. As it was mentioned in [9], due to Tardos results [52], linear programs with constraint matrices whose minors are bounded by a constant Δ can be solved in strongly polynomial time. Bonifas et al. [12] showed that polyhedra defined by a constraint matrix that is totally Δ-modular have small diameter, i.e., the diameter is bounded by a polynomial in Δ and the number of variables. Very recently, Eisenbrand and Vempala [18] showed a randomized simplex-type linear programming algorithm, whose running time is strongly polynomial even if all minors of the constraint matrix are bounded by any constant. The second goal of our paper (Sect. 4) is to improve the results of the paper [8]. Namely, we are going to present an FPT algorithm for the ILPPΔ with the additional property that the problem’s constraint matrix has no singular rank sub-matrices. Additionally, we improve some inequalities established in [8]. The authors consider this paper as a part of achieving the general aim to find out critical values of parameters, when a given problem changes complexity. For example, the integer linear programming problem is polynomial-time solvable on polyhedrons with all-integer vertices, due to [33]. On the other hand, it is NP-complete in the class of polyhedrons with denominators of extreme points equal 1 or 2, due to [45].
22
D. V. Gribanov
The famous k-satisfiability problem is polynomial for k ≤ 2, but is NP-complete for all k > 2. A theory, when an NP-complete graph problem becomes easier, is investigated for the family of hereditary classes in the papers [4–6, 34, 36–42].
2 FPT Algorithm for the SVP Let H ∈ Zd×n . The SVP related to the L p norm can be formulated as follows: min
x∈(H )\{0}
||x|| p ,
(1)
or equivalently min ||H x|| p .
x∈Zn \{0}
Without loss of generality, we can assume that the following properties hold: (1) the matrix H is already reduced to the Hermite normal form (HNF) [49, 51, 54], (2) the matrix H is a full rank matrix and d ≥ n, and (3) using additional permutations of rows and columns, the HNF of the matrix H can be reduced to the following form: ⎞ 1 0 ... 0 0 0 ... 0 ⎜ 0 1 ... 0 0 0 ... 0 ⎟ ⎟ ⎜ ⎜. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .⎟ ⎟ ⎜ ⎜ 0 0 ... 1 0 0 ... 0 ⎟ ⎟ ⎜ ⎜ a1 1 a1 2 . . . a1 k b1 1 0 . . . 0 ⎟ ⎟ ⎜ ⎟ H =⎜ ⎜ a2 1 a1 2 . . . a2 k b2 1 b2 2 . . . 0 ⎟ , ⎜. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .⎟ ⎟ ⎜ ⎜ as 1 a1 2 . . . as k bs 1 bs 2 . . . bs s ⎟ ⎟ ⎜ ⎜ a¯ 1 1 a¯ 1 2 . . . a¯ 1 k b¯1 1 b¯1 2 . . . b¯1 s ⎟ ⎟ ⎜ ⎝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .⎠ a¯ m 1 a¯ m 2 . . . a¯ m k b¯m 1 b¯m 2 . . . b¯m s ⎛
(2)
where k + s = n and k + s + m = d. Let Δ be the maximal absolute value of n × n minors of H and let δ = | det(A1:n ∗ )|, let also A ∈ Zs×k , B ∈ Zs×s , A¯ ∈ Zm×k , and B¯ ∈ Zm×s be the matrices defined by the elements {ai j }, {bi j }, {a¯ i j }, and {b¯i j }, respectively. Hence, B is lower triangular. The following properties are standard for the HNF of any matrix: (1) 0 ≤ ai j ≤ bi i for any i ∈ 1 : s and j ∈ 1 : k, (2) 0 ≤ bi j ≤ bi i for any i ∈ 1 : s and j ∈ 1 : i, and s bi i , and hence s ≤ log2 Δ. (3) Δ ≥ δ = i=1 ¯ max ≤ Bq , where q = log2 Δ and In the paper [8], it was shown that ||( A¯ B)|| the sequence {Bi } is defined for i ∈ 0 : q as follows:
FPT Algorithms for the Shortest Lattice Vector …
B0 = Δ,
Bi = Δ +
i−1
23
B j Δlog2 Δ (log2 Δ)(log2 Δ/2) .
j=0
It is easy to see that Bq = Δ(Δlog2 Δ (log2 Δ)(log2 Δ/2) + 1)log2 Δ . ¯ max can be significantly improved by We will show that the estimate on ||( A¯ B)|| making a bit more accurate analysis as in [8]. Lemma 1 Let j ∈ 1 : m, then the following inequalities are true: Δ b¯ j i ≤ (3s−i + 1) 2 for i ∈ 1 : s, and a¯ j i ≤ for i ∈ 1 : k. ¯ max ≤ Hence, ||( A¯ B)||
Δ (Δlog2 3 2
Δ s (3 + 1) 2
+ 1) < Δ1+log2 3 .
Proof The main idea and the skeleton of the proof are the same as in the paper [8]. ¯ let also w be the Assume that H has the form as in (2). Let c be any row of A, row of B¯ with the same row index as c. Let Hi denote the square sub-matrix of H that consists of the first n rows of H , except the i-th row, which is replaced by the row (c w). Let also bi denote bi i . Since |det (Hn )| = b1 . . . bs−1 |ws |, it follows that |ws | ≤ Δ. Similar to reasonings of the paper [8], let us consider two cases: Case 1: i > k. We can express det (Hi ) as follows: ⎞ wr . . . . . . . . . . . . . ws ⎟ ⎜ ∗ br +1 ⎟ ⎜ ⎟ ⎜ |b1 | . . . |br −1 || det ⎜ ∗ ∗ . . . ⎟ |, ⎟ ⎜ ⎝. . . . . . . . . . . bs−1 ⎠ . . . . . . . . . . . . . . . . bs
⎛
:= H¯
where r = i − k. Let H¯ j be the sub-matrix of H¯ obtained by deletion of the first row and the column indexed by j. Then, s−r +1 s−r +1 1 j+1 j ¯ ¯ ¯ Δ ≥ | det H | = wr H + (−1) wr + j−1 H ≥ |wr H¯ 1 | − | (−1) j+1 wr + j−1 H¯ j |, j=2 j=2
24
and thus
D. V. Gribanov
⎞ ⎛ s−r +1 1 ⎝Δ + |wr | ≤ |wr + j−1 || H¯ j |⎠ . | det H¯ 1 | j=2
Let δ¯ = | det H¯ 1 | = br +1 . . . bs . We note that for any 2 ≤ j ≤ s − r + 1 the matrix j ¯ H is a lower triangular matrix with an additional over-diagonal. The over-diagonal is a vector that consists of at most j − 2 first nonzeros and the other elements are zeros. The following example expresses the structure of the H¯ 5 matrix: ⎞ ⎛ ∗∗ ⎟ ⎜∗ ∗ ∗ ⎟ ⎜ ⎟ ⎜∗ ∗ ∗ ∗ ⎟ ⎜ ⎟ ⎜∗ ∗ ∗ ∗ ⎟, ⎜ ⎟ ⎜∗ ∗ ∗ ∗ ∗ ⎟ ⎜ ⎟ ⎜. . . . . . . . . ⎟ ⎜ ⎝∗ . . . . . . . . ∗ ⎠ ∗ .......... ∗ where we can see three additional nonzero over-diagonal elements (the diagonal is bold). It is easy to see that | det H¯ j | ≤ 2 j−2 δ¯ for 2 ≤ j ≤ s − r + 1. Hence, the recurrence for wr takes the following form: |wr | ≤ Δ +
s−r +1
2 j−2 |wr + j−1 | = Δ +
j=2
s−r −1
2 j |wr + j+1 |.
j=0
Case 2: i ≤ k. Similar to the previous case, we can express | det Hi | as ⎛ ⎞ ci w1 . . . . . . ws ⎜ ∗ b1 ⎟ ⎜ ⎟ ⎜ ⎟ . | det ⎜ ∗ ∗ . . ⎟|. ⎜ ⎟ ⎝ . . . . . . . . bs−1 ⎠ . . . . . . . . . . . . . bs
:= H¯
Let again H¯ j be the sub-matrix of the matrix H¯ obtained by deletion of the first row and the column indexed by j, then
FPT Algorithms for the Shortest Lattice Vector …
25
s+1 s+1 j+1 j ¯ ¯ Δ ≥ | det H | = ci δ + (−1) w j−1 H ≥ |ci δ| − | (−1) j+1 w j−1 H¯ j |, j=2 j=2 ⎞ ⎛ s+1 1⎝ |ci | ≤ |w j−1 || H¯ j |⎠ . Δ+ δ j=2
and thus
As in Case 1, we have the inequality | det H¯ j | ≤ 2 j−2 δ for 2 ≤ j ≤ s + 1. Hence, we have the following inequality: |ci | ≤ Δ +
s+1
2
j−2
|w j−1 | = Δ +
j=2
s−1
2 j |w j+1 |.
j=0
s be the sequence defined as follows: Let {Bi }i=0
B0 = Δ,
Bi = Δ +
i−1
2i− j−1 B j .
j=0
Using the final inequality from Case 1, we have |wi | ≤ Bs−i for any i ∈ 1 : s. And using the final inequality from Case 2, we have |ci | ≤ Bs for any i ∈ 1 : k. For the sequence {Bi }, the following equalities are true: Bi = Δ + Bi−1 +
i−2
2i− j−1 B j = Δ + Bi−1 + 2(Bi−1 − Δ) = 3Bi−1 − Δ.
j=0
Finally, Bi = 3i B0 − Δ
i−1 j=0
3 j = Δ(3i −
Δ 3i − 1 ) = (3i + 1), 2 2
and the lemma follows. Theorem 1 If n > Δ1+m(1+log2 3) + log2 Δ, then there exists a polynomial-time algorithm to solve the problem (1) with the bit-complexity O(n log n log Δ(m + log Δ)). Proof If n > Δ1+m(1+log2 3) + log Δ, then k > Δ1+m(1+log2 3) . 2 A Consider the matrix H¯ = ¯ . By Lemma 1, there are strictly less than A ¯ so if k > Δ1+m(1+log2 3) , then Δ1+m(1+log2 3) possibilities to generate a column from A, ¯ H has two equivalent columns. Hence, the lattice (H ) contains the vector v, such
26
D. V. Gribanov
√ that ||v|| p = p 2 (||v||∞ = 1). We can find equivalent rows using any sorting algorithm with the compares-complexity equal to O(n log n), where the bit-complexity of the two vectors compare operation is O(log Δ(m + log Δ)). The lattice (H ) contains a vector of the norm 1 (for p = ∞) if and only if the matrix H¯ contains the zero column. Definitely, let H¯ have no zero columns and let u be the vector of the norm 1 induced by the lattice (H ). Then, u ∈ H∗ i + H∗ (k+1):n t for the integral, nonzero vector t and i ∈ 1 : k. Let j ∈ 1 : (n − k)√be the first index, such that t j = 0. Since H j j > H j i , we have u j = 0 and ||u|| p ≥ p 2; this is a contradiction. In the case, when m = 0 and H is the square nonsingular matrix, we have the following trivial corollary. Corollary 1 If n ≥ Δ + log2 Δ, then there exists a polynomial-time algorithm to solve the problem (1) with the bit-complexity O(n log n log2 Δ). Let x ∗ be an optimal vector of the problem (1). The most classical G. Minkowski’s theorem in geometry of numbers states that
det (H ) ||x || p ≤ 2 Vol(B p ) ∗
1/n ,
where B p is the unit sphere for the l p norm. n/2 √ ed d ≤Δ Using the inequalities det (H ) = det H H ≤ Δ , we n n can conclude that ed n Δ ∗ . ||x || p ≤ 2 n Vol(B p ) √On the other hand, by Lemma 1, the last column of H has the norm equals Δ p m + 1. Let √ ed n Δ p (3) M = min Δ m + 1, 2 n Vol(B p ) be the minimum value between these two estimates on a shortest vector norm. Lemma 2 Let x ∗ = βα be an optimal solution of (1), then: (1) ||α||1 ≤ M p , (2) |βi | ≤ 2i−1 (M p + M/2), for any i ∈ 1 : s, and (3) ||β||1 ≤ 2s (M p + M/2) ≤ Δ(M p + M/2) < 2ΔM p and ||x ∗ ||1 ≤ (1 + Δ)M p + ΔM < 2(1 + Δ)M p . 2 Proof The statement (1) is trivial.
FPT Algorithms for the Shortest Lattice Vector …
For β1 , we have |b1 1 β1 +
k
27
a1 i αi | ≤ M,
i=1 k
a1 i αi − M ≤ b1 1 β1 ≤
i=1
k
a1 i αi + M,
i=1
k k 1 1 ( a1 i αi − M) ≤ β1 ≤ ( a1 i αi + M) ≤ M p + M/2. b1 1 i=1 b1 1 i=1
−M p − M/2 ≤
For β j , we have |
j
b j i βi +
i=1
βj ≤
k
a j i αi | ≤ M,
i=1
j−1 j−1 k 1 ( b j i βi + a j i αi + M) ≤ βi + M p + M/2, b j j i=1 i=1 i=1
|β j | ≤ 2 j−1 (M p + M/2). The statement (3) follows from the proposition (2). Let Pr ob(l, v, u, C) denote the following problem: l i=1
l p l p s m |αi | + a j i αi + v j + a¯ j i αi + u j → min p
j=1 i=1
(4)
j=1 i=1
α ∈ Zl \ {0} ||α||1 ≤ C,
where 1 ≤ l ≤ k, 1 ≤ C ≤ M p , v ∈ Zs , u ∈ Zm and ||v||∞ ≤ 2Δ(1 + Δ)M p , ||u||∞ ≤ 2Δ1+log2 3 (1 + Δ)M p . Let σ (l, v, u, C) denote the optimal value of the Pr ob(l, v, u, C) objective function, then we trivially have σ (1, v, u, C) = min{|z| p +
s i=1
|ai 1 z + vi | p +
m
|a¯ i 1 z + u i | p : z ∈ Z, |z| ≤ C}.
i=1
(5) The following formula gives relations between σ (l, v, u, C) and σ (l − 1, v, u, C), correctness of the formula could be checked directly:
28
D. V. Gribanov
σ (l, v, u, C) = min{ f (v, ¯ u, ¯ z) : z ∈ Z, |z| ≤ C, v¯i = vi + ai l z, u¯ i = u i + a¯ i l z}, (6) where σ (l − 1, v, u, C), for z = 0 f (v, u, z) = . p p |z| p + min{σ (l − 1, v, u, C − |z|), ||v|| p + ||u|| p }, for z = 0 Let Pr ob(l, v, u, C) denote the following problem: k p min{ j,l} s |αi | + a j i αi + b j i βi + v j + i=1 j=1 i=1 i=1 p min{ j,l} m k + a ¯ α + b¯ j i βi + u j → min ji i j=1 i=1 i=1 α ∈ Zk , β ∈ Zl 1 ≤ ||α||1 + ||β||1 ≤ C,
k
p
(7)
where 1 ≤ l ≤ s, 1 ≤ C ≤ 2(Δ + 1)M p and the values of v, u are the same as in (4). Let σ¯ (l, v, u, C) denote the optimal value of the Pr ob(l, v, u, C) objective function. Again, it is easy to see that σ¯ (1, v, u, C) = min{ f (v, ¯ u, ¯ z) : z ∈ Z, |z| ≤ C, v¯i = vi + bi 1 z, u¯ i = u i + b¯i 1 z}, (8) where σ (k, v, u, min{C, M p }), for z = 0 f (v, u, z) = p p min{σ (k, v, u, min{C − |z|, M p }), ||v|| p + ||u|| p }, for z = 0
.
The following formula gives relations between σ¯ (l, v, u, C) and σ¯ (l − 1, v, u, C): σ¯ (l, v, u, C) = min{ f (v, ¯ u, ¯ z) : z ∈ Z, |z| ≤ C, v¯i = vi + bi l z, u¯ i = u i + b¯i l z}, (9) where f (v, u, z) =
σ¯ (l − 1, v, u, C), for z = 0 p p min{σ¯ (l − 1, v, u, C − |z|), ||v|| p + ||u|| p }, for z = 0.
FPT Algorithms for the Shortest Lattice Vector …
29
Theorem 2 There is an algorithm to solve the problem (1), which is polynomial on n, size H and Δ. The algorithm bit-complexity is equal to O(n d M p(2+m+log2 Δ) Δ3+4m+2 log2 Δ Mult(log Δ)), √ where Mult(k) is the two k-bit integers multiplication complexity. Since M ≤ Δ p m + 1 (see (3)), the problem (1), parameterized by a parameter Δ, is included in the FPT complexity class for fixed m and p. Proof By Lemma 2, the objective function optimal value is equal to σ¯ (s, 0, 0, 2(1 + Δ)M p ). Using the recursive formula (9), we can reduce instances of the type σ¯ (s, ·, ·, ·) to instances of the type σ¯ (1, ·, ·, ·). Using formula (8), we can reduce instances of the type σ¯ (1, ·, ·, ·) to instances of the type σ (k, ·, ·, ·). Using formula (6), we can reduce instances of the type σ (k, ·, ·, ·) to instances of the type σ (1, ·, ·, ·). Finally, an instance of the type σ (1, ·, ·, ·) can be computed using formula (5). The bit-complexity to compute an instance σ (1, v, u, C) is O(C d Mult(log Δ)). The vector v can be chosen using (2Δ(1 + Δ)M p )s possibilities, and the vector u can be chosen using (2Δ1+log2 3 (1 + Δ)M p )m possibilities; hence, the complexity to compute instances of the type σ (1, ·, ·, ·) is roughly O(d M p(2+m+log2 Δ) Δ1+4m+2 log2 Δ Mult(log Δ)). The reduction complexity of σ (l, ·, ·, ·) to σ (l − 1, ·, ·, ·) (the same is true for σ¯ ) consists of O(C) minimum computations and O(dC) integers multiplications of size O(log Δ). So, the bit-computation complexity for instances of the type σ (l, ·, ·, ·) for 1 ≤ l ≤ k can be roughly estimated as O(k d M p(2+m+log2 Δ) Δ1+4m+2 log2 Δ Mult(log Δ)), and the bit-computation complexity for instances of the type σ¯ (l, ·, ·, ·) for 1 ≤ l ≤ s ≤ log2 Δ can be roughly estimated as O(log2 Δ d M p(2+m+log2 Δ) Δ3+4m+2 log2 Δ Mult(log Δ)). Finally, the algorithm complexity can be roughly estimated as O(n d M p(2+m+log2 Δ) Δ3+4m+2 log2 Δ Mult(log Δ)).
3 The SVP for a Special Class of Lattices In this section, we consider the SVP (1) for a special class of lattices that are induced by integral matrices without singular rank sub-matrices. Here, we inherit all notations and special symbols from the previous section. Let the matrix H have the additional property such that H has no singular n × n sub-matrices. One of the results of the paper [8] states that if n ≥ f (Δ), then the matrix H has at most n + 1 rows, where f (Δ) is a function that depends only on Δ.
30
D. V. Gribanov
The paper [8] contains a super-polynomial estimate on the value of f (Δ). Here, we will show an existence of a polynomial estimate. Lemma 3 If n > Δ3+2 log2 3 + log2 Δ, then H has at most n + 1 rows. Proof Our proof of the theorem has the same structure and ideas as in the paper [8]. We will make a small modification with the usage of Lemma 1. Let the matrix H be defined as illustrated in (2). Recall that H has no singular n × n sub-matrices. For the purpose of deriving a contradiction, assume that n > Δ3+2 log2 3 + log2 Δ and H has precisely n + 2 rows. Let again, as in the paper [8], H¯ be the matrix H without rows indexed by numbers i and j, where i, j ≤ k and i = j. Observe, that ⎛ a1 i ⎜ .. ⎜ . ⎜ | det H¯ | = | det ⎜as i ⎜ ⎝a¯ 1 i a¯ 2 i
⎞ a1 j b1 1 ⎟ .. .. ⎟ . . ⎟ |. as j . . . . . . bs s ⎟ ⎟ a¯ 1 j . . . . . . b¯1 s ⎠ a¯ 2 j . . . . . . b¯2 s
:= H¯ i j
The matrix H¯ i j is a nonsingular (s + 2) × (s + 2)-matrix. This implies that the first two columns of H¯ i j must be different for any i and j. By Lemma 1 and the structure of HNF, there are at most Δ · Δ2(1+log2 3) possibilities to choose the first column of H¯ i j . Consequently, since n > Δ3+2 log2 3 + log2 Δ, then k > Δ3+2 log2 3 , and there must exist two indices i = j, such that det H¯ i j = 0. This is a contradiction. Using the previous theorem and Theorem 1 of the previous section, we can develop an FPT algorithm that solves the announced problem. Theorem 3 Let H be the matrix defined as illustrated in (2). Let also H have no singular n × n sub-matrices and Δ be the maximal absolute value of n × n minors of H . If n > Δ3+2 log2 3 + log2 Δ, then there is an algorithm with the complexity O(n log n log2 Δ) that solves the problem (1). Proof If n > Δ3+2 log2 3 + log2 Δ, then, by the previous theorem, we have m = 1 or m = 0. In both cases, we have n > Δ3+2 log2 3 + log2 Δ > Δ1+m(1+log2 Δ) + log2 Δ. The last inequality meets the conditions of Theorem 1 and the theorem follows.
4 Integer Linear Programming Problem (ILPP) Let H ∈ Zd×n , c ∈ Zn , b ∈ Zd , rank(H ) = n and let Δ be the maximal absolute value of n × n minors of H . Suppose also that all n × n sub-matrices of H are nonsingular.
FPT Algorithms for the Shortest Lattice Vector …
Consider the ILPP:
max{c x : H x ≤ b, x ∈ Zn }.
31
(10)
Theorem 4 Let n > Δ3+2 log2 3 + log2 Δ, then the problem (10) can be solved by an algorithm with the complexity O(log Δ · n 4 Δ5 (n + Δ) · Mult(log Δ + log n + log ||w||∞ )). Proof By Lemma 3, for n > Δ3+2 log2 3 + log2 Δ the matrix H can have at most n + 1 rows. Let v be an optimal solution of the linear relaxation of the problem (10). Let us also suppose that Δ = | det(H1:n ∗ )| > 0 and H1:n ∗ v = b1:n . First of all, the matrix H need to be transformed to the HNF. Suppose that form as in (2). has the same it α cα . We note that Let us split the vectors x and c such that x = and c = cβ β ¯ from H is actually a row. The problem (10) takes the form: the sub-matrix ( A¯ B) cα α + cβ β → max ⎧ α ≤ b1:k ⎪ ⎪ ⎪ ⎨ Aα + Bβ ≤ bk+1:k+s ¯ + Bβ ¯ ≤ bd ⎪ Aα ⎪ ⎪ ⎩ k α ∈ Z , β ∈ Zs .
As in [8], the next step consists of the integral transformation α → b1:k − α and from an introduction of the slack variables y ∈ Zs+ for rows with numbers in the range k + 1 : k + s. The problem becomes − cα b1:k + cα α − cβ β → min ⎧ ⎪ ⎨ Bβ − Aα + y = bˆ ¯ − Aα ¯ ≤ bˆd Bβ ⎪ ⎩ α ∈ Zk+ , y ∈ Zs+ , β ∈ Zs , ¯ 1:k . where bˆ = bk+1:k+s − A b1:k and bˆd = bd − Ab We note that α || ||∞ ≤ nΔ y
(11)
due to the classical theorem proved by Tardosh (see [49, 52]). Tardosh’s theorem states that if z is an optimal integral solution of (10), then ||z − v||∞ ≤ nΔ. Since A1:n ∗ v = b1:n for the optimal solution of the relaxed linear problem v, the slack variables α and y must be equal to zero vectors, when the x variables are equal to v.
32
D. V. Gribanov
Now, using the formula β = B −1 (bˆ + Aα − y), we can eliminate the β variables from the last constraint and from the objective function: − cα b1:k − cβ B −1 bˆ + (cα − cβ B −1 A)α + cβ B −1 y → min ⎧ ⎪ ⎨ Bβ − Aα + y = bˆ ¯ − B¯ B ∗ y ≤ Δbˆd − B¯ B ∗ bˆ ( B¯ B ∗ A − Δ A)α ⎪ ⎩ α ∈ Zk+ , y ∈ Zs+ , β ∈ Zs , where the last line was additionally multiplied by Δ to become integral, and where B ∗ = ΔB −1 is the adjoint matrix for B. Finally, we transform the matrix B into the Smith normal form (SNF) [49, 50, 54] such that B = P −1 S Q −1 , where P −1 , Q −1 are unimodular matrices and S is the SNF of B. After making the transformation β → Qβ, the initial problem becomes equivalent to the following problem: w x → min ⎧ ⎪ ⎨Gx ≡ g (mod S) hx ≤ h 0 ⎪ ⎩ x ∈ Zn+ , ||x||∞ ≤ nΔ,
(12)
where w = (Δcα − cβ B ∗ A, cβ B ∗ ), G = (P − P A) mod S, g = P bˆ mod S, ¯ − B¯ B ∗ ), and h 0 = Δbˆd − B¯ B ∗ b. ˆ The inequalities ||x||∞ ≤ nΔ h = ( B¯ B ∗ A − Δ A, are additional tools to localize an optimal integral solution that follows from Tardosh’s theorem argumentation (see (11)). ¯ max ≤ Δ1+log2 3 , ||A||max ≤ Δ, and Trivially, ||(G g)||max ≤ Δ. Since || A|| ∗ 2 ¯ || B B ||max ≤ Δ, we have that ||h||max ≤ Δ (n + Δlog2 3 ). Actually, the problem (12) is the classical Gomory’s group minimization problem [22] (see also [29]) with an additional linear constraint (the constraint ||x||∞ ≤ nΔ only helps to localize the minimum). As in [22], it can be solved using the dynamic programming approach. To do that, let us define subproblems Pr ob(l, γ , η): x → min w1:l ⎧ ⎪ ⎨G ∗ 1:l x ≡ γ (mod S) h 1:l x ≤ η ⎪ ⎩ x ∈ Zl+ ,
where l ∈ 1 : n, γ ∈ int. hull(G) mod S, η ∈ Z, and |η| ≤ n 2 Δ3 (n + Δ). Let σ (l, γ , η) be the objective function optimal value of the Pr ob(l, γ , η). When the problem Pr ob(l, γ , η) is unfeasible, we put σ (l, γ , η) = +∞. Trivially, the optimum of (10) is σ (n, g, min{h 0 , n 2 Δ3 (n + Δ)}).
FPT Algorithms for the Shortest Lattice Vector …
33
The following formula gives the relation between σ (l, ∗, ∗) and σ (l − 1, ∗, ∗): σ (l, γ , η) = min{σ (l − 1, γ − zG ∗ l , η − zh l ) + zwl : |z| ≤ nΔ}. The σ (1, γ , η) can be computed using the following formula: σ (1, γ , η) = min{zw1 : zG ∗ 1 ≡ γ (mod S), zh 1 ≤ η, |z| ≤ nΔ}. Both, the computational complexity of σ (1, γ , η) and the reduction complexity of σ (l, γ , η) to σ (l − 1, ·, ·) for all γ and η can be roughly estimated as O(log Δ · n 3 Δ5 (n + Δ) · Mult(log Δ + log n + log ||w||∞ )). The final complexity result can be obtained by multiplying the last formula by n.
5 Conclusion Here, we present FPT algorithms for SVP instances parameterized by the lattice determinant on lattices induced by near square matrices and on lattices induced by matrices with no singular sub-matrices. In the first case, the developed algorithm is applicable for the norm l p for any finite p ≥ 1. In the second case, the algorithm is also applicable for the l∞ norm. Additionally, we present an FPT algorithm for ILPP instances, whose constraint matrices have no singular sub-matrices. In the full version of the paper, we are going to extend the results related to the SVP on more general classes of norms. Next, we are going to extend result related to the ILPP for near square constraint matrices. Finally, we will present an FPT algorithm for the simplex width computation problem. Acknowledgements The research reported in this paper was conducted under the financial support of the Russian Science Foundation Grant No. 17-11-01336.
References 1. Ajtai, M.: Generating hard instances of lattice problems (extended abstract). In: Proceedings of the STOC, pp. 99-108 (1996) 2. Ajtai, M., Kumar, R., Sivakumar, D.: A sieve algorithm for the shortest lattice vector problem. In Proceedings of the STOC, pp. 601–610 (2001) 3. Ajtai, M., Kumar, R., Sivakumar, D.: Sampling short lattice vectors and the closest lattice vector problem. In Proceedings of the CCC, pp. 53–57 (2002) 4. Alekseev, V.E.: On easy and hard hereditary classes of graphs with respect to the independent set problem. Discret. Appl. Math. 132(1–3), 17–26 (2003) 5. Alekseev, V.E., Boliac, R., Korobitsyn, D.V., Lozin, V.V.: NP-hard graph problems and boundary classes of graphs. Theor. Comput. Sci. 389(1–2), 219–236 (2007)
34
D. V. Gribanov
6. Alekseev, V.E., Korobitsyn, D.V., Lozin, V.V.: Boundary classes of graphs for the dominating set problem. Discret. Math. 285(1–3), 1–6 (2004) 7. Alekseev, V.V., Zakharova, D.: Independent sets in the graphs with bounded minors of the extended incidence matrix. J. Appl. Ind. Math. 5, 14–18 (2011) 8. Artmann, S., Eisenbrand, F., Glanzer, C., Timm, O., Vempala, S., Weismantel, R.: A note on non-degenerate integer programs with small sub-determinants. Oper. Res. Lett. 44(5), 635–639 (2016) 9. Artmann, S., Weismantel, R., Zenklusen, R.: A strongly polynomial algorithm for bimodular integer linear programming. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1206-1219 (2017) 10. Blömer, J., Naewe, S.: Sampling methods for shortest vectors, closest vectors and successive minima. Theor. Comput. Science. 410(18), 1648–1665 (2009) 11. Bock, A., Faenza, Y., Moldenhauer, C., Vargas, R., Jacinto, A.: Solving the stable set problem in terms of the odd cycle packing number. In: Proceedings of the 34th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pp. 187–198 (2014) 12. Bonifas, N., Di Summa, M., Eisenbrand, F., Hähnle, N., Niemeier, M.: On sub-determinants and the diameter of polyhedra. Discret. Comput. Geom. 52(1), 102–115 (2014) 13. Cassels, J.W.S.: An Introduction to the Geometry of Numbers, 2nd edn. Springer (1971) 14. Cheon, J.H., Lee, C.: Approximate algorithms on lattices with small determinant. Cryptology ePrint Archive, Report 2015/461 (2015). http://eprint.iacr.org/2015/461 15. Cygan, M., Fomin, F.V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized Algorithms. Springer International Publishing, Switzerland (2015) 16. Dadush, D., Peikert, C., Vempala, S.: Enumerative algorithms for the shortest and closest lattice vector problems in any norm via M-ellipsoid coverings. In: 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science (2011) 17. Eisenbrand, F., Hähnle, N., Niemeier, M.: Covering cubes and the closest vector problem. In: SoCG ‘11 Proceedings of the twenty-seventh annual symposium on Computational geometry, pp. 417–423 (2011) 18. Eisenbrand, F., Vempala, S.: Geometric Random Edge (2016) https://arxiv.org/abs/1404. 1568v5 19. Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, New York (1999) 20. Fincke, U., Pohst, M.: A procedure for determining algebraic integers of given norm. In Proceedings of the EUROCAL, vol. 162, pp. 194–202. LNCS (1983) 21. Fincke, U., Pohst, M.: Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Math. Comp. 44(170), 463–471 (1985) 22. Gomory, R.E.: On the relation between integer and non-integer solutions to linear programs. Proc. Natl. Acad. Sci. USA 53(2), 260–265 (1965) 23. Gribanov, D.V.: The flatness theorem for some class of polytopes and searching an integer point. In: Springer Proceedings in Mathematics & Statistics. Models, Algorithms and Technologies for Network Analysis, vol. 104, pp. 37–45 (2013) 24. Gribanov, D.V., Malishev, D.S.: The computational complexity of three graph problems for instances with bounded minors of constraint matrices. Discret. Appl. Math. 227, 13–20 (2017) 25. Gribanov, D.V., Chirkov, A.J.: The width and integer optimization on simplices with bounded minors of the constraint matrices. Optim. Lett. 10(6), 1179–1189 (2016) 26. Gribanov, D.V., Veselov, S.I.: On integer programming with bounded determinants. Optim. Lett. 10(6), 1169–1177 (2016) 27. Gruber, M., Lekkerkerker, C.G.: Geometry of Numbers. North-Holland (1987) 28. Hanrot, G., Pujol, X., Stehlé, D.: Algorithms for the shortest and closest lattice vector problems. coding and cryptology. In: IWCC 2011. Lecture Notes in Computer Science vol. 6639, pp. 159– 190 (2011) 29. Hu, T.C.: Integer Programming and Network Flows. Addison-Wesley Publishing Company (1970)
FPT Algorithms for the Shortest Lattice Vector …
35
30. Kannan, R.: Improved algorithms for integer programming and related lattice problems. In Proceedings of the STOC pp. 99-108 (1983) 31. Kannan, R.: Minkowski’s convex body theorem and integer programming. Math. Oper. Res. 12(3), 415–440 (1987) 32. Karmarkar, N.: A new polynomial time algorithm for linear programming. Combinatorica 4(4), 373–395 (1984) 33. Khachiyan, L.G.: Polynomial algorithms in linear programming. Comput. Math. Math. Phys. 20(1), 53–72 (1980) 34. Korpelainen, N., Lozin, V.V., Malyshev, D.S., Tiskin, A.: Boundary properties of graphs for algorithmic graph problems. Theor. Comput. Sci. 412, 3545–3554 (2011) 35. Lenstra, A.K., Lenstra Jr., H.W., Lovász, L.: Factoring polynomials with rational coefficients. Math. Ann. 261, 515–534 (1982) 36. Malyshev, D.S.: Continued sets of boundary classes of graphs for colorability problems. Discret Anal. Oper. Res. 16(5), 41–51 (2009) 37. Malyshev, D.S.: On minimal hard classes of graphs. Discret Anal. Oper. Res. 16(6), 43–51 (2009) 38. Malyshev, D.S.: A study of the boundary graph classes for colorability problems. J. Appl. Ind. Math. 2, 221–228 (2013) 39. Malyshev, D.S.: Classes of graphs critical for the edge list-ranking problem. J. Appl. Ind. Math. 8, 245–255 (2014) 40. Malyshev, D., Pardalos, P.M.: Critical hereditary graph classes: a survey. Optim. Lett. 10(8), 1593–1612 (2016) 41. Malyshev, D.: Critical elements in combinatorially closed families of graph classes. J. Appl. Ind. Math. 11(1), 99–106 (2017) 42. Malyshev, D., Sirotkin, D.: Polynomial-time solvability of the independent set problem in a certain class of subcubic planar graphs. J. Appl. Ind. Math. 24(3), 35–60 (2017) 43. Micciancio, D., Voulgaris, P.: A deterministic single exponential time algorithm for most lattice problems based on Voronoi cell computations. In: Proceedings of the STOC pp. 351–358 (2010) 44. Nesterov, Y.E., Nemirovsky, A.S.: Interior Point Polynomial Methods in Convex Programming. Society for Industrial and Applied Math, USA (1994) 45. Padberg, M.: The boolean quadric polytope: some characteristics, facets and relatives. Math. Program. 45(1–3), 139–172 (1989) 46. Pardalos, P.M., Han, C.G., Ye, Y.: Interior point algorithms for solving nonlinear optimization problems. COAL Newsl 19, 45–54 (1991) 47. Siegel, C.L.: Lectures on the Geometry of Numbers. Springer (1989) 48. Shevchenko, V.N.: Qualitative Topics in Integer Linear Programming (Translations of Mathematical Monographs). AMS (1996) 49. Schrijver, A.: Theory of linear and integer programming. In: Wiley Interscience Series in Discrete Mathematics. Wiley (1998) 50. Storjohann, A.: Near optimal algorithms for computing Smith normal forms of integer matrices. In: ISSAC’96 Proceedings of the 1996 International Symposium on Symbolic and Algebraic Computation. ACM Press, pp. 267–274 (1996) 51. Storjohann, A., Labahn, G.: Asymptotically fast computation of Hermite normal forms of integer matrices. In: Proceedings of the 1996 International Symposium on Symbolic and Algebraic Computation, pp. 259–266 (1996) 52. Tardos, E.: A strongly polynomial algorithm to solve combinatorial linear programs. Oper. Res. 34(2), 250–256 (1986) 53. Veselov, S.I., Chirkov, A.J.: Integer program with bimodular matrix. Discret Optim. 6(2), 220– 222 (2009) 54. Zhendong, W.: Computing the Smith Forms of Integer Matrices and Solving Related Problems (2005)
The Video-Based Age and Gender Recognition with Convolution Neural Networks Angelina S. Kharchevnikova and Andrey V. Savchenko
Abstract The paper reviews the problem of age and gender recognition methods for video data using modern deep convolutional neural networks. We present the comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames. We implemented the video-based recognition system with several aggregation methods to improve the age and gender identification accuracy. The experimental comparison of the proposed approach with traditional simple voting using IJB-A, Indian Movies, and Kinect datasets is provided. It is demonstrated that the most accurate decisions are obtained using the geometric mean and mathematical expectation of the outputs at softmax layers of the convolutional neural networks for gender recognition and age prediction, respectively. Keywords Age and gender recognition · Contextual advertising · Convolutional neural networks · Classifier fusion
1 Introduction Nowadays, due to the rapid growth of interest in video processing, the modern face analysis technologies are oriented to identify various characteristics of an observed person. In particular, age and gender characteristics can be applied in retail for contextual advertising for particular group of customers [1]. In the recent few years, the huge number of age and gender recognition algorithms has been proposed [2]. However, the reliability of the existing solutions remains insufficient for practical A. S. Kharchevnikova (B) · A. V. Savchenko National Research University Higher School of Economics, Nizhniy Novgorod, Russia e-mail:
[email protected] A. V. Savchenko e-mail:
[email protected] A. V. Savchenko Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_3
37
38
A. S. Kharchevnikova and A. V. Savchenko
application. Therefore, the problem of increasing the accuracy of age and gender identification by the video frame continues to be an acute challenge for researchers in the field of computer vision and machine learning. Unlike traditional single-image processing systems, the video analysis lets us use additional information. For rather fast recognition algorithms, one can obtain more than 100 frames of the classified object in the dynamics within a few seconds from the camera. It is sufficient to guarantee that at least several frames belong to the same class from the reference base [1]. Thus, this paper is intended to consider the video-based age and gender recognition task as the problem of choosing the most reliable solution using the classifier fusion (or ensemble) methods [3]. The rest of the paper is organized as follows: In Sect. 2, we describe the proposed algorithm. In Sect. 3, an experimental study is conducted using IJB-A, Indian Movies, and Kinect datasets. In Sect. 4, we present the findings and give concluding comments.
2 Materials and Methods 2.1 Literature Survey The task of video classification is to assign the newly arriving (to the input) sequence of T frames {X (t)}, t = 1, 2, . . . , T , with the face of an individual to one of the L classes [4]. For simplicity, we assume that the whole video contains only one person, and the facial region is preliminarily detected in each frame, so that images {X (t)} contain only this facial region. Though age prediction problem is an example of regression task, the highest accuracy is obtained if several age categories are defined (L = 8 in [5]), and general classification task is solved. Early age estimation methods are based on the calculation of the relationships between different dimensions of facial features [6]. Since these algorithms require an accurate calculation of the facial features location, that is, a fairly complex problem, they are unsuitable for raw images and video frames. The authors in [7] propose a method for automatic age recognition—AGing pattErn Subspace (AGES), the idea of which is to create an aging pattern. However, the requirements of front alignment of images impose significant restrictions on the set of input parameters. The frequency-based approach is known among the age identification algorithms. For instance, a combination of biological features of the image is studied in [8] (BIF—Biologically Inspired Features). Classification of age could also be conducted using linear Gabor filters and Support Vector Machines (SVM) [9]. Gender recognition for a facial image [10] is much more simple task, because it includes only L = 2 classes. Hence, traditionally binary classifiers can be used. Among them, methods such as SVM [11], boosting-based algorithms, and neural networks are widely used.
The Video-Based Age and Gender Recognition …
39
Nevertheless, the accuracy of traditional computer vision and pattern recognition algorithms does not satisfy practical requirements. With regard to the effectiveness of the Convolution Neural Networks (CNN) implementation, in particular, to classification challenges, the authors in [5] propose applying this method to solve age and gender recognition problems. After that, several other papers have proved the efficiency of deep CNNs in these tasks [12–15]. Hence, we will use this deep learning approach in order to recognize age and gender for video data.
2.2 Proposed Algorithm At first, each frame is assigned to one of the L classes by feeding the RGB matrix of pixels of a facial image {X (t)} to the CNN. This deep neural network should be preliminarily trained using the very large dataset of facial images with known age and gender labels [5]. The output of the CNN is usually obtained in the Softmax layer that provides the estimate of posterior probabilities P(l|X (t)) for the tth frame belonging to the lth class [16]: P(l|X (t)) = so f tmax zl (t) =
exp zl (t) L
{, l = 1, 2, . . . , L},
(1)
exp z j (t)
j=1
where zl (t) is the output of the lth neuron in the last (usually fully connected) layer of the neural network. The decision is made in favor of a class with a maximum a posterior probability (MAP). Rather, high accuracy has been obtained using MAP approach applying to the single-image age and gender recognition [5]. However, due to the influence of diverse factors such as unknown illumination, quick change of camera angle, low resolution of video camera, etc., making a decision based on the MAP approach for every frame is usually inaccurate. Therefore, we will use the fusion of decisions for individual frames to increase recognition accuracy. In this paper, we examine the following aggregation algorithms [3]: 1. Simple voting, in which the final decision is made in favor of the class [11] l ∗ = argmax l=1,L
T
δ(l ∗ (t) − 1),
(2)
t=1
where δ(· ) is the discrete delta function. 2. Arithmetical mean of posterior probability estimates (1), i.e., the sum rule [11]: l ∗ = argmax l=1,L
T 1 P(l|X (t)). T t=1
(3)
40
A. S. Kharchevnikova and A. V. Savchenko
Fig. 1 The block diagram of the proposed recognition system
3. If we follow the “naive” assumption about the independence of all frames, then the decision should be taken according to the geometric mean of posterior probabilities, or the product rule [11]: l ∗ = argmax l=1,L
T
P(l|X (t)) = argmax l=1,L
t=1
T
log P(l|X (t)).
(4)
t=1
In addition, we recall that the age prediction task is essentially the regression problem. Hence, in this case it is possible to compute an expected value (mathematical expectation): L ∗ P(l|X (t)) · l. (5) l = l=1
The general data flow in the proposed video-based age and gender recognition system is presented in Fig. 1. The first step implies supplying images from video camera to the input of the system. Isolated frames are selected from the video stream with a fixed frequency (about 10–20 times per second) in the frame selection block. Then, it is important to fix and leave only the face area that is performed in the corresponding block. Face detection is conducted using the cascade method of Viola–Jones and the Haar features [17] from the OpenCV library. The next step is supposed to provide the recognition of each frame, where the CNN is used. As the result, the estimates of posterior probabilities are obtained from the softmax layer. The product rule (4) is used for aggregation of gender classification results. The predicted age for each frame is combined with the resulted expected value (5). Based on the data that is the output of the classifier fusion block, a final recognition solution is implemented in favor of the corresponding class.
The Video-Based Age and Gender Recognition …
41
3 Experimental Results and Discussion We implemented the described approach (Fig. 1) in MS Visual Studio 2015 using C++ language. Implementation of the algorithm scheme is carried out using either OpenCV DNN module or the Caffe framework [18]. The GUI of the proposed system is shown in Fig. 2. We compare two publicly available CNNs architectures: Age_net and Gender_net models [5] and deep VGG-16 neural network [19] trained for age/gender prediction [20]. Moreover, we implemented image normalization techniques, namely, the Mean-Variance Normalization layer (MVN) to the models and mean image subtraction (Caffe) to cope with illumination effects, low camera resolution, etc. Choosing the datasets for testing the accuracy of recognition and the aggregation solution algorithms is an inherently challenging problem. The reason for this is that just a very limited number of databases provide such personal information as age, gender, or both about a person on an image. Furthermore, the video-based approach is considered in this work. In case of this reason, the datasets which used to train and test described CNNs architectures [5, 20] could not be applied. Hence, in this work testing the accuracy of recognition and the aggregation solution algorithms is conducted using the facial datasets IARPA Janus Benchmark A (IJB-A), Indian Movie, and
Fig. 2 Experiment results
42
A. S. Kharchevnikova and A. V. Savchenko
Table 1 Inference time (s) Gender_net and Age_net CPU GPU OpenCV DNN OpenCV DNN + MVN layer Caffe + mean image subtraction Caffe + mean image subtraction + MVN layer
VGG-16 CPU
GPU
28.981 34.984
– –
4.805 8.734
– –
1.502
0.867
6.671
3.395
3.012
1.064
12.973
9.012
EURECOM Kinect, for which gender and age information are available, and the video frames of a single track are stored. The first dataset consists of 2,043 videos, where only the gender information is present [21]. The Indian Movie database is a collection of video frames assembled from Indian films. In total, there are about 332 different videos and 34,512 frames of one hundred Indian actors, whose age is divided into four categories: “Child”, “Young”, “Middle”, and “Old” [22]. In this example, the verbal description of age was replaced by specific age intervals according to an approximate estimation of people on the frames: 0–15, 16–35, 36–60, and 60+. In the following experiments, the intersections of the recognition results at the given intervals are estimated. The Kinect dataset contains 104 videos with 52 people (14 women and 38 men) [23]. The database provides information about the gender and the year of birth that simplifies the estimation of age. When implementing the algorithm, age is considered in the range with addition and subtraction of 10 years, since it is necessary to identify the accuracy of the intersection with the recognized age interval. Further, the age recognition will be considered to be a classification task. The average inference time of individual CNNs for Intel Core i5-2400 CPU, 64-bit machine with NVIDIA GeForce GT 440 is presented in Table 1. It could be noticed that, according to experiments, the recognition using the Caffe functionality is several times faster, since it is possible to use the power of GPUs. Inference time (Table 1) of the recognition with OpenCV limits the usage of VGG16 architecture, so there are no testing examples of this library. In Table 2 and Table 3, the comparison of implementation means is presented on the example of sum rule for the gender and age recognition, respectively. The highest accuracies of age and gender recognition have been obtained with the Caffe framework and mean image reduction. For example, the difference in error rates with the OpenCV dnn module was 6–10% for the gender recognition problem, and 7–35% for the age classification in different databases. The results of gender and age prediction for traditional recognition of one image of each frame (MAP approach) and the classifier fusion schemes (2)–(5) are shown in Tables 4 and 5, respectively. It could be noticed that accuracy obtained by the MAP
The Video-Based Age and Gender Recognition …
43
Table 2 Gender recognition accuracy (%) (sum rule) Implementation IJB-a Indian movie means Gender_net Vgg-16 Gender_net Vgg-16 OpenCV dnn OpenCV dnn + MVN Caffe + mean image deduction Caffe + mean image deduction + MVN
– –
70 67
– –
55 61
– –
59
81
72
87
75
84
59
81
71
83
63
84
Kinect Gender_net
Vgg-16
16 19
– –
8 26
– –
23
65
45
79
24
64
38
75
Table 4 Gender recognition accuracy (%) with classifier fusion Algorithm IJB-a Indian movie Gender_net Vgg-16 Gender_net Vgg-16 Frame by frame (MAP) Simple voting (2) Sum rule (3) Product rule (4)
Gender_net Vgg-16
53 56
Table 3 Age recognition accuracy (%) (sum rule) Implementation Indian movie means Gender_net Vgg-16 OpenCV dnn OpenCV dnn + MVN Caffe + mean image deduction Caffe + mean image deduction + MVN
Kinect
Kinect Gender_net Vgg-16
51
72
61
75
55
69
60
81
71
81
73
84
59 59
81 82
72 75
87 88
75 77
84 84
approach is about 10–15% lower for the gender recognition problem and 10–20% for the age classification in comparison with classifier fusion implementation. As a percentage, all aggregation methods worked at approximately the same level, but the geometric mean (4) showed the best result for each of the bases. Thus, it could
44
A. S. Kharchevnikova and A. V. Savchenko
Table 5 Age prediction accuracy (%) with classifier fusion Algorithm Indian movie Kinect Age_net Vgg-16 Age_net Frame by frame (MAP) Simple voting (2) Sum rule (3) Product rule (4) Expected value (5)
Vgg-16
10
42
22
58
23 23 23 31
62 65 65 63
41 43 45 50
73 79 79 78
be noticed that the VGG-16 architecture is ahead of Gender_net and Age_net models for the accuracy of age and gender recognition. Unfortunately, the inference time (Table 1) for very deep VGG-16 is much higher when compared to the CNN models from [5]. Here, we have a general trade-off between performance and accuracy. In addition, the testing data confirmed the need for the input video image normalization, as the average image subtraction showed its efficiency. Finally, according to (Table 4), the product rule (4) is slightly more accurate for gender recognition when compared to other aggregation techniques. Estimation of the expected value (5) shows its efficiency in age identification (Table 5).
4 Conclusion and Future Work In this work, we proposed the video-based age and gender recognition algorithm with the implementation of the classifier committees. The experimental results have demonstrated the increase of recognition accuracy of the proposed algorithm when compared with traditional frame by frame decision. The method of finding the geometric mean (product rule [3]) with normalization of the input video images is the most accurate in gender classification task. At the same time, the most accurate age prediction is achieved through the computation of the expected value (5). We presented the results of comparing two CNN architectures: models Age_net and Gender_net [5] and VGG-16 [20]. Eventually, the accuracy of the VGG-16 architecture is about 15% and 20% higher for the gender recognition (Table 4) and age prediction (Table 5) than age and gender net models, respectively. However, the inference time of the VGG-16 is 4–9 times lower (Table 1). As in future, we plan to estimate age accuracy considered to be a regression task and evaluate mean squared error (mse) or Huber loss. Moreover, we would try to implement the gender/age recognition in Android mobile offline application, and insufficient performance of the VGG-16 model can be a limiting factor of its practical usage. Hence, the modern techniques for optimization of deep CNNs can be applied [24].
The Video-Based Age and Gender Recognition …
45
Acknowledgements The paper was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (grant 17-050007) and by the Russian Academic Excellence Project “5–100”. Andrey V. Savchenko is partially supported by Russian Federation President grant no. MD-306.2017.9.
References 1. Savchenko, A.V.: Search Techniques in Intelligent Classification Systems. Springer International Publishing (2016) 2. Chao, W.L., Liu, J.Z., Ding, J.J.: Facial age estimation based on label-sensitive learning and age-oriented regression. Pattern Recogn. 46(3), 628–641 (2013) 3. Kittler, J., Alkoot, F.M.: Sum versus vote fusion in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 25(1), 110–115 (2003) 4. Wang, H., Wang, Y., Cao, Y.: Video-based face recognition: a survey. World Acad. Sci. Eng. Technol. Int. J. Comput. Electr. Autom. Control Inf. Eng. 3(12), 2809–2818 (2009) 5. Levi, G., Hassner, T.: Age and gender classification using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–42 (2015) 6. Kwon, Y.H.: Age classification from facial images. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’94, pp. 762–767. IEEE (1994) 7. Geng, X.: Learning from facial aging patterns for automatic age estimation. In: Proceedings of the 14th ACM International Conference on Multimedia, pp. 307–316. ACM (2006) 8. Guo, G., Mu, G. Fu, Y.: Human age estimation using bio-inspired features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 112–119. IEEE (2009) 9. Choi, S.E.: Age estimation using a hierarchical classifier based on global and local facial features. Pattern Recogn. 44(6), 1262–1281 (2011) 10. Makinen, E.: Evaluation of gender classification methods with automatically detected and aligned faces. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 541547 (2008) 11. Shan, C.: Face recognition and retrieval in video. In: Video Search and Mining, pp. 235–260. Springer, Berlin, Heidelberg (2010) 12. Krizhevsky, A., Sutskever I., Hinton G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 13. Rothe, R., Timofte, R., Van Gool, L.: Deep expectation of apparent age from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–15 (2015) 14. Ekmek, A.: Convolutional Neural Networks for Age and Gender Classification (2016). http:// cs231n.stanford.edu/reports/2016/pdfs/003_Report.pdf 15. Rude Carnie: Age and Gender Deep Learning with TensorFlow. https://github.com/dpressel/ rude-carnie 16. Szegedy, C.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 17. Lienhart, R., Maydt, J.: An extended set of Haar-like features for rapid object detection. In: Proceedings of International Conference on Image Processing, vol. 1, pp. I–I. IEEE (2002) 18. Caffe http://caffe.berkeleyvision.org/ 19. Simonyan, K., Zisserman A.: Very Deep Convolutional Networks for Large-Scale Image Recognition (2014). arXiv:1409.1556 20. Rothe, R., Timofte R., Van Gool L. Dex: Deep expectation of apparent age from a single image In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 10–15 (2015)
46
A. S. Kharchevnikova and A. V. Savchenko
21. Klare, Brendan F., et al.: Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 22. Setty, S., et al.: Indian movie face database: a benchmark for face recognition under wide variations. In: Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG). IEEE (2013) 23. Min, R., Kose, N., Dugelay J.: KinectFaceDB: a kinect database for face recognition. In: IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 11, pp. 1534–548, Nov 2014. https://doi.org/10.1109/TSMC.2014.2331215 24. Rassadin, A.G., Savchenko, A.V.: Compressing deep convolutional neural networks in visual emotion recognition. In: Proceedings of the International conference Information Technology and Nanotechnology (ITNT). Session Image Processing, Geoinformation Technology and Information Security Image Processing (IPGTIS), CEUR-WS, vol. 1901, pp. 207–213 (2017). http://ceur-ws.org/Vol-1901/paper33.pdf
On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs Dmitry B. Mokeev
Abstract A triangle packing of graph G is a set of pairwise vertex-disjoint 3-vertex cycles in G. A triangle vertex cover of graph G is a subset S of vertices of G such that every cycle of 3 vertices in G contains at least one vertex from S. We consider a hereditary class graphs which has the following property. The maximum cardinality of a triangle packing is equal to the minimum cardinality of a triangle vertex cover. In this paper we present some minimal forbidden induced subgraphs for this hereditary class. Keywords Maximum triangle packing · Minimum triangle vertex cover · Konig graph · Hereditary class · Forbidden subgraphs
1 Introduction We use the standard notation Pn , Cn , Kn for the simple path, the chordless cycle, and the complete graph with n vertices, respectively. Let X be a set of graphs. A set of pairwise vertex-disjoint induced subgraphs of a graph G each isomorphic to a graph in X is called an X packing of G. The X packing problem is to find a maximum X packing in a graph. A subset of vertices of a graph G which covers all induced subgraphs of G each isomorphic to a graph in X is called an X vertex cover of G. In other words, removing all vertices of any X vertex cover of G produces a graph that does not contain none of the graphs in X as an induced subgraph. The X vertex cover problem is to find a minimum X vertex cover in a graph. A König graph for X is a graph in which every induced D. B. Mokeev (B) Department of Algebra, Geometry and Discrete Mathematics, Lobachevsky State University of Nizhni Novgorod, 23, Gagarina Ave., N.Novgorod, Russia e-mail:
[email protected] D. B. Mokeev Laboratory of Algorithms and Technologies for Networks Analysis, Higher School of Economics in Nizhni Novgorod, 136, Rodionova Str., N.Novgorod, Russia © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_4
47
48
D. B. Mokeev
subgraph has the property that the maximum cardinality of its X packings is equal to the minimum cardinality of its X vertex covers [2]. The class of all König graphs for a set X is denoted by K (X ). If X consists of a single graph H , then we will talk about H packings, H vertex covers, and König graphs for H , respectively. A lot of papers on the X packing and X vertex cover problems are devoted to algorithmic aspects (see [6, 12, 18, 24]). It is known that the matching problem (equivalently, P2 packing problem) can be solved in polynomial time [8], but the H packing problem is NP-complete for any graph H having a connected component with three or more vertices [13]. Being formulated as integer linear programming problem, the of X packing and X vertex cover problems form a pair of dual problems. So König graphs for X are graphs such that for any their induced subgraph there is zero duality gap between the X packing and X vertex cover problems. In this regard, König graphs are similar to perfect graphs having the same property with respect to another pair of dual problems (vertex coloring and maximum clique), which helps to solve efficiently these problems on perfect graphs [9]. The cycles packing problem in an undirected graph has several applications, for instance, in computational biology [4]. It is known that the triangle packing problem can be solved in polynomial time if the maximum vertex degree of the input graph is at most 3 [5] and NP-complete for planar graphs, chordal graphs, line graphs, and total graphs [10]. Several polynomial-time approximation algorithms for this problem are known [1, 11, 14, 22]. There is a way for a systematic study of the computational complexity of graph problems in some important subfamilies of the hereditary graph classes family. We will mention the papers [15–17, 19–21]. We present the class of triangle-König graphs (i.e., König graphs for C3 ) in this paper. We suppose that both the triangle packing and triangle vertex cover problems can be solved in polynomial time on such graphs. Every hereditary class H can be described by a set of its minimal forbidden induced subgraphs, i.e., minimal by the relation “to be an induced subgraph” graphs not belonging to H . A class K (X ) is hereditary for any X and, therefore, it can be described by a set of forbidden induced subgraphs. Such a characterization for K (P2 ) is given by the König theorem. In addition to this classical König theorem, the following results are known. All minimal forbidden induced subgraphs for the class K (C ) are described in [7], where C is the set of all chordless cycles. All minimal forbidden induced subgraphs for the class K (P3 ) are described, and a full structural description of this class is presented in [2, 3]. Some forbidden induced subgraphs for K (P4 ) are found in [23]. The aim of this paper is to describe some forbidden induced subgraphs for the class of all König graphs for C3 (not all of them). We show that K4 is a forbidden induced subgraph for K (C3 ). In addition, we give definitions of belts and rings and show when they are triangle-König graphs or minimal forbidden induced subgraphs for this class.
On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs
49
2 Notation and Definitions We always denote by (v1 , v2 , v3 ) a triangle that consists of vertices v1 , v2 , v3 . We say that two different triangles are adjacent if they have at least one common vertex. We say that two triangles are weakly adjacent if they have exactly one common vertex. We say that two different triangles are strongly adjacent if they have a common edge. In what follows a triangle-König graph means a König graph for C3 . The maximum number of subgraphs in triangle packings of G is denoted by μC3 (G), and the minimum number of vertices in its triangle vertex covers is denoted by βC3 (G). Let v be a vertex of a graph G and e is an edge of the same graph. We denote by G\v and G\e the graphs obtained from G by deletion of v and e, respectively. Definition 1 We say that a vertex v of a graph G is -isolated if there is no a triangle in G which contains v. In other words, v is -isolated if NG (v) is an independent set. Definition 2 We say that a graph G is a -graph if it has not -isolated vertices and NG (u) ∩ NG (v) = ∅ for every its edge (u, v).
Definition 3 Let G be a graph. Let G be obtained from G by deleting all -isolated vertices. We say that a -graph H is a -skeleton of a graph G if H is a spanning subgraph of G . We denote by (G) the -skeleton of a graph G. We call connected component of a graph G a connected component of (G). Definition 4 We say that a graph G is -isomorphic to a graph H if (G) (H ).
We denote this relation by G H . Obviously, the sets of triangles of G and (G) coincide. Hence, μC3 (G) = μC3 ((G)), βC3 (G) = βC3 ((G)). If H is an induced subgraph of G, then (H ) is an induced subgraph of (G). Therefore, μC3 (H ) = βC3 (H ) iff μC3 ((H )) = βC3 ((H )). So, the following lemmas are true. Lemma 1 A graph G is triangle-König iff every -connected component of G is a triangle-König graph. Lemma 2 A graph G is a forbidden induced subgraph for the class K (C3 ) iff (G) is a forbidden induced subgraph for this class.
Corollary 1 Let G H . Then, G is a forbidden induced subgraph for the class K (C3 ) iff H is a forbidden induced subgraph for this class.
50
D. B. Mokeev
3 The Simplest Minimal Forbidden Induced Subgraph Obviously, μC3 (K4 ) = 1 and βC3 (K4 ) = 2. If a graph H is a proper induced subgraph of K4 , then, obviously, μC3 (H ) = βC3 (H ). So, the following lemma is true. Lemma 3 The graph K4 is a minimal forbidden induced subgraph for the class K (C3 ).
4 Belts Definition 5 Let G be a graph obtained from a simple path by adding some edges. Each added edge connects a pair of vertices on the distance 2 in the path. We call G a belt. We call edges of the basic path inner edges. We call added edges outer edges. Considering a belt of n vertices, we assume that its vertices are labeled along the basic simple path as 0, 1, . . . , n − 1 and each outer edge (i − 1, i + 1) is labeled as i for each i from 1 to n − 2. We say that a belt is full if it has all possible outer edges. We denote Bn a full belt of n vertices. We denote Bn (e1 , e2 , . . . , em ) where 1 ≤ e1 < e2 < · · · < em ≤ n − 2 a belt obtained from Bn by deleting edges e1 , e2 , . . . , em . For example, Fig. 1 is the belt B13 (2, 8, 9). Here, thick lines mean inner edges and thin lines mean outer edges. Lemma 4 Each belt is a triangle-König graph. Proof Let G be a belt Bn (e1 , e2 , . . . , em ). The proof is by induction on the number of vertices. If n ≤ 3, then G is isomorphic to one of the graphs K1 , K2 , P3 , C3 which all are, obviously, triangle-König. Suppose that for each n < k the equality μC3 (G) = βC3 (G) is true. Let n = k.
If e1 = 1, then the vertex 1 is -isolated and the graphs G G\1. By the inductive assumption, μC3 (G\1) = βC3 (G\1). Therefore, by the Lemma 1, μC3 (G) = βC3 (G). If e1 > 1, then (1, 2, 3) is triangle in G. We consider a graph G obtained from G 0 by deleting the vertices 1, 2, 3. Let M be a maximum triangle packing and C be a minimum triangle vertex cover of the graph G . By the inductive assumption,
Fig. 1 The graph B13 (2, 8, 9)
On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs
51
|M | = |C|. Obviously, M ∪ {(1, 2, 3)} and C ∪ {3} are maximum triangle packing and minimum triangle vertex cover of the graph G of the same cardinality. Thus, if G is a belt, then μC3 (G) = βC3 (G). But each -connected component of a subgraph H of G is a belt. So, by the Lemma 1, μC3 (H ) = βC3 (H ) and G is a triangle-König graph. We call length of a belt the number of triangles in it. In particular, length of Bn equals n − 2.
5 Rings Definition 6 Let G be a graph obtained from a chordless cycle of 5 or more vertices by adding some edges. Each added edge connects a pair of vertices on the distance 2 in the cycle. We call G a ring. We call edges of the basic cycle inner edges. We call added edges outer edges. Considering a ring of n vertices, we assume that its vertices are labeled along the basic cycle as 0, 1, . . . , n − 1. The arithmetic operations with the vertex labels are performed modulo n. Each outer edge (i − 1, i + 1) is labeled as i for each i from 0 to n − 1. Each triangle (i − 1, i, i + 1) is labeled as i for each i from 0 to n − 1. Assume that the arithmetic operations with the outer edge labels and triangle labels are performed modulo n. We say that a ring is full if it has all possible outer edges. We denote Rn a full ring of n vertices. We denote Rn (e1 , e2 , . . . , em ) where 0 ≤ e1 < e2 < · · · < em ≤ n a belt obtained from Rn by deleting edges e1 , e2 , . . . , em . For example, Fig. 2 is the ring R15 (0, 5, 7, 9). Here, thick lines mean inner edges and thin lines mean outer edges. It is easy to see that a triangle i can be strongly adjacent to the triangles i − 1 and i + 1 and weakly adjacent to the triangles i − 2 and i + 2 if such triangles exist in a graph. It is obvious that each -connected component of each induced subgraph of a ring is a belt. Theorem 1 A full ring Rn is a triangle-König graph if n is divisible by 3. If n = 5, then / K (C3 ). Otherwise, Rn is minimal forbidden induced subgraph for K (C3 ). Rn ∈ Proof R5 is isomorphic to K5 and contains K4 as a subgraph. So, by the Lemma 3, it is not a triangle-König graph. Suppose now that n > 5. Each triplet of vertices (i−1, i, i + 1) where 0 ≤ i ≤ n in Rn forms a triangle. It means that μC3 (Rn (H )) = n3 . Each vertex i in Rn is contained in triangles i − 1, i, i + 1. So βC3 (Rn ) = n3 . Thus, obviously, μC3 (Rn ) = βC3 (Rn ) iff n is divisible by 3. Since each -connected component of each induced subgraph of a ring is a belt, then by the lemmas 1 and 4, each induced subgraph of Rn is a triangle-König graph.
52
D. B. Mokeev
Fig. 2 The graph R15 (0, 5, 7, 9)
So Rn is a triangle-König graph if n is divisible by 3 and is minimal forbidden induced subgraph for K (C3 ) otherwise. Corollary 2 A graph G is a minimal forbidden induced subgraph for K (C3 ) if (G) Rn where n ≥ 7 and n is not divisible by 3. One can see that if a ring is not full then it consists of full belts of lengths e2 − e1 − 1, e3 − e2 − 1, . . . , em − em−1 − 1, n + e1 − em − 1 which series connected with the common vertices. We call such belts the blocks in the ring. We associate a block-vector (j1 , j2 , . . . , jm ) with each ring which is not full. Here, ji = ei+1 − ei − 1 for each i from 1 to m − 1 and jm = n + e1 − em − 1. In other words, ji equals length of a corresponding block. We say that a ring Rn (e1 , e2 , . . . , em ) is crowded if ji > 0 for each i from 1 to m in its block-vector. Note that if a ring is not crowded then it is -isomorphic to a belt and, therefore, is a triangle-König graph. For example, the ring R15 (0, 5, 7, 9) (Fig. 2) is crowded and has the block-vector (5, 4, 1, 1). Considering a ring with a block-vector (j1 , j2 , . . . , jm ) assume that the arithmetic operations with the indexes of j are performed modulo m. Lemma 5 Let G be a crowded ring with a block-vector (j1 , j2 , . . . , jm ) and there exists l from 1 to m such that jl = jl+1 = 1. Then G is a triangle-König graph iff a crowded ring G with a block-vector (j1 , j2 , . . . , jl−1 , 2, jl+2 , . . . , jm ) is a triangleKönig graph. Proof Let M be a maximum triangle packing and C be a minimum triangle vertex cover of G. Note that if C contains one of the vertices i − 1, i + 1, then C\{i − 1, i +
On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs
53
1} ∪ {i} is the minimum triangle vertex cover of G. Thus, we can assume that C does not contain vertices i − 1, i + 1. Blocks which correspond to jl and jl+1 in G consist of two triangles which are weakly adjacent. Let these triangles be labeled as i − 1, i + 1. Since jl = jl+1 = 1, G does not contain triangles i − 2, i, i + 2. It is easy to see that G can be obtained from G by deleting the vertex i − 1 and adding the edge (i − 2, i + 1). Triangle i − 1 of a graph G is transformed to the triangle (i − 2, i, i + 1) of a graph G . We assume that this triangle is labeled as i − 1 in G . Note that the triangles i − 1, i + 1 are strongly adjacent in G . One can see that the sets of triangles in G and G are equal. Moreover, two triangles are adjacent in G iff they are adjacent in G . So M is a maximum triangle packing of G and, since C does not contain the vertex i − 1, C is a minimum triangle vertex cover of G . Thus, μC3 (G) = μC3 (G ) and βC3 (G) = βC3 (G ). Each -connected component of each induced subgraph of the both graphs G and G is a belt. So by the Lemmas 1 and 4, each induced subgraph of G and G is a triangle-König graph. Thus, G and G either belong or do not belong to the class K (C3 ) simultaneously. Lemma 6 Let G be a crowded ring with a block-vector (j1 , j2 , . . . , jm ). Let l be a number from 1 to m such that jl ≥ 3 and there exists a crowded ring G with a block-vector (j1 , j2 , . . . , jl−1 , jl − 3, jl+1 , . . . , jm ). Then G is a triangle-König graph iff G is a triangle-König graph. Proof It is obvious that rings with the block-vectors (j1 , j2 , . . . , jl−1 , jl , jl+1 , . . . , jm ) and (jl+1 , jl+2 , . . . , jm , j1 , j2 , . . . , jl ) are isomorphic. So assume that l = m. Then G can be obtained from G by deleting vertices n − 3, n − 2, n − 1, i.e., if G Rn (e1 , e2 , . . . , em ), and then G Rn−3 (e1 , e2 , . . . , em ). Let M be a maximum triangle packing of G . Suppose a triangle (n − 4, 1, 2) ∈ M , then assume M = M \ {(n − 4, 1, 2)} ∪ {(n − 1, 1, 2), (n − 4, n − 3, n − 2)}. Otherwise assume M = M ∪ {(n − 3, n − 2, n − 1)}. In both cases, M is the max imum triangle packing of G of cardinality |M | + 1. Let C be a minimum triangle vertex cover of G . Let i be a maximum label of a vertex in C . It is easy to see that i ∈ {n − 4, n − 5, n − 6} Assume C = C ∪ {i + 3}. C is the minimum triangle vertex cover of G of cardinality |C | + 1. Thus, μC3 (G) = μC3 (G ) + 1 and βC3 (G) = βC3 (G ) + 1. Each -connected component of each induced subgraph of the both graphs G and G is a belt. So by the Lemmas 1 and 4, each induced subgraph of G and G is a triangle-König graph. Thus, G and G either belong or do not belong to the class K (C3 ) simultaneously. We call a sunflower a crowded ring such that each component of its block-vector equals 1. We denote by Sunflowern a sunflower of n triangles. It is easy to see that the block-vector of Sunflowern consists of n ones.
54
D. B. Mokeev
Fig. 3 The graph Sunflower5
For example, the block-vector of the graph Sunflower5 (Fig. 3) is (1, 1, 1, 1, 1). Let G be a crowded ring with a block-vector (j1 , j2 , . . . , jm ). Assume ji is a remainof G der of dividing ji by 3 for each i from 1 to m. We say that a sunflower-factor j otherwise. equals 0 if at least one of the j1 , j2 , . . . , jm equals 0 and m i=1 i Note that there are only three crowded rings of 5 vertices up to isomorphisms. They are R5 , R5 (0), and R5 (0, 3). Each of them contains K4 as a subgraph. So none of the rings of 5 vertices is a minimal forbidden induced subgraph for the class K (C3 ). Theorem 2 Let G be a ring of 6 or more vertices which is not full. G is a triangleKönig graph if its sunflower-factor is even. Otherwise, G is a minimal forbidden induced subgraph for K (C3 ). Proof Each -connected component of each induced subgraph of the graph G is a belt. So by the Lemmas 1 and 4, each induced subgraph of G is a triangle-König graph. Let (j1 , j2 , . . . , jm ) be a block-vector of G and f be its sunflower-factor. Assume ji is a remainder of dividing ji by 3 for each i from 1 to m. Suppose f = 0. Then there exists l from 1 to m such that jl = 0. Consider a ring G with the block-vector j1 , j2 , . . . , jl−1 , jl , jl+1 , . . . , jm . By the Lemma 6, G is a triangle-König graph iff G is a triangle-König graph. But the ring G is not crowded. Therefore G is -isomorphic to a belt and, by the Lemma 4, is a triangle-König graph. Thus, if f equals 0, then G is a triangle-König graph. Suppose now that ji ∈ {1, 2} for each i from 1 to m. Then f = m i=1 ji . Consider four cases.
On Forbidden Induced Subgraphs for the Class of Triangle-König Graphs
55
1. m = 1 and f = j1 = 1. Consider a ring R5 (0) with the block-vector (4). By the Lemma 6, G is a triangle-König graph iff R5 (0) is a triangle-König graph. But / K (C3 ). So G does not belong to K (C3 ). R5 (0) ∈ 2. m = 1 and f = j1 = 2. Consider a ring R6 (0) with the block-vector (5). M = {1, 4} is a maximum triangle packing and C = {0, 3} is a minimum triangle vertex cover of R6 (0). So μC3 (R6 (0)) = βC3 (R6 (0)) = 2 and R6 (0) belongs to K (C3 ). Thus G belongs to K (C3 ). 3. m = 2, f = 2 and j1 = j2 = 1. Consider a ring R7 (0, 5) with the block-vector (1, 4). By the Lemma 6, G is a triangle-König graph iff R7 (0, 5) is a triangleKönig graph. M = {2, 6} is a maximum triangle packing and C = {0, 3} is a minimum triangle vertex cover of R7 (0, 5). So μC3 (R7 (0, 5)) = βC3 (R7 (0, 5)) = 2 and R7 (0, 5) belongs to K (C3 ). Thus, G belongs to K (C3 ). 4. f ≥ 3. Consider a ring G with the block-vector j1 , j2 , . . . , jm . By the Lemma graph. 6, G is a triangle-König graph iff G is a triangle-König Replace each ji = 2 in the block-vector j1 , j2 , . . . , jm into the pair of ones. The obtained block-vector corresponds to Sunflowerf . By the Lemmas 5 and 6, G is a triangle-König graph iff Sunflowerf is a triangleKönig graph. One can see that outer edges of Sunflowerf form a chordless cycle Cf . A set of triangles is a triangle packing of Sunflowerf iff the set of their outer edges is a matching of this cycle. A set of vertices is a triangle vertex cover of Sunflowerf iff it is a vertex cover of this cycle. So, by the König-Egervary theorem, μC3 Sunflowerf = βC3 Sunflowerf = 2f iff Cf is a bipartite graph, i.e., if f is even. Thus, G is a triangle-König graph if f is even and G is a minimal forbidden induced subgraph for K (C3 ) if n is odd. Corollary 3 A graph G is a minimal forbidden induced subgraph for K (C3 ) if (G) is a ring of 6 or more vertices which is not full and has if its sunflower-factor is odd.
6 Common Theorem The common theorem is a corollary of the Lemmas 1, 4 and of the Theorems 1, 2. Theorem 3 A graph G is a triangle-König graph if each of its -connected component is a belt, or a full ring Rn , where n is divisible by 3, or a crowded ring with even sunflower-factor. Acknowledgements This work was supported by the Russian Science Foundation Grant No. 1711-01336.
56
D. B. Mokeev
References 1. Abdelsadek, Y., Herrmann, F., Kacem, I., Otjacques, B.: Branch-and-bound algorithm for the maximum triangle packing problem. Comput. Indus. Eng. 81, 147–157 (2015) 2. Alekseev, V.E., Mokeev, D.B.: König graphs with respect to 3-paths. Diskretnyi Analiz i Issledovanie Operatsii. 19, 3–14 (2012) 3. Alekseev, V.E., Mokeev, D.B.: König graphs for 3-paths and 3-cycles. Discrete Appl. Math. 204, 1–5 (2016) 4. Bafna, V., Pevzner, P.A.: Genome rearrangements and sorting by reversals. SIAM J. Comput. 25, 272–289 (1996) 5. Caprara, A., Rizzi, R.: Packing triangles in bounded degree graphs. Inform. Proc. Lett. 84(4), 175–180 (2002) 6. Cornuéjols, G.: Combinatorial Optimization: Packing and Covering. Society for Industrial and Applied Mathematics, Philadelphia (2001) 7. Ding, G., Xu, Z., Zang, W.: Packing cycles in graphs II. J. Comb. Theory. Ser. B. 87, 244–253 (2003) 8. Edmonds, J.: Paths, trees, and flowers. Canadian J. Math. 17(3–4), 449–467 (1965) 9. Grötschel, M., Lovasz, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, Heidelberg (1993) 10. Guruswami, V., Rangan, C.P., Chang, M.S., Chang, G.J., Wong, C.K.: The Kr -packing problem. Computing 66(1), 79–89 (2001) 11. Hassin, R., Rubinstein, S.: An approximation algorithm for maximum triangle packing. Discrete Appl. Math. 154(6), 971–979 (2006) 12. Hell, P.: Graph packing. Electron. Notes Discrete Math. 5, 170–173 (2000) 13. Kirkpatrick, D.G., Hell, P.: On the completeness of a generalized matching problem. In: Proceedings of the Tenth Annual ACM Symposium on Theory of computing (San Diego, May 1–3), pp. 240–245. ACM, New York (1978) 14. Kloks, T., Poon, S.H.: Some results on triangle partitions. In: arXiv preprint (2011) Available via DIALOG. https://arxiv.org/abs/1104.3919v1 Cited 25 Aug 2017 15. Korpelainen, N., Lozin, V.V., Malyshev, D., Tiskin, A.: Boundary properties of graphs for algorithmic graph problems. Theor. Comp. Sci. 412, 3545–3554 (2011) 16. Malyshev, D.: On the infinity of the set of boundary classes for the 3-edge-colorability problem. J. Appl. Indus. Math. 4(2), 213–217 (2010) 17. Malyshev, D.: On the number of boundary classes in the 3-colouring problem. Discrete Math. Appl. 19(6), 625–630 (2010) 18. Malyshev, D.S.: The impact of the growth rate of the packing number of graphs on the computational complexity of the independent set problem. Discrete Math. Appl. 23(3–4), 245–249 (2013) 19. Malyshev, D.: A study of the boundary graph classes for colorability problems. J. Appl. Indus. Math. 7(2), 221–228 (2013) 20. Malyshev, D.: Classes of graphs critical for the edge list-ranking problem. J. Appl. Indus. Math. 8(2), 245–255 (2014) 21. Malyshev, D., Pardalos, P.M.: Critical hereditary graph classes: a survey. Optim. Lett. 10(8), 1593–1612 (2016) 22. Mani´c, G., Wakabayashi, Y.: Packing triangles in low degree graphs and indifference graphs. Discrete Math. 308(8), 1455–1471 (2008) 23. Mokeev, D.B.: König graphs for 4-paths. Models, algorithms and technologies for network analysis, vol. 104, pp. 93–103. Springer, Heidelberg (2014) 24. Yuster, R.: Combinatorial and computational aspects of graph packing and graph decomposition. Comput. Sci. Rev. 1, 12–26 (2007)
The Global Search Theory Approach to the Bilevel Pricing Problem in Telecommunication Networks Andrei V. Orlov
Abstract In this paper, we develop new methods of local and global search for finding optimistic solutions to the hierarchical problem of optimal pricing in telecommunication networks. These methods are based on the fact that a bilevel optimization problem can be equivalently represented as a nonconvex optimization problem (with the help of the Karush–Kuhn–Tucker conditions and the penalty approach). To solve the resulting nonconvex problem, we apply the Global Search Theory (GST). Computational testing of the developed methods on a test problem demonstrated workability and efficiency of the approach proposed. Keywords Bilevel pricing problem · Telecommunication networks KKT-approach · Penalty approach · Global Search Theory Numerical experiment
1 Introduction The research in the field of economic systems, which, as a rule, possess a hierarchical structure, often results in conflict statements of optimization problems. This could be explained by the fact that certain elements of a system have their own goals that generally do not coincide with the goal of the system as a whole. Analysis of such systems does not conform with the regular optimization theory, because the interaction between the elements of the system makes the concept of optimality more complex. At present, the most popular tool for modeling of problems that appear in hierarchical systems is the bilevel programming (optimization) [8] that makes it possible to write down and simultaneously take into account interests of two adjacent levels of such a system.
A. V. Orlov (B) Matrosov Institute for System Dynamics and Control Theory of SB RAS, Lermontov St. 134, Irkutsk 664033, Russia e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_5
57
58
A. V. Orlov
In addition, a hierarchy has been one of the most promising paradigms in mathematical programming in recent years [23]. The pioneering work on bilevel optimization [4] developed into the monograph on the class of mathematical programs with equilibrium constraints (MPECs) [17] and, somewhat later, into the classical monographs on the Bilevel Programming Problems (BPPs) [1, 8]. There exist a lot of applications of the MPECs and BPPs in control, economy, traffic, energy systems, telecommunication networks, etc. (see, e.g., [7, 9]). Investigation of the MPECs or the BPPs from the viewpoint of designing efficient numerical methods is the ongoing issue in the modern theory and methods of Mathematical Optimization [23]. The bilevel problems in the classical statement [1, 8] represent optimization problems, which, side by side with ordinary constraints, such as equalities and inequalities, include a constraint described as an optimization subproblem: ⎫ F(x, y) ↓ min, x ∈ D, y ∈ Y∗ (x), ⎬ x,y
Y∗ (x) = Argmin{G(x, y) | (x, y) ∈ D1 }. ⎭
(BP g )
y
Note that we consider an optimistic formulation of the bilevel problem (the goal of the upper level can be adjusted to the actions of the lower level) [8]. Therefore, the goal function F is to be minimized with respect to x and y simultaneously. It seems that direct investigation of the problem (BP g ) to develop methods of solution is quite challenging for implementation. Therefore, we suggest to investigate special bilevel classes [13, 22, 30, 31] or bilevel applications. In particular, tarification problems, which are popular in decision making, naturally result in bilevel programming formulations with a specific structure (see, e.g., [7]). Several methods have been proposed in the literature for this kind of problems [5, 6, 10, 15, 16, 35]. Nevertheless, development of new numerical methods that can handle high dimension problems remains an important issue. In this work, we propose new methods of the local and global search for the hierarchical problem of optimal pricing in telecommunication networks [33]. These methods are based on the following. First, we use the fact that a bilevel optimization problem (in a global sense) can be equivalently represented as a nonconvex optimization problem [1, 8] (with the help of the Karush–Kuhn–Tucker conditions and the penalty approach). Second, we apply the special Global Search Theory (GST) developed by A.S. Strekalovsky (to solving the resulting nonconvex problem) [24–28, 32]. The GST allows the construction of efficient numerical methods for several classes of one-level and bilevel problems with nonconvex structures (of the dimension up to 1000) [13, 19, 20, 27, 28, 30, 31]. In contrast to mainstream approaches, such as the branch and bound algorithm, cutting methods, outer and inner approximations, the covering method, etc., the GST acknowledges achievements of the modern convex optimization [2, 3, 18] and extensively uses effective numerical methods for solving convex problems when designing global search algorithms. Therefore, the GST will be useful for the bilevel problem of optimal pricing in telecommunication networks.
The Global Search Theory Approach to the Bilevel Pricing Problem …
59
2 Problem Statement and Reduction Consider a telecommunication traffic network G = (V, U, A) consisting of a set of nodes V (|V | = m), and a set of arcs U (|U | = n) with the incidence matrix A. The nodes represent origin, destination, or transit stations for telecommunication traffics, and arcs represent transmission lines. Let the upper level player be the leading telecommunication operator, and the lower level player be the client that needs to communicate between the nodes for transferring the information. The arc set of the network is split into two subsets, U1 (toll arcs) and U2 , which are the set of links operated by the leading telecommunication operator and by the competing operators, respectively. Let |U1 | = p, |U2 | = q ( p + q = n). Let ci1 be a fixed part of the cost of the information unit transmission on the corresponding arc (for each arc of the graph G belonging to U1 ), and xi be an additional fee to the transfer, determined by the upper level player. So, the total cost of the information unit transmission on an arbitrary arc belonging to U1 is ci1 + xi , i = 1, 2, . . . , p. Thus, x = (x1 , x2 , . . . , x p ) is an upper level variable. The set X = {x ∈ IR p | x i ≤ xi ≤ x i , i = 1, 2, . . . , p} defines bounds of tariff changes. Also, let ci2 , i = 1, 2, . . . , q be a fixed part of the cost of the information unit transmission on the arcs belonging to U2 . Further, assume that the client (player of the lower level) wants to transfer K datasets from the node s k to the node r k . Each dataset has the volume δ k , k = 1, 2, . . . , K . In that case, a lower level variable is determined by the vector
y = (y 1 , . . . , y K ), where y k =(y k1 , y k2 ) ∈ IR p+q = IR n , k = 1, 2, . . . , K . Each component of the vector y k = (y k1 , y k2 ) is the volume of traffic passing through the corresponding arc, y k1 = (y1k1 , y2k1 , . . . , y k1 p ) is the traffic flow on the arcs belonging to U1 , and y k2 = (y1k2 , y2k2 , . . . , yqk2 ) is the traffic flow on the arcs belonging to U2 . In addition, the bandwidth constraints imposed on each arc have the form: 0 ≤ y kj ≤ y kj ,
j = 1, 2, . . . , n,
k = 1, 2, . . . , K .
Also, we should introduce the following arc flows constraint: Ay k = δ k d k ,
k = 1, 2, . . . , K ,
where d k ∈ IR m is the zero vector with dskk = +1, drkk = −1. The latter constraint ensures that the total outgoing traffic from all nodes in the network is equal to the total amount of incoming traffic (it is known as Kirchhoff’s First Law). The goal of the upper level is to maximize the revenue, and the goal of the lower level is to minimize the cost of the data transmission. So, we can formulate the following bilevel optimization problem:
60
A. V. Orlov K
x, y ↑ max, x ∈ X, x,y K c1 + x, y k1 + c2 , y k2 y ∈ Y∗ (x) = Argmin k1
k=1
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬
(BP)
⎪ ⎪ ⎪ ⎪ ⎪ k | Ay k = δ k d k , 0 ≤ y kj ≤ y j , j = 1, 2, . . . n, k = 1, 2, . . . K . ⎭ y
k=1
Note that we obtain a special bilinear–bilinear problem of the bilevel optimization. It is worth mentioning that minimization at the upper level in the problem (BP) is carried out simultaneously with respect to two variables, which means that the problem is formulated in the optimistic statement [8]. It seems that in this case, we only have to search for the optimistic solution to the problem (BP), because, for example, the data transmission contract between the upper and the lower level might take into account just the upper level goal. Given the separable structure of the problem, for the sake of simplicity, let K = 1 without loss of generality. We obtain the following bilevel problem:
F(x, y) =x, y 1 ↑ max, x,y
x ∈ X,
⎫ ⎪ ⎬
y ∈ Y∗ (x) = Argmin{c + x, y + c2 , y 2 | Ay = δd, 0 ≤ y ≤ y}, ⎪ ⎭ 1
1
(P)
y
where x ∈ X = {x ∈ IR p | x i ≤ xi ≤ x i , i = 1, 2, . . . , p}, y = (y 1 , y 2 ) ∈ IR n , c1 ∈ IR p , c2 ∈ IR q , A is the (m × n)-matrix, d ∈ IR m , δ ∈ IR, and c1 ≥ 0, c2 ≥ 0. To reduce the problem (P) to a one-level problem, it is common to apply a classic approach, which involves replacement of the lower level problem by its optimality conditions [8, 13, 22, 29–31]. In our case, since the lower level problem at the fixed variable of the upper level is a linear programming problem, we use the corresponding duality theory (see, for example, [2]). Once we transformed all constraints to a standard inequality form, we can formulate the following one-level problem, which is equivalent to the problem under study from the viewpoint of searching for a global solution: ⎫ x, y 1 ↑ max , ⎪ x, y, v ⎪ ⎬ v, b1 − B1 y = 0, (DC C ) x ≤ x ≤ x, v ≥ 0, B1 y ≤ b1 , ⎪ ⎪ ⎭ (c1 , c2 )T + x T Q + v B1 = 0, where the variable v is the vector of the Lagrange multipliers, v ∈ IR 2n+2m , B1 = (A, −A, E n , −E n )T , b1 = (δd, −δd, y, 0n )T , Q = (E p , 0n− p ). Therefore, instead of the bilevel problem (P), we can solve a one-level problem (DC C ) with a bilinear goal function and a bilinear constraint. As well-known, [19, 20, 28], the bilinear function is nonconvex and can be represented as a difference between two convex functions (i.e., it belongs to a class of d.c. functions) [34]. Optimization problems with such nonconvex functions can be effectively addressed by the Global Search Theory [24, 32]. However, this situation is complicated by the
The Global Search Theory Approach to the Bilevel Pricing Problem …
61
fact that nonconvexity of the problem (DC C ) is found in constraints as well as in the goal function, which means that this problem has a second degree of nonconvexity [24]. To reduce the problem to a problem that has nonconvexity only in a goal function, we apply the penalty method [2, 3]. Introduce a σ -parametric family of problems: Φ(x, y, v) = x, y 1 − σ v, b1 − B1 y ↑ max , x, y, v
⎫ ⎪ ⎪ ⎬
(x, v) ∈ U ={(x, v) | x ≤ x ≤ x, v ≥ 0, (c1 , c2 )T + x T Q + v B1 = 0}, ⎪ ⎪ ⎭ y ∈ Y ={y | B1 y ≤ b1 }, (DC (σ )) where σ is the penalty parameter, σ > 0. According to the constraints of this problem, it is obvious that v, b1 − B1 y ≥ 0 ∀(x, y, v) ∈ U × Y =: D.
(1)
Further, consider the connection between the solutions to (DC (σ )) and (DC C ). Let (x(σ ), y(σ ), v(σ )) be a solution to (DC (σ )) at some value of σ . Then the following statement holds. Proposition 1 (i) Let for some σ = σˆ on the solution (x(σˆ ), y(σˆ ), v(σˆ )) to the problem (DC (σ )) the following equality be valid: v(σ ), b1 − B1 y(σ ) = 0. Then, the triple (x(σˆ ), y(σˆ ), v(σˆ )) is a solution to (DC C ). (ii) For all values of the parameter σ > σˆ , it is true that v(σ ), b1 − B1 y(σ ) = 0 and (x(σ ), y(σ ), v(σ )) is a solution to (DC C ). The proof of the above statement is carried out by means of the inequality (1) using the standard results on the penalty functions method (see, for example, [2, 3]). Therefore, given v(σ ), b1 − B1 y(σ ) = 0, the solution to the problem (DC (σ )) is a solution to (DC C ), and that is still the case when the value of σ grows. Hence, to solve the problem (P), we can use the directed enumeration method with respect to σ , which involves the search for the (global) solution in the series of problems (DC (σ )) that correspond to the increasing values of the parameter σ [29–31]. Moreover, it can be shown that there exists a finite value of σ with v(σ ), b1 − B1 y(σ ) = 0. The latter problem is the problem of d.c. maximization with a convex feasible set [24, 28]. It is known that such nonconvex problems might have a large number of local solutions located quite far from a global one, even from the viewpoint of the goal function’s value [11, 14, 24, 34]. Direct application of standard convex optimization methods [2, 3, 18] turns out to be inefficient when it comes to the global search. So, it is required to construct new global search methods that will allow us to escape from a stationary (critical) point. To solve the problem formulated above, we are going to construct the algorithms based on the Global Search Theory (GST) in d.c. programming problems, which was developed in [24, 32]. Global Search Algorithms based on the GST consist of
62
A. V. Orlov
two principal stages: (1) a special local search method that takes into account the structure of the problem under scrutiny [12, 19, 20, 24, 27, 29]; (2) the procedures, based on the Global Optimality Conditions (GOCs), which allow us to improve the point provided by the Local Search Method [19–22, 24–28].
3 Special Local Search Method It is noteworthy that nonconvexity in the problem (DC (σ )) is generated by bilinear components only. On the basis of this fact, we suggest to perform the local search in the problem (DC (σ )) using the idea of the successive solution by different groups of variables. Earlier this idea was successfully applied in solving bimatrix games [19, 28], problems of bilinear programming [20, 28], and quadratic-linear bilevel problems [31]. Here we single out a pair (x, v) and a variable y, and then the problem under study becomes a linear programming problem at the fixed pair (x, v) as well as at the fixed variable y. These new problems can be easily solved. Therefore, we arrive at a specialized local search method. Let there be given a starting point (x0 , y0 , v0 ) ∈ D. Y -procedure Step 0. Set s := 1, y s = ((y 1 )s , (y 2 )s ) := y0 . ρs -solution Step 1. Using a suitable linear programming method, find the 2 s+1 s+1 (x , v ) to the problem (y 1 )s , x − σ (b1 − B1 y s ), v ↑ max,
x,v
(x, v) ∈ U = {(x, v) | x ≤ x ≤ x, v ≥ 0, (c1 , c2 )T + x T Q + v B1 = 0}, (L P xv (y s )) such that the following inequality holds: ⎫ ρs ⎬ ≥ 2 1 s s ≥ sup{(y ) , x − σ (b1 − B1 y ), v | (x, v) ∈ U }. ⎭ (y 1 )s , x s+1 − σ (b1 − B1 y s ), vs+1 +
(2)
x, v
Step 2. Find the
ρs -solution y s+1 to the following LP problem: 2
x s+1 , y 1 + σ (vs+1 )T B1 , y ↑ max, y
such that
y ∈ Y = {y | B1 y ≤ b1 },
(L P y (x s+1 , vs+1 )) ρs ⎫ ≥⎬ x s+1 , (y 1 )s+1 + σ (vs+1 )T B1 , y s+1 + 2 (3) s+1 1 s+1 T ≥ sup{x , y + σ (v ) B1 , y | y ∈ Y }. ⎭ y
The Global Search Theory Approach to the Bilevel Pricing Problem …
63
Step 3. Set s := s + 1 and move to Step 1. Note that to launch the Y -procedure, we do not need the whole point (x0 , y0 , v0 ), it is sufficient to know just y0 , which is why the procedure was given this name. The following convergence theory is valid for the Y -procedure. Theorem 1 (i) Given ρs > 0, s = 0, 1, 2, . . . ,
∞
ρs < +∞, the numerical
s=0 s
sequence of the values of the function Φs = Φ(x , y s , vs ), generated by the Y -procedure, converges. (ii) Any accumulation point (x, ˆ yˆ , vˆ ) of the sequence {(x s , y s , vs )} satisfies the following inequalities: Φ(x, ˆ yˆ , vˆ ) ≥ Φ(x, yˆ , v) ∀ (x, v) ∈ U,
(4)
Φ(x, ˆ yˆ , vˆ ) ≥ Φ(x, ˆ y, vˆ ) ∀ y ∈ Y.
(5)
Proof (i) Denote Φ s = Φ(x s+1 , y s , vs+1 ). Then, taking into account (2) and (3), the following chain of inequalities is valid: ρs ≤ Φs ≤ sup{(y 1 )s , x − σ (b1 − B1 y s ), v | (x, v) ∈ U } ≤ Φ s + 2 x, v ρs ≤ sup{x s+1 , y 1 + σ (vs+1 )T B1 , y | y ∈ Y } − σ b1 , vs+1 + ≤ Φs+1 + ρs . 2 y (6) Whence it follows that Φs+1 ≥ Φs − ρs , i.e., the numerical sequence {Φs }, s = 0, 1, 2, . . . is almost monotone nondecreasing. Besides, this sequence is bounded from above, since it is obvious that the function Φ(·) is bounded from above on D because the function F(·) is also bounded from above and by construction we have (x s , y s , vs ) ∈ D for s = 0, 1, 2, . . . . Then, taking into account the condition imposed on ρs , we arrive at the conclusion that the sequence {Φs } converges (see, for example, Lemma 2.6.3 from [36] or Lemma 1 from [31]). ˆ yˆ , vˆ ). According to Step 1 of the Y -procedure, the (ii) Let (x s , y s , vs ) → (x, following inequality holds: Φ(x s+1 , y s , vs+1 ) +
ρs ≥ Φ(x, y s , v) 2
∀ (x, v) ∈ U.
Passing to the limit at s → +∞ (here ρs ↓ 0) and taking into account that Φ(x, y, v) is continuous, we obtain the inequality (4). Similarly, according to Step 2 of the Y -procedure, the following inequality is valid: ρs ≥ Φ(x s+1 , y, vs+1 ) ∀y ∈ Y. Φ(x s+1 , y s+1 , vs+1 ) + 2 Passing to the limit, as we have done above, we obtain the inequality (5).
64
A. V. Orlov
Definition 1 The triple (x, ˆ yˆ , vˆ ), satisfying (4) and (5), is said to be the critical point of the problem (DC (σ )). If for some point the inequalities (4) and (5) are valid with a certain accuracy, then the point is said to be approximately critical. Further, it can be shown that if at the iteration s of the local search method, the following inequality holds (7) Φs+1 − Φ s ≤ τ, then it follows from (6) that Φ(x s+1 , y s , vs+1 ) ≥ sup{Φ(x, y s , v) | (x, v) ∈ U } − (τ + ρs ),
(8)
x, v
which means that (x s+1 , y s , vs+1 ) is a partially global (τ + ρs )-solution to (DC (σ )) with respect to (x, v) . On the other hand, taking (7) into account, it follows from (6) that Φ(x s+1 , y s , vs+1 ) ≥ sup{Φ(x s+1 , y, vs+1 ) | y ∈ Y } − (τ + y
ρs ). 2
(9)
It may be concluded from (8) and (9) that the inequality (7) can be used as a stopping criterion of the Y -procedure. In this case, the triple (x s+1 , y s , vs+1 ) will be the approximately critical point in the problem (DC (σ )). Note that it is possible to suggest other stopping criteria for the proposed local search procedure that will derive a different accuracy for inequalities from the definition of the critical point (see also [28, 29]). To perform the local search in the problem (DC (σ )), we can propose another implementation, “symmetric” to the one described above. Then the auxiliary problems will be solved in a different order, and, to launch the algorithm from the starting point (x0 , y0 , v0 ), we require not the component y0 but a pair (x0 , v0 ) (X V procedure) (see [28, 29]).
4 Global Search In accordance with the Global Search Theory [24, 32], the second stage of finding global solutions to nonconvex problems involves construction of a procedure for escaping from critical points based on the Global Optimality Conditions (GOCs) [24, 32]. To this end, we have to rewrite the goal function as a difference between two convex function, since, as has been observed above, the problem under study belongs to a class of the d.c. maximization problems. Using the known property of the scalar product, we can obtain, for example, the following representation of the goal function of (DC (σ )) as a difference of two convex functions: Φ(x, y, v) = h(x, y, v) − g(x, y, v),
The Global Search Theory Approach to the Bilevel Pricing Problem …
65
1 σ 1 B1 y + v 2 g(x, y, v) = x − y 1 2 where h(x, y, v) = x + y 1 2 + 4 4 4 σ + + B1 y − v 2 +σ b1 , v are convex functions. 4 The necessary Global Optimality Conditions by A.S. Strekalovsky that constitute the basis of the global search procedure have the following form in terms of the problem (DC (σ )). Theorem 2 [24, 31, 32] If the feasible point (x ∗ , y ∗ , v∗ ) is a global solution to (DC (σ )), then ∀(z, u, w, γ ) : h(z, u, w) = γ + ζ,
ζ := Φ(x ∗ , y ∗ , v∗ ),
(10)
g(z, u, w) ≤ γ ≤ sup (g, D),
(11)
x,y,v
g(x, y, v) − γ ≥ ∇h(z, u, w), (x, y, v) − (z, u, w)
∀(x, y, v) ∈ D.
(12)
The conditions (10)–(12) possess the so-called algorithmic (constructive) property: if the GOCs are violated, we can construct a feasible point that will be better than the point in question [24, 32]. Indeed, if for some (˜z , u, ˜ w, ˜ γ˜ ) from (10) on some ˜ y˜ , v˜ ) ∈ D the inequality (12) level ζ := ζk := Φ(x k , y k , vk ) for the feasible point (x, is violated: g(x, ˜ y˜ , v˜ ) < γ˜ + ∇h(˜z , u, ˜ w), ˜ (x, ˜ y˜ , v˜ ) − (˜z , u, ˜ w) ˜ , then it follows from the convexity of h(·) that Φ(x, ˜ y˜ , v˜ ) = h(x, ˜ y˜ , v˜ ) − g(x, ˜ y˜ , v˜ ) > h(˜z , u, ˜ w) ˜ + ζ − h(˜z , u, ˜ w) ˜ = Φ(x k , y k , v k ),
or, Φ(x, ˜ y˜ , v˜ ) > Φ(x k , y k , vk ). Therefore, the point (x, ˜ y˜ , v˜ ) ∈ D happens to be better than the point (x k , y k , vk ) with respect to the goal function value. Consequently, by varying parameters (z, u, w, γ ) in (10) for a fixed ζ = ζk and finding approximate solutions (x(z, u, w, γ ), y(z, u, w, γ ), v(z, u, w, γ )) to the linearized problems (see (12)) g(x, y, v) − ∇h(z, u, w), (x, y, v) ↓ min , (x, y, v) ∈ D, x,y,v
(PL (z, u, w))
we obtain a family of starting points to launch the local search procedure. Additionally, we do not need to sort through all (z, u, w, γ ) at each level ζ , because it is sufficient to prove that the inequality (12) is violated at the single 4-tuple ˜ y˜ , v˜ ), (˜z , u, ˜ w, ˜ γ˜ ). After that, we move to a new level (x k+1 , y k+1 , vk+1 ) := (x, ζk+1 := Φ(x k+1 , y k+1 , vk+1 ) and vary parameters again. The above procedure makes a basis for the following global search algorithm in the problem (DC (σ )), which takes into the account the adopted d.c. representation.
66
A. V. Orlov
Let there be given a starting point (x0 , y0 , v0 ) ∈ D, numerical sequences {τk }, {δk }. a set Dir = (τk , δk > 0, k = 0, 1, 2, ..., τk ↓ 0, δk ↓ 0, (k → ∞)), {(¯z 1 , u¯ 1 , w¯ 1 ), ..., (¯z N , u¯ N , w¯ N ) ∈ IR m+n+q |(¯z i , u¯ i , w¯ i ) = 0, i = 1, ..., N }, the numbers γ− = inf(g, D) and γ+ = sup(g, D), and the algorithm’s parameters M > 0 and η > 0. Step 0. Set k := 0, (x¯ k , y¯ k , v¯ k ) := (x0 , y0 , v0 ), i := 1, γ := γ− , γ = (γ+ − γ− )/M. Step 1. Proceeding from the point (x¯ k , y¯ k , v¯ k ) by Y - or X V -procedure of a local search, build a τk -critical point (x k , y k , vk ) ∈ D to the problem (DC (σ )). Set ζk := Φ(x k , y k , vk ). Step 2. Using (¯z i , u¯ i , w¯ i ) ∈ Dir , construct a point (z i , u i , wi ) of the approximation Ak = {(z 1 , u 1 , w 1 ), ..., (z N , u N , w N ) | h(z i , u i , wi ) = γ + ζk , i = 1, ..., N } of the level surface U (ζk ) = {(x, y, v) | h(x, y, v) = γ + ζk } of the convex function h(x, y, z), such that h(z i , u i , wi ) = γ + ζk . Step 3. If g(z i , u i , wi ) > γ + ηγ , then i := i + 1 and return to Step 2. Step 4. Find a δk -solution (x¯ i , y¯ i , v¯ i ) of the linearized problem (PL (z i , u i , wi )). Step 5. Starting at the point (x¯ i , y¯ i , v¯ i ), build a τk -critical point (xˆ i , yˆ i , vˆ i ) ∈ D to the problem (DC (σ )) by means of the local search method. Step 6. If Φ(xˆ i , yˆ i , vˆ i ) ≤ Φ(x k , y k , vk ), i < N , then set i := i + 1 and return to Step 2. Step 7. If Φ(xˆ i , yˆ i , vˆ i ) ≤ Φ(x k , y k , vk ), i = N and γ < γ+ , then set γ := γ + γ , i := 1 and go to Step 2. Step 8. If Φ(xˆ i , yˆ i , vˆ i ) > Φ(x k , y k , vk ), then set γ := γ− , (x¯ k+1 , y¯ k+1 , v¯ k+1 ) := (xˆ i , yˆ i , vˆ i ), k := k + 1, i := 1 and return to Step 1. Step 9. If Φ(xˆ i , yˆ i , vˆ i ) ≤ Φ(x k , y k , vk ), i = N and γ = γ+ , then stop. The point (x k , y k , vk ) is the obtained solution to the problem. It can be readily seen that this algorithm is not an algorithm in the conventional sense, because some of its steps are not specified. For example, we do not know how to construct a starting point and the set Dir , how to solve the problem (PL (z i , u i , wi )), etc. These issues will be covered below.
5 Implementation of the GSA First, to construct a feasible starting point, we used the projection of the chosen infeasible point (x 0 , y 0 , v0 ) onto the feasible set D by solving the following quadratic programming problem: 1 (x, y, v) − (x 0 , y 0 , v0 )2 ↓ min, x,y,v 2
(x, y, v) ∈ D.
(PR(x 0 , y 0 , v0 ))
The solution to (PR(x 0 , y 0 , v0 )) was taken as a starting point (x0 , y0 , v0 ) ∈ D.
The Global Search Theory Approach to the Bilevel Pricing Problem …
67
The value of the penalty parameter σ was chosen experimentally: σ = 20. The accuracy for linear programming problems in the local search method is ρs = 10−7 . The accuracy of the local search is τk = 10−5 . The key feature of the above GSA consists in constructing an approximation of the level surface of the convex function h(·) that generates the basic nonconvexity in the problem under consideration. For the problem (DC (σ )), the approximation Ak = A (ζk ) has been constructed with the help of a special set of directions Dir = {(el , e j ), l = 1, ..., m + n, j = 1, ..., q}, where el ∈ IR p+n , e j ∈ IR 2n+2m are the Euclidean basis vectors of the corresponding dimension. According to our previous experience [19–21, 31], Dir is one of the standard direction sets for problems with a bilinear structure. Unfortunately, we cannot theoretically guarantee the global optimality of the point generated by the GSA with the set Dir . However, numerical treatment usually derived global solutions in most cases [19–21, 31]. We also employ the following technique to construct approximation points Ak = A (ζk ) on the basis of the direction set: the triples (z i , u i , wi ) are found in the form (z i , u i , wi ) = λi (¯z i , u¯ i , w¯ i ), i = 1, ..., N , where λi ∈ IR are computed using the condition h(λi (¯z i , u¯ i , w¯ i )) = γ + ζk . In that case, the search for λi can be performed analytically (see also [19–21, 31]). Further, note that the selection of the algorithm parameters M and η can be carried out on the basis of our previous experience, too [19–21, 31]. The parameter η is responsible for the accuracy of the inequality (11) from the GOCs (in order to minimize rounding errors) [19–21, 31] (Step 3). Different values of the parameter M are responsible for splitting the interval [γ− , γ+ ] into a suitable number of subintervals to realize a passive one-dimensional search along γ . Here, we use the following values only: M = 2, η = 0.0. To compute the segment [γ− , γ+ ] for the one-dimensional search according to the GOCs, we need to solve two problems: minimization and maximization of a convex quadratic function g(·). The minimum problem can be solved by any quadratic programming method and an appropriate software subroutine. By the way, the same is true for the linearized problem (PL (z i , u i , wi )) at Step 4 (δk = 10−5 ). To tackle the maximum problem, we can employ a known global search strategy for convex maximization problems [24]. However, in this case, the computational process does not require the exact knowledge of these bounds. It is sufficient to have comparatively rough estimates [24]. Therefore, here we use: γ− := 0.0; γ+ := ( p + n + 2n + 2m) ∗ σ. Finally, Steps 6–9 represent verification of the main inequality (12) from the GOCs, the stopping criteria, and looping.
68
A. V. Orlov
6 Case Study To demonstrate the performance of the algorithm described above, consider a test telecommunication problem on a small graph, the nodes of which are the cities of Russia, China, and Mongolia (m = 8) (Fig. 1). The set of arcs that correspond to the available transmission lines is split into two subsets: U1 and U2 , where the arcs from U1 = {1, ..., 16} are the communication channels serviced by the upper level player (Russia) (regular lines in the figure), and the arcs from U2 = {17, ..., 32} are the channels that belong to Mongolia (dotted lines) and China (solid lines). Assume that we have to transmit δ = 30 units of information from the node s = 2 (Bratsk) to the node r = 7 (Beijing). Then the vector defining the network nodes for the information transmission is equal to d = (0, 1, 0, 0, 0, 0, −1, 0). Also, set the cost of the information unit transmission as c1 = (6, 6, 4, 4, 8, 8, 9, 9, 10, 10, 3, 3, 2, 2, 12, 12) and c2 = (6, 6, 2, 2, 7, 7, 4, 4, 8, 8, 5, 5, 6, 6, 2, 2) via channels from U1 and U2 , respectively. The cost was calculated depending on the distance between the cities and difficulties in installation of telecommunications. In addition, set the acceptable boundaries for variation of
Fig. 1 The example of telecommunication network
The Global Search Theory Approach to the Bilevel Pricing Problem …
69
Table 1 Results derived by the Global Search Algorithm for the test example
F (x 0 , y 0 , v0 ) Φ(x0 , y0 , v0 ) F(x0 , y0 , v0 ) Φ Loc G I t T (−1, −1, ..., −1) −11960.791 (0, 0, ..., 0) −12067.973 (3, 3, ..., 3) −20301.527
21.244 21.567 36.159
66 66 347 66 66 347 68.5 68.5 243
2 2 1
Φ∗
F∗
6.39 68.5 68.5 6.23 68.5 68.5 4.49 68.5 68.5
the addition tariffs for the information unit transmission via the channels from U1 : 0.5 ≤ xi ≤ 2, i = 1, ..., 16 and introduce bandwidth constraints for each channel: 0 ≤ y 1 ≤ (10, 10, 15, 15, 10, 10, 12, 12, 8, 8, 15, 15, 15, 15, 8, 8)T ; 0 ≤ y 2 ≤ (10, 10, 15, 15, 10, 10, 20, 20, 10, 10, 12, 12, 14, 14, 25, 25)T . Note that the problem (DC (σ )) constructed in accordance with the above data will have the dimension p + n + 2n + 2m = 128 and 2m + 2(2n + 2m) + n = 224 constraints. The software that implements the method developed was coded in MATLAB 7.11.0.584 R2010b. To run the software, we used the computer with Intel Core i52400 processor (3.1 GHz) and 4Gb RAM. Testing of the global search algorithm was performed using the Y -procedure of the local search and three various infeasible starting points. Table 1 shows the results derived by the algorithm and employs the following denotations: (x 0 , y 0 , v0 ) is the infeasible starting point; Φ(x0 , y0 , v0 ) is the goal function value of the problem (DC (σ )) and F(x0 , y0 , v0 ) is the goal function value that was computed for the upper level of the bilevel problem (P) at the feasible starting point after we solved the
are the values of the goal functions
and F projection problem (PR(x 0 , y 0 , v0 )); Φ of the problems (DC (σ )) and (P) at the first critical point obtained; Loc is the total number of runs of the Y -procedure required by the GSA; G I t is the number of iterations of the GSA; T is the execution time (in seconds); Φ∗ is the goal function value of the problem (DC (σ )); and F∗ is the goal function value of the bilevel problem (P) at the point obtained by the GSA. First of all, our attention is drawn to the big negative values of the goal function of the problem (DC (σ )) at feasible starting points. It is easy to see that this is mostly due to negative values of the penalty component −σ v, b1 − B1 y of this function. At the same time, the absolute values of the goal function of the original bilevel problem are small and, as expected, nonnegative. However, note that these points are not feasible in the problem (P), because σ v, b1 − B1 y = 0 (the conditions of Proposition 1 are violated). It is worth mentioning that the local search procedure performed really well and derived a considerable improvement of Φ as well as of F. Moreover, at this stage, the critical point in the problem (DC (σ )) has become feasible in the bilevel
coin and F problem (P), which is testified by the fact that the values of Φ
70
A. V. Orlov
cide (and σ v, b1 − B1 y = 0). The obtained components of the critical point (without the auxiliary variable v) have the following form: xˆ = (2, 2, ...2) ∈ IR 16 , yˆ = (0, 0, 0, 0, 10, 0, 12, 0, 8, 0, 0, 0, 3, 0, 0, 0, 7, 0, 15, 0, 0, 0, 20, 0, 0, 0, 8, 0, 2, 0, 0, 10). It also should be noted that the global search algorithm helped find the identical best values of the functions Φ and F for all starting points. This suggests that we found nothing else but the global solution to the problem. The execution time for the problem of such dimension is short and does not exceed 7 s. In addition, the third point was enough for the local search to derive the best value of the goal function, which the global search procedure failed to improve. In other cases, it was required to run two iterations of the global search. Note that, as expected, the obtained best values of the functions Φ and F are equal in all cases, because the value of the penalty function is equal to zero at the obtained point, and this point will be feasible in the original bilevel problem. Here are the components of the best solution: xˆ = (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0.5, 2, 1.5, 2, 2, 2), yˆ = (0, 0, 0, 0, 10, 0, 12, 0, 8, 0, 2, 0, 5, 0, 0, 0, 5, 0, 15, 0, 0, 0, 20, 0, 0, 0, 10, 0, 0, 0, 0, 10).
Fig. 2 The critical point and the best obtained solution to the test problem
The Global Search Theory Approach to the Bilevel Pricing Problem …
71
Let us one more time highlight the efficiency of the local search. The solution that it derived was improved by a rather complicated global search procedure only from 66 to 68.5. Therefore, when we have a complex real-life network problem with a big number of nodes and arcs, or when the global solution is not necessary, it is enough to use just the local search procedure. Figure 2 shows representation of the obtained critical point and the best obtained solution at the telecommunication network graph. Numbers with denote that they are the components of a critical point. The components relating to the upper level are marked in bold. The components relating to the lower level are marked in italic. Recall that it was required to find an optimal way of transmitting δ = 30 information units from the node s = 2 to the node r = 7. Note that many components of the critical point and the best obtained solution coincide. When we passed from the first to the second one, we reduced the maximum additional fee of the arcs 11 and 13, which were not exploited at full capacity. From the viewpoint of the optimal solution of the upper level, this reduction is not obvious, but proves to be profitable as far as the common solution to the bilevel problem is concerned.
7 Concluding Remarks In the present paper, new procedures for finding the solution to a bilevel pricing problem in telecommunication networks were developed. The procedures are based on the special Global Search Theory for nonconvex (d.c.) optimization by A.S. Strekalovsky. We described in detail the reduction of the original bilevel problem to a sequence of nonconvex one-level problems and showed how to develop special local and global search methods that take into account properties of the problem under study. The performance of the technique proposed was demonstrated on a model example. Basing on our previous numerical experience (see, for example, results on solution other bilevel problems [13, 30, 31]), we hope that the algorithms developed can be used for solving telecommunication network problems with a more complex structure. Acknowledgements This work has been supported by the Russian Science Foundation (Project no. 15-11-20015).
References 1. Bard, J.F.: Practical Bilevel Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands (1998) 2. Bazara, M.S., Shetty, C.M.: Nonlinear Programming. Wiley, Theory and Algorithms. New York (1979)
72
A. V. Orlov
3. Bonnans, J.-F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization: Theoretical and Practical Aspects, 2nd ed. Springer (2006) 4. Bracken, J., McGill, J.T.: Mathematical programs with optimization problems in the constraints. Oper. Res. 21, 37–44 (1973) 5. Brotcorne, L., Labbe, M., Marcotte, P., Savard, G.: A bilevel model and solution algorithm for a Freight Tariff-setting problem. Transp. Sci. 34(3), 289–302 (2000) 6. Brotcorne, L., Labbe, M., Marcotte, P., Savard, G.: A bilevel model for toll optimization on a multicommodity transportation network. Transp. Sci. 35(4), 345–358 (2001) 7. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. oper. Res. 153(1), 235–256 (2007) 8. Dempe, S.: Foundations of Bilevel Programming. Kluwer Academic Publishers, Dordrecht, The Netherlands (2002) 9. Dempe, S., Kalashnikov, V.V., Perez-Valdes, G.A., Kalashnykova, N.: Bilevel Programming Problems: Theory. Algorithms and Applications to Energy Networks. Springer-Verlag, BerlinHeidelberg (2015) 10. Didi-Biha, M., Marcotte, P., Savard, G.: Path-based formulations of a bilevel toll setting problem. In: Dempe, S., Kalashnikov, V. (eds.) Optimization with Multivalued Mappings, pp. 29–50. Springer Science + Business Media, LLC (2006) 11. Floudas, C.A., Pardalos, P.M. (eds.): Frontiers in Global Optimization. Kluwer Academic Publishers, New York (2004) 12. Gruzdeva, T.V., Strekalovsky, A.S.: Local search in problems with nonconvex constraints. Comput. Math. Math. Phys. 47(3), 397–413 (2007) 13. Gruzdeva, T.V., Petrova, E.G.: Numerical solution of a linear bilevel problem. Comput. Math. Math. Phys. 50(10), 1631–1641 (2010) 14. Horst, R., Tuy, H.: Global Optimization. Deterministic Approaches. Springer-Verlag, Berlin (1993) 15. Kalashnikov, V., Camacho, F., Askin, R., Kalashnykova, N.: Comparison of algorithms for solving a bi-level toll setting problem. Int. J. Innov. Comput. Inf. Control 6(8), 3529–3549 (2010) 16. Labbe, M., Marcotte, P., Savard, G.: A bilevel model of taxation and its application to optimal highway pricing. Manag. Sci. 44(12), Part 1 of 2, 345–358 (1998) 17. Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996) 18. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin, New York (2006) 19. Orlov, A.V., Strekalovsky, A.S.: Numerical search for equilibria in bimatrix games. Comput. Math. Math. Phys. 45(6), 947–960 (2005) 20. Orlov, A.V.: Numerical solution of bilinear programming problems. Comp. Math. Math. Phys. 48(2), 225–241 (2008) 21. Orlov, A.V., Strekalovsky, A.S., Batbileg, S.: On computational search for Nash equilibrium in hexamatrix games. Optim. Lett. 10(2), 369–381 (2016) 22. Orlov, A.V.: A nonconvex optimization approach to quadratic bilevel problems. In: Battiti, R., Kvasov, D., Sergeyev Y. (eds.) 11th International Conference, LION 11, Nizhny Novgorod, Russia, June 19–21, 2017, Revised Selected Papers. LNCS 10556, pp. 222–234 (2017) 23. Pang, J.-S.: Three modelling paradigms in mathematical programming. Math. Prog., Series B. 125(2), 297–323 (2010) 24. Strekalovsky, A.S.: Elements of Nonconvex Optimization. Nauka Publ, Novosibirsk (2003). (In Russian) 25. Strekalovsky, A.S.: On the minimization of the difference of convex functions on a feasible set. Comput. Math. Math. Phys. 43(3), 399–409 (2003) 26. Strekalovsky, A.S.: Minimizing sequences in problems with d.c. constraints. Comput. Math. Math. Phys. 45(3), 418–429 (2005) 27. Strekalovsky, A.S., Orlov, A.V.: A new approach to nonconvex optimization. Numer. Meth. Programm. (internet-journal: http://num-meth.srcc.msu.su/english/index.html) 8, 160–176 (2007)
The Global Search Theory Approach to the Bilevel Pricing Problem …
73
28. Strekalovsky, A.S., Orlov, A.V.: Bimatrix Games and Bilinear Programming. FizMatLit, Moscow (2007). (in Russian) 29. Strekalovsky, A.S., Orlov, A.V., Malyshev, A.V.: Local search in a quadratic-linear bilevel programming problem. Numer. Anal. Appl. 3(1), 59–70 (2010) 30. Strekalovsky, A.S., Orlov, A.V., Malyshev, A.V.: Numerical solution of a class of bilevel programming problems. Numer. Anal. Appl. 3(2), 165–173 (2010) 31. Strekalovsky, A.S., Orlov, A.V., Malyshev, A.V.: On computational search for optimistic solutions in bilevel problems. J. Global Optim. 48, 159–172 (2010) 32. Strekalovsky, A.S.: On solving optimization problems with hidden nonconvex structures. In: Rassias, T.M., Floudas, C.A., Butenko, S. (eds.) Optimization in Science and Engineering, pp. 465–502. Springer, New York (2014) 33. Tsevendorj, I.: Optimality conditions in global optimization: contributions to combinatorial optimization. University of Versailles Saint-Quentin, Habilitation to Supervise Research (2007) 34. Tuy, H.: D.c. Optimization: theory, methods and algorithms. In: Horst, R., Pardalos, P.M. (eds.) Handbook of Global optimization, pp. 149–216. Kluwer Academic Publisher (1995) 35. Vallejo, J.F., Sanchez, R.M.: A path based algorithm for solve the hazardous materials transportation bilevel problem. Appl. Mech. Mater. 253–255, 1082–1088 (2013) 36. Vasilyev, F.P.: Optimization Methods. Factorial Press, Moscow (2002). (in Russian)
Graph Dichotomy Algorithm and Its Applications to Analysis of Stocks Market Alexander Rubchinsky
Abstract A new approach to graph complexity measure is presented in this article. The essence of the matter consists of the use of recently suggested and published frequency dichotomy algorithm for constructing a family of the graph dichotomies. This family includes a prespecified number of dichotomies, such that some of them can be different and some of them can coincide. The normalized entropy of this set is a reasonable index for formal definition of complexity of a given graph. This property allows us making every day short-term predictions of big crises at stock markets. Keywords Graph dichotomy · Complexity index · Entropy · Stability · Stock market · Crisis
1 Introduction The presented article is devoted to development of the new approach to notion of graph complexity and its applications, suggested in publications [1–3] and cited their works. The essential difference between the current and previous articles can be briefly described as follows. 1. The basic construction, named in mentioned works as “agglomerative-divisive algorithm,” is significantly simplified. Instead of alternated sequence of dividing and agglomerative steps, only one dividing (into two parts) step is considered. This step is referred to as “dichotomy”. 2. Detailed experimental analysis of stability and reproducibility of computational results, concerning family of dichotomies, is executed. This analysis has not been done in the previous articles. A. Rubchinsky (B) Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] A. Rubchinsky National Research Technological University “MISIS”, Moscow, Russia © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_6
75
76
A. Rubchinsky
3. The pattern in the form of a system of linear inequalities was significantly modified and simplified. It allows obtaining short-term prediction of big crises at stock market without mistakes, while in the previous results few mistakes were encountered. In the present publication, all the abovementioned necessary notions and definitions (as well as some new ones) are introduced. The article does not suppose the readers’ knowledge of the previous works. The material of the article is structured as follows. It includes introduction (Sect. 1), two section (Sects. 2 and 3), and conclusion (Sect. 4). Section 2 is devoted to the central introduced notion of the work—the notion of dichotomy complexity of a graph. In Sect. 3, this notion is applied to stock market analysis, particularly to short-term prediction of big crises. The other examples of the suggested approach application are not considered here. Note that some of them were considered in publications [1–3]. In this material, the attention is focused on stability issues that have not been considered yet. Indeed, some natural stability requirements arise in any reasonable attempt of practical application of the suggested algorithm. Therefore, the advance in this direction seems expedient and helpful.
2 Dichotomy Complexity of Graph The exact definition of central notion of this Sect. 2—notion of graph complexity—requires results of a special computational algorithm as preliminary data. This algorithm was first described in [1] and comprehensively considered also in [2–4]. In Sect. 2.1, only brief description of the algorithm is given, though it is enough for complete programs elaboration. The exact definition and construction of a family of indices, designated as “graph dichotomy complexity”, are given in Sect. 2.2.
2.1 Frequency Dichotomy Algorithm Arbitrary simple undirected graph can be an input of frequency dichotomy algorithm (further FDA for brevity). Simplicity of a graph means that any pair of vertices cannot be connected by more than one edge. The case of graphs having several connected components is not excluded. FDA output consists of two subgraphs of the input graph, such that their sets of vertices form a partition of the set of vertices of the input graph. This pair of subgraphs is designated as a dichotomy of the input graph. Parameters of FDA are discussed below. Assume that a pair of different vertices of input graph G has been chosen. Let us describe the following essential step of FDA.
Graph Dichotomy Algorithm and Its Applications …
77
Fig. 1 Illustration of the essential step
1. Construction of a minimal path, connecting the two chosen vertices by Dijkstra’s algorithm. The length of an edge is its current frequency. The length of a path is equal to the length of its longest edge—not to the sum of all its edges. It is well known that Dijkstra’s algorithm is applicable in such cases. 2. Frequencies modification. The value 1 is added to the frequencies in all the edges on the path found at the previous substep 1. Illustration of the essential step. Chosen pair of vertices {a, b} is connected by the path (bold lines), whose edges frequencies are increased by 1 s (see Fig. 1). There are three situations just before execution of any essential step, designated as cases A, B, and C. In Fig. 2a–c, bold lines present edges with the maximal frequency, while thin lines present paths, connecting pairs of vertices. In case A (Fig. 2a), the set of all the edges with the maximal frequency does not contain any cut of the graph. Therefore, the found path does not contain edges with maximal frequency, because of minimax definition of path length. Hence, after execution of the essential step, the maximal (over the considered graph) edge frequency does not change. In case B (Fig. 2b), the set of all the edges with the maximal frequency does contain a cut of the graph, but the found path does not contain edges with maximal frequency (both vertices are located in the same side of the cut). Hence, after execution of the essential step, the maximal (over the considered graph) edge frequency does not change. In case C (Fig. 2c), the set of all the edges with the maximal frequency does contain a cut of the graph, and both vertices are located in different sides of the cut. Therefore, there is a pair of vertices such that any path, connecting these vertices,
78
A. Rubchinsky
(a)
(b)
(c)
Fig. 2 a Case A, b Case B, c Case C
Graph Dichotomy Algorithm and Its Applications …
79
must include at least one edge with the maximal frequency. Hence, the maximal (over the considered graph) edge frequency must increase by 1. Let us consider the main stages of FDA: 1. Initial setting 1. Finding connected components of the given graph G (by any standard algorithm). 2. If the number of components is greater than 1 (i.e., graph G is disconnected), then the component with the maximal number of vertices is declared as the first part of the constructed dichotomy of the initial graph; all the other components form its second part; thus, the dichotomy is constructed and the algorithm stops. Otherwise, go to the next substep 3. 3. The frequency in every edge is initialized by 1. 2. Cumulative stage 1. Choose at random a pair of different vertices of graph G. 2. Execute the above-described essential step. Repeat these two substeps T times. 3. Stopping rule 1. Continue both substeps of the cumulative stage till to the moment of increasing the maximal (over all the edges) edge frequency F max . See case C of the above illustration of the essential step. 2. Deduct the value 1 from the frequencies in all the edges forming the last found path. 3. Remove all the edges, in which frequency is equal to F max . 4. A component of the graph obtained at the previous substep 3 with the maximal number of vertices is the first part of the found dichotomy. All the other vertices form its second part. There is only one FDA parameter: number T of repetitions of the essential step. The following simple statement guarantees the construction of a dichotomy by the suggested algorithm. Statement 1. In the graph, obtained after edges removal at substep 3 of the abovedescribed stopping rule, the number of connected components exceeds 1.
2.2 Family of Dichotomies and Its Properties For any value of the single parameter T as well as for any run of the above-described FDA, the result of its work is a dichotomy, i.e., a division of the initial graph into two parts. For many graphs, the found dichotomy is the same for all the runs of FDA, but it is not true for arbitrary graphs. Remember that FDA repeatedly uses random generator for choice of consecutive random pairs of vertices. Therefore, in some cases,
80
A. Rubchinsky
especially practically important, different runs of FDA produce dichotomies that are not always coinciding. These cases are not vexatious mistakes. Moreover, it is possible to assert that various dichotomies naturally arise in the study of many complex systems, especially systems, whose functioning is determined by human activity. Hence, it seems expedient to consider—as an important generalization of the conventional dichotomy problem—construction of a family of dichotomies instead of a single one. The constructed family of dichotomies characterizes the initial system, presented by a graph. Moreover, in such situations dichotomies themselves, forming the abovementioned family, are of little interest. It turned out that it is much more expedient to focus our attention on properties of the constructed family of dichotomies as an entire object. Let us start with a simple algorithm of construction of the abovementioned family of dichotomies (AFC for brevity). Arbitrary simple undirected graph G can be an input of AFC. AFC output is defined below. AFC consists of M consecutive independent runs of the basic dichotomy algorithm, whose input coincides with the same graph G, which is the input of AFC. Thus, there are two integer parameters of AFC—the single parameter T of FDA and the number M of repetitions of FDA. The output of AFC is the constructed family of M dichotomies of the input graph G. Let us introduce the necessary notions and definitions that allow describing properties of the constructed families of dichotomies. The family of all the dichotomies, produced by all these runs, consists of M items. Some of them can coincide and some of them can be different. Assume that among M found dichotomies, dichotomy d p is encountered mp times (p 1, …, t), where t m p M. (1) p1
Numbers mp (p 1, …, t) entering in (1) are calculated directly from the family of M dichotomies found by FDA. It is enough to have an obvious algorithm of two dichotomies’ comparison. Let us order numbers mp (p 1, …, t) nonincreasingly. A set of these ordered numbers forms dichotomies distribution for the given graph G under parameters T and M. It is intuitively clear enough that properties of these distributions reflect properties of the input graphs and, hence, properties of considered systems presented by these graphs. They have different meaningful interpretations in different situations, but generally they describe complexity, entanglement, intricacy, and other similar hardly defined, though important, properties of various real systems. Some examples of such distributions are given in Sect. 3.2. Assume the initial graph consists of two connected components. Corresponding to substeps 1 and 2 of initial setting of FDA, in this case, the only one dichotomy, whose parts coincide to these two components, is constructed, and random generator is not used at all. Therefore, all the constructed dichotomies are the same independently of number M of runs. The distribution consists of one number M.
Graph Dichotomy Algorithm and Its Applications …
81
Now let us consider the other extreme case. Assume G be a complete graph with n vertices. From the symmetry of the graph, any run of FDA constructs a new dichotomy into two parts with the same number of vertices for even n or dichotomy into parts with k and k + 1 vertices (n 2 k + 1) for odd n. For n large enough, for instance, n > 50, the chance to receive two coinciding dichotomies is practically equal to zero. Therefore, all the constructed M dichotomies are different, and the distribution consists of M 1 s. It is possible to say that extreme situations of the first type, in which only one dichotomy can be constructed, occur in graphs having the corresponding special “binary” structure—there are two subsets of vertices, connected by relatively small number of edges. The above-described FDA just reveals this structure. The extreme situations of the second type correspond to graphs that have a “chaotic” structure. This means only that all the possible outcomes of the abovedescribed FDA are equitable and equiprobable. This structure is also revealed by FDA. There are other, less obvious, examples of both extreme cases, which are also revealed by FDA. But in the most graphs that can occur in real and theoretical problems, both structures are presented partially and simultaneously. We introduce the formal index that expresses numerically the presence of both types of structures in one graph. Let us define t µ p ln µ p , where µ p m p /M. (2) E − p1
E is the conventional entropy of division of M items into t parts, consisting of coinciding items. It is obvious that the minimal possible value of E is 0, while the maximal possible value is ln(M). In the first case, we have only one group of coinciding items, and in the second case we have M group, consisting of one element each. Both cases actually can occur. Assume I E/ ln(M), where E is determined by formula (2). I is a random value, because random generator is repeatedly used in every of M runs of FDA. By the construction, 0 ≤ I ≤ 1. Pay attention that I is just a random value but not its average. Formula (2) clearly shows that E is not equal to some averaged values related to a single FDA run, and the same concerns value I. In order to underline a dependence of I upon the considered two parameters, we will use the notation I(T , M). The expectation I¯(T , M) of this random value is named as dichotomy complexity of the initial graph G at level (T , M). It is supposed that the introduced dichotomy complexity of a graph reflects its important properties, mentioned above in this section. But this value itself is unknown and the suggested algorithm of family construction (AFC) gives only its approximation I(T , M). The main issue concerns the variance of these approximations. Relatively low values of the variance provide reproducibility and reliability of experimental results, describing the behavior of stock markets. In more detail, the issue is considered in the next Sect. 3.
82
A. Rubchinsky
3 Stock Market Analysis The remaining part of the presented article is devoted to experimental estimations. Though all the introduced notions are of general character and relate to arbitrary undirected graphs, the special graphs—so-called market graphs, generated by stock markets—are considered in more detail. First, these graphs describe real processes in finances sphere—one of the very important fields of human activity. Second, they have many nontrivial properties in reasonable dimensionality. Finally, designed methods and algorithms allow analyzing stock market, and, particularly, suggest an efficient approach to short-term prediction of big crises at this market. In Sect. 3.1, the known construction of the so-called market graph is described. It associates an undirected graph to every day of a considered period, based on prices at closure time of every share of a given stock market at this day and several previous days. Therefore, using algorithms of Sect. 2, we can associate to every day a numerical value of normalized entropy (see formulae (2) and (3)). It is done for stock market S&P-500 (500 greatest companies of USA) during 9 years—since January 1, 2001 till December 12, 2009. In Sect. 3.2, the found values are analyzed and some pattern, based on linear inequalities between these values during 7 consecutive days, is suggested. As it turns out, all these inequalities are true for 7 days, ending on March 3, 2001 and September 22, 2008, i.e., just 5 and 7 days prior to two big crises. At every other day among remaining 3285 days, at least one of the inequalities is wrong. Moreover, this property is stable in the following sense: under typical for calculated complexity values’ variance, at least one of the wrong inequalities remains wrong. Therefore, the suggested system of inequalities can be considered as pattern that works during all the period without mistakes.
3.1 Everyday Complexity Values for S&P-500 The stock market S&P-500 (500 greatest companies in the USA) is considered. First of all, let us describe the well-known graph model of an arbitrary stock market (see, for instance, [5, 6]). The objects correspond to considered (during some period) shares. The distance between two shares is determined as follows. 1. Let us define the basic minimal period, consisting of l consecutive days. All the data found for the period x, x − 1, …, x − l + 1 are related to day x. Assume the length l of a considered period is equal to 16. This choice is determined by the following meaningful reasons: for short period, data are too variable, for long period—too smooth. 2. Prices of all the shares at closure time are considered for days x, x − 1, …, x – l + 1. The matrix R of pairwise correlation coefficients is calculated based on these prices.
Graph Dichotomy Algorithm and Its Applications …
83
3. Distance d ij between two shares (say, i and j) is defined by the formula d ij = 1 – r ij , where r ij is the correspondent element of matrix R. The determined distance d is close to 0 for « very similar » shares and is close to 2 for « very dissimilar » shares. Therefore, matrix D (d ij ) is considered as the dissimilarity matrix. 4. Graph vertices are in the one-to-one correspondence to the considered shares. For every vertex (say, a), all the other vertices are ordered as follows: the distance between ith object in the list and object a is a nondecreasing function of index i. The first four vertices in this list (i.e., the first four closest vertices) as well as all the other vertices (if they exist), whose distances from a are equal to the distance from a to the fourth vertex in the list, are connected by an edge to vertex a. It is easy to see that the constructed graph does not depend on a specific numeration, satisfying the above conditions. The number of closest vertices (here 4) is the parameter of the graph construction. The market graph was constructed for every day of 9-year period from January 1, 2001 till December 31, 2009. The estimations I(T , M) of dichotomy complexities under parameters T 10000 and M 1000 can be calculated by formulae (1) and (2) applied to families of dichotomies found by the above-described AFC. These complexities are given in Table 1. Two 7-tuples of data related to 7 consecutive days are marked in Table 1 with gray color. One 7-tuple is marked with frame. These selections will be explained further in the article. All the data in Table 1 are realizations of different random values, defined for every day. In order to rely on these data, we must have some estimations of their range or variance. To achieve this goal, computational experiments were repeated 5 times for 6 random days (with the same parameters T 10000 and M 1000). The results are presented in Table 2. We see that the range (difference between maximal and minimal values) is little enough: they do not exceed 0.03. Let us consider the other computational experiment. It concerns two special periods, both consisting of 7 consecutive days: February 26, 2001–March 4, 2001 and September 16, 2008–September 22, 2008. The first sequence ends on March 4, 2001—5 days prior to the abrupt beginning of the so-called “dotcom crisis”; the second sequence ends on September 22, 2008—7 days prior to the abrupt beginning of the so-called “hypothec crisis”. Results of three independent runs of the abovedescribed algorithms are presented in Table 3. One of these results is included in Table 1. The ranges are also given in Table 3. The values in Tables 2 and 3 as well as many other computational experiments clearly demonstrate the stability and reproducibility of the general results in Table 1, concerning every day of the long 9-year period. Formally, in all the experiments, the range does not exceed 0.03. We can conclude that the same is true for every day. Of course, it is an experimental phenomenon rather than theoretical conclusion. In Sect. 3.2, it is suggested that a simple algorithm of short-term prediction of big crises at stock market is based on the reliable data, collected in Table 1.
2001 0.3907 0.1850 0.8274 0.3792 0.4183 0.9613 0.6840 0.0011 0.7213 0.0938 0.9827 0.7933 0.0901 0.4411 0.3417 0.6405 0.5503 0.5306 0.8220 0.3207 0.4751 0.0000 0.0000 0.6223
0.2919 0.1018 0.7593 0.4759 0.3621 0.3740 0.3966 0.4415 0.7212 0.8014 0.7352 0.6684 0.2107 0.5758 0.0705 0.5019 0.7547 0.5951 0.7602 0.3102 0.7788 0.0088 0.0000 0.5542
0.8888 0.0812 0.4010 0.3160 0.6314 0.3089 0.4065 0.5321 0.8269 0.6673 0.4348 0.4732 0.3122 0.2948 0.1257 0.1636 0.8293 0.7052 0.7601 0.2898 0.4311 0.2137 0.0000 0.8037
0.2690 0.8412 0.5624 0.4282 0.2622 0.4118 0.4743 0.3399 0.6832 0.5886 0.8319 0.5369 0.3499 0.4699 0.8633 0.3324 0.3494 0.1050 0.1997 0.1671 0.3582 0.0156 0.0248 0.9296
0.1880 0.4559 0.5818 0.4023 0.5310 0.7084 0.0959 0.9558 0.9583 0.3549 0.3339 0.4557 0.2119 0.5133 0.1880 0.8086 0.6162 0.5188 0.9486 0.4739 0.0964 0.3028 0.0000 0.1080
Table 1 Normalized entropy for every day of the period 0.3063 0.9232 0.2573 0.5801 0.5427 0.0821 0.0000 0.8507 0.8066 0.2275 0.3455 0.6761 0.6527 0.2866 0.2248 0.6603 0.5363 0.6745 0.6179 0.5412 0.1810 0.2984 0.5309 0.3971
0.2321 0.3507 0.9575 0.2200 0.6531 0.1727 0.0023 0.0159 0.7803 0.4182 0.3327 0.4857 0.7361 0.6804 0.4943 0.3107 0.5185 0.8934 0.9108 0.2815 0.0966 0.1135 0.2047 0.5155
0.3931 0.9024 0.4979 0.3583 0.2334 0.6290 0.0944 0.1182 0.7866 0.3940 0.2543 0.5238 0.8752 0.8693 0.4286 0.4893 0.3750 0.7237 0.7882 0.4912 0.3242 0.3635 0.4562 0.2916
0.5676 0.8774 0.1187 0.5066 0.1727 0.4187 0.1993 0.0816 0.1226 0.0634 0.3579 0.2145 0.2604 0.5867 0.1294 0.1948 0.7021 0.6227 0.2781 0.1906 0.0946 0.7506 0.8112 0.2143
0.0737 0.8447 0.0564 0.5192 0.6978 0.5998 0.0928 0.3444 0.3467 0.1195 0.4942 0.3904 0.3259 0.5608 0.7471 0.4905 0.3530 0.2797 0.6665 0.6686 0.5380 0.0000 0.4777 0.1344
0.2136 0.4047 0.3087 0.6184 0.8274 0.4496 0.2883 0.2476 0.5775 0.1944 0.7153 0.1152 0.4748 0.6833 0.6037 0.4610 0.0496 0.2516 0.5150 0.8575 0.2252 0.0062 0.4089 0.2143
(continued)
0.2208 0.4178 0.4839 0.3726 0.6218 0.4989 0.0325 0.2320 0.3535 0.2391 0.9123 0.1140 0.4948 0.3920 0.9562 0.1671 0.2782 0.3083 0.0969 0.7191 0.0369 0.0023 0.6300 0.0352
84 A. Rubchinsky
0.5105 0.6028 0.1949 0.7185 0.5584 0.5910 0.2604 2002 0.3547 0.6275 0.3165 0.7038 0.6854 0.3879 0.7993 0.5150 0.5451 0.4381 0.1573 0.3246 0.7573 0.0817 0.8858 0.3942 0.1216 0.0145
0.3307 0.4085 0.1015 0.3435 0.4308 0.1721 0.2491
0.8507 0.4532 0.8311 0.2714 0.5986 0.4573 0.2232 0.5886 0.1610 0.5649 0.7867 0.5660 0.6900 0.0497 0.1919 0.2039 0.7391 0.2495
0.5433 0.3082 0.2605 0.1576 0.9325 0.8626 0.3188
0.8953 0.2395 0.1829 0.6403 0.3088 0.5420 0.3344 0.6954 0.2454 0.5545 0.3112 0.5202 0.8049 0.0606 0.6736 0.3527 0.9658 0.0104
Table 1 (continued)
0.5830 0.3238 0.9294 0.2765 0.3573 0.1609 0.1352 0.6243 0.5279 0.6028 0.3686 0.9267 0.6728 0.0459 0.3540 0.0514 0.2075 0.3758
0.4381 0.3372 0.2749 0.1721 0.4894 0.5997 0.2236 0.6381 0.2199 0.2628 0.2786 0.6882 0.0314 0.8708 0.6879 0.4730 0.8019 0.3059 0.4051 0.1127 0.0914 0.2464 0.2364 0.4142 0.4362
0.6729 0.3962 0.4512 0.3181 0.5957 0.6802 0.1235 0.6394 0.3000 0.1051 0.7979 0.2614 0.2629 0.2914 0.0959 0.2746 0.7101 0.2950 0.6138 0.3046 0.1764 0.2537 0.1027 0.2046 0.1394
0.7869 0.2047 0.5094 0.0797 0.5271 0.8698
0.5585 0.5933 0.4289 0.4589 0.6156 0.0189 0.3286 0.7209 0.0562 0.4415 0.1989 0.6641 0.3344 0.0991 0.6121 0.2573 0.0000 0.3412
0.6660 0.4959 0.0125 0.3873 0.2049 0.7318
0.5204 0.2356 0.2941 0.8457 0.5312 0.0395 0.4860 0.6067 0.0783 0.3060 0.8487 0.5318 0.3650 0.4817 0.5848 0.1171 0.0138 0.0268
0.7780 0.4430 0.3372 0.5345 0.1936 0.2531
0.3676 0.2492 0.5225 0.6197 0.5592 0.0124 0.5030 0.3580 0.2198 0.3137 0.6404 0.5872 0.9742 0.2891 0.2010 0.0169 0.0749 0.2010
0.3145 0.0271 0.6219 0.9608 0.1602 0.5742
0.3268 0.1977 0.4207 0.9370 0.5362 0.0556 0.2643 0.7690 0.5614 0.1267 0.7849 0.5224 0.5060 0.3343 0.5129 0.3931 0.0896 0.0639
0.3316 0.4898 0.2987 0.7488 0.4086 0.5843
0.7207 0.3208 0.4642 0.7825 0.2852 0.5496 0.9237 0.9363 0.2548 0.0763 0.8622 0.0824 0.0577 0.3603 0.3023 0.0011 0.0049 0.0945
0.9648 0.2126 0.0840 0.7972 0.1399 0.1969
(continued)
0.2461 0.1272 0.3266 0.6324 0.7295 0.2330 0.4705 0.7636 0.7243 0.2657 0.7680 0.4218 0.3117 0.3999 0.2407 0.7431 0.0000 0.1501
0.7897 0.3620 0.0643 0.8458 0.4915 0.2631
Graph Dichotomy Algorithm and Its Applications … 85
0.4855 0.3285 0.3810 0.6417 0.7361 0.7211 0.0576 0.4410 0.3066 0.3995 0.3912 0.4979
0.6062 0.5378 0.0322 0.2513 0.5696 0.4456 0.0488 0.6726 0.5700 0.6169 0.4520 0.3736
0.6374 0.9717 0.0153 0.4504 0.3816 0.1890 0.9466 0.3963 0.5447 0.1231 0.3895 0.1825
0.1065 0.8467 0.0044 0.3684 0.8234 0.8243 0.6027 0.4250 0.6454 0.5550 0.4064 0.4688
0.3101 0.2372 0.1779 0.5547 0.0332 0.2787 0.0519 0.8327 0.3812 0.5578 0.7252 0.6334 0.4376 0.5201 0.4939 0.0325 0.6381 0.0737 0.6840 0.6410 0.4964 0.2081 0.1722 0.2191 0.4266
0.2389 0.1734 0.3224 0.6739 0.3133 0.6873 0.0121 0.8693 0.9220 0.1323 0.7548 0.6353
0.0607 0.5973 0.2056 0.1179 0.8949 0.7020 0.1247 0.5646 0.3621 0.3563 0.2936 0.2883
0.2825 0.4738 0.5197 0.6692 0.5579 0.2024 0.1952 0.9239 0.1814 0.0698 0.1363 0.5635
0.3326 0.7585 0.5083 0.0167 0.5280 0.7113 0.0419 0.0512 0.5883 0.2228 0.2238 0.2406
0.5084 0.2822 0.0958 0.7943 0.4705 0.2144 0.2642 0.7080 0.9455 0.0265 0.1669 0.5564
0.1479 0.4360 0.1658 0.2434 0.6618 0.3834 0.5023 0.2741 0.3344 0.1631 0.6698 0.1749
0.3980 0.1470 0.0678 0.7683 0.0822 0.2879 0.1328 0.4729 0.9620 0.1388 0.2233 0.7982
0.1074 0.0842 0.2148 0.3554 0.2162 0.1979 0.0990 0.3831 0.0072 0.6182 0.5720 0.3995
0.0000 0.1002 0.6843 0.6604 0.2462 0.2792 0.1465 0.3588 0.9593 0.3806 0.3586 0.0000
0.1243 0.2862 0.7189 0.1153 0.3775 0.2736 0.0011 0.1037 0.1434 0.8048 0.2439 0.9249
0.4524 0.2115 0.9343 0.7877 0.4004 0.9130 0.6998 0.6761 0.8074 0.6169 0.1177 0.5045
(continued)
0.1686 0.3945 0.4965 0.0999 0.5901 0.4014 0.0151 0.6291 0.2713 0.3350 0.0000 0.7445
0.0000 0.9004 0.1764 0.8240 0.8307 0.6622 0.3988 0.7327 0.9044 0.2369 0.2690 0.5877
0.4077 0.2496 0.2776 0.4827 0.4260 0.7738 0.5694 0.1496 0.2606 0.1661 0.2429 0.2376 0.9830 2003 0.7270 0.1615 0.1682 0.1708 0.2359 0.5512 0.0388 0.0011 0.7793 0.3899 0.7897 0.3634
0.3609 0.0898 0.3187 0.4285 0.0000 0.1836 0.0391 0.7899 0.3983 0.2276 0.9652 0.3647 0.3994
Table 1 (continued)
0.5228 0.0000 0.2236 0.7311 0.2368 0.1287 0.1896 0.9094 0.4779 0.1067 0.9432 0.3065 0.7079
A. Rubchinsky
0.2417 0.0131 0.7452 0.3400 0.0011 0.0308 0.3883 0.5839 0.2386 0.3860 0.5511 0.5697 0.4975
86
0.7977 0.7554 0.5211 0.6092 0.0321 0.3431 0.4567 0.5893 0.6821 0.2623 0.7914 0.3819 0.1129 0.3815 0.0823 0.1565 0.0000 0.2545 0.2005 2004 0.0850 0.8908 0.1176 0.4283 0.4599 0.6680
0.6394 0.3036 0.6906 0.3223 0.0563 0.3403 0.5710 0.7589 0.5946 0.3977 0.3177 0.3094 0.0578 0.7716 0.5436 0.1185 0.2414 0.4376 0.3316
0.2001 0.3723 0.3329 0.2298 0.6501 0.4147
0.7105 0.3034 0.7379 0.2926 0.0397 0.4566 0.2459 0.7163 0.4091 0.3568 0.5833 0.2452 0.4870 0.5569 0.7081 0.4914 0.0264 0.2700 0.3013
0.7488 0.6542 0.1112 0.2678 0.9298 0.7258
Table 1 (continued)
0.0991 0.8475 0.4187 0.7047 0.3734 0.1138
0.4989 0.5169 0.6118 0.3412 0.1193 0.8054 0.9489 0.8137 0.0985 0.6625 0.4603 0.2729 0.2352 0.8063 0.6016 0.3525 0.4369 0.8223 0.0888 0.5940 0.4737 0.4993 0.4582 0.6381 0.1828
0.6737 0.1257 0.3529 0.3234 0.2139 0.8845 0.9142 0.3625 0.9315 0.3968 0.0321 0.0554 0.1527 0.2091 0.5635 0.6560 0.2282 0.8738 0.4481 0.9611 0.3940 0.8152 0.7070 0.4291 0.5294
0.2713 0.0179 0.5157 0.5009 0.6127 0.7672 0.4763 0.1199 0.3867 0.5716 0.4896 0.2403 0.4850 0.2905 0.9174 0.4122 0.0836 0.5023
0.2812 0.5599 0.3324 0.2607 0.6507 0.2008
0.5049 0.0928 0.5158 0.3598 0.0070 0.8982 0.6289 0.1738 0.0804 0.2158 0.3684 0.3210 0.6807 0.0471 0.7302 0.6199 0.4770 0.5575
0.3229 0.3888 0.2678 0.9111 0.4014 0.3023
0.0032 0.1288 0.8394 0.3927 0.3917 0.9020 0.8987 0.0836 0.0858 0.2434 0.4234 0.1042 0.8569 0.0712 0.2996 0.9466 0.2764 0.7648
0.2634 0.3712 0.3318 0.0458 0.3030 0.2025
0.0388 0.2393 0.4102 0.2262 0.6987 0.3476 0.5882 0.3097 0.4038 0.4187 0.8044 0.1801 0.6103 0.5935 0.5430 0.2096 0.4358 0.2173
0.3356 0.3407 0.6797 0.5377 0.7005 0.7391
0.1333 0.6384 0.3407 0.5972 0.1880 0.3822 0.4562 0.2887 0.1973 0.7306 0.0030 0.1818 0.5250 0.0754 0.5452 0.0000 0.3448 0.3270
0.4494 0.6530 0.3294 0.5693 0.6334 0.4931
0.0011 0.6059 0.4850 0.3745 0.7482 0.7915 0.6024 0.5047 0.4294 0.9334 0.1605 0.4012 0.6812 0.0736 0.3303 0.1988 0.3729 0.6812
(continued)
0.3241 0.1184 0.3743 0.3225 0.2943 0.8157
0.1688 0.7036 0.7335 0.2524 0.5666 0.8040 0.5276 0.7473 0.5541 0.5714 0.7792 0.0984 0.3507 0.4098 0.1536 0.6585 0.3897 0.7224
Graph Dichotomy Algorithm and Its Applications … 87
0.5328 0.5956 0.1266 0.1493 0.5593 0.0000 0.2890 0.9302 0.7048 0.2569 0.0815 0.3072 0.8496 0.2148 0.1920 0.5580 0.2773 0.2041 0.6749 0.4641 0.0000 0.5237 0.3871 0.8123 0.5918
0.5899 0.4120 0.1524 0.3392 0.8318 0.0551 0.7376 0.2733 0.5103 0.7015 0.3467 0.6964 0.5891 0.2086 0.2130 0.1629 0.7440 0.4486 0.3946 0.2877 0.0628 0.3356 0.4015 0.2873 0.4107
Table 1 (continued)
0.0994 0.3440 0.0079 0.5733 0.7972 0.0563 0.6593 0.7261 0.2616 0.7037 0.4834 0.7001 0.5485 0.2353 0.0809 0.1675 0.4551 0.1505 0.2335 0.6256 0.0321 0.0000 0.1770 0.4795 0.4375
0.3874 0.4082 0.0088 0.4554 0.5027 0.0262 0.4839 0.5754 0.2753 0.5550 0.0608 0.4832 0.2141 0.2859 0.4092 0.0021 0.5114 0.3326 0.1699 0.4358 0.1847 0.1646 0.0689 0.6411 0.1956
0.5665 0.6093 0.0929 0.6990 0.8203 0.2787 0.5821 0.4761 0.4326 0.7076 0.8044 0.4940 0.1006 0.5620 0.7659 0.6606 0.6329 0.4404 0.1582 0.2139 0.3775 0.8281 0.1438 0.8446 0.5438
0.5246 0.7621 0.0544 0.4314 0.4173 0.1885 0.3839 0.8513 0.2189 0.5739 0.5161 0.0953 0.1129 0.3620 0.1495 0.1739 0.3982 0.0365 0.4071 0.0600 0.1074 0.2572 0.5168 0.8477 0.2473
0.5457 0.4760 0.0000 0.3427 0.4774 0.2066 0.2319 0.6998 0.1652 0.4817 0.3449 0.3088 0.1812 0.5877 0.0333 0.3104 0.1660 0.0447 0.1008 0.2514 0.5324 0.2047 0.5110 0.9170
0.3974 0.4970 0.2925 0.2060 0.2599 0.3675 0.2582 0.2737 0.3914 0.8537 0.0639 0.1648 0.0244 0.1921 0.3788 0.5650 0.4050 0.0090 0.7081 0.0245 0.2710 0.4947 0.2660 0.3368
0.1373 0.4265 0.4495 0.7624 0.5106 0.8527 0.2594 0.5590 0.3205 0.7707 0.0887 0.1943 0.0705 0.0921 0.1490 0.6245 0.5461 0.0877 0.1452 0.0675 0.4092 0.5578 0.1992 0.2570
0.1777 0.3935 0.7115 0.9116 0.1584 0.2088 0.2488 0.8673 0.0678 0.1522 0.4653 0.4336 0.0112 0.0000 0.1427 0.5719 0.4365 0.4286 0.5087 0.0097 0.4945 0.4634 0.4126 0.8541
0.1449 0.2603 0.1330 0.8199 0.1814 0.5649 0.1003 0.9667 0.2192 0.3726 0.4494 0.7775 0.2335 0.0393 0.1977 0.7244 0.3026 0.2148 0.4518 0.0000 0.6448 0.5844 0.4977 0.7145
(continued)
0.0729 0.0981 0.3487 0.4181 0.2795 0.7310 0.8748 0.1733 0.4066 0.4781 0.8370 0.5776 0.0427 0.1210 0.1332 0.4099 0.3656 0.2624 0.5730 0.0000 0.1798 0.4848 0.2441 0.5660
88 A. Rubchinsky
2005 0.3537 0.2717 0.7366 0.3929 0.5081 0.5068 0.5321 0.3119 0.5406 0.0000 0.2710 0.7332 0.3880 0.3622 0.2281 0.2644 0.1970 0.4734 0.7932 0.3697 0.2228 0.0000 0.4086 0.3339 0.1780 0.8010
0.0949 0.2654 0.5979 0.1771 0.2879 0.3732 0.0544 0.2398 0.0807 0.0000 0.7380 0.6158 0.1049 0.1711 0.4438 0.1280 0.2132 0.3571 0.7190 0.5350 0.5196 0.3854 0.3626 0.6460 0.5295 0.6417
Table 1 (continued)
0.7003 0.3572 0.5726 0.1310 0.3021 0.5805 0.4732 0.2575 0.1779 0.0744 0.6488 0.1058 0.0000 0.5089 0.5478 0.4966 0.5673 0.5977 0.3553 0.0330 0.4270 0.2655 0.2441 0.7814 0.2364 0.4290
0.3990 0.6425 0.4775 0.3623 0.4534 0.0149 0.5593 0.2498 0.6448 0.0000 0.2727 0.3727 0.1741 0.6090 0.5263 0.2595 0.2735 0.6806 0.2115 0.7074 0.6874 0.2709 0.4511 0.7831 0.1463 0.6146
0.6171 0.8602 0.3796 0.1414 0.5914 0.0369 0.4009 0.6110 0.6138 0.1187 0.7710 0.9286 0.2903 0.3640 0.0951 0.1962 0.0011 0.6688 0.2518 0.0011 0.4462 0.0799 0.0729 0.4042 0.7522 0.2610
0.6039 0.1208 0.1430 0.0480 0.4578 0.0151 0.5806 0.1728 0.5793 0.5330 0.3377 0.2905 0.2630 0.4832 0.2940 0.6870 0.1530 0.0566 0.1372 0.0000 0.7078 0.2583 0.2623 0.3436 0.0812 0.3042
0.1043 0.6181 0.0000 0.2071 0.3132 0.1082 0.4100 0.8548 0.7820 0.4558 0.3173 0.5804 0.4188 0.3705 0.6856 0.2750 0.2968 0.3553 0.5298 0.5902 0.4035 0.1190 0.3026 0.6298 0.3293 0.5584
0.1758 0.1606 0.4642 0.0696 0.8113 0.1810 0.5132 0.2706 0.4231 0.7460 0.2621 0.6018 0.7715 0.3152 0.5434 0.7975 0.3071 0.4155 0.7783 0.5222 0.4203 0.7922 0.5851 0.1333 0.2821 0.0522
0.0011 0.2963 0.5842 0.2451 0.3520 0.4715 0.6147 0.1982 0.3237 0.4034 0.3570 0.4535 0.3321 0.7279 0.4652 0.5715 0.2824 0.4823 0.0942 0.3340 0.2628 0.5365 0.4313 0.0215 0.2502 0.2637
0.0381 0.4530 0.2875 0.0386 0.5739 0.2962 0.2463 0.0000 0.0655 0.7515 0.3915 0.3760 0.4957 0.6595 0.0960 0.1087 0.3169 0.2651 0.4123 0.4222 0.3864 0.3474 0.3594 0.0693 0.3369 0.0419
0.0314 0.5532 0.2209 0.0858 0.1611 0.8581 0.2357 0.8763 0.0943 0.9680 0.6586 0.2780 0.6228 0.7965 0.0985 0.0021 0.4506 0.4143 0.3573 0.1507 0.1816 0.4128 0.4564 0.0772 0.4943 0.1982
(continued)
0.1489 0.7693 0.0928 0.2981 0.6529 0.7107 0.2092 0.5443 0.2810 0.4024 0.2089 0.2424 0.6614 0.3726 0.3373 0.0700 0.4600 0.4127 0.3939 0.2034 0.5824 0.3587 0.0000 0.0091 0.7289 0.6162
Graph Dichotomy Algorithm and Its Applications … 89
0.2104 0.1802 0.6206 0.4658 0.3554 0.1318 0.5986 0.5154 0.1923 0.9657 0.1939 0.1895 0.7319 0.8564 0.9032 0.2196 0.2859 0.2532 0.4075 0.3088
0.3995 0.1855 0.6654 0.4423 0.4848 0.6328 0.2173 0.2661 0.5927 0.7958 0.5355 0.5383 0.2410 0.4568 0.2013 0.0826 0.0936 0.0815 0.2191 0.2885
0.8700 0.0990 0.2140 0.8978 0.5114 0.0000 0.4247 0.3365 0.2021 0.3189 0.5046 0.3141 0.7992 0.0786 0.9497 0.2123 0.2934 0.8156 0.7314 0.2131
0.6442 0.2847 0.1118 0.6550 0.2808 0.7170 0.8133 0.6449 0.3599 0.2022 0.9539 0.1165 0.6718 0.2686 0.7399 0.3721 0.0175 0.2016 0.6149 0.3747
0.2929 0.1846 0.5101 0.8221 0.2843 0.5118 0.5309 0.4816 0.3534 0.6328 0.4118 0.6678 0.9090 0.5067 0.1920 0.8218 0.2584 0.5316 0.1080 0.9365 0.0756 0.1678 0.4014 0.9660 0.1354
0.3959 0.0079 0.5159 0.4265
0.3820 0.3569 0.0000 0.4320 0.3084 0.4909 0.3656 0.5688 0.4351 0.4655 0.6163 0.0662 0.3929 0.2106 0.8848 0.2265 0.0000 0.0046 0.3242 0.0000
0.4741 0.0000 0.5425 0.9197
0.5312 0.1478 0.0234 0.9148 0.1665 0.4342 0.2168 0.8186 0.2052 0.0834 0.1321 0.1525 0.8482 0.1785 0.8982 0.1342 0.5470 0.1314 0.3767 0.2024
0.6852 0.2961 0.2744 0.3717
0.9449 0.4583 0.7916 0.2696 0.3592 0.9223 0.0621 0.2805 0.2868 0.1238 0.3842 0.2846 0.1163 0.7029 0.7125 0.6119 0.2677 0.2323 0.9063 0.3625
0.1712 0.1967 0.9289 0.2956
0.0030 0.5991 0.6413 0.7216 0.3202 0.4633 0.4284 0.7316 0.1187 0.2372 0.4684 0.1901 0.1878 0.1864 0.5481 0.6681 0.1806 0.2099 0.1840 0.8257
0.5409 0.0766 0.7894 0.7708
0.0899 0.5056 0.7935 0.3313 0.4481 0.4695 0.1180 0.6390 0.6944 0.6556 0.8092 0.0970 0.3830 0.4360 0.5202 0.6615 0.3072 0.1604 0.5304 0.6908
0.1672 0.3603 0.4575 0.7404
(continued)
0.7837 0.4039 0.5854 0.4739 0.4562 0.2825 0.3137 0.6225 0.7272 0.5991 0.5831 0.0000 0.6755 0.1395 0.9738 0.4383 0.1438 0.0743 0.2735 0.4862
0.2724 0.0909 0.4018 0.8293
0.2756 0.3121 0.1824 0.7301 0.3384 2006 0.4631 0.3876 0.6463 0.5328 0.6695 0.3346 0.2364 0.2761 0.4285 0.8058 0.5981 0.4191 0.4591 0.0728 0.2441 0.5629 0.4673 0.1238 0.3198 0.8090
0.1959 0.2610 0.8862 0.2183 0.2915
Table 1 (continued)
0.4154 0.0810 0.1205 0.4310 0.5678
A. Rubchinsky
0.4539 0.0803 0.5140 0.7405 0.5974
90
0.7268 0.7132 0.2738 0.2921 0.7470 0.1104 0.4001 0.4078 0.1099 0.0900 0.3143 2007 0.3033 0.2647 0.3307 0.0599 0.2076 0.2223 0.5955 0.6446 0.6180 0.2903 0.7014 0.5062 0.7125 0.0174
0.2943 0.8117 0.5828 0.8371 0.2093 0.5706 0.2879 0.6240 0.1841 0.3973 0.2218
0.5525 0.3687 0.7559 0.0681 0.2799 0.1690 0.1495 0.2967 0.4033 0.3140 0.4388 0.9198 0.4561 0.6211
0.2755 0.4742 0.5185 0.3030 0.4260 0.2982 0.2040 0.4305 0.1658 0.7020 0.4369
0.6464 0.3505 0.2157 0.1236 0.5189 0.1058 0.8584 0.1702 0.7541 0.3794 0.6066 0.5913 0.3746 0.9095
Table 1 (continued)
0.7876 0.2163 0.5077 0.0713 0.2477 0.5002 0.5518 0.0011 0.1686 0.1787 0.8449 0.9591 0.4587 0.5385
0.7916 0.6922 0.8674 0.4838 0.1520 0.2770 0.6941 0.4441 0.2245 0.7862 0.0034 0.8758 0.1669 0.2902 0.1900 0.6374 0.3850 0.6371 0.1484 0.2576 0.1332 0.4295 0.8089 0.8553 0.5947
0.6070 0.4785 0.7166 0.3884 0.4456 0.7371 0.6419 0.7385 0.6353 0.6641 0.1405 0.9448 0.0813 0.8320 0.0787 0.7336 0.0000 0.6836 0.4987 0.5334 0.1365 0.7158 0.8825 0.2408 0.7868
0.6554 0.7081 0.8668 0.4821 0.3221 0.6761 0.3463 0.2435 0.8020 0.6168
0.9142 0.2194 0.2760 0.1434 0.1996 0.0000 0.1184 0.2105 0.3705 0.3874 0.6720 0.4790 0.5715 0.6781
0.6156 0.3135 0.8612 0.2281 0.5650 0.8960 0.2562 0.4363 0.2332 0.5380
0.3157 0.3932 0.3872 0.3890 0.7099 0.0140 0.6486 0.4297 0.5755 0.4161 0.6012 0.3203 0.4116 0.5632
0.0044 0.1395 0.8614 0.2329 0.8449 0.4151 0.6160 0.7960 0.3832 0.3756
0.1837 0.1705 0.3157 0.8231 0.5740 0.0113 0.1825 0.3215 0.4431 0.7606 0.4333 0.2359 0.0983 0.2294
0.3081 0.2030 0.8410 0.3527 0.3678 0.7699 0.7305 0.6141 0.2550 0.4125
0.5949 0.5295 0.1981 0.8458 0.3478 0.0706 0.0853 0.4977 0.3802 0.4884 0.7684 0.3944 0.0890 0.1688
0.1161 0.2771 0.4163 0.3736 0.3204 0.9235 0.9556 0.2098 0.5670 0.2622
0.1899 0.1178 0.1577 0.6256 0.2632 0.3240 0.7218 0.4700 0.5352 0.4757 0.2621 0.2317 0.8441 0.3012
0.4937 0.2995 0.4671 0.1228 0.3500 0.7756 0.8067 0.3463 0.4184 0.6215
(continued)
0.1599 0.0605 0.1450 0.2176 0.4982 0.7209 0.4502 0.4190 0.7847 0.3562 0.3171 0.7086 0.4237 0.3731
0.1915 0.2083 0.5786 0.2678 0.5313 0.3206 0.8421 0.2792 0.2943 0.3605
Graph Dichotomy Algorithm and Its Applications … 91
0.1037 0.3115 0.2105 0.1088 0.2268 0.3873 0.3966 0.0869
0.5303 0.0907 0.3075 0.3607 0.5548 0.7685 0.6160 0.1479
0.3560 0.2357 0.3966 0.6751 0.7751 0.3845 0.6522 0.1051
0.1170 0.0512 0.3537 0.8150 0.9436 0.3742 0.2326 0.0611
0.2981 0.7868 0.4622 0.4444 0.5738 0.3078 0.4587 0.4067 0.3127 0.1351 0.1532 0.2001 0.1985 0.5094 0.0105 0.0884 0.3450 0.0023 0.3805 0.3234 0.2118 0.8766 0.2150 0.9192 0.3667
0.5987 0.5532 0.3752 0.6841 0.5445 0.6077 0.4602 0.3695 0.1936 0.6039 0.3381 0.2658 0.1599 0.8162 0.4383 0.1158
0.0147 0.1354 0.6285 0.2691 0.8237 0.7047 0.9181 0.6170
0.3274 0.3823 0.5417 0.0491 0.3730 0.6524 0.2257 0.2792 0.5698 0.4006 0.4329 0.4514 0.7347 0.6453 0.0769 0.1565
0.0095 0.1681 0.0675 0.1918 0.9728 0.0588 0.9224 0.7928
0.3675 0.2333 0.3368 0.4226 0.1908 0.5534 0.8183 0.3480 0.6757 0.5353 0.2276 0.6430 0.1783 0.4832 0.0219 0.2089
0.0754 0.0508 0.3326 0.2344 0.4520 0.0379 0.4133 0.8081
0.2588 0.3207 0.3836 0.1755 0.4355 0.4704 0.9078 0.6869 0.6480 0.5256 0.3338 0.3009 0.2459 0.9554 0.3609 0.5794
0.0053 0.1231 0.5863 0.3379 0.3214 0.2307 0.5925 0.3815
0.1794 0.2146 0.1577 0.1070 0.6831 0.3050 0.6239 0.6205 0.3281 0.3141 0.3349 0.9293 0.1537 0.5927 0.2539 0.4965
0.0240 0.2843 0.3889 0.3854 0.8739 0.1105 0.4427 0.1350
0.1812 0.4676 0.2243 0.0535 0.6880 0.6645 0.1400 0.6956 0.4936 0.2436 0.5218 0.3881 0.2301 0.3398 0.7451 0.7888
(continued)
0.1588 0.3454 0.8522 0.4338 0.3262 0.3242 0.2399 0.0793
0.3632 0.6507 0.8970 0.0000 0.9147 0.8172 0.0408 0.3508 0.6833 0.7175 0.3975 0.5853 0.9162 0.2063 0.4266 0.8694
0.5266 0.5861 0.3880 0.1988 0.0000 0.7567 0.2888 0.2358 0.1057 0.7320 0.5037 0.8595 0.1148 0.1025 0.1359 0.5684 0.2619 2008 0.5857 0.0095 0.3192 0.9667 0.3585 0.1210 0.8029 0.1084
0.5400 0.8038 0.8513 0.1718 0.7949 0.7808 0.1991 0.4317 0.2175 0.3886 0.1756 0.2956 0.0000 0.6819 0.1303 0.7017 0.8907
Table 1 (continued)
0.7736 0.5762 0.2838 0.3153 0.7336 0.8462 0.8029 0.4836 0.0141 0.1228 0.7089 0.5897 0.0000 0.3113 0.3401 0.3634 0.8508
A. Rubchinsky
0.1745 0.4681 0.4618 0.3303 0.1645 0.9571 0.7304 0.3882 0.2373 0.6602 0.3383 0.3502 0.2830 0.3497 0.0320 0.3607 0.7360
92
0.3354 0.7811 0.7851 0.3444 0.2476 0.3205 0.9002 0.1429 0.1485 0.2574 0.8224 0.0095 0.2416 0.0695 0.5434 0.9685 0.4745 0.1620 0.5151 0.2767 0.0000 0.3711 0.4768
0.3566 0.2564 0.4172 0.2895 0.1090 0.4439 0.4432 0.1262 0.0791 0.1079 0.6270 0.0885 0.1507 0.3039 0.1922 0.4018 0.5649 0.2461 0.1811 0.1858 0.3946 0.8617 0.2109
Table 1 (continued)
0.7884 0.4484 0.2188 0.4369 0.3117 0.4143 0.3154 0.1007 0.3626 0.0175 0.6768 0.2896 0.3391 0.5135 0.3933 0.0000 0.4112 0.0633 0.0699 0.9463 0.9254 0.3609 0.2215
0.0159 0.3815 0.1555 0.4021 0.1947 0.5373 0.2488 0.0811 0.3075 0.0649 0.2778 0.5585 0.4626 0.2042 0.9711 0.0000 0.5798 0.2692 0.7912 0.8891 0.4047 0.3784 0.3879
0.0074 0.3350 0.3379 0.1007 0.1598 0.1572 0.3504 0.0000 0.9035 0.2354 0.1521 0.0388 0.8745 0.5614 0.1835 0.1089 0.5820 0.1944 0.7938 0.7487 0.2325 0.4992 0.9444
0.5595 0.3715 0.2240 0.6368 0.1075 0.7923 0.5941 0.0643 0.6255 0.0700 0.1402 0.1102 0.6294 0.5843 0.8795 0.4384 0.4119 0.6289 0.8126 0.0000 0.1247 0.5933 0.2819
0.3654 0.1578 0.2219 0.1975 0.1279 0.4995 0.7331 0.4046 0.4160 0.0000 0.3139 0.1019 0.1794 0.4220 0.6296 0.7905 0.1743 0.4019 0.4935 0.5499 0.2271 0.3532
0.3518 0.4638 0.2939 0.2369 0.1161 0.4210 0.1491 0.2746 0.3390 0.8652 0.0606 0.3828 0.5276 0.2694 0.8164 0.8955 0.3014 0.2059 0.3415 0.7879 0.2022 0.9077
0.4293 0.0363 0.5144 0.0000 0.2026 0.0391 0.1257 0.0614 0.0720 0.5543 0.0282 0.3854 0.2620 0.1764 0.8698 0.3176 0.1693 0.0000 0.1469 0.5545 0.3123 0.9815
0.4644 0.0795 0.4492 0.5259 0.0000 0.3658 0.0041 0.2780 0.0471 0.9729 0.1876 0.1235 0.5444 0.8304 0.8372 0.2742 0.7285 0.0000 0.5622 0.3545 0.5624 0.2453
0.1639 0.0324 0.7175 0.3028 0.5104 0.5263 0.2758 0.0221 0.1786 0.9236 0.2037 0.1880 0.5069 0.5033 0.8146 0.6040 0.1329 0.0000 0.2792 0.2054 0.4420 0.8225
(continued)
0.5882 0.1168 0.3528 0.2517 0.3142 0.9096 0.1177 0.0910 0.4463 0.9725 0.5288 0.1763 0.4576 0.9516 0.9685 0.6050 0.3999 0.0767 0.0000 0.0285 0.2144 0.2336
Graph Dichotomy Algorithm and Its Applications … 93
2009 0.9052 0.5683 0.3778 0.2713 0.9226 0.0000 0.1467 0.0000 0.6468 0.2775 0.7250 0.7109 0.0695 0.3972 0.7070 0.4386 0.5222 0.7256 0.2481 0.3944 0.0878
0.3799 0.4280 0.6210 0.1593 0.0000 0.0000 0.0168 0.0861 0.0248 0.1528 0.5514 0.7251 0.8743 0.5831 0.2990 0.6525 0.4347 0.8363 0.6270 0.2677 0.6328
Table 1 (continued)
0.0693 0.7381 0.6289 0.6436 0.3593 0.0704 0.0212 0.1533 0.0700 0.0700 0.8381 0.5054 0.8254 0.5813 0.1906 0.2867 0.1025 0.9055 0.1696 0.2923 0.0032
0.0000 0.8902 0.6131 0.7405 0.1160 0.1403 0.3939 0.2181 0.0304 0.0524 0.5115 0.0814 0.2715 0.8144 0.1520 0.8962 0.0743 0.0615 0.2601 0.8959 0.5950
0.0000 0.8012 0.2703 0.5360 0.2220 0.1731 0.4206 0.1859 0.1799 0.3353 0.6685 0.0884 0.6023 0.0586 0.1836 0.2170 0.9024 0.0052 0.2357 0.9714 0.7653
0.3795 0.3539 0.7617 0.3068 0.0000 0.3345 0.5345 0.0023 0.0660 0.8760 0.3638 0.2975 0.3766 0.2513 0.1089 0.3401 0.7251 0.0885 0.3575 0.8386 0.7116
0.1549 0.0585 0.7491 0.6828 0.0000 0.5865 0.1052 0.9640 0.0693 0.6910 0.6394 0.4421 0.1195 0.4135 0.2149 0.5221 0.0000 0.3586 0.5072 0.4301 0.3996
0.1333 0.2119 0.4612 0.5246 0.0000 0.1772 0.0011 0.6963 0.2133 0.4656 0.2294 0.1526 0.5570 0.2440 0.2073 0.9001 0.0000 0.4447 0.1995 0.4417 0.5846
0.7574 0.0000 0.7431 0.7803 0.4205 0.0000 0.5108 0.3394 0.1373 0.6258 0.0984 0.0000 0.3896 0.4505 0.3764 0.4815 0.9166 0.2108 0.5726 0.4916 0.0000
0.3778 0.1635 0.9269 0.2384 0.1060 0.5764 0.3755 0.7571 0.5104 0.4427 0.3595 0.2337 0.9205 0.5017 0.7793 0.2564 0.2200 0.3295 0.5357 0.0046 0.0071
0.1911 0.1891 0.8967 0.1322 0.4972 0.6639 0.0832 0.6657 0.1380 0.5203 0.7741 0.5482 0.7125 0.1706 0.5101 0.0000 0.4389 0.1525 0.1618 0.0160 0.2571
(continued)
0.4404 0.2104 0.2944 0.6851 0.2719 0.4572 0.2175 0.1372 0.4689 0.7472 0.2048 0.3397 0.3366 0.0739 0.7521 0.2827 0.0138 0.4465 0.5491 0.1046 0.9086
94 A. Rubchinsky
0.0587 0.0328 0.1256 0.0180 0.4740 0.4206 0.2776 0.4836 0.5910 0.4188
0.2527 0.0641 0.1535 0.0406 0.0504 0.3344 0.7815 0.4867 0.4487 0.3029
Table 1 (continued)
0.1082 0.0452 0.1817 0.0128 0.9018 0.4656 0.0955 0.4271 0.5199 0.2635
0.0021 0.0887 0.5316 0.0432 0.7095 0.0619 0.6334 0.7479 0.3140 0.4332
0.0231 0.6216 0.4462 0.4253 0.7589 0.0746 0.8714 0.8587 0.6346 0.0305
0.1115 0.4949 0.4545 0.5369 0.6606 0.1143 0.8072 0.5494 0.5566
0.1125 0.7952 0.1975 0.5724 0.3861 0.0083 0.8023 0.3635 0.4869
0.0436 0.4627 0.2775 0.4838 0.0118 0.0478 0.3687 0.4121 0.3677
0.1088 0.6745 0.3265 0.4263 0.7592 0.0144 0.0994 0.3974 0.3332
0.0000 0.4300 0.1210 0.5941 0.5926 0.0188 0.4077 0.3661 0.1347
0.0000 0.2240 0.0884 0.7131 0.5856 0.3778 0.3275 0.2508 0.1792
0.0000 0.1180 0.0540 0.6742 0.6044 0.4649 0.0976 0.2551 0.3105
Graph Dichotomy Algorithm and Its Applications … 95
96
A. Rubchinsky
Table 2 Dichotomy complexity evaluations for several days 1 2 3 4 5
0.263157 0.285477 0.276258 0.278195 0.279098
0.374670 0.378985 0.384670 0.372934 0.381546
0.000000 0.000000 0.000000 0.000000 0.111111
0.788328 0.795698 0.793323 0.777154 0.798290
0.305912 0.303997 0.297838 0.307115 0.306610
0.314111 0.327307 0.315890 0.322806 0.314019
Table 3 Dichotomy complexity prior two big crises 1a 2a 3a a 1b 2b 3b b
26.02.2001 27.02.2001 28.02.2001 01.03.2001 02.03.2001 03.03.2001 04.03.2001 0.1697 0.6978 0.8313 0.6256 0.9541 0.3815 0.3028 0.1727 0.6978 0.8274 0.6218 0.9613 0.3740 0.3089 0.1818 0.6899 0.8241 0.6190 0.9567 0.3930 0.3141 0.0121 0.0079 0.0072 0.0066 0.0072 0.0190 0.0102 16.09.2008 17.09.2008 18.09.2008 19.09.2008 20.09.2008 21.09.2008 22.09.2008 0.2816 0.1738 0.8262 0.4806 0.9546 0.5419 0.1922 0.2820 0.1809 0.8363 0.4763 0.9560 0.5437 0.1903 0.2694 0.1764 0.8304 0.5033 0.9516 0.5434 0.1922 0.0122 0.0071 0.0101 0.0270 0.0044 0.0018 0.0019
3.2 Short-Term Prediction of Big Crises Let us consider data in Table 3 in more detail. The plots of entropy for two periods in 7 days are shown in Fig. 3. These plots are similar: both are of M-type form. In order to better understand similarity of these two plots, and, perhaps, similarity of corresponding periods, let us consider the following system of linear inequalities, constructed using the data of Table 1. Put real variable Z0 in correspondence to the right column of Table 1, variable Z1 —to the column left to the right column, and so on, till variable Z6 , corresponding to the left column of the table. The following system reflects relations between these variables: Z2 Z4 Z3 Z1 Z0
> > > > <
Z0 , Z2 > Z1 , Z2 > Z3 , Z2 > Z4 , Z2 > Z5 , Z2 > Z6 , Z2 > 0.95; Z0 , Z4 > Z1 , Z4 > Z3 , Z4 > Z5 , Z4 > Z6 , Z4 > 0.82; . (3) Z0 , Z3 > Z6 , Z3 < 0.8, Z3 > 0.45; Z0 , Z1 < 0.60, Z5 > Z6 − 0.15, Z5 < 0.75; 0.33, Z6 < 0.30.
By the construction, any row of Table 3, considered as a vector of dimension 7, satisfies system (3). Thus, 7-tuples of dichotomies complexities for periods of 7 days, ending on March 4, 2001 and September 22, 2008 unshakably, satisfy system (3). What about 7-tuples ending at all the other days of the considered 9-year period? We can simply take all the necessary values from Table 1 and check them. The result seems unexpected: none of these remaining 3285 7-tuples satisfies system (3).
Graph Dichotomy Algorithm and Its Applications …
97
Fig. 3 Plots of day entropy prior two big crises
System (3) includes 23 inequalities. Some of them are consequences of others ones. An arbitrary 7-tuple of values X 0 , X 1 , …, X 6 satisfies some of these inequalities and does not satisfy the others of them. Assume the inequality X i > X j is infringed. It means that actually X i ≤ X j . Infringed inequality X i > C means that actually X i ≤ C. Notice that system (3) includes only inequalities with one or two variables. Define S ij X j – X i , S iC C – X i for all the infringed inequalities. Assume S max max Si j , max SiC . The values S for every of 3285 days are calculated and presented in Table 4. The minimal value of S over all the days of the period (except two abovementioned special days) is equal to 0.0886. It happens on August 12, 2006. Remember that computational experiments have demonstrated that the range of every random value in Table 1 does not exceed 0.03. Because each inequality in system (3) does not conclude more than two variables, their summary deviation cannot exceed 0.06 0.03 × 2. At the same time, at least one infringement at every day exceeds the minimal value 0.0886. It means that the corresponding infringement S remains positive under all the technically possible deviations of the found values of dichotomies complexity. Therefore, system (3) can be used as the reliable pattern for short-term prediction of an incoming big crisis. Indeed, it does not make any mistake for the considered 9-year period. The prediction algorithm is as follows. 1. Every day after the closure, consider prices of all the shares and form together with prices in the previous 15 days the correlation matrix. 2. Execute the operation described in Sect. 3.2. As the result, the value of dichotomy complexity for the current day is found. 3. Consider 7-tuple of values, where X 0 is the new found value, X 1 is the dichotomy complexity for the previous day found yesterday, and so on, till the value X 6 , found 6 days ago.
0.7364 0.5453 0.9011 0.3316 0.7885 0.5004 0.6617 0.7384 0.6974 0.9192 0.5544 0.8348 0.6148 0.3804 0.8269 0.6252 0.9004 0.6984 0.6327 0.6294 0.7254 0.9438 0.5411
2001
0.7463 0.6024 0.7636 0.5774 0.3613 0.4511 0.9175 0.7180 0.7076 0.8632 0.3384 0.8360 0.6148 0.5693 0.2993 0.7829 0.6718 0.6417 0.8531 0.4488 0.9131 0.9477 0.3423
0.7650 0.5774 0.5708 0.5317 –1.0 0.3704 0.9489 0.5949 0.8562 0.6256 0.1970 0.8599 0.5089 0.6083 0.3525 0.6622 0.7704 0.5684 0.6293 0.4749 0.9500 0.9500 0.5112
0.8482 0.5447 0.4741 0.5879 0.5873 0.5534 0.7875 0.5949 0.4948 0.7435 0.2816 0.7393 0.3742 0.8795 0.4543 0.6622 0.5418 0.5138 0.7231 0.3686 0.9412 0.9500 0.5996
0.8687 0.5490 0.6340 0.4017 0.6523 0.5435 0.9546 0.6283 0.7262 0.5479 0.4768 0.7299 0.6552 0.8243 0.7926 0.2938 0.4139 0.7489 0.6602 0.5575 0.8200 0.9500 0.3296
0.8419 0.3876 0.5218 0.6878 0.5873 0.4757 0.6158 0.4766 0.3614 0.2475 0.6123 0.6093 0.4801 0.7495 0.7926 0.6006 0.8450 0.7503 0.7829 0.5918 0.9344 0.9252 0.3300
Table 4 Everyday maximal deviations from the considered linear pattern 0.7620 0.8419 0.7002 0.5477 0.4190 0.6613 0.8541 0.3186 0.4503 0.5951 0.6827 0.4943 0.7381 0.5252 0.7620 0.6564 0.3338 0.7884 0.7111 0.5302 0.8536 0.6472 0.9500 0.8420
0.6437 0.5724 0.7002 0.3918 0.5578 0.8679 0.9500 0.6158 0.4566 0.7225 0.6045 0.3684 0.6633 0.6634 0.7252 0.4876 0.4800 0.7884 0.7489 0.6529 0.7691 0.8044 0.7952 0.5529
0.7179 0.5993 0.2382 0.7300 0.3693 0.7773 0.9477 0.9399 0.5269 0.5318 0.6173 0.4643 0.6633 0.3560 0.6753 0.6393 0.5293 0.6003 0.5604 0.6685 0.8534 0.8365 0.8200 0.8216
0.5569 0.5412 0.7002 0.5917 0.7166 0.7379 0.8556 0.8376 0.3832 0.5925 0.6957 0.4262 0.1673 0.5334 0.6753 0.4607 0.5750 0.2263 0.3884 0.4779 0.6391 0.5865 0.4938 0.8216
0.5879 0.5724 0.8388 0.6000 0.7773 0.6473 0.8177 0.9399 0.8357 0.8866 0.5921 0.7355 0.6896 0.3633 0.8206 0.7552 0.3162 0.3273 0.6719 0.7594 0.8554 0.7065 0.6153 0.7357
(continued)
0.8763 0.6232 0.9011 0.4617 0.5940 0.3502 0.8572 0.8348 0.6840 0.8305 0.6580 0.5596 0.6241 0.3892 0.8269 0.4655 0.5971 0.6703 0.3398 0.5284 0.4958 0.9500 0.4723 0.8156
98 A. Rubchinsky
0.7357 0.5055 0.7929 0.8660 0.2458 0.8101 0.7531 2002 0.7264 0.4524 0.6292 0.4858 0.3554 0.6648 0.8076 0.3170 0.4620 0.6952 0.8737 0.2083 0.8676 0.9164 0.5967 0.6477 0.9489 0.9451
0.6057 0.3619 0.7551 0.7360 0.6608 0.7227 0.6896
0.7271 0.3225 0.7040 0.3558 0.3197 0.5621 0.2704 0.4350 0.4049 0.5652 0.7927 0.6254 0.7376 0.9164 0.4597 0.5558 0.9647 0.9355
0.9148 0.6332 0.5880 0.8857 0.6025 0.7227 0.6869
0.8264 0.7039 0.8228 0.6234 0.5457 0.2838 0.7644 0.6594 0.3067 0.2997 0.6933 0.5487 0.7225 0.6625 0.5516 0.7093 0.9647 0.9500
Table 4 (continued)
0.7271 0.7105 0.8022 0.4934 0.6412 0.4080 0.6156 0.4532 0.7046 0.3955 0.6388 0.6021 0.4400 0.8894 0.4859 0.5973 0.3919 0.9396
0.7848 0.6566 0.6895 0.7924 0.4488 0.4276 0.6312 0.5406 0.4968 0.6129 0.6786 0.4825 0.4927 0.7356 0.6237 0.7890 0.4719 0.6627 0.6021 0.2600 0.9003 0.7581 0.7461 0.8443 0.8055
0.6193 0.6648 0.8485 0.6065 0.5192 0.7779 0.7009 0.3670 0.6262 0.6483 0.6735 0.5927 0.7891 0.8148 0.3257 0.6026 0.4243 0.5814 0.4680 0.2772 0.9041 0.5960 0.8986 0.7583 0.8096
0.5119 0.6128 0.6751 0.7779 0.5458 0.6976
0.3119 0.7301 0.6872 0.6714 0.3854 0.9186 0.6641 0.6250 0.6590 0.2551 0.6441 0.5449 0.8373 0.8586 0.7036 0.7136 0.5517 0.5705
0.4893 0.5538 0.7185 0.6319 0.5017 0.6976
0.5953 0.6500 0.8449 0.5692 0.6886 0.6871 0.7356 0.8541 0.6754 0.2545 0.6550 0.3362 0.6922 0.7741 0.6963 0.8473 0.7612 0.8106
0.4480 0.7453 0.5451 0.8703 0.6325 0.6905
0.5507 0.6001 0.7261 0.5671 0.3372 0.9311 0.6214 0.4927 0.8938 0.5085 0.7511 0.5215 0.8615 0.8509 0.5736 0.6927 0.9500 0.6088
0.2840 0.4541 0.9375 0.8811 0.7451 0.3211
0.4296 0.7144 0.8243 0.6070 0.5586 0.9105 0.5794 0.7241 0.8717 0.6440 0.5860 0.6267 0.6695 0.6436 0.5663 0.8329 0.9362 0.9232
0.1720 0.6153 0.6128 0.8811 0.7564 0.6969
0.5824 0.7008 0.4275 0.4782 0.4269 0.9376 0.6594 0.6063 0.7638 0.6363 0.6633 0.3628 0.4856 0.7209 0.7490 0.9331 0.8751 0.7490
0.6355 0.9229 0.8075 0.4672 0.7898 0.4270
(continued)
0.6232 0.7523 0.5293 0.4979 0.4443 0.8944 0.6857 0.4336 0.7417 0.8233 0.4380 0.4276 0.6091 0.6157 0.4371 0.7420 0.8604 0.8861
0.6332 0.4602 0.6513 0.5158 0.6264 0.6166
Graph Dichotomy Algorithm and Its Applications … 99
0.8555 0.4976 0.7533 0.7522 0.3692 0.7485 0.5321 0.6872 0.6239 0.3044 0.6812 0.8323 0.9830 2003 0.5835 0.8257 0.6638 0.6542 0.8347 0.6787 0.6764 0.9489 0.8463 0.8066 0.6569 0.7061
0.7255 0.7004 0.6889 0.5548 0.5240 0.4302 0.3806 0.8004 0.6894 0.7959 0.7071 0.7124 0.7982
0.4206 0.7885 0.7818 0.7792 0.7141 0.4456 0.9112 0.9489 0.7163 0.6766 0.1603 0.6698
0.7999 0.9500 0.7198 0.7736 0.4943 0.5738 0.6338 0.6735 0.4612 0.6455 0.7131 0.6810 0.9830
0.6830 0.7814 0.7358 0.6052 0.8501 0.6038 0.6221 0.9349 0.4662 0.8128 0.6150 0.9500
Table 4 (continued)
0.4079 0.8103 0.9178 0.6987 0.7201 0.5044 0.9078 0.8049 0.3800 0.5487 0.4980 0.8200
0.7083 0.9369 0.2048 0.7580 0.9489 0.9192 0.5617 0.6402 0.7114 0.7931 0.7283 0.5510 0.4855 0.4645 0.8103 0.6518 0.6492 0.5874 0.6353 0.9078 0.8189 0.6434 0.5505 0.5588 0.4566
0.4272 0.9500 0.7264 0.7580 0.7865 0.8213 0.7604 0.6831 0.5594 0.8433 0.7223 0.6435 0.2751 0.4135 0.3089 0.9347 0.5687 0.5684 0.7610 0.7712 0.5537 0.4053 0.8269 0.5605 0.7675
0.5891 0.8602 0.6768 0.5215 0.9500 0.7999 0.9109 0.5393 0.6834 0.7977 0.4248 0.5853
0.8435 0.6432 0.9456 0.5816 0.8211 0.5130 0.8890 0.5250 0.5134 0.4205 0.5436 0.4812
0.6399 0.8200 0.7721 0.3953 0.9168 0.7430 0.8981 0.5939 0.5688 0.7133 0.2400 0.5135
0.4997 0.4779 0.9175 0.3696 0.8763 0.6353 0.3090 0.4536 0.7419 0.7778 0.7309 0.6375
0.7111 0.7766 0.6276 0.4643 0.8200 0.6364 0.9379 0.3780 0.7641 0.8177 0.2511 0.4553
0.8893 0.3744 0.8156 0.8321 0.6623 0.5321 0.8253 0.3950 0.5879 0.5937 0.6564 0.6617
0.6675 0.5828 0.6421 0.4383 0.7868 0.7476 0.7681 0.6094 0.7686 0.8802 0.8289 0.4682
0.6174 0.6717 0.7875 0.9333 0.8211 0.2387 0.9081 0.8988 0.6119 0.7272 0.7262 0.7094
0.5811 0.6678 0.8542 0.3304 0.5067 0.7357 0.8079 0.4899 0.7779 0.9235 0.8289 0.3936
0.8021 0.5467 0.7842 0.7066 0.7496 0.5666 0.6953 0.6759 0.6156 0.7869 0.5264 0.7751
0.5520 0.8030 0.8822 0.4577 0.8678 0.7107 0.8172 0.5327 0.7779 0.8112 0.7267 0.5045
(continued)
0.8426 0.8658 0.7352 0.8033 0.7338 0.7521 0.8510 0.7688 0.9428 0.5972 0.5962 0.6843
0.9500 0.8498 0.8385 0.4940 0.7485 0.6987 0.8035 0.5912 0.7406 0.7935 0.6531 0.9500
100 A. Rubchinsky
0.6451 0.9489 0.5807 0.4650 0.5938 0.2620 0.5982 0.3476 0.5103 0.5206 0.4614 0.7895 0.6399 0.3807 0.8764 0.6197 0.7512 0.5771 0.6027 2004 0.8612 0.6274 0.4488 0.6206 0.8653 0.5170
0.3094 0.8189 0.4289 0.3408 0.9179 0.6069 0.4933 0.4289 0.3153 0.6877 0.1586 0.8014 0.8371 0.5685 0.8677 0.7935 0.9500 0.6955 0.7495
0.8650 0.3706 0.8324 0.5217 0.4901 0.2820
0.5254 0.7812 0.4079 0.5394 0.6976 0.6320 0.6020 0.5987 0.5313 0.6227 0.3786 0.8170 0.8516 0.5993 0.7446 0.7964 0.9466 0.5603 0.5475
0.6638 0.6259 0.8316 0.5757 0.8653 0.6557
Table 4 (continued)
0.3719 0.5667 0.8388 0.6822 0.4975 0.5257
0.2395 0.6512 0.3384 0.6574 0.9103 0.4934 0.7041 0.4837 0.5409 0.5932 0.4306 0.7048 0.7216 0.4763 0.5193 0.6664 0.9236 0.6800 0.6487 0.7499 0.5777 0.7024 0.7202 0.4698 0.5353
0.6249 0.6464 0.3059 0.6277 0.8937 0.6097 0.6683 0.3024 0.8330 0.5577 0.6334 0.6406 0.8922 0.4385 0.7377 0.8315 0.8200 0.6193 0.6195 0.8509 0.2365 0.7088 0.5522 0.5766 0.8362
0.4511 0.5166 0.4036 0.6088 0.8307 0.4372 0.6683 0.2276 0.8515 0.4632 0.4897 0.6771 0.7148 0.2631 0.5874 0.5975 0.7936 0.6038
0.7610 0.5908 0.4871 0.5902 0.3207 0.7672
0.4977 0.8243 0.5971 0.6266 0.7637 0.5682 0.3779 0.5875 0.5836 0.5532 0.9179 0.8946 0.7973 0.7409 0.4002 0.7015 0.7218 0.3847
0.7209 0.5560 0.4013 0.6504 0.6298 0.7062
0.6787 0.9321 0.5094 0.4788 0.7007 0.5720 0.5687 0.8301 0.8330 0.3784 0.5512 0.7097 0.7042 0.6594 0.4081 0.6166 0.8664 0.4477
0.6799 0.3901 0.6176 0.6893 0.3501 0.7492
0.4451 0.8572 0.4865 0.5902 0.9430 0.3020 0.3211 0.7762 0.8696 0.7342 0.7879 0.7646 0.7042 0.9029 0.3539 0.3466 0.5918 0.3925
0.6382 0.5612 0.6822 0.4919 0.5486 0.6477
0.9468 0.8212 0.3118 0.5573 0.5583 0.5054 0.6489 0.8664 0.8642 0.7066 0.5266 0.8458 0.3350 0.8788 0.6504 0.4078 0.7364 0.5223
0.6976 0.5788 0.6182 0.9042 0.6470 0.7475
0.9112 0.7272 0.5398 0.7238 0.8130 0.6024 0.6142 0.6462 0.8511 0.7176 0.4516 0.7699 0.3512 0.7729 0.4070 0.7404 0.5142 0.7327
(continued)
0.6976 0.6093 0.5522 0.6612 0.4186 0.6132
0.8168 0.6912 0.6093 0.4273 0.7620 0.5678 0.4938 0.7364 0.7527 0.6900 0.9470 0.7682 0.4250 0.8746 0.6179 0.9500 0.6052 0.6230
Graph Dichotomy Algorithm and Its Applications … 101
0.6175 0.8051 0.6897 0.8170 0.2293 0.7686 0.3851 0.8497 0.5316 0.7308 0.5774 0.7483 0.6553 0.7495 0.9107 0.7523 0.2256 0.6474 0.7352 0.6748 0.9500 0.4108 0.3656 0.6208 0.6599
0.2599 0.8771 0.8519 0.6013 0.5319 0.6705 0.6439 0.6814 0.7934 0.7522 0.7015 0.3892 0.4160 0.9073 0.8290 0.8168 0.5401 0.5844 0.6876 0.5629 0.9500 0.7702 0.4652 0.7059 0.3840
Table 4 (continued)
0.4172 0.6751 0.8234 0.8007 0.4672 0.9500 0.6610 0.7745 0.3857 0.6931 0.8685 0.6428 0.2185 0.7352 0.7807 0.6223 0.6727 0.7459 0.6052 0.4859 0.9500 0.4263 0.5629 0.3223 0.3582
0.4391 0.7471 0.7976 0.6108 0.6116 0.8949 0.2124 0.6767 0.7934 0.4134 0.6033 0.2536 0.3609 0.7773 0.7370 0.7871 0.4101 0.5014 0.5576 0.6623 0.8872 0.6402 0.5485 0.6627 0.5541
0.8506 0.6060 0.9421 0.6707 0.5199 0.8937 0.5310 0.2239 0.7934 0.5631 0.7436 0.5298 0.4775 0.7147 0.8691 0.7825 0.5427 0.7995 0.7165 0.3559 0.9179 0.9500 0.7730 0.5573 0.5125
0.7164 0.5418 0.9412 0.4946 0.4473 0.9238 0.4661 0.6569 0.6747 0.3950 0.8892 0.5370 0.7359 0.6641 0.6070 0.9479 0.4386 0.6174 0.7801 0.5323 0.7653 0.7854 0.8811 0.5604 0.7544
0.7206 0.4760 0.8571 0.2510 0.2593 0.7637 0.3679 0.6569 0.5584 0.2424 0.3892 0.4560 0.8494 0.5847 0.7391 0.6525 0.3649 0.6695 0.7918 0.7361 0.7879 0.8200 0.8062 0.5870
0.4905 0.4118 0.8956 0.5186 0.5327 0.7938 0.5661 0.2446 0.7311 0.5237 0.7592 0.8547 0.8371 0.5880 0.8005 0.8179 0.5518 0.9135 0.6501 0.8900 0.8426 0.6928 0.7511 0.3170
0.4043 0.4740 0.9500 0.6073 0.4972 0.7434 0.7181 0.4261 0.7848 0.4683 0.6051 0.6412 0.7688 0.3623 0.9167 0.6396 0.7840 0.9053 0.8492 0.6986 0.4425 0.7453 0.6762 0.1795
0.5526 0.4530 0.7656 0.7440 0.6901 0.6642 0.6918 0.6763 0.6011 0.2550 0.8861 0.7852 0.9256 0.7579 0.6705 0.6461 0.5450 0.9410 0.4129 0.9255 0.7126 0.5709 0.6840 0.6132
0.8127 0.5235 0.8200 0.6139 0.5604 0.6134 0.6906 0.6930 0.6548 0.4076 0.8613 0.7557 0.8795 0.8579 0.8010 0.5096 0.6540 0.8623 0.8048 0.8825 0.5408 0.6234 0.7508 0.6930
(continued)
0.7723 0.5565 0.5275 0.6140 0.7916 0.7412 0.7745 0.6930 0.8822 0.7978 0.7731 0.6552 0.9388 0.9500 0.8073 0.3781 0.5135 0.8110 0.4413 0.9403 0.5490 0.4866 0.5540 0.5906
102 A. Rubchinsky
2005 0.4062 0.9186 0.5237 0.7291 0.8642 0.7889 0.3485 0.7143 0.8548 0.8557 0.4166 0.5243 0.6720 0.4879 0.1535 0.8515 0.9479 0.5376 0.5357 0.7258 0.7993 0.7684 0.5372 0.4936 0.8728 0.5698
0.7027 0.8011 0.3670 0.8572 0.7814 0.6503 0.5619 0.7408 0.8763 0.7545 0.5656 0.7411 0.7076 0.4715 0.5774 0.7240 0.8800 0.5031 0.5549 0.6841 0.7466 0.4336 0.5913 0.9500 0.9409 0.4831
Table 4 (continued)
0.6055 0.7886 0.2668 0.5991 0.7342 0.6589 0.4188 0.6381 0.4094 0.9500 0.6970 0.2411 0.5620 0.5878 0.7219 0.7215 0.8179 0.4766 0.4057 0.5803 0.7272 0.9500 0.5414 0.7814 0.7720 0.3257
0.8551 0.6846 0.3521 0.7729 0.6621 0.5768 0.8956 0.7102 0.8693 0.9500 0.5656 0.6111 0.8451 0.7789 0.5062 0.8220 0.7500 0.5929 0.4073 0.6744 0.6166 0.5646 0.5874 0.8200 0.8109 0.3083
0.5223 0.5948 0.3774 0.8190 0.6479 0.3695 0.8037 0.6925 0.7956 0.8756 0.6970 0.8442 0.9500 0.4578 0.5919 0.5556 0.6230 0.3523 0.5947 0.9170 0.5972 0.8200 0.7059 0.4861 0.7136 0.5210
0.7251 0.5948 0.4725 0.6429 0.5321 0.9351 0.7656 0.7002 0.7393 0.9500 0.6773 0.5773 0.7759 0.6489 0.4237 0.6920 0.6765 0.4629 0.7385 0.4170 0.3778 0.6791 0.4989 0.1740 0.8037 0.4289
0.3329 0.4972 0.5704 0.8086 0.5179 0.9131 0.5491 0.6820 0.6421 0.8313 0.1790 0.7142 0.8200 0.5860 0.8549 0.7538 0.9489 0.2987 0.6982 0.9489 0.5038 0.8701 0.8771 0.5458 0.5836 0.6890
0.4210 0.8292 0.8070 0.9020 0.4982 0.9349 0.3694 0.7772 0.3707 0.8200 0.6123 0.6595 0.6870 0.4668 0.6560 0.6013 0.7970 0.8934 0.8128 0.9500 0.2422 0.7124 0.6877 0.6064 0.8688 0.6458
0.8457 0.3319 0.9500 0.7429 0.6368 0.8418 0.5400 0.2772 0.2062 0.7013 0.6327 0.3696 0.5312 0.5795 0.7249 0.6750 0.8189 0.5947 0.5682 0.8189 0.5465 0.8310 0.7471 0.4814 0.6207 0.5590
0.7742 0.7894 0.6770 0.8804 0.3622 0.8049 0.4368 0.6820 0.5269 0.4215 0.6879 0.6382 0.5570 0.6348 0.5260 0.1750 0.6670 0.7634 0.6828 0.8200 0.5297 0.5617 0.5577 0.8167 0.7388 0.8978
0.9489 0.7394 0.8200 0.7049 0.5980 0.7498 0.4100 0.8763 0.6263 0.6380 0.5930 0.6382 0.6179 0.4812 0.4848 0.5450 0.6676 0.6122 0.8558 0.6160 0.6872 0.7010 0.5187 0.9285 0.6998 0.6863
(continued)
0.9119 0.6594 0.6625 0.9114 0.4919 0.6770 0.7037 0.9500 0.8845 0.3680 0.5585 0.5740 0.4543 0.5048 0.8540 0.8413 0.6331 0.6849 0.5377 0.5278 0.5636 0.6026 0.5906 0.8807 0.6131 0.9081
Graph Dichotomy Algorithm and Its Applications … 103
0.7518 0.7828 0.6233 0.4925 0.6241 2006 0.6585 0.8601 0.4444 0.2028 0.6452 0.5019 0.4805 0.8320 0.5395 0.6871 0.6962 0.4842 0.8530 0.7319 0.5140 0.5848 0.2885 0.6428 0.7896 0.6250
0.6744 0.6528 0.7676 0.6289 0.6116
0.5285 0.9419 0.3144 0.4916 0.4887 0.6154 0.7136 0.7020 0.5215 0.6357 0.3519 0.5309 0.7319 0.8772 0.7637 0.4536 0.4827 0.8262 0.6596 0.7223
0.7781 0.6776 0.8591 0.5482 0.2674
0.6657 0.9419 0.5461 0.3646 0.6452 0.4998 0.6675 0.6363 0.5381 0.7013 0.5828 0.3669 0.9500 0.7319 0.8105 0.5982 0.5117 0.8062 0.8757 0.7223
Table 4 (continued)
0.6596 0.7645 0.4161 0.5678 0.4652 0.3638 0.7327 0.6839 0.4316 0.3657 0.4145 0.4117 0.8200 0.4932 0.8102 0.8913 0.8564 0.8685 0.7457 0.6615
0.4961 0.8697 0.7952 0.4894 0.4708 0.7396 0.7698 0.3294 0.4935 0.5946 0.8182 0.5960 0.5439 0.7577 0.3944 0.7561 0.7605 0.4308 0.7472 0.7056 0.7542 0.6641 0.6968 0.5425 0.6412
0.5444 0.8690 0.8295 0.6038 0.4909 0.4205 0.8510 0.7360 0.3777 0.4386 0.9500 0.6027 0.6429 0.7479 0.6468 0.6279 0.6359 0.5790 0.8714 0.7351 0.8913 0.7264 0.7385 0.7469 0.7369
0.7541 0.7397 0.3954 0.7317
0.6596 0.6653 0.8382 0.4320 0.6692 0.6882 0.2214 0.3936 0.6277 0.7635 0.6279 0.8335 0.2782 0.6814 0.5548 0.6004 0.9325 0.7484 0.5585 0.5959
0.6571 0.7654 0.7657 0.7014
0.4382 0.7210 0.6060 0.5966 0.3172 0.8200 0.3953 0.4886 0.6179 0.7737 0.4493 0.6916 0.5182 0.8420 0.5682 0.8744 0.7822 0.5486 0.0886 0.8146
0.6241 0.9421 0.4341 0.7014
0.6149 0.5931 0.9500 0.5180 0.6416 0.5924 0.5844 0.3812 0.5149 0.7737 0.3375 0.8838 0.5571 0.7778 0.6032 0.7235 0.9500 0.9454 0.6418 0.9500
0.5271 0.9500 0.6545 0.2127
0.5700 0.8022 0.9266 0.5978 0.7835 0.5158 0.7332 0.4511 0.7448 0.8666 0.8218 0.7975 0.4992 0.7715 0.6497 0.8158 0.6522 0.8186 0.5893 0.8257
0.4241 0.8121 0.6756 0.5783
0.4380 0.4917 0.8200 0.6804 0.5908 0.4170 0.8879 0.6695 0.6632 0.8262 0.8218 0.7538 0.8337 0.6094 0.4399 0.5935 0.8200 0.8154 0.6418 0.8257
0.7788 0.8200 0.2775 0.6544
(continued)
0.9470 0.6722 0.7966 0.2284 0.6535 0.4882 0.6056 0.6285 0.8313 0.7366 0.6897 0.7599 0.7622 0.7636 0.6438 0.6858 0.7694 0.7401 0.7820 0.6176
0.4091 0.8734 0.6545 0.5480
104 A. Rubchinsky
0.4632 0.6112 0.6505 0.5612 0.8272 0.6000 0.5960 0.2421 0.6037 0.5650 0.4075 2007 0.9466 0.7601 0.8322 0.7923 0.3244 0.6868 0.8087 0.6375 0.4985 0.4148 0.4743 0.6879 0.7183 0.7217
0.2232 0.6201 0.6762 0.6579 0.6972 0.8396 0.5499 0.5477 0.8401 0.8600 0.6357
0.8166 0.6853 0.7022 0.8901 0.7424 0.7277 0.5344 0.3054 0.3500 0.6597 0.4606 0.6577 0.5883 0.9326
0.4638 0.7585 0.7417 0.5614 0.6822 0.5449 0.6294 0.3160 0.6708 0.6557 0.5895
0.8095 0.7901 0.8895 0.8050 0.7324 0.4722 0.7878 0.7347 0.5310 0.4398 0.5938 0.6329 0.4256 0.8921
Table 4 (continued)
0.6795 0.6601 0.7595 0.8264 0.6382 0.8442 0.4023 0.7798 0.4010 0.5706 0.5149 0.6420 0.5754 0.5211
0.6745 0.6285 0.6590 0.6470 0.5522 0.6518 0.7460 0.6556 0.7842 0.6961 0.5131 0.5725 0.5813 0.4893 0.8819 0.6701 0.7810 0.8005 0.6533 0.5467 0.6360 0.5112 0.4789 0.5253 0.8921
0.6557 0.3022 0.5936 0.5279 0.7407 0.7096 0.6621 0.5067 0.7659 0.7300 0.7282 0.6148 0.7337 0.6163 0.8787 0.7023 0.7142 0.5714 0.9489 0.7814 0.7713 0.3858 0.5525 0.4913 0.4568
0.5445 0.3781 0.5368 0.5341 0.7980 0.6730 0.6160 0.5421 0.7255 0.2868
0.5842 0.7831 0.6598 0.7600 0.5401 0.6510 0.7089 0.8016 0.6924 0.8168 0.5205 0.2825 0.4125 0.3553
0.5257 0.4715 0.5312 0.5616 0.6107 0.6190 0.5321 0.2115 0.6359 0.4227
0.3464 0.8687 0.3123 0.8713 0.5723 0.9500 0.7089 0.8189 0.6514 0.8135 0.3066 0.2913 0.7092 0.6095
0.2946 0.2419 0.5314 0.4679 0.6929 0.6190 0.6037 0.7065 0.5955 0.4020
0.2525 0.7306 0.6740 0.8066 0.7504 0.9500 0.8316 0.7395 0.5795 0.6868 0.4154 0.6198 0.3785 0.3211
0.3344 0.6365 0.5110 0.7219 0.3993 0.4399 0.6938 0.5137 0.7168 0.4120
0.6343 0.7387 0.5628 0.7672 0.2504 0.9360 0.4333 0.5203 0.3745 0.6835 0.5449 0.6591 0.6145 0.3868
0.9456 0.8105 0.5674 0.7171 0.4979 0.5935 0.6993 0.5765 0.5668 0.5744
0.7663 0.7795 0.6343 0.7024 0.6204 0.9387 0.7675 0.6285 0.5069 0.4326 0.5167 0.7141 0.8517 0.7206
0.6419 0.7470 0.4166 0.5973 0.5822 0.4456 0.6993 0.4950 0.6950 0.5375
(continued)
0.7611 0.4268 0.7519 0.4341 0.6022 0.8794 0.8647 0.4523 0.5698 0.4616 0.4158 0.6466 0.8610 0.7812
0.8339 0.6805 0.5668 0.5871 0.6296 0.4809 0.5121 0.7402 0.5688 0.6878
Graph Dichotomy Algorithm and Its Applications … 105
0.6488 0.7688 0.4993 0.7257 0.8965 0.4792 0.3524 0.8100 0.2544 0.4564 0.7064 0.5295 0.6284 0.7199 0.6156 0.4591 0.2901 2008 0.6075 0.9260 0.7692 0.6367 0.5856 0.5237 0.8395 0.6181
0.5992 0.6388 0.5620 0.7512 0.9500 0.5162 0.6612 0.8670 0.8443 0.3480 0.5764 0.2982 0.8352 0.8475 0.8195 0.3816 0.6881
0.4360 0.9405 0.6308 0.4633 0.5915 0.8290 0.7095 0.8416
0.6512 0.6406 0.6054 0.6623 0.9500 0.6271 0.5150 0.9092 0.5992 0.4919 0.5059 0.5525 0.3647 0.6663 0.7491 0.5661 0.4740
0.6050 0.8147 0.6969 0.3804 0.5162 0.6728 0.6258 0.7101
Table 4 (continued)
0.5508 0.8593 0.6425 0.6061 0.5483 0.5477 0.4958 0.8021
0.7755 0.4819 0.5676 0.6197 0.8200 0.4508 0.2196 0.7792 0.7127 0.2898 0.6117 0.5998 0.8145 0.6003 0.9180 0.5893 0.6287 0.8463 0.8105 0.7395 0.8579 0.7232 0.7528 0.5534 0.8631
0.2934 0.4568 0.6662 0.6981 0.8200 0.3880 0.5312 0.5842 0.9359 0.8272 0.3163 0.3603 0.9500 0.8138 0.6841 0.5866 0.6287 0.5940 0.7293 0.5534 0.7434 0.6498 0.5655 0.6866 0.8449
0.6455 0.3519 0.3670 0.7782 0.6555 0.6147 0.7509 0.5183 0.7325 0.5614 0.7744 0.6544 0.9500 0.8138 0.8197 0.4593 0.6075 0.8330 0.8988 0.6095 0.8579 0.6498 0.5758 0.7174 0.8889
0.6519 0.2861 0.5676 0.5056 0.3762 0.6493 0.4913 0.5433 0.8059 0.8149 0.7968 0.7499 0.8200 0.5087 0.9395 0.8616 0.6050 0.9477 0.5843 0.6266 0.7382 0.6428 0.7350 0.6898 0.7317
0.3513 0.3968 0.5748 0.6482 0.4055 0.6571 0.6209 0.5805 0.7564 0.5374 0.6444 0.6842 0.8200 0.1532 0.6897 0.8342 0.4360 0.9353 0.8146 0.4663 0.6809 0.3728 0.4458 0.6898 0.7589
0.6226 0.5677 0.4083 0.9009 0.5770 0.5462 0.7243 0.6708 0.5073 0.6849 0.6668 0.6199 0.6215 0.6254 0.8731 0.7935 0.5508 0.9405 0.7819 0.8825 0.7582 0.4751 0.8912 0.4196 0.4533
0.5825 0.7167 0.6132 0.5274 0.7592 0.4808 0.4476 0.6020 0.6264 0.4147 0.7224 0.6634 0.7717 0.4722 0.9281 0.7411 0.8463 0.8746 0.8992 0.6174 0.7156 0.6436 0.9121 0.5367 0.2030
0.6912 0.6293 0.5664 0.7745 0.5145 0.4796 0.5943 0.5408 0.3020 0.4244 0.6162 0.6491 0.7041 0.2094 0.7431 0.6635 0.5940
(continued)
0.9447 0.8269 0.7847 0.6282 0.6514 0.7612 0.6192 0.5685
0.7706 0.7354 0.7923 0.8430 0.7239 0.6450 0.3261 0.4720 0.6219 0.6359 0.6151 0.2844 0.7963 0.5162 0.7981 0.6606 0.8330
106 A. Rubchinsky
0.8150 0.7861 0.9176 0.3056 0.8200 0.6174 0.8705 0.7290 0.9279 0.7714 0.4924 0.7918 0.7620 0.5580 0.7752 0.6385 0.5780 0.8171 0.9500 0.6731 0.7446 0.5080 0.2432
0.8707 0.3618 0.8332 0.5972 0.6983 0.8200 0.5344 0.8323 0.8590 0.7729 0.5652 0.6324 0.7737 0.4924 –1.0 0.5164 0.6213 0.5501 0.8733 0.9500 0.9215 0.7356 0.7479
Table 4 (continued)
0.7288 0.6561 0.7876 0.6056 0.7024 0.6295 0.3833 0.8071 0.8015 0.6926 0.3468 0.9405 0.7084 0.8805 0.4483 0.5698 0.4755 0.7880 0.8200 0.9463 0.9500 0.5789 0.7479
0.7407 0.6936 0.7032 0.6605 0.8410 0.5061 0.5068 0.8238 0.8709 0.8421 0.6729 0.8615 0.7993 0.6461 0.7594 0.5667 0.3851 0.7039 0.7689 0.9463 0.8969 0.6056 0.7391
0.4846 0.5016 0.7312 0.5131 0.6383 0.5357 0.6346 0.8493 0.8244 0.9325 0.6236 0.8105 0.7238 0.7505 0.5778 0.9685 0.5388 0.8867 0.8801 0.6124 0.8200 0.5891 0.7335
0.9341 0.5685 0.7945 0.5479 0.7553 0.6351 0.7012 0.8689 0.8244 0.8851 0.6946 0.7315 0.7238 0.7458 0.7594 0.9685 0.3702 0.6808 0.7427 0.7606 0.5453 0.5716 0.7335
0.9426 0.6150 0.6296 0.8493 0.7902 0.7928 0.6514 0.9500 0.4574 0.8025 0.7979 0.9112 0.4809 0.3886 0.7875 0.9685 0.4088 0.7567 0.7501 0.5499 0.7175 0.5007
0.8041 0.5785 0.7260 0.4179 0.8425 0.2928 0.5712 0.8857 0.5961 0.8800 0.8098 0.8398 0.4119 0.6158 0.6328 0.8955 0.5381 0.5508 0.1374 0.9500 0.8253 0.5777
0.8126 0.7922 0.7281 0.7525 0.8221 0.6628 0.4696 0.8200 0.5340 0.9500 0.6679 0.8481 0.7706 0.5280 0.7875 0.7866 0.7757 0.6256 0.4565 0.9463 0.8008 0.6515
0.5982 0.4862 0.6561 0.7131 0.8339 0.5290 0.8009 0.7557 0.6110 0.9729 0.8894 0.7098 0.4224 0.6806 0.7875 0.3816 0.6486 0.7441 0.6085 0.8891 0.7478 0.3882
0.5207 0.9137 0.5981 0.9500 0.7474 0.9109 0.8243 0.8886 0.8780 0.9729 0.9218 0.7181 0.6951 0.7736 0.4846 0.6324 0.7807 0.9500 0.8031 0.7487 0.6377 0.5772
(continued)
0.4856 0.8705 0.5261 0.6368 0.9500 0.8705 0.9459 0.6720 0.9029 0.6425 0.7624 0.8265 0.4500 0.7752 0.6385 0.6758 0.5186 0.9500 0.6658 0.5955 0.6178 0.7362
Graph Dichotomy Algorithm and Its Applications … 107
2009 0.6836 0.7589 0.8200 0.4491 0.8178 0.4528 0.8200 0.8668 0.6640 0.8120 0.4297 0.7216 0.8200 0.5310 0.7794 0.4436 0.9500 0.7118 0.7975 0.7882 0.9340
0.6681 0.5096 0.7396 0.6556 0.6842 0.7140 0.4928 0.7325 0.8128 0.4811 0.3773 0.7452 0.8048 0.6134 0.8761 0.3225 0.9001 0.9362 0.5035 0.4009 0.8454
Table 4 (continued)
0.1681 0.6289 0.6309 0.6787 0.6878 0.9500 0.8033 0.9500 0.3128 0.6820 0.5081 0.3006 0.8805 0.5528 0.6494 0.5114 0.8200 0.9028 0.7019 0.6582 0.8622
0.6626 0.5602 0.6096 0.7907 0.9500 0.9500 0.9332 0.8639 0.9252 0.7972 0.3986 0.6152 0.4857 0.6205 0.7461 0.6095 0.5373 0.8918 0.3735 0.6823 0.7154
0.8807 0.4712 0.4422 0.7374 0.5907 0.8796 0.9288 0.8200 0.8800 0.8800 0.3385 0.5692 0.8048 0.4228 0.7594 0.6633 0.8475 0.4252 0.7804 0.7038 0.9468
0.9500 0.3920 0.4914 0.6607 0.9226 0.8200 0.8032 0.7339 0.9196 0.8976 0.4472 0.8686 0.6785 0.2369 0.7980 0.4654 0.8757 0.8885 0.6899 0.7038 0.7083
0.9500 0.2683 0.6797 0.4140 0.9226 0.7769 0.7988 0.9618 0.7701 0.8060 0.4250 0.8616 0.3477 0.8914 0.7664 0.7330 0.7175 0.9448 0.7143 0.6036 0.8168
0.8200 0.5961 0.3507 0.6432 0.9500 0.6797 0.4261 0.9618 0.8840 0.7676 0.5862 0.7386 0.6028 0.6987 0.8411 0.6830 0.8281 0.8615 0.5925 0.1714 0.6296
0.8200 0.8915 0.5497 0.4735 0.9500 0.6469 0.8448 0.6341 0.8807 0.5407 0.5381 0.7316 0.8305 0.7614 0.7351 0.6830 0.9500 0.9003 0.5843 0.5414 0.5504
0.8167 0.8317 0.5969 0.5132 0.9500 0.7728 0.9489 0.9618 0.7540 0.4844 0.7206 0.7974 0.8010 0.7558 0.7427 0.6792 0.9500 0.7315 0.7505 0.5959 0.3654
0.6651 0.9500 0.5667 0.2360 0.8200 0.9500 0.7148 0.6246 0.8127 0.3242 0.8516 0.9500 0.8010 0.4995 0.6051 0.4685 0.9024 0.7392 0.3774 0.6714 0.9500
(continued)
0.6867 0.7865 0.4617 0.7116 0.8440 0.6428 0.8189 0.2677 0.6067 0.5760 0.5906 0.7163 0.2630 0.5760 0.6127 0.6936 0.9166 0.6205 0.6205 0.9454 0.9429
108 A. Rubchinsky
0.9086 0.9500 0.7260 0.8616 0.3937 0.3743 0.8056 0.7206 0.6992 0.7708
0.8129 0.9500 0.8320 0.8960 0.2758 0.3456 0.8012 0.8524 0.6949 0.6853
Table 4 (continued)
0.8913 0.9172 0.8244 0.9320 0.8515 0.5294 0.6724 0.4925 0.5692 0.6408
0.6973 0.8859 0.7965 0.9094 0.8996 0.6156 0.5379 0.7224 0.5649 0.6471
0.8499 0.9048 0.7683 0.9372 0.7085 0.4843 0.8545 0.5287 0.4301 0.6865
0.9479 0.8613 0.6665 0.9068 0.8515 0.8881 0.7117 0.3720 0.6360
0.9269 0.7748 0.6383 0.8072 0.4236 0.8754 0.7245 0.3929 0.3154
0.8385 0.7313 0.4955 0.7768 0.2894 0.8357 0.6860 0.4006 0.5060
0.8375 0.3445 0.7525 0.3947 0.7474 0.9417 0.1477 0.5865 0.4631
0.9064 0.4873 0.6725 0.4662 0.9382 0.9022 0.5813 0.5379 0.5823
0.8412 0.3216 0.6235 0.5237 0.7471 0.9356 0.8506 0.5587 0.6168
0.9500 0.5200 0.8290 0.3559 0.8082 0.9312 0.7078 0.5839 0.8153
Graph Dichotomy Algorithm and Its Applications … 109
110
A. Rubchinsky
4. Substitute values X 0 , X 1 , …, X 6 into system (3). If even one of inequalities is infringed, we conclude that a big crisis in the next 5 days will not happen. Otherwise, we conclude that a big crisis is coming soon (in 5–8 days). The suggested algorithm works correctly for 9 years. Some preliminary draft considerations show that the algorithm works without mistakes up to now. Of course, any prediction cannot be correct forever.
4 Conclusion Let us give some final remarks and comments on the presented material. 1. There are two parameters in the general algorithm of construction of the family of dichotomies (AFC) and two parameters in its application for stock market analysis: the number of days in the basic minimal period (16) and the number of closest vertices in the market graph construction (4). The choice of these parameters has not accompanied by any explanations. It is possible to say that the suggested algorithm is the algorithm of data analysis and it is not the algorithm of a system behavior imitation. The only requirement to the algorithms themselves and to its parameters consists in usefulness of these algorithms. 2. Some elements of the general scheme can be improved. Particularly, it is possible to define value of parameters T and M in dependence of convergence of the corresponding complexity values (adaptively). This version can significantly reduce calculation time. It will be considered in the next publications. 3. The experimental results concern only one stock market. Of course, in the further investigation, it is supposed to consider essentially wider datasets. It will be done as the required information becomes available. 4. The author is familiar with other approaches to short-time prediction of big crises (see, for instance, publications [4, 5]). However, there is essential difference between known approaches to the problem and the approach suggested here. Unlike all of them, here no assumptions about market behavior are done. Particularly, the market activity cannot be adequately described by any kind of random processes as well as by network modeling and differential equations—especially at the moments of crises. Finally, I have never met the long period everyday predictions—rather reasoning how it can be done. 5. The last remark is as follows. Several times the expression “big crisis” is mentioned without any formal definition. It seems that to give a formal definition of this notion is practically impossible. However, such a definition is unessential. We can simply suppose that a big crisis is a state of the stock market, which the most of participants perceive as a big crisis. And this assumption leads to their behavior, characterized by the found pattern. The author is grateful to F.T. Aleskerov for his support and attention to the work.
Graph Dichotomy Algorithm and Its Applications …
111
The article was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE) and supported within the framework of a subsidy by the Russian Academic Excellence Project “5–100”.
References 1. Rubchinsky, A.: A new approach to network decomposition problems. In: Springer Proceedings of Models, Algorithms, and Technologies for Network Analysis in Mathematics & Statistics, pp. 127–152 (2017) 2. Rubchinsky, A.: Family of graph decompositions and its applications to data analysis. In: Working Paper WP7/2016/09. Higher School of Economics Publ. House, Moscow (2016). – (Series WP7 “Mathematical methods for decision making in economics, business and politics”) p. 60 3. Rubchinsky, A.: Divisive-Agglomerative algorithm and complexity of automatic classification problems. In: Working paper WP7/2015/09—Moscow. Higher School of Economics Publ. House (2015). – (Series WP7 “Mathematical methods for decision making in economics, business and politics”) 44 p 4. Kim, M., Sayama, H.: Predicting Stock Market Movements using Network Science: an Information Theoretic Approach. arXiv:1705.07980v1 [cs.SI] 22 May 2017, 1705.07980.pdf 5. Reider, R.: Volatility Forecasting I: GARCH Models.—Vol_Forecast1.pdf
Cluster Analysis of Facial Video Data in Video Surveillance Systems Using Deep Learning Anastasiia D. Sokolova and Andrey V. Savchenko
Abstract In this paper, we propose the approach of structuring information in video surveillance systems by grouping the videos, which contain identical faces. First, the faces are detected in each frame and features of each facial region are extracted at the output of preliminarily trained deep convolution neural networks. Second, the tracks that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. In the experimental study with the YTF dataset, we examined several ways to aggregate features of individual frame in order to obtain descriptor of the whole video track. It was demonstrated that the most accurate and fast algorithm is the matching of normalized average feature vectors. Keywords Organizing video data · Deep convolutional neural networks · Video surveillance systems
1 Introduction Nowadays, a high volume of available multimedia data led to creation of automatic systems, which can organize visual information. Some user applications, such as Google Photos, provide the opportunity to search for the required image, and organize and display photos of the same person. Such systems become also all the more important in public safety and video surveillance technologies [1]. However, there is a challenge to process information because such systems accumulate a huge amount of video data [2]. They can obtain hundred frames in dynamics in a few seconds [3–5]. Hence, the increasing attention is attracted to the problem of extraction of video tracks for each visitor, whose face has been observed by the video camera [2]. A. D. Sokolova (B) · A. V. Savchenko National Research University Higher School of Economics, Laboratory of Algorithms Technologies for Network Analysis, Nizhny Novgorod, Russia e-mail:
[email protected] A. V. Savchenko e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_7
113
114
A. D. Sokolova and A. V. Savchenko
The cluster analysis of the video stream can be performed in order to select clusters of tracks [6]. The main part of such clustering is the rule, which verifies that two video tracks contain the same person. This subtask can be solved with the known face verification methods [3, 4] based on modern deep convolution neural networks (CNN) [7, 8]. The goal of our paper is to carry out the comparative analysis of face verification algorithms, namely, the various ways to aggregate of deep features extracted from individual frames [2]. The rest of the paper is organized as follows: in Sect. 2, we formulate the proposed approach of cluster analysis of video data. In Sect. 3, we present the experimental results for the YouTubeFaces (YTF) dataset [9]. In Sect. 4, the concluding comments are given.
2 Cluster Analysis in Video Data The task of this work can be formulated as follows. The given video sequence of T > 1 frames should be processed in order to extract subsequences (tracks) with observations of one person and then unite subsequences, which contain the same person. At first, the faces are detected [10] with the Viola-Jones method. To increase the detection accuracy, the face region is considered only if eyes can be detected in a face region using the same Viola-Jones approach. For simplicity, we assume that each frame contains only one face. Next, an appropriate tracker algorithm [10, 11] divides the input sequence into M < T disjoint subsequences (tracks) {X (m)}, m = 1, 2, ..., M, where the m-th track is characterized by the indices of first t1 (m) and last t2 (m) frames. We will denote the duration of this track as t (m) = t2 (m) − t1 (m) + 1. Then, we use a clustering method in order to group similar tracks [6, 12]. Splitting into clusters should happen that tracks of different clusters differ significantly from each other, and tracks within a single cluster are sufficiently close. In order to implement the clustering algorithm, it is necessary to extract facial features from every frame, aggregate (pool) the features of individual frame to a single descriptor for the whole track, and then match these descriptors. Deep CNNs are widely used nowadays to extract the features. There are a lot of CNNs [7, 8, 13] trained with external large dataset, e.g., Casia WebFaces, MS-Celeb1M of VGGFace datasets. By using the domain adaptation techniques, the outputs of one of the last CNN layer for the t-th frame are stored in the D-dimensional feature vector x(t). The matching of deep features is usually done with the Euclidean distance ρ(x(t1 ), x(t2 )) [7]. Next, it is necessary to define the dissimilarity ρ(X (m 1 ), X (m 2 )) of tracks X (m 1 ) and X (m 2 ) as a summary statistic of distances between individual frames to implement clustering. Nowadays, there are a lot of examples of using Long Short-Term Memory (LSTM), which present the calculation of dissimilarity frameby-frame. But they are not normally train on the limited data. Therefore, sometimes it is possible to simply average the pairwise distance between all frames: t2 (m 1 ) (m 2 ) t2 1 ρ(x(t), x(t )). ρ(X (m 1 ), X (m 2 )) = t (m 1 )t (m 2 ) t=t (m ) t =t (m ) 1
1
1
2
(1)
Cluster Analysis of Facial Video Data in Video Surveillance …
115
The variance of distances can be also taken in the account, as it is done in hypothesis testing for statistical homogeneity using the Student’s t-test [14]: ρin (X (m 1 ), X (m 2 )) t (X (m 1 ), X (m 2 )) = , Din (m 1 ) Din (m 2 ) + t (m 1 ) t (m 2 )
(2)
where
ρin (X (m 1 ), X (m 2 )) =
t2 (m 1 ) (m 2 ) t2 1 ρ(x(t), x(t )) t (m 1 )(t (m 2 )) t=t (m ) t =t (m ) 1
Din (m) =
1
1
(3)
2
t2 (m)−1 2 (m) t 2 ρ(x(t), x(t ))2 . t (m)(t (m) − 1) t=t (m) t =t+1
(4)
1
Unfortunately, matching of all frames in two tracks leads to the very high runtime complexity [23]. Hence, usually the computation of the distance between tracks X (m 1 ) and X (m 2 ) is examined as the distance between their fixed-size representations. Yang et al. [15] proposed the two-layer neural network with attention blocks to aggregate the CNN features of all frames. However, in our experimental study, such approach demonstrated insufficient accuracy. Therefore, in this paper, we implement straightforward aggregation (or pooling [15, 16]) techniques: 1. Calculation of distance between medoids of tracks: ρ(X (m 1 ), X (m 2 )) = ρ(x*(m 1 ), x*(m 2 )), x*(m i ) =
argmin
t 2 (m i )
x(t),t∈[t1 (m i ),t2 (m i )] t =t (m ) 1 i
(5)
ρ(x(t), x(t ))
(6)
2. Average features of each track are matched: ρ(X (m 1 ), X (m 2 )) = ρ(¯x(m 1 ), x¯ (m 2 )), x¯ (m i ) =
t 2 (m i ) 1 x(t) t (m i ) t =t (m ) 1
(7)
i
3. Calculation of distance between median features of tracks: ρ(X (m 1 ), X (m 2 )) = ρ(x (m 1 ), x (m 2 )),
(8)
where x (m) is the vector of median features of track X (m) [17]. It is worth noting that in static image recognition tasks, the deep feature vectors are typically divided into their normalization [13]. Such normalization is known to
116
A. D. Sokolova and A. V. Savchenko
Fig. 1 The data flow in the proposed video processing system
make these features more robust to variations of observation conditions, e.g., camera resolution, illumination, and occlusion. Hence, in this work, we present conventional L 2 -normed features. In fact, there are two possible ways to implement such normalization. Traditional approach includes the preliminary normalization of features of individual frames [13]. However, in this paper, we argue that the video verification is much more accurate if the obtained track representations are normalized only after pooling of the initial features (5)–(8). Chen [3] proposed face verification problem based on deep convolutional neural networks. The end-to-end system consists of three modules: face detection, alignment into canonical coordinates using the detected landmarks, verification to compute the similarity between a pair of images/videos. Li [4] presented the Eigen-PEP that is built upon the recent success of the probabilistic elastic part model. It integrates the information from relevant video sources by a part-based average pooling through the model and then compresses the intermediate representation through principal component analysis, and only a number of principal eigen dimensions are kept. Our proposed approach involves a face detection using Viola-Jones cascades with additional tracking by KCF algorithm. Then, there is extraction of D CNN bottleneck features from each frame and track organizing. The simple online clustering of the normalized descriptors [12, 18] is implemented. In particular, the feature vector of each track is sequentially matched with the features of the previously detected clusters. If the distance to the nearest cluster does not exceed a certain threshold, this track is added to the cluster, and the information is updated. The complete approach is presented in Fig. 1.
3 Experimental Results In this section, we provide experimental study of the proposed system (Fig. 1). Development and testing of described approach were conducted in MS Visual Studio 2015 project using C++ language and the OpenCV library. The PC characteristics: Lenovo ideapad 310, 64-bit operating system Windows 10. To extract features, we used the
Cluster Analysis of Facial Video Data in Video Surveillance …
117
Caffe framework [19] and two publicly available CNNs suitable for face recognition, namely, the VGGNet [8] and Lightened CNN (version C) [13]. The VGGNet extracts D = 4096 nonnegative features in the output of “fc7” layer from 224 × 224 RGB images. Using Lightened CNN extracts D = 256 features at “eltwise_fc2” layer from 128 × 128 grayscale facial image. An advantage of the latter network is the low inference time (60 ms in average), as well as high accuracy of face verification for several modern datasets [13]. To determine that the same individual is presented on consecutive frames, the threshold value was experimentally selected. The Euclidean distance between the feature vectors is compared with this threshold. This parameter was obtained using the Labeled Faces in the Wild (LFW) dataset which contains 13,000 images of people faces where 1680 people have two and more images. Next, we described aggregation techniques for tracks (“Distance" and “norm > Distance” (1), “norm -> Medoid” and “Medoid -> norm” (5), (6), “norm -> AvePool” and “AvePool -> norm” and “AvePool” (7), “Median -> norm” (8)). We implemented two kinds of distance: L2 (Euclidean metric) and the implementation of the Student’s t-test (2)–(4). Experimental study was conducted with the YTF dataset [9], which contains 3425 videos of 1595 different people. An average of 2.15 videos is available for each subject. The shortest track duration is 48 frames, the longest track contains 6070 frames, and the average length of a video clip is 181.3 frames. The following
Table 1 Results of video-based face verification, YTF dataset (Lightened CNN) Algorithm Distance AUC (%) FRR@FAR=1% Average time (s) L2 L2
90.7 ± 0.6 98.2 ± 0.4
77.0 ± 8.4 14.1 ± 3.6
0.017 0.016
L2 t-test L2
89.7 ± 0.6 84.7 ± 0.7 97.2 ± 0.6
80.6 ± 6.4 72.9 ± 7.8 19.1 ± 4.3
0.029 0.41 0.03
t-test L2 t-test norm -> AvePool L2 (7) t-test AvePool (7) -> L2 norm t-test Median (8) -> L2 norm t-test
88.8 ± 0.6 91.3 ± 1.3 91.8 ± 1.4 97.7 ± 0.5
54.1 ± 5.9 71.8 ± 10.0 72.3 ± 11.5 21.4 ± 6.4
0.043 0.2 0.017 0.019
96.8 ± 0.5 98.3 ± 0.7
37.2 ± 7.6 12.4 ± 3.1
0.017 0.014
97.6 ± 0.5 96.7 ± 0.6
12.5 ± 3.1 22.3 ± 7.2
0.015 0.011
94.4 ± 0.5
37.0 ± 7.5
0.013
Distance (1) Norm -> Distance (1) Medoid (5), (6) Medoid (5), (6) -> norm AvePool (7)
118
A. D. Sokolova and A. V. Savchenko
Table 2 Results of video-based face verification, YTF dataset (VGGNet) Algorithm Distance AUC (%) FRR@FAR=1%
Average time (s)
L2 L2
83.3 ± 0.8 97.9 ± 0.6
85.8 ± 9.0 23.2 ± 6.3
0.081 0.06
L2 t-test L2
85.7 ± 1.8 80.8 ± 1.2 93.5 ± 1.1
86.0 ± 8.4 83.9 ± 7.7 25.4 ± 6.2
0.097 1.25 0.12
t-test L2 t-test norm -> AvePool L2 (7) t-test AvePool (7) -> L2 norm t-test Median (8) -> L2 norm t-test
85.2 ± 0.7 89.1 ± 0.9 87.4 ± 1.2 97.2 ± 0.6
69.9 ± 7.9 79.7 ± 7.8 81.2 ± 5.8 54.4 ± 6.3
0.16 1.07 0.081 0.088
96.3 ± 0.7 98.1 ± 1.0
76.9 ± 6.8 19.4 ± 5.9
0.082 0.073
97.7 ± 0.6 96.2 ± 0.7
25.3 ± 7.9 32.4 ± 6.5
0.078 0.053
94.8 ± 0.7
41.1 ± 7.3
0.068
Distance (1) Norm -> Distance (1) Medoid (5), (6) Medoid (5), (6) -> norm AvePool (7)
parameters were calculated: AUC (Area under curve), FRR (False Reject Rate) for FAR (False Accept Rate) fixed to 1%, and average verification time. The results are shown in Tables 1 and 2 for Lightened CNN and VGGNet, respectively, in the format: mean ± standard deviation, where the values of standard deviation are calculated using YTF face verification protocol. Results emphasize the need for proper normalization of feature vectors. Both tables indicate that the most efficient algorithm is the computation of average track features and then normalize them (AvePool (7) -> L2 -norm). The Lightened CNN demonstrated better results. The AUC of this method is 98.3 ± 0.7, such result is 10–13% more accurate than approaches [3, 4] of known face verification methods. The ROC-curves for the best methods are shown in Fig. 2. The next step is clustering experiment. Thresholds were selected by fixing different FAR. The results are shown in Table 3. The number of all clusters is more than people in dataset because the thresholds were selected in the way that the error value was very small. So, it means that sometimes one person images can be split in the different clusters.
Cluster Analysis of Facial Video Data in Video Surveillance …
119
Fig. 2 ROC-curves for video-based face verification task (Lightened CNN) Table 3 Results of clustering, YTF dataset FAR (%) Lightened CNN All clusters Wrong clusters 1 10
2492 2147
35 162
All clusters 2634 2195
VGGNet Wrong clusters 44 171
4 Conclusion In this paper, the task of video data clustering was examined. We particularly focused on the ways to efficiently compute the dissimilarity of video tracks by using rather simple aggregation techniques of features extracted by state-of-the-art deep CNNs. Experiments demonstrated the most accurate and computationally cheap technique involves the L2 -normalization of average unnormalized features of individual frames. The main direction for further research is applying our approach in organizing data from real video surveillance systems. It is also important to examine more sophisticated distances between video tracks, e.g., metric learning [20] or statistical homogeneity testing [21]. If the number of observed persons is high, it is necessary to deal with insufficient performance of our simple online clustering by using, e.g., approximate nearest neighbor search [5, 22]. Acknowledgements The work was conducted at Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics and supported by RSF (Russian Science Foundation) grant 14-41-00039.
120
A. D. Sokolova and A. V. Savchenko
References 1. Zhang, Y.J., Lu, H.B.: A hierarchical organization scheme for video data. Pattern Recogn. 35(11), 2381–2387 (2002) 2. Sokolova, A.D., Kharchevnikova, A.S., Savchenko, A.V.: Organizing multi-media data in video surveillance systems based on face verification with convolutional neural networks. arXiv preprint arXiv:1709.05675 (2017) 3. Chen, J.C., Ranjan, R., Kumar, A., Chen, C.H., Patel, V.M., Chellappa, R.: An end-to-end system for unconstrained face verification with deep convolutional neural networks. In: IEEE International Conference on Computer Vision Workshops, pp. 118–126 (2015) 4. Li, H., Hua, G., Shen, X., Lin, Z., Brandt, J.: Eigen-PEP for video face recognition. In: Cremers D., Reid I., Saito H., Yang M.H. (eds.) Asian Conference on Computer Vision. ACCV 2014. LNCS, vol. 9005, pp. 17–33. Springer, Cham (2014) 5. Savchenko, A.V.: Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition. Opt. Memory Neural Netw. (Information Optics) 26(2), 129136 (2017) 6. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2009) 7. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016) 8. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision, pp. 6–17 (2015) 9. Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 529–534 (2011) 10. Szeliski, R.: Computer vision: algorithms and applications. Springer Science and Business Media (2010) 11. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) European Conference on Computer Vision (ECCV 2012). LNCS, vol. 7575, pp. 702–715. Springer, Berlin, Heidelberg (2012) 12. Savchenko, A.V.: Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases. Opt. Lett. 11(2), 329–341 (2017) 13. Wu, X., He, R., Sun, Z.: A Lightened CNN for deep face representation. arXiv:1511.02683 (2015) 14. Seltman, H.J.: Experimental design and analysis. Carnegie Mellon University, Pittsburgh (2012) 15. Yang, J., Ren, P., Chen, D., Wen, F., Li, H., Hua, G.: Neural aggregation network for video face recognition, arXiv: 1603.05474 (2016) 16. Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification. arXiv preprint arXiv:1706.06905 (2017) 17. Rassadin, A.G., Gruzdev, A.S., Savchenko, A.V.: Group-level emotion recognition using transfer learning from face identification. arXiv preprint arXiv:1709.01688. accepted at ACM ICMI (2017) 18. Savchenko, V.V.: Study of stationarity of the random time series using the principle of the information divergence minimum. Radiophys. Quantum Electron. 60(1), 81–87 (2017) 19. Jia, Y., et al. Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678 (2014) 20. Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2013) 21. Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015) 22. Savchenko, A.V.: Maximum-likelihood approximate nearest neighbor method in real-time image recognition. Pattern Recogn. 61, 459–469 (2017) 23. Nikitin, M.Y., Konushin, V.S., Konushin, A.S.: Neural network model for video-based face recognition with frames quality assessment. Comput. Opt. 5, 732–742 (2017)
Using Modular Decomposition Technique to Solve the Maximum Clique Problem Irina Utkina
Abstract In this article, we use the modular decomposition technique to solve the weighted maximum clique problem exactly. Proposed algorithm takes the modular decomposition tree, constructed in linear time, and finds maximum weighted clique via recursive tree search (DFS). We want to show that modular decomposition reduces calculation time. However, not all graphs have modules, so in the article there are algorithms to construct them. The results show comparison of proposed solution with Ostergard’s algorithm on DIMACS benchmarks and on generated graphs. Keywords Graphs · Maximum clique problem · Modular decomposition
1 Introduction Today, graphs can be used in various fields, such as biology, chemistry [5, 7], data analysis, mathematics, and others, as a structure of data. Due to enormous expand of information, the size of graphs for analysis is increasing; it can be hundred, thousand, and even hundreds of thousands vertices. Since the computational time of any algorithm on graphs depends on its size, it has become a great problem for community. Clique C of a graph G(V, C) is a subset of vertices which all of them are connected to each other (complete subgraph). Maximum clique (MC) is a clique which has a maximum size or weight, if there are weights to vertices. Maximum clique problem is to find the maximum complete subgraph of G. It is the NP-complete problem [1] so that is why increasing the size of the input graph leads us to increasing computational time of any exact solver exponentially. In order to avoid this, we can divide problem to numbers of subproblems in smaller size and solve them faster.
I. Utkina (B) Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, 136, Rodionova Str., Nizhny Novgorod, Russia e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_8
121
122
I. Utkina
There are many graph decomposition techniques to reduce a graph to its smaller fragments; one of them is modular decomposition. In this article, we use the fastest algorithm for constructing modular decomposition proposed by Tedder et al. [8]. It creates a modular decomposition tree for any input graphs in linear time. Then, we use this tree to solve the maximum clique problem on graphs from DIMACS benchmarks using Ostergard’s [6] algorithm and compare computational time with Ostergard’s algorithm without modular decomposition technique. Also, we construct some other types of graphs, such as co-graphs and graphs of mutual simplicity. As the result, we found out that not many graphs have modules, so it limited our options, but there are special types of graph that have needed structure, such as co-graphs and graphs of mutual simplicity. Computational time on such graphs is reduced compared to Ostergard’s algorithms.
2 Modular Decomposition Algorithm Module M of graph G(V, E) is a subset of vertices, where all of them have the same neighbors outside the set. For example, in Fig. 1a, vertices a, b, c are a module, because each has one common neighbor vertex d. Also, vertices f and e construct a module with common neighbors d and g. As it is seen from example, inside module, vertices can be connected and/or disconnected. It results in three types of modules: parallel, series, and prime. The first one describes module, where all of its vertices are disconnected, so it is basically an independent set, whereas the second one is characterized by connected vertices. Finally, the third one relates to the set in which not all vertices are connected. The modular decomposition technique suggests reduced module to one vertex with some changed quality, for example, it can be weight, label, or color depending on the problem, and after finding all modules and reducing them, we get graph with less vertices. So first, we take modules a, b, c and make one vertex abc as shown in Fig. 1b, and then reduce module e, f to vertex e f . As a result, the input graph has 4 vertices as shown in Fig. 1c. Tedder et al. proposed a linear algorithm for constructing modular decomposition tree in which the root represents input graph and its children represent strong modules (modules which do not overlap each other); after that, each model decomposes to its strong module and so on; leaves of this tree are vertices of the input graph. An example of such tree is shown in Diagram 1.
Using Modular Decomposition Technique …
123
Fig. 1 Graph reducing
(a) Example 1[3]
(b) Step 1 of reducing graph size [3]
(c) Step 2 of reducing graph size [3] Diagram 1 Example of MD tree for graph from example 1
Prime Series a
b
d c
Parallel e
f
g
124
I. Utkina
3 The Maximum Clique Solver Based on the Modular Decomposition Tree Proposed algorithm takes the modular decomposition tree as input data and recursively as depth-first search compute maximum clique on each level, as it solves all its children. See the pseudocode: function solve(node) if node is leave then return this node with it s weight end if if node has t ype parallel then return max(solve(childr en)) end if if node has t ype series then return sum(solve(childr en)) end if if node has t ype prime then subgraph = cr eate − subgraph(childr en) return Ostergar d(subgraph) end if end function There are three types of nodes: parallel, series, and prime. When it finds node with the parallel type, it returns maximum of solution for its children, because they are not connected and cannot become a clique. When it finds nodes with the series type, it returns sum of children, as they are all connected. If the node i type is prime, algorithm constructs a new graph, which has n i vertices (n i —the number of modules for node i), connects them as in input graph, and gives them weights as a result of calculation of MC problem, and then solve it using some general algorithm for weighted maximum clique problem. In our case, we use Ostergard’s algorithm, because it easily implements in new algorithm, due to input data. Our approach was to create such algorithm and compare it with some known solver, like Ostergard. In this article, we show results against Ostergard, because it was used inside suggested approach (Fig. 2). Let us consider the graph in Fig. 1 and its modular decomposition tree from Diagram 1. The proposed algorithm goes to leaf a and returns 1, because there is no weight to vertices, also from leafs b and c; it returns 1, and then the algorithm goes to the node with the type series and returns 3 as sum of 1, 1, and 1. At leaves d and g, it returns 1. At the parallel node, it returns also 1, because nodes e and f have the same weight; if they have different weights, it will return maximum. After that, at the node with type prime it constructs graph with 4 vertices with weights 3, 1, 1, and 1, and the structure as shown in Fig. 3, use solver, and get a solution maximum clique a, b, c, d with weight equal to 4.
Using Modular Decomposition Technique …
125
Fig. 2 Structure of algorithm *Marc Tedder, Derek Corneil, Michel Habib, and Christophe Paul “Simpler Linear-Time Modular Decomposition via Recursive Factorizing Permutations” ** Patric R. J. Ostergard
4 Results on DIMACS Benchmarks After using the algorithm of Tedder et al. on DIMACS benchmarks, it was found that only a few graphs have modules, so we compare results only with them. See Table 1. As can be seen from the result table, the proposed algorithm is faster only on c − f at200 − 5, and after analysis of MD tree of this graph, we found that for this particular graph MD tree contains only parallel and series types of nodes. So we thought that we can create such structures to test on them.
Table 1 Results on DIMACS benchmarks Size My algorithm (s) MD (s) c-fat200-1 c-fat200-2 c-fat200-5 c-fat500-1 c-fat500-10 c-fat500-2 c-fat500-5
0.000118 0 0 0.015956 0.006224 0.011647 0.009128
0.068662612 0.104443933 0.226667621 0.147924545 0.680824483 0.216331646 0.324011131
My + MD (s)
Ostergard (s)
0.068780612 0.104443933 0.226667621 0.163880545 0.687048483 0.227978646 0.333139131
0.001059 0.001371 2.61894 0.002184 0.011369 0.003734 0.006868
126
I. Utkina
5 Algorithms for Generating Graphs with Modules In this article, there are two proposed algorithms to generate graphs, which have special structure of MD tree; it contains only parallel and series nodes. This is needed to avoid constructing subgraphs and use Ostergard’s algorithm [6] to solve it. Such structures do not appear in many graphs, so we need to generate it to test our theory.
5.1 Graphs of Mutual Simplicity Graph of mutual simplicity is generated as follows: we connect two vertices i and j only if their greatest common divisor equals to 1, for example, see Fig. 3. For this graph, you can see MD tree on Diagram 2.
Fig. 3 Graph of mutual simplicity with eight nodes
Diagram 2 MD tree for graph of mutual simplicity with eight vertices
Series
7
1
Parallel 6
Series 3
Parallel 8
4
Series 2
5
Using Modular Decomposition Technique …
127
5.2 Co-Graphs Co-graphs is another type of graph, which has necessary structure. For this case, we build MD tree by recursively partitioning giving nodes to modules and randomly assign types to each node as parallel or series. The algorithm works as follows: function partition(n) par ts = [] while n > 0 do p = randind(1, n) par ts.append( p) n=n−p end while return par ts end function function Cr eateCoGraph(n) if n > 1 then par ts = r ever se( partition(n)) if randint (0, 1) == 0 then graph for par tinpar ts do subgraph = cr eateCo − Graphs( par t) graph.add(subgraph) end for else graph for par tinpar ts do subgraph = cr eateCo − Graphs( par t) graph.add(subgraph) end for connect all subgraph end if return graph end if end function
6 Results on Generated Graphs 6.1 Graphs of Mutual Simplicity In Table 2, results for solving maximum clique problem are presented. Fist column contains graphs’ size; we had graphs with 100–2450 vertices. The second column
128
I. Utkina
Table 2 Results on graphs mutual simplicity Size My algorithm(s) MD(s) 100 150 200 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750 1800 1850 1900 1950 2050 2100 2200 2250 2300 2350 2400 2450
0 0 0 0.00015 0.000132 0.000149 0.00014 0.000178 0.000163 0.000237 0.000169 0.000192 0.000263 0.000273 0.000358 0.000262 0.000268 0.000283 0.000284 0.000295 0.00024 0.000331 0.000284 0.000315 0.000323 0.000331 0.000347 0.000519 0.000395 0.000368 0.00046
0.06906381 0.160086907 0.311571475 0.830395642 2.77094437 2.962479483 1.005088013 1.102902136 3.624508829 6.191885617 3.158776487 0.448514811 5.838179619 8.211194675 1.049262402 1.861131225 2.108732494 13.137433843 6.082156016 2.649852882 1.641608819 22.834692949 15.864648365 19.971808447 6.728151243 3.152012792 3.20890262 26.991591521 26.367774638 17.514787453 41.651295783
My+MD(s)
Ostergard(s)
0.06906 0.16009 0.31157 0.83055 2.77108 2.96263 1.00523 1.10308 3.62467 6.19212 3.15895 0.44871 5.83844 8.21147 1.04962 1.86139 2.10900 13.13772 6.08244 2.65015 1.64185 22.83502 15.86493 19.97212 6.72847 3.15234 3.20925 26.99211 26.36817 17.51516 41.65176
0.001743 0.004758 0.006921 300.001 0.709009 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300 300
shows the time took by the proposed algorithm to find maximum clique using MD tree. The next column has time for constructing MD tree by algorithm of Tedder et al. [8]. The fourth is the sum of two previous columns. And the last column has time of Ostergard’s algorithm [6]. As can be seen from this table, with an increase in graphs’ size, the proposed algorithm works faster than Ostergard’s algorithm.
Using Modular Decomposition Technique … Table 3 Results on co-graphs Size My algorithm(s) MD(s) 500 550 600 650 700 750 800 850 900 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600
107 vertices 0.000119 219 vertices 0.000128 206 vertices 0.000123 143 vertices 0.000124 197 vertices 0.000151 279 vertices 0.000184 92 vertices 0.00017 155 vertices 0.000172 134 vertices 0.00016 241 vertices 0.00018 338 vertices 0.000162 329 vertices 0.000182 408 vertices 0.000139 362 vertices 0.000165 151 vertices 0.000193 292 vertices 0.00019 178 vertices 0.000214 216 vertices 0.000186 279 vertices 0.00026 377 vertices 0.000217 238 vertices 0.000268 409 vertices 0.000262
129
My+MD(s)
Ostergard(s)
0.599508522
0.599627522
1.126792247
1.126920247
1.613567235
1.613690235
1.079984486
1.080108486
107 vertices 5.19397 219 vertices 0.420204 206 vertices 0.385463 37 vertices 300
1.49792208
1.49807308
2.111751452
2.111935452
0.592107428 1.634152987
0.592277428 1.634324987
197 vertices 0.10201 279 vertices 0.096327 37 vertices 300 29 vertices 300
1.25222059
1.25238059
36 vertices 300
3.105684402
3.105864402
241 vertices 4.2414
4.151108273
4.151270273
116 vertices 300
4.811745176
4.811927176
5.756633889
5.756772889
5.614880477
5.615045477
2.125359674
2.125552674
329 vertices 19.4748 408 vertices 0.302615 362 vertices 2.85711 54 vertices 300
5.384282824
5.384472824
2.035435868
2.035649868
292 vertices 12.8855 63 vertices 300
2.634297357
2.634483357
32 vertices 300
5.657872455
5.658132455
35 vertices 300
9.239033114
9.239250114
6.540190934
6.540458934
10.753900639
10.75416264
377 vertices 13.7176 238 vertices 19.8803 409 vertices 48.6752 (continued)
130
I. Utkina
Table 3 (continued) Size My algorithm(s) 1650 1700 1800 1850 1900 1950
278 vertices 0.00028 229 vertices 0.000327 543 vertices 0.000345 494 vertices 0.000296 263 vertices 0.000303 355 vertices 0.000308
MD(s)
My+MD(s)
Ostergard(s)
10.456971305
10.45725131
41 vertices 300
10.474212124
10.47453912
30 vertices 300
21.152246751
21.15259175
47 vertices 300
18.978203233
18.97849923
43 vertices 300
5.338127994
5.338430994
103 vertices 300
17.914419038
17.91472704
44 vertices 300
6.2 Co-Graphs Table 3 shows similar results on co-graphs with 500–1950 vertices. In addition to computational time, we present the clique size that was found. As can be seen sometimes, Ostergard could not find maximum clique during 300 s, so in the table in the last column it has less vertices in MC than the proposed algorithm.
7 Conclusion As can be seen from the result, the proposed algorithm works faster on graphs without prime nodes in the MD tree. It happens due to necessity to construct a subgraph and call different solvers for it to calculate the maximum clique at this step. Also, you can notice that the construct of MD tree takes the significant amount of calculation time, though algorithm is linear. Acknowledgements The author is grateful to Dmitry Malyshev for the problem statement and fruitful discussions.
References 1. Berman, P., Schnitger, G.: On the complexity of approximating the independent set problem. Lecture Notes in Computer Science, vol. 349, pp. 256–267. Springer (1989) 2. Bomze, I.M. et al.: The maximum clique problem. Handbook of Combinatorial Optimization, pp. 1–74. Springer, U.S. (1999)
Using Modular Decomposition Technique …
131
3. Gagneur, J., Krause, R., Bouwmeester, T., Casari, G.: Modular decompositionof protein-protein interaction networks. Genome Biol. 5, R57 (2004) 4. Habib, M., Paul, C.: A survey on algorithmic aspects of modular decomposition. Comput. Sci. Rev. 4, 41–59 (2010) 5. Kuhl, F.S., Crippen, G.M., Friesen, D.K.: A combinatorial algorithm for calculating ligand binding. J. Comput. Chem. 5(1), 24–34 (1983) 6. Ostergard P.R.J.: A fast algorithm for the maximum clique problem. Discret. Appl. Math. 120(1– 3), 197–207(2002) 7. Rhodes, N., Willett, P., Calvet, A., Dunbar, J.B., Christine, H.: CLIP: similarity searching of 3D databases using clique detection. J. Chem. Inf. Comput. Sci. 43(2), 443–448 (2003) 8. Tedder, M., Corneil, D., Habib, M., Paul, C.: Simpler linear-time modular decomposition via recursive factorizing permutations. In: 35th International Colloquium on Automata, Languages and Programming, ICALP2008, Part 1, LNCS, vol. 5125, pp. 634–64. Springer (2008)
Part II
Network Models
Robust Statistical Procedures for Testing Dynamics in Market Network A. P. Koldanov and M. A. Voronina
Abstract Market network analysis attracts a growing attention last decades. One of the most important problems related with it is the detection of dynamics in market network. In the present paper, the stock market network of stock’s returns is considered. Probability of sign coincidence of stock’s returns is used as the measure of similarity between stocks. Robust (distribution free) multiple testing statistical procedure for testing dynamics of network is proposed. The constructed procedure is applied for German, French, UK, and USA market. It is shown that in most cases where the dynamics is observed it is determined by a small number of hubs in the associated rejection graph. Keywords Stock returns · Probability of sign coincidence · Sign correlation Stock markets · Uniformly most powerful test · Multiple hypothesis testing Bonferroni correction
1 Introduction Market network analysis attracts a growing attention last decades. Market network is a complete weighted graph in which the nodes correspond to stocks and weights of edges between nodes are given by a measure of similarity (dependence) of stock’s returns [2]. Most publications on market network analysis deal with Pearson correlation as the measure of similarity. The main drawback of the Pearson correlation is its weak robustness to failure of the assumption of normality of joint distribution of stock returns. A different measure of similarity, sign similarity (probability of sign coincidence), was proposed in [1]. It was shown in [4] that statistical procedure of A. P. Koldanov · M. A. Voronina (B) Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia e-mail:
[email protected] M. A. Voronina e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_9
135
136
A. P. Koldanov and M. A. Voronina
network structure identification based on this measure is distribution free (robust) in the class of elliptical distributions of stock returns. One of the most important problem in market network analysis is the detection of dynamics in market network. Experimental study of dynamics of USA market network was initiated in [3]. Statistical approach to the detection of the dynamics of Pearson correlations market network was proposed in [7]. This approach was applied to the French and German stock markets. In the present paper, we propose a statistical approach to the detection of the dynamics of sign similarity market network. This approach gives a distribution free (robust) statistical procedures, and it is applied to USA, UK, French, and German stock markets. Stock returns over the period 2003–2014 are considered. The hypothesis of stability of sign connections for whole market is rejected for the period 2003–2014. But detailed analysis showed that the rejection of hypotheses is associated with few stocks. To visualize the results, we use the notion of the rejection graph, introduced in [5]. It is shown that in most cases where the dynamics is observed it is determined by a small number of hubs in the associated rejection graph.
2 Methodology We model the daily stocks returns on financial market by random variables Rik (t), i = 1, . . . , N , where N is a number of stocks, k is the year of observation (k = 2003, . . . , 2014), t is the day of observation, t = 1, 2, . . . , n k , where n k is the number of observations for the year k. Let M be the number of years of observations (in our case M = 12). Suppose that Rik (t), t = 1, . . . , n k are i.i.d. random variables distributed as Rik . Sign similarity of stock returns i and j for the year k is defined by: pi,k j = P((Rik − E(Rik ))(R kj − E(R kj )) > 0)
(1)
Introduce the set of individual hypotheses (k, l = 2003, . . . , 2014, k = l, i, j = 1, . . . , N , i = j): (2) h i,k,lj : pi,k j = pi,l j vs ki,k,lj : pi,k j = pi,l j , We say that there is no significant difference between sign similarity market networks for the years k and l if all hypotheses h i,k.lj are accepted. We say, that sign similarity market networks for the years k and l are different (dynamics is observed) if at least one hypothesis h i,k.lj is rejected (i.e.there exists a pair of stocks (i, j) such that hypothesis h i,k.lj is rejected). To have a more detailed investigation, we introduce the rejection graph for the years (k, l). Rejection graph has N nodes, an edge (i, j) is included in the rejection graph if the hypothesis h i,k.lj is rejected. Rejection graph gives insights for detailed understanding of market network dynamics. Let S k be the matrix of sign similarities of stock returns for the year k, S k = ( pi,k j ). The overall hypothesis for the period of M years from 2003 to 2014 has the form H0 : S 2003 = S 2004 = · · · = S 2014
Robust Statistical Procedures for Testing Dynamics in Market Network
137
To detect the dynamics of market network, we use the some multiple testing statistical procedures. To test the individual hypotheses (2), we apply the test of comparison of two binomial distributions, and to test the overall hypothesis H0 we use the Bonferroni corrections [6], which is known to control a Family Wise Error Rate (FWER, probability of at least one error of the type I). To test the individual hypotheses (2), we use the next estimation of the probability (1) from observations: Ii,k j (t) =
1 , sign((rik (t) − rik )(r kj (t) − r kj )) > 0
, wher e rik =
0 , sign((rik (t) − rik )(r kj (t) − r kj )) ≤ 0
Random variable Ti,k j =
nk n=1
n 1 k R (t) n n=1 i
Ii,k j (t) has binomial distribution b(n k , pi,k j ). To test
individual hypotheses (2), we apply test φ(Ti,k j , Ti,l j ) [6]: 1 , Ui, j < c1 (Vi, j ) or Ui, j > c2 (Vi, j ) φ(Ti,k j , Ti,l j ) = 0 , c1 (Vi, j ) ≤ Ui, j ≤ c2 (Vi, j ) wher e Ui, j =
nk t=1
Ii,k j (t), Vi, j =
It can be shown that test defined from equations:
nk
Ii,k j (t) +
t=1 φ(Ti,k j , Ti,l j )
nl t=1
is UMPU test when constants c1 , c2 are
P(Ti,k j < c1 /Ti,k j + Ti,l j = ξ ) ≤
P(Ti,l j
<
c2 /Ti,k j
+
Ti,l j
Ii,l j (t)
= ξ) ≤
c 1 −1
ξ −z
Cnz k Cnl ξ
z=0
Cn k +nl
n k +n l
Cnz k Cnl
z=c2 +1
=
α 2
ξ −z
ξ Cn k +nl
=
α 2
3 Experimental Results We apply the described methodology to detect dynamics of the market network for the German, French, UK, and USA stock markets over 12-years period from 2003 to 2014. Germany It was selected 91 German companies traded 250 days in each year for the period from 2003 to 2014. Following the Bonferroni correction, we tested the individual hypothe2 2 C91 = 1, 85 × 10−7 . The results are ses (1) each at the significance level 0, 05/C12 presented in Table 1.
138
A. P. Koldanov and M. A. Voronina
Table 1 Results. Germany 2003 2004 2005 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
0 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 1 1 1 1 1 1 1 1
1 0 0 0 0 1 1 1 1 1 1 1
Table 2 Results. France 2003 2004 2005 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
0 1 1 1 1 1 1 1 1 1 1 1
1 0 0 1 1 1 1 1 1 1 1 1
1 0 0 1 1 1 1 1 1 1 1 1
2006
2007
2008
2009
2010
2011
2012
2013
2014
1 0 0 0 0 1 1 1 1 1 1 1
1 1 0 0 0 1 1 1 1 1 1 1
1 1 1 1 1 0 0 1 1 1 1 0
1 1 1 1 1 0 0 1 1 1 1 1
1 1 1 1 1 1 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 1 1 0 0 0 0
1 1 1 1 1 1 1 0 0 0 0 0
1 1 1 1 1 0 1 1 0 0 0 0
2006
2007
2008
2009
2010
2011
2012
2013
2014
1 1 1 0 1 0 1 1 1 1 1 1
1 1 1 1 0 0 1 1 1 1 1 1
1 1 1 0 0 0 0 1 1 0 1 0
1 1 1 1 1 0 0 1 1 1 1 0
1 1 1 1 1 1 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 0
1 1 1 1 1 0 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 0 0 1 0 1 1 0
The entry (k, l) of the table has value 0, if the hypothesis S k = S l is accepted, and entry (k, l) of the table has value 1, if the hypothesis S k = S l is rejected (we reject the hypothesis if there is a pair of stocks (i, j) such that h i,k,lj is rejected). As one can see for the most pair of years (k, l) stability hypothesis S k = S l is rejected. France It was selected 100 French companies traded 250 days in each year for the period from 2003 to 2014. Following the Bonferroni correction, we tested the individual 2 2 hypotheses (1) each at the significance level 0, 05/C12 C100 = 1, 53 × 10−7 . The results are presented in Table 2.
Robust Statistical Procedures for Testing Dynamics in Market Network Table 3 Results. UK 2003 2004 2005 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
0 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 1 1 1 1 1 1 1
1 0 0 0 0 1 1 1 1 0 1 1
139
2006
2007
2008
2009
2010
2011
2012
2013
2014
1 0 0 0 0 1 1 1 1 0 1 0
1 0 0 0 0 1 1 1 1 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 1 0 0 1
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 1 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 0 1 0 0 0 0 0
The entry (k, l) of the table has value 0, if the hypothesis S k = S l is accepted, and entry (k, l) of the table has value 1, if the hypothesis S k = S l is rejected (we reject the hypothesis if there is a pair of stocks (i, j) such that h i,k,lj is rejected). As one can see, situation is similar for the German stock market, for the most pair of years (k, l) stability hypothesis S k = S l is rejected. UK It was selected 91 UK companies traded 250 days in each year for the period from 2003 to 2014. Following the Bonferroni correction, we tested the individual hypothe2 2 ses (1) each at the significance level 0, 05/C12 C100 = 1, 85 × 10−7 . The results are presented in Table 3. The entry (k, l) of the table has value 0, if the hypothesis S k = S l is accepted, and entry (k, l) of the table has value 1, if the hypothesis S k = S l is rejected (we reject the hypothesis if there is a pair of stocks (i, j) such that h i,k,lj is rejected). As one can see, situation is different from the German and French stock markets, there are many pairs of years (k, l) where stability hypothesis S k = S l is confirmed. USA It was selected 98 USA companies traded 250 days in each year for the period from 2003 to 2014. Following the Bonferroni correction, we tested the individual 2 2 C100 = 1, 594 × 10−7 . The hypotheses (1) each at the significance level 0, 05/C12 results are presented in Table 4. The entry (k, l) of the table has value 0, if the hypothesis S k = S l is accepted, and entry (k, l) of the table has value 1, if the hypothesis S k = S l is rejected (we reject the hypothesis if there is a pair of stocks (i, j) such that h i,k,lj is rejected). As one can see, situation is different from the German and French stock markets, and close to the UK market, there are much more pairs of years (k, l) where stability hypothesis S k = S l is confirmed. USA market is the most stable in comparison with German, French, and UK markets.
140
A. P. Koldanov and M. A. Voronina
Table 4 Results. USA 2003 2004 2005 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
0 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 1 1 1 1 1 1 1
1 0 0 0 0 1 1 1 1 0 1 1
2006
2007
2008
2009
2010
2011
2012
2013
2014
1 0 0 0 0 1 1 1 1 0 1 0
1 0 0 0 0 1 1 1 1 0 0 0
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 0 0 1 0 0 1
1 1 1 1 1 0 0 0 0 0 0 0
1 1 1 1 1 0 1 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 0 0 0 1 0 0 0 0 0
4 Rejection Graph The results of the previous section are in some sense disappointing, market network stability is not observed. At the same time, one do not observe a clear tendency in the instability of the market networks. To go insight and understand better the situation, we use the notion of the rejection graph for the years (k, l). Rejection graph has N nodes, an edge (i, j) is included in the rejection graph if the hypothesis h i,k.lj is rejected. The figures below represent the rejection graphs for the German market network for the pairs of years 2003–2004, 2003–2005, 2003–2006, …, 2003–2014.
Fig. 1 Rejection graphs. Left: rejection graph for the years 2003 and 2004. Right: rejection graph for the years 2003 and 2005
Robust Statistical Procedures for Testing Dynamics in Market Network
141
Fig. 2 Rejection graphs. Left: rejection graph for the years 2003 and 2006. Center: rejection graph for the years 2003 and 2007. Right: rejection graph for the years 2003 and 2008
Fig. 3 Rejection graphs. Left: rejection graph for the years 2003 and 2009. Center: rejection graph for the years 2003 and 2010. Right: rejection graph for the years 2003 and 2011
Fig. 4 Rejection graphs. Left: rejection graph for the years 2003 and 2012. Center: rejection graph for the years 2003 and 2013. Right: rejection graph for the years 2003 and 2014
One can observe an surprising phenomenon: each rejection graph includes a small number of hubs, and if one deletes these hubs, the stability hypothesis is not rejected. This phenomenon needs a further investigation (Figs. 1, 2, 3 and 4).
5 Concluding Remarks The paper describes a methodology for testing dynamics of the stock market networks. This methodology is applied for the German, French, UK, and USA stock markets. Some instability of the networks is observed. It is shown that this instability
142
A. P. Koldanov and M. A. Voronina
is connected with a few stocks of the market. This phenomenon will be a subject of future investigations. Acknowledgements The work is partially supported by RFHR grant 15-32-01052.
References 1. Bautin, G.A., Kalyagin, V.A., Koldanov, A.P., Koldanov, P.A., Pardalos, P.M.: Simple measure of similarity for the market graph construction. Comput. Manag. Sci. 10, 105–124 (2013) 2. Boginski, V., Butenko, S., Pardalos, P.M.: Statistical analysis of financial networks. J. Comput. Stat. Data Anal. 48(2), 431–443 (2005) 3. Boginski, V., Butenko, S., Pardalos, P.M.: Mining market data: a network approach. J. Comput. Oper. Res. 33(11), 3171–3184 (2006) 4. Kalyagin, V.A., Koldanov A.P., Koldanov, P.A.: Robust identification in random variables networks. J. Stat. Plan. Inference 181, 30–40 (2017) 5. Koldanov, P.A., Lozgacheva, N.N.: Multiple testing of sign symmetry for stock return distributions. Int. J. Theor. Appl. Finance (2016) 6. Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer, New York (2005) 7. Voronina, M.A., Koldanov, P.A.: Stability testing of stock returns connections. In: Springer Proceedings in Mathematics and Statistics, vol. 197, pp. 163–174 (2017)
Application of Market Models to Network Equilibrium Problems Igor Konnov
Abstract We present a general two-side market model with divisible commodities and price functions of participants. A general existence result on unbounded sets is obtained from its variational inequality reformulation. We describe an extension of the network flow equilibrium problem with elastic demands and a new equilibrium type model for resource allocation problems in wireless communication networks, which appear to be particular cases of the general market model. This enables us to obtain new existence results for these models as some adjustments of that for the market model. Under certain additional conditions, the general market model can be reduced to a decomposable optimization problem where the goal function is the sum of two functions and one of them is convex separable, whereas the feasible set is the corresponding Cartesian product. We discuss some versions of the partial linearization method, which can be applied to these network equilibrium problems. Keywords Market models · Divisible commodities · Price functions · Variational inequality · Existence results · Partial linearization · Componentwise steps Network flow equilibria · Elastic demands · Wireless communication networks
1 Introduction Investigation of complex systems with active elements having their own interests and sets of actions is usually based on a suitable equilibrium concept. Such a concept should equilibrate different interests and opportunities of the elements (agents, participants) and provide ways of its proper implementation within some accepted basic (information) behavior framework of the system under investigation. For instance, the classical perfectly (Walrasian) and imperfectly (Cournot– Bertrand) competitive models, which are most popular in economics (see e.g. [1, 2] I. Konnov (B) Kazan Federal University, Kremlevskaya 18, Kazan 420008, Russia e-mail:
[email protected] URL: http://kpfu.ru/Igor.Konnov?p_lang=2 © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_10
143
144
I. Konnov
and references therein), reflect different equilibration mechanisms and information frameworks. We recall that actions of any separate agent within a perfect competition model can not impact the state of the whole system, hence any agent may utilize some integral system parameters (say, prices), rather than the information about the behavior of other agents. On the contrary, actions of any separate agent in an imperfectly competitive model can change the state of the whole system. Therefore, the model is formulated as a (non-cooperative) game problem and is usually based on the wellknown Nash equilibrium concept [3]. Nevertheless, real systems (markets) may give wide variety of these features and different information frameworks. Hence, flexible equilibrium models could also be very useful for derivation of efficient decisions in complex systems. In this paper, we consider a general two-side market model with divisible commodities and price functions of participants. It is based on the auction market models proposed in [4, 5], where the equivalence result with a variational inequality problem was established. Afterwards, some extensions to the multi-commodity case and applications to resource allocation in telecommunication networks were suggested in [6, 7]. The alternative equilibrium concept related to this model was proposed in [8], where it was also shown that the same equilibrium state can be attained within different mechanisms and information exchange schemes, including the completely decentralized competitive mechanism. We now suggest a somewhat more general class of market equilibrium models, which follows the approach from [8]. It is subordinated to the material balance condition and can also be formulated as a variational inequality problem, hence one can utilize the well-developed theory and methods of variational inequalities for investigation and solution finding of this equilibrium model. We give a new existence result for the model in the case where the feasible set is unbounded. Besides, under certain integrability conditions, the model can be also reduced to an optimization problem. We suggest a new cyclic version of the partial linearization method for its decomposable case. We describe extensions of the known network flow equilibrium problem with elastic demands and a resource allocation problem in wireless communication networks and show they are particular cases of the presented market model. This enables us to obtain new existence results for these models and to solve these problems with the partial linearization method.
2 A General Multi-commodity Market Equilibrium Model We start our considerations from a general market model with n divisible commodities, which somewhat extends those in [6, 8]; see also [9] for the vector model. For each commodity s, each trader i chooses some offer value xis in his/her capac , αis ] and has a price function gis . Similarly, each buyer j chooses ity segment [αis some bid value y js in his/her capacity segment [β js , β js ] and has a price function h js . We denote by Is and Js the finite index sets of traders and buyers attributed to commodity s and set N = {1, . . . , n}. Clearly, each trader/buyer can be attributed
Application of Market Models to Network Equilibrium Problems
145
to many commodities. We suppose that the prices may in principle depend on all the bid/offer volumes of all the commodities. That is, if we set x(s) = (xis )i∈Is , x = (x(s) )s∈N , y(s) = (y js ) j∈Js , y = (y(s) )s∈N , and w = (x, y), then gis = gis (w) and h js = h js (w). Let bs denote the value of the external excess demand for commodity s, then b = (bs )s∈N . If it equals zero, the market is closed. Any market solution must satisfy the balance equation, hence we obtain the feasible set of offer/bid values W =
Ws , where
s∈N
Ws = w(s) = (x(s) , y(s) )
i∈Is
xis −
y js = bs ;
j∈Js
xis ∈ [αis , αis ], i ∈ Is , y js ∈ [β js , β js ], j ∈ Js
;
for s ∈ N . A vector w¯ = (x, ¯ y) ¯ ∈ W is said to be a market equilibrium point if there exists a price vector p¯ = ( p¯ s )s∈N such that ⎧ ⎨ ≥ p¯ s = p¯ s gis (w) ¯ ⎩ ≤ p¯ s ⎧ ⎨ ≤ p¯ s = p¯ s h js (w) ¯ ⎩ ≥ p¯ s
and
if x¯is = αis , if x¯is ∈ (αis , αis ), for i ∈ Is ; if x¯is = αis ,
(1)
if y¯ js = β js , if y¯ js ∈ (β js , β js ), for j ∈ Js ; if y¯ js = β js ,
(2)
for s ∈ N . We now give the basic relation between the market equilibrium problem (1)–(2) and a variational inequality (VI, for short). Its proof is almost the same as that in [6, Theorem 2.1] and is omitted. Proposition 1 (a) If (w, ¯ p) ¯ satisfies (1)–(2) for s ∈ N and w¯ ∈ W , then w¯ solves VI: Find w¯ ∈ W such that ⎤ ⎡ ⎦ ⎣ gis (w)(x ¯ is − x¯is ) − h js (w)(y ¯ (3) js − y¯ js ) ≥ 0 ∀w ∈ W. s∈N
i∈Is
j∈Js
(b) If a vector w¯ solves VI (3), then there exists p¯ ∈ Rn such that (w, ¯ p) ¯ satisfies (1)–(2) for s ∈ N . The presence of the price functions is invoked by complexity of the whole system, i.e., the price functions may contain participants’ intentions or reflect interdependence (mutual influence) of the elements, which need not be known to the participants. It follows from Proposition 1 that we can establish existence results for equilibrium problems of form (1)–(2) by using suitable results from the theory of VIs or more general equilibrium problems. For instance, if the feasible set W is bounded and
146
I. Konnov
the cost mapping of VI (3) is continuous, then equilibrium problem (1)–(2) has a solution. In the unbounded case, we need certain coercivity assumptions. We follow the approach from [8, 9] and consider for simplicity the case where all and β js of capacities are fixed and greater than −∞, whereas the lower bounds αis some upper bounds αis and β js can be absent. Then, for each commodity s ∈ N , we define the index sets = +∞} and Jsu = { j ∈ Js | β js = +∞}, Isu = {i ∈ Is | αis
and take the following coercivity condition. (C) There exists a number r > 0 such that for any point w = (x, y) ∈ W and for each s ∈ N it holds that ∀l ∈ Jsu , yls > max{r, β js } =⇒ ∃k ∈ Isu such that xks > αks and gks (w) ≥ h ls (w). This condition seems rather natural: at any feasible point w and for each fixed commodity s, any large demand value of buyer l invokes existence of a trader k whose price is not less than the price of buyer l. Proposition 2 Suppose that the set W is non-empty, all the functions gis and g js are continuous for all i ∈ I , j ∈ J , and s ∈ N . If condition (C) is fulfilled, then VI (3) has a solution. The proof of this assertion is almost the same as those in [8, Theorem 1] and [9, Theorem 4.3] and is omitted.
3 Partial Linearization Methods Due to Proposition 1, we can take various iterative solution methods for optimization and variational inequality problems (see, e.g., [5, 8, 10]) for finding solutions of the market equilibrium problems of form (1)–(2). We now intend to consider a special integrable class of these problems that admits efficient iterative solution methods. Let us first take a problem of minimization of the sum of two functions μ(w) + η(w) over a feasible set W ⊆ Rm , or briefly, min → {μ(w) + η(w)} .
w∈W
(4)
We suppose that the set W ⊂ Rm is non-empty, convex, and compact, both the functions are convex and μ : Rm → R is smooth. Moreover, the minimization of the function η over the set W is not supposed to be difficult. In this case, one can apply the partial linearization (PL for short) method, which was first proposed in [11].
Application of Market Models to Network Equilibrium Problems
147
Method (PL). Choose a point w 0 ∈ W and set k = 0. At the k-th iteration, k = 0, 1, . . ., we have a point w k ∈ W . Find some solution v k of the problem min → μ (w k ), v + η(v)
(5)
v∈W
and define p k = v k − w k as a descent direction at w k . Take a suitable stepsize λk ∈ (0, 1], set wk+1 = w k + λk p k and k = k + 1. The stepsize can be found either with some one-dimensional minimization procedure as in [11] or with an inexact Armijo type line search; see also [12, 13] for substantiation and further development. The usefulness of this approach becomes clear if problem (4) is (partially) decomposable, which is typical for very large dimensional problems. For instance, let η(w) =
ηs (w(s) ) and W =
s∈N
Ws ,
s∈N
where w(s) ∈ Ws ⊂ Rm s , so that m =
m s , i.e., there is some concordant partition
s∈N
of the initial space R . Then, we have the problem m
min → μ(w) +
w∈
Ws
ηs (w(s) ) ,
(6)
s∈N
s∈N
and (5) becomes equivalent to several independent problems of the form min →
v(s) ∈Ws
∂μ(w k ) + ηs (v(s) ) ; v(s) , ∂w(s)
(7)
for s ∈ N . The above descent method admits various componentwise iterative schemes; see, e.g., [14]. Our market equilibrium problem from the previous section written as VI (3) is reduced to problem (4) in the case where the price functions are integrable, i.e., gis (w) =
∂μ(w) ∂ηs (w(s) ) , i ∈ Is , and h js (w) = − , j ∈ Js ; s ∈ N . ∂xi ∂y j
This is the case if these functions are separable, i.e., gis (w) = gis (xis ) for each i ∈ Is and h js (w) = h js (y js ) for each j ∈ Js , for all s ∈ N . More precisely, VI (3) becomes the necessary optimality condition for (4). The reverse assertion is true if the functions μ and η are convex. We now describe an adaptive cyclic componentwise PL method for problem (6), which is some implementation of that from [15]. For each point w ∈ W and each s ∈ N , we define by Vs (w) the solution set of the optimization problem:
148
I. Konnov
min →
v(s) ,
v(s) ∈Ws
∂μ(w) + ηs (v(s) ) ; ∂w(s)
cf. (7). As above we suppose that the functions μ and η are convex, μ is smooth, the set W ⊂ Rm is non-empty, convex, and compact. Under these assumptions Vs (w) is also non-empty, convex, and compact. We define the gap function ϕs (w) = max
v(s) ∈Ws
w(s) − v(s) ,
∂μ(w) + ηs (w(s) ) − ηs (v(s) ) ∂w(s)
for each s ∈ N . For brevity, set f (w) = μ(w) + η(w) and denote by Z+ the set of nonnegative integers. The optimal value of the function f in (4) (or (6)) will be denoted by f ∗ . The adaptive cyclic PL method is described as follows. Method (CPL). Initialization: Choose a point z 0 ∈ W , numbers β ∈ (0, 1), θ ∈ (0, 1), and a sequence {δl } 0. Step 0: Set l := 1, d := 0, s := 1, w 0 := z l−1 , k := 0. Step 1: Solve problem (7). Step 2: Take v(s) ∈ Vs (w k ) and ϕs (w k ). If ϕs (w k ) ≥ δl , take k p(i) =
k v(i) − w(i) 0
if i = s, if i = s;
and go to Step 4. Step 3: If d = 0, set μ := ϕs (w k ), s¯ := s. If d > 0 and μ < ϕs (w k ), set μ := ϕs (w k ), s¯ := s. Set d := d + 1. If d < n, go to Step 5. Otherwise set l1 := l and determine l as the smallest number in Z+ such that δl < μ. Then set z m := w k for all m = l1 , . . . , l − 1, s := s¯ and go to Step 2. (Restart) Step 4: Determine j as the smallest number in Z+ such that f (w k + θ j p k ) ≤ f (w k ) − βθ j ϕs (w k ), set λk := θ j , w k+1 := w k + λk p k , k := k + 1, d := 0. Step 5: If s = n, set s := 1, otherwise s := s + 1. Afterwards go to Step 1. Thus, the method has two levels. Each of its outer iteration l contains some number of inner iterations in k with the sequential verification of descent value for each component with the fixed tolerance δl . Completing each stage, which is marked as restart, leads to decreasing the tolerance value and increasing the counter value l at Step 3. The value s indicates the current component index. The basic properties of CPL are deduced along the same lines as in [15]. Proposition 3 Suppose in addition that the gradient map of the function μ is uniformly continuous on W . Then
Application of Market Models to Network Equilibrium Problems
149
(i) the number of inner iterations at each outer iteration l is finite; (ii) the sequence {z l } generated by Method (CPL) has limit points, all these limit points are solutions of problem (6), besides, lim f (z l ) = f ∗ .
l→∞
The line search procedure in the method admits various modifications. For instance, we can take the exact one-dimensional minimization rule instead of the current Armijo rule. If the gradient of the function μ is Lipschitz continuous, we can take fixed stepsize values and remove the line search procedure at all; see [15] for more details. Remark 1 Due to the presence of the control sequence {δl }, CPL differs essentially from the usual decomposition methods; see, e.g., [14, 16]. At the same time, this technique is rather usual for non-differentiable optimization methods; see, e.g., [17]. It was also applied in iterative methods for linear inequalities [18] and for decomposable variational inequalities [19].
4 A Generalization of Network Equilibrium Problems with Elastic Demands We now consider network flow equilibrium problems with elastic (inverse) demands, which find various applications; see [20, 21, Chap. IV] and references therein. Let us be given a graph with finite sets of nodes M and oriented arcs A which join the nodes so that any arc a = (i, j) has origin i and destination j. Next, among all the pairs of nodes of the graph we extract a subset of origin–destination (O/D) pairs N of the form s = (i → j). Each pair s ∈ N is associated with the set of paths Ps which connect the origin and destination for this pair. Also, denote by x p the path flow for the path p. Given a flow vector x = (x p ) p∈Ps , s∈N , one can determine the value of the arc flow α pa x p (8) fa = s∈N p∈Ps
for each arc a ∈ A, where α pa =
1 0
if arc a belongs to path p, otherwise.
(9)
If the vector f = ( f a )a∈A of arc flows is known, one can determine the disutility value ca ( f ) for each arc. Then one can compute the disutility value for each path p: g p (x) =
a∈A
α pa ca ( f ).
(10)
150
I. Konnov
In the known elastic demand models, each (O/D) pair s ∈ N is associated with one variable value of flow demand and hence one inverse demand (disutility) function; see, e.g., [21, Chap. IV] and references therein. However, many active agents (users) with different disutility functions may have the same physical location for many networks arising in applications. For this reason, we now consider the generalization, where each (O/D) pair s ∈ N may have several pairs of active users hence it is associated with the set of such pairs Bs so that each pair of users j ∈ Bs has its particular flow demand y j and disutility function h j , which can be in principle dependent of the flow demand y, i.e., y = (y j ) j∈Bs ,s∈N . Then one can define the feasible set of flows: p∈Ps x p = j∈Bs y j , x p ≥ 0, p ∈ Ps , . (11) W = w = (x, y) 0 ≤ y j ≤ γ j , j ∈ Bs ; s ∈ N We say that a feasible flow/demand pair (x ∗ , y ∗ ) ∈ W is an equilibrium point if it satisfies the following conditions: ∗
∀s ∈ N , ∃λs such that g p (x ) and
⎧ ⎨ ≤ λs h j (y ∗ ) = λs ⎩ ≥ λs
≥ λs = λs
if x ∗p = 0, if x ∗p > 0,
if y ∗j = 0, if y ∗j ∈ (0, γ j ), if y ∗j = γ j ,
∀ p ∈ Ps ;
∀ j ∈ Bs .
(12)
(13)
Clearly, the equilibrium conditions in (12)–(13) represent some implementation of the multi-commodity two-sided market equilibrium model (1)–(2), where each commodity is associated with an (O/D) pair s ∈ N , its set of traders (carriers) with price functions g p (x) is represented by the paths p ∈ Ps , whereas its set of buyers with price functions h j (y) is represented by the pairs of users j ∈ Bs . We observe that the prices here are not fixed, the dependence of volumes for offer price functions g p is given in (8)–(10) and caused by the complexity of the system topology and by the fact that carriers of different (O/D) pairs can utilize the same links (arcs). We now show that conditions (11)–(13) can be equivalently rewritten in the form of a VI: Find a pair (x ∗ , y ∗ ) ∈ W such that s∈N p∈Ps
g p (x ∗ )(x p − x ∗p ) −
h j (y ∗ )(y j − y ∗j ) ≥ 0 ∀(x, y) ∈ W.
(14)
s∈N j∈Bs
Proposition 4 A pair (x ∗ , y ∗ ) ∈ W solves VI (14) if and only if it satisfies conditions (12)–(13). Proof Writing the usual necessary and sufficient optimality conditions (see [5, Proposition 11.7]) for problem (14), we obtain that there exist x ∗ ≥ 0, y ∗ ∈ [0, γ], and λ such that
Application of Market Models to Network Equilibrium Problems
151
(g p (x ∗ ) − λs )(x p − x ∗p ) ≥ 0 ∀x p ≥ 0, p ∈ Ps , s ∈ N ;
p∈Ps
λs − h j (y ∗ ) (y j − y ∗j ) ≥ 0 ∀y j ∈ (0, γ j ), s ∈ N ;
k∈Bs
p∈Ps
x ∗p =
y ∗j ,
j ∈ Bs , s ∈ N ;
k∈Bs
where λ = (λs )s∈N . However, the first and second relations are clearly equivalent to (12)–(13). If each (O/D) pair is attributed to only one pair of users, we obtain the custom network equilibrium problems with elastic (inverse) demands; see, e.g., [21, Chap. IV]. If all the (O/D) traffic demands in this model are not restricted with upper bounds, we obtain the model considered in [20]. Let us insert the same condition in our model: (15) γ j = +∞ ∀ j ∈ Bs , s ∈ N . Then (13) reduces to the following condition: h j (y ∗ )
≤ λs = λs
if y ∗j = 0, if y ∗j > 0;
∀ j ∈ Bs .
(16)
We can also write some other equivalent network equilibrium conditions, for instance, = 0 if x ∗p > 0 and y ∗j > 0, ∗ ∗ g p (x ) − h j (y ) ≥ 0 if x ∗p = 0 or y ∗j = 0; (17) ∀ p ∈ Ps , j ∈ Bs , s ∈ N . Proposition 5 Let (15) hold. Then, for any pair (x ∗ , y ∗ ) ∈ W , condition (17) is equivalent to (12) and (16). Proof Take an arbitrary pair s ∈ N . Suppose a pair (x ∗ , y ∗ ) ∈ W satisfies conditions (12) and (16). Then, for any p ∈ Ps and j ∈ Bs , the relations x ∗p > 0 and y ∗j > 0 imply g p (x ∗ ) = λs = h j (y ∗ ). Next, each of the relations x ∗p = 0 or y ∗j = 0 implies g p (x ∗ ) ≥ λs ≥ h j (y ∗ ), and (17) holds true. Conversely, suppose a pair (x ∗ , y ∗ ) ∈ W satisfies conditions (17). Fix any s ∈ N and set α = min g p (x ∗ ), α = max h j (y ∗ ), p∈Ps
j∈Bs
then α ≥ α . If x ∗p = 0 for all p ∈ Ps , then y ∗j = 0 for all j ∈ Bs and conversely. Then taking any λs ∈ [α , α ] yields (12) and (16). Otherwise, there exists at least one pair of indices p ∈ Ps , j ∈ Bs such that x ∗p > 0 and y ∗j > 0. Then setting λs = α = α again yields (12) and (16).
152
I. Konnov
It is easy to see that conditions (17) can be replaced with the following: ∗
∗
g p (x ) − h j (y )
> 0 =⇒ x ∗p = 0 or y ∗j = 0, ≥ 0 ⇐⇒ x ∗p ≥ 0 and y ∗j ≥ 0; ∀ p ∈ Ps , j ∈ Bs , s ∈ N .
(18)
Proposition 6 Let (15) hold. Then, for any pair (x ∗ , y ∗ ) ∈ W , condition (18) is equivalent to (12) and (16). The equivalent VI formulation of network equilibrium problems enables us to obtain the existence of solutions rather easily. The feasible set W of the network equilibrium problem defined in (11) is bounded if γ j < +∞ for all j ∈ Bs , s ∈ N . Then VI (14) and hence the equivalent network equilibrium problem are solvable if all the mappings ca , a ∈ A and h j , j ∈ Bs , s ∈ N are continuous. Let us turn to the above pure unbounded case (15). Then the feasible set W in (11) is unbounded. We now deduce a new existence result for VI (14) and hence for the equivalent network equilibrium problem by a direct application of Proposition 2. We need the proper following coercivity condition; cf. (C). (C1) There exists a number r > 0 such that for any point w = (x, y) ∈ W and for each s ∈ N it holds that ∃ j ∈ Bs , y j > r =⇒ ∃ p ∈ Ps such that x p > 0 and g p (x) ≥ h j (y). We observe that condition (C1) implies condition (C) for VI (14) and we obtain the desired existence result. Theorem 1 Suppose that (15) holds, the set W defined in (11) is non-empty, all the functions ca and h j are continuous for all a ∈ A, j ∈ Bs , and s ∈ N . If condition (C1) is fulfilled, then VI (14) has a solution.
5 Implementation of Partial Linearization Methods for Integrable Network Equilibrium Problems In Sect. 3, several versions of partial linearization (PL) methods for special decomposable optimization problems over Cartesian product sets were described for the general multi-commodity market equilibrium model of Sect. 2 in the integrable case. Hence, PL methods can be also applied to integrable network equilibrium problems with elastic demands of Sect. 4. Therefore, we now will suppose that all the functions ca and h j are continuous and separable, i.e., ca ( f ) = ca ( f a ) and h j (y) = h j (y j ). Besides, we assume that ca ( f a ) and −h j (y j ) are monotone increasing functions. Next, we assume that γ j < +∞ ∀ j ∈ Bs , s ∈ N ;
Application of Market Models to Network Equilibrium Problems
153
then the feasible set W is non-empty, convex, and compact and W =
Ws , where
s∈N
p∈Ps x p = j∈Bs y j , ; Ws = w(s) = (x(s) , y(s) ) x p ≥ 0, p ∈ Ps , 0 ≤ y j ≤ γ j , j ∈ Bs for s ∈ N . Here x(s) = (x p ) p∈Ps , y(s) = (y j ) j∈Bs . Due to the separability of the functions ca and h j , their continuity implies integrability, i.e., then there exist functions v j
fa μa ( f a ) =
ca (t)dt ∀a ∈ A, η j (y j ) = 0
h j (t)dt ∀ j ∈ Bs , s ∈ N . 0
Taking into account (8), we see that VI (14) gives a necessary and sufficient optimality condition for the following optimization problem: min →
(x,y)∈W
⎧ ⎨ ⎩
μa ( f a ) −
a∈A
s∈N j∈Bs
⎫ ⎬
η j (y j ) . ⎭
(19)
However, this problem falls into the basic format (6) and the suggested PL methods can be applied to (19). We describe the solution of the basic direction finding problem (7). It now consists in finding an element w¯ (s) = (x¯(s) , y¯(s) ) ∈ Ws , which solves the optimization problem
min
(x(s) ,y(s) )∈Ws
→
⎧ ⎨ ⎩
p∈Ps
g p (x k )x p −
j∈Bs
η j (y j )
⎫ ⎬ ⎭
(20)
for some selected pair s ∈ N . The solution of (20) can be found with the simple procedure below, which is based on optimality conditions (12)–(13). First, we calculate the shortest path q ∈ Ps with the minimal cost. Set λ˜ s = gq (x k ), x¯ p = 0 for all p ∈ Ps . For each j ∈ Bs , we verify three possible cases. ˜ then set y¯ j = 0. Otherwise go to Case 2. Case 1. If h j (0) ≤ λ, ˜ set y¯ j = γ j , x¯q = x¯q + γ j . Otherwise go to Case 3. Case 2. If h j (γ j ) ≥ λ, Case 3. We have h j (γ j ) < λ˜ < h j (0). By continuity of h j , we find the value y¯ j ∈ ˜ set x¯q = x¯q + y¯ j . [0, γ j ] such that h j (y¯ j ) = λ, Therefore, the suggested PL methods can be implemented rather easily.
154
I. Konnov
6 Application of Market Models to Resource Allocation in Wireless Networks In contemporary wireless networks, increasing demand of services leads to serious congestion effects, whereas significant network resources (say, bandwidth and batteries capacity) are utilized inefficiently for systems with fixed allocation rules. This situation forces one to apply more flexible market type allocation mechanisms. Due to the presence of conflict of interests, most papers on allocation mechanisms are devoted to pure game-theoretic models reflecting imperfect competition; see, e.g., [22, 23]. However, certain lack of information about the participants is typical for wireless telecommunication networks (see, e.g., [23, 24]), and some other market models may be suitable here because they can be utilized under minimal information requirements on involved users. We now consider the problem of allocation of services of several competitive wireless network providers for a large number of users, which is very essential for contemporary communication systems. This problem was investigated in [25–28] for wired and wireless network settings, where game-theoretic models for competitive providers were presented. An alternative model, which is based on some VI formulation and uses proper equilibrium conditions, was suggested for this problem in [29, Sect. 6]. We now propose its extension that admits different kinds of users’ behavior. Namely, we suppose that there are m wireless network providers and that all the users are divided into n classes, that is, the users belonging to the same class j are considered as one service consumer with a price function h j (y j ) and a scalar bid volume y j ∈ [0, β j ] for j ∈ N = {1, . . . , n}. Next, each provider i announces his/her price function bi (xi ) depending on the offer volume xi ∈ [0, αi ] for i ∈ M = {1, . . . , m}. However, such joint consumption of wireless network resources yields the additional disutility li (x) for users consuming resources of provider i, where x = (x1 , . . . , xs ) ; see [26–28] for more detail. Hence, the actual price function of provider i for users becomes gi (x) = bi (xi ) + li (x). We can thus define the feasible set of offer/bid values ⎧ ⎫ ⎨ ⎬ xi ∈ [0, αi ], i ∈ M, D = (x, y) xi = yj; ; y j ∈ [0, β j ], j ∈ N ; ⎩ ⎭ i∈M
j∈N
where y = (y1 , . . . , yn ) . Then, we can write the two-sided equilibrium problem that consists in finding a feasible pair (x, ¯ y) ¯ ∈ D and a price λ such that ⎧ ⎧ ⎨ ≥ λ, if x¯i = 0, ⎨ ≤ λ, if y¯ j = 0, = λ, if x¯i ∈ (0, αi ), h j (y¯ j ) = λ, if y¯ j ∈ (0, β j ), gi (x) ¯ ⎩ ⎩ ≤ λ, if x¯i = αi , ≥ λ, if y¯ j = β j , i ∈ M; j ∈ N.
(21)
Application of Market Models to Network Equilibrium Problems
155
Clearly, it is a particular case of those in (1)–(2). Due to Proposition 1, (21) can be replaced with the equivalent VI: Find (x, ¯ y) ¯ ∈ D such that
gi (x)(x ¯ i − x¯i ) −
i∈M
h j (y¯ j )(y j − y¯ j ) ≥ 0 ∀(x, y) ∈ D.
(22)
j∈N
This property enables us to establish existence of solutions for the above problem and develop efficient iterative solution methods. In fact, if all the price functions are continuous and the set D is non-empty and bounded, then VI (22) has a solution. In the unbounded case, some coercivity condition is necessary. For instance, let us consider the case where αi = +∞ for i ∈ M and β j = +∞ for j ∈ N and take the following condition; cf. (C). (C2) There exists a number r > 0 such that for any pair (x, y) ∈ D it holds that yl > r =⇒ ∃k ∈ M such that xk > 0 and gk (x) ≥ h l (yl ). Clearly, (C2) implies (C) for VI (22) and Proposition 2 provides the existence result. Theorem 2 Suppose that the set D is non-empty, the functions gi and h j are continuous for all i ∈ M, j ∈ N . If condition (C2) is fulfilled, then VI (22) has a solution.
7 The Partial Linearization Method for Resource Allocation Problems in Wireless Networks Iterative solution methods for solving VI of form (22) in general require additional monotonicity assumptions for convergence; see, e.g., [5, 10, 14]. Additional solution methods appear in the integrable case where gi (x) =
∂μ(x) , i ∈ M; h j (y j ) = −η j (y j ), j ∈ N . ∂xi
Then, VI (22) gives the optimality condition for the optimization problem: min → f (w), w∈D
f (w) = f (x, y) = {μ(x) + η(y)} , η(y) =
(23)
η j (y j );
j∈N
cf. (4) and (6). In particular, conditional gradient, gradient projection, and Uzawa type methods then can be utilized; see, e.g., [7, 8]. We now only describe a way to implement the custom PL method since the problem is not separable. We suppose in addition that the function μ is smooth and convex, αi = +∞ for all i ∈ M, and
156
I. Konnov
0 ≤ β j < +∞ for all j ∈ N . Then the feasible set D is non-empty, convex, and compact. For more clarity, we rewrite the PL method for problem (23). We define the gap function ϕ(w) = ϕ(x, y) = max
(x ,y )∈D
x − x , μ (x) + η(y) − η(y ) .
Method (PL). Initialization: Choose a point w0 ∈ D, numbers β ∈ (0, 1) and θ ∈ (0, 1), set k := 0. Step 1: Find a solution v k = (x¯ k , y¯ k ) of the problem min → μ (x k ), x + η(y) .
(x,y)∈D
(24)
Step 2: If v k = w k , stop. Otherwise set d k := v k − w k . Step 3: Find p as the smallest number in Z+ such that f (w k + θ p d k ) ≤ f (w k ) − βθ p ϕ(w k ), set σk := θ p , w k+1 := w k + σk d k , k := k + 1, and go to Step 1. The solution of the basic direction finding problem (24) can also be found with the simple procedure, which is similar to that from Sect. 5 and based on the optimality conditions. First, we calculate an index q ∈ M that corresponds to the minimal value gq (x k ) = min gi (x k ) i∈M
and set λ˜ = gq (x k ), x¯ik = 0 for all i ∈ M. For each j ∈ N , we verify three possible cases. ˜ then set y¯ k = 0. Otherwise go to Case 2. Case 1. If h j (0) ≤ λ, j ˜ set y¯ k = β j , x¯ k = x¯ k + β j . Otherwise go to Case 3. Case 2. If h j (β j ) ≥ λ, q q j Case 3. We have h j (β j ) < λ˜ < h j (0). By continuity of h j , we find the value y¯ kj ∈ ˜ set x¯ k = x¯ k + y¯ k . [0, β j ] such that h j (y¯ kj ) = λ, q q j Let us now consider the case where 0 ≤ αi < +∞ for all i ∈ M and 0 ≤ β j < +∞ for all j ∈ N . Then the feasible set D is also non-empty, convex, and compact. Hence, the above PL method can be applied to (23), however, we should then take more complex procedures for solution of problem (24). However, we can eliminate the upper bounds for the variables xi via a suitable penalty approach. For instance, replace problem (23) with the sequence of auxiliary problems of the form
Application of Market Models to Network Equilibrium Problems
157
min → Φ(w, τ ),
(25)
w∈D
Φ(w, τ ) = μ(x) + τ ϕ(x) + η(y), ϕ(x) = 0.5
max{xi − αi , 0}2 ,
i∈M
where τ > 0 is a penalty parameter, the functions μ and η are defined as above. Under the standard assumptions, the sequence of solutions of (25) will approximate a solution of (23) if τ → +∞; see, e.g., [10]. Next, each problem (25) has the previous format without the upper bounds for the variables xi . Hence, we can apply directly the above version of the PL method to (25) with replacing f (w) by Φ(w, τ ). Clearly, (24) is replaced by min → μ (x k ) + τ ϕ (x k ), x + η(y) .
(x,y)∈D
We also have to substitute each function gi (x) with g˜i (x) = gi (x) + τ max{xi − αi , 0} in the procedure of finding its solution. This gives us an alternative way to solve such resource allocation problems in wireless networks.
8 Computational Experiments with Network Equilibrium Test Problems In order to compare the performance of the PL methods, we carried out preliminary series of computational experiments on network equilibrium test problems of form (11)–(13) or (14). We took their adjustment described in Sect. 5. For comparison, we took proper extensions of the known test examples of network equilibrium problems with elastic demands, namely, each (O/D) pair was associated with two pairs of active users. We used the arc cost functions ca ( f a ) = 1 + f a for all a ∈ A and the minimal path cost (disutility) functions h j1(s) (y j1 ) = 30 − 0.5y j1(s) and h j2(s) (y j2(s) ) = 28 − 0.3y j2(s) , where Bs = { j1(s), j2(s)} for all s ∈ N . We took ϕs (w k ) k = ϕ(w k ) = s∈N
as accuracy measure for the methods. Both the PL and CPL methods were implemented with the Armijo line search rule where β = θ = 0.5. Due to the above description, we see that we can take the total number of blocks where the line search procedure was utilized as unified complexity measure for both the methods, which will be called block iterations. Hence we reported this value in the tables for attaining different accuracies. The methods were implemented in C++ with double precision arithmetic. The topology of Example 1 was taken from [30]. The graph contains 25 nodes, 40 arcs, and 5 O/D pairs. We used two rules for changing the parameter δl with δ0 = 10 in CPL. The performance results are given in Table 1.
158 Table 1 Example 1. The numbers of block iterations
I. Konnov Accuracy 0.2 0.1 0.05
PL
CPL
CPL
4970 10785 21260
δl+1 = δl /2 4427 8747 17284
δl = δ0 /l 3519 6411 13425
Table 2 Example 2. The numbers of block iterations
Accuracy
PL
CPL
0.2 0.1 0.05
420 468 504
233 246 256
Table 3 Example 3. The numbers of block iterations
Accuracy
PL
CPL
1 0.5 0.2 0.1
135730 271830 662220 1329910
106308 217932 531032 1082449
The topology of Example 2 was taken from [31, Network 26]. The graph contains 22 nodes, 36 arcs, and 12 O/D pairs. We used the rule δl = δ0 /l with δ0 = 10 in CPL. The performance results are given in Table 2. In Example 3, the data were generated randomly. The graph contained 20 nodes, 114 arcs, and 10 O/D pairs. We used the rule δl = δ0 /l with δ0 = 10 in CPL. The results are given in Table 3. In all the cases, CPL showed certain preference over PL in the number of block iterations.
9 Conclusions We considered the general market model with many divisible commodities and price functions of participants and established existence results for this problem under natural coercivity conditions in the case of an unbounded feasible set. We described extensions of the known network flow equilibrium problems with elastic demands and a resource allocation problem in wireless communication networks and showed they are particular cases of the presented market model. This property enabled us to obtain new existence results for all these models as some adjustments of that for the general market model. Besides, under certain integrability conditions, the market model can be reduced to an optimization problem. We suggested a new cyclic version of the partial linearization (PL) method for its decomposable case. We suggested ways for implementation of the PL method to solve the network equilibrium problems and resource allocation problems in wireless communication networks.
Application of Market Models to Network Equilibrium Problems
159
Acknowledgements The results of this work were obtained within the state assignment of the Ministry of Science and Education of Russia, project No. 1.460.2016/1.4. This work was also supported by the RFBR grant, project No. 16-01-00109a. The author is grateful to Olga Pinyagina for her assistance in carrying out computational experiments.
References 1. Nikaido, H.: Convex Structures and Economic Theory. Academic Press, New York (1968) 2. Okuguchi, K., Szidarovszky, F.: The Theory of Oligopoly with Multi-Product Firms. Springer, Berlin (1990) 3. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951) 4. Konnov, I.V.: On modeling of auction type markets. Issled. Inform. 10, 73–76 (2006) (in Russian) 5. Konnov, I.V.: Equilibrium Models and Variational Inequalities. Elsevier, Amsterdam (2007) 6. Konnov, I.V.: On variational inequalities for auction market problems. Optim. Lett. 1, 155–162 (2007) 7. Konnov, I.V.: Equilibrium models for multi-commodity auction market problems. Adv. Model. Optim. 15, 511–524 (2013) 8. Konnov, I.V.: An alternative economic equilibrium model with different implementation mechanisms. Adv. Model. Optim. 17, 245–265 (2015) 9. Konnov, I.V.: On vector formulations of auction-type problems with applications. Optimization 65, 233–251 (2016) 10. Konnov, I.V.: Nonlinear Optimization and Variational Inequalities. Kazan University Press, Kazan (2013) (In Russian) 11. Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33, 9–23 (1981) 12. Patriksson, M.: Cost approximation: a unified framework of descent algorithms for nonlinear programs. SIAM J. Optim. 8, 561–582 (1998) 13. Bredies, K., Lorenz, D.A., Maass, P.: A generalized conditional gradient method and its connection to an iterative shrinkage method. Comput. Optim. Appl. 42, 173–193 (2009) 14. Patriksson, M.: Nonlinear Programming and Variational Inequality Problems: A Unified Approach. Kluwer Academic Publishers, Dordrecht (1999) 15. Konnov, I.V.: An adaptive partial linearization method for optimization problems on product sets. arXiv:1605.01971v2. Accessed 25 May 2016 16. Migdalas, A.: Cyclic linearization and decomposition of team game models. In: Butenko, S., Murphey, R., Pardalos, P. (eds.) Recent Developments in Cooperative Control and Optimization, pp. 332–348. Kluwer Academic Publishers, Dordrecht (2004) 17. Balinski, M.L., Wolfe, P. (eds.): Nondifferentiable Optimization. Math. Program. Study 3. North-Holland, Amsterdam (1975) 18. McCormick, S.F.: The methods of Kaczmarz and row orthogonalization for solving linear equations and least squares problems in Hilbert space. Indiana Univ. Math. J. 26, 1137–1150 (1977) 19. Konnov, I.V.: A class of combined relaxation methods for decomposable variational inequalities. Optimization 51, 109–125 (2002) 20. Dafermos, S.: The general multimodal network equilibrium problem with elastic demand. Networks 12, 57–72 (1982) 21. Nagurney, A.: Network Economics: A Variational Inequality Approach. Kluwer, Dordrecht (1999) 22. Leshem, A., Zehavi, E.: Game theory and the frequency selective interference channel: a practical and theoretic point of view. IEEE Signal Process. 26, 28–40 (2009)
160
I. Konnov
23. Raoof, O., Al-Raweshidy, H.: Auction and game-based spectrum sharing in cognitive radio networks. In: Huang, Q., (ed.) Game Theory, Sciyo, Rijeka, Ch. 2, pp. 13–40 (2010) 24. Iosifidis, G., Koutsopoulos, I.: Double auction mechanisms for resource allocation in autonomous networks. IEEE J. Sel. Areas Commun. 28, 95–102 (2010) 25. Hayrapetyan, A., Tardos, É., Wexler, T.: A network pricing game for selfish traffic. Distrib. Comput. 19, 255–266 (2007) 26. Korcak, O., Iosifidis, G., Alpcan, T., Koutsopoulos, I.: Competition and regulation in a wireless operators market: an evolutionary game perspective. In: Proceedings of the 6th International Conference on Network Games, Control and Optimization (NETGCOOP), Avignon, pp. 17–24. IEEE (2012) 27. Maillé, P., Tuffin, B., Vigne, J.M.: Competition between wireless service providers sharing a radio resource. In: Proceedings of the 11th IFIP Networking Conference. Part II, pp. 355–365. Springer, Berlin (2012) 28. Zhang, F., Zhang, W.: Competition between wireless service providers: pricing, equilibrium and efficiency. In: 11th International Symposium and Workshops on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt 2013), pp. 208–215. IEEE (2013) 29. Konnov, I.V.: On auction equilibrium models with network applications. Netnomics 16, 107– 125 (2015) 30. Bertsekas, D., Gafni, E.: Projection methods for variational inequalities with application to the traffic assignment problem. Math. Program. Study 17, 139–159 (1982) 31. Nagurney, A.: Comparative tests of multi-modal traffic equilibrium problems. Transp. Sci. B. 18, 469–485 (1984)
Selective Bi-coordinate Variations for Network Equilibrium Problems with Mixed Demand Igor Konnov and Olga Pinyagina
Abstract In the present paper, we propose a modification of the method of bicoordinate variations for network equilibrium problems with mixed demand. This method is based on the equilibrium conditions of the problem under consideration. It uses a special tolerance control and thresholds for constructing descent directions and a variant of the Armijo-type line-search procedure as a rule of step choice. Some results of preliminary numerical calculations which confirm efficiency of the method are also presented. Keywords Bi-coordinate variations · Network equilibrium · Mixed demand
1 Introduction Network equilibrium problems with fixed and elastic demands, which arise in different areas, including transport and telecommunication networks, are known for a long time and examined in detail (see [2, 3, 9, 10]). Being based on the relationships between the market and network equilibrium models discovered in [4], recently one of the authors proposed a new formulation of network equilibrium problem with mixed demand, which includes those with fixed and elastic demands as partial cases [11, 12]. The important advantage of network equilibrium problems is the simple structure of feasible sets. At the same time, they usually have a large dimension, which causes difficulties in practice when performing calculations. This work was supported by Russian Foundation for Basic Research, project No 16-01-00109a. I. Konnov · O. Pinyagina (B) Kazan Federal University, Institute of Computational Mathematics and Information Technologies, Kremlevskaya st. 18, 420008 Kazan, Russia e-mail:
[email protected];
[email protected] I. Konnov e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_11
161
162
I. Konnov and O. Pinyagina
For solving large dimensional problems, coordinate descent methods are attractive, and now they are intensively developed and applied. For conditional optimization problems, one can take marginal-based bi-coordinate descent methods originally proposed in [1, 7]. More flexible versions of bi-coordinate descent methods were presented in papers [5, 6]. They describe a method of bi-coordinate variations with a special threshold control and tolerances for optimal resource allocation problems with simplex type constraints. In the present paper, we apply this approach to network equilibrium problems with mixed demand, propose the corresponding modification of the method for this problem, and perform preliminary numerical calculations.
2 Network Equilibrium Problems Let us remind the formulation of the network equilibrium problem with fixed demand (see for example, [2, 9]). Let us consider a network composed of a set of nodes V and a set of directed links A. In addition, we have a set W of origin-destination (O/D) pairs (i, j), i, j ∈ V . For each O/D-pair with index w ∈ W , a set of paths Pw is known (each path is a simple chain of links starting at the origin and ending at the destination of O/D-pair) and a demand value yw > 0 is given, which presents a flow outgoing from the origin and ingoing to the destination. Usually, it corresponds to the transport or information flow. We denote I = 1, . . . , n, where n = w∈W |Pw |. The problem is to distribute the required demands for all O/D-pairs among the set of paths by using a certain (equilibrium) criterion. We denote by x p a value of flow passing along path p. Then, the feasible set for the path flow vectors is defined as follows: ⎧ ⎫ ⎨ ⎬ X = x x p = yw , x p ≥ 0, p ∈ Pw , w ∈ W . ⎩ ⎭ p∈Pw
Paths and links are connected with the help of the incidence matrix with elements α pa =
1, 0,
if link a belongs to path p; otherwise.
The flow value for each link a ∈ A is defined as the sum of the corresponding path flows: fa = α pa x p . (1) w∈W p∈Pw
For each link a, a continuous cost function ca is given; it can depend on all link flows in the general case. The summary cost function for path p has the form:
Selective Bi-coordinate Variations for Network Equilibrium …
g p (x) =
163
α pa ca ( f ),
a∈A
where f is the vector with components f a , a ∈ A. The equilibrium condition for this network consists in finding an element x ∗ ∈ X such that ∀w ∈ W, q ∈ Pw , xq∗ > 0 =⇒ gq (x ∗ ) = min g p (x ∗ ). p∈Pw
Therefore, only paths with minimal costs have nonzero flows. It is the Nash equilibrium based on the user-optimization principle, which asserts that a network equilibrium is established when no O/D-pair can decrease its cost by making a unilateral decision to change its path flows. It is well known [9, 10] that this problem is equivalent to the variational inequality: find a point x ∗ ∈ X such that G(x ∗ ), x − x ∗ ≥ 0
∀x ∈ X,
where the vector G is composed of components g p , p ∈ Pw , w ∈ W, respectively. In contrast to the network equilibrium problem with fixed demand, in the problem with elastic demand, the demand values are variables. Then, the feasible set takes the form: ⎧ ⎫ ⎨ ⎬ K = (x, y) x p = yw , x p ≥ 0, p ∈ Pw , w ∈ W . ⎩ ⎭ p∈Pw
Here, y is a vector with variable components yw , w ∈ W . In this problem, for each O/D-pair w ∈ W , a continuous so-called disutility function h w with respect to demand is given. In the general case, it can depend on the whole demand vector y. Therefore, the network equilibrium problem with elastic demand is to find an element (x ∗ , y ∗ ) ∈ K such that G(x ∗ ), x − x ∗ − H (y ∗ ), y − y ∗ ≥ 0 ∀(x, y) ∈ K .
(2)
Here, the vector H is composed of the components h w , respectively. It is well known [9, 10] that equilibrium conditions for this problem have the following form: a vector (x ∗ , y ∗ ) ∈ K is a solution to problem (2), if for all p ∈ Pw , w ∈ W it holds that = h w (y ∗ ), if x ∗p > 0 ; ∗ g p (x ) ≥ h w (y ∗ ), if x ∗p = 0.
164
I. Konnov and O. Pinyagina
In other words, at each equilibrium point, the path costs (for nonzero flows) are equal to the disutility function value for the associated O/D-pair. At last, we consider the network equilibrium problem with mixed demand originally proposed in [11]: find a vector (x ∗ , y ∗ ) ∈ U such that G(x ∗ ), x − x ∗ − H (y ∗ ), y − y ∗ ≥ 0 ∀(x, y) ∈ U,
(3)
where U=
⎧ ⎨ ⎩
(x, y)
x p = yw + ywconst , x p ≥ 0, yw ≥ 0, p ∈ Pw , w ∈ W
p∈Pw
⎫ ⎬ ⎭
.
In this problem, for each O/D-pair, the variable yw and fixed ywconst demands are simultaneously presented (ywconst ≥ 0, ∀w ∈ W ). It is well known that equilibrium conditions for problem (3) are the following [11]: a vector (x ∗ , y ∗ ) ∈ U is a solution to problem (3) if and only if for all p ∈ Pw , w ∈ W it satisfies conditions (a) if x ∗p > 0, then g p (x ∗ ) = minq∈Pw gq (x ∗ ), (b) if x ∗p > 0 and yw∗ > 0, then g p (x ∗ ) = h w (y ∗ ), (c) if x ∗p = 0 or yw∗ = 0, then g p (x ∗ ) ≥ h w (y ∗ ). In what follows, we assume that each link cost function ca depends on f a only, ∀a ∈ A, each disutility function h w depends on yw only, ∀w ∈ W . Then, the mappings G and H are potential, and there exist functions fa μa ( f a ) =
yw ca (t)dt ∀a ∈ A, σw (yw ) =
0
h w (t)dt ∀w ∈ W. 0
In this case, variational inequality (3) presents the optimality condition for the following optimization problem: min −→ ψ(u),
(4)
σw (yw ) , f a , ∀a ∈ A are defined
u∈U
where u = (x, y), ψ(x, y) =
a∈A
μa ( f a ) −
w∈W
in (1). Therefore, each solution to problem (4) solves problem (3). The reverse assertion is true, if, for example, the mappings G and −H are monotone. In the following section, we propose the method of bi-coordinate variations for solving problem (3) when the mappings G and H are potential.
Selective Bi-coordinate Variations for Network Equilibrium …
165
3 Method of Bi-coordinate Variations The classical coordinate descent methods for unconditional minimization problems are based on the choice of only one coordinate as a descent direction. For conditional optimization problems, this rule may cause difficulties. In the bi-coordinate method, as we see from its name, one chooses (at least) two coordinates. In papers [5, 6], a method of bi-coordinate variations with a special threshold control and tolerances have been proposed for solving resource allocation problems with simplex-type constraints. Let us explain the idea of this method applied to the network equilibrium problem (3). We note that at any optimal point (x ∗ , y ∗ ) ∈ U of problem (3) for any w ∈ W in view of equilibrium condition (a), the values of the corresponding components of vector G with nonzero path flows are equal: ∀w ∈ W, i, j ∈ Pw , xi∗ > 0, x ∗j > 0 =⇒ gi (x ∗ ) = g j (x ∗ ). At the same time, due to equilibrium condition (b) for each nonzero variable demand, the cost values for paths p ∈ Pw with nonzero flows are equal to the value of disutility function for this O/D-pair w ∈ W : if x ∗p > 0 and yw∗ > 0, then g p (x ∗ ) = h w (y ∗ ). Therefore, it is reasonable to “adjust” deviating values of path cost functions and disutility functions, i.e., to increase the small ones and to decrease the great ones. We also note that we need an additional condition, which coordinates values we can decrease. The value is suitable to decreasing, if it exceeds a certain threshold ε > 0. We denote the sets of “active” indices by Iε (x) = {i = 1, 2, . . . , n | xi ≥ ε}, Jε (y) = { j ∈ W | y j ≥ ε}, respectively. The proposed method has the two-level scheme. On the inner level, we minimize the objective function with fixed values of parameters ε and δ and on the upper level we decrease values of these parameters. We use the inexact Armijo-type line-search as the rule of step choice. Method of bi-coordinate variations (BCVM) Step 0. Choose a stop criterion and an accuracy value, an initial point u 0 ∈ U , sequences {εk } 0, {δk } 0, k = 1, 2, . . . , parameters β ∈ (0, 1), θ ∈ (0, 1). Set k := 1. Step 1. Set l := 0, vl := u k−1 . Step 2. If for the point vl the stop criterion is fulfilled, then we obtained the given accuracy, the iterative process stops. Otherwise, set (x l , y l ) := vl . Step 3. Choose at least one (or more) pair of indices, no more than one pair for each w ∈ W such that either (i, n + j) : gi (x l ) − h j (y lj ) ≤ −δk , i ∈ P j , j ∈ W,
(5)
166
I. Konnov and O. Pinyagina
(denote the sets of chosen indices by Il+ and Jl+ , respectively) or (i, n + j) : gi (x l ) − h j (y lj ) ≥ δk , i ∈ Iεk (x l ), j ∈ Jεk (y l ), i ∈ P j ,
(6)
(denote the sets of chosen indices by Il− and Jl− , respectively) or (i, j) : gi (x l ) − g j (x l ) ≥ δk , i ∈ Iεk (x l ), i, j ∈ Pw ,
(7)
(denote the sets of chosen indices by Il and Jl , respectively). If no such pairs exist, set u k := (x l , y l ), k := k + 1 and go to Step 1. Step 4. Construct the descent direction d l with components ⎧ ⎪ ⎨1, dsl = −1, ⎪ ⎩ 0,
if s ∈ Il+ ∪ Jl+ ∪ Jl , if s ∈ Il− ∪ Jl− ∪ Il , in all other cases.
Step 5. Find the smallest nonnegative integer b such that the condition is fulfilled (the Armijo-type inexact line-search) ψ(vl + θ b εk d l ) − ψ(vl ) ≤ βθ b εk ψ (vl ), d l . Set λl := εk θ b , vl+1 := vl + λl d l , l := l + 1 and go to Step 2. We note that for the network equilibrium problem with elastic demand in Step 3, only conditions (5) and (6) should be applied. In the network equilibrium problem with fixed demand, there are variables x ∈ X only, and Step 3 is reduced to condition (7). Now, we will establish the convergence properties of the proposed method. Proposition 1 The line-search procedure at Step 5 of BCVM is finite. The proof follows Lemma 3.1 [5] and is omitted. Proposition 2 Let the function ψ be coercive on U . Then, the inner iterative process (Steps 2–5) of BCVM is finite. The proof follows Propositions 3.1 and 6.1 from [5] and is also omitted. Theorem 1 Let the function ψ be coercive on U . Then, the sequence {u k } generated by BCVM has limit points, all of them are solutions to VI (3). Provided that the function ψ is convex, they are also solutions to the optimization problem (4). Proof Although this theorem follows Theorems 3.1 and 6.1 from [5], we present its complete proof in order to make the paper self-sufficient. At first, we note that the sequence {u k } is bounded and have limit points. In ¯ We addition, ψ(u k+1 ) ≤ ψ(u k ), therefore there exists a limit limk→∞ ψ(u k ) = ψ. k ks take any limit point u¯ = (x, ¯ y¯ ) of sequence {u }, denote by {u } a subsequence
Selective Bi-coordinate Variations for Network Equilibrium …
167
converging to this point. For convenience, we denote u k = (x k , y k ) in the framework of this proof. Further, by construction for all k > 0, the condition is fulfilled gi (x k ) − h j (y k ) > −δk ∀ j ∈ W, i ∈ P j . Proceeding to the limit as k = ks → ∞, we obtain ¯ ≥ h j ( y¯ ) ∀ j ∈ W, i ∈ P j . gi (x)
(8)
Therefore, equilibrium condition (c) holds. On the other hand, by construction for all k > 0, the condition is true gi (x k ) − h j (y k ) < δk ∀i ∈ Iεk (x k ), j ∈ Jεk (y k ). Let w ∈ W , p ∈ Pw be arbitrary indices such that x¯p > 0 and y¯w > 0. Then, the conditions x kps ≥ εks , ywks ≥ εks are fulfilled for sufficiently great numbers s. Therefore, g p (x ks ) − h w (y ks ) < δks . Proceeding to the limit as s → ∞, we obtain ¯ ≤ h w ( y¯ ). g p (x) Hence, in view of (8) with x¯p > 0, y¯w > 0 and p ∈ Pw , we have ¯ = h w ( y¯ ). g p (x)
(9)
At last, by construction for all k > 0, the condition holds gi (x k ) − g j (x k ) < δk ∀i ∈ Iεk (x k ), i, j ∈ Pw , w ∈ W. Let p ∈ Pw be an arbitrary index for certain w ∈ W such that x¯p > 0. Then, it holds that x kps ≥ εks for sufficiently great numbers s. Therefore, g p (x ks ) − g j (x ks ) < δks ∀ j ∈ Pw . Proceeding to the limit as s → ∞, we obtain ¯ ≤ g j (x) ¯ ∀ j ∈ Pw . g p (x)
(10)
Hence, due to (8)–(10), the chosen point (x, ¯ y¯ ) satisfies equilibrium conditions (a)–(c) and is a solution to VI (3). If the function ψ is convex, then it is also a solution to problem (4).
168
I. Konnov and O. Pinyagina
Fig. 1 Example 1, 22 nodes, 12 O/D-pairs
4 Computational Experiments We compare the method of bi-coordinate variations (BCVM) and the ordinary conditional gradient method (CGM), and present the results of preliminary numerical experiments for the network equilibrium problem with fixed demand. In Example 1 (see Fig. 1), we consider a network from paper [8]. We set link cost functions ca ( f a ) = 1 + 0.5 f a for all a ∈ A, the fixed demand ywconst = 5 for all w ∈ W. We use the stop criterion of the conditional gradient method: ¯ < Δ, ψ (x), x − x
(11)
¯ = minz∈X ψ (x), z. Numbers of iterations of the inner process where ψ (x), x (Steps 2–5 of BCVM) and calculation time for different accuracy values are presented in Table 1. In the following examples, we used random data. We generated N nodes with random coordinates in the rectangle (0, 0, 1000, 750). If the distance between two nodes is less than 300, there exists two directed links connecting these nodes in both directions. We also randomly generated K O/D-pairs. We note that the dimension of the network equilibrium problem (the number of feasible paths for all O/D-pairs) is usually great, but the solution often contains many zero values. Therefore, in practice, we use the following “trick”. Instead of sets
Table 1 Example 1, numbers of iterations and calculation time Δ BCM 0.1 0.01 0.001
116 it., 16 ms 137 it., 31 ms 147 it., 47 ms
CGM 885 it., 93 ms 11228 it., 141 ms 114961 it., 1076 ms
Selective Bi-coordinate Variations for Network Equilibrium …
169
Table 2 Numbers of iterations and calculation time for examples with different dimensions ¯ N K BCM CGM w∈W | Pw | 50 80 100 100 200
20 26 30 40 50
200 600 850 1000 2700
2715 it., 0.09 s 6143 it., 0.218 s 11029 it., 0.39 s 13608 it., 0.515 s 37719 it., 3.5 s
6302 it., 3.04 s 8254 it., 13.1 s 9516 it., 22.8 s 11953 it., 32.34 s 9896 it., 130.2 s
Pw , w ∈ W , we use their approximations P¯w . On the initial stage, we choose some nonempty subset P¯w ⊂ Pw for all w ∈ W and at each iteration, they can increase including new shortest paths. At some moment, the subsets P¯w stop to increase. The calculation results for several problems with a given error Δ = 0.1 are presented in Table 2. Beside numbers of nodes and O/D-pairs, we adduce approximate dimensions of solutions, i.e., the value w∈W | P¯w |. The program has been written in Visual C++, tested on an Intel i3-4170 CPU at 3.7 GHz, 4 GB RAM, running under Windows 7.
5 Conclusion In the present paper, we apply the bi-coordinate variations approach to network equilibrium problems with mixed demand. It is based on the equilibrium conditions of the problem under consideration and uses a special tolerance control and thresholds for constructing descent directions. Results of preliminary numerical calculations show that this approach is rather efficient and promising for further investigations.
References 1. Dafermos, S., Sparrow, F.: The traffic assignment problem for a general network. J. Res. Nat. Bur. Stand. 73B, 91–118 (1969) 2. Dafermos, S.: Traffic equilibrium and variational inequalities. Transp. Sci. 14, 42–54 (1980) 3. Dafermos, S.: The general multimodal network equilibrium problem with elastic demand. Networks. 12, 57–72 (1982) 4. Konnov, I.V.: On auction equilibrium models with network applications. Netnomics 16, 107– 125 (2015). https://doi.org/10.1007/s11066-015-9095-6 5. Konnov, I.V.: Selective bi-coordinate variations for resource allocation type problems. Comput. Optim. Appl. 64(3), 821–842 (2016). https://doi.org/10.1007/s10589-016-9824-2 6. Konnov, I.V.: A method of bi-coordinate variations with tolerances and its convergence. Russ. Math. (Iz. VUZ) 60(1), 68–72 (2016). https://doi.org/10.3103/S1066369X16010084 7. Korpelevich, G.M.: Coordinate descent method for minimization problems with constraints, linear inequalities, and matrix games. In: Goldshtein, E.G. (ed.) Mathematical Methods for Solving Economic Problems, vol. 9, pp. 84–97. Nauka, Moscow (1980) (In Russian)
170
I. Konnov and O. Pinyagina
8. Nagurney, A.: Comparative tests of multimodal traffic equilibrium methods. Transp. Res. 18B(6), 469–485 (1984). https://doi.org/10.1016/0191-2615(85)90013-X 9. Nagurney, A.: Network Economics: A Variational Inequality Approach. Kluwer, Dordrecht (1999) 10. Patriksson, M.: The Traffic Assignment Problem: Models and Methods. Courier Dover Publications, Mineola, USA (2015) 11. Pinyagina, O.: On a Network Equilibrium Problem with Mixed Demand. In: Kochetov, Yu. et al. (eds.) DOOR-2016. LNCS, vol. 9869, pp. 578–583. Springer, Heidelberg (2016). https:// doi.org/10.1007/978-3-319-44914-2_46 12. Pinyagina, O.: The network equilibrium problem with mixed demand. J. Appl. Ind. Math. 11(4), 554–563 (2017). https://doi.org/10.1134/S1990478917040135
Developing a Model of Topological Structure Formation for Power Transmission Grids Based on the Analysis of the UNEG Sergey Makrushin
Abstract Methods of the current research are based on the nodes degrees distribution analysis. Information about the United National Electricity Grid (UNEG)— Russia’s power transmission grid is used in the paper as a source of empirical data about power grids topology. As a result of the analysis, we get a conclusion that universal models of complex network theory are not applicable to the UNEG modelling. Nevertheless, the creation of a compound model, which generates networks with nodes degrees distributions similar to the negative binomial distribution, is promising. The analysis of the UNEG nodes degrees distribution helped us to identify the key principles for a compound model. According to these principles, we chose from the ad hoc models of power network formation the Random Growth Model (RGM) as a good base for creation of the compound model. However, the RGM has a significant flaw in the mechanism of formation of transit nodes in a power grid. We found a way to fix it by adding to the RGM an intermediate phase of network growth, which occurs between the phase of global optimal growth and the phase of self-organized growth. A new’project’ stage of network formation could correctly depict the formation of transit nodes, which are created in large-scale projects of long power line routes creation. Keywords Power grid · Degree distribution · Network topology · Network growth
1 Introduction The aim of the current research is to find principles for creating a model of topological structure formation for power transmission grids. In the research, information about the United National Electricity Grid (UNEG) is used as a source of empirical data about power grids structure. The UNEG is the power transmission grid of Russia, S. Makrushin (B) Financial University under the Government of the Russian Federation, 49 Leningradsky Prospekt, GSP-3, 125993 Moscow, Russia e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_12
171
172
S. Makrushin
the major part of which is managed by Federal Grid Company of the United Energy System (FGC UES). The research methods are based on complex network theory, which is widely used for power transmission grids analysis [11] and for developing models of topological structure formation for infrastructure networks [13, 15, 16]. A special role in the research is assigned to the analysis of the nodes degrees distribution of the UNEG and other power transmission grids.
1.1 Motivation of the Research The model of topological structure formation for power transmission grids would give us a tool for advanced analysis of networks and would help to develop a new approach for long-term forecasts of networks development. In the process of development of a model of topological structure formation, in fact, we will conduct reverse engineering of a complex system and will describe it as a relatively simple set of rules which manage the growth of the network. Analysis of these rules will give us an understanding of the key parameters, which defines the network structure. With a model of topological structure formation, we will be able to generate a large number of test cases of artificial power grids, which have realistic topologies and match specified parameters, such as the number of links and nodes, spatial distribution of nodes, etc. Generated power grids could be used as zero models for discovering individual characteristics of a real power grid compared to a class of networks with the same topological structure. This approach will lead to a more correct analysis of network properties in comparison with using network topology properties absolute values or in comparison with using zero models generated by one of common topology structure formation models, which do not form a topology structure similar to power grids. Moreover, the model of topological structure formation could also be used for generation of scenarios of a power grid development. A statistical analysis of the topology properties of a big bunch of generated scenarios could be used as a new approach for long-term forecasting of a power grid structure development.
1.2 Empirical Data In this work, a UNEG computer model is used as a source of empirical data about power grids structure. In the computer model, the UNEG appears as a network, with electric power stations and substations as nodes; and high-voltage power lines as links. The UNEG computer model has been created for the main operating regions of the UNEG except from the united national energy systems of Siberia, East and Urals (partially). The model contains data on 514 nodes and 614 links. Besides networks topology, the computer model includes information about geographical coordinates and binding to administrative regions of nodes, voltage levels of nodes and links.
Developing a Model of Topological Structure Formation for Power Transmission …
173
Fig. 1 Visualization of the UNEG network model (nodes shown in accordance with their geographical coordinates)
The data for the computer model creation has been taken from official papers [10], online UNEG map from FGC UES [14] and OpenStreetMap GIS [9]. A visualization of the UNEG computer model with nodes situated in accordance with their geographical coordinates is shown in Fig. 1. The computer model processing and the algorithm development have been done with Python programming language and NetworkX library [8]. Using the computer model of the UNEG, the basic properties of the network have been found: average path length is 11.9 (hops) and average clustering coefficient is 0.081. This paper is organized as follows. Section 2 describes using nodes degrees distribution as a criterion of applicability of models of topological structure formation. Section 3 reviews universal models of topological structure formation, which are applicable for power transmission grids. Section 4 evaluates acceptance of universal models of topological structure formation for the UNEG. Section 5 describes the special role of transit nodes in power grids. Section 6 reviews ad hoc models of topological structure formation. Section 7 describes the random growth model analysis and the possibility of its development. Section 8 concludes the paper.
174
S. Makrushin
2 Nodes Degrees Distribution Criterion Works in the field of complex network theory have shown that there are several universal network models which are applicable to a wide set of real-world networks from different problem domains. The point is that all real-world complex systems (which in a complex network theory are represented as networks) are not formed instantaneously, but are grown in a process which is managed by a set of rules. Similarity of the rules which manage the growth of complex networks makes it possible to create universal complex networks models, which correctly describe topology of networks from different problem domains. Thus, when we choose or build a model of topological structure formation for a type of networks, we can get an understanding of the rules which manage the growth of these networks. Analysis of these rules could provide clues for main parameters which defines the structure of a network and consequently could help to compare correctly different networks from one problem domain, and build up long-term forecasts of the network development. In the current research, two ways are used for developing a model of topological structure formation for power grids: analysis of applicability of universal models from complex network theory (current section and Sect. 4) and analysis of ad hoc models of topological structure formation for power grids (Sect. 6). Analysis of nodes degrees distribution is used in the research as the main criterion of applicability for different network models. This criterion is quite popular in complex network theory [2, 5], and particularly in analysis of power grids [11]. The nodes degrees distribution P(k) defines the probability that a randomly picked network node has a degree k (i.e. has k links). Actually, a nodes degrees distribution can be considered as a probability mass function for a discrete random variable— degree of a network node. The popularity of using the nodes degrees distribution is caused by simplicity of its construction for any network. P(k) value could be found as P(k) = N (k)/N , where N (k) is the quantity of nodes of a network which have a degree k and N is the total quantity of nodes in the network. Usage of the nodes degrees distribution as a criterion of applicability for models of topological structure formation is based on the requirement of similarity of nodes degrees distributions for an original network and its analogue, generated by a model of topological structure formation. As shapes (distribution laws) of nodes degrees distributions are well known for many universal models, analysis of nodes degrees distributions could help to choose the most suitable universal model. Analysis of power grids nodes degrees distributions in different works [11] shows that for power grids, the most interesting are the Poisson distribution, the zeta distribution (discrete analogue of power-law probability distribution) and the geometrical distribution (discrete analogue of the exponential distribution). For each of these models in complex network theory, there is a universal model of topological structure formation; and we provide their short description further.
Developing a Model of Topological Structure Formation for Power Transmission …
175
3 Universal Models of Topological Structure Formation for Power Transmission Grids 3.1 Barabasi–Albert Model In previous works, there are a lot of statements about empirical evidence of a power law in nodes degrees distributions for different types of networks including power grids [1, 11]. As a nodes degrees distribution is defined as a discrete random variable, instead of the power law, it is more correct to use the zeta distribution = P(k)−γ n k −γ /ζ (γ ) which is a discrete analogue of the power law (here, ζ (γ ) = ∞ n=1 is Riemann zeta function). Networks with the zeta distribution of nodes degrees are usually named scale-free networks in the literature. The Barabsi–Albert model [1] is the most popular model of network topology structure formation for scale-free networks. In the Barabsi–Albert model, new nodes are added consequentially and form new links to previously added nodes in accordance to the preferential attachment principle. The preferential attachment principle declares that a new node ‘prefers’ (has a greater probability) to form a link to nodes which already have more links than other nodes in a network. As a result of using the preferential attachment principle in a network, several hub nodes with a very high degree and a lot of nodes with a degree which is significantly lower than average are formed in terms of the nodes degrees distribution. The presence of a few hubs is observed as the fat tail of a distribution, which is typical for the zeta distribution. Except for the preferential attachment principle, also the historical principle, which declares that nodes are added consequentially, is used in the Barabsi–Albert model. As a result of using the historical principle, the degree of a node is dependent on the lifetime of a node.
3.2 Model of Growing Random Network Meta-analysis in [11] shows that in works on complex network theory, in which a nodes’ degrees distribution of power grids was analysed, an exponential distribution of nodes degrees was found in most cases (in 8 papers out of 11). As in the previous case, instead of continuous exponential distribution P(k) = Ce−ak , it would be more correct to use its discrete analogue. Using substitutions a = −ln(1 − p) and C = p/(1 − p), we will get P(k) = (1 − p)k−1 p the discrete geometric distribution law. For generation of networks with the geometric law of nodes degrees distribution, there is the model of growing random network (dynamic variation on the Poisson random network model) described in [8]. In this model, a complete graph with m nodes is created at the first step of network creation. After that new nodes are added consequentially, and each of them forms m new links to previously added nodes, choosing them randomly without any preferences. Like in the Barabsi–Albert model, the model of growing random network uses the historical principle but does not use
176
S. Makrushin
the preferential attachment principle. It means that the difference between expected values of degrees of nodes is primarily described by the difference in their lifetime in a network. As a result of using the historical principle in the model of growing random network, a group of ‘aged’ hub nodes with relatively high degrees are formed in a generated network. But this group is bigger and difference in degrees between hubs and other nodes is lower than in case of the fat tail of the Barabsi–Albert model.
3.3 Methods of Nodes Degrees Distribution Analysis In previous works of the nodes degrees distribution of power transmission grids, a regression model was used for the analysis of points on a nodes degrees distribution plot, which looks like the plot of the UNEG data (see Fig. 2). However, despite the high value of R 2 , using a regression model for a probability mass function of a discrete random variable is incorrect, because different points on such graphs are not observations, but aggregations of different quantities of observations. For example, for the UNEG, a data point for P(k = 2) is defined as a result of more than 200 facts (observations of nodes degrees), but a data point for P(k = 10) is defined as a result of only one fact. Formally, regression model, in this case, violates the assumption of the Gauss–Markov theorem about the homoscedasticity of errors. In this case, it is better to use methods based on maximum likelihood estimation, such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) instead of a regression model. For example, this approach is used in [6]. Therefore, regressions which were made in the previous researches are disproportionally sensitive to the shape of the tail of a nodes degrees distribution and do not sufficiently take into account the shape of the head of a distribution. Consequently, we could interpret previous studies as estimations of shapes of tails of nodes degrees
Fig. 2 The UNEG nodes degrees distribution approximation (shown in different coordinate axes)
Developing a Model of Topological Structure Formation for Power Transmission …
177
distributions for power grids, which have predominantly the geometrical (exponential) distribution. But besides testing distributions which fit the tails of power grids nodes degrees distributions, we need to consider distributions which could fit the heads of these distributions. It is especially, important because most nodes in power grids have a low degree. For example, in the UNEG, 81% of nodes have degrees less or equal to 3. Visual analysis of the head of the UNEG nodes degrees distribution (see Fig. 2) and the heads of other power grids distributions [11, 13] shows that it looks like a head of the Poisson distribution. But in our case, it is necessary to use the Poisson distribution law shifted by one to the right, because degrees of all nodes in a power grid are greater or equal to 1.
3.4 Geometric Graph Model In complex network theory, there are two widely used network topology structure formation models which have nodes degrees distributions that fit the Poisson distribution law: the Erdös-Rényi model [3] and the random geometric graph model [12]. Further, we will discuss only the random geometric graph model, because this model is much closer to the power grids case. The random geometric graph model describes network topology generation for a network with nodes situated on a plain (or in a metric space with other quantity of dimensions). In this model, all nodes are randomly situated in some region of a plain and then every node is linked to all nodes which are situated nearer than R to this node. Here, R is a parameter of the model—the maximum length of a link. In case, when the random geometric graph model generates network with a large quantity of nodes, and nodes are randomly and uniformly distributed in a plane region, and R is much lower than the plain region size, any link creation is virtually a result of a high quantity of Bernoulli trials with a low probability. Due to the Poisson limit theorem, the distribution of the quantity of links for a random node in this model could be approximated to the Poisson distribution law P(k) = e−λ λk /k!, where parameter λ equals an average degree of nodes in a generated network [12]. The random geometric graph model has a good interpretation for networks with spatial referencing of nodes and a cost of links creation dependent on their length. This model uses in a very strong manner the geometrical principle of network growth, which postulates that link creation is dependent on metric distance between nodes which this link connects.
3.5 Negative Binomial Distribution Nevertheless, as we can see from Fig. 2, the shifted Poisson distribution law and the geometrical (exponential) distribution law is adequate to the empirical UNEG
178
S. Makrushin
distribution only in the head and in the tail of this distribution, respectively. However, there exists the negative binomial distribution (or the Pascal distribution), which for certain parameter values could have a head similar to the head of the Poisson distribution and a tail similar to the tail of the geometrical distribution. The negative binomial distribution is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed Bernoulli trials (with success probability p) before a specified number of successes (denoted r ) occurs. Probability −1 (1 − p)k pr . From the form of mass function of this distribution is P(k) = Crr+k−1 the function, we can conclude that geometrical distribution is a special case of the negative binomial distribution for r = 1; moreover, when the value of r is low, the tail of the negative binomial distribution (P(k) values for k r ) will have a shape alike to the tail of a geometrical (exponential) distribution law. As in the case of the Poisson distribution law, we need to shift the negative binomial distribution by one to the right, because degrees of all nodes in our networks are greater than or equal to 1. Unfortunately, at the moment, a network topology structure formation model for the negative binomial distribution of nodes degrees does not exist. But despite this, the negative binomial distribution is used in works based on complex network theory as a variant of the describing function in the analysis of nodes degrees distribution of in empirical networks. In some cases, negative binomial distribution even becomes the best variant of the describing function [6]. Despite the absence of a corresponding model, we could very tentatively interpret the negative binomial distribution of nodes degrees as a superposition of the random geometric graph model and the growing random network for nodes with low and high degrees, respectively. It could be quite acceptable regarding the principles which we postulate for each model.
4 Acceptance of Universal Models of Topological Structure Formation for the UNEG Data As a result of the above analysis, we have chosen the following distributions for testing their accordance with the UNEG nodes degrees distribution: the zeta distribution (a discrete analogue of a power law and corresponding to the Barabsi–Albert model), the geometric distribution (a discrete analogue of the exponential distribution and corresponding to the model of the growing random network), the Poisson distribution (corresponding to the random geometric graph model) and the negative binomial distribution (very tentatively as a superposition of the random geometric graph model and the growing random network). Taking into account the criticism of using the regression method, we have used a more appropriate methodology for testing the accordance of different distributions to the UNEG empirical data on the nodes degrees distribution. For every tested degrees distribution, we have fit parameters to the UNEG distribution using maximum likelihood estimation. Second, AIC and BIC criteria values were calculated for the comparison of the adjustment quality
Developing a Model of Topological Structure Formation for Power Transmission …
179
Table 1 Comparison of different distributions with the parameters adjusted to the UNEG nodes degrees distribution 1 2 3 4 5 6 Distribution Parameters ln of AIC BIC χ2 χ 2 P-value likelihood function Zeta Geometric Poisson Negative binomial
γ = 1.83 p = 0.40 λ = 1.49 r = 3, p = 0.67
−1025.7 −864.3 −856.8 −834.4
2053.4 1730.6 1715.6 1672.8
2057.6 1734.8 1719.8 1681.3
430.5 95.1 88.2 33.1
7.73e−91 5.57e−19 1.57e−17 1.13e−6
between different distributions laws. Also for every distribution law, the hypothesis of its applicability to the UNEG distribution was tested using χ 2 statistics. A similar technique was used in other studies, which utilize complex network theory in other domains (for example see [6]). In Table 1, the results of the comparison of statistical properties of different distributions with the parameters adjusted to the UNEG distribution are presented. In Fig. 2, these distributions are shown together with the empirical data on the UNEG in different coordinate axes. Among the tested distributions, the negative binomial distribution has the smallest values of AIC and BIC. Nevertheless, chi-square test rejects strict hypothesis of correctness of using all these distributions, including the negative binomial distribution for the UNEG. Based on this analysis, we get a conclusion that using the zeta distribution (power law) and the geometric distribution (exponential law), which are typically used for the description of nodes degrees distributions of power grids is incorrect for the UNEG and apparently for many others power grids. Popularity of these distributions in the previous works, which are reviewed in [11] is the result of using incorrect statistical methods, such as the regression analysis of the nodes degrees distribution. It also follows that universal models of topological structure formation, such as the Barabsi–Albert model model, the model of the growing random network and the random geometric graph model could not correctly describe the process of the UNEG topology formation. Considering good results of the negative binomial distribution, we get a conclusion that the creation of a compound model of topological structure formation for power transmission grids is promising. For this compound model, we need to use basic principles from source models. In our case, we need to use the historical principle from the random growing network model and the spatial principle from the geometrical graph model. In accordance with the historical principle, most nodes are added to a network consequentially, and it is more likely that older nodes will have more links than younger ones. This principle is especially, important for generation
180
S. Makrushin
of nodes with high degree (the tail in a nodes degrees distribution). In accordance with the spatial principle, all nodes in a network are embedded in metrical space, and existence of a link between nodes depends on spatial distance between them. This principal plays a major role for nodes with low degree.
5 Special Role of Transit Nodes in Power Grids The analysis of the cause of chi-square test failing for the negative binomial distribution shows that this distribution fails in the estimation of fraction of nodes with degrees of 2 in the UNEG. Figure 2 shows that even the Poisson distribution significantly underestimates the proportion of this type of nodes. It is a consequence of the fact that nodes with degrees of 2 are transit nodes in power grids and they have a special role and a special mechanism of formation, which we need to take into account in the compound model. Visual analysis of power grids (see Fig. 1) shows that the high proportion of transit nodes is the consequence of the fact that chain patterns are widespread in power grids. A chain pattern is a chain of transit nodes which forms a long power line route. This route usually has a certain spatial direction and connects distant parts of a power grid. Typically, all nodes of a chain are added to a grid within the framework of one project of the grid development. Using a chain of links separated by nodes instead of one long link for connection of distant parts of a power grid has some technical and economic reasons. Building power lines with lower voltage level is cheaper, but these lines have more losses in power transport due to electrical resistance. Moreover, higher electrical resistance causes more intensive voltage drop throughout a line. Consequently, power lines with lower voltage need more frequent installation of transit substations which support the nominal voltage level. Nevertheless, using lower voltage levels for transporting electricity over long distances could be economically effective, if a power flow is relatively low. Another advantage of using a chain of power lines separated by substations instead of one long power line of higher voltage is a possibility to get access to the power grid in all regions near to transit substations. Power lines which are used to access the power grid through transit substations are not seen in the network model, because distribution power lines have lower voltage level and are not mapped in the model of the power transmission grid. That is why transit nodes which are used by local consumers have a degree of 2. The geometrical graph model and the random growing network model do not form a sufficient quantity of chain patterns in a network, thus formation of chain patterns should be used as an additional principle of the compound model.
Developing a Model of Topological Structure Formation for Power Transmission …
181
6 Ad Hoc Models of Topological Structure Formation After the formulation of the principles for the compound model, we searched in the previous papers ad hoc models of topological structure formation for power transmission grids, which had implemented these principles. As a result, we have found three models of topological structure formation: the geographical network learner and generator (GNLG) model [15], the RT-nested-small-world model [16] and the random growth model (RGM) [13]. The following analysis showed that only the random growth model is based on principles similar to the principles formulated in the paper.
6.1 Geographical Network Learner and Generator (GNLG) Model The aim of the geographical network learner and generator (GNLG) model from [15] is to generate synthetic networks with structural and spatial properties similar to real power grids. The algorithm first generates a set of nodes with the spatial distribution similar to the nodes in a given empirical network. Then, it connects the nodes using two procedures: the tunable weight spanning tree (TWST), which is applied first, and the reinforcement procedure, which is applied last. Their design is inspired by the historical evolution of power grids and their emphases are on the connectivity of a network and the network robustness respectively. The TWST procedure imitates a gradual network evolution and builds a quasiminimum spanning tree with a root in the centre of mass of spatially distributed nodes of the generated network. As a result of running the TWST procedure, all nodes are connected in a tree structure, which is constructed with a compromise between the total links length and the average path length between nodes. Soltan et al. [15] decided that the tail of degree distributions for all four empirical power grids tested in the paper follows a power law distribution. Nevertheless, the authors did not try to test an exponential law and assert that they do not have enough statistical evidence to support the hypothesis about a power law distribution for their networks. Proceeding from this, in [15], the conclusion is drawn that nodes degrees distributions for power grids are very similar to those of scale-free networks, but grids have less nodes with degrees 1 and 2 and do not have nodes with very high degrees. Apart from this postulate, the reinforcement procedure of the GNLG model is based on the following observations: power grids do not include very long lines, and nodes in denser areas are more likely to have higher degrees. On that basis, the reinforcement procedure of GNLG model works as follows: it selects a low degree node (for large networks, the procedure only considers nodes with degrees 1 and 2) in a dense area of a network, and connects it to a high degree node (as in the preferential attachment model [1] from the nearest neighbourhood.
182
S. Makrushin
Thus, we can conclude that the GNLG model actively uses the spatial principle (both in the TWST procedure and in the reinforcement procedure) and uses elements of the historical principle in a very special way. Also, the TWST procedure uses the preferential attachment principle very actively. Thus, the principles of the GNLG model are quite different from our results and this model can’t be used as a basement for creation of the compound model.
6.2 RT-nested-Small-world Model In a paper by Wang et al. [16], the RT-nested-small-world model for generation of power grid topology is presented. This model constructs a large-scale power grid using a hierarchical way: first, it forms connected subnetworks with a limited size; then, it connects the subnetworks through lattice connections. The hierarchy in the model arises from observations of real-world power grids: usually a large-scale system consists of smaller size subsystems, which are interconnected by sparse and important tie lines. As a result of the analysis of nodes degrees distributions of empirical power grid networks, the authors found that they have an exponential tail which is analogous to that of the geometric distribution. However, it is also noticed that for small node degrees (less or equal to 3), empirical probability mass functions curves clearly deviate from that of a geometric distribution. The authors concluded that a nodes degrees distribution for power grids can be well fit by a mixture distribution coming from the sum of a truncated geometric degree distribution and an irregular discrete random variable for nodes with low degree. The first step of the RT-nested-small-world model is the generation of connected subnetworks using modified small-world model, called clusterSmallWorld procedure. This logic is based on fact that there is a lot of evidence that power grids have properties of a small world [4]. ClusterSmallWorld procedure is different from the Watts–Strogatz small-world model [17] in two aspects: the initial link creation and the link rewiring process. Instead of connecting every node to the same quantity of immediate neighbours like in the Watts–Strogatz model, in the clusterSmallWorld procedure, every node is connected to k nodes from a local neighbourhood. Here, k is not a constant, but comes from the geometric distribution. It is important to note that in the RT-nested-small-world model, there is no binding of nodes to metric space and an artificial index of nodes is used to define the neighbourhood of a node. A special procedure based on Markov chain process is used in clusterSmallWorld for the link rewiring process, instead of rewiring links to an arbitrary node chosen randomly, as it done in the Watts–Strogatz model. The Markov chain process is used here to produce a correlation between the rewired links as it is in real power grids. On ending the generation of subnetworks, they are connected through lattice connections, which are selected randomly from neighbouring subnetworks to form a whole large-scale power grid network.
Developing a Model of Topological Structure Formation for Power Transmission …
183
We can conclude that the RT-nested-small-world model does not use the spatial and the historical principle. But the model generates networks with an exponential tail of nodes degrees distribution. Despite this, a large difference in principles forces us to reject the use of RT-nested-small-world model as a basement for creation of the compound model.
6.3 Random Growth Model (RGM) In a paper by Schultz et al. [13], a random growth model (RGM) is proposed. The aim of the model is to create synthetic analogues of networks for power grids and other infrastructure networks. The algorithm of the RGM consists of an initialization phase, in which tree-like kernel of a network is created and a growth phase, in which new nodes are added consequentially. In both phases an attachment rule which is a trade-off between cost optimization and creating redundancy of a network is used. For this purpose, in the process of adding links a special heuristic target function is used: (dG (n i , n j ) + 1)r , (1) f RC (n i , n j ) = d S (n i , n j ) where dG is the length of the shortest path between nodes i and j in network G (i.e. their network distance measured in hops) and d S is the spatial distance between nodes. Here, r is a coefficient which lets us tune (manipulate) the relative importance given to redundancy versus costs, from r = 0 (redundancy is disregarded) to r → ∞ (costs are disregarded). Because the main logic of the RGM algorithm is similar to the model of growing random network (dynamic variant of the Poisson random network model which is described in [5]), the RGM algorithm generates networks with an exponential tail of nodes degrees distribution which is not imposed exogenously but emerges endogenously from the growth process. Furthermore, it implements the feature which provides a high proportion of transit nodes in a generated network. For this purposes, a special subprocedure of the grid’s evolution algorithm splits some links in a generated network by adding transit nodes to them. Below, a basic description of the RGM algorithm is given (see full description in [13]). Phase 1: Initialization of the kernel. At the first (initialization) phase, the connected kernel of a generated network is created from a small share of nodes of the network. 1.1
1.2
Creation of the kernel nodes: Using a parameter N0 —quantity of nodes in the kernel, and rule of spatial distribution for new nodes, new nodes of the kernel of the network are created and placed in a metrical space. Connection of the kernel: Kernel nodes are linked to a connected tree graph with minimal total length (spatial distance d S ) of links using an algorithm of minimal spanning tree creation.
184
1.3
S. Makrushin
Enforcing of the kernel: Parameters of the model define the quantity of additional links which provide the redundancy (additional reliability) of the kernel of the network. Additional links are added consequentially. At every step, an additional link with a higher value of redundancy/cost function f R C is chosen for addition.
Phase 2: Network growth. In the network growth phase of the RGM algorithm, the network grows by consequential addition of new nodes and links. 2.1
2.2
2.3
2.4
2.5
2.6
Choosing iteration type: For every new iteration of growth process with a probability s (parameter of the model), one of two variants of growth process is chosen: the first one consisting of step 2.2 and the second one consisting of steps 2.3–2.5. Splitting a line: One of network links is chosen randomly, then in the middle of the link, a new node is added and the link is split in two new links. Each of them is connected to the new node and to the old neighbour of the split link. Adding a new node: Using the rule of spatial distribution of new nodes, a new node is created. The node is connected to the nearest node (here, spatial distance d S is used as a distance measure). Increasing linkage redundancy of the new node: This step is executed with a probability p (parameter of the model). For the new node which was created at step 2.3, a new link is added aiming to increase the redundancy of the node connectedness. The additional link is chosen with the maximum value of redundancy/cost function f RC among all possible new links for the new node. Increasing Linkage redundancy of a random node: This step is executed with a probability q (parameter of the model). A random node is chosen in the network and a link is added to it aiming to increase the redundancy of the node connectedness. The additional link is chosen with the maximum value of redundancy/cost function f RC among all possible new links for the chosen node. Looping: Until the quantity of nodes of the network is not exceeded return to step 2.1.
This model stands out among other models because it implements the historical and the spatial principles and even has a mechanism for the formation of chain patterns. In the paper [13], the RGM algorithm is used for generation of an analogue of an empirical model of Western US power grid which consists of 4941 nodes and 6594 links. With parameters r = 0.3, N0 = 50, s = 0.28, p = 0.03, q = 0.44, the analogue has nodes degrees distribution very close to the empiric network nodes degrees distribution. Thus, the RGM could form a topological structure of the network which has a nodes degrees distribution consistent with empirical distributions for power grids. Thee RGM agrees with principles for the compound model which we found as a result of the analysis of the UNEG. In the next section, we will make an additional analysis of the RGM.
Developing a Model of Topological Structure Formation for Power Transmission …
185
7 Random Growth Model Analysis High performance of using combination of the historical and the spatial principles as a basement for the model of topological structure formation of power grids follows from the fact that these principles agree not only with mathematical models from which we have deduced them, but also with the real practice of power grids development. In this section, we will compare the RGM basic principles with real practice of power grids development and try to find possibilities to improve the RGM. In the first step, we will go deeper in the implementation of the historical principle in the RGM model. We deduced the historical principle from the model of growing random network (dynamic variant of the Poisson random network model), and broadly defined the principle as a necessity to consequentially add nodes to a generated network. Nevertheless, the structure of the RGM algorithm is even more similar to the model of growing random network than the historical principle. The RGM has two phases: the initialization of a kernel and the network growth quite similar to analogous phases of the model of growing random network. In both models at the first phase, all kernel nodes are created simultaneously and form a connected graph. The RGM kernel network is based on the minimal spanning tree and is close to globally optimal state by cost and reliability criteria. The kernel nodes cover the whole area of generated power grid but with lower density of nodes and longer links. The network growth phase of the RGM algorithm imitates an evolutionary development of a network with spontaneous addition of nodes and links and usage of local optimal decisions in optimization problems which is typical for self-organizing systems. The logic of these phases is in good agreement with the real practice of development of power grids. At the beginning of developing of a national-wide power grid, a strategic plan is usually developed at state level. In particular, in the USSR, the GOELRO plan (a strategic plan) was developed at the initial phase of creating the national-wide power grid [7]. Due to the GOELRO plan implementation, gross electricity generation in the USSR increased more than 50 times in 15 years from 1920 to 1935. It is usual for a strategic planning to use system-level (global) criteria for optimization of the target state of planning. Because the costs of network creation and reliability issues are usually the main criteria in power network development, the logic of the RGM kernel network creation process is highly correlated with the real practice of the first stage of national power transmission grids creation. In practice and in the RGM, a power grid structure is created at the beginning as globally optimal in terms of costs and reliability. At the late stages of development of a power grid, when the main framework of a network is already created, the real practice of the power grid development looks quite similar to the second phase of the RGM algorithm. The main exception here is the link (power line) splitting procedure in the RGM. This procedure is very important for the RGM. For example, in the case of the generation of the analogue of the Western US power grid, about 28% of nodes are created by this procedure. Since the procedure results are new transit nodes, the importance of the procedure is explained by a high share of nodes of degree of 2 in real power grids. However, as
186
S. Makrushin
mentioned above, in the real practice of power grid development, transit nodes are usually created as a result of creation of transit nodes chain which forms a long power line route and only in rare cases, they are the result of splitting an existing line by a new transit node. Moreover, high quantity of splitting operations and specificity of links creation in the phase of the initialization of the kernel in the RGM presupposes that at the early stage of a power grid development a lot of links (power lines) have an extremely high length. But in reality, it was impossible at least due to technological reasons. In practice at the early stage of development of power transmission grids, long routes were constructed as chains of transit nodes. The problem of replacing the line splitting procedure by using chains of nodes in the implementation of a transit node generation process could be solved if we look in detail at the gap between the phase of initialization of the kernel and the phase of network growth in the RGM. Indeed, between a stage of implementation of a strategic plan of creating a power grid from a scratch and a stage when the framework of the network is already created and its development looks like a selforganized process, there is a long stage when a power grid development is governed by a series of long-term plans. Usually, the planning horizon of long-term plans of a power grid development is from 10 to 20 years. While a power grid is developed, the relative role of every subsequent long-term plan is diminishing and projects in it move from a global (system) level to a local level. It means that an important role in a power grid development is played by the implementation of medium-range development projects. A common cause of such projects is creation of long power line routes, which forms a chain pattern in a power grid.). The logic of nodes addition in these projects can be placed between global optimization (first phase of the RGM) and local optimization for individual nodes and links (second phase of the RGM as well as the links splitting procedure in the RGM, the creation of power line routes leads to an abundancy of transit nodes, but it forms a different topology structure of a network. In particular, in case of power line routes creation, transition nodes will be neighbours to other transition nodes more frequently than in the case of the links splitting procedure. We can conclude that the lines splitting procedure are very important for the RGM but it is not correspondent to real power grid development process, which leads to an abundancy of transit nodes in power grids. To correct the RGM algorithm we need to modify it by removing the links splitting procedure and by adding the process of network growth based on medium-range projects. These projects create long power line routes, which form chains patterns in a power grid. New project phase of a network formation in the modified RGM algorithm will ensure the transition of optimization principles from global to local level as a power grid is developing.
Developing a Model of Topological Structure Formation for Power Transmission …
187
8 Conclusion In the current research, the analysis of the applicability of universal models to the UNEG network based on nodes degrees distribution considerations allows to draw the following conclusions. Current universal models are not applicable to the UNEG modelling. Nevertheless, creation of a compound model, which has a nodes degrees distribution similar to the negative binomial distribution, is promising. The analysis of the UNEG nodes degrees distribution helped us to identify the key principles of a model of topological structure formation for power transmission grids: the historical and the spatial principles, and the necessity for a special mechanism for the formation of chain patterns consisting of transit nodes in a network. Especially, important for nodes with high degree is the historical principle of network growth, which postulates that nodes are added consequentially and the degree of a node is dependent on the lifespan of a node. The spatial principle of network growth postulates that link creation is defined by metric distance between nodes. This principle and the special role of transit nodes are important for nodes with low degree. Due to these principles, we chose the RGM: an ad hoc model of network formation which could be a good base for creation of a compound model. This model has a good implementation of the historical and the spatial principles, but the mechanism for the formation of transit nodes in the RGM is not consistent to the real process of power grids development. The analysis has shown that the most promising way of developing a model of topological structure formation for power transmission grids is to further develop the RGM algorithm by adding a new ‘project’ stage of network formation. This stage is a transition from global to local optimization level and could correctly reflect (depict) large-scale projects of network development.
References 1. Barabsi, A., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999) 2. Barthelemy, M.: Spatial Networks. Condensed Matter, Statistical Mechanics, Physics Reports 499, 1–101 (2011). arXiv:1010.0302 3. Erdös, P., Rényi, A.: On random graphs I. Publ. Math. Debrecen 6, 290–297 (1959) 4. Makrushin, S.: Analysis of Russian power transmission grid structure: small world phenomena detection. In: Kalyagin V., Nikolaev A., Pardalos P., Prokopyev O. (eds) Models, Algorithms, and Technologies for Network Analysis. NET 2016, Springer Proceedings in Mathematics & Statistics, vol. 197. Springer, Cham (2017) 5. Matthew, O.: Jackson Social and Economic Networks. Princeton University Press (2010) 6. (H9) Moslonka-Lefebvre, M., Bonhoeffer, S., Alizon, S.: Weighting for sex acts to understand the spread of STI on networks. J. Theor. Biol. 311(Oct 21), 46–53 (2012) 7. Neporozhnii, P.S.: 50th anniversary of the Lenin Golro Plan and Hydropower Development. Power Technol. Eng. 4(12), 1089–1093 (1970) 8. NetworkX (2017). https://networkx.github.io. Accessed 30 Sept 2017 9. OpenStreetMap (2017). http://www.openstreetmap.org. Accessed 30 Sept 2017
188
S. Makrushin
10. Order of the Ministry of Energy of Russia: Shema i programma razvitiya ENES na 2013–2019 godi (Scheme and development program of the UNES on 2013–2019 years). Order of the Ministry of Energy of Russia from 19.06.2013 309 (2013) 11. Pagani, G. A., Aiello, M.: The Power Grid as a Complex Network: a Survey. Physica A: Statistical Mechanics and its Applications, 392 (11) (2011) 12. Penrose, M.: Random Geometric Graphs. Oxford University Press, Oxford (2003) 13. Schultz, P., Heitzig J., Kurths J.: A Random Growth Model for Power Grids and Other Spatially Embedded Infrastructure Networks. Eur. Phys. J. 223(2593) (2014) 14. Services for technological connection: power distribution centers (2017). http://portaltp.fskees.ru/sections/Map/map.jsp. Accessed 30 Sept 2017 15. Soltan, S., Zussman, G.: Generation of Synthetic Spatially Embedded Power Grid Networks. In: Proceedings of the IEEE PES-GM’16, July 2016 16. Wang, Z., Scaglione A., Thomas, R.J.: Generating statistically correct random topologies for testing smart grid communication and control networks. IEEE Trans. Smart Grid, 1(1), 5463043, 28–39 (2010) 17. Watts, D.J., Strogatz, S.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)
Methods of Criteria Importance Theory and Their Software Implementation Andrey Pavlovich Nelyubin, Vladislav Vladimirovich Podinovski and Mikhail Andreevich Potapov
Abstract The article presents a general approach to the solution of the multicriteria choice problem by methods of the Criteria importance theory. The overview of methods of vector estimates comparison by preference using various types of information about the preferences of the decision-maker is given. These methods are implemented in the computer system DASS. Keywords Multicriteria analysis · Criteria importance theory · Decision support system · Graphical-analytical methods
1 Introduction Most of the real decision-making problems are inherently multicriterial. The decisionmaker (DM) needs to assess his/her subjective preferences as accurately as possible to choose the best final alternative. For this purpose, he/she can utilize mathematical and computer tools that provide opportunities to use complex mathematical methods for multicriteria analysis and optimization. Among other approaches to analyzing and solving multicriteria problems, the Criteria Importance Theory (CIT) developed in Russia has a number of special advantages [1–3]. In this theory, a formal definition of the relative importance of criteria is introduced, which makes it possible to correctly take into account the incomplete and inaccurate information about the preferences of DM: the relative importance of A. P. Nelyubin (B) Mechanical Engineering Research, Institute of the RAS, Moscow, Russia e-mail:
[email protected] V. V. Podinovski Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] M. A. Potapov Institute of Computer Aided Design of the RAS, Moscow, Russia e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_13
189
190
A. P. Nelyubin et al.
criteria and the change in preferences along the scale of criteria. This information can be expressed both qualitatively and quantitatively, in the form of intervals of possible values of preference parameters. In this article, we will consider only those methods of CIT that are implemented in the computer decision support system DASS version 2.4 [3, 4]. This system is designed to help in solving the problem of multicriteria choice among a finite set of alternatives. There are some new features in the version 2.4 comparing with previous versions. One of them is implementation of conciliatory decisions [5] in case of incomplete information on preferences. Also some new algorithms were developed which will be described in this article. The reported study was carried out within the framework of the State Program of the ICAD RAS during the research in 2016–2018, with the financial support of the RFBR (research project No. 16-01-00404 a).
2 Methods of the Criteria Importance Theory To describe the CIT methods, let us introduce the following mathematical model of the situation of making an individual decision under conditions of certainty: M =< X, K , Z , R >, where X is the set of decision variants (alternatives); K = (K 1 , . . . , K m ) is the vector of m 2 individual criteria; Z is the range of values of the vector criterion K ; R is a non-strict preference relation of the DM. Criterion K i is a function defined on X with a range of values on Z i . It is assumed that all the criteria are homogeneous or reduced to such. This means that the criteria have a common scale and, in particular, they have a common range of values, which is the set of estimates Z 0 = {1, . . . , q}, q 2. It is assumed that each of the criteria is independent by preference from the others and its larger values are preferable to smaller ones. The values of all the criteria K i (x) of variant x from X form a vector estimate of this variant y = K (x) = (K 1 (x), . . . , K m (x)). Vector estimates from the set Z = Z m0 can correspond to the available variants from X or be hypothetical. Comparison of the options by preference is reduced to comparing their vector estimates. To do this, the non-strict preference relation R is introduced on the set of vector estimates of Z : the notation y Rz means that the vector estimate y is no less preferable than z. The relation R is (partial) quasi-order (that is, it is reflexive and transitive) and generates on Z the indifference I and (strict) preference P relations: y I z ⇔ y Rz ∧ z Ry;
y Pz ⇔ y Rz ∧ ¬z Ry.
Since the DM preferences increase along the scale of the criteria Z 0 , the Pareto relation R ∅ is defined on the set of vector estimates Z :
Methods of Criteria Importance Theory and Their Software Implementation
191
y Rz ⇔ yi z i , i = 1, . . . , m. As a rule, it is not possible to solve the multicriteria choice problem only with the help of the Pareto relation. Therefore, it needs to be expanded, using additional information about the preferences of the DM. Let R be the preference relation constructed on Z using the information obtained . If initially there is no such information ( = ∅), then R plays the role of R ∅ . CIT uses information about the relative importance of criteria for decision-makers, which can be expressed qualitatively or quantitatively [1]. Qualitative information on the importance of the criteria consists of messages of the form “Criteria K i and K j are equally important” (denoted by i ∼ j) and “Criterion K i is more important than the criterion K j ” (denoted by i j). An exact definitions of these concepts are given in [1]. Further in the article, we will assume that the information is consistent and complete, i.e., it allows you to order (rank) the importance of all the criteria. For convenience, we number the criteria in order of nonincreasing of their relative importance. The positive numbers α1 , . . . , αm in the sum equal to 1 are called the coefficients of importance of the criteria consistent with the information if they satisfy the following conditions: i ∼ j ∈ ⇒ αi = α j ,
i j ∈ ⇒ αi > α j .
Consistency of information ensures the existence of coefficients of importance. The importance coefficients agreed with the full information are called ordinal [2]. The set A of feasible values of the vector α = (α1 , . . . , αm ) is given by a system of linear constraints (1, 2): (1) α1 + · · · + αm = 1, αi = αi+1 , i f i ∼ i + 1, αi > αi+1 , i f i i + 1, i = 1, . . . , m − 1.
(2)
Quantitative information on the importance of the criteria consists of messages like “The criterion K i is more important than the criterion K j in h i j times”. An exact definition of this concept is given in [1, 6]. The information is said to be complete and consistent, if it can be used to specify the quantitative or cardinal coefficients of the importance of the criteria αi -positive numbers in the sum equal to 1 and possessing the property h i j = αi /α j , i, j = 1, . . . , m. Quantitative information allows qualitative information to be specified. To do this, consider the degree of superiority in the importance h i of each criterion K i over the following criterion K i+1 : hi =
αi 1, i = 1, . . . , m − 1. αi+1
Here, h i = 1 if i ∼ i + 1 ∈ , and h i > 1 if i i + 1 ∈ .
(3)
192
A. P. Nelyubin et al.
Quantitative information on the criteria importance can be set not exactly, but with interval constraints: 1 li h i ri , i = 1, . . . , m − 1.
(4)
Here, li = ri = 1 if i ∼ i + 1 ∈ , and li > 1 if i i + 1 ∈ . This interval quantitative information on the importance of the criteria is denoted by . In addition to information on the relative importance of criteria, CIT takes into account information on the scale of criteria Z 0 , namely its type and rate of growth of preferences along this scale. If we only know that the values v(k) = vk of the gradations k of the scale Z 0 increase with increasing k, then such a scale is called the ordinal scale: v(1) < v(2) < · · · < v(q). (5) For q 3, information on the rate of growth of preferences along the criteria scale may serve as additional information. For this value increment between adjacent grades d(k) = v(k + 1)?v(k), k = 1, . . . , q − 1, are ranked by preference. With such information , the scale of criteria is referred to as the first ordered metric scale [7]. In this article, we consider only the law of decrease (information ↓) of the growth rate of preferences: δ(1) > δ(2) > · · · > δ(q − 1) > 0.
(6)
The quantitative information on the criteria scale specifies the information ↓. An exact information V about the rate of growth of preferences can be specified by the ratio δ(k)/δ(k + 1), k = 1, . . . , q − 2. In this case, the criteria scale is a scale of intervals. Interval information [V] can also be specified: 1 dk
δ(k) u k , k = 1, . . . , q − 2. δ(k + 1)
(7)
All these types of information about the preferences of the decision-maker and their combinations require their own methods of analyzing the problem and algorithms for comparing alternatives by preference. Within the framework of CIT, precise and effective methods, and algorithms for solving such problems have been developed. It was necessary to obtain solutions of a number of linear, nonlinear, and discrete problems. For some of these problems analytical solutions have been obtained. Here, is a brief overview of the methods developed. First, consider the information on ordering the criteria by importance. For various types of criteria scale (5) and (6), analytical decision rules were developed in CIT [2, 8, 9]. To formulate them, for each k = 1, . . . , q − 1, we introduce the vector αk (y) = (α1k , α2k , . . . , αmk ) whose elements are: αik (y) =
αi , yi > k, 0, yi k,
i = 1, . . . , m.
(8)
Methods of Criteria Importance Theory and Their Software Implementation
193
Let ψ (n) (x) be a vector-function that orders the components of the n-dimensional vector x in the order of their nonincreasingness: ψ1(n) (x) ψ2(n) (x) · · · ψn(n) (x). The analytical decision rule specifying the ratio R is: y R z ⇔ ψi(m) αk (y) ψi(m) αk (z) , i = 1, . . . , m; k = 1, . . . , q − 1.
(9)
If in (9) all the non-strict inequalities are satisfied as equalities, then y I z, and if at least one is satisfied as strict >, then y P z. The analytical decision rule defining the ratio R &↓ is similar in form (9), but instead of the vector αk (y), it uses the vector α[1,k]↓ (y) composed of the vectors α1 (y), α2 (y), . . . , αk (y) [9]. In case of exact quantitative information about the relative importance of criteria having an ordinal scale (5), the decision rule that defines the preference relation R can be represented as follows [6]: y R z ⇔ Bk (y) Bk (z), k = 1, . . . , q − 1,
(10)
m αik (y). here Bk (y) = i=1 If, however, only a set A of possible values of the vector α is known (for example, under constraints (1–4)), then y R A z is true if and only if the inequalities (10) hold for any α ∈ A, or equivalently: y R A z ⇔ inf (Bk (y) − Bk (z)), k = 1, . . . , q − 1. α∈A
(11)
In the case of exact quantitative information about the relative importance of criteria having a first ordered metric scale (6), the decision rule defining the preference relation R &↓ can be represented as follows [8]: y R &↓ z ⇔ Dk (y) Dk (z), k = 1, . . . , q − 1, m αit (y). here Dk (y) = kt=1 Bt (y) = kt=1 i=1 If only a set A of possible values of the vector α is known, then y R A&↓ z ⇔ inf (Dk (y) − Dk (z)), k = 1, . . . , q − 1. α∈A
(12)
To construct decision rules that use quantitative (in particular, interval) information about the growth of preferences along the scale of criteria (7), we use the additive value function: F (y | α, v) =
m i=1
αi v(yi ).
194
A. P. Nelyubin et al.
Let us introduce the matrix C(y, z) with the elements: ⎧ ⎨ 1, i f z i ≤ k < yi , cik (y, z) = −1, i f yi ≤ k < z i , i = 1, . . . , m; k = 1, . . . , q − 1. ⎩ 0, other wise,
(13)
With its help, it is possible to express the difference in the value functions for the vector estimates y and z being compared: G (y, z | α, v) = F (y | α, v) − F (z | α, v) = m i=1
αi
q−1
cik (y, z)δk
m
αi (v(yi ) − v(z i )) =
i=1
= αT C(y, z)δ = G (y, z | α, δ) .
k=1
If the values of the importance coefficients αi and the increments of the values of gradations of the scale δ(k) are known exactly, then the following decision rule can be used: y P &V z ⇔ G (y, z | α, δ) 0. And if only (non-empty) sets A and of possible values of vectors α and δ are known, then (14) y P A& z ⇔ inf G (y, z | α, δ) 0. (α,δ)∈A×
The decisive rules (11), (12), and (14) requires solving of optimization problems. Even with a relatively small number of variants (several dozen), this makes it difficult to use them. For interval information about the relative importance of the criteria, it was shown in [10, 11] how to solve these optimization problems with the help of recurrence formulas. The analytical decision rule specifying the relation R is: y R z ⇔ γk∗ 0, k = 1, . . . , q − 1.
(15)
Here, γk∗ are the quantities consecutively calculated for each k according to the recurrence formulas: l c , c 0, γ1k = 1 1k 1k r1 c1k , c1k < 0; li cik + γi−1,k , cik + γi−1,k 0, γik = ri cik + γi−1,k , cik + γi−1,k < 0,
i = 1, . . . , m − 1;
(16)
Methods of Criteria Importance Theory and Their Software Implementation
195
γk∗ = cmk + γm−1,k . The decision rules y R &↓ z and y R &V z are similar in form to (15–16), but instead of the numbers cik (y, z) (introduced in (13)), they use the numbers dik (y, z) = q−1 k c (y, z) and e = i t=1 it k=1 cik (y, z)δk , respectively. When combining interval information on the importance of the criteria and interval information on the scale V, the decision rule (14) requires solving the bilinear programming problem. The algorithm for solving this problem was proposed in [12], it uses the extreme point formulas of the sets A and given by constraints (1–4) and (5–7), respectively.
3 Software Implementation of the Methods To solve multicriteria choice problems using the CIT methods considered, the authors develop the computer system DASS, which is freely available at http://www.mcodm. ru/soft/dass. In this system, the solution of the choice problem is organized in the form of an iterative process, during which the DM gradually specifies information about his/her preferences in the interactive mode [3, 4]. At each step of this process, the system calculates the results of comparisons of alternatives, based on the available information on the DM preferences and using the methods described above. First, the DM introduces basic information about the problem: a set of criteria, decision variants, and their evaluations by each of the criteria. At this stage, the Pareto relation R is constructed on the set of vector estimates of the variants. Nondominant (Pareto-optimal) variants are singled out, among which the choice is to be made taking into account the preferences of the DM. The DM does not need to immediately indicate the exact information about his/her preferences. First, with the help of special methods developed, qualitative information on the importance of the criteria [1] is inquired. Using the decision rule (9), the system tries to compare variants from the set of Pareto-optimal ones among themselves. As a result, some of these variants turn out to be dominant by the relation R. Thus, the number of nondominated options decreases. To further narrow the set of nondominant variants, one can begin to inquire, using special methods, quantitative information about the relative importance of criteria in the form of intervals (4), consistent with the qualitative information [1]. Alternatively, one can begin to refine the type of the criteria scale Z 0 in the form (7) or (8). Using the appropriate decision rules, the system will again try to compare by preference the nondominant variants obtained in the previous step. And so on, until the one nondominant variant is obtained. In practice, for this purpose it is not necessary to inquire exact values for the importance coefficients αi of the criteria and the exact values v(k) for the gradations of the criteria scale Z 0 . The preference relation R at each step of the iterative process of solving the choice problem can be represented as an incomplete oriented graph. The vertices of the graph are variants or their vector estimates. The arcs represent the preference relation R.
196
A. P. Nelyubin et al.
Incomparable by the relation of R variants are not connected on the graph. In addition, to analyze the solution, it is convenient to arrange the vertices of the graph in the plane so that all the arcs have a common direction, for example, downwards. Thus, the nondominant variants will be on top. Such a representation of intermediate results in the form of an oriented graph refers to graphical-analytical methods for solving the choice problem. Graphic image promotes a comprehensive perception of a large number of complex information. The visual representation of the set of variants allows to reveal the relationship existing between them. When a DM sees this picture of the relationship as a whole, he/she can best plan the further course of the solution of the problem. As a result, visualization tools improve the quality of decision-making.
References 1. Podinovski, V.V.: Introduction into the Theory of Criteria Importance in Multi-Criteria Problems of Decision Making. Fizmatlit, Moscow (2007) (in Russian) 2. Podinovski, V.V.: Criteria importance coefficients in decision making problems. Ordinal importance coefficients (in Russian). Autom. Remote Control 10, 130–141 (1978) 3. Podinovski, V.V., Potapov, M.A.: Criteria importance in multicriteria decision making problems: theory, methods, soft and applications (in Russian). Otkrytoe obrazovanie. No 2, 55–61 (2012) 4. Podinovski, V.V.: Analysis of multicriteria choice problems by methods of the theory of criteria importance, based on computer systems of decision making support. J. Comput. Syst. Sci. Int. 47(2), 221–225 (2008) 5. Nelyubin, A.P., Podinovski, V.V.: Multicriteria choice based on criteria importance methods with uncertain preference information. Comput. Math. Math. Phys. 57(9), 1475–1483 (2017) 6. Podinovski, V.V.: The quantitative importance of criteria for MCDA. J. Multi-Criteria Decis. Anal. 11, 1–15 (2002) 7. Fishburn, P.C.: Decision and Value Theory. Wiley, New York (1964) 8. Podinovski, V.V.: On the use of importance information in MCDA problems with criteria measured on the first ordered metric scale. J. Multi-Criteria Decis. Anal. 15, 163–174 (2009) 9. Nelyubin, A.P., Podinovski, V.V.: Analytical decision rules using importance-ordered criteria with a scale of the first ordinal metric. Autom. Remote Control 73(5), 831–840 (2012) 10. Podinovski V.V.: Interval information on criteria importance in multi-criteria decision making analysis. Nauchno-Tech. Inf. Ser. 6(2), 15–18 (2007) 11. Nelyubin, A.P., Podinovski, V.V.: Optimization methods in multi-criteria decision making analysis with interval information on the importance of criteria and values of scale gradations. Autom. Doc. Math. Linguist. 45(4), 202–210 (2011) 12. Nelyubin, A.P., Podinovski, V.V.: Bilinear optimization in the analysis of multicriteria problems using criteria importance theory under inexact information about preferences. Comput. Math. Math. Phys. 51(5), 751–761 (2011)
A Model of Optimal Network Structure for Decentralized Nearest Neighbor Search Alexander Ponomarenko, Irina Utkina and Mikhail Batsyn
Abstract One of the approaches for the nearest neighbor search problem is to build a network, which nodes correspond to the given set of indexed objects. In this case, the search of the closest object can be thought as a search of a node in a network. A procedure in a network is called decentralized, if it uses only local information about visited nodes and its neighbors. Networks, which structure allows efficient performing the nearest neighbor search by a decentralized search procedure started from any node, is of particular interest, especially for pure distributed systems. Several algorithms that construct such networks have been proposed in literature. However, the following questions arise: “Are there network models in which decentralized search can be performed faster?”; “What are the optimal networks for the decentralized search?”; “What are their properties?” In this paper, we partially give answers to these questions. We propose a mathematical programming model for the problem of determining an optimal network structure for decentralized nearest neighbor search. We have found an exact solution for a regular lattice of size 4 × 4 and heuristic solutions for sizes from 5 × 5 to 7 × 7. As a distance function, we use L 1 , L 2 and L ∞ metrics. We hope that our results and the proposed model will initiate study of optimal network structures for decentralized nearest neighbor search. Keywords Nearest neighbour search · Optimisation model · Network decentralised search
A. Ponomarenko (B) · I. Utkina · M. Batsyn Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected];
[email protected] I. Utkina e-mail:
[email protected] M. Batsyn e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_14
197
198
A. Ponomarenko et al.
1 Introduction The nearest neighbor search appears in many fields of computer science. A problem of building data structure for the nearest neighbor search is formulated as follows. Let D be a domain and d : D × D → R[0;+∞) be a distance function. One needs to preprocess a finite set X ⊆ D so that the search of the closest object for any given query q ∈ D in the set X will be as fast as possible. A huge number of methods have been proposed. Of particular interest is the case when the search of nearest neighbor should run in a distributed environment without any central coordination point. For this case, a natural approach for organizing nearest neighbor search is to build a network, which nodes correspond to the given set X . In this case, the search of the closest object can be thought as a search of a node in a network. Moreover, a distributed environment, especially for p2p case, requires that all procedures that are involved in the search or indexing processes should be decentralized. This means that all procedures have only local information about visited nodes and its neighbors and do not have access to the information about the whole structure of the network. As a rule, such an approach implies searching via greedy walk algorithm [1–4] or its modification [5, 6]. So, many p2p systems including DHT protocols [7–9] use the same search algorithm, but employ different distance functions and have different network structures. In the present paper, we address the problem of optimal network structure for NNS. We emphasize that for any fixed input set, there exists an optimal network structure with respect to the chosen search algorithm. To study the properties of such networks, we present a mathematical Boolean nonlinear programming model of optimal network structure. The objective is to minimize the expected number distance computations made by the greedy walk algorithm to find the nearest neighbor for an arbitrary query starting from an arbitrary node. As a first step, we solve this problem for the case when the input set X corresponds to the set of nodes of a two-dimensional regular lattice. We have found an exact solution for size 4 × 4 and heuristic solutions for sizes from 5 × 5 to 7 × 7. As a distance function we use L 1 , L 2 and L ∞ metrics.
2 Mathematical Formulation We consider a network as a graph G(V, E) with vertex set V X {1, . . . , n} and edge set E ⊂ V × V . Let d(i, q) be a distance function between vertex i and query q. The neighborhood of vertex i is defined as N (i) { j ∈ V : (i, j) ∈ E}. We denote the probability function for a query as f q for a discrete domain and as f (q) the probability density function for a query in continuous domain.
A Model of Optimal Network Structure for Decentralized …
199
2.1 Decentralized Search Algorithm—Greedy Walk The goal of the search algorithm is to find the vertex (target vertex) in the graph G which is the closest to the query, going from one vertex to another through the set of edges E of G. The search is based on the information related to the vertices. During the search process, the algorithm can calculate the distance between the query and the vertices which it knows. Below is the pseudo code of the greedy walk algorithm.
Starting from vertex s the algorithm calculates the value of the distance function d(y, q) between query q and every neighbor y of s. After that, the algorithm is recursively called for vertex c closest to the q. The algorithm stops at the vertex, which neighborhood contains no vertices closer to the query than itself. The greedy walk algorithm can be also considered as a process of routing a search message in a network. At each step, the node (vertex) which has received a message (message holder) passes it to the neighbor closest to the query according to the function d.
2.2 Mathematical Programming Model By no means, all graphs have proper structure for searching via greedy walk. In our model, we require from the structure of graph G that search of any vertex by the greedy walk will reach the target vertex starting from an arbitrary vertex. In general, this requires that the graph needs to have the Delone graph as a subgraph. Similar to the Kleinberg model [1], in this paper, we consider a particular case when vertices are nodes of a regular lattice with integer coordinates. In this case, the Delone graph is just the set of the edges of the regular lattice. The complexity of the search algorithm is measured as the number of different vertices for which the distance to the query has been calculated. We take this number as an objective function. Equations (1–9) define Boolean nonlinear programming formulation for optimal graph structure. Decision variables 1, if edge (i, j) belongs to the solution (1) xi j 0, otherwise
200
A. Ponomarenko et al.
yikj
1, if vertex k belongs to the greedy walk from i to j 0, otherwise
(2)
Objective function 1 min O(i, jq ) f q (discrete domain) n i1 q∈D n 1 min O(i, jq ) f (q)dq, (continuous domain), n i1 n
(3a)
(3b)
D
where jq arg min d( j, q)
(4)
j1,n
O(i, jq ) l ∈ V : ∃k xlk 1 and yikjq 1
(5)
xii 0 ∀i ∈ V
(6)
Constraints
yii j n
j yi jq
1 ∀i, jq ∈ V
xlk yikjq ≥ yil jq ∀i, jq , l ∈ V
(7) (8)
k1 ∗
l ∗ arg min (d(l, q)) ⇒ yil jq ≥ yikjq ∀q ∈ D ∀i, k ∈ V l∈V : xkl 1
(9)
Decision variables xi j (1) determine the adjacency matrix of the optimal graph, which we want to find. Indicator variables yikj (2) are used to calculate the number of the operations O(i, jq ) performed during the search process from vertex i to vertex jq , which is the closest vertex (target vertex) to the query q (4). In our case, it is the number of different vertices for which the distance to the query has been calculated. This is equal to the cardinality of the union set of neighborhoods of vertices k for which yikj 1 (5). Since we want to find the optimal graph in general case (for any starting vertex and any query), our objective is to minimize the average number of operations required for the search algorithm to reach a target vertex (3a, 3b, 4, 5). Constraint (6) guarantees that there are no loops in the graph and constraint (7) requires greedy walk (i, j) to start from vertex i and stop at vertex j. Constraint (8) links variables xi j and yikj and requires that the search algorithm (the greedy walk) will go through one of vertex l neighbors, if it goes through this vertex l. Constraint (9) describes the greedy strategy of the greedy walk algorithm: if vertex k belongs to the greedy walk from vertex i to
A Model of Optimal Network Structure for Decentralized …
(a)
L2 , f ≈ 7.093
(b)
L1 , f ≈ 7.039
201
(c)
L∞ , f ≈ 7.203
Fig. 1 Exact solutions found by our branch-and-bound algorithm for regular lattice 4 × 4
vertex jq (yikjq 1) then its neighbor l ∗ , closest to the query q among all its neighbors ∗ l, should also belong to this greedy walk (yil jq 1). The presented model is applicable for an arbitrary metric space. In the next section, we present the results for a particular case when vertices are the nodes of a twodimensional regular lattice and the distance functions are L 1 , L 2 , or L ∞ .
3 Computational Experiments and Results In this work, we suppose that the input set corresponds to the nodes of a twodimensional regular lattice, and we have a domain such that all nodes have the same probability to be the nearest neighbor for a query. In this case, the nearest neighbor search can be thought as a node discovery procedure, which means that we need to find the given node in the network. Obviously, we can find the optimal graph structure, if we check all possible configurations of the set of edges. However, the number of all possible configurations grows as 2n(n−1)/2 . To find an exact solution, we have implemented a branch-and-bound algorithm. The exact solutions found by algorithm for regular lattice 4 × 4 are presented in Fig. 1. The solutions found by our heuristic are presented in Figs. 2, 3, and 4.
4 Conclusion and Future Work We have proposed a Boolean nonlinear programming model to determine an optimal graph structure, which minimizes the complexity of the nearest neighbor search by the greedy walk algorithm. We have found an exact solution for a regular lattice of size 4 × 4, and presented the results found by our heuristic for sizes from 5 × 5 to 7 × 7 with the three most popular distances: L 1 , L 2 , and L ∞ .
202
A. Ponomarenko et al.
(a)
L2 , f ≈ 8.974
(b)
L1 , f = 8.784
(c)
L∞ , f ≈ 9.036
Fig. 2 Solutions found by our heuristic for regular lattice 5 × 5
(a)
L2 , f ≈ 10.509
(b)
L1 , f ≈ 10.485
(c)
L∞ , f ≈ 10.756
(c)
L∞ , f ≈ 12.328
Fig. 3 Solutions founded by heuristic for a regular lattice 6 × 6
(a)
L1 , f ≈ 12.054
(b)
L2 , f ≈ 12.136
Fig. 4 Solutions found by our heuristic for regular lattice 7 × 7
However, we realize that the most important characteristic which should be studied is the asymptotical behavior of the objective function. Therefore, our future work will be focused on improving the efficiency of our exact and heuristic algorithms. We also have plans to develop models describing optimal network structures for
A Model of Optimal Network Structure for Decentralized …
203
approximate nearest neighbor search. We hope that this work will draw attention to the study of graph structures optimal for decentralized nearest neighbor search. Acknowledgements This research is conducted in LATNA Laboratory, National Research University Higher School of Economics and supported by RSF grant 14-41-00039.
References 1. Kleinberg, J.: The small-world phenomenon: an algorithmic perspective. In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, pp. 163–170. ACM, May 2000 2. Beaumont, O., Kermarrec, A.-M., Marchal, L., Riviere, E.: VoroNet: a scalable object network based on Voronoi tessellations. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–10. IEEE (2007) 3. Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In: Similarity Search and Applications, pp. 132–147. Springer, Berlin, Heidelberg (2012) 4. Beaumont, O., Kermarrec, A.M., Rivière, É.: Peer to peer multidimensional overlays: approximating complex structures. In: Principles of Distributed Systems, pp. 315–328. Springer, Berlin, Heidelberg (2007) 5. Malkov, Y., Ponomarenko, A., Logvinov, A., Krylov, V.: Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45, 61–68 (2014) 6. Ruiz, G., Chávez, E., Graff, M., Téllez, E.S.: Finding near neighbors through local search. In: Similarity Search and Applications, pp. 103–109. Springer International Publishing (2015) 7. Maymounkov, P., David, M.: Kademlia: a peer-to-peer information system based on the xor metric. In: Peer-to-Peer Systems, pp. 53–65. Springer, Berlin, Heidelberg (2002) 8. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-topeer lookup service for internet applications. ACM SIGCOMM Comput. Commun. Rev. 31(4), 149–160 (2001) 9. Rowstron, A., Druschel, P.: Pastry: scalable, decentralized object location, and routing for largescale peer-to-peer systems. In: Middleware 2001, pp. 329–350. Springer, Berlin, Heidelberg, Nov 2001
Computational Study of Activation Dynamics on Networks of Arbitrary Structure Alexander Semenov, Dmitry Gorbatenko and Stepan Kochemazov
Abstract In this paper, we present the results on describing and modeling dynamical properties of collective systems. In particular, we consider the problems of activation and deactivation of collectives, represented by networks, by establishing special agents called activators and deactivators in a network. Such problems are combinatorial and to solve them, we employ the algorithms for solving Boolean satisfiability problem (SAT). Thus, we describe the general technique for reducing the problems from a considered class to SAT. The paper presents the novel approach to analysis of problems related to Computer Security. In particular, we propose to study the developing and blocking of attacks on computer networks as the processes of activation/deactivation. We give a number of theoretical properties of corresponding discrete dynamical systems. For the problems of blocking attacks on computer networks, the corresponding reduction to SAT was implemented and tested. At the present moment using state-of-the-art SAT solvers, it is possible to solve such problems for networks with 200 vertices. Keywords Networks · Discrete dynamical systems · Boolean satisfiability problem (SAT) · Vulnerabilities and attacks on computer networks
A. Semenov (B) · D. Gorbatenko · S. Kochemazov Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia e-mail:
[email protected] D. Gorbatenko e-mail:
[email protected] S. Kochemazov e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_15
205
206
A. Semenov et al.
1 Introduction It can be said with confidence that the study of networks at the present moment is one of the most active areas of science. Suffice to say that the number of citations in Google scholar of several key papers about the properties of networks are numbered in tens of thousands (see for example, [1, 2]). Such an interest to networks is explained by the fact that they are very expressive when viewed as models of collective systems. An additional impulse in that direction is supplied by social networks becoming a part of everyday life. Since one can naturally move from a social network (of a relatively small size) to its abstract model, there appears a symbiosis between abstract and real parts which is very productive for science: the theoretical properties that arise when working with abstract model can be confirmed or rejected by observing a real network. On the other hand, a new property that was figured out for a model can have a useful interpretation in the context of a real network. It is possible to informally outline two main parts in the modern study of networks. The first part studies the properties of networks which can be characterized as qualitative. By these, we mean different consistent patterns observed in networks when postulating some general principles of their genesis. For example, one can consider different characteristics of randomly generated networks—generate random graphs using one of the known models and track their characteristics, such as small world effect, clustering coefficient, various types of centrality, etc. Another large part is formed by the study of processes of information dissemination in networks. Note that, the division implied here is quite conditional because the two mentioned areas are intertwined. For example, the small-world effect can be viewed as a property of networks of some type (say, Watts–Strogatz networks [2]) to provide fast transmission of information between any two nodes in a network. The present paper belongs to the second area because hereinafter, we consider a relatively common type of information dissemination processes in networks. In particular, we discuss the activation processes, when an arbitrary network node (to which we refer as to agent) at each particular time moment can be in one of several states, which can be viewed as grades of activity. In the most simple situation, there are only two grades: 0—inactive and 1—active. A particular combination of active/inactive states of agents at some time moment directly influences such combination at the next time moment. Hereinafter, assume that a discrete time is considered. Therefore, a network is viewed as Discrete Dynamical System (DDS). Note, that DDS considered below are often called DDS of automaton type or Discrete automata [3]. In accordance with [4], we associate with processes of networks activation the combinatorial problems on distributing activating/deactivating agents (activators/deactivators, respectively). To solve the problems of such kind, one can use state-of-the-art combinatorial algorithms which demonstrate high effectiveness on large classes of the so-called industrial tests. In particular, in [4] for this purpose, the algorithms for solving Boolean Satisfiability problem (SAT) [5] were applied. The main novelty of the present paper is the models of processes related to computer
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
207
security. In particular, we consider models of attacks in computer networks, where the attacks are viewed as network activation processes. Let us give a brief outline of the paper. In the next section, we introduce basic notation and give known results, which is used as a basis for further constructions. In Sect. 3, we formulate the problems of activation dynamics of collectives: we consider the problems of finding dispositions of activating/deactivating agents in a network that result in this network’s soonest activation/deactivation. The fourth section is the main section of the paper. In this section, we consider the processes of development and blocking of attacks on computer networks from the point of view of activation dynamics. Here, we give theoretical results related to the type of State Transition Graphs for the corresponding discrete dynamical systems. Also, we propose an effective algorithm that makes it possible for an arbitrary computer network to construct a graph representing all possible attacks within this network. In the same section, we briefly describe the procedure for reducing to SAT the problem of blocking attacks by patching some of the vulnerabilities in computer network. This problem is viewed as a special case of a deactivation problem—to use an advantageous disposition of patches in order to block an activation process corresponding to some attack. In the conclusion, we briefly summarize the achieved results.
2 Preliminaries Hereinafter by network we mean an arbitrary graph (usually directed and sometimes labeled), the vertices of which interpret the members of some collective (agents), and edges or arcs interpret binary relations over the set of agents (such as “friends” or “influence”, etc.). Consider a simple (i.e., without loops and multiple arcs) directed graph G = (V, A), called network. We assume that the network structure does not change with time. Here, V is a set of vertices (agents), |V | = n, A—a set of arcs. An arbitrary arc (w, v) ∈ A interprets the relation “agent w influences agent v”. For an arbitrary v ∈ V , we define its neighborhood Vv as follows: Vv = {w ∈ V, w = v|(w, v) ∈ A}. Introduce a discrete time t ∈ {0, 1, 2, . . .}. At each time moment t with an arbitrary v ∈ V , we associate an element ωv (t) from some finite set Ωv (in a general case, an arbitrary symbol, such as, e.g., 0 or 1). Let v ∈ V be an arbitrary network vertex and Vv = {v1 , . . . , vl } be its neighborhood. Let us define the rule f v , in accordance with which at time moment t + 1, we associate with elements ωv (t), . . . , ωv (t) an element ωv (t + 1), s.t. 1
l
ωv (t + 1) = f v (ωv (t), . . . , ωv (t)). 1
l
(1)
208
A. Semenov et al.
The mapping f v : Ωv × · · · × Ωv → Ωv is called a weight function of vertex v. 1 l By defining the weight functions of all network vertices, the following function is defined as: (2) FG : Ω → Ω, where Ω = Ωv1 × · · · × Ωvn . The values of FG , considered at time moments t, are called network states (we refer to the state at time moment t as WG (t)). If for each v ∈ V it holds, that Ωv = {0, 1}, then the described network with weight functions specified by (1) is called a Synchronous Boolean network (SBN) or a Kauffman network [6, 7]. In SBN, the weight functions are the Boolean functions which can be specified by truth tables of formulas. Despite the fact that in the original paper by S. Kauffman [7], SBNs were used to model the behavior of gene networks, and it was possible to extend a number of notions and ideas from this paper to arbitrary collectives. In particular, we will represent the function (2) in the form of a special graph called State Transition Graph (STG). The vertices of interpret network states, i.e., n-tuples from Ω. The arcs in interpret transitions between states at moments t and t + 1. Thus, when G is SBN then graph contains 2n vertices, each associated with a binary word of size n representing the certain set of values of all weight functions of SBN G. Because the number of network states is always finite, then whatever the initial network state (i.e., the state at t = 0), there exists such natural numbers L and M, 0 ≤ L < M that in the sequence of states WG (0), WG (1), . . . , WG (M) it holds that WG (L) = WG (M). Denote the smallest L and M that have this property as L ∗ and M∗ , respectively. Then, the sequence WG (L ∗ ), . . . , WG (M∗ ) is called a cycle (or attractor) of length M∗ − L ∗ . A cycle of length 1 is called a stationary state or a fixed point of a mapping FG . Since in a general case, it holds that the number of vertices in STG is exponential from the number of vertices in network G, then the problems of finding stationary states and cycles of mappings of the kind (2) are combinatorial. When studying particular networks, one can apply to solving such problems various combinatorial algorithms which are effective in practice. In a number of papers, the algorithms for solving Boolean Satisfiability problem (SAT) were used for this purpose [3, 4, 6]. Below, let us briefly touch the question of reducing the problems of finding cycles in State Transition Graphs to SAT. Let U = {u 1 , . . . , u l } be a set of variables, each taking a value from a set of two elements. For convenience, we denote this set as {0, 1}. The variables from U in that case are called Boolean. Any mapping of the kind μ : U → {0, 1} is called an assignment of variables from U . Let us denote the set of all possible binary words of length n by {0, 1}n the set of all possible binary words of length n. We call an arbitrary total mapping of the kind f : {0, 1}n → {0, 1} a Boolean function of arity n. Also in this case, we use the notation f (x1 , . . . , xn ) and refer to Boolean function f as to function with n
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
209
arguments. It is possible to specify Boolean functions via different ways. First, one can use tables for this purpose (they are sometimes referred to as truth tables). Second, any Boolean function can be specified using special expressions called Boolean formulas. A Boolean formula from n arguments is a word constructed according to special rules over an alphabet that apart from variables x1 , . . . , xn includes special symbols called logical connectives. Each logical connective specifies a Boolean function of arity 1 or 2. The corresponding Boolean functions are called elementary. As it was told above, for any n ≥ 1 any Boolean function can be expressed using a Boolean formula with Boolean variables from a set X = {x1 , . . . , xn } and logical connectives. Such Boolean function can be viewed as a superposition of elementary Boolean functions. Some logical connectives have explicit technical interpretation [8] and can be viewed as nodes of an electrical circuit. An important fact from the basics of theory of Boolean functions is that it is possible to represent any Boolean function f : {0, 1}n → {0, 1} by a directed acyclic labeled graph S f called Boolean circuit. In a set of vertices of such graph, there are explicitly outlined vertices without parents. These vertices are called input nodes of a circuit. Input nodes are labeled with variables from a set X = {x1 , . . . , xn }. All the other nodes of S f are called gates. Each gate is labeled with some logical connective. A gate without children is called a circuit output. The set B of different logical connectives which is used to label gates of circuit S f is called its basis. It is also said that function f is realized by Boolean circuit over basis B. Any basis, over which it is possible to realize any Boolean function (for any n) is called complete. One of the most commonly used complete basises is B = {∧, ∨, ¬}. Here, ∧ is a logical connective called conjunction, ∨— disjunction and ¬ — negation. One can refer to Boolean formulas, where the elements of some (usually complete) basis act in the role of logical connectives, as to formulas over basis B. It is well known that any Boolean function can be expressed by a formula over basis B = {∧, ∨, ¬} called a Conjunctive Normal Form (CNF). To provide a formal definition of CNF, we need to borrow several notions from mathematical logic. Any formula of the kind u or ¬u, where u is a Boolean variable, is called a literal. An arbitrary disjunction of different literals is called a clause. CNF is an arbitrary conjunction of different clauses. For example, the formula (u 1 ∨ ¬u 3 ) ∧ (¬u 1 ∨ ¬u 2 ∨ u 3 ) ∧ (u 2 ∨ ¬u 3 ) ∧ (¬u 1 ∨ ¬u 2 ∨ ¬u 3 ) is the CNF over the set of Boolean variables U = {u 1 , u 2 , u 3 }. Let C be an arbitrary CNF over a set of variables U and let f C be a Boolean function specified by formula C. A CNF C is called satisfiable, if there exists such assignment μ of variables from U , that f C (μ) = 1. In that case, μ is called a satisfying assignment for C. If C has no satisfying assignments, then it is called unsatisfiable. It is also possible to represent any Boolean function as a Disjunctive Normal Form (DNF), that represents a disjunction of conjunctions of literals. However, traditionally, algorithms for solving Boolean satisfiability problem work with CNFs. A Boolean Satisfiability Problem or in short SAT consists in the following: for an arbitrary CNF C to answer the question: “Is it True that C is satisfiable?” SAT is a historically first NP-complete problem [9, 10]. In a wider sense by SAT, it is usually meant not only the decision problem but also the corresponding search problem: if C
210
A. Semenov et al.
is satisfiable to find at least one of its satisfying assignments. Despite the intractability in structural sense, in numerous special cases SAT can be effectively solved using combinatorial algorithms that employ various heuristics to reduce the search space. The performance of the corresponding algorithms have dramatically increased over the last 15 years. Hereinafter, we refer to functions that transform binary words into binary words as to discrete functions. Let {0, 1}∗ be the set of all possible binary words of an arbitrary finite length. Let F be a total discrete function, F : {0, 1}∗ → {0, 1}∗ and assume that F is specified by some algorithm (program) M. This program naturally specifies a countable family of functions of the kind: Fn : {0, 1}n → {0, 1}m , n ∈ N.
(3)
For a fixed n by inversion problem for discrete function Fn , we mean the following problem: for an arbitrary γ ∈ RangeFn to find such α ∈ {0, 1}n that Fn (α) = γ. Note that, an arbitrary function of the kind (3) can be viewed as a set of m Boolean functions of arity n. Therefore, it is possible to specify (3) using a Boolean circuit S Fn over an arbitrary complete basis. A scheme S Fn will have n inputs and m outputs. There is a special technique called Tseitin transformations [11] which puts in correspondence to an arbitrary circuit S Fn a CNF C Fn over a set of Boolean variables U , X ⊆ U. Here X = {x1 , . . . , xn } are variables associated with input nodes of a circuit S Fn and U \X are the variables associated with all the gates of S Fn. The time complexity of the procedure that constructs C Fn for S Fn is linear of the total number of gates in circuit S Fn. Let us pick in U a subset Y of variables associated with output nodes of S Fn . Let γ ∈ {0, 1}m be an arbitrary assignment of variables from Y . Let us assign values from γ to corresponding variables from Y and perform all possible elementary transformations with the obtained formula. Denote the resulting CNF as C Fn (γ). From the properties of Tseitin transformations, it follows that if γ ∈ RangeFn then C Fn (γ) is satisfiable and from each of its satisfying assignments, it is possible to extract such an assignment α of variables from X , that Fn (α) = γ. Thus, the problem of inversion for an arbitrary function of the kind (3) is reduced to SAT. The transition from an inversion problem (3) to SAT for CNF C Fn (γ) can be used to find preimages γ using state-of-the-art SAT solving algorithms. Sometimes, it is productive to associate with a considered original problem some discrete function (3) and use the inversion problem for this function as an intermediate stage of solving an original problem. As an example, consider a problem of finding stationary points of a mapping (2) for one particular case. Let G = (V, A) be an SBN over n vertices, for which the weight functions are specified by Boolean formulas. It is easy to see that in this case, the function (2) is a discrete function of the kind: Fn : {0, 1}n → {0, 1}n ,
(4)
which can be specified by a simple algorithm. Using the technique described above, let us construct CNF C Fn . Assume that U is a set of Boolean variables from
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
211
C Fn and X = {x1 , . . . , xn } and Y = {y1 , . . . , yn } are the variables associated with inputs/outputs of circuit S Fn. Consider the following CNF: C Fn ∧ (x1 ≡ y1 ) ∧ . . . ∧ (xn ≡ yn ),
(5)
whereby ≡ we denote the logical connective called logical equivalence. In this context, it is clear that CNF (5) is satisfiable if and only if a mapping Fn has stationary points. If (5) is satisfiable and μ is its satisfying assignment, then one can extract from μ an assignment α of variables from X , which defines a stationary point in the corresponding STG. It is possible to reduce the problem of finding cycles of an arbitrary length in STGs for mappings (2) to SAT in a similar way. In the case when G is an SBN over n vertices, for this purpose one can consider a function Fnr which is an r -fold superposition of function of the kind (4), where r is the length of the sought cycle. The corresponding circuit S Fnr is essentially a chain of r circuits of the kind S Fn , where the inputs of each successive circuit are linked to the outputs of the preceding one. Similar to (5) it is necessary to add to CNF C Fnr a constraint that specifies that input and output states of Fnr are equal. The described algorithm for reducing the problems of finding cycles in STGs of mappings of the kind (2) to SAT belongs to a class of procedures called propositional encodings.
3 Problems of Activation Dynamics on Networks of Arbitrary Structure The famous paper of Granovetter [12] became a prologue to the study of problems described in this section. In [12], there were considered several fundamental problems related to mathematical modeling of sociological processes. In particular, M. Granovetter considered the activation problems for collectives, where agents can have different activation thresholds. For example, an agent with activation threshold 0 is initially active. For an agent with activation threshold k to become active, it is necessary and sufficient that k agents in its neighborhood are active. In his paper, M. Granovetter shows a lot of real-world examples of collective processes that fit into his general threshold model. In [4], the activation dynamics of networks was considered in the context of two well-known phenomea: conformity and anticonformity. The conforming behavior consists in that agent-conformist transitions to an active state under the influence of some number (not less than a specific threshold) of active agents in its neighborhood. If a corresponding threshold is not reached, then this agent transitions to the inactive state. Essentially, this form of activation corresponds to Granovetter’s model. The anticonforming behavior up until recently has not been studied as extensively as the conforming one. In this context, we would like to cite [13]. The phenomenon of anticonformity is much less widespread as that of conformity,
212
A. Semenov et al.
however, it is not hard to find corresponding examples. The anticonformity implies that an agent transitions to an active state only if in its neighborhood the number of inactive agents is greater or equal than its threshold. Otherwise, an anticonformist moves to an inactive state. The main conceptual difference of the approach proposed in [4] from that used by M. Granovetter consists in that conforming and anticonforming behavior in [4] was considered in the context of activation processes of Kauffman networks (SBNs). This point of view made it possible to naturally associate with considered phenomena combinatorial problems that consist in finding such dispositions of activating agents in a network that push this network to an active state in a relatively short time. From our point of view, the results from [4] contain at least two valuable components. First, considering conformity and anticonformity in the context of SBNs made it possible to obtain several interesting qualitative theoretical results. For example, from the SBNs point of view, the two phenomena are different in principle. In particular, in [4] we found out that with some natural initial conditions, an STG for a network of conformists contains a stationary point but cannot have cycles of length ≥2. With the same conditions an STG for a network of anticonformists may contain stationary points and cycles of length 2 but not cycles of length ≥3. The second valuable contribution of [4] is the procedures for reducing the problems of finding dispositions of agents, which activate/deactivate a network, to SAT. They were implemented in the form of computer program and tested on networks of random structure constructed according to well-known random graphs models: Erdös-Rényi [14], Watts–Strogatz [2] and Barabasi-Albert [1]. At the moment of writing [4], this approach made it possible to solve combinatorial problems of proposed type for networks with 500 vertices. Later, in [15, 16] thanks to the use of parallel SAT solvers and state-of-the-art propositional encoding techniques [21] it was possible to solve these problems for networks with up to 4000 vertices. Because the main results of the present paper can be viewed as the development of ideas of [4] in application to problems of Computer Security, let us first give several key notions and facts from [4]. Initially, to agents that kickstart the activation process, i.e., that have an activation threshold equal to 0, M. Granovetter referred as instigators. The same notion with a similar meaning was employed in [4]. For agents that are never active (it can be said that the activation threshold of such agent is greater than the number of agents in its neighborhood) in [4], we used the term loyalists. Hereinafter, we will use more neutral terminology and refer to instigators as activators and to loyalists as deactivators. To agents that are neither activators, nor deactivators we refer as simple agents. For an SBN which models the collective of conformists, the weight functions of simple agents look as follows: ωv (t + 1) =
1, v ∈Vv ωv (t) ≥ θv · |Vv | . 0, v ∈Vv ωv (t) < θv · |Vv |
(6)
Here, θv ∈ [0, 1] is an activation threshold of agent v, |Vv | is the number of agents in a neighborhood of v (if Vv = ∅ then it is assumed that v ∈Vv ωv (t) = 0).
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
213
In [4], the following combinatorial problems were considered. For an SBN with weight functions of the kind (6) assume that at initial time moment t = 0, all agents of a network are inactive simple agents. Then, some of the simple agents are replaced by activators. The goal was to find such disposition of activators that would after small amount of time moments force the network to a condition in which the majority of agents are active (say, ≥80%). Also, we studied an inverse problem: for an activated network (for example, with at least 80% active agents), it was necessary to replace a small number of simple agents by deactivators and to find such disposition of deactivators that would force the network after small amount of time moments to a state in which only activators are active. As it was mentioned above, in [4], we reduced the corresponding problems to SAT and solved them for networks of random structure with several hundreds of vertices. In the next section, we employ some of the ideas of the described approach to study the problems related to Computer Security. In particular, we focus on models of attack development in computer networks.
4 Problems of Blocking Attacks on Computer Networks in the Context of Their Activation Dynamics By computer network, we mean a directed labeled graph G = (V, A, L V , L A ), whereby L V , L A we denote the set of symbols called labels (for vertices and for arcs, respectively). It is implied that there are specified mappings L V → V and A → L A . We refer to vertices of G as hosts. The attacks on computer networks are dynamic processes, in the course of which the adversary attempts to obtain an unauthorized access to some hosts. Hereinafter, we will define in more detail the main issues related to attacks. Unfortunately, the related terminology in the corresponding area is not formalized in any single source. We believe that one of the most complete papers in this sense is the Ph.D. Thesis by M. Danforth [17]. Therefore, all the notions used below have the same meanings as in [17]. So, consider a computer network G = (V, A, L V , L A ). Hereinafter, we study only cases when L A = ∅, i.e., that the arcs are not labeled and only reflect that the two hosts are directly linked. The neighborhood of an arbitrary vertex v ∈ V , interpreting some host, stands for the same as above. If the informative formulations are available, the technique described below can be applied also to networks where L A = ∅. In the majority of problems in computer security, the actions of some hostile entity, called adversary or malefactor are analyzed. It is assumed that the adversary is violating the properties of a considered system (such as computer network or a cryptographic protocol), which are set in its original specification. With each network host a number of entities are associated, the specifics of which stems from computer security. In the first place, here, we mean vulnerabilities. A vulnerability is a complex of conditions, thanks to which an adversary can use a specific host for its hostile actions with regards to a network. The corresponding process is referred to as exploiting a vulnerability. The number of currently known
214
A. Semenov et al.
vulnerabilities is huge.1 Often, to clearly understand a specific vulnerability it is necessary to deeply understand the practical area where it was discovered. Below, we will focus on several vulnerabilities that were employed in a number of key papers on models of security of computer networks [17–20]. We use the notation from [17]. Below, the part “pe” in the names of vulnerabilities means Privilege Escalation. The vulnerabilities of this type allow an adversary, which we hereinafter denote as M, to escalate its access rights on a corresponding host. Usually, it is implied that an adversary can use such vulnerability to obtain root access. The “noauth” part is a shortening for No Authorization. Such vulnerabilities mean that M can perform some actions on a host without authorization. Thus, for example, pe-noauth means that M can get the root access on a corresponding host without authorization. The part “local” means that an adversary can perform actions on a specific host only using user rights on this host. That is pe-local means that if M has user rights on a host, it can escalate it to superuser level (root). Also, we consider the following two vulnerabilities. The first is called writenoauth. It allows an adversary to add any host (including the ones controlled by adversary) to a file of trusted hosts of an attacked host. The second vulnerability is called trust-login—it makes it possible for an adversary to log in on an attacked host and obtain user access rights on it, on condition that an attacked host trusts the attacker. As it was noted in [17, 19, 20], the listed vulnerabilities often arise in real computer networks. Another important notion is that of elementary attack. An elementary attack is defined for a pair of hosts, which are called source (S) and target (T ). It is formulated as the following implication: from the preconditions on hosts S and T follow the postconditions on host T
Usually, in the preconditions there are some vulnerabilities, while postconditions contain access rights. Thus, an elementary attack describes a one-time exploitation of available vulnerabilities on host T . The time in the cited papers is not explicitly defined. Hereinafter, we assume that the preconditions are satisfied at time moment t and the postconditions take place at time moment t + 1. Below, we show four elementary attacks. They were studied in [17] and we use them below. 1. Remote To Root: if host T has the pe-noauth vulnerability, then as the result of this attack any host S in the neighborhood of T is granted superuser access rights on T . 2. Trust Establishment: if host T has the write-noauth vulnerability, then the result of this attack is that host T starts to trust all hosts in its neighborhood. 3. Trust Login: if host T has the trust-login vulnerability and it trusts host S, then the result of attack is that host S obtains user rights on host T. 1 http://www.cvedetails.com/browse-by-date.php.
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
215
4. Local Privilege Escalation: if T has a pe-local vulnerability and S has user rights on T , then after this attack S gets superuser rights on T. Any attack on a computer network is considered as a sequence of elementary attacks. Usually, in a network, there are explicitly outlined hosts S ∗ —the host from which M starts an attack, and T ∗ —the host on which M wants to obtain root access. If M succeeds then the attack is successful. Below, we consider attacks in the context of an introduced formalism of synchronous discrete dynamics. In more detail, consider a network G = (V, A, L V ) as a discrete dynamical system. If v is an arbitrary host of a network, different from S ∗ , then we assume that at time moment t it is associated with a Boolean vector ωv (t) which reflects all vulnerabilities of this host, as well as the results of elementary attacks on this host at previous time moments 0, . . . , t. For each host v ∈ V \{S ∗ } at time moment t + 1, we synchronously compute ωv (t + 1) using a set of vectors {ωv (t)}v∈V \{S ∗ } and the rules defining elementary attacks described above. Let Q = {q1 , . . . , ql } be all possible vulnerabilities. In accordance with the above, for an arbitrary v ∈ V \{S ∗ } the vector ωv (t) looks as follows: ωv (t) = ω 1 , . . . , ωl , ωl+1 (t), . . . , ωr (t) The (ω 1 , . . . , ωl ) part of this vector is called vector of vulnerabilities. This part does not change with time: ω j = 1 if a host v has a vulnerability q j , otherwise ω j = 0. The (ωl+1 (t), . . . , ωr (t)) part is called vector of possibilities and can change with time. Below, we will use the following important assumption referred to as monotonicity property: If for some time moment some coordinate of the vector of possibilities became 1 then it will not change at following time moments. We will refer to discrete dynamic systems of a described type, specified by computer networks, as Discrete Dynamic Systems for Computer Networks (DDSCN). Let us illustrate it on an example, which appears in the majority of cited papers. We mean the network of three hosts, which was first studied in [20]. In this example, there are employed four described vulnerabilities and four corresponding elementary attacks. On Fig. 1 at time moment t = 0, the adversary M controls host H0 . Its goal is to get root access at host H2 . On Fig. 2 the vector associated with host H1 at time moment t = 0 is showed. Fig. 1 The network with three hosts
216
A. Semenov et al.
Fig. 2 The ω H1 (0) vector
Fig. 3 STG for DDSCN defined by network from Fig. 1
The first four components of this vector form the vector of vulnerabilities (here, three out of four vulnerabilities are present). In the vector of possibilities, the coordinate trust takes the value of 1 if and only if the considered host trusts all other hosts. The coordinate from the family user privileges takes the value of 1 if and only if H1 has user access rights on a corresponding host. The coordinates from the root privileges family has a similar meaning. Let us define a discrete dynamics of synchronous type for this network, according to the rules described above. The corresponding transitions between states form the State Transition Graph depicted on Fig. 3. Let us comment the Fig. 3. The network state here is the matrix of size 3 × 9 with components from {0, 1}. The coordinates in vectors of possibilities that have changed at step t = 1 are marked with squares, at step t = 2 with ovals and at step t = 3 with rhombuses. It is easy to see that at t = 3, the host H0 (i.e., host S ∗ ) gets a root access rights on the host H2 (host T ∗ ). The state at time moment t = 3 is the stationary point of the mapping (2). The following statement holds. Theorem 1 STG for DDSCN with explicitly outlined host S ∗ contains one stationary point and does not contain cycles of length ≥ 2. Sketch proof. Consider an arbitrary computer network G = (V, A, L V ). Let S ∗ be some host from which an adversary attacks this network. Define t = 0 as the moment when M have not yet exploited any vulnerabilities. At time moment t = 1, the adversary M exploits all available vulnerabilities on accessible network hosts, the neighborhoods of which contain S ∗. At time moment t = 2, it exploits vulnerabilities of hosts, the neighborhoods of which contain not only S ∗ but also other hosts, the vulnerabilities of which were exploited at t = 1, etc. From the monotonicity property during each transition t → t + 1, the network state either will not change, so the corresponding state is a stationary point, or the number of 1s in vectors of possibilities will increase. Since the total number of coordinates in vectors of possibilities
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
217
is finite, then the system will inevitably move to a stationary point. This stationary point is unique due to the deterministic nature of state transitions. . Let us now consider the question of representing the set of attacks on a computer network in the form of some compact data structure. Such structures are usually known as attack graphs. They were first introduced in [18, 20]. Note, that the term attack graph does not have a unique interpretation. The attack graphs from the cited papers are different both in their nature and in the way they were generated. In [20], a model checking solver was used to construct an attack graph, thus there were no guarantees of effectiveness. In [18], a direct approach to constructing attack graphs based on the description of a computer network was proposed. It also employs the list of vulnerabilities and possible attacks. This algorithm has a polynomial complexity. In more detail, the corresponding estimation looks as follows: O(||2 · |E|), where is a set of entities called attributes and E is the set of elementary attacks, called exploits in [18]. Below, we show that in the context of the approach developed in the present paper, for an arbitrary computer network G = (V, A, L V ) with explicitly outlined host S ∗ , if we consider this network as DDS, it is possible to construct a special graph, representing all attacks on this network in time, bounded from above by a polynomial in |V |. Theorem 2 For an arbitrary computer network G = (V, A, L V ) with explicitly outlined S ∗ , lists of vulnerabilities and elementary attacks Q and E, respectively, it is possible to deterministically construct a graph Δ(G) representing all attacks on a considered network. The time of constructing Δ(G) is O(|V |2 · R), where R is the maximal number of coordinates in a vector of possibilities overall network hosts. Sketch proof. Consider a network G = (V, A, L V ) as DDS. Graph Δ(G) is a directed acyclic labeled graph with explicitly outlined root vertex. Any vertex in Δ(G) corresponds to some host, thus by the term “vertex” below we mean a corresponding host. So, the root vertex of Δ(G) corresponds to host S ∗ . The set of remaining vertices of this graph is split into layers corresponding to time moments t = 1, 2, . . .. In the first layer (L 1 ) there are vertices, in the neighborhood of which lies S ∗ , and for which during the transition (t = 0) → (t = 1), at least one coordinate in the vector of possibilities has changed from 0 to 1. If there are no such vertices, then L 1 = ∅. If L 1 = ∅ then a second layer L 2 is constructed. In L 2 there are all vertices, in the neighborhood of which there are either S ∗ or the vertices from the first layer, such that during (t = 1) → (t = 2) at least one of the coordinates in the vector of possibilities changes from 0 to 1. If there are no such vertices, then L 2 = ∅. And so on. The arcs of Δ(G) connect the vertices from different layers. Each arc goes from a layer number i to a layer number j : j > i. It interprets an elementary attack and is labeled by a corresponding symbol from the set E. Due to the monotonicity property, the process of constructing layers must end at some finite step. Moreover, it is easy to see that the number of layers in Δ(G) does not exceed |V | · R, where R is the maximal number of coordinates in a vector of possibilities overall network hosts. Indeed, the “extremal” case is when in each layer, there is exactly one host for which one coordinate in a vector of possibilities
218
A. Semenov et al.
have changed. From the other hand, the number of vertices in each layer of Δ(G) is limited by |V |. Thus, we have the proposed bound from above . Suppose that some computer network G = (V, A, L V ) is considered. Let Δ(G) be an attack graph for this network. Also, we can consider STG of the corresponding DDSCN as an interpretation of the set of all possible attacks. Assume that in Δ(G) or the adversary reaches its goal and obtains the root access on one or several target hosts. Further, we study the problem of patching vulnerabilities in order to block successful attacks. In real-world computer networks, some vulnerabilities cannot be removed. Therefore, the following problem is of particular interest: is it possible to patch a small number of vulnerabilities or all vulnerabilities of a certain type in such a way that after this all successful attacks cannot reach their goals. In the context of the above, we have a typical problem of activation dynamics: in an activated network to construct a disposition of a small number of “deactivators” to block network activation. In other words, we need to provide conditions for DDSCN to enter a stationary point earlier (since in that case, M will not be able to obtain root access on target hosts). The problem of finding the corresponding dispositions of deactivators is combinatorial and to solve it we employ the approach outlined above, modified to take into account its specifics. Below, we briefly describe the corresponding actions. First, using Δ(G) or , we find the smallest number of steps t∗ , that M needs to get root access on target hosts. A single step of the considered DDSCN corresponds to a discrete function FG of the kind (2). Then, a function FGt∗ , which is a t∗ -fold superposition of FG , will correspond to t∗ steps. For the Boolean circuit representing FGt∗ , we construct CNF Ct∗ (G) in accordance with the propositional encoding procedure described above. We introduce additional constraints regarding all vulnerabilities, that limit their activity/inactivity. It is done by means of special auxiliary Boolean variables called patching variables. Let xq be a variable that takes the value of 1 if and only if there is a vulnerability q ∈ Q on some host. Introduce a new Boolean variable pq that takes the value of 1 if and only if xq takes the value of 0. It is clear that in that case xq ⊕ pq must take the value of 1. Let us introduce another variable yq , such that yq ≡ (xq ⊕ pq ). Let C(yq ) be the CNF representing the boolean function of three variables, specified by formula yq ≡ (xq ⊕ pq ). Denote by C˜ t∗ (G) the CNF produced from Ct∗ (G) as a result of replacing all appearances of variable xq by the variable yq . Let x(S ∗ ,T ∗ ) be a Boolean variable that takes the value of 1 if and only if the host S ∗ has root access rights on host T ∗ . From the above (including the properties of Tseitin transformations), it follows that the CNF C˜ t∗ (G) ∧ C(yq1 ) ∧ · · · ∧ C(yqs ) ∧ (¬x(S ∗ ,T ∗ ) ) is satisfiable if and only if there is such a set of patching variables pq1 , . . . , pqs for vulnerabilities q1 , . . . , qs , that if they are assigned truth values (that corresponds to the absence of associated vulnerabilities) then the variable x(S ∗ ,T ∗ ) takes the value of 0. In other words, in this case M cannot obtain the root access on T ∗ in time t∗ . In a
Computational Study of Activation Dynamics on Networks of Arbitrary Structure
219
similar way it is possible to write such a condition on the absence of root on several target hosts. To encode the fact that not all, but only some part of vulnerabilities can be blocked, we used the technique described in [4] (it was used there to compute the number of active agents in a neighborhood). This technique is based on sorting networks [21]. The described method for patching vulnerabilities was implemented as a computer program. At the current stage, it makes it possible to solve corresponding combinatorial problems on a usual PC for networks with 200 and more vertices. We believe that in the nearest future, the application of new methods for encoding cardinality constraints, described in [21] will make it possible to significantly increase the dimension of problems of blocking attacks that can be solved in reasonable time.
5 Conclusions The paper contains the results regarding the application of combinatorial algorithms to the study of dynamical properties of collectives represented by networks. The new results are the development of ideas expressed in [4], where it was proposed to use algorithms for solving Boolean satisfiability problem to find dispositions of activating/deactivating agents in a network. More concretely, in the present paper, we consider the problems of modeling the development and also of blocking attacks on computer networks. The blocking of a particular attack is achieved by distributing patches that remove vulnerabilities on some network hosts. The additional constraints on the number of patches or the kind of patched vulnerability make the corresponding problem combinatorial. In the paper, we describe the general technique for reducing the problems of blocking attacks to SAT. Besides that, we propose new theoretical results regarding the properties of discrete mappings specified by networks from the considered class. The current practical implementation of algorithms for blocking attacks makes it possible to solve the corresponding problems for networks with hundreds of vertices on a usual PC. In the nearest future, we plan to significantly increase the dimension of solved problems thanks to recently developed techniques of propositional encoding for mappings, specified by networks. Acknowledgements The research was funded by Russian Science Foundation (project No. 16-1110046).
References 1. Barabasi, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999) 2. Watts, D., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)
220
A. Semenov et al.
3. Evdokimov, A.A., Kochemazov, S.E., Otpushchennikov, I.V., Semenov, A.A.: Study of discrete automaton models of gene networks of nonregular structure using symbolic calculations. J. Appl. Ind. Math. 8(3), 307–316 (2014) 4. Kochemazov, S., Semenov, A.: Using synchronous boolean networks to model several phenomena of collective behavior. PLOS ONE 9(12), 1–28 (2014) 5. Biere, A., Heule, M.J.H., van Maaren, H., Walsh, T.: Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185, Feb 2009. IOS Press 6. Dubrova, E., Teslenko, M.: A SAT-based algorithm for finding attractors in synchronous boolean networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(5), 1393–1399 (2011) 7. Kauffman, S.: Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 22(3), 437–467 (1969) 8. Shannon, C.E.: A symbolic analysis of relay and switching circuits. Electr. Eng. 57(12), 713– 723 (1938) 9. Cook, S.A.: The complexity of theorem proving procedures. In: Proceedings of the Third Annual ACM Symposium, pp. 151–158. ACM, New York (1971) 10. Levin, L.: Universal sequential search problems. Probl. Inf. Transm. 9(3), 265–266 (1973) 11. Tseitin, G.: On the complexity of derivation in propositional calculus. Studies in constructive mathematics and mathematical logic, part II, Seminars in mathematics, pp. 115–125 (1970) 12. Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978) 13. Breer, V.V.: Game-theoretic models of collective conformity behavior. Autom. Remote Control 73(10), 1680–1692 (2012) 14. Erdös, P., Rényi, A.: On random graphs. Publications Mathematicae 6, 290–297 (1959) 15. Kochemazov, S., Semenov, A., Zaikin, O.: The application of parameterized algorithms for solving SAT to the study of several discrete models of collective behavior. In: 39th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2016, Opatija, Croatia, May 30–June 3, 2016, pp. 1288–1292. IEEE (2016) 16. Kochemazov, S., Zaikin, O., Semenov, A.: Improving the effectiveness of SAT approach in application to analysis of several discrete models of collective behavior. In: 40th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2017, Opatija, Croatia, May 22–26, 2017, pp. 1172–1177. IEEE (2017) 17. Danforth, M.: Models for threat assessment in networks. California Univ. Davis Dept. of Computer Science, Technical report (2006) 18. Ammann, P., Wijesekera, D., Kaushik, S.: Scalable, graph-based network vulnerability analysis. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, pp. 217–224. CCS ’02. ACM, New York, NY, USA (2002) 19. Jha, S., Sheyner, O., Wing, J.M.: Two formal analysis of attack graphs. In: 15th IEEE Computer Security Foundations Workshop (CSFW-15 2002), 24–26 June 2002, Cape Breton, Nova Scotia, Canada, pp. 49–63 (2002) 20. Sheyner, O., Haines, J.W., Jha, S., Lippmann, R., Wing, J.M.: Automated generation and analysis of attack graphs. In: 2002 IEEE Symposium on Security and Privacy, Berkeley, California, USA, May 12–15, 2002, pp. 273–284 (2002) 21. Asin, R., Nieuwenhuis, R., Oliveras, A., Rodriguez-Carbonell, E.: Cardinality networks: a theoretical and empirical study. Constraints 16(2), 195–221 (2011)
Rejection Graph for Multiple Testing of Elliptical Model for Market Network D. P. Semenov and Petr A. Koldanov
Abstract Models of stock return distributions attract growing attention last decade. Elliptically contoured distributions became popular as probability model of stock market returns. The question of adequacy in this model to real market data is open. There are known results that reject such model and at the same time, there are results that approve such model. Obtained results are concerned to testing some properties of elliptical model. In the paper, another property of elliptical model namely property of symmetry condition of tails of two-dimensional distribution is considered. Multiple statistical procedure for testing elliptical model for stock returns distribution is proposed. Sign symmetry conditions of tails distribution are chosen as individual hypotheses for multiple testing. Uniformly, most powerful tests of Neyman structure are constructed for individual hypotheses testing. Associated stepwise multiple testing procedure is applied for the real market data. The main result is that under some conditions tail symmetry hypothesis is not rejected. Keywords Elliptically contoured distributions · Tail symmetry condition Multiple decision statistical procedure · Family-wise error rate · Holm procedure Rejection graph · Stock market
D. P. Semenov (B) · P. A. Koldanov Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia e-mail:
[email protected] P. A. Koldanov e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_16
221
222
D. P. Semenov and P. A. Koldanov
1 Introduction Models of stock return distributions attract growing attention last decade in theoretical and applied finance, portfolio selection, and risk management. Elliptically contoured distributions became popular as probability model of stock market returns [1]. The question of adequacy in this model to real market data is open. There are known results that reject such model and at the same time, there are results that approve such model using statistical approach. For example, in [2], it was shown that while Student’s copulas provide a good approximation for highly correlated pairs of stocks, discrepancies appear when the pair correlation between pairs of stocks decreases, which excludes the use of elliptical models to describe the joint distribution of stocks. These results were obtained by testing symmetry, symmetry of tails, and some other properties of elliptically contoured distributions. But, as the authors point out, their approach differs from the usual testing of hypotheses by statistical tools. In the paper [3], statistical methodology for testing symmetry condition was proposed. Distribution-free multiple decision statistical procedure based on uniformly most powerful tests of the Neyman structures was constructed and it was shown that under some conditions sign symmetry hypothesis is not rejected. To describe results of application of multiple decision statistical procedure to the USA and UK stock markets, the concept of rejection graph was introduced. In this paper, we consider the problem of testing symmetry condition for tails of two-dimensional distributions. Note that, sign symmetry condition is a particular case of this condition. Uniformly, most powerful Neyman structure test for hypothesis of symmetry condition for tails distributions for any pair of stocks is constructed. Multiple decision statistical procedure for simultaneously testing such hypotheses is constructed and applied to the stock markets of different countries. Numerical experiments show that hypothesis of tail symmetry condition for overall stock market is rejected. At the same time, it is observed that the graph of rejected individual hypotheses has unexpected structure. Namely, this graph is sparse and has several hubs of high degree. Removing this hub leads to non-rejection of hypothesis of tail symmetry condition. The paper is organized as follows. In Sect. 2, basic definitions and problem statement are given. In Sect. 3, multiple statistical procedure for testing symmetry condition for tails of two-dimensional distributions are constructed. In Sect. 4, constructed procedure is applied to analysis UK, USA, Germany, France, India, and China stock markets. In Sect. 5, some conclusion remarks are given.
2 Basic Notations and Problem Statement Let N be the number of stocks on the stock market and n be the number of observations. Let pi (t) be the price of the stock i at day t (i = 1, . . . , N ; t = 1, . . . , n). Then, return of stock i for the day t is defined as follows:
Rejection Graph for Multiple Testing of Elliptical Model for Market Network
xi (t) = ln(
pi (t) ) pi (t − 1)
223
(1)
Let xi (t) (i = 1, . . . , N ; t = 1, . . . , n) be a sample (iid) from distribution of the random variables X i . Distribution of random vector X = (X 1 , X 2 , . . . , X N ) belongs to the class of elliptically contoured distributions, if its density functions is [4]: f (x; μ, Λ) = |Λ|− 2 g(x − μ) Λ−1 (x − μ), 1
(2)
where Λ is positive definite covariance matrix, μ = (μ1 , μ2 , . . . , μ N )—vector of expectations, g(x) ≥ 0, and
∞ −∞
...
∞
−∞
g(y y)dy1 . . . dy N = 1
(3)
In the following, we assume that μ = (μ1 , μ2 , . . . , μ N ) is known vector. Let μ = (0, 0, . . . , 0). The well-known distributions such as multivariate Gaussian (that are often used for describing multivariate distribution of stock returns) and multivariate Student distributions belong to this class of distributions. “Upper” tail dependency ratio is defined by [2]: τuu ( p) = P(X i > c1− p |X j > c1− p )
(4)
where c1− p is quantile of level p of distribution of variable X i . “Lower” tail dependency ratio is defined by: τll ( p) = P(X i < c p |X j < c p )
(5)
For elliptical distributions for any p, 0 < p < 1, the tail symmetry condition has the form: (6) τuu ( p) = τll ( p) For random vector X = (X 1 , X 2 , . . . , X N ) with elliptical distribution, the tail symmetry condition have to be satisfied for any pair of stocks. Moreover, if the pair of random variables X i , X j , such that tail symmetry condition is not performed, exists then the random vector X = (X 1 , X 2 , . . . , X N ) does not have an elliptical distribution. Note that, for p = 21 and known μ = (μ1 , μ2 , . . . , μ N ), the tail symmetry condition is equivalent to sign symmetry condition which is tested in [3] and is demonstrated in (7): h i, j : P(X i > 0, X j > 0) = P(X i < 0, X j < 0)
(7)
The authors detected pairs of stocks for which sign symmetry hypotheses are rejected and studied associated rejection graph.
224
D. P. Semenov and P. A. Koldanov
In the paper, we construct and apply multiple statistical procedure for simultaneous hypotheses testing of: h i, j : P(X i > ci , X j > c j ) = P(X i < −ci , X j < −c j )
(8)
where i, j = 1, . . . N , i = j. In order to accept the hypothesis about elliptical distribution, it is necessary to accept hypotheses (8) for any i and j (i = j). Our main goal is to define which hypotheses are true and which are false. This information gives an opportunity to find set of stocks such that for any pair of stocks the symmetry property is satisfied. For this set of stocks, hypothesis of symmetry condition for tail distributions could not be rejected.
3 Multiple Hypotheses Testing Procedure 3.1 Test for Individual Hypothesis Consider the individual hypothesis (8) for stocks i and j: P(X i > ci , X j > c j ) = P(X i < −ci , X j < −c j )
(9)
p1 = P(X i > ci , X j > c j )
(10)
p2 = P(−ci < X i < ci , X j > c j )
(11)
p3 = P(X i < −ci , X j > c j )
(12)
p4 = P(X i < −ci , −c j < X j < c j )
(13)
p5 = P(−ci < X i < ci , −c j < X j < c j )
(14)
p6 = P(X i > ci , −c j < X j < c j )
(15)
p7 = P(X i > ci , X j < −c j )
(16)
p8 = P(−ci < X i < ci , X j < −c j )
(17)
p9 = P(X i < −ci , X j < −c j )
(18)
Denote by:
Then hypothesis (8) can be written as: h i, j : p1 = p9
(19)
Rejection Graph for Multiple Testing of Elliptical Model for Market Network
225
To construct test for testing hypothesis h i, j , let us introduce the indicators: Ix1i x j (t)
Ix2i x j (t)
1, i f xi (t) > ci and x j (t) > −c j 0, else
(20)
1, i f − ci < xi (t) < ci and x j (t) > c j 0, else
(21)
=
=
Ix3i x j (t)
Ix4i x j (t)
Ix5i x j (t)
=
1, i f − ci > xi (t) and x j (t) > c j 0, else
(22)
1, i f − ci > xi (t) and − c j < x j (t) < c j 0, else
(23)
=
1, i f − ci < xi (t) < ci and − c j < x j (t) < c j = 0, else
Ix6i x j (t)
=
1, i f xi (t) > ci and − c j < x j (t) < c j 0, else
=
Ix9i x j (t)
(26)
1, i f − ci < xi (t) < ci and x j (t) < −c j 0, else
(27)
1, i f − ci > xi (t) and x j (t) < −c j = 0, else
Let: T1 = Ix1i x j = nt=1 Ix1i x j (t) T2 = Ix2i x j =
(25)
1, i f xi (t) > ci and x j (t) < −c j 0, else
Ix7i x j (t) =
Ix8i x j (t)
(24)
n
2 t=1 I xi x j (t)
… T9 = Ix9i x j = nt=1 Ix9i x j (t)
(28)
226
D. P. Semenov and P. A. Koldanov
The joint distribution of statistics Ti (i = 1, . . . , 9) has the form: P(T1 = k1 , T2 = k2 , . . . , T9 = k9 ) =
n! p k1 p k2 . . . p9k9 , k1 !k2 ! . . . k9 ! 1 2
(29)
where k1 + k2 + . . . + k9 = n In exponential form, the joint distribution can be written as: P(T1 = k1 , T2 = k2 , . . . , T9 = k9 ) = k1 !k2n!!...k9 ! ex p(ln( p1k1 ) + ln( p2k2 ) + · · · + ln( p9k9 )) = k1 !k2n!!...k9 ! ex p (ln( p1k1 ) + ln( p9k9 ) + ln( p9k1 ) − ln( p9k1 ) + ln( p2k2 ) + ln( p3k3 ) + ln( p4k4 ) + ln( p5k5 ) + ln( p6k6 ) + ln( p7k7 ) + ln( p8k8 ) + ln( p8k1 ) − ln( p8k1 ) + ln( p8k2 ) − ln( p8k2 ) + ln( p8k3 ) − ln( p8k3 ) + · · · + ln( p8k9 ) − ln( p8k9 )) = k1 !k2n!!...k9 ! ex p(k1 ln pp19 + (k1 + k9 ) ln pp98 +
(1− p1 − p2 −···− p7 − p9 ) n! ex p k2 ln pp28 + k3 ln pp38 + · · · + k7 ln pp78 + nlnp8 ) = k1 !k2 !...k 7 !k9 ! (n−k1 −k2 −···−k7 −k9 )! (k1 ln pp19 + (k1 + k9 ) ln pp98 + k2 ln pp28 + k3 ln pp38 + · · · + k7 ln pp78 ) n
Then, test for testing individual hypothesis has the form [5]: φi j =
0, i f d1 ((k1 + k9 ), k2 , . . . , k7 ) < k1 < d2 ((k1 + k9 ), k2 , . . . , k7 ) 1, else
(30)
where d1 and d2 are defined from: P((d1 ((k1 + k9 ), k2 , . . . , k7 ) > k1 or k1 > d2 ((k1 + k9 ), k2 , . . . , k7 ))/ h i, j ) = P(T1 = k1 < d1 ((k1 + k9 ), k2 , . . . , k7 ) or T1 = k1 > d2 ((k1 + k9 ), k2 , . . . , k7 )/T1 + T9 = k1 + k9 , T2 = k2 , . . . , T7 = k7 ) = α where α is given significance level. One has: P(T1 = k1 /T1 + T9 = k1 + k9 , T2 = k2 , . . . , T7 = k7 ) =
P(T1 + T9 = k, T2 = k2 , . . . , T7 = k7 ) =
P(T1 =k1 ,T1 +T9 =k1 +k9 ,T2 =k2 ,...,T7 =k7 ) P(T1 +T9 =k1 +k9 ,T2 =k2 ,...,T7 =k7 )
n! k!k2 !k3 !...k7 !
P(T1 = k1 , T1 + T9 = k, T2 = k2 , . . . , T7 = k7 ) =
p2k2 p3k3 . . . p7k7 ( p1 + p9 )k n! k2 !k3 !...k7 !
p2k2 p3k3 . . . p7k7
k
k
p11 p99 k1 !k9 !
Conditional distribution has the form: P(T1 = k1 /T1 + T9 = k1 + k9 , T2 = k2 , . . . , T7 = k7 ) = Ckk1 ( p1 p+1 p9 )k1 ( p1 p+9 p9 )k−k1 Then, the test φi j can be written as: 0, i f d1 (k) < k1 < d2 (k) φi j = 1, else
,
(31)
where k1 + k9 = k If hypothesis (5) is true: k
p
p
k
P(T1 = k1 /T1 + T9 = k1 + k9 , T2 = k2 , . . . , T7 = k7 ) = Ck 1 ( p +1p )k1 ( p +9p )k−k1 = Ck 1 ( 21 )k1 ( 21 )k−k1 1 9 1 9
Rejection Graph for Multiple Testing of Elliptical Model for Market Network
227
Finally, d1 (k) and d2 (k)is defined by: c Cki ≤ α2 ) d1 (k) = max(C : ( 21 )k i=0 k d2 (k) = min(C : ( 21 )k i=c Cki ≤ α2 )
3.2 Holm Procedure The Holm step-down procedure [6] is applied for simultaneous testing individual hypotheses h i, j . This procedure consists of at most M = C N2 steps. At each step, either of individual hypothesis h i, j is rejected or all remaining hypotheses are accepted. Let α be a Family-Wise Error Rate (FWER) of the multiple testing procedure and qi, j be a p-value of the individual test for testing hypothesis h i, j . The procedure is constructed as follows: • Step 1: If min
i, j=1,...,N
qi, j ≥
α M
(32)
then accept all hypotheses h i, j , i, j = 1, . . . , N else if mini, j=1,...,N qi, j = qi1 , j1 then reject hypothesis h i1 , j1 and go to step 2. • … • Step K: Let I = (i 1 , j1 ), (i 2 , j2 ), . . . , (i K −1 , j K −1 ) be the set of indexes of previously rejected hypotheses. If min qi, j ≥
(i, j)∈I /
α M −K +1
(33)
then accept all hypotheses h i, j , (i, j) ∈ / I , else if min(i, j)∈I / qi, j = qi K , j K then reject hypothesis h i K , jK and go to step (K + 1). • … • Step M: Let I = (i 1 , j1 ), (i 2 , j2 ), . . . , (i M−1 , j M−1 ) be the set of indexes of pre/ I . If viously rejected hypotheses. Let (i M , j M ) ∈ qi M , j M ≥ α
(34)
then accept hypothesis h i M , jM , else reject hypothesis h i M , jM (reject all hypotheses).
228
D. P. Semenov and P. A. Koldanov
4 Practical Application The procedure for testing individual hypothesis (8) was applied to analyze data from the US, UK, France, Germany, India, and China stock markets for the period from January 1, 2006 to December 31, 2016. For each of the countries, the greatest by sales stocks were selected, which were present on the market whole period. The attribute of the stock is the logarithmic return. For each country, number of stocks is N = 100, a sample of size is n = 250 (1 calendar year) were taken. Constants ci and c j are chosen as an estimations of quantiles of order p from marginal distributions of stocks i and j respectively. For each year, for each country, symmetry condition for tail distribution was tested and hubs from a graph of rejections were obtained. These hubs are removed from consideration and the procedure is repeated until all individual hypotheses are accepted. At each iteration, the Holm procedure is used. The obtained results are shown in the Tables 1, 2, 3, 4, 5 and 6. Significance level for Holm procedure was chosen equals 0.05. In each table element, (i, j) = k means that with quantiles level p (element (i, 1)) of X i at year that are placed in (1, j) it is necessary to delete k stocks for performing tail symmetry property.
Table 1 USA market 2006 2007 p= 0.05 p= 0.1 p= 0.25 p= 0.5
2009
2010
2011
2012
2013
2014
2015
2016
22
14
17
8
9
17
18
22
23
19
25
29
32
26
16
20
25
22
37
33
25
40
32
25
43
32
27
32
28
36
38
39
55
31
32
38
26
30
50
30
34
41
36
57
2008
2011
Table 2 UK market 2006 2007 p= 0.05 p= 0.1 p= 0.25 p= 0.5
2008
2009
2010
2012
2013
2014
2015
2016
21
13
5
11
10
8
21
27
24
23
25
28
16
8
25
17
18
34
35
38
32
29
46
34
17
36
42
27
41
44
47
47
45
24
22
22
49
46
20
39
41
38
34
31
Rejection Graph for Multiple Testing of Elliptical Model for Market Network Table 3 Germany market 2006 2007 2008 p= 0.05 p= 0.1 p= 0.25 p= 0.5
2011
2012
2013
2014
2015
2016
8
13
9
16
6
3
19
17
12
11
16
22
27
24
19
16
17
34
30
21
17
32
31
42
33
27
29
21
45
41
28
31
29
23
57
40
32
34
23
64
43
20
22
2009
2010
2011
2012
2013
2014
2015
2016
22
13
13
9
21
7
14
19
21
18
21
34
20
23
21
39
17
30
25
32
25
30
45
31
30
29
49
21
35
35
36
35
35
57
25
33
23
40
21
20
25
32
36
38
2009
2010
2011
2012
2013
2014
2015
2016
Table 5 China market 2006 2007 2008 p= 0.05 p= 0.1 p= 0.25 p= 0.5
2010
9
Table 4 France market 2006 2007 2008 p= 0.05 p= 0.1 p= 0.25 p= 0.5
2009
229
13
17
12
28
24
14
13
23
38
14
12
20
34
21
51
33
25
20
31
61
28
22
37
38
32
60
41
37
48
48
72
56
46
49
50
0
51
41
45
41
45
76
61
57
For example, for USA market, year = 2006, and p = 0.05, it is necessary to delete 22 stocks to obtain set of stocks such that for any pair of stocks the symmetry property is satisfied.
230
D. P. Semenov and P. A. Koldanov
Table 6 India market 2006 2007 p= 0.05 p= 0.1 p= 0.25 p= 0.5
2008
2009
2010
2011
2012
2013
2014
2015
2016
3
11
9
11
18
12
12
13
21
13
10
6
20
16
11
23
23
28
19
26
17
15
16
26
13
27
36
38
35
29
32
22
23
74
9
6
8
6
10
10
9
7
8
9
5 Discussion In this paper, elliptical model as a model of multivariate distribution of stocks returns were investigated. Statistical procedure for multiple hypotheses testing property (tail symmetry condition) that are satisfied for elliptical distributions is constructed. The property has been tested on the data of the US, UK, China, India, Germany, and France markets. The sets of stocks for any pair of which the properties of elliptical distributions are satisfied has been obtained. Figures show that for lower quantiles, it is necessary to remove less stocks to satisfy tail symmetry condition for all pair of stocks. There are years, where it necessary to remove small number of stocks (for example, 3 for India (2006 year) or 3 for Germany (2012 year)) stocks to satisfy tail symmetry condition for all pair of stocks. Separate discussion can be devoted to constants ci and c j choice. In the paper, these constants are chosen as an estimations of quantiles of order p from marginal distributions of stocks i and j, respectively. But there are other ways how to choose them, for example, from a segment with given boundaries with a given step. It is the problem for further investigations. Acknowledgements The work of Koldanov P. A. was conducted at the Laboratory of Algorithms and Technologies for Network Analysis of National Research University Higher School of Economics.The work is partially supported by RFHR grant 15-32-01052.
Appendix 1 For example, in 2016 for US market (quantile p = 0, 1), the set of stocks for any pair of which the properties of elliptical distributions are satisfied is as follows (60 tickers): ‘AVY’, ‘EXR’, ‘AON’, ‘FRT’, ‘NDAQ’, ‘LVLT’, ‘LEN’, ‘ARNC’, ‘TDC’, ‘CLX’, ‘HST’,
Rejection Graph for Multiple Testing of Elliptical Model for Market Network
231
‘CRM’, ‘SYK’, ‘GPN’, ‘TAP’, ‘GPC’, ‘RIG’, ‘MMC’, ‘EA’, ‘NLSN’, ‘MCHP’, ‘SLB’, ‘AAPL’, ‘NUE’, ‘GM’, ‘COH’, ‘MKC’, ‘ICE’, ‘CSRA’, ‘AMZN’, ‘ACN’, ‘LKQ’, ‘MA’, ‘CXO’, ‘CTL’, ‘HON’, ‘PNW’, ‘AEP’, ‘BMY’, ‘DVA’, ‘XOM’, ‘KSU’, ‘CBG’, ‘AME’, ‘GOOGL’, ‘ETN’, ‘SPLS’, ‘BAC’, ‘PXD’, ‘MTD’, ‘O’, ‘MTB’, ‘JCI’, ‘NTRS’, ‘RF’, ‘KHC’, ‘COF’, ‘PGR’, ‘TXN’, ‘ROK’
Appendix 2. The “Evolution” of the Rejection Graph On the pictures below changes in structure of rejection graph in dependence of the number of stocks is demonstrated. It is shown clearly that after the removal of hubs, the number of rejected hypotheses gradually decreases and eventually tends to zero.
Fig. 1 Germany, 2016. Quantile p = 0.25. 100 vertices. Max degree of graph is 43
232
D. P. Semenov and P. A. Koldanov
Fig. 2 Germany, 2016. Quantile p = 0.25. 90 vertices. Max degree of graph is 27
Particularly, for 100 vertices, the max degree of graph is 43. By deleting hubs from this graph and remaining, for example, 90 vertices—the max degree of graph decreases (this means that the number of pairs of stocks, for which the symmetry property is not satisfied, decreases). And when 30 vertices are deleted, the graph with the max degree equals 2 is obtained: for node that has maximum degree there are only two stocks for which the symmetry property is not satisfied (Figs. 1, 2, 3 and 4).
Rejection Graph for Multiple Testing of Elliptical Model for Market Network
Fig. 3 Germany, 2016. Quantile p = 0.25. 80 vertices. Max degree of graph is 10
233
234
D. P. Semenov and P. A. Koldanov
Fig. 4 Germany, 2016. Quantile p = 0.25. 70 vertices. Max degree of graph is 2
References 1. Gupta, F.K., Varga, T., Bondar, T.: Elliptically Contoured Models in Statistics and Portfolio Theory. Springer (2013) 2. Chicheportiche, R., Bouchaud, J.-P.: The joint distribution of stock returns is not elliptical (September 6, 2010). Int. J. Theor. Appl. Finan. 15(3) (2012) 3. Koldanov, P.A., Lozgacheva, N.: Multiple testing of sign symmetry for stock return distributions. Int. J. Theor. Appl. Finan. (2016) 4. Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. WileyInterscience, New-York (2003) 5. Lehmann, L., Romano, J.P.: Testing Statistical Hypotheses, 3rd edn. Springer, New-York (2005) 6. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70 (1979)
Mapping Paradigms of Social Sciences: Application of Network Analysis Dmitry Zaytsev and Daria Drozdova
Abstract In this paper, we propose to utilize the methods of network analysis to analyze the relationship between various elements that constitute any particular research in social sciences. Four levels that determine a design of the research can be established: ontological and epistemological assumptions that determine what is the reality under the study and how can we obtain the knowledge about it; a general methodological frame that defines the object of the study and a spectrum of research questions we are allowed to pose; and, finally, a list of methods that we might use in order to get answers. All these levels are interrelated, sometimes in very confusing way. We propose to extract a preliminary set of relations between various elements from textbooks on methodology of social and political sciences and to visualize and analyze their relations using network analytic methods. Keywords Social science methodology · Network analysis · Philosophy of science · Political science methodology · Paradigms · Quantitative and qualitative methods
The article was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE) and supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100.’ D. Zaytsev (B) · D. Drozdova Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] D. Drozdova e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_17
235
236
D. Zaytsev and D. Drozdova
1 Introduction As the social science fields are relatively well established with respect to methodologies they use, it appears that researchers who look for the causal associations between variables use quantitative, statistical methods to prove cause–effect relations between, while political linguists tend to use methods such as discourse analysis to disclose cognitive manipulation by power-holders through the imposition of certain beliefs, defining social behavior. Are methods in social sciences, just the neutral tools, or they are related to general philosophical assumptions, which impose certain research design and methodological decisions? Social research is driven not only by methods chosen but also by general methodological approaches and philosophical assumptions shared by researchers. Methodological conventions typically favor certain practical approaches associated with a certain ontology and epistemology, which, at times, can lead to the biased research results. For example, the use of statistical methods in order to find a cause–effect relation between the level of economic growth and democratization would most surely result in a significant positivistic association, leading to an inference about such causality. It is shown by multiple political science studies analyzing statistics by countries [1]. At the same time, some scientists arrive at a reverse causality conclusion that democratization leads to economic growth [2]. In both examples, researchers are driven by the naturalist ontology and positivist epistemology that induces them to believe that social world is governed by regular laws similar to laws in natural sciences, and the aim of a researcher is to reveal these regularities by systematizing and analyzing numerical data related to “economic growth” and “democratization.” In a contrast to typical cause–effect studies, Huntington [1] develops a complex model of the relationship between economic growth and democratization. He claims that innovative economic growth leads to the rise of the middle class, an increased level of education, and as a result, an increased demand for accountabilty of the government before its citizens in budgetary matters. This conclusion became possible because Huntington bases his research on the method of small-N comparison and uses the historical approach based on the realist ontology and the post-positivist epistemology. These allow him to shift the focus of research from the abstract processes of “economic growth” and “democratization” to the political and economic situation in certain countries, looking for the deeper meaning in the studied relationship. This example demonstrates how ontology and epistemology implicitly assumed by researchers define their research findings, determine the choice of methodological approaches, and limit their analytical tools or methods. This “path dependency” is unlikely to demonstrate results contrary to previously obtained using the same methods. This can lead to a vicious cycle of both the ontology and epistemology defining approaches and methods, leading to expected outcomes known a priori and determined by the philosophical foundations of ontology. We can break this vicious circle by identifying different clusters of related ontologies, epistemologies, approaches, and methods in social sciences in order to identify how they interrelate and avoid such a methodological trap. Despite numerous studies attempting to do just that, to the best
Mapping Paradigms of Social Sciences: Application of Network Analysis
237
of our knowledge, none of the previous studies have used the relational approach to analyzing the complex relationship between these components of the scientific study. We propose to use network analysis in order to exit the path dependency of implicit methodological assumptions inherent in the ontological choice.
2 Methodological Diversity of Social Sciences While there is a lot of literature on methodology and philosophical foundations of social sciences, there is no consensus or a dominant position shared by the majority of social scientists regarding the list of ontologies, epistemologies, methodological approaches, and methods of social sciences and associations between them. For example, Moses and Knutsen argue that despite clear and reasonable statement that methodology and methods of social science depends on ontological and epistemological justifications, “in practice it is hard to follow-up,” because “methodological diversity of the social sciences can be confusing” [3, p. 4]. As a result, scientists often disagree about some fundamental issues relating to ontologies, epistemologies, methodological approaches, and associated methods. It is clear that such relations exist and drive our research and scientific discovery along a given path. In order for the study results to be accepted by the academic community, researchers and students of social sciences have to be clear about the methodology used and recognize restrictions of the methods that they are using. In order to achieve the highest level of reliability and validity of a study, an ultimate requirement is to apply the “methodological triangulation” [4, 5]—a combination of a variety of methodologies and methods in a single research. While teaching methodology of political research and philosophy of science, the authors of this paper faced the problem of the lack of a unified approach to ontological and epistemological foundations and methodologies and methods of social sciences even in textbooks for beginners. Each textbook proposes its own way to label, list, and describe philosophical assumptions, methodologies, and methods. For example, Moses and Knutsen [3] propose to divide different methods in social science alongside the two main methodological perspectives: naturalist and constructivist. Porta and Keating [6] insist that we can identify at least four traditions in social sciences according to their ontological and epistemological presuppositions, which can be distinguished by methods they are using. Meanwhile, Abbot [7] describes methods and methodological approaches relative to an entire list of methodological, ontological and epistemological debates. All these discrepancies may give an impression that there is no common understanding of ontologies, epistemologies, methodologies, and methods in social sciences, they all are mixed with each other, and the relationship between them is chaotic. However, an in-depth study of textbooks of social science methodology, performed for this paper, has shown that despite such diversity, it may be possible to establish a common structure to them. Despite the diversity in labeling, listing, and linking ontologies, epistemologies, methodologies, and methods, the differences
238
D. Zaytsev and D. Drozdova
between textbooks are limited, and a lot of similarities can be found. Many textbooks recognize that in order to describe the methodological diversity of social sciences, several distinctive levels should be identified. The upper level is related to philosophical assumptions. It includes ontology and epistemology that determine a research agenda: ontology is answering the question “What social world really is,” while epistemology gives an answer to the question “What and how can we learn about the social world?” The intermediate level is the methodological tradition or methodological approach (e.g., theory of rational choice, ethnomethodology, institutionalism, etc.). This approach determines a proper research focus, an aim of the research, a relevant research question, and hypotheses that have to be tested in order to answer that question. The last level is the methods themselves, often considered to be just a toolkit to collect and analyze data. Therefore, in social sciences, four levels define the research design: ontologies (O), epistemologies (E), methodological approaches (A), and methods (M). All levels are interconnected, but the exact nature of this relationship is unclear. Ontologies and epistemologies are undoubtedly connected to each other. Approaches (A) have strong relations to philosophical assumptions (O and E), and, at the same time, they can be characterized by a set of methods (M) they are using. Finally, methods, even as a neutral set of tools, depend in a certain way on philosophical positions implicitly adopted by corresponding methodological approaches. So far, it appears clear that an interconnected set of relations between ontologies, epistemologies, approaches, and methods in social sciences can be established. But a relational approach is needed in order to examine the deeper structure of this relationships. Therefore, we propose to utilize the methods of network analysis to analyze the described relationship as a multilevel network.
3 Network Design and Description As a source of data, we used five textbooks on social and political science methodology [3, 6–9]. These textbooks were chosen out of 186 books on political and social science methodology; 27 of them were textbooks that describe methods in detail. Only the chosen five provides a description of various levels that we have indicated previously: ontologies, epistemologies, methodological approaches, and methods, which they label, define, describe, and propose a system of their interconnection, and as a result, allows to examine the relationships between the different levels. From the chosen textbooks, we made the lists of ontologies, epistemologies, approaches, and methods. However, even in the five textbooks, there were some differences in descriptions and labels that needed to be reconciled. The first issue was the problem of synonyms: often, authors used different labels for essentially the same entity. For example, some identify the same position (e.g., “positivism”) as an epistemological tradition, while others—as a methodological. Therefore, we documented all possible synonyms in order to identify a list of common concepts that they describe (Table 1).
Mapping Paradigms of Social Sciences: Application of Network Analysis
239
Table 1 Ontologies, epistemologies, methodologies, and methods according to the textbooks Elements of analysis Description • “Deals with the things that we think exist in the world” [8, p. 5], “the theory of what kinds of objects exist” [8, p. 49], “Ontology seeks to answer the question, what exists?” [8, p. 60] • “Is a theory of being” [9, p. 185], “The key ontological question is: What is the form and nature of reality and, consequently, what is there that can be known about it?” [9, p. 185] • “Related to the existence of a real and objective world” [6, p. 21], “The ontological question is about what we study, that is, the object of investigation” [6, p. 21] • “Ontology ... means the study of being—the study of the basic building blocks of existence. The fundamental question in the field of ontology is: ’What is the world really made of?” [3, p. 4] • [social ontology]: “Debates about the nature of social reality itself” [7, p. 44], “the elements and processes that we imagine make up the world” [7, p. 179] Epistemology • “[Deals] with how we come to know about those things” [8, p. 5], “is the theory of knowledge” [8, p. 49] • “Reflects …what we can know about the world” [9, p. 185], “the key epistemological question is: What is the nature of the relationship between the knower and what can be known?” [9, p. 185] • “Epistemology is about how we know things. It is a branch of philosophy that addresses the question of the ‘nature, sources and limits of knowledge”’ [6, p. 22] • “Epistemology …denotes the philosophical study of knowledge. ‘What is knowledge?’ is the basic question of epistemology” [3, p. 4] Methodology/approach • Approaches = traditions(?) Traditions of political science: modernist empiricism, behavioralism, institutionalism, and rational choice • “Different general ways of approaching the subject matter of political science… Each… combines a set of attributes, understandings and practices that define a certain way of doing political science” [9, p. 3] • Methodology: “the form of coherent research designs” [6, p. 25], “referring to the technical instruments that are used in order to acquire … knowledge” [6, p. 21], “The methodological question refers to the instruments and techniques we use to acquire knowledge” [6, p. 25] • “Methodology … refers to the ways in which we acquire knowledge. ‘How do we know?’ is the basic question in methodology. …methodology … is… the study of which methods are appropriate to produce reliable knowledge” [3, p. 4–5] Methods • “Tools of the trade” [6, p. 7], “merely ways of acquiring information” [6, p. 7], “Methods are no more than ways of acquiring data” [6, p. 28] • “Methods as tools, …methodologies as well-equipped toolboxes” [3, p. 3], “method refers to research techniques, or technical procedures of a discipline” [3, p. 5] • “…stylized ways of conducting … research that comprise routine and accepted procedures for doing the rigorous side of science” [7, p.13] Ontology
240
D. Zaytsev and D. Drozdova
The second issue was establishing a list of elements included in the analysis. Given that our aim is to maximize the number of elements using the maximum number of textbooks, we chose to list the elements that were mentioned in at least in four books out of five (Tables 2, 3, 4 and 5), because the number of elements mentioned in all five texts are too small. As a result, we established a list of 3 ontologies (Naturalism, Realism, and Relativism), 4 epistemologies (Positivism, Post-Positivism, Interpretivism, and Post-Structuralism), 8 methodologies (Behaviorism, Rational Choice Theory, Institutionalism, Marxism, Political Psychology, Constructivism, Cultural Approach, and Normative Approach), and 14 methods (Historical Narrative, Participant Observation, Experiment, Interpretative Experiment, Survey, Interview, Focus Group, Within-case Study, Small-N Comparison, Large-N Comparison, Statistics, Formal Modeling, Bayesian, and Discourse Analysis).
Table 2 Ontologies Groups of synonyms Realism [6–9]/Foundationalism [9]/Objectivism [9]/Naturalism (syn.: ‘positivism’, ‘empiricism’, ‘behavioralism’) [3, 6] Construct (-ivism) (-ionism) [3, 6–9]/ Relativism [3, 6–9]/Anti-foundationalism [9] Naturalism + Relativism [6, 8, 9]/ Scientific Realism [3, 6]/Reductionism [3, 6]/Critical Realism [3, 6] Nominalism [6]
Table 3 Epistemologies Groups of synonyms Positivism [3, 6–9]/‘Causal thinking’ [8]/ (inductive, experimental) Empiricism [3, 6–9] Post-Positivism [6, 9]/(deductive) Rationalism [3, 6, 8]/Realism [9] (hermeneutic) Interpretivism [3, 6, 7, 9]/‘Interpretive thinking’ [8]/Holism [3, 6, 8] Postmodern interpretivism [3, 6, 9]/(post)Structuralism [3, 9]/ ‘Interpretive thinking’ [8]/Humanistic [6]
[8]
[9]
[6]
[3]
[7]
Total
*
*
*
*
*
5
*
*
*
*
*
5
*
*
*
*
4
*
1
[8]
[9]
[6]
[3]
[7]
Total
*
*
*
*
*
5
*
*
*
*
*
*
*
*
*
*
*
*
4 *
5
4
Mapping Paradigms of Social Sciences: Application of Network Analysis Table 4 Methodological approaches Groups of synonyms Behaviorism [3, 6–9] Rational choice [3, 6–9]/Game Theory [3, 6–8]/individualism [6–9] Institutional [6, 9]/(historical, new) Institutionalism [6, 8, 9]/Institutional game theory [3, 6, 7, 9] Marxism [3, 6, 9]/emergentism [7] Political psychology [3, 6, 7, 9] Constructivism [3, 6, 8, 9]/(social) constructionism [6, 7]/contextualism [6, 7] Cultural [3, 6–8]/culturalism [7] Normative [6, 8, 9]/contextualism[6, 7] Feminism [6, 9]
241
[8]
[9]
[6]
[3]
[7]
Total
* *
* *
* *
* *
* *
5 5
*
*
*
*
*
5
*
* * *
* * *
* * *
* * *
4 4 5
* * *
*
* *
* *
4 4 2
* *
Fig. 1 Relations between ontologies and epistemologies
From these sets, we created three relational tables: relations between ontologies and epistemologies (see Table 6), between epistemologies and methodological approaches (Table 7), and between approaches and methods (Table 8). We use a valued network, where the weight of an edge is proportional to the number of textbooks that indicate an existence of that relation. The simplest relations are established between ontologies and epistemologies (see Fig. 1). Ontological “naturalism” (that is the belief in the existence of the “Real World”) is strongly connected to the positivist epistemology (empirical and induc-
242 Table 5 Methods Groups of synonyms Description [3, 6, 7]/thick description [3, 6] / narrative [3, 6–8]/historical narrative [3, 7, 8] Observation[3]/ethnographic observation [9]/ ethnography(-ic) [6, 7, 9]/ethnomethodology [3]/participant observation [6, 7, 9] Semistructured interviews [8, 9]/unstructured interviews [6]/(in-depth, intensive) interview [3, 8, 9] Experiment [3, 6–9] Interpretive (quasi-) experiment [3, 6–8] Survey [3, 6–9]/(structured) interviews [3, 6–8] Focus group [3, 6, 7, 9] Compar(-ison), (-ative) [3, 6, 8, 9]/ small-N compar(-ison), (-ative) [3, 6, 7, 9] (within-) case(-)study [3, 6–9]/process(-)tracing [3, 6, 8, 9] Large-N comparison (analysis/studies) [3, 6, 7, 9] (descriptive) Statistics [3, 6–9]/standard causal analysis [7] Mathematical modeling [8, 9]/game theory [6–9] / formal (-ization) model (-ing) [7–9] Bayesian (analysis, statictic, inference) [3, 6, 8]/Bayesian network analysis (modeling) [7, 8] (text (-ual)) Discourse analysis [3, 6, 9]/ Record-based analysis [7] Historical analysis [3, 8]/historiography [3, 9]/ life history [9] Content analysis [6, 9]
D. Zaytsev and D. Drozdova
[8]
[9]
[6]
[3]
[7]
Total
*
*
*
4
*
*
*
*
4
*
*
*
*
* * *
*
*
* * *
* * * * *
* * * * *
* * * * *
5 4 5 4 5
*
*
*
*
*
5
*
* *
* *
* *
* *
4 5
*
*
*
*
4
*
*
* *
*
*
*
4
*
*
*
4
* *
4
* *
3 2
tive research strategy), and ontological “relativism” (which supposes that the “Real World” exists exclusively together with our perceptions of it) relates to the interpretivist epistemology (historiographic, ethnographic, and contextual research strategy) and to the extreme relativist epistemology which is called post-structuralism (knowledge discovery is always subjective and that is why objective knowledge in social sciences impossible). However, very strong dichotomous ontological and epistemological foundations in political science could be merged down in an intermediate position: critical or scientific “realism” (assumption that there are real and powerful structures independent of our knowledge of it, which is always subject to historical and cultural determination) which is related to both post-positivist (empirical deductive research strategy) and interpretivist epistemologies.
Mapping Paradigms of Social Sciences: Application of Network Analysis
243
Fig. 2 Relations between epistemologies and methodological approaches
A simple visual examination of epistemologies approaches relations (see Fig. 2) shows that it is possible to divide methodological approaches into three groups. The first group includes behaviorism and rational choice theory, which are related exclusively to positivist and post-positivist epistemologies. The second group includes normative, cultural, and constructivist approaches, which are related to interpretivist and post-structuralist epistemologies. Intermediate position between “positivist” and “interpretivist” poles is occupated by psychological, institutional and marxists approaches. However, the picture becomes more complicated when we take into account the relation between methods and methodological approaches (see Fig. 3). Here, we can see that political psychology uses only a small number of available methods, and as a result, stands apart from other methodologies. Institutionalism comes closer to behaviorism and rational choice theory: they share a relation to a group of “quanti-
Table 6 Ontologies and epistemologies Positivism Post-positivism Naturalism Realism Relativism
5 0 0
1 2 0
Interpretivism
Poststructuralism
0 1 4
0 0 3
244
D. Zaytsev and D. Drozdova
Table 7 Approaches and epistemologies Positivism Post-positivism Behaviorism Rational choice theory Institutionalism Marxism Political psychology Constructivism Cultural approach Normative approach
Interpretivism
Poststructuralism
5 1
2 4
0 0
0 0
1 0 0
2 2 1
1 1 1
0 0 0
0 0 0
0 0 0
4 3 2
1 2 2
tative” methods, which includes experiment, survey, statistics and formal modeling. Further, Marxism gets close to “interpretivist” approaches (cultural, normative, and constructivist). These two approaches have a strong and almost exclusive relation
Fig. 3 Relations between approaches and methods
0
0
0
2
0
2
1
1
Institutionalism
Marxism
Political psychology
Constructivism
Cultural approach
Normative approach
0
2
3
0
1
0
0
Rational choice theory 0
Behaviorism
Historical Participant narrative observation
Table 8 Approaches and methods
0
0
0
1
0
2
1
3
1
1
1
0
1
0
1
0
Experiment Interpretive experiment
0
0
0
0
0
1
1
1
0
3
3
0
1
0
0
0
0
1
1
0
0
0
0
0
2
1
3
0
2
1
0
0
Survey Interview Focus Withingroup case study
2
1
3
0
1
1
0
0
Small-N comparison
0
1
1
0
0
1
0
1
0
0
0
0
0
1
2
5
0
0
0
0
0
1
4
2
Large-N Statistics Formal comparison modeling
0
0
1
0
0
0
1
0
1
1
2
1
1
0
0
0
Bayesian Discourse analysis
Mapping Paradigms of Social Sciences: Application of Network Analysis 245
246
D. Zaytsev and D. Drozdova
Fig. 4 Ontologies, epistemologies, approaches, and methods
to a group of “qualitative” methods, which includes historical narrative, interview, participant observation, discourse analysis, and focus group study. At the same time, there is a group of methods (small-n and large-n comparisons, interpretative experiment, within-case study, and Bayesian statistics) that, in some way, are related to both the “positivist” and “intepretivist” approaches. The last step in our data description is to put all three networks together. The combined network is presented at the Fig. 4, where the size of the method’s nodes is proportional to the number of approaches they are connected to (degree). With a weighted network, an expected association between different levels becomes more apparent. For example, we can observe a strong connection between naturalist ontology, positivist epistemology, behaviorist approach and statistical methods, and between post-positivist epistemology, rational choice theory, and formal modeling. It is also clear that the naturalism/relativism opposition on ontological level is reflected in positivism/post-structuralism opposition on epistemological level, and in corresponding grouping of methods and methodological approaches. The spectrum of methods is defined by a tension between “quantitative” (statistics, formal modeling, and survey) and “qualitative” methods (focus group, interview, narrative, and discourse analysis), which correspond to positivism/interpretivism opposition in epis-
Mapping Paradigms of Social Sciences: Application of Network Analysis
247
Fig. 5 Two-dimensional representation of ontologies, epistemologies, approaches, and methods relations
temology. Some methods (for example, large-n comparison, experiment, Bayesian methods) occupy an intermediate position. They are assumed to fulfill the brokerage not only between diverse methods, but also between contrary ontological and epistemological foundations, and can be used for diverse methodological issues and research subjects. Another way of visualization of a multilevel network is a two-dimensional representation obtained by correspondence analysis [10]. In order to compute this analysis, we constructed a new bipartite contingency matrix which relates epistemologies to ontologies, approaches, and methods (ExOAM matrix with dimensions 4 × 29). Two first relations (ExO and ExA) are part of our initial data, whereas epistemologies to methods relations (ExM) were constructed by multiplication of ExA and AxM matrices with subsequent normalization according to the maximum value of other data. Having obtained the bipartite matrix, we performed correspondence analysis
Table 9 Principal inertias (eigenvalues) Dim Value % 1 2 3 Total
0.716300 0.173253 0.030167 0.919719
77.9 18.8 3.3 100.0
Cum %
Scree plot
77.9 96.7 100.0
******************* ***** *
248
D. Zaytsev and D. Drozdova
R packages ca and ggplot2. As a result, we received a two-dimensional picture (Fig. 5), where the first dimension explains 77.9% of variation and the second dimension explains 18.8% (for eigenvector parameters see Table 9). The two-dimensional visualization presents the same bipolar distinction that we have described before, but in a new way: on the left, we have a number of methods related to positivist and post-positivist epistemologies, on the right—those that are related to interpretivist and post-structuralist epistemologies. Here, a dense group contains relativist ontology, interpretivist, and post-structuralist epistemological positions, three methodological approaches (cultural, normative, and constructivist), and a number of methods. Marxist methodological approach in this representation stands apart and is closely related to realist ontology. Actually, in some textbooks, Marxism is described as the main representative of realist understanding of reality [9, p. 204].
4 Network Analysis So far, we used network methods for mere description and representation of relations between ontologies, epistemologies, methodological approaches, and methods which were obtained by close reading of textbooks. We were able to observe a certain association between different elements—ontological, epistemological, and methodological—that defines a research practice in social sciences. However, the positivist/intrepretivist divergence we have observed was already contained in our initial data because it is exactly the distinction that constitutes the main theme of any methodological discussion related to social sciences. As the next step, we calculated the main network indexes (degree, betweenness centrality and eigenvalue centrality) for the methods-approaches network (M x A, Table 8). Because network methods are inherently relational, describing a much more complex underlying structure of the relationships, not apparent with the use of other, nonnetwork methods. The degree of a node is equal to number of its connections. In our example, for an approach, it is a number of methods associated with that approach; for a method, it is a number of approaches where it is usually used. The betweenness centrality is a measure of probability that a node is situated on the shortest path between two others. The eigenvector centrality is a measure of node’s “influence,” that is, its relation to other nodes with high degree centrality. The degree distribution for the methods-approaches network has some differences if we consider methods and approaches separately as the network is directed and nonsymmetric. An average method is connected to 3–4 approaches (median value: 3.5, mean: 3.6, and standard deviation: 1.8), whereas an average approach is associated with 6–7 methods (median value: 6.5, mean: 6.4, and standard deviation: 2.5). But we have to take into account that the maximum number of possible connections for a method is equal to 8 (the number of approaches), whereas the maximum number of possible connections for an approach is equal to 14 (the number of methods). Therefore, it would be more informative to normalize the degree according to the type of node: method’s degree is divided by, 8 whereas, approaches’ degree is divided by 14.
Mapping Paradigms of Social Sciences: Application of Network Analysis Table 10 Approaches and methods Degree
Historical 4 narrative Participant 3 observation Experiment 4 Interpretive 5 experiment Survey 3 Interview 3 Focus group 2 Within-case study 5 Small-N 5 comparison Large-N 4 comparison Statistics 3 Formal modeling 3 Bayesian 2 Discourse 5 analysis Behaviorism 5 Rational choice 6 theory Institutionalism 7 Marxism 7 Political 2 psychology Constructivism 10 Cultural approach 9 Normative 5 approach
249
Normalized degree
Normalized betweenness centrality
Eigenvector centrality
0.5
0.051
0.429
0.38
0.016
0.509
0.5 0.63
0.231 0.411
0.139 0.285
0.38 0.38 0.25 0.63 0.63
0.043 0.016 0 0.315 0.315
0.065 0.574 0.174 0.603 0.551
0.5
0.419
0.22
0.38 0.38 0.25 0.63
0.043 0.043 0.080 0.332
0.171 0.141 0.128 0.383
0.36 0.43
0.205 0.536
0.201 0.167
0.5 0.5 0,14
0.813 0.284 0,074
0.222 0.478 0.057
0.71 0.64 0.38
1 0.675 0.116
1 0.589 0.372
The normalized degree value you can see in the Table 10. After the normalization, the mean value of both degree distributions became the same and equal to 0.455, which means that an average method or approach is related to approximately 45% of their counterpart. The betweenness centrality index in the Table 10 is divided by the maximum values (48.6); therefore, it should be read as a relative value. Again, we analyze methods and approaches separately. Among methods, the interpretative experiment, small-n
250
D. Zaytsev and D. Drozdova
and large-n comparisons, discourse analysis, and within-case study have relatively big betweenness index. This group includes almost exclusively the methods that are common for both “positivist” and “interpretivist” approaches (see Sect. 3), so they act as a bridge between methodologies that some considers it to be the opposite for each other. The outcome on the approaches level is unexpected and cannot be derived with general observation or other nonnetwork methods. Constructivist and cultural approaches have almost the same connections with methods, and they have the highest level of connectedness in the whole network (degrees equal to 10 and 9, respectively). However, they have a very different betweenness index: the betweenness centrality of constructivism is significantly higher than that of cultural approaches. It appears that the constructivism acts as a bridge between different methods and other approaches much more frequently. Interpretatively, this means that constructivism is linked to approaches that usually do not use each other’s methods. The last index we use is the eigenvector centrality, which is a measure of the “influence” of a node. Here, we have a similar situation: constructivist approach has the maximum value of the index, whereas the eigenvector centrality of cultural approach is significantly smaller. Among approaches that have the same number of connections, normative approach has a higher number of influent connections than behaviorism, and Marxism has more influent connections than institutionalism. Among methods, interpretive experiment is a curious case because it has the highest values of degree and betweenness centrality, whereas its eigenvector centrality is relatively small. At the same time, the participant observation and an interview, both with the average connectedness and zero betweenness, have quite a high level of influence. While interpretation should be made with a caution, from the network analytic perspective it is apparent that these two methods influenced many more approaches that they are given credit for in the methodological textbooks.
5 Discussion and Conclusion In this paper, we identify three clusters or paradigms of related ontologies, epistemologies, methodological approaches, and methods in political science. First, a very dense and consistent paradigm is associated with qualitative methods, normative, cultural, and constructivist approaches, interpretative and post-structuralist epistemologies, and relativism as ontology. Contrary and opposite to this “qualitative interpretivism” is the “quantitative behaviorism.” The former is less a consistent and fuzzy cluster of quantitative methods, which share behavioral research focus, positivist epistemology and naturalist ontology. The dichotomy and opposition between qualitative and quantitative methodology is well known, described, and studied a lot in the literature, but we have found the specific methods, methodological approaches, and even epistemological and ontological foundation, which occupy an intermediate position in quantitative–qualitative continuum of methods. These provide some evidence that by using such methods as large-N comparison, interpretative experiment, and Bayesian methods; studying institutions and sharing post-positivist
Mapping Paradigms of Social Sciences: Application of Network Analysis
251
theory-driven logic of scientific discovery of the social science researcher might be able to expand the horizons of applied methods, and overcome the shortcomings of quantitative as well as qualitative methodologies. It remains unclear, and should be the subject of the follow-up studies, whether these methods, due to their high betweenness centrality, have more explanatory power. Also, in our analysis, the system of axes is set by ontologies. The opposition between naturalism and relativism defines the first dimension, and realism—the second. As a result, we can argue that methodology is defined or biased by ontological position of the researcher. We receive not only the justification of this statement but also the concrete descriptions and definitions of such the oppositional ontologies. Of course, this conclusion can be biased by the design of this research and currently also remains the subject of further research. We propose the following directions for the follow-up study. We limit our analysis to five textbooks of political science methodology. The more textbooks from other social science disciplines (such as sociology or social psychology) can change the picture described in this paper. We propose to analyze more paradigms specific to the concrete social science disciplines and to justify the universal character of some already discovered paradigms. In addition to books, journal articles can provide a rich source of information. The interviews with experts on the philosophy of science and methodology of social research can be beneficial to the data already collected and the survey of social scientists (political scientists, sociologists, and psychologists). Additional data with a different level of analysis—either a journal or scholar (not a textbook) will provide more opportunities to build networks of methodologies and methods used and the ontological and epistemological foundations of the scholars. Usually in the textbooks, there is no disagreement about the quantitative methods: they are all related to the positivism, and qualitative methods are blurry. We received contrary results: qualitative methods are in one dense group in association with each other and with normative, cultural, and constructivist approaches, interpretative and post-structuralist epistemologies, and relativism. Marxist approach stands alone but is close to realism. In the middle, there are methods of large-N comparison, Bayesian methods, and interpretive experiment, which, again can indicate their role as brokers between different methods, approaches, epistemologies, and ontologies. The rest are spread between positivism and post-positivism. Bayesian methods and interpretive experiment occupy the position (the closest to the center) between institutionalism and epistemology as post-positivism. They can be viewed as the brokers at the more general, meta-method level. Other explanation of brokerage can be the concept on “integrators of the discipline.” “Integrators of the discipline” is defined as people whose work is influential across multiple branches of the discipline’ [11]. In our case, such integrators are not scholars but methods, approaches, and even epistemologies, which are appropriate for the great diversity of tasks in knowledge discovery and generation. There are no ontologies in this list as they are the most opposite to each other as they define basic paradigmatic dimensions and boundaries along with other most opposite epistemologies (positivism versus post-structuralism) and approaches (behaviorism versus normativism).
252
D. Zaytsev and D. Drozdova
Each new source of information can provide more data, which will be transformed into the matrices of correlations and allow a comparative analysis of the sources and to identify the strengths and weaknesses of the proposed methods of analysis. What is more important here is that additional analysis with more data and units of analysis will verify the results that were received in this paper. Acknowledgements The authors express their gratitude to Dr. Valentina Kuskova, Head of International Laboratory for Applied Network Research, who has been supervising this research from the initial idea to the final version. Funding The article was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE) and supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100.’
Appendix List of textbooks used for the analysis: • [8]: Box-Steffensmeier, J.M., Brady, H.E., Collier, D. (eds.) The Oxford Handbook of Political Methodology (2008) • [9]: Marsh D., Stoker G. (eds.) Theory and Methods in Political Science (2010) • [6]: Della Porta D., Keating M. (eds.) Approaches and Methodologies in Social Sciences (2008) • [3]: Moses, J.W., Knutsen, T.L. Ways of Knowing: Competing Methodologies in Social and Political Research (2012) • [7]: Abbott, A. Methods of Discovery: Heuristics for the Social Sciences (2004)
References 1. Huntington, S.P.: The Third Wave: Democratization in the Late Twentieth Century. University of Oklahoma Press, Norman and London (1991) 2. Acemoglu, D., Naidu, S., Restrepo, P., Robinson, J.A.: Democracy Does Cause Growth (2015). https://economics.mit.edu/files/10759. Assessed 18 Aug 2017 3. Moses, J.W., Knutsen, T.L.: Ways of Knowing: Competing Methodologies in Social and Political Research. Palgrave Macmillan (2012) 4. Denzin, N.K.: The Research Act in Sociology. Butterworth, London (1970) 5. Silverman, D.: Qualitative Methodology and Sociology: Describing the Social World. Gowel, Aldershot (1985) 6. Della Porta, D., Keating, M.: Approaches and Methodologies in Social Sciences. Cambridge (2008) 7. Abbott, A.: Methods of Discovery: Heuristics for the Social Sciences. University of Chicago, Chicago (2004) 8. Box-Steffensmeier, J.M., Brady, H.E., Collier, D. (eds.): The Oxford Handbook of Political Methodology. The Oxford University Press, Oxford (2008) 9. Marsh, D., Stoker, G.: Theory and Methods in Political Science. Palgrave Macmillan (2010)
Mapping Paradigms of Social Sciences: Application of Network Analysis
253
10. Zhu, M., Kuskova, V., Wasserman, S., Contractor, N.: Correspondence analysis of multirelational multilevel network affiliations. In: Lazega, E., Snijders, T.A.B. (Eds.) Multilevel Network Analysis for the Social Sciences. Methodos Series, vol. 12, pp. 145–172 (2016) 11. Goodin, R.E.: The state of the discipline, the discipline of the state. In: Goodin, R.E. (ed.) The Oxford Handbook of Political Science. The Oxford University Press, Oxford (2011)
Part III
Network Applications
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices to Classify Structural Brain Networks Mikhail Belyaev, Yulia Dodonova, Daria Belyaeva, Egor Krivov, Boris Gutman, Joshua Faskowitz, Neda Jahanshad and Paul Thompson
Abstract This paper presents a method of symmetric positive semidefinite (SPSD) matrices classification and its application to the analysis of structural brain networks (connectomes). Structural connectomes are modeled as weighted graphs in which edge weights are proportional to the number of streamline connections between brain regions detected by a tractography algorithm. The construction of structural brain networks does not typically guarantee that their adjacency matrices lie in some topological space with known properties. This makes them differ from functional connectomes—correlation matrices representing co-activation of brain regions, which are usually symmetric positive definite (SPD). Here, we propose to transform structural connectomes by taking their normalized Laplacians prior to any analysis, to put them into a space of symmetric positive semidefinite (SPSD) matrices, and apply methods developed for manifold-valued data. The geometry of the SPD matrix manifold is well known and used in many classification algorithms. Here, we expand existing SPD matrix-based algorithms to the SPSD geometry and develop classification pipelines on SPSD normalized Laplacians of structural connectomes. We demonstrate the performance of the proposed pipeline on structural brain networks reconstructed from the Alzheimer‘s Disease Neuroimaging Initiative (ADNI) data. Keywords Brain networks · Riemannian geometry
M. Belyaev (B) · D. Belyaeva · E. Krivov Skolkovo Institute of Science and Technology, Moscow, Russia e-mail:
[email protected] M. Belyaev · Y. Dodonova · D. Belyaeva · E. Krivov · B. Gutman Kharkevich Institute for Information Transmission Problems, Moscow, Russia E. Krivov Moscow Institute of Physics and Technology, Moscow, Russia B. Gutman · J. Faskowitz · N. Jahanshad · P. Thompson Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, University of Southern California, Marina del Rey, CA, USA © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_18
257
258
M. Belyaev et al.
1 Introduction Riemannian geometry offers a powerful approach to process structured data that can be represented by symmetric positive definite (SPD) matrices; these belong to a differentiable manifold, with a corresponding nonlinear structure. Riemannian geometry tools allow us to find geodesics (the shortest differentiable path between two SPD matrices) and calculate its length explicitly to use any distance-based algorithms. Alternatively, SPD matrices may be projected to a tangent space which approximates local structure of the manifold. As this space is Euclidean, projections can be analyzed as usual vectors. For the last decade, Riemannian-based analytic tools have been used in various medical imaging applications, including DTI-based tensor computing [15] and classifying brain functional data from electroencephalography (EEG) or functional magnetic resonance imaging (fMRI). EEG covariance matrices are commonly used in brain–computer interface algorithms; these matrices are symmetric positive definite, so methods based on Riemannian geometry can be applied. In EEG-based classification, such methods resulted in state-of-the-art algorithms that outperformed existing benchmarks [1, 2]. Covariance matrices based on fMRI data have attracted particular attention, given the rapid development of connectomics approaches [19]. These matrices can represent macroscale functional brain connectivity (functional connectomes). Recently, several Riemannian geometry-based algorithms have been proposed to classify functional connectomes. Slavakis et al. [18] introduced a novel approach to track timevarying brain connectivity patterns. They used geodesics on a Riemann manifold and tangent spaces of this manifold to cluster their observations, with excellent performance on synthetic data. Ng et al. [14] projected fMRI covariance matrices onto a tangent space of a Riemann manifold of SPD matrices to reduce interrelation of covariance matrix elements; this significantly improved classification accuracy in four task-related fMRI experiments. Functional activation signals across multiple brain regions may be represented by covariance matrices, allowing us to apply Riemannian geometry to their classification. However, this does not hold for structural connectivity matrices which capture information on anatomical pathways rather than functional interrelations between brain regions. Elements of these matrices represent the number of streamlines between the respective cortical regions estimated using a diffusion-based tractography algorithm. As such, structural connectomes are undirected graphs, and their adjacency matrices are symmetric; importantly as opposed to functional images, they have indefinite connectivity matrices. This difference is crucial for approaches based on Riemannian geometry as by definition these techniques can be used only for symmetric positive definite matrices. Dodero et al. [6] proposed a method to overcome this limitation. They studied a Riemannian geometry-based algorithm for both functional and structural connectomes. For structural connectomes, they calculated the Laplacians of the original connectivity matrices to obtain symmetric positive semidefinite (SPSD) matrices,
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices …
259
and added the identity matrix with a small multiplier to produce SPD matrices and applied the same classification pipeline as for functional connectomes. The experiments on the Autism Spectrum Disorder connectome database demonstrated a classification accuracy of 60.76% and 68% for functional and structural connectomes, respectively. The Laplacian is a useful transformation for a structural connectivity matrix as it guarantees its positive semi-definiteness. However, Laplacians are not invariant to scaling of the corresponding adjacency matrix, and scaling of structural connectivity matrices is known to affect classification accuracy [16]. Another limitation of Dodero et al.‘s method is the regularization of the Laplacians. It serves the purpose of converting SPSD Laplacians into SPD matrices; but manipulating the Laplacian diagonal elements makes the Laplace operator hard to interpret. Here, we introduce an approach to overcome the two limitations, scaling and regularization, providing a framework to classify disease in human structural connectivity matrices. First, we use normalized Laplacians to facilitate intersubject comparability of connectomes to avoid the scaling problem. Second, we introduce a Riemannian metric for SPSD matrices, a step that allows us to work with network Laplacians directly, without any preliminary regularization. This paper is organized as follows: Sect. 2 reviews Riemannian geometry-based methods for classifying SPD matrices. Section 3 introduces the two novel classification algorithms that use the SPSD normalized Laplacians of the structural matrices. Section 4 demonstrates disease classification results from the structural connectomes derived from the Alzheimer‘s Disease Neuroimaging Initiative (ADNI).
2 Notation and Elements of Riemannian Geometry Before proceeding to classification approaches, we introduce some notation. We consider a binary classification of SPD matrices, so as input data we have a set of pairs consisting of an input (an SPD matrix) and an output (a class label): {Xi , yi }, Xi ∈ P(n), yi ∈ {0, 1}, i = 1, . . . , N ; where n is the shape of all matrices, P(n) is the space of SPD matrices Rn×n , and N is the number of observations. The set of all SPD matrices—the positive symmetric cone—forms a differentiable manifold M of n ∗ = n(n + 1)/2 dimensions; see [7] for details. Two key requirements for the construction of classification algorithms are the Riemannian distance and a projection of SPD matrices into a Euclidean space. The Riemannian distance between two SPD matrices is the length of a geodesic curve which is the shortest differentiable path connecting these matrices represented as points on M . This can be calculated in an explicit way from the following equation: −1/2 −1/2 δspd (Xi , X j ) = logm Xi X j Xi , F
(1)
260
M. Belyaev et al.
where logm () is the matrix logarithm, · F is the Frobenius norm. δspd takes into account the nonlinear structure of M and provides a better metric than the usual Euclidean distance between the vectorized upper triangle of the matrices Xi , X j both in terms of theoretical properties as well as the quality of covariance matrices classification [1]. The second useful property of Riemannian geometry is an ability to project SPD matrices on a tangent space which, by definition, approximates the local structure of M . Let τX be a tangent space of M at point X. Then we can project a covariance matrix Xi onto the tangent plane τX by: ˜ i = X1/2 logm(X−1/2 Xi X−1/2 )X1/2 . X
(2)
3 Classification of SPD Matrices Most SPD matrix-based classification algorithms may be split into two groups: 1. Kernel methods based on a precomputed matrix of pairwise distances between SPD matrices [2, 6, 10]. 2. Two-step algorithms, which consist of (1) a projection of all SPD matrices to a tangent space selected appropriately and (2) a classification of the projected matrices which are vectors now by a linear classification algorithm, such as linear discriminant analysis or logistic Regression [1, 11, 14].
3.1 Kernel Methods Kernel-based methods are quite popular machine learning methods for connectomics as they can process connectomes as graphs, whereas other methods usually require connectomes to be converted into vectors (i.e., vectors of global graph metrics) [4]. The core idea of kernel-based methods is based on introducing a distance or a distance-like measure between connectomes. For example, a distance based on similarity in structural connectomes partitions into communities was introduced in [13]. Riemannian-based approaches apparently use a metric on M . Once the distance δ(Xi , X j ) is selected, the classification algorithm is straightforward. We can build a kernel from a distance by: K R B F (Xi , X j ) = ex p −γ δ(Xi , X j )2 ,
(3)
where γ > 0 is a parameter. As δ is a metric, K R B F is a positive definite kernel [10]. Now, we can use a Support Vector Machine [17] to build a classifier based on K R B F .
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices …
261
3.2 Classification in a Tangent Space We can now project our SPD matrices to a tangent space τX using (2). As τX is a Euclidean space, any ordinary classification algorithm can be applied to classify the projections. The most essential question is the choice of a reference point X. It N to keep local approximation by τX should be close to the training points {Xi , yi }i=1 accurate enough. Usually, the geometric mean of the training points is used as the reference point. An efficient algorithm to compute the mean of a set of SPD matrices was described in [2].
4 The Proposed Approaches In this section, we consider a binary classification of structural connectome matrices, so as input data we have a set of pairs of input (an adjacency matrix) and output (a class label): {Ai , yi }, Ai ∈ S(n), yi ∈ {0, 1}, i = 1, . . . , N ; where n is the number of nodes in each connectome, S(n) is the space of all symmetric matrices Rn×n , and N is the number of observations.
4.1 Converting Structural Connectomes to SPSD Matrices The first step is to convert a structural connectome to an SPSD matrix; we calculate its normalized Laplacians by: −1/2
Xi = Di
−1/2
(Di − Ai )Di
,
where Di is the diagonal matrix of weighted node degrees, Di |k,k =
(4) l
Ai |k,l .
4.2 Distance for SPSD Matrices Structural connectomes are indefinite matrices, but normalized Laplacians become symmetric positive semidefinite matrices. To use the algorithms in Sect. 3, we can replace our SPSD matrices by their regularized SPD versions Xi + αIn (α is a small positive number, In is the identity matrix) as in [6]. As an alternative more accurate approach, we propose to use a similarity measure for SPSD matrices of a fixed rank proposed in [3]. This is a generalization of the Riemannian distance δspd to the SPSD case, and can be computed as follows:
262
M. Belyaev et al.
1. Let X, Z be two SPSD matrices of rank m; 2. Find nonzero eigenvalues Sx , Sz and corresponding eigenvectors Vx , Vz of matrices X and Z, respectively. Matrices V∗ have shape n by m; diagonal matrices S∗ have shape m by m; X = Vx Sx Vx T , Z = Vz Sz Vz T . Vx and Vz are matrices that span range(X) and range(Z). 3. Apply Singular Value Decomposition to calculate orthogonal matrices Ox , Oz and a diagonal such that Ox Oz T = Vx T Vz T . 4. Calculate θk = ar ccos(σk ) (where σk are diagonal elements of ) which are the principal angles between two subspaces defined by Vx and Vz . 5. Select the principal vectors as Ux = Vx Ox , Uz = Vz Oz . 6. Compute the Riemannian distance between the projection of matrices X, Z to the common subspace: δcs (X, Z) = δspd (Ux T X Ux )1/2 , (Uz T Z Uz )1/2 .
(5)
7. Compute the distance-like measure between matrices X and Z: k 2 0.5 = ( 2 + kδcs ) , δspsd
(6)
where k > 0 is a parameter and is the vector composed of principal angles θk . k is a parameter of the described family of similarity measures. Theoretical results do not offer recommendations for selecting k, [3], so we investigated this question empirically (Sect. 5.3). The measure, δspsd , is not a metric as it does not satisfy the triangle inequality, but it has several useful properties and a vivid geometric interpretation. Considering our SPSD matrices Xi , X j as flat ellipsoids in R n , we can decompose δspsd (Xi , X j ) into two parts: , a distance between the subspaces in which the ellipsoids are contained and δcs , the Riemannian distance between the ellipsoids within a common subspace of SPD matrices P(m). For more detailed explanations, see [3, Sect. 5].
4.3 Algorithm for Kernel SVM Based on δspsd A generalization of kernel-based approaches is straightforward: 1. 2. 3. 4.
Calculate matrix of pairwise “distances” δspsd using (6). Use (3) to calculate the RBF kernel. Build an SVM model using the defined kernel. Kernel width γ and δspsd weight k are two hyperparameters of the algorithm which should be selected using a nested cross-validation procedure.
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices …
263
4.4 Algorithm for Dimensionality Reduction Based Classification An adaptation of tangent space-based methods to the SPSD case is more complex as we cannot build a tangent space at the matrices’ geometric mean. To avoid this limitation, we apply Isomap, a nonlinear dimensionality reduction algorithm [21]. The algorithm is based on the idea of geodesic distances on a manifold. To find such distances, it builds a neighborhood graph internally and then computes the shortest path between two nodes, which can be considered as a good approximation of the geodesic distance. Then it applies classical multidimensional scaling [12] to find a low-dimensional embedding which preserves the matrix of pairwise geodesic distances. The original Isomap algorithm works with Euclidian space and uses L2 distance to build neighborhood graph, but we can also use other distances. In particular, the Riemannian distance between EEG covariance SPD matrices was used in conjunction with Isomap in [11]. We propose δspsd as a distance-like measure which can replace L2 in the case of the SPSD matrices space. The algorithm is as follows: 1. Calculate matrix of pairwise “distances” δspsd using (6). 2. Find a d-dimensional embedding using Isomap algorithm based on the precomputed matrix of pairwise “distances”. This step converts our initials set of SPSD matrices into a set of Euclidean vectors. 3. Apply a standard classification method. In our experiments we used logistic regression with L2 penalty and regularization coefficient λ. 4. The dimensionality d and regularization coefficient λ are two hyperparameters of the algorithm which should be selected using a nested cross-validation procedure.
5 Experiments To evaluate the proposed methods, we conducted computational experiments using ADNI, one of the largest public neuroimaging database. We estimated quality of phenotypes classification and compared the proposed approaches with two baseline methods as well as investigated the effect of multiplier k which determines the impact each of the two terms of the similarity measure (6).
5.1 Data We used the ADNI2 database, consisting of 228 individuals (748 scans; mean age at baseline visit 72.9 ± 7.4 y, 96 women). The ADNI2 cohort subdivides people into participants with Alzheimers disease (AD), mild cognitive impairment, and normal controls (NC), including 47 people (135 scans) with AD, 40 (145 scans) NC, and 80
264
M. Belyaev et al.
(278) individuals with late-stage and early-stage MCI (LMCI/EMCI), and 61 (190) NC. T1-weighted (T1w) images were processed with FreeSurfer [8], where we used cortical parcellation based on the Desikan-Killiany atlas [5] of 68 cortical brain regions. In parallel, the average b0 of the DWI images was registered to the downsampled (2 mm isotropic MNI) T1w image, to account for susceptibility artifacts. DWI images were corrected for eddy current and motion-related distortions; b-vectors were rotated accordingly. Probabilistic streamline tractography was performed using the Dipy [9] LocalTracking module with constrained spherical deconvolution (CSD) [20]. Streamlines longer than 5 mm with both ends intersecting the cortical surface were retained. Edge weights in the original cortical connectivity matrices were thus proportional to the number of streamlines detected by the algorithm.
5.2 Setup of Experiments We compared four methods: 1. As the simplest baseline method, we used kernel SVM with L 2 distance between matrices Ai and the RBF kernel (3). 2. Kernel-based approach proposed by Dodero et al. We calculated connectomes Laplacians, then added the identity matrix with coefficient α = 10−3 (the algorithm is robust to selection of this coefficients as suggested by Dodero et al.), and calculated the log-Euclidean distances between the obtained SPD matrices. For details, see [6]. k similarity measure for a set of k values, see 3. RBF Kernel SVM based on the δspsd Sect. 4.3. 4. Dimensionality reduction for SPSD matrices combined with the L2-penalized logistic regression classifier in the reduced space, see Sect. 4.4. For Kernel-based methods, we used a logarithmic grid of α parameters in (3) and varied SV M regularization parameter. For Isomap algorithm, we varied the reduced dimensionality d from 2 to 10. For logistic regression, we used a logarithmic grid of regularization parameters λ. Parameters for all algorithms were selected using group 10-fold cross validation (we use group K -fold approach to ensure that all scans from a single subject belongs to a single fold). For the best set of parameters, we repeated 10-fold cross validation with 10 new partitioning of the data to evaluate the performance.
5.3 Results Table 1 contains all results of our computational experiments. The obtained quality of classification suggests that for the AD versus NC classification problem both the
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices …
265
Table 1 Quality of the compared methods on the 4 binary classification problems. Quality was measured as the area under ROC curve estimated by ten repeats of Group 10fold cross validation Task Isomap: SPSD Kernel SVM: Kernel SVM: reg. Kernel SVM: L 2 SPSD SPD AD versus NC AD versus LMCI LMCI versus EMCI EMCI versus NC
0.816 ± 0.006 0.678 ± 0.006 0.341 ± 0.059
0.816 ± 0.007 0.686 ± 0.014 0.441 ± 0.022
0.800 ± 0.005 0.688 ± 0.015 0.478 ± 0.024
0.772 ± 0.010 0.655 ± 0.016 0.447 ± 0.029
0.538 ± 0.012
0.571 ± 0.015
0.539 ± 0.016
0.504 ± 0.018
Fig. 1 Structural connectomes for all AD and NC individuals projected to the plane. Blue and red colors depict NC and AD cohorts, respectively. The dots depict MRI scans, the transparent areas depict kernel density estimation for each group
proposed approaches outperform baseline methods whereas the method from Dodero et al. [6] shows better performance than the simple approach which ignores the matrix structure of connectomes completely. As an additional evaluation of the proposed SPSD-based dimensionality reductionbased approach, we use it to project our data into a two-dimensional space. Figure 1 shows the projections of the input matrices for all AD and NC individuals; even 2-D representation is quite informative. As we mentioned in Sect. 4.2, we need to study the effect of parameter k empirically. Figure 2 shows the dependency of AD versus NC classification problem quality as a function of k. Notably, the curves are smooth in general, so we can assume that
266
M. Belyaev et al.
Fig. 2 Quality of classification as a function of similarity measure parameter k. 4 lines depict the mean value of the area under ROC curve estimated by Group 10fold cross validation. Transparent areas depicts standard deviation
the proposed methods are robust to small variations of the parameter. Interestingly, the optimal values of parameter k are different for the SVM-based method and the dimensionality reduction based one. We assume that the further investigations of this effect are needed.
6 Conclusions We demonstrated how the powerful framework of Riemannian geometry can be generalized for the analysis and classification of positive semidefinite matrices. We assured positive semi-definiteness of the connectivity matrices of human structural brain networks by taking their normalized Laplacians. We showed that the proposed pipeline outperforms the baselines on the Alzheimers Disease Neuroimaging Initiative dataset which was selected as one of the largest publicly available set of neuroimaging data. Acknowledgements Some data used in preparing this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database. A complete listing of ADNI investigators and imaging protocols may be found at http://www.adni.loni.usc.edu.The results of Sects. 2–6 are based on the scientific research conducted at IITP RAS and supported by the Russian Science Foundation under grant 17-11-01390.
Using Geometry of the Set of Symmetric Positive Semidefinite Matrices …
267
References 1. Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Multiclass brain-computer interface classification by Riemannian geometry. IEEE Trans. Biomed. Eng. 59(4), 920–928 (2012) 2. Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Classification of covariance matrices using a Riemannian-based kernel for BCI applications. Neurocomputing 112, 172–178 (2013) 3. Bonnabel, S., Sepulchre, R.: Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM J. Matrix Anal. Appl. 31(3), 1055–1070 (2009) 4. Brown, C.J., Hamarneh, G.: Machine learning on human connectome data from MRI. arXiv:1611.08699 (2016) 5. Desikan, R.S., Ségonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D., Buckner, R.L., et al.: An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31(3), 968–980 (2006) 6. Dodero, L., Minh, H.Q., San Biagio, M., Murino, V., Sona, D.: Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pp. 42–45. IEEE (2015) 7. Faraut, J., Korányi, A.: Analysis on symmetric cones (1994) 8. Fischl, B.: Freesurfer. Neuroimage 62(2), 774–781 (2012) 9. Garyfallidis, E., Brett, M., Amirbekian, B., Rokem, A., Van Der Walt, S., Descoteaux, M., Nimmo-Smith, I.: Dipy, a library for the analysis of diffusion MRI data. Front. Neuroinform. 8, 8 (2014) 10. Jayasumana, S., Hartley, R., Salzmann, M., Li, H., Harandi, M.: Kernel methods on the riemannian manifold of symmetric positive definite matrices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 73–80 (2013) 11. Krivov, E., Belyaev, M.: Dimensionality reduction with isomap algorithm for EEG covariance matrices. In: 2016 4th International Winter Conference on Brain-Computer Interfaces (BCI), pp. 1–4. IEEE (2016) 12. Kruskal, J.B., Wish, M.: Multidimensional Scaling, vol. 11. Sage (1978) 13. Kurmukov, A., Dodonova, Y., Zhukov, L.: Classification of normal and pathological brain networks based on similarity in graph partitions. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 107–112. IEEE (2016) 14. Ng, B., Varoquaux, G., Poline, J.B., Greicius, M., Thirion, B.: Transport on Riemannian manifold for connectivity-based brain decoding. IEEE Trans. Med. Imaging 35(1), 208–216 (2016) 15. Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006) 16. Petrov, D., Dodonova, Y., Zhukov, L., Belyaev, M.: Boosting connectome classification via combination of geometric and topological normalizations. In: 2016 International Workshop on Pattern Recognition in Neuroimaging (PRNI), pp. 1–4. IEEE (2016) 17. Scholkopf, B., Smola, A.J.: Learning with kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2001) 18. Slavakis, K., Salsabilian, S., Wack, D.S., Muldoon, S.F.: Clustering time-varying connectivity networks by Riemannian geometry: the brain-network case. In: 2016 IEEE Statistical Signal Processing Workshop (SSP), pp. 1–5. IEEE (2016) 19. Smith, S.M., Vidaurre, D., Beckmann, C.F., Glasser, M.F., Jenkinson, M., et al.: Functional connectomics from resting-state fMRI. Trends Cognitive Sci. 17(12), 666–682 (2013) 20. Tax, C.M., Jeurissen, B., Vos, S.B., Viergever, M.A., Leemans, A.: Recursive calibration of the fiber response function for spherical deconvolution of diffusion MRI data. Neuroimage 86, 67–80 (2014) 21. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Comparison of Statistical Procedures for Gaussian Graphical Model Selection Ivan S. Grechikhin and Valery A. Kalyagin
Abstract Graphical models are used in a variety of problems to uncover hidden structures. There is an important number of different identification procedures to recover graphical model from observations. In this paper, undirected Gaussian graphical models are considered. Some Gaussian graphical model identification statistical procedures are compared using different measures, such as Type I and Type II errors, ROC AUC. Keywords Gaussian graphical model · Identification procedure Statistical inference · Risk function · ROC AUC
1 Introduction Graphical models are used as applications in various areas of science such as bioinformatics, economics, cryptography, and many more [1, 2]. The main reason for extensive use of graphical models is the level of visualization, which allows to uncover hidden patterns and connections between entities. Gaussian graphical model is a graph, where every vertex is a random variable and the random vector is distributed according to the multivariate normal distribution [1]. Connections in the graph show some kind of pair-wise dependency between variables. Three types of graphical models are widely used. First of them is bi-directed graphical model. In this model, two random variables are marginally independent, if there is no edge between corresponding vertices. Such graphs are usually constructed on correlation matrices, where zero correlation means marginal independence or absence of the edge in the graph. Second I. Grechikhin (B) · S. Kalyagin Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, 136 Rodionova St, Nizhny Novgorod 603093, Russia e-mail:
[email protected] S. Kalyagin e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_19
269
270
I. S. Grechikhin and V. A. Kalyagin
type of pair-wise independence is conditional independence on all other variables. This leads to undirected graphical model. Undirected graphical model is constructed from concentration matrix, which is inverses of covariance matrix; zero element of the concentration matrix means lack of edge in the graph between two corresponding vertices. Finally, directed acyclic graphical model displays conditional independence on a subset of random variables. Gaussian graphical model selection problem is to recover the graphical model from observations. There are different statistical procedures for Gaussian graphical model selection. However, the reliability of the obtained results is under discussion. Drton and Perlman [1] suggested some methods of family-wise error rate control, using different p-value adjustment. Those methods control the predetermined level of family-wise Type I error (the probability of at least one Type I error in the model). The same authors [3] suggested SINful procedure as a way of obtaining more conservative and less conservative models. Later, this procedure was improved by introducing Minimal Covariance Determinant estimator [4], because it makes the SINful procedure more robust. Another approach comes from optimization field. One looks for best final network corresponding to the graphical model using some function, which estimates the score of a model [5–7]. Recently, L1-regularization for optimization function gains popularity in the literature [6, 7]. The reason behind this is the fact that lasso regularization has a property, where great part of the parameters in optimization function becomes zero. This fact helps to distinguish zero conditional correlation coefficients from nonzero values and obtain network, where edges connect conditionally correlated variables. This regularization is particularly useful in the case of sparse concentration matrices. The goal of the present paper is to analyze uncertainty of described in the Drton and Perlman’s statistical procedures for Gaussian graphical model selection. Additionally to FWER, Type II errors, ROC AUC, and risk functions are used for the comparison of uncertainty of selection procedures. Type II error is used to estimate total number of errors in a model or, in other words, the difference between true and selected model. Risk function represents a linear combination of the numbers of Type I and Type II errors [8]. ROC AUC curves are usual quality characteristic in binary classification. Numerical simulations are used to obtain the results.
2 Undirected Gaussian Graphical Models In this article, we consider undirected Gaussian graphical models. Let Y = (Y1 , ..., Yn ) be a random vector with multivariate Gaussian distribution N p (μ, Σ). Undirected graphical model is the graph G = (V, E), constructed using the dependency information between pairs of random variables. The set E of edges represents conditional independence through Markov properties. It means that if the edge (i, j) is absent from the set E, then two corresponding random variables from the random vector are conditionally independent, where the condition is induced on all other variables:
Comparison of Statistical Procedures for Gaussian Graphical Model Selection Fig. 1 Example of a graph, constructed with concentration matrix from Table 1
2
4
5
1
3
Yi Y j | YV {i, j}
271
7
6
(1)
This pair-wise Markov property or the conditional independence of two random variables also corresponds to the zero element of the concentration matrix. It is obtained from covariance matrix Σ by inversion. This matrix can be coerced to the partial correlation matrix as well: if the elements of the concentration matrix are {σ i j }, then: σij ρi j = √ (2) σ ii ∗ σ j j Here, is an example of the graph constructed on some concentration matrix (Figs. 1, 2, 3 and Table 1). If the dimensionality of the random vector is p, then the number of edges is . For every edge, there is a hypothesis: P = p∗( p−1) 2 h i j : Yi Y j | YV {i, j} or ρ i j = 0
(3)
ki j : Yi ∦ Y j | YV {i, j} or ρ i j = 0
(4)
against alternative
As a result, there are P different hypothesis to test to determine the structure of the graph. The hypothesis is rejected if the probability of it being true, based on the existing data is too small, less than some significance level α, chosen beforehand. There are a lot of procedures, which are described in the literature for constructing graphs on networks on data. Usually, the exact steps of procedures depend on the end goal, however, we can distinguish three common types of procedures. The first type might be called statistical because these procedures rely on some statistical properties of source data in order to find the network. In particular, different identification procedures of this type are suggested by Drton and Perlman [1]. Some properties of these statistical identification procedures will be observed in the paper.
272
I. S. Grechikhin and V. A. Kalyagin
Fig. 2 Average number of Type I errors for different number of observations
Fig. 3 Average number of Type II errors for different number of observations
Comparison of Statistical Procedures for Gaussian Graphical Model Selection Table 1 Example of concentration matrix Σ 1 2 3 1 2 3 4 5 6 7
1 0.465 0 0 0.511 0.392 0
0.465 1 0 0 0.448 0 0
0 0 1 0 0 0.32 0
273
4
5
6
7
0 0 0 1 0.262 0 0.314
0.511 0.448 0 0.262 1 0.459 0.42
0.392 0 0.32 0 0.459 1 0
0 0 0 0.314 0.42 0 1
Other types of procedures include procedures that optimize some goodness-of-fit function, that is based on the graph structure. Some of the procedures use Bayesian approach, which is considered more demanding because it needs to know prior information about distribution and compute posterior distribution. However, in this paper, only properties of statistical procedures will be observed. The goal of any procedure is to uncover the underlying patterns and connections of random variables. It means that there are basically two kinds of networks. True network or true correlation matrix defines how random variables are connected in reality, however, we usually do not know the true network structure. Therefore, we use data from observations on random variables, and from that data, we construct sample correlation matrix or sample network. This network represents some graph, which is considered to be close to the true network, however, the data may have provided us with some misleadings. As a result, there are errors that we can consider as a measure of uncertainty in the network. First of all, there are two types of errors: when we add edge, that do not exist in the true network and when we miss the edge that exists in true graph. The Type I Error is an error when hypothesis is rejected, however it is true in reality. In our case, Type I Error is when we consider establishing an edge between two edges, whereas in reality, the edge does not exist. Other way round, the Type II error is when we accept a hypothesis, which is not true in real case. That means that we do not establish edge that exists in the true network. Based on these two types of errors, there are a lot of different measures of uncertainty. The most simple measure is a number of Type I or Type II errors, or in other words, the total number of wrongly added or wrongly missed edges. Sometimes errors of Type I are called True Positive (FP) and errors of Type II are called False Negative (FN), and in that case, number of correctly allocated edges is called True Positive (TP), and number of correctly absent edges is True Negative (TN). One of the most popular measures are Family-Wise Error Rate (FWER), which is a probability of making at least one Type I error in the whole network or F W E R = P(F P > 0). False Discovery Rate (FDR) is a relation between the number of Type I errors and FP . Also known measures total number of edges in a sample network, or F D R = F P+T P of errors include so-called risk function, which is a linear combination of the number of Type I and Type II errors:
274
I. S. Grechikhin and V. A. Kalyagin
R(Y, α) = E(F P) ∗ (1 − α) + E(F N ) ∗ α
(5)
Additionally, in this paper, we consider area under receiver-operating characteristic (ROC) curve (ROC AUC). This curve is constructed for some procedure as follows: for any significance level, we calculate two characteristics: specificity, TN TP and sensitivity, Sen = T P+F . We draw all points in twowhich is Spe = F P+T N N dimensional space, where X = 1 − Spe and Y = Sen. As a result, when the significance level is zero, we will not reject any hypothesis, because there is no probability less than zero, T P = 0, F P = 0 and resulting point on a plot is (0, 0). On the contrary, if significance level is 1, using the same reasoning, all hypotheses will be rejected, T N = 0, F N = 0, as a result, the point will be (1, 1). Resulting curve will be drawn from (0, 0) to (1, 1). If on some interval the curve is situated under the y = x line, the output of the procedure might be inverted on the interval, which will only improve the procedure, however, optimal procedures are expected to be located on the upper side of y = x line. Naturally, one can estimate the area under this whole curve, if it is equal to one, the procedure is optimal and ideal for any significance level; if area under curve is close to 0.5, the procedure practically does not differ from random decision. To sum up, ROC AUC allows to estimate the efficiency of the procedure as a whole, at different significance levels.
3 Identification Procedures All described measures allow us to compare different procedures. The procedures are described below. First of all, if we have observations Y (1) , ..., Y (n) from a given multivariate normal distribution, where each observation is a vector of length p. Sample correlation matrix can be derived from observations using sample covariance matrix and sample mean: S=
n 1 (m) (Y − Y )(Y (m) − Y )T n − 1 m=1 n 1 (m) Y = Y n m=1
(6)
(7)
Usually, for statistical procedures, one needs to compute p-values. P-value is essentially a probability of trueness of hypothesis, given the current observations. The distribution of sample correlation coefficient, when true correlation coefficient is zero is known for components of normal random vector: if ri j is such sample √ correlation coefficient, then n − 2 ∗ ri j 1 − ri2j has a t-distribution with n − 2 degrees of freedom. For sample correlation, coefficient n should be changed on n − p. Knowing this, we can obtain p-values for every sample correlation coefficient.
Comparison of Statistical Procedures for Gaussian Graphical Model Selection
275
The simultaneous multiple testing procedure is the simplest one to obtain the network. In this procedure, every hypothesis is tested independently at the same time with some chosen significance level. It means, that we compare the significance level with p-value for the hypothesis, if the p-value is lower, then the hypothesis is rejected and vice versa. It is worth noting that this procedure only controls the level of error in every hypothesis, but not in the whole network, which may be useful in some cases. Drton and Perlman [1] described procedures that control FWER in a network, which adjust p-value for every hypothesis. In this paper, four different adjustments are observed: Bonferroni adjustment: Bon f
πi j
= min {C 2p ∗ πi j , 1}
(8)
Sidak adjustment: 2
= 1 − (1 − πi j )C p πiSidak j
(9)
If we reorder p-values in such a way that π(1) ≤ π(2) ≤ ... ≤ π(C 2p ) , then Bonferroni adjustment with Holm step-down procedure: Bon f.Step
π(a)
= max min{(C 2p − b + 1) ∗ π(b) , 1} b=1,...,a
(10)
Sidak adjustment with Holm step-down procedure: Bon f.Step
π(a)
= max 1 − (1 − π(b) )C p −b+1 2
b=1,...,a
(11)
4 Experiments and Results In the article by Drton and Perlman [1], they conducted some experiments to show that described procedures control FWER at predetermined level. In the experiments, they used generated concentration matrix with p = 7, which has nine nonzero elements from the interval [0.2, 0.55]. The number of observations for one trial varied from 25 to 500. Their experiments showed that step procedures with Bonferroni and Sidak adjustments are closing in to predetermined significance level α. However, non-step procedures with the same adjustments are going further away from the predetermined level, and the real FWER controlling level for those procedures is much lower than chosen α. First goal of the experiments was to repeat described experiments for matrices of higher dimensionality ( p = 25) and different concentration matrix densities (q = 0.2, 0.4, 0.6, 0.8, 0.95). Number of observations were chosen from 100 to 500 with step 100. The obtained values are averaged amongst 1000 trials. Tables 2, 3, 4, and 5 show four different adjustment procedures for matrix with p = 25 and q = 0.2. It means that there are 300 possible connections, and about
276
I. S. Grechikhin and V. A. Kalyagin
Table 2 Type I and Type II errors with p = 25, q = 0.2 Bonferroni 100 200 300 E(P(F P > 0)) E(P(F N > 0)) E(F P > 0) E(F N > 0)
0.069 1 0.096 42.124
0.057 1 0.075 33.732
0.063 1 0.079 29.145
Table 3 Type I and Type II errors with p = 25, q = 0.2 Bonferroni Step 100 200 300 E(P(F P > 0)) E(P(F N > 0)) E(F P > 0) E(F N > 0)
0.07 1 0.098 42.033
0.062 1 0.08 33.658
0.069 1 0.085 29.046
Table 4 Type I and Type II errors with p = 25, q = 0.2 Sidak 100 200 300 E(P(F P > 0)) E(P(F N > 0)) E(F P > 0) E(F N > 0)
0.071 1 0.099 42.016
0.065 1 0.084 33.59
0.071 1 0.089 29.963
Table 5 Type I and Type II errors with p = 25, q = 0.2 Sidak Step 100 200 300 E(P(F P > 0)) E(P(F N > 0)) E(F P > 0) E(F N > 0)
0.077 1 0.105 41.921
0.07 1 0.089 33.507
0.074 1 0.093 28.866
Table 6 Type I and Type II errors with p = 25, q = 0.6 Sidak Step 100 200 300 E(P(F P > 0)) E(P(F N > 0)) E(F P > 0) E(F N > 0)
0.038 1 0.105 153.506
0.048 1 0.089 117.174
0.052 1 0.093 95.569
400
500
0.058 1 0.068 26.204
0.06 1 0.065 23.865
400
500
0.062 1 0.073 26.123
0.062 1 0.069 23.803
400
500
0.063 1 0.075 26.045
0.071 1 0.078 23.734
400
500
0.065 1 0.08 25.96
0.075 1 0.083 23.666
400
500
0.041 1 0.08 82.109
0.067 1 0.083 73.635
Comparison of Statistical Procedures for Gaussian Graphical Model Selection Table 7 Type I and Type II errors with p = 25, q = 0.95 Sidak Step 100 200 300 E(P(F P > 0)) E(P(F N > 0)) E(F P > 0) E(F N > 0)
0.003 1 0.003 263.477
0.008 1 0.008 219.876
0.004 1 0.004 191.627
277
400
500
0.005 1 0.005 171.954
0.012 1 0.012 157.58
60 true edges. As a result, there can be not more than 60 Type II errors in total. Tables 6 and 7 show Sidak step-down adjustment procedure for different densities of matrices. Additionally, there are some graphics, which show the progression of average number of Type I and Type II errors with increasing number of observations: To sum up, experiments showed that: • Achieved practical level of FWER depends on density of a matrix for any of the procedures • The number of Type II error is decreasing with the number of observations, however, it is particularly high in case of greater dimensionality even for significant number of observations (for p = 7 case this number is usually 0 for 400–500 observations, which is not true for p = 25 case) • The number of Type II errors is a bit smaller for step procedures, the number of Type I errors is also slightly bigger for step procedures, however, the difference is small in comparison to the total number of errors, or wrongly defined connections. In another experiment, we compared ROC AUC for the four procedures with adjustments and simultaneous multiple testing procedure. ROC AUC measure allows us to compare these procedures without looking at their significance level. The experiments were conducted for the matrix with p = 7, which has nine nonzero elements from the interval [0.2, 0.55], the values are averaged amongst 1000 trials. Increasing number of observations led to increasing ROC AUC, which can be clearly seen from Figs. 4 and 5. The best ROC AUC is achieved for Sidak adjustment
Fig. 4 ROC for different procedures, n = 10, 20
278
I. S. Grechikhin and V. A. Kalyagin
Fig. 5 ROC for different procedures, n = 50, 150
Fig. 6 Risk function for different procedures, n = 10, 20
Fig. 7 Risk function for different procedures, n = 50, 150
and simultaneous multiple testing procedure without adjustments. Differences between those three procedures are insignificant.
Comparison of Statistical Procedures for Gaussian Graphical Model Selection
279
Finally, we compare risk functions for different procedures. On the picture, horizontal axis is different values of α and vertical axis is values of risk function. The values are averaged fro 1000 trials. According to Figs. 6 and 7, the maximum of risk function for simultaneous multiple testing procedure happens for different α, than for Sidak and Sidak step-down adjustment procedures. This can be explained by the structure of the experiment: αlevel for risk function should be taken as in simultaneous multiple testing procedure, however for adjustment procedures, we control FWER-level instead of individual level of every hypothesis. As a result, we have to somehow obtain individual α-level 1 2
from FWER-level. For example, αind = 1 − (1 − α F W E R ) C p . Additionally, the value of risk function for Sidak adjustment and simultaneous multiple testing procedures decrease with the number of observations, whereas Sidak step-down adjustment raises the value of risk function with the number of observations.
5 Conclusion In the article, we analyzed procedures, described in Drton and Perlman from some new points of view. Despite the control of FWER for these procedures, they perform poorly from the point of view of Type II errors. Sidak, Sidak step-down adjustment procedures and simultaneous multiple testing procedure show similar ROC curves with almost equal AUC score, which improves with growing number of observations. As a result, Sidak, Sidak step-down adjustment procedures and simultaneous multiple testing procedure may be considered best amongst analyzed, however, some of their properties are still not satisfactory. The directions for the future work include analysis of goodness-of-fit procedures and research of properties of observed procedures for other elliptical distributions. Acknowledgements The work is partially supported by RFBR grant 18-07-00524.
References 1. Mathias, D., Perlman, M.D.: Multiple testing and error control in gaussian graphical model selection. Stat. Sci. 22(3), 430–449 (2007) 2. Jordan, M.I.: Graphical models. Stat. Sci. 19(3), 140–155 (2004) 3. Drton, M., Perlman, M.D.: A SINful approach to Gaussian graphical model selection. J. Stat. Plan. Inference 138, 1179–1200 (2008) 4. Gottard, A., Pacillo, S.: Robust concentration graph model selection. Comput. Stat. Data Anal. 54, 3070–3079 (2010) 5. Schafer, J., Strimmer, K.: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21(6), 754–764 (2005) 6. Khondker, Z.S., et al.: The Bayesian covariance lasso. Stat Interface 6(2), 243–259 (2013) 7. Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19–35 (2007) 8. Kalyagin, V.A., Koldanov, A.P., Koldanov, P.A., Pardalos, P.M.: Optimal statistical decision for Gaussian graphical model selection. Cornell University Library (stat.ML). arXiv:1701.02071
Sentiment Analysis Using Deep Learning Nikolay Karpov, Alexander Lyashuk and Arsenii Vizgunov
Abstract The study was aimed to analyze advantages of the Deep Learning methods over other baseline machine learning methods using sentiment analysis task in Twitter. All the techniques were evaluated using a set of English tweets with classification on a five-point ordinal scale provided by SemEval-2017 organizers. For the implementation, we used two open-source Python libraries. The results and conclusions of the study are discussed. Keywords Natural language processing · Sentiment analysis · Deep learning
1 Introduction There is currently a growing interest in social network analysis due to the expanded role of the latter. The fundamental trend is that people are mostly communicating and collaborating inside these social networks. According to the statistics, the trend of the popularity of social networks is growing steadily. The majority of people now have at least one account at a social network and people communicate there via exchanging text messages. It seems to be extremely important to be able to analyze this type of data and reveal its hidden properties. One of the most popular formats of communication in a social network is a phenomenon of a post. It is an arbitrary message, expressing thoughts and ideas of a speaking person. Such a post usually appeals to peoples emotions and audience start to actively discuss it by putting more and more comments. An increasing amount of publications in the sphere of the Internet texts analysis reveals quite interesting N. Karpov (B) · A. Lyashuk · A. Vizgunov Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] A. Lyashuk e-mail:
[email protected] A. Vizgunov e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_20
281
282
N. Karpov et al.
properties of modern texts. In particular, one can assign sentiment index to every piece of a text. The task usually involves detecting whether it expresses a POSITIVE, a NEGATIVE, or a NEUTRAL sentiment. While sentiment analysis seems to stay popular, more and more mathematical models are applied to automated prediction of the sentiment. We investigate new models mentioned in literature in comparison with other baseline methods. The remaining part of the current study is the following. In Sect. 2, we will give the detailed overview of the existing approaches and significant theoretical trends. Further, in Sect. 3, we will describe the experiment methodology we elaborated to perform experiments. The detailed overview of the proposed approach and baseline methods will be given in the Sect. 4. Results of the proposed approach will be shown in the Sect. 5. Finally, we will make a conclusion and further research directions analysis in the Sect. 6.
2 Literature Review Our research is based on the modern research for automatic natural language processing. The task to automatically identify a sentiment polarity is usually formulated as a classification problem. A classification problem of short messages is a well-known problem in the natural language processing field. This problem is traditionally solved by using machine learning approaches. For instance, sentences can be classified according to their readability, using pre-built features and classification algorithms like SVM, Random Forest, and others [5]. A relevant task for sentiment analysis is explored by many researchers [7, 9]. The practical applications of this task are wide, from monitoring popular events (e.g., Presidential debates, Oscars, etc.) to extracting trading signals by monitoring tweets about public companies. If quantification of the distribution of sentiment towards a topic across a number of tweets is more interesting than the polarity of each message, this task is usually called quantification [4, 6]. For instance, distribution of sentiment of USA social network users during a particular period is helpful to predict future movements of stock market indicators like DJIA (Dow Jones Industrial Average) or S&P500 [10]. The competition platform for sentiment analysis in Twitter has been run since 2013 and is called SemEval [11]. These applications often benefit greatly from the best possible accuracy, which is why the SemEval-2017 Twitter competition promotes research in this area. During last few years, neural networks become very popular for various machine learning tasks. Recent advances in Deep Learning allow us to effectively analyze user sentiment handling a big number of messages in social networks. [1] Two of the most popular deep-learning techniques for sentiment analysis are CNNs and LSTMs [2]. In the next section, we show our empirical results of applying deep-learning method in contrast to some traditional machine learning methods.
Sentiment Analysis Using Deep Learning Table 1 Source data table Id 0 1 2 3 4
628949369883000832 628976607420645377 629023169169518592 629179223232479232 629186282179153920
283
Topic
Text
p
@microsoft @microsoft @microsoft @microsoft @microsoft
dear @Microsoft the newOoffice... @Microsoft how about you make... I may be ignorant on this issue... Thanks to @microsoft, I just may... If I make a game as a #windows1...
−1 −2 −1 −1 0
3 Experiment Setup We consider message polarity classification task. It is required to define sentiment conveyed in the statement from Twitter on a five-point scale from negative to positive. We use SemEval-2017 dataset that contains English tweets. Each sentence is placed with the label (−2 strongly negative, −1 negative, 0 neutral, 1 positive, 2 strongly positive). For evaluation, we use macro-averaged mean absolute error (MAE). For all implementations, we use open-source Python libraries.
3.1 Dataset Overview We have 10,000 training and 20,000 test objects. Each object contains a tweet id, subject, text, and polarity (Table 1).
3.2 Data Preprocessing We removed id and topic features and made values of polarity from 0 to 4 instead of −2 to 2 (Table 2).
Table 2 The source data used for training Text 0 1 2 3 4
dear @Microsoft the newOoffice... @Microsoft how about you make... I may be ignorant on this issue... Thanks to @microsoft, I just may... If I make a game as a #windows1...
p 1 0 1 1 2
284
N. Karpov et al.
Table 3 The preprocessed data used for training Text 0 1 2 3 4
Dear the newooffice for mac is great and... How about you make system that doesn eat... May be ignorant on this issue but should we... Thanks to just may be switching over to... If make game as universal app will owners...
p 1 0 1 1 2
When building Machine Learning systems based on tweet data, a preprocessing is required. We used tweet-preprocessor that makes it easy to clean tweets from URLs, hashtags, mentions, reserved words, etc. After that, the data is ready to be used (Table 3).
4 Proposed Approach 4.1 Baseline First, we transformed tweets into TF-IDF terms and used baseline methods: • • • • • • • •
Logistic Regression Decision Tree Gradient Boosting K Nearest Neighbors Multilayer Perceptron Naive Bayes Random Forest Support Vector Machine
It can be seen from the bar chart that the perceptron showed the best results. Therefore, we decided that we need to move towards neural networks (Fig. 1).
4.2 Deep Learning We used Recurrent Neural Network (RNN) with Long Short-Term Memory layers (LSTM) [3]. Neural networks cannot work with a text directly. It needs to work with numerical features. We did not apply the same features as it was for baseline methods. Taking a word sequence in consideration, we transformed tweets into numerical sequences of the same length (Fig. 2 and Table 4).
Sentiment Analysis Using Deep Learning
285
Fig. 1 Amount of the MAE for baseline methods
Fig. 2 Structure of the LSTM layer Table 4 The preprocessed data used for training in neural network Text 0 1 2 3 4
[1182, 1, 13480, 8, 715, 7, 132, 4, 32, 21, 66... [88, 40, 10, 77, 1998, 17, 286, 1229, 20, 9673... [11, 13, 3280, 5, 22, 949, 21, 124, 33, 615, 9... [385, 2, 23, 11, 13, 6623, 110, 2] [29, 77, 87, 39, 4675, 942, 26, 3659, 13, 738...
p 1 0 1 1 2
Inside the neural network, we used a vector representation of words. This is a special representation of a word in an n-dimensional space. Each word corresponds to a vector of n real numbers. The idea is that if the words are similar, then they are located side by side. Here, for example, how it would look in two-dimensional space (Fig. 3).
286
N. Karpov et al.
Fig. 3 Two-dimensional example of vector representation of words
To implement the neural network, we used the Keras library,1 which, in fact, is the shell for the Theano library in our case. We use a sequential model to input a sequence of words (the layers go one by one). 1. The Embedding layer—the layer for the vector representation of words. The settings indicate that there are 25,000 different words in the dictionary, the sequence consists of no more than 50 words, and the dimension of the vector representation is 200. 2. Two LSTM layers. 3. The Dropout layer is responsible for overfitting. It nulls the random half of the features and prevents the coadaptation of the weights in the layers. 4. Dense—fully connected layer. 5. Activation layer—gives an integer value from 0 to 4 (using the softmax function of activating). Listing 1.1 Neural Network Definition.
max_features = 25000 maxlen = 50 embedding_dim = 200 model = Sequential () model.add(Embedding(max_features , embedding_dim, input_length=maxlen)) model.add(LSTM(64, return_sequences=True)) model.add(LSTM(64)) model.add(Dropout(0.5)) model.add(Dense(5)) model.add( Activation ( ’softmax ’ ) ) model. compile( loss=’categorical_crossentropy ’ , optimizer=’adam’ , metrics = [ ’accuracy ’ ] ) 1 https://keras.io.
Sentiment Analysis Using Deep Learning
287
Fig. 4 Amount of MAE for evaluated method with baselines
First, we tried the approach with an automatic selection of the weights of the Embedding layer. But after that, we tried to use Global Vectors for Word Representation (GloVe) [8]. We used pretrained word vectors trained on 2 billion tweets as Embedding layer.2
5 Experimental Results Neural networks show the best result of message polarity classification. A recurrent neural network with a long short-term memory and pretrained Embedding layer of the vector representation of words showed the best MAE indicator—0.8390. The reason why this neural network is better than the network without using the pretrained Embedding layer of the vector representation of words is most likely that the size of the training sample is quite small—10,000 tweets and the set is very noisy. Consequently, the use of the pretrained Embedding layer of the vector representation of words compensates for the small size of the training set and improves the quality of the classification, due to the fact that the pretrained word vectors are tailored to the language model of the social network Twitter. Our open repository includes all source codes and experimental results3 (Fig. 4).
6 Conclusion The aim of this research was to apply a simple deep-learning approach to predict sentiment of users which they express in messages of a social network. This approach 2 https://nlp.stanford.edu/projects/glove/. 3 https://github.com/lxdv/SemEval-2017.
288
N. Karpov et al.
is compared with some baseline method. We have used a dataset prepared by organizers of SemEval-2017 shared task, five points scale subtask. Using this dataset we have built some machine learning models to automatically predict a polarity of an arbitrary message from any social network user. Experimental results showed that model based on LSTM neural network with pretrained word embeddings allow to obtain significantly better results. We explain this by the fact that neural network with pretrained word embeddings used more data, even in an unsupervised regime. Acknowledgements The reported study was funded by RFBR according to the research Project No 16-06-00184 A.
References 1. Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: EMNLP, pp. 740–750 (2014) 2. Cliche, M.: Bb_twtr at SemEval-2017 task 4: twitter sentiment analysis with CNNs and LSTMs. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 564–571, Vancouver, Canada. Association for Computational Linguistics, Aug 2017 3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 4. Karpov, N.: NRU-HSE at SemEval-2017 task 4: tweet quantification using deep learning architecture. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval2017), pp. 681–686, Vancouver, Canada. Association for Computational Linguistics, Aug 2017 5. Karpov, N., Baranova, J., Vitugin, F.: Single-sentence readability prediction in Russian. In: International Conference on Analysis of Images, Social Networks and Texts_x000D_, pp. 91–100. Springer (2014) 6. Karpov, N., Porshnev, A., Rudakov, K.: NRU-HSE at SemEval-2016 task 4: comparative analysis of two iterative methods using quantification library. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 171–177, San Diego, California. Association for Computational Linguistics, June 2016. bibtex: karpov-porshnevrudakov:2016:SemEval 7. Kiritchenko, S., Mohammad, S.M., Salameh, M.: SemEval-2016 task 7: determining sentiment intensity of English and Arabic phrases. In: Proceedings of SemEval, pp. 42–51 (2016) 8. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014) 9. Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B.: Orphée de clercq, véronique hoste, marianna apidianaki, xavier tannier, natalia loukachevitch, evgeny kotelnikov, nuria bel, salud marıa jiménez-zafra, and gülsen eryigit. 2016. SemEval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval, vol. 16 (2016) 10. Porshnev, A., Redkin, I., Karpov, N.: Modelling movement of stock market indexes with data from emoticons of twitter users. In: Russian Summer School in Information Retrieval, pp. 297–306. Springer (2014) 11. Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 task 4: sentiment analysis in twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 493–509, Vancouver, Canada. Association for Computational Linguistics, Aug 2017
Invariance Properties of Statistical Procedures for Network Structures Identification Petr A. Koldanov
Abstract Invariance properties of statistical procedures for threshold graph identification are considered. An optimal procedure in the class of invariant multiple decision procedures is constructed. Keywords Random variables network · Network model · Network structures Threshold graph · Uniformly most powerful test · Invariance · Unbiasedness
1 Introduction Random variable network (RVN) is a pair (X, γ ), where X is a random vector and γ is a measure of similarity (dependency) of a pair of random variables. The concept of random variable network was introduced in [5]. These networks appear in different applications. Market network introduced in [2] is a particular case of random variable network. Random variable networks are closely connected with graphical models, which are known to be useful in bioinformatics and signal processing [4]. Any random variable network generates network model, which can be considered as visualization of similarities (dependencies) in random variable network. Network model is a complete weighted graph. One approach to analyze this graphs is to consider its subgraphs (network structures) which contain the key information of the graph. In practice, random variable network model is given by observations. Identification problem is to reconstruct a network structure from observations. Any identification algorithm is therefore a statistical procedure. In the present paper, we investigate a general statistical properties of some popular identification algorithms. One important subgraph of a complete weighted graph is the threshold graph (TG). In market network, threshold graph is known as market graph [2]. General statistical P. A. Koldanov (B) Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Bolshaya Pecherskaya 25/12, Nizhny Novgorod 603155, Russia e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_21
289
290
P. A. Koldanov
approach for market graph identification was developed in [8]. Optimal unbiased multiple decision statistical procedure for market graph identification in sign similarity network was constructed in [6]. The aim of the present paper is to investigate the optimality property of multiple decision procedures for threshold graph identification in Pearson correlation network. Our approach follows a general approach by Lehmann’s [9], where Bayes, minimax, and optimal unbiased procedures were investigated. The main result is the optimality of multiple decision statistical procedure based on individual correlation tests in the class of invariant procedures. In the proof, we use optimality property of individual correlation tests discussed in [1]. Note that, some invariant multiple decision procedures were considered in [3]. The paper is organized as follows. In the Sect. 2, the main concepts of random variable network, network model, and network structures are introduced and discussed. In the Sect. 3, the problem of network structures identification as multiple decision problem is formulated. In the Sect. 4, the question of optimality of individual correlation test is discussed. In the Sect. 5, the theorem of optimality of multiple decision statistical procedure based on individual correlation tests in the class of invariant procedures is proved.
2 Random Variable Network Random variable network [5] is a pair (X, γ ), where X = (X 1 , . . . , X N ) is a random vector, and γ is a measure of dependence between components of vector X . There are great variety of random variable networks. In the present paper, we consider Gaussian Pearson correlation network, where random vector X has a multivariate Gaussian distribution, and γ is the Pearson correlation. Such random variable networks are connected with stock market and gene co-expression networks [2, 4]. Another example is Gaussian partial correlation networks, where random vector X has a multivariate Gaussian distribution, and γ is the partial correlation. Such random variable networks are connected with gene expression networks [4]. Any random variable network (X, γ ) generate network model which is complete weighted undirected graph. Nodes of the graph corresponds to the random variables X i , i = 1, 2, . . . , N and weights of the edge (i, j) is given by γ (X i , X j ), i, j = 1, 2, . . . , N . This network model will be called reference network model. Network structures which are subgraphs of the reference network model, will be called reference network structures. Any reference network structure with N nodes can be defined by its adjacency matrix. For example, the matrix ⎛
0 ⎜ 0 G1 = ⎜ ⎝... 0
0 0 ... 0
... ... ... ...
⎞ 0 0 ⎟ ⎟ ...⎠ 0
is associated with the subgraph of N isolated vertices (there is no edges). The matrix:
Invariance Properties of Statistical Procedures for Network Structures Identification
⎛
0 ⎜ 1 ⎜ G2 = ⎜ ⎜ 0 ⎝... 0
1 0 0 ... 0
0 0 0 ... 0
... ... ... ... ...
291
⎞ 0 0⎟ ⎟ 0⎟ ⎟ ⎠ 0
is associated with the subgraph with N nodes and only one edge (1, 2). For the matrix: ⎛ ⎞ 0 1 ... 1 ⎜ 1 0 ... 1 ⎟ ⎟ G3 = ⎜ ⎝... ... ... ...⎠. 1 1 ... 0 the corresponding structure is the complete graph with N nodes. Let G be the set of all adjacency matrix with N nodes. In practice, random variable network model is given by observations. Identification problem is to reconstruct a reference network structure from observations. In the present paper, we consider the threshold graph identification problem, i.e., problem of identification of the reference-threshold graph from observations.
3 Multiple Decision Approach In this section, we recall multiple decision approach for threshold graph identification first developed in [8]. Let (X, γ ) be a random variable network, and γ0 be a given threshold. Reference threshold graph is defined as follows: graph has N nodes, edge (i, j) is included in the reference threshold graph if γ (X i , X j ) > γ0 . Assume that random vector X has a distribution from class K = {( f (x, θ ), θ ∈ )} where is a parametric space. Let us introduce the hypothesis HS : θ ∈ S that reference network structure has adjacency matrix S ∈ G (recall that G is the set of all adjacency matrices with N nodes). Any multiple decision statistical procedure δ(x) is the partition of the sample space by the regions D S such that if x ∈ D S then δ(x) = d S , where d S is the decision, that network structure has adjacency matrix S ∈ G, i.e., δ(x) =
dS , x ∈ DS
According to Wald decision theory [11], the quality of any statistical procedure is measured by a risk function. Risk function of statistical procedure δ(x) is defined by Risk(S, θ ; δ) =
Q∈G
w(S, Q)Pθ (δ(x) = d Q ), θ ∈ S , S ∈ G
292
P. A. Koldanov
where Pθ (δ(x) = d Q ) is the probability that decision d Q is taken while the true decision is d S , w(S, Q) is the loss from the decision d Q when the hypothesis HS is true. We assume w(S, S) = 0, S ∈ G. Risk function allows to introduce the concept of optimality of multiple decision statistical procedure. Definition 3.1 Statistical procedure δ is optimal in class D if R(S, θ, δ) ≤ R(S, θ, δ ), ∀S ∈ G, ∀θ ∈ S , ∀δ ∈ D It is natural to connect the loss function with difference of two graphs: the reference graph and the graph obtained by application of the procedure δ(x). This difference is defined by the numbers of erroneously included edges and erroneously non included edges. Therefore for the problems of network structures identification it is natural to consider the loss function which takes into account these numbers. To deal with this problem, we apply the concept of additive loss function, introduced in [9]. Let ai, j be the loss from false inclusion of edge (i, j) in the structure and bi, j be the loss from false exclusion of the edge (i, j) from the structure, i, j = 1, 2, . . . , N ; i = j. Loss function w(S, Q) is additive [9] if: w(S, Q) =
(i, j):si, j =0,qi, j =1
ai, j +
bi, j
(1)
(i, j):si, j =1,qi, j =0
In what follows, we consider additive loss functions only. In the present paper, we investigate the properties of statistical procedures for threshold graph (TG) identification in Gaussian Pearson correlation network. In this case the parameter θ is defined by θ = (μ, ), where μ ∈ R 1×N , and is a positive definite matrix (N × N ). Reference threshold graph for a given threshold ρ0 is defined as follows: graph has N nodes, edge (i, j) is included in the reference threshold graph if ρ(X i , X j ) > ρ0 . Here ρ(X i , X j ) is the Pearson correlation between X i and X j . To identify the reference threshold graph from observations, one can test the individual hypotheses: h i, j : ρi, j ≤ ρ0 vs ki, j : ρi, j > ρ0 , i = j, i, j = 1, 2, . . . , N
(2)
Hypothesis h i, j means that there is no edge between vertexes i and j in the reference TG. Alternative ki, j means that there is an edge between vertexes i and j in the reference TG. Let ϕi, j (x) be a test for individual hypothesis h i, j : ϕi, j = 1 means that we reject the hypothesis, and ϕi, j = 0 means that we accept the hypothesis. Then, the associated multiple decision statistical procedure for TG identification is defined as (3) δ(x) = d Q if (x) = Q where (x) is the following matrix
Invariance Properties of Statistical Procedures for Network Structures Identification
⎛
0 ⎜ ϕ2,1 (x) (x) = ⎜ ⎝... ϕ N ,1 (x)
ϕ1,2 (x) 0 ... ϕ N ,2 (x)
... ... ... ...
⎞ ϕ1,N (x) ϕ2,N (x) ⎟ ⎟ ⎠ ... 0
293
(4)
4 Properties of Individual Correlation Tests In this section, we recall some properties of correlation tests for testing individual hypotheses h i, j . Let x(t) = (x1 (t), x2 (t), . . . , x N (t)), t = 1, 2, . . . , n
(5)
be a sample of observations from random vector X = (X 1 , X 2 , . . . , X N ). Let G c,d = g(c,d)
{g(c, d) : x → y = cx + d; c, d ∈ R 1 }, x, y ∈ R N ×n be the group of scale/shift transformations of the sample space R N ×n . Function M(x) is called invariant with respect to the group G c,d if M(gc,d x) = M(x), ∀gc,d ∈ G c,d , ∀x ∈ R N ×n . Similarly, statistical test ϕ is called invariant with respect to the group G c,d if ϕ(gc,d x) = ϕ(x) for ∀gc,d ∈ G c,d Function M(x) is maximal invariant with respect to the group G c,d if M(x1 ) = M(x2 ) implies x1 = gc,d x2 for some gc,d ∈ G c,d . As follows from Neyman–Pearson fundamental lemma ([10], pp. 59) that if maximal invariant has monotone likelihood ratio then, test based on maximal invariant is uniformly most powerful invariant. It is proved in [10] that sample correlation ri, j is maximal invariant with respect to the group G c,d . Distribution of ri, j depends on ρi, j only and can be written as [1]: n − 2 (n − 1) (1 − ρ 2 )1/2(n−1) (1 − r 2 )1/2(n−4) (1 − ρr )−n+3/2 × f ρ (r ) = √ 2π (n − 1/2) 1 1 + ρr 1 1 ) ×F( ; ; n − ; 2 2 2 2 where ∞
(a + j) (b + j) (c) x j F(a, b, c, x) =
(a)
(b) (c + j) j! j=0
(6)
The distribution (6) has monotone likelihood ratio [10]. Therefore, the following test: ⎧ ri, j − ρ0 ⎪ ⎪ 1, > ci, j ⎪ ⎪ ⎪ 1 − ri,2 j ⎪ ⎨ (7) ϕi,Corr j (x i , x j ) = ⎪ ri, j − ρ0 ⎪ ⎪ ⎪ 0, ≤ ci, j ⎪ ⎪ ⎩ 1 − ri,2 j
294
P. A. Koldanov
is the uniformly most powerful invariant test with respect to the group G c,d for testing the individual hypothesis h i, j defined by (2). Here, ci, j is chosen to make the significance level of the test equal to prescribed value αi, j . This means that for any test ϕi, j invariant with respect to the group G c,d with E ρ0 ϕi, j = αi, j one has Pρi, j (ϕi,Corr = 1) ≥ Pρi, j (ϕi, j = 1), ρi, j > ρ0 j Corr Pρi, j (ϕi, j = 1) ≤ Pρi, j (ϕi, j = 1), ρi, j ≤ ρ0
(8)
where Pρi, j (ϕi, j = 1) is the probability of the rejection of hypothesis h i, j for a given ρi, j .
5 Optimal Invariant Statistical Procedure for Threshold Graph Identification Statistical procedure δ(x) is called invariant with respect to the group G c,d if δ(gc,d x) = δ(x) for ∀gc,d ∈ G c,d In this paper, we consider the following class D of statistical procedures δ(x) for network structures identification: 1. Any statistical procedure δ(x) ∈ D is invariant with respect to the group G c,d of shift/scale transformations of the sample space. 2. Risk function of any statistical procedure δ(x) ∈ D is continuous with respect to parameter. 3. Individual tests ϕi, j generated by any δ(x) ∈ D depends on observations xi (t), x j (t), t = 1, . . . , n only. Define statistical procedure δCorr (x) by δCorr (x) = d Q if Corr (x) = Q where
⎛
Corr (x), 0, ϕ1,2 Corr ⎜ ϕ (x), 0, 2,1 Corr (x) = ⎜ ⎝ ... ... Corr (x), ϕ ϕ Corr N ,1 N ,2 (x),
⎞ Corr . . . , ϕ1,N (x) Corr . . . , ϕ2,N (x) ⎟ ⎟ ... ... ⎠ ..., 0
(9)
(10)
The main result of the paper is given in the following theorem. Theorem 5.1 Let (X, ρ) be a Gaussian Pearson correlation network, loss function be additive and losses ai, j , bi, j of individual tests ϕi, j for testing hypotheses h i, j are connected with the significance levels αi, j of the tests by ai, j = 1 − αi, j , bi, j = αi, j .
Invariance Properties of Statistical Procedures for Network Structures Identification
295
Then statistical procedure δCorr defined by (9)–(10) is optimal multiple decision statistical procedure in the class D. Proof Statistical tests ϕ Corr defined by (7) are invariant with respect to the group G c,d of scale/shift transformation of the sample space. It implies that multiple decision statistical procedure δCorr is invariant with respect to the group G c,d . Statistical tests ϕ Corr depends from xi (t), x j (t), t = 1, . . . , n only. According to [7, 9] risk function of statistical procedure δCorr for additive loss function can be written as
R(S, ρ, δCorr ) =
r (ρi, j , ϕi,Corr j )
(11)
i, j:i= j Corr where r (ρi, j , ϕi,Corr for the individual j ) is the risk function of the individual test ϕi, j hypothesis h i, j , ρ = (ρi, j ) is the matrix of correlations. One has
r (ρi, j , ϕi,Corr j )=
⎧ Corr ⎨ (1 − αi, j )E ρi, j ϕi, j , if ρi, j ≤ ρ0 ⎩
αi, j (1 − E ρi, j ϕi,Corr j ), if ρi, j > ρ0
Since E ρi, j ϕi,Corr = αi, j if ρi, j = ρ0 then function r (ρi, j , ϕi,Corr j j ) is continuous as function of ρi, j . Therefore, multiple decision statistical procedure δCorr belongs to the class D. Let δ ∈ D be another statistical procedure for TG identification. Then, ϕi, j (x) depends from xi = (xi (1), . . . , xi (n)), x j = (x j (1), . . . , x j (n)) only. Statistical procedure δ ∈ D is invariant with respect to the group G c,d if and only if associated individual tests ϕi, j (x) are invariant with respect to the group G c,d for all i, j = 1, . . . , N , i = j. One has for an additive loss function (see [7, 9]): R(S, θ, δ ) =
r (θ, ϕi, j )
(12)
i, j
where r (θ, ϕi, j )
=
(1 − αi, j )E θ ϕi, j , if ρi, j ≤ ρ0 αi, j (1 − E θ ϕi, j ), if ρi, j > ρ0
Since tests ϕi, j are invariant with respect to the group G c,d then the distributions of tests ϕi, j depends on ρi, j only [10]. It implies r (θ, ϕi, j ) = r (ρi, j , ϕi, j ) and
R(S, θ, δ ) = R(S, ρ, δ )
Risk functions of statistical procedures from the class D are continuous with respect to parameter, and one get E ρ0 ϕi, j = αi, j . It means that the test ϕi, j has significance
296
P. A. Koldanov
levels αi, j . The test ϕ Corr is UMP invariant test of the significance level αi, j , therefore one has r (ρi, j , ϕi,Corr j ) ≤ r (ρi, j , ϕi, j ), i, j = 1, . . . , N . Then
R(S, ρ, δCorr ) ≤ R(S, ρ, δ ), ∀S ∈ G, ∀δ ∈ D, ∀ρ
Note, that in many experimental works on market network analysis, the authors use sample correlations for the market graph construction. More precisely, the edge (i, j) is included in the market graph if ri, j > ρ0 , and it is not included in the market graph if ri, j ≤ ρ0 . This statistical procedure corresponds to the statistical procedure δCorr (x) with αi, j = 0.5, i, j = 1, 2, . . . , N , i = j. Therefore, practical identification procedure is optimal in the class D for the risk function equals to the sum of the expected number of false edge inclusions and of the expected number of false edge exclusions. For the case ρ0 = 0, the statistical procedure δCorr (x) is in addition optimal in the class of unbiased statistical procedures for the same additive loss function. This follows from optimality of the tests ϕ Corr (x) in the class of unbiased tests and some results from [8, 9].
6 Concluding Remarks The class D is defined by three conditions. All of them are important in the proof of the main theorem and cannot be removed. The condition that risk function is continuous with respect to parameter cannot be removed, because when we consider the class D without this condition the statistical procedure δCorr with significance levels of individual tests αi, j is no more optimal in this large class. A counterexample is given by any statistical procedure of the same type δCorr , but with different significance levels of individual tests. Note that, all these statistical procedures have a discontinuous risk function for the losses ai, j = 1 − αi, j , bi, j = αi, j . The condition that individual tests ϕi, j depends on observations xi (t), x j (t) only also cannot be removed. A counterexample is given by Holm-type step-down procedures, that can be shown by numerical experiments. Acknowledgements The work is partially supported by RFHR grant 15-32-01052 (Sections 3, 4) and RFFI grant 18-07-00524 (Section 5).
Invariance Properties of Statistical Procedures for Network Structures Identification
297
References 1. Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 3rd edn. WileyInterscience, New York (2003) 2. Boginski, V., Butenko, S., Pardalos, P.M.: On structural properties of the market graph. In: Nagurney, A. (ed.) Innovations in Financial and Economic Networks, pp. 29–45. Springer (2003) 3. Gather, U., Pawlitschko, J.: A note on invariance of multiple tests. Stat. Neerl. 51(3), 366–372 (1997) 4. Jordan, M.I.: Graphical models. Stat. Sci. 19, 140–155 (2004) 5. Kalyagin, V.A., Koldanov, A.P., Koldanov, P.A.: Robust identification in random variables networks. J. Stat. Plann. Inference 181, 30–40 (2017) 6. Kalyagin, V.A., Koldanov, A.P., Koldanov, P.A., Pardalos, P.M.: Optimal decision for the market graph identification problem in a sign similarity network. Ann. Oper. Res. (2017). https://doi. org/10.1007/s10479-017-2491-6 7. Koldanov, P.A.: Risk function of statistical procedures for network structures identification. Vestnik TvGU. Seriya: Prikladnaya Matematika [Trudy of Tver State University. Series: Applied Mathematics], no. 3, pp. 45–59 (2017) (in Russian) 8. Koldanov, A.P., Kalyagin, V.A., Koldanov, P.A., Pardalos, P.M.: Statistical procedures for the market graph construction. Comput. Stat. Data Anal. 68, 17–29 (2013) 9. Lehmann, E.L.: A theory of some multiple decision problems. Ann. Math. Stat. 1–25 (1957) 10. Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer, New York (2005) 11. Wald, A.: Statistical Decision Functions. Springer (1950)
Topological Modules of Human Brain Networks Are Anatomically Embedded: Evidence from Modularity Analysis at Multiple Scales Anvar Kurmukov, Yulia Dodonova, Margarita Burova, Ayagoz Mussabayeva, Dmitry Petrov, Joshua Faskowitz and Leonid E. Zhukov Abstract Human brain networks show modular organization: cortical regions tend to form densely connected modules with only weak inter-modular connections. However, little is known on whether modular structure of brain networks is reliable in terms of test–retest reproducibility and, most importantly, to what extent these topological modules are anatomically embedded. To address these questions, we use MRI data of the same individuals scanned with an interval of several weeks, reconstruct structural brain networks at multiple scales, and partition them into communities and evaluate similarity of partitions (i) stemming from the test–retest data of the same versus different individuals and (ii) implied by network topology versus anatomybased grouping of neighboring regions. First, our results demonstrate that modular structure of brain networks is well reproducible in test–retest settings. Second, the results provide evidence of the theoretically well-motivated hypothesis that brain regions neighboring in anatomical space also tend to belong to the same topological modules. Keywords Brain networks · Modularity · Community structure Test–retest reliability · Physically embedded networks
A. Kurmukov (B) · M. Burova · A. Mussabayeva · L. E. Zhukov Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] Y. Dodonova · D. Petrov Kharkevich Institute for Information Transmission Problems, Moscow, Russia D. Petrov Imaging Genetics Center, University of Southern California, Los Angeles, USA J. Faskowitz Indiana University, Bloomington, USA © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_22
299
300
A. Kurmukov et al.
1 Introduction Brain networks (also called connectomes) are known to have modular organization [1, 2]. This means that cortical brain regions tend to group into modules (communities) with dense anatomical and functional within-module connections and weak intermodule links. Moreover, community structure of anatomical networks were shown to capture important information about brain anatomy and functioning, including that is useful for classifying healthy and pathological brains [3, 4]. However, there are still many important questions to be answered. First, little is known about test–retest reliability of the modular structure of human brain networks. Recently, there has been several papers on reproducibility of brain networks themselves (e.g., [5] and references herein), but test–retest reliability of brain community structures to our knowledge has never become a subject of special evaluation. Another important question is how topological modules are anatomically colocalized in the brain. Theoretically, it can be expected that brain regions belonging to the same topological module should also be neighbors in physical space. This should be advantageous in terms of minimizing the connection distance and wiring cost within a module. However, empirical work is still strongly needed to understand how modular structure of connectomes is embedded in brain anatomy [1, 2]. In this study, we address both of the above questions. We first question whether community structure of anatomical brain networks is reproducible in the sense that optimal partitions of the connectomes of the same individual scanned with an interval of several weeks show high similarity. To be able to address this question, we analyze appropriate MRI data from the Consortium for Reliability and Reproducibility [6]; we reconstruct connectomes based on different MRI scans of the same subject, find their optimal partitions into communities and evaluate similarity of these partitions. We expect that similarity of partitions stemming from connectomes of the same individual are high, and in particular, higher than that of connectomes of different individuals. Second, we question whether topological modules of brain networks are at least to some extent anatomically embedded. For each high-resolution brain network, we independently produce two partitions. The first partition is purely topological and only uses information about network connectivity of the respective regions. The second partition use only minor topological information and mainly groups nodes into modules based on their anatomical spacing (and hence their belonging to the same anatomical regions in a low-resolution analysis). We estimate and compare modularity of both types of partitions, and evaluate similarity of the obtained community structures. Details on the procedure are provided in the next section.
Topological Modules of Human Brain Networks …
301
2 Data and Experimental Settings We describe the data and preprocessing steps in this section, and also provide explanation for the terms from network analysis most relevant for our study. Data analysis procedures are described in the next section, each immediately followed by the results obtained on this dataset.
2.1 Dataset We used a dataset from the Consortium for Reliability and Reproducibility (CoRR, [6]). Data sites within CoRR were chosen due to availability of T1-weighted (T1w) and diffusion-weighted images (DWI) with retest period less than two months as described in [5]. The dataset included 49 participants aged 19–30 years. All subjects were healthy volunteers without any psychiatric or neurological disorders. For each participant, two MRI session were performed with the retest period of about 6 weeks (40.9 ± 4.51 days). Details on the sample and MRI acquisition procedure are available on the website of CoRR project [6].
2.2 Tractography and Brain Parcellation We followed the same MRI preprocessing steps as described in [5]. We used probabilistic constant solid angle (CSA) approach to reconstruct white matter structures. This method allows for estimating the fiber orientation distribution (FOD) within each voxel. Streamlines reconstructed based on this method gave information about edge weights of brain networks. To define nodes of brain networks at multiple scales, we followed the procedure similar to that described in [7, 8]. Based on the Lausanne atlas, the cortex of each subject was divided into 68 cortical regions. Then, the obtained regions were subdivided into 1000 small regions so that coverage area of each region was about 1,5 cm2 . To obtain parcellation at multiple scales, these 1000 small regions were additionally combined into 448, 219, and 114 nodes; in each step, neighboring regions of interest were manually combined into larger regions. This successive grouping thus gave us a hierarchical decomposition between 68 and 1000 brain regions; importantly, smaller regions in high-resolution parcellations were strictly embedded into larger low-resolution regions. We counted the number of streamlines having endpoints in each pair of labels for each parcellation and used them as edge weights in each constructed network. We, thus, obtained sets of 98 brain networks (two for each of 49 individuals) at five different scales.
302
A. Kurmukov et al.
2.3 Notation We use the following definitions: Weighted network. Let G = (V, E, W) be an undirected graph (network), where V ={1, . . . , n} is a set of nodes, E is a set of edges of the graph, A = {ai, j }i,n j=1 is its adjacency matrix and W = {wi, j }i,n j=1 is a connectivity matrix where each element represents a weight of the respective edge. Modularity is defined as follows: Q=
n si s j 1 δi, j , wi, j − s i, j=1 s
n si is the sum of where si = nj=1 w j is the weighted degree of node i, s = i=1 weights of the whole graph, and δi j equals 1, if nodes i and j belongs to the same community and 0 otherwise. Note that, modularity is always defined with respect to some network partition. Rand Index (RI) [9] is used to measure the similarity of two partitions U, V : R I (U, V ) =
a+d , a+b+c+d
where a is a number of pairs with objects placed in the same community in both U and V , b is a number of pairs with objects placed in different communities in U and V , c is a number of pairs with objects placed in the same community in V and in different communities in U , and d is a number of pairs with objects placed in the same community in U and in different communities in V . Here, we use an adjusted version of RI [10]. Similarly to RI, it takes values between 0 and 1, with values closer to 1 for more similar partitions.
3 Experiments and Results 3.1 Modular Structure of Brain Networks and Its Reliability We first analyzed a set of connectomes with 68 nodes. This was the only reasonable network size for which globally optimal modularity score and the corresponding community structure could be found. First, all networks showed strong modularity, with sample average modularity score of 0.568, standard deviation 0.010. Figure 1 shows an example connectome of this size, with nodes located in their physical 3D coordinates and colored according to their best partition into communities.
Topological Modules of Human Brain Networks …
303
Fig. 1 Example brain network and its optimal modularity structure. a Brain network with 68 nodes shown in their physical 3D coordinates (axial view). b The same brain network with 68 nodes, node colors show its optimal modular structure. c Connectivity matrix of the same network. d Connectivity matrix with four optimal modules shown by colored squares
Although the number of modules was not constrained during partitioning, the algorithm stopped at four-module community structure in all networks. All modules were intra-hemispheric, with two of them located in the left hemisphere and two in the right one (there was the only exception: in one network, there was a single node attributed to a module from a “wrong” hemisphere). We next addressed a question of reliability of the obtained community structures. For each individual, we had two MRI scans and thus two connectomes and two corresponding partitions. We estimated similarity of these two partitions by computing ARI. We also computed ARI for all pairs of partitions from different individuals. For the same-individual pairs, average ARI value was 0.976, with a standard deviation 0.037 across individuals; median value was strictly 1, thus indicating for most connectomes identical community structures in test and retest connectomes. For interindividual pairs, ARI values were still high, but significantly lower than those obtained for the same-individual pairs. Average ARI value for modular structures of different participants was 0.920, with a standard deviation of 0.061 (median value 0.923).
304
A. Kurmukov et al.
Finally, we performed additional analysis to evaluate whether the same or very close modular structures could be found by an algorithm that does not necessarily stop at globally maximal modularity value (compared to the respective optimal partitioning). We used Louvain algorithm for this analysis. For each network of size 68, we found its best Louvain partition in addition to the optimal one. Modularity values obtained with Louvain algorithm were exactly the same or close to optimal ones in most networks, with the sample average Louvain modularity score of 0.568 and standard deviation 0.010. However, close modularity scores do not necessarily indicate similarity in the corresponding partitions. To compare optimal modularity structure to that obtained by Louvain algorithm, we again used the ARI. We computed pair-wise similarity score between partitions obtained with two algorithms for each connectome. Mean sample ARI value was 0.895, with a standard deviation of 0.095 (recall that ARI value for two identical partitions is 1). High ARI values indicate that Louvain algorithm revealed modular structures identical or very close to those corresponding to globally maximal modularity. This allows us to use Louvain algorithm instead of global optimization in further analyzes, a step needed to evaluate modular structure of the larger networks.
3.2 Anatomical Embedding of Topological Modules We now consider sets of 98 connectomes reconstructed at other scales, with 114, 219, 448, and 1000 nodes (note that information on whether connectomes belong to the same or different individuals is not relevant in what follows). Figure 2 shows an example of 1000-node connectome in its physical 3D coordinates, with the corresponding 68-node connectome put in the same space. Each node in a 1000-node connectome corresponds to some node in a 68-node connectome; this means that for the 1000-node connectome, we already have a partitioning into 68 communities that is strictly anatomically determined. By addressing a question of whether topological modules are anatomically embedded, we, in fact, aim to check that optimal topological partition of a network is close to that defined based on anatomical neighboring. The most straightforward approach to checking this hypothesis would be to find the best possible topological partition of a 1000-node network into 68 communities and compare this topological partition to that defined a priori from anatomical parcellation. However, this approach does not work in practice because the number of communities is too large and hence the obtained topological partitions are too far from being optimal (to compare with, even the first iteration of the Louvain algorithm reveals on average only 22 communities in a 1000-node network). We thus propose the following procedure. For a 1000-node network, we find its best topological partition using Louvain algorithm. On the other hand, we consider a corresponding 68-node network (obtained from the same MRI scan) and its optimal partition into 4 modules described in the previous section. We map this four-module
Topological Modules of Human Brain Networks …
305
Fig. 2 Left Brain networks with 68 (light red circles) and 1000 (black dots) nodes shown in the same physical space (axial view); 3 nodes of the 68-node network are colored with blue, pink, and green, and their corresponding nodes in a 1000-node network are colored accordingly. Right Connectivity matrix of the same 1000-node network; light red boxes show how the nodes of this network are grouped to produce nodes of a 68-node network, and the same 3 nodes are additionally colored
optimal partition from a low-resolution network to a high-resolution network so that each node in a 1000-node network inherits its community membership from the respective parental node in the 68-node network. This mapped partition of the 1000-node network certainly carries some information about network topology because we used topological modules of a low-resolution network for mapping. But still, this mapped partition is substantially determined by the anatomy. It, in fact, ignores topology of a 1000-node network and mainly puts nodes to a module based on whether or not the respective small cortex areas belong to a given larger region in the anatomical parcellation. For each 1000-node network, we thus know its best Louvain partition and the corresponding modularity score, and its mapped partition and the corresponding modularity score. We finally evaluate similarity between these two partitions by computing ARI. The above procedure was repeated for networks of size 114, 219, and 448 (with 68-node networks always being a reference for obtaining mapped partitions). Table 1 reports the results. First, networks of all sizes were again confirmed to be highly modular. Network modularity steadily increased with increasing resolution (network size). Second, modularity scores of the anatomically mapped partitions were only marginally lower than those obtained for the best topological partitions. In other words, ignoring actual connections between nodes in a high-resolution network and simply coloring them based on anatomical membership of the respective cortex areas did not result in a dramatic drop in modularity values.
306
A. Kurmukov et al.
Table 1 Mean values (standard deviations) of the modularity scores obtained for each network size by Louvain partitioning algorithm (top row) and by mapping the partitioning of the corresponding low-resolution network (middle row) and the Rand index indicating similarity between these two partitions (bottom row) 114 nodes 219 nodes 448 nodes 1000 nodes Modularity, best partition Modularity, mapped partition ARI for two partitions
0.598 (0.011)
0.631 (0.010)
0.665 (0.011)
0.691 (0.011)
0.592 (0.010)
0.615 (0.009)
0.629 (0.009)
0.642 (0.009)
0.762 (0.114)
0.652 (0.099)
0.561 (0.068)
0.525 (0.051)
Fig. 3 Different ways to illustrate that topological modules of high-resolution networks are nested within the anatomically mapped network structure inherited from the low-resolution parental network. a Connectivity matrix of the high-resolution network with eight best-partitioned topological modules are shown with light red squares and four anatomically mapped modules inherited from the low-resolution network shown with blue, green, violet, and pink squares; the former partition is almost strictly embedded into the latter. b Hierarchical tree with four levels: I—whole 1000-node connectome, II—two hemispheres, III—four modules inherited from the anatomically mapped partition of the low-resolution network, and IV—eight modules obtained by Louvain partitioning of the high-resolution network. c Cross tabulation (contingency table) of co-occurrence of the high-resolution network labels in eight modules of the best Louvain partition (horizontal) and four modules of the anatomically mapped partition (vertical); again, best-partitioned topological modules are almost strictly submodules of the anatomically mapped partition
Third, these anatomically mapped partitions were highly similar to the best topological partitions of the respective networks, as indicated by moderate to high ARI values. Importantly, computation of ARI does not require that partitions have the same number of modules. The obtained moderate to high ARI values also not necessarily
Topological Modules of Human Brain Networks …
307
indicate that best-partitioned high- resolution networks resembled the four-modular structure of the anatomically mapped partitions and the parental low-resolution networks. High-resolution networks could show larger number of modules, but what was important for ARI not to drop down was that these smaller topological modules were hierarchically nested within the four-module anatomically mapped structure. Figure 3 illustrates this idea for high-resolution 1000-node networks. Louvain algorithm revealed eight-module community structure of this network; however, these topological modules were almost embedded into four-module structure obtained by anatomical community mapping of the parental low-resolution network.
4 Conclusions We analyzed modular structure of anatomical brain networks defined at different scales, from low-resolution networks that represented connectivity of atlas cortical regions to high-resolution 1000-node networks. We worked with dataset from the Consortium for Reliability and Reproducibility that included data obtained from the same individuals with a time interval of several weeks, a step that additionally allowed us to evaluate test–retest reliability of the community structure of brain networks. First, our experiments confirm that structural brain networks are highly modular. Brain regions tend to form small number of densely connected modules, with the number of topologically defined modules varying from four in the low-resolution networks based on the Lausanne anatomical atlas to eight in the highest resolution 1000-node networks. Second, we proposed to use a measure of similarity between partitions (Rand index) for evaluating test–retest reliability of the community structure of brain networks. Our experiment confirmed that modular organization of anatomical brain networks is highly reliable in the sense that it almost perfectly resembles in networks reconstructed from different MRI scans of the same individual taken with an interval of several weeks. Third, and most important, for the high-resolution networks, we considered two approaches to partitioning brain regions into communities. The first one was purely topological and revealed optimal modules only based on network connections between the nodes. The second one was largely anatomical because nodes were put into communities based on the fact that they were anatomically neighboring and belong to the same cortical region in the anatomical atlas (and hence to the same parental node in a low-resolution network). We demonstrated that modularity with respect to these latter partitions were still very high, and only slightly lower than modularity estimated with respect to topologically optimal partitions. Moreover, we demonstrated that similarity between topologically optimal and anatomically implied partitions was very high. Topological modules largely resembled anatomical grouping of neighboring cortical regions. By using multi-scale analysis and network algorithms of partitioning and comparing partitions, we found new
308
A. Kurmukov et al.
evidence in support of the theoretically well-motivated hypothesis that brain regions neighboring in anatomical space also tend to belong to the same topological modules. Acknowledgements The publication was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (grant 16-05-0050) and by the Russian Academic Excellence Project “5-100”.
References 1. Meunier, D., Renaud, L., Bullmore, E.T.: Modular and hierarchically modular organization of brain networks. Front. Neurosci. 4 (2010) 2. Meunier, D., Lambiotte, R., Fornito, A., Ersche, K.D., Bullmore, E.T.: Hierarchical modularity in human brain functional networks. Front. Neuroinform. 3 (2009) 3. Kurmukov, A., Dodonova, Y., Zhukov, L.: Classification of normal and pathological brain networks based on similarity in graph partitions. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) 4. Kurmukov, A., Dodonova, Y., Zhukov, L.: Machine learning application to human brain network studies: a kernel approach. Models, Algorithms, and Technologies for Network Analysis. NET 2016. Springer Proceedings in Mathematics and Statistics, vol. 197 (2017) 5. Petrov, D., Ivanov, A., Faskowitz, J., Gutman, B., Moyer, D., Villalon, J., Thompson, P.: Evaluating 35 methods to generate structural connectomes using pairwise classification. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2017) 6. He, Y.: Connectivity-based brain imaging research database (c-bird) at Beijing normal university 7. Cammoun, L., Gigandet, X., Meskaldji, D., Thiran, J.P., Sporns, O., Do, K.Q., Hagmann, P.: Mapping the human connectome at multiple scales with diffusion spectrum MRI. J. Neurosci. Methods 203(2) (2012) 8. Hagmann, P., Kurant, M., Gigandet, X.: Mapping human whole-brain structural networks with diffusion MRI. PLoS ONE 2(7) (2007) 9. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971) 10. Vinh, N.X., Julien, E., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11 (2010)
Commercial Astroturfing Detection in Social Networks Nadezhda Kostyakova, Ilia Karpov, Ilya Makarov and Leonid E. Zhukov
Abstract One of the major problem for recommendation services is commercial astroturfing. This work is devoted to constructing a model capable of detecting astroturfing in customer reviews based on network analysis. The model uses projecting a multipartite network to a unipartite graph, for which we detect communities and represent actors with falsified opinions. Keywords Social network analysis · Astroturfing · Bipartite network Recommendation system
1 Introduction Social networking services have become highly popular around the world. Most of these services consist of connections between people while there are a lot of recommendation networks, which connect people and products. People recommend different products by rating them, writing reviews about them, and commenting on the reviews of others. Consequently, recommendation networks have a large amount of useful information, which can be used for analysis of different patterns. However, since this information is used by consumers to make decision about buying products, there is a big motivation to abuse it by advertising or discommend certain products. This type of abuse is called commercial astroturfing, and is the main goal of this paper. N. Kostyakova · I. Karpov · I. Makarov (B) · L. E. Zhukov Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] N. Kostyakova e-mail:
[email protected] I. Karpov e-mail:
[email protected] L. E. Zhukov e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_23
309
310
N. Kostyakovaet al.
Let us formulate model in terms of network structure. Social network of recommendations can be presented as a bipartite network: the first subset of nodes consists of users and the second one consists of brands. If a user i wrote a review on a product of a brand j, then there is an edge (i, j). Hence, the main goal of this analysis is to find a cluster of users who produce fake reviews. Different approaches to solving the community detection problems were described [1]. However, all of these methods were intended for unipartite networks. Consequently, algorithms for community detection in bipartite networks were described in [2–4], such as Modularity-Based, Label Propagation, and Weighted Projection Algorithms. The best results were shown by Weighted Projection Algorithms based on random walk. We propose a new pipeline of using existing community detection algorithms for detecting of commercial astroturfing and evaluate our model on a collected dataset of reviews from a certain Internet shop. This paper consists of seven parts. The second one presents detailed description of related papers and their comparison. The third part includes building of the dataset which comprises loading stage, description of the dataset and preprocessing step. The fourth part shows detailed description of the constructed model. The fifth part describes the implementation process. At the sixth part, results of an analysis on the dataset are contained. Finally, the last part summarizes the results and presents the conclusion of the entire paper.
2 Related Work This chapter is devoted to analysis of existing approaches and basic notions in problem area. In general, a subset of vertices forms a community if the induced subgraph is dense, but there are relatively few connections from the included vertices to vertices in the rest of the graph [5, 6]. The main approaches to a community detection problem were reviewed in [1]. We shortly describe the algorithms that we use in our research. The first method is a divisive algorithm by Girvan and Newman [7]. The idea is to detect the edges that connect vertices of different communities and remove them, so that the clusters get disconnected from each other. Detecting the edges is based on edge betweenness metric, which is a number of shortest paths between all vertex pairs that run along the edge. The second method uses Modularity metric-forming communities [1], which have the highest modularity value. The modularity metrics stands for the difference between the actual density of edges in a subgraph and the density of a subgraph in a random graph. The third method uses Random walk [1] for detecting community because a random walker will spend a lot of time inside the community due to the high density inside the community. The fourth method is a Label propagation model [8] that takes initially a subset of nodes with their labels and then at every step, a randomly picked vertex takes the label shared by the majority of its neighbours. In what follows, we described algorithms specifically designed for community detection in bipartite networks. In [4], a modularity-based algorithm was applied
Commercial Astroturfing Detection in Social Networks
311
to bipartite networks. There was a new definition of a modified modularity metric suitable for bipartite networks, and an algorithm for community detection based on modified modularity metric (BRIM). The authors of [3] presented a model based on a combination of two algorithms: the extended Modularity-based algorithm (BRIM) and the Label propagation method. The main disadvantage of this model is that it needs correct (at least, not random) initial labelling, which is not always available. In contrast with standard approaches, in [2], a Weighted Projection Algorithm was presented showing state-of-art results. It transforms a bipartite graph to unipartite and uses a Random walking to detect communities. Despite the fact that astroturfing problem is relatively new, there are a lot of papers examining different techniques of modelling it. In [9], the authors investigate political astroturfing in social networks. Their system collects data from networking services (Twitter, Yahoo Meme, and Google Buzz), builds a dataset based on network analysis and sentiment analysis, marks every post as truthy, legitimate, or remove and finally, uses methods of classification (AdaBoost and SVM) based on this dataset. Results of this analysis are quite good; it accurately detects truthy memes based on features extracted from the topology of the diffusion networks. In [10], different astroturfing campaigns in social networks relying on free texts posts were studied. This research was based on two datasets of users and their posts collected from Twitter. Preprocessing step consisted of manual marking of a small dataset of posts as sharing similar talking points, then building a network in which nodes are presented by posts and connections are presented by similarities of these posts (based on text mining) and making an attempt to identify campaigns using network analysis. Campaigns were defined as connected components, cliques or dense subgraphs. In [11], the authors studied the problem of social botnets. This research is based on a dataset of suspected users from Twitter, who posted tweets about Syrian war at a particular time and retweeted tweets of a particular user (news aggregator bot). This dataset forms a network in which nodes are presented by users and connections are presented by retweet events. This research made possible to reveal some behavioural patterns of bots. The problem of promoted social media campaigns detection was studied in [12]. The dataset consisted of Twitter posts corresponding to Twitter trends. There were two types of trends in this social network: organic trends which appear naturally and promoted ones which were formed for a fee. After extracting data from the social service, several network-related, user account, timing, text, and sentiment features were computed. For network-related features, three networks were built (retweet, mention, and hashtag co-occurrence), and some network properties were calculated for each of them. User account group of features consisted of some simple metrics, such as the distribution of follower and followers’ numbers, the number of tweets produced by the users, etc. Time-related features included the number of tweets produced in a given time interval, etc. K-Nearest Neighbour with dynamic time warping and Random Forest were used to analyse the dataset. Results of this analysis showed high accuracy (AU C = 95%) and resulted in conclusion that network-based and content-based features are most valuable for separating promoted campaigns from non-promoted ones.
312
N. Kostyakovaet al.
To sum up, two main approaches to solve the astroturfing problem were described. The first one uses network analysis, while the other one is based on forming a dataset using specific methods such as text mining or content analysis, marking each observation as promoted or not and, finally, classification. We concentrated on the first approach and aim to show that it is sufficient to use network properties even for a small dataset to detect astroturfing. We used a modified version of the Weighted Projection Algorithm since it allows to use highly investigated methods of community detection of unipartite networks. The first step of this algorithm is building a multimodal network. The second step is building a bimodal network which is based on the multimodal. The third step is projecting the bimodal network to a unimodal. And, the final step is detecting communities in this unipartite network using two methods: Louvain and Label propagation.
3 Dataset Loading To solve the problem of commercial astroturfing, the dataset was collected from a recommendation social network, IRecommend. This service is available online, but its site is protected from automated parsing. Since that the process of the data loading was built using Python package “Selenium,” which can run the browser and imitate the real user, there was no information about which reviews were artificial. Therefore, our task could be formulated in terms of unsupervised learning problems. IRecommend service contains information on the products for each product category, and for each product, there are reviews written by users. All reviews written by a particular user can be seen. This is a key property of a recommendation system because it makes possible to observe the connections between users and products. The dataset consists of 36916 reviews and has seven features. Each review include some features, which are presented at the Table 1.
Table 1 Description of the dataset Name Description User Product Brand Time Text Rating Sent
Link to the user which wrote this review Link to the product on which a review was written Brand of the product in the review. Most of reviews have this feature, but some of products do not Date and time of posting the review Text of the review. Language of reviews is Russian. Since the text processor cannot process emoji they have been deleted from the text Rating of the product by the user in this review. It can take the value from 1 to 5 Dummy variable which take the value 0 if it is negative and the value 1 otherwise
Commercial Astroturfing Detection in Social Networks
313
Preprocessing Since there were empty values in the dataset the preprocessing was done. First, if the value of User of Brand features of the observation is empty, then this observation was removed. Second, if the value of the Rating feature is empty, then the neutral value equal to 3 was set. So after this step the dataset consists of 36894 reviews from 266 users on 5407 brands.
4 Model Description Our method consists of several steps. Since the initial network in the recommendation service is three partite, the first step is building the three-partite network of users, products, and brands. The second step is building a bipartite network of users and brands. The third step is projecting a bipartite network to unipartite network of users. And, the final stage is community detection using the modified version of the Weighted Projection Algorithm.
4.1 Building a Three-Partite Network The first step of building the three-partite network of users, products, and brands is presented in Fig. 1. Every edge between users and products has a weight that is equal to rating value in the review.
4.2 Building a Bipartite Network The second step is building a bipartite network of users and brands using averaging of weights (Fig. 2).
Fig. 1 Example of a three-partite network
314
N. Kostyakovaet al.
Fig. 2 A bipartite network
Fig. 3 A fully connected unipartite network
j
ai =
⎧ n kj ⎪ ⎪ ci ⎨ k=1
⎪ ⎪ ⎩
n
, if n > 0
0,
otherwise,
kj
where ci is the rating value of the i-th user on the k-th product of the j-th brand, n j is the number of products of the j-th brand, which was reviewed by the i-th user, ai is the rating value of the i-th user on the j-th brand.
4.3 Building a Unipartite Network The third step is projecting a bipartite network to unipartite network of users (Fig. 3). First, similarity metric should be calculated between all users. In this work, Cosine j similarity metric was used. Let A = (ai ) be an adjacency matrix of a bipartite network. Then, the similarity measure between h-th and l-th users is defined as follows phl =
(ah , al ) , ||ah ||2 ||al ||2
where ah is the h-th row of the matrix A representing a vector of aggregated ratings of brands of the h-th user, and al is the l-th row of the matrix A standing for a vector of aggregated ratings of brands of the l-th user. Second, to decrease the number of edges, a threshold was defined, meaning that only highly similar users were connected in this network (Fig. 4).
Commercial Astroturfing Detection in Social Networks
315
Fig. 4 A unipartite network
4.4 Community Detection on a Unipartite Network The fourth step is detecting communities in this unipartite network. There were two methods used: Louvain and Label propagation (LP). Since the LP algorithm needs initial labelling a set of suspicious users and a set of suspicious brands should be considered. We used an assumption that users having only high grades could be said to be suspicious. Since the dataset does not have information about actual labelling to evaluate obtained results, the manual procedure of verification was done, i.e., after labelling nodes using LP every node which was labelled as “suspicious” was checked manually.
4.5 Implementation Here, we describe some parameters used in the model. The first parameter is the threshold to define edges in the unipartite network. It was defined as 75-th percentile. The second parameter is the threshold to define, which grades are high at initial labelling step. This parameter was set as 90-th percentile. The third parameter is the fraction of neighbours, which should be labelled in the LP algorithm. This parameter was set as 0.2.
5 Discussion 5.1 Louvain First, Louvain algorithm was used. There were six communities detected (Fig. 5). To understand which clusters were found, some measures were calculated (Table 2). It can be seen that • there is two single-node communities; • the third community has high values of average number of products, average number of brands, and average rating value comparing to others; • the node in the sixth community wrote extremely many reviews.
316
N. Kostyakovaet al.
Fig. 5 Communities using Louvain Table 2 Results of community detection using Louvain 1 com. 2 com. 3 com. 4 com. Size of community Average number of products Average number of brands Average rating value
5 com.
6 com.
80
39
86
59
1
1
39.51
130.02
284.5
48.98
171
1068
21.26
58.51
119.23
24.25
101
568
4.213
4.215
4.274
4.191
4.456
4.075
Additionally, frequent 3-g were found. It was found that • the third community has a lot of verbs such as “buy”, “like”, “love”, and so on; • the node in the sixth community is very different from others in terms of frequent 3-g, it used a lot of words about food such as “taste”, “composition of the product,” and so on. To sum it up, it can be said that these communities are different not only in terms of their brands preferences, but in terms of their behaviour.
Commercial Astroturfing Detection in Social Networks
317
Fig. 6 Communities using LP Table 3 Results of community detection using LP
Size of community Average number of products Average number of brands Average rating value
1 com.
2 com.
37 154.67 57.648 4.588
234 134.61 61.683 4.172
5.2 LP Second, LP algorithm was used. Two communities were detected (Fig. 6). To understand which clusters were found, some measures were calculated (Table 3). It can be seen that • these communities are approximately equal in terms of average number of products and average number of brands; • average rating value of the first community is much higher; this can be explained by the design of initial labelling. Additionally, frequent 3-g were found. It was found that reviews of users in the first community contain the word “review” a lot. It can be explained by the fact that this service gives money to users if their reviews are popular, and users refer to their previous reviews to get money. Consequently, this community can be defined as users that write reviews for money. We can say that these communities are different not only in terms of their brands preferences, but in terms of their behaviour.
318
N. Kostyakovaet al.
One of key text features was also the number of links to own reviews, which was found to be a property of a normal user who wants to attract attention and earn trust from the readers. Rather than that, we have no actual supervised data, but we found that suspicious communities found by our algorithm really represented biased average rating for a group of brands having no connection to each other, which may be noted as an indirect property of commercial astroturfing.
6 Conclusion We proposed a method for astroturfing detection. It consists of four steps: building a multimodal network, building a bimodal network, building a unimodal network, and community detection of the unimodal network. This method makes it possible to detect communities of similar users in terms of their recommendations. It was found that these communities differ in terms of their behavioural and linguistic features. What is more, suspicious communities can be found using additional information. Acknowledgements The article was supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100’ and RFBR grant 16-29-09583 “Methodology, techniques and tools of recognition and counteraction to organized information campaigns on the Internet”.
References 1. Fortunato, S.: Community detection in graphs. Phys. Rep. (2010) 2. Alzahrani, T., Horadam, K.: Community detection in bipartite networks: algorithms and case studies. Complex Systems and Networks (2016) 3. Liu, X., Murata, T.: Community detection in large-scale bipartite network. In: Proceedings of IEEE/WIC/ACM (2009) 4. Barber, M.: Modularity and community detection in bipartite networks. Phys. Rev. (2007) 5. Schaeffer, S.: Graph clustering. Comput. Sci. Rev. (2007) 6. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences) (1994) 7. Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. (2004) 8. Raghavan, U., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. (2007) 9. Ratkiewicz, J., Conover, M., Meiss, M., Gonalves, B., Patil, S., Flammini, A., Menczer, F.: Detecting and tracking the spread of astroturf memes in microblog streams. Technical Report [cs.SI], CoRR (2010). arXiv:1011.3768 10. Abokhodair, N., Yoo, D., McDonald, D.: Content-driven detection of campaigns in social media. In: Proceedings of CIKM (2011) 11. Lee, K., Caverlee, J., Cheng, Z., Sui, D.: Dissecting a social botnet: growth, content and influence in twitter. In: Proceedings of the 18th ACM Conference on Computer-Supported Cooperative Work and Social Computing (2015) 12. Ferrara, E., Varol, O., Menczer, F., Flammini, A.: Detection of promoted social media campaigns. In: Proceedings of International AAAI Conference on Web and Social Media (2016)
Information Propagation Strategies in Online Social Networks Rodion Laptsuev, Marina Ananyeva, Dmitry Meinster, Ilia Karpov, Ilya Makarov and Leonid E. Zhukov
Abstract Online social networks play a major role in the spread of information on a very large scale. One of the major problems is to predict information propagation using social network interactions. The main purpose of this paper is to construct a heuristic model of a weighted graph based on empirical data that can outperform the existing models. We suggest a new approach of constructing the model of information based on matching specific weights to a given network. Keywords Social network analysis · Information propagation · Social networks
1 Introduction Online social networks is one of the most effective and fast tools used for the spread of information. These technologies enable individuals to share information simultaneously with an audience of any size on different topics of interest. For instance, in online social networks, this process can be implemented via reposts that are posts copying information from another post, while preserving a link to the source. Profound knowledge of core principles of information propagation and the ways of its R. Laptsuev · M. Ananyeva · D. Meinster · I. Karpov · I. Makarov (B) · L. E. Zhukov Higher School of Economics, National Research University, Moscow, Russia e-mail:
[email protected] R. Laptsuev e-mail:
[email protected] M. Ananyeva e-mail:
[email protected] D. Meinster e-mail:
[email protected] I. Karpov e-mail:
[email protected] L. E. Zhukov e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_24
319
320
R. Laptsuev et al.
spread provide us with a lot of opportunities. For example, one might influence the target groups of a minimal sufficient number of active actors depending on the purpose: propagation of rumours, political propaganda, placement of a new product on the market, viral advertisement, and many others. A lot of efforts have already been made in understanding the principles of information propagation and their formalization through mathematical models. The range of currently applied models mostly consists of linear threshold models, cascade models and mixed models [4, 5, 10, 15]. However, there are still many challenges on the way to improve existing models. In this study, we propose a weighted graph model with tuned parameters. The weights are modified in order to fit the empirical data. The article is organised as follows. In the beginning (Sect. 2), we give a brief overview of the most frequently used strategies of modelling information propagation. Section 3 contains the main definitions and notions on the topic. In Sect. 4, we introduce the dataset used for our experiments. In Sect. 5, we describe two models for weighted graph and baseline model for further verification. In Sect. 6, we conclude the paper and discuss possible directions for future work.
2 Related Work The problems of information propagation, social influence maximization, and its applications to online social networks were widely studied in the research papers [15, 16]. One of the main prerequisites for spread of information via social networks is that user receives new information and takes one of the two possible strategies: to extend the information further or not. The propagation can occur either through direct messages or through reposts. Since the direct conversations are treated as confidential information, we focused our attention on the reposts. We consider only direct reposts leaving indirect reposting to future research. There are two probability models used by researchers: Independent Cascade (IC) model [9, 10] and Linear Threshold model (LT) model [12]. Cascade models were borrowed from particle physics [19]. They are used for simulating processes similar to activation of any node as a result of on–off independent attempts by already activated neighbouring nodes. Aside from physics, the models were also inspired by medicine. Systematic study on the adoption of medical diffusion models in human networks started with the research by Coleman et al. [6]. In [20], Morris described the contagion theory of spread of behaviours. The basic premise behind the theory of social contagion is that a node is driven to adopt a behaviour based on the behaviours of its neighbours, more precisely the fraction of the neighbours, who have adopted the behaviour. The idea was further generalized through linear threshold models [14] incorporating different weights for neighbours and different individual thresholds. Regarding IC model, Kimura and Saito [19] proposed several shortest path-based influence cascade models and provided efficient heuristic algorithms for computations of influence spread under their models. In [5], researchers modified the algorithm by adding degree discount heuristics for the uniform-independent cascade
Information Propagation Strategies in Online Social Networks
321
model with the same probabilities for all edges of graphs. One more feature proposed by the authors is called maximum influence arborescence, which is a tree in a directed graph, where all edges are either pointing towards the root or pointing away form the root [9]. All the mentioned above models use specific features of the IC model, therefore, they cannot be applied directly to LT models. The cascade and linear threshold classes of models show the good results of influence spread and active actors maximization in comparison to other suggested models [14, 15]. Nevertheless, one of the most important disadvantages of models described above are slow time of running and not scalable properties. Other algorithms do not provide consistently good performance on influence diffusion [19]. A generalized cascade model was shown to be equivalent to the generalized threshold model [14], while providing theoretical performance guarantees for several hill-climbing strategies for many general instances of the NP-hard models for optimal solutions. The first researchers who considered the influence of maximization within the probabilistic approach as an algorithmic problem were Domingos and Richardson [8]. They used methods based on Markov random fields in order to model the final state of the node activation in the network directly. Driven by its application in viral marketing, a lot of recent efforts in diffusion processes have focused on finding the set of nodes that would maximize the spread of information in a network, also called a target set selection problem. Despite the problem of scalability of their greedy algorithms, Kempe et al. [14] studied the influence maximization as a discrete optimization problem. In [18], the lazy-forward optimization model was presented, which selects new seeds in order to reduce the number of influence propagation evaluations. However, it is also not scalable to graphs with thousands of nodes and edges or larger size and computational time is not efficient enough. Consequently, several extensions of the original cascade and threshold models have been developed. In [3], the authors collected available traces of photos spread in social networks like Flickr and tried to reveal the role of friendship in the diffusion process and the length of photos spread. The results showed that exchanged information between friends was likely to account for over 50% with significant delays at each hop and the obstacles to spread each photo widely among users. In [2], it was found that the long ties in social networks prohibited the complex cascade. The authors of [21] found a coupling between interaction strengths and the networks local structure in a mobile communication network. It was shown that if the weak ties are gradually removed, then a phase transition takes place in the network. In addition, it is important to mention several studies with designed machine learning algorithms in order to extract parameters of influence cascade models from empirical datasets [11, 19]. We have also used this principle to generate graphs in our paper.
3 Definitions In our research, the term ‘information propagation’ implies the diffusion of information via reposts among the members of a given community. We denote the online social network as a graph G(V, E), where V stands for the set of vertices, E is the
322
R. Laptsuev et al.
set of directed weighted edges. The node may be defined as a user or a group of user, but taking into account our empirical dataset, we define one vertex as a member (subscriber) of community. The spread of reposts on the graph, we call a ‘wave’ in further. We define Information Cascade or Information Diffusion as a behaviour of information adoption by people in a social network resulting from the fact that people ignore their own information signals and make decisions from inferences based on peoples previous actions.
4 Dataset We used Vkontakte as the source for uploading the dataset. We chose one political community (‘United Russia’) as an example for examining the way the information is spread via reposts among the community subscribers. The dataset contains the user discussions, obtained during Parliament elections in Russia in 2016. It consists of approximately 6000000 messages and more than 36000 actors, who re-posted a message more than 10 times. One should say that there are still few problems with the quality of the database. 1. There is a difficulty modelling users and groups simultaneously, because two different types of objects are presented on the same graph. 2. Low value of betweenness centrality metric (lack of connections). 3. Groups do not make direct references to other groups, which are the sources of information. The following pre-processing steps were conducted. First, the BIRCH (balanced iterative reducing and clustering using hierarchies) data mining unsupervised algorithm was used in order to perform hierarchical clustering over particularly large datasets. We also used locality-sensitive hashing (LSH) to reduce the dimension of dataset. 1. All given documents were sorted by length. 2. The analysis included a context window of 10000 documents. 3. The size of search window was chosen of 2 weeks.
5 Model Description In our project, we have used two different models. The first model constructs a weighted graph, where edges reflect apriori extent to which users influence each other. The second model provides an advanced model of wave propagation on the baseline graph in order to adjust weights for better wave propagation modelling.
Information Propagation Strategies in Online Social Networks
323
This simulation is used to measure similarity of our model compared to information spread in real networks.
5.1 Architecture Weight Generating Model We begin with two types of data that was taken from the real network. The first type is a list of communities with their users. With this data, a baseline graph is built with nodes corresponding to communities and edges connecting communities with common users. The weights correspond to Jaccard index for users of connected nodes in baseline graph. Second type of data is “wave”, which is a message that is posted in communities. Each wave is a list of users with the time of their activation. For each wave, we consequently modify baseline weights and adjust the model based on new information on information propagation. The algorithm is correct with respect to different orders of waves during graph processing. In what follows, we describe processing one wave. In the beginning, we make all the nodes belonging to the wave active simultaneously. Then, we look on the neighbours of active nodes. If neighbour is not active then we decrease the weight of edge from active node to its neighbour by applying a function of specific type depending on the current weight. Otherwise, we choose the node, which was activated earlier and increase the weight of the edge connecting the first activated node to its neighbour activated later. To change the weight, we find the argument of chosen monotonic piecewise continuous function taking values in [0, 1], then change the argument by adding/subtracting the constant parameter (computed via model fitting), and finally calculate the value of new weight as the function value on a new argument. We choose Sigmoid function among many of the so-called “activation functions” [13] often used in emulating nonlinear processes. Let us demonstrate processing a simple graph with a short wave consisting of three vertices (see Fig. 1). In this wave, vertex 0 was activated in time t1 and vertex 1 was activated in f (h)−
h 0
2 b a
c 1
d e
Fig. 1 Example of re-weighing on waves with 3 vertices
0
2 f (b)+ a
c 1
f (d)− e
324
R. Laptsuev et al.
Dark blue colour indicates activated status of a vertex. Left graph presents wave with activated vertex 0 at time t1 and vertex 1 at time t2 < t1 . Right graph shows updates of weights: the functions of weights on edges f (d) and f (h) were reduced due to inactivity of node 2, while f (b) was increased for activation of node 1.
5.2 Model Evaluation After adjusting the weighted graph based on “wave” data, we aim to preserve the following property of modelling the information waves: we want to simulate the existing waves for the obtained graph and get almost the same number of activated vertices as in the original wave. The question of comparing the differences between sets of activated vertices is left for the future work. We take a vertex with the earliest time of activation from wavei and make it active on our graph. Next, we make its neighbour active with probability p = Wout , similar to IC model. In order to reflect the assumption that users who have seen information from a fairly reliable source and refused to get activated will never be activated with this wave, we remove all the nodes that were not activated if the weight edge connecting them is greater than parameter Q standing for the threshold of the information spread reliability. Choosing proper parameter of reliability we guarantee convergence of our simulation to non-trivial solution close to the original wave. In what follows, we continue this process for all the activated neighbours for the graph without removed vertices. Hereby, we demonstrate the process of verification on the small network shown on Fig. 2 with simple case Q = 0 of complete reliability. Wave starts from vertex 0. Next, vertex 2 was activated and vertex 1 was not, so it was removed from the future analysis. Then, we analyse only neighbours of 2 that were not removed. Suppose that 3 was removed and now our algorithm stops and return the final amount of activated nodes by this wave. These two steps are represented on Fig. 2. One can observe graphical representation of this process in Fig. 3.
3
3r
z
f (h)− 0a
2a f (b)+ a
c 1r
Fig. 2 Verification graphs
z
f (h)−
f (d)− e
0a
2a f (b)+ a
c 1r
f (d)− e
Information Propagation Strategies in Online Social Networks
325
Fig. 3 Example of the nodes activation in information spread simulation. Three graphs, from left to right, demonstrate the spread of information in three moments of time: start, intermediate step and finish of simulation
5.3 Implementation First, we write an algorithm for assigning and evaluating the weights (see Algorithm 1). Next, we provide the step-by-step algorithm for implementing the simulations (see Algorithm 2).
5.4 Discussion The main result of our project is a new model that could be used for modelling information propagation in social networks. In order to check the applicability of our model, we have built two metrics that will check if our algorithm works better than cascade model. Algorithm 1: . Weight generating model Data: G(V, E, W ), W aves = (S ⊆ V, T : V → N) Result: G(V, E, Wnew ) for wave in Waves do for U in wave do for V: U → V do if v ∈ wave then if Tu < Tv then Wuv := σ (σ −1 (Wuv + δ) end else Wuv := σ (σ −1 (Wuv − δ) end end end end
326
R. Laptsuev et al.
Algorithm 2: . Simulation Data: G(V, E, W ), s ∈ V t Result: {Fi }i=1 F0 := {s} E xcluded := ∅ t := 0 while Ft = ∅ : do for u ∈ F do for v : u → v do if v ∈ / E xcluded then Ft+1 .add(v) with probability Wuv end end end E xcluded.add({v|u → v, v ∈ / Ft+1 }) t := t + 1 end
The first metric is an interval for activated node. In order to obtain this metric, we take some random node V0 from wave. Next, we apply it to our model and look on the amount of nodes that were finally activated during 500 iterations. Then, we build the intervals of 95% values and repeat the same procedure for cascade model with different thresholds, observing which of them is narrower and less biased. Next, we build two graphs of dependence between number of nodes and period of activation. The aim is to find there final activation value of each particular wave. Finally, we merge these three graphs in order to compare the results.
6 Conclusion We proposed a weighted graph model by tuning specific weights based on empirical data that might outperform the existing models. There is still a lot of directions for improvements in this research area. Considering the last step performed in this paper, it would be a great improvement to suggest the model of optimisation for threshold parameter, which is required for not considering the inactive nodes (in other words, removing them from the final subset of activated nodes). It also makes sense to compare the results obtained via the proposed model with those we obtained using the independent cascade model or any other state-of-the-art models (e.g., another type of cascade model—linear threshold model). Depending on the results, we could conclude whether the proposed model is more effective or better by any other criterion.
Information Propagation Strategies in Online Social Networks
327
Acknowledgements The article was supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100’ and RFBR grant 16-29-09583 “Methodology, techniques and tools of recognition and counteraction to organised information campaigns on the Internet”. The dataset used in this research paper was provided by the NRU HSE International Laboratory for Applied Network Research.
References 1. Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Proceedings of the 21st International Conference on World Wide Web, pp. 519– 528. ACM, Apr 2012 2. Centola, D., Eguluz, V., Macy, M.: Cascade dynamics of complex propagation. Phys. A Stat. Mech. Appl. 374(1), 449–456 (2007) 3. Cha, M., Mislove, A., Gummadi, K.: A measurement-driven analysis of information propagation in the Flickr social network. In: Proceedings of the 18th International Conference on World Wide Web, WWW 09, pp. 721–730 (2009) 4. Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1029–1038. ACM, July 2010 5. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 199–208. ACM, June 2009 6. Coleman, J., Katz, E., Menzel, H.: The diffusion of an innovation among physicians. Sociometry 20(4), 253–270 (1957) 7. Dodds, P., Watts, D.: A generalized model of social and biological contagion. J. Theor. Biol. 232(4), 587–604 (2005) 8. Domingos, P., Richardson, M.: Mining the netwDork value of customers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 57–66. ACM, Aug 2001 9. Goldenberg, J., Libai, B., Muller, E.: Using complex systems analysis to advance marketing theory development. Acad. Mark. Sci. Rev 10. Goldenberg, J., Libai, B., Muller, E.: Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark. Lett. 12(3), 211–223 (2001) 11. Goyal, A., Bonchi, F., Lakshmanan, L.V.: A data-based approach to social influence maximization. Proc. VLDB Endow. 5(1), 73–84 (2011) 12. Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978) 13. Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Exp. Syst. 1(4), 111–122 (2011) 14. Kempe, D., Kleinber, J., Tardos, E.: Maximizing the spread of influence in a social network. In: Proceeding 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146 (2003) 15. Kempe, D., Kleinberg, J., Tardos, E.: Influential nodes in a diffusion model for social networks. In: Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP). Springer Berlin Heidelberg, Lisbon 16. Kimura, M., Saito, K.: Tractable models for information diffusion in social networks. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 259–271. Springer Berlin Heidelberg, Sept 2006 17. Kostka, J., Oswald, Y.A., Wattenhofer, R.: Word of mouth: rumor dissemination in social networks. In: International Colloquium on Structural Information and Communication Complexity, pp. 185–196. Springer Berlin Heidelberg, June 2008
328 18. 19. 20. 21.
R. Laptsuev et al.
Leskovec, J.: Epinions social network (2010) Liggett, T.: Interacting Particle Systems. Springer (1985) Morris, S.: Contagion. Rev. Econ. Stud. 67, 57–78 (2000) Onnela, J., Saramaki, J., Hyvonen, J., et al.: Structure and tie strengths in mobile communication networks. PNAS 104(18), 7332 7336 (2007) 22. Subbian, K., Aggarwal, C., Srivastava, J.: Mining influencers using information flows in social streams. ACM Trans. Knowl. Discov. Data 10(3), 26 (2016) 23. Zhao, J., Wu, J., Feng, X., Xiong, H., Xu, K.: Information propagation in online social networks: a tie-strength perspective. Knowl. Inf. Syst. 32(3), 589–608 (2012)
Analysis of Co-authorship Networks and Scientific Citation Based on Google Scholar Nataliya Matveeva
and Oleg Poldin
Abstract In this study, we investigated how scientific collaboration represented by co-authorship is related to citation indicators of a scientist. We use co-authorship network to explore the structure of scientific collaboration. For network construction, the profiles of scientists from various countries and scientific fields in Google Scholar were used. We ran the count data regression model for a sample of more than 30 thousand authors with the first citation after 2007 to analyze the correlation between co-authorship network parameters of scientists and their citation characteristics. We identify that there is a positive correlation between citation of scientist and number of his co-authors, between citation and the author’s closeness centrality, and between scholar’s citation and the average citation of his co-authors. Also, we reveal that h-index and i10-index are correlated significantly with the number of co-authors and average citation of co-authors. Based on these results, we can conclude that scientists who maintain more contacts and more active than others have better bibliometric indicators on an average. Keywords Co-authorship · Social network analysis · Bibliometric analysis Google Scholar · Count data models
1 Introduction Collaboration between researchers is an essential feature of scientific activity and it assumes a work of several researchers on a scientific problem. Collaboration requires scientists to invest time and financial resources, but these expenses can be paid off by the growth of scientific output. How scientific collaboration influences the scientific impact and how this influence can be estimated? These questions are crucial to the investigation of scientific activity [32].
N. Matveeva (B) · O. Poldin Higher School of Economics, National Research University, Moscow 110000, Russia e-mail:
[email protected] © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_25
329
330
N. Matveeva and O. Poldin
In the short term, the evaluation of scientific activity is measured by the relevance of publications. The widespread indicators of publications’ relevance are citation index and its derivatives such as Hirsch-index, i10-index, and other [6, 7, 13, 22, 36]. The scientific fields and researches, which work in that, have a different publication culture and different approaches to cooperation so the evaluation of scientific activity is challenging. The scientific collaboration is easier to compose through co-authorship, because the product of co-authorship is a publication. Co-authorship networks are a form of social network, which reflects the collaboration between authors. In this network, the authors are nodes and their joint publications are links. Glanzel and colleagues as well as Heffner A. investigated how the network characteristics vary depending on country and scientific fields. They have found that metrics of co-authorship networks are different in various scientific fields [17, 18, 21]. The ways of networks construction and methods for its evaluation have been described by Newman and his colleagues [16, 30, 29]. Information and knowledge are spread through co-authorship, so it represents the social resources, which scientists have. It is reasonable to assume that vicinity to resources facilitates the achievement of the greater scientific output. Studies have shown that there is a positive correlation between some network characteristics (normalized degree centrality, normalized eigenvector centrality, average ties strength) and citation index (g-index) [1], between citation count of article and tie strength between the authors [26, 35]. Li with colleagues as well as Guan found that research collaboration positively affects paper’s citation count [19, 26]. The drawbacks of some other prior studies are caused by using a small sample of scientists from different fields [39] or use data about scientists from only one scientific field [4, 33, 35, 38], that in both cases affects the reliability of the results. Co-authors network constructed on the base of parsed articles also affect the quality of the analysis due to the complications in the determination of the conformity between authors with identical surnames and their works [28]. The present study is based on bibliographic database Google Scholar (GS). In contrast to Web of Science (WoS) and Scopus, GS indexes more scientific sources, and has free access and provides relatively simple information extraction. Thereby GS allows analyzing large amounts of data about scientists from various scientific fields. Furthermore, since the scientists can manage their profiles in GS, the misidentification of authors with the same names and their works reduce significantly. To date, there are very few studies considering co-authorship networks and scientific citation that are based on the authors’ profiles of GS [3, 23]. Possible reasons for this are the indexation of not verified scientific sources [24] and misidentification of authors and their works, the presence of “phantom” authors, as well as incorrect identification of the year of publication, which certainly affects the citation indexes [24]. Despite this, there is a strong correlation between the authors’ ratings based on GS and other databases [14, 37]. And, in the past decade, the GS has made great efforts to make their database more relevant (https://scholar.google.com/intl/en/sch olar/citations.html).
Analysis of Co-authorship Networks …
331
Here, we evaluate the correlation between network characteristics of the scientists such as degree centrality, closeness centrality, and an indicator of citation: the total number of citations, h-index, i10-index. In addition, we identified a correlation between indicators of citation and average co-authors citation of a scientist. Investigated networks’ parameters indicate the position of the scientists in the network of collaboration and characterize the potential of the scientists. To fill the void in the literature, in this study we use a large dataset (110 thousand for network analysis) by taking into account the size of the sample (34 thousand for regression analysis). The sample was divided into subsets corresponding the tenure and scientific field because of the facts that citation’s indicators depend on tenure and scientific fields [12, 27]. Moreover, we investigate not only how scientific collaboration is linked with the citation indexes, but also we take into account new parameter—average citation of co-authors and investigate how it correlates with citation count for relatively young researches.
2 Study Context and Data 2.1 Google Scholar We constructed co-authorship network based on the profile of the authors in Google Scholar. GS is a free web search system, which provides a full-text search of all types’ of publications from various disciplines. This system was launched in November 2004. GS contains basic information about publications such as title, authors, and total number of citation. An information about citing sources, publisher, year of publications, and links on full-text publication is also provided by GS. GS comprises features of online social networking site, since it allows users to register their researchers’ profiles in which they can define their co-authors in a special section of the personal page. Personal page of researchers contains data about his\her publications, which indexed by GS, its citation, i10 index, Hirsch index for all the years and for the past 5 years and optional information about researchers affiliation, position, and scientific interests. Registered researchers can manage the list of their publications and the list of co-authors. The drawbacks of GS are indexing of nonscientific sources (blogs and presentations), indexing of the articles in non-refereed journals and proceedings, for some profiles there is misidentification of authors and their works. All that leads to incorrect citation count [25, 24]. It can be assumed that these drawbacks are not prevalent since there is a strong correlation between bibliometric indicators based on GS and based on WoS [20, 37].
332
N. Matveeva and O. Poldin
2.2 Citation and Its Derivatives A citation of publication is the number of references on this publication in the others. The scientist’s citation is the sum of references for all his\her publications. In contrast to the number of publications, citation takes into account the qualitative aspect of scientific activity—it is shown that better works are cited more often [15]. The major shortcoming of citation is the dependence of this indicator on the disciplines and on the year of publication. The citation indexes also vary according to using bibliographic database [5]. In addition to the Hirsch index, Google Scholar uses the i10-index, which indicates the number of scientific publications, which have at least 10 citations. It was introduced in the year 2011 and is calculated only by GS. The advantages of its merits are a more accurate (in comparison with the Hirsch index) reflection of the number of publications, which have ten and more citations, and i10 is easy to calculate. The shortcoming of i10-index is the local use and as well as h-index it does not reflect the publication having significantly more than ten citations. Google Scholar indexes not only journal articles and English-language works, but also other types of publications. So the citation indicators calculated based on GS are higher than that in WoS and Scopus. However, there is a strong correlation between citation ratings compiled by these databases [10, 14, 37].
2.3 Co-authorship Network In the co-authorship network, nodes represent the authors and their joint publications are links. Such network represents the interaction between the authors: how often the authors have collaboration (presented as number of publications), who is the most active author, position of authors in the network, how information can be distributed, and so on. Parameters of that network characterize the quantitative estimate of these interactions. In present study for network construction, information about scientist’ co-authors and scientific interest were used, which is indicated in their profiles in GS. Based on scientific interest scientists were divided into the groups corresponding to the fields: computer science, economics and finance, biology and medicine, physics and chemistry, mathematics, social sciences, and humanities. For full co-authorship network and for each discipline the following were calculated: the total number of nodes, degree centrality, average number of co-authors citation, and closeness centrality. The total number of nodes is scientists who have a publication. This number is different in various disciplines and reflects the presence of various disciplines in GS. Degree centrality is the number of co-authors, which a scientist has. Degree centrality of nodes is measured as a sum of its adjacent nodes [34]. Average number of co-author’s citation indicates how often citation the co-
Analysis of Co-authorship Networks …
333
authors of scientist have on the average in the sample. Closeness centrality measures how close a particular author is to the other and how easy they can reach his. 1 The data collection was started with a list of 12 profiles of Russian researchers, and then information was extracted about their co-authors, co-authors of co-authors and so on up to 110 thousand. Based on these data adjacency matrix was made. In this matrix rows and columns are nodes and elements of the matrix indicate whether the pairs of vertices are connected or not. After preparing the adjacency matrix, we calculate short paths for each node using Dijkstra’s algorithm [11]. In addition, we calculated network parameters and citation indicators.
3 Analysis and Results Using our data, we calculate network metrics and indicators of citation. The results for the full sample and for groups of disciplines are shown in Table 1. The highest average values of citations, indexes Hirsch and i10 are in the fields of biomedicine, physics and chemistry, and the lowest indexes—in economics and finance. The average number of co-authors for all disciplines is 6, this indicator is above average in the computer science and math, below average is in economics and finance. For full sample closeness centrality is 0.185, it is above average in computer science, below average in economics and finance and physics and chemistry. The variable tenure in Table 1 represents the average tenure of scientists in the sample and calculated as average number of years in which scientist’s publications were cited. Based on this analysis, the result obtained by Ortega [31] was confirmed that in GS scientists who are more or less engaged in the field of computer sciences prevail.
3.1 Count Regression Models In our study, the dependent variables represent the number of events (citations) and, therefore, are nonnegative integers. For regression analysis of that cases, count data models are often used. We use model variations with negative binomial distribution, which is a generalization of a simpler and limited Poisson model [8]. In particular, the model with negative binomial distribution has more flexibility for describing the variance of the dependent variable and is better appropriate for modeling situations in which individual citation facts are interrelated events [2]. For the number of citation as dependent variables, we ran the Zero-truncated model, because we do not take into account zero values of citation. Not all scientists at the sample have publications N −1 centrality was calculated by Ci − (N − 1)−1 j di j , where Si —centrality of iparticipant; N—full number of participants; d ij —the shortest distance from participant i to participant j.
1 Closeness
334
N. Matveeva and O. Poldin
Table 1 Descriptive statistics: mean (std. deviation) Full Disciplines sample BM
CS
EF
Math
PhCh
SH
Citation
333.9 (1041.9)
542.8 (1529.2)
262.1 (873.4)
234.1 (414.0)
266.0 (504.3)
470.3 (1153.3)
307.6 (611.7)
h-index
6.4 (4.94)
7.6 (5.6)
5.7 (4.3)
5.7 (3.8)
6.5 (4.3)
7.9 (5.8)
6.542 (4.755)
i10-index
5.8 (9.3)
7.8 (10.3)
4.8 (7.3)
4.7 (5.7)
5.9 (7.2)
8.4 (11.9)
6.113 (8.597)
Number of co-authors
6.03 (6.62)
5.93 (6.34)
6.65 (7.59)
4.07 (3.75)
6.11 (5.67)
5.93 (5.39)
5.913 (6.521)
Average citation 4285.2 of co-authors (5728.4)
6355.0 (7630.2)
3705.9 (4498.8)
4209.0 (6239.2)
3984.9 (4600.6)
5060.1 (5720.8)
4019.3 (5194.9)
Tenure
4.928 (1.725)
5.111 (1.651)
4.758 (1.746)
5.221 (1.672)
5.088 (1.669)
5.041 (1.661)
5.081 (1.671)
Closeness centrality
0.185 (0.0163)
0.185 (0.0141)
0.190 (0.0136)
0.165 (0.0188)
0.185 (0.0154)
0.178 (0.0134)
0.180 (0.0186)
Observations
34701
1878
6514
1004
1247
1353
2259
BM—Biology and Medicine, CS—Computer Science, EF—Economics and Finance, Math—Mathematics, PhCh—Physics and Chemistry, SH—Social Sciences and Humanities
with 10 and more citation and for that reason, for i10-index, the Hurdle model were used. For results’ interpretation, it is more convenient to consider not the variable coefficients, which is the exponential function in the model, but changes of the dependent variable with an increment of the explanatory factors. Since response variables differ in scale, we considered the effects of a discrete increment of these variables from their mean values by one standard deviation, and other variables are fixed at the mean value. For citation as a dependent variable, the mean of the effects and the levels of 95% confidence intervals for the full sample and for individual specializations are shown in Table 2. In the full sample, an increase in the average citation of co-authors to a standard deviation of 5728 (Table 1) is associated with an increase in citations by 116.8 units; with an increase in the number of co-authors from 6 to 13, the citation increases by 92.4. In the presented specializations, the greatest absolute effect is observed for the authors working in the field of biology and medicine, the lowest in the field of economics and finance. However, the mean and variance of citations for biology and medicine are greater than that for economics and finance. In the last column of Table 2, the absolute effects are normalized to the standard deviations of the citation in the corresponding sample. The figures show how many standard deviations the citation changes with the growth of response variable by one standard deviation. In these terms, the average citation of co-authors has the greatest effect for specialization in math and the number of co-authors has the greatest effect
Analysis of Co-authorship Networks …
335
Table 2 Changes of the citation with increasing response variables by 1 std y Low level Higher level
y/σy
Full sample Average number of co-authors citation
116.8
103.9
129.7
0.11
Number of co-authors Ln(tenure)
92.4 309.9
86.1 292.7
98.8 327.0
0.09 0.30
65.5
49.1
81.9
0.07
105.8 223.4
91.6 206.1
120.1 240.6
0.12 0.26
24.7
6.5
42.8
0.06
71.6 266.7
55.3 231.0
88.0 302.4
0.17 0.64
Average number of co-authors citation
176.2
123.5
228.8
0.12
Number of co-authors Ln(tenure)
156.5 483.4
117.4 416.6
195.5 550.2
0.10 0.32
Average number of co-authors citation
124.4
85.6
163.2
0.11
Number of co-authors Ln(tenure)
112.0 420.8
81.5 355.3
142.5 486.4
0.10 0.36
86.8
44.6
129.1
0.17
80.1 260.4
61.5 218.9
98.7 302.0
0.16 0.52
93.4
46.5
140.2
0.15
88.2 223.1
69.2 137.1
107.2 309.0
0.14 0.36
Computer science Average number of co-authors citation Number of co-authors Ln(tenure) Economics and Finance Average number of co-authors citation Number of co-authors Ln(tenure) Biology and Medicine
Physics and Chemistry
Math Average number of co-authors citation Number of co-authors Ln(tenure) Social Sciences and Humanities Average number of co-authors citation Number of co-authors Ln(tenure)
for economics and finance. Quantitative estimates of the number of co-authors vary from 0.09 to 0.17, the coefficients of co-authors citations vary from 0.06 to 0.17. We ran a negative binomial regression model with taking into account zero values to identify which of the response variables (average number of co-authors, number of co-authors, and the logarithm of citations tenure) impact the h-index. The results of this model for the full sample are present in Table 3. All variables are significant. The negative binomial regression coefficients are interpreted as follows: the number of co-authors has a greater effect (0.25) for h-index than citation (0.12) of the coauthors, while for the citation the effects of these variables are comparable (0.09 and
336
N. Matveeva and O. Poldin
Table 3 Regression estimates for h-index Coefficient β
y if x change by 1 std
y
Low level
Higher level
y/σ
Average number of co-authors citation × 10−4
0.173*** (0.010)
0.58
0.52
0.65
0.12
Number of co-authors
0.031*** (0.001)
1.25
1.18
1.32
0.25
Ln (tenure)
0.985*** (0.008)
3.26
3.20
3.32
0.66
Constant
–0.021* (0.012)
ln (α)
–2.246*** (0.035)
Number of observations
34701
pseudo-R2
0.123
*p < 0.1, ***p < 0.01
0.11). Another difference of results for citation and h-index relates to the correlation with the tenure of citation: for the Hirsch index, the relation is almost linear (for the citation––quadratic). The normalized effect of tenure for h-index is twice greater (0.66) than that for citation (0.30). The results of hurdle models for the i10-index are shown in Table 4. The first part is logistic model of a binary choice, in which the probability to reaches the author a nonzero index i10 (the author has at least one publication with ten citations) is a dependent variable. The correlation coefficient values are interpreted as follows: with an increase of an average number of co-authors’ citation by one standard deviation the probability to reaches nonzero index increases by 2.6% and by 6% with an increase of the number of co-authors by one standard deviation. In the second part, we run a negative binomial model with truncated zeros in which a nonzero value of i10-index is the dependent variable. As can be seen from the last column, for i10-index the normalized effect of a discrete change in the average citation of co-authors (0.12) is noticeably weaker than a change in the number of co-authors (0.22).
4 Conclusion In this work, we investigated how scientific collaborations represented by coauthorship network relate to the scientific impact. For this purpose, we analyzed the correlation between the number of co-authors, the author’s centrality, average citation of co-authors, the tenure of citation, and citation characteristics—the number of citations, Hirsch index, and i10-index. To avoid the limitation of previous works caused by datasets, we used a large dataset about scientists from various countries
Analysis of Co-authorship Networks …
337
Table 4 Regression estimates for i10 index Coefficient β
y if x change by 1 std y
Low level Higher level
y/σ
Dependent variable—P(i10 > 0) Average number of co-authors citation × 10−4
0.730*** (0.060)
0.026
0.023
0.030
Number of co-authors
0.208*** (0.008)
0.060
0.058
0.062
Ln(tenure)
2.804*** (0.040)
0.058
0.056
0.061
Constant
–3.374*** (0.062)
Number of observations
34701
pseudo-R2
0.342
Dependent variable—i10 | i10 > 0 Average number of co-authors citation × 10−4
0.365*** (0.019)
1.156
1.021
1.292
0.12
Number of co-authors
0.050*** (0.001)
2.024
1.921
2.128
0.22
Ln(tenure)
1.861*** (0.030)
4.093
3.925
4.262
0.44
Constant
–1.909*** (0.051)
ln (α)
–0.384*** (0.030)
Number of observations
28690
pseudo-R2 In brackets are standard errors;
0.0738 ***
p < 0.01
and scientific fields. We divided scientists by the year of the first citation and ran a regression analysis for relatively young scientists. The results of count data regression model show that there is a positive correlation between scholar’s citation counts and the number of co-authors, between citations and the author’s centrality. It was found that average citation of co-authors has positive effects on scholar’s citations. We estimated additionally the extension of statistical connection between the variables by taking the normalized coefficients that characterize the variation of the dependent variable (in absolute figures and standard deviations) with an increase in the factors by one standard deviation. The variations in results depend on the scientific specialization of the author. In addition, in this study, the Hirsch index and the i10-index are used as indicators of scientific impact. These indicators significantly correlate with the number of co-authors and average
338
N. Matveeva and O. Poldin
citations of co-authors, while the normalized effect of the number of co-authors is approximately twice greater than the effect of average co-authors’ citation. Thus, in light of the above, we may conclude that the scientists who maintain more contacts and are more active than others have better bibliometric indicators on the average. These results correspond to other studies, for example [17, 32], which show that citation of publication growth with an increase in the number of co-authors and more frequently cited papers are mostly co-authored by scientists who have higher network characteristic [9, 35]. On the other hand, our results do not coincide with Abbasi et al. [1]. They concluded that closeness centrality is not positively correlated with the citation indicator of scientists. A possible reason for that can be the inclusion in their sample scientists with different citation tenure. Namely, presence in co-authorship network professors with a large number of co-authors can influence the centrality of their graduate students. In our work, we obtain opposite results due to the separation of the sample by year of the first citation.
References 1. Abbasi, A., Altmann, J., Hossain, L.: Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. J. Inf. 5(4), 594–607 (2011) 2. Ajiferuke, I., Famoye, F.: Modelling count response variables in informetric studies: Comparison among count, linear, and lognormal regression models. J. Inf. 9(3), 499–513 (2015) 3. Arnaboldi, V., Dunbar, R.I., Passarella, A., Conti, M.: Analysis of co-authorship ego networks. In: International Conference and School on Network Science, pp. 82–96, Springer, Cham, Jan 2016 4. Avkiran, N.K.: An empirical investigation of the influence of collaboration in finance on article impact. Scientometrics 95(3), 911–925 (2013) 5. Bakkalbasi, N., Bauer, K., Glover, J., Wang, L.: Three options for citation tracking: Google scholar, Scopus and Web of Science. Biomed. Digit. Libr. 3(1), 7 (2006) 6. Bergstrom, C.T., West, J.D., Wiseman, M.A.: The eigenfactor metrics. J. Neurosci. 28(45), 11433–11434 (2008) 7. Bornmann, L., Daniel, H.-D.: What do citation counts measure? A review of studies on citing behavior. J. Doc. 64(1), 45–80 (2008) 8. Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data. Cambridge University Press, Cambridge (2013) 9. Cimenler, O., Reeves, K.A., Skvoretz, J.: A regression analysis of researchers’ social network metrics on their citation performance in a college of engineering. J. Inf. 8(3), 667–682 (2014) 10. Delgado, E., Repiso, R.: The impact of scientific journals of communication: comparing Google Scholar metrics, Web of Science and Scopus/el impacto de las revistas de comunicación: comparando Google Scholar metrics, Web of Science y Scopus. Comunicar 21(41), 45–52 (2013) 11. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959) 12. Ductor, L., Fafchamps, M., Goyal, S., van der Leij, M.J.: Social networks and research output. Rev. Econ. Stat. 96(5), 936–948 (2014) 13. Egghe, L.: Theory and practice of the g-index. Scientometrics 69(1), 131–152 (2006) 14. Franceschet, M.: A comparison of bibliometric indicators for computer science scholars and journals on Web of Science and Google Scholar. Scientometrics 83(1), 243–258 (2009)
Analysis of Co-authorship Networks …
339
15. Garfield, E.: Citation indexing: Its theory and application in science, technology, and humanities. Wiley, New York (1979) 16. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99(12), 7821–7826 (2002) 17. Glanzel, W.: Coauthorship patterns and trends in the sciences (1980–1998): A bibliometric study with implications for database indexing and search strategies. Library Trends 50(3), 461–475 (2002) 18. Glänzel, W., Schubert, A.: Analysing scientific networks through co-authorship. Handbook of Quantitative Science and Technology Research, pp. 257–276. Springer (2004) 19. Guan J., Yan Y., Zhang J.J.: The impact of collaboration and knowledge networks on citations. J. Inf. – T. 11. – №. 2. – C. 407–422 (2017) 20. Harzing, A.-W.K., Van der Wal, R.: Google Scholar as a new source for citation analysis. Ethics Sci. Environ. Politi. 8(1), 61–73 (2008) 21. Heffner, A.: Funded research, multiple authorship, and subauthorship collaboration in four disciplines. Scientometrics 3(1), 5–12 (1981) 22. Hirsch, J.E.: An index to quantify an individual’s scientific research output. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 102 (46), pp. 16569–16572 (2005) 23. Hossain, M.I., Kobourov, S.: Research Topics Map: Rtopmap (2017). arXiv:1706.04979 24. Jacso, P.: Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for FW Lancaster. Libr. Trends 56(4), 784–815 (2008) 25. Jacsó, P.: The pros and cons of computing the h-index using Google Scholar. Online Inf. Rev. 32(3), 437–452 (2008) 26. Li, E.Y., Liao, C.H., Yen, H.R.: Co-authorship networks and research impact: a social capital perspective. Res. Polic. 42(9), 1515–1530 (2013) 27. Murugesan, P., Moravcsik, M.J.: Variation of the nature of citation measures with journals and scientific specialties. J. Am. Soc. Inf. Sci. – T. 29. – №. 3. – C. pp. 141_147 (1978) 28. Méndez-Vásquez, R.I., Suñén-Pinyol, E., Cervelló, R., Camí, J.: Identification and bibliometric characterization of research groups in the cardio-cerebrovascular field, Spain 1996–2004. Revista Española de Cardiología (English Edition) 65(7), 642–650 (2012) 29. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004) 30. Newman, M.E.J.: The structure of scientific collaboration networks. In: Proceedings of the National Academy of Sciences, vol. 98(2), pp. 404–409 (2001) 31. Ortega, J.L.: How is an academic social site populated? A demographic study of Google Scholar citations population. Scientometrics 104(1), 1–18 (2015) 32. Persson, O., Glänzel, W., Danell, R.: Inflationary bibliometric values: the role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics 60(3), 421–432 (2004) 33. Puuska, H.-M., Muhonen, R., Leino, Y.: International and domestic co-publishing and their citation impact in different disciplines. Scientometrics 98(2), 823–839 (2014) 34. Scott, J.: Social Network Analysis: A handbook. Sage (1991) 35. Uddin S., Hossain L., Rasmussen, K.: Network effects on scientific collaborations. PLoS ONE, 8(2) (2013) 36. West, J., Bergstrom, T., Bergstrom, C.T.: Big Macs and eigenfactor scores: don’t let correlation coefficients fool you. J. Am. Soc. Inf. Sci. Technol. 61(9), 1800–1807 (2010) 37. Wildgaard, L.: A comparison of 17 author-level bibliometric indicators for researchers in astronomy, environmental science, philosophy and public health in Web of Science and Google Scholar. Scientometrics 104(3), 873–906 (2015) 38. Yu, Q., Shao, H., Long, C., Duan, Z.: The relationship between research performance and international research collaboration in the C&C field. Exp. Clin. Cardiol. 20(6), 145–153 (2014) 39. Zuckerman, H.: Nobel laureates in science: Patterns of productivity, collaboration, and authorship. Am. Soc. Rev. 32(3), 391–403 (1967)
Company Co-mention Network Analysis S. P. Sidorov, A. R. Faizliev, V. A. Balash, A. A. Gudkov, A. Z. Chekmareva and P. K. Anikin
Abstract In network analysis, the importance of an object can be found by using different centrality metrics such that degree, closeness, betweenness, and so on. In our research we form a network, which we called company co-mention network. The network is constructed quite similar to social networks or co-citation networks. Each company is a node and news mentioning two companies establishes a link between them. Each company acquires a certain value based on the amount of news which is mentioned in the company. This research examines the network of companies by using companies’ co-mention news data. A matrix containing the number of co-mentioning news between pairs of companies is created for network analysis of companies, whose shares are traded on major financial markets. We used different types of SNA metrics (degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, frequency) to find a key company in the network. Moreover, it was shown that distribution of degrees and clustering-degree relations for our network follows the power law, although with nontypical indicators of degree exponent. News analytics data have been employed to collect the companies co-mentioning news data, and R packages have been used for network analysis as well as network visualization. Keywords Network analysis · News analytics · Degree distribution · SNA metrics
1 Introduction The paper [7] describes network analysis as a set of research procedures used to identify structures in systems based on the relations among the system’s components (nodes) rather than the attributes of individual cases. Social network analysis (SNA) This work was supported by the Russian Fund for Basic Research, project 18-37-00060. S. P. Sidorov (B) · A. R. Faizliev · V. A. Balash · A. A. Gudkov A. Z. Chekmareva · P. K. Anikin Saratov State University, Saratov, Russian Federation e-mail:
[email protected] URL: http://www.sgu.ru © Springer International Publishing AG, part of Springer Nature 2018 V. A. Kalyagin et al. (eds.), Computational Aspects and Applications in Large-Scale Networks, Springer Proceedings in Mathematics & Statistics 247, https://doi.org/10.1007/978-3-319-96247-4_26
341
342
S. P. Sidorov et al.
allows us to analyze the structure of relations in an organization [8, 10]. The paper [53] considers SNA as a method of examining relationships among social entities. The fundamental concepts of SNA are node and link. A node is the unit (individual, object, item) and a link serves as the relationship between nodes. Companies’ comention news data can be easily transformed into network data, since a company can be a node and the act of mentioning two companies in one news can be considered as a link between them. Moreover, based on co-mention data one can examine different properties of networks, such as centrality and tie strength. Highly mentioned companies can be considered as key companies, which have more influence on economy or other companies. Key companies in co-mention analysis should be central nodes in the network. In addition, we can use the number of co-mentions as the link weight, where highly mentioned companies are considered to have a higher level of link weight. A company co-mention network is a set of companies, which have connections in a pair to represent their co-mention relationship. Two companies are considered in a relationship if there has been published a publicly available news mentioning them. In such type of network, a company will be called as “node” or “vertex” and the connection will be an “edge”. Company co-mention network will be represented by undirected weighted graph. This network represents only links between companies and is not related to how many news were published mentioning each of these companies. Company co-mention network is similar to social networks. Different types of social network analysis metrics and citation indices can be used for finding key companies in the network. In this paper, we will apply different types of social network analysis metrics for finding key companies in the company co-mention network. In our research, we would like to find answers to the following questions: • What are the key companies in the network? • What type of degree distribution exhibits the company co-mention network? • What type of the functional form has the clustering-degree relation to the network? Note that the past two decades have seen extensive research in the area of degree distribution analysis of complex networks arisen in sociology, physics, and biology. It has been shown that many networks have similar degree distributions [2, 4, 13, 22, 38, 42]. It turned out that most of the real networks have degree distributions that are scale free [4]. In other words, their degree distributions are power law. The paper [9] studies the structure of international Internet hyperlinks. Their results show some countries are more central than others in the hyperlink network, and that the structure corresponds with the global structure with core and peripheral dimensions suggested by world system theory.
Company Co-mention Network Analysis
343
2 Data News agencies, stocks exchanges, companies, journals generate a huge amount of economic and financial news. Company names are mentioned in these news. As it was pointed out in Introduction, company co-mention analysis can be carried out as follows: 1. Full texts of all economic and financial news published during a period of time are collected; 2. For each news full text, a list of companies it mentions is gathered; 3. For all set of available news, a weighted co-mention count is determined for each pair of co-mentioned companies; 4. The weighted co-mention counts are accumulated into a symmetric co-mention matrix; 5. The co-mention matrix is analyzed statistically, and the results are visualized and interpreted. Steps 1 and 2 require the processing of a large amount of news resources. Fortunately for us, the actions in steps 1 and 2 are performed by news analytics providers. In our study, we will use these already prepared news analytics data.
2.1 News Analytics Data Providers of news analytics (such as Raven Pack or Thompson Reuters) collect news from different sources in real time. They extract and use data from various sources such as news agencies and social media (blogs, social networks, etc.). They also examine the so-called pre-news, i.e., SEC reports, court documents, reports of various government agencies, business resources, company reports, announcements, industrial and macroeconomic statistics. Then providers of news analytics conduct a preliminary analysis of every news item in real time. They analyze news-related expectations (sentiments) using AI and taking into account the current market situation. For each news item, providers of news analytics generate the following fields: time stamp, company name, company id, relevance of the news, event category, event sentiment, novelty of the news, novelty id, composite sentiment score of the news, among others. Subscribers of news analytics data receive the news analytics data feed in real time and may design and employ quantitative models or trading strategies, which use both financial and the news analytics data. The review of methods and tools of news analytics can be found in books [40, 41]. In recent years, news analytics tools have been developed and used in social network analysis [11, 33, 39, 47].
344
S. P. Sidorov et al.
Table 1 News/companies c1 c2 N1 N2 N3 N4 N5 N6 N7 N8
+
+
+
+ +
c4
c5
c6
c7
+ +
+
+ +
+
+
+ +
+
+ +
+
+
Table 2 The matrix of weights c1 c2 c1 c2 c3 c4 c5 c6 c7
c3
0 3 1 0 0 0 0
3 0 1 0 0 0 0
c3
c4
c5
c6
c7
1 1 0 1 0 0 0
0 0 1 0 2 3 0
0 0 0 2 0 2 0
0 0 0 3 2 0 1
0 0 0 0 0 1 0
2.2 Generating Company Co-mention Network Methodology Company co-mention network is formed based on co-mention; it means that a company has connection with those companies who have been mentioned in a news item together. Based on the available data of news analytics, we built an adjacency matrix which represents the relationship between companies in line with the following approach. Suppose, there are eight news items: N1–N8. Let N1 mentions three companies c1, c2, and c3, N2 mentions two companies c3 and c4. Let N3 have three co-mentioned companies: c4, c5, and c6. Let N4 mention two companies c6 and c7, N5 mentions two companies c1 and c2, N6 have two co-mentioned companies: c4, c6, N7 mentions three companies c4, c5 and c6, N8 mentions two companies c1 and c2 (see Table 1). The nondirectional symmetric matrix with weights is presented in Table 2. Company co-mention network (Fig. 1) is constructed based on the matrix presented in Table 2. We link the company c1–c2 with weight 3, c1–c3 with weight 1, and so on. Here, nodes represent companies and a connection between two nodes emerges if there is a news item that mentions both companies in its text. Network Our data cover the 1-month period from February 1, 2015 to February 28, 2015 (i.e., 20 trading days). We consider all the news released during this period.
Company Co-mention Network Analysis
345
Fig. 1 Company co-mention network
2 3 1
2
1
4
1 3
5
2 6
3
1
1 7
Table 3 The descriptive statistics day
n
Sum
Mean
Minimum Maximum St. deviation
Median
Skewness
Kurtosis
1
28
234736
8383,43
262
10653
−0,60
1,75
15069
5415,21
We removed all the news on the imbalance of supply and demand before both the opening and the closing of trading time of different stock exchanges. Such news are released each trading day at the same time (usually, we can see several hundreds of news coming out in a short time at the beginning and at the end of the trading sessions). During February 2015, there have been released more than 230 thousand news mentioning more than 18000 companies. We formed the list of all companies which have at least one common news item with at least one other company, in order to include those companies which may be associated with all possible aspects of our research. There are more that 7000 such companies during February 2015. The descriptive statistics of time series one can be found in Table 3. After identifying and searching the indexed name of each company, co-mention frequencies between each pair of companies were calculated. A nondirectional symmetric matrix with valued weights was formed for the co-mention counts of each pair of companies, and R package was used to find basic statistics and to visualize the network of companies. After generating a network, we can start our analysis.
3 Network Analysis 3.1 Methodology of Social Network Analysis Social Network Analysis (SNA) considers social relations in terms of graph theory. SNA represents objects within the network as nodes, and links which represent relationships between the objects, such as friendship, co-authorship, organizations, and sexual relationships [1, 26, 36]. These networks can be drawn as a diagram, where nodes are represented as points and links are represented as lines. SNA consists of measuring of relationships and
346
S. P. Sidorov et al.
flows between objects (individuals, groups, organizations, URLs, and other connected entities) [18, 37, 53]. The nodes in our network are the companies while the links show relationships between the nodes. SNA provides both a visual and a mathematical analysis of relationships between individuals and objects [46, 49]. In general, analysis of social networks can help to evaluate the performance of individuals, groups, or the entire social network.
3.2 SNA Metrics Key objects are those that are related or involved with other objects largely. In the context of our analysis, a company with extensive links or co-mention with many other companies in the economy is considered more important than a company with a relatively fewer links. Different types of SNA metrics can be used to find a key company in the network. In this section, we present the following well-known metrics: degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, frequency. Frequency Frequency of a company is defined as the total number of news in which the company is mentioned. Degree Centrality Degree centrality (or connectedness) is simply a node’s number of direct links [25]. Degree centrality of a node is equal to the number of edges that are adjacent to this node [32, 36]. Degree centrality gives the simplest measure of centrality, since it quantifies only how many connection link companies to their immediate neighbors in the network [50]. Thus, key nodes usually have a high degree. Degree centrality is an indicator of a company’s influence. The normalized degree centrality is defined as the number of links of an actor divided by the maximal possible number of nods [20, 23]. The normalized degree centrality di of node i is defined as di =
d(i) n−1
Closeness Centrality To quantify how close a node is to all other nodes in a network (both directly and indirectly), one can use the measure of closeness [36]. Closeness is the average length (number of steps) of the shortest paths connecting a node to all other nodes in a network. Closeness is measured as the inverse of this average [25]. The closeness metric can only be exploited in a network in which every two nodes have at least one path to the other, i.e., the connected network [17, 19, 31]. The closeness centrality Cc (i) of a node i is defined as follows: Cc (i) =
1 , avg(L(i, j))
(1)
Company Co-mention Network Analysis
347
where L(i, j) is the length of the shortest path between two nodes, i and j. The closeness centrality of every node lies between 0 and 1. In the framework of SNA, closeness centrality measures the speed of information spreads from a given node to other reachable node in the network. Betweenness Centrality Betweeness centrality displays the extent to which a node lies along paths that connects other pairs of nodes [34]. A node in this location is in a position to control or influence information flows in the network. It is the proportion of paths containing a node that connects a pair of nodes in the network. Thus, betweenness centrality measures an object’s potential authority of communication within the network [32, 36]. By definition, betweenness centrality is the ratio of the number of shortest paths (between all pairs of nodes) that pass through a given node divided by the total number of shortest paths. The betweenness centrality Cb (i) of a node i is equal to σ jk (i) 2 , (2) Cb (i) = (n − 1)(n − 2) j=i=k σ jk where the sum is taken over all nodes j and k that are distinct from i, σ jk is the number of shortest paths from j to k, and σ jk (i) denotes the number of shortest 2 is the paths from j to k that contain the node i [24, 26]. The coefficient (n−1)(n−2) number of node pairs excluding i and normalizes the value of betweenness centrality, such that for each node i, the value of Cb (i) lies between 0 and 1 [16, 27, 52]. Eigenvector Centrality Eigenvector centrality shows how central (or peripheral) an object is based on its loading on the largest eigenvector of the network’s sociomatrix [14]. Eigenvector centrality is proposed to measure the influence of a node in a network. Eigenvector centrality extends the notion of degree centrality [48]. While the degree centrality of a node counts the total number of nodes that are adjacent to that node, the eigenvector centrality takes into account both the total number of adjacent nodes and the importance of each of the adjacent nodes. In some sense, links with influence individual will give an individual more influence than a connection with less important persons. Therefore, links are not equal in eigenvector centrality. To find eigenvector centrality, we should take into consideration not only the connections but also the values of eigenvector centrality of associated nodes. Eigenvector centrality can be found by estimating how well connected an individual to the parts of the network with the greatest connectivity [15]. If an individual has high eigenvector score then it must have many direct links to other individuals, which have many direct connections with others, and they are connected with many nodes out to the end of the network. Eigenvector centrality has been proposed in 1987 by Philip Bonacich in [21, 43]. Eigenvector centrality is defined as simply dominant of eigenvector of the adjacency matrix. Eigenvector centrality is the useful measure for networks in which the weights (quantifying frequency of communication) between objects are known, rather than the presence or absence of a tie between nodes. It has the effect of making nodes with high connectivity to more important nodes appear relatively more important.
348
S. P. Sidorov et al.
As it pointed out in [12], individuals with high eigenvector centrality cannot necessarily perform the roles of individuals with high closeness and betweenness. As can be seen from Table 4, Apple with high frequency and with high degree is a key company by a significant margin as it has the largest number of joint news and the greatest number of links with other companies. We note that in terms of closeness centrality, all the companies in question are identical. The betweenness centrality indicator suggests that Apple is also one of the network links. Note that Bank of America falls into this list with a high betweenness centrality though this company is not among the top ten in terms of frequency or degree. Eigenvector centrality indicates that Continental Resources has connections with the largest and the most important companies, while this Apple’s indicator is low (0.2). It should also be noted that Apple trades on the Nasdaq exchange, and the remaining 19 companies from the table belong to the New York Stock Exchange.
3.3 Key Company Analysis and Results Our aim is to detect a key company in the company co-mention network. We constructed company co-mention network based on available news analytics data and conduct the frequency, degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality analysis. We start with calculating the frequency, degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality for every company by using R package. Then, we dispose the companies in descending order, then select top 10 companies from each measure and combined it and obtain 20 important companies presented in Table 4. The table shows the list of the companies, which are listed first by the decreasing order of frequency and then by degree centrality, betweenness centrality and eigenvector centrality of companies. These 40 companies’ data are attainable for our analysis. Companies with high frequency are key since they have more number of news mentioning others companies. Companies with high degree centrality are important since they have more relationship with the others in terms of co-mention. However, we cannot claim that a company with high degree centrality is important because sometimes a company can have connections with a big amount of companies many of which have very small number of news co-mention other companies. By this reason, degree centrality is not quite appropriate for detecting key company in the network. Companies with high closeness centrality are also important since they are close to average number of companies. Companies who have high betweenness centrality are also significant, because such companies regularly stood in the way of information flow in the network. Companies with high Eigenvector centrality are key in the network since the companies have connections with a high score eigenvector centrality companies.
Company Co-mention Network Analysis
349
Table 4 Top 20 companies with higher frequency Company Frequency Degree Degree Closeness Betweenness Eigenvector centrality × centrality × centrality × centrality 10−1 10−7 10−2 Apple Apache Continental Resources General Motors Exxon Mobil Halliburton Anadarko Petroleum Chesapeake Energy Basic Energy Services Devon Energy Transocean Ltd ACT.N Bank of America EP Energy JPMorgan Chase Concho Resources Pioneer Natural Resources Antero Resources RSP Permian American Express
3590 2335 2291
759 327 292
1,08 0,47 0,42
1,48825 1,48774 1,48776
4,18 0,65 0,23
0,20 0,97 1,00
2062
393
0,56
1,48818
2,31
0,09
2061
523
0,74
1,48823
3,02
0,19
1973 1914
450 339
0,64 0,48
1,48803 1,48790
0,61 1,05
0,17 0,72
1840
336
0,48
1,48807
0,84
0,66
1828
406
0,58
1,48813
0,60
0,14
1714
244
0,35
1,48777
0,13
0,83
1710
469
0,67
1,48818
1,62
0,14
1678 1651
416 389
0,59 0,55
1,48801 1,48834
0,91 3,13
0,11 0,09
1547 1541
203 362
0,29 0,52
1,48795 1,48820
0,60 2,56
0,82 0,08
1540
163
0,23
1,48745
0,08
0,77
1535
83
0,12
1,48697
0,00
0,94
1502
280
0,40
1,48752
0,10
0,47
1496
80
0,11
1,48703
0,00
0,91
1493
383
0,54
1,48816
1,32
0,09
350
S. P. Sidorov et al.
Closeness and betweenness centrality are useful in information flow network, but company co-mention network is news co-mention relationship network. Both stability and behavior of finance markets are highly dependant on news flow and chain reaction of bad or good news for a company can affect other companies, which might lead to turmoil of financial markets. Such effect depends on company, if a company is highly connected and it has links with other highly connected companies then there is a high probability that the news mentioning the company will affect market. Therefore, eigenvector centrality is more suitable for identifying key company in the network.
3.4 Degree Distribution Analysis Such local topological measure as the degree of a given node is not sufficient to characterize the whole network. One can usually integrate the local measure into a global description of the network by calculating the degree distribution, P(k), which represents the probability that a randomly taken node will have degree k. To find the degree distribution, one should count the number of nodes, n(k), each of whose have k = 1, 2, 3, . . . edges, and then divide n(k) by the total number of nodes of the network n. A degree distribution is called power-law distribution if P(k) ∼ Ak −γ , where A is a constant such that the sum of P(k) is equal to 1. The degree exponent γ is usually the same for similar networks. For example, it was shown in [4, 29, 30, 35] that 2 < γ < 3 for metabolic networks and the out-degree distribution of most gene-regulatory networks. The power-law of the degree distribution for the vast majority of real networks reveals the fractal structure of underlying systems. It means that there is a high dissimilarity of node degrees and there is no average node in the network that characterizes the rest of the nodes. For the sake of justice, it should be noted that some networks demonstrate exponential degree distributions. These networks include network of substations and power lines that forms the North American power grid [3], and the network of worldwide air transportation routes [5]. For the network under consideration, the distribution of degrees follows the power law with γ = 1.41: P(k) ∼ 8.34k −1.41 , R 2 = 0.87 The resulting model is statistically significant at any level of significance and is significantly better than the exponential model. However, the degree exponent γ = 1.41 does not fall within the interval (2, 3). Figure 2 shows the dependence of the number of companies n(k) on node degree k.
Company Co-mention Network Analysis
351
Fig. 2 The degree distribution of the companies’ co-mention network
3.5 Clustering Analysis The local clustering coefficient for node i is defined by Ci =
Ei , ki (ki − 1)
where E i is the number of links connecting the immediate neighbors of node i, and ki is the degree of node i. The average value of clustering coefficients of all nodes in a network is called the average clustering coefficient. The value of the average clustering coefficient quantifies the strength of connectivity within the network. The paper [51] examines protein–protein interaction networks and metabolic networks, which have to demonstrate large average clustering coefficients. The analogs result has been established for collaboration networks in academia and the entertainment industry in papers [6] and [28]. Let C(k) denote the average clustering coefficient of nodes with degree k. It has been found that for most of real networks C(k) follows C(k) ∼ Bk −β , where the exponent β usually lies between 1 and 2 [44, 45, 54]. For the given network, the clustering–degree distribution relation follows the power law: C(k) ∼ 1.02k −0.32 , R 2 = 0.69. The resulting model is statistically significant at any significance level. Herewith, the exponent β = 0.32 is turned out less than 1.
352
S. P. Sidorov et al.
Fig. 3 The clustering– degree relation of the companies co-mention network
The plot of the clustering–degree relation, i.e., C(k) as a function of node degree k, is shown in Fig. 3. The proportion of present edges from all possible edges in the network = 0.005. The diameter of our undirected graph is 11, i.e., the maximum length of the shortest path between two vertices, and the average distance between them is 3.48.
4 Conclusion This article examines company co-mention network and identified key companies in the network based on different network analysis indicators, such as frequency, normalized degree of centrality, closeness centrality, betweenness centrality, and eigenvector centrality. At the same time, most of the leading (key) companies belong to the New York Stock Exchange. It was shown that distribution of degrees and clustering–degree relations for our network adheres to the power law, although with nontypical indicators of degree exponent.
References 1. Abbasi, A., Altmann, J.: On the correlation between research performance and social network analysis measures applied to research collaboration networks. In: 44th Hawaii International Conference on System Sciences (HICSS). IEEE pp. 1–10 (2011) 2. Albert, R.: Scale-free networks in cell biology. J. Cell Sci. 118, 4947–4957 (2005) 3. Albert, R., Albert, I., Nakarado, G.: Structural vulnerability of the north American power grid. Phys. Rev. E 69, 025103(R) (2004) 4. Albert, R., Barabasi, A.L.: Statistical mechanics of complex networks. Rev. Modern Phys. 74, 47–97 (2002)
Company Co-mention Network Analysis
353
5. Amaral, L.A.N., Scala, A., Barthelemy, M., Stanley, H.E.: Classes of behavior of small-world networks. Proc. Natl. Acad. Sci. (USA) 97, 1149 (2000) 6. Anthonisse, J.M.: The rush in a directed graph. Technical (1971) 7. Barnett, G.: A longitudinal analysis of the international telecommunication network. Am. Behav. Sci. 44, 1655–1938 (2001) 8. Barnett, G., Danowski, J.A.: The structure of communication: a network analysis of the international communication association. Hum. Commun. Res. 19(2), 264–285 (1992) 9. Barnett, G., Park, H.: The structure of international internet hyperlinks and bilateral bandwidth. Annales des te‘le‘communications 60, 9–10, 1110–1127 (2005) 10. Barnett, G., Salisbury, J.: Communication and globalization: a longitudinal analysis of the international telecommunication network. J. World Syst. Res. 2(16), 1–17 (1996) 11. Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI & Society 30(1), 89–116 (2015) 12. Bihari, A., Pandia, M.: Key author analysis in research professionals’ relationship network using citation indices and centrality. Procedia Comput. Sci. 57, 606–613 (2015) 13. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.U.: Complex networks: structure and dynamics. Phys. Rep. 424, 175–308 (2006) 14. Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. J. Math. Soc. 2, 113–120 (1972) 15. Bonacich, P., Lloyd, P.: Eigenvector-like measures of centrality for asymmetric relations. Soc. Netw. 23(3), 191–201 (2001) 16. Borgatti, S.P.: Centrality and aids. Connections 18(1), 112–114 (1995) 17. Borgatti, S.P., Everett, M.G.: A graph-theoretic perspective on centrality. Soc. Netw. 28(4), 466–484 (2006) 18. Correa, C., Crnovrsanin, T., Ma, K.L.: Visual reasoning about social networks using centrality sensitivity. IEEE Trans. Vis. Comput. Graph. 18(1), 106–120 (2012) 19. Correa, C.D., Crnovrsanin, T., Ma, K.L., Keeton, K.: The derivatives of centrality and their applications in visualizing social networks. Citeseer (2009) 20. Deng, Q., Wang, Z.: Degree centrality in scientific collaboration supernetwork. In: International Conference on Information Science and Technology (ICIST). IEEE pp. 259–262 (2011) 21. Ding, D., He, X.: Application of eigenvector centrality in metabolic networks. In: 2nd International Conference on Computer Engineering and Technology (ICCET). IEEE, vol. 1, pp. 1–89 (2010) 22. Dorogovtsev, S.N., Mendes, J.F.F.: Evolution of networks. Adv. Phys 51, 1079 (2002) 23. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22(1), 457–479 (2004) 24. Estrada, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E 71(5), 056–103 (2005) 25. Freeman, L.: Centrality in networks: I. Conceptual clarification. Soc. Netw. 1(3), 215–239 (1979) 26. Friedl, D.B., Heidemann, J.: A critical review of centrality measures in social networks. Bus. Inf. Syst. Eng. 2(6), 371–385 (2010) 27. Gómez, D., Figueira, J.R., Eusébio, A.: Modeling centrality measures in social network analysis using bi-criteria network flow optimization problems. Eur. J. Oper. Res. 226(2), 354–365 (2013) 28. Granovetter, M.: The strength of weak ties. Am. J. Soc. 78, 1360 (1973) 29. Guelzim, N., Bottani, S., Bourgine, P., Kepes, F.: Topological and causal structure of the yeast transciptional network. Nat. Genet. 31, 60–63 (2002) 30. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barabasi, A.L.: The large-scale organization of metabolic networks. Nature 407, 651–654 (2000) 31. Jin, J., Xu, K., Xiong, N., Liu, Y., Li, G.: Multi-index evaluation algorithm based on principal component analysis for node importance in complex networks. Networks IET 1(3), 108–115 (2012) 32. Kas, M., Carley, L.R., Carley, K.M.: Monitoring social centrality for peer-to-peer network protection. IEEE Commun. Mag. 51(12), 155–161 (2013)
354
S. P. Sidorov et al.
33. Khan, W., Daud, A., Nasir, J.A., Amjad, T.: A survey on the state-of-the-art machine learning models in the context of NLP. Kuwait J. Sci. 43(4), 95–113 (2016) 34. Kincaid, D.: Communication network dynamics: cohesion, centrality, and cultural evolution. Prog. Commun. Sci. XII, 111–133 (1993) 35. Lee, T. I.: Transcriptional regulatory networks in saccharomyces cerevisiae. Science 298, 799– 804 (2002) 36. Liu, B.: Web Data Mining. Springer (2007) 37. Liu, X., Bollen, J., Nelson, M.L., de Sompel, V.: Co-authorship networks in the digital library research community. Inf. Process. Manag. 41(6), 1462–1480 (2005) 38. Lofdahl, C., Stickgold, E., Skarin, B., Stewart, I.: Extending generative models of large scale networks. Procedia Manufacturing 3 (Supplement C), 3868–3875. In: 6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the Affiliated Conferences, AHFE 2015 39. Manaman, H.S., Jamali, S., AleAhmad, A.: Online reputation measurement of companies based on user-generated content in online social networks. Comput. Hum. Behav. (Supplement C) 54, 94–100 (2016) 40. Mitra, G., Mitra, L. (eds.): The Handbook of News Analytics in Finance. Wiley (2011) 41. Mitra, G., Yu, X. (eds.): Handbook of Sentiment Analysis in Finance (2016) 42. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003) 43. Newman, M.E.: The mathematics of networks. The new palgrave encyclopedia of economics 2, 1–12 (2008) 44. Ravasz, R., Barabasi, A.L.: Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003) 45. Ravasz, R., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabasi, A.L.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002) 46. Said, Y.H., Wegman, E., Sharabati, W.K., Rigsby, J.: Social networks of author-coauthor relationships. Comput. Stat. Data Anal. 52(4), 2177–2184 (2008). Elsevier 47. Schuller, B., Mousa, A.E., Vryniotis, V.: Sentiment analysis and opinion mining: on optimal parameters and performances. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 5(5), 255– 263 (2015) 48. Spizzirri, L.: Justification and application of eigenvector centrality 49. Tang, J., Zhang, D., Yao, L.: Social network extraction of academic researchers. In: Seventh IEEE International Conference on Data Mining. IEEE, pp. 292–301 (2007) 50. Umadevi, V.: Automatic co-authorship network extraction and discovery of central authors. Int. J. Comput. Appl. 74(4), 1–6 (2013) 51. Wagner, A., Fell, D.A.: The small world inside large metabolic networks. Proc. Royal Soc. Lond. Ser. B Biol. Sci. 268, 1803–1810 (2001) 52. Wang, G., Shen, Y., Luan, E.: A measure of centrality based on modularity matrix. Prog. Nat. Sci. 18(8), 1043–1047 (2008) 53. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press (1994) 54. Yook, S.H., Oltvai, Z.N., Barabasi, A.L.: Functional and topological characterization of protein interaction networks. Proteomics 4, 928–942 (2004)