Idea Transcript
Gilbert Müller
Workflow Modeling Assistance by Casebased Reasoning
Workflow Modeling Assistance by Case-based Reasoning
Gilbert Müller
Workflow Modeling Assistance by Casebased Reasoning
Gilbert Müller Trier, Germany Dissertation University of Trier, Department IV, 2018
ISBN 978-3-658-23558-1 ISBN 978-3-658-23559-8 (eBook) https://doi.org/10.1007/978-3-658-23559-8 Library of Congress Control Number: 2018955144 Springer Vieweg © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer Vieweg imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH part of Springer Nature The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
Acknowledgements In recent years, many people greatly supported my work and the final outcome presented in this thesis. First and foremost, I would like to express my sincere gratitude to my supervisor Prof. Dr. Ralph Bergmann for his inspiring guidance and constant readiness to engage in dialogue. His great expertise strongly influenced my knowledge on Case-based Reasoning and Workflow Management. Not only for my research, but also for the completion of this thesis, his constructive advice was always highly valuable. There are countless things I learned from him during that time. Furthermore, I would like to thank Prof. Dr. Mirjam Minor for her immediate agreement to be my second supervisor. The discussions we shared since the very beginning of the EVER project and also her feedback regarding PO-CBR were very helpful for my work. Moreover, I am especially grateful for the cooperative and friendly atmosphere at the Center for Informatics Research and Technology (CIRT) at the University of Trier. In particular, I would like to thank my colleagues Dr. Sebastian G¨ org, Christian Zeyen, Lisa Grumbach, Sarah Gessinger, and Maria Gindorf, who supported me in various ways throughout the accomplishment of this thesis. For implementing some parts of the presented ideas in the CAKE framework, I appreciate the work of Christian Zeyen, Maximilian Hoffman, and Jens Manderscheid. Furthermore, I gratefully acknowledge the detailed proofreading of this thesis by Lisa Grumbach and Mirna Stieler. Their feedback was highly useful for the revision of my drafts and the final version. Also, I would like to thank the entire ICCBR community for the brilliant conferences and many great debates we had. Especially, I am very glad that Prof. Dr. Miltos Petridis and Prof. Dr. Susan Craw were my mentors at the ICCBR doctoral consortiums and took time to discuss my work with them. Certainly, all of this would not have been possible without the great support of my family and my friends, who always encouraged me on this exiting journey.
Gilbert M¨ uller Trier, January 2018
Summary Nowadays, workflows are applied in an increasing number of application areas to represent and (semi-)automatically execute various kinds of processes. The modeling of workflows, however, is a demanding task as it is not only a time-consuming but also a complex and error-prone activity. Thus, there is a high demand for methods to support workflow modelers in this endeavor. As a consequence, several approaches were recently presented that enable the search for already modeled workflows. However, search is often not sufficient, because workflows more frequently need to be tailored to individual circumstances. This work addresses that problem by presenting a novel workflow modeling assistance, which automatically constructs workflows based on a given query. The approach applies methods from artificial intelligence, in particular, from the field of Case-based Reasoning (CBR). Case-based Reasoning is a problem-solving methodology that reuses experience from previous problemsolving episodes (here, previously modeled workflows). Following the CBR principle, the workflow modeling assistance searches for the best-matching workflow from a repository of previously modeled workflows according to a specified query and then automatically adapts this workflow, if required. As a result, an individual workflow is automatically constructed. Overall, this work lays highly relevant and new foundations in the field of ProcessOriented Case-based Reasoning (PO-CBR), in which the automated workflow construction is hardly investigated so far. From a workflow management perspective, this work further presents an innovative contribution to support workflow modelers and may be a basis for further research in many workflow application areas. In more detail, this work first summarizes the most important foundations of workflow management and Case-based Reasoning. Next, a novel query language for the retrieval of workflows is presented, which captures the restrictions and requirements on the desired workflow model. Based on this, three new workflow adaptation approaches are introduced, which are based on well-established methods in CBR. More precisely, the presented compositional adaptation decomposes workflows into meaningful components called workflow streams that can be replaced by other matching workflow
VIII
Summary
streams. The operator-based adaptation uses operators that specify the deletion, insertion or replacement of workflow fragments. Finally, generalization and specialization enable workflow adaptation by means of generalized workflows. The methods are then integrated into a combined adaptation process. In general, all adaptation approaches require adaptation knowledge (e.g., rules or operators) that specifies appropriate modifications in the particular domain. In order to obviate the need for an extensive manual acquisition of adaptation knowledge, new methods are developed in order to learn the required adaptation knowledge automatically. This work also presents several new approaches to further improve the presented workflow modeling assistance, which comprises the automated completion of missing information in workflows, the adaptation-guided retrieval to identify better adaptable workflows, and the consideration of workflow complexity during the construction process. All methods were integrated into the workflow management system CAKE, which is developed at the University of Trier. Based on this, CookingCAKE was developed as a prototypical application in the field of cooking, in which workflows represent real cooking recipes. Furthermore, a comprehensive evaluation in the cooking domain demonstrates the feasibility of the workflow modeling assistance. For this purpose, automatically constructed workflows are compared with workflows resulting from search based on a set of user-generated queries. The evaluation shows that the automatically constructed workflows have a slightly lower quality than workflows manually modeled by humans. However, the queries can be better fulfilled by the automatically constructed workflows compared to the respective workflow resulting from search. Thus, the automatically constructed workflows were preferred and rated with a significantly higher utility by the participants of the study. Overall, this clearly demonstrates the benefit and potential of the developed approach. This work concludes with a discussion on the implications and limitations of the presented workflow modeling assistance and also highlights potential future research directions.
Contents 1 Introduction 1.1 Motivation . . . . 1.2 Aims of the Thesis 1.3 Approach . . . . . 1.4 EVER Project . . 1.5 Outline . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 3 6 8 9
2 Foundations 2.1 Process-Aware Information Systems . . . . . . . . . 2.2 Business Process Modeling . . . . . . . . . . . . . . . 2.2.1 Workflow Terminology . . . . . . . . . . . . . 2.2.2 Build-time versus Run-time . . . . . . . . . . 2.2.3 Workflow Representations . . . . . . . . . . . 2.2.4 Workflow Categories . . . . . . . . . . . . . . 2.2.5 Workflow Quality . . . . . . . . . . . . . . . . 2.2.6 Conclusions . . . . . . . . . . . . . . . . . . . 2.3 Workflow Management . . . . . . . . . . . . . . . . . 2.3.1 Workflow Management System Architecture . 2.3.2 Flexible and Adaptive Workflow Management 2.3.3 Conclusions . . . . . . . . . . . . . . . . . . . 2.4 Case-Based Reasoning . . . . . . . . . . . . . . . . . 2.4.1 CBR Cycle . . . . . . . . . . . . . . . . . . . 2.4.2 Knowledge Containers . . . . . . . . . . . . . 2.4.3 Case Representation . . . . . . . . . . . . . . 2.4.4 Retrieval . . . . . . . . . . . . . . . . . . . . 2.4.5 Adaptation . . . . . . . . . . . . . . . . . . . 2.4.6 Maintenance . . . . . . . . . . . . . . . . . . 2.4.7 Conclusions . . . . . . . . . . . . . . . . . . . 2.5 Process-Oriented Case-Based Reasoning . . . . . . . 2.5.1 Case Representation . . . . . . . . . . . . . . 2.5.2 Retrieval & Adaptation . . . . . . . . . . . . 2.5.3 Conclusions . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
11 12 15 15 16 18 22 24 28 29 30 33 37 38 39 41 42 43 45 53 56 57 58 58 60
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
X
Contents 2.6
Related Domains & Related Work . . . . . . . . 2.6.1 Application Visions . . . . . . . . . . . . 2.6.2 Taxonomy of Workflow Modeling Support 2.6.3 Related Fields in Case-Based Reasoning . 2.6.4 Conclusions . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . Approaches . . . . . . . . . . . . . .
60 61 64 69 72
3 Domain & Notation 3.1 Business Workflows and Cooking Workflows 3.2 Workflow Notation . . . . . . . . . . . . . . 3.2.1 Workflow Properties . . . . . . . . . 3.2.2 Partial Workflows . . . . . . . . . . 3.3 Workflow Ontology . . . . . . . . . . . . . . 3.4 Semantic Workflows . . . . . . . . . . . . . 3.5 Semantic Workflow Similarity . . . . . . . . 3.6 Conclusions . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
75 75 78 80 83 84 86 87 91
4 Query Language 4.1 Requirements . . . . . . . . . . . . . . 4.2 POQL Queries . . . . . . . . . . . . . 4.3 Query Processing . . . . . . . . . . . . 4.4 POQL-Lite . . . . . . . . . . . . . . . 4.5 Query Consistency . . . . . . . . . . . 4.6 Potential Extensions . . . . . . . . . . 4.7 Resulting Modeling Support Scenarios 4.8 Conclusions and Related Work . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
93 94 96 98 101 102 104 106 107
5 Workflow Adaptation 5.1 Generalization and Specialization . . . . . . . . . . . . . . 5.1.1 Generalized Workflows . . . . . . . . . . . . . . . . 5.1.2 Specialization of Workflows . . . . . . . . . . . . . 5.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . 5.2 Compositional Adaptation by Workflow Streams . . . . . 5.2.1 Automated Learning of Adaptation Knowledge . . 5.2.2 Application of Adaptation Knowledge . . . . . . . 5.2.3 Additional Application Fields of Workflow Streams 5.3 Transformational Adaptation by Adaptation Operators . 5.3.1 Representation of Adaptation Operators . . . . . . 5.3.2 Semantics of Adaptation Operators . . . . . . . . . 5.3.3 Details of the Operator Application . . . . . . . . 5.3.4 Automated Learning of Adaptation Knowledge . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
109 111 112 118 125 125 126 133 138 144 145 148 149 154
. . . . . . . .
. . . . . . . .
. . . . . . . .
Contents
5.4 5.5
5.6
XI
5.3.5 Workflow Adaptation using Adaptation Operators Characteristics of Workflow Adaptation Methods . . . . . Integrated and Combined Workflow Adaptation . . . . . . 5.5.1 Learning of Adaptation Knowledge . . . . . . . . . 5.5.2 Performing Adaptation . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
156 159 162 163 163 164
6 Improving Workflow Modeling Assistance 6.1 Data-Flow Completion of Workflows . . . . . . . 6.1.1 Complete Workflows . . . . . . . . . . . . 6.1.2 Workflow Completion Operators . . . . . 6.1.3 Learning Workflow Completion Operators 6.1.4 Workflow Completion . . . . . . . . . . . 6.1.5 Evaluation . . . . . . . . . . . . . . . . . 6.1.6 Conclusions and Related Work . . . . . . 6.2 Adaptation-Guided Workflow Retrieval . . . . . . 6.2.1 Adaptation-Guided Retrieval . . . . . . . 6.2.2 Workflow Adaptability Assessment . . . . 6.2.3 Evaluation . . . . . . . . . . . . . . . . . 6.2.4 Conclusions and Related Work . . . . . . 6.3 Complexity-Aware Workflow Construction . . . . 6.3.1 Process Complexity . . . . . . . . . . . . 6.3.2 Complexity Measure . . . . . . . . . . . . 6.3.3 Complexity-Aware Query Fulfillment . . . 6.3.4 Experimental Evaluation . . . . . . . . . 6.3.5 Conclusions . . . . . . . . . . . . . . . . . 6.4 Summary . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
167 169 171 174 178 179 186 189 190 192 194 198 200 201 202 203 205 206 208 208
7 Workflow Modeling Assistance Architecture 7.1 Collaborative Agile Knowledge Engine . . . 7.2 Workflow Modeling Assistance Architecture 7.3 CookingCAKE: Prototypical Application . . 7.4 Interactive Modeling Interface . . . . . . . . 7.5 Conclusions . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
209 209 212 217 220 222
8 Evaluation 8.1 Experimental Setup . . . . . . . . . . 8.1.1 Utility & Application Scenario 8.1.2 Evaluation Criteria . . . . . . . 8.1.3 Hypotheses . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
223 223 224 225 227
. . . .
. . . .
. . . .
XII
Contents
8.2
8.3
8.1.4 Workflow Repository . . . . . . . . . 8.1.5 Taxonomies . . . . . . . . . . . . . . 8.1.6 Real User Queries . . . . . . . . . . 8.1.7 Data Revision & Data Set Summary 8.1.8 Adaptation Parameters . . . . . . . 8.1.9 Computing Evaluation Samples . . . Evaluation Results . . . . . . . . . . . . . . 8.2.1 Analytical Evaluation . . . . . . . . 8.2.2 Experimental Evaluation . . . . . . Summary . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
228 230 231 233 233 235 236 236 240 246
9 Conclusion 247 9.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 9.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 251
List of Figures 1.1 1.2
Workflow modeling assistance by PO-CBR . . . . . . . . . . . Information systems research framework . . . . . . . . . . . .
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26
PAIS lifecycle . . . . . . . . . . . . . . . . . . . Workflow build-time and workflow run-time . . Sequence pattern . . . . . . . . . . . . . . . . . AND-split and AND-join nodes . . . . . . . . . XOR-split and XOR-join nodes . . . . . . . . . Cycle pattern . . . . . . . . . . . . . . . . . . . Example data-flow . . . . . . . . . . . . . . . . Workflow categories . . . . . . . . . . . . . . . Semiotic quality model . . . . . . . . . . . . . . Devil’s quadrangle . . . . . . . . . . . . . . . . Workflow reference model . . . . . . . . . . . . Structure of a workflow management system . . Status transitions of workflow instances . . . . Status transitions of workflow activities . . . . Types of workflow flexibility . . . . . . . . . . . CBR cycle . . . . . . . . . . . . . . . . . . . . . CBR knowledge containers . . . . . . . . . . . Case retrieval . . . . . . . . . . . . . . . . . . . Adaptation models . . . . . . . . . . . . . . . . Rule-based adaptation . . . . . . . . . . . . . . Operator-based adaptation . . . . . . . . . . . Compositional adaptation cycle . . . . . . . . . Learning adaptation knowledge from case base Application and maintenance of CBR systems . PO-CBR setting for workflows . . . . . . . . . Workflow modeling support taxonomy . . . . .
3.1 3.2 3.3
Example business workflow . . . . . . . . . . . . . . . . . . . 76 Example cooking workflow . . . . . . . . . . . . . . . . . . . . 77 Workflow block elements . . . . . . . . . . . . . . . . . . . . . 81
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
4 7 14 17 19 19 20 20 21 22 24 27 30 31 32 33 34 40 41 44 46 47 48 50 53 54 59 64
XIV
List of Figures
3.4 3.5 3.6
Example of a partial workflow . . . . . . . . . . . . . . . . . . 84 Example of a data taxonomy . . . . . . . . . . . . . . . . . . 85 Workflow fragment with semantic annotations . . . . . . . . . 86
4.1 4.2 4.3 4.4
POQL query example . . . . . . . . . . . . . . . . . . . . . . Query fulfillment computation . . . . . . . . . . . . . . . . . . Inconsistent query examples . . . . . . . . . . . . . . . . . . . Example transitive control-flow and data-flow connectedness .
97 99 103 105
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24
Example generalization/specialization relationships . Example generalization . . . . . . . . . . . . . . . . Example specialization scenarios . . . . . . . . . . . Workflow and workflow streams . . . . . . . . . . . . Workflow stream extraction for a creator task . . . . Workflow stream extraction for remaining task . . . Example stream . . . . . . . . . . . . . . . . . . . . Substitute stream . . . . . . . . . . . . . . . . . . . . Example stream removal . . . . . . . . . . . . . . . . Example stream insertion . . . . . . . . . . . . . . . Workflow stream abstraction examples . . . . . . . . Example of component-based worfklow modeling . . Late-binding by component-based workflow modeling Workflow stream chains example . . . . . . . . . . . Example pasta workflow . . . . . . . . . . . . . . . . Example streamlet . . . . . . . . . . . . . . . . . . . Example adaptation operator . . . . . . . . . . . . . Example of head data node removal . . . . . . . . . Example of plain streamlet insertion . . . . . . . . . Example cleanup after operator application . . . . . Mapping example and resulting adaptation operators Operator-based adaptation . . . . . . . . . . . . . . Generation of adaptation knowledge . . . . . . . . . Combined workflow adaptation process . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
113 115 121 127 129 130 131 134 135 136 139 140 141 142 145 147 148 150 152 153 155 157 163 164
6.1 6.2 6.3 6.4 6.5 6.6
Workflow modeling assistance (abstract illustration) . . . . Framework of workflow completion . . . . . . . . . . . . . . Examples for incomplete data-flow information . . . . . . . Example of strict consistency (identical branches) . . . . . . Example of strict consistency (identical creator data node) . Example completion operator . . . . . . . . . . . . . . . . .
. . . . . .
167 170 172 173 173 174
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
List of Figures
XV
6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19
Example generalized completion operator . . . . . . . . . . . Example mapping between operator and incomplete workflow Example result for an operator application . . . . . . . . . . . Default completion rule bases . . . . . . . . . . . . . . . . . . Example of default completion rule 1 . . . . . . . . . . . . . . Example of default completion rule 2 . . . . . . . . . . . . . . Example of default completion rule 3 . . . . . . . . . . . . . . Example of default completion rule 4 . . . . . . . . . . . . . . Example of default completion rule 5 . . . . . . . . . . . . . . Workflow retrieval and workflow adaptability . . . . . . . . . Adaptability assessment . . . . . . . . . . . . . . . . . . . . . Difference-based query construction . . . . . . . . . . . . . . Evaluation result of adaptation-guided retrieval . . . . . . . .
175 176 177 181 182 183 183 184 184 191 194 195 200
7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8
CAKE architecture . . . . . . . . . . . . . . . . CAKE configuration . . . . . . . . . . . . . . . CAKE data types . . . . . . . . . . . . . . . . . Workflow modeling assistance architecture . . . CookingCAKE: Query definition (simple mode) CookingCAKE: Query definition (expert mode) CookingCAKE: Result page . . . . . . . . . . . Interactive workflow modeling interface . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
210 212 213 215 217 218 219 220
8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
Application environment and utility of a PO-CBR system Workflow modeling example based on textual descriptions Extract of task and data taxonomies . . . . . . . . . . . . POQL-Lite query example . . . . . . . . . . . . . . . . . . POQL-Full query example . . . . . . . . . . . . . . . . . . Adaptations performed for evaluation . . . . . . . . . . . Success rate of adaptation . . . . . . . . . . . . . . . . . . Example of textual workflow representation . . . . . . . . Overall workflow quality ratings in comparison . . . . . .
. . . . . . . . .
. . . . . . . . .
224 229 230 231 232 235 237 240 243
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
List of Tables 5.1
Properties of workflow adaptation methods . . . . . . . . . . 160
6.1 6.2 6.3 6.4 6.5
Evaluation results of workflow completion . . . . . . . . . . . Adaptability table . . . . . . . . . . . . . . . . . . . . . . . . Evaluation results of adaptation-guided retrieval . . . . . . . Complexity criteria . . . . . . . . . . . . . . . . . . . . . . . . Evaluation results of complexity-aware workflow construction
189 197 199 203 207
8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10
Average workflow size . . . . . . . . . . . . . . Average query size . . . . . . . . . . . . . . . . Basic data set . . . . . . . . . . . . . . . . . . . Adaptation parameters . . . . . . . . . . . . . . Average query fulfillment & computation time Structural change of workflow . . . . . . . . . . Quality criteria . . . . . . . . . . . . . . . . . . Quality assessment . . . . . . . . . . . . . . . . Average utility . . . . . . . . . . . . . . . . . . Application preference . . . . . . . . . . . . . .
228 232 233 234 236 239 241 242 244 245
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
1 Introduction 1.1 Motivation In recent years, workflows developed towards an important paradigm to represent and execute various kinds of processes. From a business perspective, workflows are “the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules” [266][p. 8]. Broadly speaking, workflows serve as a model to represent sequences of activities as well as information that is shared between those activities in order to execute a certain business process. Workflows representing business processes are applied in e-commerce [123], financial services [214, 196], the service industry [197], and many other application areas. Beyond this business perspective, workflows recently gained significant importance to represent and execute many kinds of processes in various domains. A well-established application area, for example, is the analysis of large data sets employing so-called scientific workflows [242, 76, 36] including domains such as biology, ecology, physics and geology [34]. Besides, workflows are also used to support medical treatment processes [176, 52] and provide a basis for a novel programming paradigm. Flow-based programming [165] links several small sub components with each other in order to construct an entire program code. Moreover, workflows are considered to support processes in everyday life [80] and can also be applied to represent information gathering processes [69, p. 90-93] or cooking instructions [222]. In summary, it can be stated that workflows are applied in numerous domains and application areas. It is commonly agreed that the creation of workflows, also referred to as workflow modeling, is a time-consuming task with a high degree of complexity [253]. Thus, in all the previously mentioned domains, an important key to the successful application of the workflow paradigm is the reuse of those workflows. This implies that once created workflows are used several times again in same or similar scenarios. Hence, workflows do not have to be modeled from scratch, but can be based on already modeled workflows, © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_1
2
1 Introduction
which significantly reduces workflow modeling time [79]. Furthermore, the quality of modeled workflows is increased by utilizing experience gained by other workflow modelers in previous workflow modeling scenarios [79]. Moreover, based on established and validated workflows, reuse additionally leads to less error-prone workflow models [79]. Workflow reuse is thus of vital importance, as it substantially facilitates workflow modeling and improves the resulting workflow in many respects. Reuse of workflows is currently supported by various search methods (e.g., [20, 59, 6, 13, 14, 6]). They enable to identify the most suitable workflow stored in a workflow repository with regard to given restrictions and requirements. More precisely, a search for the workflow best matching the current situation is executed. This workflow can either be reused directly or requires a manual modification of the workflow to suit the particular situation. Consequently, the workflows do not have to be modeled from scratch, which significantly supports workflow modelers. However, current trends in workflows and its application areas indicate an increasing demand for individuality and flexibility, which is partially caused by a turbulent market and technological innovation force [3]. Flexibility, addresses the need to modify or change the workflow on demand during execution and has been addressed by adaptive and flexible workflow management [262, 249, 194, 219]. Individuality, in contrast, implies the requirement to model workflows tailored to the particular needs in the current scenario or situation. The increasing demand for individuality causes that the reapplication or exclusive search for workflows is not sufficient anymore, since available workflows often do not match the current situation. As a result, workflow modelers are frequently required to adapt the provided workflow manually. However, the manual modification of the workflow can be a time-consuming and complex task. If the found workflow, consequently, differs clearly from the desired workflow, the requirement of manual modification can be a threat to the successful reuse. At worst, workflow reuse is circumvented, if the creation of a new workflow from scratch is preferred. This then results in a loss of all the previously described benefits of workflow reuse. Consequently, new adaptive methods are required to support workflow modelers during the construction of individual workflows tailored to their particular needs. This thesis presents an approach to workflow modeling assistance by automated workflow construction. More precisely, in order to support the user, the requirements and restrictions of the particular scenario are captured in a query and a most suitable workflow from a workflow repository is identified. Subsequently, this workflow is adapted automatically by removing, adding, or replacing workflow elements such that
1.2 Aims of the Thesis
3
the workflow better matches the specified query. As a result, a workflow is automatically constructed according to the requirements and restrictions in the particular scenario. This substantially facilitates the elaborate task of workflow modeling and ensures that the workflow modeler can still benefit from workflow reuse, since the automatically created workflow is based on previously modeled workflows. Thus, the described automated workflow construction can be a means to a novel workflow modeling assistance that can cope with the increasing demand for individuality.
1.2 Aims of the Thesis This thesis aims at developing methods to provide a workflow modeling assistance that facilitates the complex, error-phone, and time-consuming task of workflow modeling [253, 79]. The developed workflow modeling assistance is based on methods that have already been applied successfully in Case-Based Reasoning (CBR). Case-Based Reasoning is a problem-solving methodology that reuses previously gathered experiences [200, 1, 16]. Problems are solved by searching for the most similar situation experienced in the past. Then, the corresponding solution can be reused directly or is adapted automatically to suit the current situation. Process-Oriented Case-Based Reasoning (POCBR) [157] applies this Case-Based Reasoning methodology on processes, for example, represented as workflows. This implies that workflows can be suggested based on experience gained in previous workflow modeling scenarios. For this purpose, a search is performed to identify a workflow from a repository of previously modeled workflows that best matches the current scenario. Thus, PO-CBR supports workflow modeling, since workflows do not have to be modeled from scratch, but can be created based on an existing workflow that already (partially) matches the desired solution. By means of a subsequent adaptation process, PO-CBR would further enable the automated construction of workflows tailored to particular needs. However, besides some initial investigations (e.g., [152, 155]), research in PO-CBR not yet addressed workflow adaptation. This gap in research on workflow adaptation certainly limits the scope and application of current PO-CBR systems1 with regard to workflow modeling assistance, as they can only provide a search for previously modeled workflows. Therefore, this thesis will particularly investigate the important topic of workflow adaptation in PO-CBR in order to establish an enhanced workflow modeling assistance 1 Any
CBR system without adaptation capabilities is limited with regard to scope and application [257]
4
1 Introduction
that may cope with the increasing demand for individual workflow reuse, i.e., that workflows more frequently need to be tailored to the particular needs. Hence, this work addresses a highly relevant problem in numerous domains.
query
adaptation
retrieval searching
learning adaptation knowledge
workflow repository
workflow
applying adaptation knowledge
adaptation knowledge CBR system
Figure 1.1: Workflow modeling assistance by PO-CBR
The basic idea of the intended workflow modeling assistance by means of PO-CBR is illustrated in Figure 1.1. More precisely, the user first specifies the restrictions and requirements on the desired workflow in a query. Next, for the specified query, the most suitable workflow from a workflow repository is retrieved. In a subsequent adaptation process, workflow elements are modified such that deficiencies of the workflow with regard to the particular query are compensated. For this purpose, adaptation knowledge describing valid transformations of a workflow is employed. As a result, a workflow according to the specified query is automatically constructed, which can significantly support users in the elaborate task of workflow modeling. The main research objective of this thesis is to investigate this sketched idea of workflow modeling assistance by means of PO-CBR by addressing the following partial objectives. Novel Query Language. As this thesis aims at assisting the user in the elaborate task of workflow modeling, an easy to understand query language is required. Additionally, the query language needs to capture the restrictions and requirements on the desired workflow. This query language further must be able to assess the query fulfillment of workflow solutions and possible adaptations in order to guide the retrieval and adaptation process. Novel Adaptation Methods. A workflow W can be transformed into s1 s2 an adapted workflow Wn by chaining various adaptation steps W → W1 → sn . . . → Wn . This process can be considered as a search process towards an optimal solution with regard to the query. For enabling workflow adaptation in PO-CBR, novel adaptation methods have to be developed. The approaches
1.2 Aims of the Thesis
5
should be able to consider the previously defined query language during the transformation of the workflow in order to find the best possible solution. For this purpose, various workflow adaptation methods for PO-CBR will be developed that draw on adaptation principles successfully applied in CBR. Integrated Adaptation. Adaptation methods are usually associated with their respective advantages and disadvantages or restrictions and requirements. Consequently, the competence of the system is affected depending on the particular scenario or domain. Thus, this thesis further aims at integrating and combining all developed adaptation approaches into a single adaptation process in order to compensate the disadvantages of the particular workflow adaptation methods. Thereby, a capable tool for the adaptation of workflows should be constructed. Automated Learning of Adaptation Knowledge. A major threat to the adaptation in CBR, so far not mentioned, is that adaptation knowledge usually has to be defined manually. Such a manual modeling process is expensive with regard to time and complexity. This concerns in particular PO-CBR due to the inherent intricacy of workflows. During the definition of adaptation knowledge, many restrictions and scenarios in the particular domain have to be considered. Moreover, the adaptation knowledge depends on the adaptation algorithm. Hence, an expert in workflow modeling, who is further able to understand the consequences of the adaptation knowledge with regard to the adaptation algorithms in the particular domain, would be required in order to acquire useful adaptation knowledge. This usually leads to an acquisition bottleneck of adaptation knowledge [93], as the manual acquisition of adaptation knowledge is mostly not feasible. As a result, the competence of the CBR system is reduced. The developed adaptation methods should therefore include methods to learn adaptation knowledge automatically. In this thesis, the required adaptation knowledge will be obtained from the workflows stored in the repository (see Figure 1.1). Thus, the applicability of the presented adaptation methods is increased by reusing domain specific-knowledge automatically. This additionally reduces the setup time of the PO-CBR system significantly, as no expert is required to define adaptation knowledge manually. Workflow Completion. The intended workflow modeling assistance by means of PO-CBR needs to be based on mostly complete workflows. However, existing workflow repositories frequently contain incomplete workflows with insufficiently specified information. This may not only lead to inappropriate and incomplete workflows selected during retrieval, but also results in adaptation knowledge that is incomplete in itself. Employing such incomplete workflows may thus significantly affect the entire PO-CBR
6
1 Introduction
system. Hence, methods are required to complete missing information in the stored workflows automatically. By this means, the PO-CBR system can be based on complete workflows, which improves the quality of the workflows constructed by the workflow modeling assistance. Integrating Retrieval and Adaptation. The previous aims address the retrieval and adaptation of workflows by considering adaptation as a postprocessing step after workflow retrieval. However, Smyth and Keane [233] already stated that it is important to consider the adaptability during the retrieval stage. Otherwise, retrieval may provide a workflow that cannot be at best adapted according to the query, resulting in a non-optimal workflow solution. Hence, methods are required to assess the adaptability of workflows, for instance, by performing several example adaptations. Considering this estimated adaptability value during workflow retrieval could ensure that the selection of the workflow is not a limiting factor for the subsequent adaptation. Complexity-Aware Workflow Construction. In certain situations, further requirements on the desired workflow must be considered in addition to the query. In particular, the reduction of complexity is an important criterion, since it increases the understandability of the workflow model (see, e.g., [198, 148]). Thus, less complex workflow models can be highly beneficial, especially for novice workflow modelers. Moreover, a lower complexity facilitates the maintenance and contributes to a reduced error-proneness (see ,e.g, [43, 145]) of the workflow model. Consequently, an approach is required to integrate this additional criterion into the workflow modeling assistance such that workflows with a low complexity can be generated, if required.
1.3 Approach The research approach of this work follows the design science theory in information systems research [99, 185], which “[. . . ] addresses important unsolved problems in unique or innovative ways or solves problems in more effective or efficient ways” [99][p. 81]. According to Hevner et al. design science “[. . . ] creates and evaluates IT artifacts intended to solve identified organizational problems. Such artifacts are represented in a structured form that may vary from software, formal logic, and rigorous mathematics to informal natural language descriptions.” [99][p. 77]. Furthermore, the authors present an information systems research framework (see Fig. 1.2), which describes that the environment of people, organizations and technology defines business needs on information systems
1.3 Approach
7
research. Pursuing information systems research with the goal to address those needs with the developed methods in the given environment ensures the relevance of the research. Thus, artifacts have to be constructed that aim at fulfilling or satisfying business needs. On the other hand, research rigor must be ensured, which can be “derived from the effective use of the knowledge base - theoretical foundations and research methodologies” [99]. Hevner et al. [99] state that information systems research has to build artifacts that are both relevant and rigorous.
Environment People Organizations Technology
Relevance
Business Needs
Rigor
IS Research Develop/Build Assess
Refine
Applicable Knowledge
Knowledge Base Foundations Methodologies
Justify/Evaluate
Application in the appropriate Environment
Additions to the Knowledge Base
Figure 1.2: Information systems research framework by Hevner et al. [99] (simplified)
The business needs can usually be identified by a field or case study or result directly from gaps in the environment. This thesis addresses a highly relevant problem in numerous workflow domains, which directly results from the increasing demand to construct individual workflows tailored to particular needs. The developed artifacts are based on theories and methodologies from Case-Based Reasoning, methods in Artificial Intelligence, and on related work already addressing workflow modeling support. Hence, research rigor and relevance of the presented research is ensured. The developed methods will be integrated into the CAKE framework [19]. CAKE “[. . . ] is a prototypical generic software system for integrated process and knowledge management. CAKE integrates recent research results on agile workflows, process-oriented Case-Based Reasoning, and web technologies into a common platform that can be configured to different application domains and needs.“ [19][p. 1]. To access the utility of the developed artifacts, a comprehensive evaluation in the cooking domain is conducted. In the cooking domain, workflows represent cooking recipes that consist of cooking instructions describing
8
1 Introduction
preparation steps to be executed and ingredients to be used in order to prepare a particular dish. Cooking workflows will not only be used for the evaluation, but also as a running example for demonstrating the presented approaches throughout this thesis. The cooking domain has been chosen due to multiple reasons. A major argument is that repositories of workflows can be acquired easily, as the information is readily accessible through cooking recipes. Moreover, the required background knowledge is easy to obtain (if not already available). This can be very difficult in other domains leading to an increased effort in developing methods and finding experts evaluating the presented approach. Furthermore, in CBR research the cooking domain has been established as a frequently used example domain, which enables to compare the presented approaches. A natural reason is also that the presented methods have been developed within the EVER project, in which the cooking domain has been chosen for similar reasons.
1.4 EVER Project The author was a research associate within the EVER project [22]2 “Extraction and Processing of Procedural Experience Knowledge in Workflows”3 funded by the German Research Foundation DFG4 , which initiated the contributions presented in this thesis. The project was a joint work by the University of Trier, Goethe University Frankfurt, and the University of Marburg. The objective of this research project was to gather procedural experience that is implicitly available on websites, blogs or discussion boards, and to make this knowledge available for others in a structured manner, i.e., workflows. Methods were consequently developed to gather the procedural knowledge automatically from websites in the form of workflows. Based on this, the search for workflows was investigated, which enables the reuse of the gathered procedural knowledge. Also, the reasoning on workflows was addressed such that procedural knowledge for new situations can be provided based on the gathered experience. In this sense, retrieval and adaptation of workflows also played a key role in the EVER project.
2 http://ever.wi2.uni-trier.de 3 Ger. 4 Ger.
“Extraktion und Verarbeitung von prozeduralem Erfahrungswissen in Workflows” “Deutsche Forschungsgemeinschaft”
1.5 Outline
9
1.5 Outline The remainder of this thesis is organized as follows. In the next chapter, the fundamentals for the intended workflow modeling assistance will be explained. This comprises workflow management, workflow modeling, as well as Case-Based Reasoning (CBR). Moreover, PO-CBR is described in more detail. This chapter further sketches some visionary applications and describes various related approaches to workflow modeling assistance in general. Next, Chapter 3 introduces the example domain and formal notations used in the following chapters. More precisely, cooking workflows that represent cooking recipes will be described, which serve as a running example for illustrating the approaches throughout this thesis. Based on this, the notations on workflows are introduced, which are use to formalize and explain the developed approaches. Furthermore, workflow ontologies, semantic workflows, and a semantic workflow similarity measure are described that also represent the foundations for the remaining chapters. In Chapter 4, a novel query language POQL for the retrieval and adaptation of workflows is introduced. It captures the restrictions and requirements of the user and enables to assess how well these are fulfilled. Thus, POQL can guide the entire workflow modeling assistance process. In the next chapter (see Chap. 5), three different adaptation methods for workflows are presented that are able to adapt a workflow with regard to a given POQL query. For all introduced adaptation approaches, methods are presented to obtain the required adaptation knowledge automatically. Further, the characteristics of the different approaches are discussed. These adaptation methods are then integrated and combined into a single adaptation process. Chapter 6 illustrates various approaches to improve the retrieval and adaptation process of the workflow modeling assistance. This comprises the completion of missing information within workflows, an approach to complexity-aware workflow construction, and an integrated method of retrieval and adaptation aiming at regarding the adaptability already during retrieval. In Chapter 7, the CAKE framework developed at the University of Trier is described and technical insights about the implementation of the novel approaches within this framework are provided. Furthermore, a prototypical application called CookingCAKE for demonstrating the workflow modeling assistance is presented, which enables the automated construction of cooking recipes.
10
1 Introduction
Next, Chapter 8 presents a comprehensive evaluation in the cooking domain. The evaluation is based on a comparison of workflows constructed by the presented workflow modeling assistance with workflows resulting from search for a set of user-generated queries. Automatically computed evaluation criteria as well as evaluation criteria assessed by experts are investigated in this evaluation in order to demonstrate the feasibility and usability of the novel workflow modeling assistance. Finally, Chapter 9 summarizes the achievements and discusses potential future research directions.
2 Foundations Triggered by the emergence of Business Process Reengineering in the ’90s, the perspective of software systems shifted from a pure data perspective towards process orientation. This resulted in the development of Process-Aware Information Systems (PAIS), which are systems that deal with the management and execution of processes. One instance of PAIS are workflow management systems (WfMS), which manage and execute processes represented as workflows. While PAIS were traditionally developed for business processes, the application field of workflows nowadays goes beyond this traditional perspective. One of the most essential characteristics in any WfMS is that workflows have to be modeled, prior to execution. This thesis addresses that particular stage. Thus, this chapter explains the foundations for structuring and modeling workflows. It will further be illustrated how the modeled workflows are executed within a workflow management system, since this has to be considered also during the modeling stage. Moreover, workflow quality will be discussed in this chapter. The quality of the modeled workflow is highly important, since it greatly determines the successful execution of the modeled process. Quality dimensions will be described and comprise, for example, the performance of the process with regard to quality of the final product or the understandability of the modeled process. Furthermore, the foundations will also describe the rudiments from a computational perspective. Case-Based Reasoning (CBR) has been successfully employed as a problem-solving paradigm in many application areas and domains. During workflow modeling, the workflow designer has to identify and create a suitable workflow process model for a particular scenario, which is basically such a kind of a problem-solving activity. Consequently, Process-Oriented CBR (PO-CBR) can be a means to support the modeling of workflows. However, only little research exists in the field of PO-CBR so far. Thus, this thesis will illustrate how to transfer successfully applied methods from CBR to PO-CBR in order to support workflow modeling. This section will therefore explain the fundamentals of CBR, in particular focusing on retrieval as well as adaptation methods. For modeling support, the retrieval (search) for already known workflow models is crucial. This helps the user by avoiding the need to model the workflow from scratch. Instead, the user can © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_2
12
2 Foundations
reuse the identified model either directly or performs few modifications to adapt the model to match the particular demands. However, as this thesis addresses the creation of individual workflows, only considering search is not sufficient. More useful workflows can be created only if they are adapted automatically towards the desired restrictions and needs of the user. Thus, this chapter also discusses various adaptation methods successfully applied in CBR. These approaches usually base on adaptation knowledge. In order to prevent the extensive manual creation of such adaptation knowledge, this chapter further illustrates how it is possible to learn the required adaptation knowledge automatically. After introducing the foundations of Business Process Management and Case-Based Reasoning, this chapter will describe the basic idea of ProcessOriented Case-Based Reasoning. Based on this, several application visions will be sketched. Next, related approaches to workflow modeling assistance will be presented.
2.1 Process-Aware Information Systems In The Wealth of Nations published in 1776, Adam Smith described the fragmentation of work into specialized tasks as the so-called division of labor, which significantly increased enterprises productivity [230]. Hammer and Champy [87][p.7-30] argued in the early ’90s that task-oriented jobs according to the division of labor became inappropriate to handle new developments on the market. That is because the organization around activities neglect the perspective on the outcoming result. Since then, customer demands could no longer be fulfilled with the mass market as customers call for individual products for their particular needs. Furthermore, globalization has led to an increased competition and the way of doing business has changed through technology. Another reason is that changes in the business environment became ubiquitous and continuous. Specialized work in fragmented processes is not very responsive and flexible to changes on the market. Finally, taskorientation also hampers innovation and creativity in an organization, results in a high overhead, and leads to an absence of customer orientation. In the course of Business Process Reengineering in the early ’90s the process perspective has consequently gained significant importance. According to Hammer and Champy “[. . . ] reengineering is the fundamental rethinking and radical design of business processes to achieve dramatic improvements in critical, contemporary measures of performance, such as cost, quality, service and speed” [87][p. 32]. Thus, business process redesign focuses on
2.1 Process-Aware Information Systems
13
the improvement of processes in order to increase, for example, customer satisfaction, improve the efficiency and quality of business processes, reduce the cost of processes, and cope with new business challenges and opportunities [75]. A shift towards process-orientation means that the work is organized around the business process. Davenport has defined a business process as “simply a structured, measured set of activities designed to produce a specified output for a particular customer or market. [. . . ] A process is thus a specific ordering of work activities across time and space, with a beginning and an end, and clearly defined inputs and outputs: a structure for action.” [55, p.5]. This definition captures the customer-orientation as well as the process-orientation focusing on the particular output. Hammer and Champy [87][p. 47] highlight that information technology enables a radical redesign of business processes whereby the process-orientation facilitates to overcome organizational boundaries improving the performance of business processes. Consequently, business process reengineering has significantly changed the perspective of information systems. From a technological perspective, the early business information systems mainly focused on storing, searching, and displaying of information [64][p. 4-5]. These systems were thus solely driven by data for the purpose of supporting employees in specific tasks following the division of labor. For enterprises, this means that the actual business processes have been neglected in those IT systems. In the worst case, business processes were structured such that they suit the present IT systems. The entire business processes consequently involved multiple IT systems and manual procedures which hampered the efficient execution of business processes. Factors for inefficiencies included manual resource allocation and work routing, no clear separation of responsibilities, possible work overflows, and redundant manual data input [64][p. 4]. Furthermore, due to multiple IT systems, a need for a global view on the operation of information systems emerged. Thus, there was a high demand for so called Process-Aware Information Systems (PAIS) systems following Business Process Reengineering developments towards a process-orientation. A holistic view on process-orientation with regard to the modeling of information systems to support business processes has also been described by the ARIS architecture [217] [219][p. 51]. A PAIS is “a software system that manages and executes operational processes involving people, applications, and/or information sources on the basis of process models” [64][p. 7]. Thus, Process-Aware Information Systems support entire business processes and have been enablers for a radical redesign of business processes. PAIS usually separate process logic from
14
2 Foundations
the applications and facilitate a modeling of processes without a recoding of the system [64][p. 4-5]. This is also reflected by the four phases of the so-called PAIS lifecycle (see Fig. 2.1) [64][p. 11-12]. In the design phase process models are constructed, i.e., processes are modeled. Next, software systems, for example, a workflow management system, have to be implemented and developed which support the execution of these process models (system configuration phase). In the next phase, these process models can be executed, which is also referred to as process enactment. In the final phase, the executed processes are analyzed to identify problems and optimization options in order to revise the design of the processes (diagnosis phase). Thus, the modeling of processes is a main aspect in any PAIS as without process models no system can be designed and no processes can be executed. Prominent examples of PAIS systems are, for example, Enterprise Resource Planning (ERP) systems, Customer Relationship Management (CRM) systems, and Workflow Management systems (WfMS) [209]. diagnosis
process enactment
process design
system configuration
Figure 2.1: PAIS lifeycle by Dumas et al. [64][p. 11-12]]
The shift towards a process-orientation by means of PAIS enabled an improved communication between the involved stakeholders, for example, managers and IT professionals through the use of explicit process models. These explicit process models further support the management during controlling, evaluating and improving of processes. Moreover, they enabled a more global view and an automatic enactment of the business processes. The separation of process models from the PAIS also allows the modification of business processes, which is demanded by continuously changing business environments.
2.2 Business Process Modeling
15
2.2 Business Process Modeling Business Modeling [129][p. 31] is one of the first phases of business process management, which commonly involves different stakeholders [252] [263][p. 11-16]. During this process, surveys, discussions, and process improvement activities lead to an informal business process description in which process goals, critical success factors, organization structures, and business objects have to be identified primarily [129][p. 31-32]. Business Process Modeling is the phase in which this informal description is formalized by constructing a business process model using a particular process model notation. A process model is also often used during the entire design process, since it is continuously validated and improved until it represents the desired business process. Thus, a comprehensible representation of a process model is obviously required. Various types of PAIS have been created to support different process representations such as event-driven process chains (EPCS) [218], petri nets [57] or UML [67]. This thesis focuses on processes represented as workflows, which will be introduced in the following.
2.2.1 Workflow Terminology The workflow terminology is based on the specifications of the Workflow Management Coalition (WfMC)1 . The WfMC is a non-profit organization founded in 1993 aiming at the development of common terminologies and standards to enhance the opportunities of the workflow technology [266][p. 5]. The WfMC defines workflows as “the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules” [266][p. 8]. Broadly speaking, a workflow consists of an ordered set of tasks. The WfMC defined these tasks or activities as “a description of a piece of work that forms one logical step within a process. An activity may be a manual activity, which does not support computer automation, or a workflow (automated) activity. A workflow activity requires human and/or machine resources(s) to support process execution; where human resource is required an activity is allocated to a workflow participant.” [266][p. 13] Consequently, these tasks may be either manual tasks such that they have to be performed by a particular participant or automated tasks which are autonomously executed by a certain application. The execution order of these tasks, also referred to as the control-flow , is defined by constructors 1 http://www.wfmc.org
16
2 Foundations
[251]. These constructors determine whether certain tasks are, for example, executed sequentially, in parallel, or within loops. In order to construct an entire process, information or data needs to be shared between the workflows’ tasks. The WfMC defined such data as “data that is used by a workflow management system to determine the state transitions of a workflow [. . . ]”[266][p. 45]2 . This data can be modified by applications via automated tasks or by the user performing the particular activity. Furthermore, this data can also define which activity is performed next. Usually, each task requires a determined set of information for its proper execution and modifies this information or produces new data which can subsequently be consumed by other tasks. The sharing of data or information is commonly denoted as the workflows’ data-flow and is essential for the performing of single tasks and consequently for the entire workflow execution. Such workflows are also referred to as control-flow oriented workflows (see , e.g., [267, 182, 250]) as the execution order of the activities is explicitly specified via the control-flow. In contrast, data-flow oriented workflows that are, for example, employed as scientific workflows (e.g., [242, 183]), implicitly derive the execution order of tasks via the data-flow, i.e., a task can be executed when all required input data provided by previous tasks are available. Thus, both types of workflows differ fundamentally. In the following, this thesis focuses on control-flow oriented workflows, which are conventionally applied to represent business processes.
2.2.2 Build-time versus Run-time In the PAIS cycle, it has already been implied that there are mainly two perspectives of workflows, namely the modeling (build-time) and the enactment perspective (run-time) of a workflow (see Fig. 2.2). In this section, an amplified view on these perspectives will be illustrated. During the modeling of a workflow, the process designer defines, configures, and verifies the workflow model [195][p. 31 ff]. This stage is referred to as the process design in the PAIS lifecycle in Figure 2.1 or the build-time. The WfMC defines a workflow model3 as “(a) set of one or more linked procedures or activities which collectively realize a business objective or policy goal, normally within the context of an organizational structure defining functional roles and relationships.” [266][p. 10]. The process designer makes use of a process editor and determines tasks as manual or automated activities. 2 WfMC
defines various types of data; In the following data is referred so-called workflow relevant data [266][p. 45] 3 referred to as workflow definition by WfMC
2.2 Business Process Modeling
17
Then, the designer orders these tasks by the definition of the control-flow and further specifies the produced and consumed data of the tasks by defining the data-flow. This is commonly a complex task for the workflow designer, especially if the workflow modeling language is highly expressive, providing many constructs or if many activities, data, automated services are involved. The final result is a single workflow model , which represents a generic business process for a certain business scenario.
Workflow Model
build-time run-time
instance of
Workflow Instance
Figure 2.2: Workflow build-time and workflow run-time based on [263][p. 88]
The enactment of workflows causes the instantiation of the previously mentioned workflow model [195][p. 31 ff], which is referred to as the process enactment stage in PAIS. Thus, a workflow model is a template for a set of workflow instances [263][p.88]. More precisely, a model is instantiated, which means that it represents the execution of a particular business scenario. The workflow enactment refers to as the run-time of the workflow by means of workflow instances. The WfMC defined workflow instances as “the representation of a single enactment of a process [. . . ], including its associated data. Each instance represents a separate thread of execution of the process [..], which may be controlled independently and will have its own internal state and externally visible identity, which may be used as a handle, for example, to record or retrieve audit data relating to the individual enactment” [266][p. 15]. Consequently, a workflow instance involves, for example, concrete tasks and data of a particular business process. Furthermore, during enactment, the workflow instance and its activities possess a current execution state in order to manage the workflow execution. This is performed by a so-called workflow engine which creates workflow instances, interprets, and manages them according to the particular workflow model and current status of the workflow instance. Following the WfMC definition, workflow
18
2 Foundations
instances additionally serve as a documentation of the particular business process. Consequently, the modeling of workflows (build-time) is usually separated from the enactment of a workflow (run-time). However, also during run-time of a workflow, the remodeling of a workflow instance might be required because the process cannot be executed as previously determined (see Sect. 2.3.2). This thesis primarily focuses on the modeling of the workflow during build-time, but in the following will also present related topics from workflow enactment to present the entire application field.
2.2.3 Workflow Representations In this section, the modeling stage of a workflow will be explained in more detail. As soon as an informal business process derived from business modeling or business process reengineering, it has to be modeled explicitly by a particular workflow notation (or workflow language). These notations can be based on many different representations defining constructs that can be used by the workflow modeler to formally express the business process. Workflows can, for example, be represented by rule-based modeling approaches, which basically define the order of activities by rules [132]. The most common way, however, to represent workflows is the use of a graphical notation. A graphical workflow notation means that workflows are represented as graphs consisting of nodes and edges. While the nodes reflect activities as well as data, edges define the control-flow or data-flow respectively. Controlflow edges define the ordering of activities by stating which task is executed after another one. Data consumed or produced by these tasks is defined by data-flow edges. Today, many graphical workflow modeling languages have been constructed with a varying degree of expressiveness and for different purpose such as BPMN (Business Process Modeling Notation) [182], YAWL (Yet Another Workflow Language) [250], BPEL (WS-Business Process Execution Language) [181] and XPDL (XML Process Definition language) of the WfMC [267]. Van der Aalst et al. [251] discussed how to structure the control-flow of a workflow according to so-called workflow patterns. These constructs enable, for example, the execution of activities in parallel or in sequences. Van der Aalst et al. [251] have investigated 15 different workflow management systems and showed that common software usually does not support all possible workflow patterns. In the following, the most important workflow
2.2 Business Process Modeling
19
patterns according to van der Aalst [251] are summarized based on the work of Weske [263][p. 126 ff]. Sequence Pattern The most essential pattern, available in any common workflow language is the sequence pattern (see Fig. 2.3). It defines the order of two control-flow elements by describing that task or control-flow node A is executed prior to another task or control-flow node B. Thus, only after terminating A, B can be executed. A
B
Figure 2.3: Sequence pattern
Parallel patterns Parallel patterns enable that, instead of a single sequence, multiple sequences can be executed in parallel. The parallelization is represented by AND-split and AND-join nodes as exemplified in Figure 2.4. The AND-split denotes that after task A is finished, the tasks B and C can possibly be executed in parallel. The AND-join synchronizes parallel branches and requires that the tasks B and C are completed before D is enabled for execution. B A
B
AND split
AND join C
D
C
Figure 2.4: AND-split and AND-join nodes (parallel pattern)
Exclusive patterns In contrast to the AND-split/-join the XOR-split/-join nodes enable that only one of multiple sequences can be executed (see Fig. 2.5). XOR-split means that after task A is executed either B or C can be executed. Which one is executed, is usually defined via a certain condition. The XOR-join finally again synchronizes the alternative branches. However, it only requires that either B or C must be completed prior to the execution of D.
20
2 Foundations
B
B XOR split
A
XOR join C
D
C
Figure 2.5: XOR-split and XOR-join nodes (exclusive pattern)
Choice patterns Choice patterns are similar to exclusive patterns, only differing in the respect that more than one sequence can potentially be executed. Instead of XOR-split/join nodes OR-split/join nodes are simply used instead (compare Fig. 2.5). Here, the OR-split denotes that after task A is executed depending on the condition either B, C, or both can be executed. Again, the OR-join synchronizes the branches, which means that if any branch is completed D can be executed. Compared to the previous patterns, the choice pattern may significantly lead to more error-prone and less comprehensible workflow models (see Sect. 2.2.5). Cycle Pattern Arbitrary cycles also referred to as loops, define the possibility to execute sequences multiple times. Loops however, are basically only XOR-join and XOR-split nodes (see Fig. 2.6). At the end of the loop, an XOR-split decides whether the sequence is repeated or the loop is terminated based on a defined condition. In case of re-execution, the XOR-join enables that the loops’ sequence nodes can be again executed.
A
XOR join
Sequence X
XOR split
B
Figure 2.6: Cycle pattern
Data Patterns Besides the consideration of the control-flow, also the dataflow is essential to the workflow paradigm. Usually, workflow data patterns are categorized into data visibility, data interaction, data transfer, and data-based routing [263][p. 98 ff] [208]. Data visibility restricts the access of the data, such that it can only be accessed by a particular task (task data), by a certain sub-process (block
2.2 Business Process Modeling
21
data), by the entire workflow (workflow data), or by the entire business process execution environment (environment data). Data interaction patterns describe the level at which information can be shared, for example, between two tasks of the same workflow, between different workflows or between the workflow and the business process execution environment. Furthermore, data can be transferred either via call-by-value or call-by-reference. Finally, data-based routing describes the influence of data to the process execution. In this regard, available data can enable the execution of a particular task or be used to evaluate a condition, for example, in a LOOP or XOR structure, in order to decide which path is going to be executed. Whenever data objects are mentioned in the following, they refer to task data that can be shared between any tasks in the workflow as defined by the particular data-flow. Data 1
LOOP join
Data 2
Task A
LOOP split
Figure 2.7: Example data-flow
The data-flow is usually defined by edges such that edges pointing from the data node to a task node denote input data objects and edges in the reversed direction denote output data objects (see e.g. [263][p. 230 ff]). An example data-flow is illustrated in Figure 2.7. The dashed lines mark the data-flow edges which describe that the data objects 1 and 2 are consumed by task A as input data objects. After execution of A, data object 2 is overwritten by a new value as denoted by the output data edge. Data can further be used to evaluate conditions of control-flow nodes as depicted in the loop split node, that defines whether the loop sequence is executed again or terminated based on data object 2. It has been shown that user acceptance of workflow modeling languages usually depends on the expressiveness and its ease of use [190], which is a trade-off difficult to fulfill, since the language would need to be tailored to the particular application. Despite its expressiveness (e.g., BPMN specification
22
2 Foundations
comprises 53 constructs plus attributes), workflow modeling languages like BPMN found a widespread acceptance. However, more than 70% of its users are not professionally trained workflow modelers [190], although modeling skills significantly affect the quality and the overall success of the modeled process. Modeled workflows serve as the basis for documentation, enactment or redesign of processes. Patching of insufficiently modeled workflows can become a costly undertaking. Consequently, workflow modeling support could be an important means to guide less trained workflow modelers in performing their daily work. This may be useful in different kinds of workflow categories as sketched in the next section.
2.2.4 Workflow Categories Workflows are modeled for different types of scenarios or purposes. Leymann and Roller, for example, categorized workflows according to their business value and their repetition rate [129][p. 10-12] (see Fig. 2.8). Workflows with a high business value are those implementing the core competencies of the company. The repetition rate defines how often a workflow model is executed in the same manner (i.e. number of workflow instances).
Business Value high Collaborative Workflows
Production Workflows
Ad Hoc Workflows
Administrative Workflows
low low
high Repetition Rate
Figure 2.8: Workflow categories by Leymann and Roller [129][p. 10-12]]
2.2 Business Process Modeling
23
• Workflows with a low repetition rate and a low business value are referred to as ad hoc workflows. They are characterized by having no defined procedure. An example is the unregulated and unstructured sharing of information in a company. This information could trigger new actions by the recipients that may again produce new information shared. Thus, the flow of information causes the execution of several activities, without following an explicit structure. • Administrative workflows have a high repetition rate but are characterized by low business value, since they do not manage the core competencies of the company. In contrast to ad hoc workflows, a clearly defined structure exists due to their repetitive and standardized execution. Travel expense accounting is an example of such an administrative workflow. • Production workflows have a high business value as well as a high repetition rate. They usually implement the core competency of the enterprise, for example, claims management processes in insurance companies. These processes need to be executed efficiently in order to gain competitive advantage. • The fourth class of workflows is denoted as collaborative workflows. Though they have a high business value, their repetition rate is low as they are created for a particular task. This occurs, for example, whenever products are tailored to the particular needs of the customer. Usually, they are constructed based on adapting a former applied process. Examples comprise ship construction, technical documentation for a software product or the brand management of a customer product. From a traditional perspective, workflows with a high repetition rate (i.e., production and administrative workflows) are those which should be modeled, since the initial effort is reasonable compared to the benefit. Consequently, the effort for modeling of ad hoc workflows is usually disproportional, as they are executed rarely and further have a low business value. In contrast, collaborative workflows have a high business value, since they are characterized by being tailored to a particular task. As this occurs ever more frequently, workflow modeling support by automated workflow construction plays a key role. Consequently, all types of workflows, where a procedure is explicitly defined (i.e., collaborative, production, and administrative workflows) may significantly benefit from a workflow modeling assistance.
24
2 Foundations
2.2.5 Workflow Quality Though workflows are characterized by different repetition rates and inherent different business values, the performance of processes strongly depends on the quality of the workflow model. The quality is primarily determined during workflow modeling. Consequently, workflow modeling support by automated workflow construction can only be beneficial in case it ensures that the quality is not significantly reduced compared to the additional effort of creating workflow models from scratch. Reijers et al. defined quality as “the totality of features and characteristics of a process model that bear on its ability to satisfy stated or implied needs” [198][p. 173] based on the ISO 9000 guideline and Moody’s quality definitions on conceptual models [164]. While this seems rather abstract, several concrete quality metrics to assess certain aspects of workflows were presented (e.g. [254, 146, 97]). In general, mainly two perspectives from workflow quality can be drawn, namely, the workflow as a conceptual model and the workflow as a productive business process. In the following, these two perspectives will be explained in more detail below. Conceptual Workflow Quality Lindland et al. presented a framework for the quality of conceptual models and divided quality into syntactic, semantic, and pragmatic quality [130] (see Fig. 2.9). While the framework of Lindland et al. is a more theoretical approach, Rittgen [204] has investigated how these quality dimensions can be measured in practice. An overall framework for these three quality dimensions of workflows has been proposed by the 3QM-Framework [186] (acronym for quality marks, metrics, measures, and procedures). In the following syntactic, semantic, and pragmatic quality of workflows will be discussed in more detail.
Modeling language Syntactics Real world excerpt
Semantics
Model
Pragmatics
Audience interpretation
Figure 2.9: Semiotic quality model by Lindland et al. [130] taken from [186]
2.2 Business Process Modeling
25
Syntactic Quality The syntax describes the vocabulary and grammar that is available to describe the model, i.e., the workflow meta-model including the workflow patterns. Syntax checking ensures the syntactic correctness of workflows. Reijers [198] differentiates between static verification and behavioral verification. A workflow is statically verified with regard to the syntax if only permitted types of elements and permitted connections between those elements are used. Behavioral verification is ensured, if an option exists that guarantees the termination of the workflow at any time from any task and that each task can in principle be reached. This is basically also known as the soundness of the workflow [247]. Reijers also highlights the importance of verification during the design of process models, since a range of studies showed that 10-12% of process models from practice lack of behavioral verification. According to Reijers [198] a valid syntax can be ensured, for example, during the modeling of the workflow, by the so-called correctness-by-construction principle [193, 53], which restricts the use of control-flow edges. Further, also dynamic changes of workflow instances or modifications on workflow models can preserve the syntactic correctness of a workflow [201, 259]. Workflow modeling assistance should prevent syntax errors as it can be ensured completely automatically and provides a first means for eliminating error-prone workflow models. Semantic Quality Semantic quality addresses the completeness (“contains all statements about the domain that are correct and relevant to the problem”[130][p. 46]) and the validity (“all statements made by the model are correct and relevant to the problem”[130][p. 46]) of the workflow for the particular process goal which relates to the particular domain knowledge. If a model is complete and valid it is consistent, which means that no invalid statement is contained in the model plus no statement in the model is missing or superfluous. However, since domain knowledge depends on the particular perceptions, semantic quality can usually only be estimated subjectively. Based on the work of Maes et al. [135], Rittgen [204] constructed four indicators for the perceived semantic quality, as follows 4 : • Correctness The workflow represents the business process correctly. • Relevance All workflow elements are relevant for the representation of the business process. 4 definition
taken from [204] adopted to workflows
26
2 Foundations
• Completeness The workflow completely represents the business process. • Authenticity The workflow is a realistic representation of the business process. Each indicator should be rated on a 7-point Likert scale from “strongly disagree” to “strongly agree”. As an overall measure of the perceived semantic quality, the average value of the 4 indicators is proposed. Please note that as Reijers stated, the rating of semantic quality requires to comprehend the meaning of modeling constructs, to understand the domain and further to know the goal of the process model [198]. Pragmatic Quality Finally, the pragmatic quality measures the comprehensibility of the model with regard to its addressed audience. Comprehensibility relates to the executability, expressive economy, and structuredness of the process model. Reijers [198] explains that workflows can have a low semantic quality (important parts of the real world are neglected) but a high pragmatic quality (as this model is still of high comprehensibility for the audience) and vice versa. Mendling et al. [148], for example, introduced seven process modeling guidelines for the creation of comprehensive process models (i.e., with a high pragmatic quality) which have been constructed based on empirical insights. From these guidelines, quality impacts, which usually hold, can be extracted as follows. • A process model with a smaller number of workflow elements has a higher pragmatic quality and a reduced error rate. • A lower number of input and output edges per element are easier to comprehend and have a lower error rate • Using only one start and one end element increases the understandability and reduces the error rate. • Workflows that are structured such that each SPLIT matches a JOIN connector of the same type have proven to be better understandable and less error-prone. • In contrast to an exclusive use of AND and XOR control-flow elements, OR control-flow nodes have a negative impact on the pragmatic quality and the error-proneness. • The labeling of activities should be in verb-object order for a higher comprehensibility.
2.2 Business Process Modeling
27
• Workflows with more than 50 elements should be decomposed to split up into smaller workflow models (in alignment with the first listed quality impact) Finally, also naming conventions regarding the labels of workflow elements are an important factor for the pragmatic workflow quality as stated by Becker et al. [12]. This eases the comparison of different workflow models and ensures a common understanding of the intended activity or data object represented by the particular workflow element. Workflow Performance Quality The presented quality types refer to workflows as a conceptual model. Since workflows are executable models, also their performance can be measured from various perspectives. The performance perspective on workflow quality includes aspects such as time, resource allocation (e.g., human resources), and the quality of the resulting product. Rittgen [204] introduces the perceived usefulness of the models, which depends on the concrete business scenario. Jansen-Vullers et al. present a framework to measure the performance of a workflow [105] based on the devil’s quadrangle introduced by Brand and van der Kolk [37]5 . The devil’s quadrangle separates (see Fig. 2.10) the performance of a workflow into four dimensions: time, cost, flexibility, and (performance) quality67 . Quality
Cost
Time
Flexibility
Figure 2.10: Devil’s quadrangle by [37], based on [105] 5 The
author had no access to the publication referred to as quality. For distinction purposes in the following denoted as performance quality 7 For the sake of simplicity, different categories of the particular dimensions are shortened in this section. Please see [105] for more details. 6 Only
28
2 Foundations
The time dimension is a fundamental performance measure, which is reflected through a broad range of available measures. Several categories of time can be considered, for example, the time to handle the entire process is referred to as lead time. The throughput time measures the time between completion of a task and completion of another one. Consequently, the throughput time also includes the time between the completion of a task and starting the subsequent task, for instance, due to being queued in the worklist (referred to as queue time). The cost dimension, for example, comprises the running cost such as human resources or machinery as well as transportation costs, administrative costs and the costs for utilizing materials. Please note that the cost dimension is effected by other performance dimensions. A more costly workflow may, for example, result from long lead times or a low-quality workflow that requires in consequence more elaborative rework. The next dimension covers the flexibility of the process which may address individual resources or tasks as well as the entire workflow. This dimension not only includes the ability to modify the process but also the ability to pursue alternative routes. Performance quality can be categorized into internal performance quality and external performance quality. While the latter describes the quality of the process from the customer perspective, the internal performance quality refers to as the employee perspective. External performance quality, comprises the quality of the process for the customer, measuring the provided service quality during the process and the quality of the final output. Furthermore, internal quality is essential as it may result in “high motivation, high job satisfaction, high external quality, and low absenteeism”[105]. This dimension depends on the autonomy of the employee, task variety, and task significance.
2.2.6 Conclusions This section introduced the workflow terminology. The basic workflow patterns to construct workflow models were illustrated. It was shown that workflow modeling is a complex and time-consuming task, which is usually not performed by professionally trained workflow modelers. Thus, workflow modeling assistance can be highly beneficial. Besides the construction of the workflow model during the build-time, also the enactment of workflow instances during run-time was described. More details on workflow instances will be presented in the next section. Furthermore, it was shown that workflows with a low repetition rate (few workflow instances) can also be of high business value. For these workflows
2.3 Workflow Management
29
the modeling assistance is highly beneficial because the initially large modeling effort can be reduced as models can be more easily tailored to the particular scenario. Finally, quality dimensions of workflows have been discussed, which are essential criteria to be considered during the modeling of workflows and thus also by the developed workflow modeling assistance.
2.3 Workflow Management The management and execution of workflows is usually facilitated by so-called Workflow Management Systems (WfMS). Early Process-Aware Information Systems, such as Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems, were unable to capture the overall business processes, usually incorporating multiple (PAIS) applications [263][p.32 ff]. The holistic view on business processes in the course of business process reengineering, however, requires more interlaced handling of processes in the IT systems. Furthermore, as business-to-business processes construct a corporate value chain between various stakeholders (e.g., the buyer, the manufacturer, and the reseller company), they demand a collaboration within the business partners information systems. These considerations ultimately resulted in workflow management systems as a novel type of PAIS. The Workflow Management Coalition determines a workflow management system as a “system that defines, creates and manages the execution of workflows through the use of software, running on one or more workflow engines, which is able to interpret the process definition, interact with workflow participants and, where required, invoke the use of IT tools and applications” [266][p. 9]. Thus, a workflow management system is able to interpret process definitions, to manage the execution of workflow instances, and to organize the activities of involved workflow participants or applications. The term Business Process Management is also commonly used in the context of workflow management and enhances the scope of workflow management systems by an extended analysis and management support [248]. Thus, according to Weske, “business process management includes concepts, methods, and techniques to support the design, administration, configuration, enactment, and analysis of business processes” [263][p. 5]. In the following, the architecture of workflow management systems to model and execute workflows will be introduced. Moreover, it will be illustrated how current workflow management systems cope with changing
30
2 Foundations
environments and an increased demand for workflows tailored to the particular needs.
2.3.1 Workflow Management System Architecture The WfMC published a well-known reference model [265], describing a generic architecture of workflow management systems by identifying their basic interfaces (see Fig. 2.11) [266][p. 23]. Process Definition Tools Interface 1
Administration & Monitoring Tools
Interface 5
Workflow API and Interchange formats Other Workflow Workflow Enactment Service Interface 4 Enactment Service(s) Workflow Engine(s) Interface 2 Workflow Client Applications
Workflow Engine(s) Interface 3 Invoked Applications
Figure 2.11: Workflow reference model [266][p. 23]
These interfaces define the interaction of the workflow management system with its environment [266][p. 23]. The core of a workflow management system is the workflow enactment service, “[. . . ] that may consist of one or more workflow engines in order to create, manage and execute particular workflow instances” [266][p. 59]. A workflow engine “[. . . ] provides the run time execution environment for a process instance” [266][p. 57] based on a particular process model [266][p. 57]. More precisely, the engine interprets the process model, creates a particular instance, and manages its execution by organizing the handling of the involved activities. The interfaces to the workflow management systems’ environment are arranged around this enactment service. A workflow management system can import and export process models that have been constructed in a process modeling tool, which can then be enacted by a workflow engine. During the run-time, applications are invoked or participants are involved in the workflow execution process via workflow client applications. Further, a workflow management system may interoperate with other workflow engines such that process execution is coordinated between those engines [266][p. 58]. Finally, a workflow
2.3 Workflow Management
31
management system usually provides functions for monitoring workflow events during workflow execution. This enables to supervise the performance of the process [266][p. 56] and to administrate user data, process models, or the assignment of activities [266][p. 61].
Workflow Model Editor
Workflow Model Repository
build-time run-time
Workflow Model/ Workflow Definition
Workflow Model/ Workflow Definition
enactment
Workflow Engine
Worklist
Workflow Instance
Applications
Figure 2.12: Structure of a workflow management system based on [129][p. 62]] & [266]
While the workflow reference model is rather abstract, another view on workflow management systems architecture will be provided, which focuses on the build-time and run-time perspective (see Fig. 2.12). During the build-time, the process designer defines, configures, and verifies workflow models [195][p. 31]. For this purpose, the process designer makes use of a process editor which also contains repositories of application services which can be included into the workflow. Further, the user may access the previously created workflow models through the workflow repository which could serve as guidelines for the new workflow model. The final workflow design is then stored within the workflow model repository. By enacting a workflow a workflow instance will be created and executed by the workflow engine. For communicating between the particular workflow engine and the user, a “list of work items associated with a given workflow participant [. . . ]” [266] [p. 20] is provided which is also referred to as worklist. This worklist is managed
32
2 Foundations
and created by the worklist handler that requests work items or activities from the workflow engine in order to enable the execution of manual activities [266] [p. 20-21]. Each employee has a separate worklist containing a set of assigned work items. The handler enables the assignment of a work item to a participant such that the user can perform its assigned task, which may also be performed within a provided tool of the workflow management system or via a particular client application. For assignment usually information about the organizational structure, skills, and competences are accessed in order to assign tasks to a qualified participant which also ensures that responsibilities are clearly defined. Further, the handler may also manage the reassignment of work items to other participants, and notifies the workflow management system, if a work item has been completed. Moreover, during enactment, applications may be invoked, which automatically execute particular activities or work items respectively [266] [p. 41]. Workflows can consequently be modeled involving manual tasks as well as applications and automated tasks without recoding the workflow management system, which is a characteristical property of a PAIS [64][p. 4-5].
Suspended Restart Suspend/ Resume Initiate
Initiated
Running Start/ Restart
Terminated Terminate/ Abort
Iterate through all active activities Active
(one or more activity instances) Complete
Figure 2.13: Status transition of workflow instances by the WfMC [265][p. 23]
During the run-time, the workflow enactment service manages various states of its particular workflow instance and its corresponding activities [265][p.23-25]. The change of a stage is triggered by external events such as completing of a work item or control-flow related decisions of the workflow engine such as selecting the next step within a process depending on conditions. With the creation of a process instance it is “initiated” but not yet started (see Fig. 2.13). As soon as the execution of the process has been triggered the instance is “running”. If work items have been created and assigned to the worklist or applications, the workflow instance is considered as “active”. The process can also be “suspended”, determining that currently no activities are started until the process instance resumes to the
2.3 Workflow Management
33
running state. Finally, the instance has either been successfully “completed”, or a “termination” has been explicitly invoked, for example, by a manual termination of the process or due to an unexpected error. Suspended Suspend/ Resume Start
Inactive
Active
Completed
(has work item)
Figure 2.14: Status transition of workflow activities by the WfMC [265][p. 24]
Also, the activities transit various states (see Fig. 2.14). Initially, they are determined as inactive until starting conditions of an activity are met [265][p.23-25]. This causes the creation of a corresponding workflow item. Then, the activity is considered as active until it has been completed. Further, the activity can be suspended similar to the workflow instance, which means that no work item is allocated until the activity resumes to inactive stage (waiting for activation).
2.3.2 Flexible and Adaptive Workflow Management In recent years, increased variety of products or services as well as an increased time-to-market and business-on-demand changed the way of doing business significantly [260]. Business agility became an important enabler for competitive advantages, which can only be reached by efficient and repetitive adaption of the business processes in order to suit changing environmental circumstances. Consequently, the ability to change processes within PAIS became a critical success factor. In order to react to changing environments adaptive workflow management systems have been developed [219][p. 65 ff]. An adaptive workflow management system is characterized by the ability to change the workflow instance to suit altered environments. When changes on processes are discussed, literature employs the terms flexibility and adaptation frequently8 . Thus, in the following these terms will be introduced and defined, showing that adaptation can mostly be seen as a special kind of flexibility. 8 sometimes
also the term agility (e.g. [159]) occurs in literature, which is usually enabled by ad-hoc adaptations (see below)
34
2 Foundations
Definition 1. Flexibility is the ability of process instances to encompass changes of the environment in which they operate9 . Thus, flexibility mainly refers to the ability of the process instances to react to changes in the environment, which can be basically achieved by five principles, namely, by design, notation, deviation, change, or underspecification (see Fig. 2.15)[221]10 . Flexibility by
notation
change
deviation
design
underspecification
Figure 2.15: Types of flexibility based on [221]
• Flexibility by notation. The notation of the workflow primarily determines the flexibility of its process instances. The introduced workflow notation is a so-called imperative workflow notation, which exactly states how work has to be done with a previously defined order [187]. Its flexibility depends on the available constructs, for example, if no XOR-constructs are available, alternative execution paths cannot be defined, reducing its flexibility. Another approach is the use of so-called declarative workflows that describe what has to be done in order to achieve the entire process goal [187]. This means that there is usually a certain degree of freedom about the order of the tasks considering defined constraints. Thus, declarative workflows provide a certain degree of inherent flexibility. This thesis focuses on imperative workflows, which were already introduced in the previous sections. • Flexibility by design means that during the modeling of the workflow several alternative ways of executions, for example, by use of XOR-elements are considered. During the run-time of the workflow, the appropriate path of the workflow can be chosen. • Flexibility by deviation refers to the ability of a process instance to deviate from the prescribed execution path of the process model [221]. Thus, the order of activities is dynamically changed, if required. In imperative workflow notations, this can be achieved by introducing operators enabling 9 based 10 based
on the work of Schonenberg et al. [221] on [221] extended by “flexibility by notation”
2.3 Workflow Management
35
deviation. In declarative workflows, deviation is managed, as soon as constraints are violated (e.g., [85]). • Flexibility by underspecification. During workflow modeling, certain parts can be underspecified such that during run-time the incomplete process specification of the instance can be extended by appropriate information [221][219][p. 70-71]. It can be distinguished between late modeling and late binding, both providing flexibility by underspecification. Late modeling is the construction of workflow models in such a manner that placeholders mark underspecified parts within the workflow model, which are concretized based on modeling missing workflow fragments during run-time by a user. Late binding is a special kind of late modeling, in which a placeholder construct can be used during workflow modeling which is linked to various fragments that can replace this placeholder during run-time. • Flexibility by change. The previously introduced flexibility types aim at providing flexible execution ways of a workflow instance by defining a workflow model during build-time in such a manner that it legitimizes corresponding run-time flexibility of its particular workflow instances. In cases where this is not possible, the process instance or process model has to be explicitly modified, which is referred to as flexibility by change. It can be basically distinguished between a so-called momentary change referring to the modification of a particular workflow instance and an evolutionary change where the workflow model is modified such that newly instantiated processes comprise these changes. Adaptation in general aims at explicitly modifying a process model or process instance to cope with a changed environmental situation of the generic process or the particular business situation respectively. Hence, adaptation can mostly be considered as flexibility by change. Definition 2. Adaptation is the explicit restructuring of process models or process instances on changes of the environment. If adaptations are performed on workflow instances, they are referred to as so called ad-hoc adaptations (or momentary change). Such ad-hoc adaptations [219][p. 70] enable an agile workflow management (e.g. [159, 261]) and are required during run-time, whenever the process cannot be executed as originally defined. Modifications on the workflow model occur in the course of an evaluation of the process to cope with new circumstances or due to process improvement (evolutionary change), which is also referred
36
2 Foundations
to as schema evolution [219][p. 68-70]. Transferring these modifications to particular workflow instances is denoted as schema migration. Thus, running workflow instances can involve these new adaptations without the need for restarting the entire process. Various research investigated schema evolution (summarized by Rinderle [202] et al.), while the most straightforward manner is that schema evolution is possible (instance is compliant with new workflow model), for example, if the current trace of the workflow instance is included in the adapted workflow model. Considering workflow instances, adaptation is required, whenever the proper execution of a workflow is prevented by the occurrence of unexpected exceptions [66]. In contrast expected exceptions usually occur frequently as an alternative execution variant and for example can already be considered during creation of the workflow model, thus, make use the previously introduced flexibility methods. Ad-hoc adaptations [219][p. 70] enable the dynamic modification to cope with changed circumstances, which only affect one particular workflow instance. This can, for example, be achieved by a breakpoint construct, which is dynamically inserted into the control-flow of a workflow instance [158, 219]. Subsequent tasks will then be suspended. This enables a modification of the control-flow of the not yet executed activities. After modification, the breakpoint can be removed again, which triggers the initialization of the subsequent control-flow. Thereby, the workflow can be changed dynamically during run-time. Adaptation of processes on a low abstraction level, i.e., modification of nodes and edges of the workflow graph require high expertise and increase the error-proneness of the resulting workflows [260]. Thus, changes on workflow models should be performed at a higher level of abstraction ensuring the syntactic quality of the process. Weber et al. introduce so-called adaptation patterns that enable to modify a workflow model or workflow instance on the control-flow level with high-level operations instead of low-level change operations such as remove or add a node or edge. In graph theory, such modifications are considered as graph transformation problems [95] or graph rewriting [61]. The application of these adaptation patterns may further ensure syntactic correctness of the workflow model, if certain preconditions hold. According to Weber et al. this is fundamental, if changes are applied by end users or performed automatically (as presented in this thesis). Of the 14 presented change patterns, the most essential ones will be summarized below. The most basic patterns are the insertion or deletion pattern, which add or remove an entire process fragment. Please note that already the insertion or removal of a single activity requires to insert/remove controlflow edges, which are conjunctly executed in order to ensure the syntactic
2.3 Workflow Management
37
correctness of the entire operation. Weber et al. also introduce patterns to move process fragments to another location, replacing process fragments by another, swap the location of process fragments with each other or copying a process fragment to an additional location. These can be implemented basically by encapsulating insertion and deletion pattern(s). Furthermore, the authors introduce the extract sub-process pattern, which describes that an existing process fragment from a process schema can be extracted and replaced by a sub-process. This sub-process can be inserted into the workflow model again by the inline sub-process pattern. Further adaptation patterns comprise adding and removing control-flow edges and control-flow elements such as AND, XOR, and LOOPS as well as their conditions. The authors also highlight that besides the discussed correctness criteria, flexible support requires to include schema evolution, a version control and analysis support of the changes, support the handling of concurrent changes and regard that changes might be restricted to certain users. Further, they highlight that change reuse is very important since the definition of changes from scratch requires high user skills. In this regard workflow modeling assistance can be beneficial.
2.3.3 Conclusions In this section, the historic origins of workflow management systems were expounded. Next, a detailed view of the basic architecture of workflow management systems was illustrated. This showed that workflow modeling is an essential part of any workflow management system. In this regard, workflow designers could significantly benefit by an automated support on the elaborate task of workflow modeling. Next, various methods for adaptive and flexible workflow management systems were presented, which aim at ensuring the appropriate execution of processes despite changing environments. Moreover, adaptation was classified within these flexibility approaches and discussed in more detail. Processes within workflow management systems more often have to be adapted to suit individual situations and changed requirements. This consequently results in a high demand for workflow modeling support by automated workflow construction, since it is considerably less complex and time-consuming than the modeling of new workflows from scratch.
38
2 Foundations
2.4 Case-Based Reasoning Case-Based Reasoning (CBR) is a discipline in the field of Artificial Intelligence whose origins can be found in cognitive science, machine learning, and knowledge-based systems [18]. Cognitive science investigates the human way of processing information [117, 227, 1]. This kind of processing involves remembering, understanding, experiencing, and learning, which cannot be separated from each other ([117] based on [216, 215]). From a technological point of view, this has been addressed on a large scale by machine learning, which deals with improving the performance of software by experiencing and learning [160][p. 17]. This naturally also requires remembering and understanding of information. According to Russel and Norvig [210], the first thoughts on machine learning have been introduced by Alan Turing in his 1950’s paper “Computing Machinery and Intelligence” [210, 246]. Case-Based Reasoning is a machine learning approach, which resembles human problem-solving behavior [1, 200]. Whenever humans solve problems they access experience made in the past. Hence, they search for a similar problem situation and investigate whether the past solution can be applied or partially used to solve the current problem. Thus, solutions are derived from transferring old experience to new problem situations [40]11 . In CBR terminology, experience that has been acquired during solving a particular problem is captured as a so-called case [200, 16]. A case is a pair, consisting of the problem and a corresponding successfully applied solution. All made experiences, i.e., all cases, are stored within the case base, which serves as the foundation for solving future arising problems. In order to solve new problems, Case-Based Reasoning relies on the fundamental assumption that “similar problems tend to have similar solutions” [117, 1, 200], which is also naturally utilized in human problem-solving behavior [117][p.138139]. Thus, for solving a problem, previously made experiences need to be accessed. The most similar problem within the case base is consequently searched under the assumption to comprise an appropriate solution. The found solution can then be reused and applied to solve the problem at hand. As in human problem-solving, solutions may require modifications to suit the current problem, which are referred to as adaptations in CBR. Although, adaptation is required, parts of the solution can be reused, which can be beneficial because otherwise a solution has to be constructed from scratch without the ability to access previously made experiences. As an example, a 11 in
computer science also known as analogical problem solving
2.4 Case-Based Reasoning
39
car mechanic may reuse the repairing process of a broken backlight directly for a car produced by the same vehicle manufacturer. For other cars, the solutions might require minor adaptations. With no experience, however, this can become an elaborate task12 . After applying the solution, new experience has been gained, i.e., a solution for a new problem has been learned. This experience can again be stored as a case within the case base for improving the competence in future scenarios. First CBR applications have been introduced in the early 1980s [258][p. 18]. Today, Case-Based Reasoning is successfully applied in plenty of industries such as manufacturing, electronics, transportation, banking, insurance, and health care, to name but a few [17, p. 14-15]. In this regard, CBR applications have become numerous and can, for example, be found in help desk support, sales support (w.r.t. product recommendation, cost estimation or market analysis) or troubleshooting. Enterprises can benefit from the CBR methodology, since it can lead to “smarter business decisions (that) can be made in less time and/or lower cost.”’ [17, p. 9]. This is mainly because CBR provides Knowledge Management techniques (see [17]). More precisely, CBR enables the storage and reuse of knowledge, which can significantly increase effectiveness and efficiency of employees by supporting them with required data and information.
2.4.1 CBR Cycle The previously introduced reasoning process in CBR is now illustrated in more detail in reference to the CBR cycle (see Fig. 2.16) presented by Aamodt and Plaza [1]. The CBR cycle consists of the stages REtrieve, REuse, REvise, REtain. These “four REs” are indicated as follows: • Whenever a new problem arises, a search for the case with the most similar problem is executed (retrieve stage)13 . This requires the definition of a similarity measure between the new problem and problems within the stored case base. • Next, the solution of the retrieved case is applied to the current problem (reuse stage). Depending on its applicability, the solution is either directly applied or modified to adapt the solution to the new problem. This modification is executed automatically and referred to as the adaptation of a case. 12 example 13 Some
based on the work of Bergmann [17][p. 11-12 ff] CBR approaches retrieve multiple cases
40
2 Foundations
• In the revise stage, the solution to the new problem is verified. This task is usually performed manually. This means that the quality of the solution for the problem scenario is accessed and revised in case of a dissatisfying solution. • The retain stage is the basic learning process of the CBR cycle. Generally, the newly made experience, i.e., the new case, is stored. The new and revised problem-solution pair increases the competence of the system for further problem scenarios. In addition to the new case, failure information gathered from the revise stage could be additionally included. Hence, similar to a human, this enables that the system could also learn from mistakes avoiding the repetition of similar mistakes in similar problem scenarios. Problem
New Case
TR
RETA
E
IN
RE
IE V
Learned Case
Confirmed Solution
Tested/ Repaired Case
Retrieved Case
Previous Cases
General Knowledge
V
SE
RE
IS E
Solved Case
Suggested Solution
Figure 2.16: CBR cycle by Aamodt & Plaza [1]
RE
U
New Case
2.4 Case-Based Reasoning
41
2.4.2 Knowledge Containers Another perspective on CBR has been introduced by the so-called knowledge containers by Richter [199][200, p. 34-37] (see Fig. 2.17). CBR is a knowledge-based system, thus it is heavily relying on knowledge. The containers reflect that various types of knowledge are accessed during the different stages of problem solving. The knowledge containers classify knowledge into vocabulary, similarity, case base, and adaptation knowledge. The construction of these knowledge containers is essential to build a CBR system [16, p.50-51] [16, p. 94].
Vocabulary
Case Base
Similarity
Adaptation
Figure 2.17: CBR knowledge containers by Richter [199][200, p. 34-37]
• The vocabulary defines the possible use of terms and structure of experience. Hence, the vocabulary container “[. . . ] explicitly describe(s), how knowledge elements can be used” [200, p. 35]. Thus, it specifies a domain ontology capturing information on the application domain. More precisely, the vocabulary determines the problem space as well as the solution space of the cases. A common structure for representing cases is an attribute-value definition. This means that the problem description part as well as the solution part is described by a fixed set of attributes. Each of these attributes is specified by values from a certain range, for example, boolean values, numerical values, strings or sets of values. The attributes and the corresponding value range define the vocabulary container, which means that cases can be represented only by the defined attributes and their corresponding value range [16, p. 61-62]. Thus, the vocabulary basically defines the case representation. • Based on the vocabulary, a case is then represented as a pair of problem and solution. The case base stores all cases and is a main source of knowledge in CBR systems. Usually, an initial case base is constructed during the development of the CBR applications. Further cases commonly derive from the retain stage after a solution for a new problem has been found.
42
2 Foundations
• The similarity container comprises similarity measures which assess the similarity between two problems. This enables to identify the most similar problem(s) with regard to the current problem (see Fig. 2.18). Naturally, a solution is searched that provides a high utility for the new problem. Thus, the defined similarity measure usually accesses the utility of the retrieved solutions according to the defined problem, for instance, it is assumed that the higher the similarity between the defined problem and the case problem the higher the cases’ (solution) utility. • After the most similar problem(s) have been retrieved, the solutions are reused to solve the current problem. Whenever a modification of a solution is required, in order to be applicable to the current problem, the adaptation container used. This container comprises adaptation knowledge to fulfill this task, for example, in a set of rules and adaptation algorithms. The definition of this container is important, as ”without adaptation, Case-Based Reasoning (CBR) systems are restricted both in scope and application” [257]. The goal of adaptation is to modify the solution such that it matches the current problem. Knowledge can be “shifted” between those knowledge containers [200, p. 279], still being able to solve the same problems. This means that insufficient knowledge in one container can be compensated by more knowledge in another container. For example, an insufficiently filled similarity container or a smaller case base could be compensated by a more adaptation knowledge. The problem-solving capacity is hence not necessarily bound to a particular container. In the following, the vocabulary (case representation), similarity (retrieval), and adaptation knowledge container will be described in more detail14 .
2.4.3 Case Representation Considering the vocabulary container, Case-Based Reasoning systems can use various kinds of case representations. Bergmann distinguished between three categories of CBR systems according to the case representation: textual CBR, conversational CBR, and structural CBR [17] [16][p. 53-61]. • In textual CBR cases are defined only by a textual description that follows no or marginally defined structures (such as headers). Querying is executed by entering a free text based on which the CBR system identifies relevant documents. 14 The
case base container is neglected, since it is simply defined as a set of cases
2.4 Case-Based Reasoning
43
• Conversational CBR represents cases by a set of answers and corresponding question. There is, however, no clear case structure since the questions and answers may differ between the cases. In order to identify the most relevant cases that are annotated with a corresponding solution, questions are posed to the user based on the questions available in the cases. • The last type of CBR system is referred to as structural CBR. Here, an explicit vocabulary is used which defines how a case can be structured. A straightforward possibility is an attribute-value representation as previously sketched. Besides attribute-value representations, object-oriented, graph or predicate logic representations, for example, can be used for structural CBR [16][p.61 ff]. Though the structural CBR has a relatively high initial effort, it is appropriate for complex applications and characterized by a high accuracy. In this work, a structural CBR approach is presented. Traditionally, in structural CBR, a case c is represented as a pair of problem and solution c = (p, s) based on the vocabulary V determining the problem space P and the solution space S, i.e., c ∈ P × S [16, p. 50]. However, as pointed out by several authors (e.g., [74, 237, 26]), this traditional view on cases is not always suitable. In some application domains, the problem and solution cannot be clearly distinguished, which means that a case c consists of a single description only. Thus, the vocabulary rather describes a description space C to define cases, i.e., c ∈ C.
2.4.4 Retrieval As previously introduced, retrieval aims at finding the most similar problem description in the case base for a particular problem situation, since it is assumed that the corresponding solution is the most useful one [16, p. 94 ff]. In formal terms, this means that for a given problem description p ∈ P, the solution si ∈ S of a case ci = (pi , si ) with the highest utility value u(p, si ) is searched (see Fig. 2.18). Thereby, the utility depends on the current problem-solving task in the given domain. This results in a major problem, since it is not feasible to assess utility a priory, i.e., before the solution is actually applied to the problem. Thus, CBR aims at approximating the utility of the cases by the use of a similarity measure suitable for the particular environment of the CBR system. A similarity measure is commonly defined between the description of the current problem p ∈ P and the problem description pi ∈ P of a particular case ci = (pi , si ) within the case base. Thus, a similarity
44
2 Foundations
new problem p utility?
new solution s
retrieve similar expecience similarity sim(p,pi)
reuse solution (p.r.n. adaptation)
case ci=(pi,si) problem pi
case base
measure is a function sim : P × P → [0, 1]. The restriction to a value range between [0, 1] enables to explicitly express unrelated problem descriptions (0) or identical ones (1). The latter means that the solution can be entirely applied to solve the new problem. Thus, it can be accessed “how similar” the two related problem descriptions are. In general, it is useful that the similarity measure enables a ranking between the cases in order to estimate the best-matching case. For cases without a clear distinction between the problem and the solution part, the similarity measure is not specified between two problems, but between a query q ∈ Q and the case c ∈ C instead, i.e., sim : Q ∈ C. Here, the query q ∈ Q can either be a partial description of the case (i.e., q ∈ C) or represented by a specific query language Q.
solution si
Figure 2.18: Case retrieval based on Bergmann [16, p. 94]
Literature discussed various kinds of similarity measures for CBR retrieval which can be mostly divided into syntactical and semantical similarities [1][p. 14]. Syntactic measures are useful in domains in which the acquisition of general domain knowledge is very difficult. In contrast, semantic similarity measures include the contextual meaning of a problem description in an environment. The definition of such a semantic similarity measure can become a very complex and elaborate task. A basic approach aiming at simplifying the modeling of similarity measures is the local-global principle. It can be applied to attribute-value representations and many other case representations. A similarity measure according to the local-global principle is separated into local similarity functions and a global similarity function. This means that for each attribute a local similarity function is defined in the value range [0, 1]. To estimate an overall similarity, the local values are aggregated to an overall similarity value, for example, by computing the average, weighted average, maximum, or minimum similarity value. Please note that as soon as adaptation is provided by the system the utility function u(p, s) may change, since a case with an initial lower utility may then be adaptable to provide a matching solution for p. Thus, during
2.4 Case-Based Reasoning
45
retrieval also the adaptability of cases [233] should be considered in order to maximize the previously introduced similarity measure at best [16][p. 228].
2.4.5 Adaptation As long as the solution of the best-matching case can be applied, retrieval is sufficient and no adaptation is required. However, as pointed out by Richter and Weber [200][p. 198], problems are usually not the same and the former solution cannot be exactly reused for altered problem scenarios. In this case, adaptation is required. In CBR adaptation aims at finding novel solutions for newly arising problem situations [16][p. 141] by the reuse of cases. Thus, adaptation in CBR relies on three requirements [117][p. 21-22]. First, a method that is able to identify those parts that represent deficiencies in the solution with regard to the given problem situation (1). Further, adaptation strategies are required, which use this information to modify the case in such a manner that the solution becomes more appropriate (2). The strategies define modifications on an abstract level, for example, replacing the value of an attribute. The third requirement is the availability of adaptation knowledge, which is domain-specific and defines, for example, which and how these values, can be replaced (3). In the following, these three requirements are illustrated in more detail ((1) Utility and Adaptation (2) Adaptation Models (3) Learning of Adaptation Knowledge). Utility and Adaptation Commonly, adaptation is attempted, whenever the solution of the bestmatching case is not sufficient for solving the current problem situation. This means that the solution does not approve as fully applicable. As the utility is approximated by the similarity measure, adaptation is attempted, whenever the similarity value is below 1. Thus, a deficiency of the provided solution is detected. If the similarity measure is based on the local-global principle, for example, insufficient parts can be easily obtained by identifying those which have a local similarity value below 1. During adaptation, it is searched for a solution with the highest utility or similarity value respectively. Thus, the selected faulty parts have to be replaced by more appropriate solutions with regard to the similarity measure. The provided solution is then usually more useful than the retrieved solution as at it better matches the described problem scenario [200][p. 201].
46
2 Foundations
Adaptation Strategies In order to achieve this, various adaptation approaches have been presented for CBR, which can be classified into different so-called adaptation models [264] (see Fig. 2.19): Adaptation Transformational Adaptation Substitutional Adaptation
Generative Adaptation
Structural Adaptation Compositional Adaptation
Figure 2.19: Adaptation models according to Wilke and Bergmann [264]
• Transformational Adaptation describes that the solution of a case is transformed to a new solution in order to solve a new problem. This means that case elements are modified, i.e., their values are substituted (Substitutional Adaptation) or the case structure is modified which means that case elements are deleted or added (Structural Adaptation). • Generative Adaptation is based on general domain knowledge that is used by an automated problem solver to solve problems from scratch. This requires the definition of the entire domain knowledge which is an expensive knowledge engineering task and in many cases not feasible. • In contrast to the previous methods, Compositional Adaptation involves multiple cases to provide a solution. More precisely, several fragments of various cases in the case base are combined, which is usually achieved by transformational or generative adaptation methods. The previously introduced adaptation models represent basic strategies how adaptation of cases can be performed in principle. These adaptation strategies will be explained in more detail below. Further, additional adaptation methods such as hierarchical adaptation, adaptation by generalization as well as case-based adaptation will be discussed, which usually utilize one or more of the previously introduced adaptation models.
2.4 Case-Based Reasoning
47
Transformational Adaptation The basic principle of transformational adaptation is to transform the case c1 stepwise into an altered case cn by applying various adaptation steps as1 . . . asn aiming at achieving a higher utility by altering the case. as
as
as
c1 →1 . . . c2 →2 . . . →n cn
(2.1)
Transformational adaptation is usually facilitated by adaptation rules or adaptation operators. Adaptation rules aim at transferring the solution of the original case in order to find a solution for the given problem scenario. “An adaptation rule [. . . ] represents certain knowledge about how a retrieved case must be changed if its characterization differs from the current situation“ [16, p. 143]. This means that adaptation rules must be aware of the problem situation (described within the query case) in order to modify the retrieved case to an appropriate target case, thereby aiming at compensating the deficiencies between the query and the retrieved case within the desired target case [16, p. 141-148] (see Fig. 2.20). More precisely, adaptation rules consist of a precondition and conclusion part. The precondition part determines whether the rule can be applied and refers to the target case, problem situation and the retrieved case. If a rule is applied, a set of actions on the target case is performed as defined in the conclusion part of the adaptation rule. The rule-based adaptation itself is then straightforward. All rules whose precondition are matching are applied, i.e., the rule-based adaptation represents a single adaptation step resulting in a final target case. retrieved case
adaptation rule
target case
query case
precondition conclusion
Figure 2.20: Rule-based adaptation taken from [16][p. 156]
Another approach to transformation adaptation are adaptation operators. An “adaptation operator is a partial function that transforms a case into a successor case. An adaptation operator represents valid transformations in the sense that if a valid case is transformed the resulting case after adaptation is still valid.” [16, p. 149]. The applicability of operators may
48
2 Foundations
also depend on certain conditions [16, p. 141-148]. However, the operators differ from rules since the operators simply describe possible transformations of the solution, while rules aim at compensating the difference between the new problem and the retrieved problem and describe corresponding actions within the solution part. ...
o1
retrieved case
o2 o3
successor case
...
successor case
...
successor case
...
...
adapted case
...
Figure 2.21: Operator-based adaptation based on [16][p. 232]
Operator-based adaptation is enabled by chaining several operators in a sequence (see Fig. 2.21). The first operator transforms the retrieved case in a successor case, which can again be transformed by another operator. For each of these cases possibly multiple operators are applicable, i.e., the semantic correctness of the case has to be ensured. Thus, each case transformation may result in multiple successor cases. This results in a search for the bestmatching solution according to the previously introduced adaptation utility. Consequently, this can also be considered as a search process, involving several adaptation steps (each applied operator). A drawback of operator-based transformation is that the search space is potentially large, resulting in a high complexity [16][p. 231-232]. In this regard, Richter and Weber sketched the reduction of the search space or the application of heuristics [200][p. 199-210]. Further, based on Schmitt et al. and Meyfarth [220, 149], Bergmann illustrated the potential of interactive adaptation by operator-based adaptation in which the user selects components to be adapted as well as operators applied to these components. The user is also able to undo previously executed adaptation steps. This approach can significantly reduce computational complexity and further enables more flexible adaptation approaches.
2.4 Case-Based Reasoning
49
Generative Adaptation In contrast to transformational adaptation, generative adaptation does not start with an initial case but plans the solution to the given problem from scratch [16, p. 222-224]. A generative planner, however, needs a vastly greater amount of knowledge. Furthermore, generating solutions from scratch is usually not satisfying with regard to quality of the solution and computation time. Thus, generative adaptation in CBR usually only alters the solution parts required to compensate the difference between the problem and the retrieved case. More precisely, the planner uses solution traces capturing information of previous planing episodes. It replays those traces such that it reuses those solving paths that can be applied to the current problem and only solves the remaining parts from scratch. Compositional Adaptation Compositional adaptation usually involves the reuse of case components contained in various cases [16][p. 224-225,236239]. A prerequisite is that cases must be divisible into components that are mostly independent from each other. This means that a case can be separated into independent sub-problems with corresponding sub-solutions. The basic idea is to replace components of cases by components of other cases. Thus, an overall solution can be identified by replacing inappropriate sub-components and combining valid sub-solutions. Constraints have to be defined, which verify the correctness of the component replacement. Thereby, the components represent the adaptation knowledge from the adaptation container. Compared to the previous approaches, this can significantly lessen the adaptation knowledge acquisition effort. The compositional adaptation approach is now illustrated on the basis of an adaptation cycle for compositional adaptation by Bergmann [16][p. 236-239] (see Fig. 2.22). Prior to adaptation usually retrieval with a given query (or problem description) is executed with the goal to identify the best-matching case from the case base15 (see CBR cycle16 ). The retrieved case is then handed to the adaptation algorithm along with the corresponding query. First, the adaptation algorithm determines which part of the case is to be modified. This can be based on the query with regard to the similarity value of the component. Another possibility is to fragment the case into components and attempting adaptations for each component. In the next stage, alternative components are retrieved and ranked according to their similarity value, i.e., according to their usefulness. The most suitable component is then applied and the modified case needs to be verified with 15 A
case could possibly be adapted directly without retrieval presentation of the adaptation cycle differs marginally
16 The
50
2 Foundations
regard to the constraints that must be fulfilled. Constraints ensure that the replacement still represents a correct case. Additionally, it is validated whether the similarity value is increased, indicating a more suitable solution. If either of these conditions does not hold, no applicable component is found. Then, the component is replaced with the next most suitable component. Again the modified case needs to be validated. This is repeated until a valid replacement has been found or all replacements have been attempted. Next, the algorithm continues with the next component to be replaced with a more suitable one. This kind compositional adaptation approach is kind of a hill-climbing search as each component is replaced by the best suitable replacement component and replacements are subsequently applied. query retrieval
case base
retrieved case
similarity & constraints
selected part selection of adaptation knowledge
applicable adaptation knowledge
adaptation knowledge
application of adaptation knowledge
similarity & constraints
continue adaptation
modified cases case selection
adapted case finish adaptation compositional adaptation
final case
Figure 2.22: Compositional adaptation cycle based on [16][p. 236]
2.4 Case-Based Reasoning
51
Hierarchical Adaptation Hierarchical adaptation is not considered in the previously introduced adaptation models but usually involves them (e.g., [30, 232]). The basic idea is that cases defined at are different levels of abstraction. Abstraction means that the complexity of the case representation is reduced and adaptation starts from the most abstract representation. The solution is then gradually refined by adding more and more details to the solution. Adaptation by Generalization While a single case is represented by a problem-solution pair, a “generalized case can be understood as a possibly infinite set of closely related problems and solutions”[241][p. 2]. Thus, as concluded by Tartakovski [241][p. 2]17 , a generalized case provides solutions for a set of problems. Consequently, generalized cases [241, 16] also provide a means for adaptation and contribute to a reduction of the size of the case base, since multiple cases can be represented by a single one [16][p. 73]. Assuming an attribute-value representation, this means that a single case is denoted by a concrete value for each attribute, while a generalized case may contain, for example, a set of valid values or a range of numerical values for a case. Thus, the case already contains case-specific adaptation knowledge [28]. For the purpose of adaptation, the generalized case is specialized such that it matches the target problem to be solved [30]. This means for the generalized attributes, best-matching values are chosen. However, for generalized cases, there is usually an increased representation or acquisition effort and retrieval can become more complex [16][p. 73]. Please note that abstraction used for hierarchical adaptation differs from generalization [21]. Generalization aims at covering more problems and solutions and thus extends knowledge stored in the case. In contrast, abstraction reduces the details of the case, i.e., reducing the knowledge stored in the case [29]. Case-based Adaptation Case-based adaptation is another approach recently introduced by various authors (e.g., [152, 127, 124, 51]). The basic idea is that a so-called adaptation case captures the retrieved and the adapted case as well as the corresponding transformations. The adaptation itself becomes case-based. More precisely, an adaptation case most similar to the current problem situation and the retrieved case is identified and subsequently applied to the retrieved case by performing the stored transformations in order to provide a new and adapted solution. This approach may involve any of the previously introduced adaptation methods. 17 based
on [16, 28]
52
2 Foundations
Learning of Adaptation Knowledge All the previously introduced adaptation methods require adaptation knowledge to be defined, which is also stored in the adaptation knowledge container [16][p. 141]. The engineering load for adaptation knowledge, however, is huge and complex [92]. An engineer of adaptation knowledge must consider numerous possible future problems as well as dependencies between cases, problems, solutions, and their transformations with regard to new arising problems. Furthermore, adaptation knowledge usually is bound to a particular domain and cannot be reused. This leads to a so-called acquisition bottleneck of adaptation knowledge [93]. This bottleneck reduces the adaptation capabilities and thus the competence of the entire CBR system significantly, since the adaptation container is only sparsely defined. Thus, various methods have been investigated to learn adaptation knowledge automatically (e.g., [92, 144, 49, 106, 10]). This is particularly important for complex cases such as workflows, since the definition of adaptation knowledge is even more challenging. Based on Hanney and Keane [92] learning adaptation knowledge can be distinguished depending on the knowledge source into three approaches: • Learning from domain-specific knowledge addresses the definition of general adaptation knowledge strategies, which are then applied to domain knowledge already available. This results in domain-specific adaptation cases, which can be used to derive additional domain-specific adaptation knowledge. Thus, domain-specific knowledge already available could also be reused. However, Hanney and Keane argue that the effort for the definition of general transformation strategies is expensive. • Another opportunity is to learn adaptation knowledge from user interactions. To this end, the CBR system must enable a modification of the solution by the user. If the user then changes the current solution, adaptation knowledge can be drawn from this modification. This approach significantly relies on the expert level of the user. • Finally, the cases stored in the case base provide another approach to obtain adaptation knowledge. Learning from cases is, for instance, based on the differences between pairs of cases. Based on these differences adaptation knowledge can be learned automatically. However, as this is a fully automatic process, the learning algorithms must ensure an appropriate quality of the learned adaptation knowledge.
2.4 Case-Based Reasoning
problem
53
known solution
Query Case solution difference
Adaptation Case
problem difference
similar problem
solution difference problem difference problem proposed solution
proposed solution
Retrieved Case
Figure 2.23: Learning adaptation knowledge from the case base [106]
In Figure 2.23 the latter approach is illustrated by a more concrete example based on the works of Craw, Jarmulak and Rowe [51, 106]. A case is extracted from the case base and a retrieval for the most similar case excluding the extracted case is executed (leave-one-out approach). Next, both problem descriptions can be compared and a problem difference can be constructed. In the same manner, a solution difference can be constructed resulting in a set of actions that have to be executed in order to transform one solution to another. Knowing which problem differences result in which solution difference is adaptation knowledge. This knowledge can be transferred on new scenarios in which only a problem part is known. Thus, differences between the retrieved solution and the desired solution can possibly be compensated by such adaptation knowledge.
2.4.6 Maintenance The previous sections described the knowledge containers of a CBR system in detail. These knowledge containers, however, must be maintained, for example, to adapt the CBR system due to a changed environment or changed knowledge or to improve the performance with regard to effectivity and efficiency [200][p. 260]. Thus, maintenance is important for CBR systems. Roth-Berghofer and Iglezakis [207] highlighted that maintenance is not reflected by the CBR cycle as it merely focuses on the application phase, which is not entirely covered by the retain step. In other words, retainment is certainly an elementary maintenance activity of CBR, but maintenance in general is of a broader scope [200][p. 260].
54
2 Foundations
Problem-solving activities retrieve, reuse, revise, retain
Online CBR - Application
Knowledge base
vocabulary, case base, similarity, adaptation
trigger maintenance
Offline CBR - Maintenance
Figure 2.24: Application and maintenance of CBR systems, based on [180] adopted to CBR
Based on the work of Nick et al. [180], Figure 2.24 illustrates the difference between the online application stage and the offline maintenance stage of CBR systems. Problem-solving activities are supplied by the four REs. During this, the CBR system accesses the knowledge base consisting of the previously introduced knowledge containers: vocabulary, case base, similarity and adaptation. For linking the CBR application stage, interfaces to the particular problem-solving application or problem-solving context must consequently be identified. Maintenance, in contrast, is performed offline as it is not involved in the usual CBR application cycle. Maintenance activities can be triggered manually or automatically, for example, by the running application if a new case is stored in the case base. In contrast to the application stage (except of the retain step), maintenance activities modify the knowledge base. This can be performed either by a maintenance operator or the CBR system itself involving one more knowledge containers [200][p. 260-266]. In this regard, the vocabulary is affected, for example, in an attributevalue representation, by adding or deleting an attribute or changing its value range. Case base maintenance includes, for example, the removal of invalid cases, removal of cases with low quality or adding of new cases. The maintenance of the case base further includes the modification of a case, for example, by
2.4 Case-Based Reasoning
55
changing values or by generalization or specialization of a case. In addition, a well-known problem is that a large case base reduces the performance as the similarity of the new problem and all the cases have to be calculated each time. Thus, another maintenance activity is the reduction of the case base size in order to speed-up retrieval time of the application without affecting the competency of the system. Thus, it has to be decided, which cases to retain and which to discard [234]. Cases may be dispensable since their problem and solution space is already covered by other cases which means that the case can solve multiple problems (possibly using adaptation). Thus, it can replace several cases within the case base without reducing the competence of the system. More precisely, given a certain problem scenario, reachability defines the set of cases, that can be adapted such that they provide a solution for the given problem [234] [200][p.206-209]. Coverage of a case is defined as the set of all problems that can be solved by it. Thus, removing cases, without reducing the coverage of the entire CBR system does not decrease its competency. For the case base, also the provenance [200][p. 265-266] of cases is important. Cases can be added to the case base because they have been created newly or adapted from another one. This information can be useful for the reuse of such cases. Further, a case could be adapted successively several times. Thus, capturing the history of a case provides helpful information on the reliability and the context of the case and can be, for example, employed during maintenance decisions. Another approach that can mostly be seen as maintenance of the case base is the so-called case completion, which aims at completing missing information in the case, or correcting false case information, for example, by completion rules [200][p. 193-198]. Bergmann defines a completion rule as “a rule that represents knowledge about how an attribute value of a case can be determined from other attribute values of the same case under certain conditions” [16][p. 142]. Hence, a completion rule consists of a precondition and a conclusion similar to those of adaptation rules. In contrast to adaptation rules, the precondition may only refer to the case itself as completion rules aim at enhancing the information within the case itself, independent from a particular problem-solving scenario. More precisely, missing information is completed or inconsistent information resolved (conclusion) if the precondition is matching. Completion rules can consequently enhance the information of the cases within the case base or the problem description of the query. This can lead to a more precise similarity assessment as well as increased adaptation capabilities [16][p. 142]. Both completion rules and adaptation rules can easily be combined in order to benefit from each other.
56
2 Foundations
Maintaining the similarity container addresses modifications on the similarity measure. Changes on local-global similarity measures, for example, could be the adjustment of weights or replacement of local similarity measures. Moreover, similarity measures can be learned automatically from the case base such that the retrieval accuracy is improved [236]. Furthermore, the adaptation-guided retrieval also falls into this category, if similarity measures are modified in order to consider the adaptability of the cases (e.g., [128, 233]). Finally, the adaptation knowledge can be maintained, which means that adaptation knowledge, for example, adaptation rules are added, deleted or changed. In the previous section, it has been illustrated how adaptation knowledge can be learned automatically, which is consequently considered as a maintenance activity and is not part of the application stage of a CBR system. Further, also the retention of adaptation knowledge has been investigated, since the adaptation knowledge can become very large [104, 125].
2.4.7 Conclusions The modeling assistance presented in this thesis is based on the Case-Based Reasoning methodology. Thus, the CBR section introduced the history and foundations of Case-Based Reasoning. In particular, the CBR cycle was presented, which must at least partially be considered during the development of a CBR system. It was shown that knowledge is spread over various containers, conjointly constructing the competence of a CBR system. This section further demonstrated the differences between the application stage (the four REs) and the maintenance stage (offline maintenance activities). Moreover, adaptation was illustrated in more detail, since the presented workflow modeling assistance will fundamentally draw on traditional CBR adaptation methods. Thus, various adaptation models and approaches were illustrated. Since these adaptation approaches are based on adaptation knowledge and the manual acquisition of adaptation knowledge is an elaborate task, also the automated learning of adaptation knowledge was discussed. This section further showed that adaptation itself is performed during the application stage of the CBR system, whereas the learning of adaptation knowledge is a maintenance activity.
2.5 Process-Oriented Case-Based Reasoning
57
2.5 Process-Oriented Case-Based Reasoning Process modeling is basically a problem-solving activity [198]. Today, in companies already large repositories of modeling experience stored in the form of workflow models exist [205]. Combining these facts, it stands to reason that the CBR methodology is employed for the purpose of workflow modeling support. This approach is also known as Process-Oriented Case-Based Reasoning (PO-CBR) [157], which applies CBR to process and workflow management. It mainly focuses on the reuse of experience gained in previous modeling, execution or change activities in order to support the creation of process models and to support the adaptation of process models or process instances. In PO-CBR, a case is usually specified by a process model, for example, a workflow. The basic idea is to support workflow modeling based on employing the retrieval stage of CBR methods in such a manner that the user models a small set of workflow elements and the best-matching workflow is suggested as kind of an auto-completion (e.g., [83]). Thus, the workflow does not have to be modeled from scratch. There exist various approaches addressing the retrieval of workflows in the context of PO-CBR (e.g., [163, 20]). In fact, many today’s WfMS support the search for workflow models in a variety of ways. However, search cannot match up to the increasing demand of more and more individual workflows and continuously changing business environments. The retrieved workflows less likely match the given scenario, resulting in an extensive modification of the found workflow model which is a complex and error-prone task. In this regard, adaptation capabilities of the POCBR system could modify the retrieved workflow automatically in such a manner that it more likely matches the given scenario. This increases the competence of the CBR system and reduces the effort for the workflow designer to perform workflow modifications. The major challenge of implementing a CBR system according to Aamodt and Plaza [1][p. 11], is the development of suitable CBR methods for particular domains and application environments, since CBR methods cannot be applied universally. This particularly concerns PO-CBR, as traditional CBR methods as previously introduced, cannot be applied straightforwardly on PO-CBR, because it deals with complex case representations, as will be shown in the next section.
58
2 Foundations
2.5.1 Case Representation As PO-CBR deals with case representations of processes (e.g., workflows), usually represented as graphs, it can mostly be categorized as a structural CBR approach. This is because processes or workflows commonly follow a predefined structure, which is usually not the case in textual or conversational CBR approaches. Despite the fact that many structural CBR approaches exist, they cannot be applied straightforwardly on PO-CBR, because it deals with so-called complex case representations. In alignment with the characteristics described by Gebhardt et al. [74][p. 25-26]18 , workflows represent complex cases. First, workflows are denoted by a complex structure, which means that they cannot be represented simply by an attribute-value representation. Furthermore, there is no distinction between the problem and solution, as the workflow models represent both at the same time. Moreover, CBR with workflow cases naturally involves complex similarity measures and complex adaptation methods, since graph algorithms are typically characterized by a high complexity [16][p. 72]. Finally, for workflows as well as for complex cases in general, adaptation cannot be neglected, since a single case usually cannot be applied to a broad range of problems. Overall, this makes workflow reuse quite complex, since retrieval and the required adaptation are computationally expensive.
2.5.2 Retrieval & Adaptation In the following, a holistic view on PO-CBR retrieval and adaptation is illustrated (see Fig. 2.25). Since PO-CBR commonly dissolves the traditional separation of a case between a problem and solution part, a case is represented by the workflow itself. Hence, the case base consists of a set of workflows, also referred to as workflow repository. The problem is described as a query defining the restrictions and requirements on the desired workflow. This query could, for example, be a partial workflow or a set of workflow elements which are desired or undesired respectively. Similarity measures in PO-CBR must estimate how well the query q matches the workflow case, i.e., sim(q, wi ) → [0, 1] in order to retrieve the best-matching workflow case wi . In case that some query elements do not match the best-matching workflow, i.e., if sim(q, wi ) < 1, adaptation is required. The goal of this adaptation stage is then to increase the value sim(q, wi ) as good as possible. 18 The
authors name additional possible properties of complex cases. Only the most basic ones regarding workflows are listed below.
query q ?
workflow
retrieve best matching workflow similarity sim(q,wi) return directly or adapt workflow according to q
59
workflow case wi
case base / workflow repository w1 ... wn
2.5 Process-Oriented Case-Based Reasoning
Figure 2.25: PO-CBR setting for workflows
Consequently, this setting supports the modeling of workflows by searching for a best-matching workflow according to the user-defined query and if necessary, adapts the workflow to its restrictions and resources. Thus, based on the user query a workflow is automatically constructed. Although, the suggested workflow certainly does not match the expectations of the user intended by the defined query, the workflow does not have to be modeled from scratch. Rather, the suggested workflow just needs to modified, which eases and speeds-up the modeling process. Another usage scenario for these workflow adaptation capabilities of the PO-CBR system, is the adaptation of an arbitrary workflow model. The adaptation process here adjusts the respective workflow model according to the needs specified in a query. Furthermore, the adaptation of workflow instances during execution could be supported, e.g., if unexpected situations arise and the workflow needs to be modified to ensure a correct execution. In this case, additional actions have to be taken into account to ensure that the modifications on the workflow model are not in conflict with the running instance. One possibility is to adapt the corresponding workflow model (schema evolution) and migrate the adapted workflow model on the running instance (schema migration). Schema evolution and schema migration have already been proven of use for workflow instance adaptations (see Sect. 2.3.2 or [44] respectively). In this case, the adaptations must additionally ensure that already executed parts of the workflow are not altered. Migration of the workflow schema on the running instance is otherwise mostly hampered. In this regard, direct workflow modeling, workflow model adaptation as well as workflow instance adaptation can be supported by PO-CBR. The
60
2 Foundations
experience gained in any of these three scenarios could be stored within the PO-CBR system and reused in following similar scenarios. This triggers the learning cycle of the CBR system, improving the capabilities in future situations. Despite its importance, little research in the field of PO-CBR exists so far, in particular addressing the adaptation of workflows or workflow modeling support (see Sect. 2.6).
2.5.3 Conclusions This section introduced a holistic view on PO-CBR and its usability with regard to workflow modeling support. Furthermore, the special characteristics of PO-CBR that prevent the straightforward application of established CBR methods were described. This again emphasizes the need for the development of new PO-CBR methods that will be presented in this work. This section further showed that retrieval as well as adaptation by means of PO-CBR can be useful approaches to enhance workflow reuse. Thus, PO-CBR also has a strong interrelationship with Experience and Knowledge Management, as already pointed out by Bergmann [16] and various other authors (e.g., [200, 137, 5]) for CBR in general. More precisely, the modeling of processes is a method to capture, formalize and structure knowledge of the business process making it accessible for others and is in this regard a powerful tool for knowledge management [108]. PO-CBR can consequently serve as a methodology to manage and reuse formalized knowledge on business processes. The reuse of business process knowledge by PO-CBR can significantly improve quality through reuse of established and optimized processes and further reduces process modeling time, by avoiding the modeling of identical process parts multiple times [139]. Thus, workflow modeling support by means of PO-CBR, which utilizes knowledge and experience gained from previous modeling scenarios, is a highly important field also from the perspective of knowledge and experience management.
2.6 Related Domains & Related Work This section will first sketch some visionary application examples for the presented workflow modeling assistance. These application examples comprise the traditional domain of business processes and also various other domains such as scientific workflows, social workflows, and workflows representing medical treatment processes, programming code, or cooking instructions. Thus, a view on its potential and importance in various related domains
2.6 Related Domains & Related Work
61
is provided, where the workflow paradigm is already successfully applied. Next, related approaches to workflow modeling support in general will be discussed, which also includes existing approaches in the field PO-CBR. For this purpose, a taxonomy of workflow modeling support techniques is presented, in which the related work is then classified and presented.
2.6.1 Application Visions Workflow modeling assistance can be beneficial for numerous domains, since there exist a broad range of applications employing the workflow paradigm. In Section 2.2 it has been shown that these base on various types of workflow representations or workflow modeling languages. This section will sketch a vision of some generic workflow modeling support scenarios, showing its importance and potential. The presented methods in this thesis could serve as a foundation to investigate workflow modeling assistance, for example, in these domains or for other workflow representations. From the traditional business perspective, the use of workflow management assistance concerns many domains such as banking, accounting, manufacturing, brokerage, insurance, government departments, telecommunications, university administration, and customer service [161]. In these domains, the assistance could support to facilitate more customer-oriented processes or processes suited to a particular business situation. In this regard also workflows with a low repetition rate, which have a relatively high effort in workflow modeling, could be significantly supported. Further, cooperative experience on workflow modeling can be shared among the employees, reducing modeling time, and significantly reducing error-proneness of such workflow models. Storing and capturing as well as sharing experience also ensures that this valuable knowledge is no longer bound to a particular situation or employee. Another well-established application area is the in silico analysis of large data sets employing so-called scientific workflows in fields such as life science, geophysics, climate research, or bioinformatics [36, 238, 79]. Further, scientific workflows can also be applied in health science, for example, to detect diseases (e.g., [239]). A scientific workflow aims at providing an abstract model consisting of components while the latter actually perform the desired computations. The importance of sharing knowledge on scientific workflow modeling is already reflected by the fact that large repositories of scientific workflows are publicly available19 . However, these scientists are commonly 19 e.g.,
http://www.myexperiment.org [77] or http://www.mygrid.org.uk [238]
62
2 Foundations
novices in workflow modeling [36]. Thus, workflows seem initially rather complex than helpful for those users, who then prefer to write required scripts on their own. This is, for example, because they do not know, which components to use and how to align them. Cohen-Boulakia and Leser [36] highlighted that the manual reuse of workflows is consequently hampered, which is one reason why scientific workflows so far not have found acceptance by their particular users. Thus, workflow modeling assistance including retrieval and adaptation can facilitate a broader reuse of workflows since this eases the finding of a solution suited to the particular needs. Further, reusing of those workflows eases the comparison of results of various researchers. Moreover, reutilization of processes or process fragments in new data analyses is increased which helps scientists to extract advantages from the experiences of other scientists. In the medical domain [176, 52, 90] workflows are used, for example, to represent the treatment of a patient during hospitalization. In certain situations, the medical treatment process needs to be tailored to the particular patient, e.g., due to drug incompatibility. A single physician handles many patients and findings per patient a day [176], which makes adapting of those processes to the particular patient an elaborative task. Furthermore, these processes involve numerous organizational units or different hospitals or health care providers which further increases their complexity [52]. The discovery of new drugs, medical treatments, devices, or changes in hospital policies also enforces a constant adaptation of the treatment processes [90]. However, considering all possible situations in a single process is not feasible. Recapitulating, adaptations are frequently required. Adaptations in treatment processes can be time consuming because alternative solutions, if not already known to the physician, need to be acquired by reading literature or discussing with colleagues. Furthermore, side-effects and dependencies on the medical treatment have to be considered. Adaptation of these processes is required during run-time to be able to handle newly arising information or unexpected situations in a flexible manner but also during the modeling stage of the workflow itself. In this regard modeling assistance of medical treatment processes could be beneficial, as it helps considering that certain medical treatments or diagnostic test are not appropriate for the particular patient and propose alternative ways. Improving clinical processes is important as it can shorten the length of stay, reduce the number of procedures, and decrease the number of complications [52]. Workflow modeling support is a potential candidate to achieve such improvements. Social workflows [80] support tasks in everyday life such as planning a trip, amateur car repair, or moving to another city. These processes usually involve
2.6 Related Domains & Related Work
63
the access to online services and also various participants (professionals or friends), which is organized by these workflows. Social workflow users are no experts in workflow modeling similar to those of scientific workflows. As already sketched, such novices heavily rely on reuse by retrieval and adaptation. Modeling assistance would consequently significantly support these users to create new processes or adapt processes of other participants to suit their needs, thereby accessing valuable experience of other social workflow modelers. In flow-based programming [165], which links several small sub-components with each other in order to construct an entire program code, workflow modeling support could support programming by suggesting a certain linkage of sub-components. Flow-based programming actually tries to ease the way of programming and reuse knowledge (reusing of sub-components), this reduces error-proneness of programming code as well as programming time. Supporting the composition of such sub-components would be the next natural step further improving these issues by reusing the experience made in linking those sub-components. Cooking recipes can also be represented by workflows [222]. Thus, a step by step guidance for the preparation of a particular dish can be given. Traditional cooking websites usually provide a search function for recipes regarding the name, ingredients contained, and possibly food categories. In this regard, the presented approach could be beneficial in three respects: It supports an extended search of workflows such that besides desired ingredients also undesired ingredients, desired and undesired preparation steps as well as their relationships or diets can be considered. Preferences on the desired recipe can consequently be defined in more detail. Here, not only exact matches of desired ingredients will be regarded, but similar alternatives suggested in case the defined set of preferences is not found. Second, it could be used to adapt a certain recipe exactly to particular preferences, which could support in finding new variations for a given recipe. This is highly beneficial, in case of a missing ingredient or cooking tool as well as if a friend with a food allergy is spontaneously visiting. Finally, the approaches could be used to create new recipes by combining parts of other recipes either for discovery purposes or for creating recipes based on the ingredients and cooking tool currently available in the kitchen.
64
2 Foundations
2.6.2 Taxonomy of Workflow Modeling Support Approaches Workflow modeling assistance can be facilitated by various approaches, as illustrated in the taxonomy of Figure 2.26 (partially based on [60]). Acquiring and storing assists workflow modeling, since it aims at increasing the repository and consequently helps users to more likely find their desired workflow by providing more possibilities of workflow modeling variants. However, if the workflow repository increases, search methods are required to further support users. This is usually enabled by similarity-based retrieval or querying [60]. Alternative methods have been suggested to provide a navigation support through the repository. However, if search of workflows is not sufficient because the processes hardly match the requirements, automatic adaptation of workflows can by a means to support workflow modelers. Furthermore, the quality of workflows is of high importance and can partially be ensured by several revision approaches that assist during the modeling of workflows. Finally, workflow modeling can be supported by the illustration of the particular workflow such that only the required and helpful information according to the competence and skills of the users are presented.
Workflow Modeling Support
Acquire / Store
Search
Adapt
Revise
Ilustration
Similarity Querying Repository Navigation Figure 2.26: Workflow modeling support taxonomy
All those previously introduced ways of modeling support can address various parts of the workflow modeling process. Thus, they usually focus either on build-time (workflow model ) or run-time (workflow instance) of the workflows. Furthermore, various entities of workflows can be addressed, for example, the entire workflow, workflow fragments or components, single workflow elements or the linkage of control-flow elements, data-flow elements, or components. Additionally, the approaches could also be classified by the
2.6 Related Domains & Related Work
65
demand for user interaction. Based on the introduced taxonomy (see Fig. 2.26), related work is now illustrated including general as well as (PO)-CBR specific approaches for workflow modeling support. Acquire/Store As depicted, acquiring workflow models automatically can increase the knowledge stored, thus, providing an enhanced modeling support. This is also important for PO-CBR, since a sufficiently defined case base is required for appropriate problem-solving capabilities (see Knowledge Containers 2.4). This can be achieved, for example, by extracting process models automatically from textual process descriptions [222, 62]. Workflow mining (also referred to as workflow discovery or process mining) [253, 98, 225] is another related approach. It aims at using execution traces of processes in order to construct a related workflow process model, thus generating process models automatically from execution log files, which supports the modeling of workflows. In this regard, it identifies the way employees are actually performing their work, which could be used to trigger business process reengineering task on the underlying business process. Furthermore, also the management of change logs resulting from workflow adaptations is discussed [203], which could be used for the purpose of reusing adaptations or the analysis of conflicting workflow executions. Search Numerous methods were presented that enable a search for the best-matching workflow [60, 139]. This becomes important, especially if the workflow repository increases. The basic search methods are either based on similarities or queries [60]. Similarity measures can be used to identify the best-matching workflow or workflow fragment, which can consequently provide a means for the autocompletion of partially modeled workflows. Furthermore, this similarity search can be used to provide similar processes to the current one, thus supports the identification of alternative processes. Common workflow similarity measures rely, for example, on graph matching algorithms or graph edit distance. Furthermore, various query languages for workflow search have been presented and base on SQL-Like statements or the definition of desired and undesired workflow elements. This enables that workflow models or workflow fragments with particular properties, i.e., tasks and their relationships can be retrieved. Both workflow similarity and workflow
66
2 Foundations
querying are useful for workflow modeling, since they support the reuse of appropriate workflow models or workflow parts. In this regard they also provide a navigation through the workflow repository, which can be highly supportive for the user. This becomes essential, especially, when the workflow repository increases. Easing of navigation could also be facilitated by clustering, categories, tags or searching trees [24, 162, 240, 102]. Some selected approaches for modeling support will be illustrated below, whereas similarity measures and query languages will be discussed in chapter 4 in more detail. An approach to modeling support for scientific workflows was presented Leake and Kendall-Morwick [126]. During modeling of a workflow, services are suggested to the user. The approach considers the current state of the workflow and previously performed workflows. This simplifies the modeling process by a completion support as searching for the desired service can be avoided. Instead of starting modeling, the user could also define the input or outputs of the workflow or workflow part they are currently modeling [46]. Based on these information, the system can propose matching workflows or workflow parts. If the found workflows or workflow parts are not matching, automatic adaptation is attempted, for example, by adding an additional service to match the set of inputs and outputs. Finding best-matching workflows or workflow parts could, as already sketched, be achieved by tags and classifications. Using tags, the user can search for similar processes or process fragments and can further be supported in completion by suggesting process models or process fragments with similar tags compared to the ones inferred from the currently modeled workflow [102]. This approach was further enhanced by considering the relationships between workflow modelers with regard to their modeled workflow parts aiming at supporting the user during the selection of suitable workflow fragments [119], i.e., similar modelers have similar process usage patterns. Adaptation For workflow models, adaptation can be a means to create new workflows or to construct variants of a current workflow model. Minor et al. have investigated the adaptation of the control-flow of workflows based on adaptation cases [152, 153] (see Sect. 2.4) in the field of PO-CBR. A workflow adaptation case is represented by an original workflow, the workflow solution after modification referred to as target workflow, and a set of workflow fragments describing which workflow fragments have to be deleted or added in order to
2.6 Related Domains & Related Work
67
transform the original workflow to the corresponding target workflow. Based on this, the authors studied on learning of adaptation cases automatically by investigating the differences between workflow revisions of scientific workflows [155]. Another approach for the adaption of workflow models has been introduced by Kim et al. [114]. Here, a query is represented by a kind of a partial workflow, where tasks can be weighted according to their importance. Based on this CBR is employed for the retrieval of the most similar workflow model, mainly based on the similarity of terms between the query workflow and the retrieved model. A vocabulary of terms helps to adapt the workflow such that it more likely matches the given query, i.e., task terms are replaced if they are defined as similar in the vocabulary. Finally, workflows are stored within the case base. In contrast to the approach presented in this work, no structural changes are pursued during the retrieved model (kind of generalization approach) and adaptation knowledge acquisition is an entirely manual task. Automatic workflow support during run-time is important whenever situations change unexpectedly causing that the previously defined workflow model cannot be enacted as designated. For the run-time support, CBRFlow [262] initiates a dialog with the user, whenever changes on the running workflow instance have to be performed. This dialog captures the problem why the predefined workflow is not applicable and also a corresponding solution resulting from manual workflow adaptation. If similar situations arise, this workflow change can be reaccessed and reapplied. Workflow modelers finally have the possibility to abstract frequent changes to ECA rules. Based on these rules future exceptions can be handled autonomously. These ECA rules are also employed by the workflow management system AgentWork [176]. ECA rules in general are triggered on defined exceptions. Then, necessary workflow adaptations are automatically executed by identifying those workflow parts that are affected by a failure. The workflow is then modified by adding or removing tasks and adjusting the data-flow correspondingly. In summary, it can be stated that in contrast to search for workflow models, little research on workflow adaptation exists so far. In particular, workflow adaptation for the purpose of automated workflow construction has hardly been addressed by now and currently misses a systematic methodology. This thesis addresses this research gap by investigating the field of PO-CBR in more detail, especially focusing on the development of new workflow adaptation algorithms.
68
2 Foundations
Revise As previously introduced, workflow modeling is an error-prone task. Consequently, various approaches (see also Sect. 2.2.5) have been presented that enable the verification of the workflow syntax and ensure that the workflow model can be terminated (e.g., [211, 116, 245, 243]). Thus, ensuring the correctness of the workflow is already an initial means for workflow modeling support. Composition Analysis Tool (CAT) is an example of an intelligent assistant tool for interactive workflow modeling [115]. It analyzes the currently modeled workflow and generates error messages aiming at reducing the error-proneness of the workflow and suggestions on possible completions in order to support workflow modeling. Further, also the optimization of workflow models is important, since if new workflows are created based on other models probably inadequate ones, the created workflows will also be of low quality. This initially requires a Business Process Analysis, aiming at identifying deficiencies in the process models, for example, by simulation. These deficiencies are then reduced by Business Process Optimization [256]. Both Business Process Analysis as well as Business Process Optimizations are broad fields with many directions. Also, the structure of the workflow repository requires revision, since it affects search and adaptation activities. Thus, the management of process model variants (e.g, [47]), merging of process models or refactoring of the repository are also relevant fields [60]. Illustration Illustration of the workflow or the workflow repository can also support the user significantly. Hierarchical workflow representations or business process model abstraction can also be helpful during workflow modeling [268, 188]. In this regard, an abstract model reduces the detail level of information, for example, workflow fragments are replaced by a single abstract task. In order to perform abstraction, the abstraction criterion has to be known, which depends on the current user. The aim of abstraction is to remove unimportant information for the user and highlight the details of interest. In this regard, the number of occurrences of a certain task label, the execution time of a task, costs of a task or fragments of workflows can be of interest. Combining this with an appropriate level of abstraction may then support various stakeholders in modeling their processes. Thus, the user is flexible to choose the abstraction level that is in alignment with his or her skills and expertise. The flexibility
2.6 Related Domains & Related Work
69
of abstraction levels is not only helpful during modeling itself, but also in identifying workflows already (partially) matching the desired needs such that the new workflow does not have to be modeled from scratch. Further, approaches could support the user, for example, by explicitly enabling the filtering of certain information, viewing the entire repository or the particular workflow at various levels of detail [100]. Following the assumption that knowledge on the process is spread across various stakeholders, processes should rather be modeled in fragments [65]. The stakeholders model the process parts addressing their competencies and skills and all fragments, which reduces required competencies on the overall business process, thus providing modeling support. The business fragments are finally composed to an entire process model. Furthermore, fragments can be defined during run-time of a process, i.e., to support late-binding since the concrete procedure is not known during build-time, thus supporting a flexible execution. Hybrid Approaches The presented taxonomy is not a strict classification. Various hybridapproaches or approaches covering a larger scale of supportive techniques exist. For example, Business Process Intelligence [45] covers acquisition/storage, revision, and illustration activities from the taxonomy in Figure 2.26 addressing also business process modeling and redesign. In a concrete application, process mining could track the actions of a user within an information system [111]. If another user is subsequently aiming at performing the same actions again, the previously mined workflow could be retrieved by a similarity measure, possibly manually adapted to the particular needs and finally, automatically be enacted, i.e., performing the remaining actions within the information system autonomously. A broad scale of hybrid approaches is conceivable.
2.6.3 Related Fields in Case-Based Reasoning In this section, CBR related fields will be sketched in more detail, in order to demonstrate similarities and dissimilarities to workflow modeling support by means of PO-CBR. Since the modeling assistance recommends ways of modeling workflows, the approach is somewhat related to case-based recommendation [231]. However, recommendations are usually made based on similar preferences from other users or based on previous preferences by the same user. In the
70
2 Foundations
presented scenario, it is not aimed at identifying recommendations based on automatically gathered preferences. Furthermore, since PO-CBR is a synthetic task it is also related to configuration and design [200][p. 53-70]. In configuration, a complex object (e.g., a technical device) is constructed by the specification of several parameters and sub-components typically involving constraints on valid configurations. Design addresses the creation of a particular object, for example, like an architect designs a house. The quality of design is mainly affected by the ensemble of the various features. The creation of processes differs insofar from design and configuration as temporal relationships between activities especially need to be considered. Next, a more detailed view on Conversational Case-Based Reasoning and Case-Based Planning will be given, since they are the most related fields to the presented approach that already partially addressed workflow modeling support. Conversational Case-Based Reasoning In traditional CBR applications, the user needs to fully specify the problem situation. Thus, he or she already needs to define all restrictions and requirements on the expected solution. Conversational Case-Based Reasoning (CCBR) [4] overcomes this complication by presenting questions to the user on the given problem that the user can answer easily. In this regard, CCBR may provide a means for the modeling support of workflows as it eases the navigation through the workflow repository. However, in the field of PO-CBR, only few approaches exist (e.g., [262, 272]). As an example, Zeyen et al. [272] investigated CCBR on workflow cases for the purpose of modeling support by search. Though, adaptations are not considered, the user is supported by questions on workflow cases in the case base that helps the user in finding the most suitable workflow. Questions are automatically derived from the case base and comprise single workflow elements such as activities and data as well as workflow fragments. The questions are ranked and asked to the user, which helps him to identify suitable workflow models. Case-Based Planning Case-Based Planning [89, 48, 178] (CBP) is a CBR approach for reasoning on plans. Plans are similar to workflows, since they represent a process that describes how to achieve a particular goal under given circumstances. Cox et
2.6 Related Domains & Related Work
71
al. see planning as a “search problem for finding a sequence of actions that can transform an initial state of the world into a given goal state” [48][p. 1]. According to these authors, plans can, for example, be represented as graphs where nodes are states and the actions are edges that link to nodes transferring one state to another. Recapitulating, plans are similar to workflows, yet fundamentally different, since they are based on different assumptions. Traditionally, plans are expressed by states and actions with explicit preconditions determining their applicability and postconditions defining the resulting states (effects of the action) (see, e.g., STRIPS planner [68]). Thus, a plan can be constructed automatically, but the planning process usually requires a complete domain model. In contrast, the pre- and postconditions of the task execution in a traditional workflow domain are often not clearly determinable. This usually requires a manual modeling of the desired process or at least a manual revision of the constructed process in order to guarantee the correctness of the workflow. Thus, PO-CBR differs significantly from CBP. However, in planning as well as PO-CBR, cooking is occasionally applied as an example domain (e.g. [89, 222, 62]). Case-Based Planning differs from the typical planning process, as instead of constructing a new plan from scratch, existing plans are modified or adapted to suit new situations [48, 178]. Thus, it deals with the reuse of experience in the planning process considering failures and successes from previous planning episodes [89][p. 1-29]. Most basically, plans are constructed by the modification or adaptation of existing plans. This is in particular related to the construction of workflows by retrieval and adaptation as presented in this thesis. During the construction of a plan, not all required information might have been considered. This results in failures of plans during run-time, which requires a reparation of the plan. Thus, plan repair is related to the adaptation of workflow instances. Workflow adaptations during run-time can also be required, if not all occurring exceptions have been considered during the modeling of the workflow. In general, adaptation of plans can also be achieved by abstraction and generalization of plans [31] or hierarchical plan representations [177]. However, since plans can be constructed fully automatically, CBP essentially aims at increasing the performance of the planning process. Despite the difference between workflow and plan representations, CBP was recently proposed to support the reuse of workflows. Madhusudan et al. [134] presented an approach using CBP for supporting the modeling of workflows by reusing and adapting process models stored in a repository. In this regard, their approach is highly related to the presented thesis. In their work, the composition of workflows with regard to the control-flow of
72
2 Foundations
the workflow was considered. However, this approach requires the definition of states and actions for the workflow models, which is a highly elaborate task. Furthermore, also Homburg et al. [101] presented an approach to the construction of workflows by means of planning. However, again this approach requires the specification of a complete domain model. As opposed to these approaches, the workflow modeling assistance presented in this thesis focuses on reducing the amount of initial effort for setting up the modeling support functionalities. Hence, the workflow construction is based on a minimum of domain-specific knowledge and the required adaptation knowledge is learned fully automatically from the case base.
2.6.4 Conclusions This section sketched several application visions for workflow modeling assistance by PO-CBR in general, showing its importance and potential in various domains. Next, a taxonomy of approaches to workflow modeling assistance was illustrated. Based on this, related work was discussed, which included the workflow reuse by searching, querying, and navigation support within a workflow repository. Furthermore, approaches were described to support workflow modelers by appropriate workflow illustrations, by automated verification of the workflow, and by workflow acquisition. Moreover, the adaptation of workflows during run-time or build-time was discussed. Overall, this section revealed several basic usage scenarios for workflow modeling assistance by retrieval and adaptation, which are also partially sketched by Markovic and Pereira [139]20 : The ability to search for workflows or workflow fragments, the auto-completion of a partially modeled workflow, and finally the automated construction of workflows. However, this section further showed that PO-CBR has hardly been investigated so far. This particularly applies to workflow adaptation for the purpose of automated workflow construction. Consequently, this thesis will provide a first detailed contribution to workflow modeling support by means of PO-CBR. It thereby aims at closing an important research gap in workflow modeling assistance in general and in the under-researched field of PO-CBR in particular. Additionally, the presented PO-CBR approach was delimited from other fields in Case-Based Reasoning. Here, especially Conversational Case-Based Reasoning and Case-Based Planning were discussed. While Conversational 20 The
authors describe business process reuse in general
2.6 Related Domains & Related Work
73
Case-Based Reasoning supports navigating through the workflow repository by asking questions to the user, Case-Based Planning creates plans automatically by reuse of previously created plans. However, as pointed out, plans are different to workflows and the use of Case-Based Planning for workflows usually involves an elaborate acquisition task of domain knowledge. Please note that more related research will be discussed in the corresponding chapters.
3 Domain & Notation This chapter first introduces cooking workflows which will serve as a running example throughout this thesis and which will also be used to evaluate the presented workflow modeling assistance. Based on a business workflow representing the process of a business trip, a comparison will demonstrate that there is a high structural analogy between business workflows and workflows representing cooking recipes, such that cooking workflows can mostly be considered as a representative example. Based on this, the formal notations on workflows used in the following chapters are introduced, which result from various publications related to the presented workflow modeling assistance [168, 171, 173, 170]. This chapter further describes the starting point for the presented work. In this regard, ontologies are introduced, which are used to enrich information in workflows, resulting in so-called semantic workflows [20]. Furthermore, a corresponding similarity measure [20] that assesses the similarity between two semantic workflows is described.
3.1 Business Workflows and Cooking Workflows The workflow graph representation illustrated in Section 2.2.3 showed that workflows usually consist of a set of activities also referred to as tasks. These activities are ordered by so-called control-flow patterns, enabling the definition of subsequent executions (sequences), parallel executions (AND block), repeated executions (LOOP block), and exclusive executions (XOR block) of the corresponding activities. This order defines the control-flow of the workflow. Furthermore, the data-flow specifies which data is consumed or produced by the workflow tasks. Based on this rough definition, an example workflow in the business domain is illustrated in Figure 3.1 [23]. The workflow describes the process for a business trip comprising several tasks such as requesting trip approval, booking of the accommodation, and the final trip expense accounting subsequent to the actual business trip. For this purpose, several documents are shared between those tasks containing business trip data, booking in© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_3
76
3 Domain & Notation
formation, or travel expenses. Based on the trip approval notification, the illustrated workflow only executes one specific task sequence by use of an XOR-split/join control-flow pattern, i.e., either a notification of rejection is submitted or the preparation of the particular business trip is initiated. The process then finishes with a report that is created based on the particular travel information (in case of trip approval) or the trip rejection information. send notification of rejection
gather trip information
request for approval
NO
?
?
YES
business trip book hotel
business trip data
trip approval
data-flow edge
book travel
cash advance
booking information
control-flow edge
request cash advance
XOR control-flow node
write report
expense accounting
travel expenses
document / data node
travel expense payout
report
business activity / task node
?
Figure 3.1: Example of a block-oriented business workflow by Bergmann and M¨ uller [23]
Cooking instructions of recipes can also be formalized as workflows. In this regard, the cooking workflow describes the preparation steps and the ingredients required to prepare a particular dish [224]. Here, the task nodes are used to describe preparation steps, while the data nodes are used to represent ingredients that are processed during the preparation of the dish. An example cooking workflow representing a sandwich recipe is illustrated in Figure 3.2. Here, basically a mayonnaise-based sauce is produced and toppings such as salami and cheese are placed on the baguette slices. An AND-split/join control-flow pattern defines that preparation steps can be executed in parallel, i.e., the sauce can be prepared by a chef, while another one slices the baguette for preparing the placement of toppings and the sauce.
3.1 Business Workflows and Cooking Workflows
mayonnaise
grate
italian seasoning
mustard
mix
add
slice
slice
+
77
sauce
+
sandwich dish
spread
add
baguette
salami
layer
sprinkle
bake
cucumber cheese data-flow edge
AND ingredient / preparation step / control-flow edge control-flow node data node task node
+
Figure 3.2: Example of a block-oriented cooking workflow
Comparing both workflow domains, business workflows represent processes of information whereas cooking workflows describe a production process. More precisely, business workflows share data, for example, in the form of documents to perform certain business activities. Each of these activities consume a set of data which is modified or used to produce new data or documents for subsequent activities. The original documents still remain accessible. After the business workflow has been performed, a final document is produced. Likewise, a cooking workflow consumes certain ingredients in each preparation step in order to modify those ingredients or to create new ones, which finally results in a specific product, i.e., a particular dish. By processing an ingredient its original form is, however, no longer accessible. For example, after chopping onions, the onion tuber does not exist anymore. Although, these workflow domains show differences, their structural properties are highly similar, since the workflows in both domains produce a final data node by executing sequences of tasks that consume and produce certain data objects respectively. Consequently, the cooking workflows can mostly1 be considered as a representative example also for other domains with control-flow oriented workflow representations (see Sect. 2.2.3) such as business processes. This is mainly because the general structure of control1 domain-specific
characteristics may still differ
78
3 Domain & Notation
flow oriented workflows is usually highly similar, since they are based on the typical workflow patterns (see Sect. 2.2.3). Furthermore, cooking has served as a research domain in many applications, such as CHEF [88], Taaable [9], or WISE [226] and is also a relevant field for recipe websites and novel cooking applications such as IBM Chef Watson2 . Consequently, the use of this domain in research is widely accepted, since it can serve as a comprehensible illustration of various approaches. Thus, in the following sections, the cooking domain is chosen to demonstrate the developed workflow modeling assistance.
3.2 Workflow Notation In this section, formal notations of commonly used graph-based workflows are defined and illustrated by the priorly introduced cooking domain. Furthermore, various properties addressing the quality of workflows are presented and a formal notation for workflow sub-components is defined. A graph-based workflow is essentially considered as a directed graph with different types of nodes and edges as determined in Definition 3. Definition 3. A workflow is a directed labeled graph W = (N, E) where N is a set of nodes and E ⊆ N × N is a set of edges. Nodes N = N D ∪ N T ∪ N C can be data nodes N D , task nodes N T , or control-flow nodes N C . In addition, N S = N T ∪ N C refers to as the set of sequence nodes, i.e., task nodes plus control-flow nodes. Edges E = E C ∪ E D can be control-flow edges E C ⊆ N S × N S , which define the order of the sequence nodes or data-flow edges E D ⊆ (N D × N S ) ∪ (N S × N D ), which define how data is shared between the tasks. The labels of the nodes are assigned by a function S : N → Σ, where Σ denotes the set of possible labels. Based on this definition, further notations on the control-flow and the dataflow of workflows are introduced respectively. In a workflow W = (N, E), the control-flow edges E C define the execution order of the particular sequence nodes N S . Hence, in the following s1 < s2 defines a transitive relation expressing that sequence node s1 ∈ N S is executed prior to sequence node s2 ∈ N S . Furthermore, n ∈]s1 , s2 [ denotes that a sequence node n ∈ N S is executed after s1 but prior to s2 , i.e., s1 < n < s2 . This means, that for the workflow given in Figure 3.2 sprinkle has to be executed prior to the bake activity, i.e., sprinkle < bake, and layer prior to 2 https://www.ibmchefwatson.com
3.2 Workflow Notation
79
the sprinkle activity, i.e., layer < sprinkle < bake. Consequently, for the sprinkle activity, sprinkle ∈ ]layer,bake[ holds. Considering the data-flow edges, they define data nodes produced tD as well as data nodes consumed tD by a task t ∈ N T , i.e., tD = {d ∈ N D |∃(t, d) ∈ E D } and tD = {d ∈ N D |∃(d, t) ∈ E D }. These are also referred to as input (i.e., tD ) and output data nodes (i.e., tD ) respectively. Data nodes that are defined as an output but not as an input of the same task (they are produced but not consumed) are so-called creator data nodes N D∗ , since they are newly created after execution of the particular task, i.e., N D∗ = {d ∈ N D |∃t ∈ N T : d ∈ tD ∧ d 6∈ tD }. The related tasks are accordingly referred to as creator task nodes N T ∗ = {t ∈ N T |∃d ∈ N D : d ∈ tD ∧ d 6∈ tD }. While such creator tasks consequently create new data nodes, the remaining processing task nodes N T \ N T ∗ have at least one identical input and output data node, which is processed during task execution. As an example, for the workflow given in Figure 3.2, the task spread is a creator task with creator data node sandwich dish, describing that spreading sauce on the baguette creates the sandwich dish. More precisely, the input or consumed data nodes for this task are sauce and baguette and the output or produced data node is the sandwich dish. The preparation step mix, in contrast, is a processing task, which seasons the mayonnaise by consuming mayonnaise and italian seasoning as input and producing a processed mayonnaise as output. Additionally, two data nodes d1 , d2 ∈ N D are data-flow connected d1 n d2 , if a task exists such that d1 is consumed and d2 is produced by the same task. Furthermore, let d1 n+ d2 define a corresponding transitive data-flow connectedness as defined below. d1 n d2 , iff ∃t ∈ N T : (d1 ∈ tD ∧ d2 ∈ tD )
(3.1)
d1 n+ d2 , iff d1 n d2 ∨ ∃d ∈ N D : (d1 n d ∧ d n+ d2 )
(3.2)
For instance, the data nodes sauce and sandwich dish in Figure 3.2 are data-flow connected, i.e., sauce n sandwich dish. The nodes mustard and sandwich dish are transitively data-flow connected, i.e, mustard n+ sandwich dish. Moreover, let t1 n t2 describe that two tasks t1 , t2 ∈ N T are data-flow connected and that t1 n+ t2 denotes the corresponding transitive relationship as defined below.
80
3 Domain & Notation
t1 n t2 , iff t1 < t2 ∧ ∃d ∈ N D : ((t1 , d) ∈ E D ∧ (d, t2 ) ∈ E D ) t1 n+ t2 , iff t1 n t2 ∨ ∃t ∈ N T : (t1 n t ∧ t n+ t2 )
(3.3)
(3.4)
d
Furthermore, let t1 n t2 describe that two tasks t1 , t2 ∈ N T are data-flow connected via a particular data node d as follows: d
t1 n t2 , iff t1 < t2 ∧ ((t1 , d) ∈ E D ∧ (d, t2 ) ∈ E D )
(3.5)
In the given example workflow (see Fig. 3.2), the task sprinkle is data-flow connected to the task bake via the data node sandwich dish, i.e., sprinkle sandwich dish
n bake and sprinkle n bake. Furthermore, the task spread is transitively data-flow connected to the task bake, i.e., spread n+ bake.
3.2.1 Workflow Properties The workflow definition introduced previously does not restrict the use of control-flow edges. This can reduce the understandability of the modeled workflow significantly and impede its validation with regard to syntactic correctness. One approach to tackle these issues is to construct workflows by permitted blocks of workflow elements, thereby limiting valid control-flow compositions. Such workflows are referred to as block-oriented workflows and are defined as follows: Definition 4. A block-oriented workflow W = (N, E) is a workflow with a block-oriented graph structure according to the following rules: a) A block element is a workflow sub-graph of the Graph (N \ N D , E C ) which contains either: • a task node • a sequence of block elements • a LOOP block containing an opening loop node, one block element, and a closing loop node
3.2 Workflow Notation
81
• an XOR/AND block containing an opening XOR/AND node, two branches containing a block element, and a matching closing XOR/ AND node3 • a XOR block containing an opening XOR node, one branch with a block element, an empty branch and a closing XOR node b) The workflow W consists of a single root block element Workflow block elements, as exemplarily illustrated in Figure 3.3 as dashed rectangles, are accordingly used as bricks to construct workflows. These bricks include sequences of blocks, looping sequences (LOOP), parallel sequences (AND), and exclusive sequences (XOR). An additional XOR-block with only one sequence enables to define an optional path that can be skipped during run-time of the workflow. According to the definition, these bricks may be nested but not interleaved. Consequently, a block-oriented workflow has exactly one start node (no ingoing control-flow edges) and one end node (no outgoing control-flow edges). In literature, such process graphs are also referred to as hammocks [260, 273]. task block
block element sequence
LOOP block L
AND block A
* +
XOR block A*
+
X
*
?
*
L*
XOR block ?
X*
X
*
?
?
X*
Figure 3.3: Workflow block elements
The workflow shown in Figure 3.2 represents such a block-oriented workflow, since it basically consists of a single AND-block and various task sequences. As already stated, this way of construction restricts the usage of control-flow edges with regard to valid block elements. This follows the correctness-by-construction principle [193, 53], which ensures - as its name implies - the syntactic correctness of a workflow during construction(see Sect. sec:quality-conceptual). More precisely, certain properties, for example, soundness or termination of the workflow, have not to be checked. Thus, the 3 For
the sake of simplicity, only two branches are contained in an XOR/AND block, as multiple branches can easily be constructed by nesting other XOR/AND blocks.
82
3 Domain & Notation
validation of the syntax becomes superfluous, which is a particular advantage, if workflows are adapted, since valid modifications can mostly be obtained without effort. Though block-orientation reduces the expressiveness of workflows compared to BPMN models (not all possible workflow patterns are supported, see [251]) in favor of syntactic correctness, it mostly guarantees the compliance of various guidelines concerning pragmatic quality [148] (see Sect. 2.2.5) during workflow modeling. This means that block-oriented workflows are characterized by an increased comprehensibility and a reduced error-rate. On the one hand, single start and end elements are ensured and workflows are structured such that each split matches a joining connector of the same type (see Sect. 2.2.3). On the other hand, OR-blocks are not permitted in the previous definition, as in contrast to XOR and AND blocks they have a negative impact on the pragmatic quality according to the guidelines. Moreover, a block-oriented workflow comprises a lower number of input and output edges per element, as the blocks only consist of at most two sequences, which prevents arbitrary edges possibly resulting in many edges on a single task node. Please note that in the case of multiple branches within a single control-flow block, however, several control-flow blocks must be nested, which would introduce additional workflow elements into the workflow possibly having a negative impact on the pragmatic quality. Finally, since the labels of the workflows are determined by a specified description language Σ, this enables the restriction of labels to certain naming conventions. Overall, block-orientation mostly ensures the compliance with several guidelines on pragmatic workflow quality. Recapitulating, the block-oriented workflow construction reduces the complexity by limiting valid control-flow definitions, which increases the comprehensibility of the process model, reduces its error-proneness, and eases syntax validation. The data-flow, however, is thereby not restricted. Consequently, a terminology of consistent block-oriented workflows 4 is introduced to extend the syntactical correctness also towards the use of data-flow edges. This terminology is based on Davenport’s perspective of processes, who stated that “[...] a process is simply a structured, measured set of activities designed to produce a specific output [...]” [55]. In the following, these specific workflow outputs for a workflow W are denoted as W O ⊆ N D . In the cooking domain, the specific output is the particular dish produced, i.e., sandwich dish in Figure 3.2. For a consistent workflow, it is required, that each data node 4 Here,
consistency refers to something different than within logics
3.2 Workflow Notation
83
must at least be transitively data-flow connected to this specific workflow output and further each task must consume and produce at least one data object. This can be formalized as follows: Definition 5. A block-oriented workflow is consistent, iff each task has at least one input and one output data object, i.e., ∀t ∈ N T : tD 6= ∅ ∧ tD 6= ∅ and each data node is transitively data-flow connected to the specific output W O , i.e., ∀d ∈ (N D \ W O )∃o ∈ W O : d n+ o. Basically, such workflows are considered to be complete, denoting that all required information for a proper execution of the workflow is defined. Hence, for the cooking workflow in Figure 3.2, each ingredient must be contained in the specific output W O ⊆ N D , i.e., sandwich dish. Otherwise, the particular ingredient as well as the related tasks would have no effect on the final dish and would thus be superfluous. Consequently, this definition also prevents non-relevant activities by prohibiting preparation steps that do not process an ingredient or create a novel one, i.e., at least one input and output ingredient must be defined. In the following, consistent block-oriented workflows are considered as syntactically valid with regard to control-flow as well as data-flow.
3.2.2 Partial Workflows Workflows can be organized into sub-components, representing a partial process of the entire workflow. These sub-components are usually determined by a given subset of related activities. Here, such workflow sub-components are considered as so-called partial workflows formalized as follows: 0
Definition 6. For a subset of tasks N T ⊆ N T , a partial workflow W 0 of a block-oriented workflow W = (N, E) is a block-oriented workflow W 0 = 0 0 0 0 C0 (N 0 , E 0 ∪E+ ) with a subset of nodes N 0 = N T ∪N C ∪N D ⊆ N . N D ⊆ N D 0 is defined as the set of data nodes that are linked to any task in N T ,i.e., 0 0 N D = {d ∈ N D |∃t ∈ N T : d ∈ tD ∨ d ∈ tD )}. W 0 contains a subset of edges E 0 = E ∩ (N 0 × N 0 ) connecting two nodes of N 0 supplemented by a C0 set E+ of additional control-flow edges that retain the execution order of 0 0 0 C0 the sequence nodes, i.e., E+ = {(n1 , n2 ) ∈ N S × N S |n1 < n2 ∧ 6 ∃n ∈ N S : 0 0 ((n1 , n) ∈ E C ∨ (n, n2 ) ∈ E C ∨ n ∈]n1 , n2 [)}. Control-flow blocks of the workflow are contained in the partial workflow if they do not violate the block-oriented workflow structure. Hence, each workflow block element within a control-flow block must not be empty, i.e.,
84
3 Domain & Notation
contain at least one task or another valid child block element. Otherwise, the corresponding control-flow block is no element of the partial workflow. C0 The partial workflow may contain additional edges E+ in order to retain the execution order of the sequence nodes. This basically means, that an additional edge is required, if a sequence node s2 ∈ N S is located between two sequence nodes s1 , s3 ∈ N S in the original workflow, i.e., s2 ∈]s1 , s3 [, 0 which is not an element of the partial workflow, i.e., s2 6∈ N S . sandwich dish slice
add
layer
salami
cucumber
sprinkle
bake
cheese
Figure 3.4: Example of a partial workflow W 0
A partial workflow W 0 for the original workflow W given in Figure 3.2 is illustrated in Figure 3.4. This workflow was constructed based on the subset of tasks slice, add, layer, sprinkle, bake. Here, an additional edge (depicted by the double-line arrow) is required to retain the execution order, since slice and add are not directly linked by the control-flow in the original workflow W . Furthermore, the AND-block of W is not contained in this partial workflow, as it does not comprise any related tasks. This sub-component construction ensures that partial workflows are again blockoriented workflows. Please note that partial workflows differ from so-called sub-workflows. Sub-workflows require that all intermediate sequence nodes (e.g., spread in the previous example) are also contained within the subworkflow, i.e., no control-flow or task node can be skipped.
3.3 Workflow Ontology The labels of task and data nodes used in workflows can be organized in ontologies. For this purpose, a taxonomy according to Definition 7 can be applied arranging the labels into a tree structure. In the cooking domain, this means that a data taxonomy specifies the labels of ingredients and a task taxonomy specifies the labels of preparation steps, respectively.
3.3 Workflow Ontology
85
Definition 7. A taxonomy ψ is a tree of a partially ordered set of semantic terms Γ = {γ0 , . . . , γn }, whereas γi @ γj denotes that γj is more general than γi ( γi is more specific than γj ). Further, γi v γj holds, iff γi @ γj ∨ γi = γj . A taxonomy consists of a single most general term (root term) denoted by ψ¯ ∈ Γ, i.e. 6 ∃γ 0 ∈ Γ : ψ¯ @ γ 0 . The set of most specific terms (leaf terms in the tree) is denoted by ψ = {γ ∈ Γ| 6 ∃γ 0 ∈ Γ : γ 0 @ γ}. ¯ ingredients 0.01 (ψ) vegetarian
non vegetarian
0.1
0.6
0.3
potatoes rice noodles (ψ) (ψ) (ψ)
...
0.5
...
side dish vegetables liquids seafood ...
...
0.1
0.7
meat 0.6
beef pork chicken turkey (ψ) (ψ) (ψ) (ψ)
Figure 3.5: Example of a data taxonomy
The example taxonomy given in Figure 3.5 contains ingredient labels as semantic terms. According to the taxonomical structure, meat is a more general term compared to beef, i.e., beef @ meat. Here, the most general term is ingredients. Potatoes, rice, noodles, beef, pork, for example, denote the most specific terms or leaf terms respectively. An inner node γ represents a generalized term as a class of most specific terms, i.e., {γ 0 ∈ ψ |γ 0 v γ}. ¯ The generalized term vegetarian within the illustrated taxonomy represents a class of ingredients comprising, for example, potatoes, rice and noodles. Based on this fundamental taxonomy definition, some further notations are introduced below. Definition 8. Γ↓ (γ) defines the one-step specializations of term γ, i.e., Γ↓ (γ) = {γx ∈ Γ|γx @ γ∧ 6 ∃γy ∈ Γ : γx @ γy @ γ}. Likewise Γ↑ (γ) defines the one-step generalization of term γ, i.e., Γ↑ (γ) = γx ∈ Γ such that γ ∈ Γ↓ (γx ). Let further LCA(γ1 , γ2 ) ∈ Γ be the lowest common ancestor (most specific generalization) of γ1 , γ2 ∈ Γ in taxonomy ψ, i.e., LCA(γ1 , γ2 ) = γ ∈ Γ such that γ1 v γ ∧ γ2 v γ∧ 6 ∃γ 0 : γ 0 @ γ ∧ γ1 v γ 0 ∧ γ2 v γ 0 . These notations are now illustrated based on the taxonomy example of Figure 3.5. It can be seen that rice (besides potatoes and noodles) is a possible one-step specialization of side dish, i.e., rice ∈ Γ↓ (side dish).
86
3 Domain & Notation
Here, vegetarian is the one step generalization of side dish, i.e., side dish = Γ↓ (vegetarian). Consequently, also side dish = Γ↑ (rice) ∈ Γ↓ (vegetarian) holds. Examples for least common ancestors are side dish for potatoes and noodles or vegetarian for vegetables and rice, i.e. LCA(potatoes, noodles) = side dish and LCA(vegetables, rice) = vegetarian.
3.4 Semantic Workflows Semantic workflows as defined by Bergmann and Gil [20] are workflows that are annotated with ontological information. In the cooking domain these semantic annotations could, for example, include properties such as baking time or baking temperature, or the required amount of ingredients (see example workflow fragment in Fig. 3.6). In the business domain, for example, annotations could comprise the assigned employee or the execution status of a task, or meta information of a particular document. Thus, semantic workflows in general provide a flexible way to annotate workflows with additional information, which can be very useful in many domains. However, for sake of simplicity, in the following only task and data labels from the previously introduced taxonomies are considered as possible semantic annotations. label: cheese amount: 200g type: AND join cheese
...
+
sprinkle
label: bake duration: 20 min degree: 180 °C bake
Figure 3.6: Workflow fragment with semantic annotations
Definition 9. The labels of a semantic workflow W = (N, E) are determined by the function S : N → Σ, which assigns each node n ∈ N an appropriate label from the metadata language Σ. Hence, S(n) denotes the particular label of a node n ∈ N in a workflow W .
3.5 Semantic Workflow Similarity
87
The semantic labeling of a workflow is now formalized in Definition 9. The function S assigns each node n ∈ N from a workflow W an appropriate label from a metadata language Σ. Here, this metadata language Σ is defined by a taxonomy of task node labels ψtasks (preparation steps) and a taxonomy of data node labels ψdata (ingredients, see Fig. 3.5) as introduced in the previous section. Additionally, the metadata language comprises the possible types of control-flow nodes CF T , required for the definition of opening and closing AND/XOR/LOOP control-flow nodes (see Sect. 2.2.3), i.e., CF T = {”AN Dsplit”, ”AN Djoin”, ”XORsplit”, ”XORjoin”, ”LOOP split”, ”LOOP join”}. Thus, the labeling function S assigns each task or data node a label from the corresponding taxonomy and each controlflow node a particular type, i.e, Σ = Γψtasks ∪ Γψdata ∪ CF T 5 . The label of a node n ∈ N is denoted as S(n). If a workflow is annotated with generalized labels from a taxonomy, they describe that an arbitrary more specific label can be chosen respectively. In general, it can be assumed that an executable workflow usually consists of leaf terms (see Sect. 3.3) only. Otherwise, the particular ingredient or preparation step is at least selected during the actual execution of the activity. If a cooking workflow, for example, contains the ingredient meat, a particular meat type such as pork has to be chosen during workflow execution. Consequently, the generalized nodes within the taxonomy are mostly considered for internal computations or reasoning6 . An advantage of limiting the set of possible labels compared to free text descriptions is that this increases the uniformity of workflow models, therewith also their pragmatic quality (see Sect. 2.2.5), by use of a consistent terminology. Furthermore, such taxonomies may ease the modeling process, since suggestions of task and data node labels can be provided. In the next section, it will be shown that the semantic labels can further be used to define a semantic workflow similarity.
3.5 Semantic Workflow Similarity Workflow similarity measures in general can be used to identify alternative workflow modeling variants for a given workflow or to identify best-matching workflows for a partially modeled one. For the presented workflow modeling 5 In
the following, the index of the taxonomy is omitted, if it is obvious which taxonomy is referenced. 6 Please note that in some workflows also generalized labels occur (for more details see Sect. 5.1)
88
3 Domain & Notation
assistance the semantic workflow similarity measure by Bergmann & Gil [20] is employed. This similarity measure is based on traditional approaches already successfully applied in Case-Based Reasoning and has further been proven to match the expectations of experts. The semantic similarity measure uses the local-global principle introduced in Section 2.4.4. Here, the local similarity measure utilizes the previously introduced taxonomies, whose terms are assigned to the nodes of the workflow by the function S. In order to assess the similarity between two workflow nodes the taxonomic similarity approach by Bergmann is applied [15]. This means that a similarity value simψ is annotated to each non-leaf term γ in the taxonomy, i.e., γ ∈ ψ\ψ . This value defines the similarity between all terms which are more specific¯ than γ, i.e., between all child terms. simψ (meat) = 0.6 in Figure 3.5, for example, denotes a similarity of 0.6 between all types of meat. The similarity simψ (γx , γy ) of two terms γx , γy ∈ ψ within the taxonomy is then determined by the corresponding similarity value of the least common ancestor (LCA) (see Formula 3.6). ( simψ (γx , γy ) =
1 , if γx v γy ∨ γy @ γx simψ (LCA(γx , γy )) , otherwise
(3.6)
In the illustrated taxonomy example in Figure 3.5 the following similarity values apply: sim(seafood, meat) = 0.1, sim(noodles, vegetables) = 0.1, sim(meat, chicken) = 1.0, and sim(chicken, meat) = 1.0. This enables to compute similarities between domain-specific task or data labels by employing the particular taxonomical structure. Within a workflow, the similarity of two nodes simN : N × N → [0, 1] is then defined as the similarity between the corresponding annotated labels of the particular taxonomy simψ (see Formula 3.7). For similarity assessment between two control-flow nodes, the types of control-flow nodes, i.e., AND-split/join, XOR-split/join, or are considered, i.e., control-flow nodes of the same label are denoted by a similarity of 1, otherwise with a similarity of 0. simψ (S(n1 ), S(n2 )) , if n1 and n2 task or data nodes simN (n1 , n2 ) = 1 , if S(n1 ) = S(n2 ) 0 , otherwise (3.7)
3.5 Semantic Workflow Similarity
89
Furthermore, a local edge similarity simE : E × E → [0, 1] defines the similarity between two edges by aggregating the similarity of the corresponding pre (denoted by e.pre) and post nodes (denoted by e.post) of the particular edges (see Formula 3.8). In case that the corresponding pre and post node both have a similarity value of 1 the similarity value of the edge is consequently defined as 1.
simE (e1 , e2 ) =
simN (e1 .pre, e2 .pre) + simN (e1 .post, e2 .post) 2
(3.8)
For computing a global similarity between two workflows, a mapping function m is defined, which maps the nodes and edges of query workflow Wq = (Nq , Eq ) to those of a case workflow Wc = (Nc , Ec ). This mapping is defined as a type-preserving, partial and injective mapping function Nq ∪ Eq → Nc ∪ Ec . Type-preserving means that only workflow elements of the same type are mapped. This prevents, for example, mappings between a data node and a task node or a data-flow edge and a control-flow edge respectively. Furthermore, as the mapping is a partial function, not all nodes of the case workflow have to be mapped by the query nodes. Moreover, edges are only mapped by this function, if their corresponding nodes are mapped as well. In the following, let Nqm = Nq ∩ D(m)7 be the set of mapped query nodes, and Eqm = Eq ∩ D(m) the set of mapped query edges. P simm (Wq , Wc ) =
simN (n, m(n)) +
n∈Nqm
P e∈Eqm
|Nq | + |Eq |
simE (e, m(e)) (3.9)
The similarity between the query workflow Wq and the case workflow Wc is then estimated for a particular mapping m by use of the local node and edge similarity. More precisely, node similarities simN (nq , m(nq )) are computed between a mapped query node nq ∈ Nqm and the corresponding mapped case node m(nq ) = nc ∈ Nc and edge similarities simE (eq , m(eq )) are assessed between the mapped query edge eq ∈ Eqm and the corresponding mapped case edge m(eq ) = ec ∈ Ec . Based on these local similarities, a workflow similarity can be estimated by aggregating all similarity values to a mean score, denoted as simm (Wq , Wc ) (see Formula 3.9). Since for two workflows usually multiple admissible mappings exist, the overall workflow similarity is determined by the mapping resulting in the highest similarity 7 D(m)
denotes the domain of the mapping function
90
3 Domain & Notation
value (see Formula 3.10). This best possible mapping is in the following also referred to as mmax . sim(Wq , Wc ) = max{simm (Wq , Wc )|admissible mapping m}
(3.10)
Since the similarity is based on a partial mapping function, this similarity measure is asymmetrical. Hence, the similarity measure is basically a kind of sub-graph-matching, assessing how well the query workflow is covered by the case workflow. The measure denotes a similarity of 1, if the query workflow is a sub-graph of the case workflow. Thus, for a partially modeled workflow, a matching complete workflow model could be determined from a workflow repository. Given the partial workflow of Figure 3.4 as a query workflow Wq and a case workflow Wc as illustrated in Figure 3.2, the similarity measure would assess a high similarity value because the elements of Wq are mostly contained in Wc . Here, the additionally added control-flow edge in the partial workflow is the only element that cannot be mapped exactly, which thus marginally reduces the similarity value. In the opposite direction, a much lower similarity value would result, since many elements cannot be mapped. The computation of the similarity requires to construct the set of all possible admissible mappings m in order to choose the mapping resulting in the highest workflow similarity. This leads to an exhaustive search for the best possible mapping, which is computationally not feasible. Thus, Bergmann and Gil [20] presented a heuristic search algorithm based on A* to speed-up this search process. If the most similar workflow within the repository is searched, the heuristic similarity computation is then applied to each workflow within the repository. To further improve this process for similarity computations on the entire repository an A* parallel retriever was proposed by Bergmann and Gil [20], which computes similarity values in parallel and aborts similarity computations, if it is guaranteed that a particular workflow is not contained within the k-most similar workflows. As long as the memory bounds are not exceeded, it is ensured that the most similar workflow can still be identified. In the following, workflow retrieval or workflow similarity refers to these heuristic algorithms, respectively. Though its computational complexity, the semantic workflow similarity has been chosen as a basis for the workflow modeling assistance, since the local-global principle as well as the mapping are useful properties for deriving detailed similarity information. These are highly beneficial during computational reasoning with workflows as will be shown in the next chapters.
3.6 Conclusions
91
3.6 Conclusions This chapter showed that despite their differences business processes and cooking workflows are structured highly similar and illustrated two example scenarios respectively. The latter will serve as a running example to demonstrate the approaches for the workflow modeling assistance. Based on this, workflow notations that will be used in the following were introduced. Next, block-oriented and consistent workflows were defined that can be used to ensure syntactical correctness of workflows thereby easing workflow adaptations. Furthermore, workflow ontologies have been introduced and their usage for semantic workflows [20] has been described. Finally, a semantic similarity measure [20] has been sketched, which enables the assessment of a similarity value between two workflows. From a Case-Based Reasoning perspective, the workflow notation as well as the ontology basically represent the vocabulary container (see Sect. 2.4.2). The stored workflows based on this vocabulary represent the case base and the similarity measure represents the similarity container. In order to implement such a PO-CBR system, the ontology would usually have to be constructed manually and annotated by local similarity values for determining the presented similarity measure in the particular domain. If the required ontologies are already available, these could be reused and semantic similarity measures could be applied for computing local similarity values for terms within these ontologies [270, 94]. Thus, the ontology construction can potentially be avoided and similarity values would not have to be manually specified within the ontology. However, the investigations have shown that a manual construction of taxonomies and annotations with similarity values is mostly feasible. Based on such a PO-CBR system, the semantic workflow similarity can already serve as modeling support by two means. Either alternative workflow models for a particular workflow or a matching workflow for a partially modeled one could be identified. The following chapter sketches that this similarity measure can serve as a basis for a more expressive query language to facilitate a more comprehensive workflow modeling assistance.
4 Query Language Workflow modeling assistance by retrieval and adaptation requires a query language that captures the particular needs of the individual workflow model to be created. Query languages for workflow models can be basically classified into similarity search and querying. Relating to this, Dijkman et al. state that “[. . . ] querying searches for exact matches of a query to a part of a process model, while similarity searches for inexact matches of the query to a complete process model” [60][p. 2]. A query language for the intended workflow modeling assistance must accordingly be based on similarity search, since inexact matches with stored workflow models are very likely, if individual workflows shall be created. The similarity assessment can then be applied to identify the best-matching workflow and to guide the subsequent adaptation process of the workflow. Traditional workflow similarities, however, are fairly limited for this purpose, since they merely capture desired properties of a workflow in order to support the identification of similar processes, searching for workflow fragments or enabling the auto-completion of a process (see also Sect. 2.6.2). The automatic construction of individual workflow models, however, additionally requires that restrictions such as undesired workflow elements or fragments can be defined. Thus, a novel process-oriented query language POQL is presented in this chapter for the primary objective to support the creation of individual workflow models. POQL extends traditional similarity search of workflows by capturing requirements as well as restrictions on the desired workflow model. The main contributions of this chapter have been published by M¨ uller and Bergmann [172, 169] and are organized as follows: First, requirements on a query language for the intended modeling assistance will be explained in more detail. Next, the corresponding query language will be introduced in two different versions and illustrated based on examples from the cooking domain. It is further shown, how the consistency of POQL queries can be validated. Moreover, potential extensions of the query language will be discussed. Finally, this chapter will present its possible usage scenarios for workflow modeling assistance and position the approach in the field of related work. © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_4
94
4 Query Language
4.1 Requirements As previously mentioned, the intended workflow modeling assistance necessitates a novel query language with certain requirements. Evidently, the query language must satisfy the expressiveness demands, meaning that the workflow modeler can define required types of information to describe the individual needs of the desired workflow model. Moreover, such a query language should be characterized by a high intuitiveness as it constitutes the core communication with the workflow modeling assistance. Finally, the query processing must provide a ranking functionality to assess the degree of compliance between a workflow model and the defined query such that the workflow modeling assistance can provide an appropriate workflow model. These three different requirements, which are also partially discussed in literature (e.g., [138, 6]), are explained in more detail below.
Expressiveness The expressiveness requirements of a query language for the purpose of workflow modeling assistance are derived by exposing the limitations of currently available methods. Applying similarity measures, for example, the semantic workflow similarity (see Sect. 3.5), would enable to define a single partially modeled workflow as a query. This query would then describe the task and data objects as well as relationships between those that should be contained in the desired workflow. Consequently, the optimal workflow from the repository contains most components or most similar components for the given query workflow. In the best case, all desired elements are contained. In this regard, similarity can be used to retrieve the most suitable workflow for a given set of workflow elements. In the cooking domain, this could comprise several desired ingredients, preparation steps or relationships between those (e.g., mashed tomatoes). However, there might be certain restrictions on the desired workflow, which requires to define undesired workflow elements such as tasks, data elements, or entire workflow fragments that must not occur in the workflow. In the cooking domain, it is very likely that certain ingredients, preparation tools (e.g, oven), or cooking skills (e.g., blanching of celery) are not available. If the provided workflow contains such elements, because they cannot be defined within the query, workflow modeling assistance becomes significantly less useful. Thus, it is of high importance for a query language to additionally consider undesired workflow properties. Since the purpose of traditional similarity measures is to identify similar workflows or similar workflow
4.1 Requirements
95
fragments, they consequently need to be extended to additionally capture undesired workflow elements. In order to further enhance the expressiveness, the query language should additionally be able to capture more general desired or undesired properties of the workflow. This means that general workflow elements with regard to ontological information (see Sect. 3.3), i.e., more general task and data labels, can be considered as desired or undesired elements within the query. In the cooking domain, this could be used to express that a vegetarian dish (by defining meat as undesired) or a meat-containing recipe (by defining meat as desired) is requested by the user. Because such general workflow elements significantly increase the expressiveness, they should consequently be considered as part of the query language. Summarizing, the query language must be expressive enough to be able to represent the relevant requirements of the workflow modeler. Thus, the query language should not only be able to represent desired properties, but also be able to handle undesired properties of the desired workflow. Moreover, generalized query elements should be supported. These expressiveness requirements most likely also apply in various other domains such as business processes, where a query would be defined by desired or undesired business activities, business documents, and their relationships, respectively. Since the semantic workflow similarity measure provides a means to capture desired as well as general workflow elements, it is used as a basic functionality for the presented query language, which additionally considers undesired workflow properties.
Intuitiveness The query language must be easy to understand for the particular audience, i.e., the workflow modelers. Otherwise, the utility and benefit of the workflow modeling assistance compared to manual workflow construction would be reduced substantially. Consequently, a query language, which is highly similar compared to the workflow modeling itself, is required. This shows that a visual query language is preferable, which would also facilitate the definition of complex workflow queries. In this regard, new notations should only be introduced, if required and they should be based on the already known concepts.
96
4 Query Language
Ranking A query language for the intended workflow modeling support should facilitate the retrieval of most suitable workflows and further be able to guide the subsequent adaptation process by providing information, which workflow elements have to be added, deleted, or modified, to construct a better matching workflow. In order to identify the most suitable workflow from the repository, the query language must enable a ranking of the workflows with regard to query fulfillment. More precisely, during query processing all workflows need to be ordered such that workflows with a higher query fulfillment are ranked at higher positions. For the automatic adaptation process itself, it is important that the workflow modifications can be assessed, i.e., whether they lead to a higher query fulfillment. This assessment can be used to guide the entire adaptation process by choosing appropriate workflow modifications.
4.2 POQL Queries Based on the previously mentioned requirements, a POQL query basically defines desired and undesired properties of a workflow. Thus, the query consists of a set of desired workflow components and a set of undesired workflow components, as defined in Definition 10. Each workflow component thereby consists of nodes and edges, representing desired/undesired parts of the workflow. This ensures a high intuitiveness as the query construction is highly similar to modeling a workflow. Consequently, task nodes, data nodes, and control-flow nodes as well as their relationships with regard to control-flow and data-flow can be defined, either within the desired or undesired query part. − + − Definition 10. A POQL query Q = ({Q+ 1 , . . . , Qn }, {Q1 , . . . , Qm }) con+ + sists of a set of desired workflow components Q1 , . . . , Qn and a set of unde− sired workflow components Q− 1 , . . . , Qm defining the desired and undesired properties of the desired workflow model, respectively.
More precisely, the query basically defines that each desired workflow component should be included within the workflow and no restriction workflow component should be contained. Each workflow components is represented by an arbitrarily fragmented workflow that is not necessarily consistent or blockoriented (see Sect. 3.2.1). In the following, the set of desired workflow compo+ nents is also referred to as query workflow Q+ = Q+ 1 ∪ . . . ∪ Qn and the set of − − undesired workflow components as restriction workflow Q = Q− 1 ∪ . . . ∪ Qn ,
4.2 POQL Queries
97
respectively. Thus, a POQL query is also referred to as pair of a query and a restriction workflow, i.e., Q = (Q+ , Q− ). Besides such desired and undesired properties, a query language should be able to consider more generalized properties according to the expressiveness requirements. Thus, POQL employs the task and data ontologies (see Sect. 3.3). More precisely, the generalized terms specifying classes of task and data labels can be used to define generalized workflow properties within the query. chop
heat
herbs
cheese
query workflow Q+ (desired properties)
mash
bake
tomatoes
meat
restriction workflow Q(undesired properties)
Figure 4.1: POQL query example Q
Figure 4.1 shows an example of a POQL query containing desired and undesired properties, including specific as well as generalized elements. The dashed rectangles denote the particular workflow components of the query and the restriction workflow. The desired properties define, that a cooking workflow is searched for preparing a hot dish (heating desired), which contains some kind of cheese and freshly chopped herbs. The latter denotes that instant mixed herbs are less preferred. Please note that herbs, cheese and heat are generalized labels that can be fulfilled in various ways. Additionally, the generalized labels of the undesired properties require that the recipe should be a vegetarian meal and that the dish should not be baked, i.e., casserole recipes are undesired (e.g., no oven present or too effortful). The restriction workflow further denotes that the desired workflow should not contain any mashed tomatoes. However, the activity mash and the ingredient tomatoes are no undesired elements in general, as for example, bake and meat, but should not be contained in combination. While this section described the syntax and intended semantics of the query, the processing of a POQL query requires a ranking of the workflows within the repository with regard to their query fulfillment. If a query would only consist of desired properties, it would only represent a partially modeled workflow. In this case, the semantic workflow similarity measure
98
4 Query Language
could be applied for ranking purposes. For more expressive POQL queries, the ranking procedure is illustrated in the next section.
4.3 Query Processing As previously sketched, the query language should be able to rank workflows from the repository according to a given query. The intended ranking function QF (Q, W ) → [0, 1] assesses the query fulfillment for a workflow W and a query Q based on the following conditions: 1. QF = 1, if the workflow W exactly matches all desired properties Q+ and is not exactly matched by any component of the undesired properties Q− (optimal workflow). 2. QF = 0, if the workflow W is exactly matched by all components of the undesired properties Q− and none of the desired properties Q+ are at least partially fulfilled. 3. QF ∈]0, 1[, in all other cases in which the previous conditions do not apply. According to these conditions, the ranking most basically requires that each element in the workflow W , which is similar to an element defined in the query workflow Q+ , will increase the ranking value, while only exact matches of entire restriction workflow components Q− 1 , . . . , Q− n will reduce the ranking value. For example, if cut tomatoes would be desired, a cooking workflow containing cut vegetables or mashed tomatoes represents an alternative, which is only preferable, if no exact matches could be found. Consequently, the ranking value is increased but slightly lower compared to an exact match. If, however, a restriction workflow component would define that cut tomatoes are undesired, cut vegetables or mashed tomatoes are not exactly matched and should consequently not reduce the ranking value. If the ingredient tomatoes or the preparation step cut are generally not desired, they should be added as a single restriction workflow component. In this case, it would lead to a reduction of the ranking value. The concrete implementation is now formalized in Definition 11 by introducing a measure for the query fulfillment QF (Q, W ). Definition 11. The query fulfillment of a workflow W for a given query + − ,W ) Q = (Q+ , Q− ) is defined as QF (Q, W ) = sim(Q ,W )+RS(Q . 2
4.3 Query Processing
99
Thus, the query fulfillment for a query Q and a workflow W , is determined by the semantic similarity sim(Q+ , W ) → [0, 1] and a restriction satisfaction function RS(Q− , W ) → [0, 1]. In this regard, the semantic similarity (see Sect. 3.5) assesses the fulfillment of the desired properties and the restriction satisfaction implements a corresponding measure for undesired workflow properties (see Fig. 4.2). The latter, more precisely, denotes the restriction satisfaction as the proportion of not exactly matched restriction workflow components and the total number of restriction workflow components as follows:
RS(Q− , W ) = 1 −
− − |{q − ∈ {Q− 1 , . . . , Qm }|sim(q , W ) = 1}| m
(4.1)
Exactly matched restriction workflow components are those entirely contained in the workflow W , i.e., they have a similarity value of 1 assigned according to the semantic similarity measure. query Q
QF(Q ,W )
desired properties Q+ (query workflow)
Q
+ 1
...
Q
sim(Q+,W
)
workflow W
+ n
desired workflow components
undesired properties Q-
) Q-,W RS(
(restriction workflow) -
Q1
...
-
Qn
( sim
-
Qi
,W
)
undesired workflow components
Figure 4.2: Query fulfillment computation
This query fulfillment is consequently in alignment with the previously introduced ranking function, i.e., each fulfilled desired property increases the query fulfillment, while each exactly matched component of the undesired properties reduces the value of the query fulfillment. Thus, the query fulfillment measure enables a detailed assessment of a workflow with regard to a particular query. Moreover, this allows a ranking of all workflows within the repository, i.e., workflows with a higher query fulfillment are preferred, since more restrictions and requirements are satisfied.
100
4 Query Language
Please note that during computation of the query fulfillment, the individual workflow elements of the workflow W can be mapped multiple times by the mapping function of the semantic similarity. More precisely, they can potentially be mapped by the query workflow and each restriction workflow component independently. This is important, since otherwise the query and restriction workflow components of the query Q could only refer to different workflow elements, which would lead to a different and undesired query semantic. This basic query fulfillment is further enhanced by introducing a weighting of the particular properties with regard to their size. This should ensure that those components, the user more elaborately defined during query modeling, have a higher importance within the ranking function. Thus, these properties are more likely to be contained in the best-matching workflow, which avoids the most elaborative manual adaptations, if an exact matching workflow could not be identified. Hence, the presented query fulfillment function is adjusted to a weighted measure QFweight (Q, W ) → [0, 1] of a workflow W for a given query Q = (Q+ , Q− ) as follows: size(Q+ ) · sim(Q+ , W ) + size(Q− ) · RSweight (Q− , W ) size(Q+ ) + size(Q− ) (4.2) In general, the size of a workflow W is defined as the number of nodes and edges contained in the workflow, i.e., size(W ) = |N | + |E|. Here, the workflow size is used to weight the query workflow Q+ and the restriction workflow Q− respectively. The weighted restriction satisfaction is defined as the aggregated size of the non-matching restriction workflow components related to the size of all restriction workflow components, i.e., the size of the restriction workflow (see Formula 4.3). Consequently, more or larger matching components lead to a lower restriction satisfaction compared to fewer or smaller matching components. The overall query fulfillment is finally normalized by the size of the defined query. P size(c) QFweight (Q, W ) =
RSweight (Q− , W ) =
c∈{q − ∈Q− |sim(q − ,W )6=1}
size(Q− )
(4.3)
In the following, only the weighted function is applied and denoted by QF (Q, W ). Furthermore, let rankk (Q, RP ) define the k best-matching workflow from a repository RP with regard to query Q.
4.4 POQL-Lite
101
Please note that the processing of POQL queries is computationally expensive, since the ranking function QF (Q, W ) calls the semantic similarity measure several times. In order to control retrieval time, the ranking is implemented by means of an A* algorithm by extending the A* parallel retriever mentioned in Section 3.5 towards POQL queries.
4.4 POQL-Lite Investigations during the development of POQL have shown that in some scenarios a light-weight version of the previously introduced query language is sufficient. This version is hereinafter referred to as POQL-Lite. It is basically the same query language, but does not enable to define edges within the components of the query. Thus, POQL-Lite reduces the expressiveness in favor of intuitiveness compared to POQL-Full. This means that POQL-Lite captures only the desired and undesired tasks and data nodes of a workflow within a query Q. Thus, the query is merely defined by a set of desired nodes and a set of undesired nodes respectively as denoted in Definition 12. In the cooking domain, it captures desired and undesired ingredients or preparation steps of a cooking workflow. Definition 12. A POQL-Lite query Q = ({x1 , . . . , xn }, {y1 , . . . , ym }) consists of a set of desired task or data nodes x1 , . . . , xn and a set of undesired task or data nodes y1 , . . . , ym . A POQL-Lite query still enables to define generalized elements within the set of desired or undesired nodes. For example, a vegetarian recipe can be requested by adding meat to the set of undesired nodes. An optimal matching workflow for a query Q contains all desired nodes and no undesired node. Again, the query Q = ({x1 , . . . , xn }, {y1 , . . . , ym }) can basically be represented by two workflows (see Sect. 4.2), i.e., a query workflow Q+ = (N + , ∅) consisting only of the desired nodes N + = {x1 , . . . , xn } and a restriction workflow Q− = (N − , ∅) only consisting of the undesired nodes N − = {y1 , . . . , ym }. Thus, POQL-Lite queries are also referred to as pair of a query and a restriction workflow, i.e., Q = (Q+ , Q− ). The corresponding query fulfillment measure QF (Q, W ) → [0, 1] implementing the ranking, basically assigns the highest value to the workflow, which contains most desired nodes and least undesired nodes. More precisely, the query fulfillment is determined by the similarity of the desired task and data nodes contained in workflow W and the number of undesired task and data nodes not contained in W in relation to the size of the query (see Formula 4.4). According to
102
4 Query Language
this definition, similar desired nodes increase the query fulfillment, while only exact matching undesired nodes reduce the fulfillment of the query. sim(x, m+ (x)) + |{y ∈ N − |sim(y, m− (y)) 6= 1}| |N + | + |N − | (4.4) In the formula, m+ denotes the similarity mappings resulting from the similarity computation sim(Q+ , W ) between the query workflow Q+ and the workflow W . Furthermore, m− describes the similarity mappings resulting from the similarity computation sim(Q− , W ) between the restriction workflow Q− and the workflow W . The resulting similarity mappings are employed by the query fulfillment to assess local node similarities (see Sect. 3.5) between desired or undesired nodes of the query and the mapped nodes of the workflow (see Formula 4.4). Thus, it can be determined, whether desired or undesired nodes in the query are exactly matched by the workflow, i.e., they have a local similarity value of 1. Please note that the ranking value of POQL-Lite mostly1 leads to the same ranking value as the ranking function of POQL-Full, if the query only consists of nodes. Moreover, the ranking conditions as introduced in Section 4.3 also apply to POQL-Lite. P
QF (Q, W ) =
x∈N +
4.5 Query Consistency A system that is based on queries, should be able to automatically determine whether a contradiction exists within the specified query. In this case, the user should be informed, as this indicates that the query does not reflect the users intentions. An appropriate POQL query without contradictions can entirely be fulfilled by a matching workflow model. Such queries are referred to as consistent as specified in Definition 13. − + − Definition 13. A POQL query Q = ({Q+ 1 , . . . , Qn }, {Q1 , . . . , Qm }) is − + − − + + + consistent, iff 6 ∃q ∈ {Q1 , . . . , Qm } ∧ 6 ∃q ⊆ {Q1 ∪ . . . ∪ Qn } : q v q − + Consequently, a query is consistent, if no sub-graph (q + ⊆ {Q+ 1 ∪. . .∪Qn }) of the query workflow is matched by a (more general) restriction workflow component (q + v q − ). Accordingly, a query is inconsistent, if any part of 1 POQL-Lite
only estimates one similarity mapping for the entire restriction workflow, while POQL-Full computes one similarity mapping for each undesired workflow component.
4.5 Query Consistency
103
the query workflow is matched by a restriction workflow component. The corresponding part cannot be fulfilled, since it is prohibited according to the restriction workflow. In case of a generalized restriction workflow component a broader range of components in the desired workflow is excluded prohibiting the definition of more specific sub-graphs within the query workflow. Q-
Q+ example 1
salami
example 2
salami
example 3
salami
ϟ salami=salami salami salami
ϟ ϟ
meat meat
salami meat
meat
cut
Figure 4.3: Inconsistent query examples
Three examples for inconsistent POQL queries in the cooking domain are illustrated in Figure 4.3. Obviously, as sketched in the first two examples a workflow cannot contain the ingredient salami and no salami at the same time (example 1), or contain salami and be also a vegetarian dish (example 2). One of the respective nodes must be removed in order to reestablish a consistent query. Likewise, this also applies to preparation steps and furthermore to entire workflow fragments as shown in the third example. Here, it is illustrated that cut salami is desired, while meat should not be contained in the workflow, which is obviously not possible. The removal of salami 2 or meat from the query would resolve the corresponding inconsistency. In following chapters only consistent queries are considered. However, inconsistent queries could still be specified in general, but a workflow entirely matching the query can never be created, i.e., a query fulfillment of 1 can never be achieved (either of the corresponding inconsistent parts is not fulfilled). The remaining parts of the query are not affected such that the 2 the
corresponding edge would be removed automatically
104
4 Query Language
query fulfillment still appropriately measures the compliance of the workflow and the query, but with a lower maximum value due to the inconsistency.
4.6 Potential Extensions Potential extensions of the presented POQL approach mainly comprise methods to increase its expressiveness. Though the ideas presented below might be required in certain scenarios, they do not address primary requirements for the intended workflow modeling support. Consequently, they are considered as possible future enhancements. POQL is currently not able to refer to particular workflow elements by use of the corresponding element ID. This kind of binding would, for example, enable to express dependencies between desired and undesired elements within the query, which could also be important during adaptation. The query could then contain workflow elements with IDs referring to the original workflow, for example, defining that a specific task t contained in the workflow W should be replaced by another task t0 . Thus, t would be contained in the restriction workflow and t0 in the query workflow both attached with the corresponding ID of the element in workflow W . As an example, a particular preparation step cut into strips should be replaced by cut into cubes instead of removing any cut into strips task and inserting a cut into cubes activity at an arbitrary position within the workflow. However, this is not considered in the following, as it would increase the complexity of the query language and query processing. In case of workflows available from various categories, additional filtering techniques could further be supported. For example, the specific output as defined in Section 3.2 marks the resulting data node after executing the workflow. In the cooking domain, this represents the particular dish (e.g., pasta dish or sandwich dish). Thus, this particular data node could be considered to strictly filter workflows by identifying those that fulfill a specific goal or purpose. Moreover, the expressiveness of POQL could be increased by more generalized constructs. In POQL nodes are defined based on task and data labels from the taxonomies. The semantic workflow representation, however, can possibly capture further information such as the duration of a preparation step or the required amount of ingredients in the cooking domain (see Sect. 3.5). Furthermore, some generic properties on the workflow itself could be annotated (e.g., required cooking skill level) that should be considered during retrieval. This would require that the workflow similarity and the
4.6 Potential Extensions
105
ontologies consider these additional semantics. For reasons of usability, POQL considers task and data labels only. Additionally, more generalized constructs could comprise transitive controlflow and data-flow properties. In POQL-Full, the query and restriction workflow exactly specify the control- and data-flow of a workflow. In some situations more generalized definitions could be required: A transitive controlflow relationship describes that a task is executed some time prior to another one, while a transitive data-flow relationship denotes that a produced data node is later somehow consumed by another task (possibly already contained within another data node). These transitive relationships between tasks and data nodes of a workflow W = (N, E) were formally introduced Section 3.2 as follows: 1. Let s1 < s2 denote a transitive relation expressing that sequence node s1 ∈ N S is executed prior to sequence node s2 ∈ N S . Consequently, t1 < t2 defines that a task t1 ∈ N T is executed prior to a task t2 ∈ N T . 2. Let t1 n t2 denote that the tasks t1 , t2 ∈ N T are data-flow connected, iff t1 < t2 ∧ ∃d ∈ N D : ((t1 , d) ∈ E D ∧ (d, t2 ) ∈ E D ), i.e., task t1 produces a data node d that is consumed by task t2 . Then, two tasks t1 , t2 ∈ N T are transitively data-flow connected t1 n+ t2 , iff t1 nt2 ∨∃t ∈ N T : t1 ntn+ t2 . To represent transitive control-flow or data-flow connectedness, new types of edges could be introduced. These additional edge constructs are exemplarily illustrated in Figure 4.4. The transitive control-flow edge on the left-hand side can define that a certain preparation step has to be executed some time (but not necessarily directly) before another one. Here, the preparation step stuffing must be performed prior to the baking of the particular dish. The transitive data-flow edge can be used to express that a preparation step should process an ingredient whose result is later consumed by another preparation step. Here, the cheese has to be sprinkled over the particular dish before the dish is placed in the oven for baking, i.e., a process for a casserole recipe is desired. stuff
<
bake
sprinkle
+
bake
cheese
Figure 4.4: Example transitive control-flow and data-flow connectedness
106
4 Query Language
The construction of transitive edges differs significantly from the usual definition of a workflow and would thus not straightforwardly gain acceptance by workflow modelers. Moreover, the computation time of the similarity assessment would be significantly increased. The mapping approach of the semantic similarity measure would require that the workflows are extended by all relations t1 < t2 and t1 n+ t2 for two task nodes t1 , t2 ∈ N T , which would significantly increase the search space for the best possible mapping (see Sect. 3.5).
4.7 Resulting Modeling Support Scenarios The presented query language POQL is mainly constructed to support the automatic construction of workflow models. Thus, it is based on a similarity measure in order to retrieve suitable workflows, even if exact matches cannot be identified. Furthermore, this enables to guide the subsequent adaptation process. Both usage scenarios will be illustrated in more detail below. For the purpose of modeling support by workflow retrieval, POQL can simply retrieve the best-ranked or k best-ranked workflows from the repository. As the query is divided into several workflow components, it becomes very expressive. Thus, it is possible to define various desired and undesired workflow components, which enables a more expressive search for workflow models compared to traditional workflow similarity measures. Moreover, POQL would also be suitable for the search of workflow fragments or for the auto-completion of a partially modeled workflow by searching for a matching workflow from the repository. The POQL query further can be used to automatically guide the adaptation of either a certain workflow selected by the user or directly after a bestmatching workflow has been identified. In the first case, an adaptation query Q has to be defined, while in the second case the already given query Q can be used to adapt the workflow according to the users’ requirements. Based on the query Q elements that do not match can be identified, enabling to estimate the elements to be deleted from or added to the workflow. More precisely, the similarity mappings of the employed semantic workflow similarity measure enable to identify the parts of the corresponding workflow W that have to be changed in order to satisfy the restriction component. This means that if an undesired component of the query has a similarity value of 1, an arbitrary part of the workflow W that is matched by this
4.8 Conclusions and Related Work
107
component has to be deleted or changed3 . Likewise, any element of the desired workflow components that has a similarity value less than 1 is not fulfilled and has to be somehow added to the workflow. All the desired elements that are matched with a similarity value of 1 must in general not be changed during adaptation, as this would reduce the ranking value. Furthermore, adaptation must avoid the creation of components violating the undesired components in the query. Since particular modifications can be assessed with regard to query fulfillment by computing the corresponding ranking value, this provides an opportunity to select the most appropriate workflow adaptation.
4.8 Conclusions and Related Work POQL is a capable query language for guiding the entire workflow modeling assistance. It extends traditional similarity search by an increased expressiveness, enabling to define undesired properties on a workflow. The asymmetric characteristic of the similarity measure enables that not all properties of the desired workflow have to be defined. If this is required in a certain application scenario (e.g. find a recipe that prepares a dish for the ingredients just being available in the kitchen), the similarity measure must be adjusted to a symmetric similarity measure (see [24]). POQL-Full queries are highly intuitive, as the query construction is highly similar to modeling a workflow. Thus, a query can be easily defined via a graphical workflow editor, which includes the available task and data node labels defined in the corresponding taxonomy (see Sect. 3.3). Furthermore, this section presented a light-weight version of the query language. POQL-Lite has been introduced with a reduced expressiveness in favor of the intuitivity, since investigations showed that a light-weight version is sufficient in certain situations. Both approaches can easily be exchanged with each other, since implementing POQL-Full enables the definition of POQL-Lite queries. Thus, depending on the particular application and domain, an appropriate implementation can be chosen. This section further discussed the validation of query consistency and possible extensions of the query language. Related query languages can mostly be classified into similarity search and querying [60]. Similarity search of workflows is an important research field, which is reflected by the large number of measures presented, including graph edit metrics, graph/sub-graph isomorphisms, and most common sub3 In
case of one element within the component that is applied, it denotes the element that has to be removed.
108
4 Query Language
graph approaches [11, 20, 58, 110, 133]. A survey by Becker and Lauer [13] showed, that such similarity measures can facilitate many ways of application scenarios for business process models, but are usually tailored to these scenarios with regard to their inherent properties. This chapter demonstrated, however, that similarity search is limited by the range of possible application scenarios for business process modeling assistance, exposing the demand for a novel query language. For business process querying, several approaches have been presented. These approaches are usually also limited for the intended workflow modeling assistance, since querying does not enable a ranking of workflows. Thus, they cannot be used to assess inexact matching workflows and cannot guide the automatic workflow adaptation. BPMN-Q [6, 212, 7] is a query language, which identifies matching BPMN business processes for a partially modeled workflow. This workflow is defined by its control-flow meaning that dataflow information or undesired properties cannot be defined. The related approaches by Beeri et al. [14] or Markovic et al. also cannot capture undesired properties of the workflow. The process query language PQL [109] addresses the querying and adaptation of process models in SQL-like statements. In PQL the required workflow adaptations must be explicitly defined. Another approach for the purpose of workflow (fragment) querying is the use of description logic [78]. In general, query languages can be used for a broad range of usage scenarios including the auto-completion of workflow models, workflow reuse by search, or fragment substitution [140]. Furthermore, as illustrated by Markovic et al. [140] and Awad [6], they can also be applied in additional scenarios such as the dependency analysis between workflow elements for the determination of adaptation effects [54] and for decision making support [96].
5 Workflow Adaptation The demands on workflow modeling assistance shifted from workflow reuse towards the creation of individual workflow models tailored to particular needs. Workflow reuse by searching for the best-matching workflow is not sufficient anymore, owing to the fact that workflows more frequently need to suit an individual application scenario. As a consequence, reusing the provided workflow would require an elaborative manual adaptation process in order to satisfy the restrictions and requirements of the individual situation. This significantly hampers successful reuse of workflows and the entire modeling support. Consequently, there is a high demand for enhancing workflow modeling assistance by an automatic adaptation of workflows. Generally speaking, adaptation to particular needs is basically a problemsolving scenario. As described by Newell and Simon [179] in 1972, in human problem solving a sequence of operators has to be identified that can transform the initial problem state into the desired goal state. This means for workflow adaptation, that several adaptation steps are performed in order to gradually transform the workflow such that it better matches the particular scenario. More precisely, an initial workflow w0 is modified by an adaptation step as1 into a workflow w1 , which can be adapted again. The transformation finally results in an adapted workflow wn (see Formula 5.1). as
as
as
w0 →1 w1 →2 . . . →n wn
(5.1)
Each adaptation step modifies the workflow by adding missing desired properties or removing undesired ones in order to construct a better matching workflow. Thus, an appropriate sequence of adaptation steps has to be identified in order to maximize the query fulfillment of the final workflow. This usually results in a trade-off between the final query fulfillment and the adaptation processing time. The adaptation steps can be performed by various adaptation approaches. In general, any adaptation approach relies on some kind of adaptation knowledge defining the possible workflow modifications. The manual acquisition of such adaptation knowledge, however, is commonly an elaborative task: The semantics of the particular adaptation knowledge have to be known and its effects on the adaptation algorithm in © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_5
110
5 Workflow Adaptation
many different application situations have to be considered. Moreover, for comprehensive adaptation a substantial amount of adaptation knowledge has to be defined. Overall, this results in a so called acquisition bottleneck for adaptation knowledge [93], hampering adaptation capabilities significantly. Consequently, the automatic learning of adaptation knowledge is of vital importance for the intended workflow modeling assistance. This chapter will present three different adaptation approaches, namely adaptation by generalization and specialization, compositional adaptation as well as transformational adaptation. These approaches base on principles successfully applied in Case-Based Reasoning (see Sect. 2.4.5) and each of them learns the required adaptation knowledge automatically from the workflow repository (see Sect. 2.4.5). The three different approaches are now briefly summarized below. • Generalization and Specialization: A generalized workflow [170] consists of generalized workflow elements, which represent a set of data nodes or task nodes respectively. Thus, it has an increased coverage as it substitutes a set of possible workflows. A generalized workflow can be specialized to particular needs by selecting appropriate task and data nodes for each generalized workflow element. Thus, generalization in combination with specialization provides a means for the adaptation of workflows. • Compositional Adaptation: The basic idea of compositional adaptation of workflows [168] is to analyze the workflows in the repository and to decompose them into meaningful sub-components, called workflow streams. Then, a given workflow can be adapted to individual demands by replacing its workflow streams with more appropriate ones from other workflows within the repository. • Transformational Adaptation: The approach of transformational workflow adaptation [171] is implemented by means of adaptation operators explicitly describing possible modifications of the workflow. More precisely, they determine modifications to add, remove or replace certain workflow fragments. By chaining various adaptation operators the workflow can be adapted to a more suitable workflow according to a given query. All three methods support the adaptation of a workflow with regard to given POQL query (see Chap. 4) describing the requirements and restrictions of the desired workflow. Furthermore, all approaches learn the required adaptation knowledge automatically from the workflow repository. Thus,
5.1 Generalization and Specialization
111
the acquisition bottleneck of adaptation knowledge is avoided. As a result, no additional effort is required to implement the particular workflow adaptation approaches, if the semantic similarity measure and the corresponding ontologies for workflow retrieval have already been constructed (see Sect. 3.5). Moreover, the three approaches ensure the syntactical correctness of the adapted workflow. This avoids that the user needs to manually repair apparently malformed workflow models. Otherwise, the utility of the workflow modeling assistance would be significantly reduced, as it may lead to flawed and inexecutable workflow models. In order to achieve this, each modification ensures that the workflow remains a consistent and block-oriented workflow, i.e., the workflow is syntactically valid with regard to control-flow as well as data-flow (see Sect. 3.2.1). The remainder of this chapter is organized as follows: First, the three briefly illustrated adaptation approaches will be presented in more detail. Then, the properties of the presented adaptation methods will be analyzed and compared in Section 5.4. Next, it is shown that these adaptation approaches can be integrated and combined in a single adaptation framework, such that a comprehensive adaptation process is constructed. This process aims at utilizing the advantages and compensating the disadvantages of the particular approaches in order to construct a more capable adaptation support. Finally, this chapter will be summed-up by conclusions and related work regarding workflow adaptation.
5.1 Generalization and Specialization Generalization is a basic concept in Machine Learning (e.g., [213, 150, 192]) and is also successfully applied in Case-Based Reasoning (e.g., [241, 274, 28, 142]). While in CBR (see Sect. 2.4) a traditional case is considered to provide a solution for a single problem, a generalized case describes solutions that cover a range of problems. Thus, a generalized case can be adapted to a specific problem situation. Applying generalization to workflows means that certain task or data nodes are denoted by generalized labels (e.g., meat) of the taxonomies (see Sect. 3.3). Each generalized label thereby represents a class of more specialized labels (e.g., chicken, pork, or beef ). Thus, a generalized workflow represents a set of possible workflows. However, an executable workflow usually requires a concrete specification for each task or data node. Thus, for an executable workflow model a specialization of the generalized workflow is required, which replaces the generalized labels (e.g., meat) with a concrete specialization
112
5 Workflow Adaptation
(e.g., chicken, pork, or beef ). Furthermore, in order to suit the demands of the user, the specialization must consider the current application scenario, i.e., the labels are specialized such that they best match the requirements and restrictions given in the query. The modification of labels by means of generalization and specialization thus enables the adaptation of a workflow according to a specific query. Thereby, the structure of the workflow is not changed, which ensures that the syntactical correctness of the workflow is not violated by adaptation. Generalization of workflows may additionally be useful beyond adaptation as it enables to improve the performance of workflow retrieval. If structural identical workflows can be represented by a single generalized workflow, only the generalized workflow has to be stored. This reduces the size of the workflow repository and consequently speeds up the retrieval time for identifying the best-matching workflow. The basic generalization approach presented in this section has been pubc of original publication by AAAI lished by M¨ uller and Bergmann [170] ( Press 2015). The remainder of this section first introduces the concept of generalized workflows. Next, an algorithm is presented that learns appropriate workflow generalizations automatically from the repository of workflows. Finally, an approach will be described employing such generalized workflows for the purpose of workflow adaptation.
5.1.1 Generalized Workflows The concept of generalized workflows previously sketched is now formalized in Definition 14. This means that a workflow W ∗ is a generalization of a workflow W , if both workflows are structurally equal (graph isomorphism I : N → N ∗ exists) and all labels of the generalized workflow W ∗ are identical or more generalized compared to the respective labels of the workflow W . Generalized labels are those representing a class of labels (see Sect. 3.3), i.e., they are labels from a higher level in the taxonomy. As an example in the cooking domain, the generalized ingredient meat covers particular ingredients such as pork and chicken, while the generalized preparation step heat represents activities such as bake and boil. In case of workflow W ∗ being a generalization of the workflow W , the workflow W also represents a possible specialization of the generalized workflow W ∗ . Thus, generalization and specialization describe an opposed relationship.
5.1 Generalization and Specialization
113
Definition 14. A workflow W ∗ is a generalization of the workflow W (denoted as W v W ∗ ), iff there exists a graph isomorphism I : N → N ∗ between the workflows W and W ∗ such that ∀n ∈ N : S(n) v S(I(n)). Following these considerations, definitions of equivalent workflows and strictly generalized workflows based on the comparison of two workflows are introduced (see Def. 15). Workflows are equivalent, if they are structurally identical and their related labels are also identical. A workflow is a strict generalization, if it is structurally equal compared to the other workflow and at least one of the labels is more general. Definition 15. Two workflows W1 , W2 are equivalent, denoted by W1 ≡ W2 , iff W1 v W2 ∧ W2 v W1 . Furthermore, W @ W ∗ denotes that W ∗ is a strict generalization of workflow W , iff W v W ∗ ∧ W ∗ 6v W . An example illustrating these relationship definitions is depicted in Figure 5.1. Here, the workflows W1 and W2 are structurally equivalent workflows and both workflows represent the process to prepare a sauce. W2 contains two variables x1 and x2 to demonstrate various possibilities of label assignments resulting in different relationships between these two workflows. Consequently, if x1 would be defined by italian seasoning and x2 by mix, both workflows would be considered as equivalent. If then x1 would be generalized to seasoning or the task label x2 is generalized to make small, W2 would be considered as a strictly generalized workflow compared to workflow W1 . italian seasoning mayonaise
mix
onions
cut
mustard
sauce
add
W1
x1
mayonaise
onions
mustard
x2
cut
add
sauce
W2
Figure 5.1: Example generalization/specialization relationships
Automated Workflow Generalization Based on the previously formalized definitions of generalized workflows, this section presents an algorithm for learning appropriate workflow generalizations automatically from the workflow repository. It employs the taxonomies of task and data labels as illustrated in Section 3.3. Referring to
114
5 Workflow Adaptation
this, Γ↑ (γ) denotes a one-step generalization (next more generalized label) and Γ↓ (γ) defines a one-step specialization (next more specialized label) for a given label γ. Furthermore, each generalized label γ is annotated with similarity values simψ (γ) denoting the similarity between all its specialized labels. Consequently, this taxonomic structure determines the relationships between the data and task labels, which serves as a knowledge source during generalization defining possible generalizations of particular workflow nodes. The main idea of automated generalization is based on the assumption, that if similar workflows with similar labels exist, these labels can be generalized. This is because similar workflows indicate that a single generalized workflow could appropriately cover all of their application scenarios. Hence, the algorithm basically generalizes a label from a workflow, if similar labels occur within similar workflows. For example, if many similar cooking workflows (e.g., all representing similar pasta dishes) with various kinds of meat exist, it can be assumed that the particular decision of which meat is used is not highly relevant, i.e, any kind of meat can potentially be chosen. Hence, the corresponding labels within the workflows could be replaced by a generalized label meat respectively. As the similarities and the relationships of the labels are based on the particular taxonomy, its proper definition is an important factor for appropriate generalizations. In order to derive additional information for generalization decisions, the approach further employs the data-flow and the control-flow information of the workflows. More precisely, similar labels should also occur in similar contexts (similar data and control-flow) within the workflows. Overall, these assumptions aim at avoiding arbitrary generalizations and providing mostly reasonable ones. Otherwise, inappropriate generalizations may lead to undesirable workflow adaptations. A non-reasonable generalization, for example, would be a recipe for stuffed zucchinis in which the zucchinis are generalized to vegetables. These vegetables could be later specialized to beans, resulting in a stuffed beans recipe. Obviously, beans can hardly be stuffed resulting in an insufficient quality of the provided cooking workflow. The approach for automated generalization is now explained in more detail. Workflows that are sufficiently similar for deriving appropriate generalization information, are determined based on a threshold similarity parameter ∆W ∈ [0, 1]. Hence, for a workflow W the set of sufficiently similar workflows RPW ⊆ RP within the workflow repository RP are those ones having at least a similarity value of ∆W according to the semantic workflow similarity measure (see Sect. 3.5). Consequently, RPW = {W 0 ∈ RP |W 0 6≡ W ∧ sim(W, W 0 ) ≥ ∆W } denotes the set of workflows that are employed during generalization of a particular workflow W .
5.1 Generalization and Specialization
115
W ...
...
W*
mm
...
pork (n1) sim(W,W1) ≥ Δ
...
W2
ax
...
mm
W1
ax
chicken (n)
...
...
...
meat (n*)
beef (n2) sim(W,W2) ≥ Δ
Figure 5.2: Example generalization
The workflow fragments illustrated in Figure 5.2, will be used to explain the generalization approach based on the cooking domain. Here, W represents the workflow to be generalized. The set of sufficiently similar workflows RPW with regard to the similarity threshold ∆W consists of the depicted workflows W1 and W2 , i.e., RPW = {W1 , W2 }. During computation of the sufficiently similar workflows RPW , the best possible mapping mmax between the workflow W and each workflow W 0 ∈ RPW was determined in order to estimate the workflow similarity (see Sect. 3.5). This mapping is highly useful for the generalization approach, as it provides information which nodes between these two workflows are related with each other. For a particular node n ∈ N of workflow W the set of related node labels from the sufficiently similar workflows RPW is defined as Ω(n) = {γ 0 ∈ ψ|∃(N 0 , E 0 ) ∈ RPW ∧ ∃n0 ∈ N 0 : mmax (n) = n0 ∧ γ 0 = S(n0 )} ∪ S(n). Consequently, the set of related nodes comprises all labels of mapped nodes as well as the original label of the respective node n of workflow W . In the example illustrated in Figure 5.2, the node chicken is mapped to pork in workflow W1 and to beef in workflow W2 . Thus, Ω(n) = {pork, beef, chicken} defines that for node n (chicken), the labels pork, beef, and chicken have been identified as related labels in sufficiently similar workflows. The use of this mapping function ensures that the context of the nodes is regarded in two respects: First, labels are only mapped to labels of the same type, i.e., task or data labels respectively. Second, also the control-flow and
116
5 Workflow Adaptation
data-flow of the nodes are considered, since the semantic similarity aims at identifying a best possible mapping for the nodes and corresponding edges. Consequently, labels are usually only considered as related, if they occur in the same context within the workflow. In the cooking domain, for example, the context of stuffing a vegetable usually differs from cutting or mashing a vegetable, i.e., not all vegetables can be meaningfully stuffed as previously described. The final condition whether a node n is generalized to its parent label γ ∗ = Γ↑ (S(n)), is determined by means of two criteria: The similarity values annotated to the taxonomy (see Sect. 3.5) and related node labels within similar workflows. The first criterion aims at considering the substitutability of the particular labels. Therefore, the annotated similarity value simψ (γ ∗ ) of the parent label is regarded, which denotes the similarity between its child labels. Since a higher similarity value indicates that sibling labels can be more likely substituted with each other, labels are more likely generalized if their parent label is denoted with a higher similarity value. The second criterion is used to assess the evidence for an appropriate generalization, i.e., whether similar labels are contained within sufficiently similar workflows. More precisely, the evidence measure is defined as the set of related node labels found in RPW in relation to possible one-step specializa|{γ 0 |γ 0 ∈Ω(n)∧γ 0 ∈Γ↓ (γ ∗ )}| tions (child labels) of the parent label, i.e., ϕ(γ ∗ ) = . |{γ 0 |γ 0 ∈Γ↓ (γ ∗ )}| Consequently, the more sibling labels with similar contexts (validated by the mapping) are identified in sufficiently similar workflows, the higher the value of ϕ(γ ∗ ). As an example, the evidence to generalize chicken to meat is higher, if the set of related node labels contains pork, beef, turkey and chicken instead of pork and chicken only. Both criteria are then aggregated to a single value by the harmonic simψ (γ ∗ )+ϕ(γ ∗ ) mean, i.e., ≥ ∆ψ . A parameter ∆ψ ∈ [0, 1] specifies the 2 final threshold to determine whether a particular node is generalized to its one-step generalization of the taxonomy γ ∗ = Γ↑ (γ) (e.g., chicken is replaced by meat). Consequently, if parameter ∆ψ would be set to 0.5, this would lead to generalizations either, if the labels are considered as identical (denoted as fully substitutable by similarity of 1.0), or if each of the sibling labels has been identified as related node label (i.e., maximum evidence for generalization is fulfilled). The complete workflow generalization approach is now summarized in Algorithm 1. For all nodes n ∈ N in the workflow W it is checked, whether the label S(n) is replaced by its one-step generalization Γ↑ (S(n)). For this
5.1 Generalization and Specialization
117
purpose, the set of related node labels Ω(n) is determined and the fulfillment of the generalization condition regarding threshold ∆ψ is verified. This algorithm results in a one-step generalized workflow meaning that the labels of the workflow W are replaced by their one-step generalization (parent label from the taxonomy), if appropriate. For further generalizations, this algorithm can be reapplied to inductively generalize a workflow. Additionally, all information of the original workflow is retained, as the original labels prior to generalization are stored within Sorig (n), unless this value has already been set by a prior generalization. More precisely, for all nodes that are not generalized, no original specification Sorig (n) is defined, i.e., Sorig (n) = ∅. Thus, if a node is generalized for the first time, the original label prior to generalization is stored within Sorig (n). Algorithm GENERALIZE WF(W ,RP ) Input: Workflow W and workflow repository RP Output: Generalized workflow W ∗ Compute RPW ; forall n ∈ N do Compute Ω(n); γ ← S(n); γ ∗ ← Γ↑ (γ); simφ (γ ∗ )+ϕ(γ ∗ ) 2 ∗
≥ ∆ψ then S(n) ← γ ; if Sorig (n) = ∅ then Sorig (n) ← γ; return W if
Algorithm 1: Workflow generalization
Repository Generalization Based on this, the entire repository of workflows can be easily generalized (see Algorithm 2) as follows. Each workflow W ∈ RP in the repository is generalized according to the previously introduced algorithm and stored in a new repository RP ∗ , unless this repository already contains an equal or more general workflow. In this case, the generalized workflow is not stored, as it is already covered by another one within RP ∗ . This results in a reduction of the workflow repository size but also leads to a loss of information with regard to the original labels of the discarded workflow. The generalization of the workflow repository is iteratively repeated until no workflow can be
118
5 Workflow Adaptation
further generalized. The result is a generalized workflow repository, in which each of the workflows inherit potential adaptation knowledge by representing a set of possible specialized workflows. Algorithm GENERALIZE REPOSITORY(RP ) Input: Workflow repository RP Output: Generalized workflow repository RP ∗ generalization ← true ; while generalization do generalization ← false ; RP ∗ ← ∅ ; forall W ∈ RP do W ∗ ← GEN ERALIZE W F (W, REP ); if 6 ∃W 0 ∈ RP ∗ : W ∗ v W 0 then RP ∗ ← RP ∗ ∪ W ∗ ; if W ∗ 6≡ W then generalization ← true; RP ← RP ∗ ; return RP
Algorithm 2: Generalization of workflow repository
In principle, generalization could also be performed on demand, for example, as presented by Gaillard et al. [70] for RDFS ontologies. However, as this workflow generalization approach aims at learning only appropriate generalizations by considering all available knowledge stored in the repository, generalization becomes highly computationally expensive. Thus, the generalization of the workflow repository is pre-computed and stored as the case base of the PO-CBR system. Workflow retrieval for a particular query (see Chap. 4) then determines the best-matching generalized workflow.
5.1.2 Specialization of Workflows Generalized workflows as previously sketched cannot be executed directly. For example, a sandwich recipe containing the ingredients bread and meat cannot be prepared in reality. Instead, the definition of a particular bread or meat type would be necessary (e.g., baguette or chicken). Thus, for enabling the execution, generalized workflows require a specialization for each generalized task and data label. However, in certain scenarios, the specialization of certain generalized labels is left to the involved users. For example, a cooking workflow containing the task cut represents an executable
5.1 Generalization and Specialization
119
recipe, although potential specializations such as cut into strips or cut into cubes exist. Here, the concrete specialization is chosen by the chef during workflow execution. Based on these considerations, this section presents an automated approach to specialize a generalized workflow such that an applicable workflow model can be constructed. Furthermore, in order to suit the particular application scenario, the automated specialization considers restrictions and requirements given in a query. Automated generalization and specialization thereby becomes a means for workflow adaptation. A workflow priorly modeled for another scenario can be generalized and subsequently specialized, such that it satisfies the requirements of the new scenario, by modifying the tasks and data labels in a reasonable manner. Since many possible specializations for a generalized workflow exist, a generalized workflow W represents a set of more specific workflows W+ (see Def. 16). The number of specific workflows further provides a measure for the adaptability of a particular generalized workflow W . More precisely, the coverage |W+ | denotes the number of different specializations or adaptations possible and thus determines the number of workflows that can be created automatically from the workflow W . Definition 16. The set of specific workflows W+ of a workflow W = (N, E) is defined as W+ = {W 0 = (N 0 , E 0 )|W 0 v W }. Further, the coverage of W is denoted by |W+ |. + − Determining the best specific workflow for a POQL query S Q = (Q , Q ), basically requires to construct all workflow specializations W ∈RP ∗ W+ from the generalized workflow repository RP ∗S(see [28]) and to identify the bestmatching specialized workflow rank1 (Q, W ∈RP ∗ W+ ) by means of workflow retrieval (see Chap. 4). This would, however, vastly increase the size of the workflow repository resulting in an unacceptable retrieval time for the best-matching specific workflow. In order to circumvent this expensive procedure an alternative approach is now illustrated, which utilizes that the best-matching generalized workflow for a given query Q generally comprises the best-matching specific workflow within the entire repository. This is because the ranking produced by the query fulfillment measure of POQL, which is based on the taxonomic similarity measure described in Section 3.5, already considers generalized labels during workflow retrieval (see Chap. 4). More precisely, the ranking already reflects the highest possible query fulfillment for a generalized workflow after specialization. Consequently, it is not possible to increase the query fulfillment by specialization any further. For example, if the query defines chicken as desired and the workflow contains meat, a corresponding query
120
5 Workflow Adaptation
fulfillment of 1.0 is determined. Specializing meat to chicken does not affect the corresponding query fulfillment QF (Q, W ). If instead the query specifies that no chicken is desired, the query fulfillment is only changed, if meat is specialized to chicken (reduction of query fulfillment). Any other arbitrary type of meat such as beef or pork can be chosen without affecting the query fulfillment QF (Q, W ). Considering the desired and undesired properties of the query Q = (Q+ , Q− ) during specialization, consequently ensures that the query fulfillment does not change. The generalized workflow with the highest query fulfillment can consequently lead to the best specialized workflow according to a given query Q. An approach for specializing the best-matching generalized workflow W ∗ according to the previous considerations is now illustrated, resulting in an specific workflow adapted to the particular scenario specified in the query Q = (Q+ , Q− ). The approach employs the mapping between the query workflow Q+ and the workflow W ∗ , which is already available since it is estimated during workflow retrieval (see Sect. 3.5+4.3). This mapping is hereinafter denoted as m+ max . Based on this, the generalized workflow is specialized node by node. This means that the specialization is based on the successive replacement of the generalized node labels with specific ones according to the query. This simplifies the search for an optimal solution as dependencies between various nodes are not considered and further prevents multiple and elaborative re-computations of the mappings. Prior investigations showed that this heuristic approach is sufficient in most application cases and is thus chosen in order to reduce the run-time of the algorithm. In general, the node-by-node specialization aims at fulfilling desired properties and at avoiding undesired ones. Moreover, the original specification is considered, if appropriate. This ensures that information of the original workflow is reused, since it most likely leads to a higher quality of the resulting workflow. Additionally, the original specification can be utilized to determine an appropriate level of specialization. For example, if cheese is an original specification it can be considered as a potential label occurring in a valid cooking workflow. It is further assumed that the query node itself is not a desired specialization unless it is a leaf node in the taxonomy. For example, if meat is specified as desired, usually a concrete type of meat would be expected in the resulting workflow. For the node-by-node specialization, five scenarios have been derived as illustrated in Figure 5.3. Each of them is now described by the generalized workflow node label meat, an example query, and the resulting specialization considering the depicted partial example taxonomy. The first two conditions
5.1 Generalization and Specialization
121
(1+2) aim at identifying a suitable specialization, if a label from the query was mapped to the generalized node in the workflow. If no such mapping exists, the original specification is chosen unless it is undesired (Condition 3). If the previous conditions are not appropriate, Condition 4 and 5 determine a suitable specialization and additionally ensure that more closely related labels from the taxonomy are considered according to the original specification, if possible. Query Q
poultry(Q+)
poultry(Q+)
Query Q
m+max Workflow W
(1)
meat
Sorig(n*)=chicken
no/unrelated mapping Workflow W
(3)
chicken
meat
tuna (Q+)
meat
Workflow W
(2)
specialization by original label
Query Q
chicken(Q-)
m+max
specialization by leaf node of query label chicken(Q-)
Query Q
m+max chicken
Sorig(n*)=chicken
specialization by original label
meat
Workflow W
(4)
turkey
Sorig(n*)=chicken
turkey
Sorig(n*)=chicken
specialization by sibling of original label ingredients
Query Q
poultry(Q-) seafood
Workflow W
(5)
meat
Sorig(n*)=chicken
beef steak
tuna scampi
specialization by leaf of current label
meat beef pork poultry steak
loin
duck turkey chicken
related partial example taxonomy Figure 5.3: Example specialization scenarios
1. Assuming a first scenario in which poultry is desired, the generalized node meat is replaced by the original specification, i.e., chicken, because it represents a valid specialization unless it is undesired. 2. In the second scenario the original specification, i.e., chicken, would be inappropriate, since the query explicitly specifies this ingredient as undesired. Thus, the workflow node meat is replaced by a random leaf node of the mapped query node that is not undesired, e.g., turkey.
122
5 Workflow Adaptation
3. Another scenario occurs, if the mapped query node is not related or no related mapping is present at all, such that a valid specialization according to the query cannot be identified (see scenario three). In this case, the label is replaced by its original specification, unless it is defined as undesired according to the query. In the illustrated example meat is specialized to the original specification chicken, as no specialization of meat could fulfill the desired property that tuna is contained in the workflow. 4. In the fourth scenario, the query denotes that chicken is undesired. This means that the original specification is undesired. Thus, meat is specialized to a random sibling label of the original specification that does not lead to any undesired workflow element, here turkey. Sibling labels are assumed to be more likely replaceable with each other and thus more likely result in an appropriate specialization. This aims at avoiding potential effects of over-generalization, since generalized nodes of a high level in the taxonomy represent a broad range of valid specializations. 5. In the last case, an appropriate sibling label cannot be identified, because no poultry should be contained. Then, a random leaf label of the taxonomy with regard to the current label of the generalized node (i.e., meat) is chosen that represents a valid and not undesired specialization, here, beef steak. In order to describe the concrete implementation of the node-by-node specialization approach, formal notations are introduced below. For a generalized workflow W ∗ = (N ∗ , E ∗ ) and a query Q = (Q+ , Q− ), let nq+ := ∗ n ∈ N + s.t. m+ max (n) = n specify a potential desired node of the query + + + workflow Q = (N , E ) mapped to the node n∗ ∈ N ∗ of the generalized workflow W ∗ . Moreover, the undesired labels from the taxonomy for the given restriction workflow Q− = (N − , E − ) are defined as γ − = {γ ∈ ψ|∃n− ∈ N − : γ v S(n− )}. Based on these notations, the workflow is specialized node by node according to the five previously sketched specialization scenarios (see Algorithm 3). If a desired query node nq+ is mapped to the generalized node n∗ ∈ W ∗ such that the query label would represent a valid specialization (S(nq+ ) @ S(n∗ )), it is distinguished between two cases (1+2). If the original specification Sorig (n∗ ) is not undesired (Sorig (n∗ ) 6∈ γ − ) and further is either identical to the mapped desired node nq+ or is a valid specialization of the mapped desired node nq+ (Sorig (n∗ ) v S(nq+ )), the original specification Sorig (n∗ ) is the chosen as specialization for the generalized node n∗ ∈ W ∗
5.1 Generalization and Specialization
123
(1). Otherwise, all leaf nodes with regard to the query that are not undesired (γ 0 v nq+ ∧ γ 0 6∈ γ − ) represent potential labels for the generalized node n∗ and a random one is chosen as the new label of n∗ (2). Algorithm SPECIALIZE WF(W, Q) Input: Generalized workflow W ∗ , query Q = (Q+ , Q− ) Output: Specialized workflow W ∈ W+ Compute γ − ; forall n∗ ∈ N ∗ do if nq+ @ n∗ ∧ Sorig (n∗ ) v nq+ ∧ Sorig (n∗ ) 6∈ γ − then //original label, if matched by query label and not undesired(1) S(n∗ ) ← Sorig (n∗ ) ; else if nq+ @ n∗ ∧ ∃γ 0 ∈ψ : γ 0 v nq+ ∧ γ 0 6∈ γ − then //random leaf child of¯ mapped query label if not undesired (2) S(n∗ ) ← γ 0 ; else if Sorig (n∗ ) 6∈ γ − then //original label, if not undesired (3) S(n∗ ) ← Sorig (n∗ ); else if Sorig (n∗ ) 6= ∅ ∧ ∃γ 0 ∈ Γ↓ (Γ↑ (Sorig (n∗ ))) : γ 0 6∈ γ − then //random sibling label of original label that is not undesired (4) S(n∗ ) ← γ 0 ; else if Sorig (n∗ ) 6= ∅ ∧ ∃γ 0 ∈ψ : γ 0 v S(n∗ ) ∧ γ 0 6∈ γ − then ¯ //random leaf child of generalized label that is not undesired (5) S(n∗ ) ← γ 0 ; else if Sorig (n∗ ) 6= ∅ then //choose original label as fallback (6) S(n∗ ) ← Sorig (n∗ ) ; return W Algorithm 3: Specialization of a workflow
If the mapped query node is not a valid specialization of the current generalized label (i.e., S(nq+ ) 6@ S(n∗ )), all possible specializations of n∗ are not related to the query, i.e., the query needs not to be considered. Thus, in case of irrelevant or no mappings to the generalized node n∗ , the label of the original specialization Sorig (n∗ ) is chosen unless it is undesired (3). If the node n∗ has not yet been specialized by one of the previous conditions, it is specialized by a random not undesired sibling label (Γ↓ (Γ↑ (Sorig (n∗ )))) of the original specification with regard to the taxonomy (4). Failing this,
124
5 Workflow Adaptation
the generalized node n∗ is replaced by a random leaf label (γ 0 ∈ψ ) from the taxonomy representing a not undesired valid specialization with¯ regard to the generalized node (γ 0 v S(n∗ )) (5). Consequently, the algorithm basically aims at replacing the generalized labels by the mapped desired query terms or corresponding original specifications and further avoids that specialization produces any undesired node according to the query. If no appropriate specialization can be identified by the bevious conditions, the original specification is chosen (6). Please note that the label of node n∗ is not changed during specialization, if the node was not generalized during the generalization of the workflow, i.e., Sorig (n∗ ) = ∅1 . The only exception is when a certain specialization is explicitly requested in the query and it represents a valid specialization according to the current node label (Condition 2). By this approach, the query fulfillment of a workflow W for a POQL query Q is not changed, i.e., QF (Q, W ) = QF (Q, SP ECIALIZE W F (W, Q)). This is because undesired nodes are avoided in general and the specialization process of the generalized node n∗ considers the query. More precisely, if a desired query node nq+ was mapped to the generalized node n∗ such that it represents a valid specialization (nq+ @ n∗ ) any child node y of the query node that is not undesired (y v nq+ ∧ S(y) 6∈ γ − ), results in the same similarity value according to the taxonomic similarity measure (see Sect. 3.5), i.e., simψ (S(nq+ ), S(n∗ )) = simψ (S(nq+ ), S(y)) = 1. This also applies to the original specification, if it is a child node of the mapped query node (Sorig (n∗ ) v S(nq+ )). Thus, condition 1 and 2 do not change the query fulfillment. By applying the remaining conditions (3+4+5) the query fulfillment is also not changed, since undesired elements are avoided. More precisely, the same similarity value results for a generalized node, when choosing a valid specialized label (y v n∗ ) that is not undesired (S(y) 6∈ γ − ), i.e., simψ (LCA(S(nq+ ), S(n∗ ))) = simψ (LCA(S(nq+ ), S(y))). In case that the query cannot be satisfied (6), because all leaf nodes of the generalized node n∗ and all sibling labels of the original specification Sorig (n∗ ) are undesired, replacing the current label of the generalized node with the original specification also does not change the query fulfillment, since both labels are undesired. Because specialization cannot increase the query fulfillment, this algorithm thus results in an optimal specific workflow. Consequently, for the best-matching generalized workflow, the specialization algorithm can be considered as a useful heuristic for producing a specific 1 if
the node was not generalized, no original specification was set during the generalization of the workflow as defined in Algorithm 1
5.2 Compositional Adaptation by Workflow Streams
125
workflow S with the same query fulfillment as the best-matching workflow from W ∈RP ∗ W+ with regard to a POQL query Q. The resulting workflow W is then a specialized workflow according to the requirements defined in the particular POQL query Q.
5.1.3 Summary Summarizing, the presented approaches for generalization and specialization provide a means for the adaptation of workflows. The workflow repository is automatically generalized, thereby appropriate workflow adaptation knowledge is learned. A heuristic specialization approach then selects the best-matching generalized workflow from this repository and specifies it according to the restrictions and requirements denoted in the query. In order to diminish the loss of quality during this process, the approach limits the potential impact of over-generalization by storing the original labels of the workflow. Moreover, the use of the mappings ensures that the context of the nodes with regard to the control-flow and data-flow can be considered during specialization and generalization.
5.2 Compositional Adaptation by Workflow Streams Compositional adaptation (see Sect. 2.4) is a well-established method in Case-Based Reasoning (e.g., [189, 191, 118]). This adaptation principle basically assumes that cases can be divided into several components such that these components can be substituted with other ones in order to adapt the case towards varying demands. According to this, compositional workflow adaptation requires the determination of workflow components for each stored workflow. These workflow components could be explicitly defined as sub-workflows during modeling of the workflow. In order to obviate the need for defining components manually, the presented approach partitions a workflow automatically into so-called workflow streams. In this manner, required adaptation knowledge is again learned automatically. Compositional adaptation of a workflow is then conducted by replacing its workflow streams with more appropriate ones of other workflows. A similar approach to adaptation has been presented by Dufour-Lussier et al. [63], in which the process is partitioned by certain manually defined types
126
5 Workflow Adaptation
of tasks. In contrast, the partition of the workflow into workflow streams can be determined fully automatically. The basic approach to compositional adaptation presented in this section c of original has been priorly published by M¨ uller and Bergmann [168] ( publication by Springer 2014). The remainder of this section will first present an approach to determine workflow streams automatically by decomposing all workflows stored in the workflow repository. Next, a compositional adaptation approach is introduced, which modifies a workflow for a given query, by replacing workflow streams with more appropriate ones such that a higher query fulfillment is achieved. Finally, this section will discuss additional and alternative application fields of workflow streams for the purpose of workflow modeling support.
5.2.1 Automated Learning of Adaptation Knowledge This section introduces a method to decompose workflows into meaningful components referred to as workflow streams. The definition of these workflow streams thereby aims to identify mostly independent components of a workflow that can be easily exchanged with each other. The idea of workflow streams is based on Davenport’s perspective on processes, who stated that a process is “[. . . ] designed to produce a specific output [. . . ]” [55, p.5] (see Sect. 2.1). In case of cooking workflows, the specific output is the particular dish finally produced, for example, pizza salami or spaghetti bolognese. In order to construct the specific output of a workflow, partial outputs have to be produced priorly and then combined to a final output of the workflow. Assuming that a workflow prepares a pizza dish, this workflow in general also prepares the corresponding dough, the sauce, and the toppings, which are then combined to create the particular pizza. The basic idea is that each workflow stream represents the production of such a specific partial output, for example, the partial process to produce the sauce. The partial outputs of a workflow are hereinafter considered as those data nodes (or ingredients respectively) that are newly produced. Such nodes were introduced in Chapter 3 as creator data nodes N D∗ that are produced by specific activities referred to as creator tasks N T ∗ . A creator data node d is linked only as an output to at least one task t, i.e., ∃t ∈ N T : d ∈ tD ∧d 6∈ tD , while a creator task t accordingly is linked with at least one data node d that is no input data node, i.e., ∃d ∈ N D : d ∈ tD ∧ d 6∈ tD . As an example, Figure 5.4 illustrates a workflow representing the process to prepare a pizza dish. Here, the creator tasks of the workflow (marked with ), produce the partial outputs sauce and dough. These are then com-
5.2 Compositional Adaptation by Workflow Streams
127
bined to create a pizza, which is subsequently layered with various toppings. The illustrated workflow will be used in the following to demonstrate the formalized approach by an example scenario. tomatoes
pepper
salt
sauce
∞
pizza
∞
2 1 cut
peel grate
mash
4
add
+
+
Stream S
3 butter
knead
1 roll
combine
add
combine rye-flour
1 PROCESS pizza
sprinkle
bake
Stream S yeast
water
2 PREPARE sauce
dough
∞
3 PREPARE dough
salami
cheese
4 PREPARE pizza
Figure 5.4: Workflow and workflow streams
As can be obviously derived from the example workflow in Figure 5.4, the sole execution of the particular creator task is not sufficient to produce the corresponding partial output. Instead, also previous workflow activities have to be performed. For example, the creation of the sauce requires that tomatoes are peeled and mashed and finally seasoned with salt and pepper. Consequently, workflow streams usually involve multiple tasks, since they represent the entire partial processes to produce the specific partial output. In order to identify these partial processes, all activities need to be determined that are required before the particular creator task can be executed. This can be achieved by employing the data-flow information of the workflow. In Chapter 3 it was defined that two tasks t1 , t2 ∈ N T are data-flow connected t1 n t2 , if t2 consumes a data node produced by a prior activity t1 . Thus, to ensure the correct processing of the data node, task t1 needs to be executed prior to task t2 . Furthermore, t1 n∗ t2 denotes the corresponding transitive data-flow connectedness between two tasks and d
t1 n t2 describes that t1 is data-flow connected to t2 via data node d. In the illustrated example workflow in Figure 5.4, knead n roll or grate n∗ sprinkle would hold. Additionally, the tasks grate cheese and sprinkle cheese over cheese
the pizza are data-flow connected via the data node cheese, i.e., grate n sprinkle. Hence, the preparation step grate cheese has to be performed prior to sprinkle cheese over the pizza and more importantly this is also required in order to produce the final specific output, i.e., the pizza dish. Employing
128
5 Workflow Adaptation
the data-flow can thus serve as a means to identify related activities that need to be performed prior to the execution of a particular creator task. This introduced data-flow connectedness can be utilized to decompose a workflow W into meaningful partial workflows. As a reminder, partial workflows basically represent a block-oriented workflow for a given set of tasks, including their linked data, related control-flow nodes, and the involved data- and control-flow edges (see Chap. 3). The decomposition into meaningful components then aims at identifying partial workflows, such that each represents a process to produce a certain partial output. Moreover, these workflow components should be easily substitutable. This requires a clear affiliation of the associated tasks since overlaps would hamper the independent component replacement. Hence, each task is included in exactly one workflow component. Based on these considerations, each workflow W from the workflow repository RP is partitioned according to the definition given below. Definition 17. A partition S W = {S1W , . . . , SnW } of a block-oriented workflow W is a set of partial workflows SiW , such that each task t ∈ N T is contained in a partial workflow SiW and such that the tasks in each SiW are transitively data-flow connected and not contained in any other partial workflow Sj6W=i . All creator tasks y ∈ N T ∗ are end nodes of any partial workflow in S W . Each partition SiW is referred to as workflow stream. More precisely, for each creator task y ∈ N T ∗ the transitively data-flow connected tasks are added to the set TS (y) (see Formula 5.2) unless they are data-flow connected to a predecessor creator task x ∈ N T ∗ in the workflow (creator tasks are end nodes). This ensures that the tasks are associated with the closest creator task. The workflow stream S is then created by constructing a partial workflow according to Definition 6 for the tasks TS (y). This stream then basically represents a partial workflow to produce a partial output (creator data node) determined by the creator task y. TS (y) := {t ∈ N T |tn∗ y∧t 6∈ N T ∗ ∧ 6 ∃x ∈ N T ∗ : (tn∗ x∧x < y)}∪{y} (5.2) Figure 5.5 illustrates an example to extract the related tasks of a stream for a particular creator task y. The example creator task (see double-lined rectangle) combines the sauce and the dough to produce a plain pizza. All tasks that are not relevant for the particular stream are marked by grey areas. More precisely, the stream must not contain any task that is a successor of the creator task (1), which here excludes all tasks that places the toppings
5.2 Compositional Adaptation by Workflow Streams
129
on the pizza after the plain pizza was produced. Furthermore, other creator tasks must not belong to the stream, since they are assigned to another stream per definition (2+3). Also, the tasks that are transitively data-flow connected to a previous creator task are excluded, because they are assigned to the stream of that creator task (4). In the example, the tasks peel and mash are used to produce the sauce and would consequently be part of that stream. Finally, all tasks that are not transitively data-flow connected to the creator task are excluded (5). Here, cut and grate are not part of the stream, since they are components that are used to process the toppings but not relevant for production of the plain pizza. All remaining tasks then determine the tasks of the stream for the creator task y, i.e., knead, roll and combine are tasks that are extracted as a stream to produce the plain pizza (stream 4 in Fig. 5.4).
connected creator to previous task (2) creator task (4)
not connected (5)
cut
grate
tomatoes
salt
pepper
peel
mash
add
+
successor tasks (1)
sauce
+
knead
pizza
roll
combine
add
sprinkle
bake
combine butter
rye-flour
creator task (3)
yeast
water
dough
∞
salami
cheese
Figure 5.5: Workflow stream extraction for a creator task
According to Definition 17, each task in the workflow is exactly contained in one workflow stream SiW . Thus, also the remaining tasks not contained in any stream SiW constructed by the creator tasks N T ∗ , are assigned to streams. This means that for each transitively data-flow connected set of remaining tasks, an additional workflow stream is constructed respectively, i.e., a partial workflow is determined. For the example workflow (see Fig. 5.6), the tasks that place toppings on the pizza (1) and those preparing the toppings (2) remain after the extraction of streams for each creator task. Because all these tasks are transitively data-flow connected they are assigned
130
5 Workflow Adaptation
to a single stream determining the process to place toppings on the pizza (stream 1 in Fig. 5.4). remaining tasks (2)
cut
grate
tomatoes
salt
pepper
peel
mash
add
+
remaning tasks (1)
sauce
+
knead
pizza
roll
combine
add
sprinkle
bake
combine butter
rye-flour
yeast
water
dough
∞
salami
cheese
Figure 5.6: Workflow stream extraction for remaining task
In contrast to the prior workflow streams that describe a partial process to produce a particular creator data node, these streams process a certain creator data node, i.e., the data node by which the tasks are data-flow connected. Hence, it is distinguished between two classes of workflow streams, namely, producer streams producing a creator data node and processor streams processing a creator data node. For the workflow illustrated in Figure 5.4 the dashed lines represent three different producer streams (stream 2,3,4) determined by the particular creator tasks (denoted by ) producing sauce, dough or pizza as well as an additional processor stream (stream 1) which places toppings on the pizza. Please note that the definition of workflow streams does not require that tasks within a stream are uninterruptedly connected with regard to the control-flow (e.g., see stream 1) as the affiliation is mostly estimated via the data-flow. However, the determined workflow streams are disjoint, for example, task peel is not part of stream 4 to produce the pizza, since their transitive data-flow connection is separated by the creator task to produce the sauce. The decomposition of a workflow into workflow streams according to the previous definition constructs workflow components, which are syntactically valid with regard to the control-flow, since they are block-oriented workflows according to the definition of partial workflows (see Sect. 3.2.2). Furthermore, if the workflow is consistent, this also applies to the determined workflow streams, since the data-flow is considered during stream construction. By this means, the construction also ensures a meaningful decomposition of the workflow. Based on the determined workflow streams, compositional adaptation aims at adapting a workflow by replacing the workflow streams with more
5.2 Compositional Adaptation by Workflow Streams
131
appropriate ones. Streams are thereby considered as appropriate, if they basically represent the same partial process, but in a different manner by employing other task or data nodes. Such processes use the same data nodes that are consumed and produced by the other streams in the workflow. In the illustrated example workflow (see Fig. 5.4) an appropriate replacement for stream 4 must also consume the priorly created sauce and dough to create the pizza, because otherwise the replacement stream would represent another kind of partial process. These designated data nodes thus represent the dependency conditions to the remaining components of the workflow and are hereinafter referred to as the anchors of a particular workflow stream (see Def. 18). Definition 18. For a workflow stream S, a set of input anchors is defined d
as SA = {S(d)|(d ∈ NSD ∧ ∃t ∈ N T \ NST ∧ ∃tS ∈ NST ) : t n tS } and the set of output anchors is defined as S A = {S(d)|(d ∈ NSD ∧ ∃t ∈ N T \ NST ∧ ∃tS ∈ d
NST ) : tS n t} According to this definition, the input anchors of a stream are the data nodes that are produced by a task of another stream and the output anchors are those consumed by a task of another stream within the same workflow. In Figure 5.4 those anchor nodes are denoted by ∞ ,i.e., sauce, dough and pizza. Here, sauce is an output anchor for producer stream 2 and dough is an output anchor for producer stream 3 respectively. The producer stream 4 is denoted by the input anchors sauce and dough and by the output anchor pizza. Thus, these anchors mark dependencies to other streams within the workflow. In the example, stream 4 can only be executed after stream 2 and stream 3 have produced the required sauce and dough data node respectively. For processor stream 1 pizza is an input and output anchor. Extracting stream 1 from the workflow would then be represented as a partial workflow denoted with the particular anchors as illustrated in Figure 5.7. pizza
∞
cut
grate
add
salami
sprinkle
bake
cheese
Figure 5.7: Stream S (stream 1 in workflow from Figure 5.4)
132
5 Workflow Adaptation
The definition of workflow streams enables to learn required adaptation knowledge automatically (see Algorithm 4). Therefore, each workflow W = (N, E) ∈ RP in the repository is decomposed into workflow streams (see Def. 17) by extracting corresponding streams for each creator data node ∗ t∗ ∈ N T . For stream extraction (see extractStream), the related tasks ∗ TS (t ) are determined and a corresponding partial workflow (see Sect. 3.2.2) is constructed. Furthermore, the anchors of the stream (see Def. 18) are annotated to the partial workflow, i.e., the data nodes that are linked to other streams of the workflow. After the streams for each creator task were constructed, also the remaining tasks that are so far not assigned to any workflow stream are extracted as workflow streams (see extractRemainingStreams). More precisely, for each set of transitively data-flow connected tasks a stream is extracted additionally. All streams are then stored in a separate workflow stream repository. After setup, this repository is analyzed in order to remove duplicate workflow streams (see removeDuplicates). This means that whilst there exist two equal workflow streams (S1 ≡ S2 ), one of them is removed from the stream repository. Furthermore, if a stream is more general compared to another one in the repository (S1 @ S2 ) only the more general workflow stream (S1 ) is retained as it basically covers both streams. This can be seen as a kind of maintenance process of the adaptation knowledge container (see Sect. 2.4.6), since redundant and superfluous knowledge is eliminated, accelerating the subsequent adaptation process. Algorithm EXTRACT STREAMS(RP ) Input: Workflow repository RP Output: Set of workflow streams streams streams = ∅; forall W = (N, E) ∈ RP do ∗ forall y ∈ N T do Init stream s; stream = extractStream(W, TS (y)); streams = streams ∪ {stream}; streams = streams ∪ extractRemainingStreams(W ); streams = removeDuplicates(streams); return streams
Algorithm 4: Basic extraction algorithm of workflow streams
5.2 Compositional Adaptation by Workflow Streams
133
5.2.2 Application of Adaptation Knowledge Once a workflow stream repository is initialized according to Section 5.2.1, a workflow W can be adapted towards the restrictions and requirements defined in a POQL query Q = (Q+ , Q− ). This means that the compositional adaptation aims at achieving a higher query fulfillment for the particular query by replacing workflow streams. A stream from the workflow stream repository can be roughly regarded as more appropriate compared to a stream within the workflow W , if it is characterized by a higher query fulfillment with regard to the query Q. Thus, the replacement would adapt the workflow such that more desired properties are fulfilled and less undesired ones are contained. Replacing those streams would consequently lead to a higher query fulfillment for the workflow and thereby adapt it according to the users’ demands. In the next sub-section substitute stream candidates are defined, which ensure that the replacement of a stream does not violate the consistency of the adapted workflow. Then, the replacement of streams within a workflow is explained in more detail. Based on this an algorithm for the compositional adaptation of a workflow is finally presented. Replaceable Workflow Streams Adaptation of workflows by exchanging streams fundamentally requires to determine those streams that can be replaced with each other without violating the consistency of the workflow. Such streams are hereinafter referred to as substitute stream candidates. They can be identified by considering the anchors of the particular stream, which denote the dependencies to the remaining workflow. More precisely, stream S 0 is a substitute stream candidate for stream S, if the streams have matching input and output anchors (see Def. 19). Anchors of the substitute stream candidate S 0 are matching, if the labels are equal or more general than those of stream S. Thus, also workflow streams defined at a more general level, i.e., covering a broader range of possible workflow specifications, are potential substitute stream candidates. Definition 19. A stream S 0 is a substitute stream candidate for a given stream S, iff the anchor labels of the substitute stream candidate S 0 are equal A or more general than those of stream S, i.e., SA v S 0 A ∧ S A v S 0 This definition ensures that replacing a stream does not violate the consistency of the workflow, since the dependencies to the remaining workflow
134
5 Workflow Adaptation
components are considered. More precisely, the substitute stream candidate consumes the corresponding data nodes provided by prior streams and produces the corresponding required data nodes for the execution of successor streams. Thus, each data node is still contained in the final output of the workflow after stream replacement (see Def. 5). In this regard, the anchors of a workflow stream determine consistency constraints [16][p. 151-152] on the adaptation process. Provided that these constraints are regarded during adaptation, the streams of the workflow can be replaced independently from the remaining workflow components. As intended, none of the streams within the workflow illustrated in Figure 5.4 can be replaced with each other, since they represent different processes to produce or process various partial outputs. More precisely, replacing the stream 2 would require a stream that also produces a sauce without having further dependencies. A substitute stream candidate of the stream 4 must be denoted with the input anchors dough and sauce as well as the output anchor pizza. pizza
∞
cook
ground beef
add
onions
chop
sprinkle
bake
cheese
Figure 5.8: Substitute stream S 0
An example substitute stream candidate of stream 1 is illustrated in Figure 5.4. Both streams have identical input and output anchors, i.e., pizza (see ∞). Thus, it is ensured that both processor workflow streams consume the pizza data node, process this data node and finally return it as output data node. Hence, exchanging those two streams would ensure the consistency of the workflow, as they are characterized by the same dependencies to the remaining workflow. Replacing Workflow Streams The process to replace a stream S in a workflow W by a substitute stream candidate S 0 , basically removes stream S and subsequently inserts the substitute stream candidate S 0 into workflow W . This replacement will be illustrated in more detail below.
5.2 Compositional Adaptation by Workflow Streams
135
Removing stream S from workflow W , means that a partial workflow (see Sect. 3.2.2) is constructed, containing all tasks of W except of those contained in S. According to definition, this partial workflow also does not include data nodes and control-flow nodes which are only related to the particular stream activities. This partial workflow is the resulting workflow W after removing stream S. Since the resulting workflow is a partial workflow it is a block-oriented workflow by definition. Consequently, removing a stream from a workflow ensures that a syntactically valid workflow with regard to the control-flow is constructed. tomatoes
salt
pepper
sauce
pizza
∞
blend cut
grate
mash
add
+
+
knead
roll
combine
add
combine
Stream S butter
rye-flour
sprinkle
bake
Stream S yeast
water
dough salami
cheese
Figure 5.9: Stream S removed from W
As an example, Figure 5.9 shows the removal of the workflow stream 1 from the workflow W illustrated in Figure 5.4. The grey marked elements are those that are thereby removed from workflow W . As can be seen, the resulting workflow after removal is again block-oriented. Removing a stream, however, may violate the consistency of the workflow. This is, for instance, caused by removing a producer stream, since the particular creator data node, required by other workflow components, is not created anymore. Literally, the information how to produce the desired output anchor from the given set of input anchors is missing. If stream 4 of Figure 5.4 would be removed, the information how to produce a pizza by use of dough and sauce is not defined in the workflow. Thus, the data objects dough and sauce would not be contained in the final specific workflow output pizza and consequently the consistency of the workflow would be violated. In order to retain workflow consistency, a corresponding workflow substitute stream candidate S 0 comprising the missing information is again included into the workflow. The new stream S 0 is inserted at the position of the last sequence node of the removed workflow stream S in W . This means that all edges, tasks, and data nodes (if not already present) of S 0 are inserted into the workflow W . Then, the inserted stream S 0 is connected
136
5 Workflow Adaptation
with control-flow edges such that the stream is linked at the old position (i.e., last sequence node) of the removed stream S in W . In the special case that the last sequence node of S is a control-flow node that is still a part of the partial workflow W after removing workflow stream S, the stream S 0 is inserted behind this particular control-flow node. Inserting a workflow stream as described ensures the syntactical correctness, because a partial workflow (the new stream) is basically inserted at a valid position of the control-flow, such that the resulting workflow is again block-oriented. Since the substitute stream candidate consumes and produces the same data nodes related to other streams, the insertion also restores the consistency. tomatoes
salt
pepper
sauce
pizza
∞
blend
mash
add
+
+
knead
roll
combine
cook
add
chop
sprinkle
combine butter
rye-flour
bake Stream S‘
yeast
water
dough
ground beef
onions
cheese
Figure 5.10: Stream S 0 added to W
In the illustrated scenario, the stream S 0 is inserted behind the last combine task (see Fig. 5.10), since this marks the position of the last sequence node of stream S, i.e., bake. Replacing a stream in such a manner can cause a shift of tasks. In the given example, the start node of the original workflow (see Fig. 5.4) and the adapted workflow differ (see Fig. 5.10). However, the insertion at old the position of the last sequence node ensures that dependencies to the remaining workflow components are considered, i.e., the data is still being processed in the right order. This is because the last sequence node of the workflow stream marks the position at which all required data nodes produced by prior workflow components are available at the latest and also marks the position at which creator data nodes consumed by successor components are potentially produced. Since both the removal as well as the insertion of a workflow stream lead to a block-oriented workflow, also the replacement of a workflow stream results in a block-oriented workflow. Furthermore, as only substitute stream candidates are replaced also the consistency of the workflow is retained, since dependencies to the remaining workflow components are considered. Thus, the compositional adaptation by workflow streams ensures syntactically valid workflows with regard to control-flow and data-flow.
5.2 Compositional Adaptation by Workflow Streams
137
Compositional Workflow Adaptation The overall compositional workflow adaptation is implemented as a local hill-climbing approach, which replaces each of the workflow streams in the workflow by the most appropriate substitute stream candidate. Considering two workflows streams, a stream S 0 is more appropriate compared to another one S, if it leads to a higher query fulfillment for a given query Q, i.e., QF (Q, S 0 ) > QF (Q, S). If a query, for example, would specify that ground beef is desired and salami is undesired, the workflow stream 1 of Figure 5.4 containing salami but no ground beef is less appropriate than the stream illustrated in Figure 5.8 entirely fulfilling the query. Assessing the substitution of a stream concerning workflow adaptation, however, requires that the entire workflow and not only the particular workflow streams are considered. For example, a stream might be more appropriate compared to another one, since it contains a desired workflow element specified in the query. If this desired workflow element is already included in other workflow components, the exchange of the particular streams would not increase the query fulfillment of the adapted workflow. In this regard, a stream replacement is only appropriate, if it leads to a higher query fulfillment in the adapted workflow W 0 compared to the original workflow W , i.e., QF (Q, W 0 ) > QF (Q, W ). The most appropriate stream replacement consequently maximizes the query fulfillment QF (Q, W 0 ) of the adapted workflow W 0 after replacement by the substitute stream candidate S 0. Based on these considerations, a heuristic hill-climbing approach for compositional workflow adaptation is now presented (see Algorithm 5). All streams in the workflow W are initially sorted by their execution order. This means that the workflow streams are sorted by their position of the last sequence node within the workflow. Next, the streams are subsequently replaced, starting with the first stream in the workflow. More precisely, for each stream S, the substitute stream candidate S 0 with the highest query fulfillment is searched, i.e., rank1 (Q, SC). The query fulfillment of the substitute stream candidate must be higher than that of the stream to be replaced, i.e., QF (Q, S 0 ) > QF (Q, S). Furthermore, it is verified whether it increases the query fulfillment QF (Q, W 0 ) for the resulting workflow W 0 after stream replacement. If such a substitution can be found, the resulting workflow W 0 is considered as the new workflow W on which the remaining workflow streams are then subsequently replaced in the same manner. After the processing of all streams, the final workflow W is the resulting adapted
138
5 Workflow Adaptation
workflow of the compositional workflow adaption approach. Since this approach is implemented by a hill-climbing algorithm, an optimal solution cannot always be ensured. However, this heuristic approach significantly speeds-up adaptation processing time. Otherwise, all possible substitute stream candidate combinations would have to be considered, resulting in a large set of possible workflow adaptations to be compared. Algorithm APPLY STREAMS(Q, W , streams) Input: query q, workflow W , stream repository streams Output: adapted workflow forall streams S in the workflow W (sorted by execution order) do SC = substitutionCandidates(S); S 0 = rank1 (Q, SC); if QF (Q, S 0 ) > QF (Q, S) then W 0 = replaceStream(W, S, S 0 ); if QF (Q, W 0 ) > QF (Q, W ) then W = W 0; return W Algorithm 5: Compositional workflow adaptation
5.2.3 Additional Application Fields of Workflow Streams In the following section additional application fields of workflow streams for the purpose of workflow modeling support are sketched. Workflow Abstraction Workflow abstraction means that the level of detail of the workflow is reduced. This abstracted view supports the workflow designer to focus on the most relevant workflow elements and the overall context of the modeled process. Thus, abstraction can support the challenging task of workflow modeling. Please note that abstraction differs from generalization [21]. While in CBR generalization is a means to represent a set of cases by use of a single case (see Sect. 5.1), abstraction aims at storing the case information hierarchically at different levels of detail. Applying this to workflows means, that workflow generalization addresses the replacement of single nodes with more generalized labels and workflow abstraction aims at reducing the granularity of workflows such that the workflow is basically reduced to the most important workflow elements, i.e., particular workflow elements are explicitly not visualized or workflow elements are aggregated.
5.2 Compositional Adaptation by Workflow Streams
139
The workflow streams presented in this section could serve as a means for workflow abstraction, since they basically aggregate tasks to meaningful subprocesses. This means that each stream in the workflow could be replaced by an abstract task. Since workflow streams are either producer streams or processor streams, each abstract task then represents a sub-process which basically processes or creates a particular data node. This idea is illustrated in Figure 5.11 showing an example of an abstracted producer stream and an abstracted processor stream. Here, the abstract tasks describe that a sauce can be produced with tomatoes, salt and pepper or that ground beef, onions, and cheese can be used as a combination for pizza toppings. Both abstractions neglect the particular preparation steps involved and are represented by an abstract task respectively. Thus, the number of tasks is reduced, which also causes the reduction of data-flow information with regard to the data links. A further data-flow related abstraction could be achieved by only retaining anchor data nodes of the workflow (here, pizza or sauce). tomatoes
salt
pepper
peel
mash
add
sauce
tomatoes
∞
abstraction
salt
pepper
pizza
∞
ground beef
add
onions
chop
sprinkle
∞
produce
pizza
cook
sauce
∞
bake
abstraction
cheese
process
ground beef
onions
cheese
Figure 5.11: Workflow stream abstraction examples
These considerations further demonstrate that compositional adaptation of workflow streams is also a kind of a hierarchical adaptation (see Sect. 2.4.5), since replaceable workflow streams are identical at a higher level of abstraction. A related approach to workflow abstraction has been presented by Garijo et al. [72, 73] for scientific workflows. Their so-called workflow motifs aim at providing an abstracted description of the workflows, highlighting overlapping parts between workflows, and at identifying common workflow fragments in the repository. Thus, the workflow designer is supported during workflow modeling and the analysis of the particular workflow repository is enhanced.
140
5 Workflow Adaptation
Component-Based Workflow Modeling From the perspective of workflow modeling support, workflow components are mostly considered as bricks to construct workflows. The reuse of established workflow components significantly supports workflow modeling, by decreasing modeling time and increasing the quality and the soundness of the constructed workflow model. Since workflow streams represent workflow components, they could also be considered as building blocks of the workflow. This means that the workflow modeler could construct a workflow simply by selecting and connecting appropriate building blocks (see Fig. 5.12). More precisely, a building block would basically represent a stream as an abstract task and the specified data anchors as ports. Then, the workflow modeler would need to select appropriate building blocks and connect the ports with each other to construct a workflow model. Consequently, this component-based workflow modeling simplifies the construction of a workflow. produce
salt, pepper, tomatoes
sauce
produce
produce
butter, rye-flour, yeast, water
pizza
dough
Figure 5.12: Example of component-based worfklow modeling
In the illustrated example in Figure 5.12, a dough and a sauce building block are connected to a pizza building block in order to produce a pizza dish. The building blocks would thereby literally shift the control-flow oriented workflow modeling to a data-flow oriented workflow modeling, since the control-flow is not explicitly defined. Furthermore, during modeling of a workflow, matching building blocks could be automatically derived (via unconnected ports) and suggested to the user as possible workflow completions. A related approach to component-based workflow modeling are the socalled Action patterns [229]. They have been suggested as “chunks of actions often appearing together in business processes”[229][p. 1] for the purpose of workflow modeling assistance. By considering the action patterns in the
5.2 Compositional Adaptation by Workflow Streams
141
current workflow model, next actions to be included can be suggested, which serves as a kind of auto-completion during workflow modeling. ABSTRACT COMPONENT (build-time) tomatoes
salt
...
pepper
sauce
∞ ...
produce build-time run-time
WORKFLOW STREAM (run-time alternative 1) tomatoes
salt
pepper
peel
mash
add
sauce
∞
WORKFLOW STREAM (rune-time alternative 2)
OR
tomatoes
chop
salt
pepper
sauce
∞
mix
Figure 5.13: Example of late-binding by component-based workflow modeling
Moreover, such workflow stream components could also be used to support late-binding (see Sect. 2.3.2) of particular sub-workflows. The basic idea is that an abstract component represents a set of workflow streams (with identical data nodes and anchors) of which the most appropriate one is selected during run-time of the workflow. The illustrated example in Figure 5.13 demonstrates, that during build-time it is defined that tomatoes, salt, and pepper are somehow used to produce a sauce and that during run-time a particular specification of this process can be selected, i.e., tomatoes are either peeled and mashed or chopped otherwise. A similar approach to support the late-binding of workflow components was presented by Adam et al. [2]. Their workflow components worklets are defined as “. . . small, self-contained, complete workflow process which handles one specific task[. . . ], in a larger, composite process [. . . ]”[2][p. 3]. This means that a generic process model represents the business process at a macro level. For a particular task, multiple worklets can be defined, representing alternative ways how to perform the particular activity. During the run-time, appropriate worklets are selected by means of rules such that the workflow can be adapted according to the particular situation.
142
5 Workflow Adaptation
Partial Generative Adaptation Generative adaptation is enabled by a problem solver that is able to solve problems from scratch based on general domain knowledge (see Sect. 2.4.5). This has been successfully applied in domains such as planning and would mostly obviate the need for any workflow modeling, since a workflow could be constructed, matching the defined requirements in the query q. Workflow streams could be considered as partial knowledge for such a generative adaptation approach as already briefly sketched by M¨ uller and Bergmann [167]. Partial means that not all possible workflow solutions can be provided, since the required generative domain knowledge is restricted to the streams extracted from the repository. Here, the workflow designer would, additionally to the usual query Q, have to define the specific output (i.e., the “goal”) to be produced. Based on this, a backward-search for an appropriate solution is performed. Starting from all workflow streams producing the specific output, matching predecessor streams are then identified with regard to the particular anchors. More precisely, streams are iteratively searched for streams with output anchors that are identical to the current input anchors until the constructed chain has no unconnected input anchors anymore. Thus, entire workflows producing the specific output can be created from scratch, by chaining appropriate workflow streams to so-called workflow stream chains. The constructed workflow with the highest query fulfillment is finally selected as the provided solution. Since this approach usually generates many possible solutions, a heuristic approach is required in order to control the run-time of the construction process. This construction principle of a workflow can loosely be compared with Case-Based Planning (see Sect. 2.6.3).
pizza sauce
tomato
stream
tomato sauce
sieved tomatoes
tomato
stream
Figure 5.14: Workflow stream chains example
stream
5.2 Compositional Adaptation by Workflow Streams
143
An example of this process is illustrated in Figure 5.14. Assuming that sauce describes the desired specific output to be produced at a generalized level. Starting from all workflow streams producing the desired specific output (here, produce pizza sauce and produce tomato sauce), matching predecessor workflow streams have to be identified. In the illustrated example, the stream to produce tomato sauce requires that sieved tomatoes have been produced priorly while the stream to produce the pizza sauce has no further requirements. Moreover, the presented workflow stream chain approach could also be considered during compositional workflow adaptation, meaning that a workflow stream is not replaced by a single stream but by a corresponding workflow stream chain or vice versa, which would further increase the adaptability of the presented approach. For example, the stream on the left side of Figure 5.14 could then be replaced by the workflow stream chain on the right side. Concurrency Optimization Another approach to support workflow modelers is to optimize their modeled workflow with regard to the execution time in order to improve the performance quality of the workflow (see Sect. 2.2.5). In this regard, parallelization of tasks in the workflow model, where possible, speeds-up the execution of the entire process during enactment. Optimization of concurrency could be achieved by rearranging workflow streams of the currently modeled workflow. Since the anchors of the streams represent clear execution dependencies with regard to the remaining workflow streams, these anchors could be employed to identify whether certain streams have to be executed sequentially or whether two streams could be executed in parallel. A search algorithm could then identify the best stream composition thereby optimizing the workflow. By use of the workflow streams, it is further ensured that tasks are not arbitrarily rearranged, but that meaningful and coherent components are considered instead. In the cooking domain, for example, preparation of the dough and preparation of the sauce for a pizza, can be conducted by two different chefs at the same time. Concurrency should be considered, especially if the workflow is created automatically, for example, by the previously introduced workflow stream chain approach, as it would fundamentally construct sequential workflows.
144
5 Workflow Adaptation
5.3 Transformational Adaptation by Adaptation Operators In Case-Based Reasoning, transformational adaptation is usually based on adaptation rules or adaptation operators (see Sect. 2.4.5) defining the particular adaptation knowledge. Various approaches have been presented in CBR addressing transformational adaptation (e.g., [32, 50, 35]). Adaptation rules contain a set of actions (rule conclusion), which are triggered automatically after retrieval, if certain conditions occur with regard to the query and the retrieved case (rule condition). These actions aim at transferring the problem description and the solution of the retrieved case simultaneously to match the particular query. Adaptation operators [32] in contrast define valid modifications of a case into another valid case, independent from the particular query. Thus, a search algorithm is required in order to identify the most appropriate modifications of a case regarding a particular query. This section presents an approach for transformational workflow adaptation based on adaptation operators. The workflow adaptation operators, here, basically consist of a workflow fragment describing the elements to be deleted and of a workflow fragment describing the elements to be inserted into the workflow. This enables to construct add, delete, and replace operators. The main idea is then to adapt a workflow by applying chains of adaptation operators such that the query fulfillment of the adapted workflow is maximized. The adaptation operators thereby implicitly define applicability constraints with regard to the particular workflow. This limits the set of valid transformations aiming at avoiding inappropriate workflow adaptations. The approach of operator-based workflow adaptation can thus be loosely compared with graph transformation approaches [95, 184]. In contrast to the previous compositional adaptation approach, which exchanges meaningful components of the workflow referring to the same kind of sub-process, almost arbitrary workflow fragments can be removed, added or replaced by operator-based adaptation. As the operator-based approach fundamentally differs from adaptation by workflow streams, another adaptation algorithm and a different method to learn the particular adaptation knowledge is required. The basic approach of operator-based workflow adaptation illustrated in c of original this section was published by M¨ uller and Bergmann [171] ( publication by Springer 2015). The remainder of this section will first define the representation as well as the semantics of workflow adaptation operators. Next, an algorithm is presented, which is able to learn these adaptation
5.3 Transformational Adaptation by Adaptation Operators
145
operators automatically from the repository of workflows. Finally, the approach for transformational workflow adaptation based on these operators is presented.
5.3.1 Representation of Adaptation Operators The idea of adaptation operators for transformational workflow adaptation is loosely based on STRIPS operators [68]. These operators basically describe preconditions determining the requirements of the operator’s applicability and postconditions describing the result after operator application. Roughly speaking, the workflow adaptation operators presented here contain a fragment to be removed from the workflow and a fragment to be added to the workflow respectively. The pre- and postconditions as previously sketched then result implicitly from these two workflow fragments. In the following, the approach for operator-based adaptation will be explained by the example workflow illustrated in Figure 5.15, which describes the process of a pasta dish to prepare penne with shrimps in tomato sauce garnished with rocket. water
penne
salt
drain
boil grate
pasta dish
+
+ peel
devein
shrimps
add
tomato sauce
combine
add
heat
sprinkle
heat mixed herbs
rocket
seafood sauce garlic
Figure 5.15: Example pasta workflow
The previously mentioned workflow fragments are basically partial workflows according to Definition 6 in Section 3.2.1. Compared to workflow streams, these fragments represent smaller entities and are consequently referred to as workflow streamlets. Such a streamlet s is defined for a specific data node d ∈ N D of a workflow W = (N, E), which is also referred to as head data node ds (see Def. 20). A streamlet contains all relevant information (related tasks and data nodes) to process the particular head data node ds . This means, that a partial workflow of W is constructed for all tasks linked with ds , i.e., {t ∈ N T |ds ∈ tD ∨ ds ∈ tD }. Per definition (see Sect. 3.2.2), this partial workflow does not only consist of task nodes but
146
5 Workflow Adaptation
also of all data nodes linked to these tasks as well as related control-flow nodes. Additionally, streamlets include information on the position of the particular head data node within the remaining workflow. More precisely, anchor tasks As denote the activities at which the head data node ds is integrated into other data nodes of the workflow. According to this definition, streamlets are syntactically valid with regard to the control-flow, since they are block-oriented workflows. Definition 20. A streamlet s = (Ns , Es ) is defined as a partial workflow of a workflow W = (N, E) specified by the subset of tasks connected to a specific data node d ∈ N D , i.e., NsT = {t ∈ N T |d ∈ tD ∨ d ∈ tD }. Furthermore, let the tasks in streamlet s, that do not produce the specific data node d, be defined as anchor tasks As = {t ∈ NsT | 6 ∃(t, d) ∈ EsD }. The specific data node d is also referred to as the head data node ds of the streamlet s. Based on this definition, plain streamlets are now introduced, which will be used later to describe the application of adaptation operators. A plain streamlet basically represents the particular streamlet without the corresponding anchor tasks (see Def. 21). Definition 21. A plain streamlet bs for the streamlet s is specified as a partial workflow without the anchor tasks As of the streamlet s, i.e., Nbs = Ns \ As . An example of a streamlet s extracted from the workflow illustrated in Figure 5.15 for the head data node garlic is shown in Figure 5.16 (see doublelined circle). This means that the depicted streamlet contains all tasks of the workflow linked to garlic. As previously explained, anchor tasks are those activities that consume the head data node but that do not produce it, here, add (see double-lined rectangle). In contrast to the anchors of workflow streams (see Sect. 5.2.1), these anchors do not explicitly denote dependencies to the remaining workflow. Instead, they describe positions at which the particular head data node is integrated into another data node. In the example streamlet the anchor task add describes that after adding garlic to the seafood sauce it is integrated as part of the pasta dish. Furthermore, streamlets may also contain more than one anchor task. For example, if salt is inserted into boiled water and into the pasta sauce, for each of these data nodes a respective anchor task is required. The plain streamlet without the anchor task bs for the given example would solely consist of the task grate, the ingredient garlic, and the corresponding input and output data-flow edge. Using these streamlets, three different types of adaptation operators are introduced, namely exchange, insert and delete operators (see Def. 22).
5.3 Transformational Adaptation by Adaptation Operators
147
pasta dish add
grate seafood sauce
garlic
Figure 5.16: Example of a streamlet W 0
Basically, an adaptation operator consists of an insertion streamlet and a deletion streamlet. While the deletion streamlet, loosely speaking, describes the elements to be removed from the workflow, the insertion streamlet describes those to be added. Consequently, by defining both fragments an exchange operator is constructed. Furthermore, one of these streamlets may also be empty. An empty insertion streamlet specifies a workflow operator removing workflow elements (delete operator ), while an empty deletion streamlet determines an operator inserting workflow elements (insert operator ) respectively. Definition 22. An adaptation operator O = (I, D) is specified by an insertion streamlet I and a deletion streamlet D, of which one can also be empty. Based on the presence of the two streamlets, it is distinguished between three types of adaptation operators: • An insert operator consists of only an insertion streamlet I with the head data node dI and the anchor tasks AI . • A delete operator consists of only a deletion streamlet D with the head data node dD and the anchor tasks AD . • An exchange operator consists of an insert and a delete operator. An example exchange operator is illustrated in Figure 5.17. It consists of a deletion fragment with the head data node shrimps and an insertion fragment with the head data node salmon, respectively. This operator basically describes the replacement of shrimps by salmon, which would additionally require that the preparation steps peel and devein are replaced by chop. The particular anchors add and mix (marked by double-lined rectangles) provide necessary information at which position of the workflow the corresponding modifications are executed.
148
5 Workflow Adaptation
shrimps
peel
tomato sauce
devein
seafood sauce
add
deletion streamlet D
salmon
mayonnaise
chop
seafood sauce
mix insertion streamlet I
Figure 5.17: Example adaptation operator
5.3.2 Semantics of Adaptation Operators The insertion and deletion streamlet of the adaptation operators implicitly describe preconditions determining the requirements for an operator to be applicable to a workflow W and postconditions describing the result after operator application. As a precondition for all adaptation operators, it is required, that the resulting workflow after application is a consistent block-oriented workflow according to Definition 5 in Section 3.2.1. Thus, the syntactical correctness with regard to control-flow as well as data-flow is ensured after application. This requirement can be compared to constraints in graph transformation [95], which restrict the set of applicable operators to those leading to admissible and valid graphs. An insert operator O = (I, ∅) further requires that the head data node Id of the insertion streamlet I is not contained in the workflow. This ensures that streamlets with the same head data node that are already contained in the workflow are not extended, since streamlets must be either deleted, inserted, or replaced. Additionally, matching anchor tasks Ad of the insertion streamlet I must be available in the workflow W , since otherwise the position to insert the particular workflow elements cannot be identified. The application then basically inserts the plain insertion streamlet at the corresponding position. The application of a delete operator O = (∅, D) requires that a streamlet s within the workflow W exists, which is sufficiently similar to the deletion streamlet D. If such a streamlet is available, the application of the delete operator basically removes the related elements of streamlet s from the workflow. The exchange operator O = (I, D) aggregates the preconditions and postconditions of the insert and delete operator, which means that an exchange operator first executes the related delete operation and subsequently the related insert operation, if the corresponding applicability conditions are fulfilled. The consistency might be temporarily violated after delete operator
5.3 Transformational Adaptation by Adaptation Operators
149
application, but must be restored after the application of the related insert operation. The exchange operator is transactional, meaning that it can only be applied entirely. These considerations are summarized in the following definition. Definition 23. Let W be the set of all consistent block-oriented workflows. A workflow adaptation operator is a partial function o : W 7→ W transforming a workflow W ∈ W into an adapted and consistent block-oriented workflow o(W ) ∈ W by use of the corresponding insertion streamlet I or deletion streamlet D. • The application of an insert operator O = (I, ∅) to W inserts the plain insertion streamlet Ib at the positions of the best-matching anchor tasks AI . The operator is only applicable (precondition A) iff in W tasks matching AI exist, the head data node Id is not already contained in the workflow, and if the resulting workflow is consistent. • The application of a delete operator O = (∅, D) to W deletes the related b from W . The operator is only elements to the plain deletion streamlet D applicable (precondition B) iff there exists a workflow streamlet s in W , which is sufficiently similar to the deletion streamlet D and if the resulting workflow after deletion is consistent. • The application of an exchange operator O = (I, D) to W applies the deletion operation determined by D and subsequently applies the insertion operation determined by I on the workflow W . The operator is only applicable iff both previously defined preconditions A and B are fulfilled. According to this definition, the example exchange adaptation operator O = (I, D) in Figure 5.17, roughly speaking, exchanges a streamlet s in a workflow W by the insertion streamlet I. The operator is only applicable, if the streamlet s is sufficiently similar to the deletion streamlet D and if matching anchor tasks AI (here, mix ) for the insertion streamlet I exist and the corresponding head data node Id (here, salmon) is not already included in the workflow.
5.3.3 Details of the Operator Application The application of the workflow adaptation operators as previously introduced will now be explained in more detail and demonstrated by an example in which the exchange operator O = (I, D) shown in Figure 5.17 is applied to the workflow W illustrated in Figure 5.18.
150
5 Workflow Adaptation
In general, the application of adaptation operators must result in consistent block-oriented workflows to ensure the executability of the modified workflow (see Sec 3.2.1). It has been shown that this definition needs to be extended for operator-based adaptation, since almost any modifications on the input and output data nodes for tasks are permitted, though certain tasks absolutely require more than one input data node for a proper execution (e.g., season tomatoes with salt ). Consequently, for the consistent block-oriented workflows considered in this particular approach, it is additionally required that tasks that have more than one input data node prior to application of an operator, i.e., {t ∈ N T ||tD | > 1} also have more than one input after the operator has been applied. According to the definition, a delete or exchange operator O = (I, D) with deletion streamlet D is applicable to a workflow W , if a workflow streamlet s ∈ W with the head data node sd exists that is sufficiently similar. Thereby, the application context of the operator is verified. This condition is implemented by a similarity threshold ∆s ∈ [0, 1], denoting the required semantic workflow similarity between the corresponding plain streamlets b ≥ ∆s . The comparison of the plain streamlets of s and D, i.e., sim(bs, D) verifies, whether the corresponding head data nodes are processed in a similar manner without considering the context to the remaining workflow (by ignoring anchor tasks). Furthermore, this condition requires that the head data nodes of the streamlets are identical, which means that the head data node Dd of the deletion streamlet D is equal or more general than the head data node sd of the workflow streamlet s, i.e., S(sd ) v S(Dd ). Thus, also generalized operators can be supported. The requirements of identical head data nodes and the minimum similarity between the plain streamlets then serve as a precondition to check whether the streamlet s in the workflow W is similar enough to the deletion streamlet D of the operator. water
penne
salt
drain
boil grate
pasta dish
+
+ peel
devein
shrimps
add
tomato sauce
combine
add
heat
sprinkle
heat mixed herbs
seafood sauce
Figure 5.18: Head data node sd removed from W
rocket garlic
5.3 Transformational Adaptation by Adaptation Operators
151
The deletion streamlet of the example operator fulfills these conditions for the workflow in Figure 5.15, since the head data nodes and the streamlets b = 1. By application of are identical, i.e., S(sd ) v S(Dd ) and sim(bs, D) the delete or exchange operator, the streamlet s is indirectly removed from W . More precisely, the corresponding matching head data node sd of the streamlet s is deleted first. In the given example, this means that shrimps and the related data links are removed from the workflow (see grey elements in Fig. 5.18). In a following cleanup function, further workflow elements, only related to the streamlet s, will be removed. Moreover, this function restores potentially violated consistency by deletion of the head data node. In case of an exchange operator, this cleanup function is executed after the insertion operator has been applied, because otherwise potential matching anchors might have already been removed. To determine the applicability of an insert or exchange operator O = (I, D) to a workflow W with regard to the insertion streamlet I, matching anchor tasks for AI in workflow W must be identified. Therefore, a partial workflow is constructed for each anchor task AI in the insertion streamlet, containing the particular anchor task as well as all connected data nodes respectively. Additionally, for each anchor task AW in the workflow W such a graph is determined. The anchor graph of the insertion streamlet AGI and of the workflow AGW previously constructed, are defined as matching, if the output data nodes and the types of the anchor tasks are identical. This means, that the outputs of the anchor tasks in AGI must contain equal or more generalized labels compared to those of the corresponding anchor tasks in AGW . Furthermore, the anchor task in the insertion streamlet AGI and in the workflow streamlet AGW must be of the same type. More precisely, a creator anchor task denotes that the head data node is consumed to produce a new data node, while a processing anchor task specifies that the head data node is processed, but not integrated within an other data node (see Sect. 3.2). Thus, both types of anchor tasks differ and for applicability they must be either creator or processing tasks. This ensures that the insertion streamlet is only inserted at appropriate positions that are similar to those the operator was learned from. Moreover, the similarity of the respective tasks must be higher than the anchor task threshold ∆T , ensuring that anchors are only considered if they at least partially represent the same activity. This is important, as creator tasks fundamentally differ from other tasks occurring in workflows. Altogether, this ensures that the application context for the insert operation is considered. From all matching anchor graphs, the matching with the highest similarity between the respective graph of the insertion streamlet AGI and the respective graph of the workflow AGW
152
5 Workflow Adaptation
according to the semantic similarity (see Sect. 3.5), i.e., sim(AGI , AGW ), is chosen as the best-matching anchor . Regarding applicability of an insert or exchange operator such a best-matching anchor is needed to ensure that the streamlet can be added to a suitable position of the workflow merging in an appropriate data node. Moreover, the head data node must not already be contained in the workflow, as this would extend an already present streamlet, which is not desired by operator application. The insertion streamlet I of the example exchange operator is applicable to the workflow shown in Figure 5.18, since the head data node salmon is not already contained in the workflow and for the mix anchor in the insertion streamlet, add has been identified as matching anchor, i.e., both tasks are similar (mix v add ) and produce the same output (seafood sauce). water
salt
pasta dish
drain
boil grate
penne
+
+ peel
devein
chop
salmon
add tomato sauce
combine
add
heat
sprinkle
heat mixed herbs
seafood sauce
rocket garlic
Figure 5.19: Plain streamlet Ib added to W
If these conditions are successfully validated, the insertion streamlet is inserted at the position of the best-matching anchor by adding all edges, tasks (except of anchor tasks), and data nodes (except those already present or only linked to anchor tasks) of the insertion streamlet AI to the workflow W , i.e., the plain streamlet Ib is basically added to the workflow. The inserted tasks are then linked in front of the corresponding best-matching anchor task in W . In the special case, that the insertion streamlet contains more than one anchor, the streamlet is split such that each part of the split streamlet contains all previous transitively data-flow connected tasks until another anchor is reached. Then, for each part of the split streamlet a partial workflow is constructed and the insertion procedure is applied respectively. As the insertion operation thereby only adds partial workflows at specified positions within the workflow, the resulting workflow is again block-oriented. Thus, syntactical correctness with regard to the control-flow is ensured. In the given example scenario, the insertion streamlet I (see Fig. 5.19, marked in grey) is inserted in front of the best-matching anchor add.
5.3 Transformational Adaptation by Adaptation Operators
153
As already mentioned, a cleanup function removes further elements from the workflow. This function is only triggered, if a head data node has been previously removed due to the application of a deletion streamlet D. More precisely, all unproductive tasks X previously connected to the head data node Dd that at this point either have no input or output data nodes, are removed. Hence, a partial workflow is constructed, containing all tasks of W except of the unproductive tasks X , i.e., N T \ X . By this means the corresponding workflow streamlet is not removed entirely, but rather those tasks becoming unproductive after the removal of the head data node. This is important as a single task can be related to more than one streamlet. In this case, the task cannot be deleted, since otherwise an inconsistent workflow would be constructed more likely. Again, as the result is a partial workflow, the syntactical correctness after applying a deletion or exchange operator is ensured. In the example scenario, this cleanup results in the workflow illustrated in Figure 5.20, meaning that the peel and devein tasks are removed from the workflow, since they do not contain any input or output data and were previously connected to the removed head data node shrimps.
water
salt
pasta dish
drain
boil grate
penne
+
+ chop
salmon
add tomato sauce
combine
add
heat
sprinkle
heat mixed herbs
rocket
seafood sauce garlic
Figure 5.20: Cleanup applied to W
Since the insertion as well as the deletion modifications on the workflows retain a block-oriented workflow structure and the adaptation operators are further only applicable if they are consistent, the application of adaptation operators ensures the syntactical correctness of the adapted workflows.
154
5 Workflow Adaptation
5.3.4 Automated Learning of Adaptation Knowledge The learning of required adaptation operators (see Algorithm 6) is based on a comparison of similar workflows from the repository. As with workflow generalization (see Sect. 5.1.1), the threshold ∆W ∈ [0, 1] defines, whether a query workflow Wq and a case workflow Wc are similar with regard to the semantic workflow similarity (see Sect. 3.5), i.e., sim(Wq , Wc ) ≥ ∆W . For each pair of similar workflows, adaptation operators are derived from the data nodes that distinguish the particular workflows. More precisely, the created operators roughly describe the workflow elements to be exchanged, inserted or deleted in order to transform the data nodes of the query workflow Wq to those of the case workflow Wc . Algorithm LEARN OPERATORS(RP ) Input: Workflow repository RP Output: Set of operators operators operators = ∅; forall Wq ∈ RP do forall {Wc ∈ RP |sim(Wq , Wc ) ≥ ∆W ∧ Wq 6≡ Wc } do forall d ∈ NqD do Init operator o; o.insert = ∅; o.delete = construct streamlet(Wq , d); if sim(d, mmax (d)) ≥ ∆D then o.insert = construct streamlet(Wc , mmax (d)); if o.delete 6≡ o.insert ∧ applicable(o, Wq ) then operators = operators ∪ {o}; forall {d ∈ NcD | 6 ∃d0 ∈ NqD : (mmax (d0 ) = d ∨ sim(d0 , d) ≥ ∆D )} do Init operator o; o.delete = ∅; o.insert = construct streamlet(Wc , d); if applicable(o, Wq ) then operators = operators ∪ {o}; return operators Algorithm 6: Learning of workflow adaptation operators
The mapping determined during the similarity computation (see Sect. 3.5) provides useful information for this purpose. It describes how data nodes of the query workflow NqD are mapped to data nodes of the case
5.3 Transformational Adaptation by Adaptation Operators
155
workflow NcD . A higher similarity between mapped nodes thereby indicates that they can be more likely substituted with each other. Thus, for each pair of mapped data nodes, a replace operator o is constructed, if the similarity between the particular nodes is higher than a specified threshold parameter ∆D ∈ [0, 1]. The corresponding replace operator, consists of a deletion streamlet determined by the query data node d ∈ NqD and an insertion streamlet specified by the corresponding mapped case data node mmax (d) ∈ NcD . Furthermore, both streamlets must not be identical (i.e., o.insert 6≡ o.delete), because otherwise the operator is ineffectual. For the remaining data nodes, insert or delete operators are subsequently constructed. More precisely, the corresponding data nodes of the query workflow determine delete operators and the corresponding data nodes of the case workflow construct insert operators respectively. In general, the learned operators are only stored, if they are applicable to the query workflow Wq , i.e,. applicable(o, Wq ). This prevents the learning of inappropriate and dispensable workflow operators that are not applicable in the situation they are learned from. query workflow Wq
case workflow Wc
chicken
exchange mmax
ham
cheese
delete mmax
∅
∅
insert mmax
garlic
Figure 5.21: Similarity mappings and resulting adaptation operators
This operator construction principle is illustrated in Figure 5.21 by means of an example mapping of data nodes between a query workflow Wq and a case workflow Wc . Here, the query workflow Wq contains the ingredients (or data nodes) chicken and cheese and the case workflow Wc contains the ingredients ham and garlic. In the illustrated example, chicken has been mapped to ham with a sufficient similarity value with regard to the threshold ∆D , resulting in the creation of an exchange operator consisting of the deletion streamlet determined by chicken and an insertion streamlet specified by ham. The data node cheese has not been mapped with a sufficient similarity to any
156
5 Workflow Adaptation
data node of workflow Wc . Thus, a delete operator is learned, consisting of a corresponding cheese deletion streamlet. Since the ingredient garlic of workflow Wc has also not been mapped with a sufficient similarity, an insert operator is constructed with a corresponding garlic insertion streamlet. Applying the operators learned from a particular mapping to the query workflow Wq , would then basically result in the corresponding case workflow Wc with regard to the set of data nodes. However, the learned operators do not only transform the particular data nodes, but also describe how the control-flow of the workflow has to be adapted accordingly. This is because the workflow streamlets also contain information about the processing of the particular head data node (see example operator in Fig. 5.16). Thus, as defined in Section 5.3.3 the application also causes an exchange, deletion, or insertion of tasks or control-flow nodes, when applying the learned operators. The operators learned from all pairs of similar workflows, are finally stored in a specific adaptation operator repository. For maintaining this repository, duplicate adaptation operators are removed, to eliminate redundant knowledge and to speed-up the adaptation process. This means that equal or more specific adaptation operators are deleted2 . Given two adaptation operators o1 and o2 , the operator o1 is considered as equal or more generalized than o2 , i.e., o1 v o2 , iff the operators o1 and o2 are of the same type (insert, delete or exchange operator) and the corresponding streamlets of o1 are equal or more generalized compared to the respective streamlets of o2 according to Definition 14 in Section 5.1.1. Since this definition requires that the workflows are structurally equal and each mapping represents an equal or more generalized relationship, this also ensures that the head data node as well as the anchor tasks of the corresponding streamlets must be equal or more generalized. Furthermore, replace operators are removed whose insertion and deletion streamlet are identical, since they would basically perform no operation on the workflow.
5.3.5 Workflow Adaptation using Adaptation Operators By employing the previously introduced adaptation operators, transformational workflow adaptation aims at increasing the query fulfillment of the workflow W considering a query Q = (Q+ , Q− ). This basically means, that undesired properties are removed and desired properties are added to the workflow in order to achieve a higher query fulfillment. More precisely, the
2 similar
to the maintenance approach for workflow streams (see Sect. 5.2.1)
5.3 Transformational Adaptation by Adaptation Operators
157
goal of transformational adaptation is to maximize the query fulfillment QF (Q, W ), by applying a chain of workflow adaptation operators. o1.1 o1.2
W o1.3
o1.2(W) o1.2(W) o1.3(W)=W1
max query fulfillment o1.4
o1.4(W)
o2.1
o2.1(W1)
o2.2
o2.2(W1)
o2.3
o2.3(W1) = W2 max query fulfillment
on.1
on.1(W2)= Wn max query fulfillment
Figure 5.22: Operator-based adaptation based on Bergmann [16][p. 232]
As the identification of the best chain is computationally expensive, a hill climbing search algorithm is employed (see Fig. 5.22), which locally optimizes the query fulfillment QF (Q, W ) by choosing the best operator in each adaptation step. Roughly speaking, all applicable operators are initially used to adapt a given workflow W , resulting in a set of adapted workflows. Next, the best adapted workflow W1 is chosen, i.e., the one with the highest query fulfillment. The adaptation process then continues with the best adapted workflow W1 . Again the operator is chosen, which leads to the highest query fulfillment. This process finally returns the adapted workflow Wn , constructed by applying a chain of adaptation operators. Using such a hill-climbing approach may result in a local optimum of the query fulfillment, but significantly speeds-up the entire adaptation process compared to a brute-force search for the best operator chain. The operator-based workflow adaptation is now described in more detail based on Algorithm 7. Prior to workflow adaptation, the number of adaptation operators is reduced, which aims at improving the adaptation performance with regard to time and quality. This means that certain operators are filtered from the adaptation operator repository for a particular workflow adaptation. More precisely, exchange and delete operators are filtered out, if the corresponding head data node of the deletion streamlet is not contained in the workflow W , as those operators are not applicable. Furthermore, insert and exchange operators must not contain undesired workflow components of the query Q− within the insertion streamlet, which aims at avoiding the insertion of undesired workflow components during the adaptation process. Insert operators are filtered out, if they do not contain a head data node or task node, which is exactly matched by one of the desired properties Q+ . This should avoid the application of many insert
158
5 Workflow Adaptation
operators, each of which only slightly increases the query fulfillment. This would potentially result in a large set of elements added to the workflow W , without addressing actual requirements specified in the query Q+ . Most likely, this would significantly reduce the semantic quality of the adapted workflow. Please note that the filtering of insert operators does not consider already fulfilled desired properties Q+ in the workflow W , since those elements might be priorly removed during adaptation by the application of a delete or exchange operator. Algorithm APPLYING OPERATORS(Q, W , Operators) Input: query Q, workflow W , repository of operators Operators Output: adapted workflow Operators = f ilterOperators(Operators); forall d ∈ N D (sorted by first usage) do streamlet = extractStreamlet(d, W ); O = exchangeDeleteOperators(Operators, streamlet, W ); o ∈ {o ∈ O| 6 ∃o0 ∈ O : QF (Q, o0 (W )) > QF (Q, o(W ))}; if QF (Q, o(W )) > QF (Q, W ) then W = o(W ); IO = insertOperators(Operators); while true do o ∈ {o ∈ IO| 6 ∃o0 ∈ IO : QF (Q, o0 (W )) > QF (Q, o(W ))}; if QF (Q, o(W )) > QF (Q, W ) then W = o(W ); IO = IO \ o; else break; return W Algorithm 7: Operator-based workflow adaptation
Based on the filtered adaptation operator repository (see Algorithm 7), the workflow W is adapted according to the query Q. Initially, the data nodes N D are ordered with regard to their first usage in the workflow W . More precisely, a data node d1 ∈ N D is utilized before another data node d2 ∈ N D , if d1 is consumed prior to d2 as an input of a task, i.e., if ∃t, t0 ∈ N T : d1 ∈ tD ∧ d2 ∈ t0D ∧ t < t0 . In the given example workflow, shrimps would be utilized before mixed herbs as an input of a task. Starting with the streamlet specified by the first data node according to the previously constructed order (i.e., shrimps), exchange or delete operators are subsequently applied to the streamlets in workflow W . This means that for each workflow streamlet the applicable
5.4 Characteristics of Workflow Adaptation Methods
159
exchange or delete operators are determined (see exchangeDeleteOperators, in Algorithm 7) and the operator o is applied, which maximizes the query fulfillment QF (Q, o(W )). The following operators are then applied to the resulting workflow W = o(W ). After all workflow streamlets are processed, the workflow is further adapted by applying insert operators, that maximize the query fulfillment QF (Q, o(W )) after application. The insert operators are iteratively applied to the resulting workflow W = o(W ) until the query fulfillment cannot be further increased. The final workflow W is then considered as the result of the operator-based adaptation. Please note that operators that do not increase the query fulfillment, i.e., QF (Q, W ) = QF (Q, o(W )) are never applied during the entire adaptation process. In general, the conception of adaptation approaches can be considered as a trade-off between increasing the query fulfillment and decreasing the adaptation time. Though the presented approach is based on a heuristic search, it is still highly computationally expensive, compared to the previously presented adaptation approaches. This is caused by the large amount of operators learned and their elaborative applicability validation, which basically requires that the operator is actually applied to determine the consistency of the adapted workflow. Moreover, after each adaptation step the adapted workflow with the highest query fulfillment needs to be identified, which results in many expensive computations. For the operator-based adaptation, the adaptation processing time could be reduced by computing the query fulfillment only locally for the modified workflow elements. However, as this may significantly reduce query fulfillment potentials with regard to the entire workflow, it is consequently not considered further.
5.4 Characteristics of Workflow Adaptation Methods The various characteristics of the three presented adaptation approaches are now demonstrated, also highlighting their different advantages and disadvantages. In contrast to other workflow adaptation approaches ( see Sect. 2.6.2) that only focus on the control-flow of the particular workflow, the presented approaches explicitly involve the corresponding data-flow information. Considering the data-flow has the advantage that the context of the tasks can be considered during adaptation by regarding the associated data nodes. Moreover, by considering the data-flow information, actually related tasks can be identified, that process the same data nodes. When the considered information is restricted to the control-flow, this is not possible,
160
5 Workflow Adaptation
since subsequently executed tasks may be unrelated. Thus, the presented approaches are deliberately based on specified control-flow as well as dataflow information. In general, adaptation can be considered as a multi-objective optimization problem, which aims at optimizing the query fulfillment, the adaptation time, and the quality. More precisely, adaptation should maximize query fulfillment, minimize the adaptation time, and also construct a workflow with a high quality. Since all these criteria can less likely be fulfilled simultaneously, an appropriate trade-off between these dimensions has to be identified. All adaptation approaches were developed aiming at finding a suitable trade-off between those dimensions, which individually depends on the respective approach. Additionally, the approaches can be configured by several parameters, which enables an adjustment to the particular application scenario. The adaptation by specialization and generalization, for example, can be configured by specifying the degree of generalizations. If workflow nodes are generalized to higher nodes in the taxonomy they cover a larger range of most specific values, thus increasing query fulfillment, but also reducing the quality of the produced output. Table 5.1: Properties of workflow adaptation methods
changes run-time coverage requirements
generalization
compositional
operatorbased
elements low none
components medium similarly structured workflows and consistent block-oriented workflows
fragments high consistent block-oriented workflows
The main characteristics of the three approaches are illustrated in Table 5.1, demonstrating their different properties. Regarding the type of changes applied to the workflow, generalization is based on the replacement of task and data labels. Thus, the structure of the workflow is retained. Compositional adaptation in contrast exchanges entire meaningful workflow components and the operator-based approach modifies the workflow by smaller workflow fragments. This shows that adaptation by generaliza-
5.4 Characteristics of Workflow Adaptation Methods
161
tion may be inappropriate, if a structural change is required, whereas the replacement of workflow streams could lead to an over-adaptation, if only a few modifications are needed. In the event that comprehensive changes are necessary, operator-based adaptation usually applies many operators, potentially resulting in a more inappropriate adapted workflow. However, the adaptation in each of the three approaches is only performed if this explicitly increases the overall query fulfillment indicating that the particular adaptation is desired. In overall, it can still be concluded that the optimal type of change largely depends on the particular adaptation scenario. Furthermore, the approaches differ significantly regarding their run-time. In this respect, adaptation by generalization is characterized by a low computation time, since the required adaptation knowledge (i.e., the generalized workflow) is already available after retrieval. In contrast, compositional adaptation as well as operator-based adaptation require to determine and apply appropriate adaptation knowledge (i.e., workflow streams or adaptation operators) and to assess the resulting workflows for each adaptation step. This significantly affects the run-time of both approaches. Compared to compositional adaptation, the operator-based adaptation results in an even higher computation time. This is because the estimation of applicable adaptation knowledge is more expensive (no simple anchor verification), the adaptation knowledge repository is larger (more workflow operators learned), and finally more adaptation steps are performed (replacement of smaller fragments instead of components). Furthermore, the run-time of both approaches, i.e., operator-based and compositional adaptation, is affected by the size of the workflow, since on a larger workflow (typically containing more streamlets or streams) usually more adaptation knowledge can potentially be applied. In case of a continuously expanding adaptation knowledge repository, cluster-based retrieval [166] or a MAC/FAC approach [27] could potentially be employed in order to control the increasing run-time for determining appropriate adaptation knowledge. The number of workflows that can be constructed from a particular workflow, additionally distinguish the presented adaptation approaches. This number, also referred to as coverage or adaptability of a workflow, measures the amount of scenarios for which the particular workflow can provide appropriate solutions by adaptation. In general, the coverage significantly relies on the amount of present adaptation knowledge. However, the approaches are based on various kinds of changes that can be performed on the workflow, and thus also differ with regard to their adaptability. As generalizations facilitate no structural changes, the number of supported scenarios is limited to structurally identical workflows. Consequently, their coverage is
162
5 Workflow Adaptation
considered to be rather low. Compositional adaptation is characterized by a higher adaptability, since structural changes are supported. However, these changes are restricted to replacing components with a similar purpose. In contrast to this, operator-based adaptation enables to perform almost arbitrary modifications on the workflow by replacing, adding or removing workflow fragments. This again shows that the adaptability of a workflow with regard to a particular adaptation approach depends on the respective adaptation scenario. Finally, the approaches have different requirements. In general, taxonomies of task and data labels must be specified and annotated by similarity values in order to implement the semantic workflow similarity and the POQL query fulfillment measure (see Chap. 3+4). Adaptation by generalization has no additional requirements. Thus, it can basically be applied to most kinds of workflow representations. In contrast, operator-based as well as compositional adaptation are restricted to consistent block-oriented workflows, when learning and applying the corresponding adaptation knowledge. If a workflow repository violates this particular requirement, the stored workflow models could potentially be transformed into block-oriented workflows (e.g., [147]). Furthermore, compositional adaptation only exchanges workflow components with similar purpose (components with identical anchor data nodes), which means that learned adaptation knowledge can only be applied to workflows with similar structure.
5.5 Integrated and Combined Workflow Adaptation The various characteristics of the presented adaptation approaches point out, that depending on the particular adaptation scenario another adaptation approach is required. Thus, a single adaptation process is constructed aiming at increasing the adaptability of the workflow modeling assistance by combining and integrating all three adaptation approaches. This integrated workflow adaptation process was presented by M¨ uller and Bergmann [169] and can basically be divided into two phases. In the learning phase, all required adaptation knowledge is determined automatically from the workflow repository (see Sect. 5.5.1). In the following application phase, the learned adaptation knowledge is accessed, in order to adapt a workflow with regard to the requirements and restrictions specified in a POQL query (see Sect. 5.5.2).
5.5 Integrated and Combined Workflow Adaptation
163
5.5.1 Learning of Adaptation Knowledge Learning the required adaptation knowledge for the integrated adaptation process is basically accomplished by determining the corresponding adaptation knowledge for each integrated adaptation approach. The automatic generation of adaptation knowledge is implemented as an integrated and combined learning process as illustrated in Figure 5.23. Initially, the workflow repository is generalized automatically according to the approach presented in Section 5.1.1. Based on the generalized workflow repository, workflow streams and adaptation operators (see Sect. 5.2+5.3) are determined automatically. Thus, the learned workflow streams and adaptation operators also become generalized. This increases the adaptability of the respective adaptation knowledge and thereby of the corresponding adaptation approach. Finally, redundant and superfluous adaptation knowledge is removed from each adaptation knowledge repository (see respective approaches in Chap. 5.3.5) in order to accelerate the following adaptation process.
generalization
workflow repository
+ generalized workflow repository
workflow streams
adaptation operators
adaptation knowledge
Figure 5.23: Generation of adaptation knowledge
The generation of adaptation knowledge is highly computationally intensive. However, the learning process is executed only once whenever the workflow repository changes. This process is implemented as a background process, updating the adaptation knowledge repositories immediately after the re-computation has been terminated.
5.5.2 Performing Adaptation Based on the automatically learned adaptation knowledge, a workflow can be adapted for a given POQL query. Roughly speaking, the integrated adaptation process is performed by chaining all adaptation approaches presented in this chapter. The approaches are thereby arranged according to their run-time and their coverage characteristics (see Sect. 5.4), such that a
164
5 Workflow Adaptation
suitable workflow solution is provided as soon as possible and only minimal changes are applied. The latter aims at preventing unnecessary modifications that would potentially reduce the quality of the resulting workflow. retrieval
? POQL Query
specialization
workflow from generalized repository
generalized repository
adaptation
specialized workflow
workflow streams
specialization
adapted generalized workflow
adapted workflow
adaptation operators
Figure 5.24: Combined workflow adaptation process
Following these considerations, the integrated adaptation process (see Fig. 5.24) is now explained in more detail. The best-matching generalized workflow is initially specialized (see Sect. 5.1.2) in order to determine the restrictions and requirements denoted in the query that can be satisfied without structural changes. The respective query elements can then be neglected during the remaining adaptation process. Next, workflow streams (see Sect. 5.2.1) are applied in order to exchange components of the workflows. Then, the more time-consuming operator-based adaptation (see Sect. 5.3) is executed. Finally, the adapted workflow is again specialized, since the workflow may contain generalized workflow elements potentially inserted by workflow streams or adaptation operators. As a result, a workflow is constructed automatically according to the query by chaining all three adaptation methods. This adaptation process is prematurely terminated before completion, as soon as the adapted workflow entirely fulfills the query, i.e., QF (Q, W ) = 1, since the application of additional modifications cannot optimize the query fulfillment of the workflow any further.
5.6 Conclusions Despite its importance, only a few approaches for the adaptation of workflow models have been presented so far (see Sect. 2.6.2). This chapter introduced three different adaptation approaches for workflows that are based on common Case-Based Reasoning adaptation models (see Sect. 2.4.5). Furthermore, for each of these adaptation algorithms an automated learning method
5.6 Conclusions
165
for the required adaptation knowledge was presented. This chapter also discussed various characteristics of the approaches showing their particular advantages and disadvantages. Based on these considerations, a single workflow adaptation process was constructed, that combines and integrates all three presented adaptation methods. Chaining the retrieval and adaptation approaches presented in this thesis, then enables the automated construction of individual workflows, which may significantly support workflow modelers. Moreover, for workflow streams additional application scenarios (see Sect. 5.2.3) for the purpose of workflow modeling were sketched. A comprehensive evaluation of the three adaptation approaches is described in Chapter 8.
6 Improving Workflow Modeling Assistance This chapter discusses several approaches to improve the presented workflow modeling assistance (see Fig. 6.1). The main objective is to further increase the utility of the constructed workflow. The approaches address the completion of underspecified workflow models, the maximization of the query fulfillment by considering the adaptability already during retrieval, and the integration of additional criteria into the retrieval and adaptation process as will be explained in more detail below.
query
retrieval
adaptation
workflow
knowledge containers CBR System
Figure 6.1: Workflow modeling assistance (abstract illustration)
The workflow modeling assistance entirely depends on the information specified in the four CBR knowledge containers (see Sect.2.4), i.e., the case base (workflow repository), the vocabulary, the similarity measure, and the adaptation knowledge container. Underspecified or incomplete knowledge containers result in the construction of inappropriate or deficient workflow models. Thus, their appropriate definition is essential to ensure that useful workflow models are generated. For implementing the workflow modeling assistance the requirements on the knowledge containers are mostly limited. More precisely, to deploy the vocabulary and the similarity container only domain-specific taxonomies annotated with similarity values (see Sect. 3.3) have to be defined. Moreover, the required adaptation knowledge is learned fully automatically (see Sect. 2.4.5). However, the workflow repository © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_6
168
6 Improving Workflow Modeling Assistance
requires sufficiently specified workflows for a proper execution of the workflow modeling assistance, i.e., consistent block-oriented workflows (see Sect. 3.2.1) annotated with labels from the specified taxonomies. The construction of an appropriate repository with sufficiently specified workflow models can become a demanding task, since the manual modeling of workflows is timeconsuming and other methods to gather workflow models (e.g., automatic extraction or reuse of available workflow repositories) often still require an extensive manual revision. Thus, this chapter introduces an approach to automatically complete missing information in workflow models. This supports the construction of consistent block-oriented workflows, thereby facilitating that the knowledge containers are sufficiently specified such that appropriate workflow models can be generated. Moreover, this further reduces the effort for implementing the PO-CBR application. Another impact on the utility of the constructed workflow solution is caused by the basic principle of the CBR cycle (see Sect. 2.4.1). Commonly, the best-matching case is selected during retrieval and then adapted according to the query. Smyth and Keane [233] state that this clear separation of the retrieval and adaptation stage can prevent the identification of the bestmatching case, i.e., the workflow with the highest query fulfillment. More precisely, during traditional retrieval, the best-matching case is estimated independently from its adaptation capabilities. However, an initially less matching case might be better adaptable to the particular query. Thus, adaptation capabilities have to be considered already during retrieval in order to select the most appropriate case. Consequently, this chapter introduces approaches to heuristically determine the adaptability of workflows and to consider the estimated adaptability during retrieval in order to provide workflows with a higher query fulfillment. Finally, it can be assumed that the utility of the constructed workflow is not only determined by the query fulfillment but depends on several additional criteria. For instance, a workflow with a high query fulfillment is not useful, if the workflow is characterized by a low quality (see Sect. 2.2.5). Thus, the presented workflow modeling assistance already aims at ensuring the workflow quality by guaranteeing the syntactical correctness of the workflow and by learning appropriate chunks of adaptation knowledge. However, in certain scenarios, additional utility criteria on the constructed workflow may apply. In particular, the reduction of complexity can be important, as it increases the understandability of the workflow. Thus, the complexity can significantly affect the utility of the constructed workflow (e.g., for novice workflow modelers). Consequently, this chapter presents an approach to also consider the workflow complexity during retrieval and adaptation.
6.1 Data-Flow Completion of Workflows
169
As a result, the workflow modeling assistance provides workflows that are optimized with regard to a high query fulfillment as well as a low workflow complexity. Moreover, the presented approach may be also applicable to various other criteria. In the remainder of this chapter, these approaches are explained in more detail. Furthermore, experimental evaluations will be summarized that demonstrated their feasibility and their potential to enhance the presented workflow modeling assistance.
6.1 Data-Flow Completion of Workflows The modeling of control-flow oriented workflows (see Sect. 2.2.3) is focused on the activities involved in the process and their particular execution order. As a consequence, the data-flow information is frequently neglected. Most public available workflow repositories reflect this frequently occurring incompleteness. The IWi process model corpus [244], for example, comprises more than 4000 process models in various domains. In most of these processes, the control-flow is fully specified, whilst the data-flow information is at most sparsely defined. This incompleteness also concerns automatically constructed workflow models from textual process descriptions or process event logs (see Sect. 6.1.6). However, such incomplete process models affect the workflow modeling assistance in two ways. On the one hand, the modeling assistance might construct inappropriate workflow models, whose execution is impeded as certain activities are not provided with all required information. On the other hand, the reasoning of the modeling assistance is hampered, since the missing information impairs the retrieval and adaptation capabilities. A workflow repository with incomplete workflows thus constitutes a critical problem for the entire workflow modeling assistance. More precisely, the estimated ranking of the workflows in the retrieval stage might be deficient due to missing information. Furthermore, the enactment as well as the adaptation of the retrieved workflows are affected. Additionally, since the adaptation knowledge is learned automatically from the workflow repository, it becomes incomplete as well. As a consequence, adaptation quality is reduced or adaptation is prevented. This demands an automated completion approach in order to avoid the impact of missing information on the workflow modeling assistance and to enhance the provided workflow model. The problems resulting from such incomplete information are not limited to PO-CBR but also apply to CBR in general. Here, case completion (see
170
6 Improving Workflow Modeling Assistance
Sect. 2.4.6) is traditionally addressed by manually defined completion rules (see, e.g., [33]). However, similar to adaptation knowledge (see Chap. 5), this quickly results in an acquisition bottleneck. Thus, in the following an automated completion approach for the data-flow information of workflows is presented, in which completion operators are learned automatically from the workflow repository. new workflow
workflow completion
training workflows
query
default completion rules completion operators
setup initial learn repository workflow completion operators repository learn adaptation knowledge workflow retrieval workflow adaptation
adaptation knowledge CBR System
workflow solution
Figure 6.2: Framework of workflow completion based on [173, 33]
The basic approach to workflow completion presented in this section was c of original publication by first published by M¨ uller and Bergmann [173] ( Springer 2016) and is now sketched in more detail. Initially, a constructed training repository with revised and completed workflows is constructed (see Fig 6.2). This repository then serves as a basis to learn completion operators automatically in order to limit the acquisition bottleneck of completion knowledge. The learned workflow completion operators represent domainspecific knowledge of valid data-flow completions. The remaining workflows are added to the repository and completed automatically by applying these operators in combination with a small set of mostly domain independent completion rules. From each new stored workflow, further completion opera-
6.1 Data-Flow Completion of Workflows
171
tors can be obtained. The data-flow completion increases the information stored in the workflow repository as well as in the adaptation knowledge, which consequently enhances the entire workflow modeling assistance. Moreover, the workflow designer can be supported during the manual modeling or adaptation of a workflow by proposing a potential data-flow completion for the current workflow model. Likewise, the query for the workflow modeling assistance could be automatically completed by suggesting a data-flow completion for each contained workflow fragment. Hence, in both scenarios, the workflow designer is supported by completion recommendations that can be verified and revised, if required. In summary, the data-flow completion can support the workflow modeler in three respects: By autocompletion of a manually modeled or adapted workflow, by enhancing the solution provided by the workflow modeling assistance and finally by enriching corresponding workflow queries. In the next section, the definition of complete workflows is introduced, which is derived from the requirements of the workflow modelling assistance on the workflows stored in the repository. Then, completion operators are presented and methods are described to learn and apply them automatically. Next, it is described how the degree of completion can be enhanced by generic completion rules. Since the workflow completion may result in many solutions, a heuristical solution selection is presented in order to determine an appropriate completion solution. Finally, this section concludes with the summary of an experimental evaluation.
6.1.1 Complete Workflows The definition of complete workflows followed in this section is derived from the requirements imposed by the modeling assistance on the workflows stored in the repository. More precisely, the adaptation approaches require consistent block-oriented workflows 1 (see Sect. 3.2.1), which specify the necessary data-flow information for a proper adaptation of the respective workflow. Based on consistent workflows, compositional adaptation by workflow streams (see Sect. 5.2) can be performed and consistent workflows further enable a flexible operator-based replacement of workflow fragments (see Sect. 5.3). Hence, consistent block-oriented workflows contain all required information demanded by the workflow modeling assistance. Thus, they are considered as complete workflows.
1 except
of adaptation by generalization and specialization (see Sect. 5.1.1)
172
6 Improving Workflow Modeling Assistance
mayonnaise
tabasco
sandwich dish
ketchup
(2)
(3) mix
combine (1) sandwich sauce
sandwich topping
Figure 6.3: Examples for incomplete data-flow information
Recapitulating, consistent workflows require that all data is linked to the final specific workflow output (see Def. 5). As an example, in Figure 6.3 the final output of the workflow is the sandwich dish composed of a sandwich topping and sandwich sauce. The latter contains the ingredients mayonnaise, ketchup, and tabasco. Each of the three denoted data links (1,2,3) is required. Otherwise, the workflow would be considered as incomplete, since entire components such as the sandwich sauce are not contained in the final output (1), ingredients within the sandwich sauce component are missing (2), or no data is linked to the specific workflow output (3). Workflow completion thus aims at identifying and reconstructing those missing data links. Furthermore, the completion aims at the construction of appropriate data-flow specifications. Thus, the definition of consistent block-oriented workflows (see Def. 5) is now extended, to prevent most implausible workflow definitions. More precisely, it can be assumed that identical branches of the same XOR control-flow block, i.e., branches with identical sequences of tasks and identical data-flow, are not reasonable. Furthermore, the multiple creation of the same data output at different parts of the workflow is usually not plausible. Based on these considerations, strictly consistent block-oriented workflows are introduced as follows: Definition 24. A block-oriented workflow is strictly consistent, if and only if all following conditions hold: 1. The workflow is consistent (according to Def. 5). 2. The workflow has no identical XOR branches within the same control-flow block. 3. The workflow does not have several creator tasks t1 , t2 ∈ N T ∗ producing D the same data output, i.e., tD 1 = t2 , unless they are contained in different XOR branches within the same control-flow block.
6.1 Data-Flow Completion of Workflows
173
This definition of strictly consistent workflows is now explained in more detail based on two examples (see Fig. 6.4+6.5).
...
italian seasoning
mustard
mix
add
XOR
XOR
mix
mayonnaise
sauce
...
...
italian seasoning
mustard
mix
add
XOR
XOR
mash
add
sauce
...
add
tomatoes
mayonnaise
strictly consistent
not strictly consistent
Figure 6.4: Example of strict consistency (identical branches)
The second condition prohibits equal XOR branches, since the corresponding control-flow block would be superfluous. For instance, in the example illustrated in Figure 6.4, the identical XOR branches literally mean to “either create the sauce or create the identical sauce in the same manner”. In the corresponding example of a strictly consistent XOR block in Figure 6.4, either a mayonnaise-based sauce or a tomato-based sauce is created instead, i.e., the branches specify different possibilities for preparation. The third condition prohibits the multiple creation of identical data nodes at different parts of the workflow, since it would override the previously created data node (unless they are contained in different XOR branches). mayonnaise
...
mix
italian seasoning
add
mustard
mayonnaise
sauce
mash
tomatoes
not strictly consistent
add
...
...
mix
italian seasoning
add
mustard
mayonnaise sauce
mash
tomato sauce
add
...
tomatoes
strictly consistent
Figure 6.5: Example of strict consistency (identical creator data node)
In the example illustrated in Figure 6.5 a mayonnaise-based sauce is created first and then replaced by a tomato-based sauce, i.e., the creation
174
6 Improving Workflow Modeling Assistance
of the mayonnaise-based sauce becomes superfluous. Thus, an appropriate workflow that creates several sauces should also create different data nodes representing the two sauces respectively. The corresponding example of a strictly consistent workflow fragment (see Fig. 6.5) consequently creates a mayonnaise sauce and a tomato sauce. However, in different XOR branches, the creation of the same data node is permitted, since only one branch is executed (see right-sided example in Fig. 6.4). To summarize, leveraging the automatic data-flow completion to construct consistent block-oriented workflows, ensures an appropriate basis for the entire workflow modeling assistance, since they enable retrieval and adaptation. Thus, these consistent block-oriented workflows are considered as complete workflows. Furthermore, in order to avoid most implausible workflow specifications, the presented completion approach aims at constructing complete workflows that are also strictly consistent.
6.1.2 Workflow Completion Operators The automated workflow completion applies completion operators in order to compensate missing information within the data-flow. Roughly speaking, a completion operator specifies a valid data-flow for a particular task. Hence, the operator represents a possible combination of input and output data nodes for a certain activity (see Def. 25). Similar to the adaptation operators (see Sect. 5.3), the completion operators implicitly contain preand postconditions on the operator application. Definition 25. A completion operator o is a workflow o = (No , Eo ) consisting of a single task t (also denoted as to ), i.e., NoT = {t} and NoC = ∅, together with a set of input data nodes tD and a set of output data nodes tD , i.e., NoD = tD ∪ tD . sandwich sauce mix
mayonnaise
ketchup
Figure 6.6: Example completion operator
tabasco
6.1 Data-Flow Completion of Workflows
175
Figure 6.6 illustrates an example completion operator, describing that a sandwich sauce can be created by mixing mayonnaise, ketchup, and tabasco. Moreover, completion operators can also be generalized (see Sect. 5.1.1). A possible generalization of the previous completion operator is illustrated in Figure 6.7. Here, the data nodes sauce and flavoring as well as the task combine are more generalized. Hence, the sandwich sauce is created by arbitrarily combining mayonnaise with any sauce and any flavoring. combine mix
sandwich sauce
combine mayonnaise
sauce
flavoring
Figure 6.7: Example generalized completion operator
These completion operators enable to identify and reconstruct missing data-flow information in case they represent an appropriate completion scenario. To determine the applicability of a completion operator o with operator task to to a task t in workflow W = (N, E), two mapping functions D min : toD → N D and mout : tD are introduced in Definition 26. o → N These functions map the input and output data nodes of the operator to the data nodes of the workflow such that the labels of the operator data nodes are equal or more generalized. In case of input or output data nodes already linked to the respective task t, they must be mapped by the corresponding mapping function. Existing input and output data nodes must consequently match the respective data nodes of the completion operator. The mapping min is a total mapping function, which requires that each input data node of the operator can be mapped to a data node of the workflow (missing input data links will be reconstructed). In contrast, the output data nodes of the operator do not need to be mapped by the partial mapping function mout (in this case new output data nodes will be created). Definition 26. For a task t in workflow W = (N, E) and a completion operator o = (No , Eo ) with task to , let min : toD → N D be a total injective mapping s.t. ∀d ∈ tD : ∃d0 ∈ toD : d = min (d0 ) and ∀d ∈ toD : S(min (d)) v D S(d). Furthermore, mout : tD is a partial injective mapping s.t. o → N D 0 D 0 ∀d ∈ t : ∃d ∈ to : d = mout (d ) and ∀d ∈ tD o : S(mout (d)) v S(d) ∨ mout (d) = ∅2 . 2m
out (d) = ∅ denotes that the mapping is undefined for output data node d, i.e, d is not within the domain of the function (d 6∈ Dmout )
176
6 Improving Workflow Modeling Assistance
The applicability of a completion operator is determined by these mapping functions as follows: Definition 27. An operator o = (No , Eo ) with operator task to ∈ NoT is applicable to a task t under the mapping pair (min , mout ) iff S(t) v S(ot ) D and if the mapping functions min : toD → N D and mout : tD exist o → N as defined in Definition 26. generalized operator combine mix
sandwich sauce
combine
mayonnaise min mayonnaise
flavoring
sauce min ketchup
incomplete workflow
min tabasco
mix grate
+
+ slice
spread
add
baguette
salami
layer
sprinkle
bake
open
cucumber
cheese
Figure 6.8: Example mapping between operator and incomplete workflow
As an example, Figure 6.8 illustrates the mappings between a generalized operator and an incomplete workflow. The depicted operator is applicable to the workflow task mix, since the corresponding tasks are matching (i.e., combine is more general than mix ) and the input data nodes of the operator are already contained in the workflow (see mappings min ). Since no output data node for the task mix exists, the output data node of the operator does not have to be mapped. The application of a completion operator reconstructs missing data-flow information based on the two mapping functions. Since the operator describes valid flows of data, the mappings basically refer to the corresponding input and output data of the workflow task after application, i.e., missing data links are added and if required, new output data nodes are created. The
6.1 Data-Flow Completion of Workflows
177
completion thereby assumes that the workflow contains each data node only once (i.e., no two data nodes with identical labels exist). Thus, the completion operators link already existing data nodes instead of creating new ones, if possible. As for a completion operator and a particular workflow task multiple valid mappings may exist (input or outputs of the operator task can be mapped differently), the application can result in several possible workflow completions. The application of a completion operator with operator task to ∈ NoT under a certain mapping (min , mout ) is specified as follows: Definition 28. The application of completion operator o = (No , Eo ) with operator task to ∈ NoT to the task t of workflow W = (N, E) under the mapping pair (min , mout ) results in the workflow W 0 = (N ∪ N D+ , E ∪ E D+ ). Here, N D+ is the set of output data nodes newly produced by the 0 D operator application, i.e., N D+ = {d ∈ tD : S(d0 ) v S(d)}. E D+ o |6 ∃d ∈ N determines the new input and output data links generated by the operator (if not already present), i.e., E D+ = {(min (d), t)|d ∈ toD } ∪ {(t, mout (d))|d ∈ D+ tD } o } ∪ {(t, d)|d ∈ N
mayonnaise
ketchup
tabasco *
*
mix grate
+
sandwich sauce *
+ slice
spread
add
baguette
salami
layer
sprinkle
bake
open
cucumber
cheese
Figure 6.9: Example result for an operator application
The application of a completion operator to a task t is now explained based on the example illustrated in Figure 6.8. As previously described, the generalized operator o is applicable to the task mix of the workflow illustrated Figure 6.8. Considering a certain mapping, missing data-flow information related to this task is added to the workflow. More precisely, mapped input and output data nodes not linked to the particular workflow task are linked by corresponding data-flow edges. In the illustrated example, the data nodes mayonnaise and ketchup mapped by min are already linked to the task mix. Thus, only for the data node tabasco an additional data-flow
178
6 Improving Workflow Modeling Assistance
edge has to be inserted into the workflow (see data link marked with ∗ in Fig. 6.9). New data nodes are added to the workflow w, if an output data node of the respective operator is not mapped by mout . Here, a new data node sandwich sauce is created as an output of task mix (see workflow elements marked with ∗ in Fig. 6.9).
6.1.3 Learning Workflow Completion Operators Similar to the learning of adaptation knowledge (see Sect. 2.4.5), the completion operators can also be learned automatically from the workflow repository. This is essential in order to avoid a manual definition of these operators by a domain expert. Otherwise, a straightforward manual completion of the workflows would probably be more appropriate. The completion operators are automatically extracted from the training workflows stored in the repository, since they mostly contain complete and valid data-flow information. Hence, for each complete task t (i.e., td 6= ∅ ∧ td 6= ∅) in the repository, a completion operator is constructed respectively. The completion operator o = (No , Eo ) is specified by the workflow task t, i.e, NoT = {t}, the respective input and output data nodes, i.e., NoD = {d ∈ N D |t : d ∈ tD ∨ d ∈ tD )}, as well as the corresponding data-flow edges, i.e., EoD = {(d, t) ∪ (t, d) ∈ E D |d ∈ NoD }. The learned completion operators are subsequently generalized aiming at increasing their applicability. Here, a simple approach for generalization is performed, which is based on a threshold parameter θ denoting the level of generalization within the taxonomy. More precisely, each task and data node label is generalized to the highest parent label annotated with a similarity value less than θ, i.e., simψ (S(n)) < θ. This generalization approach is less restrictive than the generalization approach for workflows presented in Section 5.1.1. This is important, as the workflow generalization requires a sufficient number of workflows stored in the repository for determining appropriate generalizations. On the assumption that the initial training repository is relatively small, the workflow generalization approach becomes unsuitable for generalizing the completion operators. However, for larger training repositories, the workflow generalization could also be straightforwardly applied. Several different operators for the same task label could be derived by extracting the completion operators from all workflows within the repository. If however, two completion operators o1 , o2 are entirely identical (i.e., o1 ≡ o2 ), they are only stored once. In this case, the stored operator is annotated with an evidence score Score(o), which defines how often the operator has
6.1 Data-Flow Completion of Workflows
179
been extracted from the repository. The completion algorithm will prefer higher-ranked operators with regard to the evidence score Score(o), since this implies that these operators are more universal.
6.1.4 Workflow Completion Based on the learned completion operators from the initial workflow repository, further workflows can be automatically completed. The approach is illustrated in Algorithm 8 and will be explained in more detail in the following sections. The basic idea is to apply all learned and applicable domain-specific completion operators (see Sect. 6.1.4) resulting in a set of completed workflows. Next, generic default completion rules are applied to those tasks not yet completed by an appropriate operator (see AP P LY RU LES in Algorithm 8, Sect. 6.1.4). Finally, from the set of completed workflows, a heuristic method aims at identifying the most appropriate workflow completion (see SOLU T ION SELECT ION in Algorithm 8, Sect. 6.1.4). Algorithm COMPLETE WF(W ,O) Input: Workflow W = (N, E) and completion operators O Output: Completed workflow solution W 0 = (N 0 , E 0 ) SOL:={W }; for each task t ∈ N T (ordered by control-flow, first task at first) do Init newSOL = ∅; for each completion solution s = (Ns , Es ) ∈ SOL do for each operator o with mapping pair (min , mout ) ∈ CO(t) do s0 := workflow after applying o under (min , mout ) on s; if ¬ (s’ can never become strictly consistent) then newSOL := newSOL ∪ s0 ; SOL:=newSOL; SOLoperators := {s ∈ SOL|s 6≡ W }; SOLrules := AP P LY RU LES(SOLoperators , RB); return SOLU T ION SELECT ION (SOLrules ); Algorithm 8: Data-flow completion
180
6 Improving Workflow Modeling Assistance
Applying Completion Operators The completion of workflows by the learned operators is based on a breadthfirst search (see Algorithm 8) aiming at determining all attainable workflow completions. Starting with the first task regarding the control-flow, the algorithm successively completes the data-flow information of all tasks in the workflow. Initially, the data-flow information of the first task is completed by using all applicable completion operators, which results in multiple completion solutions. Next, the subsequent task is again completed by applying all applicable operators to each of the previously determined solutions. The data-flow of the completion solutions is thereby successively expanded. This process is continued until all tasks of the workflow have been processed, which finally results in a set of completed workflows SOL. In Algorithm 8, CO(t) represents the set of all applicable completion operators for a task t along with all valid mapping pairs (min , mout ) respectively. This approach is characterized by a high complexity with regard to runtime and memory consumption. Thus, it is important to limit the set of possible workflow completions. Here, current solutions are not further considered, if they cannot result in strictly consistent block-oriented workflows (see Def. 24). More precisely, a workflow can never become strictly consistent by applying additional completion operators, if it already contains several creator tasks with identical outputs (unless they are contained in different branches of the same XOR block) or if an already completed XOR block contains identical branches. Consequently, those completion solutions are discarded, which sufficiently limits the memory consumption as well as the run-time of the algorithm and further prevents undesired workflow completions. However, in case of a significantly increased amount of workflow completion operators or a significantly increased workflow size, a heuristic search algorithm may be required. Only those completion solutions s = (Ns , Es ) ∈ SOL for which at least one completion operator was applied (i.e., solution not equivalent to original workflow W ) are retained and stored in the set SOLoperators . Some tasks of these completed solutions may, however, still be incomplete. Thus, a set of default rules is subsequently applied aiming at further completing the data-flow information as will be described in the next section. Applying Default Completion Rules After determining possible completions for a particular workflow by applying the learned operators, the data-flow information of each completion solution
6.1 Data-Flow Completion of Workflows
181
is further increased by using a set of generic and mostly domain independent default rules (see AP P LY RU LES(SOLoperators , RB) in Algorithm 8). Most basically, the default rules aim at identifying a suitable output for each task without output data node such that the data-flow consistency of the respective completion solution is established (see Sect. 3.2.1), i.e., each data node is contained in the final specific workflow output. Initially, pre-computations are performed to derive additional information that facilitate the identification of appropriate output data nodes. More precisely, it is determined which data nodes are creator data (see Sect. S nodes 3) within the learned completion operators O, i.e., D∗ = NoD∗ . Moreover, o∈O
for each task t in the completion solution W = (N, E) ∈ SOLoperators , a set of unproduced data nodes tD− ⊆ N D is determined. Unproduced data nodes of a task t ∈ N T are those data nodes not produced by a previous task in the workflow, i.e., tD− = {d ∈ N D | 6 ∃t0 ∈ N T : d ∈ t0D ∧ t0 < t}. RULE BASE 1 1: IF tD = ∅ ∧ ∃!d ∈ tD THEN tD := {d}; 2: IF tD = ∅ ∧ ∃!d ∈ (tD ∩ N D∗ ) THEN tD := {d}; RULE BASE 2 3: IF tD = ∅ ∧ ∃!d ∈ tD : ∃t0 ∈ N T ∧ t < t0 ∧ d ∈ t0D THEN tD := {d}; RULE BASE 3 4: FOR EACH t0 ∈ N T : t < t0 (ordered by control-flow, closest task first) IF tD = ∅ ∧ ∃!d ∈ tD− : d ∈ t0D THEN tD := {d}; 5: FOR EACH t0 ∈ N T : t < t0 (ordered by control-flow, closest task first) IF tD = ∅ ∧ ∃!d ∈ (tD− ∩ D∗ ) : d ∈ t0D THEN tD := {d};
Figure 6.10: Default rule bases (∃!d denotes that exactly one d exists)
The completion rules are organized into several rule bases (see Fig. 6.10). Ordered by priority, the rule bases are consecutively applied to a particular completion solution W = (N, E) ∈ SOLoperator (see Algorithm 9) in order to further extend the corresponding data-flow information. The application of a rule base successively matches the specified rules against the tasks currently lacking an output data node (i.e., tD = ∅), starting with the
182
6 Improving Workflow Modeling Assistance
first task regarding the control-flow. During this rule-based completion process, only the first matching rule is used to determine the respective output. Thereby no new output data nodes are created. Instead, suitable data nodes already contained in the workflow are identified and linked to the respective task as an output data node. The rules thereby basically consider the current input data nodes of the particular task and the relationships to the remaining workflow in order to determine missing data-flow links such that the data-flow consistency can be established. Algorithm APPLY RULES(W ,RB) Input: Workflow W = (N, E) and rule bases RB Output: Rule-based completed workflow W = (N, E) for each rule base rb ∈ RB (ordered by priority, ascending) do for each task t ∈ N T (ordered by control-flow, first task first) do W := AP P LY RU LEBASE(t, rb); return W ; Algorithm 9: Application of rule bases
The rule bases illustrated in Figure 6.10 are now explained in more detail. For each of the five rules, an exemplary application scenario is sketched illustrating the task t with its corresponding data-flow information, the relationships to the remaining workflow, as well as the resulting data-flow completion. In the illustrated scenarios (see, e.g., Fig. 6.12), dashed lines mark transitive control-flow edges and the resulting data-flow completion, i.e., the new data-flow edge, is marked with an asterisk ∗. cucumber
cucumber
* slice
task t
slice
task t
Figure 6.11: Example of default completion rule 1
According to the first rule, tasks with exactly one input data node are regarded as processor tasks that produce and consume the same data node. Thus, the input data node is also linked as the corresponding output of the task. In the illustrated example (see Fig. 6.11), the execution of the preparation step slice cucumber would naturally result in the data output cucumber, which is thus specified as the corresponding output data node.
6.1 Data-Flow Completion of Workflows
183
D* sandwich sauce
ketchup
sandwich sauce
add
mix
mayonnaise
herbs
*
task t
herbs
add
task t
herbs tabasco
Figure 6.12: Example of default completion rule 2
mayonnaise
*
add
add
add
task t
salt
mayonnaise
pepper
task t
salt
pepper
Figure 6.13: Example of default completion rule 3
In case more than one input data node for a task t exists, the next two rules (rule 2+3) aim at choosing an appropriate input data node as the output. If exactly one input data node of a task t is a creator data node in the current workflow (i.e., contained in N D∗ ), rule 2 specifies this data node as the corresponding output. This rule is based on the assumption that combining creator data nodes with other data nodes usually enrich the creator data node by additional data. In the example (see Fig. 6.12), the sandwich sauce is considered as a creator data node, for example, produced by mixing mayonnaise, ketchup and tabasco. If this created data node is processed by seasoning the sandwich sauce with herbs, the resulting sandwich sauce represents an appropriate output for the particular task. The third rule selects an input data node of a task t as output that is consumed by a subsequent task, if only one such input data node exists. This rule ensures that the produced output is consumed by any subsequent task in the workflow. Otherwise, the data-flow consistency would be violated, i.e., the produced output would not be contained in the specific workflow output W O (see Sect. 3.2.1). For example (see Fig. 6.13), if multiple ingredients such as salt, pepper, and mayonnaise are combined and only the mayonnaise
184
6 Improving Workflow Modeling Assistance
is reused, it is linked as the corresponding output data node. Without this data link, the subsequent task would consume an unseasoned mayonnaise and the data-flow consistency is violated since the seasoned mayonnaise would not be contained in the final specific workflow output. -
tD
mushrooms
chop
tD bell pepper
tD vegetable mix
combine
mushrooms
combine
add
task t
bell pepper
vegetable mix
*
task t
garlic
Figure 6.14: Example of default completion rule 4
D* ∩ tD
-
sandwich sauce
-
-
tD
tD
salt
pepper
sandwich sauce
*
combine task t
mayonnaise
ketchup
combine
season
tabasco
task t
mayonnaise
ketchup
tabasco
Figure 6.15: Example of default completion rule 5
The rules specified in the last rule base (rule 4+5) again focus on linking a data node as an output that is required by a successor task. In contrast to the previous rules, however, the output is not selected from the inputs of the current task but from the inputs of a successor task. The rules are based on the assumption that closer related tasks are also more related to each other concerning the data-flow, i.e., a produced data node is more likely consumed by a closely related successor task. Consequently, the last two rules search for the first successor task matching the particular rule’s precondition. More precisely, rule 4 considers a data node for task t as an output that has not been produced by a prior task to t, i.e., contained within the unproduced data nodes tD− , but is required by a successor task. If exactly one such data node for a successor task exists, the corresponding data node linked to the
6.1 Data-Flow Completion of Workflows
185
closest successor task is specified as the output of task t. This potentially constructs novel creator data nodes in case that the linked output data node is no input of task t. In the illustrated example (see Fig. 6.14), the task t (combine) consumes mushrooms and bell pepper and the closest successor task for which the rule matches consumes vegetable mix and garlic. Here, the vegetable mix is linked as the output, since the garlic data node has already been produced. The last rule (rule 5) is an extension of the forth rule, which additionally requires that the data node linked as output to task t is a creator data node N D∗ determined from the learned completion operators, i.e., the precondition is extended, because no single data node could be identified matching rule 4. In the example (see Fig. 6.15), task t combines mayonnaise, ketchup, and tabasco while a successor task seasons sandwich sauce with salt and pepper. Here, the sandwich sauce is the only currently unproduced creator data node (tD− ∩ D∗ ) that is consumed by a subsequent task. Thus, sandwich sauce is linked as the output of task t, since an unproduced creator data node indicates that it must be produced at some point in the workflow in order to establish the consistency of the data-flow. These rules are based on common data-flow patterns and should therefore be mostly domain independent. However, in certain application scenarios, adjustments might be required. For this purpose, the particular rule bases can be easily exchanged. Solution Selection The automated data-flow completion of a workflow as previously described might result in a large number of possible completion solutions. Since the manual selection of an appropriate solution is not reasonable, a method to automatically determine a single completion solution (see function SOLU T ION SELECT ION (SOLrules ) in Algorithm 8) is required. Therefore, the constructed completion solutions are filtered based on various criteria. Since strictly consistent block-oriented workflows are appropriate for retrieval and adaptation and are usually characterized by a meaningful definition of the data-flow, only such workflows are retained, if possible. The final solution selection then mostly depends on rate of completeness of the particular solution and the evidence score of the applied operators, i.e., how often these operators have been learned from the workflow repository indicating their universal applicability in the particular domain (see Sect. 6.1.3). The process for appropriate solution selection is consequently defined as follows:
186
6 Improving Workflow Modeling Assistance
1. Initialize SOL0 with the set of completed workflow solutions SOLrules . 2. If SOL0 contains at least one strictly consistent workflow, then all other workflows that are not strictly consistent are removed from SOL0 . 3. Next, only retain those completed workflow solutions in SOL0 with the largest number of complete tasks, i.e., tasks which have at least one input and one output data node. 4. Finally, rank the remaining workflows in SOL0 by the sum of the evidence scores annotated to the completion operators respectively applied. Select the completed workflow with the highest score.
6.1.5 Evaluation This section presents an experimental evaluation to demonstrate the feasibility of the described approach to workflow completion. The evaluation strictly follows an evaluation performed for a previous version of the described approach [173]. Since strictly consistent workflows reflect the perception of complete workflows, the evaluation primarily investigated whether the data-flow completion results in strictly consistent workflows (Hypothesis H1). Furthermore, it was verified whether the workflow completion produces workflows of high quality (Hypothesis H2). H1. The presented workflow completion results in strictly consistent workflows in most cases. H2. The presented workflow completion results in completed workflows with a high quality. The evaluation used a workflow repository with 61 sandwich recipes manually modeled based on cooking recipes from various Internet sources. For all 61 cooking workflows, the strict consistency and a high quality were ensured. The workflows contained AND, XOR, as well as LOOP structures. The required taxonomies of ingredients and preparation steps were manually constructed based on the WikiTaaable ontology [9]3 . The evaluation is performed by an ablation study in which data-flow information is removed from the workflows. The deconstructed workflows are then automatically completed and compared with the corresponding original workflows. Thus, the evaluation uses a repository with incomplete 3 http://wikitaaable.loria.fr
6.1 Data-Flow Completion of Workflows
187
workflows and a repository of golden standard workflows, specifying the optimal workflow completion. In this evaluation, the original 61 workflows serve as golden standard workflows and are also used as the training repository RPtrain for learning completion operators. Incomplete workflows were automatically constructed by removing data-flow information from the golden standard workflows. As the incomplete workflows should reflect occurring scenarios in the cooking domain, real cooking recipes were analyzed. This investigation showed that the input ingredients for each preparation step are usually explicitly specified, but the output ingredients are commonly missing. Based on this consideration, two test workflow repositories with incomplete workflows were constructed. For the generation of the first test workflow I repository RPtest , all output data links were removed from the workflows. II In the second workflow repository RPtest even less data-flow information is specified. More precisely, the workflows in this repository only contained the data nodes given in the ingredient list of a recipe, i.e., the input data. Furthermore, only the data links that connect the input data to the first task that consumes the corresponding data node are contained respectively, i.e., first consumption of the input data with regard to the control-flow. Evaluation Criteria Based on precision and recall measures as defined by Schumacher et al. [223], the compliance of the completed workflow with the golden standard workflow was measured to determine the quality of the completed workflow. For computing these measures, the recall and precision is assessed for each task in the completed workflow and then aggregated to an overall workflow recall or precision value. precisionT (t, t∗ ) = recallT (t, t∗ ) =
|tD ∩ t∗D | + |tD ∩ t∗D | |tD | + |tD |
|tD ∩ t∗D | + |tD ∩ t∗D | |t∗D | + |t∗D |
(6.1)
(6.2)
The percentage of the data links connected to the completed task t that are also data links of the golden standard task t∗ is defined as the precision (see Formula 6.1), whereas the percentage of data links connected to the golden standard task t∗ that are also contained in the completed task t is specified as the recall (see Formula 6.2).
188
6 Improving Workflow Modeling Assistance
T
|N | X 1 precision(W, W ) = · precisiont (ti , t∗i ) |N T | i=0 ∗
(6.3)
T
|N | X 1 recall(W, W ) = · recallt (ti , t∗i ) T |N | i=0 ∗
(6.4)
Consequently, the workflow precision (see Formula 6.3) computes the ratio of the data links in the completed workflow W that are also contained in the golden standard workflow W ∗ for each task on average. In contrast, the workflow recall (see Formula 6.4) measures the average ratio of data links in the golden standard workflow W ∗ that are also contained in the completed workflow W for each task. In both formulas, ti refers to a task in the completed workflow and t∗i refers to the corresponding same task in the golden standard workflow. F 1(W, W ∗ ) = 2 ·
precision(W, W ∗ ) · recall(W, W ∗ ) precision(W, W ∗ ) + recall(W, W ∗ )
(6.5)
Finally, an overall measure F 1 (see Formula 6.5) aggregates the workflow precision and workflow recall in a single value. This score is basically defined as the harmonic mean of both values. Evaluation Results For the evaluation of the hypotheses, leave-one-out experiments were perI formed. More precisely, each workflow W from the test repositories (RPtest II or RPtest , respectively) was selected as a test workflow. Then, the completion operators were learned from the training repository CBtrain without the corresponding test workflow W . Thus, only completion operators resulting from other workflows were learned. The workflow completion constructed 44 strictly consistent test workflows I (72.13%) for RPtest but only 5 strictly consistent test workflows (8.20%) II for RPtest . Thus, Hypothesis H1 (completion results in strictly consistent II workflows in most cases) cannot be confirmed for RPtest . However, for I RPtest Hypothesis H1 is mostly confirmed. Consequently, for a successful workflow completion, a certain amount of information in the workflow or the learned completion operators is required. Additionally, the quality of the completed workflows was analyzed for validating Hypothesis H2. In Table 6.1, the average precision, recall, and F1
6.1 Data-Flow Completion of Workflows
189
Table 6.1: Quality compared to golden standard workflow
precision recall F1
I RPtest uncompleted completed 1.00 0.98 (-0.02) 0.59 0.98 (+0.39) 0.74 0.98 (+0.24)
II RPtest uncompleted completed 1.00 0.84 (-0.16) 0.45 0.73 (+0.28) 0.62 0.78 (+0.16)
score values between the test workflow W and the corresponding training workflow wtrain are illustrated prior to completion as well as after completion. I For the test workflow repository RPtest , the workflow completion results only in a slight decrease in precision (-0.02), showing that a low rate of wrong data links is inserted during the completion process. In contrast, the recall increased significantly from 0.59 to 0.98. Consequently, also the F1 score is significantly increased (+0.24). For the second workflow repository II RPtest , the recall (+0.28) and F1 score (+0.16) were increased significantly. However, the reduction of precision (-0.16) indicates that for this repository some inserted data-flow information is not in compliance with the golden standard workflow. Thus, the quality of the workflow resulting from the workflow completion is significantly determined by the completeness of the initial workflow prior to completion. However, for both repositories, the F1 score is significantly increased. Thus, hypothesis H2 is clearly confirmed. Overall, the experiments showed that the presented approach produces workflow completions of high quality. However, a training repository with a sufficient number of manually completed workflows is required in order to extract enough completion information, such that the learned operators cover a broad range of domain-specific data-flow patterns. The required amount of training workflows thereby individually depends on the data-flow information specified in the workflow to be completed, i.e., a sparsely defined data-flow requires more completion knowledge.
6.1.6 Conclusions and Related Work This section presented an approach for the automated completion of workflows by reconstructing missing data-flow information. An experimental evaluation demonstrated the feasibility of the presented approach showing that incomplete workflows can be transformed to sufficiently specified workflows if enough completion knowledge is available. The automated com-
190
6 Improving Workflow Modeling Assistance
pletion enhances the workflow modeling assistance by three means: First, the completion lessens the knowledge gap within the workflow repository and the adaptation knowledge container. Thus, the workflow modeling assistance is improved, since the corresponding retrieval and adaptation process is provided with more information. Furthermore, based on a repository of complete workflows, only consistent block-oriented workflows are constructed by the workflow modeling assistance. Next, the user could trigger this approach during the manual modeling or adaptation of a particular workflow, which would automatically complete the data-flow of the respective workflow model. As a consequence, the efficiency of the manual modeling or adaptation is increased. Finally, also a POQL query could be completed automatically, which eases the specification of a comprehensive query. The result of the workflow modeling assistance is thereby improved, since a comprehensively specified query more likely provides a workflow matching the particular needs. In Case-Based Reasoning, various approaches addressed the completion of missing case information (e.g., [39, 8, 91, 206]). For PO-CBR in particular, workflow completion has been considered by approaches automatically extracting processes from textual process descriptions (e.g., [222, 62]). Here, the completion is based on linguistic methods compensating missing information that only implicitly result from the textual descriptions. In the field of workflow management systems, several approaches address the completion of workflows with regard to the construction of executable workflows [245, 116]. Process mining is also closely related to workflow completion, but current literature and available process mining tools mostly focus on the identification of the control-flow. Only recent process mining approaches highlighted and addressed the analysis of the data-flow (e.g., [56]).
6.2 Adaptation-Guided Workflow Retrieval The presented workflow modeling assistance constructs a workflow solution by retrieving the best-matching workflow from the repository and by compensating the respective deficiencies in a subsequent adaptation process. However, an initially less matching workflow, which is consequently disregarded, may be better adaptable to the particular needs. Hence, this retrieval approach cannot ensure that the selected workflow results in the best possible workflow solution. This isolation between the retrieval and adaptation stage is a well-known problem in CBR. As pointed out by Smyth and Keane “the success of case-based problem solving is critically dependent
6.2 Adaptation-Guided Workflow Retrieval
191
on the retrieval of a suitable case; that is, one that can be adapted to give the desired solution” [233][p.1]. Motivated by this, several adaptation-guided retrieval approaches were presented that consider adaptability during retrieval (e.g., [233, 128]).
W1 ... query q
RP
retrieval
of best matching workflow(s) from workflow repository RP workflow repository Wi
AWi
applicable adaptation knowledge for workflow Wi
Wi
AW1
AWi
a(q,W1) a(q,Wi)
... Wn ranked workflows
AWn
adaptation
based on applicable adaptation knowledge
a(q,Wn) adapted workflows
Figure 6.16: Workflow retrieval and workflow adaptability
For PO-CBR, this issue is illustrated in more detail based on Figure 6.16. First, workflow retrieval is performed, ranking the workflows in the repository RP according to the fulfillment of a given query q. Here, W1 denotes the best-matching workflow and Wn the least matching workflow respectively, i.e., QF (q, W1 ) ≥ QF (q, Wi ) ≥ QF (q, Wn ). By using the traditional retrieval approach, the best-matching workflow W1 is selected and subsequently adapted with regard to the query, which finally results in the adapted workflow a(q, W1 ) (see bold lines). However, the adaptability of the ranked workflows differ, since the available adaptation knowledge depends on the respective workflow. Consequently, compared to the best-matching workflow W1 an initially less matching workflow Wi may have a higher query fulfillment after adaptation, i.e., QF (q, a(q, Wi )) > QF (q, a(q, W1 )), if for this workflow more appropriate adaptation knowledge is available. Selecting the best-matching workflow W1 for the adaptation process can thus result in a non-optimal workflow solution. As an example, a lasagna recipe can hardly be prepared without an oven. If the query specifies that a no-bake pasta dish is desired, the lasagna recipe should consequently not be selected (even though all other requirements and restrictions are matching), since a pasta recipe that is already prepared without an oven could more likely be adapted to the particular query specifications.
192
6 Improving Workflow Modeling Assistance
Retrieval must thus consider the adaptability of the workflows to ensure that the workflow modeling assistance provides the best possible workflow solution. The POQL query language (see Chap. 4) already considers generalized labels, thereby measuring the adaptability of the stored workflow models with regard to adaptation by generalization and specialization (see Sect. 5.1). For the remaining approaches, however, the adaptability is basically unknown until the adaptation has actually been performed. Hence, each workflow in the repository would have to be adapted according to the particular query in order to determine the best possible workflow solution (see Fig. 6.16). This is computationally not feasible, especially in large workflow repositories. Furthermore, a pre-computation is virtually impossible due to the infinite number of possible workflow solutions. Hence, this section presents an adaptation-guided retrieval approach, which assesses the adaptability of the workflows heuristically. Based on this, the workflows are ranked in such a manner that workflows at higher positions more likely match the query after adaptation. By selecting the highest-ranked workflow for the subsequent adaptation process, better matching workflow solutions can then be provided, since the adaptability of the workflows is considered. The contents of this section originate from a student research project [271] and the resulting publication by Bergmann, M¨ uller, Zeyen, and Manderscheid c of original publication by AAAI Press 2016). In the remainder of [25] ( this section, the general approach to adaptation-guided retrieval will be presented and various methods to estimate the adaptability of workflows will be discussed. Finally, an experimental evaluation of the presented approaches is described.
6.2.1 Adaptation-Guided Retrieval In Case-Based Reasoning, retrieval aims at assessing the utility of the cases in order to determine the most useful case for a particular query (see Sect. 2.4.4). The utility is, however, not known a priori, since it can only be determined after the case was applied to a particular problem (see e.g., [16][p. 94]). Thus, the basic assumption of CBR is that the most similar case contains the most useful solution. CBR systems incorporating adaptation capabilities, however, require another retrieval approach (e.g., [128, 233]). The utility no longer depends on the case itself but is based on its adaptation capabilities. For example, a case with a low similarity that can be fully adapted to a desired solution is highly useful. Thus, the retrieval stage needs to consider adaptability in order to better assess the final utility.
6.2 Adaptation-Guided Workflow Retrieval
193
This also applies to the presented workflow modeling assistance in which the retrieval is based on the assumption that better matching workflows are also more useful. Thus, the POQL query fulfillment measure (see Chap. 4) approximates the utility of a workflow W with regard to the fulfillment of a given query q, i.e., utility(q, W ) ≈ QF (q, W ). However, with the ability to adapt workflows automatically, this approximation becomes inappropriate. Instead, the most useful workflow is the best-matching workflow after adaptation, i.e., utility(q, W ) ≈ QF (q, a(q, W )), which requires another utility assessment. Consequently, the adaptation-guided retrieval for workflows is implemented by a new function QFadapt (q, W ) → [0, 1], which considers the adaptability of the workflows in addition to the query fulfillment. More precisely, the workflows are ranked according to the current query fulfillment and the gain that can be achieved by adaptation (see Formula 6.6). QFadapt (q, W ) = QF (q, W ) + (1 − QF (q, W )) · adaptability(q, W ) (6.6) Here, 1 − QF (q, W ) denotes the potential increase in query fulfillment that can be achieved by adaptation. More precisely, if the fulfillment of the query QF (q, W ) ∈ [0, 1] is 1, adaptation cannot increase this value any further, while a lower query fulfillment could possibly be compensated by adaptation. The particular increase that can be achieved is determined by a queryspecific adaptability (QSA) of a workflow W , i.e., adaptability(q, W ) → [0, 1]. An adaptability value of 1 denotes that the workflow can be completely modified such that it matches the given query q, while an adaptability value of 0 denotes that no further adaptations can be performed increasing the fulfillment of the query. Overall, the new function determines the query fulfillment of a workflow W after adaptation, i.e., QFadapt (q, W ) = QF (q, a(q, W )). Under a simplified assumption that the adaptability of the workflow is mostly independent from the query, the adaptation-guided query fulfillment function is specified by a global adaptability (GA) for a particular workflow W , i.e., adaptability(W ) ∈ [0, 1] (see Formula 6.7). A global adaptability value of 1 denotes, that a workflow can be adapted to any query while a value of 0 specifies that the workflow cannot be adapted at all. QFadapt (q, W ) = QF (q, W ) + (1 − QF (q, W )) · adaptability(W )
(6.7)
194
6 Improving Workflow Modeling Assistance
Both the global adaptability as well as the query specific adaptability function can be used to implement the adaptation-guided retrieval approach, as will be shown in the next section. Adaptation-guided retrieval then ranks the workflows such that the adaptability of the workflows is considered during retrieval.
6.2.2 Workflow Adaptability Assessment The precise determination of the adaptability for all workflows stored in the repository and all potential queries is computationally not feasible. Hence, the adaptability of the workflows needs to be determined heuristically. The basic idea is to perform several training adaptations prior to retrieval in order to estimate the adaptability of the stored workflows (see Fig. 6.17). The precomputed adaptability values can then be reused during adaptationguided retrieval. A particular advantage of this adaptability assessment is that it is independent from the underlying adaptation method. Thus, no adaptation-specific implementations are required. Furthermore, as adaptability is derived from experience on prior adaptations [128] it can automatically be refined, when new adaptation knowledge is stored, i.e., whenever new workflow cases are added to the workflow repository. Query q 1 A
Workflow W
… Query qn A
adaptability?
Adapted Workflow a(q1,W)
QF(q1,a(q1,W1))
…
…
Adapted Workflow a(qn,W)
QF(qn,a(qn,Wn))
Executing adaptations
Figure 6.17: Adaptability assessment based on [25]
The adaptability assessment is now described in more detail. In order to derive appropriate adaptability values for the adaptation-guided retrieval, the adaptability assessment performs several representative adaptations for the stored workflows. The sampling of representative adaptations is based on
6.2 Adaptation-Guided Workflow Retrieval
195
the assumption that transformations from one workflow to another workflow represent common adaptation scenarios. Consequently, each stored workflow Wi ∈ RP is basically adapted to each other workflow Wj ∈ RP, i 6= j in the repository. The required query describing such adaptations can be easily constructed by determining the differences between the two workflows Wi and Wj . Here, only POQL-Lite queries (see Sect. 4.4) are considered, since they represent more general adaptation scenarios that more likely reflect the general adaptability of the workflow. Thus, the query is constructed based on the workflow nodes contained in Wi and Wj respectively. More precisely, all nodes in the workflow Wi that are not contained in the workflow Wj regarding the semantic labels are determined as undesired nodes, while those contained in workflow Wj but not in workflow Wi are specified as desired nodes. The corresponding query specifying the transformation of the workflow Wi to the workflow Wj is hereinafter denoted as qij . qij q-ij
Wi salmon
mayonnaise
chop
dill
seafood sauce
mix
∆
peel
chop q+ij
Wj shrimps
dill
salmon
mayonnaise
seafood sauce
mix
shrimps
peel
Figure 6.18: Difference-based query construction to transform workflow Wi into Wj
An example of this query construction is depicted in Figure 6.18. Here, the two illustrated workflows Wi and Wj produce two different seafood sauces. While Wi creates a sauce by combining chopped salmon with mayonnaise and dill, Wj uses shrimps that are peeled and combined with mayonnaise only. The query qij to transform Wi into Wj , consequently specifies the
196
6 Improving Workflow Modeling Assistance
ingredients salmon and dill as well as the preparation step chop as undesired − (qij ), since they are only contained in the workflow Wi . In contrast, the ingredient shrimps and the preparation step peel are only contained in + workflow Wj and are thus specified as desired nodes (qij ). For each pair of workflows in the repository, i.e., (Wi , Wj ) : Wi ∈ RP ∧ Wj ∈ RP, i 6= j, such a query qij is determined to perform the respective training adaptations. The adaptation then transforms the workflow Wi into an adapted workflow a(qij , Wi ) regarding query qij . Based on this, the query-specific adaptability of the particular adaptation scenario (see Formula 6.8) can be determined by considering the gain of the query fulfillment that was achieved by adaptation, i.e., QF (qij , a(qij , Wi )) − QF (qij , Wi ). If the exact same query occurs again in the future, the determined query-specific adaptability can be reused for this particular workflow during adaptationguided retrieval.
adaptability(Wi , qij ) =
QF (qij , a(qij , Wi )) − QF (qij , Wi ) 1 − QF (qij , Wi )
(6.8)
The query-specific adaptability values resulting from the training adaptations are then stored for each workflow in the repository. Based on these values, global or query-specific adaptability values also for other adaptation scenarios can be derived as will be shown in the next sections. Global Adaptability The global adaptability approach is based on the assumption that the adaptability of a workflow is independent from the particular query. Consequently, a workflow that is adaptable in many training scenarios, is considered to be highly adaptable in general. The global adaptability of a workflow adaptability(W ) → [0, 1] is thus computed by the harmonic mean of the previously determined query-specific adaptability values (see Formula 6.9). |RP P|
adaptability(Wi ) =
adaptability(Wi , qij )
j=1,j6=i
|RP | − 1
(6.9)
This global adaptability value is then annotated to the workflow. Thus, during adaptation-guided retrieval the corresponding adaptability is immediately available to compute the particular fulfillment QFadapt (q, W ) (see Sect. 6.2.1) of any query q.
6.2 Adaptation-Guided Workflow Retrieval
197
Query-Specific Adaptability Assuming that the query has a larger impact on the adaptability, a single global adaptability value is insufficient. The adaptability then varies widely depending on the particular adaptation scenario. This assumption is supported by Smyth and Keane [235], who argue that adaptability usually depends on the problem-solving context, i.e., the case and the desired solution. This requires that a query-specific adaptability is considered during the adaptation-guided retrieval (see Sect. 6.2.1). As a consequence, a more precise adaptability assessment can be achieved. Table 6.2: Adaptability table for workflow Wi , based on [25]
goal workflow W1 ... Wn
query-specific adaptability adaptability(Wi , qi1 ) adaptability(Wi , qin )
The query-specific approach is based on an adaptability table, constructed for each workflow Wi ∈ RP in the repository (see Table 6.2). This table stores the adaptability values determined from the training adaptations as well as a reference to the corresponding workflow Wj used to construct the respective adaptation scenario (hereinafter referred to as goal workflow ). Based on these adaptability tables, query-specific adaptabilities can also be determined for new queries. For this purpose, the most appropriate adaptability value for a workflow W from the respective table is selected. Since the query specifies the goal of the adaptation, the example adaptation scenario with the most similar adaptation goal presumably reflects the most accurate adaptability value. Hence, for the query q, the best-matching goal workflow from the adaptability table is identified. The corresponding adaptability value is then considered during adaptation-guided retrieval in order to compute the query fulfillment QFadapt (q, W ) (see Sect. 6.2.1). To determine this adaptability-oriented query fulfillment, also the traditional query fulfillment QF (q, W ) of all workflows W ∈ RP in the repository are computed (see Sect. 6.2.1). Thus, the workflows in the repository are basically ranked according to the query fulfillment. Consequently, for deriving the required adaptability value adaptability(q, W ), the best-matching goal workflow (also contained in the repository) can be promptly determined
198
6 Improving Workflow Modeling Assistance
without any additional computations4 .
6.2.3 Evaluation The approach to adaptation-guided retrieval of workflows was investigated in an experimental evaluation [25], which is summarized below. It was studied whether better adaptable workflows can be selected (Hypothesis H1) such that the final query fulfillment is higher after adaptation (Hypothesis H2) compared to the standard approach. H1. Adaptation-guided retrieval selects better adaptable workflows compared to the standard approach. H2. Adaptation-guided retrieval results in a higher query fulfillment compared to the standard approach after adaptation. In order to investigate these hypotheses, the adaptation principle of compositional adaptation by workflow streams5 was applied (see Sect. 5.2). The used workflow repository consists of 58 pasta recipes manually modeled based on real cooking recipes. The evaluation considered two different conditions on the workflow repository. Condition A uses all 58 pasta recipes for learning adaptation knowledge, while in condition B 20 randomly selected workflows were marked such that these workflows were not adaptable and not used to learn adaptation knowledge. Thus, condition A results in a large amount of adaptation knowledge and in highly adaptable workflows. In contrast, less adaptation knowledge exists under condition B and the adaptability of the workflows differ significantly. For both conditions adaptation knowledge (i.e., workflows streams) was then extracted automatically. Next, 58 POQL-Lite queries were automatically generated. More precisely, for each workflow W in the repository, the most similar workflow W 0 was selected. The nodes that are only contained in workflow W were considered as desired nodes and the nodes that only occur in workflow W 0 were considered as undesired nodes of the query. Based on these generated queries, a leave-one-out experiment was performed. More precisely, the particular workflow that generated the query, was temporarily removed from the workflow repository. The adaptation knowledge, however, is not changed by removing the workflow, which 4 This
is, however, not considered in the corresponding publication [25], which resulted in an increased retrieval time 5 The compositional adaptation approach was chosen for this evaluation, since it facilitates the validation and the assessment of the evaluation results (i.e., the manual determination of workflow adaptability and potential adaptations) compared to the operator-based or combined adaptation.
6.2 Adaptation-Guided Workflow Retrieval
199
prevents potential effects on the evaluation caused by varying adaptation knowledge depending on the query. Then, the estimated adaptability values were computed under both conditions (A + B). Subsequently, the query was used to validate the standard retrieval approach (QF ) as well as the two adaptation-guided retrieval approaches using the global adaptability (GA) and the query-specific adaptability (GSA), respectively. Table 6.3: Experimental results
condition A B
increase in query fulfillment QF GA QSA 0.030 0.076 0.149 0.013 0.063 0.123
final query fulfillment QF GA QSA 0.834 0.850 0.850 0.817 0.826 0.832
The evaluation results on both conditions for the standard (QF ) and the two adaptation-guided retrieval approaches (GA + QSA) are illustrated in Table 6.3. This table shows the average increase in query fulfillment that was achieved during adaptation (i.e., the actual adaptability), and the final query fulfillment of the adapted workflows on average. In general, both adaptation-guided retrieval approaches result in a higher increase in query fulfillment during adaptation compared to the standard approach. This increase is statistically significantly higher on both conditions as determined by a paired t-test (p < 0.005). Thus, better adaptable workflows are selected by adaptation-guided retrieval, which clearly approves Hypothesis H1. Furthermore, the results show that the query-specific adaptability approach (QSA) outperforms the global adaptability approach (GA) in this respect. Moreover, the final query fulfillment was analyzed. Under both conditions, the adaptation-guided retrieval approaches have a higher query fulfillment compared to the standard approach. These findings are statistically significant under condition A (p < 0.05). Under condition B, the improvement of the adaptation-guided approaches is weaker but statistically significant for the query-specific adaptability approach. Overall, the increase in query fulfillment is small but statistically significant in most cases. Thus, hypothesis H2 is mostly confirmed. Figure 6.19 illustrates the increase in query fulfillment of both adaptabilityguided retrieval approaches (GA+QSA) compared to the standard approach (QF ) for condition A. While for most queries an increase in query fulfillment can be measured, also the opposite effect can be observed in some cases (e.g., Q29). Investigations showed that the assessed adaptability of the
200
6 Improving Workflow Modeling Assistance
workflows selected in these cases is high, but the actual adaptability is significantly lower. This over-estimation of the adaptability thus results in an inappropriate workflow selection in some cases. In order to prevent such scenarios, it is conceivable to employ an ensemble approach in which the standard as well as the adaptation-guided retrieval approach are performed and the better matching workflow is finally selected. This would increase the computation effort but significantly increase the resulting query fulfillment. Difference to QF
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32 Q33 Q34 Q35 Q36 Q37 Q38 Q39 Q40 Q41 Q42 Q43 Q44 Q45 Q46 Q47 Q48 Q49 Q50 Q51 Q52 Q53 Q54 Q55 Q56 Q57 Q58 avg
0.25 0.20 0.15 0.10 0.05 0.00 −0.05 −0.10
GA − QF
QSA − QF
Figure 6.19: Increase in query fulfillment compared to standard approach (taken from [25])
Overall, the experiments have shown that the presented adaptation-guided retrieval can provide more suitable workflow solutions, i.e., workflows with a higher query fulfillment. The query-specific adaptability assessment is more precise compared to the global adaptability, resulting in better matching workflows. However, the general increase in query fulfillment is less than expected, which partially results from an over-estimation of the adaptability. Based on these findings, investigations showed that an ensemble approach, combining standard and adaptation-guided retrieval could further increase the query fulfillment.
6.2.4 Conclusions and Related Work This section illustrated that the workflow modeling assistance using traditional retrieval based on query fulfillment cannot guarantee that the most optimal workflow with regard to the query is constructed. Instead, the adaptability of the workflows must be considered during retrieval to provide better matching workflow solutions. An approach for workflow adaptability assessment has been presented that is independent from the specific implementation of the adaptation process. More precisely, several training adaptations are executed to learn representative adaptability values prior to retrieval. Based on the determined adaptability values, adaptation-guided retrieval can be
6.3 Complexity-Aware Workflow Construction
201
performed considering query-specific or global workflow adaptabilities. An experimental evaluation demonstrated that the adaptation-guided retrieval can result in workflows with a higher query fulfillment. In general, the accuracy of the adaptation-guided retrieval could be continuously improved by recomputing the adaptability assessment whenever new workflows are added to the repository (resulting in more training scenarios). Several approaches related to adaptation-guided retrieval in CBR were presented. Smyth and Keane [233], for example, analyze the available adaptation knowledge to consider the adaptability during retrieval. Another approach introduced by Stahl [236] discuss the training of similarity measures such that they reflect the adaptability of the cases. Similar to the presented approach, Leake et al. [128] estimate the adaptability of the cases based on previous adaptation scenarios in the domain of Case-Based Planning. In general, the learned adaptability values represent an indicator for the coverage of the workflows [234], i.e., the set of possible workflow solutions that can be provided by this single workflow. Thus, the assessed adaptability values could also be considered during workflow retainment (see Sect. 2.4.6).
6.3 Complexity-Aware Workflow Construction The presented workflow modeling assistance so far only considers the query fulfillment as the optimization criterion for the construction of a workflow. More precisely, during workflow retrieval, the best-matching workflow (i.e., the workflow with the highest query fulfillment) is selected and subsequently adapted such that the resulting workflow maximizes the query fulfillment. Consequently, workflows can be generated such that desired and undesired workflow components are considered. However, in certain application scenarios, the consideration of additional criteria is required. In particular, the workflow complexity can be important in many respects. Complex workflow models are usually characterized by a low understandability, which results in a difficult verification and maintenance of the modeled process. As a consequence, the error-proneness is increased [43, 145]. Hence, a reduced complexity can contribute to the appropriate enactment of the workflow. Furthermore, the associated increase in understandability facilitates access to the workflow technology for novice workflow modelers. For assessing the complexity of workflow models, several measures were proposed in business process literature. These measures mostly focus on properties of the model such as the number of workflow elements or the connectivity of workflow elements with regard to the data- or control-flow.
202
6 Improving Workflow Modeling Assistance
In addition to these traditional measures, complexity can also refer to the enactment of workflows. For example, workflows with long execution times or workflows containing demanding tasks can indicate a high complexity of the modeled process with regard to its execution. Consequently, it can be assumed that the complexity of a workflow as a whole cannot be measured by a single criterion but is rather composed of several indicators. The reduction of workflow complexity is not only important for business processes but can be highly relevant in various domains. In cooking, for example, a low complexity is desirable, if the user requests an easy-to-prepare cooking recipe, i.e., a recipe with a short preparation time, a low number of ingredients, or with low requirements on the cooking skills. Similar scenarios are conceivable in many domains. Consequently, this section presents an approach to the complexity-aware workflow construction that was published by M¨ uller and Bergmann [174] c of original publication by Springer 2017). First, existing literature to ( assess the complexity of processes is discussed in more detail. Subsequently, a domain-independent complexity measure for workflows is introduced that considers various complexity criteria. This measure is then integrated into the workflow modeling assistance such that workflows with a low complexity can be generated. This section finally concludes with the presentation of an experimental evaluation. The corresponding results demonstrate that the approach reduces the complexity of the constructed workflows and further indicate that the approach could be applied to optimize various other criteria in general.
6.3.1 Process Complexity Process quality is widely discussed particularly in business process literature (see Sect. 2.2.5). An important aspect of quality that is frequently a subject of discussion is the complexity of process models. The correlation between process complexity and the understandability of the process model is emphasized by several authors (e.g., [198, 148]). Due to the importance, various measures for the automated assessment of process complexity are proposed in literature. These measures are, for instance, inspired by established measures in software development [42, 41, 122] or graph complexity measures such as the coefficient network complexity or the complexity index [121]. Cardoso et al. [43], for example, consider the number of workflow elements (e.g., number of tasks) or analyze the structure of the data- or control-flow to assess the workflow complexity. An alternative approach for estimating the complexity is the cross-connectivity metric of Vanderfesten
6.3 Complexity-Aware Workflow Construction
203
et al. [255] that measures the connection strength between process elements. Also for process models defined as petri nets, several complexity metrics were presented [120]. While these traditional complexity measures usually assess the complexity of the process model from a conceptual perspective (see Sect. 2.2.5), also the implications from the workflow execution may indicate a certain kind of complexity. For example, time-consuming workflows or workflows containing tasks with high skill demands on the involved users commonly characterize a rather complex process.
6.3.2 Complexity Measure Since business process literature proposes a diversity of complexity measures, focusing on different aspects of the workflow, it can be assumed that a more holistic view of complexity is composed of several criteria. Based on this assumption, a new domain-independent complexity measure complexity(W ) → [0, 1] for a workflow W = (N, E) is now introduced, covering five different complexity criteria (see Table 6.4). Each of the single complexity criteria indicates a complexity in the range [0, 1] respectively. The single criteria are then aggregated to an overall workflow complexity complexity(W ) ∈ [0, 1] that assigns a high value to complex workflows and a lower value to less complex workflows. Table 6.4: Complexity criteria
critera description number of data nodes number of control-flow elements complexity of data-flow complexity of tasks lead time
criteria measure |N D | D |} max{|N1D |,...,|Nn |N T ∪N C | T ∪N C |} max{|N1T ∪N1C |,...,|Nn n 2·|N T | 1 − |E D | P taskComplexity(t) t∈N T
|N T | leadT ime(W ) max{leadT ime(W1 ),...,leadT ime(Wn )}
The single complexity criteria of the overall workflow complexity are now introduced (see Table 6.4). In general, larger workflows containing more data nodes or control-flow elements tend to be more complex. Thus, the workflow complexity measure considers the number of data nodes as well as the number of control-flow elements (task and control-flow nodes) as the first
204
6 Improving Workflow Modeling Assistance
two basic complexity criteria. These criteria are normalized by the highest number of data nodes or control-flow elements occurring in any workflow from the repository. The third criterion is the complexity of the data-flow, which is determined by the average number of data nodes consumed or produced (i.e., the data-flow edges E D ) by the tasks N T in the workflow6 . Consequently, this criterion indicates a high complexity, if the tasks in the workflow consume and produce a great number of data nodes. While the complexity of the tasks is affected by the number of the linked data nodes, also the task itself may be more or less complex. Thus, the next criterion determines the complexity of the tasks by considering the required skill level of the users for executing the tasks in the workflow [131]. Therefore, task complexity values taskComplexity(t) ∈ [0, 1] are annotated to each task in the corresponding taxonomy (see Sect. 3.3). The average skill level of the tasks contained in the workflow is then computed to measure the task complexity as the fourth criterion. Finally, the overall execution time, also referred to as the lead time [105], (see Sect. 2.2.5) is considered as an indicator of the workflow complexity. The lead time is computed by the single throughput times [105](see Sect. 2.2.5) of the tasks in the workflow, i.e., the elapsed time until a task is completed after the previous task was finished. The approximated throughput times throughputT ime(t) ∈ N are again annotated to the tasks in the corresponding taxonomy (see Sect. 3.3). For measuring the overall lead time of the workflow, the path with the highest overall throughput time from the first to the last workflow activity is considered, i.e., maxP ath(W ) ⊆ N T . More precisely, the single throughput times of this path are aggregated to determine P the lead time of the workflow as the last criterion, i.e., leadT ime(W ) = t∈maxP ath(W ) throughputT ime(t). The computed lead time is then normalized by the highest occurring value in any workflow from the repository. The overall workflow complexity measure complexity(W ) for a workflow W is finally determined by the arithmetic mean of these five complexity criteria. The single quality criteria of the domain-independent complexity measure are now exemplified in the cooking domain (see Sect. 3.1). Here, complexity refers to the difficulty of preparation. A workflow with a low complexity thus represents an easy-to-prepare recipe. In a cooking workflow, the first two criteria would basically consider the number of ingredients (number of data nodes) as well as the number of preparation steps (number of control-flow elements). The third criterion assesses the complexity of the preparation 6 Please
note that each task in a consistent workflow consumes and produces at least one data node, respectively.
6.3 Complexity-Aware Workflow Construction
205
(complexity of data-flow). More precisely, a high complexity according to this criterion results, if each preparation step involves a large number of ingredients. For example, a preparation step consuming only two ingredients is usually less complex than the same task with five ingredients consumed. The previously introduced task complexity here refers to the complexity of preparation steps. As an example, the preparation step devein is more complex than the rather simple preparation step mix. Finally, the last criterion (lead time) specifies the preparation time. By using this criterion, it can be considered that recipes containing time-consuming preparation steps such as bake are usually more complex than quick recipes that, for example, merely mix ingredients.
6.3.3 Complexity-Aware Query Fulfillment The presented complexity measure is integrated into the workflow modeling assistance, by introducing a new criterion to guide the retrieval and adaptation process. QFcomplexity (q, W ) = α · QF (q, W ) + (1 − α) · (1 − complexity(W )) (6.10) This new criterion needs to incorporate the query fulfillment as well as the complexity of the workflow. Thus, a so-called complexity-aware query fulfillment QFcomplexity (q, W ) → [0, 1] is introduced in Formula 6.10. In order to control the weighting between query fulfillment and complexity, a parameter α ∈ [0, 1] can be specified. The new criterion then replaces the current query fulfillment measure QF (q, W ) of the workflow modeling assistance. Consequently, during retrieval the workflow from the repository with the highest value QFcomplexity (q, W ) is selected and subsequently adapted according to this new criterion. Thus, retrieval as well as adaptation become complexity-aware. As a result, workflows should be constructed such that they are characterized by a low complexity and a high query fulfillment. Please note that the complexity-aware query fulfillment specifies a multiobjective optimization problem. Thus, the workflow modeling assistance cannot guarantee to maximize both criteria at the same time. However, the new complexity-aware query fulfillment is highly flexible such that basically any complexity measure could be considered. Furthermore, it is conceivable to integrate various other criteria into the workflow modeling assistance in the same manner.
206
6 Improving Workflow Modeling Assistance
6.3.4 Experimental Evaluation This section presents an experimental evaluation [174], which was performed in order to validate the proposed complexity-aware workflow construction. The evaluation investigated whether the complexity of the constructed workflows can be reduced (Hypothesis H1). Moreover, the impact on the query fulfillment was analyzed, assuming that the complexity-aware approach still enables to construct workflows with a high query fulfillment (Hypothesis H2). H1. The complexity-aware approach constructs workflows with a significantly lower complexity compared to the standard approach. H2. The complexity-aware approach constructs workflows with a high query fulfillment. For this evaluation, a workflow repository with 61 sandwich recipes was manually constructed based on real cooking recipes. The cooking workflows in the repository only contained a single sequence of tasks, i.e., they did not contain any control-flow nodes. Based on the WikiTaaable ontology, taxonomies of preparation steps and ingredients were constructed and annotated with similarity values (see Sect. 3.5). Additionally, the taxonomy of preparation steps was annotated with the approximated throughput times and task complexity values (see Sect. 6.3.2) that are required for the estimation of the workflow complexity. Furthermore, 61 POQL-Lite queries (see Sect. 4.4) were determined automatically by analyzing pairs of workflows in the repository. More precisely, for each workflow W ∈ RP in the repository, the most similar workflow W 0 ∈ RP was selected. Based on the difference between these two workflows, a query was constructed automatically such that the workflow nodes that are only contained in workflow W are defined as desired and the workflow nodes that only occur in workflow W 0 are defined as undesired. The size of the resulting query was limited to a maximum of 4 ingredients and 2 preparation steps in the desired or undesired part of the query respectively. To evaluate the hypotheses, several leave-one-out experiments were performed, i.e., the workflow W used to construct the respective query was temporarily removed from the repository. For each of the 61 queries, the standard as well as the complexity-aware workflow construction was executed, which consider the query fulfillment or the new complexity-aware query fulfillment (with parameter α = 0.5) as the respective optimization criterion.
6.3 Complexity-Aware Workflow Construction
207
Table 6.5: Evaluation results: average values over all queries
standard retrieval standard adaption complexity-aware retrieval complexity-aware adaption
query fulfillment 0.83 0.92 0.75 0.87
complexity 0.43 0.48 0.28 0.29
combined 0.70 0.72 0.74 0.79
The evaluation results illustrated in Table 6.5 show the average values of the query fulfillment, the complexity as well as of the combined measure QFcomplexity (q, W ) for both approaches after retrieval and after adaptation. In general, the complexity increases for both approaches during adaptation in favor of query fulfillment. However, the complexity-aware approach selects a workflow with a lower complexity during retrieval as starting point for adaptation. As a result, the complexity-aware approach produces workflows with an average complexity of 0.29, while the standard approach results in workflows with an average complexity of 0.48. Thus, the workflows constructed by the complexity-aware approach are characterized by a significantly lower complexity (-40%), which confirms hypothesis H1. Regarding the query fulfillment, a significant increase can be achieved by adaptation for both approaches. In the standard approach, the query fulfillment is increased from 0.83 to 0.92 and for the complexity-aware approach from 0.75 to 0.87. Thus, both approaches construct workflows with a high query fulfillment, which confirms Hypothesis H2. Furthermore, these results show that the complexity-aware approach only slightly reduces the query fulfillment (-5%, 0.87 in relation to 0,92) compared to the standard-approach. Summarizing, the complexity-aware approach constructs workflows with a high query fulfillment but a significantly reduced complexity. Overall, these results clearly confirm the evaluation hypotheses and the feasibility of the approach. Most importantly, the evaluation demonstrated that the workflow modeling assistance is highly flexible and can be used to construct workflows according to a multi-dimensional complexity criterion and the query fulfillment at the same time. This strongly indicates that other complexity measures or additional criteria in general could be integrated.
208
6 Improving Workflow Modeling Assistance
6.3.5 Conclusions This section introduced a complexity-aware criterion for the presented workflow modeling assistance. By using this criterion during retrieval and adaptation, the generated workflows are optimized with regard to query fulfillment as well as workflow complexity. An evaluation demonstrated the feasibility of the approach, showing that well-matching workflows with a low complexity can be generated. Furthermore, the results indicate that various other criteria could be considered by the workflow modeling assistance in general.
6.4 Summary This chapter introduced several approaches to enhance the presented workflow modeling assistance. First, the completion of missing data-flow information of workflows was addressed in Section 6.1. By means of automatically learned completion operators and a set of default completion rules, more complete workflow models are constructed, which improves the solution provided by the workflow modeling assistance. Next, the separation of the retrieval and the adaptation stage was discussed in Section 6.2. A workflow with an initially lower query fulfillment may be better adaptable according to a given query. Selecting such a workflow during retrieval would result in a higher query fulfillment after adaptation. Thus, two different retrieval approaches were presented that consider the adaptability of workflows. Finally, this chapter discussed the construction of workflows with regard to workflow complexity (see Sect. 6.3). A complexity-aware query fulfillment that covers various aspects of workflow complexity is integrated into the retrieval and adaptation process of the workflow modeling assistance. As a result, workflows with a low complexity can be generated.
7 Workflow Modeling Assistance Architecture In this chapter, the implementation of the workflow modeling assistance will be described, which includes all the previously discussed approaches within a single system. The architecture is based on the Collaborative Agile Knowledge Engine CAKE, which will be depicted in the next section. Then, the architecture of the implemented workflow modeling assistance will be illustrated in more detail. Furthermore, a prototype system in the cooking domain will be presented, which creates individual sandwich recipes according to desired and undesired preparation steps or ingredients. Finally, the idea of an interactive workflow modeling interface will be discussed, which provides modeling suggestions to the user during the actual modeling of a workflow.
7.1 Collaborative Agile Knowledge Engine The intelligent workflow modeling assistance is implemented within the Collaborative Agile Knowledge Engine CAKE [19]. Most basically, CAKE integrates process management and knowledge management in a single system. More precisely, CAKE combines a workflow management system with an artificial intelligence component implemented by a CBR system. CAKE is a framework developed at the University of Trier and has been developed and improved throughout various research projects and theses 1 . The foundations for the CBR system were laid in 2004 by the AMIRA project, which focused on providing diagnostics and decision support for mobile workers such as firefighters in urgent and critical situations. By that time, a process-oriented perspective had been considered within the APPEAR project, which focused on the analysis of medical treatment processes of patients. In 2006, the URANUS project finally addressed to construct an agile workflow management system, for example, to support flexible workflow management of chip design processes. The WEDA project focused on the 1 http://cake.wi2.uni-trier.de
© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_7
210
7 Workflow Modeling Assistance Architecture
implementation of a cloud-based workflow management system in which REST-based web applications can easily be included and triggered during the execution of workflows. This thesis resulted from the EVER project (see Sect. 1.4), which dealt with the automated workflow extraction of textual process descriptions and the reasoning with these workflows by means of PO-CBR methods. So far CAKE has been applied in numerous domains such as construction processes, processes in daily life, and also for cooking recipes [19]. Recent research projects will extend the work related to the CAKE framework. More precisely, the SEMAFLEX project [86] aims at supporting the execution of highly flexible workflows (see Sect. 1, flexibility by deviation) and integrates knowledge-based document management. The eXplore! project includes the application of the workflow paradigm to support researchers in the field of digital humanities during their daily research activities. The broad range of application fields for CAKE and the necessity to model workflows within most of these domains again highlights the importance for a workflow modeling assistance as a supporting component also within the CAKE framework.
Figure 7.1: CAKE architecture taken from [19]
7.1 Collaborative Agile Knowledge Engine
211
The architecture of the CAKE framework is illustrated in Figure 7.1. Basically, the CAKE framework is implemented as a Software as a Service and consequently consists of a client and a server component as well as several databases. The client component of CAKE provides a browser-based system access and also a worklist app for Android devices [19]. The browserbased client offers the ability to collaboratively model, execute, and adapt workflows in real-time [81]. Furthermore, the client provides access to the worklist of the particular user and to the administration of the users of the workflow management system. The communication between the user clients and the CAKE server is organized by the CAKE interface layer via the server API. The CAKE agile workflow engine manages the execution of workflows and organizes the performance of the involved activities, which are either assigned to users via the worklist manager or executed automatically by web services. The architecture is based on the reference model of the Workflow Management Coalition (see Sect. 2.3.1). The workflow engine executes workflows represented in a block-oriented workflow language called CAKE Flow Cloud Notation (CFCN) [151], supporting sequences, parallel, exclusive and cycle patterns (see Sect. 2.2.3). The agility of the workflow engine enables ad-hoc changes (see Sect. 2.3.2, flexibility by change) such that workflow instances can be dynamically modified during run-time. Either the entire workflow is suspended and manually adapted or a stop sign [158] is placed within the workflow such that it only suspends subsequent tasks, which can then be modified. After adaptation, the workflow execution can be proceeded. The CAKE storage layer manages all data, resources and user profiles within the system by accessing the databases. It further supports a workflow specific control mechanism that enables to define access rights for user groups or users that are permitted to view, modify or execute the respective workflows and related data resources [82]. The CAKE knowledge engine is based on a generic CBR system [141], which was recently integrated as a component of CAKE to support users of the workflow management system by means of PO-CBR methods. The knowledge engine utilizes a separate workflow repository (case base) consisting of semantic workflows (see Sect. 3.4) that is not managed by the storage layer. Instead, this repository only contains a specific subset of workflows, for example, revised best-practices, that should be considered by the PO-CBR methods and that can further be shared among all users. These workflows are then made accessible via the retrieval engine using the semantic workflow similarity measure (see Sect. 3.5). Thus, for a partially
212
7 Workflow Modeling Assistance Architecture
modeled workflow, relevant best-practices can be identified. In this work, the retrieval engine was extended by the POQL query language (see Chap. 4), which additionally allows to specify undesired workflow elements or undesired workflow fragments. Furthermore, the adaptation engine was implemented which adapts workflows according to the desired specifications as described in Chapter 5. More details on the knowledge engine will be explained in the next section. The CAKE acquisition layer acquires information for the CAKE knowledge engine. The workflow repository of the knowledge engine could be initialized, for example, by selecting appropriate best-practice workflows stored in the workflow management system or by automatic workflow extraction from textual process descriptions crawled from the Internet (see [222] and Sect. 2.6.2). Furthermore, required ontologies (see Sect. 3.3) and similarity configurations (see Sect. 3.5) can be defined and integrated via the acquisition layer.
7.2 Workflow Modeling Assistance Architecture The workflow modeling assistance is implemented within the knowledge engine of the previously introduced CAKE framework (see Sect. 7.1), which is based on a generic CBR system [141]. The generic architecture enables a flexible application of the workflow modeling assistance to various workflow domains. In this section, the system configuration as well as the overall architecture of the workflow modeling assistance will be described. Knowledge Container
vocabulary
case base
similarity
adaptation
XML File (CAKE Representation)
model.xml (CDM)
casebase.xml (CDOP)
sim.xml (CDSM)
adaptationConfig.xml (CAM)
System Handler
Domain Knowledge Manager
Repository Manger
Similarity Valuator/ Retrieval Manager
Adaptation Manager
Figure 7.2: System configuration
The configuration of the system (see Fig. 7.2) is structured into the four CBR knowledge containers (see Sect. 2.4.2), i.e., the vocabulary, the case base, the similarity, and the adaptation knowledge container. The knowledge containers are specified via XML-based configuration files, which
7.2 Workflow Modeling Assistance Architecture
213
enables a customization of the modeling assistance to the requirements in the particular application domain. The configuration files are read on system initialization. At run-time each knowledge container of the CBR system is managed by a separate system handler based on the configuration settings. Aggregate Atomic
Boolean Numeric
Double Integer
String Collection DataObject
Void
Set List
NESTGraph NESTGraphItem
NESTNode
NESTSequenceNode
NESTTaskNode NESTControlflowNode
NESTDataNode NESEdge
NESTControlflowEdge NESTDataflowEdge
Figure 7.3: CAKE data types (in part, see [141][p. 149])
The vocabulary container defines the basic structure of the workflow models as well as the domain-specific knowledge that can be used to describe these workflows. The configuration file (model.xml ) is based on the CAKE Data Model (CDM) [143][p. 148 ff], which comprises all data types handled by the system. The CDM offers a wide range of atomic values, for example, boolean, numeric or string values (see Fig. 7.3). Furthermore, entire classes (aggregates) can be specified by a set of attributes containing atomic values or references to other classes. The CDM also comprises workflow edges (e.g, data-flow or control-flow edges) as well as workflow nodes (e.g., data or task nodes) that can be used to represent semantic workflows (also referred to as NESTGraphs2 ). The domain-specific knowledge that needs to be specified in the configuration file consists of a taxonomy of task labels and a taxonomy 2 Semantic
workflows are also referred to as NESTGraphs, since in their original definition (see [20]) they consist of N odes, E dges, S emantic descriptions and different T ypes of workflow elements (e.g. data or task node).
214
7 Workflow Modeling Assistance Architecture
of data labels (see Sect. 3.3) representing the labels that can be used to describe workflow tasks or workflow data nodes respectively. In principle, the semantic workflows (see Sect. 3.4) could also be annotated with more complex information by use of an aggregate class. The repository manager handles the case base, i.e., the workflow repository. All workflow models are stored within a single XML file (casebase.xml ) representing the workflow repository. Depending on the particular domain, this repository could comprise the workflows accessible for all users (see CAKE architecture in Sect. 7.1) or a set of verified workflows such that only best-practice workflows are contained. The repository can continuously be extended, for example, in case a new workflow has been modeled or adapted. The repository manager internally stores the workflow repository in a so-called CAKE Data Object Pool (CDOP) [141][p. 20 ff]. The similarity container comprises the required information to compute similarities between workflows by means of the semantic workflow similarity measure (see Sect. 3.5). Thus, in the corresponding configuration file (sim.xml ) the parameters of the semantic similarity measure can be defined. Most importantly, for the labels in the domain taxonomies, similarity values must be specified in order to define the similarity measure in the particular domain (see Sect. 3.5). This configuration file is based on the so-called CAKE Data Similarity Model (CDSM) [141][p. 24 ff]. A Similarity Valuator can then be accessed during run-time to compute the defined domainspecific workflow similarities. The Retrieval Manager calls this similarity computation to rank the workflows stored in the repository for a given POQL query according to their query fulfillment (see Chap. 4). The adaptation knowledge container stores the adaptation algorithms as well as the adaptation knowledge itself. A new configuration file developed in this thesis (adaptationConfig.xml ) determines the CAKE adaptation model (CAM), which specifies the adaptation process of the workflow modeling assistance. More precisely, in this configuration file, the adaptation algorithms (see Chap. 5) to be used during adaptation and their execution order is defined. Furthermore, the parameters of the respective adaptation algorithms can be specified. Hence, the workflow adaptation process can be individually adjusted to the particular domain and is not necessarily bound to the proposed adaptation process of Section 5.5.2. Moreover, the CAM specifies the location of the adaptation knowledge repository and enables to configure whether adaptation knowledge should be learned automatically when initializing the system. Finally, in this configuration file an alternative adaptation criterion (or adaptation goal) may be specified. Instead of increasing the query fulfillment, for example, the reduction of the workflow
7.2 Workflow Modeling Assistance Architecture
215
model’s complexity could be such a criterion (see Sect. 6.3). Based on the configuration file the Adaptation Manager performs the adaptation of workflows and also manages the adaptation knowledge repository. A more detailed perspective of the overall CAKE architecture focusing only on the important components of the workflow modeling assistance is now illustrated in Figure 7.4. Most basically, the architecture rests upon a client-server model in which the client-server communication is realized via a REST-based API.
Client Query Editor
Workflow Editor Server
Adaptation Manager
workflow adaptation knowledge validator/verifier repository
Repository Manager
Domain Knowlege Manager
workflow modifier
adaptation algorithms
Retrieval Manager domain taxonomies
workflow repository
similarity valuator
retriever algorithms
Figure 7.4: Workflow modeling assistance architecture
The client interface is implemented as a browser-based application and consists of a query editor for defining POQL queries as well as a workflow editor, which enables the manual modification of a workflow model. The server component is structured according to the four CBR knowledge containers and consequently consists of a domain knowledge manager (vocabulary), a repository manager (case base), a retrieval manager (similarity), and an adaptation manager (adaptation). Essentially, the Domain Knowledge Manager maintains the domain taxonomies defining the hierarchical structure of the domain-specific workflow labels. All workflows maintained
216
7 Workflow Modeling Assistance Architecture
by the system are stored in a workflow repository, which is controlled by the Repository Manager. The Retrieval Manager component is responsible for computing similarities and for executing retrieval processes. More precisely, the integrated similarity valuator computes similarities between workflow models based on the defined taxonomies and the semantic workflow similarity measure (see Sect. 3.5). Furthermore, various retriever algorithms enable to compute a ranking between workflows for a defined POQL query (see Sect. 4.3) and to retrieve appropriate adaptation knowledge during adaptation. Finally, the adaptation manager coordinates the various adaptation algorithms and monitors their proper execution. Each adaptation algorithm comprises a method to execute the workflow adaptation and a method to learn the required adaptation knowledge from the workflow repository. The adaptation manager initially triggers the learning of required adaptation knowledge and maintains it in a designated adaptation knowledge repository. During the adaptation of a workflow, the workflow modifier provides basic operations for the modification of workflows such as the insertion or deletion of single workflow nodes or edges and also of entire workflow components or fragments. Additionally, it provides cleanup functions, for example, to remove unused control-flow blocks (not containing a single task) or unused data objects (not linked to any task). This significantly facilitates the implementation of new workflow adaptation algorithms. Moreover, the adaptation manager contains a workflow verifier/validator component, which is able to validate the syntactical correctness of adaptations as well as to verify the semantical correctness with regard to the workflows’ consistency (see Sect. 3.2.1). The adaptation manager as well as the adaptation algorithms utilize this component to monitor the adaptation process such that the syntactical and semantical correctness of the adapted workflow is ensured. The adaptation-guided retrieval approach and the data-flow completion of workflows described in Chapter 6, are not explicitly depicted in the illustrated architecture (see Figure 7.4). The data-flow completion of workflows (see Sect. 6.1) is a separate component within the server module, which is able to learn the required domain-specific completion operators from the workflow repository and completes the data-flow of a workflow on demand. In contrast, the adaptation-guided retrieval approaches (see Sect. 6.2) are components of the retrieval manager, implemented by specific adaptation-guided retrieval algorithms. For enabling the adaptation-guided retrieval, the adaptability of the stored workflows is assessed during initialization of the system.
7.3 CookingCAKE: Prototypical Application
217
7.3 CookingCAKE: Prototypical Application CookingCAKE3 [169] is a prototypical application of the presented workflow modeling assistance, which recently participated in the Computer Cooking Contest in 2014 [167], 2015 [169] and 2017 [175, 269]. The CookingCAKE system is based on the previously described architecture and supports amateur chefs in finding suitable cooking recipes adapted to their individual demands. Given a query that specifies desired and undesired preparation steps or ingredients, a cooking workflow is constructed automatically. Consequently, amateur chefs do not need to construct a novel recipe from scratch, but are rather supported by an example cooking process to prepare the desired dish. The internal workflow repository of CookingCAKE currently features 61 different sandwich recipes.
Figure 7.5: CookingCAKE: Query definition (simple mode)
The user can define the desired properties of the sandwich dish either in an easy query interface using POQL-Lite or an expert interface using POQL-Full (see Chap. 4). In either case, 156 different ingredients and 55 different cooking steps are currently supported. The simple query interface captures desired and undesired preparation steps or ingredients. In the screenshot shown in Figure 7.5, the query defines that bagel and bacon are desired ingredients, while no tomatoes should be contained. Furthermore, the sandwich dish should contain fried ingredients and no baking step. In certain scenarios, the easy query definition mode may be too restrictive. In this case, the user can switch to the expert query interface, which uses POQL-Full (see screenshot in Fig. 7.6). This interface is basically separated into two workflow editors4 , which capture desired or undesired properties 3 http://cookingcake.wi2.uni-trier.de 4 workflow
editors are implemented based on the JavaScript visualization library vis.js
218
7 Workflow Modeling Assistance Architecture
respectively. More precisely, the green-framed editor can be used to define desired workflow fragments (query workflow, see Chap. 4) and the redframed editor to specify undesired fragments (restriction workflow, see Chap. 4), respectively. Each of these workflow editors enables to specify single workflow elements, i.e., preparation steps and ingredients, as well as their relationships by use of control-flow and data-flow edges. Furthermore, also control-flow blocks such as the choice, parallel or cycle pattern can be defined (see Sect. 2.2.3). Compared to the simple mode, the user is now able to express relationships between workflow elements or to define entire workflow fragments as desired or undesired. In the screenshot shown above (see Fig. 7.6), the user specified that the bacon of the sandwich should be fried, which cannot be modeled by a POQL-Lite query (see Fig. 7.5).
Figure 7.6: CookingCAKE: Query definition (expert mode)
After the definition of a query, the workflow modeling assistance process (see Sect. 5.5.2) can be triggered by clicking on a create recipe button, resulting in a cooking workflow adapted to the specified query. The result page (see screenshot in Fig. 7.7) first provides a name as a short description
7.3 CookingCAKE: Prototypical Application
219
of the sandwich dish. This name is derived by simple rules based on the contained ingredients [169]5 . Furthermore, a detailed list of the contained ingredients as well as a textual step-by-step instruction6 to create the sandwich dish is provided. Below these information, the corresponding cooking workflow describes the cooking process of the recipe in more detail. In case that the user is not satisfied with the desired solution (here, possibly since bacon is included but not fried ), the user could directly modify the suggested workflow. CookingCAKE also offers a feedback mechanism to asses the quality of the produced recipe, which may be used in future versions to improve the workflow constructed by the workflow modeling assistance.
Figure 7.7: CookingCAKE: Result page
5A
more sophisticated approach for the labeling of CBR cases in general or cooking recipes in particular was presented by Kiani et al. [113] 6 derived from the workflow by listing preparation steps with corresponding input ingredients
220
7 Workflow Modeling Assistance Architecture
7.4 Interactive Modeling Interface In this section, the idea of an interactive workflow modeling interface is sketched as a potential extension of the current architecture. This interface aims at facilitating the actual modeling of a workflow, based on the newly developed retrieval and adaptation components (see Chap. 4+5) in the CAKE framework (see Sect. 7.2). The basic idea of the interactive workflow modeling is that the user starts to model a workflow and is constantly supported by proposals to construct the desired workflow model. More precisely, modeling suggestions as well as an auto-modeling procedure guide the user during the current modeling task. Furthermore, the user does not need to define a query. Instead, the implicit query specifications of the user are gradually refined based on the user’s interactions thereby ensuring that the modeling suggestions and the auto-modeling are continuously adjusted towards the users intentions. Workflow
Modeling Assistance
+task | + data | + control-flow block | + data-flow edge | + control-flow edge
salmon
mayonnaise
sauce
undesired elements ADD meat bake cut -> tomatoes
modeling suggestions dill -> mix sandwich dish (bread,sauce, salmon)
slice
mix
AUTO-MODELING SEARCH WORKFLOW MODEL
Figure 7.8: Interactive workflow modeling interface
The interface of the interactive workflow modeling support is illustrated in Figure 7.8. The left-sided workflow editor enables the modeling of a particular workflow. In the menu on the right side, undesired workflow components may be specified. Furthermore, modeling suggestions, an autocompletion, and a search for related workflow models are provided in this part of the interface. The underlying idea is that these functions are based on the presented workflow modeling assistance. This assistance requires a query Q = (Q+ , Q− ) that consists of desired properties Q+ and undesired properties Q− (see Chap. 4). Since the currently modeled workflow in the editor is a partial solution of the desired workflow model, it specifies desired workflow elements. In the given example slice salmon, for example, is modeled, since it is a
7.4 Interactive Modeling Interface
221
desired component in the final workflow. Thus, the current workflow is considered as the desired part Q+ of the query. Additionally, the user may specify and manage the set of undesired workflow components on the right− hand side, i.e., Q− = Q− 1 ∪ . . . ∪ Qn . In the illustrated example meat, bake, and cut tomatoes are specified as undesired. By modifying the workflow in the editor or by changing the undesired workflow components, the query is automatically refined. Whenever such modifications cause an inconsistency with regard to the query (see Sect. 4.5), the user is notified. Based on the given query the user can be supported in three different ways (see Fig. 7.8): • Auto-modeling: The auto-modeling enables the automatic construction of a workflow model guided by the current query Q = (Q+ , Q− ). The basic idea is to apply the presented workflow modeling assistance. More precisely, a retrieval for the best-matching workflow is executed and subsequently adapted according to the query. Next, the resulting workflow is presented to the user. The user may then accept or reject the presented workflow. Accepting the presented workflow will replace the current workflow in the editor with the auto-modeled workflow, which thereby refines the query. When declining the presented workflow, the user may subsequently add certain elements of the auto-modeled workflow to the list of undesired components. This will also refine the query and ensure that those elements will not be suggested in future auto-modeling scenarios. • Modeling suggestions: Furthermore, the user is continuously provided by modeling suggestions during the actual workflow modeling. Whenever the query is changed, the auto-modeling is triggered and the autocompleted workflow is used to derive potential fragments to complete the current workflow automatically. The completion fragments are considered as those workflow fragments that are only contained in the auto-completed workflow but not in the specified query, i.e., they are neither undesired nor already present in the currently modeled workflow. These modeling suggestions are then presented to the user on the right side. Modeling suggestions can be accepted meaning that they are automatically applied to the current workflow model in the editor. Furthermore, modeling suggestions can also be rejected. In this case, the related workflow elements are automatically added to the list of undesired workflow elements. Thus, by accepting or by rejecting modeling suggestions the query is further refined, which will then also result in new modeling suggestions generated. In the given example in Figure 7.8, it is suggested to add dill to the sauce or to combine the ingredients bread, sauce, and salmon to create a sandwich dish.
222
7 Workflow Modeling Assistance Architecture
• Search workflow model: In addition, the user could be supported by search for related workflows in the repository based on the current query. In contrast to the auto-modeling, no adaptation would be performed. The user may then compare workflows related to the currently modeled workflow in order to analyze possible alternatives to model the desired workflow. A suitable workflow model can then also be selected as a new starting point for workflow modeling.
7.5 Conclusions This chapter presented technical details on the implementation of the workflow modeling assistance. The constructed system integrates all the presented approaches and was implemented within the workflow management system CAKE. Its architecture allows a flexible configuration to match the demands of various domains. For the cooking domain, a prototypical system CookingCAKE was described, which provides cooking recipes represented as workflows. CookingCAKE differs significantly from today’s traditional cooking websites that exclusively support recipe search by name, contained ingredients or dish category. In contrast, CookingCAKE provides a more expressive query that also considers semantical relations. More importantly, CookingCAKE does not only provide a search for recipes, but also creates individual cooking recipes adapted to the specified query. For a comprehensive application in real cooking scenarios, further development is required, for example, the amount of ingredients needs to be considered. However, cooking is an excellent domain to demonstrate new CBR related approaches [71, 112, 38, 84] and to evaluate the presented workflow modeling assistance. Finally, an interactive modeling interface was described as a potential extension of the current architecture. Based on the presented retrieval and adaptation methods this interface aims at supporting the user during the actual modeling of a workflow by providing modeling suggestions.
8 Evaluation This chapter presents the evaluation of the described workflow modeling assistance by workflow retrieval and adaptation, which is based on a prestudy [136, 22]. According to design science in information systems research [99] (see Sect. 1.3) newly developed artifacts have to address a significant problem. To evaluate the developed artifacts, the utility of the approach for the particular problem situation thus needs to be determined. In this chapter, the utility of the novel workflow adaptation approaches is demonstrated by means of analytical evaluation methods as well as an experimental evaluation with real users in the cooking domain. The next section describes the experimental setup, including the application scenario, the investigated evaluation criteria, and hypotheses as well as the used data set. Subsequently, a detailed description of the analytical evaluation is presented, which examines the influence of workflow adaptation on the query fulfillment, the computation time as well as on structural properties of the workflow. Next, the experimental evaluation is described, in which real users were asked to rate the quality and the utility of the adapted workflows in comparison to non-adapted workflows. Finally, the implications of the evaluation are summarized in Section 8.3.
8.1 Experimental Setup In general, the workflow modeling assistance could be investigated by means of the Turing-Test [246] to verify whether users can distinguish between manually modeled and automatically constructed workflows. Such a test has already been applied for automatically generated cooking recipes (e.g., [228]). In contrast, this evaluation aims at a substantially more detailed assessment of the workflows constructed by the workflow modeling assistance. More precisely, the utility of the workflow modeling assistance according to the design science in information systems research [99] (see Sect. 1.3) is analyzed as will be explained in the next section.
© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_8
224
8 Evaluation
8.1.1 Utility & Application Scenario As in any information system, the utility of the workflow modeling assistance strictly depends on the application environment as illustrated in Figure 8.1. More precisely, the application environment determines the general setting in which the particular information system is applied. For a POCBR application this usually means that based on a given query, a suitable workflow model in the respective domain should be provided. When the user then specifies a concrete query, she/he is situated in a particular application scenario that determines the restrictions and requirements captured in the query. A workflow model that does not match the desired purpose would thus be inappropriate. Consequently, the utility of the provided workflow model strictly depends on the application environment as well as on the respective application scenario.
Application Scenario
Query
PO-CBR Application
Utility
Provided Solution
Workflow Model
Application Environment Figure 8.1: Application environment and utility of a PO-CBR system (based on [99] and [16][p.94 ff])
In the field of CBR, Bergmann and Wilke stated that many possible interpretations for utility exist (see [16][p.50-53] and [32]). These comprise, for example, the quality achieved or the effort saved by using the CBR system. For this evaluation, the utility is focused on the solution provided by the workflow modeling assistance. More precisely, the utility of the workflow modeling assistance is considered as the satisfaction of the user with the presented workflows in the particular application scenario.
8.1 Experimental Setup
225
Hence, for assessing the utility of the workflow adaptation, the specification of an appropriate application scenario in the application environment is required. For this evaluation, the cooking domain, which was also used to exemplify the approaches throughout this thesis, was chosen as the application environment. The concrete application scenario is defined as follows: An amateur chef wants to prepare a sandwich dish for an upcoming event at the weekend only having a rough idea of the ingredients that should be used or avoided. In addition, the amateur chef may have a partial perception of how the preparation needs to be performed. It is further assumed that all ingredients must still be bought. Thus, the amateur chef decides to use a cooking system, to search for a matching sandwich recipe. Several studies showed that the workflow adaptation is also applicable to other cooking workflows such as pasta recipes [170] or pizza recipes [168]. However, for a better comparison of the different workflows, the presented study is focused on a single dish type for all workflows.
8.1.2 Evaluation Criteria The presented evaluation considers various criteria to investigate the utility of the workflows constructed by the workflow modeling assistance. While several criteria can be determined automatically, others require the explicit involvement of users in the particular domain. Thus, this evaluation is divided into an analytical evaluation that assesses several evaluation criteria automatically and an experimental evaluation in which real users were asked to rate several criteria manually (differentiation based on Hevner et al. [99]). Analytical Evaluation Criteria The utility of the workflow modeling assistance can be analyzed automatically from various perspectives. The presented analytical evaluation will investigate the applicability of the adaptation methods as well as the impact of the adaptation on the query fulfillment and on structural properties of the workflow. Furthermore, the required computation time is analyzed. • Applicability of Workflow Adaptation. The general applicability of the automatically learned adaptation knowledge is an important factor for the utility of the presented adaptation approaches. In case of limited adaptation capabilities, the workflow modeling assistance is basically reduced to a search for workflow models. Thus, the success rates of adaptation, i.e., the number of scenarios in which the automatically learned adaptation knowledge can be applied, is investigated.
226
8 Evaluation
• Query Fulfillment. The query fulfillment (see Chap. 4), assesses the compliance of the user-specified query and the constructed workflow model by considering the desired and undesired workflow components given in the query. It can be assumed that a workflow with a higher query fulfillment better matches the expectations of the user and is consequently preferred compared to workflows with a lower query fulfillment. • Structural Properties. By applying workflow adaptation, structural properties of the workflow such as the size or the complexity could be changed. Since structural properties indicate the understandability of a workflow model (see Sect. 2.2.5), they can also affect the workflow’s utility. In this evaluation, the size of the workflow (i.e., number of nodes and edges) as well as the different types of nodes and edges are analyzed. Additionally, the complexity is investigated by the coefficient network complexity (CNC) measure [121, 43], which assesses the connectivity of the workflow elements. More precisely, the CNC measure is defined as the number of edges divided by the number of nodes. For a more detailed discussion on workflow complexity, the reader is referred to Section 6.3. • Computation Time. When considering the utility of the workflow modeling assistance in the broader sense, the computation time represents an additional criterion. For example, in certain scenarios, a fast workflow construction is required. Thus, the adaptation approaches are analyzed and compared also in this respect. Experimental Evaluation Criteria In the presented experimental evaluation, the quality as well as the utility of the workflow models constructed by the workflow modeling assistance will be assessed by users and compared with original workflows that were manually modeled based on real cooking recipes. • Workflow Quality. Workflow quality (see Sect. 2.2.5) is an important factor for the usefulness of a workflow model. The presented workflow adaptation approaches can affect the quality of the workflows, since adaptation knowledge is learned and applied fully automatically, i.e., workflow adaptation usually constructs new and unverified workflow models. In this evaluation, the quality assessment is based on five different criteria from the 3QM-Framework (see Sect. 2.2.5) and comprise the completeness and relevance as well as the correctness and comprehensibility of the cooking instructions specified in the workflow. While the completeness,
8.1 Experimental Setup
227
relevance and correctness quantify the semantic quality, the comprehensibility assesses the pragmatic quality of the workflow. Furthermore, the users were asked to rate the expected tastefulness of the sandwich dish described in the recipe to determine whether adaptation, for example, results in undesirable ingredient combinations. This item was used to measure the performance quality of the workflows (see Sect. 2.2.5) by evaluating the quality of the final workflow output (i.e., the final dish produced). Additionally, an overall workflow quality is automatically derived by computing the average of the five quality criteria. • Utility of constructed workflows. In addition to the previous criteria, which are investigated to analyze the utility of the workflow modeling assistance, the constructed workflows are also explicitly rated by users with regard to utility for the particular application scenario. It can be assumed that the perception of utility of the constructed workflow basically covers all the previous evaluation criteria1 and furthermore reflects the utility of the workflow modeling assistance as previously defined (see Sect. 8.1.1), i.e., the satisfaction of the user with the constructed workflow in the particular application scenario. Moreover, it is assumed that workflows with a higher utility are preferred (see [16][p.50-53]) compared to workflows with a lower utility.
8.1.3 Hypotheses Based on these criteria, the presented evaluation analyzes five hypotheses to measure the utility of the developed adaptation methods. As a basic requirement, workflow adaptation should be able to apply the automatically learned adaptation knowledge in most cases and to increase the query fulfillment such that the workflows better match the specified query (Hypothesis 1). However, it can be assumed that the quality of the workflows is decreased by adaptation, because the adaptation knowledge is learned and applied fully automatically. Since the adaptation methods have been developed in such a manner that they limit potential quality effects, workflow adaptation should, however, not considerably reduce the workflow’s quality (Hypothesis 2). The third hypothesis is based on the assumption that adapted workflows provide a higher utility for the users in the concrete application scenario (see Fig. 8.1) compared to the retrieved workflow (Hypothesis 3). Finally, the adaptation approaches are also investigated in comparison. Since the 1 Except
for computation time, which was no criterion for the user assessment and not known by users
228
8 Evaluation
combined and integrated adaptation approach was developed to exploit the advantages of the different adaptation approaches and to cover potential disadvantages, it can be assumed that the combined adaptation approach outperforms the single adaptation techniques with regard to query fulfillment and utility (Hypothesis 4). However, the combined adaptation most likely involves a high computation time, since it basically requires to perform all single adaptation approaches successively (Hypothesis 5). Hypothesis 1. Workflow adaptation significantly increases the query fulfillment compared to retrieved workflows. Hypothesis 2. Workflow adaptation does not reduce the quality of the workflows considerably. Hypothesis 3. Considering the query, users significantly prefer adapted workflows over the retrieved workflows with regard to utility. Hypothesis 4. The combined adaptation outperforms all single adaptation approaches with regard to query fulfillment and utility. Hypothesis 5. The computation time of the single adaptation approaches and the combined adaptation approach differ significantly.
8.1.4 Workflow Repository As a basis for the evaluation, a workflow repository was manually constructed by modeling cooking workflows representing real sandwich recipes from various Internet sources (e.g., WikiTaaable [9]2 ). In total 70 workflows were modeled based on the textual recipe descriptions. On average, a constructed workflow consists of 23,4 nodes as well as 41,7 edges (for more details see Table 8.1). Table 8.1: Average workflow size
10,41 10,86 2,11 29,43 12,23
2 http://wikitaaable.loria.fr
data nodes task nodes control-flow nodes data-flow edges control-flow edges
8.1 Experimental Setup
229
GRAPHICAL WORKFLOW REPRESENTATION
TEXTUAL RECIPE DESCRIPTION
Based on keywords contained in the textual recipe, control-flow blocks were modeled, if they represent the instructions of the cooking recipe appropriately. For example, keywords such as while were used to model AND blocks, if desired for modeling XOR blocks, and until to model LOOP blocks. From the 70 constructed workflows, 63% contain at least one control-flow block. More precisely, 57% of the workflows contain LOOP blocks, 13% XOR blocks, and 11% AND blocks. Ingedients
mayonnaise olive oil freshly-minced garlic black pepper toasted bread (2 slices) roasted bacon sliced tomatos salad green mix
mayonnaise garlic mince
Cooking Instructions
- whisk mayonnaise, olive oil, freshly-minced garlic and black pepper until smooth - refrigerate sauce - spread the bread with mayonnaise - then, layer remaining ingredients - cut in half
manual modeling process
black pepper olive oil
LOOP
whisk
LOOP
sandwich topping
bread
sandwich dish
sandwich sauce refrigerate
toast
roast
bacon
slice
tomatoes
set aside
spread
layer
top with
cut in half
salads green mix
Figure 8.2: Workflow modeling example (recipe based on WikiTaaable [9])
An example3 of this modeling process is depicted in Figure 8.2, which shows a textual recipe description including the ingredient list and the single cooking instructions as well as the resulting workflow graph for a sandwich dish. The example demonstrates, that during the modeling process, information that is not explicitly specified in the recipe needs to be completed. For instance, the layering of the second slice of bread on top of the sandwich dish is not explicitly specified in the cooking instructions of the textual recipe description, but required for an appropriate workflow representation. Furthermore, all workflows were modeled in a similar manner to ensure the basic application requirement of workflow streams, i.e., that workflows with a similar structure must be present (see Sect. 5.2.1).
3 Please
note that this example is chosen for illustrative purposes and does not cover all details
230
8 Evaluation
8.1.5 Taxonomies As a next step, initial task and data ontologies were manually constructed considering the preparation steps and ingredients contained in the modeled workflows.
TASK TAXONOMY
preparation steps 0.0 change temperature 0.1 increase 0.2
decrease 0.9
fry bake cook 0.9 boil
cut in stripes
others 0.0
make small 0.2 combine 0.3
mix 0.9 wash
cut 0.6 cut in slice cubes thinly
stir
serve
whisk
simmer
DATA TAXONOMY
ingredients 0.0 meat 0.5 poultry 0.9
vegetables 0.6
beef 0.9 tubers 0.6
turkey 0.9 beetroot radishes potatoes roast turkey
turkey breast
flavoring 0.5 spices 0.6 herbs 0.7
sandwich components 0.0 sandwich sandwich topping sauce
italian coriander mint herbs 0.8 oregano
basil
Figure 8.3: Extract of task and data taxonomies
An extract of each taxonomy is illustrated in Figure 8.3. Here, the bold ingredients or cooking steps represent leaf nodes in the taxonomies (e.g., fry or roast turkey). For the inner nodes representing generalized ingredients or cooking steps the corresponding similarity values are depicted (e.g., 0.6 for cut). Overall, the workflows stored in the repository used 198 different ingredients and 84 different preparation steps. On average, each workflow contained 1,5 generalized ingredients (e.g., cheese or bread ) and 1,8 generalized preparation steps (e.g., cut or cook ).
8.1 Experimental Setup
231
8.1.6 Real User Queries Based on these initial ontologies, realistic queries were gathered from 9 potential users in the cooking domain, i.e., amateur chefs4 . The users were familiarized with the application scenario (see Sect. 8.1.1), the semantics of the query, and the available taxonomies of ingredients and preparation steps that can be used to formulate queries. Each user was given 30 minutes5 to specify POQL-Lite queries and another 30 minutes to specify POQL-Full 6 queries. Both query types were entered via the respective query interface as illustrated in Chapter 7. For each query, the amateur chefs were asked to textually describe the defined queries in addition. This description was used to verify the plausibility of the created query in a subsequent manual revision step. In total, 51 plausible queries were identified.
heat pastrami relish
spread
deep-fry
cut into shape
mix
leaf salad
gherkin
bagel
cheese tropical fruit citrus fruit
query workflow Q+ (desired properties)
restriction workflow Q(undesired properties)
Textual Description
- My sandwich should contain leaf salad, bagel, pastrami, gherkin and relish. - Tropical fruits, citrus fruits and cheese are undesired. - Cooking steps should include heat and spread. - Undesired cooking steps are deep-frying, mixing or cut into shape.
Figure 8.4: POQL-Lite query example
An example of a POQL-Lite query is illustrated in Figure 8.4 showing the desired and undesired ingredients and preparation steps as well as 4 Amateur
chefs hereinafter refers to students of business informatics that stated to have cooking experience and be able to prepare a particular dish for a given recipe. All amateur chefs were German speakers. Thus, the experiment was entirely performed in German i.e., the workflows, taxonomies, queries and evaluation instructions were described in German. The evaluation instructions and examples presented in this section are translated into English for illustrative purposes. 5 Each of the 30 minutes included the time for study introduction. 6 POQL-Full limited to control-flow edges and input data-edges (for the sake of simplicity no control-flow nodes or output data nodes)
232
8 Evaluation
the corresponding textual description of the query specified by the user 7 . Additionally, an example for a POQL-Full query is depicted in Figure 8.5. Here, the user also defined several desired and undesired ingredients. Furthermore, by use of control-flow and data-flow edges available in POQLFull, the user specified that the fish should be grilled and subsequently toppings should be placed on the ciabatta. place
grill fish vegetables
ciabatta
tubers
butter biscuit
chilli pepper
query workflow Q+ (desired properties)
restriction workflow Q(undesired properties)
Textual Description
- Desired ingredients are chilli pepper, vegetables, fish and ciabatta. - The fish should be grilled. - Subsequently, toppings should be placed on the chiabatta - Tubers and butter biscuits are undesired.
Figure 8.5: POQL-Full query example
On average, the constructed POQL-Lite queries consist of 7,7 desired and 3,0 undesired workflow elements and the POQL-Full queries of 9,3 desired and 2,9 undesired workflow elements (see Table 8.2 for more details). Table 8.2: Average query size
POQL-Lite POQL-Full
7 Please
desired undesired desired undesired
data nodes 5,22 1,43 4,07 1,79
task nodes 2,52 1,61 2,57 0,86
control-flow edges 0,57 0,00
data-flow edges 2,04 0,25
note that the users textually described the POQL-Lite and POQL-Full queries in various ways
8.1 Experimental Setup
233
8.1.7 Data Revision & Data Set Summary Based on the gathered queries, obvious deficiencies in the initial taxonomies and recipes were revised in order to prevent distortion in the assessment of the workflows. More precisely, modeling errors of the workflows were corrected and the taxonomies were revised in a reasonable manner by reordering of the hierarchical structure, which also involved the renaming, adding or removal of certain ingredients and preparation steps. The final data taxonomy contains 241 different ingredients (176 leaf nodes) and the final task taxonomy consists of 105 different preparation steps (76 leaf nodes). The most basic numbers of the final evaluation data set are summarized in Table 8.3. Table 8.3: Basic data set
data description amateur chef queries workflows in repository different ingredients in the workflows different preparation steps in the workflows
number of data 51 70 198 84
8.1.8 Adaptation Parameters Furthermore, the presented workflow modeling assistance requires the specification of several adaptation parameters. When determining these parameters, it has to be considered that adaptation is a multi-objective optimization problem (see Sect. 5.4). An optimal adaptation algorithm would be fast, achieve the highest query fulfillment possible and further result in best quality workflows. However, a parameter configuration that optimizes all these criteria is usually not feasible. For example, certain parameter settings may prevent inappropriate adaptations leading to a higher quality, but may also reduce the increase in query fulfillment. Thus, the appropriate selection of parameters is essential to demonstrate the potential of the workflow modeling assistance. The following evaluation parameters have consequently been manually optimized by performing some initial trials on the evaluation data set aiming at finding an appropriate trade-off between quality, computation time and query fulfillment. More precisely, starting from an initial configuration and verification of the results, the parameter configurations were iteratively refined until an appropriate parameter setting was determined.
234
8 Evaluation
Table 8.4: Adaptation parameters
parameter ∆W = 0.0
algorithm(s) generalization / operators
∆ψdata = 0.6
generalization
∆ψtasks = 0.7
generalization
∆D = 0.7
operators
∆s = 0.5
operators
∆T = 0.5
operators
description Specifies the minimum similarity of a pair of workflows to be considered during the learning adaptation knowledge (see Sect. 5.1.1 + 5.3.4) Determines the probability and the level of generalization in the data taxonomy (see Sect. 5.1.1) Determines the probability and the level of generalization in the task taxonomy (see Sect. 5.1.1) Specifies whether two data nodes result in replace or insert and delete operators (see Sect. 5.3.4) Determines whether an exchange or delete operator is applicable on a streamlet in the workflow (see Sect. 5.3.3) Defines the minimum similarity of streamlet anchors (see Sect. 5.3.3)
The parameters resulting from the manual optimization process are listed in Table 8.4. This table comprises all required parameter values for the three single adaptation approaches. The first parameter ∆W specifies the minimum similarity between pairs to be considered during the learning of adaptation knowledge for adaptation by generalization and specialization as well as for the operator-based adaptation. Here, the minimum similarity of 0.0 is chosen, because the workflow repository only contains workflows of a single dish type, i.e., sandwich recipes. Consequently, all pairs of workflows are considered to learn adaptation knowledge. Given a data set with more different types of workflows, a higher similarity threshold is recommended to prevent the acquisition of inappropriate adaptation knowledge. The next two parameters in the table (∆ψdata and ∆ψtasks ) specify the probability and the level of workflow generalization, whereas the last three parameters are used to configure the learning (∆D ) and the applicability (∆s and ∆T ) of adaptation operators. For the compositional adaptation by workflow streams, no parameters need to be specified. The
8.1 Experimental Setup
235
combined adaptation approach integrating all three adaptation methods is parametrized by the same parameter values as the respective single adaptation approaches. While the combined adaptation may further be configured individually by selecting the adaptation methods and their application order (see Sect. 7.2), the standard setting as described in Section 5.5 is chosen. This means that first specialization is executed followed by the application of workflow streams. Next, operator-based adaptation is performed and finally specialization is again applied to the adapted workflow potentially containing some generalized workflow elements.
8.1.9 Computing Evaluation Samples After finalizing the setup of the PO-CBR application, the automated learning of adaptation knowledge was triggered, which analyzes the workflows stored in the repository (see Chap. 5). The resulting adaptation knowledge consists of generalized workflows with a total coverage of more than 1.9 · 1015 possible specializations (see Def. 16 in Sect. 5.1.2). Furthermore, 220 distinct workflow streams and 3562 different adaptation operators (529 insert, 539 delete, 2494 replace) were extracted. For the combined adaptation approach with generalized adaptation knowledge, 212 different workflow streams and 38458 different operators (599 insert, 592 delete, 2654 replace) were obtained. retrieval retrieved workflow
RP retrieval RP*
51 user queries
RP
adaptation by specialization
retrieved workflow
compositional adaptation
adapted workflow by workflow streams
WS
retrieved workflow
operator-based adaptation
adapted workflow by adaptation operators
WO
generalized workflow
combined adaptation
adapted workflow by combined adaptation
WC
retrieval RP retrieval RP*
adapted workflow WG by generalization and specialization
generalized workflow
retrieval
?
WR
51 evaluation samples containing 1 retrieved + 4 adapted workflows
Figure 8.6: Adaptations performed for evaluation
For each of the 51 user queries, the retrieved as well as the adapted workflows resulting from the four different adaptation approaches were 8 More
different generalized operators learned, since generalized workflow repository basically results in more replace operators that are less likely identical in general.
236
8 Evaluation
computed (see Fig. 8.6). For adaptation by generalization and specialization as well as for the combined adaptation, the generalized workflow repository was used during retrieval resulting in a generalized workflow, which is specialized during the subsequent adaptation process. As a result of these computations, 51 evaluation samples, each consisting of one retrieved and four adapted workflows were constructed. These samples are used in the evaluation in order to assess the quality, the utility and the query fulfillment of the workflows in comparison. The retrieved workflow then serves as a baseline to measure the potential effect of workflow adaptation. In the following of this chapter, the corresponding retrieved workflows are denoted by WR . The adapted workflows are denoted by WG (generalization and specialization), WS (compositional adaptation by workflow streams), WO (operator-based adaptation) and WC (combined adaptation) respectively.
8.2 Evaluation Results This section presents the evaluation results of the workflow modeling assistance. First, the results of the analytical evaluation measuring the query fulfillment, computation time as well as structural properties of the workflows are described. Next, the experimental evaluation with real users analyzing the quality and the utility of the workflows is presented.
8.2.1 Analytical Evaluation The analytical evaluation used all 51 evaluation samples (see Sect. 8.1.9) to measure the effects of the workflow adaptation on the query fulfillment, computation time as well as on structural properties of the workflow. Table 8.5: Average query fulfillment (QF ) & computation time (time)
QF time
WR 0,78 1,12 s
WG 0,83 (+6%) +0,08 s
WS 0,82 (+5%) +2,69 s
WO 0,85 (+9%) +13,90 s
WC 0,89 (+14%) +28,58 s
For investigating hypothesis 1 (query fulfillment can be increased ) the query fulfillment of the retrieved and adapted workflows were analyzed. On average, all adaptation approaches lead to a higher query fulfillment compared to the retrieved workflows (see Table 8.5). However, only a slight increase in query fulfillment was achieved by adaptation. On average,
8.2 Evaluation Results
237
generalization and specialization as well as workflow streams increased the query fulfillment by 5-6% and the operator-based approach by 9%. In overall, for the combined adaptation approach the highest query fulfillment increase of 14% was measured. The slight increase in query fulfillment can be partially explained by the relatively high query fulfillment of the retrieved workflows 9 . The increase in query fulfillment is, however, statistically significant for all adaptation approaches, which is confirmed by a paired t-test (p < 0, 025 = α). In overall, even though query fulfillment is only slightly increased, Hypothesis 1 is confirmed. Furthermore, also Hypothesis 4 is partially confirmed, since the combined adaptation approach on average leads to a higher query fulfillment compared to the single adaptation approaches. combined
51
operator-based
40
compositional
11
22
29
generalization
5
46 0%
20%
40%
successful
60%
80%
100%
failed
Figure 8.7: Success rate of adaptation
For a more comprehensive investigation of Hypothesis 1, the number of successful adaptations was analyzed in addition (see Fig. 8.7). Successful adaptations here refer to those adaptations in which at least one adaptation step increased the matching of the query. Thus, this investigation indicates whether the adaptation algorithms are applicable in many scenarios. All four adaptation approaches were successful in the majority of the 51 evaluation samples. However, the workflow streams were only successful in 57% of the cases. This is most likely caused, since workflow streams can only be replaced if similarly structured workflows exist in the workflow repository 10 . 9 e.g.,
a replacement of a similar but not desired element in the workflow with the actual desired element only slightly increases query fulfillment 10 This requirement was considered during the modeling of the workflow repository, but could not always be ensured.
238
8 Evaluation
Furthermore, the exchange of entire workflow components is not always appropriate with regard to query fulfillment (see Sect. 5.4). In contrast, generalization and operator-based adaptation were successful in 90% and 78% of the scenarios respectively. Most importantly, the combined adaptation approach was successful for all 51 queries. This reconfirms that query fulfillment can be increased by adaptation (Hypothesis 1), by showing that the automatically learned adaptation knowledge can be applied in many situations. Moreover, these results approve that the combined adaptation approach outperforms the single adaptation approaches with regard to query fulfillment (Hypothesis 4). Next, the retrieval as well as the adaptation times were compared (see Table 8.5) to investigate Hypothesis 5 (combined adaptation denoted by highest computation time). On average, retrieval required 1,12 seconds, while the computation time of the subsequent adaptation procedure largely depends on the particular adaptation algorithm. As expected, the specialization can be computed promptly (0,08 seconds). Also for compositional adaptation, the average computation time of 2,69 seconds is rather low. In contrast, the operator-based approach increases the adaptation time to 13,90 seconds and the combined approach requires 28,58 seconds until the adapted workflow is computed. Thus, the adaptation approaches differ significantly with regard to computation time (see Sect. 5.4). Furthermore, hypothesis 5 is confirmed, since the combined adaptation is characterized by the highest computation time. Overall, both query fulfillment as well as computation time largely depend on the chosen adaptation algorithm. Additionally, the effect of the adaptation on the structural properties (see Sect. 8.1.2) of the workflows was investigated to determine whether the quality is potentially affected (see Hypothesis 2). Table 8.6 illustrates the average number of nodes N and edges E contained in the retrieved as well as the four adapted workflows. Furthermore, the average number of data nodes N D , task nodes N T , control-flow nodes N C , data-flow edges E D and control-flow edges E C are illustrated (see Sect. 3.2). Moreover, the CNC measure is depicted in the table for the retrieved as well as the adapted workflows as a basic complexity criterion (see Sect. 6.3 for a more detailed discussion on workflow complexity). The table shows that on average all adaptation approaches11 increase the size of the workflow, i.e., more nodes and edges are contained in the adapted workflows. The general increase in 11 Please
note that adaptation by generalization and specialization itself does not change structural properties of the workflows. However, due to the generalization a different workflow may be selected during retrieval, which results in an adapted workflow that differs w.r.t. the structural property compared to standard retrieval.
8.2 Evaluation Results
239
workflow size can be explained by the asymmetric property of the similarity measure used by POQL, which usually prefers larger workflow fragments over smaller ones during the adaptation process12 . In more detail, all approaches inserted additional task and data nodes into the workflow, which also results in more control-flow and data-flow edges. Comparing the single approaches, the size of workflows is most increased by operator-based adaptation. This is primarily caused since usually more input operators than delete operators are applied. The smallest increase in workflow size has been measured for the compositional adaptation by workflow streams. However, since workflow stream adaptation failed in 43% of the scenarios, the effects on the structural property are generally less distinctive. In overall, the size of the workflow increased the most for the combined adaptation approach. In contrast to the general increase in workflow size, the number of control-flow blocks was marginally reduced by adaptation. The largest effect has been measured for the workflow stream adaptation. This relatively high reduction of control-flow nodes can be explained by the definition of the streams, since they basically only contain a control-flow block, if at least one task of the stream is contained in each of the corresponding branches13 . Despite the general increase in workflow size for all adaptation approaches, the coefficient network complexity (CNC) measure indicates that the connection strength of the workflow elements is not substantially affected. Overall, for verifying Hypothesis 2, a more detailed investigation is required to determine the impact on the workflow quality. Table 8.6: Structural change of workflow
N E ND NT NC ED EC CN C
12 Larger
WR 27,08 50,92 12,58 13,08 1,42 35,63 15,29 1,87
WG +0,71 +2,54 +0,13 +0,67 -0,08 +1,96 +0,58 +0,05
WS +0,17 +0,48 +0,33 +0,38 -0,54 +1,46 -1,00 -0,01
WO +2,83 +5,04 +1,75 +1,13 -0,04 +4,17 +0,86 -0,02
WC +2,79 +8,08 +0,64 +2,25 -0,08 +6,35 +0,01 +0,09
workflow fragments more likely contain desired workflow elements or elements similar to desired workflow elements. 13 Except of those branches that are empty in the original workflow (e.g., in XOR blocks)
240
8 Evaluation
8.2.2 Experimental Evaluation
TEXTUAL WORKFLOW REPRESENTATION
The analytical evaluation showed that the adaptation is able to increase the query fulfillment. However, in order to derive the actual quality and the utility of the provided solutions the involvement of experts is necessary. While first approaches to automatically assess the quality of adapted workflows exist (e.g. [156]) they cannot fully cover the users’ perception of quality and utility. Thus, an experimental evaluation with real users (i.e., amateur chefs) was performed. Since the evaluation with all 51 queries was not feasible, 30 queries were randomly selected for which the highest number of adaptation algorithms performed at least one adaptation step. This ensures a detailed assessment of the different adaptation algorithms in comparison. For the experimental evaluation 20 amateur chefs have been acquired that were divided into 5 different groups. Each group rated the same set of workflows resulting from 6 queries with regard to quality and utility. Thus, each amateur chef rated 30 different workflows. 1. mince [garlic]->[garlic] 2. LOOP 2.1 whisk [mayonnaise, olive oil,garlic,black pepper]->[sandwich sauce] 3. /LOOP 4. refrigerate [sandwich sauce]->[sandwich sauce] 5. toast [bread]->[bread] 6. roast [bacon]->[bacon] 7. slice [tomato]->[tomato] 8. set aside [bacon,tomato,salad greens mix]->[sandwich topping] 9. spread [bread,sandwich sauce]->[sandwich dish] 10. layer [sandwich dish,sandwich topping]->[sandwich dish] 11. top with [sandwich dish,bread]->[sandwich dish] 12. cut in half [sandwich dish]->[sandwich dish]
Figure 8.8: Example of textual workflow representation
The experimental evaluation was divided into two parts. The amateur chefs first rated the quality of the 30 workflows independent from the application scenario and the query (see Sect. 8.2.2). In the second part, the amateur chefs were then familiarized with the application scenario as well as the particular query and rated the utility of the workflows in comparison (see Sect. 8.2.2). The assessment of the workflows in both parts was based on a textual workflow representation automatically derived from the workflow graph. An example of this representation is illustrated in Figure 8.8 (derived from the workflow in Fig. 8.2). The textual representation basically lists
8.2 Evaluation Results
241
the single cooking steps in sequential order and describes the respective ingredients that are consumed (first square bracket) and produced (second square bracket) in each cooking step. This representation is more similar to cooking instructions given in textual recipe descriptions and thus more easy to assess by amateur chefs than cooking workflows represented as graphs. Initially, the amateur chefs were introduced into the semantics of this workflow representation as well as into the taxonomies containing the ingredients and preparations steps that may occur in a recipe. Quality Assessment For the quality assessment, 30 workflows were presented to each expert in random order such that the amateur chefs did not know which workflow was adapted and which was an original recipe generated by a human. Furthermore, the application scenario and the particular query was not given to the amateur chefs. Table 8.7: Quality criteria
criterion completeness relevance correctness comprehensibility tastefulness
short description Is all required information for preparation contained? Is all information relevant and not superfluous? Are all instructions correct and executable? How comprehensible is the current recipe? How would you rate the tastefulness of the dish described in the recipe? (neglecting personal ingredient preferences)
The experts were asked to rate 5 different quality criteria (see Sect. 8.1.2) on a likert scale from 1 to 5. Each quality criteria was described to the experts by means of a general question (see Table 8.7), which should reflect the corresponding ratings. More precisely, the five criteria comprised the completeness, relevance, correctness, and comprehensibility of the cooking instructions specified in the workflow. Furthermore, the users were asked to rate the expected tastefulness of the sandwich dish described in the recipe. In addition to these criteria, an overall workflow quality is automatically derived by computing the average of the five quality criteria.
242
8 Evaluation
The average quality ratings of the retrieved as well as of the adapted workflows are illustrated in Table 8.8. The results show that all adaptation approaches cause a relatively small reduction of workflow quality. The smallest decrease in quality was measured for the operator-based approach, which even results in a slight improvement with regard to the relevance criterion. This improvement may be explained by insert operators that include additional information into the workflow. The compositional adaptation also marginally decreased the quality, while a larger decrease was measured for the adaptation approach by generalization and specialization. In overall, the combined adaptation is characterized by the largest decrease in quality. However, for all approaches, the degree of quality reduction is very low. Table 8.8: Quality assessment (* p < 0, 025 = α)
completeness relevance correctness comprehensibility tastefulness overall quality
WR 4,28 4,09 4,03 4,23 3,73 4,07
WG -0,20* -0,20* -0,29* -0,27* -0,23* -0,24*
WS -0,15 -0,00 -0,18 -0,09 -0,04 -0,09
WO -0,10 +0,08 -0,03 -0,03 -0,03 -0,02
WC -0,23* -0,21* -0,32* -0,38* -0,26* -0,28*
Furthermore, the statistical significance of the slight quality reduction was investigated by a paired t-test. For the compositional and operator-based adaptation, the quality reduction is not statistically significant for any quality criterion. In contrast, for the adaptation approach by generalization and specialization and the combined adaptation approach, the quality reduction is statistically significant for all quality criteria. However, since the quality reduction is small for all approaches, it can still be concluded that the quality of the workflows is not considerably reduced by adaptation. These results are further supported by considering the individual number of ratings in which the retrieved or the adapted workflow was rated higher, lower or equal with regard to the overall workflow quality (see Fig. 8.9). More precisely, for the adaptation approach by generalization and specialization in more than 47,2% of the cases, the adaptation was rated higher or equal compared to the retrieved workflow. Thus, in more than 47,2% of the adaptations, no negative effect on the workflow quality was measured. For the combined adaptation approach the quality was not reduced in 55,8% of the cases. These findings are even more significant for the operator-based and
8.2 Evaluation Results
243
compositional adaptation approach (62,5-64,2%). Overall, in a relatively high number of cases, no negative effect on the workflow quality was determined. However, the results reconfirm that the combined adaptation as well as of the adaptation by generalization and specialization have a larger impact on the workflow quality. The quality assessments further demonstrated that the users have a similar perception of workflow quality, which is shown by the average standard deviation of 0,631 for the overall quality ratings of each workflow. Altogether, the analysis showed that the adaptation only slightly decreases the quality of the workflows. Thus, the quality is not considerably affected by adaptation, which confirms Hypothesis 2. 55,8 %
combined
12
55
53 64,2 %
operator-based
15
62
43
62,5 %
composi�onal
12
63
45
47,2 %
generaliza�on 0%
17
40 10%
20%
30%
adapted be�er
40% equal
63 50%
60%
70%
80%
90%
100%
retrieved be�er
Figure 8.9: Overall workflow quality ratings in comparison
Utility The previous section showed that workflow adaptation results in a slight decrease in workflow quality. However, these findings are not surprising considering the fact that the adaptation knowledge is learned and applied fully automatically. On the contrary, the query fulfillment is increased by workflow adaptation. Thus, in order to assess the trade-off between increase in query fulfillment and loss of quality, the utility was considered as an additional evaluation criterion. According to Hevner [99] utility is the most basic criterion when designing new artifacts for information systems. It can
244
8 Evaluation
be assumed, that if the utility of the adapted workflow is significantly higher than the utility of the retrieved workflow, even a loss of quality is acceptable. Thus, the utility of the workflows was investigated in a second part of the experimental evaluation. More precisely, the amateur chefs were asked to rate the utility of exactly those workflows whose quality they previously rated. Each of the 20 experts thus rated 30 workflows resulting from 6 queries according to utility. This time, the experts were aware of the general application scenario and of the particular query the workflows were generated from. Both information are important because otherwise the assessment of the utility is not possible (see Sect. 8.1.1). The utility was described to the experts as the usefulness of the recipe in the particular application scenario and for the specified query independent from personal ingredient preferences. The amateur chefs rated the retrieved as well as the four adapted workflows for a single query in comparison, i.e., the 5 workflows resulting from the query were presented to the user simultaneously. Again, the amateur chefs did not know which workflow was adapted and which was an original recipe generated by a human. Each workflow was rated with regard to utility based on a likert scale from 1 to 5. During this process, the experts were supported by highlighted workflow elements that match the desired (green font) or undesired (red font) elements specified in the query. Table 8.9: Average utility
WR WG WS WO WC
utility 2,41 3,31 (+0,90) 3,08 (+0,67) 3,00 (+0,59) 3,99 (+1,58)
The utility ratings illustrated in Table 8.9 show that on average the adapted workflows are more useful compared to the retrieved workflow. The smallest increase in utility was measured for the operator-based and compositional adaptation, which still raise the utility of the workflow from 2,41 to approximately 3,00. For the generalization approach, a utility of 3,31 was achieved. Finally, the combined adaptation approach outperforms all single adaptation approaches by an average utility of 3,99. For all adaptations, the increase in utility is statistically significant compared to the retrieved workflow, which is confirmed by a paired t-test (p < 0, 025 = α).
8.2 Evaluation Results
245
Furthermore, the utility improvement of the combined adaptation approach compared to all other adaptation methods is statistically significant (p < 0, 025 = α). Relating these results to the quality reduction by adaptation indicates that the loss of quality has no major impact on the utility. For example, the adaptation by generalization and specialization as well as the combined adaptation approach decreased the quality of the workflows at most, but led to the highest utility values at the same time. Consequently, this implies that the utility of the workflows is more affected by the query fulfillment than by the slight decrease in workflow quality. This confirms that the adapted workflows are more useful compared to the retrieved workflow for a given query. Thus, Hypothesis 3 is confirmed. Furthermore, the results also confirm that the utility of the combined adaptation approach is the highest (Hypothesis 4). Table 8.10: Application preference
WG WS WO WC
adapted workflow preferred 76 (63,3%) 63 (52,5%) 67 (55,8%) 94 (78,3%)
no preference 29 41 41 13
(24,2%) (34,2%) (34,2%) (10,8%)
retrieved workflow preferred 15 (12,5%) 16 (13,3%) 12 (10,0%) 13 (10,8%)
These findings are also approved by investigating the single utility ratings. Table 8.10 illustrates the measured preferences by stating the number of samples in which the utility was rated higher for the adapted or the retrieved workflow as well as the number of equal utility ratings. The results show that the retrieved workflow is only preferred in at most 13,3% of the cases (see stream adaptation WS ). Consequently, the adapted workflows are preferred or considered equivalent to the retrieved workflow in at least 86,7% of the cases. The highest utility preferences or utility equality are achieved for the operator-based adaptation (90,0%) and the combined adaptation approach (89,2%). Overall, this demonstrates that the adapted workflows have higher utility ratings in most scenarios. Finally, also the correlation between the utility and the query fulfillment as well as between the utility and the quality was investigated. The pearsons correlation coefficient of r = 0, 34 shows that there is a correlation of the query fulfillment and utility which is not strong, but statistically significant
246
8 Evaluation
(p < 0, 025 = α). In contrast to the query fulfillment, the quality less affected the ratings of the utility, which is shown by a correlation coefficient of r = 0, 111 between the overall quality and the utility (p < 0, 025 = α). Also for the single quality criteria, no high correlations could be identified. In case of a larger decrease in quality, a more distinct impact on the utility ratings can most likely be expected. However, please note that due to the experimental setting the computed correlations may only indicate a tendency of dependencies (e.g., utility ratings may differ for workflows with identical query fulfillment). Overall, the correlations imply that the query fulfillment can reflect the utility to some degree, but other factors (e.g., quality or personal preferences) may affect the utility in addition. This is confirmed by the average standard deviation of 0,93 for the utility ratings of each workflow, which shows that the users partially have a different perception of utility, but mostly agree with regard to the overall utility tendency. In summary, it can be concluded that the users primarily expect a workflow, which best matches the specified query and accept a loss of quality, if necessary.
8.3 Summary This chapter presented a comprehensive evaluation of the presented workflow modeling assistance by workflow retrieval and adaptation. The analytical and experimental evaluation results showed that the adaptation methods increase the query fulfillment, while only slightly decreasing the quality of the constructed workflow models. As a consequence, real users rated the utility of the adapted workflows higher than the retrieved workflows, which proves the utility of the developed adaptation methods. Comparing the workflow adaptation methods in more detail, the evaluation showed that the combined adaption approach outperforms the single adaptation approaches with regard to query fulfillment and utility.
9 Conclusion This thesis addresses the complex, error-prone and time-consuming task [253, 79] of workflow modeling. This demanding task is an essential requirement to organize and automatically execute various kinds of processes such as business processes, medical treatment processes [176, 52], or research processes in natural sciences [34, 36]. Consequently, the support of workflow modeling is a highly relevant and important topic in many domains. In this work, an innovative approach for workflow modeling assistance was presented that automatically constructs workflows by means of newly developed PO-CBR methods. The remainder of this section discusses the achieved contributions, and sketches potential future work.
9.1 Contributions Nowadays, workflows are applied in an increasing number of application areas. In all these domains the modeling of workflows is an essential but difficult task, which calls for methods to support workflow modelers. Thus, this thesis addresses a highly relevant and significant problem that concerns numerous domains. Several approaches already exist to support workflow modelers including workflow acquisition, search for workflow models, and workflow illustration techniques (see Sect. 2.6.2). In contrast, the automated workflow construction by means of workflow retrieval and adaptation, which was investigated in this thesis, is a novel approach. Thus, this thesis fills a significant gap in workflow modeling assistance. Furthermore, this work presents a first detailed investigation of workflow construction by retrieval and adaptation in the under-researched field of PO-CBR. Consequently, this thesis is an important contribution to Workflow Management as well as Case-Based Reasoning. According to design science in information systems research, the contributions of this work are threefold [99] and comprise the foundations laid, the artifact designed, and the evaluation performed. First, new and highly relevant foundations were established in PO-CBR, which was only limitedly investigated so far. More precisely, a comprehen© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8_9
248
9 Conclusion
sive view on the retrieval and adaptation of workflows was presented. In particular, this thesis is a first detailed investigation of workflow adaptation, which is known to be a highly difficult stage in CBR and in which only initial approaches existed (see Sect. 2.6.2). In order to ensure the rigor1 and structuredness of the research, the newly developed PO-CBR methods are based on well-established methods in CBR. Overall, this work laid comprehensive foundations for the retrieval and adaptation of workflows by PO-CBR, which may contribute to further research in Case-Based Reasoning and Workflow Management. Second, the designed artifact itself, i.e., the workflow modeling assistance, is an important contribution. In more detail, a new query language POQL was introduced that enables the search for workflow models considering desired and undesired workflow elements. While in literature several similarity measures and query languages have been proposed for the retrieval of workflows (see Sect. 4.8), the novelty of this approach is that it can be used to guide the automated adaptation of workflows. Next, this thesis contributed a first detailed investigation of workflow adaptation in the field of PO-CBR. This investigation comprised the development of three different kinds of workflow adaptation approaches that basically cover all main adaptation methods successfully applied in CBR. Furthermore, for all adaptation algorithms, this thesis presented new approaches to learn required adaptation knowledge automatically by analyzing the workflow repository. The automated learning of adaptation knowledge is an important contribution because otherwise the setup of the workflow modeling assistance becomes a time-consuming and elaborative task, which may hamper the successful application of the workflow modeling assistance. Again, this represents a new contribution in the field of PO-CBR, in which only first approaches existed (e.g., [155]). Additionally, a combined adaptation approach was presented, which integrates the three novel adaptation methods aiming at enhancing the resulting workflow solution. This thesis further discussed several approaches to improve the workflow modeling assistance. First, an approach was presented to automatically complete the data-flow of the workflows in order to ensure that workflows stored in the repository contain the required information for learning and applying adaptation knowledge appropriately. In CBR the completion of cases is a common problem, but usually only addressed by a set of manually configured completion rules. The novelty of this approach, also for CBR in general, is that in addition to a 1 According
to Hevner et al., “rigor addresses the way in which research is conducted” [99][p. 87] and “[. . . ] is achieved by appropriately applying existing foundations and methodologies”[99][p. 80]
9.1 Contributions
249
small set of manually defined but domain-independent completion rules, the required domain-specific knowledge (represented as completion operators) is derived fully automatically from already completed workflows. Next, adaptation-guided retrieval was discussed aiming at selecting the workflow that can be at best be adapted to the particular needs specified in the query. This has so far only been investigated in the field of CBR in general (see 6.2.4). Thus, also the presented adaptation-guided retrieval of workflows is a highly novel contribution for PO-CBR. Finally, a new approach for the complexity-aware construction of workflows was presented. While business process literature presents several complexity measures, the automated construction of workflows with a low complexity according to a specified query is a novel contribution. In summary, this thesis presents a first comprehensive investigation of workflow construction by means of PO-CBR. By integrating all the previously described methods into a workflow management system (see Chap. 7), the following usage scenarios for the workflow modeling assistance can be provided: • Workflow search: Based on POQL-Full or POQL-Lite queries, the workflow modeler can be supported by presenting workflows from a repository that best match the users’ demands (see Chap. 4). This supports workflow modelers having only a rough perception of the desired solution by enabling a search based on partial requirements and restrictions. • Workflow construction: In case that the repository does not contain a suitable workflow model, the adaptation component can automatically construct a better matching workflow model by adapting the best-matching workflow from the repository (see Chap. 5). • Auto-completion: More advanced workflow modelers that prefer to model a workflow from scratch could be supported during the actual modeling of a workflow by auto-completion of the data-flow. (see Sect. 6.1). • Workflow adaptation: In addition, a manually selected workflow model from the workflow repository can be adapted to the user’s needs by defining a specific POQL query determining the respective adaptation goal (see Chap. 5). The presented workflow modeling assistance is not limited to a specific domain, although cooking workflows served as running examples. This is because the developed methods mostly abstract from domain-specific assumptions. More precisely, the methods were developed for control-flow
250
9 Conclusion
oriented workflows (see Sect. 2.2.3) in general, which are applied in many domains and are traditionally used to represent business processes. The general structure of such control-flow oriented workflows is usually highly similar (see Sect. 3.1), since they are based on the typical control-flow patterns (see Sect. 2.2.3). Furthermore, the domain-specific information is not a part of the designed algorithms, but is considered as input data. Thus, the workflow modeling assistance can readily be implemented in a certain domain by modeling a set of consistent block-oriented workflows (see Sect. 3.2.1) and by specifying the required domain-specific task and data taxonomies with corresponding similarity values (see Sect. 3.3+3.5). All other required domain-specific knowledge, in particular the adaptation knowledge (see Chap. 5), is learned fully automatically, which significantly facilitates the application of the workflow modeling assistance in the particular domain. Furthermore, the integrated adaptation process combines three different adaptation methods with different advantages and requirements (see Sect. 5.4+5.5), such that the workflow adaptation can still be performed even if a single adaptation method cannot be applied in the respective domain. Consequently, the presented workflow modeling assistance is in principle applicable for control-flow oriented workflows in various domains. However, for demonstrating their generalizability and for enabling the application in other domains, additional research is required as will be discussed in the future research section (see Sect. 9.2). Finally, a new systematic approach for the evaluation of PO-CBR systems was developed, which comprised the conception of several evaluation criteria for automatically constructed workflows. The performed comprehensive study contributed a detailed assessment of the developed adaptation methods demonstrating the feasibility and utility of the workflow modeling assistance. In more detail, the results showed that the presented adaptation algorithms are able to increase the query fulfillment without a significant reduction of workflow quality. Consequently, the utility of the adapted workflows is higher compared to the retrieved workflows. The evaluation thereby also confirmed the appropriateness of the approaches to automatically learn adaptation knowledge. An important conclusion from the evaluation is that the combined approach outperforms all the three single adaptation methods with regard to query fulfillment and utility. Overall, the evaluation demonstrated the major contribution of this thesis: The presented workflow modeling assistance is able to enhance workflow modeling support compared to search for workflow models.
9.2 Future Research
251
9.2 Future Research Besides the contributions presented in this thesis, the foundations laid in this work provide a broad range of opportunities for potential future research. First, various optimizations of the current workflow modeling assistance could be investigated. More precisely, adaptation is a multi-objective optimization problem, including the dimensions of query fulfillment, adaptation time and quality. The trade-off between those multiple dimensions can so far be governed by several parameters and the selection of appropriate adaptation methods for the particular application domain. Further research may address the improvement of these dimensions. Regarding the quality, the current approaches do not consider dependencies between adaptation steps. However, it may significantly improve the quality, if a particular executed adaptation step enables or disables the application of another adaptation step. Moreover, the development of an approach to the automated assessment of workflow quality (e.g. [156]), which is then integrated and considered as an additional adaptation criterion during the workflow modeling process (such as the complexity as described in Sect. 6.3), could further increase the quality of the constructed workflow. Regarding adaptation time, the operator-based and the combined adaptation approach are characterized by a relatively high computation time. This could, for example, be addressed by parallelization or by applying more heuristic approaches to compute the similarity, to perform the search for the most optimal workflow solution, or to select appropriate adaptation knowledge (e.g., by cluster-based retrieval [166]). Furthermore, many established CBR methods can be applied in order to extend and improve the current workflow modeling assistance. Retainment (see Sect. 2.4.6), for example, addresses the maintenance of the CBR system with regard to quality and performance. For the workflow modeling assistance, user feedback on the provided solution could be gathered to identify and remove inappropriate adaptation knowledge thereby improving quality and performance of the system. Moreover, if an automated quality assessment is developed, as previously described, it could also be used to verify whether the application of adaptation knowledge results in workflows with acceptable quality in order to control the retainment of the adaptation knowledge. Furthermore, preference-based CBR methods (e.g., [103]) that consider user preferences during the generation process of workflow models could implement a personalized workflow modeling assistance. In this regard, also case-based recommender systems (e.g., [231]) could support the user during the actual workflow modeling by suggesting appropriate completion
252
9 Conclusion
fragments based on workflow models constructed by related users (e.g., [119]). Another potential investigation is the storing of the provenance (see Sect. 2.4.6), which enables a more detailed assessment of the reliability of the knowledge source. Here, the adaptation cases developed by Minor et al. [152] could be used to capture the information of manually adapted workflows and to use this information as an additional knowledge source during workflow construction. Finally, also conversational CBR (see Sect. 2.6.3) could be applied to integrate the user into the workflow modeling process, which means that the workflow modeling assistance becomes interactive (e.g., as partially sketched in Sect. 7.4). This can be important, as queries are initially often insufficiently specified, which may result in the construction of undesired workflow models. The basic idea is that instead of specifying a query, the user is guided during the modeling of the workflow by interactively traversing the space of potential adaptations. Also from a more workflow management related perspective, several possibilities for future investigations result. Currently, the workflow modeling assistance only supports workflows in which the tasks and data nodes are annotated by labels from a taxonomy (see Sect. 3.2). However, this is not sufficient in many application areas, because they usually require a more expressive representation that captures more semantic information. For example, in the domain of cooking, the quantities of ingredients or duration times of certain activities need to be captured. Thus, the full expressiveness of semantic workflows (see [20]) needs to be exploited, which means that the task and data nodes are annotated by a set of attribute-value pairs or by semantic descriptions based on OWL ontologies rather than by single labels (see Sect. 3.4). For this purpose, the retrieval approach and the adaptation approaches must be modified such that the additional semantic information can be handled. Furthermore, the current restriction of the adaptation approaches2 to consistent block-oriented workflows with limited control-flow patterns (see Sect. 3.2.1) may also be a starting point for future research in order to extend their scope of application. An additional potential application of the current workflow adaptation methods, so far not investigated, is the adaptation of workflow instances (see Sect. 2.2.2). The adaptation of workflows during run-time is important, since it enables to react flexibly to unexpected situations during workflow execution. If the presented adaption approaches are extended such that they ensure that already executed parts of the workflows are not changed, schema evolution
2 except
of adaptation by generalization and specialization (see Sect. 5.4)
9.2 Future Research
253
and schema integration (see Sect. 2.3.2) could be applied to enable the adaptation of running workflow instances. Even though the presented comprehensive evaluation enabled a detailed assessment of the workflow modeling assistance, future research could also address broader investigations in this regard. For example, in the cooking domain, the evaluation of the workflow modeling assistance with a larger workflow repository that covers more different aspects (e.g., different dish types in the cooking domain) could demonstrate the feasibility of the approach on a larger scale. Furthermore, in order to demonstrate the generalizability of the developed methods, the evaluation of the workflow modeling assistance in other domains is required. Finally, the discovery of new domains for the workflow modeling assistance also raises several opportunities for additional research. For the application of the presented workflow modeling assistance in other domains with similar workflow representations, additional research is required. This primarily includes an extended investigation to identify the requirements in the particular domain and the development of appropriate domain-specific solutions. For example, in real business domains, more complex workflow structures (e.g., including parameters of automated tasks) need to be supported. Furthermore, an appropriate concept to integrate the modeling support into business process reengineering tasks is required. For the application to other kinds of workflows, such as data-flow oriented workflows (see e.g., [242, 76, 36]), the presented adaptation principles have to be adapted to the respective workflow representations. In case of entirely new domains, i.e., domains in the early application stages of workflow management, the workflow repository is typically only sparsely defined. This can be a limiting factor for establishing the workflow modeling support, since the presented approaches require a sufficiently specified workflow repository in order to learn the required adaptation knowledge automatically. To overcome this problem, methods from transfer learning (e.g., [154]) could be employed to transfer knowledge from an established domain with a more substantial basis of workflow repositories to the new insufficiently specified domain. Overall, though more research is required, many application visions for the presented workflow modeling assistance are conceivable (see Sect. 2.6.1) ranging from the support of medical treatment processes or private processes in everyday life to areas such as the flow-based programming. Moreover, this work also laid foundations for investigating approaches to support the flexible manufacturing in the emerging field of Industry 4.0 [107] by the automated construction and adaptation of production processes. To conclude, this thesis is a pioneering contribution for the automated workflow construction
254
9 Conclusion
by retrieval and adaptation of workflows and thereby presents an innovative approach for the workflow modeling support. The approaches developed and foundations laid in this thesis provide a substantial basis for further research regarding workflow modeling assistance by means of PO-CBR.
Bibliography [1] Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun., 7(1):39–59, 1994. [2] Michael Adams, Arthur H. M. ter Hofstede, David Edmond, and Wil M. P. van der Aalst. Worklets: A service-oriented implementation of dynamic flexibility in workflows. In Robert Meersman and Zahir Tari, editors, On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I, volume 4275 of Lecture Notes in Computer Science, pages 291–308. Springer, 2006. [3] Alessandra Agostini and Giorgio De Michelis. Improving flexibility of workflow management systems. In Wil M. P. van der Aalst, J¨org Desel, and Andreas Oberweis, editors, Business Process Management, Models, Techniques, and Empirical Studies, volume 1806 of Lecture Notes in Computer Science, pages 218–234. Springer, 2000. [4] David W. Aha. Conversational case-based reasoning. In Sankar K. Pal, Sanghamitra Bandyopadhyay, and Sambhunath Biswas, editors, Pattern Recognition and Machine Intelligence, First International Conference, PReMI 2005, Kolkata, India, December 20-22, 2005, Proceedings, volume 3776 of Lecture Notes in Computer Science, page 30. Springer, 2005. [5] Klaus-Dieter Althoff and Rosina O. Weber. Knowledge management in case-based reasoning. Knowledge Eng. Review, 20(3):305–310, 2005. [6] Ahmed Awad. BPMN-Q: A language to query business processes. In Manfred Reichert, Stefan Strecker, and Klaus Turowski, editors, Enterprise Modelling and Information Systems Architectures - Concepts and Applications , Proceedings of the 2nd International Workshop on Enterprise Modelling and Information Systems Architectures (EMISA’07), © Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8
256
Bibliography St. Goar, Germany, October 8-9, 2007, volume P-119 of LNI, pages 115–128. GI, 2007.
[7] Ahmed Awad, Artem Polyvyanyy, and Mathias Weske. Semantic querying of business process models. In 12th International IEEE Enterprise Distributed Object Computing Conference, ECOC 2008, 1519 September 2008, Munich, Germany, pages 85–94. IEEE Computer Society, 2008. [8] Kerstin Bach, Klaus-Dieter Althoff, Julian Satzky, and Julian Kroehl. CookIIS Mobile: A Case-Based Reasoning Recipe Customizer for Android Phones. In Miltos Petridis, Thomas Roth-Berghofer, and Nirmalie Wiratunga, editors, Proceeding of UKCBR 2012, December 11, Cambridge, United Kingdom, pages 15–26. School of Computing, Engineering and Mathematics, University of Brighton, UK, 2012. [9] Fadi Badra, Rokia Bendaoud, Rim Bentebibel, Pierre-Antoine Champin, Julien Cojan, Am´elie Cordier, Sylvie Despr`es, St´ephanie Jean-Daubias, Jean Lieber, Thomas Meilender, Alain Mille, Emmanuel Nauer, Amedeo Napoli, and Yannick Toussaint. TAAABLE: text mining, ontology engineering, and hierarchical classification for textual case-based cooking. In Martin Schaaf, editor, ECCBR 2008, The 9th European Conference on Case-Based Reasoning, Trier, Germany, September 1-4, 2008, Workshop Proceedings, pages 219–228, 2008. [10] Fadi Badra, Am´elie Cordier, and Jean Lieber. Opportunistic Adaptation Knowledge Discovery. In Lorraine McGinty and David C. Wilson, editors, Case-Based Reasoning Research and Development, ICCBR 2009, Seattle, WA, USA, Proceedings, volume 5650 of LNCS, pages 60–74. Springer, 2009. [11] Joonsoo Bae, Ling Liu, James Caverlee, and William B. Rouse. Process mining, discovery, and integration using distance measures. In 2006 IEEE International Conference on Web Services (ICWS 2006), 1822 September 2006, Chicago, Illinois, USA, pages 479–488. IEEE Computer Society, 2006. [12] J¨org Becker, Michael Rosemann, and Christoph von Uthmann. Guidelines of business process modeling. In Wil M. P. van der Aalst, J¨ org Desel, and Andreas Oberweis, editors, Business Process Management, Models, Techniques, and Empirical Studies, volume 1806 of Lecture Notes in Computer Science, pages 30–49. Springer, 2000.
Bibliography
257
[13] Michael Becker and Ralf Laue. A comparative survey of business process similarity measures. Computers in Industry, 63(2):148–167, 2012. [14] Catriel Beeri, Anat Eyal, Simon Kamenkovich, and Tova Milo. Querying business processes. In Umeshwar Dayal, Kyu-Young Whang, David B. Lomet, Gustavo Alonso, Guy M. Lohman, Martin L. Kersten, Sang Kyun Cha, and Young-Kuk Kim, editors, Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006, pages 343–354. ACM, 2006. [15] Ralph Bergmann. On the Use of Taxonomies for Representing Case Features and Local Similarity Measures. In Lothar Gierl and Mario Lenz, editors, Proceedings of the 6th German Workshop on Case-Based Reasoning (GWCBR’98), 1998. [16] Ralph Bergmann. Experience Management: Foundations, Development Methodology, and Internet-Based Applications, volume 2432 of Lecture Notes in Computer Science. Springer, 2002. [17] Ralph Bergmann, Klaus-Dieter Althoff, Sean Breen, Mehmet H. G¨oker, Michel Manago, Ralph Traph¨oner, and Stefan Wess. Developing Industrial Case-Based Reasoning Applications: The INRECA-Methodology, volume 1612 of Lecture Notes in Computer Science. Springer, 1999. [18] Ralph Bergmann, Klaus-Dieter Althoff, Mirjam Minor, Meike Reichle, and Kerstin Bach. Case-based reasoning: Introduction and recent developments. KI, 23(1):5–11, 2009. [19] Ralph Bergmann, Sarah Gessinger, Sebastian G¨org, and Gilbert M¨ uller. The collaborative agile knowledge engine CAKE. In Sean P. Goggins, Isa Jahnke, David W. McDonald, and Pernille Bjørn, editors, Proceedings of GROUP ’16, Sanibel Island, FL, USA, November 09 - 12, 2014, pages 281–284. ACM, 2014. [20] Ralph Bergmann and Yolanda Gil. Similarity assessment and efficient retrieval of semantic workflows. Inf. Syst., 40:115–127, 2014. [21] Ralph Bergmann, Janet L. Kolodner, and Enric Plaza. Representation in case-based reasoning. Knowledge Eng. Review, 20(3):209–213, 2005. [22] Ralph Bergmann, Mirjam Minor, Gilbert M¨ uller, and Pol Schumacher. Project EVER: Extraction and processing of procedural
258
Bibliography experience knowledge in workflows. In Antonio A. S´ anchez-Ruiz and Anders Kofod-Petersen, editors, Proceedings of ICCBR 2017 Workshops (CAW, CBRDL, PO-CBR), Doctoral Consortium, and Competitions co-located with the 25th International Conference on Case-Based Reasoning (ICCBR 2017), Trondheim, Norway, June 26-28, 2017., volume 2028 of CEUR Workshop Proceedings, pages 137–146. CEURWS.org, 2017.
[23] Ralph Bergmann and Gilbert M¨ uller. Similarity-based Retrieval and Automatic Adaptation of Semantic Workflows. In Grzegorz J. Nalepa and Joachim Baumeister, editors, Knowledge Engineering and Software Engineering (working title). Springer, forthcomming, 2017. [24] Ralph Bergmann, Gilbert M¨ uller, and Daniel Wittkowsky. Workflow clustering using semantic similarity measures. In Ingo J. Timm and Matthias Thimm, editors, KI 2013: Advances in Artificial Intelligence 36th Annual German Conference on AI, Koblenz, Germany, September 16-20, 2013. Proceedings, volume 8077 of Lecture Notes in Computer Science, pages 13–24. Springer, 2013. [25] Ralph Bergmann, Gilbert M¨ uller, Christian Zeyen, and Jens Manderscheid. Retrieving adaptable cases in process-oriented case-based reasoning. In Zdravko Markov and Ingrid Russell, editors, Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, Key Largo, Florida, May 16-18, 2016., pages 419–424. AAAI Press, 2016. [26] Ralph Bergmann, Michael M Richter, Sascha Schmitt, Armin Stahl, and Ivo Vollrath. Utility-Oriented Matching: A New Research Direction for Case-Based Reasoning. In 9th German Workshop on Case-Based Reasoning (GWCBR’2001), 2001. [27] Ralph Bergmann and Alexander Stromer. MAC/FAC retrieval of semantic workflows. In Chutima Boonthum-Denecke and G. Michael Youngblood, editors, Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2013, St. Pete Beach, Florida. May 22-24, 2013. AAAI Press, 2013. [28] Ralph Bergmann and Ivo Vollrath. Generalized cases: Representation and steps towards efficient similarity assessment. In Wolfram Burgard, Thomas Christaller, and Armin B. Cremers, editors, KI-99: Advances in Artificial Intelligence, 23rd Annual German Conference on Artificial
Bibliography
259
Intelligence, Bonn, Germany, September 13-15, 1999, Proceedings, volume 1701 of Lecture Notes in Computer Science, pages 195–206. Springer, 1999. [29] Ralph Bergmann and Wolfgang Wilke. Learning abstract planning cases. In Nada Lavrac and Stefan Wrobel, editors, Machine Learning: ECML-95, 8th European Conference on Machine Learning, Heraclion, Crete, Greece, April 25-27, 1995, Proceedings, volume 912 of Lecture Notes in Computer Science, pages 55–76. Springer, 1995. [30] Ralph Bergmann and Wolfgang Wilke. On the role of abstraction in case-based reasoning. In Ian F. C. Smith and Boi Faltings, editors, Advances in Case-Based Reasoning, Third European Workshop, EWCBR-96, Lausanne, Switzerland, November 14-16, 1996, Proceedings, volume 1168 of Lecture Notes in Computer Science, pages 28–43. Springer, 1996. [31] Ralph Bergmann and Wolfgang Wilke. PARIS: Flexible plan adaptation by abstraction and refinement. In ECAI 1996 Workshop on Adaptation in Case-Based Reasoning, 1996. [32] Ralph Bergmann and Wolfgang Wilke. Towards a new formal model of transformational adaptation in case-based reasoning. In ECAI, pages 53–57, 1998. [33] Ralph Bergmann, Wolfgang Wilke, Ivo Vollrath, and Stefan Wess. Integrating General Knowledge with Object-Oriented Case Representation and Reasoning. In Hans-Dieter Burkhard and Stefan Wess, editors, 4th German Workshop: Case-Based Reasoning - System Development and Evaluation, pages 120–126, Informatik-Berichte Nr. 55, 1996. Humboldt-Universit¨at Berlin. [34] Chad Berkley, Shawn Bowers, Matthew B. Jones, Bertram Lud¨ascher, Mark Schildhauer, and Jing Tao. Incorporating semantics in scientific workflow authoring. In James Frew, editor, 17th International Conference on Scientific and Statistical Database Management, SSDBM 2005, 27-29 June 2005, University of California, Santa Barbara, CA, USA, Proceedings, pages 75–78, 2005. [35] Katy B¨orner. Structural similarity as guidance in case-based design. In Stefan Wess, Klaus-Dieter Althoff, and Michael M. Richter, editors, Topics in Case-Based Reasoning, First European Workshop, EWCBR93, Kaiserslautern, Germany, November 1-5, 1993, Selected Papers,
260
Bibliography volume 837 of Lecture Notes in Computer Science, pages 197–208. Springer, 1993.
[36] Sarah Cohen Boulakia and Ulf Leser. Search, adapt, and reuse: the future of scientific workflows. SIGMOD Record, 40(2):6–16, 2011. [37] N. Brand and H. van der Kolk. Workflow Analysis and Design. Kluwer Bedrijfswetenschappen, 1995. [38] Derek Bridge and Henry Larkin. Creating new sandwiches from old. In Emmanuel Nauer, Mirjam Minor, and Am´elie Cordier, editors, Procs. of the Seventh Computer Cooking Contest (Workshop Programme of the Twenty-second International Conference on Case-Based Reasoning), pages 117–124, 2014. [39] Hans-Dieter Burkhard. Case completion and similarity in case-based reasoning. Comput. Sci. Inf. Syst., 1(2):27–55, 2004. [40] Jaime G. Carbonell. Derivational analogy and its role in problem solving. In Michael R. Genesereth, editor, Proceedings of the National Conference on Artificial Intelligence. Washington, D.C., August 22-26, 1983., pages 64–69. AAAI Press, 1983. [41] Jorge Cardoso. About the data-flow complexity of web processes. In 6th International Workshop on Business Process Modeling, Development, and Support: Business Processes and Support Systems: Design for Flexibility, pages 67–74, 2005. [42] Jorge S. Cardoso. Business process control-flow complexity: Metric, evaluation, and validation. Int. J. Web Service Res., 5(2):49–76, 2008. [43] Jorge S. Cardoso, Jan Mendling, Gustaf Neumann, and Hajo A. Reijers. A discourse on complexity of process models. In Johann Eder and Schahram Dustdar, editors, Business Process Management Workshops, BPM 2006 International Workshops, BPD, BPI, ENEI, GPWW, DPM, semantics4ws, Vienna, Austria, September 4-7, 2006, Proceedings, volume 4103 of Lecture Notes in Computer Science, pages 117–128. Springer, 2006. [44] Fabio Casati, Stefano Ceri, Barbara Pernici, and Giuseppe Pozzi. Workflow evolution. Data Knowl. Eng., 24(3):211–238, 1998.
Bibliography
261
[45] M. Castellanos, A.K Alves de Medeiros, J. Mendling, B. Weber, and T. Weitjers. Business Process Intelligence. In Handbook of Research on Business Process Modeling., pages 456–480. Idea Group Inc., 2009. [46] Eran Chinthaka, Jaliya Ekanayake, David B. Leake, and Beth Plale. CBR based workflow composition assistant. In 2009 IEEE Congress on Services, Part I, SERVICES I 2009, Los Angeles, CA, USA, July 6-10, 2009, pages 352–355. IEEE Computer Society, 2009. [47] Riccardo Cognini, Knut Hinkelmann, and Andreas Martin. A case modelling language for process variant management in case-based reasoning. In Manfred Reichert and Hajo A. Reijers, editors, Business Process Management Workshops - BPM 2015, 13th International Workshops, Innsbruck, Austria, August 31 - September 3, 2015, Revised Papers, volume 256 of Lecture Notes in Business Information Processing, pages 30–42. Springer, 2015. [48] Michael T. Cox, H´ector Mu˜ noz-Avila, and Ralph Bergmann. Casebased planning. Knowledge Eng. Review, 20(3):283–287, 2005. [49] Susan Craw, Jacek Jarmulak, and Ray Rowe. Learning and applying case-based adaptation knowledge. In David W. Aha and Ian D. Watson, editors, Case-Based Reasoning Research and Development, 4th International Conference on Case-Based Reasoning, ICCBR 2001, Vancouver, BC, Canada, July 30 - August 2, 2001, Proceedings, volume 2080 of Lecture Notes in Computer Science, pages 131–145. Springer, 2001. [50] Susan Craw, Nirmalie Wiratunga, and Ray Rowe. Case-based design for tablet formulation. In Barry Smyth and Padraig Cunningham, editors, Advances in Case-Based Reasoning, 4th European Workshop, EWCBR-98, Dublin, Ireland, September 1998, Proceedings, volume 1488 of Lecture Notes in Computer Science, pages 358–369. Springer, 1998. [51] Susan Craw, Nirmalie Wiratunga, and Ray Rowe. Learning adaptation knowledge to improve case-based reasoning. Artif. Intell., 170(1617):1175–1192, 2006. [52] Peter Dadam, Manfred Reichert, and Klaus Kuhn. Clinical Workflows - The Killer Application for Process-oriented Information Systems?, pages 36–59. Springer, London, 2000.
262
Bibliography
[53] Peter Dadam, Manfred Reichert, Stefanie Rinderle-Ma, Kevin Goeser, Ulrich Kreher, and Martin Jurisch. Von ADEPT zur AristaFlow BPM Suite - Eine Vision wird Realit¨ at: “Correctness by Construction” und flexible, robuste Ausf¨ uhrung von Unternehmensprozessen. Technical Report UIB-2009-02, University of Ulm, Ulm, January 2009. [54] W. Dai, H. Dominic Covvey, Paulo S. C. Alencar, and Donald D. Cowan. Lightweight query-based analysis of workflow process dependencies. Journal of Systems and Software, 82(6):915–931, 2009. [55] Thomas H Davenport. Process innovation: Reengineering work through information technology. Harvard Business Press, 2013. [56] Massimiliano de Leoni and Wil M. P. van der Aalst. Data-aware process mining: discovering decisions in processes using alignments. In Sung Y. Shin and Jos´e Carlos Maldonado, editors, Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, Coimbra, Portugal, March 18-22, 2013, pages 1454–1461. ACM, 2013. [57] J¨org Desel. Process modeling using petri nets. In Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, 2005. [58] Remco M. Dijkman, Marlon Dumas, and Luciano Garc´ıa-Ba˜ nuelos. Graph matching algorithms for business process model similarity search. In Umeshwar Dayal, Johann Eder, Jana Koehler, and Hajo A. Reijers, editors, Business Process Management, 7th International Conference, BPM 2009, Ulm, Germany, September 8-10, 2009. Proceedings, volume 5701 of Lecture Notes in Computer Science, pages 48–63. Springer, 2009. [59] Remco M. Dijkman, Marlon Dumas, Boudewijn F. van Dongen, Reina K¨a¨arik, and Jan Mendling. Similarity of business process models: Metrics and evaluation. Inf. Syst., 36(2):498–516, 2011. [60] Remco M. Dijkman, Marcello La Rosa, and Hajo A. Reijers. Managing large collections of business process models - current techniques and challenges. Computers in Industry, 63(2):91–97, 2012. [61] Heiko D¨orr. Efficient Graph Rewriting and Its Implementation, volume 922 of Lecture Notes in Computer Science. Springer, 1995.
Bibliography
263
[62] Valmi Dufour-Lussier, Florence Le Ber, Jean Lieber, and Emmanuel Nauer. Automatic case acquisition from texts for process-oriented case-based reasoning. Inf. Syst., 40:153–167, 2014. [63] Valmi Dufour-Lussier, Jean Lieber, Emmanuel Nauer, and Yannick Toussaint. Text adaptation using formal concept analysis. In Isabelle Bichindaritz and Stefania Montani, editors, Case-Based Reasoning. Research and Development, 18th International Conference on CaseBased Reasoning, ICCBR 2010, Alessandria, Italy, July 19-22, 2010. Proceedings, volume 6176 of Lecture Notes in Computer Science, pages 96–110. Springer, 2010. [64] Marlon Dumas, Wil M. P. van der Aalst, and Arthur H. M. ter Hofstede. Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, 2005. [65] Hanna Eberle, Tobias Unger, and Frank Leymann. Process fragments. In Robert Meersman, Tharam S. Dillon, and Pilar Herrero, editors, On the Move to Meaningful Internet Systems: OTM 2009, Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009, Vilamoura, Portugal, November 1-6, 2009, Proceedings, Part I, volume 5870 of Lecture Notes in Computer Science, pages 398–405. Springer, 2009. [66] Johann Eder and Walter Liebhart. The workflow activity model WAMO. In CoopIS, pages 87–98, 1995. [67] Gregor Engels, Alexander F¨orster, Reiko Heckel, and Sebastian Th¨ one. Process modeling using UML. In Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, 2005. [68] Richard Fikes and Nils J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artif. Intell., 2(3/4):189–208, 1971. [69] Andrea Freßmann. Knowledge management support for collaborative and mobile working. PhD thesis, University of Trier, Germany, 2007. [70] Emmanuelle Gaillard, Laura Infante-Blanco, Jean Lieber, and Emmanuel Nauer. Tuuurbine: A generic CBR engine over RDFS. In Luc Lamontagne and Enric Plaza, editors, Case-Based Reasoning Research and Development - 22nd International Conference, ICCBR 2014, Cork,
264
Bibliography Ireland, September 29, 2014 - October 1, 2014. Proceedings, volume 8765 of Lecture Notes in Computer Science, pages 140–154. Springer, 2014.
[71] Emmanuelle Gaillard, Jean Lieber, and Emmanuel Nauer. Improving ingredient substitution using formal concept analysis and adaptation of ingredient quantities with mixed linear optimization. In Joseph Kendall-Morwick, editor, Workshop Proceedings from The TwentyThird International Conference on Case-Based Reasoning (ICCBR 2015), Frankfurt, Germany, September 28-30, 2015., volume 1520 of CEUR Workshop Proceedings, pages 209–220. CEUR-WS.org, 2015. ´ [72] Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, and Carole A. Goble. Common motifs in scientific workflows: An empirical analysis. In 8th IEEE International Conference on EScience, e-Science 2012, Chicago, IL, USA, October 8-12, 2012, pages 1–8. IEEE Computer Society, 2012. ´ [73] Daniel Garijo, Oscar Corcho, and Yolanda Gil. Detecting common scientific workflow fragments using templates and execution provenance. In V. Richard Benjamins, Mathieu d’Aquin, and Andrew Gordon, editors, Proceedings of the 7th International Conference on Knowledge Capture, K-CAP 2013, Banff, Canada, June 23-26, 2013, pages 33–40. ACM, 2013. [74] F. Gebhardt, A. Voß, W. Gr¨ather, and B. Schmidt-Belz. Reasoning with complex cases. Kluwer Academic Publishers, New York, 1997. [75] Dimitrios Georgakopoulos, Mark F. Hornick, and Amit P. Sheth. An overview of workflow management: From process modeling to workflow automation infrastructure. Distributed and Parallel Databases, 3(2):119–153, 1995. [76] Yolanda Gil, Ewa Deelman, Mark H. Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole A. Goble, Miron Livny, Luc Moreau, and Jim Myers. Examining the challenges of scientific workflows. IEEE Computer, 40(12):24–32, 2007. [77] Carole A. Goble, Jiten Bhagat, Sergejs Aleksejevs, Don Cruickshank, Danius T. Michaelides, David R. Newman, Mark Borkum, Sean Bechhofer, Marco Roos, Peter Li, and David De Roure. myexperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Research, 38(Web-Server-Issue):677–682, 2010.
Bibliography
265
[78] Antoon Goderis, Ulrike Sattler, and Carole A. Goble. Applying description logics for workflow reuse and repurposing. In Ian Horrocks, Ulrike Sattler, and Frank Wolter, editors, Proceedings of the 2005 International Workshop on Description Logics (DL2005), Edinburgh, Scotland, UK, July 26-28, 2005, volume 147 of CEUR Workshop Proceedings. CEUR-WS.org, 2005. [79] Antoon Goderis, Ulrike Sattler, Phillip W. Lord, and Carole A. Goble. Seven bottlenecks to workflow reuse and repurposing. In Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark A. Musen, editors, The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings, volume 3729 of Lecture Notes in Computer Science, pages 323–337. Springer, 2005. [80] Sebastian G¨org and Ralph Bergmann. Social workflows - vision and potential study. Inf. Syst., 50:1–19, 2015. [81] Sebastian G¨ org, Ralph Bergmann, Sarah Gessinger, and Mirjam Minor. Real-time collaboration and experience reuse for cloud-based workflow management systems. In IEEE 15th Conference on Business Informatics, CBI 2013, Vienna, Austria, July 15-18, 2013, pages 391–398. IEEE Computer Society, 2013. [82] Sebastian G¨ org, Ralph Bergmann, Sarah Gessinger, and Mirjam Minor. A resource model for cloud-based workflow management systems enabling access control, collaboration and reuse. In Fr´ed´eric Desprez, Donald Ferguson, Ethan Hadar, Frank Leymann, Matthias Jarke, and Markus Helfert, editors, CLOSER 2013 - Proceedings of the 3rd International Conference on Cloud Computing and Services Science, Aachen, Germany, 8-10 May, 2013, pages 263–272. SciTePress, 2013. [83] Sebastian G¨org, Ralph Bergmann, Mirjam Minor, Sarah Gessinger, and Siblee Islam. Collecting, reusing and executing private workflows on social network platforms. In Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, editors, Proceedings of the 21st World Wide Web Conference, WWW 2012, Lyon, France, April 16-20, 2012 (Companion Volume), pages 747–750. ACM, 2012. [84] Kazjon Grace, Mary Lou Maher, David C. Wilson, and Nadia A. Najjar. Combining CBR and deep learning to generate surprising recipe
266
Bibliography designs. In Ashok K. Goel, M. Bel´en D´ıaz-Agudo, and Thomas RothBerghofer, editors, Case-Based Reasoning Research and Development - 24th International Conference, ICCBR 2016, Atlanta, GA, USA, October 31 - November 2, 2016, Proceedings, volume 9969 of Lecture Notes in Computer Science, pages 154–169. Springer, 2016.
[85] Lisa Grumbach and Ralph Bergmann. Using constraint satisfaction problem solving to enable workflow flexibility by deviation (best technical paper). In Max Bramer and Miltos Petridis, editors, Artificial Intelligence XXXIV - 37th SGAI International Conference on Artificial Intelligence, AI 2017, Cambridge, UK, December 12-14, 2017, Proceedings, volume 10630 of Lecture Notes in Computer Science, pages 3–17. Springer, 2017. [86] Lisa Grumbach, Eric Rietzke, Markus Schwinn, Ralph Bergmann, and Norbert Kuhn. SEMAFLEX - semantic integration of flexible workflow and document management. In Ralf Krestel, Davide Mottin, and Emmanuel M¨ uller, editors, Proceedings of the Conference “Lernen, Wissen, Daten, Analysen”, Potsdam, Germany, September 12-14, 2016., volume 1670 of CEUR Workshop Proceedings, pages 43–50. CEUR-WS.org, 2016. [87] Michael Hammer and James Champy. Reengineering the Corporation: A Manifesto for Business Revolution. HarperCollins Publishers, Inc., New York, NY, USA, 1993. [88] Kristian J. Hammond. CHEF: A model of case-based planning. In Tom Kehler, editor, Proceedings of the 5th National Conference on Artificial Intelligence. Philadelphia, PA, August 11-15, 1986. Volume 1: Science., pages 267–271. Morgan Kaufmann, 1986. [89] Kristian J. Hammond. Case-based planning: A framework for planning from experience. Cognitive Science, 14(3):385–443, 1990. [90] Yanbo Han, Amit Sheth, and Christoph Bussler. A taxonomy of adaptive workflow management. In Workshop of the 1998 ACM Conference on Computer Supported Cooperative Work, 1998. [91] Alexandre Hanft, Oliver Sch¨afer, and Klaus-Dieter Althoff. Integration of Drools into an OSGI-based BPM-Platform for CBR. In Mirjam Minor, Stefania Montani, and Juan A. Recio-Garcia, editors, Workshop on Process-Oriented Case-Based Reasoning. ICCBR 2011, 19th,
Bibliography
267
September 12-15, Greenwich, London, United Kingdom. University of Greenwich, 2011. [92] Kathleen Hanney and Mark T. Keane. Learning adaptation rules from a case-base. In Ian F. C. Smith and Boi Faltings, editors, Advances in Case-Based Reasoning, Third European Workshop, EWCBR-96, Lausanne, Switzerland, November 14-16, 1996, Proceedings, volume 1168 of Lecture Notes in Computer Science, pages 179–192. Springer, 1996. [93] Kathleen Hanney and Mark T. Keane. The Adaption Knowledge Bottleneck: How to Ease it by Learning from Cases. In David B. Leake and Enric Plaza, editors, Case-Based Reasoning Research and Development, ICCBR-97, Providence, USA, volume 1266 of LNCS, pages 359–370. Springer, 1997. [94] S´ebastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain. The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics, 30(5):740–742, 2014. [95] Reiko Heckel. Graph transformation in a nutshell. Electr. Notes Theor. Comput. Sci., 148(1):187–198, 2006. [96] Martin Hepp, Frank Leymann, John Domingue, Alexander Wahler, and Dieter Fensel. Semantic business process management: A vision towards using semantic web services for business process management. In Francis C. M. Lau, Hui Lei, Xiaofeng Meng, and Min Wang, editors, 2005 IEEE International Conference on e-Business Engineering (ICEBE 2005), 18-21 October 2005, Beijing, China, pages 535–540. IEEE Computer Society, 2005. [97] Mitra Heravizadeh, Jan Mendling, and Michael Rosemann. Dimensions of Business Processes Quality (QoBP). In Will van der Aalst, John Mylopoulos, Norman M. Sadeh, Michael J. Shaw, Clemens Szyperski, Danilo Ardagna, Massimo Mecella, and Jian Yang, editors, Business Process Management Workshops, volume 17, pages 80–91. Springer, 2009. [98] Joachim Herbst and Dimitris Karagiannis. Integrating machine learning and workflow management to support acquisition and adaptation of workflow models. In Proceedings of the 9th International Workshop
268
Bibliography on Database and Expert Systems Applications, DEXA ’98, pages 745–, Washington, DC, USA, 1998. IEEE Computer Society.
[99] Alan R. Hevner, Salvatore T. March, Jinsoo Park, and Sudha Ram. Design science in information systems research. MIS Quarterly, 28(1):75– 105, 2004. [100] Markus Hipp, Bela Mutschler, and Manfred Reichert. Navigating in process model collections: A new approach inspired by google earth. In Florian Daniel, Kamel Barkaoui, and Schahram Dustdar, editors, Business Process Management Workshops - BPM 2011 International Workshops, Clermont-Ferrand, France, August 29, 2011, Revised Selected Papers, Part II, volume 100 of Lecture Notes in Business Information Processing, pages 87–98. Springer, 2011. [101] Timo Homburg, Pol Schumacher, and Mirjam Minor. Towards workflow planning based on semantic eligibility. In 28. PuK-Workshop “Planung/Scheduling und Konfigurieren/Entwerfen”, Stuttgart, Germany, September 2014. [102] Thomas Hornung, Agnes Koschmider, and Georg Lausen. Recommendation based process modeling support: Method and user experience. In Qing Li, Stefano Spaccapietra, Eric S. K. Yu, and Antoni Oliv´e, editors, Conceptual Modeling - ER 2008, 27th International Conference on Conceptual Modeling, Barcelona, Spain, October 20-24, 2008. Proceedings, volume 5231 of Lecture Notes in Computer Science, pages 265–278. Springer, 2008. [103] Eyke H¨ ullermeier and Patrice Schlegel. Preference-based CBR: first steps toward a methodological framework. In Ashwin Ram and Nirmalie Wiratunga, editors, Case-Based Reasoning Research and Development - 19th International Conference on Case-Based Reasoning, ICCBR 2011, London, UK, September 12-15, 2011. Proceedings, volume 6880 of Lecture Notes in Computer Science, pages 77–91. Springer, 2011. [104] Vahid Jalali and David Leake. On retention of adaptation rules. In Luc Lamontagne and Enric Plaza, editors, Case-Based Reasoning Research and Development - 22nd International Conference, ICCBR 2014, Cork, Ireland, September 29, 2014 - October 1, 2014. Proceedings, volume 8765 of Lecture Notes in Computer Science, pages 200–214. Springer, 2014.
Bibliography
269
[105] MH Jansen-Vullers, MWNC Loosschilder, PAM Kleingeld, and HA Reijers. Performance measures to evaluate the impact of best practices. In Proceedings of Workshops and Doctoral Consortium of the 19th International Conference on Advanced Information Systems Engineering (BPMDS workshop), volume 1, pages 359–368. Tapir Academic Press Trondheim, 2007. [106] Jacek Jarmulak, Susan Craw, and Ray Rowe. Using case-base data to learn adaptation knowledge for design. In Bernhard Nebel, editor, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, IJCAI 2001, Seattle, Washington, USA, August 4-10, 2001, pages 1011–1020. Morgan Kaufmann, 2001. [107] Henning Kagermann, Wolfgang Wahlster, and Johannes Helbig. Securing the future of german manufacturing industry: Recommendations for implementing the strategic initiative INDUSTRIE 4.0. Final report of the Industrie 4.0 Working Group, 2013. [108] Brane Kalpic and Peter Bernus. Business process modelling in industry - the powerful tool in enterprise management. Computers in Industry, 47(3):299–318, 2002. [109] Klaus Kammerer, Jens Kolb, and Manfred Reichert. PQL - A descriptive language for querying, abstracting and changing process models. In Khaled Gaaloul, Rainer Schmidt, Selmin Nurcan, S´ergio Guerreiro, and Qin Ma, editors, Enterprise, Business-Process and Information Systems Modeling - 16th International Conference, BPMDS 2015, 20th International Conference, EMMSAD 2015, Held at CAiSE 2015, Stockholm, Sweden, June 8-9, 2015, Proceedings, volume 214 of Lecture Notes in Business Information Processing, pages 135–150. Springer, 2015. [110] Stelios Kapetanakis, Miltos Petridis, Brian Knight, Jixin Ma, and Liz Bacon. A case based reasoning approach for the monitoring of business workflows. In Isabelle Bichindaritz and Stefania Montani, editors, Case-Based Reasoning. Research and Development, 18th International Conference on Case-Based Reasoning, ICCBR 2010, Alessandria, Italy, July 19-22, 2010. Proceedings, volume 6176 of Lecture Notes in Computer Science, pages 390–405. Springer, 2010. [111] Daniel S. Kaster, Claudia Bauzer Medeiros, and Heloisa Vieira da Rocha. Supporting modeling and problem solving from prece-
270
Bibliography dent experiences: the role of workflows and case-based reasoning. Environmental Modelling and Software, 20(6):689–704, 2005.
[112] Marius Keppler, Moritz Kohlhase, Niels Lauritzen, Matthias Schmidt, Pol Schumacher, and Alexander Sp¨ at. GoetheShaker - Developing a rating score for automated evaluation of cocktail recipes. In ICCBR Workshop Proceedings, pages 101–108, 2014. [113] Nadia Kiani, Jean Lieber, Emmanuel Nauer, and Jordan Schneider. Analogical transfer in rdfs, application to cocktail name adaptation. In Ashok K. Goel, M. Bel´en D´ıaz-Agudo, and Thomas Roth-Berghofer, editors, Case-Based Reasoning Research and Development - 24th International Conference, ICCBR 2016, Atlanta, GA, USA, October 31 - November 2, 2016, Proceedings, volume 9969 of Lecture Notes in Computer Science, pages 218–233. Springer, 2016. [114] Jae Ho Kim, Woojong Suh, and Heeseok Lee. Document-based workflow modeling: a case-based reasoning approach. Expert Syst. Appl., 23(2):77–93, 2002. [115] Jihie Kim, Marc Spraragen, and Yolanda Gil. An intelligent assistant for interactive workflow composition. In Jean Vanderdonckt, Nuno Jardim Nunes, and Charles Rich, editors, Proceedings of the 2004 International Conference on Intelligent User Interfaces, January 13-16, 2004, Funchal, Madeira, Portugal, pages 125–131. ACM, 2004. [116] Jana Koehler, Giuliano Tirenni, and Santhosh Kumaran. From Business Process Model to Consistent Implementation: A Case for Formal Verification Methods. In Proceedings of EDOC 2002, 17-20 September 2002, Lausanne, Switzerland, page 96. IEEE Computer Society, 2002. [117] Janet Kolodner. Case-based Reasoning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. [118] Janet L. Kolodner and Robert L. Simpson Jr. The MEDIATOR: Analysis of an early case-based problem solver. Cognitive Science, 13(4):507–549, 1989. [119] Agnes Koschmider, Minseok Song, and Hajo A. Reijers. Social software for modeling business processes. In Danilo Ardagna, Massimo Mecella, and Jian Yang, editors, Business Process Management Workshops, BPM 2008 International Workshops, Milano, Italy, September
Bibliography
271
1-4, 2008. Revised Papers, volume 17 of Lecture Notes in Business Information Processing, pages 666–677. Springer, 2008. [120] Kristian Bisgaard Lassen and Wil M. P. van der Aalst. Complexity metrics for workflow nets. Information & Software Technology, 51(3):610–626, 2009. [121] Antti M Latva-Koivisto. Finding a complexity measure for business process models. Helsinki University of Technology, Systems Analysis Laboratory, 2001. [122] Ralf Laue and Volker Gruhn. Complexity metrics for business process models. In BIS, volume 85 of LNI, pages 1–12. GI, 2006. [123] Amaia Lazcano, Heiko Schuldt, Gustavo Alonso, and Hans-J¨ org Schek. WISE: process based e-commerce. IEEE Data Eng. Bull., 24(1):46–51, 2001. [124] D Leake. Combining rules and cases to learn case adaptation. In Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, volume 84, page 89, 1995. [125] David B. Leake and Scott A. Dial. Using case provenance to propagate feedback to cases and adaptations. In Klaus-Dieter Althoff, Ralph Bergmann, Mirjam Minor, and Alexandre Hanft, editors, Advances in Case-Based Reasoning, 9th European Conference, ECCBR 2008, Trier, Germany, September 1-4, 2008. Proceedings, volume 5239 of Lecture Notes in Computer Science, pages 255–268. Springer, 2008. [126] David B. Leake and Joseph Kendall-Morwick. Towards case-based support for e-science workflow generation by mining provenance. In Klaus-Dieter Althoff, Ralph Bergmann, Mirjam Minor, and Alexandre Hanft, editors, Advances in Case-Based Reasoning, 9th European Conference, ECCBR 2008, Trier, Germany, September 1-4, 2008. Proceedings, volume 5239 of Lecture Notes in Computer Science, pages 269–283. Springer, 2008. [127] David B Leake, Andrew Kinley, and David Wilson. Learning to improve case adaptation by introspective reasoning and cbr. In International Conference on Case-Based Reasoning, pages 229–240. Springer, 1995. [128] David B. Leake, Andrew Kinley, and David C. Wilson. Case-based similarity assessment: Estimating adaptability from experience. In
272
Bibliography Benjamin Kuipers and Bonnie L. Webber, editors, Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, AAAI 97, IAAI 97, July 27-31, 1997, Providence, Rhode Island., pages 674–679. AAAI Press / The MIT Press, 1997.
[129] Frank Leymann and Dieter Roller. Production workflow - concepts and techniques. Prentice Hall, 2000. [130] Odd Ivar Lindland, Guttorm Sindre, and Arne Sølvberg. Understanding quality in conceptual modeling. IEEE Software, 11(2):42–49, 1994. [131] Matthias Lohrmann and Manfred Reichert. Understanding Business Process Quality. In Business Process Management - Theory and Applications, pages 41–73. Springer, Berlin Heidelberg, 2013. [132] Ruopeng Lu and Shazia Wasim Sadiq. A survey of comparative business process modeling approaches. In Witold Abramowicz, editor, Business Information Systems, 10th International Conference, BIS 2007, Poznan, Poland, April 25-27, 2007, Proceedings, volume 4439 of Lecture Notes in Computer Science, pages 82–94. Springer, 2007. [133] Yinglong Ma, Xiaolan Zhang, and Ke Lu. A graph distance based metric for data oriented workflow retrieval with variable time constraints. Expert Syst. Appl., 41(4):1377–1388, 2014. [134] Therani Madhusudan, J. Leon Zhao, and Byron Marshall. A casebased reasoning framework for workflow model management. Data Knowl. Eng., 50(1):87–115, 2004. [135] A. Maes, G. Poels, F. Gailly, and R. Paemeleire. Measuring User Beliefs and Attitudes towards Conceptual Models: A Factor and Structural Equation Model. Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 05/311, Ghent University, Faculty of Economics and Business Administration, June 2005. [136] Jens Manderscheid. Evaluation von Adaptionsmethoden im prozessorientierten Fallbasierten Schließen. Master thesis, University of Trier, Germany, 2013. [137] Selma Limam Mansar, Farhi Marir, and Hajo A Reijers. Case-based reasoning as a technique for knowledge management in business process
Bibliography
273
redesign. Electronic Journal on Knowledge Management, 1(2):113–124, 2003. [138] Ivan Markovic. Advanced querying and reasoning on business process models. In Witold Abramowicz and Dieter Fensel, editors, Business Information Systems, 11th International Conference, BIS 2008, Innsbruck, Austria, May 5-7, 2008. Proceedings, volume 7 of Lecture Notes in Business Information Processing, pages 189–200. Springer, 2008. [139] Ivan Markovic and Alessandro Costa Pereira. Towards a formal framework for reuse in business process modeling. In Arthur H. M. ter Hofstede, Boualem Benatallah, and Hye-Young Paik, editors, Business Process Management Workshops, BPM 2007 International Workshops, BPI, BPD, CBP, ProHealth, RefMod, semantics4ws, Brisbane, Australia, September 24, 2007, Revised Selected Papers, volume 4928 of Lecture Notes in Computer Science, pages 484–495. Springer, 2007. [140] Ivan Markovic, Alessandro Costa Pereira, David de Francisco Marcos, and Henar Mu˜ noz. Querying in business process modeling. In Elisabetta Di Nitto and Matei Ripeanu, editors, Service-Oriented Computing - ICSOC 2007 Workshops, International Workshops, Vienna, Austria, September 17, 2007, Revised Selected Papers, volume 4907 of Lecture Notes in Computer Science, pages 234–245. Springer, 2007. [141] Kerstin Maximini and Rainer Maximini. Collaborative Agent-Based Knowledge Engine, Anforderungen - Konzept - L¨osungen. Technical Report V2, Universit¨at Trier, Lehrstuhl f¨ ur Wirtschaftsinformatik II, Universit¨ at Trier, Lehstuhl f¨ ur Wirtschaftsinformatik II, D-54286 Trier, January 2007. [142] Kerstin Maximini, Rainer Maximini, and Ralph Bergmann. An investigation of generalized cases. In Kevin D. Ashley and Derek G. Bridge, editors, Case-Based Reasoning Research and Development, 5th International Conference on Case-Based Reasoning, ICCBR 2003, Trondheim, Norway, June 23-26, 2003, Proceedings, volume 2689 of Lecture Notes in Computer Science, pages 261–275. Springer, 2003. [143] Rainer Maximini. Advanced techniques for complex case based reasoning applications. PhD thesis, University of Trier, Germany, 2006. [144] David McSherry. Demand-driven discovery of adaptation knowledge. In Thomas Dean, editor, Proceedings of the Sixteenth International Joint
274
Bibliography Conference on Artificial Intelligence, IJCAI 99, Stockholm, Sweden, July 31 - August 6, 1999. 2 Volumes, 1450 pages, pages 222–227. Morgan Kaufmann, 1999.
[145] J. Mendling, M. Moser, G. Neumann, H. M. W. Verbeek, and B. F. Van Dongen. A quantitative analysis of faulty EPC in the SAP reference model. BPM Center Report BPM-06-08, BPMcenter.org, 2006. [146] Jan Mendling. Metrics for Process Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness, volume 6 of LNBIP. Springer, 2008. [147] Jan Mendling, Kristian Bisgaard Lassen, and Uwe Zdun. On the transformation of control flow between block-oriented and graph-oriented process modelling languages. IJBPIM, 3(2):96–108, 2008. [148] Jan Mendling, Hajo A. Reijers, and Wil M. P. van der Aalst. Seven process modeling guidelines (7PMG). Information & Software Technology, 52(2):127–136, 2010. [149] Susanne Meyfarth. Entwicklung eines transformationsorientierten Adaptionskonzeptes zum fallbasierten Schließen in INRECA. Master thesis, University of Kaisterslautern, Germany, 1997. [150] Ryszard S. Michalski. A theory and methodology of inductive learning. Artif. Intell., 20(2):111–161, 1983. [151] Mirjam Minor, Ralph Bergmann, and Sebastian G¨org. Adaptive Workflow Management in the Cloud -Towards a Novel Platform as a Service. In Proceedings of the ICCBR 2011 Workshops, pages 131–138, 2011. [152] Mirjam Minor, Ralph Bergmann, and Sebastian G¨org. Case-based adaptation of workflows. Inf. Syst., 40:142–152, 2014. [153] Mirjam Minor, Ralph Bergmann, Sebastian G¨org, and Kirstin Walter. Reasoning on business processes to support change reuse. In Birgit Hofreiter, Eric Dubois, Kwei-Jay Lin, Thomas Setzer, Claude Godart, Erik Proper, and Lianne Bodenstaff, editors, 13th IEEE Conference on Commerce and Enterprise Computing, CEC 2011, LuxembourgKirchberg, Luxembourg, September 5-7, 2011, pages 18–25. IEEE Computer Society, 2011.
Bibliography
275
[154] Mirjam Minor, Ralph Bergmann, Jan-Martin M¨ uller, and Alexander Sp¨at. On the transferability of process-oriented cases. In Ashok K. Goel, M. Bel´en D´ıaz-Agudo, and Thomas Roth-Berghofer, editors, Case-Based Reasoning Research and Development - 24th International Conference, ICCBR 2016, Atlanta, GA, USA, October 31 - November 2, 2016, Proceedings, volume 9969 of Lecture Notes in Computer Science, pages 281–294. Springer, 2016. [155] Mirjam Minor and Sebastian G¨org. Acquiring adaptation cases for scientific workflows. In Ashwin Ram and Nirmalie Wiratunga, editors, Case-Based Reasoning Research and Development - 19th International Conference on Case-Based Reasoning, ICCBR 2011, London, UK, September 12-15, 2011. Proceedings, volume 6880 of Lecture Notes in Computer Science, pages 166–180. Springer, 2011. [156] Mirjam Minor, Siblee Islam, and Pol Schumacher. Confidence in workflow adaptation. In Bel´en D´ıaz-Agudo and Ian Watson, editors, Case-Based Reasoning Research and Development - 20th International Conference, ICCBR 2012, Lyon, France, September 3-6, 2012. Proceedings, volume 7466 of Lecture Notes in Computer Science, pages 255–268. Springer, 2012. [157] Mirjam Minor, Stefania Montani, and Juan A. Recio-Garc´ıa. Processoriented case-based reasoning. Inf. Syst., 40:103–105, 2014. [158] Mirjam Minor, Daniel Schmalen, Andreas Koldehoff, and Ralph Bergmann. Structural adaptation of workflows supported by a suspension mechanism stand by case-based reasoning. In 16th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2007), 18-20 June 2007, Paris, France, pages 370–375. IEEE Computer Society, 2007. [159] Mirjam Minor, Alexander Tartakovski, and Daniel Schmalen. Agile workflow technology and case-based change reuse for long-term processes. IJIIT, 4(1):80–98, 2008. [160] Tom M. Mitchell. Machine learning. McGraw Hill series in computer science. McGraw-Hill, 1997. [161] C. Mohan. Recent Trends in Workflow Management Products, Standards and Research, pages 396–409. Springer Berlin Heidelberg, Berlin, Heidelberg, 1998.
276
Bibliography
[162] Stefania Montani and Giorgio Leonardi. Retrieval and clustering for business process monitoring: Results and improvements. In Bel´en D´ıaz-Agudo and Ian Watson, editors, Case-Based Reasoning Research and Development - 20th International Conference, ICCBR 2012, Lyon, France, September 3-6, 2012. Proceedings, volume 7466 of Lecture Notes in Computer Science, pages 269–283. Springer, 2012. [163] Stefania Montani and Giorgio Leonardi. Retrieval and clustering for supporting business process adjustment and analysis. Inf. Syst., 40:128–141, 2014. [164] Daniel L. Moody. Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions. Data Knowl. Eng., 55(3):243–276, 2005. [165] J. Paul Morrison. Flow-Based Programming: A New Approach to Application Development. CreateSpace, Paramount, CA, 2010. [166] Gilbert M¨ uller and Ralph Bergmann. A cluster-based approach to improve similarity-based retrieval for process-oriented case-based reasoning. In Torsten Schaub, Gerhard Friedrich, and Barry O’Sullivan, editors, ECAI 2014 - 21st European Conference on Artificial Intelligence, 18-22 August 2014, Prague, Czech Republic - Including Prestigious Applications of Intelligent Systems (PAIS 2014), volume 263 of Frontiers in Artificial Intelligence and Applications, pages 639–644. IOS Press, 2014. [167] Gilbert M¨ uller and Ralph Bergmann. Compositional Adaptation of Cooking Recipes using Workflow Streams. In Computer Cooking Contest, Workshop Proceedings ICCBR 2014, Springer, 2014. The original publication is available at www.springerlink.com. [168] Gilbert M¨ uller and Ralph Bergmann. Workflow streams: A means for compositional adaptation in process-oriented CBR. In Luc Lamontagne and Enric Plaza, editors, Case-Based Reasoning Research and Development - 22nd International Conference, ICCBR 2014, Cork, Ireland, September 29, 2014 - October 1, 2014. Proceedings, volume 8765 of Lecture Notes in Computer Science, pages 315–329. Springer, 2014. [169] Gilbert M¨ uller and Ralph Bergmann. CookingCAKE: A framework for the adaptation of cooking recipes represented as workflows. In Joseph
Bibliography
277
Kendall-Morwick, editor, Workshop Proceedings from The TwentyThird International Conference on Case-Based Reasoning (ICCBR 2015), Frankfurt, Germany, September 28-30, 2015., volume 1520 of CEUR Workshop Proceedings, pages 221–232. CEUR-WS.org, 2015. [170] Gilbert M¨ uller and Ralph Bergmann. Generalization of workflows in process-oriented case-based reasoning. In Ingrid Russell and William Eberle, editors, Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2015, Hollywood, Florida. May 18-20, 2015., pages 391–396. AAAI Press, 2015. [171] Gilbert M¨ uller and Ralph Bergmann. Learning and applying adaptation operators in process-oriented case-based reasoning. In Eyke H¨ ullermeier and Mirjam Minor, editors, Case-Based Reasoning Research and Development - 23rd International Conference, ICCBR 2015, Frankfurt am Main, Germany, September 28-30, 2015, Proceedings, volume 9343 of Lecture Notes in Computer Science, pages 259–274. Springer, 2015. [172] Gilbert M¨ uller and Ralph Bergmann. POQL: A new query language for process-oriented case-based reasoning. In Ralph Bergmann, Sebastian G¨org, and Gilbert M¨ uller, editors, Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB, Trier, Germany, October 7-9, 2015., volume 1458 of CEUR Workshop Proceedings, pages 247– 255. CEUR-WS.org, 2015. [173] Gilbert M¨ uller and Ralph Bergmann. Case completion of workflows for process-oriented case-based reasoning. In Ashok K. Goel, M. Bel´en D´ıaz-Agudo, and Thomas Roth-Berghofer, editors, Case-Based Reasoning Research and Development - 24th International Conference, ICCBR 2016, Atlanta, GA, USA, October 31 - November 2, 2016, Proceedings, volume 9969 of Lecture Notes in Computer Science, pages 295–310. Springer, 2016. [174] Gilbert M¨ uller and Ralph Bergmann. Complexity-aware generation of workflows by process-oriented case-based reasoning. In Gabriele KernIsberner, Johannes F¨ urnkranz, and Matthias Thimm, editors, KI 2017: Advances in Artificial Intelligence - 40th Annual German Conference on AI, Dortmund, Germany, September 25-29, 2017, Proceedings, volume 10505 of Lecture Notes in Computer Science, pages 207–221. Springer, 2017.
278
Bibliography
[175] Gilbert M¨ uller and Ralph Bergmann. Cooking made easy: On a novel approach to complexity-aware recipe generation. In Computer Cooking Contest 2017 at the 25th International Conference on Case-Based Reasoning (ICCBR 2017), Trondheim, Norway, June 26-18, 2017., 2017. [176] Robert M¨ uller, Ulrike Greiner, and Erhard Rahm. AgentWork: A workflow system supporting rule-based workflow adaptation. Data Knowl. Eng., 51(2):223–256, 2004. [177] H´ector Mu˜ noz, J¨ urgen Paulokat, and Stefan Wess. Controlling a nonlinear hierarchical planner using case replay. In Jean Paul Haton, Mark T. Keane, and Michel Manago, editors, Advances in CaseBased Reasoning, Second European Workshop, EWCBR-94, Chantilly, France, November 7-10, 1994, Selected Papers, volume 984 of Lecture Notes in Computer Science, pages 266–279. Springer, 1994. [178] H´ector Mu˜ noz-Avila and Michael T. Cox. Case-based plan adaptation: An analysis and review. IEEE Intelligent Systems, 23(4):75–81, 2008. [179] Allen Newell, Herbert Alexander Simon, et al. Human problem solving, volume 104. Prentice-Hall Englewood Cliffs, NJ, 1972. [180] Markus Nick, Klaus-Dieter Althoff, and Ralph Bergmann. Experience management. In Alexander Hinneburg, editor, LWA 2007: Lernen - Wissen - Adaption, Halle, September 2007, Workshop Proceedings, page 339. Martin-Luther-University Halle-Wittenberg, 2007. [181] OASIS Standard. Web services business process execution language version 2.0, 2007. last access on 17-01-2018. [182] Object Management Group, Inc. Web services business process execution language version 2.0, 2010. last access on 17-01-2018. [183] Thomas M. Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin Senger, R. Mark Greenwood, Tim Carver, Kevin Glover, Matthew R. Pocock, Anil Wipat, and Peter Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17):3045–3054, 2004. [184] Santiago Onta˜ no´n. RHOG: A refinement-operator library for directed labeled graphs. CoRR, abs/1604.06954, 2016.
Bibliography
279
¨ [185] Hubert Osterle, J¨org Becker, Ulrich Frank, Thomas Hess, Dimitris Karagiannis, Helmut Krcmar, Peter Loos, Peter Mertens, Andreas Oberweis, and Elmar J. Sinz. Memorandum on design-oriented information systems research. EJIS, 20(1):7–10, 2011. [186] Sven Overhage, Dominik Birkmeier, and Sebastian Schlauderer. Quality marks, metrics, and measurement procedures for business process models - the 3qm-framework. Business & Information Systems Engineering, 4(5):229–246, 2012. [187] Maja Pesic and Wil M. P. van der Aalst. A declarative approach for flexible business processes management. In Johann Eder and Schahram Dustdar, editors, Business Process Management Workshops, BPM 2006 International Workshops, BPD, BPI, ENEI, GPWW, DPM, semantics4ws, Vienna, Austria, September 4-7, 2006, Proceedings, volume 4103 of Lecture Notes in Computer Science, pages 169–180. Springer, 2006. [188] Artem Polyvyanyy, Sergey Smirnov, and Mathias Weske. Business process model abstraction. In Jan vom Brocke and Michael Rosemann, editors, Handbook on Business Process Management 1, Introduction, Methods, and Information Systems, 2nd Ed., International Handbooks on Information Systems, pages 147–165. Springer, 2015. [189] Lisa Purvis and Pearl Pu. COMPOSER: A case-based reasoning system for engineering design. Robotica, 16(3):285–295, 1998. [190] Jan Recker. Opportunities and constraints: the current struggle with BPMN. Business Proc. Manag. Journal, 16(1):181–201, 2010. [191] Michael Redmond. Distributed cases for case-based reasoning: Facilitating use of multiple cases. In Howard E. Shrobe, Thomas G. Dietterich, and William R. Swartout, editors, Proceedings of the 8th National Conference on Artificial Intelligence. Boston, Massachusetts, July 29 - August 3, 1990, 2 Volumes., pages 304–309. AAAI Press / The MIT Press, 1990. [192] Russell Reed. Pruning algorithms-a survey. IEEE Trans. Neural Networks, 4(5):740–747, 1993. [193] Manfred Reichert. Dynamische Ablauf¨ anderungen in WorkflowManagement-Systemen. PhD thesis, University of Ulm, Germany, 2000.
280
Bibliography
[194] Manfred Reichert, Stefanie Rinderle, and Peter Dadam. ADEPT workflow management system: Flexible support for enterprise-wide business processes. In Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, and Mathias Weske, editors, Business Process Management, International Conference, BPM 2003, Eindhoven, The Netherlands, June 26-27, 2003, Proceedings, volume 2678 of Lecture Notes in Computer Science, pages 370–379. Springer, 2003. [195] Manfred Reichert and Barbara Weber. Enabling Flexibility in ProcessAware Information Systems - Challenges, Methods, Technologies. Springer, 2012. [196] Hajo A. Reijers. Product-based design of business processes applied within the financial services. Journal of Research and Practice in Information Technology, 34(2):110–122, 2002. [197] Hajo A. Reijers. Design and Control of Workflow Processes: Business Process Management for the Service Industry, volume 2617 of Lecture Notes in Computer Science. Springer, 2003. [198] Hajo A. Reijers, Jan Mendling, and Jan Recker. Business process quality management. In Jan vom Brocke and Michael Rosemann, editors, Handbook on Business Process Management 1, Introduction, Methods, and Information Systems, 2nd Ed., International Handbooks on Information Systems, pages 167–185. Springer, 2015. [199] Michael M. Richter. The knowledge contained in similarity measures. Invited Talk ICCBR95, Sesimbra, Portugal., 2005. [200] Michael M. Richter and Rosina O. Weber. Case-Based Reasoning - A Textbook. Springer, 2013. [201] Stefanie Rinderle, Manfred Reichert, and Peter Dadam. Correctness criteria for dynamic changes in workflow systems - a survey. Data Knowl. Eng., 50(1):9–34, 2004. [202] Stefanie Rinderle, Manfred Reichert, and Peter Dadam. Flexible support of team processes by adaptive workflow systems. Distributed and Parallel Databases, 16(1):91–116, 2004. [203] Stefanie Rinderle, Manfred Reichert, Martin Jurisch, and Ulrich Kreher. On representing, purging, and utilizing change logs in process management systems. In Schahram Dustdar, Jos´e Luiz Fiadeiro, and
Bibliography
281
Amit P. Sheth, editors, Business Process Management, 4th International Conference, BPM 2006, Vienna, Austria, September 5-7, 2006, Proceedings, volume 4102 of Lecture Notes in Computer Science, pages 241–256. Springer, 2006. [204] Peter Rittgen. Quality and perceived usefulness of process models. In Sung Y. Shin, Sascha Ossowski, Michael Schumacher, Mathew J. Palakal, and Chih-Cheng Hung, editors, Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010, pages 65–72. ACM, 2010. [205] Michael Rosemann. Potential pitfalls of process modeling: part B. Business Proc. Manag. Journal, 12(3):377–384, 2006. [206] Thomas Roth-Berghofer. Explanations and Case-Based Reasoning: Foundational Issues. In Peter Funk and Pedro A. Gonz´alez-Calero, editors, Advances in Case-Based Reasoning, ECCBR 2004, Madrid, Spain, Proceedings, volume 3155 of LNCS, pages 389–403. Springer, 2004. [207] Thomas Roth-Berghofer and Ioannis Iglezakis. Six steps in case-based reasoning: Towards a maintenance methodology for case-based reasoning systems. In In: Professionelles Wissensmanagement: Erfahrungen und Visionen (includes the Proceedings of the 9th German Workshop on Case-Based Reasoning (GWCBR, pages 198–208. Shaker-Verlag, 2001. [208] Nick Russell, Arthur H. M. ter Hofstede, David Edmond, and Wil M. P. van der Aalst. Workflow data patterns: Identification, representation and tool support. In Lois M. L. Delcambre, Christian Kop, Heinrich C. Mayr, John Mylopoulos, and Oscar Pastor, editors, Conceptual Modeling - ER 2005, 24th International Conference on Conceptual Modeling, Klagenfurt, Austria, October 24-28, 2005, Proceedings, volume 3716 of Lecture Notes in Computer Science, pages 353–368. Springer, 2005. [209] Nick Russell, Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, and David Edmond. Workflow resource patterns: Identification, representation and tool support. In Oscar Pastor and Jo˜ ao Falc˜ao e Cunha, editors, Advanced Information Systems Engineering, 17th International Conference, CAiSE 2005, Porto, Portugal, June 13-17, 2005, Proceedings, volume 3520 of Lecture Notes in Computer Science, pages 216–232. Springer, 2005.
282
Bibliography
[210] Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson Education, 2 edition, 2003. [211] Wasim Sadiq and Maria E. Orlowska. On correctness issues in conceptual modelling of workflows. In Robert D. Galliers, Ciaran Murphy, Sven A. Carlsson, Claudia Loebbecke, Hans Robert Hansen, and Ramon O’Callaghan, editors, Proceedings of the Fifth European Conference on Information Systems, ECIS 1997, Cork, UK, 1997, pages 943–964. Cork Publishing Ltd, 1997. [212] Sherif Sakr, Ahmed Awad, and Matthias Kunze. Querying process models repositories by aggregated graph search. In Marcello La Rosa and Pnina Soffer, editors, Business Process Management Workshops - BPM 2012 International Workshops, Tallinn, Estonia, September 3, 2012. Revised Papers, volume 132 of Lecture Notes in Business Information Processing, pages 573–585. Springer, 2012. [213] Steven Salzberg. A nearest hyperrectangle learning method. Machine Learning, 6:251–276, 1991. [214] Thomas Schael and Buni Zeller. Workflow management systems for financial services. In Proceedings of the Conference on Organizational Computing Systems, COOCS 1993, Milpitas, California, USA, November 1-4, 1993, pages 142–153. ACM, 1993. [215] Roger C. Schank. Dynamic Memory: a Theory of Reminding and Learning in Computers and People. Cambridge University Press, New York, 1982. [216] Roger C. Schank and R. P. Abelson. Scripts, Plans, Goals and Understanding. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977. [217] August-Wilhelm Scheer. ARIS - vom Gesch¨ aftsprozess zum Anwendungssystem. Springer, Berlin Heidelberg New York, 4th edition edition, 2002. [218] August-Wilhelm Scheer, Oliver Thomas, and Otmar Adam. Process modeling using event-driven process chains. In Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, 2005.
Bibliography
283
[219] Daniel Schmalen. Adaptives Workflow Management - Referenzmodell und Umsetzung. PhD thesis, Dr. Hut Verlag, 80538 M¨ unchen, 2011. ISBN 9783868539905. [220] Sascha Schmitt, Rainer Maximini, Gerhard Landeck, and J¨org Hohwiller. A product customization module based on adaptation operators for CBR systems in e-commerce environments. In Enrico Blanzieri and Luigi Portinale, editors, Advances in Case-Based Reasoning, 5th European Workshop, EWCBR 2000, Trento, Italy, September 6-9, 2000, Proceedings, volume 1898 of Lecture Notes in Computer Science, pages 504–516. Springer, 2000. [221] Helen Schonenberg, Ronny Mans, Nick Russell, Nataliya Mulyar, and Wil M. P. van der Aalst. Towards a taxonomy of process flexibility. In Zohra Bellahsene, Carson Woo, Ela Hunt, Xavier Franch, and Remi Coletta, editors, Proceedings of the Forum at the CAiSE’08 Conference, Montpellier, France, June 18-20, 2008, volume 344 of CEUR Workshop Proceedings, pages 81–84. CEUR-WS.org, 2008. [222] Pol Schumacher. Workflow Extraction from Textual Process Descriptions. PhD thesis, Goethe University Frankfurt, Frankfurt am Main, Germany, 2016. [223] Pol Schumacher, Mirjam Minor, and Eric Schulte-Zurhausen. Extracting and enriching workflows from text. In IEEE 14th International Conference on Information Reuse & Integration, IRI 2013, San Francisco, CA, USA, August 14-16, 2013, pages 285–292. IEEE, 2013. [224] Pol Schumacher, Mirjam Minor, Kirstin Walter, and Ralph Bergmann. Extraction of procedural knowledge from the web: a comparison of two workflow extraction approaches. In Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, editors, Proceedings of the 21st World Wide Web Conference, WWW 2012, Lyon, France, April 16-20, 2012 (Companion Volume), pages 739–747. ACM, 2012. [225] Laura A. Seffino, Claudia Bauzer Medeiros, Jansle V. Rocha, and Bei Yi. WOODSS - a spatial decision support system based on workflows. Decision Support Systems, 27(1-2):105–123, 1999. [226] Qihong Shao, Peng Sun, and Yi Chen. WISE: A workflow information search engine. In Yannis E. Ioannidis, Dik Lun Lee, and Raymond T.
284
Bibliography Ng, editors, Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, pages 1491–1494. IEEE Computer Society, 2009.
[227] Herbert A. Simon. Cognitive science: The newest science of the artificial. Cognitive Science, 4(1):33–46, 1980. [228] Kari Fosshaug Skjold, Marthe Sofie Øynes, Kerstin Bach, and Agnar Aamodt. IntelliMeal - Enhancing creativity by reusing domain knowledge in the adaptation process. In Computer Cooking Contest 2017 at the 25th International Conference on Case-Based Reasoning (ICCBR 2017), Trondheim, Norway, June 26-18, 2017., 2017. [229] Sergey Smirnov, Matthias Weidlich, Jan Mendling, and Mathias Weske. Action patterns in business process model repositories. Computers in Industry, 63(2):98–111, 2012. [230] Adam Smith. An Inquiry into the Nature and Causes of the Wealth of Nations, 1776. [231] Barry Smyth. Case-based recommendation. In Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl, editors, The Adaptive Web, Methods and Strategies of Web Personalization, volume 4321 of Lecture Notes in Computer Science, pages 342–376. Springer, 2007. [232] Barry Smyth and Padraig Cunningham. D´ej` a vu: A hierarchical casebased reasoning system for software design. In ECAI, pages 587–589, 1992. [233] Barry Smyth and Mark T. Keane. Retrieving adaptable cases: The role of adaptation knowledge in case retrieval. In Stefan Wess, KlausDieter Althoff, and Michael M. Richter, editors, Topics in Case-Based Reasoning, First European Workshop, EWCBR-93, Kaiserslautern, Germany, November 1-5, 1993, Selected Papers, volume 837 of Lecture Notes in Computer Science, pages 209–220. Springer, 1993. [234] Barry Smyth and Mark T. Keane. Remembering to forget: A competence-preserving case deletion policy for case-based reasoning systems. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Montr´eal Qu´ebec, Canada, August 20-25 1995, 2 Volumes, pages 377–383. Morgan Kaufmann, 1995.
Bibliography
285
[235] Barry Smyth and Mark T. Keane. Adaptation-guided retrieval: Questioning the similarity assumption in reasoning. Artif. Intell., 102(2):249–293, 1998. [236] Armin Stahl. Learning of knowledge-intensive similarity measures in case-based reasoning. PhD thesis, University of Kaiserslautern, 2004. [237] Armin Stahl. Learning similarity measures: A formal view based on a generalized CBR model. In H´ector Mu˜ noz-Avila and Francesco Ricci, editors, Case-Based Reasoning, Research and Development, 6th International Conference, on Case-Based Reasoning, ICCBR 2005, Chicago, IL, USA, August 23-26, 2005, Proceedings, volume 3620 of Lecture Notes in Computer Science, pages 507–521. Springer, 2005. [238] Robert D. Stevens, Alan J. Robinson, and Carole A. Goble. myGrid: personalised bioinformatics on the information grid. In Proceedings of the Eleventh International Conference on Intelligent Systems for Molecular Biology, June 29 - July 3, 2003, Brisbane, Australia, pages 302–304, 2003. [239] Robert D. Stevens, Hannah J. Tipney, Chris Wroe, Thomas M. Oinn, Martin Senger, Phillip W. Lord acomnd Carole A. Goble, Andy Brass, and M. Tassabehji. Exploring williams-beuren syndrome using my grid. In Proceedings Twelfth International Conference on Intelligent Systems for Molecular Biology/Third European Conference on Computational Biology 2004, Glasgow, UK, July 31-August 4, 2004, pages 303–310, 2004. [240] Julia Stoyanovich, Ben Taskar, and Susan Davidson. Exploring repositories of scientific workflows. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, Wands ’10, pages 7:1–7:10, New York, NY, USA, 2010. ACM. [241] Alexander Tartakovski. Reasoning with generalized cases. PhD thesis, University of Trier, Germany, 2008. [242] Ian J. Taylor, Ewa Deelman, Dennis B. Gannon, and Matthew Shields. Workflows for E-Science - Scientific Workflows for Grids. Springer, London, 2007. [243] Arthur H. M. ter Hofstede, Maria E. Orlowska, and Jayantha Rajapakse. Verification problems in conceptual workflow specifications. Data Knowl. Eng., 24(3):239–256, 1998.
286
Bibliography
[244] Tom Thaler, Sharam Dadashnia, Andreas Sonntag, Peter Fettke, and Peter Loos. The IWi process model corpus. Technical report, Institute for Information Systems (IWi) at the German Research Center for Artificial Intelligence (DFKI), 10 2015. [245] Nikola Trcka, Wil M. P. van der Aalst, and Natalia Sidorova. Workflow completion patterns. In IEEE Conference on Automation Science and Engineering, CASE 2009, Bangalore, India, 22-25 August, 2011, pages 7–12. IEEE, 2009. [246] A. M. Turing. Computing machinery and intelligence, 1950. [247] Wil M. P. van der Aalst. Verification of workflow nets. In Pierre Az´ema and Gianfranco Balbo, editors, Application and Theory of Petri Nets 1997, 18th International Conference, ICATPN ’97, Toulouse, France, June 23-27, 1997, Proceedings, volume 1248 of Lecture Notes in Computer Science, pages 407–426. Springer, 1997. [248] Wil M. P. van der Aalst. Process-aware information systems: Lessons to be learned from process mining. Trans. Petri Nets and Other Models of Concurrency, 2:1–26, 2009. [249] Wil M. P. van der Aalst, Twan Basten, H. M. W. (Eric) Verbeek, Peter A. C. Verkoulen, and Marc Voorhoeve. Adaptive workflow-on the interplay between flexibility and support. In ICEIS, pages 353–360, 1999. [250] Wil M. P. van der Aalst and Arthur H. M. ter Hofstede. YAWL: yet another workflow language. Inf. Syst., 30(4):245–275, 2005. [251] Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, Bartek Kiepuszewski, and Alistair P. Barros. Workflow patterns. Distributed and Parallel Databases, 14(1):5–51, 2003. [252] Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, and Mathias Weske. Business process management: A survey. In Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, and Mathias Weske, editors, Business Process Management, International Conference, BPM 2003, Eindhoven, The Netherlands, June 26-27, 2003, Proceedings, volume 2678 of Lecture Notes in Computer Science, pages 1–12. Springer, 2003.
Bibliography
287
[253] Wil M. P. van der Aalst, Ton Weijters, and Laura Maruster. Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng., 16(9):1128–1142, 2004. [254] Irene Vanderfeesten, Jorge Cardoso, Jan Mendling, Hajo A Reijers, and Wil Van der Aalst. Quality metrics for business process models. BPM and Workflow handbook, 144:179–190, 2007. [255] Irene T. P. Vanderfeesten, Hajo A. Reijers, Jan Mendling, Wil M. P. van der Aalst, and Jorge S. Cardoso. On a quest for good process models: The cross-connectivity metric. In Zohra Bellahsene and Michel L´eonard, editors, Advanced Information Systems Engineering, 20th International Conference, CAiSE 2008, Montpellier, France, June 16-20, 2008, Proceedings, volume 5074 of Lecture Notes in Computer Science, pages 480–494. Springer, 2008. [256] Kostas Vergidis, Ashutosh Tiwari, and Basim Majeed. Business process analysis and optimization: Beyond reengineering. IEEE Trans. Systems, Man, and Cybernetics, Part C, 38(1):69–82, 2008. [257] Angi Voss, Ralph Bergmann, Brigitte Bartsch-Spoerl, Pearl Pu, and Barry Smyth. Preface of the Workshop on Adaptation in Case-Based Reasoning. In Workshops of the 12th European Conference on Artificial Intelligence, Budapest, Hungary, August 11-16, 1996, Proceedings, 1996. [258] Ian D. Watson. Applying case-based reasoning - techniques for the enterprise systems. Morgan Kaufmann, 1997. [259] Barbara Weber, Manfred Reichert, and Stefanie Rinderle-Ma. Change patterns and change support features - enhancing flexibility in processaware information systems. Data Knowl. Eng., 66(3):438–466, 2008. [260] Barbara Weber, Stefanie Rinderle, and Manfred Reichert. Change patterns and change support features in process-aware information systems. In Janis A. Bubenko Jr., John Krogstie, Oscar Pastor, Barbara Pernici, Colette Rolland, and Arne Sølvberg, editors, Seminal Contributions to Information Systems Engineering, 25 Years of CAiSE, pages 381–395. Springer, 2013. [261] Barbara Weber and Werner Wild. Towards the agile management of business processes. In Klaus-Dieter Althoff, Andreas Dengel, Ralph Bergmann, Markus Nick, and Thomas Roth-Berghofer, editors, WM
288
Bibliography 2005: Professional Knowledge Management - Experiences and Visions, Contributions to the 3rd Conference Professional Knowledge Management - Experiences and Visions, April 10-13, 2005, Kaiserslautern, Germany, pages 375–382. DFKI, Kaiserslautern, 2005.
[262] Barbara Weber, Werner Wild, and Ruth Breu. CBRFlow: Enabling adaptive workflow management through conversational case-based reasoning. In Peter Funk and Pedro A. Gonz´alez-Calero, editors, Advances in Case-Based Reasoning, 7th European Conference, ECCBR 2004, Madrid, Spain, August 30 - September 2, 2004, Proceedings, volume 3155 of Lecture Notes in Computer Science, pages 434–448. Springer, 2004. [263] Mathias Weske. Business Process Management - Concepts, Languages, Architectures, 2nd Edition. Springer, 2012. [264] Wolfgang Wilke and Ralph Bergmann. Techniques and knowledge used for adaptation during case-based problem solving. In Angel P. Del Pobil, Jos´e Mira, and Moonis Ali, editors, Tasks and Methods in Applied Artificial Intelligence, 11th International Conference on Industrial and Engineering Applications of Artificial In telligence and Expert Systems, IEA/AIE-98, Castell´ on, Spain, June 1-4, 1998, Proceedings, Volume II, volume 1416 of Lecture Notes in Computer Science, pages 497–506. Springer, 1998. [265] Workflow Management Coalition. Workflow management coalition the workflow reference model, 1995. last access on 17-01-2018. [266] Workflow Management Coalition. Workflow management coalition glossary & terminology, 1999. last access on 17-01-2018. [267] Workflow Management Coalition. Workflow management coalition workflow standard process definition interface - – xml process definition language, 2012. last access on 17-01-2018. [268] X. Xiang and G. Madey. Improving the reuse of scientificworkflows and their by-products. In IEEE International Conference on Web Services (ICWS 2007), pages 792–799, July 2007. [269] Chistian Zeyen, Gilbert M¨ uller, and Ralph Bergmann. Conversational retrieval of cooking recipes. In Computer Cooking Contest 2017 at the 25th International Conference on Case-Based Reasoning (ICCBR 2017), Trondheim, Norway, June 26-18, 2017., 2017.
Bibliography
289
¨ [270] Christian Zeyen. Bewertung ausgew¨ahlter Ahnlichkeitsmaße des Semantic-Webs zur Nutzung im Fallbasierten Schließen. Bachelor thesis, University of Trier, Germany, 2013. [271] Christian Zeyen and Jens Manderscheid. Retrieving adaptable workflows in Process-Oriented CBR. Student research project, University of Trier, Germany, 2015. [272] Christian Zeyen, Gilbert M¨ uller, and Ralph Bergmann. Conversational process-oriented case-based reasoning. In David W. Aha and Jean Lieber, editors, Case-Based Reasoning Research and Development 25th International Conference, ICCBR 2017, Trondheim, Norway, June 26-28, 2017, Proceedings, volume 10339 of Lecture Notes in Computer Science, pages 403–419. Springer, 2017. [273] Fubo Zhang and Erik H. D’Hollander. Using hammock graphs to structure programs. IEEE Trans. Softw. Eng., 30(4):231–245, April 2004. [274] Roland Zito-Wolf and Richard Alterman. Multicases: A case-based representation for procedural knowledge. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pages 331–336, 1992.
Index abstraction, 65 acquisition bottleneck, 66, 124 ad-hoc adaptation, 49 adaptability, 205 adaptation, 59, 60 adaptation case, 65, 80 adaptation operators, 61, 161, 162 combined, 178 compositional, 151 guided retrieval, 70, 205 transformational, 158 adaptation knowledge, 56 learning, 66, 177 anchor, 145, 160, 166 attribute-value representation, 55 auto-completion, 120, 185, 263
consistent workflow, 96 control-flow, 29, 32, 92 control-flow block, 95 control-flow edge, 92 control-flow node, 92 control-flow node, 35 cooking workflow, 90
best-matching workflow, 114 block-oriented workflow, 94 BPMN, 32 build-time, 30 business process, 27, 89 business process modeling, 29
flexibility, 48
CAKE, 223 case, 52, 57 case base, 52, 55 Case-Based Reasoning (CBR), 52 CBR cycle, 53 child node, 98
knowledge containers, 55
data node, 35, 92 creator data node, 93 input data node, 93 output data node, 93 data-flow, 30, 34, 93 data-flow connected, 93 data-flow edge, 93 devil’s quadrangle, 41 enactment, 28, 31, 32, 44
generalization, 65, 99, 126 generalized workflow, 126 head data node, 159
label, 99 lead time, 42 leaf node, 98 machine learning, 52
© Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2018 G. Müller, Workflow Modeling Assistance by Case-based Reasoning, https://doi.org/10.1007/978-3-658-23559-8
292 maintenance, 67 ontology, 55, 98 PAIS, 25 partial workflow, 97 POQL, 110 Process-Oriented CBR, 71 query, 110 query fulfillment, 112, 115 ranking, 110, 112 retrieval, 57, 107 run-time, 30 schema evolution, 50, 73 scientific workflow, 75 semantic workflow, 100 similarity, 102 similarity, 56, 58 specialization, 65, 99, 132 specialized workflow, 139 specific output, 96, 140, 186 specific workflow, 133 streamlet, 159 strictly consistent workflow, 194 sub-workflow, 98, 139 syntactic correctness, 39, 94 task node, 35, 92 creator task node, 93 taxonomy, 98 similarity, 102 term, 99 throughput time, 42 understandability, 40 utility, 57, 238 vocabulary, 55
Index workflow, 29, 92 abstraction, 82, 152 complexity, 216, 217 editor, 30, 45, 229 instance, 31 model, 30, 31 modeling, 29 modeling language, 32 stream, 140, 142 workflow adaptation, 123 workflow completion, 183 complete workflow, 185 complete workflows, 185 completion operator, 188 completion rules, 195 workflow construction, 73, 178 workflow engine, 31, 44 workflow graph, 92 Workflow Management System, 43 workflow mapping, 103 workflow pattern, 32 AND, 33 LOOP, 34 OR, 34 sequence, 33 split/join node, 32 XOR, 33 workflow quality, 38 pragmatic quality, 40 semantic quality, 39 syntactic quality, 39 worklist, 45