Idea Transcript
Information Systems Series
TEDIUM and the Software Process
Bruce I. Blum
The MIT Press
The MIT Press
This file has been authorized and provided by the publisher, The MIT Press, as part of its ongoing efforts to make available in digital form older titles that are no longer readily available. The file is provided for non-commercial use through the Internet Archive under a CC-BY-NC-ND 4.0 license. For more information please visit www.creativecommons.org.
TEDIUM and the Software Process
Information Systems Michael Lesk, editor
Nested Transactions: An Approach to Reliable Distributed Computing, J. Eliot B. Moss, 1985 Advanced Database Techniques, Daniel Martin, 1986 Text, Context, and HyperText: Writing with and for the Computer, edited by Edward Barrett, 1988 Effective Documentation: What We Have Learned from Usability Research, edited by Stephen Doheny-Farina, 1988 The Society of Text: Hypertext, Hypermedia, and the Social Construction of Information, Edward Barrett, 1989 TEDIUM and the Software Process, Bruce I. Blum, 1989
TEDIUM and the Software Process Bruce I. Blum S'
The MIT Press Cambridge, Massachusetts London, England
© 1990 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was printed and bound in the United States of America Library of Congress Cataloging-in-Publication Data Blum, Bruce I. TEDIUM and the software process/ Bruce I. Blum, p. cm. — (Information systems) Includes bibliographical references. ISBN 0-262-02294-X 1. TEDIUM (Computer system) 2. Medical informatics. 3. Software engineering. I. Title. II. Series: Information systems (Cambridge, Mass.) R859.7.T45B58 1989 89-14576 610'. 285-dc20 CIP
xm i • 351 mo
BmKm
usrahv
£"/39 The Software Process
3
Initially, the program generator addressed only the issues of file maintenance and report preparation. The designer would define relations, program names, and functions; the system would translate this into MUMPS programs. Given that the generator worked, I next tried to close the system to provide a more complete development environment. I had to expand the range of the generator capabilities. The interface had to be improved so that the designers need not be concerned with the generator’s internal structure. The definitions used by the generator had to be combined and listed in a variety of formats. Tools had to be added to support communications and maintenance. In short, the initial generator was expanded into an application development environment. The period from 1980 to 1983 was stressful. A new environment was implemented, a staff of ten was trained in its use, the Phase I OCIS was maintained, the Phase II OCIS was developed and installed, and the faculty and staff of the Oncology Center were constantly requesting additional support. But about a year later than originally promised, the Phase II OCIS was operating on two computers. The use of TEDIUM in the Oncology Center and elsewhere in the Johns Hopkins Hospital was considered a success. The name TEDIUM, which stands for The Environment for Developing Information Utility Machines, emerged in 1981 when I tried to interest the School of Medicine in exploiting the concept as a product. Although the generator was only used internally and I had little empirical evidence of its utility, I attempted to get the school to find a company that would market the product and pay royalties to the university (and, of course, to me). After a year of discussion, I was told to start my own company and -- if I made lots of money -- to donate some to the university. I opened a company in my basement and decided to trademark the name of the generator. A patent attorney friend did the work for me. First I tried positive sounding names like SIMPLE, but they already had been registered. It cost $50 per search to find out that someone had previously claimed a similar name. In desperation, I chose a name that I was sure would not have been registered already, and so TEDIUM and Tedious Enterprises, Inc., came into being. During the first few years of the existence of Tedious Enterprises, Inc., I believed that I had a special insight and that TEDIUM was the proper way to develop applications. I still believe that I have a special insight and that TEDIUM is the proper way to develop applications. But I now also recognize that much work is required to make TEDIUM into a commercially viable product, and I am unwilling to undertake that task. Thus, I now pursue my insights into the software process using TEDIUM as a laboratory to test my theories. The remainder of this chapter discusses the software process from a variety of perspectives. As soon will be clear, my primary concern is how we build software products to provide the functions required by the sponsoring organization. I am not especially interested in the details of the implementation. Many of these details are expansions of existing conventions
Chapter 1
4
and therefore can be generated automatically. there are many others.
TEDIUM is such a generator;
It is useful hereto draw an analogy between the standard QWERTY keyboard and TEDIUM. The keyboard layout was designed to make it difficult to use; there was a concern that fast typing would jam the keys. The fact that individuals now can type in excess of 100 words per minute is a tribute to human adaptability and not the keyboard design. This book uses TEDIUM to examine how we can produce effective software more efficiently. I am more interested in what this experience tells us about the process than I am concerned with the mechanism used for the investigation. Naturally, to understand the former we first must understand TEDIUM. Before continuing with the substance of the book, let me establish a convention for personal references. Throughout this discussion I have used the first person singular and will continue to do so wherever I make reference to my personal views and experience. I will use the first person plural when I speak as a member of the software engineering community. (For example, we know that the life cycle cost of software maintenance is considerably greater than the initial cost to implement the product.) Finally, I will use "author" to reference personal activities that are incidental to the basic theme of this book. (For example, the author was active in the field of medical informatics between 1975 and 1984; since then he has devoted himself to research in software engineering.)
1.2
DEVELOPING SOFTWARE AND HARDWARE
I now begin the body of the book. This first chapter examines the software process independent of TEDIUM. I begin with a review of the classical process model as represented by the waterfall diagram. I next abstract that model by removing the management considerations. The result is an essential model of the process; it is presented at two levels of abstraction. Finally, I review how others seek to improve this process. Naturally, the remainder of the book concentrates on how TEDIUM approaches this problem. The software process (sometimes called the software life cycle) includes all activities related to the life of a software product, from the time of initial concept until final retirement. Because the software product is generally part of some larger system comprising hardware, people, and operating procedures, the software process is always embedded in some broader system engineering problem. The subset of system engineering devoted to the design, development, and evolution of the software product is called software engineering. The origin of the term software engineering is associated with two conferences sponsored by the NATO Science Committee, the first in Garmisch, Germany in 1968 [NaRa69] and the second near Rome, Italy in 1969 [BuRa70]. In less than a quarter of a century, the technology had advanced from the first of the stored program computers, the ENIAC, to computer applications so large and complex that their implementation was difficult to manage. At the
5
The Software Process
conferences there were frequent references to the OS/360 experience and AT&T’s telephone switches, but the central focus was on the development of large systems that combined new hardware and software components. Although the NATO conferences emphasized computer science issues related to the development of large applications with embedded computers, there also was a concern for how the development process should be managed. In the early days of computing, programming was taught on the job, and the development methods were decidedly ad hoc. By the late 1960s, however, a discipline of programming was being developed at both the coding level (structured programming) and the systems engineering level (the specifydesign-implement-test cycle).
Figure 1.1 The software life cycle as modeled by Boehm in 1976. (® 1976, IEEE, reprinted with permission.)
Before the NATO conferences there was considerable experience in managing the development of large systems. That methodology would now be used to guide the development of the software components. In 1970, Royce expressed the need for a detailed design step and testing feedback [Royc70], and eventually the modern waterfall diagram emerged. Figure 1.1 displays the representation of the software life cycle used by Boehm in his 1976 review [Boeh76]. Notice how the box labels have been derived from the hardware development model. Hardware System Requirements Hardware Requirements Preliminary Design Detailed Design Fabrication Test and Preoperations Operations and Maintenance
Software System Requirements Software Requirements Preliminary Design Detailed Design Code and Debug Test and Preoperations Operations and Maintenance
Chapter 1
6
In the search for methods to improve the software development process, the flow shown in Figure 1.1 was adapted as the standard structure for organizing and managing software projects. The process could be decomposed into discrete phases with criteria for starting an activity and determining when it had been completed. Yet therearf^ra^ior differences between hardware software JH^j^yj^engineerin^jTas^Jon^iistgfy. There are ph^sicalmodel^Jhat Establish a toun3atxorT’ for decision making, and handbooks exist to provide guidance. Softwareengineenn^^^iew^ as its name implies, it relies on "soft" modelT"of reality.*^"^""^^^^^^ Hardware normally deals with mufyj^l^^gi^^ Thus, the effort to control and document design decisions can be prorate^over^Uisjiiajix^ififii^ produced. In fact, it is common (jt2-B^££ngjj^£^iiii-2Bii^rototj^e-i man^^^^ij^^gurgo^g^. Conversely, software has a negligible What
is
deHv^j^^j^jJjg^^inal^£XUiiijUfiiL-£L^ilS»
Pj^^ctionhai^wa^^j^jyiyiSiX^iiJllfldiJLito Therefore, there is a major incentive to prove the design before production begins. But software is simpW^ext^J-Ms^^y^jiaS^JiUiliUJlSfi-^^ (Naturally, the Verificationof a change is a complex proggss^-lis—£Q£J—is_,dir££lhi proportional to thenum^er"ofexisting design_d££isimia-d-C££i£dA Hardware reliability is a measure of how the parts wear out. Snftwprp— does not wear out; its reliability is an estimate of the n_umber_of undetected errors. ~ These differences suggest that, although there is a strong parallel between hardware and software, too rigid a commitment to a hardware model may prove detrimental to the software process. Despite its limitations, there are many reasons for using the traditional waterfall model. When the system to be developed combines both hardware and software, the organization of the project into parallel development phases facilitates management and coordination. Of even greater significance is the fact that virtually all our empirical data regarding software development is based on this process model. For example, all of the cost estimation and project scheduling methods build on data organized by waterfall phases. The waterfall diagram depicts one way to manage the development of software; it does not necessarily represent how software should be produced. In fact, it has become fashionable at recent software engineering conferences to declare the waterfall organization obsolete. Until tested alternatives are available, however, prudence demands that the standard flow be followed for large projects. Most of the problems associated with the traditional waterfall project organization reflect misinterpretations of task goals when software is viewed
The Software Process
as a form of hardware. classes of product are:
7
The most common dangers in confusing these two
Premature formalization of the specification. Because the design activities cannot begin until the analysis is performed and the specification is complete, there often is a tendency to produce a complete specification before the product needs are understood fully. This frequently results in an invalid system. Unlike hardware, software can be incrementally developed very effectively. iEx£erience-lias>^h^wniHthai4B product is decomposed into many__smalL"buiMa!LaLilll-ii£liveiies_£^aLi_££^M.
use^rotoWgi^-ason£ji5£^]ii£aui]aaatt^Bad6l^ULi£S^^2ii££Jiii5^JliJiiiii^— prototype are preserved in the specification^. Excessive documentation or control. Software development is a problem¬ solving activity, and documentation serves many purposes. It establishes a formal mechanism for structuring a problem solution, it is a natural means of communicating the current design decisions, and it provides an audit trail for the maintenance process. But documentation demands often go beyond these pragmatic _n££jjs. IlKWlff transter of SiiULSliiU.iiU* which is counterproductive. Delay
of
software
decisions
to
accommodate
hardware
limitations,
^te^uently^because^softwareisrela^ivgJ^^as^^^Si^n^ej^theixi^Uie perce£ti^£i^b^con^iensatec^U)rTy changes to the soflwjjx^, From a systems engineering perspective, thfs strateg^^Is^oTmously inappropriate. Although this may be the only reasonable alternative, it clearly is not a desirable design approach. Emphasis on physical products such as program code. Because code frequently is viewed as a product, there is a tendency to place considerable store in its existence. The most difficult part of software design, however, is the determination of what the code is to implement. In fact, the production of the code and its debugging typically takes less than half the time of its design. Also jnosternjn^n^^mu^jwlesign and not ^i^^y^tin|B__code4^-Jherefore, manager^should not b^^fto" "concerned that lTttTe^cocIe has been produced as long as there is evidence
yiatth^desi^i^^n^^^nin^^^jjIUm^grstandin^jUm^^h^^fliflML tosSW^Tneproblem^^An^prog rammers should not be encouraged to 'cocTebeTo^^ffi'S^TTave worked out th^^nJ^d^aifu^^fjyieii^^y^L application. (Thq^xce^tioi^QL^QIil^^^^h^jatotgiii^^ialisdiscar^^l. ^affeTTmessons have been assimilate^jiiBiJjig^lfi&i^ii^
1.3
ESSENTIAL STEPS IN THE SOFTWARE PROCESS
The previous section presented a view of software development as if it were an adjunct to hardware development. The discussion was strongly biased
8
Chapter 1
by the need to manage the process. In what follows I consider the software process without reference to any management concerns and focus on what I call the essential software process. I adapt the term from a recent paper by Brooks in which he borrows the concept from Aristotle [Broo87]. In that paper. Brooks divides the difficulties in software technology into essence, those that are inherent in the nature of software, and accidents, those that today attend its production but are not inherent. Brooks identifies four essential properties of software: complexity, conformity (the fact that the software must conform to the product’s other interfaces), changeability, and invisibility (the observation that there are no physical models or realizations). He states that past breakthroughs (e.g., highlevel languages, time-sharing, and unified programming environments) solved accidental difficulties. The title of the paper, "No Silver Bullet," suggests Brooks’ conclusion that a major breakthrough is not expected. (I will return to this important paper later.) The object of concern for Brooks is the software product. In contrast, I view the objective of the software process to be one of creating a software solution to a problem. There is a subtle distinction between these two views that will be addressed in Chapter 2. Here I will limit the discussion to the issue of how software is developed assuming that there is no need for management constraints. The remainder of this section offers an essential model based on current experience; the following section contains a somewhat more abstract model. If one examines the essential software process fixed in the waterfall flow, one sees that software is developed using the same general process as that used for building a bridge, conducting a scientific experiment, or managing hardware development: First, we determine what is to be done. This step is called analysis. The statement of what the software is to do is its specification, which determines the behavior of the software and establishes non-functional criteria (e.g., performance requirements or storage constraints). Next, we decide how to realize the desired behavior. This step is called the design. With software, it involves several levels of transformation, each of which adds details (e.g., from a component design to module designs to detailed designs to code). This defines the structure of the implementation. Often in software one speaks of programming-in-lhelarge, which involves the architecture of the system, and programming-inthe-small, which involves the creation of the code. Following this, we test the product (and each of the intermediate components). Because there are many implementation steps (i.e., the how detailing), some form of testing must follow each step. There are two dimensions of testing. Verification tests if the software product is correct with respect to the specification. That is, does it exhibit the desired behavior; is the product right. Validation determines if the software corresponds to the intended needs; is it the right product.
The Software Process
9
Finally, we use the software product. In most cases, the product changes the environment it was intended to support, thereby altering its initial specification. Consequently, the software will evolve continuously until its structure degrades to the point that it is less expensive to retire the software than to modify it. This maintenance activity can be viewed as an iteration the preceding steps. This general flow is illustrated in Figure 1.2. The four steps are shown as three transformations. Determining what to do is shown as a transformation from the real world to a problem statement. Establishing how to do it is shown as a transformation from the problem statement to the implementation statement. Using the product is shown as a transformation of the implementation statement into a system that is then embedded in the real world. Because the last step alters the real world and thereby modifies the validity of the problem statement, the figure represents one iteration of a continuous process.
z
o
LlJ
o z
LlI
o z o Q. V) UJ
a: a: o
o
v> w Hi
z
H O
<
o
< >
LU
tr cr
o o
Figure 1.2 A simplified software process model.
The figure also displays the two quality measures for the system and the processes used to evaluate them. Correspondence measures how well the delivered system corresponds to the needs of the organizational environment. Validation is an activity used to predict correspondence; true correspondence
10
Chapter 1
cannot be determined until the system is in place. (Moreover, as those experienced in system evaluation know, the determination of correspondence for an operational system can be difficult and controversial.) Unlike correspondence, correctness can be established formally. Correctness measures the consistency of a product with respect to its specification. Verification is the process of determining correctness. Notice that correctness is always objective. Given a specification and a product, it always is possible to determine if the product precisely satisfies the requirements of the specification. Validation, however, is always subjective; if the behaviors could be detailed, they would have been included in the specification. Validation begins as soon as the project begins, but verification can begin only after a specification has been accepted. Verification and validation are independent of each other. It is possible to have a product that corresponds but is incorrect (for example, a necessary report is part of the delivered product but not included in the specification). A product also may be correct but not correspond (for example, after years of waiting, a product finally is delivered that satisfies the initial design statement but no longer reflects operating practice). Finally, observe that when the specification is informal, it is difficult to separate verification from validation. In this simplified view of the software process, the transformation from a need to a problem statement represents only a small portion of the total activity. As I will demonstrate later, this is a key dividing point in the process. Before the problem statement exists, the activity centers about determining what is to be done. Once it exists, the construction of a software product becomes the central concern. The transformation from the problem statement into an implementation statement entails the design, code, and test steps of the waterfall life cycle model. The process is one of adding details to ensure completeness and to meet the performance objectives. The transformation from the implementation statement to the system always is supported by automation. For example, a compiler translates the code form of the implementation statement into an executable product. Thus, virtually the entire software process effort is allocated to the second transformation: from a problem statement to an implementation statement. The flow shown in Figure 1.2 includes all the waterfall processes. Here the testing feedback is distributed throughout the process. In contrast, Boehm’s waterfall diagram shows each step with an explicit verification or validation obligation and identifies a separate testing and preoperations phase. Nevertheless, there is a general consistency between his waterfall model and the more reduced model in Figure 2.1, which implies that empirical experience with the former will provide insight into the latter.
The Software Process
11
On the basis of many years of study, the following statements are held to be true by most software engineers: Relatively little of the total effort is devoted to the preparation and testing of code. One scheduling guideline, the 40-20-40 rule, allocates 40% of the effort to analysis and design, 20% to code and debug, and 40% to integration and testing. Most errors detected are errors in analysis or design; relatively few are programming errors. That is, error correction generally requires changes to the design documentation. This implies that coding represents a small fragment of the total effort, and it is a task we perform well. Verification and validation are feedback activities that should be conducted as early as possible for each product component. The later an error is detected, the more expensive it will be to repair. The cost to repair a defect after product release can be as much as 100 times greater than the cost to fix it at the time of requirements specification. The individual productivity lines of code produced per language used. Naturally, production of more function
of programmers, when measured in terms of unit of time, is roughly independent of the a higher level language will result in the per unit of time.
Individual productivity decreases as the number of individuals assigned to a project increases. This reflects the project’s complexity, the difficulty that individuals find in building a good understanding of the total task, and the administrative burden of control and communication. In projecting costs, the single most important factor is the quality and experience of the individuals assigned to the project. Although some early studies suggested that individual differences could be as great as 28:1, most of those differences can be reduced by training and education. The cost to maintain a system over its lifetime normally will be twice as much as that required for initial development. Over half the maintenance cost is perfective, that is, it enhances the operational product. About one-fourth of the maintenance effort is devoted to adapting the system to meet changed needs, and only one-fifth of the total maintenance activity involves the correction of errors. Each of these factors has been reported for many different projects and environments. It is doubtful that any structuring of the software process will change them. Therefore, they should be accepted as the essential characteristics of the software process. If we expect to improve productivity
Rather than site individual references for each statement, I refer the reader to such standard software engineering sources as [Boeh81a, ViRa84, Fair85, Somm85, MaBu87, and Pres87],
Chapter 1
12
or quality, we constraints.
1.4
will
have to continue
to work
within
the context of
these
THE ESSENTIAL SOFTWARE PROCESS MODEL
The simplified software process model (Figure 2.1) presented the flow from the real world to a system as three transformations. The composite transformation may be viewed as one that goes from the identification of a problem to be solved to the production of a computer-supported solution to that problem (i.e., from a real world need to a system that satisfies that need). Rephrasing the last statement in terms of knowledge domains, the software process is a transformation from a set of needs in the application domain into a solution in the implementation domain. Actually, four overlapping knowledge domains are involved: Application knowledge. This includes all knowledge about the problem space in the real world. It comprises both generic knowledge (such as accounting, missile guidance, or medical diagnosis) and local specifics (such as knowledge about the organization needing an accounting system, the characteristics of the missile that will use the guidance system, or the medical framework for the diagnostic system). Application-class knowledge. This includes the software process experience that offers insight into product conventions for a given class of application. Examples of application class knowledge are the linking of input validation with online interactions and the need for fault tolerance in mission-critical applications. Software-tool knowledge. This includes an understanding of how the products of computer science are used to realize a correct and efficient implementation. The knowledge encompasses the use of both methods (such as composition, entity-relationship modeling, or structured analysis) and tools (such as programming language compilers, documentation systems, and database management systems). Computer science knowledge. This includes the first principles used to produce and evaluate the software methods and tools. This knowledge is embedded in the software tool knowledge and need not be referenced explicitly except when building or evaluating methods and tools. Of these four categories of knowledge, the software process typically is concerned with only the second and third. The software process does not attempt to capture all of the understanding of the application domain in which the system will be used; rather, the goal of the analysis step is to identify the essential features of the intended product, which (when combined with the implicit requirements of the application class) will specify the desired product. The implementation of that product relies on the available methods and tools.
13
The Software Process
There is seldom a need to investigate theories in either the application or the computer science domain. Given this identification of the domains of knowledge, the essential software process can be modeled as a transformation from some application domain into an implementation, as illustrated in Figure 1.3. The figure is drawn from the perspective of the software process. Two lines of activity are shown. Conceptual modeling is the activity of formalizing the problem solution and the associated application-class knowledge to create a formal statement of the automated implementation. Formal modeling is the process of converting the specified solution into the implemented solution.
APPLICATION DOMAIN
SUBJECTIVE MODELS OBJECTIVE MODELS
IMPLEMENTATION DOMAIN
Figure 1.3 The essential software process model.
As before, three transformations are shown. The first is from a perceived need in the application domain into some understandable statement of a solution to that need. I use the term conceptual modeling to suggest that the model of the computer-supported solution is based on the analysts’ understanding of the problem and the technology available for the solution. Conceptual modeling is a problem solving activity; it is subjective, and there is no concept of correctness. Note that, from the perspective of the application domain, there may be formal models to define what is correct. For example, there are formal methods to determine the effects of a missile control system, but subjective (conceptual) judgments are required to establish what should be implemented in
Persons working with information systems use this term in a much more restrictive way. I use the term conceptual in its most generic sense; in another version of Figure 1.3, I label the models subjective and objective.
14
Chapter 1
the software solution. Thus, when viewed from either the perspective of a software engineer or a domain specialist, human judgment is an essential ingredient. Only one of the process’ models can be formal; the other must be conceptual. Returning to the software perspective shown in Figure 1.3, the second transformation goes from the conceptual model, which describes the intended solution in terms that are understandable in the application domain, to the formal model, which can be correctly transformed into an implementation. The third transformation is from a formal specification into an implementation. Both Figures 1.2 and 1.3 contain three transformations, but the transformations are different. In the simplified software process model (Figure 1.2), the output of the first transformation is the problem statement, which we now call the specification. The second transformation produces the implementation statement by adding details that are correct with respect to the specification. The final transformation uses automation to produce an implementation of the desired system. In the essential software process model (Figure 1.3), the formal modeling line begins with the existence of the specification. Here, the first two transformations are concerned with the building of the specification: first by deciding what should be done through the use of conceptual models, and then by the transcribing the conceptual model into that formal statement (i.e., the specification). The two models emphasize different aspects of the process. The simplified software process model is more concerned with the activities that follow the existence of the specification; the essential software process model offers a more complete analysis of what goes into the creation of the specification. I will return to the second issue later, but for now I will combine the two models into a simple pair of transformations: one that goes from an application concept (A) to a specification (S), and one going from the specification (S) to a system product (P). Lehman, Stenning, and Turski have described a software process model (LST) that examines these two transformations [LeST84], They call the transformation A => S abstraction and the transformation S => P reification. Others use the terms analysis and detailing for these activities. Many have studied the software process by examining what happens before and after the specification exists, and the discussion now turns to that well traveled path. Although I do not find that the LST model sheds much light on abstraction, it does offer insight into the reification process. Reification begins with the existence of a specification S. The goal is to transform that specification into a correct product. Correctness with respect to S means that all behaviors specified in S are preserved in P. Because S is the root of this chain of transformations, there is no concept of correctness for S.
The Software Process
15
The process can be represented as follows: S => Sx => S2 => ... => Si => Si+1 => ... => Sn => P It begins with some specification S. For a given problem, there are many potentially valid specifications that could be selected. Reification is not concerned with why S was selected. In the LST model the primary concern is that P be correct with respect to S. Presumably, this correctness can be demonstrated logically. Although there is an explicit validation obligation, it is recognized as an extralogical activity; validity cannot be proven. The central issue is how to preserve correctness. The specification S is considered a theory in the sense that a set of axioms provides a mathematical theory. Given a theory, the designers derive a model in a linguistic form that is closer to that of the desired product P. The result is a model of the theory. Many correct models for a theory exist, and the designers must select the one they consider to be the best. If the model is not derived by use of behavior-preserving transformations, then it must be verified. A validation obligation follows to ensure that no inappropriate behaviors have been introduced. Finally, the accepted model becomes the theory for another iteration of this canonical step. Expressed in this fashion, it is clear that reification is a sequential process. The model Si+1 cannot be built until the model S; is accepted as a theory. Once Sj has been accepted as a theory, any changes to the theories Sk, k < j, will invalidate the chain of logic from Sk forward. The goal is to maintain a logically correct trail until some model (Sn) exists that is isomorphic to the desired product P. The Sn represents a program that will be correctly transformed into the product. Because it is unlikely that a single linguistic representation is appropriate for each level of modeling, the canonical steps are presented in terms of different linguistic levels. One may think of going from a top-level design to a detailed design to a program design language (PDL) to code. Of course, these are all informal models, and it is difficult to prove that one (model) is correct with respect to its parent (theory). To guarantee rigor, one must have formalisms to express the descriptive theories of the application domain in ways that can be transformed correctly. For example, to the extent that FORTRAN expresses the scientist’s intent, the FORTRAN program can be considered a descriptive theory. The compiled code will be a correct model of that theory. (Different compilers might produce alternative, but equally correct, models.) Even when an appropriate formalism for the specification exists, the specification normally will not be complete. Maibaum and Turski observe that there are two reasons for this [MaTu84], First, the sponsors seldom know enough about the application to define it fully at the start. Second, and from the LST model perspective even more importantly, there are many behaviors in the product P that are not important to the sponsor. If the specification S is complete, then S will be isomorphic to P. In short, the problem will be overspecified. What is necessary is that S contain only the essential
Chapter 1
16
behaviors. Any behaviors added to some Sj are permissive so long as they do not violate the behavior of S. In this presentation, the LST model of reification represents the formal modeling line of Figure 1.3. It starts with a specification and describes how details are added to the specification until the product is created. Each canonical step in this sequence is based on the logical assumption that correctness is preserved. The validation obligation and the need to admit permissive behaviors indicates, of course, that the process is always subject to an extralogical review, which is independent of each model’s correctness. Thus there is a tension between the rigorous and the judgmental, between the objective and the subjective. To conclude this examination of reification, I borrow some concepts from Koomen. In his discussion of the process, he identifies an "information content" function associated with the designer [Koom85]. Combining this idea with what has been presented already, the LST canonical step can be expanded as shown in Figure 1.4. At the top, the boxes represent the flow from some S; as the theory (Tj), through a correctness proof (C), to the correct models of that theory (Mjj), past the validation obligation (V), and finally to the selected model (M;) that becomes the next theory, Si+1. Below this is shown the role of the designer. He embeds his knowledge about how to transform the theory correctly (Kl5 or software-tool knowledge) and the intent of the product (K2, or application-class knowledge). The former has an objective core in that (for formal specifications) there are objective criteria for establishing correctness; the latter is always subjective.
Mj (Si+|)
Figure 1.4 Model of the canonical step Sj => Si+1.
What started out as a clean description of a logical process has become quite messy. If reification — which I associate with formal modeling— depends on human judgment, then what can be said about abstraction -- which I associate with conceptual modeling? I assert that the LST process of abstraction is a human problem solving activity. It begins with some application domain problem to be solved and fashions a software solution to that problem. It requires knowledge of the application domain and the
The Software Process
17
particular context of the problem. The potential solution is stated in terms of how software has been applied in similar situations (application-class knowledge) with an understanding of the technology available to realize a software solution (software-tool knowledge). Formal models of the application domain, where they exist, are used to avoid errors; they cannot guarantee validity of the software product. Whereas reification must be a sequential (and hierarchical) process, abstraction tends to reflect its cognitive origins. Problem understanding and solution definition will involve "chunking" at various levels of abstraction, along with the reuse of prior knowledge and experience to guide the process. The less well understood the problem (i.e., the less knowledge available for reuse), the greater the probability that the initial specification will be invalid and/or that invalid behaviors will be accepted as permissive during reification. In the LST model we assumed that S contains all the essential behaviors of the desired P. But Maibaum and Turski note that the sponsor may not know what those behaviors are. Therefore, during reification it may be necessary to augment S as the problem becomes better understood. Changes to S, naturally, may invalidate models that were derived from the original S. Consequently, the software process is not one of the logical derivation of a product from a specification. Rather, it is best understood as one of problem solving in two domains using both subjective and objective criteria. Of course, that is what makes it so difficult. The last paragraph is summarized in Figure 1.5. It divides knowledge of the solution into subjective, which relies on judgment for evaluation, and objective, which has precise criteria for evaluating correctness. The dashed lines suggest that there is no clear division between the two categories. Here, the LST model is represented as a subjective abstraction process followed by a series of reification steps that include verification (objective) and validation (subjective).
Figure 1.5 Objectivity and subjectivity in the software process.
Chapter 1
18
Comparing Figure 1.5 to Figure 1.3, we see that the subjective activities are guided by the conceptual models, and the objective activities are implicit in the formal models. Notice that the essential software process model does not depend on a specification in the same ways that the LST model does. Figure 1.3 suggests that the conceptual modeling can be done in parallel with the formal modeling. In this figure, it is not clear where the specification S is defined; in fact, it is possible that the first formal model will be the program code. This discussion of software process models has gone from the traditional waterfall flow to a simplified flow and finally to a minimal essential model. Each model describes the process imperfectly. One advantage of the essential model is that it reduces the process to its elementary activities. The next section uses this model as a framework for examining how researchers are approaching the software process.
1.5
ALTERNATIVE APPROACHES TO THE SOFTWARE PROCESS
The essential software process model of Figure 1.3 presents the process as two parallel activities that transform a need recognized in the application domain into a solution in the implementation domain. Two different kinds of modeling tools are required, and there is also a need to convert one class of representation into another, which suggests that there are two basic approaches to improving the process. One research path begins in the application domain and looks for conceptual models that can represent the knowledge about the application domain and the problem solution. The other seeks to extend the formal modeling line to the left through the use of higher level and more expressive languages. Both seek to find representations that facilitate the transfer from one modeling domain to the other. Turski, one of the most articulate champions of the formal approach, notes: The history of advances in programming -- the little that there is of it — is the history of successful formalisation: by inventing and studying formalism, by extracting rigourous procedures, we progressed from programming in machine code to programming in high level languages [HLL].... For many application domains HLLs provide an acceptable level for program (and system) specification.... The expert views of such domains, their descriptive theories, can be easily expressed on the level of HLL, thus becoming prescriptive theories (specifications) for computer software. If it is desired to use computers reliably in domains where this is not the case, then we must try to narrow the gap between the linguistic level on which domain-descriptive theories arc formulated and the linguistic level on which programs are prescribed (specified). [Turs85, p. 400]
The Software Process
19
The goal of the computer scientist is to devise representations that allow the domain specialist to specify his needs. The domain specialist has the background to create descriptive theories; the responsibility of the computer scientist is to produce software tools that transform the specification into an implementation. Viewed this way, solutions to the software crisis reduce to problems that are natural for the computer scientists to solve: the definition of computational formalisms and methods and the development of software tools. Application domain knowledge, when needed, is limited to that of the application class. Researchers working at the conceptual modeling level take a variety of different approaches. One is the concept of automatic programming [RiWa88a], Here the environment maintains knowledge about both the application domain plus what I have called software-tool knowledge. As Barstow states: An automatic programming system allows a computationally naive user to describe problems using the natural terms and concepts of a domain with informality, imprecision and omission of details. An automatic programming system produces programs that run on real data to effect useful computations and that are reliable and efficient enough for routine use. [Bars84, p. 26] Barstow’s automatic programming system was designed to analyze the data from oil well drilling tools. Less sophisticated applications of this category include end-user tools such as spreadsheets and database query languages, in which the link between the conceptual and formal models has been automated. Conversely, most computer-assisted software engineering (CASE) tools are examples of conceptual models that are not linked to the formal models. One uses the tool to model both the requirements and the implementation. The models usually rely on diagrams to express conceptual decisions. Many tools contain facilities to test the completeness or consistency of the conceptual models, but few can translate the conceptual model into a formal one. Hence, the conceptual model is an adjunct to the formal model; one can change either without affecting the other. A third class of conceptual models grew out of work in knowledge representation and abstract data types. Here, portions of the application domain are modeled in an operational form that can support some computation. The most common example is the modeling of objects. Because the term "object" is now very popular, it is used for a broad variety of models, ranging from conceptual adjuncts (e.g., Booch’s object oriented design [Booc86]) to formal definition (e.g., an object oriented programming language like Smalltalk). Sometimes the model is part of the definition of what is to be done; at other times the model is used to detail the structure of the solution. One major difference between the conceptual and formal models is that the former often are restricted to only some part of the total problem solution. Semantic data models, for example, deal only with the objects to be represented in the database. However, the specification used in the reification process must be complete, and logical consistency demands that it be stable.
20
Chapter 1
When there is a well-understood problem, it is easy to define a specification first and then to proceed with development in the necessary top-down manner. But when the problem is poorly understood, the methods used must minimize the impact of change. One common approach to software development in new domains is to reduce the risk by limiting the size of each increment produced [Gilb88]. A project may be decomposed into builds or small deliverables, wherein each build is developed, tested, and delivered. Later builds utilize experience with the earlier deliverables. Schedule delays are easy to identify, and the risk can be distributed over many tasks. For more complex problems, prototyping offers a technique for gaining knowledge. In one prototyping method, a software product is created to experiment with some part of the problem that is not yet understood. The prototype serves as a laboratory for formalizing the solution. The lessons learned are documented in the requirements specification, and the prototype is discarded. Gomma and Scott suggest that it is reasonable to have a prototyping activity cost up to one-tenth of the total project cost [GoSc85]. Boehm’s spiral cost model, shown in Figure 1.6, uses a similar strategy in prototyping [Boeh88]. Here, the risk is considered too great to begin with a full requirements specification at project start. Therefore an iterative process is initiated to improve the understanding of the problem. One begins with an analysis of the objectives, alternatives, and constraints and then performs a risk analysis. The prototype is targeted for the areas of highest risk. On the basis of the outcome, additional requirements are identified. If necessary, plans for another cycle are made. Prototypes are constructed until the problem is understood; the final activity is the construction of the operational product using a waterfall-like sequence of phases. In these two examples, the prototype is used as an analysis tool to refine the specification so that the product can be built in a top-down process, sometimes referred to as decomposition or step-wise refinement. It also is possible to construct a system without waiting for the definition of the full specification. This approach is called composition. With composition, one completely models selected portions of the whole. The process is completed when all components have been modeled. (In contrast, decomposition begins with the entire system, and one divides the system into partially modeled components. Each component is again decomposed until all components are complete.) It is argued that because composition reflects what is known about portions of the whole, the resultant system will come closer to matching the reality of the environment. Consequently, the software should be easier to maintain. With decomposition, advocates of composition state, one is forced to make system level decisions very early in the process when the problem is least understood. Structural flaws may result, which will make evolution very costly [Came86].
21
The Software Process
Cumulative cost
Progress through steps
Determine objectives, alternatives, constraints,
Evaluate alternatives, identify, resolve risks
Review
Plan next phases
Develop, verify next-level product
Figure 1.6 The Boehm spiral model of software development and enhancement. (© 1988, IEEE, reprinted with permission.)
Notice how composition is presented as a cognitive activity. One defines things as one understands them; the sequence of definitions is not very important. The requirements (i.e., specification) emerge when the process is complete. Because most composition methods have no clear dividing line between the definition of the requirements and the start of implementation, the full specification may not emerge until the product is available. Therefore, rather than follow the transformations A => S => P, the transformation may become A => P, with S as a byproduct. Said another way, the specification becomes a description of what was constructed rather than a statement of what is to be
Chapter 1
22
built. As I will discuss in Chapter 2, this may be a very reasonable approach for some applications The operational approach [Zave84] is a special case of composition wherein one builds an executable specification, which serves as a prototype for examining the behavior of the target product. According to Zave and Schell, there are several differences between an executable specification and a program. An executable specification should be tolerant of incompleteness while a program need not be. The other differences between the two classes of language have to do with performance and efficiency. Performance constraints are part of a system description in an executable specification, but only as an observable effect in a programming language. Also, a programming language must be automatically compilable into an efficient implementation on one type of runtime configuration, while an executable specification language must be translatable into efficient implementations in all possible runtime configurations. [ZaSc86, p. 323] In summary, the specification must establish the behavior for all implementations, while the programming language represents a solution for a specific environment. For example, one can specify the behavior of a chess¬ playing program in terms of valid moves and endgame outcome without providing any information about how (or if) the program will be implemented. All chess-playing programs must satisfy the specified behaviors. Once a prototype has been demonstrated in the form of an executable specification, the reification process can begin, using either traditional manual methods or behavior-preserving transformations. In the paradigm being developed by Balzer and his colleagues [Balz83, Balz85], there is a division between specification and the optimization required to provided the necessary responsiveness. Specification focuses on what the software product is to do; optimization distributes that knowledge to make the product efficient. This distribution makes the product more difficult to maintain. Thus, the program is maintained at the specification level with transformations used to create an efficient product. During the product’s operational lifetime, responses to requests for changes are made at the specification level, and the transformations are reapplied as appropriate. Because the application domain in which Balzer works is complex, human intervention is required to select the transformations necessary to produce an efficient program.
1.6
CONCLUSION
Figure 1.7 presents the last of the software process model diagrams. It depicts the process in two views. On the left is shown the time spent in each activity. (Selling the concept consumes either very little or all of the effort.) In this allocation of time, the life cycle cost is divided equally between development and maintenance, and the 40-20-40 rule is used.
23
The Software Process
FUN 0
SELLING THE CONCEPT
5
STATEMENT OF REQUIREMENTS
5
ALLOCATION OF FUNCTIONS
10
DETAILED DESIGN
10
PROGRAMMING
20
TESTING
50
MAINTENANCE
,,
■ ftl §jj|g EMOTIONAL
100
Figure 1.7
TEDIUM,
QUOTIENT (SCALE UNDEFINED)
Two views of the software process.
The emotional quotient scale on the right suggests the path of undisciplined software production. People tend to maximize fun and minimize tedium. Therefore, the most rewarding approach is one that begins with a selling of the concept (that is, speculating about how computers will solve the problem) and then jumping directly to the programming activity. There are strong incentives for the early production of code; code is a product that can be quantified and presented as progress. Programming also is a rewarding intellectual activity; in fact, the personal computer has elevated it to hobby status. The first wave of software engineering in the early 1970s imposed a structure of discipline on programming. It was obvious that programs would not be useful unless they did what was required of them. This implied the definition of the requirements, their restatement in the form of a product architecture that would meet those requirements, and successive levels of detailing that ultimately led to the program. Such an organization facilitated the management of the large projects that the technology could support. Although the new structure offered a discipline that improved productivity in large and complex projects, a major perception of the on-the-job-trained programmers was that the fun was going out of the work. Emotionally, detailed design was considered a barrier to the start of the programming activity. From the designers’ perspective, testing was an equally distasteful responsibility best left to those who lacked the flair to be designers. And maintenance, that most tedious of assignments, should be a training ground for new programmers; a place for them to pay their dues. Thus, the disciplined organization provided an environment to get the job done, but at a cost.
24
Chapter 1
In the 1980s, interactive programming caused a cultural change. The issues of disciplined programming (as opposed to disciplined development) had been resolved. The new programmers had formal academic training. Structured programming was the norm. PDL replaced flowcharts. But the problems to be solved were changing. Knowledge of what a product should do was dynamic. Users were unwilling to wait years for the full development cycle; their requirements would not remain stable. The types of large projects also changed. Most early projects were well-defined engineering tasks; the major issue was one of making the software produce an effective solution. New problems had humans in the loop. Man-machine interaction was difficult to model; prototypes offered an effective learning device. Finally, the work of Lehman and Belady [Lehm80, BeLe76] and Lientz and Swanson [LiSw80] clearly demonstrated how much of the software process is maintenance. We are moving away from the waterfall diagram with its associated hardware development models [Agre86]. One direction of research, typified by the LST reification step, emphasizes formal modeling. Although there is a recognition that the specification is subject to change and that the validation obligation is extralogical, the goal is to extend the languages used in specification to preserve the correctness and to model partial solutions. This is a generalization of the HLL concept. Definition of a language is a computer science problem guided by application-class experience. Once the language is available, the application specialist uses it to define his descriptive theory. For example, the Fourth Generation Language (4GL) is used by the analyst to describe the behavior of the program; the system translates this specification into a correct implementation. (Of course, more complex problems may require multiple levels of HLL before an efficient product can be produced.) The other line of research builds on what I have described as conceptual modeling. However, there is a major difference between what I have called conceptual modeling and most of the research under that title. In the essential software process model, the conceptual line represents the definition of what the product is to do. The example of Barstow’s automatic programming system showed how a tool could manage all the necessary knowledge so the final program could be created by using only the domain specialist’s conceptual understandings. Most conceptual models today, however, are independent of the target application domain. Most have no mathematically constructed foundations that can be mapped onto a formal model. In both the conceptual and formal modeling lines, current research tends to be bottom up. We are concerned with higher level representations for portions of the implementation and the transformations of the specification constructs into operational algorithms. Most of the study of automatic programming and the psychology of programmers focuses on the cliches and plans used during detailed implementation [Wate85, SoEh84], Only a few A1 projects attempt to provide an intelligent, domain-specific tool that links a conceptual solution to an implementation. There are several reasons for this concentration on the form of the implementation. First, and most obviously, it is the domain of the computer
The Software Process
25
scientist. Second, by historic accident, this is where the application of computers began. (In the early days of the PC revolution, for example, computer literacy was equated with an understanding of BASIC -- a rcassertion of the premise that to understand computers, one had to program them.) It was only by abstracting out the domain complications that the problems become tractable. Yet the software process begins in the application domain; the formal part of the process cannot begin until the conceptual models exist. Brooks puts it as follows: I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. [Broo87, p. 11] What I have done with TEDIUM is to address this "hard part." The following chapters describe how an environment can be used to improve the process of specifying, designing, and testing that conceptual construct. The goal of TEDIUM is to mask the details of implementation by offering an environment in which it is easy to build descriptive models. The users of this environment are assumed to be professional system developers (software engineers). The products are assumed to be within the state of the art; that is, they extend prior experience. Given these two limitations, I believe that TEDIUM offers insights into how a shift to an application orientation can lead to an order of magnitude improvement in productivity.
Chapter 2 A Philosophic Framework
2.1
INTRODUCTION
This chapter offers a context for understanding TEDIUM. It is divided into two parts. First, there is a presentation of philosophical assumptions, better termed biases, that have guided the development of TEDIUM. Some of these concepts have matured after TEDIUM, and they serve as a rationale for choices already made. The second part of the chapter builds on the analysis of the software process given in the previous chapter. It proposes general criteria for software improvement. In subsequent chapters, TEDIUM is described as an instance of an environment that meets many of these criteria. The material in this chapter is fragmented. The topics touch on some philosophical issues that help bridge the gap between what TEDIUM is and how I see it contributing to an understanding of the software process. Unfortunately, I am not yet ready to present a coherent, unified view of the latter. Nevertheless, I have chosen to offer my views here because I think that we seldom step back and question what we are doing. In this sense, the discussion provides a context for an evaluation of TEDIUM. Much of what I say has been said elsewhere, better and in more depth. Of course, any questioning leads to controversy, and computer science is built on deeply held, occasionally emotional beliefs. Therefore I conclude this introduction with McCulloch’s admonition, "Don’t bite my finger -- look where I’m pointing."
2.2
SOME PRELIMINARY ASSERTIONS AND BIASES
In this section I address four topics. First I consider to what extent we can rely on scientific enquiry to establish or certify theories about the software process. The essential software process involves two types of modeling. The conceptual models are constructed by subjective reasoning, and their unambiguous analysis is inherently difficult. The formal models, on the other hand, have criteria to preserve correctness. The key issue is how the
28
Chapter 2
software process can combine the formalisms with a judgmentally founded universe. Next I consider the matter of knowledge representation. At one level, systems analysis can be considered the formalization of knowledge for a particular domain. System design and implementation, then, can be viewed as the transformation of an operational representation of that knowledge. The term knowledge representation usually is reserved for artificial intelligence (AI) applications, and the subsection considers how (and if) TEDIUM relates to AI research. The third subsection presents a brief overview of what we know about human information processing. This is of interest because TEDIUM provides an environment for analysis and design; consequently, it should be organized to complement human abilities related to those tasks. The section is sketchy, but it serves to identify some essential characteristics for an environment such as TEDIUM. The final subsection borrows its title from Dijkstra’s famous GO TO letter [Dijk68] and justifies my disregard of diagrammatic design tools. It is fair to state here that I didn’t have the hardware to produce graphics during the early stages of TEDIUM’S development; perhaps my arguments are simply "sour grapes." Nevertheless, I explain why I believe that a reliance on pictorial techniques can be counterproductive.
2.2.1
The Limits of Scientific Investigation in Software Engineering
I do not believe that there are "right answers" to the question of, "How should software be developed and maintained?" I think there are only "wrong answers" and that progress is made when we recognize and avoid those wrong answers. That is, there is no truth, but falseness can be identified; one can refute but not prove [Popp59]. This bias may seem negative and unscientific. It is broadly accepted that we need more of a scientific foundation for software engineering. I concur, but I also fear that we may be unrealistic in our expectations. First, the software process depends on a sound understanding of the application domain; this knowledge and experience is a necessary precondition to success. Second, as the conceptual modeling line in the essential model suggests, much of the software process involves cooperative problem solving. Therefore, it is based on behavioral rather than mathematical science; the process must rely on art and craft as well as science. A scientific theory can be regarded as a formalism that rejects a very large number of wrong conclusions by offering an explanation for a phenomenon. Kuhn, in his study of scientific revolutions, stated that every explanation is fixed within a paradigm that provides a context for understanding the problem [Kuhn70]. Scientific discovery, he continues, can be viewed as puzzle solving within that paradigm. The paradigm provides a framework for understanding what the problem is and how it may be solved.
A Philosophic Framework
29
In time, Kuhn states, the problem domain grows, and the paradigm is found to be inadequate. Not all questions can be answered satisfactorily. He calls the result a paradigm shift, which produces a new context for re¬ explaining the old while also offering an interpretation for the as-yet unexplained. Sometimes, the old theories are rejected (Copernicus replaced Ptolemy); at other times they are refined (Einstein augmented Newton). In this Darwinian view of scientific progress, the better theories may not be correct. They are better simply because they offer more complete explanations in the current paradigm. I find it useful to view science (and formalism) as a filter that removes wrong answers. This perspective has one major advantage. It separates reality from the models of reality. Science produces only models. Models become tractable by obscuring unwanted details with abstraction. The theories are valuable when they mimic some aspect of reality. But reality is holistic; it can support many contradictory reductionist theories. Furthermore, science, which involves the building and testing of theories, is a human activity. Consequently, the resultant theories are filled with human artifacts. The power of the scientific method is that it eliminates theories that are patently wrong within the current paradigm and the limits of human reasoning. Mahoney has written about self-deception in science [Maho86, Dick86], He argues that "the psychological processes powerfully influence and, in many cases, constrain the quality of everyday scientific inquiry. Moreover, most of these processes are so familiar and pervasive that they escape the conscious awareness of the scientific investigator." [Maho86, p.l] Citing his own work and that of others, Mahoney shows that scientists seldom try to disconfirm their theories, that papers with identical experimental procedures but with different results are rated as methodologically superior when the results are positive, and that the use of self-citations has a positive effect on paper acceptance by journal reviewers. He concludes, "My recommendation is not that we strive to eliminate the human component in science — which, besides being impossible, would be epistemically disastrous — but that we more openly and actively study our own enquiry process." [Maho86, p. 15] Not only does man bias the results of his analysis, but there also are many scientific questions that cannot be approached in a rigorously formal manner. Consider Gardner’s observations about modern psychology. In my view, its methodological sophistication is among psychology’s proudest achievements, but it is an attainment that has not yet been fully integrated with the substance of the field. Many of psychology’s most important issues need to be approached from a molar perspective and entail a top-down perspective. And yet the most rigorous psychological methods often are not appropriate for these large-scale issues. [Gard85, p. 98] Moving closer to the software process. Brooks, addressing the computerhuman interface community, states that there is a tension between
Chapter 2
30
o o
narrow truths proved convincingly by statistically sound experiments, and broad truths, generally applicable, but supported only by possibly unrepresented observations.
Some of us are scientists, insisting that standards of rigor be applied before new knowledge enters the accepted corpus.... Others of us are systems engineers, forced to make daily decisions about user interfaces and seeking guidance from any previous experience whatever. [Broo88, p. 2] The context of this citation is the tension between two computer-human interface communities, but the observation can be extended to the software process in general. Narrow truths in software engineering are the formalisms to which Turski referred. The specification is accepted as a theory, and the formalisms ensure that the product will be a correct model of that theory. This kind of proof gives us confidence in our compilers, but it says little about our programs or specifications. There also are broader truths, some of which were reported in Chapter 1. For example, individual productivity is an inverse function of project size, most errors are design errors, and the cost for product maintenance far exceeds the initial development cost. Unfortunately, the narrow truths are independent of the real world complexities of software development, and the broad truths are not subject to the critical evaluation of a narrow truth. Thus, we must learn to live with two facts. Only small parts of our universe can be measured precisely, and ruling out errors can be a useful compromise to finding a truth. For example, consider a postcondition on a program to sort an array A[i] in ascending order. The following is a natural statement of the expected outcome: i>j => A[i]>A[j]. This statement is a necessary but not a sufficient postcondition. by the "sort" program
It is satisfied
A[i] := 0 for all i. Although it may be difficult to establish the sufficient postconditions, it still is useful to know that the first test can reject faulty programs. In fact, in domains where there is limited experience, it is doubtful that the sufficiency conditions will be immediately apparent. Moreover, where there is uncertainty regarding the validity of the theory (specification), sufficiency conditions may have little utility. (For more on this subject from a different perspective, see [Fetz88 and Edit89].) If I take such a weak stand with regard to correctness, then what are my feelings about quantification and metrics? DeMarco states that if you can’t measure it, you can’t manage it [DeMa82], Gilb has techniques to measure
A Philosophic Framework
31
most software properties [Gilb88]. User engineering focuses on quantifying the user attributes of a system. In each instance, the measurements are independent predictors that serve as guidelines for determining project status, product utility, etc. Measurement is an essential management activity, but, as DeMarco and Gilb clearly indicate, the metrics must reflect local needs and constraints. There are no software process universals in the sense that they are found in physics. In software engineering, we cannot measure properties using narrow truths. Our metrics are designed to apply previous experience, and they are derived from management and project data collected within a fixed process model. These metrics describe the process model and not the essential software process. As Curtis and his MCC colleagues observe, the process models associated with the metrics "focus on the series of artifacts that exist at the end of phases of the process, rather on the actual processes that are conducted to create the artifacts." [CuKS87, p. 96] I see the software process as an activity so complex that I doubt it can be understood with any scientific rigor. Even in well-defined, controllable domains that can support repeatable experiments, there are limits to scientific enquiry and the ability of humans to manage the scientific process. Science is dynamic, and the perception of progress is not always real [Kuhn70, GaI188]. Therefore, I believe that the process will always be soft. We must rely on our instincts for authority. Of course, science does play a role in the software process. Turski’s statement, which links progress in programming to successful formalization, suggests how I view the separation between the conceptual and formal models. Clearly, computer science must focus on the formal approach. Yet the process of developing software is a problem-solving activity in which one translates a need into a software solution. Application domain knowledge is critical, and the activities are primarily cognitive and social. Like narrow truths, the formalisms are indispensable, but their role in the process is restricted. This observation about our ability to quantify precisely is reflected in the title of the Curtis paper [CuKS87], "On Building Software Process Models under the Lamppost." What we can measure, we may not be interested in, and what we are interested in we may not be able to measure. While this is not a very cheerful admission from one who would like to consider himself a computer scientist, it does at least warn the reader that everything that follows (including the quantification) is derived from the author’s judgment. I make no pretence about the correctness of my assertions. I only assert that what I say is not wrong. Perhaps a better way to conclude this discussion is to return to Brooks who proposes three nested classes of results:.
Findings will be those results properly established by soundlydesigned experiments, and stated in terms of the domain for which generalization is valid.
Chapter 2
32
Observations will be reports of facts of real user-behavior, those observed in under-controlled, limited-sample experiences.
even
Rules-of-thumb will be generalizations, even those unsupported by testing over the whole domain of generalization, believed by the investigators willing to attach their names to them. Each of our [human-computer interaction] conferences or journals should accept some reports of each kind.... The appropriate criteria for quality will differ: truthfulness and rigor for findings',; interestingness for observations; usefulness for rules-of-lhumb\ and freshness for all three. [Broo88, p. 2] With this stratification, the experience with TEDIUM is best considered an observation.
2.2.2
On Representing Knowledge
My view of the software process is that it is one of knowledge capture and representation: knowledge of the specific application constraints, the application class, the software tools, and the specific implementation environment. Common usage implies that knowledge processing involves the application of AI, but, as will be explained, what I am doing with TEDIUM differs from the research in AI. For some time I have searched for a good definition of knowledge [Blum85b]. Machlup distinguishes between knowledge and information by suggesting that information is acquired by being told, whereas knowledge can be acquired by thinking. In this sense, the former is a process and the latter a state [Mach83]. By analogy, one can map this distinction between information and knowledge onto the three types of memory use: direct recall, mental images, and inference [Norm82]. The first two represent information, that is, the direct recall of previously stored representations. Inference, which provides a mechanism for creating information, epitomizes knowledge. When one ignores implementation dogma, one finds that information and knowledge are both symbolically stored as data in a computer. The key characteristic of knowledge, however, is not its representation; rather, it is its use. Newell speaks of a knowledge level that uses the principle of rationality as the law of behavior [Newe82]. An agent at the knowledge level has actions, a body of knowledge and a set of goals. The body of knowledge is represented symbolically at a lower level, and it intermixes both what is known and how to achieve the desired behavior in meeting the goal. That is. Representation = Knowledge + Access. Of course, this is a view with an artificial intelligence perspective.
I am indebted to Jack Smith for his instructive comments in the revision of this section. Any errors in fact or judgment that remain represent advice not taken.
A Philosophic Framework
33
One also may view the algorithm as an expression of knowledge. It represents a form of expression that is computationally efficient for a well understood transformation. The disadvantage of the algorithm is that it can be used in only a very restricted context. Of course, the same is true of the production rule; it represents a transformation that will be valid only in the presence of a static set of concepts. The major difference between these two representations is that in the algorithm the domain and access knowledge are bound together, whereas with the production rule and its inference engine there is an attempt to separate these two classes of knowledge. That is, there is a trade off between computational efficiency and generality of use. From the perspective of the software process, I think that a non-AI view of knowledge is the most helpful for the near term. The focus ought to be the capture of what we need to know rather on how we can process that knowledge. Once we have identified the desired knowledge, it turns out that we have the technology to reuse much of it automatically. There are many AI techniques that can help in this process, but I feel that it is premature to attempt building large-scale applications using an AI paradigm. Thus I concentrate on identifying what we need to know and how to express it unambiguously. Once this has been determined, the use of this knowledge (or information) can be rather straightforward. To illustrate the transition that I have in mind, I present an overview of the history of computer applications in medicine. This evolution serves as a model for computer applications in general, and I would expect to see a parallel description for the history of computer tools. I organize the applications by the objects that they process. Three categories are identified.
Data. The individual items made available to the analyst. Information. A set of data with some interpretation or value added. Knowledge.
A set of rules, formula, information from data and information.
or
heuristics
used
to
create
Table 2.1 shows the progress in medical computing for applications in each of these three categories. Within a class of application, research begins only after the supporting technology is mature enough to carry it beyond the conceptual level. Once the underlying assumptions are established, prototypes are used to generate operational feedback. Finally, some of the prototypes mature, and new, complementary applications are disseminated.
Application Type Data Information Knowledge
1950s Research Concepts Concepts
1960s Prototype Research Concepts
1970s Mature Prototype Research
Table 2.1 Scope of Medical Computing
1980s Refined Mature Prototype
34
Chapter 2
For medical applications, the data-oriented products went from the signal processing and data recording tasks of the 1960s to the current systems for diagnostic imaging, implantable devices, and physiologic monitoring. The information-oriented applications have expanded from simple reporting and communication systems, and knowledge-based systems are just beginning to be used in clinical settings. When one examines the information and knowledge-based medical applications, one sees that there is a blurring over boundaries [Blum86e]. Many modern information systems rely on a knowledge base to make recommendations, raise alerts, and interact with the clinicians. AI applications designed for operational use also depend on traditional data management tools to maintain the patient data. What separates the systems is not the implementation environment, the programming language used, or even the complexity of the data structures that represent the knowledge. Rather, it is the product’s objectives or the research goals. The research interests of TEDIUM have little to do with AI. A common AI objective is to have programs exhibit intelligent behavior in some domain, often by simulating some aspect of human behavior. (Some view this as being one step toward a more ambitious objective of developing an artificial intelligence; others are content in simply expanding the range of application function.) My intent, however, is to expand the scope of the informationoriented tools by using knowledge about the application to be developed, the application class under consideration, and the software tools available for the implementation. Obviously, wherever computational methods developed as the result of AI research are appropriate, they will be used. Nevertheless, the primary direction of my work with TEDIUM is the collection and organization of knowledge that restricts the solution space for a class of application problem. Clearly, for any problem, there will be an infinite number of valid solutions that share the same essential properties. The task of TEDIUM, as I see it, is to offer a representation for those essential properties that partitions the solution space. All valid solutions in the solution space will exhibit these properties, and any solutions that do not will be excluded. Although the process is stated in terms of knowledge representation, TEDIUM does not provide any intelligent support in the AI sense. Traditional data capture tools are used to record what is known about the problem and its solution. Procedural inference mechanisms organize the knowledge into an efficient implementation. TEDIUM does not complement the designers’ more complete understanding of the application domain; it simply combines the behaviors implied by the knowledge already available to it. In summary, TEDIUM builds on what researchers in AI have discovered about knowledge representation. Its implementation, however, is based on a database paradigm. TEDIUM makes no attempt to exhibit intelligence. It is really like the clinical information systems that have access to medical knowledge; they make simple inferences and display the results for action by the decision maker. In medicine, the knowledge is complex and there is
A Philosophic Framework
35
always uncertainty; consequently, computers can play only a limited role. As I will show, however, the application of knowledge for the automation of software generation is a much simpler task.
2.2.3
Human Information Processing
The argument so far has clearly indicated that human judgment and the representation of human knowledge are two key elements in the transformation from the recognition of a need to the installation of an automated solution. Therefore, it will be useful to review some facts about human information processing that have guided my understanding of what the TEDIUM environment should provide. It already has been stated that the software process involves problem solving in two domains. For all but the most trivial of applications, the process will extend for periods of months and years. For significant developments, many people will be involved and interpersonal communications will be necessary. Consequently, as the discussion in section 2.2.1 suggests, a thorough understanding of the human aspects of the process may be impossible; it is just too complex to evaluate. Yet, if we are to build an environment, we must have some model of how people will interact with it. Newell and Card categorized the theories of human actions with reference to a time scale [NeCa85]. Above the neural and biochemical level is the level of psychological theory. The temporal granularity for these phenomena are measured in a range from tenths of a second for a cycle to tens of seconds for a unit task. As Gardner suggests, this is a bounded domain that can be studied in some detail. In fact, the information processing theory of [NeSi72] and the keystroke model of [CaMN83] are examples of this kind of psychological theory. Although these theories offer considerable understanding of what is going on during the problem-solving tasks, they offer limited insight into the design process as it extends over longer periods. My interest, of course, is the design process, and here the theories will be difficult to evaluate. I begin with the discussion of the better understood psychological theory because it provides a foundation for ruling out false higher-level theories. The ultimate interest is in what Newell and Card call bounded rationality theory, which involves tasks that may extend for minutes or days, and social and organizational theory, which functions in a time scale measured in weeks, month, years and decades. Card, Moran and Newell have constructed a model of the human processor that offers a framework for understanding and evaluating human-computer interactions [CaMN83], The Model Human Processor consists of three systems. The Perceptual Processor receives visual and auditory inputs and enters them into a store; the Motor Processor takes commands and translates them into motor actions; and the Cognitive Processor operates on working memory (including the visual and auditory image stores) and generates commands for the Motor Processor. The duration of most actions is a fraction of a second, and the major use of the model is in the analysis of short-term actions such
Chapter 2
36
as keystroke entry or the response to a visual alert. However, the model of the Cognitive Processor also provides insight into the action of longer-term activities. There are two classes of memory. Short Term Memory (STM). This holds the information under current consideration; it is the working memory that interacts with the Cognitive Processor. There are a limited number of active STM processes (3 to 7), and — without rehearsal (reactivation) — their contents have a half-life of from 7 to 30 seconds. Long Term Memory (LTM). This stores the knowledge for future use. LTM is considered to be nonerasable and unlimited in size. Retrieval from LTM is instantaneous, but the storage of new information with its associated links takes several seconds. The units managed by memory are called chunks. A chunk is a symbolic entity (or pattern) that generally is expressed in terms of other chunks. Because STM is relatively small, the Cognitive Processor continuously activates chunks from LTM, which replace chunks in STM. The result is that there are bounds on what can be retained in STM during a short period. For example, when one is asked to listen to and then repeat a string of random digits, the ceiling on a correct response is seven plus or minus two digits [Mill56]. This is not a function of intelligence; rather, it is related to the number of STM chunks available and the time it takes to activate LTM. Of course, it is possible to repeat longer strings when the strings are expressed as chunks. For example, the string CBSIBMRCAUSA is in reality only four chunks, CBS IBM RCA USA. This illustration provides a clue for a representation of LTM, which can be viewed as a network of related chunks accessed associatively from the contents of STM. The chunks contain facts, procedures, history, etc. We retrieve chunks from LTM that are "closely related" to chunks in STM. We navigate through LTM by going from chunk to related chunk. For example, in one scenario CBS makes one think of a television network, which makes one think of a TV set, which makes one think of the room in which the TV set is located, which makes one think of .... This is a common association game for children. In actual problem solving, the chain of reasoning occurs quite rapidly, and it is very difficult to reconstruct the symbolic sequences applied. Moving to time frames too long for psychological theory, LTM is used in the execution of tasks extending from tens of seconds to minutes. Because the contents of STM are very perishable, the implication is that a task requiring minutes to perform will involve multiple retrievals from LTM. As
A Philosophic Framework
already suggested, the chunks that can be restricted to the current context of STM, i.e., chunks in STM. Because STM is small, the also will be limited. The selection of chunks based. That is, only chunks that contribute currently being considered will be retrieved.
37
activated from LTM will be the LTM chunks associated with number of associative selections for retrieval from LTM is goalto the problem (or subproblem)
Thus, a "train of thought" will be directed by a goal. Once chunks are established in LTM, they may used without reference to the events leading to their creation. For example, in computing the area of a rectangle, we have two choices. We may use the formula A=L W without reapplying the deeper reasoning process of deriving the formula. Alternatively, we may rederive the formula and apply it. When the goal is to compute the area and when we have constructed (learned) a procedural chunk that encapsulates the formula, we naturally select the former path. In an information-hiding sense, we would say that the chunk represents an abstraction that we can apply without reference to its derivation. It is this ability to instantaneously retrieve chunks from a very large LTM without rejustification that makes the human information processor so powerful. Skills are required for actions that extend over hours or days. A skill is a learned activity that we can conduct in parallel with other learned activities. For example, we can drive an automobile, listen to the radio, and read road signs concurrently because we are skilled in each activity and do not have to manage them consciously. But the new driver is easily distracted, and the lost driver will concentrate on reading road signs. Once a skill is learned, it is difficult to manage consciously; for example, one may stumble when trying to concentrate on how to walk up a flight of stairs. One can think of learned skills as patterns (chunks) that help in LTM recall. They speed the processing of learned tasks and bypass conscious information processing. That is, they let us operate with abstractions without requiring recourse to first principles. Learning may be viewed as an increase in the number of available patterns (chunks) in LTM. In experiments, chess masters were shown chess boards with legal situations and then were asked to reconstruct them. Their ability to recall was significantly better than novice players. When shown illegal chess boards, the masters’ responses were no better than those of the novices; there were no patterns that could be used to reduce the number of chunks. The same experiments have been repeated with programmers using real and scrambled programs; they produced similar results [Shne80]. Learned skills and knowledge arc stored in long term memory in schemata that provide the organization for retrieving and recognizing patterns. Studies of how students learn science, for example, show that learners look for meaning and will try to construct order even in the absence of complete information [Resn83]; they seek to find a way to append the new information onto their existing schemata. Consequently, naive theories will always be formed as part of the learning process; understanding will always rely on relationships to established knowledge in the schemata. Information isolated from these structures will be lost or forgotten. Thus, all learning is carried out in the context of current perceptions.
Chapter 2
38
Problem solving involves recall and selection from learned patterns. The chess master experiments suggest that experience produces more stored patterns, which implies that there will be a higher probability that the "best" patterns will be recalled for a specific situation. Studies of how clinicians perform medical problem solving offer further insight into the process, which I illustrate with the hypothetico-deductive model constructed by Elstein and his colleagues [E1SS78]. The model contains four steps. Cue acquisition. Taking a reviewing test results, etc.
history, performing a physical examination,
Hypothesis generation. Retrieving alternative hypotheses of the diagnosis from long term memory. Cue interpretation. Considering hypotheses previously generated.
the
data
in
Hypothesis evaluation. Weighing and combining which hypotheses are supported or ruled out.
the
the
context
data
to
of
the
determine
The process is iterative. It may result in a decision that more data are required (i.e., that a test should be ordered), that a probable diagnosis (and associated therapy) is indicated, or both. In analyzing the model, researchers found that the generation of early hypotheses had considerable natural force; medical students generated early hypotheses even when asked to withhold judgment. The number of hypotheses is usually around four or five and appears to have an upper bound of six or seven. The generation of hypotheses was based more consistently on single salient cues rather than on combinations of clues. Very few cues seemed to be used, and hypothesis selection was biased by recent experience. Also, cue interpretation tended to use three measures: confirm, disconfirm and noncontributory; the use of a seven-point scale had no greater explanatory power. Finally, researchers noted that lack of thoroughness was not as important a cause of error in diagnosis as were problems in integrating and combining information. (See [AdSo85] for an analysis of the design process.) Shneiderman adds another dimension to LTM by suggesting that (in the context of computer user behavior) there are two types of knowledge: syntactic and semantic [Shne87]. Syntactic knowledge is varied, device dependent, and acquired by rote memorization (e.g., the symbol used by a programming language as a statement terminator). Semantic knowledge, however, is structured, device independent, and acquired by meaningful learning (e.g., the concept of a do while). Syntactic knowledge is easily forgotten, but semantic knowledge is stable in memory. Consequently, tasks are best learned and retained when presented in a semantically meaningful way. Shneiderman uses the term direct manipulation for human-computer interaction tasks in which the action and its intent are directly linked (e.g., word processor use of the cursor and delete keys). One may think of these actions as being related to existing schemata and therefore natural.
A Philosophic Framework
39
This brief survey suggests the following model. The human information processor has a small working memory capacity. The working memory processes symbolic chunks. Its contents decay rapidly, and it must be either rehearsed or activated from long term memory. Activation is associative and is guided by the current context (goal). The hypothetical-deductive model suggests that few cues are processed in combinations and that a three-point evaluation scale is satisfactory. This is consistent with the statement that the reasoning in short term memory deals with abstractions; that is, it relies on surface pattern matches and does not reason from first principles. The power of the human information processor lies in its vast store of long term memory with its many links for recall. The weakness in this mechanism is that there is a strong force to retrieve chunks even when they may not be appropriate for the desired goal. For learned skills, selection of the desired chunk is automatic. (In fact, well-learned tasks can be carried out in parallel.) During learning, however, current context influences the response. When learning new information, chunks are stored to match patterns already in the schema. Where the existing schema is incomplete, naive theories are created to integrate the new material. As experience grows, the schemata are reorganized and more and better patterns become available for skilled activities and problem solving. Unfortunately, from the perspective of the software process, most of the skills relate to tool use, general domain understanding, communications, etc. whereas most of the critical decisions are made during the process of learning about the problem and its solution. The number of active tasks that can be considered at one time is quite small, and the processor is subject to information overload [McDo76]. In a time-sensitive situation we become saturated with the information presented and cannot access all the knowledge available to us. Fortunately, processing continues in the subconscious (with an activity called incubation), and solutions to complex problems often emerge days after the initial problem is identified. In fact, much of the process of software design seems to involve working out detailed problems using the conscious application of learned skills in parallel with a subconscious examination of deeper problems. There is much in the previous discussion that is oversimplified and speculative. Analysis of the design process is a difficult task [GoPi89], and I clearly am not an expert in this field. Nevertheless, I believe it useful to have some model of how people design systems and the kinds of problems they may encounter. It seems to me that the human information processor is quite adept at working with independent abstractions without concern for their consistency or reification. Information is learned and retained only as it relates to the existing schema. Naive (false) theories may be generated to accommodate this. Time pressures may limit the critical evaluation of these theories. The goal of TEDIUM, therefore, is to assist in the unambiguous and consistent recording of what is known about the application to be developed and to rely on human inspiration for the rest.
Chapter 2
40
2.2.4
Diagrams Considered Harmful
Thus far I have concentrated the discussion on symbolic representations and not on the issues of imagery and color. Modern brain research has shown that the two brain hemispheres support different functions [SpDe81]. The left hemisphere is more adept at generating rapidly changing motor patterns, processing rapidly changing auditory patterns, and other forms of sequential processing. The right hemisphere, by contrast, is more effective in simultaneously processing the type of information required to perceive spatial patterns and relationships. Thus, knowledge is expressed analytically and holistically. The question asked now is, how can this extra dimension be applied in TEDIUM? I begin by observing that a diagram is a symbolic representation that conveys information. Its strength is that it presents its information by use of multiple dimensions. In some problem domains, there are informationallyequivalent sentential and diagrammatic representations, and here the latter have a clear advantage [LaSi87], In the field of software development, however, there is evidence to suggest that graphics are best suited for presentation and less for programming or manipulation [Solb87]. Indeed, Dijkstra has been quoted (by Parnas) as saying that pictures are useful only when one does not understand something or when one wants to explain something. Clearly, diagrams are effective in those situations. The figures in this book are a testament to such use. The issue, of course, is how diagrams can be beneficial in a closed and integrated development environment. Consider first Orr’s entity diagram, which he uses to go from fuzzy thinking to clear thinking [Orr81]. Once the diagram is complete, its contents are recorded in a formal notation, and the diagram is discarded. The diagram represents a cognitive aid that is outside the automated flow; it is a back-ofthe-envelope calculation that has no role in the closed environment. This is not to say that its use is bad. It certainly fits Dijkstra’s criteria. I simply suggest that it is the diagram’s nonformality that makes it a useful device. I assert that it is difficult to represent the conceptual and formal software process models diagrammatically. First, most diagrams rely on oversimplified structures. The most common diagrammatic organization is the hierarchy. There are several reasons for this. A printed document is always structured in the form of a hierarchy (outline). In fact, there is no way to list any nonsequential collection without repetition, except as a hierarchy. The implemented software product often which implies a hierarchical structure.
is
represented
as a
call-tree,
Mathematical proof (such as exhibited in the reification process described in Chapter 1) requires a top-down (and thus hierarchical) flow. Analysis of search trees, which trace a (linear) time sequential flow through an arbitrary search space, are always presented as a hierarchy.
A Philosophic Framework
41
Because cognition limits the number of chunks that can be active at any one time, the hypotheses currently under consideration are normally perceived to be arranged in some broadcr-than and narrower-than hierarchial organization. (That is, small segments of a network are seen as trees.) Hierarchies can be used to represent many useful conceptual properties; inheritance is one common example. On closer examination, however, all of the hierarchies will be seen to be small portions of a larger network. For example, the call tree masks its network structure by repeating calls to utility programs; the power of hypertext is that it overcomes the structural limitations of a linear document. Thus, the real world is complex and multidimensional. Hierarchies allow us to extract views to comprehend individual concepts. Unfortunately, these views obscure the links as they clarify the characteristics of immediate concern. For example, a data flow diagram (DFD) fixes a process node in a single role within a fixed decomposition. The role binds the process to the designer’s understanding of the problem. It is difficult to identify the same process in a different context or diagram. Because human memory relies on surface pattern matching, the diagram may make it more difficult to recognize the similarities of (or differences between) nodes in other diagrams. By concentrating on one projection (or view) of a multidimensional space, we may lose sight of alternative, and perhaps more important, associations. A second objection I have to diagrams is that their utility is directly related to their imprecision. A DFD, for example, offers three dimensions of expression in a two-dimensional medium. Each symbol, such as a process bubble or a data flow line, abstracts a concept. This makes it easier for the designer, whose working memory can manage only a limited number of chunks of information, to suppress details. In fact, this is precisely what DeMarco states is an advantage of the DFD. With a DFD, you have the opportunity to absorb the information in parallel; not to read it, but to see it.... And though you may not yet be sure of specifically what is done and how it is done, you know precisely where to look for everything. [DeMa78, p. 95] Unfortunately, there is an imperfect link between the higher level abstractions and the defining details. A bubble labeled "initialize" may be correct at some generic level without differentiating between the initialization tasks to be done and the preconditions assumed. Generally the detail is expressed in a nondiagrammatic representation. The question then becomes one of determining how the semiformal diagrams of the conceptual models should be transformed into a formal model. Most volume of Moreover, producing
modern design practices rely on diagrams and pictures. Given the our design documents, pictures clearly are better than text alone. if we are to use pictures, then standards and conventions for and interpreting them are essential. That is why all computer-
Chapter 2
42
assisted software engineering (CASE) tools support diagram building functions. In some methods, such as with the Jackson System Design, several of the diagram forms are isomorphic to pseudocode text. But this is the exception. Most diagrams are inherently informal; they permit overlapping and conflicting models that can mislead. Therefore, the question addressed here is not if diagrams can be helpful. Certainly, the answer is yes. The question is, are diagrams appropriate tools for modeling complex software entities? I do not believe so, and I close the discussion with an example that illustrates how a diagram can be harmful. Examine Figure 2.1, which depicts an arbitrary triangle [Cipr88]. We draw line AD to bisect the angle BAC and the perpendicular bisector, MD, of line BC. These two lines meet at the point D. Finally, we draw lines DE and DF such that they are perpendicular to lines BA and CA, respectively.
Figure 2.1 Diagram of an arbitrary triangle. (© 1988, AAAS, reprinted with permission).
Notice that BM = MC by construction, DM = DM by identity, and the two right angles, BMD and CMD, are equal. Therefore the triangles BMD and CMD are congruent. It follows that BD = CD and that the angles MBD and MCD are equal. Next, we note that ED = EF because a point on an angle bisector is equidistant from the sides. Given that angles BED and CFD are right angles, we have the triangles EBD and CFD congruent. This implies that the angles EBD and FCD are equal. By composition, angles ABC and ACB also are equal. Consequently, because this was an arbitrary triangle, it follows that all triangles are isosceles. (By repeating the proof with another angle, we can even show that all triangles are equilateral.) What has happened in this example is that the diagram is misleading; except for isosceles triangles, point D is below line BC. I believe that I made my point. Diagrams can be helpful. That is how we learned to solve problems
A Philosophic Framework
43
in Euclidean geometry. But the diagrams fix a view of reality that may be limiting. Just as geometry provided a better model of reality once the parallel line axiom was removed, we must search for formal models that allow us to express the complexity of reality. Perhaps diagrams represent a very promising approach, but, as the experience with TEDIUM demonstrates, there are effective alternatives.
2.3
TOWARD A THEORY OF SOFTWARE PROCESS IMPROVEMENT
Now that some of my personal biases have been aired, I can present an overview of a general theory on how to improve the software process. Historically, computer science has concerned itself with issues relating to the operation and programming of computer machinery. Obviously, we have been very successful. In what follows, I suggest that -- for mature application domains — it now is possible for us to shift our perspective and concentrate on the application problems that the software is to solve. The remaining discussion examines what we are attempting to do as we develop software. Obviously, software is too broad a term to be used in this analysis, and I will limit the universe to problem categories in which we already have sufficient experience to ensure a viable product. Much of what follows offers insight into what TEDIUM attempts to do; however, I believe that the concepts examined arc valid for many other environments as well.
2.3.1
The Problem or the Product
Chapter 1 introduced Brooks’ "No Silver Bullet" paper [Broo87], wherein he separates software difficulties into those that are essential and those that are accidental. The essential difficulties of software are its complexity, conformity, changeability, and invisibility. Conformity denotes that the software is expected to conform to the other interfaces, and this adds to its complexity. Invisibility implies that the software is not inherently embedded in space (i.e., it cannot be visualized.) All these properties, with the exception of complexity, are related to cognition and understanding. Unlike problem solving in the domain of the physical sciences, there are no constraints that restrict the range of software solutions. Brooks continues in his analysis by identifying some past breakthroughs that have solved accidental software difficulties (i.e., those that attend its production but that are not inherent). The contributions identified are highlevel languages, time-sharing and unified programming environments. Each has improved productivity without affecting the essential characteristics. To gain an insight into the difference between the essential and accidental characteristics, I borrow from Herzberg’s study of work motivation [HeMS59], He identified factors that served to motivate (such as achievement) and factors that, if negative, would contribute to dissatisfaction (such as company policy). The latter he called hygiene factors. Hygiene factors contribute little to motivation when positive, but they inhibit work motivation
44
Chapter 2
when negative. Further, he showed that the motivator factors are related to job content, while the hygiene factors are related to job context. Restating Brooks’ observations in these terms, the accidental properties of software can be viewed as hygiene factors. Accidental improvements remove negative effects without affecting the essential characteristics. They address the context of the software process without altering how they deal with its content. The impossibility of finding essential, positive factors that will significantly improve productivity gives rise to the title of his paper. Brooks’ perspective is that of the implementation domain (i.e., the realization of the solution). One finds different answers when one examines software productivity in the context of problem solving in the application domain. Here the designer applies domain knowledge and prior experience to affect a solution. Problem solving without prior experience (either direct or by analogy) is very difficult, and I exclude the consideration of such problems. Problem solving in domains in which there is considerable experience, however, can be made much easier if that experience can be reused. Techniques include education, the use of prototypical examples, and the application of specialized tools. When the focus is on the problem solving process in some application domain, two things occur. First, generality is restricted as the approach becomes more domain specific. (This is probably true even when one attempts to avoid specialization.) Second, by specializing, there is an opportunity to create a notation (or environment) in which the volume of the problem statement is reduced. That is, the domain understanding provides a set of defaults for the problem, and only the desired behaviors need to be specified. For example, the 4GLs exist in the data processing application class. The 4GL problem statement details only the application specifics; the rest is supplied by default. Compare this approach to the reification process described in Chapter 1. Assume that a specification exists containing the complete definition of the essential functions that a product should support. Software development then becomes a detailing process in which the essential behaviors of the initial specification are augmented with those details necessary to achieve an effective realization of the requirements. But from an application domain perspective, these details contribute nothing to the product. A projection of the implemented product with respect to the application requirements should eliminate all the details added during reification. Stated differently, the detailing (building) activity ought to add nothing to the desired functionality other than to make it perform. All the details are permissive. This argument has simplified reality. It makes the assumption that the specification is complete and static. Of course, this is seldom true, and the detailing activity often involves application-oriented decisions. In fact, the process would not be possible if designers did not have implementation knowledge or if the implementors had no application knowledge. Nevertheless, the argument serves to underscore the distinction between the conceptual modeling in the application domain and the formal modeling in the
A Philosophic Framework
45
implementation domain (i.e., specifying what the product is to do and creating an effective executable model). Brooks’ analysis of the essence of software is concerned largely with implementation. I agree with him, and I also do not expect major productivity improvements in software development. The goal of TEDIUM, however, is to produce applications of a particular type. Where we have experience with a class of applications, we can develop environments that restrict the problem solving activity to that application domain. With such specialization, there are many defaults to draw on, and the solution space shrinks. The detailing activity is reduced, and some of the essential problems of software become less severe. In other words, there is a transition to the problem to be solved rather than the product that represents the realized solution. The goal of TEDIUM is not to offer an environment for developing software, rather it is to use automation to produce a software solution to an application problem. This presents a different view of software and its essential characteristics. The desired knowledge need only address the problem to be solved. For relatively mature application domains, as I will show, knowledge of the translation to an implementation can be structured for automatic reuse. Thus, for such an application class, the software process reduces to the much smaller task of defining the problem solution in terms of the application domain (i.e., specifying what the software is to do).
2.3.2
The Concept of Volume
The idea of volume comes from Halstead’s software science [Hals77, FiLo78]. The objects of study in software science arc individual algorithms. It is assumed that there are intrinsic, measurable properties that can be defined in terms of the operations and operators used in the algorithm’s implementation. Some of the properties that have been defined are: Program length (L). Halstead introduced both a defining measure and an estimate for length. Where the program is "pure," (contains no redundancies or needless constructs) the estimated value is close to a value defined in terms of the numbers of unique and total operators and operands. Program volume (V). This was defined to approximate the number of bits required to specify a program. By counting operators and operands the effect of character string length is minimized; however, volume is not independent of language used. Program level (LV). To define program level, Halstead introduced the concept of the most compact, or highest level, representation of the algorithm (V*). For example, V* for the sine of the angle X program in FORTRAN would be Y=SIN(X)
Chapter 2
46
with 2 operators (= and SIN()) and 2 operands (X and Y) for a V* of 8. Program level, a measure of abstraction, then is defined as V/V*. Language level (X). Because the language is related to the volume, language level is defined to suggest the inherent level imposed by the language. It was defined as LV*. Effort (E). The difficulty of programming increases as the volume increases, thus leading him to define effort as V/L and programming time (T) as E/S, where S=18, the Stroud rate. The evaluation of software science is a continuing topic [CaAg87], only the concepts just outlined.
I draw on
The major difference between software science and the idea of volume to be introduced here is that Halstead’s volume is a measurable property of a coded algorithm. What I call the "problem volume" relates to a system specification; the focus is on the software system and not its implementation. (The problem and not the product). Unfortunately, there is no uniform written representation for the problem that can provide comparability or repeatability across problems. Thus, we are faced with the question of how to measure the volume of a problem. From the number of paragraphs in its specification? By the number of pages in a description? By the number of elements in its data model? Clearly, lacking a formal representation, we recognize that problem volume is a concept rather than a metric. Albrecht’s work with function points can be viewed as a problem volume measure for a specific domain [Albr79]. For data processing applications, it is possible to characterize a product by its outward manifestations: the number of external user interfaces, inputs, inquiries, outputs, and master files. A weighted sum of these factors, called the "function points," estimates the size of the problem to be solved. Contrast this technique with the practice of estimating source lines of code (SLOC) of the product to be delivered. Although both the function points and SLOC metrics are used to estimate the development cost, the former is based on a model of the problem and the latter on a projection of the product. The choice of language has a major impact on the required effort per function point. One multi-application study reported that the code necessary to produce one function point averaged 110 lines of COBOL code, 65 lines of PL/1 code, and 25 lines of DMS (an early 4GL) code [AlGa83]. The function points represent a measure of the problem size, and the SLOC quantifies the product realization. The fact that some languages require a larger statement for the product is an attribute of the language chosen. The selection of a more expansive language does not necessarily imply anything about the problem volume. Note how the concept of problem volume suggested by the function point parallels that of Halstead’s algorithm volume. In both cases, there is a relationship between the representation (language) used and the volume of the object. One can assume that there is an ideal, most compact representation (V*) of the specification such that one can rank representation schemes
47
A Philosophic Framework
according to a language level (LV). Because effort is directly proportional to volume, productivity should improve as we approach V*. This interpretation of function points is atypical. It has been shown that there is a high degree of correlation between function points and SLOC [AlGa83], and most study this measure in the context of product size estimation [VeTa87]. Some, such as Reifer, have refined the factors for sizing estimates in more complex domains [Reif87], Because problem size is a good predictor of product size, fixed (and therefore predictable) development procedures arc followed. This assumption ensures that changes to the software process will be of the accidental kind. The goal of TEDIUM is to introduce a new process model that begins at the problem level using a representation that is minimal (i.e., close to V*).
2.3.3
Reducing the Volume
Given the distinction between the problem and the product, one can think in terms of both problem and product volumes. Clearly, the larger the volume, the greater the effort required to affect a solution. The volume for the problem (Problem-V) is derived from the description of the desired system’s essential behaviors and the assumed defaults. The product volume (Product-V) is a function of both the implementation details in the target language and Problem-V. Generally, we expect Problem-V < Product-V, and therefore the key to reducing product lower bound on problem volume.
volume
is to consider
first
the
If we have a most compact product volume (Product-V*), then we might expect to have Problem-V = Product-V*; that is, as with the sine function, a statement of the function is its most compact representation. Of course, this is less than satisfying. The statement Y = SIN(X) really does not specify the sine function; rather, it symbolically represents a specification that is elaborated elsewhere. It is an abstraction of the sine function. When we operate in an environment that establishes a context for the target product, we can reduce the volume (and effort) by building on the implicit assumptions of that context. For example, the context of a HLL includes input/output functions. In an assembly language environment, one would have to specify procedurally (or by use of a macro) the desired read/write processing. In the HLL the context provides a baseline for expressing input/output intent. Thus, the product volume is reduced, and that
Chapter 2
48
baseline does not have to be explicitly reexpressed. Naturally, whenever the HLL features do not provide the necessary functionality, one must resort to lower levels of representation. Often we rely on our experience to reconstruct sequences in the target language to accomplish tasks that are not more concisely supported in the target HLL. But whenever we respecify previously established processes, the volume is increased with no net benefit in either functionality or performance. It is as if we had to recode the sine function for each use. I use the term housekeeping to represent that portion of the problem or product volume that could be eliminated through the use of a better representation. Reducing housekeeping is one way to improve the detailing process. Naturally, not all detailing activities are housekeeping; some details must be added to clarify, improve the interface, optimize, etc. By definition, housekeeping is the elaborated detail that adds no function. If a most compact representation were available, then the housekeeping, H, might be expressed as H = V - V*. Of course, V* seldom exists, and different domains will have different lowhousekeeping representations. Nevertheless, the goal is clear. We wish to identify what is housekeeping (the text that adds effort without providing function or performance) and find representations to reduce it. Carried to the extreme, one hopes to find the most compact representation. TEDIUM looks for a housekeeping-free representation for interactive information system applications. If one is found, then it represents a minimal specification. Many current approaches to productivity improvement can be viewed in the context of volume reduction. Program libraries reduce volume; in fact, one can view the library as being a collection of V* representations for the specified functions. (This is consistent with the sine function example.) The executable specification offers another approach; it can be viewed as a method in which one defines a Problem-V that also serves as an operational ProductV. Unfortunately, the formal specification often is not an intuitively obvious representation for the problem; Problem-V is considerably larger than ProblemV*. Artificial intelligence techniques that build programs from domain or programming knowledge demonstrate a different kind of volume reduction. These approaches enable Product-V to be closer to Problem-V, but as we have seen most projects are still in the research stage. Finally, one can reduce the detailing activity by constraining the size of Product-V through the disciplined application of simpler constructs. Representations such as objects, abstract data types, and generic packages encapsulate knowledge and reduce volume. Mills and Linger have experimented with data structured programming wherein stacks and queues are used in place of arrays and pointers [MiLi86]. The authors observe that these structures seem to increase the function per line, which reduces product volume (see Table 2.2).
A Philosophic Framework
Item
Lines Variables Variable references Table 2.2
49
Array Design
Queue Design
18 6 31
18 2 18
Data Structured Programming, Design Comparisons.
At the problem definition level, one can reduce the volume by building on primitives within an existing context. Similar to the way in which the sine function was specified compactly, one could specify a PC word processor simply as word processor (WP). The context would provide all the essential characteristics of WP (for any implementation of WP). If there were any special features of WP that were not implicit in the specification, then they would have to be explicitly called out. The detailing of such features would not be considered housekeeping because the notation available offered no other way to specify the desired behavior. But if there were a notation WPj, WP2, ... such that WPn specified all the desired behaviors, then any additional detailing description would represent housekeeping. Notice that this illustration assumes that the context of word processor is well understood even though its implementation may be difficult. Here, given the WP specification, a make or buy decision would clearly favor the latter course. Fifteen years ago, this WP specification would have no meaning; there was no experience base to support the context. Certainly, most problem specifications will not be as compact as the WP example. Except for a few formal specifications, problem statements tend to be textual, and volume remains a concept rather than a measure. Still, the path to volume reduction at the problem level is clear: one must find representations that express the problem in a succinct (i.e., housekeeping-free) manner that may be mapped directly onto a formal model in the implementation domain. Using the notation of this section, one would hope to establish a process in which Problem-V* ~ Problem-V ~ Product-V* - Product-V; that is, the actual representation of the problem is close to the most compact forms of both the problem and product representations, and the actual product representation is close to that which is most compact. Again, the 4GL serves as an example. It illustrates how one can describe a report with a conceptual model that is isomorphic to a formal declarative representation. Of course, there are many things wrong with the currently available 4GLs. They are pragmatically evolving commercial products built without accepted external standards or conventions; they are limited to a narrow application domain; and they tend to operate efficiently only for applications in which there are few performance limitations. The intent of
Chapter 2
50
TEDIUM is to provide an environment that maps the formal representation without the limitations of the 4GL.
2.3.4
conceptual
into
the
Conceptual Closeness
Reducing the volume of the problem statement may be a necessary condition for improving productivity, but it is not sufficient. Recall that conceptual modeling involves problem solving in the application domain. Also, the quality of any model can be appraised only subjectively. Thus, the conceptual model should have two properties. It should be compact (or of low volume) and it should be clear. In other words, the model representation should be conceptually close to the objects being modeled. I illustrate Backus.
this concept of closeness by citing from a
1957
paper by
A brief case history of one job done with a system seldom gives a good measure of its usefulness, particularly when the selection is made by the authors of the system. Nevertheless, here are the facts about a rather simple but sizable job. The programmer attended a one-day course on FORTRAN and spent some more time referring to the manual. He then programmed the job in four hours, using 47 FORTRAN statements. These were compiled by the 704 in six minutes, producing about 1000 instructions. He ran the program and found the output incorrect. He studied the output and was able to localize his error in a FORTRAN statement he had written. He rewrote the offending statement, recompiled, and found that the resulting program was correct. He estimated that it might have taken three days to code the job by hand, plus an unknown time to de-bug it, and that no appreciable improvement in speed of execution would have been achieved thereby. [Back57] In this example, the problem statement was conceptually close to the 47statement representation of its solution. The 1000 minus 47 additional assembly language statements would have been housekeeping. They would have added nothing to either the functionality or the performance; they would have served only to increase the effort and introduce a potential for error. The Backus example demonstrates a benefit of the HLL. As previously noted, this is one of the breakthroughs that Brooks associates with the accidental difficulties of software. He explains the benefits as follows: What does a high level language accomplish? It frees a program from much of its accidental complexity. An abstract program consists of conceptual constructs: operations, data types, sequences and communication. The concrete machine program is concerned with bits, registers, conditions, branches, channels, disks, and such. To the extent that the high-level language embodies the constructs one wants in the abstract program and avoids all lower ones, it
A Philosophic Framework
51
eliminates a whole level of complexity that was never inherent in the program at all. [Broo87, p. 12] Brooks adds that the most an HLL can do is "to furnish all the constructs that the programmer imagines in the abstract program." Yet, this is precisely the reason for the success of the 4GL. In the context of database applications and report generation, the 4GL provides a representation that contains all the abstract constructs. Put another way, the form is conceptually close to the problem being solved. Thus the modeling of the application function and the program abstraction share a common notation. Where there are no other problem requirements that must be specified explicitly -- for example, performance constraints -- the 4GL formulation exhibits the desirable properties of a compact volume and a close representation. Of course, most design methods rely on conceptual closeness. An EntityRelationship model for a particular application is close to the model of the target organization, just as the 47-line FORTRAN program was close to the mathematical problem to be solved. Unfortunately, the Entity-Relationship model generally is an adjunct to the implementation; it does not readily map onto and remain part of the implementation solution. What we require is not that the conceptual models be close to the problem definition; they would be worthless if they were not. Instead, we must have the product definition conceptually close to the problem definition. This facilitates the mapping of the problem statement onto the product statement, improves the validity of the specification, eases problem refinement when iterative development is used, and reduces the volume of the product specification.
2.3.5
Closing the Environment
In Chapter 1 the essential software process was presented as a series of transformations from a problem need in some application domain into a product that could satisfy that need. A software development environment that supports the entire process can be abstracted as a relation Env with the domain of problems for which a software solution is sought and the range of applicable software products. Because the problems have no realization until they can be expressed conceptually, Env can be viewed as a relation on the conceptual models and implementations: Env(C,I), where
C={c | c is a conceptual model), I={i | i is a software implementation).
Env is considered a relation because there are many possible implementations ij that can be derived from a given conceptual model c; moreover, a fixed product i may satisfy the requirements of many different conceptual models Cj. Because every software development environment can support the implementation of only a limited domain of problems and produce only a limited range of solutions, the relation Env represents a restriction to the set of problems and product solutions that can be managed by that
52
Chapter 2
environment. Within this restriction, I am interested in how the environment supports the realization of some i in Env for a given c. That is, I am concerned with how the development environment aids the design team in going from a conceptual model to its implementation. Let F={f | f is a formal model}. Then for each pair (c,i) in Env there is at least one sequence of transformations (functions) TX:C => C, T2:C => F, T3:F => F and T4:F => I whose composition maps the concept onto an implementation. In particular, Env can be trivially represented as a pair of transformations defined on c to produce an f that is isomorphic to i. But I seek greater granularity than this, and I look to those transformations that can provide the designer with control over the process. Therefore, I define an environment to be closed if and only if the environment supports at least one nontrivial sequence of transformations for each (c,i) in Env. If one allows an environment to use both automated and nonautomated transformations, then by definition the environment is closed. For every pair (c,i) in the relation, the environment supports transformations Tj such that Env(c,i) can be expressed as a composition of those transformations. (If this were not true, then the developers would have to go outside the environment, which would contradict our understanding of what an environment is.) Because I am concerned with the building of an automated environment, it is necessary to restrict the Tj to transformations that are, in some sense, automated. When this condition is added, the closure property as described above must be modified. The class of conceptual models is reduced to those that can be represented in an automated environment, and the transformations are reduced to those that operate on machine-stored models. The resulting automated transformations can be categorized as follows. Formally Automated. A transformation is said to be formally automated if both its domain and range are machine-processible objects within the environment and if there exists a program that can manage the transformation without human intervention. An example is a compiler that transforms a source code file into an object code file. Conceptually Automated. Because the software process begins with some preliminary human analysis before any machine-processible object can exist, there is a need to represent the mapping from an idea to a machine-processible object. I say that the transformation is conceptually automated if the mental conceptualization and the machine-processible representations are very similar. Work with user interfaces suggests the terms semantic and articulatory directness (to indicate that there is little cognitive activity required to map goals and intentions into system states) and direct engagement (to reflect the feeling that the user is manipulating objects from the task environment) [HuHN85]. An example is the use of a CASE tool to draw Entity-Relationship Diagrams, when this is a natural way for the user to define a data model. Partially Automated. Let equivalent objects be defined as machineprocessible objects having essentially the same information content but
A Philosophic Framework
53
for which no formally automated transformation exists that can produce one from the other. Then a partially automated transformation is a map from one equivalent object to another. An example is the use of a data dictionary, created with a structured design tool, to produce a formal DBMS scheme when the environment cannot automatically generate the latter from the former. (Notice that partial automation is achieved by manual transformation; automation of these transformations results in formal automation.) An environment also may not automated in any of may be used to prepare machine-processiblc objects
include automated tools that arc independent (i.e., the above senses). For example, a word processor documentation without reference to any of the described in the documents.
The goal of TEDIUM is to present a closed environment for the development and maintenance of a specific class of application. I begin by defining representations that are both close to the designers’ concepts (in the sense of the previous discussion) and close to some machine-processible representation of knowledge. These representations provide a set of conceptually automated transformations that offers both a tool for conceptual modeling and a mechanism for translating the conceptual model into the formal model. Next transformations must be found that map the conceptual representation into an implementation. TEDIUM’S implementation domain is relatively mature, and the translation problem is manageable. The challenge is to find (a) the functions that can transform the machine-processible representations into efficient program code and (b) the changes to the representation necessary to convert partially automated transformations into formally automated transformations. During this process, all independent transformations are eliminated. The guideline is one of having a mechanism for the analyst to specify all desired attributes of the final product so that, once an attribute has been defined, the definition is applied wherever appropriate. In this sense, closure implies that once a property of the desired product has been specified, there are transformations that will ensure that the property is included in every related component of the implementation. Stated another way, TEDIUM attempts to produce a closed environment by offering a formal model that will be isomorphic to the conceptual models used by the designers (i.e., the mapping from the conceptual to the formal models of the essential software process ought to be the identity.) If this can be achieved, then the formal model will be composed as the conceptual understanding of the application is documented. Automated transformations (program generation) will create products from portions of the conceptual (and its equivalent formal) model, which can serve as prototypes to validate and expand the definition of the target product. The overall objective is to provide a supports the process from initial concept an efficient operational product. Figure other development paradigms and shows
closed development environment that formulation through the creation of 2.2 compares this approach with two the percent of the knowledge about
54
Chapter 2
the problem that is retained in an automated form as a function of the processing phase. Three paradigms are illustrated. The first, called the standard development cycle, retains only the necessary code in the computer. This paradigm was the standard approach during a period of high equipment costs and is seldom practiced today. The second paradigm is labeled a nonintegrated adjunct. Here, tools are added to the environment to aid the process. These tools are neither integrated nor formally automated. Nonintegration implies that one can make changes to the tool database without affecting the product description and vice versa. Partial automation implies that the designers must manually transform the tool databases from one automated form to another. The final paradigm, here called fully automated, implies that all knowledge of the target application is maintained in an integrated database and that automated transformations are available to produce a product from the knowledge in this application database. The discussion of TEDIUM should be regarded as an observation (in the sense that Brooks uses the term) with respect to the viability of this paradigm.
STANDARD DEVELOPMENT CYCLE NONINTEGRATED ADJUNCT
V///X
FULLY AUTOMATED PARADIGM
Figure 2.2 Three paradigms for software development.
2.4
CONCLUSION
Although TEDIUM is a product that has been used to develop sophisticated applications over the past ten years, I believe its main value lies in what it tells us about how we develop and maintain software. Some preliminary results are detailed in Chapter 8. To calibrate the data in Chapter 8, one must understand TEDIUM. To understand TEDIUM, one ought to be aware of the context in which it was conceived. This background has now been presented, and I can stall no longer. The next chapter begins the description of TEDIUM.
Chapter 3 TEDIUM, MUMPS and the INA Example
3.1
INTRODUCTION
This chapter presents an overview of TEDIUM, and the three chapters of Part II follow with detailed descriptions of the language, environment, and use of TEDIUM. Because some of the TEDIUM syntax draws on conventions established in the MUMPS programming language. Chapter 3 also contains a brief overview of that language. Finally, there is a short description of a modestly difficult problem cited throughout Part II to illustrate how TEDIUM is used.
3.2
OVERVIEW OF TEDIUM
TEDIUM is an automated environment designed to manage virtually all aspects of a software product’s life cycle from initial concept formulation to final retirement. The central core of TEDIUM is a program generator that converts definitions of the product’s characteristics into efficient code. To make that generator perform effectively, it is necessary to restrict the range of the products to a class of applications that is (a) simple enough to represent completely at the conceptual level, and (b) sufficiently well understood to ensure the automatic transformation from a conceptual model into an efficient implementation. During the early days of TEDIUM’S development, my wife -- who knows very little of computers -- offered this naive description, "You punch in your qualifications, and it prints out your program." [B1B183] Close, but not quite correct. TEDIUM’S goal is to create systems (applications) and not programs. Thus, a better description is that it is an environment in which (a) the designers define attributes of the target application and (b) the generator creates a system that contains all of these attributes. The designer begins by entering a set of characteristics that specify the application’s functions. Once that definition is available in the database, a complete default application can be generated. (Completeness here implies that
Chapter 3
56
all implicit functions, such as responding to help messages and validating all inputs, have been included in the generated programs.) Additional characteristics can be defined to modify the behavior or improve the performance of the product. Development (or enhancement) is complete as soon as a satisfactory solution has been generated. The remainder of this section identifies what kinds of applications TEDIUM can be used for; what categories of knowledge are captured by TEDIUM and how they are stored; what guidelines TEDIUM follows to improve productivity; and how some of these concepts can be transferred to other target application classes.
3.2.1
The Application Class
There are many different kinds of software. It would be foolish to suggest that one environment (or language, for that matter) could be effective for all software problems. There are, however, classes of application that share many common concepts. (In Chapter 1 I referred to these concepts as application class knowledge.) One goal of TEDIUM is to formalize what we know from experience about an application class so that we can reuse that knowledge implicitly. Consequently, TEDIUM must be restricted to one class of application. The application class in which TEDIUM operates is that of the interactive information system (IIS). An IIS maintains a large database and relies on human-computer interactions to manage many of its key activities. It has few real-time demands or computational requirements, and normally it is constructed using off-the-shelf software tools and hardware systems. The IIS builds on a relatively mature technology. A clinical information system is an IIS that manages clinical data to support medical decision making and patient management. This was the initial target for TEDIUM. Most software tools also are instances of the IIS class. TEDIUM, which is an IIS, is implemented in TEDIUM. Many knowledge based applications can be implemented as an IIS. The Intelligent Navigational Assistant (INA) example described in Section 3.4 is an AI application implemented as an IIS. Thus, the IIS application class is quite broad. It includes, but is not limited to, the traditional COBOL-coded data processing applications. Many IIS applications depend on complex data structures and sophisticated human interactions, and these are the primary targets for TEDIUM. Within the IIS class, TEDIUM is especially effective for those applications in which there is uncertainty regarding the desired requirements. Recall that in the early days of computing, at the time of the NATO conferences on software engineering, most of the software applications were firmly rooted in engineering problems. Requirements could be specified, and the challenge was to implement a product that satisfied the desired constraints.
TEDIUM, MUMPS, and the INA Example
57
Since that time, however, we have addressed the creation of even more sophisticated applications. Consequently, there is less certainty about how the desired product ought to behave in the operational environment. We are confronted with the fact that our requirements specifications -- which are an essential prerequisite to the reification process -- may be inherently fuzzy. In Chapter 1 it was shown how different development paradigms deal with these issues of uncertainty and incompleteness. TEDIUM offers an alternative approach suited to a specific, mature application class. To illustrate the target problems for which TEDIUM is appropriate, it is useful to describe what we know about a requirements specification in terms of risk. Figure 3.1 shows a projection of risk onto two dimensions. The first is application risk, which measures the probability that we know what the application is to do. Technical risk, on the other hand, is the probability that we can implement a product that does what was specified.
Projects high in application risk tend to be underconstrained, and projects high in technical risk tend to be overconstrained. Applications with specifications that are low in both dimensions of risk are relatively safe to produce; often off-the-shelf software products already are available. Conversely, projects high in both dimensions of risk are beyond the state of the art. Typically, we make simplifying assumptions and build prototypes to add to our knowledge. The risk scales contract as we gain experience. TEDIUM was designed for low technical risk applications. Once it is understood what is desired, there is a high probability that an efficient product can be produced. Because the IIS uses relatively mature technology, the technical risk normally is quite low. The major challenge in developing IIS products is that the analysts may not understand what the product is to do until there is some operational experience with it in the target environment. Such products are of high application risk, and these are TEDIUM’S target applications. (Naturally, TEDIUM can be used for applications that are low in both dimensions of risk, but this is a less interesting problem.)
Chapter 3
58
3.2.2
The Application Database
TEDIUM differs from most other software process paradigms in that the unit of concern is the application and not the individual program. TEDIUM maintains an application database (ADB) that contains all information about the application to be developed. The designers* always work in the context of the target application. All definitions in the data model are global with respect to the application; once a property has been defined for an application, it is applied throughout the application. Because the ADB is an integrated database, it is impossible to define some types of conflicting properties within an application. The organization of the ADB is shown in Figure 3.2.
Figure 3.2 Three Levels of Representation.
All persons developing systems with TEDIUM are called designers. The use of this term was intended to suggest that the writing of programs is a very secondary activity. Despite that fact, several TEDIUM users refer to themselves as programmers. I shall call them designers.
TEDIUM, MUMPS, and the INA Example
59
The figure displays three levels of representation for an application. At the descriptive level there is subjective text that describes the characteristics of the application. The text is presented in three perspectives. Requirements. to perform.
This is a description of what functions the application is
Data groups. This is a description of what objects are to be represented in the application’s database. Processes. This is a description of the process flows that the application is to support. Notice that all three categories are descriptive of the application’s behavior and not of its structure; they describe what is to be done and not how it is to be done. (The choice of words for these categories obscures this fact.) Below the descriptive level is the conceptual level. Because the TEDIUM conceptual model is isomorphic to its formal model, the objects at this level are actually formal entities that can be used to generate the implementation. Two categories of conceptual object are identified: Semantic data model. This is an extension of the relational model that expresses additional semantic properties. The data dictionary also is extended to contain all validity criteria for each element. Program specifications. application programs. specifications.
This is a high level specification language for the There are both declarative and procedural
At the lowest level are the executable representations (i.e., the operational programs). Because the programs are generated from the information in the conceptual level, they contain no knowledge not already expressed at the conceptual level. Thus it is sufficient for the ADB to maintain only the first two levels of representation. Because the programs are separate from the rest of the ADB, TEDIUM-generated applications can run without the presence of TEDIUM; however, some TEDIUM utility programs are required, and portions of the ADB normally are copied to provide help messages and page heading text. The figure also displays the links among the various ADB objects. Descriptive nodes are linked both to the conceptual objects that represent their realization and to each other. All descriptive text within a category is structured as a hierarchy. This is a necessary condition for producing linear listings without replication. Links among objects, however, have few restrictions. Consequently, a full listing of the links may not be useful, but online local views will be practical. (Compare this to the data flow diagram, wherein the method forces each diagram to exist within a hierarchy of diagrams.)
Chapter 3
60
A more complete description of the ADB contents and the TEDIUM documentation tools is contained in Chapter 6. Before moving to the next topic, I observe that the kinds of descriptive and conceptual objects selected for TEDIUM reflect the specifics of the IIS application class. If more complex human-computer interfaces were used, it might be appropriate to augment the views with one that addressed dialogue definition [HaHi89]. For real-time applications, it might be necessary to identify events and/or external interfaces as conceptual objects, and the descriptive text could be expanded to include the identification of host environment constraints.
3.2.3
System Style
Given an ADB that contains the complete description of an application, TEDIUM uses a program generator to translate the ADB specifications into an operational system. Obviously, there are an infinite number of implementations that will satisfy those specifications, and the program generator will select a specific one. The system style is the set of guidelines and defaults used by the generator during the translation. It defines the characteristics of the specific implementation for the given specifications and ensures consistency among the product’s interactions. One justification in developing TEDIUM for the Phase II Oncology Clinical Information System (OCIS) was that the Phase I system lacked a uniform style. Help messages were arbitrary; user prompts were individualistic; input validation was random; and formatting conventions were applied casually. What was desired was a standard system style that all programs would adhere to. This would make the system easier for the users to learn; it also would facilitate the training and orientation of new designers. For the OCIS revision we needed a style guide that reused previous experience with the application class and established conventions for operation in the target environment. In theory, a programmer could be given a copy of this style guide plus a listing from the ADB and — without any further direction — produce a complete and correct program (with respect to both the ADB and style guide). This implies that all knowledge of the application is in the ADB and system style, and it should be possible to automate the transformation to the product.* This translation would both improve productivity and guarantee that all programs conformed to the style. In the TEDIUM implementation described in this book, the system style is integrated into the program generator. The present version of TEDIUM was bootstrapped from earlier versions of TEDIUM. The program generator is the only remaining part of TEDIUM that uses custom MUMPS code. I am in the
Note that the IIS is a mature application class, and therefore automatic transformation without human intervention is possible. For other, more complex domains this may not be true, and the designers may have to access software tool and computer science knowledge in order to produce an efficient system. For example, see [Balz85].
TEDIUM, MUMPS, and the INA Example
61
process of replacing the old generator with a more efficient TEDIUM-defincd generator. Nevertheless, only one system style currently is implemented, and it was frozen in 1982. Once the generator has been recoded, there should be no limit on the number of system styles that can be supported. (In fact, the INA example uses a prototype of the system style that I intend to develop next.) A generalized facility should support two dimensions of style. First, there will be a choice of target language for the generated program. The present version of TEDIUM generates only MUMPS programs, but this is not an inherent characteristic of TEDIUM. All programs are defined using a specification language that describes actions in the context of their external behavior. For example, let there be an element called CLINIC that is defined in the ADB. Then the TEDIUM statement Input CLINIC will cause the generator to create code that: prompts for a value of CLINIC using the external name defined in the ADB data dictionary; validates the input on the basis of the ADB definition; manages requests for help; and causes a branch if the user aborts processing. Clearly, there is nothing language specific about this function. The second dimension of the system style involves how the generated code interacts with the user. For example, the following statement specifies a prompt to the user: PRompt
(A)dd
(E)dit
(D)elete
(Q)uit
The system style establishes the functionality and form to be supported by the generated code. In the current style, the parentheses are removed, and the generated program prints the following in a scrolling dialogue style: Add
Edit
Delete
Quit
A null return is set to the first valid value (A), and any value other than A, E, D, or Q causes the prompt to repeat. Control is also provided for help messages and the processing of an escape; the next specification statement cannot be reached unless a valid input has been entered. Alternative system styles might: Open a pop-up window with the four options displayed. Print the option line in a mouse-selectible menu bar. Print the option line in a fixed option window. Superimpose the options onto a map of function keys (the INA style).
Chapter 3
62
Obviously, the TEDIUM statement contains enough information to support any of these styles. The challenge is to find ways to represent the two style dimensions so that the generator will be both flexible and efficient. Of course, this is a problem for future research, and I shall make no further reference to it.
3.2.4
Minimal Specification
In the discussion of volume in Chapter 2, I defined problem volume (Problem-V) and stated that one goal of TEDIUM was to eliminate housekeeping by producing minimal specifications whose volume was close to Problem-V*. In this context, housekeeping was defined to be any code, beyond that necessary to describe the product’s function, that was required to complete the product and make it efficient. The previous discussion of system style indicates that most implementation details have been abstracted for automatic default reuse. Without automatic reuse, the style conventions would have to be detailed explicitly. This would represent housekeeping: the avoidable code that contributes nothing to the product’s functionality or performance. (For example, recall Backus’ 47-line FORTRAN program and its 1,000-line assembly equivalent.) TEDIUM tries to reduce housekeeping by using a specification language that allows the designer to define compactly the application programs (within a fixed style). These are called minimum specifications because every statement defines product behavior without the need to repeat any functions or conventions already in the system style or the ADB. They are minimal in that any smaller specification will not exhibit the same behavior. In effect, the minimal specification is a form of Problem-V that is close to Problem-V*. The concept of a minimum depends on the task and language used; it is not an absolute. For example, FORTRAN has a style that makes mathematical computation easy to express, but some types of formatting are cumbersome; however, COBOL has tools to facilitate formatting, but it is not effective for computation. The printing of an integer with leading zeros is supported by the COBOL style but involves housekeeping when implemented in FORTRAN. Thus, the concept of a minimal specification, as used here, is with respect to a system style and the IIS application class. For tasks on the periphery of this target, the specifications may not be minimal; indeed, they may be less compact than other high level languages. Because the specifications are locally minimal in some domain, I now address the criteria for the definition of a specification language. First, the language should aim at being housekeeping free. That is, for the functions (activities) that are important in the given domain, the language should provide housekeeping-free (minimal) statements to define that function. Two examples were given in the earlier subsection. Both the Input and PRompt commands use minimal parameter fields that convey only the information necessary to
TEDIUM, MUMPS, and the INA Example
interpret what the generated code should do. the lowercase letters normally are not used.)
63
(To make the commands shorter,
In these examples the commands relied on the system style embedded in the generator to define the default interactions. The Input command also relied on information about CLINIC that was entered into the ADB as part of the data model. These minimal statements may not be what are desired, and the commands may have to be altered. For example, the command PRompt
(A)dd
(E)dit
(D)elete
(Q)uit//CMND
will cause the result of the prompt to be stored in the variable CMND. command
The
Input CLINIC/PROMPT,Enter New Clinic will change the prompt line from the standard external name for CLINIC in the data model to the string "Enter New Clinic". (The validation of the input, naturally, will use the ADB data dictionary criteria as in the previous illustration.) The justification for adding to the PRompt might be the need for a variable CMND to be passed to some other program. The Input prompt might be changed for a program that reassigns doctors among clinics; in the interactive flow, the revised prompt might be more expressive than some standard name such as "Clinic Identifier". Both extensions represent minimal specifications for the task to be performed. Each defines a program action in the context of the processing flow that could not be stated in the current language more concisely. Obviously, it would be possible to construct additional, more-concise operators for each command option. To do so, however, would add to the overall size of the specification language. Thus, the second criterion for the TEDIUM specification language is that the syntax be limited to a relatively small number of commands and constructs, which makes the language easier to learn and use. Brooks, in citing why high-level languages simply solve accidental difficulties, concludes with this observation: ... at some point the elaboration of a high-level language creates a tool-mastery burden that increases, not reduces, the intellectual task of the user who rarely uses the esoteric constructs. [Broo87, p. 12] The TEDIUM specification language seeks to avoid this problem by limiting both the domain of applicability and the number of available constructs. Specifying outside this boundary, as is sometimes necessary, involves housekeeping and/or the use of alternative languages. The final criterion used in the definition of the specification language is that it allow the creation of compound (or generic) specifications that could be used declaratively. (The program specifications written entirely in the TEDIUM command language are called common specifications.) For example,
Chapter 3
64
assume that a table (relation) called DOCTOR has been defined in the data model. The definition will include the definition of all key and non-key attributes (called index and data elements in TEDIUM) plus their external names, validation criteria, and descriptive help messages. Here, a generic Entry program that will perform all file management functions on that table has the following two-line minimal specification: EDDOC is an Entry program with the table DOCTOR. (Actually, an interactive dialogue is used, and the above syntax is only for illustration purposes.) In the generic specification for the program EDDOC, the system style defines what functions should be performed and what the interactive flow should be. In the present style, the Entry program adds, edits, deletes, and lists entries in the DOCTOR table. Naturally, all validation tests defined in the ADB are performed, and help requests respond by listing the appropriate data from the ADB. The generic program can be changed in two ways. First, by selecting alternative options for the Entry program, functions can be added or removed. With the Entry specification, the Full option produces a program that also allows modification to the keys plus scanning through the table. Second, the designer may modify the specification by adding commands at different points in the program. For example, the designer may use a declarative command to indicate that a specific listing program is to replace the default listing program normally generated. The designer also may insert procedural commands (in the TEDIUM specification language) at preselected portions of the generic program (e.g., before the update is made or when prompting for a specific element in the table). The present implementation of the generic programs is an extension of the program generator. Consequently, it consists of custom MUMPS code. Four generic programs have been written, and one (for reports) has been removed because I found it was too restrictive. In Chapter 5 I discuss how these and new generic programs may be defined by the designers; the facility to do so, however, is not part of the environment as it now operates.
3.2.5
System Sculpture
I present here a software process approach that integrates the characteristics of TEDIUM already mentioned. Recall that TEDIUM was intended to be used in high application risk projects, where the major emphasis is on understanding what the end product is to do. Once that is known, the implementation of an effective product should be easy. For these projects, TEDIUM offers a system style that defines all the implicit defaults and conventions to be used by the implementation, compact (minimal) specifications that describe the product’s behavior in terms of
TEDIUM, MUMPS, and the INA Example
65
application domain function, and an integrated database (the ADB) that retains what has been defined about the product. Finally, there is a program generator that transform portions of the ADB into efficient code. With these tools available, the most effective design philosophy would be one in which the designers attempt to understand the problem to be solved by entering various product characteristics into the ADB as they are established. Designers would then generate prototypes to validate their preliminary understandings. Through iteration, the ADB would be refined until it fully describes the desired product. When this milestone is met, the program generator has produced the operational product. This is a constructionist method for application development. It assumes that there are portions of the problem solution that are reasonably well understood, and it concentrates on capturing information about what is known. The information is entered into the ADB, which serves to constrain the solution space. When enough has been defined about some application task, programs can be generated to experiment with the initial concepts. Because the specifications are compact and task oriented, it should be easy for the designers to translate their concepts into an operational product. Because all design information is maintained in the ADB, the new features to be tried will be integrated with previously accepted features. Gradually the ADB establishes what the product is to do. Each time a prototype is generated, the result is complete with respect to the system style; it does all validity testing, manages requests for help text, and supports all implicitly required functions. Thus, once the designers perceive that the problem solution has been specified, the product solution exists. The emphasis is placed on the tools that build the designers’ knowledge of the desired application; the implementation is a by-product. The method just described is called system sculpture. It is based on a sculpture metaphor in which the artist begins with the vision of a product. The vision may require preliminary sketches and models. At some time, however, the process begins. A framework is constructed and clay is added, molded, and removed. Eventually the work is considered to be aesthetically pleasing, and the process is complete. Compare this to the architect metaphor. A complex structure is to be built, and the preliminary modeling is used to establish the requirements. Once they are agreed on, formal drawings are produced and construction begins. As work progresses, freedom to make structural changes is reduced; at the end only cosmetic modifications are allowed. The traditional process model typified by the waterfall diagram follows the architect metaphor. For large projects, the need to coordinate many people and activities makes it the safest development model. The approach breaks down when there is uncertainty about the validity of the initial drawings (i.e., the requirements). In that event, using prototypes as models or restructuring the process to learn through evolution can reduce the risk. Nevertheless, once the plans are agreed upon, the degrees of freedom are limited.
Chapter 3
66
With system sculpture, definition of the product and feedback from project experience are tightly coupled. There is little distinction between the delivery of the initial operational system and its continuous evolution. Maintenance is facilitated because it is managed as a continuation of ADB definition. The designers work in the Problem space and not in the Product space, thus eliminating many of the housekeeping details and thereby improving productivity and problem understanding. Not all environments are suitable for system sculpture. The method will not be effective for high technical risk projects in which there is little implicit knowledge that can be encoded and reused. It also may be impractical for very large projects in which the activities of many designers must be coordinated. Finally, system sculpture addresses only the essential software process described in Chapter 1. It is not concerned with management issues, and in fact it my prove very difficult to control except in selected, highly cooperative environments. Thus, I do not propose that system sculpture should be considered an alternative to the standard process models. There are obvious limits. I only assert that this is the process model used by TEDIUM, and I provide evaluation data in the final part of this book that quantifies and qualifies how effective system sculpture can be.
3.2.6
Program Generation and Bridge Technologies
Because TEDIUM represents a different environment and a new process model, I thought it appropriate to consider one of software productivity’s most important questions: If we ever find a better way, then what do we do with all our existing code? Clearly, backlogs, labor limitations, and the dependency on current databases indicate that -- even if we were to achieve orders of magnitude of improvement in software productivity -- we still would need ways to transition from the old to the new. I call the tools of transition bridge technology, and in this section I speculate on how the concepts embedded in TEDIUM may be extended to serve as a bridge technology. This capability is not a feature of the present TEDIUM environment. Nevertheless, it is such an important, and generally unrecognized, topic that I thought it worthwhile to comment on it here. The present version of TEDIUM was designed to provide applicationoriented specifications for the product to be built. In this way it captures knowledge of the target application, and the program generator creates the desired programs. When systems already exist, however, some form of reverse engineering is required. The major problem is that the TEDIUM specifications represent the application’s knowledge at a very high level (often expressed
This seemed like a reasonable place for me to address this issue, but the reader may find it better to return to this section after reading about TEDIUM in Part II.
TEDIUM, MUMPS, and the INA Example
67
declarativcly) whereas most knowledge in existing programs is represented at a very low level in procedural code. I see no way in which one can apply automated reverse engineering to procedural code and get TEDIUM-like specifications. For example, consider the validation of inputs expressed procedurally throughout the programs in a large system. If all validations always perform the equivalent process, then one may feel confident that the implemented validation represents universal properties of the element under review. But if not all implementations are equivalent, how does one explain the differences? Is it a different context or use, or does the variation represent program inconsistency? Also, there is the problem of automatically recognizing equivalent procedural code. Finally, and most important from my perspective, how sure are we that the existing code represents what was desired? Therefore, although I recognize that many are working on the reverse engineering problem, I offer a different solution. On the basis of the TEDIUM philosophy of focusing on the application domain, I would suggest augmentation of the command knowledge and system style to capture the constructs already used in existing older programs. Some limited reverse engineering could be used to build an initial ADB. For example, COBOL Data Division text could be used to construct a skeletal TEDIUM data model to which additional knowledge could be appended. But the major activity would be the definition, in the ADB, of what the application presently is presumed to be doing. As the ADB is built, implementations can be generated in two system styles. The first would use the limits and conventions of the existing programs; the second would use all TEDIUM constructs and facilities. The bridge would be constructed as follows. First, the overall architecture of the present systems would be modeled in the ADB. This could be considered, in part, to be a documentation exercise. Where automated definitions already exist, they could be transformed automatically. Once the general architecture activity would be identified:
is available
in the ADB, three types of
Maintenance of existing programs that is most effectively performed by modifying existing code. Maintenance for existing programs of degraded structure. Here the goal is to capture an understanding of what the program does and formalize it in the ADB. A modified program, which interfaces with the existing programs, then can be generated by TEDIUM. Creation of new programs and applications at the TEDIUM level using the established architecture. The objects produced by these activities will be small programs to be integrated with an existing application. The architecture represents the accepted structure for that application (i.e., the sets of operational systems of
Chapter 3
68
concern). The architecture is not complete in the sense of a TEDIUM application because not all of the application processes need be defined. Modification to the system style will ensure that the generated programs conform to the general style already implemented (albeit imperfectly). Those programs will read from and write to the existing database using the same conventions as the other existing programs, and they will be able to do this because the program generator has be tuned to reflect the needs of the operational environment. (After all, the generator is simply the automated expression of the system style that all the organization’s programmers are expected to follow.) In this way, it should be possible to transition from the maintenance of existing code to the capture of the knowledge that the code represents. Once this is done, the program generator can produce code that operates in the old environment. Alternatively, a different system style can be used to take advantage of newer or better computer environments. Of course, TEDIUM does not yet support more than one style, and I hesitate to suggest when such an ambitious undertaking can be initiated. Nevertheless, the problem of bridge technologies is one of the most important that we face. Moreover, I see the combination of knowledge capture and program generation as a very promising method for moving from the maintenance of implementation-oriented code to the far more productive exercise of directly responding to the organization’s needs.
3.2.7
Observations on Reuse
Because reuse is such an important topic of current research [Free87, Trac88], it is useful to review the TEDIUM characteristics in this context. There are two perspectives for reuse. First, it can be viewed as a formalization of previous experience or knowledge. In domains where such experience exists and can be organized, housekeeping can be reduced. Second, reuse also can be considered to be the avoidance of redundancy, such as the respecification of established design decisions. As has been shown, the IIS is a well-understood application domain, and there is experience in representing models, defining conventions and styles, and generating programs automatically. Thus, for this domain, it is possible for TEDIUM to achieve its goals by applying both categories of reuse. The following summarizes how TEDIUM approaches the software process in a reuse context. Reuse of domain expressions. Experience in the IIS domain has led to the development of data models and design methods that capture some of the semantics of the objects being modeled. Structuring this knowledge in a form that represents intuitively understood concepts of the application domain while also being formal enough to be executed allows the design team to reuse their application domain experience with few implementation distractions.
TEDIUM, MUMPS, and the INA Example
69
Reuse of domain conventions. The set of conventions and practices for IIS applications can be formalized as a system style that must be followed by all developers (i.e., implicitly reused). For example, there arc conventions that state that every input must be tested using all validation criteria in the ADB data model and that help messages must be available for all interactions. Reuse of implementation cliches. TEDIUM program specifications are compact because they rely on a representation (language) that encodes frequently used patterns (cliches or plans) to reduce housekeeping. Two types of specification are available. Generic specifications are defined declaratively and are available for frequently used processes such as specifying a menu. Common specifications, on the other hand, are built from procedural statements. In both cases, the specification formulates the behavior of the program in terms of concise cliches, while the system style determines how it should be implemented. For example, the cliche "Prompt for ..." would have different implementations with system styles that default to pop-up menus, a fixed prompt window, or a scrolling dialogue. Reuse of implementations. As with most environments, facilities are provided for copying programs and data models from existing applications, maintaining libraries, and validating the consistency of the copied objects in the context of the new application. Reuse of formal specification objects. The design of the ADB minimizes redundancy. Any formal object defined in the ADB is used by all objects that reference it, thereby reducing housekeeping and enforcing consistency. Reuse of informal specification objects. All subjective information in the ADB is available to support multiple roles. For example, the descriptive text is referenced during design, is available for online help, and is integrated into the delivered documents. Wherever possible, text is fragmented and organized to serve several needs, which facilitates maintenance and preserves consistency in the documentation. Of course, this list says nothing new; it just says it differently. What TEDIUM Is Not I have thus far focused on the description of how TEDIUM implements the paradigm described in the previous chapter. The three chapters that follow will clarify exactly how TEDIUM works. A history of how TEDIUM came into being is given in Chapter 3 of [EnLB89]. Therefore, before continuing with a description of MUMPS and the INA example, it remains only to explain what TEDIUM is not. It is not a Fourth Generation Language (4GL). While both are concerned with the IIS, TEDIUM was intended to implement products of arbitrary
Chapter 3
70
complexity that operate with the efficiency of custom crafted software. These are not characteristics of the 4GL. It is not an AI application. While both are concerned with issues of knowledge representation, TEDIUM does not attempt to offer intelligent support. It is based on a database paradigm, and most of its inference mechanisms are procedural. Although it produces programs automatically, it is not related to AI research in automatic programming. It is neither a specification language nor a high level language (HLL). Although the term specification is used, the range of the specification implementations is limited. Although TEDIUM can be used as a HLL for some portions of an application, its use for that purpose offers little benefit. It is not a rapid prototyping tool. Although TEDIUM supports the rapid evaluation of prototypes, each prototype is a complete system ready for operational use. It is not an end-user tool. TEDIUM assumes that the final product will be sufficiently complex to require the services of a designer with both application-class and software-tool knowledge. It requires an investment to learn TEDIUM; until its concepts are understood, it may be difficult to use. It is not a modern product. One of the major advantages of TEDIUM is a decade of operational experience that can be analyzed. The present version of TEDIUM was frozen in 1982. Thus, it takes limited advantage of much of what we have learned in this decade. As with cancer research, if we seek the confidence of five-year survival data, then we must rely on therapeutic regimens that are eight years old.
3.3
OVERVIEW OF MUMPS
The Massachusetts General Hospital Utility Multi-Programming System (MUMPS) was designed in the late 1960s to provide interactive programming facilities for medical database applications using minicomputers. The system was developed to operate in a dedicated interpretative environment, and several versions of MUMPS evolved. In the late 1970s, a Standard MUMPS language emerged [Conn83, Zimm84, Lewk89]. Since then, all implementations have conformed to that standard. Most new implementations are offered as layered systems running under a host operating systems. MUMPS is a nondeclarative, typeless language with very powerful facilities for database access and string manipulation. All elements are treated as variable length strings. The numeric value of a string is computed until a non-numeric character is encountered. For example, "1234" would be evaluated as 1,234, the string "150 lbs" would be evaluated as 150, and the string "string" would be evaluated as 0. Strings are stored internally as either characters or numbers; the latter are recognized by the absence of non-
TEDIUM, MUMPS, and the INA Example
71
numeric characters or formatting information (such as leading zeros). Thus, although the variables are treated as characters, computation is fairly efficient. The maximum size for a string is usually 1024 characters, but many vendors support longer strings. Storage is allocated to a string when it is assigned or read; the storage can be released with the Kill command. (All standard MUMPS commands can be reduced to the initial letter. Consequently, there should be no more that 26 commands. Actually, some letters are not used, and the same letter can be used for different commands. Interpretation is taken from the context of use.) Among the most powerful string manipulation functions are: $Extract(String,Posl,Pos2), which returns the substring from String that begins with character position Posl and ends at position Pos2. $Find(Stringl,String2,Pos), which begins a search of Stringl starting at Pos. If a match with String2 is found, it returns the character position at the end of the match. It returns 0 if no match is found. $Piece(String,Delimiter,Numl,Num2), which returns a substring of String between the Numlst and the Num2nd occurrences of Delimiter. MUMPS programmers generally separate fields in a string by a character such as the circumflex (A). Here, $P("AABACDEFAG",HAM,3) will return the string "CDEF". (Num2 defaults to Numl; if Numl is not supplied, it defaults to 1.) Any string can be used as the delimiter, and $P("AABACDEFAG", "DEF") returns "AABAC". SPiece can be used on the left-hand side of an assignment to set a portion of an existing string. Additionally, there are some character-oriented operators for pattern matching (?), for the contained-in attribute ([), for collating sequence order (]), and for concatenation (_). The language is line oriented and cannot be considered structured. The do while function is performed by the For command, and the scope of the For is the end of the current line. There is no if ... then ... else ... in the traditional sense. The MUMPS If command terminates the processing of the remainder of the current line if the predicate is false. When terminating the processing of the line, a truth value is set in the system variable $T. The MUMPS Else command is similar to the If command, except it uses the current value of $T as its predicate. Once tested, $T is returned to true. Control is managed through two branching commands and one returning command. The Do is the equivalent of a call in that, before it transfers control, it enters the next command position on a return stack. The Goto causes an unconditional transfer. The return command is the Quit; it transfers control to the location at the top of the stack. Recently, user-defined functions and parameter passing were added to the MUMPS standard. These features were not in effect when TEDIUM was frozen, and consequently they are not used in the present TEDIUM version. TEDIUM parameter passing
Chapter 3
72
assumes that all values exist in a single symbol table shared by all routines* in the active partition. (Recent changes in the standard also have allowed for symbol tables to be associated with routines, but that facility was not available when TEDIUM was constructed.) Assignment is accomplished commands, multiple arguments, predicate (called a postcondition) allow its execution only if true. the variable X being assigned the
with the Set command. Like most MUMPS separated with commas, can be used. A also may be appended to a command that will For example, the following line will result in value of 1.
Set X=1,Y=2 S:Y0 an error. S (string) - This supplies a string of characters, separated by commas, that represents the valid values for the element. A better form of the function is the table criterion. The string normally is used for simple options such as Y,N for yes or no. T (table) - This requires the definition of a table that contains both a valid code plus a short description. The table is printed out in response to a help request. For example, a request for help regarding the kinds of TEDIUM validation tests would include the following table: C E L P R S T V
Computed Expression Limits Pattern Routine String Table Verified as
The last value read is always stored in the variable YC. All variables beginning with the letter Y are considered reserved TEDIUM terms in the Ml version. (This restriction is not carried over to the T1 version.)
Chapter 4
90
V (verified as) This is used to establish the same validation criteria for this element as already specified for an defined element. For example, given a table MANUFACTURERS(Manufactld) = ... that defines all valid Manufactld values for the application, assume that we want every instance of the element ElectronicsVendor to be taken from the set of valid Manufactld values (i.e., have a corresponding entry in MANUFACTURERS). In specifying the validation, we would indicate that it should be verified as Manufactld. Both the data type and validation criteria establish properties for the element that must be true whenever the term is referenced in the application. Additional properties, called the role, may be imposed when the element is used in a particular table. The following roles are supported by TEDIUM: C (computed) This is identical to the computed validation criterion described above, except that the computation is limited to the use of the element in the current table. The element may have other roles in different tables. The element-level computed value, if it exists, always takes precedence. D (default) This is the value to be entered into the table at initial creation if the user does not supply a non-null value. For example, the default value for ValidationCriteria in the table ELEMENT is "N" for none. M (mandatory) This indicates that a valid non-null value must be supplied before the entry can be entered into the database. In TEDIUM, the role for an index element is defaulted to "M" for mandatory. O (optional) - This indicates that the term is optional and a null value will be accepted. In TEDIUM, the role for a data element is defaulted to "O" for optional.
4.2.3
Data Model Definition in TEDIUM
Having described some of the properties of TEDIUM tables and elements, we now can examine how these objects are used in the data model. Recall that TEDIUM is an IIS developed with TEDIUM. Thus, the ADB that contains the definitions of the TEDIUM applications is defined as a set of TEDIUM tables. The tables in this set are accessible only within the context of some application set (i.e., Applicationld). Within this context, there are tables that
]|t
#
Defined in Version Ml means that the element is defined in some table. Version T1 uses a less stringent form of the V criterion. In effect, it makes the object of the "verified as" reference a data type independent of its table definition status.
The Data Model
91
define ADB objects, such as ELEMENT cited above, and tables that describe the objects’ structure in further detail, such as in the table containing the list of valid role values. The design (and evolution) process consists of specifying application objects and their properties in the ADB and then generating programs to test the validity of the model. The entry and maintenance of the ADB objects is managed by the system style and the TEDIUM table definitions, and that is the topic of this section. Generally, TEDIUM organizes the objects in the ADB in two levels. The first level is a class-identification table that defines (in the underlined sense) the identifier and appends a short title, used in system documentation, along with some optional descriptive text. TEDIUM also automatically enters into the class-identification entry the designer’s identification and date every time an object is defined or edited. Objects so defined include tables, elements, and programs. The second level of table definition normally is required to define the object’s contents or structure. Examples here include the dictionary of valid values for elements with the T validation criterion, and the definition of which index and data elements are associated with a given table. When using TEDIUM to process an application’s data model, the designer begins by selecting the option to enter or edit a table. A table identifier (not to exceed six uppercase characters) is entered, and TEDIUM checks to see if that table has been defined already (i.e., the program tests to see if there is an entry in the table-class-identification table for this application and identifier.) If so, then TEDIUM allows the user to edit the table contents (i.e., the definition of the order and roles of the index and data elements). Assuming that the table has not yet been defined, TEDIUM prompts the designer with Generate a new entry (Y/N) to indicate that the entered value is not in the ADB. The first option, Y for yes, is the default, and a null return will initiate the definition process. Because the identifier has been supplied already, the system need only prompt for the (mandatory) short title and the (optional) text field description. Once these are entered, the designer can either ignore what was entered, edit the information, or accept it. After the name of a new table has been entered into the ADB, a program prompts for the definition of its contents. Using a scrolling dialogue, the program processes at least one index element and an arbitrary number of data elements. For each element to be linked to the table definition, an element identifier is requested. If the element is not already defined in the ADB, then a flow similar to that just described is repeated. TEDIUM prompts with Generate a new entry (Y/N) and, if yes, it follows with prompts for the short title, data type, length, validation criteria, and descriptive text. The user can reject, edit, or accept the results. Once the element definition is entered in the ADB, the program
Chapter 4
92
prompts for the element’s role in the table. (If the element was already in the ADB, then the program would skip immediately to the point where the role in the table is established.) In this flow, up to four different sets of ADB tables are referenced. The dictionary of valid table identifiers. The definition of the table contents including the order of the index and data elements with their roles. The dictionary of valid element identifiers along with specifics regarding their data type and validation criteria. The definition of the table of valid values if the T validation criterion is specified. Consequently, four different sets of edit functions are required. The enter/edit table definition function just described is used to create new table definitions and edit the definitions of tables in the ADB. A separate TEDIUM Edit menu provides access to functions for editing classidentification entries and the element definition. Naturally, TEDIUM offers tools to retain referential integrity. One cannot delete an element definition if it is referenced as an index or data element in some table. Also, the deletion of a table definition, when valid, results in the deletion of table entries that define its contents as well as any element definitions referenced only by the table to be deleted. Although I have used a notation to describe the TEDIUM data model, there is no external data model. The entire model is stored as a network in the ADB. It is created and updated in the context of existing definitions. Listings of the definition are prepared from the ADB to reflect particular views of interest to the user or designer. Such views are projections of the whole and only partially represent the data model.
4.2.4
Related Tables and Structures
Before examining an example of the TEDIUM data model, two additional concepts must be introduced. The first is the idea of related tables, which integrates the concepts of relationships and secondary keys. In brief, two or more tables are related to each other if: They are in a one-to-one correspondence with each other.*
TEDIUM version T1 allows a many-to-one correspondence if the relationship can be expressed as a TEDIUM predicate.
The Data Model
93
One table is designated the primary table, and the other tables arc called secondary tables. Only the primary table can be updated; the secondary tables are read only. All of the elements of a secondary table are contained in the primary table. All index elements of the primary table are contained in each secondary table. (That is, given a secondary table entry, one can find the corresponding primary entry.) With this definition, the program generator can distribute all changes to a primary table among the secondary tables. A simple appointment system example will there be an appointment table
illustrate the concept.
Let
A PPOINTMENTS(Patient, Appointment Date) = Doctor,Clinic,Time,Complaint that defines the appointments for a patient. There also is a need for tables to indicate what appointments are scheduled for each doctor and clinic. These may be defined as secondary tables: DOCTOR APTS(Doctor,AppointmentDate,Time) = Patient CLINICAPTS(Clinic,AppointmentDate,Time,Doctor) = Patient Notice the semantic information is contained in these three tables: Patients can have no more than one appointment in a single day. Doctors never have more than one patient scheduled for the same time. Clinics normally schedule more than one patient for the same time. Also observe that the secondary tables contain the full key of the primary table (i.e.. Patient is a data element), which allows cross references if a delete is requested when processing the database by secondary table. The full index also is used when the primary table contains elements not included in the secondary table (e.g., the patient’s complaint). In addition to establishing some characteristics of the data model, the secondary tables also determine the most important access orders for the application database. In this appointment system illustration, it is important to have online access to patient appointments, ordered first by patient identifier and then by date, to doctor schedules, ordered first by doctor identifier and then by date and time, and to clinic schedules, ordered first by clinic identifier, then by date and time, and finally by doctor. Also, it was not considered important to have access to appointments by date independent of patient, doctor, or clinic. Clearly, a great deal of semantic information about the problem domain has been captured in this simple data model.
Chapter 4
94
A second example further illustrates how the concept of secondary tables extends the concept of a secondary index to express semantic knowledge of the application. Let me define a node to be maintained in some larger structure as NODE(Term)=Title.Text. To express the structure as a hierarchy, I define an element, LinkedTerm, to be verified as (V validation criterion) Term. Then the following two related tables define a hierarchy: STRUCT(Term, Order) = LinkedTerm STRUCTINVERSE(LinkedTerm) = Term,Order. Term identifies the node in the hierarchy, and Order establishes the listing order for children of the same parent. LinkedTerm indicates the child for the next node. STRUCTINVERSE has a single index element, implying that there can be only one entry in that table with the given value for LinkedTerm (i.e., LinkedTerm can have only one parent Term). The program generator will create code to enforce this requirement. By changing the definition of STRUCTINVERSE as follows, the structure reduces to a network in which a LinkedTerm can have more than one parent: STRUCTINVERSE(LinkedTerm,Term) = Order. In this relationship, no LinkedTerm can be tied to a given Term more than once. (For a given LinkedTerm,Term pair. Order is unique.) The following variation of STRUCTINVERSE allows LinkedTerm to have more than one association with a given Term: STRUCTINVERSE(LinkedTerm,Term,Order). Thus, the TEDIUM data model not only indicates the semantics of use, it also implies some dynamic tests required to preserve database integrity. What is not said in the data model also is important. In none of these last three cases does it seem semantically valuable to view terms in the context of their order. (If it were useful, then additional secondary tables would be defined; if this were required only occasionally, then the information could be derived procedurally or by a query.) The final TEDIUM data model concept is that of the structure, which is both a useful modeling tool and an implementation device to improve performance. I explain the latter first. Because TEDIUM generates MUMPS code, it must define the names of the globals where the data are to be stored. TEDIUM provides a unique global name for each table. This may lead to some operational inefficiency. Different tables may describe similar aspects of the same entity; because they define different nodes in an entity’s hierarchy, they will be stored in separate locations.
The Data Model
95
For example, returning to the patient example of Chapter 3, assume that we have P ATI ENT( Patient Id) = PatName,PatSex,PatAge PATDX(PaticntId,DxDate) = Diagnosis PATTEST(PatientId,TestDate,TestName) = TestResult and that whenever we reference one table we want also to reference the other two. Keeping the tables as three globals will require more accesses than if they were organized as three nodes in a single global. To accomplish this, TEDIUM allows the designer to impose an internal (MUMPS global) name to be used for storage. Here, if all three tables were given the same internal name (e.g., APat), the entries in each table could be identified by the number of index elements. As 1 will show below, if such uniqueness did not exist, the addition of a coded index element would allow the generator to distinguish among the different nodes in a structure. The structure feature also allows the designer to create complex structures for an entity. The designer can produce listings that indicate the entity’s overall organization. Tools exist to delete or rename the entity with a single operation. Although more work will be done on this conceptual level, the structure has proven itself to be a very valuable device for improving efficiency and for integrating TEDIUM-generated code with other applications.
4.3
THE INA DATA MODEL
I now illustrate the features of TEDIUM’S semantic data model with the INA Data Model Manager. As described in Chapter 3, the INA retains a model of a database to be accessed by SQL statements, a domain vocabulary that allows the users to express their queries without table references, and a set of links that guides the Query Resolution Manager in the translation of the users’ intent into an SQL query. The Data Model Manager (or the application set DATMOD as it is defined in TEDIUM) provides the tools to define and maintain these INA objects. Thus, the DATMOD data model illustrates how TEDIUM can be used to represent a relational data model, an Entity-Relation model, and an external vocabulary. In describing the data model 1 use the names defined in the DATMOD application set, which are restricted to six characters with no lowercase letters. The discussion includes a brief overview of the elements; for reference, Table 4.1 contains the element names and their formats. In some cases the same mnemonic is used for both a table and an element; table identifiers are always in boldface. At times, I have eliminated details that do not contribute to the illustrative goal of the example.
As will be described in the following chapters, the limited. They are being expanded in version Tl.
version Ml
tools are
Chapter 4
96
Element
Descriptive Name
Format
ELCDNM ELDESC ELEM ELSIZE ELTYPE FN1 FN2 GELEM KN MODE NN PRIELE RANRS RANRSI REL RELDES RELL RELPRE RN RSHKY RSHKYE RSHMNA RSHMNB RSHMXA RSHMXB SUBDES SUBN SUBSYS SYSDES SYSPRE SYSTEM UVDSR UVGESR UVGRNO UVHNAM UVLNAM UVNAM UVSTNO UVTERM UVTEXT
Element Code Name Element Description Element Name Element Size Element Type Foreign Key Sequence No Element Sequence No Generic Element Name Sequence Number Mode of Element Sequence Number Primary Element Identifier Analyzed Relationship Relationship (Inverted) Relation Name Relation Description Linked Relation Relation Prefix Sequence Number Relationship Key Status Explanation of Relationship Root Minimum Tuples Link Minimum Tuples Root Maximum Tuples Link Maximum Tuples Subsystem Description Sequence Number View (Subsystem) Name System Description System Prefix System Id Definition Source Source of Generic Term Sequence Number Name for Help Message Linked External Term Vocabulary Term (LC) Sequence Number External Term External Term Description
V(36) T V(30) 1(3) V(3) 1(5) 1(5) V(36) 1(5) C(l) 1(5) V(30) V(12) V(12) V(30) T V(30) V(30) 1(5) C(4) V(30) C( 1) C(l) C(l) C(l) T 1(5) V(30) T V(24) V( 15) C(l) C(1) K5) V (45) V(45) V(45) 1(5) V(45) T
Table 4.1
Verification
Def by ELEM Table of values
Def by GELDEF Table of values Verified as ELEM
Def by REL Verified as REL
Table of values String String String String
of of of of
values values values Values
Def by SUBSYS
Def by SYSTEM Table of values Table of values Verified as UVNAM Verified as UVNAM Def by UVNAM
List of elements used in the INA data model example.
I previously indicated that the INA maintains models for many systems and that all relations and attributes are defined within the context of a system. The following table defines the valid systems. SYSTEM(SYSTEM) = SYSPRE,SYSDES
The Data Model
97
where SYSTEM is the system identifier, SYSPRE is a system prefix required in the SQL queries by the DRD system, and SYSDES is a text field used for descriptive information. This is a typical format for a class-identification table. It may be expanded with internal characteristics associated with the item plus additional identifiers. My normal style is to use both a short identifier (typically under 45 characters long so that it can be used in tabular displays) and a text description. For the INA, the sponsor felt that the SYSTEM names (as well as the relation names) were sufficiently descriptive in their mnemonic form. In the INA, systems can be subdivided in subsystems. Recall that the DRD is constructed from files extracted from several functional systems. For both analysis and query resolution purposes, it was considered important to group the DRD relations by the systems from which they were derived. These groupings were called subsystems in the INA data model. Each relation would be associated with a single subsystem. For listing purposes, an ordering of the subsystems within the system was desired. Following this principle, the relations were ordered within a subsystem. If no ordering was supplied, then an alphabetical ordering of the subsystems within the system (relations within the subsystem) would be used. This is defined in the following way. Two related tables are defined to establish the subsystems. SUBSYS(SYSTEM."SU”.SUBSYS) = SUBN,SUBDES SUBORD(SYSTEM,"SO",SUBN,SUBSYS) where SUBSYS is the subsystem identifier, SUBN is a sequence number that establishes the order among the subsystems, and SUBDES is a text description. As will be discussed below, all of the system and subsystem tables are maintained in a single structure, and the second literal index element serves to define the table type uniquely. That is, given an entry in the structure, the generated program can use the index pattern to identify the table that defines its contents. (For the portion of the model shown here, uniqueness can be established from the number of the index elements.) The secondary table SUBORD provides a list of subsystem identifiers in numeric order (i.e., the subsystems are sorted by SUBN). If all values for SUBN are zero (its default), then the identifiers will be in alphabetical order. The tables that define the relations and their ordering in a subsystem follow a similar organization. Again, two related tables are defined. REL(SYSTEM.REL) = SUBSYS,RN,RELDES SYSREL(SYSTEM,"SU",SUBSYS,"REL",RN,REL) where REL is the relation identifier, RN is the relation order within SUBSYS and RELDES is a descriptive text field. (In the INA implementation there are some additional fields, but they do not contribute to the present discussion.) SYSREL, like SUBORD, is a secondary table to establish the order for listings. Given that the relation identifier exists in REL, an extension of the relation structure is required to define its specification. This can be
Chapter 4
98
generalized as an ordered sequence of key attributes, an ordered sequence of nonkey attributes, and an ordered sequence of foreign keys where each foreign key is specified as a sequence of attributes. This is defined as follows. RELK(SYSTEM,REL,"K",KN) = ELEM RELN(SYSTEM,REL,"N",NN) = ELEM RELFST(SYSTEM,REL,"F",FN 1) RELF(SYSTEM,REL,"F",FN1,FN2) = ELEM where RELK defines the sequence of key attributes for the relation REL, RELN defines the sequence of nonkey attributes, and RELFST defines the sequence of foreign key strings. The sequence of attributes in a foreign key string is defined in RELF. The name of the attribute is given in the variable ELEM. To find out, for a given value of ELEM, which relations (REL) reference it, the tables RELK, RELN, and RELF have the following secondary tables: ELRELK(SYSTEM,ELEM,"K",REL) = KN ELRELN(SYSTEM,ELEM,"N",REL) = NN ELRELF(SYSTEM,ELEM,"F",REL) = FN1.FN2 Once the relations have been defined, the relationships among relations should be specified. Two symmetric related tables are used.
the
RSH(SYSTEM,REL,"RE",RELL) = RSHMNA,RSHMXA,RSHMNB,RSHMXB,RSHKY,RSHKYE RSHL(SYSTEM,RELL,"RE",REL) = RSHMNB,RSHMXB,RSHMNA,RSHMXA,RSHKY,RSHKYE where RELL is the relation that has a relationship with REL. (RELL is defined to be verified as REL, and both REL and RELL come from the attribute set defined by REL.) The relationship is described with a pair of maximum and minimum values. For example, a l:m relationship, where the single record was mandatory and the m records were optional would be defined in an RSH entry as: Node
Minimum
Origin Origin Destination Destination
Maximum
1 1 0 m
Table element RSHMNA RSHMXA RSHMNB RSHMXB
The INA listings would show this as a l:0-m relationship.
The data element
As will be shown below, the INA data model makes it very easy to identify the existence of foreign keys. Consequently, their explicit specification is not necessary. The INA sponsors, however, wanted the foreign keys to be listed in the format used by their implementation and analysis programs. Thus the INA carries them as cross-reference documentation.
99
The Data Model
RSHKY indicates how this relationship was established. codes is valid:
The following table of
Code
Explanation
D
Designer-specified when INA is used as a modelling tool Key-defined by an analysis of the INA data model Model-specified by using some external modeling tool Other, the explanation is supplied in RSHKYE.
K M
O
Notice that the two tables are symmetrical, and one cannot determine from an entry in the structure which is the primary or the secondary table. As long as the file management programs retain the one-to-one correspondence for entry pairs, this will not cause any problems. The code of K for RSHKY suggests that an examination of the key and nonkey attributes in the INA definitions can supply information about relationships. Often, one can determine from the keys if the relationships are 1:1, l:m, or m:n. One also can identify relations that are not in third normal form. Thus an analysis program was written to create entries in the tables: RANRS(SYSTEM,REL,"RAR",RELL) = RANRS R ANRSI(SYSTEM,RELL,"RAR",REL) = RANRSI where RANRS is the analyzed relationship (1:1, l:m, m:l, or m:n) or a question mark (?) if the relationship is not uniquely determined by the graph implied by the syntax. RANRSI is the inverse of RANRS; it substitutes m:l for l:m and l:m for m:l. Attributes that violate third normal form criteria are identified in the table R ANKD(SYSTEM,REL,"RAD",MODE,ELEM,RELL) where MODE is K, N, or F for key, nonkey, or foreign key, respectively, and RELL identifies the relation where an ELEM dependency exists. (The full definition of this table is shown below in Figure 4.1.) The tables defined so far describe the system, its subsystems, relations, and relationships among relations. It remains to define the elements. Recall that the goal of the INA was to offer an interface allowing the user to navigate through a vocabulary of domain terms, select the items of interest without any reference to the relations in which they are used, and then have that request translated into an SQL query that references attributes known to the system being searched. The INA data model, therefore, must contain both an external vocabulary with all the appropriate terms in the universe of discourse plus an internal vocabulary containing the attribute names and codes used by the target system. Because there may be many-to-many mappings between these two vocabularies, a third vocabulary of generic terms is introduced to facilitate the mapping process. Beginning with the internal vocabulary, its elements are defined in the tables:
100
Chapter 4
ELEM(SYSTEM.ELEM) = ELCDNM ELDEF(SYSTEM,ELEM,"D") = ELNAME,ELTYPE,ELSIZE,ELDESC where ELCDNM is the attribute name in the target database (the name to be used in the SQL query), ELNAME is a short title for the name, ELTYPE and ELSIZE define its type and length, and ELDESC is a short text description to be used for help messages and documentation. The literal index element in ELDEF suggests that additional tables are available to define selection criteria, lists of domain values for the element, etc. The generic vocabulary terms are defined in GELDEF(SYSTEM.GELEM) = PRIELE where PRIELE is the primary value of ELEM to be used for that generic element unless other conditions override this selection. The element PRIELE is verified as ELEM. The user vocabulary terms are defined in UVNAM(SYSTEM,UVNAM) = UVTERM,UVHNAM,UVTEXT where UVNAM is the term name in lowercase and UVTERM is the same term in uppercase and lowercase. Only UVTERM values will be displayed to the user; all searching and ordering will use the lowercase UVNAM terms. UVTEXT is the a text description for help messages, and UVHNAM, if supplied, indicates a UVNAM value whose UVTEXT value should be used for help messages. Naturally, UVHNAM is verified as UVNAM. Links among the internal vocabulary terms (ELEM) and the relations (REL) are maintained in the tables ELRELK, ELRELN, and ELRELF described above. Links between the internal and generic vocabularies are maintained by the pair of related tables: GELDB(SYSTEM,GELEM,"GE",ELEM) GELDBI(SYSTEM,ELEM,"GE",GELEM) Links between the generic and following pair of related tables.
user
vocabularies
are
maintained
in
the
GELUV(SYSTEM,GELEM,"UV",UVNAM) = UVGESR UVGEL(SYSTEM,UVNAM,"UV",GELEM) = UVGESR where UVGESR is the source of the user vocabulary term. Two sources are permitted. The term may be taken from the definition of the primary element
These two pairs of related tables are not symmetric as in the case of RANRS and RANRSI. Here, the same literal index is used because the tables will be maintained in different structures. INA structures are defined for internal elements, generic elements, and the user vocabulary terms. In these structures, the two high-order index elements of each structure table must be identical. The INA structures are described below.
101
The Data Model
(PRIELE) for the generic element (i.e., it may be the value of ELNAME associated with PRIELE). After changes have been made to the internal data model, such terms can be automatically deleted and regenerated to reflect those changes. Alternatively, the value of UVGESR may indicate that the association between the GELEM and UVNAM values was established by an analyst; such links can be deleted only by an analyst. The user vocabulary must support two different functions. First, it must facilitate the identification of domain terms that map directly onto database attributes. Second, it must express frequently used query concepts that imply selection criteria and groupings of terms to be listed. Considering the first function, three ways to identify a user vocabulary term are provided. By direct selection of the term after entering the selecting from a list that matches the input pattern.
initial
letters
and
By navigating through a structure that lists broader than, narrower than and similar to terms to identify the desired term (see Figure 3.6). This structure is called a tangle (from Pople’s term tangled hierarchy [Popl82]). None of these relationships is transitive, and the relationships displayed at the local level will depend on the navigation path taken. By navigating through a formal thesaurus in which all terms are in a fixed hierarchy with links for see also, broader than, and narrower than. The first of these access methods is supported UVNAM. Two pairs of related tables define the tangle:
directly
by
the
table
UVTRD(SYSTEM,UVNAM,"D",UVLNAM) U VTRU(SYSTEM,UVLN AM,"U",UVN AM) UVTRS(SYSTEM,UVNAM,"S",UVLN AM) U VTRSI(SYSTEM,UVLN AM,"S",UVN AM) where UVLNAM is the linked term defined to be verified as UVNAM. The pair of tables UVTRD and UVTRU complement each other; they define the narrower-than category (D for down) and broader-than category (U for up). The symmetric tables UVTRS and UVTRSI define the similar to-property. The vocabulary structure (thesaurus) is defined as a tree in the tables: U VSTR(SYSTEM,UVNAM,"ST",UVSTNO) = UVLNAM UVSTRI(SYSTEM,U VLN AM,"STX") = UVNAM.UVSTNO where UVSTNO is a sequence number used to establish an ordering among the children of a parent node. See-also, broader-than, and narrower-than links, which refer to nodes in the structure hierarchy, can be defined in a form similar to that used for the tangle. The second function of the user vocabulary terms is to abstract a broader concept for use in a query. There are two categories of abstraction. The
Chapter 4
102
first, called the defined terms, is established by a table with the following general organization: UVDEF(SYSTEM,UVNAM,"DEF") = Definition of selection criteria. Here the data elements contain information necessary to link the name given in UVNAM with selection criteria expressed in terms of the internal vocabulary. In the example given in Chapter 3, Current is defined as YR="89". The data terms for UVDEF have not been listed; they include the evaluated expression plus the elements necessary to preserve referential integrity among the three vocabularies. The second abstraction, the group, identifies a set of attributes to be listed when a single user term is specified. Groups are established in the table: UVGRP(SYSTEM,UVNAM,"GRP”,UVGRNO) = UVLNAM where UVGRNO identifies the default order for the listing of the attributes in the output. Thus, the group defines one user term to be an ordered list of terms. (In the example given in Chapter 3, Unit is defined to be the coded unit identifier plus a description.) All query definition functions operate at the level of the user vocabulary; the query resolution manager translates (via the generic vocabulary) the stated query into an SQL query using the internal vocabulary. In this processing of a query, two types of user vocabulary terms can be identified. Terminal terms are those that map into some internal vocabulary elements. The map may be either direct, as with terms linked to generic elements through UVGEL, or indirect, as with the terms in the tables UVDEF and UVGRP. No user selection of a term that is not a terminal can be processed by the query resolution manager. The nonterminal user vocabulary includes terms used to select the terminals. This includes terms in the tangle or the thesaurus structure. Such terms may be referenced to identify a term to be used in the query, but they may not be selected in specifying a query. Naturally, the tangle and thesaurus consist of both terminal and nonterminal user terms. In the data model just presented, no distinction between these two types of terms was indicated. As I will show in the following chapter, procedural code that relies on an existence function is used to determine the classification. In the preceding data model description, five structures containing 35 tables were used. These structures are identified in Table 4.2. In this discussion, it is clear that TEDIUM uses no diagrams. The design is viewed as a tangle; it can be understood in the small, but it is difficult to display the relationships in a single representation. Thus, TEDIUM offers several ways to view the application’s current specification. The designer has online access to the definitions of terms and the organization of tables. Objects may be viewed or listed in various levels of detail (see Chapter 6). For the purposes of this discussion, most of the detail was suppressed. Figure 4.1 illustrates the
The Data Model
System Level Information (Structure DMSY) SYSTEM System Identification SYSPHN System Phone Numbers SUBSYS Subsystem Identification SYSREL Relation Order in Subsystem SUBORD Subsystem Order Relations (Structure DMRE) REL Relation Definition RELK Relation Key RELN Relation Nonkey Elements RELF Relation Foreign Keys RELFST Foreign Key Strings RSH Relationship Definitions RSHL Relationship Definition (Link) RANRS Analyzed Relationships RANRSI Analyzed Relationships, Inv RANKD Analyzed Element Dependencies Internal Elements (Structure DMELI) ELEM Element Name ELDEF Element Definition ELRELK Element Index, Keys ELRELN Element Index, Nonkeys ELRELF Element Index, Foreign Keys GELDBI DB Refs to Generic Element Generic Elements (Structure DMELG) GELDEF Generic Element Definitions GELDB DB Refs for Generic Element GELUV Generic to Universal Vocab External Elements (Structure DMELU) UVNAM Universal Vocabulary Terms UVTRD Universal Vocab Tree - Down UVTRU Universal Vocab Tree - Up UVTRS Universal Vocab Tree - Same UVTRSI Universal Vocab Tree - S Inv UVGEL Universal to Generic Terms UVDEF Vocabulary SQL Definitions UVGRP Output Groups UVSTR Vocabulary Structure UVSTRI Vocabulary Structure, Inverted UVSTRX Vocabulary Structure Reference Table 4.2 List of structures defined in the INA data model example (with DATMOD program identifiers and 30-character names).
103
Chapter 4
104
more complete specification of a table with its associated element definitions. It contains a full listing for the table RANKD, which contains the element dependencies identified by the data model manager. The table listing includes the descriptive information about the table plus the definitions for each element in the table.*
11/14/88
TEDIUM LISTING FOR APPLICATION DATMOD DATA GROUPS RANKD
Analyzed Element Dependencies
ADMRE
BIB BIB
04/13/87 04/13/87
This table identifies all dependencies that violate third normal form for the given relation. INDEX TERMS : SYSTEM System Id
VARIABLE LENGTH
(15)
This is the identifier for the system whose data model is described. REL
Relation Name
VARIABLE LENGTH
(30)
This is the identifier of the relation in the data model. IKEY
Index key variable VARIABLE LENGTH (6) *** THE INDEX VALUE IN THIS TABLE IS COMPUTED AS "RAD"
***
This is a general key used in indicies to separate tables in a structure. MODE
Mode of Element F Foreign key K Key term N Nonkey term
CHARACTER
(1)
This is the mode of use of an element in a relation. ELEM
Element Name
VARIABLE LENGTH
This is the name of the element RELL
Linked Relation VERIFIED AS REL
(attribute)
(30)
used in a relation.
VARIABLE LENGTH
(30)
This is the identifier of a relation that is linked to the root relation.
Figure 4.1 Listing of the table RANKD. ijt
A better format for this listing has been implemented in the T1 version, but only Ml outputs will be used in this book.
The Data Model
4.4
105
THE TEDIUM DATA MODEL AS A SEMANTIC DATA MODEL
In the earlier chapters I asserted that TEDIUM uses a semantic data model. Given that this term is poorly defined, any data model that formalizes semantic information beyond that implicit in the relational model qualifies to use the label. This minimal criterion is not particularly satisfying, and this small section examines why I believe that the title is valid even though the claim may not be immediately obvious. TEDIUM does not follow the patterns now used in semantic data modeling research [Brod84, KiMc85, HuKi87, and PcMa88]. There arc two reasons for this. First, the Ml version of TEDIUM was frozen in 1982, before much of the current research was published. Second, and of greater importance, TEDIUM and semantic data modeling research address very different questions. Hull and King describe the role of the semantic data model as follows. At the present time the practical use of semantic models has been generally limited to the design of record-oriented schemas. Designers often find it easier to express the high-level structure of an application in a semantic model and then map the semantic schema into a lower level model. [HuKi87, p. 211.] Restating this in terms of the essential software process model discussed in Chapter 1, semantic models are used as conceptual models* to express application domain concepts so that a formal, implementation-oriented models can be constructed. TEDIUM, however, attempts to capture as much semantic information as possible in a form that can be managed automatically by the program generator. Semantic information that cannot be expressed in the data model must be implemented in the form of procedural specifications. Thus, this section on semantic data models really asks two questions: how does TEDIUM represent the semantic information that other semantic modeling methods capture, and what are TEDIUM’S limitations in this regard? I begin with a comparison of TEDIUM to one of the most widely used approaches, the Entity-Relationship model (E-R) [Chen76, Chen85]. In the INA data model, two tables (RSH and RSHL) were expressly defined to capture the (non-graphic) information expressed in E-R diagrams. Two other tables (RANRS and RANRSI) were defined to capture a subset of that information implicit in the relational data model. Because the TEDIUM data model is similar to the data model tables defined for the INA, I begin by examining what is in RSH that is not in RANRS. By analyzing the use of the index elements in the TEDIUM data model of the INA, it is possible to identify all relationships among relations (in the E-R sense). Often the cardinality of the relationships can be identified as 1:1 (e.g., they are related tables or a pair of tables with equivalent key elements).
In the field of information systems, these two terms are sometimes accepted as synonymous.
106
Chapter 4
m:n (e.g., tables that define objects joined by a table containing both object identifiers as index elements), or l:m (e.g., between two tables where the index elements of one are a proper subset of the index elements of the other). However, it is not always possible to determine the kind of correspondence by examining only the keys; in these cases, some additional information must be supplied by the analyst. Nevertheless, much of the explicit E-R relationship information is implicit in the definition of the relation dependencies. One important piece of information in the RSH table that cannot be established by examining the relation definitions is the fact that a relationship is mandatory or optional. For example, ELEM identifies the valid internal elements and ELDEF defines its properties. These two tables have identical (nonliteral) keys and consequently are in 1:1 correspondence. What is not known, however, is if this relationship is mandatory or optional. Because ELEM defines the values for ELEM, we know that ELEM entries are mandatory. The issue is, will the creation of an entry in ELEM force the creation of an entry in ELDEF before the former can be saved? In the INA implementation, the ELDEF entry is optional; the designer felt that one should not be forced to define the attributes of an internal element when the ELEM value is defined as an attribute of a relation. Thus, in the INA notation, the relationship is 1:0-1 (1:1 with the second relation optional). Extending these remarks to a comparison of the TEDIUM and E-R models, we see that the TEDIUM data model cannot always derive the cardinality of the relationship and has no means to express a mandatory/optional constraint for the relationship. Moreover, because all tables are defined in a uniform notation, TEDIUM offers no mechanism for distinguishing between entities and relationships. (Indeed, as shown in Table 4.2, both can be combined in the same structure.) Finally, because of the author’s prejudice, TEDIUM eschews any form of diagram. On the positive side, however, TEDIUM offers a much richer form for expressing dependency, supports the concept of a defined relationship between an element and a table, and produces a model that can be implemented directly. Thus, TEDIUM provides much, but not all, of the functionality of the E-R model and includes features that it does not contain. Still, the question of how well TEDIUM satisfies the goals of a semantic data model remains largely unanswered. To provide a more complete response, I now examine the TEDIUM data model in the context of the criteria established by Peckham and Maryanski [PeMa88], They surveyed the major semantic data models and, with a focus on conceptual modeling issues, identified eight concepts to compare the semantic data modeling capabilities of each model. I conclude this section by explaining how TEDIUM supports semantic data modeling in those categories. 1. Representation of unstructured objects. Unstructured objects are the low level types that cannot be constructed through aggregation. In TEDIUM there are the standard types of integer, number, and character string. There also are the less common types of date and time, which imply the automatic validation of inputs. Finally, there is the text data type, which may be considered an instance of a more general class that includes graphic, image, and voice objects.
The Data Model
107
2. Relationship representation. This refers to the representations presented to the modeler for analysis, (or in TEDIUM to the designer for implementation). They note that a relationship may be embodied by attributes, as in the case of a defined element in a table. The relationship also may be presented as an entity, as in the relationships implicit in a TEDIUM structure. Finally, the relationships may be expressed as an independent object, (e.g., the links between the internal and generic elements). Because TEDIUM relies on a single formal construct without a diagrammatic notation, the classification of TEDIUM definitions as relationships is ambiguous at times. 3. Standard abstractions present. This refers to kinds of abstraction mechanisms that are supported. There is a philosophical conflict that affects how TEDIUM supports this concept. The semantic data model is intended to operate at the conceptual modeling level, and it provides mechanisms for dynamically assisting the modeler in understanding the problem. In this context, generalization aids the modeler in ignoring minor differences among objects to produce a higher order type that captures what is common among those objects. Similarly, aggregation provides a means by which relationships among low-level types can be considered a higher-level type. TEDIUM is implementation oriented, and it expresses the abstractions as tables for implementation. Hierarchical (and inherited) relationships are implicit in the data model definition and not expressed explicitly. For example, if it was considered useful, the INA design could have generalized the common features of the internal and generic elements by defining an "element" table; ELEM and GELEM then would be defined as verified as (V) the identifier used in that table. Although that might be a useful concept, it would have added nothing to the implementation. An example of an implemented generalization is the internal element structure in which the literal keys indicate that each table shares the common properties of the highest-level entity, ELEM, which defines the internal element. An example of aggregation is the definition of the table RSH that defines a relationship between two tables. Finally, extending the discussion to other abstractions such as classification (e.g., is-instance-of) and association (e.g., is-member-of), the verified as element constraint can be used to represent hierarchical relationships. 4. Network or hierarchies of relationships. This refers to a diagramming context, and there is no need to reiterate my prejudices again. 5. Derivation/inheritance. This relates to the way in which the semantic model handles repeated information. In TEDIUM, the approach is to define the objects at a relatively detailed level and then build larger structures that reuse them. However, TEDIUM does not support the creation of derived information. For example, the value of NumberOflndexElements must be computed explicitly; it cannot be derived implicitly from a definition or automatically recomputed when the number of elements changes. There also are no automatic mechanisms for inheritance; multiple inheritance is always managed procedurally.
Chapter 4
108
6. Insertion/deletion/modification. This refers to the support provided by the data manipulation language (DML) to preserve integrity. For the TEDIUM data model, the DML is integrated into the specification language. (See Chapter 5.) There also are services for automatically ensuring integrity, which will be improved in the T1 version. 7. Degree of expressing relationship semantics. TEDIUM cannot express cardinality or derivation. It can denote the role of elements in a table, attribute validation criteria, and some interrelation integrity requirements. TEDIUM also has a rich notation for describing semantically meaningful dependencies and for grouping lesser units into structures. 8. Dynamic modeling. This describes the semantic properties of the database transactions (in contrast to the static model that describes the properties of the data objects and relationships). In TEDIUM, some of the dynamics are incorporated in the system style, and the static definitions in the data model imply default dynamic processes. Of course, this is the difference between a modeling and an implementation environment. The latter must have more tools for expressing the dynamics, but those tools need not be part of the data modeling component.
4.5
AREAS OF CONTINUING RESEARCH
Before concluding this chapter on the data model, I identify some extensions that are either in progress or planned. First, the T1 version has a much better user interface. Tables are defined with the short functional notation used throughout this chapter. Direct manipulation is applied wherever possible, and there is considerable flexibility in managing the processing flow. For example, after a table has been identified, the program asks the designer to define the new elements. This definition activity can be deferred, and status tables identify objects whose definitions are not complete. At the end of each session, the designer is given an opportunity to review the status data and close any open definitions. The T1 version also provides more flexible flow when defining related tables and structures. The six-character, uppercase limitation on object names has been removed. After any object is edited, comprehensive validation tests are performed for that and its related objects. For instance, after editing a table with a defined index element, the system checks that the defined index element still satisfies the criteria for a valid definition. Finally, most report formats have been revised slightly to make them more effective. In the T1 version, some additional data structures have been provided, including serial tables (with no index other than the implicit first and next), arrays, and stacks. Versions of each structure have also been defined for internal program use. That is, the designer may use the same tools (and DML) for working structures that are deleted at the end of the session. I plan to extend the related table concept to include a predicate (expressed in the TEDIUM command language) that will establish if a
109
The Data Model
secondary table entry is to be created or maintained. table
For example, given the
ACCOUNTS(Customer,Date) = Amount,... one could define OVERDUE(Customer,Date) = Amount where Difference(Today,Datc) between 120 and 360. The system style would examine the predicate each time the primary entry is processed and add or delete secondary records as necessary. The role of an element in a table establishes some first level of intratable constraint definition facility. It indicates, for example, that certain elements are mandatory. I plan to extend this to define further constraints among elements in a table; for example, SCHEDULE(Taskld) = TaskTitle,DateStart,DateStop where DateStart < DateStop. Such information is easily processed by the program generator. An extension of the above constraints would be to include intertable constraints, for example an indication that a relationship between two tables is mandatory if a given predicate is satisfied. To illustrate the concept. PERSONS(Personld) = PersonName,MaritalStatus,... SPOUSE(PersonId,SpouseId) = PlaceEmployed,... PERSONS:If MaritalStatus = "M" then SPOUSE mandatory. This is similar to the related table predicate except that these two tables are not related. My major concern with this extension is that it is not obvious how the system style can define what should be done when the predicate is true. In this example, should the generator prompt for spouse information when the marital status is "M"? If it does so, how will this affect the flow of the user interaction? In short, the real question is, once TEDIUM collects the knowledge about what the application is to do, what should it do with it? When using a semantic data model as a design tool, the implementation decisions can be deferred. All information about the problem space should be documented for use during implementation. But with TEDIUM, information collected about the application should be transformed by the systemstyle/program-generator into part of the implementation. If the program generator cannot be instructed on how to manage this transformation, then the information should be treated as text. Finally, some preliminary work has gone into extending the definitions of objects in the data model. Two facilities are being experimented with. The first is concerned with derived elements and the second introduces a limited form of logic inferencing. For example, given the table
Chapter 4
110
PARENT(Parent,Child) one should be given the facility to define the element* Parent:NumberChildren := Card(Child in PARENT) and GRANDPARENT(Personl,Person2) := PARENT(Personl,x),PARENT(x,Person2). The NumberChildren element could be used whenever the context of Parent was already established, and the values of Personl and Person2 could be displayed using a slight modification of the For Each command to be described in the following chapter. In both cases, assuming that the scope of the syntax is not too broad, there should be little difficulty is establishing how these constructs are to be interpreted by the generator.
The elements to the left of the colon establish the context. Read the following line as, for a fixed value of Parent, the value of NumberChildren is defined to be....
Chapter 5 Program Specifications
5.1
INTRODUCTION
All processing flow in TEDIUM is defined by the program specifications. Two types of specifications are available. Common specifications, written in the TEDIUM command language, are compact procedural expressions of process flow. Generic specifications, on the other hand, are declarative statements for highly specialized processes. The flow of a generic program may be modified by use of TEDIUM command statements. Because both types of specification rely on the TEDIUM command language, this chapter begins with a description of the language. This immediately raises the question of whether that TEDIUM command language is a very high level language, a fourth generation language, or a specification language. The context of the language’s definition was given in Part I of the book, and so I answer this question with an old riddle. If you call the tail of a dog a leg, how many legs does a dog have? The answer: Four, because calling the tail a leg doesn’t make it one. I call the TEDIUM command language a specification language; I leave the proof to the reader. Once the command language is described, there are sections that detail the common and generic specifications. Examples build on the INA data model defined in Chapter 4. The final section examines some extensions planned for the next version of TEDIUM (Tl).
5.2
THE TEDIUM COMMAND LANGUAGE
In keeping with the general philosophy of parsimony, the number of commands in the TEDIUM language is small. Table 5.1 lists the general categories of command with their associated counts; a summary of the commands is contained in Appendix B. Before explaining the commands, I first describe the organization of the command statements and command primitives that they use.
Chapter 5
112
Type of command
Number of commands
TEDIUM functions Control functions Computation functions Database functions Input/output functions Table 5.1
5.2.1
6 5 3 9 16
Examples Set up background job, control generator, no operation Loop through database, call programs Assign and free variables Low level navigation, put, get Read input, prompt, write, head listings
Summary of TEDIUM command functions.
Command Statements
The command language is organized as five fields. The objective is to structure a language that is easy to read on a display screen. Procedural commands are drawn in a box. Outside the box to the left are the labels that indicate where control can be transferred into the flow, and outside the box to the right are the labels that indicate transfers from the sequential flow. Obviously, the langauge is not structured. Inside the box, the commands always appear in the same position relative to the left side of the box. I experimented with indentation to indicate nesting level but found that it didn’t seem to add much to the clarity of the program flow. Having the commands in the same position, however, did seem to make it easier to scan through the program text to identify specific commands such as call or input. Thus, the command mnemonic and the associated parameters are always in the same position relative to left border of the box. Also within the box, each command has an optional condition statement, called the guard, that must be true if that statement (or block initiated by that statement) is to be executed. If the guard predicate is given, it is listed following the string "ON CONDITION that" supplied by the TEDIUM listing program. Figure 5.1 shows a small box of TEDIUM commands that was inserted into the specification of the menu that controls the INA data model manager listing functions. Here a request to browse the user vocabulary interactively causes the following sequence of actions:
With the graphics now available, I plan to experiment with the width of the left margin of the box to indicate nesting level. I have tried different characters for the left side of the box, but the result was an overloading of information rather than a clarification. It may simply be that the indentations so commonly used with other languages may be incompatible with the structure I selected for TEDIUM.
Program Specifications
BEG
ASK
113
C
SETUV
C
UVLS1
ON ABNORMAL RETURN
ASK
ON ABNORMAL RETURN Browse starting with another term (N/Y) PR ON CONDITION that YC= »»y ii N
ASK
Figure 5.1
BEG
Sample box from a TEDIUM specification.
At the statement with the label BEG, there is a call to the program SETUV, which allows the user to select a term from the user vocabulary. If a term is selected, it is part of the context for all that follows. If a term cannot be selected, then an ABNORMAL RETURN is signaled, and control is transferred to the statement with the label ASK. For the context established by the previous command, the program UVLS1 is called. It provides an interactive browsing facility similar to that shown in Figure 3.6. It can terminate normally, in which case control passes to the next command, or abnormally, in which case control is transferred to the statement with the label ASK. The statement with the label ASK prompts the user with the statement: Browse starting with another term (N/Y) In the present system style, valid commands: Null N or Y Help request Escape request
the
user
may enter any of
the
following
Here the first option (N) will be stored in the return variable (YC) The validated input will be stored in YC A help message, if available, will be listed, and the prompt will be repeated. The command transfers control to the location defined by the abnormal return field. If this field is null, then the default is an abnormal return from the box.
If an invalid entry is made, the generated program repeats the prompt. Notice that this command establishes a context in which the return variable YC is defined and contains either an "N" or a "Y". No other possibilities are allowed, and no other tests are required. Control now transfers to the next statement. The final statement has a guard of: YC="Y
Chapter 5
114
The command is the no-operation. If the guard is true, then control is transferred to the statement with the label BEG. (The dashed line to the left of BEG is present to eleminate parallax errors.) If the guard is not true, then control is transferred to the next statement in the box. Because this is the last statement in the box, the result is a normal return from the box. This simple example decomposed into five fields.
illustrates
how
each
TEDIUM
statement
is
Label
This optional field is required if transfer is to be sent to the command line. It is also used for help messages. For example, in the above example, ASK is used to associate a help message with a specific prompt in the program. If the label were not supplied for control purposes, then another label might be provided to link this statement with its help message.
Guard
If this optional field does not evaluate to true, then the command (or block that the command initiates) will not be executed. The listing program precedes the guard with ON CONDITION that and prints the guard on a line (or lines) before the command line.
Operator
This is the only mandatory field. A complete list of TEDIUM Ml commands is given in Appendix B.
Parameters
The format of this optional field is a function of the operator (see Appendix B).
Control
Two optional control fields may be specified, one for normal and the other for abnormal returns. The definition of an abnormal return will depend on the operator (see Appendix B). Two generic labels are available for the control field: $QUIT for the normal return and $ERR for the abnormal return. A null abnormal return is processed as $ERR.
Because the average specification box contains only 15 statements, most boxes can be seen on a single display screen. By keeping the control flow keys outside the box, it is easy to follow the flow, even though each statement has two built-in GOTOs. By keeping the commands in a single column, it is easy to scan a listing and find the calls to other programs. By preceding the guards with the uppercase ON CONDITION, the predicates are readily recognized. Thus, although the general format is quite different from most other programming and specification languages, the present form is the result of considerable experimentation. Once learned, it is very easy to work with.
Various forms of indentation and presentation were tried for the guard, and the present form was found to be the most aesthetically pleasing and effective.
115
Program Specifications
The structure of the TEDIUM command language is isomorphic to the augmented transition network (ATN) used by Winograd [Wino83] as a graphic tool for the application of natural language processing. This is not to suggest that TEDIUM would be effective for that application class, nor that the ATN served as a model for the TEDIUM command language.
5.2.2
Command Primitives
Two parameter identified comprises parameter associated
types of primitives are used throughout the language: functions and substrings. The former includes the standard MUMPS functions in Chapter 3 plus some higher level TEDIUM functions; the latter units used for parameter fields and function arguments. The substrings are considered first. The most important of these arc with the identification of table-related objects.
Recall from the example in Chapter 4 that terms were linked to attributes in the internal vocabulary via tables GELUV and UVGEL contained the links between (UVNAM) and a generic element (GELEM). UVGEL was
in the user vocabulary generic elements. The a user vocabulary term defined as
UVGEL(SYSTEM,UVNAM,"UV",GELEM) = UVGESR To find out, for a given SYSTEM, if a user term UVNAM had a generic element associated with it, it would suffice to find out if there existed in the database some entry U VGEL(SYSTEM,UVNAM,"UV",-) where the dash indicates any
GELEM value.
In the TEDIUM notation, this is called a table stub, and the stub for the above example is UVGEL//IKEY where IKEY is the name of the element that contains the literal string "UV". The slash (/) is used as a field delimiter in TEDIUM, and the table stub requires an initial field that identifies the table and a third field that identifies the lowest-order index element in the stub. The second field in the table stub indicates alternative values to be used for the keys. For example, recall that UVLNAM is verified as UVNAM and shares the same domain. To find out if the UVLNAM value has an associated generic element, one would ask if, in the TEDIUM notation, UVGEL/UVNAM=UVLN AM/IK EY is defined in the database. (The equal sign in this notation is not an assignment, and the current value of UVNAM is not referenced.) That is, substituting UVLNAM for UVNAM, one asks if
Chapter 5
116
UVGEL(SYSTEM,UVLNAM,"UV",-) is defined in the database. The first two fields of the table stub are called the table indicator, and the indicator is used for several different data manipulation functions. Some of these functions process index elements, others process data elements. For example, the Set FirST index (SFST) command operates on an index element. Assuming that UVGEL//IKEY is defined, then SFST GELEM/UVGEL will set GELEM to the first value (in collating sequence order) of GELEM in UVGEL for the given context of SYSTEM and UVNAM. If there were no first value (i.e., if UVGEL//IKEY were not defined), then SFST would activate the abnormal return. The command SFST GELEM/UVGEL/UVNAM=UVLNAM would perform the same function in the context of SYSTEM and UVLNAM. The Get (G) command illustrates how the parameter substring is used with data elements. This command gets data elements from a table whose full context has been established. Thus, G UVGESR/GELEM sets UVGESR to the current established by the context of below, there are processing context does not define a table
value of the UVGESR data field in the entry SYSTEM, UVNAM, and GELEM. (As discussed options for specifying the outcome when the entry.)
Although there are further variations of the table substrings as well as other types of parameter substrings, these will be introduced when the operators are discussed. Thus, the discussion now turns to the functions. The most useful of these is the existence function, which establishes if a table stub is defined in the database. It normally is used only in the guard field. Continuing with the current example, the guard ON CONDITION that @EX(UVGEL//IKEY) would allow the block starting with the current statement to be executed only if UVGEL(SYSTEM,UVNAM,"UV",-) exists in the database. All TEDIUM functions begin with the at sign (@) and have a twocharacter identifier. Often, a function is defined for both the positive and negative case. For example, there is @EX() for exists and @NE() for does not
Program Specifications
117
exist; the latter also could be written ’@EX(), where the apostrophe (’) is the negation operator. A second pair of functions determines if an element is composed of only blank characters or is null. To test if the most recent input (YC) does not contain any nonblank characters, one might use the guard ON CONDITION that @BL(YC) The negation of @BL is @NB. There also are functions for manipulating dates, controlling output, and converting formats, but these will not be used in any of the examples that follow. The MUMPS functions begin with a dollar sign ($) and normally are identified by a single character. Chapter 3 described the SPiece, SExtract, and SFind. One additional function is the $Data, which takes either an element or a node in a sparse array as an argument and returns a value indicating if the object contains data and (for a sparse array) pointers. The function is generally used in TEDIUM to test for the existence of an element. For example, unless it is defined as part of the context for the program, the element SYSTEM cannot be assumed to be defined (i.e., the symbol table for the generated MUMPS routine may not have SYSTEM as an entry). If so, then the statement ON CONDITION that ’SD(SYSTEM) Call program to set a value for SYSTEM may be appropriate. Again, there are other very useful MUMPS functions, but they will not be used in the examples that follow.’
5.2.3
TEDIUM Commands
A summary of the commands is contained in Appendix B, and in this section only the most commonly used commands are described.” I begin with some historical observations about the Ml commands. First, the generator used a very simplistic parser. It looked for only certain TEDIUM features and assumed that the remainder was in a correct MUMPS syntax. The result was that many parameter fields were written as MUMPS arguments and some of the syntax testing was deferred to the MUMPS compiler (or, worse yet, the
The fact that the TEDIUM and MUMPS functions have a different syntax is an historical accident that will be corrected in version Tl. During the building of the Ml version, all users were familiar with MUMPS; there was no perceived need to generate code in any language other than MUMPS. This is no longer true, and the Tl version of TEDIUM will eleminate its MUMPS artifacts so that programs can be generated in any language. A complete definition of all commands, generic programs and functions is available in [Tedi85].
Chapter 5
118
interactive testing). Consequently, as already illustrated with the functions, the command syntax is not as unified as one would wish. A second limitation was that the first portions of the generator were implemented in MUMPS code, and I never had the time to go back and redo things properly. (Of course, this is the objective of version Tl.) Nevertheless, I did begin to define some primitive operators that could be used to generate higher level operators. I also altered the generator so that incompletely implemented TEDIUM operators could not be used in situations that might result in an error. For example, to delete entries in a table with a related table, the user was required to use the primitive delete operator. These primitive operators have the initial letter Y, and they are used in some of the examples below. Finally, the elegance of the language was crimped by the 1982 features of MUMPS plus the fact that the Ml version was bootstrapped from even earlier versions of TEDIUM. For example, as seen with the data model definition, the variable and program names are limited to six characters. As a result, my apparently arbitrary selection of mnemonics can make the examples difficult to comprehend. Despite these limitations, however, I believe that the command language is small enough and sufficiently intuitive to be learned rapidly. Table 5.1 indicates important of these are:
that
there
are
five
control
functions.
The
most
Call
This transfers control to another program; return is to the following statement unless a control field is supplied. In keeping with the 1982 MUMPS conventions, the parameter field specifies only the program to be called; there is no passing of parameters.
For Each
This loops through a single index in a table. The end of the loop is signified by an End statement, and if there is no next value for the indicated index, control is sent to the location specified by the End.
NeXt
This is similar to the For Each operator except that it sets all lower order indices from the given index element so that a table entry is defined.
To illustrate the For Each (FE) and NeXt (NX) operators, assume that we wished to list out all the user terms linked to a generic term. This could be done with the following code: FE UVNAM/UVGEL Write out UVNAM END The output will identify all user vocabulary terms related to one or more generic elements. Within this loop, nothing known about GELEM; it may not be defined, in general it will not be linked to the current value of UVNAM. Changing the problem, assume that we now wished to write out the GELEM linked to each UVNAM. Here the NeXt operator could be used.
Program Specifications
119
NX UVNAM/UVGEL Write out UVNAM," is linked to ",GELEM END The NcXt establishes the context for an element in UVGEL, and (if there were data elements) it could be followed by a Get. Of course, the NeXt block is equivalent to FE UVNAM/UVGEL FE GELEM/UVGEL Write out UVNAM," is linked to ",GELEM END END Both the For Each and NeXt accept the table indicator as part of the parameter string. They each also accept a primitive form of a "where clause" as an optional parameter string. Neither operator accepts a control field. Of the three computation functions, the most important is the Assign operator. Like all MUMPS commands, it accepts multiple arguments if they are separated by commas. Because MUMPS is nondeclarative, if the left side of an assignment argument is not already in the symbol table, then it is added. This is true for both variables and sparse arrays. TEDIUM does not provide for the declaration of variables at the program level.* Variable names are removed from the symbol table by the Undcfine operator. (The Assign maps onto the MUMPS Set and the Undefine onto the Kill.) The most important of the database functions are Get
This gets data elements from the entry of the specified table with the current context. If there is no entry for the current context, then all data elements identified in the parameter string will be set to blanks or nulls. Thus, control will always be passed forward with all the identified elements defined. If a location is given for the abnormal return control, then the command begins by testing to see if an entry already exists; if it does not, the abnormal return is executed.
Put
This puts the set of elements into the table entry identified by the parameter field and context. If only a proper subset of the table’s data elements are identified, the operator causes only the identified elements to be updated in this entry. (For the partial put, the
It is possible to define elements in the data model that are not part of any table, and this is often done for documentation purposes. However, because the assignment string is not parsed, type checking at the assignment level is not possible in version ML This will be corrected with Tl, and a local declaration statement will be introduced. Also, as discussed in Chapter 4, more extensive working (internal) table structures will be supported.
120
Chapter 5
abnormal return is taken if the entry does not already exist.) there are related tables, they are updated at this time as well. Delete
If
This deletes an entry in the table, a subset of the table as defined by a table stub, or an entire table. If the table is part of a larger structure, then other tables in the structure also may be deleted. Thus, if the GELEM context is established, DEL GELDEF/GELEM will delete the entire structure for that generic element in the generic element structure. Unfortunately, the Ml version does not automatically preserve referential integrity, and so GELUV will be deleted (because it is in the same structure as GELDEF), but its related table, UVGEL, will not be deleted because it is in a different structure. As will be shown below, procedural code and the primitive delete must be used in this case.
Other commands include the copy and move, which copies or moves (with a delete) an element, subset, or full table (or structure) to another position in the same or a different table or structure. Both require procedural code and primitives when dealing with related tables. Finally, there are commands like the Set FirST index illustrated above that get the first, last, or next index value. Of the 16 input/output functions, the most used include Input
This inputs and validates an element that has been defined in the data model. On option, it will allow editing, changes to the prompt line, execution in the command mode, and the acceptance of an undefined value. The default for Input is the data mode, and commas are accepted as data. When in the command mode, a user can enter several commands at one time, separating the commands with commas. The generated code to process the input first examines a command string to see if any input has been entered already. If so, the input to the next comma is processed, and there is no user interaction. Otherwise, the command prompts normally. The Input Command is an example of an operator that is defaulted to the command mode. It is used to input an element that need not be defined in the data model.
PRompt
This provides a prompt string and accepts only inputs that are valid in the context of the string. It is in the command mode by default.
WRite
This is the standard write statement. Its parameter string uses the standard MUMPS formatting operators augmented with some special TEDIUM functions.
There also are operators for defining head and foot statements, for writing the text elements, for modifying the text processor defaults, and for managing page formatting.
Program Specifications
121
Finally, the TEDIUM functions include the No-operation, which normally is used as a GOTO; a comment operator (*), which treats the parameter field as comments; STacKer operators, which set up background jobs that are managed by the TEDIUM stacker (see Chapter 6); and the Source command, which allows the designer to enter code in the target language — a feature normally reserved for file-oriented commands such as open, close, and lock.
5.3
COMMON PROGRAMS
TEDIUM common programs are composed of a control shell and a set of TEDIUM statements. The control shell establishes the context for entering and exiting the program. (In version Tl, this shell will be expanded to support parameter passing and also to establish preconditions and normal/abnormal postconditions.) Definition of control shell parameters is managed by menu, which also controls the definition of help messages. (Help messages may be defined after the program has been generated; any other changes to the program specification or the data model it references require a program regeneration for implementation.) The following common programs illustrate how the TEDIUM commands are used. The first example is shown in Figure 5.2. It contains the specification for the program GETSYS, which gets a value for the system identifier SYSTEM. If the value input is not already defined in the INA data model, then the user is given an opportunity to enter a new value. The specification lists the program’s identifier and short name, the date and designer when the specification was first and last edited, and some descriptive text. The line numbers to the left of the box are used by the editor; naturally, version Tl will support full screen editing. The control shell for GETSYS indicates that it has no initial context, but that after a normal exit it will establish the context of SYSTEM and SUBSYS.’ The program uses the following flow. At BEG it calls PRSYS, which is a generic prompt program described in the next section. If the prompt program can identify a value for SYSTEM, then the normal return is taken, and control is sent to the statement with the label SUBSYS. This sets the subsystem identifier (SUBSYS) to the same name as the system identifier (SYSTEM), which is the default. Because this is the last statement, the normal return is taken. If the generic prompt program cannot find a matching value for SYSTEM, then it takes the abnormal return and sends control to ADD. The last read input is maintained in YC, and the statement with the label ADD tests to see if it is null or blank; if so, the program takes the abnormal return (to the generic label $ERR). If the abnormal return is taken, then the desired context
In the Ml version I used the terms preset and passed, but the Tl version will support parameter passing and so such concepts will be managed properly. As explained below, the Tl version also will have tools to control and test for the context throughout the application.
Chapter 5
122
TEDIUM LISTING FOR APPLICATION DATMOD SPECIFICATION DESCRIPTION SPECIFICATION GETSYS Get or Define System BIB
03/19/87
BIB
This program prompts for a system id, user to define one.
THE FOLLOWING ELEMENTS ARE PASSED SUBSYS View (Subsystem) SYSTEM System Id 5
BEG
12/01/88 (C)
12/01/88
and if one is not found allows the
Name
PRSYS
C
ON ABNORMAL RETURN 10 15 20 25
ADD
ON N A PR ON N ON A ON W ON A ON I ON W
PR1
30 35 40 45 50
SUBSYS ADD
CONDITION that @BL(YC) $ERR SYSTEM=YC (A)dd this as a system CONDITION that YC="R"
(E)nter system
(R)etry BEG
CONDITION that ($L(SYSTEM)15) YC="-» CONDITION that YC="—" [Please reenter the full System Id] CONDITION that YC="-" YC="E" CONDITION that YC="E" SYSTEM/MAND/UNDEF CONDITION that @EX(SYSTEM) [This system has already been defined] BEG
55 60 65 70 75 80
SUBSYS
IT I P A P A
SYSDES/SYSTEM SYSPRE /SYSTEM SUBSYS=SYSTEM,SUBN=0 /SUBSYS SUBSYS=SYSTEM
$QUIT
INTERNAL HELP MESSAGES MESSAGE LABEL PR1 The system identifier that you selected is not known to the data model manager. You may wish to: A - add a new system using the name that you have entered E -
add a new system but reenter the identifier already supplied
R -
retry to enter a previously defined system identifier
Figure 5.2
Listing of the program specification GETSYS.
Program Specifications
123
may not have been established, and it becomes the responsibility of the calling program to decide what action to take. The default action for the calling program is to take the abnormal return; thus, the decision is defaulted to some higher level of control. Assuming that a new value is to be entered, the contents of stored in SYSTEM and the following prompt is printed: Add this as a system
Enter system
YC are
Retry
Again, the response is in YC. If an "R” was entered, control is sent back to BEG. The next three statements illustrate how the present limitation of TEDIUM’S blocking structure introduces housekeeping. Here, if the length of SYSTEM is less than 2 or over 15, we want the user to: Prompt the user with: [Please reenter the full System Id], and then set the last read variable to indicate that the value for SYSTEM is to be entered in its entirety. Because version Ml does not support this elementary form of block structure, YC is set to to act as a processing flag. In any case, control falls to the statement that checks of see if YC is equal to "E". If so, it requests a mandatory (MAND) input for SYSTEM accepting an undefined (UNDEF) value (i.e., the input will not be tested for existence in the table SYSTEM). The next statement can be reached only if SYSTEM contains a non-null value that is valid. It tests to see if the value already has been defined; if so, it prints the message and returns control to BEG. If this is a new system identifier, then the text for the system description (SYSDES) is processed with the Input Text command. Following this the system prefix (SUBPRE) is requested. The default is that this is optional, and the next statement will assume that the context includes a valid value for SUBPRE. It Puts all elements in the table SYSTEM. (The null field before the table name implies that a full record is to be updated.) The default subsystem name then is defined to be the system name, and the name and sequence order value are Put into SUBSYS. Because the context has now been defined, the program takes the normal return by transferring control to SQUIT. Figure 5.3 illustrates a sample interaction using GETSYS. The user is prompted for a System Id (by the prompt program PRSYS) and enters DEMO. The prompt program lists all entries that match that pattern and asks if the resultant match should be accepted. The user responds N, and the prompt program takes the abnormal return, which causes GETSYS to begin processing with the statement ADD. The prompt is printed, and a request for help (?) is processed using the text defined with the specification. The prompt is repeated, and a null return indicates that the Add option was selected. The Input Text calls in the text processor for System Description, and the Input prompts for a value for System Prefix. Again there is a request for help (?), and this time the help message text is taken from the element description in the data dictionary. Because System Prefix is optional, a null is accepted.
Chapter 5
124
The database is updated, and a subsequent call to GETSYS with an input of D produces the result shown in Figure 5.4.
ENTER System Id: DEMO 1 DEMONSTRATION END OF LIST ACCEPT (Y/N) N Add this as a system
Enter system
Retry ?
The system identifier that you selected is not known to the data model manager. You may wish to: A - add a new system using the name that you have entered E -
add a new system but reenter the identifier already supplied
R -
retry to enter a previously defined system identifier
Add this as a system Enter system Retry PROCESS TEXT FOR System Description :This is another small demonstration database used to illustrate special :features of the Data Model Manager. System Prefix
:
?
The system prefix defines authority in the SQL queries. the table name. MORE INFORMATION (N/Y) System Prefix :
Figure 5.3
It is prefixed to
Sample dialogue using GETSYS.
ENTER System Id: D 1 DEMO 2 DEMONSTRATION 3 DRD END OF LIST ENTER NUMBER OR (Q)UIT
Figure 5.4
Query using GETSYS after an update.
From this short example, it can be seen how the specification defines the flow to be supported by the program. Despite its cumbersome blocking structure, the specification’s 16 lines do a considerable amount of work. Much of the housekeeping is managed by TEDIUM. The generated program processes help requests for the prompt (where the message is linked to the label PR1) and the two inputs (where the messages are taken from the data model dictionary). The specification guarantees that the value for SYSTEM will be valid, and it incorporates a test to ensure that an existing SYSTEM definition will not be overwritten. Clearly, the specification is procedural, but that is an expression of the desired flow. Nothing in the specified flow is implicit in the system style; neither is there a generic program that generalizes the flow.
Program Specifications
125
Thus, for the given style, this specification is minimal; using the TEDIUM command statements, it indicates how the process is to be carried out. Figure 5.5 contains the specification for a program to list out the generic elements. The general style is to limit comments to the descriptive text associated with the specification; the specifications are usually small enough to eleminate the need for additional commentary. In this case, the program is called with the SYSTEM context set and OPT2 to either S, for a one-line-pergeneric-element listing, or D, for the full definition. OPTS identifies the element with which to begin the listing. (As in most TEDIUM applications, the same listing programs are used online and offline; the OPTS is most useful for online scanning starting with some prefix or element.)
TEDIUM LISTING FOR APPLICATION DATMOD SPECIFICATION DESCRIPTION SPECIFICATION GELS List Generic Elements BIB
05/06/87
BIB
12/02/88 (C)
05/08/87
This program lists out the generic elements. The option flag, OPT2 is S for one line summary and D for full definition. The listing begins with the element OPTS.
THE FOLLOWING ELEMENTS ARE PRESET SYSTEM System Id THE FOLLOWING ELEMENTS ARE SAVED 1 OPT2 2 OPTS
50 55
HD ?15,"List of Generic Elements ", SYSTEM,?60,YDT, ?70,"Page ",$J(YPAGE,3),!!,"Generic Element",?40, "Primary Internal Element",! WP INIT ON CONDITION that '$D(OPT2) A OPT2="S" ON CONDITION that '$D(OPTS) A OPTS=" " FE GELEM/GELDEF//OPTS* ►u»or -*-T