Academia.eduAcademia.edu

Using program slicing in software maintenance

1991, IEEE Transactions on Software Engineering

Program slicing, introduced by Weiser, is known to help programmers in understanding foreign code and in debugging. We apply program slicing to the maintenance problem by extending the notion of a program slice (that originally required both a variable and line number) to a decomposition slice, one that captures all computation on a given variable; i.e., is independent of line numbers. Using the lattice of single variable decomposition slices, ordered by set inclusion, we demonstrate how to form a slice-based decomposition for programs. We are then able to delineate the e ects of a proposed change by isolating those e ects in a single component of the decomposition. This gives maintainers a straightforward technique for determining those statements and variables that may be modi ed in a component and those that may not. Using the decomposition, we provide a set of principles to prohibit changes that will interfere with unmodi ed components. These semantically consistent changes can then be merged back into the original program in linear time. Moreover, the maintainer can test the changes in the component with the assurance that there are no linkages into other components. Thus, decomposition slicing induces a new software maintenance process model that eliminates the need for regression testing.

Using Program Slicing in Software Maintenance K. B. Gallagher Computer Science Department Loyola College in Maryland 4501 N. Charles St. Baltimore, Maryland 21210 J. R. Lyle Computer Science Department University of Maryland, Baltimore Campus 5401 Wilkens Avenue Baltimore, Maryland 21228 August, 1991 Abstract Program slicing, introduced by Weiser, is known to help programmers in understanding foreign code and in debugging. We apply program slicing to the maintenance problem by extending the notion of a program slice (that originally required both a variable and line number) to a decomposition slice, one that captures all computation on a given variable; i.e., is independent of line numbers. Using the lattice of single variable decomposition slices, ordered by set inclusion, we demonstrate how to form a slice-based decomposition for programs. We are then able to delineate the e ects of a proposed change by isolating those e ects in a single component of the decomposition. This gives maintainers a straightforward technique for determining those statements and variables that may be modi ed in a component and those that may not. Using the decomposition, we provide a set of principles to prohibit changes that will interfere with unmodi ed components. These semantically consistent changes can then be merged back into the original program in linear time. Moreover, the maintainer can test the changes in the component with the assurance that there are no linkages into other components. Thus, decomposition slicing induces a new software maintenance process model that eliminates the need for regression testing. Index terms: Software Maintenance, Program Slicing, Decomposition Slicing, Software Process Models Software Testing, Software Tools, Impact Analysis 1 1 Introduction In \Kill that Code!," [32] Gerald Weinberg alludes to his private list of the world's most expensive program errors. The top three disasters were caused by a change to exactly one line of code: \each one involved the change of a single digit in a previously correct program." The argument goes that since the change was to only one line, the usual mechanisms for change control could be circumvented. And, of course, the results were catastrophic. Weinberg o ers a partial explanation: \unexpected linkages," i.e., the value of the modi ed variable was used in some other place in the program. The top three of this list of ignominy are attributed to linkage. More recently, in a special section of the March, 1987 issue of IEEE Transactions on Software Engineering, Schneidewind [30] notes that one of the reasons that maintenance is dicult is that it is hard to determine when a code change will a ect some other piece of code. We present herein a method for maintainers to use that addresses this issue. While some may view software maintenance as a less intellectually demanding activity than development, the central premise of this work is that software maintenance is more demanding. The added diculty is due in large part to the semantic constraints that are placed on the maintainer. These constraints can be loosely characterized as the attempt to avoid unexpected linkages. Some [4, 14] have addressed this problem by attempting to eliminate these semantic constraints and then providing the maintainer with a tool that will pinpoint potential inconsistencies after changes have been implemented. This makes maintenance appear to be more like development, since the programmer does not need to worry about linkages: once the change is made, the tool is invoked and the inconsistencies (if any) are located. One would expect that the tool would proceed to resolve these inconsistencies, but it has been shown that this problem is NP-hard [14]. Thus, the maintainer can be presented with a problem that is more dicult to resolve that the original change. We take the opposite view: present the maintainer with a semantically constrained problem and let him construct the solution that implements the change within these constraints. The semantic context with which we propose to constrain the maintainer is one that will prohibit linkages into the portions of the code that the maintainer does not want to change. This approach uncovers potential problems earlier than the aforementioned methods, and, we believe, is worth any inconvenience that may be encountered due to the imposition of the constraints. Our program slicing based techniques give an assessment of the impact of proposed modi cations, ease the problems associated with revalidation and reduce the resources required for maintenance activities. They work on unstructured programs, so they are usable on older systems. They may be used for white-box, spare-parts and backbone maintenance without regard to whether the maintenance is corrective, adaptive, perfective or preventive. 2 Background Program slicing, introduced by Weiser, [33, 36] is a technique for restricting the behavior of a program to some speci ed subset of interest. A slice ( ) (of program ) on variable , or set of variables, at statement yields the portions of the program that contributed to the value of just before statement is executed. ( ) is called a slicing criteria. Slices can be computed automatically on source programs by analyzing data ow and control ow. A program slice has the added advantage of being an executable program. Slicing is done implicitly by programmers while debugging [33, 35]; slices can be combined to isolate sections of code likely to contain program faults and signi cantly reduce debugging times [23, 24, 25]. There has been a urry of recent activities where slicing plays a signi cant role. Horwitz, Reps, et al. [15, 16, 28] use slices in integrating programs. Their results are built on the seminal work of Ottenstein and Ottenstein [7, 27] combining slicing with the robust representation a orded by program dependence graphs. Korel and Laski [20, 21, 22] use slices combined with execution traces for program debugging and testing. Choi et al. [6] use slices and traces in debugging parallel programs. Reps and Wang [29] have investigated termination conditions for program slices. Hausler [13] has developed a denotational approach to program slicing. Gallagher [8] has improved Lyle's [23] algorithm for slicing in the presence of goto's and developed techniques for capturing arbitrarily placed output statements. We will not discuss slicing techniques in this paper and instead refer the interested reader to these works. Since we want to avoid getting bogged down in the details of a particular language, we will identify a program with its owgraph. Each node in the graph will correspond to a single source language statement. S v; n P n v v S v; n 1 n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 #define YES 1 #define NO 0 main() f int c, nl, nw, nc, inword ; inword = NO ; nl = 0; nw = 0; nc = 0; c = getchar(); while ( c != EOF ) f nc = nc + 1; if ( c == '\n') nl = nl + 1; if ( c == ' ' || c == '\n' || c == '\t') inword = NO; else if (inword == NO) f inword = YES ; nw = nw + 1; g g g c = getchar(); printf("%d \n",nl); printf("%d \n",nw); printf("%d \n",nc); Figure 1: Program to be Sliced Henceforth, the term statement will mean a node in the owgraph. Using a common representation scheme makes the presentation clear, although it is clear that any tool based on these techniques will need to account for the nuances of the particular language. In this paper, we also ignore problems introduced by having dead code in the source program, and declare that the programs under consideration will not have any dead code. See [8] for slicing based techniques to eliminate dead code. Figures 2-6 illustrate slicing on the program of gure 1, a bare bones version of the Unix utility wc, word count, taken from [19]. The program counts the number of characters, words, and lines in a text le. It has been slightly modi ed to illustrate more clearly the slicing principles. The slices of gures 2-4 are complete programs that compute a restriction of the speci cation. The slice on nw ( g. 2) will output the number of words in a le; the slice on nc ( g. 3) will count the number of characters in the input text le; the slice on nl ( g. 4) will count the number of lines in the le. 3 Using Slices for Decomposition This section presents a method for using slices to obtain a decomposition of the program. Our objective is to use slicing to decompose a program \directly," into two (or more) components. A program slice will be one of the components. The construction is a two step process. The rst step is to build, for one variable, a decomposition slice, which is the union of certain slices taken at certain line numbers, on the given variable. Then the other component of the decomposition, called the \complement," will also be obtained from the original program. The complement is constructed in such a way that when certain statements of the decomposition slice are removed from the original program, the program that remains is the slice that 2 1 2 3 4 5 6 8 10 11 15 16 17 18 19 20 21 22 24 26 #define YES 1 #define NO 0 main() f int c, nw, inword ; inword = NO ; nw = 0; c = getchar(); while ( c != EOF ) f if ( c == ' ' || c == '\n' || c == '\t') inword = NO; else if (inword == NO) f inword = YES ; nw = nw + 1; g g g c = getchar(); printf("%d \n",nw); Figure 2: Slice on (nw,26): Word Counter 3 4 5 9 10 11 12 21 22 25 26 main() f int c, nc ; nc = 0; c = getchar(); while ( c != EOF ) f nc = nc + 1; c = getchar(); g g printf("%d \n",nc); Figure 3: Slice on (nc,26): Character Counter 3 3 4 5 7 10 11 13 14 21 22 23 26 main() f int c, nl, ; nl = 0; c = getchar(); while ( c != EOF ) f if ( c == '\n') nl = nl + 1; c = getchar(); g g printf("%d \n",nl); Figure 4: Slice on (nl,26): Line Counter 1 2 3 4 5 6 10 11 15 16 17 18 20 21 22 26 #define YES 1 #define NO 0 main() f int c, inword ; inword = NO ; c = getchar(); while ( c != EOF ) f if ( c == ' ' || c == '\n' || c == '\t') inword = NO; else if (inword == NO) f inword = YES ; g g g c = getchar(); Figure 5: Slice on (inword,26) 3 4 5 10 11 21 22 26 main() f int c ; c = getchar(); while ( c != EOF ) f c = getchar(); g g Figure 6: Slice on (c,26)  4 1 2 3 4 5 6 input input t = a print t = a print a b + b t - b t Figure 7: Requires a Decomposition Slice corresponds to the complement (in a sense to be de ned) of the given criteria, with respect to the variables de ned in the program. Thus the complement is also a program slice. The decomposition slice is used to guide the removal of statements in a systematic fashion to construct the complement. It is insucient to merely remove the slice statements from the original program. Since we require that a slice be executable, there will be certain crucial statements that are necessary in both the slice and its complement. For example, if we start with the slice of gure 2 and remove all its statements from the original program, the resulting object will not even compile! We use this decomposition to break the program into manageable pieces and automatically assist the maintainer in guaranteeing that there are no ripple e ects induced by modi cations in a component. We use the complement to provide a semantic context for modi cations in the decomposition slice; the complement must remain xed after any change. The decomposition ideas presented in this section are independent of a particular slicing method. Once a slice is obtained, by any slicing algorithm, a program decomposition may be computed. Clearly, the quality of the decomposition will be a ected by the quality of the slice, in the sense that more re ned slices give a ner granularity and also deliver more semantic information to the maintainer. A program slice is dependent on a variable and a statement number. A decomposition slice does not depend on statement numbers. The motivation for this concept is easily explained using the example of gure 7. The slice ( 4) is statements 1, 2, 3, 4, while the slice ( 6) is statements 1, 2, 5, 6. Slicing at statement (in this case 6) of a program is insucient to get all computations involving the slice variable, . A decomposition slice captures all relevant computations involving a given variable. To construct a decomposition slice, we borrow the concept of critical instructions from an algorithm for dead code elimination as presented in Kennedy [18]. A brief reprise follows. The usual method for dead code elimination is to rst locate all instructions that are useful in some sense. These are declared to be the critical instructions. Typically, dead code elimination algorithms start by marking output instructions to be critical. Then the use-de nition [18] chains are traced to mark the instructions that impact the output statements. Any code that is left unmarked is useless to the given computation. De nition 1 Let ( ) be the set of statements in program P that [ output variable , let last be the last statement of , and let N = Output(P,v) [ flastg. The statements in ( ) form the decomposition S t; S t; last t Output P; v v P n2N S v slice on , denoted ( ). The decomposition slice is the union of a collection of slices, which is still a program slice [36]. We include statement so that a variable that is not output may still be used as a decomposition criteria; this will also capture any de ning computation on the decomposition variable after the last statement that displays its value. To successfully take a slice at statement last, we invoke one of the crucial di erences between the slicing de nitions of Reps, with those of Weiser, Lyle and this work. A Reps slice must be taken at a point, , with respect to a variable that is de ned or referenced at . Weiser's slices can be taken at an arbitrary variable at an arbitrary line number. This di erence prohibits Reps' slicing techniques from being applicable in the current context, since we want to slice on every variable in the program at the last statement. We now begin to examine the relationship between decomposition slices. Once we have this in place, we can use the decomposition slices to perform the actual decompositions. To determine the relationships, we take the decomposition slice for each variable in the program and form a lattice of these decomposition slices, v S v last p p 5 ordered by set inclusion. It is easier to gain a clear understanding of the relationship between decomposition slices if we regard them without output statements. This may seem unusual in light of the above de nition, since we used output statements in obtaining relevant computations. We view output statements as windows into the current state of computation, which do not contribute to the realization of the state. This coincides with the informal de nition of a slice: the statements that yield the portions of the program that contributed to the value of v just before statement n is executed. Assuming that output statements do not contribute to the value of a variable precludes from our discussion output statements (and therefore programs) in which the output values are reused, as is the case with random access les or output to les that are later reopened for input. Moreover, we are describing a decomposition technique that is not dependent on any particular slicing technique; we have no way of knowing whether or not the slicing technique includes output statements or not. We say a slice is output-restricted if all its output statements are removed. De nition 2 Output S (v) \ S (w) = ;. restricted decomposition slices S (v) and S (w) are independent if It would be a peculiar program that had independent decomposition slices; they would share neither control ow or data ow. In e ect, there would be two programs with non-intersecting computations on disjoint domains that were merged together. The lattice would have two components. In Ott's slice metric terminology, [26] independence corresponds to low (coincidental or temporal) cohesion. Output-restricted decomposition slices that are not independent are said to be (weakly) dependent. Subsequently, when we speak of independence and dependence of slices it will always be in the context of output-restricted decomposition slices. De nition 3 Let S (v) and S (w) be output-restricted decomposition slices, w 6= v, and let S(v)  S(w). S (v) is said to be strongly dependent on S (w). Thus output-restricted decomposition slices strongly dependent on independent slices are independent. The de nitions of independence and dependence presented herein are themselves dependent on the notion of a slice. The analogous de nitions are used by Bergeretti and Carre [3] to de ne slices. In Ott's metric terminology, [26] strong dependence corresponds to high (sequential or functional) cohesion. Strong dependence of decomposition slices is a binary relation; in most cases, however, we will not always need an explicit reference to the containing slice. Henceforth, we will write \S (v) is strongly dependent" as a shorthand for \S (v) is strongly dependent on some other slice S (w)" when the context permits it. De nition 4 An output-restricted slice S (v) that is not strongly dependent on any other slice is said to be maximal. Maximal decomposition slices are at the \ends" of the lattice. This de nition gives the motivation for output restriction; we do not want to be concerned with the possible e ects of output statements on the maximality of slices or decomposition slices. This can be observed by considering the decomposition slices on nw and inword, of gures 2 and 5. If we regarded output statements in de ning maximal, we could force the slice on inword to be maximal by the addition of a print statement referencing inword along with the others at the end of the program. Such a statement would not be collected into the slice on nw. Since this added statement is not in any other slice, the slice on inword would be maximal and it should not be. Figure 8 gives the lattice we desire. S (nc), S (nl) and S (nw) are the maximal decomposition slices. S (inword) is strongly dependent on S (nw); S (c) is strongly dependent on all the other decomposition slices. The decomposition slices on S (nw), S (nc), and S (nl), gures 2 - 4 they are weakly dependent and maximal, when the output statements are removed. There are no independent decomposition slices in the example. Recall that independent decomposition slices cannot share any control ow: the surrounding control statements would make them dependent. We now begin to classify the individual statements in decomposition slices. De nition 5 Let S (v) and S (w) be output-restricted decomposition slices of program P . Statements in S (v) \ S (w) are called slice dependent statements. 6 S(nc) S(nl) S(nw) " - S(inword) " S(c) % Figure 8: Lattice of Decomposition Slices. Slice independent statements are statements that are not slice dependent. We will refer to slice dependent statements and slice independent statements as dependent statements and independent statements. Dependent statements are those contained in decomposition slices that are interior points of the lattice; independent statements are those in a maximal decomposition slice that are not in the union of the decomposition slices that are properly contained in the maximal slice. The terms arise from the fact that two or more slices depend on the computation performed by dependent statements. Independent statements do not contribute to the computation of any other slice. When modifying a program, dependent statements cannot be changed, or the e ect will ripple out of the focus of interest. For example, statement 12 of the slice on nc ( g. 3) is a slice independent statement with respect to any other decomposition slice. Statements 13 and 14 of the slice on nl ( g. 4) are also slice independent statements with respect to any other decomposition slice. The decomposition slice on c ( g. 6) is strongly dependent on all the other slices, thus all its statements are slice dependent statements with respect to any other decomposition slice. Statements 6 and 15-20 of the slice on nw ( g. 2) are slice independent statements with respect to decomposition slices S(nc), S(nl), and S(c); only statement 19 is slice independent when compared with S(inword). Statements 6, 15-18 and 20 of the decomposition slice on inword ( g. 5) are slice independent statements with respect to decomposition slices S(nc), S(nl), and S(c); no statements are slice independent when compared with S(nw). We have a relationship between maximal slices and independent statements. This proposition permits us to apply the terms \(slice) independent statement" and \(slice) dependent statement" in a sensible way to a particular statement in a given maximal decomposition slice without reference to the binary relation between decomposition slices that is required in de nition 5. Proposition 1 Let 1. 2. 3. 4. Varset(P) be the set of variables in program P. S(v) be an output-restricted decomposition slice of P. Let M = f m 2 Varset(P) j S(m) is maximalg Let U = M - f v g The statements in S(v) ? [ S(u) are independent. u2U Proof Sketch: Let U =[fu1; . . .; umg. S(v) ? S(u) = S(v) ? S(u1 ) . . . ? S(um ). u2U End. There is a relationship between the maximal slices and the program. (Recall that dead code has been excluded from our discussions.) Proposition 2 Let M = f m 2 Varset(P) j S(m) is maximalg. Then 7 [ m2M S(m) = P . Proof Sketch: Since S(m) 2 P, [m2M S(m)  P. If P 6 [m2M S(m), then the statements in P that are not in [m2M S(m) are dead code. End. Maximal slices capture the computation performed by the program. Maximal slices and their respective independent statements also are related: Proposition 3 An output-restricted decomposition slice is maximal i it has at least one independent state- ment. Proof Sketch: Suppose S(v) is maximal. By de nition S(v) has at least one statement that no other slice has. This statement is an independent statement. Now suppose that S(v) has an independent statement, s. Then s is not in any other slice, and the slice that contains s is maximal. End. Conversely, a slice with no independent statements is strongly dependent. We also have another characterization of strongly dependent slices. Proposition 4 Let 1. 2. 3. 4. 5. Varset(P) be the set of variables in program P. S(v) be an output-restricted decomposition slice of P. Let D = f w 2 Varset(P) j S(v) is strongly dependent on S(w)g Let M = f m 2 Varset(P) j S(m) is maximalg Let U = M - f v g An output-restricted decomposition slice S(v) is strongly dependent (on some S(d)) i [ S(u) = P . u2U Proof Sketch: Suppose S(v) is strongly dependent. We need to show that D has a maximal slice. Partially order D by set inclusion. Let d be one of the maximal elements of D. The element d is maximal; if it is not, then it is properly contained in another slice d1, which is in D and contains S(v). Then d 2 M, d 6= v, and S(v) makes no contribution union. [ S(u) =toP.theSince Suppose U  M, S(v) makes no contribution to the union. By proposition 3, S(v) u2U is strongly dependent. End. We are now in a position to state the decomposition principles. Given a maximal output-restricted decomposition slice S(v) P of program P, delete the independent and output statements of S from P. We will denote this program (v) and call it the complement of decomposition slice S(v) (with respect to P). Henceforth, when we speak of complements, it will always be in the context of decomposition slices. The decomposition slice is the subset of the program that computes a subset of the speci cation; the complement computes the rest of the speci cation. Figures 9 - 11 give the complements of the slices on nw, nc and nl of gures 2 - 4. Using proposition 4 we obtain that the complement of both the slice on inword and the slice on c is the entire program. 8 3 4 5 7 9 10 11 12 13 14 21 22 23 25 26 main() f int c, nl, nw, nc, inword ; nl = 0; nc = 0; c = getchar(); while ( c != EOF ) f nc = nc + 1; if ( c == '\n') nl = nl + 1; c = getchar(); g g Figure 9: printf("%d \n",nl); printf("%d \n",nc); P(nw). Complement of slice on nw: computes line count and character count This yields the approximation of a direct sum decomposition of a program that preserves the computational integrity of the constituent parts. This also indicates P that the only useful decompositions are done with maximal decomposition slices. A complement, , of a maximal slice can be further decomposed, so the decomposition may be continued until all slices with independent statements (i.e. the maximal ones) are obtained. In practice, a maintainer may nd a strongly dependent slice as a starting point for a proposed change. Our method will permit such changes. Such a change may be viewed as properly extending the domain of the partial function that the program computes, while preserving the partial function on its original domain. 4 Application to Modi cation and Testing Statement independence can be used to build a set of guidelines for software modi cation. To do this, we need to make one more set of de nitions regarding variables that appear in independent and dependent statements. With these de nitions we give a set of rules that maintainers must obey in order to make modi cations without ripple e ects and unexpected linkages. When these rules are obeyed, we have an algorithm to merge the modi ed slice back into the complement and e ect a change. The driving motivation for the following development is: \What restrictions must be placed on modi cations in a decomposition slice so that the complement remains intact?" De nition 6 A variable that is the target of a dependent assignment statement is called a dependent variable. Alternatively, and equivalently, if all assignments to a variable are in independent statements, then the variable is called an independent variable. An assignment statement can be an independent statement while its target is not an independent variable. In the program of gure 12 the two maximal decomposition slices are S(a) and S(e) ( gures 13 and 14). Slice S(b) ( gure 15) is strongly dependent on S(a) and S(f) ( gure 16) is strongly dependent on S(b) and S(a). S(d) and S(c) (not shown) are strongly dependent on both maximal slices. In S(a), statements 8, 10, and 11 are independent, by the proposition. But variables a and b are targets of assignment statements 6 and 5, respectively. So, in the decomposition slice S(a), only variable f is an independent variable. 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 #define YES 1 #define NO 0 main() f int c, nl, nw, nc, inword ; inword = NO ; nl = 0; nw = 0; nc = 0; c = getchar(); while ( c != EOF ) f nc = nc + 1; if ( c == '\n') nl = nl + 1; if ( c == ' ' || c == '\n' || c == '\t') inword = NO; else if (inword == NO) f inword = YES ; nw = nw + 1; g g g c = getchar(); printf("%d \n",nl); printf("%d \n",nw); Figure 10: P(nc). Complement of slice on nc: computes word count and line count 10 1 2 3 4 5 6 8 9 10 11 12 15 16 17 18 19 20 21 22 24 25 26 #define YES 1 #define NO 0 main() f int c, nl, nw, nc, inword ; inword = NO ; nw = 0; nc = 0; c = getchar(); while ( c != EOF ) f nc = nc + 1; if ( c == ' ' || c == '\n' || c == '\t') inword = NO; else if (inword == NO) f inword = YES ; nw = nw + 1; g c = getchar(); g g printf("%d \n",nw); printf("%d \n",nc); Figure 11: P( nl ). Complement of slice on nl: computes character count and word count 1 main() 2 f 3 int a, b, c, d, e, f; 4 c = 4; 5 b = c; 6 a = b + c; 7 d = a + c; 8 f = d + b; 9 e = d + 8; 10 b = 30 + f; 11 a = b + c; 12 g Figure 12: Dependent Variable Sample Program 11 1 main() 2 f 3 int a, b, c, d, e, f; 4 c = 4; 5 b = c; 6 a = b + c; 7 d = a + c; 8 f = d + b; 10 b = 30 + f; 11 a = b + c; 12 g Figure 13: Slice on a 1 main() 2 f 3 int a, b, c, d, e, f; 4 c = 4; 5 b = c; 6 a = b + c; 7 d = a + c; 9 e = d + 8; 12 g Figure 14: Slice on e 12 1 main() 2 f 3 int a, b, c, d, e, f; 4 c = 4; 5 b = c; 6 a = b + c; 7 d = a + c; 8 f = d + b; 10 b = 30 + f; 12 g Figure 15: Slice on b 1 main() 2 f 3 int a, b, c, d, e, f; 4 c = 4; 5 b = c; 6 a = b + c; 7 d = a + c; 8 f = d + b; 12 g Figure 16: Slice on f A similar argument applies for independent control ow statements that reference dependent variables. A dependent variable in an independent statement corresponds to the situation where the variable in question is required for the compilation of the complement, but the statement in question does not contribute to the complement. If a variable is referenced in a dependent statement, it is necessary to the complement and cannot be independent. If a decomposing on a single variable yields a strongly dependent slice, we are able to construct a slice where the original slice variable is an independent variable. Proposition 5 Let 1. 2. 3. 4. 5. 6. Varset(P) be the set of variables in program P. S(v) be a strongly dependent output restricted decomposition slice of P. Let D = f w 2 Varset(P) j S(v) is strongly dependent on S(w)g Let M = f m 2 Varset(P) j S(m) is maximalg Let U = D \ M Let T = S(u) [ u2U The variable v is an independent variable in T . In other words, when S(v) is a strongly dependent slice and T is the union of all the maximal slices upon which S(v) is strongly dependent, then v is an independent variable in T. 13 Proof Sketch: We show that the complement of T , P - T has no references to v: if variable v is in the complement of T , then there is a maximal slice in the complement upon which S (v) is strongly dependent. This contradicts the hypotheses, so the complement if T has no references to v and the variable v is independent in T . End. This can be interpreted as the variable version of proposition 1, that refers to statements. This has not addressed the problem that is presented when the decomposition slice on variable is maximal, but the variable itself remains dependent. This is the situation that occurred in the example at the beginning of the chapter; the slice on variable a ( gure 13) is maximal but the variable is dependent. The solution is straightforward: we construct the slice that is the union of all slices in which the the variable is dependent. Proposition 6 Let 1. Varset(P) be the set of variables in program P. 2. S (v) be an output restricted decomposition slice of P. 3. Let E = f w 2 Varset(P) j v is a dependent variable in S (w)g 4. Let T = [ S(e) e2E We have two cases: 1. E = ;, (and thus T is empty also) in which case v is an independent variable. 2. E = 6 ;, so T is not empty and the variable v is an independent variable in T . Proof Sketch: Case 1: E = ; S (v) contains all references to v. In particular, S (v) contains all assignments to v. So v is an independent variable in S (v). End Case 1 Case 2: E = 6 ; T contains all references to v. In particular, T contains all assignments to v. So v is an independent variable in T . End Case 2 End. This proposition is about variables. 4.1 Modifying Decomposition Slices We are now in a position to answer the question posed at the beginning of this section. We present the restrictions as a collection of rules with justi cations. Modi cations take three forms: additions, deletions and changes. A change may be viewed as a deletion followed by an addition. We will use this second approach, and determine only those statements in a decomposition slice that can be deleted and the forms of statements that can be added. Again, we must rely on the fact that the union of decomposition slices is a slice, since the complementary criteria will usually involve more than one maximal variable. We also assume that the maintainer has kept the modi ed program compilable and has obtained the decomposition slice of the portion of the software that needs to be changed. (Locating the code may be a highly nontrivial activity; for the sake of the current discussion, we assume its completion.) Since independent statements do not a ect data ow or control ow in the complement, we have: Rule 1 Independent statements may be deleted from a decomposition slice. Reason: Independent statements do not a ect the computations of the complement. Deleting an independent statement from a slice will have no impact on the complement. End. 14 This result applies to control ow statements and assignment statements. The statement may be deleted even if it is an assignment statement that targets a dependent variable, or a control statement that references a dependent variable. The point to keep in mind is that if the statement is independent it does not a ect the complement. If an independent statement is deleted, there will certainly be an e ect in the slice. But the purpose of this methodology is to keep the complement intact. There are a number of situations to consider when statements are to be added. We progress from simple to complex. Also note that for additions, new variables may be introduced, as long as the variable name does not clash with any name in the complement. In this instance the new variable is independent in the decomposition slice. In the following, independent variable means an independent variable or a new variable. Rule 2 slice. Assignment statements that target independent variables may be added anywhere in a decomposition Reason : Independent variables are unknown to the complement. Thus changes to them cannot a ect the computations of the complement. End. This type of change is permissible even if the changed value ows into a dependent variable. In gure 13, changes are permitted to the assignment statement a line 8 , which targets f. A change here would propagate into the values of dependent variables a and b at lines 10 and 11. The maintainer would then be responsible for the changes that would occur to these variables. If lines 10 and 11 were dependent, (i.e., contained in another decomposition slice), line 8 would also be contained in this slice, and variable f would be dependent. Adding control ow statements requires a little more care. This is required because control statements have two parts: the logical expression, that determines the ow of control, and the actions taken for each value of the expression. (We assume no side e ects in the evaluation of logical expressions.) We discuss only the addition of if-then-else and while statements, since all other language constructs can be realized by them [5]. Rule 3 Logical expressions (and output statements) may be added anywhere in a decomposition slice. Reason : We can inspect the state of the computation anywhere. Evaluation of logical expressions (or the inclusion of an output statement) will not even a ect the computation of the slice. Thus the complement remains intact. End. We must guarantee that the statements that are controlled by newly added control ow do not interfere with the complement. Rule 4 New control statements that surround (i.e. control) any dependent statement will cause the complement to change. Reason : Suppose newly added code controls a dependent statement. Let C be the criteria that yield the complement. When using this criteria on the modi ed program, the newly added control code will be included in this complementary slice. This is due to the fact that the dependent statements are in both the slice and the complement. Thus any control statements that control dependent statements will also be in the slice and the complement. End. By making such a change, we have violated out principle that the complement remain xed. Thus new control statements may not surround any dependent statement. This short list is necessary and sucient to keep the slice complement intact. This also has an impact on testing the change that will be discussed later. Changes may be required to computations involving a dependent variable, v, in the extracted slice. The maintainer can choose one of the following two approaches: 1. Use the techniques of the previous section to extend the slice so that v is independent in the slice. 15 2. Add a new local variable (to the slice), copy the value to the new variable, and manipulate the new name only. Of course, the new name must not clash with any name in the complement. This technique may also be used if the slice has no independent statements, i.e., it is strongly dependent. 4.2 Merging the Modi cations into the Complement Merging the modi ed slice back into the complement is straightforward. A key to understanding the merge operation comes the the observation that through the technique, the maintainer is editing the entire program. The method gives a view of the program with the unneeded statements deleted and with the dependent statements restricted from modi cation. The slice gives smaller piece of code for the maintainer to focus on, while the rules of the previous subsection provide the means by which the deleted and restricted parts cannot be changed accidentally. We now present the merge algorithm. 1. Order the statements in the original program. (In the following examples, we have one statement per line so that the ordering is merely the line numbering.) A program slice and its complement can now be identi ed with the subsequence of statement numbers from original program. We call the sequence numbering from the slice, the slice sequence and the numbering of the complement the complement sequence. We now view the editing process as the addition and deletion of the associated sequence numbers. 2. For deleted statements, delete the sequence number from the slice sequence. Observe that since only independent statements are deleted, this number is not in the complement sequence. 3. For statements inserted into the slice a new sequence number needs to be generated. Let P be the sequence number of the statement preceding the statement to be inserted. Let M be the least value in the slice sequence greater than P. Let F = min(int(P + 1); M). Insert the new statement at sequence number (F + P)=2. (Although this works in principle, in practice, more care needs to be taken in the generation of the insertion sequence numbers to avoid oating point errors after 10 inserts.) 4. The merged program is obtained by merging the modi ed slice sequence values (i.e. statements) into the complement sequence. Thus, the unchanged dependent statements are used to guide the reconstruction of the modi ed program. The placement of the changed statements within a given control ow is arbitrary. Again, this becomes clearer when the editing process is viewed as modi cation to the entire program. The following example will help clarify this. 4.3 Testing the Change Since the maintainer must restrict all changes to independent or newly created variables testing is reduced to testing the modi ed slice. Thus the need for regression testing in the complement is eliminated. There are two alternative approaches to verifying that only the change needs testing. The rst is to slice on the original criteria plus any new variables minus any eliminated variables. and compare its complement with the complement of the original: they should match exactly. The second approach is to preserve the criteria that produced the original complement. Slicing out on this must produce the modi ed slice exactly. An axiomatic consideration illumines this idea. The slice and its complement perform a subset of the computation; where the computations meet are the dependencies. Modifying code in the independent part of the slice, leaves the independent part of the complement as an invariant of the slice (and vice versa). If the required change is \merely" a module replacement, the preceding techniques are still applicable. The slice will provide a harness for the replaced module. A complete independent program supporting the module is obtained. One of the principle bene ts of slicing is highlighted in this context: any side e ects of the module to be replaced will also be in the slice. Thus the full impact of change is brought to the attention of the modi er. As an example, we make some changes to S(nw), the slice on nw, the word counter of gure 2. The changed slice is shown in gure 17. The original program determined a word to be any string of \nonwhite" symbols terminated by a \white" symbol (space, tab, or newline). The modi cation changes this 16 3 4 * 5 * 8 10 11 * * * 21 22 24 26 main() f int ch; int c, nw ; ch = 0; nw = 0; c = getchar(); while ( c != EOF ) f if (isspace(c) && isalpha(ch)) nw = nw + 1; ch = c ; c = getchar(); g g printf("%d \n",nw); Figure 17: Modi ed slice on nw, the word counter to requirement to be alphabetical characters terminated by white space. (The example is illustrating a change, not advocating it.) Note the changes. We have deleted the independent \variables" YES and NO; added a new, totally independent variable ch, and revamped the independent statements. The addition of the C macros isspace and isalpha is safe, since the results are only referenced. We test this program independently of the complement. Figure 18 shows the reconstructed, modi ed program. Taking the decomposition slice on nw generates the program of gure 17. Its complement is already given in gure 9. The starred (*) statements indicate where the new statements would be placed using the line number generation technique above. 5 A New Software Maintenance Process Model The usual Software Maintenance Process Model is depicted in gure 19. A request for change arrives. It may be adaptive, perfective, corrective, or preventive. In making the change, we wish to minimize defects, e ort, and cost, while maximizing customer satisfaction [12]. The software is changed, subject to pending priorities. The change is composed of two parts. Understanding the code, which may require documentation, code reading, and execution. Then the program is modi ed. The maintainer must rst design the change (which may be subject to peer review) then alter the code itself, while trying to minimize side e ects. The change is then validated. The altered code itself is veri ed to assure conformance with the speci cation. Then the new code is integrated with the existing system to insure conformance with the system speci cations. This task involves regression testing. The new model is depicted in gure 20. The software is changed, subject to pending priorities. The change is composed of two parts. Understanding the code will now require documentation, code reading, execution, and the use of decomposition slices. The decomposition slices may be read, and executed (a decided advantage of having executable program slices). The code is then modi ed, subject to the strictures outlined. Using those guidelines, no side e ects or unintended linkages can be induced in the code, even by accident. This lifts a substantial burden from the maintainer. The change is tested in the decomposition slice. Since the change cannot ripple out into other modules, regression testing is unnecessary. The maintainer need only verify that change is correct. After applying the merge algorithm, the change (of the code) is complete. 17 3 4 * 5 * 7 8 9 10 11 * * * 12 13 14 21 22 23 24 25 26 main() f int ch; int c, nl, nw, nc ; ch = 0; nl = 0; nw = 0; nc = 0; c = getchar(); while ( c != EOF ) f if (isspace(c) && isalpha(ch)) nw = nw + 1; ch = c ; nc = nc + 1; if ( c == '\n') nl = nl + 1; c = getchar(); g g printf("%d \n",nl); printf("%d \n",nw); printf("%d \n",nc); Figure 18: Modi ed Program 6 Future Directions The underlying method and the tool based on it [9] needs to be empirically evaluated. This is underway using the Goal-Question-Metric paradigm of Basili, et al [2]. Naturally, we are also addressing questions of scale, to determine if existing software systems decompose suciently via these techniques, in order to e ect a technology transfer. We are also evaluating decomposition slices as candidates for components in a reuse library. Although they seem to do well in practice, the slicing algorithms have relatively bad worst case running times, O(n e log(e)), where n is the number of variables and e is the number of edges in the owgragh. To obtain all the slices, this running time becomes O(n2 e log(e)). These worst case times would seem to make an interactive slicer for large (i.e., real) programs impractical. This diculty can be assuaged by making the data ow analysis one component of the deliverable products that are handed o from the development team to the maintenance team. An interactive tool could then be built using these products. Then as changes are made by the maintainers the data ow data can be updated, using the incremental techniques of Keables [17]. Interprocedural slices can be attacked using the techniques in Weiser [36] and Barth [1]. The interprocedural slicing algorithms of Horwitz, et al. [16] cannot be used since they require that the slice be taken at a point where the slice variable id defed or refed; we require that all slices be taken at the last statement of the program. For separate compilation, worst case assumption must be made about the external variables, if the source is not available. If the source is available, one proceeds as with procedures. Berzins [4] has attacked the problem of software merges for extensions of programs. To quote him: An extension extends the domain of the partial function without altering any of the initially de ned values, while a modi cation rede nes values that were de ned initially. We have addressed the modi cation problem by rst restricting the the domain of the partial function to the slice complement, modifying the function on the values de ned by the independent variables in the slice, 18 corrective preventive minimize cost maximize satisfaction documentation code reading test runs Request for Change Design Change pending priorities minimize side effects Change Software Alter Code Test Change Revalidate regression testing Integrate Figure 19: A Software Maintenance Process Model 19 adaptive perfective corrective preventive minimize defects minimize effort minimize cost maximize satisfaction test runs decomposition slicing Design Change Request for Change Alter Component pending priorities Change Software no side effects no regression testing Test Change Merge Figure 20: A New Software Maintenance Process Model 20 then merging these two disjoint domains. Horwitz, et al. [15] have addressed the modi cation problem. They start with a base program and two modi cations it, A and B: Whenever the changes made to base to create A and B do not \interfere" (in a sense de ned in the paper), the algorithm produces a program M that integrates A and B. The algorithm is predicated on the assumption that di erences in the behavior of the variant programs from that of base, rather than the di erences in text, are signi cant and must be preserved in M. Horwitz, et al. do not restrict the changes that can be made to base; thus their algorithm produces an approximation to the undecidable problem of determining whether or not the behaviors interfere. We have side-stepped this unsolvable problem by constraining the modi cations that are made. Our technique is more akin to the limits placed on software maintainers. Changes must be done in a context: independence and dependence provides the context. It is interesting to note, however, that their work uses program slicing to determine potential interferences in the merge. They do note that program variants, as they name them, are easily embedded in change control system, such as RCS [31]. Moreover, the direct sum nature of the components can be exploited to build related families of software. That is, components can be \summed" as long as their dependent code sections match exactly and there is no intersection of the independent domains. We also follow this approach for component construction. Weiser [34] discusses some slice-based metrics. Overlap is a measure of how many statements in a slice are found only in that slice, measured as a mean ratio of non-unique to unique statements in each slice. Parallelism is the number of slices which have few statements in common, computed as number of slices which have pairwise overlap below a certain threshold. Tightness is the number of statements in every slice, expressed as a ratio over program length. Programs with high overlap and parallelism, but with low tightness would decompose nicely: the lattice would not get too deep or too tangled. We have shown how a data ow technique, program slicing, can be used to form a decomposition for software systems. The decomposition yields a method for maintainers to use. The maintainer is able to modify existing code cleanly, in the sense that the changes can be assured to be completely contained in the modules under consideration and that no unseen linkages with the modi ed code is infecting other modules. 21 References [1] J. M. Barth. A practical interprocedural data ow analysis algorithm. Communications of the Association for Computing Machinery, 21(9):724{726, September 1978. [2] V. Basili, R. Selby, and D. Hutchens. Experimentation in software engineering. IEEE Transactions on Software Engineering, 12(7):352{357, July 1984. [3] J-F. Bergeretti and B. Carre. Information- ow and data- ow analysis of while-programs. ACM Transactions on Programming Languages and Systems, 7(1):37{61, January 1985. [4] V. Berzins. On merging software extensions. Acta Informatica, 23:607{619, 1985. [5] C. Bohm and G. Jacopini. Flow diagrams and languages with only two formation rules. Communications of the Association for Computing Machinery, 9(5):366{371, May 1966. [6] J-D. Choi, B. Miller, and P. Netzer. Techniques for debugging parallel programs with owback analysis. Technical Report 786, University of Wisconsin - Madison, August 1988. [7] J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319{349, July 1987. [8] K. B. Gallagher. Using Program Slicing in Software Maintenance. PhD thesis, University of Maryland, Baltimore, Maryland, December 1989. [9] K. B. Gallagher. Surgeon's assistant limits side e ects. IEEE Software, May 1990. [10] K. B. Gallagher and J. R. Lyle. Using program decomposition to guide modi cations. In Conference on Software Maintenance { 1988, pages 265{268, October 1988. [11] K. B. Gallagher and J. R. Lyle. A program decomposition scheme with applications to software modi cation and testing. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pages 479{485, January 1989. Volume II, Software Track. [12] R. Grady. Measuring and managing software maintenance. IEEE Software, 4(9), September 1987. [13] P. Hausler. Denotational program slicing. In Proceedings of the 22nd Hawaii International Conference on System Sciences, pages 486{494, January 1989. Volume II, Software Track. [14] S. Horwitz, J. Prins, and T. Reps. Integrating non-interfering versions of programs. In Proceedings of the SIGPLAN 88 Symposium on the Principles of Programming Languages, January 1988. [15] S. Horwitz, J. Prins, and T. Reps. Integrating non-interfering versions of programs. ACM Transactions on Programming Languages and Systems, 11(3):345{387, July 1989. [16] S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 12(1):35{46, January 1990. [17] J. Keables, K. Robertson, and A. von Mayrhauser. Data ow analysis and its application to software maintenance. In Conference on Software Maintenance { 1988, pages 335{347, October 1988. [18] K. Kennedy. A survey of data ow analysis techniques. In Steven S. Muchnick and Neil D. Jones, editors, Program Flow Analysis: Theory and Applications. Prentice-Hall, Englewood Cli s, New Jersey, 1981. [19] B. Kernighan and D. Ritchie. The C Programming Language. Prentice-Hall, Englewood Cli s, New Jersey, 1978. [20] B. Korel and J. Laski. Dynamic program slicing. Information Processing Letters, 29(3):155{163, October 1988. 22 [21] B. Korel and J. Laski. STAD - A system for testing and debugging: User perspective. In Proceedings of the Second Workshop on Software Testing, Veri cation and Analysis, pages 13{20, Ban , Alberta, Canada, July 1988. [22] J. Laski. Data ow testing in stad. The Journal of Systems and Software, 1989. [23] J. R. Lyle. Evaluating Variations of Program Slicing for Debugging. PhD thesis, University of Maryland, College Park, Maryland, December 1984. [24] J. R. Lyle and M. D. Weiser. Experiments on slicing-based debugging aids. In Elliot Soloway and Sitharama Iyengar, editors, Empirical Studies of Programmers. Ablex Publishing Corporation, Norwood, New Jersey, 1986. [25] J. R. Lyle and M. D. Weiser. Automatic program bug location by program slicing. In Proceeding of the Second International Conference on Computers and Applications, pages 877{882, Peking, China, June 1987. [26] L. Ott and J. Thuss. The relationship between slices and module cohesion. In International Conference on Software Engineering, May 1989. [27] K. Ottenstein and L. Ottenstein. The program dependence graph in software development environments. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 177{184, May 1984. L. Ottenstein is now known as L. Ott. [28] T. Reps and S. Horwitz. Semantics-based program integration. In Proceedings of the Second European Symposium on Programming ( ESOP '88), pages 133{145, Nancy, France, March 1988. [29] T. Reps and W. Yang. The semantics of program slicing. Technical Report 777, University of Wisconsin - Madison, June 1988. [30] N. Schneidewind. The state of software maintenance. IEEE Transactions on Software Engineering, 13(3):303{310, March 1987. [31] W. Tichy. RCS: A system for version control. Software - Practice & Experience, 15(7):637{654, July 1985. [32] G. Weinberg. Kill that code! Infosystems, pages 48{49, August 1983. [33] M. Weiser. Program Slicing: Formal, Psychological and Practical Investigations of an Automatic Program Abstraction Method. PhD thesis, The University of Michigan, Ann Arbor , Michigan, 1979. [34] M. Weiser. Program slicing. In Proceeding of the Fifth International Conference on Software Engineering, pages 439{449, May 1981. [35] M. Weiser. Programmers use slices when debugging. CACM, 25(7):446{452, July 1982. [36] M. Weiser. Program slicing. IEEE Transactions on Software Engineering, 10:352{357, July 1984.