Revisions to Name a set of program variables

replaced http://cstheory.stackexchange.com/ with https://cstheory.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:32

1

For the latter, the usual term is live variables, as remarked in a comment by Klaus Draeger comment by Klaus Draeger.

Bounty Ended with 50 reputation awarded by SoftTimur

occurred Nov 14, 2014 at 16:47

Adapted to question evolution, and accounting for comments

Source Link

edited Nov 14, 2014 at 15:12

babou

1.5k
10
18

This new version of the answer tries to take into account the changes in the question, and the information exchanged in the comments.

This seems related toanswer assumes that $S$ should be the set of variables that have a content that is used in some defined fragment of the program, rather than, at some point in the program, the variables with a content that will be needed before the end of the program.

For the latter, the usual term is live variables, as remarked in a comment by Klaus Draeger.

The former is a generalisation of the concept of use, as appears in dataflow analysis, and particularly in such concepts as useUse-defineDefine or useUse-definitionDefinition chains (UD chains) in, as well as program analysisDefinition-Use chains (there are also Definition-useDU chains). SoThe concept of use is usually intended for elementary program statements such as an expression appearing in an assignment or a consistent terminology would simply befunction call. Recall that this originates with analysis and optimization of old Fortran programs in the late 1960s and in the 1970s, and the was no real concept of a compound statement at the time in the Fortran language.

But there is no reason not to extend the concept to larger program fragments that form a meaningful whole. Thus used variables variable, or used set seem to be exactly the concepts and terminology that you are looking for.

Of courseNow, there may be a difficulty in practicedefining what is implied in the expression "is used by". It may just mean: "syntactically appears in", which is as simple as you can make it. This is indeed purely syntactic, but rather non satisfactory, because the variable may well appear only in a statement that changes its value, rather than use the value it already has. This is too simplistic is is clearly not what was intended by the creators of the concept.

Then a better definition will state that the occurrence of such a set $S$variable must imply that its value is onlyactually used. But as precisesoon as youryou say static analysisthat, you are no longer in syntax since you must use some of the meaning of the program. If you have fragment to know whether the statement `Ifoccurrence of the factorialvariable is for using or for changing its value (3or possibly is simply irrelevant)==6 Then $v_1$ . And when you start using the semantics of the language construct, there is no clear limit on how much of the semantics you can use.

The definition given in page 632 of the Dragon Book 1988 is:= 1 Else $v_1$

We say that a variable is used at a statement $s$ if its $r$-value may be required.

First one should note the use of :=if rather than $v_2$ End"iff, will youwhich clearly still considerindicates that some variables not meeting the condition may end up being qualified as $v_2$used. This even reinforced by the may be.

Then, this almost definition does not have impactbring much light on the statementissue. YourWhat does it mean to knowledgebe required. Typically, if a variable is an argument to a subprogram (procedure, function, method, ...), you may think it is required. But if further analysis shows that the value of this argument is not actually used in the subprogram, and the argument is only used to return a result, the $r$-value of the variable is not required. Some may object that this could be handled by an appropriate type system, which they consider syntax (I do not). But the fact willthat a value is not required may depend on deeper semantic analysis (such as in example 4 of the depthquestion). Furthermore, the evolution of your analysistype systems and type theory tends to allow inclusion in types of most thing you may want to say or prove about a variable, which would hardly qualify as syntactic.

If you take a strict theoreticalThus the situation is that there is not really an undisputable syntactic (?) reference definition of "used variable", iand it depends essentially on an arbitrary choice implied by the set of techniques used to analyze the program and the level of knowledge it brings regarding actual use of the variable value at run-time.e

On the other hand it is possible to give a reference semantic definition for a set $S$ of used variables:

A variable is used in some program fragment iff there is a computation of that program such that the results and effects of executing that fragment depend on the value of that variable before execution.

This is not completely satisfactory, because there are other parameters that could be considered, and might suggest a different terminology, though not change the above remarks about syntax and semantics. you want

For example, in a sequential context, the above definition is clear, and refers to exclude allthe value of the variable before executing the program variables which will never(fragment). However, one may still question what is considered an effect, what is an impact theon execution. Execution time could well be part of the semantics of a program, thenor not. Then, in a parallel execution context, you would consider more than the initial value of the variable. But I will ignore this, as the question has not been considering it (as far as I can see), and I will stick to the definition above.

The set $S$ thus defined semantically is the most precise (i.e. minimal) set of variables that can matter in any computation, though some may not matter in some computation.

But this set is not necessarily computable. It is recursively enumerable, since a Turing Machine could simulate in parallel all possible computations, and enumerate all the variables that turn out to be used undecidable whethereffectively (in the sense of our definition) in some variables docomputation. But determining that some variable will not belongbe used in a computation is undecidable.

The best we can hope for is to exhibit a superset of $S$, such that we are sure not to miss any relevant variable. This is precisely what is done by the various techniques that can be called upon to determine the used variables. But they do not all produce the same result and the results they give can be more or less precise.

If some program analysis technique ($S$$T_i$ produces a set $S_i$ of used variable, we say that $T_i$ is recursively enumerablecorrect iff $S\subseteq S_i$.

If two techniques $T_1$ and $T_2$ produce respectively the sets $S_1$ and $S_2$, butthen $T1$ is better, more precise than $T_2$ iff $S\subseteq S_1\subseteq S_2$.

The only thing that has a precise definition (up to the above caveat) is the set S, enven though possibly not recursivecomputable. It is also the reference for any other set to be considered acceptable in generalpractice, and is the smallest, the most precise of them all.

Hence, the expressions "used set" and "used variable" should be reserved for that set.

And I suggest that the other sets that are considered in practice, should use the same name, qualified with the name of the technique that produced them. If the technique is unique and unnamed, it could be called the computed use set (what is called SYin the question), or just used set when there is no ambiguity.

This seems related to the use-define or use-definition chains in program analysis (there are also Definition-use chains). So a consistent terminology would simply be used variables, or used set.

Of course, in practice, the definition of such a set $S$ is only as precise as your static analysis of the program. If you have the statement `If factorial(3)==6 Then $v_1$ := 1 Else $v_1$ := $v_2$ End", will you still consider that $v_2$ does not have impact on the statement. Your knowledge of the fact will depend on the depth of your analysis.

If you take a strict theoretical definition, i.e. you want to exclude all variables which will never impact the execution, then it may be undecidable whether some variables do not belong to $S$ ($S$ is recursively enumerable, but not recursive in general).

This new version of the answer tries to take into account the changes in the question, and the information exchanged in the comments.

This answer assumes that $S$ should be the set of variables that have a content that is used in some defined fragment of the program, rather than, at some point in the program, the variables with a content that will be needed before the end of the program.

For the latter, the usual term is live variables, as remarked in a comment by Klaus Draeger.

The former is a generalisation of the concept of use, as appears in dataflow analysis, and particularly in such concepts as Use-Define or Use-Definition chains (UD chains), as well as Definition-Use chains (DU chains). The concept of use is usually intended for elementary program statements such as an expression appearing in an assignment or a function call. Recall that this originates with analysis and optimization of old Fortran programs in the late 1960s and in the 1970s, and the was no real concept of a compound statement at the time in the Fortran language.

But there is no reason not to extend the concept to larger program fragments that form a meaningful whole. Thus used variable or used set seem to be exactly the concepts and terminology that you are looking for.

Now, there may be a difficulty in defining what is implied in the expression "is used by". It may just mean: "syntactically appears in", which is as simple as you can make it. This is indeed purely syntactic, but rather non satisfactory, because the variable may well appear only in a statement that changes its value, rather than use the value it already has. This is too simplistic is is clearly not what was intended by the creators of the concept.

Then a better definition will state that the occurrence of a variable must imply that its value is actually used. But as soon as you say that, you are no longer in syntax since you must use some of the meaning of the program fragment to know whether the occurrence of the variable is for using or for changing its value (or possibly is simply irrelevant). And when you start using the semantics of the language construct, there is no clear limit on how much of the semantics you can use.

The definition given in page 632 of the Dragon Book 1988 is:

We say that a variable is used at a statement $s$ if its $r$-value may be required.

First one should note the use of if rather than iff, which clearly indicates that some variables not meeting the condition may end up being qualified as used. This even reinforced by the may be.

Then, this almost definition does not bring much light on the issue. What does it mean to be required. Typically, if a variable is an argument to a subprogram (procedure, function, method, ...), you may think it is required. But if further analysis shows that the value of this argument is not actually used in the subprogram, and the argument is only used to return a result, the $r$-value of the variable is not required. Some may object that this could be handled by an appropriate type system, which they consider syntax (I do not). But the fact that a value is not required may depend on deeper semantic analysis (such as in example 4 of the question). Furthermore, the evolution of type systems and type theory tends to allow inclusion in types of most thing you may want to say or prove about a variable, which would hardly qualify as syntactic.

Thus the situation is that there is not really an undisputable syntactic (?) reference definition of "used variable", and it depends essentially on an arbitrary choice implied by the set of techniques used to analyze the program and the level of knowledge it brings regarding actual use of the variable value at run-time.

On the other hand it is possible to give a reference semantic definition for a set $S$ of used variables:

A variable is used in some program fragment iff there is a computation of that program such that the results and effects of executing that fragment depend on the value of that variable before execution.

This is not completely satisfactory, because there are other parameters that could be considered, and might suggest a different terminology, though not change the above remarks about syntax and semantics.

For example, in a sequential context, the above definition is clear, and refers to the value of the variable before executing the program (fragment). However, one may still question what is considered an effect, what is an impact on execution. Execution time could well be part of the semantics of a program, or not. Then, in a parallel execution context, you would consider more than the initial value of the variable. But I will ignore this, as the question has not been considering it (as far as I can see), and I will stick to the definition above.

The set $S$ thus defined semantically is the most precise (i.e. minimal) set of variables that can matter in any computation, though some may not matter in some computation.

But this set is not necessarily computable. It is recursively enumerable, since a Turing Machine could simulate in parallel all possible computations, and enumerate all the variables that turn out to be used effectively (in the sense of our definition) in some computation. But determining that some variable will not be used in a computation is undecidable.

The best we can hope for is to exhibit a superset of $S$, such that we are sure not to miss any relevant variable. This is precisely what is done by the various techniques that can be called upon to determine the used variables. But they do not all produce the same result and the results they give can be more or less precise.

If some program analysis technique $T_i$ produces a set $S_i$ of used variable, we say that $T_i$ is correct iff $S\subseteq S_i$.

If two techniques $T_1$ and $T_2$ produce respectively the sets $S_1$ and $S_2$, then $T1$ is better, more precise than $T_2$ iff $S\subseteq S_1\subseteq S_2$.

The only thing that has a precise definition (up to the above caveat) is the set S, enven though possibly not computable. It is also the reference for any other set to be considered acceptable in practice, and is the smallest, the most precise of them all.

Hence, the expressions "used set" and "used variable" should be reserved for that set.

And I suggest that the other sets that are considered in practice, should use the same name, qualified with the name of the technique that produced them. If the technique is unique and unnamed, it could be called the computed use set (what is called SYin the question), or just used set when there is no ambiguity.

clarified decidability of S

Source Link

edited Nov 13, 2014 at 12:16

babou

1.5k
10
18

This seems related to the use-define or use-definition chains in program analysis (there are also Definition-use chains). So a consistent terminology would simply be used variables, or used set.

Of course, in practice, the definition of such a set $S$ is only as precise as your static analysis of the program. If you have the statement `If factorial(3)==6 Then $v_1$ := 1 Else $v_1$ := $v_2$ End", will you still consider that $v_2$ does not have impact on the statement. Your knowledge of the fact will depend on the depth of your analysis.

If you take a strict theoretical definition, i.e. you want to exclude all variables which will never impact the execution, then it may be undecidable whether some variables do not belong to $S$ ($S$ is recursively enumerable, but not recursive in general).

added 5 characters in body

Source Link

edited Nov 13, 2014 at 10:42

babou

1.5k
10
18

Loading

Source Link

created Nov 13, 2014 at 10:25

babou

1.5k
10
18

Loading

Stack Exchange Network

Return to Answer