This new version of the answer tries to take into account the changes
in the question, and the information exchanged in the comments.
This seems related toanswer assumes that $S$ should be the set of variables that have a
content that is used in some defined fragment of the program, rather
than, at some point in the program, the variables with a content that
will be needed before the end of the program.
For the latter, the usual term is live variables, as remarked in a comment by Klaus Draeger.
The former is a generalisation of the concept of use, as appears
in dataflow analysis, and particularly in such concepts as
useUse-defineDefine or useUse-definitionDefinition chains (UD chains) in, as well as
program analysisDefinition-Use chains (there are also Definition-useDU chains). SoThe concept of use is usually
intended for elementary program statements such as an expression
appearing in an assignment or a consistent terminology would simply befunction call. Recall that this
originates with analysis and optimization of old Fortran programs in
the late 1960s and in the 1970s, and the was no real concept of a
compound statement at the time in the Fortran language.
But there is no reason not to extend the concept to larger program
fragments that form a meaningful whole. Thus used
variables variable, or
used set seem to be exactly the concepts and terminology that you
are looking for.
Of courseNow, there may be a difficulty in practicedefining what is implied in the
expression "is used by". It may just mean: "syntactically appears in",
which is as simple as you can make it. This is indeed purely
syntactic, but rather non satisfactory, because the variable may well
appear only in a statement that changes its value, rather than use the
value it already has. This is too simplistic is is clearly not what
was intended by the creators of the concept.
Then a better definition will state that the occurrence of such a set $S$variable
must imply that its value is onlyactually used. But as precisesoon as youryou say
static analysisthat, you are no longer in syntax since you must use some of the
meaning of the program. If you have fragment to know whether the statement `Ifoccurrence of the
factorialvariable is for using or for changing its value (3or possibly is simply
irrelevant)==6 Then $v_1$ . And when you start using the semantics of the language
construct, there is no clear limit on how much of the semantics you
can use.
The definition given in page 632 of the Dragon Book 1988 is:= 1 Else $v_1$
We say that a variable is used at a statement $s$ if its $r$-value may be required.
First one should note the use of :=if
rather than $v_2$ End"iff
, will youwhich clearly
still considerindicates that some variables not meeting the condition may end up
being qualified as $v_2$used. This even reinforced by the may be
.
Then, this almost definition does not have impactbring much light on the statementissue. YourWhat does it mean to
knowledgebe required. Typically, if a variable is an argument to a subprogram
(procedure, function, method, ...), you may think it is required. But
if further analysis shows that the value of this argument is not
actually used in the subprogram, and the argument is only used to
return a result, the $r$-value of the variable is not required. Some
may object that this could be handled by an appropriate type system,
which they consider syntax (I do not). But the fact willthat a value is
not required may depend on deeper semantic analysis (such as in
example 4 of the depthquestion). Furthermore, the evolution of your analysistype systems
and type theory tends to allow inclusion in types of most thing you
may want to say or prove about a variable, which would hardly qualify
as syntactic.
If you take a strict theoreticalThus the situation is that there is not really an undisputable
syntactic (?) reference definition of "used variable", iand it depends
essentially on an arbitrary choice implied by the set of techniques
used to analyze the program and the level of knowledge it brings
regarding actual use of the variable value at run-time.e
On the other hand it is possible to give a reference semantic
definition for a set $S$ of used variables:
A variable is used in some program fragment iff there is a computation of that program such that the results and effects
of executing that fragment depend on the value of that variable before execution.
This is not completely satisfactory, because there are other
parameters that could be considered, and might suggest a different
terminology, though not change the above remarks about syntax and
semantics. you want
For example, in a sequential context, the above definition is clear,
and refers to exclude allthe value of the variable before executing the program
variables which will never(fragment). However, one may still question what is considered an
effect, what is an impact theon execution. Execution time could well be
part of the semantics of a program, thenor not. Then, in a parallel
execution context, you would consider more than the initial value of
the variable. But I will ignore this, as the question has not been
considering it (as far as I can see), and I will stick to the
definition above.
The set $S$ thus defined semantically is the most precise
(i.e. minimal) set of variables that can matter in any computation,
though some may not matter in some computation.
But this set is not necessarily computable. It is recursively
enumerable, since a Turing Machine could simulate in parallel all possible
computations, and enumerate all the variables that turn out to be used
undecidable whethereffectively (in the sense of our definition) in some variables docomputation. But
determining that some variable will not belongbe used in a computation is
undecidable.
The best we can hope for is to exhibit a superset of $S$, such that we
are sure not to miss any relevant variable. This is precisely what is
done by the various techniques that can be called upon to determine
the used variables. But they do not all produce the same result and
the results they give can be more or less precise.
If some program analysis technique ($S$$T_i$ produces a set $S_i$ of used
variable, we say that $T_i$ is recursively enumerablecorrect iff $S\subseteq S_i$.
If two techniques $T_1$ and $T_2$ produce respectively the sets $S_1$
and $S_2$, butthen $T1$ is better, more precise than $T_2$ iff
$S\subseteq S_1\subseteq S_2$.
The only thing that has a precise definition (up to the above caveat)
is the set S, enven though possibly not recursivecomputable. It is also the
reference for any other set to be considered acceptable in generalpractice, and is the smallest, the most precise of them all.
Hence, the expressions "used set" and "used variable" should be
reserved for that set.
And I suggest that the other sets that are considered in practice,
should use the same name, qualified with the name of the technique
that produced them. If the technique is unique and unnamed, it could
be called the computed use set (what is called SY
in the question),
or just used set when there is no ambiguity.