Seven Principles of Software Testing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

S O F T WA R E T E C H N O L O G I E S

Seven
have evidenced 550, 540, and 530
faults, the trend is encouraging, but
the next run is unlikely to find no

Principles of
faults, or 100. (Mathematical reli-
ability models allow more precise
estimates, credible in the presence
of a sound long-term data collec-

Software Testing
tion process.)
The only incontrovertible connec-
tion is negative, a falsification in the
Popperian sense: A failed test gives
Bertrand Meyer, ETH Zürich and Eiffel Software us evidence of nonquality. In addi-
tion, if the test previously passed,
it indicates regression and points
to possible quality problems in the
program and the development pro-
cess. The most famous quote about
Testing is about producing failures. testing expressed this memorably:
“Program testing,” wrote Edsger
Dijkstra, “can be used to show the
presence of bugs, but never to show
their absence!”
Less widely understood (and

W
probably not intended by Dijkstra)
h i l e e v e r yo n e for testing, echoed in the Wikipedia is what this means for testers: the
knows the theoret- definition (http://en.wikipedia.org/ best possible self-advertisement.
ical limitations of wiki/Software_testing): “Software Surely, any technique that uncov-
software testing, in testing is the process used to assess ers faults holds great interest for all
practice we devote the quality of computer software. “stakeholders,” from managers to
considerable effort to this task and Software testing is an empirical developers and customers.
would consider it foolish or down- technical investigation conducted to Rather than an indictment, we
right dangerous to skip it. Other provide stakeholders with informa- should understand this maxim as
verification techniques such as static tion about the quality of the product a definition of testing. While less
analysis, model checking, and proofs or service under test, with respect to ambitious than providing “infor-
have great potential, but none is ripe the context in which it is intended to mation about quality,” it is more
for overtaking tests as the dominant operate.” In truth, testing a program realistic, and directly useful.
verification technique. This makes it tells us little about its quality, since
imperative to understand the scope 10 or even 10 million test runs are a Principle 1: Definition
and limitations of testing and per- drop in the ocean of possible cases. To test a program is to try to
form it right. There are connections between make it fail.
The principles that follow emerged tests and quality, but they are tenu-
from experience studying software ous: A successful test is only relevant This keeps the testing process
testing and developing automated to quality assessment if it previously focused: Its single goal is to uncover
tools such as AutoTest (http://se. failed; then it shows the removal of a faults by triggering failures. Any
inf.ethz.ch/research/autotest). failure and usually of a fault. (I follow inference about quality is the
the IEEE standard terminology: An responsibility of quality assurance
Defining testing unsatisfactory program execution is a but beyond the scope of testing.
As a verification method, testing “failure,” pointing to a “fault” in the The definition also reminds us that
is a paradox. Testing a program to program, itself the result of a “mis- testing, unlike debugging, does not
assess its quality is, in theory, akin to take” in the programmer’s thinking. deal with correcting faults, only
sticking pins into a doll—very small The informal term “bug” can refer to finding them.
pins, very large doll. The way out of any of these phenomena.)
the paradox is to set realistic expec- If a systematic process tracks fail- Tests and specifications
tations. ures and faults, the record might Test-driven development, given
Too often the software engineering give clues about how many remain. prominence by agile methods, has
literature claims an overblown role If the last three weekly test runs brought tests to the center stage, but

August 2008 99
S O F T WA R E T E C H N O L O G I E S

sometimes with the seeming implica- or perhaps a few hundred tests, you The approaches are complemen-
tion that tests can be a substitute for might afford to examine the results tary.
specifications. They cannot. Tests, individually, but this does not scale
even a million of them, are instances; up. The task cries for automation. Principle 5: Manual and auto-
they miss the abstraction that only a matic test cases
specification can provide. Principle 4: Applying oracles An effective testing process must
Determining success or failure of include both manually and auto-
Principle 2: Tests versus specs tests must be an automatic pro- matically produced test cases.
Tests are no substitute for speci- cess.
fications. Manual tests are good at depth:
This statement of the principle They reflect developers’ understand-
The danger of believing that a test leaves open the form of oracles. ing of the problem domain and data
suite can serve as specification is Often, oracles are specified sepa- structure. Automatic tests are good
evidenced by several software disas- rately. In research such as ours, they at breadth: They try many values,
ters that happened because no one are built in, as the target software including extremes that humans
had thought of some extreme case. already includes contracts that the might miss.
Although specifications can miss tests use as oracles.
cases too, at least they imply an effort Testing strategies
at generalization. In particular, speci- We now move from testing prac-
fications can serve to generate tests, Random testing often tice to research investigating new
even automatically (as in model- outperforms supposedly techniques. Testing research is vul-
driven testing); the reverse is not pos- nerable to a risky thought process:
sible without human intervention.
smart ideas. You hit upon an idea that seemingly
promises improvements and follow
Regression testing your intuition. Testing is tricky; not
A characteristic of testing as prac- Principle 4 (variant): Contracts as all clever ideas prove helpful when
ticed in software is the deplorable oracles submitted to objective evaluation.
propensity of previously corrected Oracles should be part of the pro- A typical example is random
faults to resuscitate. The hydra’s old gram text, as contracts. Deter- testing. Intuition suggests that any
heads, thought to have been long cut mining test success or failure strategy using knowledge about the
off, pop back up. This phenomenon should be an automatic process program must beat random input.
is known as regression and leads to consisting of monitoring contract Yet objective measures, such as the
regression testing: Checking that satisfaction during execution. number of faults found, show that
what has been corrected still works. random testing often outperforms
A consequence is that once you have This principle subsumes the previ- supposedly smart ideas. Richard
uncovered a fault it must remain part ous one but is presented as a variant Hamlet’s review of random test-
of your life forever. so that people who do not use con- ing (Encyclopedia of Software
tracts can retain the weaker form. Engineering, J.J. Marciniak, ed.,
Principle 3: Regression testing Wiley, 1994, pp. 970-978) provides
Any failed execution must yield a Manual and a fascinating confrontation of folk
test case, to remain a permanent automatic test cases knowledge and scientific analysis.
part of the project’s test suite. Many test cases are manual: Tes- There is no substitute for empiri-
ters think up interesting execution cal assessment.
This principle covers all failures scenarios and devise tests accord-
occurring during development and ingly. To this category we may Principle 6: Empirical assess-
testing. It suggests tools for turning a add cases derived—according to ment of testing strategies
failed execution into a reproducible principle 3—from the failure of an Evaluate any testing strategy,
test case, as have recently emerged: execution not initially intended as however attractive in principle,
Contract-Driven Development a test run. It is becoming increas- through objective assessment
(CDD), ReCrash, JCrasher. ingly realistic to complement these using explicit criteria in a repro-
two categories by automatic test ducible testing process.
Oracles cases, derived from the specification
A test run is only useful if you can through an automatic test generator. I was impressed as a child by read-
unambiguously determine whether A process restricted to manual tests ing in The Life of the Bee (Fasquelle,
it passed. The criterion is called a underutilizes the power of modern 1901) by Maurice Maeterlinck
test oracle. If you have a few dozen computers. (famous as the librettist of Debussy’s

100 Computer
Pelléas et Mélisande) what happens
when you put a few bees and a few
flies in a bottle and turn the bot-
tom toward the light source. As Fig-
ure 1 shows, bees, attracted by the
light, get stuck and die of hunger or
exhaustion; flies don’t have a clue
and try all directions—getting out
within a couple of minutes.
Maeterlinck was a poet, not a pro-
fessional biologist, and I don’t know
if the experiment holds up. But it is
a good metaphor for cases of appar-
ent stupidity outsmarting apparent
cleverness, as happens in testing.

Assessment criteria Figure 1. Smarter is not always better. Maeterlinck observed that if you put bees and flies
In applying the last principle, the into a bottle and turn the bottom toward the light source, the supposedly clever bees,
issue remains of which criteria to attracted by the light, get stuck and die, while apparently stupid flies get out within a
use. The testing literature includes couple of minutes. Is this a metaphor for testing strategies?
measures such as “number of tests
to first failure.” For the practitio- ally because the team knew it was age-old question “when do I stop
ner this is not the most useful: We problematic, and indeed it will often testing?”
want to find all faults, not just one. have more faults.

W
Granted, the idea is that the first More than any of these metrics e never strayed far from
fault will be corrected and the cri- what matters is how fast a strat- where we started. The first
terion applied again. But successive egy can produce failures revealing principle told us that testing
faults might be of a different nature; faults. is about producing failures; the last
an automated process must trigger one is a quantitative restatement of
as many failures as possible, not Principle 7: Assessment criteria that general observation, which also
stop at the first. A testing strategy’s most impor- underlies all the others. n
The number of tests is not that tant property is the number of
useful to managers, who need help faults it uncovers as a function Bertrand Meyer is professor of Soft-
deciding when to stop testing and of time. ware Engineering at ETH Zürich
ship, or to customers, who need an and chief architect at Eiffel Software
estimate of fault densities. More The relevant function is fault in Santa Barbara, Calif. Contact him
relevant is the testing time needed count against time, fc (t), useful at [email protected].
to uncover the faults. Otherwise we in two ways: Researchers using a
risk favoring strategies that uncover software base with known faults
a failure quickly but only after a can assess a strategy by seeing how Editor: Mike Hinchey,
lengthy process of devising the test; many of them it finds in a given time; Lero—The Irish Software
what counts is total time. This is project managers can feed fc (t) into Engineering Research Centre;
why, just as flies get out faster than a reliability model to estimate how [email protected]
bees, a seemingly dumb strategy many faults remain, addressing the
such as random testing might be
better overall.
Other measures commonly used
include test coverage of various
kinds (such as instruction, branch, Join the IEEE Computer Society online at www.computer.org/join/
Complete the online application and get
or path coverage). Intuitively they • Immediate online access to Computer
seem to be useful, but there is little • A free e-mail alias — [email protected]
• Free access to 100 online books on technology topics
actual evidence that higher cover- • Free access to more than 100 distance learning course titles
age has any bearing on quality. In • Access to the IEEE Computer Society Digital Library for only $121
fact, several recent studies suggest Read about all the benefits of joining the Society at:
a negative correlation; if a module
has higher test coverage, this is usu- www.computer.org/join/benefits.htm

August 2008 101

You might also like