Software Reliability

Software Reliability
1
 What is software reliability?
the probability of failure-free software operation for a specified

period of time in a specified environment
Software is subject to input output

sw
 design flaws:
- mistakes in the interpretation of the specification
that the software is supposed to satisfy (ambiguities)
- mistakes in the implementation of the specification:
carelessness or incompetence in writing code,
inadequate testing
 operational faults
incorrect or unexpected usage faults (operational profile)
2
Design Faults
hard to visualize, classify, detect, and correct.
 closely related to human factors and the design

process, of which we don't have a solid understanding
Given a design flaw, only some type of inputs will exercise that
fault to cause failures. Number of failures depend on how
often these inputs exercise the sw flaw
Apparent reliability of a piece of software is correlated to how

frequently design faults are exercised as opposed to number
of design faults present
3
Software reliability
4
5
We assume that programs will not be fault free
6
Software faults and Failure regions
We assume that programs will not be fault-free
The input to the software is a set of variables, defining a Cartesian

space, e.g. x and y
Failure regions
y
x
The software contains bugs if some inputs are processed erroneously
Effcacy of software fault tolerance techniques depends on how

disjoint the failure regions of the versions are
7
 Software reliability is not a direct function of time.
Electronical and mechanical parts may become old, and
wear-out with time and usage.
Software DOES NOT wear-out during its life.
Software DOES NOT change over time unless
intentionally changed or upgraded
 As a software is used, design faults are discovered and

corrected. Consequently, the reliability should improve,
and the failure rate should decrease BUT corrections could
cause new faults
8
SOFTWARE RELIABILITY EVOLUTION
 upgrades imply feature upgrades, not upgrades for reliability.
From “Software Reliability”,

J. Pan, Carnegie Mellon University, 1999
identify periods of reliability growth and decrease
9
SOFTWARE RELIABILTY EVOLUTION
 in the last phase, software does not have an
increasing failure rate as hardware does. In this phase,
software is approaching obsolescence; there are no
motivations for any upgrades or changes to the software.
Therefore, the failure rate will not change.
 in the useful-life phase, software will experience a

drastic increase in failure rate each time an upgrade is made.
The failure rate levels off gradually, partly because of the defects
found and fixed after the upgrades.
 Even bug fixes may be a reason for more software failures,

if the bug fix induces other defects into software
10
 Reliability upgrades drop in software failure rate, if redesign or
reimplementation of some modules with better engineering
approaches
From “Software Reliability”, J. Pan, Carnegie Mellon University, 1999
11
12
Software Reliability Growth Models
Removal of implementation errors should increse MTTF, and

correlation of bug-removal history with the time evolution of the
MTTF value may allow the prediction of when a given MTTF
value will be reached.
Disadvantages:
Do not consider that correct a bug may introduce new bugs
Do not consider specification errors (only implementation faults)
13
Reliability growth characterization
 Time between failure: the time between failure is increasing

Random Variables T1, ..., Tn
Ti = time between failure i-1 and failure i
Reliability growth: Ti <=st Tk for all i < k

Prob {Ti < x} >= Prob {Tk <= x} -> FTi(x) >= FTk(x) forall i < k and for all x
Tk = time between failure k-1 and k
T1 T2 Tk
0
fault fault fault

fault
14
Reliability growth characterization
 Number of failure: the number of failure is decreasing

Cumulative number of failure law: the number of failure events in an interval
of the form [0, tk] is larger than the number of events taking place in an interval
of the same length beginning later
Random Variables N(t1), ..., N(tn)

N(ti) = cumulative number of failures between 0 and ti
0 x xx x x xx x x
N(1) N(2)
N(k)
15
Jelinski and Moranda Model
(the earliest and the most commonly used model)
N faults at the beginning of the testing process

- each fault is independent of others and
- equally likely to cause a failure during testing
- detected fault is removed in a negligible time and no new faults are introduced
the fault manisfestation rate

Ti time between the failure (i-1) and the failure i
depends on the fault manifestation rate and the number of faults in the system
l(i) = [N-(i-1)] failure rate of the i-th failure
P(Ti < ti)
16
Schick and Wolver ton Model
Software failure rate is proportional to the current fault content of the
program as well as to the time elapsed since the last failure
Goel and Okumoto Imperfet Debbugging Model

The number of faults in the system at time t is treated as a Markov
process whose transition probabilities are governed by the
probability of imperfect debugging.
Other models ….
17
Dependency analysis
 Workload/failure dependency
workload appers to act as a stress factor: the failure rate increases as
the workload increases
 Correlation among failures on different components

- exists significantly in distributed systems
- for example, disk and network errors are strongly correlated,
because the processors in the system heavily use and share
the disk and the network concurrently
- generally the error correlation is high (0.62), the failure correlationis
low (0.06)
Common Cause Failure

a failure of two or more structures, systems or components due to a
single specific event or cause
18
DEFENSE against application sw CCF
• The software development process is robust and of high quality,

• The OS platform and its software development life cycle process are mature,
• Rigorous V&V methodology is used,
• Configuration management after deployment is robust (including control of software
versions, setpoint changes, spares),
• Standardized software development tools and function libraries,
• Exclusive use of pre-defined and rigorously qualified function block libraries for
application programming,
• Clearly defined rules for use of the software functional blocks (including exception
handling),
• Thorough coverage of pre-operational testing,
• Comprehensive exception handling,
• Deterministic program execution,
• Strictly cyclic operation, and
• OS defensive measures
From: B. Enzinna, L. Shi, S. Yang, Software Common-Cause Failure Probability Assessment,

NPIC&HMIT 2009
19
Software Reliability Engineering
Software Reliability Engineering (SRE) is the

quantitative study of the operational behavior of
software-based systems with respect to user
requirements concerning reliability.
20
A global software reliability analysis method
(In Karama Kanoun, ReSIST network of Excellence Courseware “Software Reliability

Engineering”, 2008 http://www.resist-noe.org/)
21
 Data collection process
- includes data relative to product itself (software size, language,
workload, ...), usage environment: verification & validation
methods and failures
- Failure reports (FR) and correction reports (CR) are generated
 Data validation process

data elaborated to eliminate FR reporting of the same failure, FR
proposing a correction related to an already existing FR, FR
signalling a false or non identified problem, incomplete FRs or
FRs containing inconsistent data (Unusable) …
Data extracted from FRs and CRs are:

Time to failures (or between failures)
Number of failures per unit of time
Cumulative number of failures
22
 Descriptive statistics
make syntheses of the observed phenomena
Analyses Fault typology, Fault density of components, Failure /
fault distribution among software components (new, modified,
reused)
Analyses Relationships Fault density / size / complexity;
Nature of faults / components; Number of components affected by
changes made to resolve an FR .
…….
 Trend tests
Control the efficiency of test activities
- Reliability decrease at the beginning of a new activity: OK
- Reliability grow after reliability decrese: OK
- Sudden reliability grow CAUTION!
- .......
 Model application
Trend in accordance with model assumptions
23
 Due to the nature of software, no general accepted mechanisms
exist to predict software reliability
 Important empirical observation and experience
 Good engineering methods can largely improve software reliability
 Software testing serves as a way to measure and improve

software reliability
 Unfeasibility of completely testing a software module:

defect-free software products cannot be assured
 Databases with software failure rates are available but numbers

should be used with caution and adjusted based on observation
and experience
24

Software Reliability

Uploaded by

Copyright:

Available Formats

Software Reliability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Software Reliability

Uploaded by

Copyright:

Available Formats

Software Reliability

the probability of failure-free software operation for a specified

Software is subject to input output

hard to visualize, classify, detect, and correct.

 closely related to human factors and the design

Apparent reliability of a piece of software is correlated to how

We assume that programs will not be fault-free

The input to the software is a set of variables, defining a Cartesian

Effcacy of software fault tolerance techniques depends on how

 As a software is used, design faults are discovered and

From “Software Reliability”,

identify periods of reliability growth and decrease

 in the useful-life phase, software will experience a

 Even bug fixes may be a reason for more software failures,

From “Software Reliability”, J. Pan, Carnegie Mellon University, 1999

Removal of implementation errors should increse MTTF, and

 Time between failure: the time between failure is increasing

Reliability growth: Ti <=st Tk for all i < k

Tk = time between failure k-1 and k

fault fault fault

 Number of failure: the number of failure is decreasing

Random Variables N(t1), ..., N(tn)

N faults at the beginning of the testing process

the fault manisfestation rate

l(i) = [N-(i-1)] failure rate of the i-th failure

P(Ti < ti)

Goel and Okumoto Imperfet Debbugging Model

 Correlation among failures on different components

Common Cause Failure

• The software development process is robust and of high quality,

From: B. Enzinna, L. Shi, S. Yang, Software Common-Cause Failure Probability Assessment,

Software Reliability Engineering (SRE) is the

(In Karama Kanoun, ReSIST network of Excellence Courseware “Software Reliability

 Data validation process

Data extracted from FRs and CRs are:

 Important empirical observation and experience

 Good engineering methods can largely improve software reliability

 Software testing serves as a way to measure and improve

 Unfeasibility of completely testing a software module:

 Databases with software failure rates are available but numbers

You might also like