Software Reliability

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Software Reliability

1
Software Reliability
 What is software reliability?

the probability of failure-free software operation for a specified


period of time in a specified environment

Software is subject to input output


sw

 design flaws:
- mistakes in the interpretation of the specification
that the software is supposed to satisfy (ambiguities)
- mistakes in the implementation of the specification:
carelessness or incompetence in writing code,
inadequate testing

 operational faults
incorrect or unexpected usage faults (operational profile)
2
Design Faults

hard to visualize, classify, detect, and correct.

 closely related to human factors and the design


process, of which we don't have a solid understanding

Given a design flaw, only some type of inputs will exercise that
fault to cause failures. Number of failures depend on how
often these inputs exercise the sw flaw

Apparent reliability of a piece of software is correlated to how


frequently design faults are exercised as opposed to number
of design faults present

3
Software reliability

4
5
We assume that programs will not be fault free

6
Software faults and Failure regions

We assume that programs will not be fault-free

The input to the software is a set of variables, defining a Cartesian


space, e.g. x and y

Failure regions
y

x
The software contains bugs if some inputs are processed erroneously

Effcacy of software fault tolerance techniques depends on how


disjoint the failure regions of the versions are

7
Software Reliability
 Software reliability is not a direct function of time.
Electronical and mechanical parts may become old, and
wear-out with time and usage.
Software DOES NOT wear-out during its life.
Software DOES NOT change over time unless
intentionally changed or upgraded

 As a software is used, design faults are discovered and


corrected. Consequently, the reliability should improve,
and the failure rate should decrease BUT corrections could
cause new faults

8
SOFTWARE RELIABILITY EVOLUTION
 upgrades imply feature upgrades, not upgrades for reliability.

From “Software Reliability”,


J. Pan, Carnegie Mellon University, 1999

identify periods of reliability growth and decrease

9
SOFTWARE RELIABILTY EVOLUTION
 in the last phase, software does not have an
increasing failure rate as hardware does. In this phase,
software is approaching obsolescence; there are no
motivations for any upgrades or changes to the software.
Therefore, the failure rate will not change.

 in the useful-life phase, software will experience a


drastic increase in failure rate each time an upgrade is made.
The failure rate levels off gradually, partly because of the defects
found and fixed after the upgrades.

 Even bug fixes may be a reason for more software failures,


if the bug fix induces other defects into software

10
 Reliability upgrades drop in software failure rate, if redesign or
reimplementation of some modules with better engineering
approaches

From “Software Reliability”, J. Pan, Carnegie Mellon University, 1999

11
12
Software Reliability Growth Models

Removal of implementation errors should increse MTTF, and


correlation of bug-removal history with the time evolution of the
MTTF value may allow the prediction of when a given MTTF
value will be reached.

Disadvantages:
Do not consider that correct a bug may introduce new bugs
Do not consider specification errors (only implementation faults)

13
Reliability growth characterization

 Time between failure: the time between failure is increasing


Random Variables T1, ..., Tn
Ti = time between failure i-1 and failure i

Reliability growth: Ti <=st Tk for all i < k


Prob {Ti < x} >= Prob {Tk <= x} -> FTi(x) >= FTk(x) forall i < k and for all x

Tk = time between failure k-1 and k

T1 T2 Tk
0

fault fault fault


fault

14
Reliability growth characterization

 Number of failure: the number of failure is decreasing


Cumulative number of failure law: the number of failure events in an interval
of the form [0, tk] is larger than the number of events taking place in an interval
of the same length beginning later

Random Variables N(t1), ..., N(tn)


N(ti) = cumulative number of failures between 0 and ti

0 x xx x x xx x x
N(1) N(2)
N(k)

15
Jelinski and Moranda Model
(the earliest and the most commonly used model)

N faults at the beginning of the testing process


- each fault is independent of others and
- equally likely to cause a failure during testing
- detected fault is removed in a negligible time and no new faults are introduced

the fault manisfestation rate


Ti time between the failure (i-1) and the failure i
depends on the fault manifestation rate and the number of faults in the system

l(i) = [N-(i-1)] failure rate of the i-th failure

P(Ti < ti)

16
Schick and Wolver ton Model
Software failure rate is proportional to the current fault content of the
program as well as to the time elapsed since the last failure

Goel and Okumoto Imperfet Debbugging Model


The number of faults in the system at time t is treated as a Markov
process whose transition probabilities are governed by the
probability of imperfect debugging.

Other models ….

17
Dependency analysis
 Workload/failure dependency
workload appers to act as a stress factor: the failure rate increases as
the workload increases

 Correlation among failures on different components


- exists significantly in distributed systems
- for example, disk and network errors are strongly correlated,
because the processors in the system heavily use and share
the disk and the network concurrently
- generally the error correlation is high (0.62), the failure correlationis
low (0.06)

Common Cause Failure


a failure of two or more structures, systems or components due to a
single specific event or cause

18
DEFENSE against application sw CCF

• The software development process is robust and of high quality,


• The OS platform and its software development life cycle process are mature,
• Rigorous V&V methodology is used,
• Configuration management after deployment is robust (including control of software
versions, setpoint changes, spares),
• Standardized software development tools and function libraries,
• Exclusive use of pre-defined and rigorously qualified function block libraries for
application programming,
• Clearly defined rules for use of the software functional blocks (including exception
handling),
• Thorough coverage of pre-operational testing,
• Comprehensive exception handling,
• Deterministic program execution,
• Strictly cyclic operation, and
• OS defensive measures

From: B. Enzinna, L. Shi, S. Yang, Software Common-Cause Failure Probability Assessment,


NPIC&HMIT 2009

19
Software Reliability Engineering

Software Reliability Engineering (SRE) is the


quantitative study of the operational behavior of
software-based systems with respect to user
requirements concerning reliability.

20
A global software reliability analysis method

(In Karama Kanoun, ReSIST network of Excellence Courseware “Software Reliability


Engineering”, 2008 http://www.resist-noe.org/)
21
 Data collection process
- includes data relative to product itself (software size, language,
workload, ...), usage environment: verification & validation
methods and failures
- Failure reports (FR) and correction reports (CR) are generated

 Data validation process


data elaborated to eliminate FR reporting of the same failure, FR
proposing a correction related to an already existing FR, FR
signalling a false or non identified problem, incomplete FRs or
FRs containing inconsistent data (Unusable) …

Data extracted from FRs and CRs are:


Time to failures (or between failures)
Number of failures per unit of time
Cumulative number of failures

22
 Descriptive statistics
make syntheses of the observed phenomena
Analyses Fault typology, Fault density of components, Failure /
fault distribution among software components (new, modified,
reused)
Analyses Relationships Fault density / size / complexity;
Nature of faults / components; Number of components affected by
changes made to resolve an FR .
…….

 Trend tests
Control the efficiency of test activities
- Reliability decrease at the beginning of a new activity: OK
- Reliability grow after reliability decrese: OK
- Sudden reliability grow CAUTION!
- .......

 Model application
Trend in accordance with model assumptions

23
Software Reliability
 Due to the nature of software, no general accepted mechanisms
exist to predict software reliability

 Important empirical observation and experience

 Good engineering methods can largely improve software reliability

 Software testing serves as a way to measure and improve


software reliability

 Unfeasibility of completely testing a software module:


defect-free software products cannot be assured

 Databases with software failure rates are available but numbers


should be used with caution and adjusted based on observation
and experience

24

You might also like