Digital IC Testing: An Introduction: Taxonomy and Definitions
Digital IC Testing: An Introduction: Taxonomy and Definitions
Digital IC Testing: An Introduction: Taxonomy and Definitions
The evaluation of the reliability and quality of a digital device is commonly called
testing, yet it comprises distinct phases that are usually kept separate, both in the research
community and in industrial practice.
1. Verification is the initial phase at design time when one ensures the match of
requirements specifications to the current functionality, that is, a digital device needs to
be verified for correctness. Verification checks that all design rules are adhered to. More
generally, this type of functional testing checks that the circuit: (a) implements what it is
supposed to do and (b) does not do what it is not supposed to do. Both conditions are
necessary. This type of evaluation uses a variety of techniques, including logic
verification with the use of hardware description languages, full functional simulation,
and generation of functional test vectors. We do not discuss verification techniques here.
From: EE Handbook, CRC Press, 2005
2. Testing more precisely refers to the phase when one must ensure that only
defect-free production devices are shipped, and faults arising from manufacturing are
detected. Testing methods must: (a) be fast enough to be applied to large numbers of
devices during production; (b) take into consideration possible access to large expensive
external tester machines; and (c) consider whether the implementation of Built-In Self-
Test (BIST) proves to be advantageous. In BIST, the circuit is designed to include its
own self-testing extra circuitry and thus can signal directly, during testing and in the
field, its possible failure status. Of course, this involves a certain amount of area
overhead and trade-offs must be considered. The development of appropriate testing
algorithms and their tool support can require a large amount of engineering effort, but
one must note that it may need to be done only once per design. The speed of application
of the method to many copies of the devices can be of more importance.
If many defects are present during the manufacturing process, the final yield is
lowered. Estimates can be derived for the relationship between manufacturing yield,
effectiveness of testing (fault coverage) and defect level remaining after test [Williams,
1986]. Let Y denote the yield, which is some value between 1 (100% defect-free
production) and 0 (all circuits faulty after testing), and is approximated by the ratio of
“good devices” over “total devices”. Let FC be the fault coverage, calculated as the
percentage of detected faults over the total number of detectable modeled faults (see
below for fault models). The value of FC ranges from 1 (all possible faults detected) to 0
(no useful testing done). We are interested in estimating the final defect level (DL) after
test, defined as the probability of shipping a defective product. That is, DL measures the
number of bad devices which pass all tests; its value is expressed as number of defects
per million (DPM). It has been shown that tests with high fault coverage for certain fault
models also have high defect coverage. The empirical equation is: DL = 1 − Y (1 − FC ) .
Plotting this equation gives interesting and practical results. For example, if a value of
DPM = 300 (that is, DL = 0.0003 = 0.03%) and a value of Y = 0.5 are desired, then it
must be that FC = 1 − (log (1 − DL ) / log Y ) = 0.999567 , which is 99.957%. Conversely,
with a similar desired yield Y = 0.5 and FC = 0.9, then DL = 1 − 0.5 (1− 0.9 ) = 0.06697
which implies that about 6.7% of shipped devices are defective. The main conclusion to
be drawn is that a very high fault coverage must be achieved to obtain any acceptable
defect level value, and manufacturing yield must be continually improved to maintain the
reliability of shipped products.
Figure 1 shows the taxonomy for testing methods. The main division is between on-line
and off-line. In the former, each input/output word from a circuit can be tested during
normal operation, usually implying that the circuit has been augmented to contain some
embedded coding scheme to provide the error detection. In the latter, the circuit must
suspend normal operation and enter a “test mode,” at which time the appropriate method
of testing is applied. Off-line testing can be executed either through external testing, for
example a tester machine external to the circuitry, or through the use of BIST.
Fault Models
At the defect level, an enormous number of different failures could be present and it is
totally infeasible to analyze them as such. Thus failures are grouped together with regards
to their logical fault effect on the functionality of the circuit, and this leads to the
construction of logical fault models as the basis for testing algorithms and the evaluation
of fault coverage [Abramovici et al., 1992; Jha et al., 2003]. More precisely, a fault
denotes the physical failure mechanism, the fault effect denotes the logical effect of a
fault on a signal-carrying net, and an error is defined as the condition (or state) of a
system containing a fault (deviation from correct state). Faults can be further divided into
classes according to their characteristics, as shown in Figure 2. Here we discuss only
permanent faults, that is, faults in existence long enough to be observed at test time, as
opposed to temporary faults (transient or intermittent), which appear and disappear in
short intervals of time, or delay faults, which affect the operating speed of the circuit.
The fundamental fault model is a stuck-at fault, which implies the fault effect to be a
line segment stuck at logic 0 or 1 (stuck-at 0 or stuck-at 1). Testing may consider single
or multiple stuck-at faults and Figure 3 shows an example for a simple circuit. The fault-
free function is shown as Z, while the faulty output functions, under the occurrence of the
From: EE Handbook, CRC Press, 2005
single stuck-at faults of either line a stuck-at 0 (a/0) or of line b stuck-at 1 (b/1), are
shown as Z*. A stuck-at fault needs a single appropriate input pattern to stimulate the
fault by controlling the inputs and making the faulty output observable, and the goal of
test patterns generation algorithms is to find such settings for all detectable faults.
On the other hand, a delay fault defines the fault effect to be slow-to-rise (from 0
to 1) or slow-to-fall (from 1 to 0), such that the final value may indeed be correct, but
outside the timing parameters expected. To detect such faults, one must apply two
patterns as stimuli: the first to set a line at a certain value and the second to change that
value. This, of course, increases the complexity of a fault detection algorithm.
Even considering only single stuck-at faults, not all possible faults need to be
explicitly tested, as many have the same fault effect and are indistinguishable. For
example, an input stuck-at-0 fault for an AND gate has the same logical effect as the
output stuck-at-0, and one needs only test for either one. Fault collapsing is the process
of reducing the total number of faults to be examined using fault equivalence classes
based on fault effects. Table 1 shows the main fault equivalence classes for gates.
Most algorithmic test pattern generators are based on the concept of sensitized
paths, through controllability and observability. Given a line in a circuit, one wants to
find a sensitized path to stimulate a possible fault and carry its logical effect to an
observable output. For example, to sensitize a path that passes through an AND gate, one
must set all other inputs of the gate to logic 1 to permit the sensitized signal to carry
forward.
propagation is shown, where possible controlling inputs are set. Lines a and d must be set
to 1 to control line g being 1, thus assigning x1=x2=1, which also forces line f to be 0.
This leaves x3 with a choice of either 0 or 1 to maintain h=0. Thus 2 test patterns are
found: (x1 x2 x3) = {(1 1 0) or (1 1 1)}.
The best-known algorithms are the D-algorithm (precursor to all), PODEM, and
FAN, plus many variations with heuristics and optimizations, often tailored to certain
classes of devices [Abramovici, 1992; Jha et al., 2003]. Major steps can be identified in
most automatic test pattern generation (ATPG) programs: (1) listing the signal on the line
on which a fault should be detected; (2) sensitizing the path from that line to a primary
output such that the fault can be observed; (3) determining the primary input conditions
necessary to set the testing signal (back propagation); and (4) repeating this procedure
until all detectable faults in a given fault set have been covered. Powerful heuristics are
used to speed the steps by aiding in the sequential selection of faults to be examined and
by cutting the amount of back and forward propagation necessary. The overall process is
summarized graphically in Figure 6.
From: EE Handbook, CRC Press, 2005
known good value. The main advantages of this method are that (1) the testing can be
done at circuit speed by appropriate choice of the pseudo-random generator; (2) there is
no need to generate algorithmic test patterns; and (3) the testing circuitry involves a very
small area, especially if the circuit has been designed using scan techniques (see next
Chapter).
The issues revolve around designing very efficient input generators and
compactors. The main disadvantage of this method is the possibility of aliasing. When
the short signature is formed, a loss of information occurs, and it can be the case that a
faulty circuit produces the same signature as a fault-free circuit. The design method for
data compaction aims at minimizing the probability of aliasing. Using the compactors
described below, the probability of aliasing has been theoretically proven to be 2–k, where
k is the length of the compactor and thus the length of the signature. It is important to
note that the result is asymptotically independent of the size and complexity of the circuit
under test. For example, for k = 16, the probability of aliasing is about 10–6. The
empirical results show that in practice this method is even more effective. Most of all,
this is the chosen methodology when BIST is required for its effectiveness, speed, and
small area overhead.
A secondary issue in data compaction is in the determination of the expected
“good” signature. The best way is to use fault-free simulation for both the circuit and the
compactor, and then the appropriate comparator can be built as part of the testing
circuitry [Bardell et al., 1987; Abramovici, 1992; Jha et al., 2003].
The most important issues are in the choices of a pseudo-random generator and a
compactor. Although no “perfect” compactor can be found, several have been shown to
be very effective. Several compaction techniques have been researched: counting
techniques, as in one’s count, syndrome testing, transition count, and Walsh spectra
coefficients; and techniques based on linear feedback shift registers (LFSR) or linear
cellular automata registers (LCA). Only these latter ones are discussed here. LFSRs and
LCA are also the preferred implementation for the input pattern generators.
The state of the ALFSR is denoted by the binary state of its cells. In Figure 8(a),
the next state of each cell is determined by the implementation given by its polynomial
and can be summarized as follows: s0+ = s 2 , s1+ = s0 ⊕ s 2 and s 2+ = s1 , where
the si+ denote the next state of cell si at each clock cycle. If the ALFSR is initialized in a
nonzero state, it cycles through a sequence of states and eventually comes back to the
initial state, following the functionality of the next-state rules implemented by its
polynomial description. If the polynomial chosen to describe the ALFSR is primitive, the
ALFSR of length k cycles through all possible 2k - 1 nonzero states in a single cycle (see
the theory of Galois fields for the definition of primitive), and such polynomials can be
found from tables [Bardell et al., 1987]. By connecting the output of each cell to each
input of a circuit under test, the ALFSR provides an ideal input generator, as it is
inexpensive in its implementation and it provides the stimuli in pseudo-random order.
tables are now used. The main disadvantage of LCA is in the area overhead incurred by
the extra EXOR gates necessary for the implementation of the cell rules.
In the process of computing a signature for testing a circuit, the input stream to
the LFSR used as a compactor is the output stream from the circuit under test. At the end
of the testing cycles, only the last state of the LFSR is examined and considered to be the
compacted signature of the circuit. In most real cases, circuits have many outputs, and the
LFSR is converted into a multiple-input shift register (MISR). A MISR is constructed by
adding EXOR gates to the input of some or all the flip-flop cells; the outputs of the
circuit are then fed through these gates into the compactor. The probability of aliasing for
a MISR is the same as that of an LFSR; however, some errors are missed due to
cancellation. This is the case when an error in one output at time t is canceled by the
EXOR operation with the error in another output at time t + 1. Assuming equal
probabilities for the different errors occurring, the probability of error cancellation has
been shown to be 21–m–N, where m is the number of outputs compacted and N is the length
of the output streams.
Given that the normal length of signatures used varies between k = 16 and k = 32,
the probability of aliasing is minimal and considered to be acceptable in practice. In
MISR, the length of the compactor also depends on the number of outputs tested. If the
number of outputs is greater than the length of the MISR, algorithms or heuristics exist
From: EE Handbook, CRC Press, 2005
for combining outputs with EXOR trees before feeding them to the compactor. If the
number of outputs is much smaller, various choices can be evaluated. The amount of
aliasing that actually occurs in a particular circuit can be computed by full fault
simulation, that is, by injecting each possible fault into a simulated circuit and computing
the resulting signature. Changes in aliasing can be achieved by changing the polynomial
used to define the compactor. It has been shown that primitive polynomials, essential for
the generation of exhaustive input generators (see above), also possess better aliasing
characteristics. It is also been shown that all implementations of such linear finite state
compactors, be it LFSR or LCA, possess the same aliasing properties.
Summary
Accessibility to internal dense circuitry is becoming a greater problem, and thus it is
essential that a designer consider how a device will be tested and incorporate structures in
the design to assist that testing. Formal DFT techniques are concerned with providing
access points for testing. As test pattern generation grows even more prohibitively
expensive, probabilistic solutions based on data compaction and using fault simulation
are becoming more widespread, especially if they are supported by DFT techniques and
can avoid the major expense of dedicated external testers. However, any technique
chosen must be incorporated within the framework of a powerful CAD system providing
semiautomatic analysis and feedback.
Defining Terms
Aliasing: It occurs if the faulty output produces the same signature as a fault-free output.
Built-in self-test (BIST): The inclusion of on-chip circuitry to provide testing.
Fault coverage: The percentage of detected faults over all possible detectable faults.
Fault simulation: An empirical method used to determine how faults affect the operation
of a circuit and/or also how much testing is required to obtain a desired fault coverage.
LFSR: A shift register formed by flip-flops and EXOR gates, chained together, with a
synchronous clock, used either as input pattern generator or as signature analyzer.
MISR: Multiple-input LFSR.
Off-line testing: A testing process carried out while the tested circuit is not in use.
On-line testing: Concurrent testing to detect errors while circuit is in operation.
Pseudo-random pattern generator: It generates a binary sequence of patterns where the
patterns appear to be random in the local sense, but they are deterministically repeatable.
Random testing: The process of testing using a set of pseudo-randomly generated
patterns.
Signature analysis/data compaction: A test where the output responses of a device over
time are compacted into a characteristic value called a signature, which is then compared
to a known good one.
Stuck-at fault: A fault model represented by a signal stuck at a fixed logic value (0 or 1).
Test pattern (test vector): An input vector such that the faulty output is different from
the fault-free output (the fault is stimulated and detected).
From: EE Handbook, CRC Press, 2005
References
1. M. Abramovici, M.A. Breuer and A.D. Friedman, Digital Systems Testing and
Testable Design, Rockville, Md.: IEEE Press, 1992.
2. P.H. Bardell, W.H. McAnney, and J. Savir, Built-In Test for VLSI:
Pseudorandom Techniques, New York: John Wiley and Sons, 1987.
3. K. Cattell and J.C. Muzio, “Synthesis of one-dimensional linear hybrid cellular
automata,” IEEE Trans. Computer Aided Design, vol. 15, no. 3, pp. 325–335,
1996.
4. N. Jha and S. Gupta, Testing of Digital Systems, Cambridge University Press,
2003.
5. T.W. Williams (Ed.), VLSI Testing, Amsterdam: North-Holland, 1986.
Further Information
The author refers the reader to the books by Abramovici et al. [1992] and Jha et al.
[2003] which, at the present time, give the most comprehensive view of testing methods
and design for testability. More information on deterministic pattern generation can also
be found in Fault Tolerant Computing, edited by D.K. Pradhan,. For newer approaches to
random testing, the book by Bardell et al. [1987] contains basic information. The latest
state-of-the-art research is to be found mainly in proceedings of the IEEE International
Test Conference.