On Distributed
Fault Simulation
zyxw
Tassos Markas, Mark Royals, and Nick Kanopoulos
Research Triangle Institute
V
erifying the functionality of digital circuits in the testbed relies
heavily upon test generation.
Because of the complexity of today’s digital circuits, efficient general-purpose
automatic test pattern generators (ATPGs)
are not yet available. To overcome the
inefficiency of current ATPG programs,
one alternative method relies on the
development of functional or pseudorandom test sets. These test sets are subsequently evaluated to determine if the obtained fault coverage (percentage of detected faults) meets the requirements
specified by the user.
The evaluation of the given test set is
carried out by the fuult simulator. a program that examines the behavior of digital
circuits under various faulty conditions.
During the fault simulation process, a
given test set is applied to the fault-free
circuit and to each of the faulty circuits
obtained by introducing certain classes of
faults in the circuit nodes. The circuit response is consequently analyzed to determine the faults detected by the test set. A
fault is considered detected by a test pattern if the response of the fault-free circuit
differs from the response of the circuit in
the presence of this particular fault. The
simulation time can be reduced by identifying equivalent fault classes and simulating only one fault from each class. This
Efficient partitioning
a
Of
zyx
zyx
k
when more than one processor is available.
This can be achieved by dividing the fault
simulation task into independent subtasks
assignable for execution to various computing resources. This large-grain granularity. type
of parallelism can be exploited
..
by using a distributed computing environment that consists of several powerful
nodes connected via a local area network
(LAN).
In this article we examine the comDutational aspects of fault simulation and address issues related to the efficient partitioning of a fault simulation task to a
number of subtasks assignable for execution to the nodes of a distributed system. In
addition, we describe the implementation
of a distributed fault simulation facility
(called DFSim) using a heterogeneous
LAN consisting of a number of workstations with different computing resources
and different versions of Unix operating
systems.
~
tasks and allocation of
the resulting subtasks
over a distributed
system yields faster
fault simulation
without resorting to
expensive Specialpurpose hardware*
zyxwvutsrqponm
40
process is called fault collapsing.’
Fault simulators carry out fault grading
and development of fault dictionaries,
which are requirements for highly reliable
circuits. In today’s highly integrated digital circuits, the fairly large number of
equivalent fault classes in a given circuit
makes fault simulation an expensive and
time-consuming process. However, fault
simulation has an inherent parallelism that
can be used to speed up the execution time
0018-9162/90/0100-0040$0l 0 0 0 19901EEE
Fault simulation
Fault simulation is used primarily for
the generation of fault dictionaries and the
evaluation of adigital circuit’s fault coverage by a given test set. Fault simulation is
also employed to analyze the operation of
a circuit under various fault conditions.
COMPUTER
The performance criteria for various existing fault simulation algorithms are the
processing power and the memory requirements.
Although processing power is the main
consideration in efficiently simulating
large circuits, maintaining a minimum
hardware configuration will keep the cost
of the machine low.
The major fault simulation algorithms
differ in the way they manage the trade-off
between processing speed and memory.
The three most commonly used fault simulation techniques are parallel, concurrent,
and parallel value list (PVL) fault simulation.
In fault simulators, faults are represented as fault structures associated with
each node of a digital circuit. These fault
structures propagate from the primary
inputs towards the primary outputs based
on the type of the logic gate they are associated with and the fault-free value of the
examined gate. During this procedure, the
faults that reach the primary outputs are
considered detected, and they are dropped
from the undetected fault list of the circuit.
The parallel* fault simulation technique
is based on the parallel processing of bitoriented operations. The driving force of
this approach arises from the capability of
computer systems to perform logical instructions on a number of bits in parallel.
This approach has the advantages of efficient memory use, because a number of
events can be packed in the same word, and
parallel computation, because computer
instructions can operate on a number of
bits simultaneously. Figure 1 shows the
fault structure in a 32-bit machine, with the
least significant bit representing the faultfree value of the node.
In concurrent' fault simulation, the fault
structure associated with each node contains the faults that produce an error at the
30
29
WordA
a..
Word 6
Fault Values
Logic 0
Logic 1
Logic 2
Logic X
I
i (ii
zyxwvutsrq
Figure 1. Fault structure of the parallel algorithm.
zyxwvutsrqponm
zyxw
zyxwvut
zyxwvutsr
zyxwv
zyxwvutsrqpon
zyxw
~.o
From
fault--+
free
value
Pointer
Fault 1
Fault
value
1
January 1990
0
Fault
1
oovaluesoo
Fault
value
inputs and/or at the output of the related
gate. These faults are arranged in a linked
structure as shown in Figure 2. The propagation of the fault structures is accomplished by simulating only the portion of
the circuit whose faulty response differs
from the fault-free response. The computational complexity of the concurrent fault
simulation algorithm is 0[n2],where n is
the number of gates of the ~ i r c u i t . ~
The concurrent fault simulation approach is faster than the parallel approach,
but it requires considerable amounts of
c*
1
memory. Also, concurrent fault simulators
require dynamic memory allocation capabilities, since the amount of memory cannot be determined during circuit compilation time.
Finally, the parallel value list' (PVL)
algorithm combines the above approaches.
A typical PVL structure (see Figure 3)
consists of parallel lists connected in a
linked structure. During this algorithm a
fault structure propagates towards the primary outputs if any of the faults in the list
produce a response different from the
Pointer
Group 2
1
Fault
value
Figure 2. Fault structure of the concurrent algorithm.
Group 1
0 1
Fault N
Fault 2
Group N
.mm
41
zy
zyxwvu
zyxwvutsrq
fault-free response of the circuit. This
approach has the advantage of simulating
in parallel the faults packed in one memory
word while maintaining the computing
efficiency of the concurrent approach.
Circuit
description
FSim
Motivation
Concurrent fault simulation of today’s
very large scale integrated circuits requires
considerable computing resources. Consequently, a significant interest exists in
reducing the excessive computing time of
the fault simulation process without resorting to special-purpose hardware6 (such as
IBM’s Yorktown Simulation Engine,
NEC’s Hardware Accelerator, Zycad,
IKOS, or Daisy), which is often expensive
and sometimes incompatible with the use
of different fault simulation algorithms
and/or different systems.
One way of accomplishing this speedup
i s to exploit the inherent parallelism of the
fault simulation process by developing a
methodology that divides a fault simulation task into subtasks and then assigns
them for independent processing to a
number of processors connected via a local
area network. The increasing use of LANs,
which include a significant number of
powerful workstations, constitutes an
appealing environment for distributed
processing applications. Distributed processing also provides better resource utilization, especially when the individual
nodes of such networks remain relatively
idle for long periods of time.
1
Undetected
fault list
( LJFL)
0
t
J.
I
Initial
fault
list
Output
fault
list
patterns
zyxwvutsrq
Figure 4. Fault simulation procedure.
Page
faults (1,000)
Systolic Array
Simulation
time (seconds)
10,000
3,574 equivalent
fault classes
8,000
. 80
6,000
.
’
60
-
40
.
20
I
Computational
requirements of
fault simulation
,?
4,000
Before introducing our distributed fault
simulation approach, we will review the
basic concepts involved in the operation of
fault simulation.
The simulator initially parses the circuit
and performs fault enumeration to produce
the undetected fault list, which contains all
the faults associated with the input and
output lines of all the gates in the circuit.
After this. the simulator reduces the number of faults to be simulated by collapsing
them into a set of equivalent fault classes.
The test patterns are simulated one at a
time, and the faults detected at each step
are dropped from the undetected fault list
until the entire test set or the undetected
fault list has been exhausted (see Figure 4).
2,000
600
1,OOo
1,400
1,800
2200
Number of faults per pass
-0 : Simulationtime
zyxwvutsr
42
--- 0
: Page faults
Figure 5. Fault simulation on a single server (a) for a sequential circuit and
COMPUTER
At the end of the simulation, the user receives the obtained fault coverage, along
with the undetected fault list and other
statistical information on the detected
faults.
The majority of commercially available
simulators allow the user to define an
initial undetected fault list as well as the
number of faults to consider during each
simulation pass. This latter parameter
enables the simulator to divide the entire
fault list into subsets and simulate these
subsets one at a time. This capability assists memory management in cases where
the system has insufficient physical memory to hold the fault simulation data for
Simulation
time (seconds)
the entire circuit.
To demonstrate the behavior of the fault
simulation under this condition, the graphs
in Figure 5 show the simulation time as a
function of the number of faults per pass
(solid line). W e used two different types of
circuits for this experiment, simulating
both in a machine with five megabytes of
main memory.
The first circuit - a highly sequential
one -was a systolic array chip consisting
of eight identical cells placed in a linear
pipe. The circuit held approximately 6,000
gates. We injected a set of 5,192 stuck-at
faults, which collapsed to 3,574 equivalent
fault classes. Finally, we simulated the
zyxwvu
zyxwvutsr
zyxwv
zyxwvutsrqpo
Page
faults (1,000)
Multiplier
10,000
4,035 equivalent
fault classes
8,000
. 80
-
6,000
60
*.+*
..U*
. 20
2,000
5 Mbytes of
*.*-
physical memor)l--o.--.’a
-.a---
O-.”’
a.-I
I
1,000
I
2,000
I
I
3,000
Number of faults per pass
- 0 : Simulation time
.
--- 0 : Page faults
(b) for a combinational circuit.
January 1990
Distributed fault
simulation
A fault simulation task can be divided
into a number of computationally independent subtasks by partitioning either the
fault list or the test set. For fault list partitioning, this inherent parallelism stems
from the lack of data dependencies among
the different subtasks. During test set partitioning, some data and control flow (in
case of sequential circuits) dependencies
make this technique more difficult to implement. In both cases the interprocess
communication among the various
subtasks is negligible, which is very appealing for developing a distributed facility capable of performing fault simulation.
Computing environments consisting of
a number of powerful workstations (servers) connected via a LAN can efficiently
execute such distributed tasks. In the distributed environment shown in Figure 6,
the client machine divides a simulation
task into subtasks by partitioning either the
fault list, in which case each server simulates a subset of the fault list using the
initial test set, or the test set, in which case
each server simulates the entire fault list
with a subset of the test set. After completion of the partitioning task, the
subtasks are assigned to the servers, which
zyxwvuts
zyxwvu
zyxwvutsrqpon
. 40
4,000
I
circuit with 390 test vectors and obtained a
fault coverage of 93 percent.
The second circuit was a 16x16 array
multiplier (Booth encoding) with approximately 3.000 gates and 4,035 equivalent
fault classes. We simulated this circuit
using 52 pseudorandom test patterns and
obtained a fault coverage of 99 percent.
Figure 5 shows a significant increase in
the simulation time after a certain number
of faults per pass has been exceeded. This
occurs when the fault simulation data surpasses the storage capacity of the main
memory, in which case the system spends
a significant amount of time swapping
pages from the main memory to the disk
and vice versa (page faulting). The dotted
lines in Figures 5a and 5b give the number
of page faults as a function of the number
of faults per pass. In general, this behavior
is more severe in sequential circuits because of longer fault lists associated with
each node. You can see this in Figure 5a,
where an increase from 1,600 to 2,000 in
the number of faults per pass results in a
drop from 97 percent to 59 percent in CPU
utilization.
I
I
4,000
43
Server 1
zyxwvutsrqp
zyxwvutsrqponm
zyxw
Server N
Server 2
i
-
FSim
FSim
A A
UFL N
UFL 1
Figure 6. Distributed fault simulation.
zyxwvutsrqp
carry out the fault simulation processes
and report the simulation results to the
client machine.
The partitioning
problem
One of the main considerations related
tn distributed fault simulation applications
i s the identification of the best partitioning
method (fault list or test set), as well as the
identification of the optimum partitioning
of a fault simulation task that will result in
minimum turnaround time. During concurrent simulation, the storage requirements depend on a number of factors, such
as circuit activity and circuit sequentiality.
Highly sequential circuits require more
storage due to longer test sequences and
longer fault lists associated with circuit
elements.’
The majority of commercially available
fault simulators have the ability to partition the fault list and simulate each partition in a different simulation pass. For this
reason it is crucial to identify the maxi-
44
mum number of faults that can be considered in each pass without degrading the
performance of the system because of
memory limitations. This parameter depends on the circuit size and available
physical memory of the system. Although
some vendors provide information concerning the estimation of this parameter,
there is no automated method of computing it.
Partitioning of the test set creates a
number of problems, especially during the
simulation of sequential circuits, where
the order in which the test patterns are
applied, as well as the state ofthe circuit, is
crucial. This makes partitioning of the test
set a rather difficult task, since each
subtask requires that the circuit be at a
certain known state during initialization.
Another disadvantage of this approach
involves the unnecessary simulation of the
same faults by different partitions. This
occurs because some faults are carried
through the entire simulation of a subtask
even if detected at some earlier point by
another fault simulation subtask. The lack
of interprocess communication among the
fault simulation subtasks prevents the
removal of the already detected faults from
the distributed fault lists.
The partitioning of the fault list is more
advantageous mainly because it reduces
the memory requirements of the fault simulation task by simulating only a subset of
the original fault list. It also results in a
more efficient implementation, since no
data dependencies exist among the various
subtasks. However, there is no generic
approach to determine the optimum fault
list partitioning for fault simulation purposes. The question that arises at this point
is how to determine the optimum fault list
partition so as to minimize the simulation
time. The performance of the distributed
fault simulation facility is determined by
the turnaround time, which represents the
elapsed time from initialization until the
completion of all fault simulation subtasks
assigned in the network servers. Minimum
turnaround simulation time can be
achieved by minimizing the cumulative
simulation time while maintaining load
balancing.
As mentioned earlier, the initial undetected fault list is generated by simulating
the circuit with an empty test file. The
COMPUTER
particular simulator used in DFSim enumerates the faults so that faults located
close to the primary outputs are placed at
the top of the fault list and faults located
close to the primary inputs are placed at the
bottom of the list. In addition, faults that
belong to the same logic block are enumerated in the same portion of the fault list.
DFSim incorporates several partitioning techniques. In the first method, a
greedy-type partitioning technique, the
fault list is divided into a number of partitions equal to the number of the available
servers. The second algorithm (Rand) selects faults randomly from the initial fault
list and places them in different partitions.
The first technique preserves the hierarchy
of the circuit (faults that belong in the same
logic block will likely be placed in the
same partition), while the second technique does not have this property.
Using the PVL fault simulation algorithm, we evaluated the partitioning techniques implemented in DFSim. We obtained the results reported in Table 1 by
simulating the Booth multiplier and the
systolic array circuit using a homogeneous
network with four servers (VAX 2000
workstations with six megabytes of memory). This table shows that the random
algorithm was on average slower than the
greedy algorithm. The explanation of this
behavior lies in the fact that during concurrent and PVL fault simulation, only the
active portions of the circuit are simulated.
In addition, the PVL algorithm simulates
the fault structures that include at least one
fault that differs from the fault-free response. Based on this observation, we can
achieve a significant speedup if all faults
that differ from the fault-free response can
be packed in the same fault structure. In
this case the number of active fault structures is minimized, thus reducing the fault
simulation time since fewer fault lists have
to be propagated through the nodes of the
circuit. For this reason the greedy-type
technique achieves, on average, faster
simulation time, since faults from the
same logic block are placed in the same
partition.
Although the greedy algorithm experienced better performance on average, the
turnaround time (determined by the most
time-consuming subtask) was better in the
random case. This resulted primarily because the greedy partitioning approach
disturbs the load balancing of the network
by grouping faults located close to the
primary outputs in one or more partitions
that differ from the partitions containing
faults located close to the primary inputs.
The imbalance occurs because the fault
structures that reside close to the primary
outputs take less time to propagate to the
outputs that constitute detection points.
The third algorithm evaluated was a
clustering-type approach (Clust), which
tries to compensate for the disadvantages
of the greedy and random algorithms.
During this technique, the circuit is divided into clusters and a portion of each
cluster is included in each partition. The
clusters are large enough to preserve the
hierarchy of the circuit, but small enough
to avoid causing any heavy imbalances in
the workload of the distributed system.
During this technique, the initial fault
list is divided into N equal fault lists, and
each one is further divided into L clusters,
where L represents the number of available
servers. The final partitioning is performed
by including one cluster from each of the N
fault lists in the same partition, thus
generating L mutually exclusive partitions
that will be assigned for execution in the
nodes of the network.
The Clust approach takes advantage of
the capabilities of the PVL algorithm, since
faults related with the same logic block are
likely to be packed in the same fault structure. The selection of an appropriate N
plays a significant role in the performance
of this technique. If N is very large
(total faultslN r L), then this method
emulates the random algorithm. On the
other hand, if N = l , then the Clust algorithm emulates the greedy approach.
In general, partitions with faults from
one logic block improve the average simulation time. However, this may cause
heavy imbalances in the system, since different logic blocks have different topologies and may be tested by different test
sets, in which case the test ordering will
significantly affect the fault simulation
time of the examined partition. This shows
in Table 1, where utilization of the network
zyxwvu
zyxwv
zyxwvuts
zyxwvut
Table 1. Performance evaluation of the fault partitioning techniques.
Partitioning
Technique
Multiplier (4,035 faults)
CLsize
Tave
Greedy
Clust
Rand
1,009
504
252
20 1
101
50
20
10
5
Systolic array (5,192 faults)
CLsize
Tave
Tmax
Tmax
Util
1,148
1,448
79.3%
1,157
1,162
1,199
1,187
1,177
1,192
1,222
1,249
1,279
1,456
1,411
1,339
1,362
1,219
1,242
1,285
1,328
1,330
79.4%
82.3%
89.5%
87.1 %
96.5%
96.0%
95.1%
94.0%
96.1 %
1,312
1,378
95.2%
Util
zyxwvutsr
,298
649
324
259
130
65
26
13
7
1,722
2,253
76.4%
1,737
2,340
2,156
2,123
1,962
2,055
2,010
2,235
2,248
2,333
2,507
2,378
2,210
2,123
2,276
2,116
2,287
2,425
74.4%
93.3%
90.6%
96.7%
92.4%
90.3%
95.0%
97.7%
92.7%
2,377
2,382
99.7%
CLsize: number of faults per cluster
Tave: average simulation time
Tmax: simulation time of the most time-consuming subtask
Util: average utilization (Taveflmax)
January 1990
45
decreases when partitioning of the rircuit
employs large logic blocks.
Load balancing can be achieved by partitioning the fault list in such a way that all
the 5ervers complete their subtasks at approximately the same time, thereby maintaining high utilization of the available
resources In distributed fault simulation
applications, load balancing is difficult to
maintain throughout the entire simulation
task. primarily because of
-
The location of the faults, with respect
to the primary inputs and the primary
outputs of the circuit, of each partition.
The varying load of servers due to
processes unrelated to fault simulation.
The detectability profile of each partition.
The ordering of the test patterns.
network. For this reason it works over a
single physical network as well as over
complex nets consisting of several physical networks connected via network
bridges.
DFSim uses this capability of the TCP
protocol to implement adistributed system
which may also include a significant
number of servers residing outside the
single physical network boundaries. File
transfer operations are carried out by RTI's
Freedomnet, which is a software subsystem designed to implement a distributed
computing system."'
Finally, this particular implementation
uses GenRad's Hilo-3 PVL fault simulator, which supports the PVL algorithm.
Note that the implementation of DFSim is
essentially simulator independent; DFSim
can be ported to support a variety of simulators. The system is based on the classic
client/ser\jers model, in which the client is
responsible for controlling the entire operation and for partitioning the fault simulation task into subtasks, which are subsequently assigned for execution to the
servers.
As mentioned earlier, DFSim facilitates
a number of fault list partitioning algorithms, which allows the user to select the
most suitable technique for a specific
application. Note also that DFSim can
control the execution of'a test-set-based
partitioning approach, provided the user
has already divided the test set.
The architecture of the DFSim facility,
shown in Figure 7, consists of several
processes that reside in the client machine.
The master process creates and controls
the entire simulation process. This process
also carries out the fault tolerance and
dynamic reconfiguration operations. During start-up, the master process creates a
server monitor process for each available
workstation, which handles the communication between the servers and the master
process.
The server monitor processes are responsible for establishing reliable communication links with the servers and transmitting data and commands concerning the
remote execution of the fault simulation
tasks. The information exchanged among
the fault simulators and the monitor processes includes fault simulation commands
and data, error conditions, and fault simulation results.
In addition, a network monitor process
provides the master process with data
concerning the status (such as up or down),
the number of active users, and the load of
each server. The master process uses this
zyxwvutsrqpon
zyxwvutsrqpon
zyxwvutsrqpon
The location of the faults residing in a
given partition is an important factor for
maintaining the load balance. Faults located close to the primary inputs of a circuit require more time to propagate to the
primary outputs than faults located close to
the primary outputs. As mentioned earlier,
the servers assigned partitions containing
faults close to primary inputs will take
more time to complete their subtasks, disturbing the balance of the network. T o
minimize this effect, a random fault list
partitioning can be used so the location of
the faults is evenly distributed among the
various partitions.
The continuously changing load of the
servers, caused by processes unrelated to
fault simulation, may cause severe imbalances in the system. Although the load of
the servers can be monitored using existing
network services, there is no guarantee that
it will remain the same throughout the
entire simulation process. Note also that
the network overhead is negligible compared to execution times of the fault simulation process, especially when dealing
with large circuits.
Even if the load remains constant
through the entire simulation time, another
parameter may disturb the balance of the
system. This parameter is the detectability
profile of each partition, which is a vector
whose ith element represents the total
number of test patterns, obtained from an
exhaustive test set, that detect the ith fault
in the partition. During fault simulation,
partitions with low detectability profiles
generally require more simulation time,
since only a small number of test patterns
or test sequences (on sequential circuits)
46
can detect them.
Load balancing can also be affected by
the order in which test patterns are applied.
The subtasks, in which the hard-to-test
faults are detected during the early stages
of the fault simulation, will finish relatively quickly.
In certain cases, where a circuit consists
of logic blocks performing independent
functions, it might be advantageous to
partition both the fault list and the test set.
In this case, the faults of one logic block
are placed in the same partition, which is
simulated only by the test set responsible
for testing this particular block.
The above discussion leads to the conclusion that an optimal load balancing
algorithm for distributed fault simulation
applications that can be determined prior
to execution of the fault simulation may
not be practical. DFSim addresses the load
balancing problem using dynamic reconfiguration techniques. During this approach, a subtask can migrate from one
server to another when the executing server
becomes overloaded, provided the network includes other available servers. This
capability improves the load balance of the
system, which is highly sensitive to the
different detectability profiles of the partitions, the test pattern ordering, and the
continuously changing workload of the
servers.
Finally, the initial partitioning of the
fault list is based on the load of the servers
during the initialization phase, their computing power, and their available physical
memory. Note also that the facility allows
the user to define any fault listpartitioning,
which may be different from the partitioning techniques implemented in DFSim.
System
implementation
The DFSim distributed fault simulation
facility, which runs under a variety of
Unix-like operating systems, is based on
the 4.3 BSD Interprocess Communication
software.* Communication between the
servers and the client uses the reliable
transmission control protocol (TCP),
which guarantees that messages traveling
from one machine to another never get lost
or corrupted.' TCP is the protocol that
defines the reliable stream transport service, one of the most important internetworking functions. TCP is an independent,
general-purpose protocol that makes very
few assumptions about the underlying
COMPUTER
Figure 7. Architecture of the DFSim distributed fault simulation facility.
information to perform the initial partition
of the fault list and to efficiently perform
the fault tolerance and dynamic reconfiguration operations.
Finally, three databases reside in the
client machine: the circuit database, the
test set database, and the fault simulation
status database. The circuit database includes the netlist description of the circuit
and can be linked with a library containing
vjrious primitive circuit descriptions. The
test set database contains test sets for all
the available servers. The test set for each
server is either a modified version of the
original one, consisting of breakpoints
appropriately placed in the test file. or a
collection of test subsets that define sirnulation subtasks to be executed in the respective server. The fault simulation status
database contains the undetected fault lists
January 1990
zyxwvut
and the faulty state of all the simulation
subtasks in the network. The files included
in the test set and the status databases will
be used for fault tolerance and dynamic
reconfiguration purposes. The initial partitioning of the fault list also takes place in
the status database.
System evaluation
The simulation results for the fault list
and the test set partitioning methods applied to the benchmark circuits described
earlier appear in Figure 8.
The most interesting observation in this
figure is the superlinear speedup attained
using the fault list partitioning approach.
This behavior stems from the inefficient
fault simulation of even medium-size circuits when the memory requirements dictated by large fault lists exceed the available resources.
Figure 8 also shows that, given a large
number of servers, performance drops
from the ideal linear speedup. This occurs
because, with a large number of servers.
the fault lists at each subtask become relatively small and other operations, such
as fault-free simulation, circuit analysis,
and intialization, dominate the fault simu-
zyxwvu
We evaluated the performance of the
developed distributed fault simulation
approach using the DFSim facility. A
homogeneous network of VAX 2000
workstations with the same amount of
physical memory (five megabytes) was
used during off-peak hours to accurately
measure the divergence of the obtained
speedup from the ideal linear speedup.
47
zy
zyx
Simulation
time
for UFL
partitioning
(minutes)
- 20
- 23
- 27
- 30
- 36
-
@
,
COC
_*--"
- 52
e - -
Test set
zyxwvu
- 122
I
1
I
I
I
I
I
I
2
3
4
5
6
7
Number of servers
Figure 8. Performance of distributed fault simulation (a) for a sequential circuit and
lation time. However, we expect to see
such behavior only after achieving a significant reduction in the fault simulation
time.
This figure also shows that partitioning
of the fault list yields better performance
compared to the one attained by partitioning the test set. Since each server simulates
the entire undetected fault list with a subset
of the test set, excessive page fault activity,
in the case of large circuits, results in
inefficient fault simulation.
Another inefficiency of the test set par48
titioning method is that a server has no
information about the faults detected from
another machine, in which case faults are
unnecessarily simulated more than once by
different servers.
However, in some cases the partitioning
of the test set results in a better performance. This occurs when a circuit with a
small fault list is simulated with a large
number of test patterns. In this case, the
fault simulation time is dominated by the
fault-free simulation time (lower bound).
Thus, the partitioning of the test set will
zyxwvu
result in a significant improvement on the
fault simulation time, since the partitions
will be simulated with smaller test sets.
Fault tolerance
Fault simulation is a time-consuming
process that requires a considerable
amount of computing resources. The distributed approach can speed up this process by using a number of workstations
connected via a LAN. As the number of
COMPUTER
Speedup
Multiplier
5-
I
zyxwvutsrqp
zyxwvutsrqpon
A
A
6-
Simulation
time
for UFL
(minutes)
- 15.5
- 15.9
4-
-
17.9
- 19.6
- 21 .a
3-
- 26.7
2-
1-
zyxwvutsrqponm
- 70.3
I
I
I
I
I
I
I
1
2
3
4
5
6
7
Number of servers
e : Partition of fault list
0
zyxwvutsrqpo
: Partition of test set
I
(b) for a combinational circuit.
servers increases, the probability of a system failure also increases, in which case
the loss of a server may result in the termination of the entire fault simulation task.
To avoid such conditions, which result in
increased simulation time and cost, DFSim
includes capabilities that allow the system
to recover from server failures.
Fault tolerance in DFSim is accomplished by using time redundancy.“ Time
redundancy involves the repetition of a
fault simulation subtask between rollback
points upon detection of a system failure in
the respective server. Failure detection of a
system takes place using existing network
services. In 4.3 BSD Unix, each machine
broadcasts messages through the network
indicating its state. Each machine also
receives similar messages from other
machines denoting their state. If the client
has not received such a message from a
server, i t considers this server to be
“down.”
At each rollback point the current status
of the fault simulation subtask is saved in a
status database on the master to be used
later in case of a failure. The master process that resides in the client machine carries out the fault recovery operations.
When the master process detects a failure
in the server, the master assigns the uncompleted subtask to the next available
server using the data saved in the stable
storage during the last rollback point.
The insertion of rollback points, which
is fault simulator dependent, is accomplished either by introducing breakpoints
in the test set or by dividing the test set into
a number of subsets. In the latter case and
zyxwvutsrqponm
January 1990
49
I
z
zyx
zyxwvutsrq
I
Fault simulation task
A
J.
Undetected
fault list
(UFL)
Test
set
Fault simulation
subtask 1
I
FSin
subset 1
Rollback
.
zyxwvuts
k
1-1 zyxwvuts
.
Fault simulation
subtask N
subtask 2
FSirn
Partition 1
I
FSirn
...
subset 1
Rollback
subset 2
Rollback
subset 2
Rollback
Test
subset M
subset M
Test
subset M
Figure 9. Structure of fault simulation processes used for incorporating fault tolerance and dynamic reconfiguration
capabilities.
during system initialization, the master
process creates a copy of the test set for
each one of the available servers. As shown
in Figure 9, these test sets are further subdivided into subsets simulated sequentially at each server. When a server completes a subset of the test vector file (a
rollback point), it records its status to the
status database, and the simulation continues with the next subset until the entire test
vector set has been exhausted.
This capability, called incremental fault
simulation, is possible when the simulator
can use the undetected fault list from a
previous run as input data to a later run.
Also, you can simulate additional vectors
without having to restart the simulation.
However, incremental fault simulation
that uses only the undetected fault list as
input is not in itself sufficient for implementing the described fault-tolerant technique. In addition, the fault simulator
should be able to save the faulty state of the
circuit when a breakpoint is encountered.
The faulty state applies only for sequential circuits, and it consists of the fault lists
at the output of the memory elements in
the circuit.
minimizing fault simulation time. The
same techniques used for recovering from
system failures underlie dynamic reconfiguration. The rollback points placed in
the test set of each server divide each
simulation subtask to a number of fault
simulation sessions (see Figure 9). During
dynamic reconfiguration the fault simulation sessions assigned to a workstation can
migrate to other servers based on processing power and server availability information.
The migration of fault simulation sessions was preferred over the partitioning of
the fault list because the network becomes
imbalanced only during the late stages of
the fault simulation where the number of
undetected faults is not significant. In this
zyxwvutsrqpon
50
Dynamic
reconfiguration
As mentioned earlier, dynamic reconfiguration of subtasks is important for
COMPUTER
Table 2. Performance improvement of the distributed facility using dynamic reconfiguration.
Servers
Reconfiguration
VAX 2000- 1
VAX 2000-2
VAX II/GPX
Sun-3/160
Sun-3/60
VAX 2000-3
Utilization
Improvement in
Turnaround Time
Three Servers
No
Yes
5
5
5
Five Servers
No
Yes
Six Servers
No
Yes
zyxwvutsrqp
61.0%
4
9
2
5
5
5
5
2
6
10
2
5
5
5
5
5
92.7%
63.4%
95.8%
68.6%
33.6%
case the fault-free simulation time dominates the fault simulation time and, for this
reason, further partitioning of the fault list
does not result in any improvement.
The master process executes the reconfiguration operation once a server completes its fault simulation sessions. In this
case the master process examines the network to identify the most loaded server
(the server which has to execute the maximum number of sessions) and executes the
reconfiguration algorithm. At this point
the master process creates a copy of the
current undetected fault list, which the
target server will use to restart a fault
simulation session. It also creates a copy of
the state of the circuit (in the case of sequential circuits) that describes the state of
the circuit of the fault simulation session
that will migrate to the free server. Therefore, when a server reaches a rollback
point, it saves the state of the circuit if no
such action has been performed by other
servers.
To demonstrate the performance of the
distributed facility with the dynamic reconfiguration capability, we selected a
heterogeneous network. The network consisted of a Sun-3/160 (with 16 megabytes
of memory and a 16.67-MHz clock), a
Sun-3/60 (with 20 megabytes of memory
and a 20-MHz clock), a cluster of VAX
2000 (with six megabytes of memory
each), and a cluster of VAX II/GPX (with
five megabytes of memory each) workstations.
We modified the test set of the multiplier
by inserting five rollback points (one rollback point per 20 test vectors) and simulated it using a heterogeneous network with
three, four, five, and six servers. Table 2
shows the network configuration along
with the number of fault simulation sesJanuary 1990
Four Servers
No
Yes
33.2%
sions executed in each workstation. From
this table, you can see the significant improvement in resource utilization, obtained using dynamic reconfiguration capabilities. This improvement results from
migrating fault simulation sessions from
slower workstations to faster ones, thus
increasing the utilization of the network
and at the same time decreasing the turnaround time, which is determined by the
slowest server.
An important issue related to the fault
tolerance and dynamic reconfiguration
operations is the placement of the rollback
points. Frequent use of rollback points
achieves better load balancing, but it may
also increase the simulation time because
the simulator spends a significant amount
of time updating the status database. Also,
frequent use of rollback points increases
the disk requirements in the case of large
sequential circuits because the state of the
circuit has to be saved at the end of each
fault simulation session. The insertion of
breakpoints is based on information related to the complexity of the circuit and
the processing power of the available
computing resources.
A different approach, meant to minimize the load imbalancing effects, has
been implemented in some commercially
available fault simulators and in the Chiefs
fault simulator.'* This approach divides
the entire fault simulation task into a large
number of smaller subtasks, assigned to
different nodes based on the availability of
servers. Once a server completes a subtask,
the client assigns a new partition to the free
server.
This approach has the disadvantage that
you must consider small fault partitions to
achieve good load balancing. As aresult, it
cannot exploit the maximum performance
7
IO
3
5
5
5
5
5
5
7
11
3
88.8%
71.0%
92.3%
3
2
25.8%
3
3
3
23.5%
of the concurrent fault simulators. This can
be achieved by considering subtasks with
as many faults as possible such that paging
to disk remains at low levels. This load
balancing approach resembles the multipass capability of fault simulators, where
the assigned undetected fault list is divided
into partitions and only one partition is
simulated at a time. This, as shown earlier,
degrades the performance of the fault
simulators (refer to Figure 5). The dynamic reconfiguration implemented in
DFSim allows the servers to simulate
larger partitions, thus reducing the overall
simulation time.
T
he foregoing discussion demonstrates that a significant reduction
in the fault simulation time of
complex circuits is possible using a distributed approach in which a fault simulation task is partitioned into subtasks consequently assigned to the nodes of a distributed network for execution. You can
achieve this using existing computing resources without resorting to special-purpose, high-cost hardware. The superlinear
speedup observed in medium- and largesize circuits results from reducing the significant memory requirements of concurrent fault simulators.
The distributed fault simulation approach described here, which is implemented in the DFSim facility, has significant advantages over current commercially available simulators. The main advantage of the proposed approach is the
ability to operate in a heterogeneous computing environment. The dynamic reconfiguration implemented in DFSim main-
51
zyxwvutsrqpon
zyxwvutsrqponm
zyxwvutsrqp
9. D. Comer, Internetworking wiith TCPIIP:
tains load balancing, which increases utilization of the available resources and improves the overall performance of the described method..
Principles, Protocols. and Architecture,
Prentice Hall, Englewood Cliffs, N.J.,
1988.
10. B. Warren et al., “Distributed Computing
using RTI’s Freedomnet in a Heterogeneous Unix Environment,” Proc. 1987 Uni‘forum Conf., Jan. 1987.
1 1. P.K. Lala, Fault Tolerant and Fault Tesrable Hardware Design, Prentice Hall Int’l,
References
Mark Royals is a research engineer with the
Center for Digital Systems Research at the
Research Triangle Institute. His research interests include VLSI design and test, test genera12. P.A. Duba et al., “Fault Simulation in a . tion, fault simulation, design for testability, and
Distributed Environment,” Proc. 25th built-in self-test techniques.
ACMIIEEE D e s i g n Automation Conf.,
Royals received his BS and MS degrees in
1988, pp. 686-691.
electrical engineering from North Carolina
State University in 1985 and 1987, respectively. He is a member of Eta Kappa Nu.
London, 1985, pp. 103-107.
1 . D.R. Schertz and G. Metze, “A New Representation for Faults in Combinational Digital Circuits,” IEEE Trans. Computers, Vol.
C-21, Aug. 1972, pp. 858-866.
2. S . Seshu, “On an Improved Diagnosis
Program,” IEEE Trans. Electronic Computers, Vol. EC-14, 1965, pp. 76-79.
3. E.G. Ulrich and T. Baker, “Concurrent
Simulation of Nearly Identical Digital
Networks,” Computer, Vol. 7, Apr. 1974,
pp. 39-44.
4. T.W. Williams and K.P. Parker, “Design
for Testability - A Survey,” Proc. IEEE,
Vol. 71, No. 1, Jan. 1983, pp. 99-100.
5. K. Son, “Fault Simulation with the Parallel
Value List Algorithm,” VLSI Systems Design, Vol. VI, No. 12, Dec. 1985.
Tassos Markas is a research engineer with the
6. T. Blank, “A Survey of Hardware Accelerators Used in Computer-Aided Design,”
IEEE Design and Test, Aug. 1984, pp. 2139.
Center for Digital Systems Research at the
Research Triangle Institute and a research assistant in the Computer Science Department at
Duke University. His research interests include
parallel and fault-tolerant architectures, application-specific IC design, testing, and distributed systems.
Markas received his BS in physics in 1985
from the University of Athens in Greece and his
MS in electrical engineering from Duke University in 1988. He is currently a PhD student in
the Electrical Engineering Department at Duke
University. He is a member of the IEEE Computer Society and the ACM.
7. D.K. Pradham, ed., Fault-Tolerant Computing: Theory and Techniques, Vol. I,
Prentice Hall, Englewood Cliffs, N.J.,
1986, pp. 234-260.
8. W. Joy et al., 4.2 BSD System Manual,
Computer Science Research Group, Dept.
of Electrical Engineering and Computer
Science, Univ. of California at Berkeley,
July 1983.
Moving?
Nick Kanopoulos is manager of the VLSI
Design and Test Group in the Center for Digital
Systems Research at the Research Triangle
Institute and adjunct assistant professor in the
Electrical Engineering Department at Duke
University. His main research activities cover
application-specific IC design using silicon and
gallium arsenide technologies, design for testability and built-in self-test techniques, and
fault-tolerant system design.
Kanopoulos received the EE degree from the
University of Patras in Greece in 1979 and the
MS and PhD degrees in electrical engineering
from Duke University in 1980 and 1984, respectively. He is a member of Tau Beta Pi, Eta
Kappa Nu, the American Association for the
Advancement of Science, and the Technical
Chamber of Greece.
Name (Please Print)
PLEASE NOTIFY
US 4 WEEKS
IN ADVANCE
. -
New Address
City
MAIL TO:
lEEE Service Center
445 Hoes Lane
Piscataway, NJ 08854
52
Statelcountry
Zip
zyxwvutsrq
zyxwvutsrqpon
ATTACH
LABEL
HERE
This notice of address change will apply to all
IEEE publications to which you subscribe.
List new address above.
If you have a question about your subscription,
place label here and clip this form lo your letter.
Readers can contact the authors at Research
Triangle Institute, Center for Digital Systems
Research, PO Box 12194, Research Triangle
Park. NC 27709.
COMPUTER