Academia.eduAcademia.edu

On distributed fault simulation

1990, Computer

AI-generated Abstract

This paper discusses distributed fault simulation, addressing the inefficiencies of traditional automatic test pattern generators (ATPGs) and the complexities of modern digital circuits. It highlights the role of fault simulation in evaluating test sets for fault coverage and describes the development of DFSim, a distributed fault simulation facility designed to improve performance through parallel processing in a heterogeneous Local Area Network (LAN). The findings suggest that leveraging multiple workstations can significantly enhance fault simulation speed and efficiency.

On Distributed Fault Simulation zyxw Tassos Markas, Mark Royals, and Nick Kanopoulos Research Triangle Institute V erifying the functionality of digital circuits in the testbed relies heavily upon test generation. Because of the complexity of today’s digital circuits, efficient general-purpose automatic test pattern generators (ATPGs) are not yet available. To overcome the inefficiency of current ATPG programs, one alternative method relies on the development of functional or pseudorandom test sets. These test sets are subsequently evaluated to determine if the obtained fault coverage (percentage of detected faults) meets the requirements specified by the user. The evaluation of the given test set is carried out by the fuult simulator. a program that examines the behavior of digital circuits under various faulty conditions. During the fault simulation process, a given test set is applied to the fault-free circuit and to each of the faulty circuits obtained by introducing certain classes of faults in the circuit nodes. The circuit response is consequently analyzed to determine the faults detected by the test set. A fault is considered detected by a test pattern if the response of the fault-free circuit differs from the response of the circuit in the presence of this particular fault. The simulation time can be reduced by identifying equivalent fault classes and simulating only one fault from each class. This Efficient partitioning a Of zyx zyx k when more than one processor is available. This can be achieved by dividing the fault simulation task into independent subtasks assignable for execution to various computing resources. This large-grain granularity. type of parallelism can be exploited .. by using a distributed computing environment that consists of several powerful nodes connected via a local area network (LAN). In this article we examine the comDutational aspects of fault simulation and address issues related to the efficient partitioning of a fault simulation task to a number of subtasks assignable for execution to the nodes of a distributed system. In addition, we describe the implementation of a distributed fault simulation facility (called DFSim) using a heterogeneous LAN consisting of a number of workstations with different computing resources and different versions of Unix operating systems. ~ tasks and allocation of the resulting subtasks over a distributed system yields faster fault simulation without resorting to expensive Specialpurpose hardware* zyxwvutsrqponm 40 process is called fault collapsing.’ Fault simulators carry out fault grading and development of fault dictionaries, which are requirements for highly reliable circuits. In today’s highly integrated digital circuits, the fairly large number of equivalent fault classes in a given circuit makes fault simulation an expensive and time-consuming process. However, fault simulation has an inherent parallelism that can be used to speed up the execution time 0018-9162/90/0100-0040$0l 0 0 0 19901EEE Fault simulation Fault simulation is used primarily for the generation of fault dictionaries and the evaluation of adigital circuit’s fault coverage by a given test set. Fault simulation is also employed to analyze the operation of a circuit under various fault conditions. COMPUTER The performance criteria for various existing fault simulation algorithms are the processing power and the memory requirements. Although processing power is the main consideration in efficiently simulating large circuits, maintaining a minimum hardware configuration will keep the cost of the machine low. The major fault simulation algorithms differ in the way they manage the trade-off between processing speed and memory. The three most commonly used fault simulation techniques are parallel, concurrent, and parallel value list (PVL) fault simulation. In fault simulators, faults are represented as fault structures associated with each node of a digital circuit. These fault structures propagate from the primary inputs towards the primary outputs based on the type of the logic gate they are associated with and the fault-free value of the examined gate. During this procedure, the faults that reach the primary outputs are considered detected, and they are dropped from the undetected fault list of the circuit. The parallel* fault simulation technique is based on the parallel processing of bitoriented operations. The driving force of this approach arises from the capability of computer systems to perform logical instructions on a number of bits in parallel. This approach has the advantages of efficient memory use, because a number of events can be packed in the same word, and parallel computation, because computer instructions can operate on a number of bits simultaneously. Figure 1 shows the fault structure in a 32-bit machine, with the least significant bit representing the faultfree value of the node. In concurrent' fault simulation, the fault structure associated with each node contains the faults that produce an error at the 30 29 WordA a.. Word 6 Fault Values Logic 0 Logic 1 Logic 2 Logic X I i (ii zyxwvutsrq Figure 1. Fault structure of the parallel algorithm. zyxwvutsrqponm zyxw zyxwvut zyxwvutsr zyxwv zyxwvutsrqpon zyxw ~.o From fault--+ free value Pointer Fault 1 Fault value 1 January 1990 0 Fault 1 oovaluesoo Fault value inputs and/or at the output of the related gate. These faults are arranged in a linked structure as shown in Figure 2. The propagation of the fault structures is accomplished by simulating only the portion of the circuit whose faulty response differs from the fault-free response. The computational complexity of the concurrent fault simulation algorithm is 0[n2],where n is the number of gates of the ~ i r c u i t . ~ The concurrent fault simulation approach is faster than the parallel approach, but it requires considerable amounts of c* 1 memory. Also, concurrent fault simulators require dynamic memory allocation capabilities, since the amount of memory cannot be determined during circuit compilation time. Finally, the parallel value list' (PVL) algorithm combines the above approaches. A typical PVL structure (see Figure 3) consists of parallel lists connected in a linked structure. During this algorithm a fault structure propagates towards the primary outputs if any of the faults in the list produce a response different from the Pointer Group 2 1 Fault value Figure 2. Fault structure of the concurrent algorithm. Group 1 0 1 Fault N Fault 2 Group N .mm 41 zy zyxwvu zyxwvutsrq fault-free response of the circuit. This approach has the advantage of simulating in parallel the faults packed in one memory word while maintaining the computing efficiency of the concurrent approach. Circuit description FSim Motivation Concurrent fault simulation of today’s very large scale integrated circuits requires considerable computing resources. Consequently, a significant interest exists in reducing the excessive computing time of the fault simulation process without resorting to special-purpose hardware6 (such as IBM’s Yorktown Simulation Engine, NEC’s Hardware Accelerator, Zycad, IKOS, or Daisy), which is often expensive and sometimes incompatible with the use of different fault simulation algorithms and/or different systems. One way of accomplishing this speedup i s to exploit the inherent parallelism of the fault simulation process by developing a methodology that divides a fault simulation task into subtasks and then assigns them for independent processing to a number of processors connected via a local area network. The increasing use of LANs, which include a significant number of powerful workstations, constitutes an appealing environment for distributed processing applications. Distributed processing also provides better resource utilization, especially when the individual nodes of such networks remain relatively idle for long periods of time. 1 Undetected fault list ( LJFL) 0 t J. I Initial fault list Output fault list patterns zyxwvutsrq Figure 4. Fault simulation procedure. Page faults (1,000) Systolic Array Simulation time (seconds) 10,000 3,574 equivalent fault classes 8,000 . 80 6,000 . ’ 60 - 40 . 20 I Computational requirements of fault simulation ,? 4,000 Before introducing our distributed fault simulation approach, we will review the basic concepts involved in the operation of fault simulation. The simulator initially parses the circuit and performs fault enumeration to produce the undetected fault list, which contains all the faults associated with the input and output lines of all the gates in the circuit. After this. the simulator reduces the number of faults to be simulated by collapsing them into a set of equivalent fault classes. The test patterns are simulated one at a time, and the faults detected at each step are dropped from the undetected fault list until the entire test set or the undetected fault list has been exhausted (see Figure 4). 2,000 600 1,OOo 1,400 1,800 2200 Number of faults per pass -0 : Simulationtime zyxwvutsr 42 --- 0 : Page faults Figure 5. Fault simulation on a single server (a) for a sequential circuit and COMPUTER At the end of the simulation, the user receives the obtained fault coverage, along with the undetected fault list and other statistical information on the detected faults. The majority of commercially available simulators allow the user to define an initial undetected fault list as well as the number of faults to consider during each simulation pass. This latter parameter enables the simulator to divide the entire fault list into subsets and simulate these subsets one at a time. This capability assists memory management in cases where the system has insufficient physical memory to hold the fault simulation data for Simulation time (seconds) the entire circuit. To demonstrate the behavior of the fault simulation under this condition, the graphs in Figure 5 show the simulation time as a function of the number of faults per pass (solid line). W e used two different types of circuits for this experiment, simulating both in a machine with five megabytes of main memory. The first circuit - a highly sequential one -was a systolic array chip consisting of eight identical cells placed in a linear pipe. The circuit held approximately 6,000 gates. We injected a set of 5,192 stuck-at faults, which collapsed to 3,574 equivalent fault classes. Finally, we simulated the zyxwvu zyxwvutsr zyxwv zyxwvutsrqpo Page faults (1,000) Multiplier 10,000 4,035 equivalent fault classes 8,000 . 80 - 6,000 60 *.+* ..U* . 20 2,000 5 Mbytes of *.*- physical memor)l--o.--.’a -.a--- O-.”’ a.-I I 1,000 I 2,000 I I 3,000 Number of faults per pass - 0 : Simulation time . --- 0 : Page faults (b) for a combinational circuit. January 1990 Distributed fault simulation A fault simulation task can be divided into a number of computationally independent subtasks by partitioning either the fault list or the test set. For fault list partitioning, this inherent parallelism stems from the lack of data dependencies among the different subtasks. During test set partitioning, some data and control flow (in case of sequential circuits) dependencies make this technique more difficult to implement. In both cases the interprocess communication among the various subtasks is negligible, which is very appealing for developing a distributed facility capable of performing fault simulation. Computing environments consisting of a number of powerful workstations (servers) connected via a LAN can efficiently execute such distributed tasks. In the distributed environment shown in Figure 6, the client machine divides a simulation task into subtasks by partitioning either the fault list, in which case each server simulates a subset of the fault list using the initial test set, or the test set, in which case each server simulates the entire fault list with a subset of the test set. After completion of the partitioning task, the subtasks are assigned to the servers, which zyxwvuts zyxwvu zyxwvutsrqpon . 40 4,000 I circuit with 390 test vectors and obtained a fault coverage of 93 percent. The second circuit was a 16x16 array multiplier (Booth encoding) with approximately 3.000 gates and 4,035 equivalent fault classes. We simulated this circuit using 52 pseudorandom test patterns and obtained a fault coverage of 99 percent. Figure 5 shows a significant increase in the simulation time after a certain number of faults per pass has been exceeded. This occurs when the fault simulation data surpasses the storage capacity of the main memory, in which case the system spends a significant amount of time swapping pages from the main memory to the disk and vice versa (page faulting). The dotted lines in Figures 5a and 5b give the number of page faults as a function of the number of faults per pass. In general, this behavior is more severe in sequential circuits because of longer fault lists associated with each node. You can see this in Figure 5a, where an increase from 1,600 to 2,000 in the number of faults per pass results in a drop from 97 percent to 59 percent in CPU utilization. I I 4,000 43 Server 1 zyxwvutsrqp zyxwvutsrqponm zyxw Server N Server 2 i - FSim FSim A A UFL N UFL 1 Figure 6. Distributed fault simulation. zyxwvutsrqp carry out the fault simulation processes and report the simulation results to the client machine. The partitioning problem One of the main considerations related tn distributed fault simulation applications i s the identification of the best partitioning method (fault list or test set), as well as the identification of the optimum partitioning of a fault simulation task that will result in minimum turnaround time. During concurrent simulation, the storage requirements depend on a number of factors, such as circuit activity and circuit sequentiality. Highly sequential circuits require more storage due to longer test sequences and longer fault lists associated with circuit elements.’ The majority of commercially available fault simulators have the ability to partition the fault list and simulate each partition in a different simulation pass. For this reason it is crucial to identify the maxi- 44 mum number of faults that can be considered in each pass without degrading the performance of the system because of memory limitations. This parameter depends on the circuit size and available physical memory of the system. Although some vendors provide information concerning the estimation of this parameter, there is no automated method of computing it. Partitioning of the test set creates a number of problems, especially during the simulation of sequential circuits, where the order in which the test patterns are applied, as well as the state ofthe circuit, is crucial. This makes partitioning of the test set a rather difficult task, since each subtask requires that the circuit be at a certain known state during initialization. Another disadvantage of this approach involves the unnecessary simulation of the same faults by different partitions. This occurs because some faults are carried through the entire simulation of a subtask even if detected at some earlier point by another fault simulation subtask. The lack of interprocess communication among the fault simulation subtasks prevents the removal of the already detected faults from the distributed fault lists. The partitioning of the fault list is more advantageous mainly because it reduces the memory requirements of the fault simulation task by simulating only a subset of the original fault list. It also results in a more efficient implementation, since no data dependencies exist among the various subtasks. However, there is no generic approach to determine the optimum fault list partitioning for fault simulation purposes. The question that arises at this point is how to determine the optimum fault list partition so as to minimize the simulation time. The performance of the distributed fault simulation facility is determined by the turnaround time, which represents the elapsed time from initialization until the completion of all fault simulation subtasks assigned in the network servers. Minimum turnaround simulation time can be achieved by minimizing the cumulative simulation time while maintaining load balancing. As mentioned earlier, the initial undetected fault list is generated by simulating the circuit with an empty test file. The COMPUTER particular simulator used in DFSim enumerates the faults so that faults located close to the primary outputs are placed at the top of the fault list and faults located close to the primary inputs are placed at the bottom of the list. In addition, faults that belong to the same logic block are enumerated in the same portion of the fault list. DFSim incorporates several partitioning techniques. In the first method, a greedy-type partitioning technique, the fault list is divided into a number of partitions equal to the number of the available servers. The second algorithm (Rand) selects faults randomly from the initial fault list and places them in different partitions. The first technique preserves the hierarchy of the circuit (faults that belong in the same logic block will likely be placed in the same partition), while the second technique does not have this property. Using the PVL fault simulation algorithm, we evaluated the partitioning techniques implemented in DFSim. We obtained the results reported in Table 1 by simulating the Booth multiplier and the systolic array circuit using a homogeneous network with four servers (VAX 2000 workstations with six megabytes of memory). This table shows that the random algorithm was on average slower than the greedy algorithm. The explanation of this behavior lies in the fact that during concurrent and PVL fault simulation, only the active portions of the circuit are simulated. In addition, the PVL algorithm simulates the fault structures that include at least one fault that differs from the fault-free response. Based on this observation, we can achieve a significant speedup if all faults that differ from the fault-free response can be packed in the same fault structure. In this case the number of active fault structures is minimized, thus reducing the fault simulation time since fewer fault lists have to be propagated through the nodes of the circuit. For this reason the greedy-type technique achieves, on average, faster simulation time, since faults from the same logic block are placed in the same partition. Although the greedy algorithm experienced better performance on average, the turnaround time (determined by the most time-consuming subtask) was better in the random case. This resulted primarily because the greedy partitioning approach disturbs the load balancing of the network by grouping faults located close to the primary outputs in one or more partitions that differ from the partitions containing faults located close to the primary inputs. The imbalance occurs because the fault structures that reside close to the primary outputs take less time to propagate to the outputs that constitute detection points. The third algorithm evaluated was a clustering-type approach (Clust), which tries to compensate for the disadvantages of the greedy and random algorithms. During this technique, the circuit is divided into clusters and a portion of each cluster is included in each partition. The clusters are large enough to preserve the hierarchy of the circuit, but small enough to avoid causing any heavy imbalances in the workload of the distributed system. During this technique, the initial fault list is divided into N equal fault lists, and each one is further divided into L clusters, where L represents the number of available servers. The final partitioning is performed by including one cluster from each of the N fault lists in the same partition, thus generating L mutually exclusive partitions that will be assigned for execution in the nodes of the network. The Clust approach takes advantage of the capabilities of the PVL algorithm, since faults related with the same logic block are likely to be packed in the same fault structure. The selection of an appropriate N plays a significant role in the performance of this technique. If N is very large (total faultslN r L), then this method emulates the random algorithm. On the other hand, if N = l , then the Clust algorithm emulates the greedy approach. In general, partitions with faults from one logic block improve the average simulation time. However, this may cause heavy imbalances in the system, since different logic blocks have different topologies and may be tested by different test sets, in which case the test ordering will significantly affect the fault simulation time of the examined partition. This shows in Table 1, where utilization of the network zyxwvu zyxwv zyxwvuts zyxwvut Table 1. Performance evaluation of the fault partitioning techniques. Partitioning Technique Multiplier (4,035 faults) CLsize Tave Greedy Clust Rand 1,009 504 252 20 1 101 50 20 10 5 Systolic array (5,192 faults) CLsize Tave Tmax Tmax Util 1,148 1,448 79.3% 1,157 1,162 1,199 1,187 1,177 1,192 1,222 1,249 1,279 1,456 1,411 1,339 1,362 1,219 1,242 1,285 1,328 1,330 79.4% 82.3% 89.5% 87.1 % 96.5% 96.0% 95.1% 94.0% 96.1 % 1,312 1,378 95.2% Util zyxwvutsr ,298 649 324 259 130 65 26 13 7 1,722 2,253 76.4% 1,737 2,340 2,156 2,123 1,962 2,055 2,010 2,235 2,248 2,333 2,507 2,378 2,210 2,123 2,276 2,116 2,287 2,425 74.4% 93.3% 90.6% 96.7% 92.4% 90.3% 95.0% 97.7% 92.7% 2,377 2,382 99.7% CLsize: number of faults per cluster Tave: average simulation time Tmax: simulation time of the most time-consuming subtask Util: average utilization (Taveflmax) January 1990 45 decreases when partitioning of the rircuit employs large logic blocks. Load balancing can be achieved by partitioning the fault list in such a way that all the 5ervers complete their subtasks at approximately the same time, thereby maintaining high utilization of the available resources In distributed fault simulation applications, load balancing is difficult to maintain throughout the entire simulation task. primarily because of - The location of the faults, with respect to the primary inputs and the primary outputs of the circuit, of each partition. The varying load of servers due to processes unrelated to fault simulation. The detectability profile of each partition. The ordering of the test patterns. network. For this reason it works over a single physical network as well as over complex nets consisting of several physical networks connected via network bridges. DFSim uses this capability of the TCP protocol to implement adistributed system which may also include a significant number of servers residing outside the single physical network boundaries. File transfer operations are carried out by RTI's Freedomnet, which is a software subsystem designed to implement a distributed computing system."' Finally, this particular implementation uses GenRad's Hilo-3 PVL fault simulator, which supports the PVL algorithm. Note that the implementation of DFSim is essentially simulator independent; DFSim can be ported to support a variety of simulators. The system is based on the classic client/ser\jers model, in which the client is responsible for controlling the entire operation and for partitioning the fault simulation task into subtasks, which are subsequently assigned for execution to the servers. As mentioned earlier, DFSim facilitates a number of fault list partitioning algorithms, which allows the user to select the most suitable technique for a specific application. Note also that DFSim can control the execution of'a test-set-based partitioning approach, provided the user has already divided the test set. The architecture of the DFSim facility, shown in Figure 7, consists of several processes that reside in the client machine. The master process creates and controls the entire simulation process. This process also carries out the fault tolerance and dynamic reconfiguration operations. During start-up, the master process creates a server monitor process for each available workstation, which handles the communication between the servers and the master process. The server monitor processes are responsible for establishing reliable communication links with the servers and transmitting data and commands concerning the remote execution of the fault simulation tasks. The information exchanged among the fault simulators and the monitor processes includes fault simulation commands and data, error conditions, and fault simulation results. In addition, a network monitor process provides the master process with data concerning the status (such as up or down), the number of active users, and the load of each server. The master process uses this zyxwvutsrqpon zyxwvutsrqpon zyxwvutsrqpon The location of the faults residing in a given partition is an important factor for maintaining the load balance. Faults located close to the primary inputs of a circuit require more time to propagate to the primary outputs than faults located close to the primary outputs. As mentioned earlier, the servers assigned partitions containing faults close to primary inputs will take more time to complete their subtasks, disturbing the balance of the network. T o minimize this effect, a random fault list partitioning can be used so the location of the faults is evenly distributed among the various partitions. The continuously changing load of the servers, caused by processes unrelated to fault simulation, may cause severe imbalances in the system. Although the load of the servers can be monitored using existing network services, there is no guarantee that it will remain the same throughout the entire simulation process. Note also that the network overhead is negligible compared to execution times of the fault simulation process, especially when dealing with large circuits. Even if the load remains constant through the entire simulation time, another parameter may disturb the balance of the system. This parameter is the detectability profile of each partition, which is a vector whose ith element represents the total number of test patterns, obtained from an exhaustive test set, that detect the ith fault in the partition. During fault simulation, partitions with low detectability profiles generally require more simulation time, since only a small number of test patterns or test sequences (on sequential circuits) 46 can detect them. Load balancing can also be affected by the order in which test patterns are applied. The subtasks, in which the hard-to-test faults are detected during the early stages of the fault simulation, will finish relatively quickly. In certain cases, where a circuit consists of logic blocks performing independent functions, it might be advantageous to partition both the fault list and the test set. In this case, the faults of one logic block are placed in the same partition, which is simulated only by the test set responsible for testing this particular block. The above discussion leads to the conclusion that an optimal load balancing algorithm for distributed fault simulation applications that can be determined prior to execution of the fault simulation may not be practical. DFSim addresses the load balancing problem using dynamic reconfiguration techniques. During this approach, a subtask can migrate from one server to another when the executing server becomes overloaded, provided the network includes other available servers. This capability improves the load balance of the system, which is highly sensitive to the different detectability profiles of the partitions, the test pattern ordering, and the continuously changing workload of the servers. Finally, the initial partitioning of the fault list is based on the load of the servers during the initialization phase, their computing power, and their available physical memory. Note also that the facility allows the user to define any fault listpartitioning, which may be different from the partitioning techniques implemented in DFSim. System implementation The DFSim distributed fault simulation facility, which runs under a variety of Unix-like operating systems, is based on the 4.3 BSD Interprocess Communication software.* Communication between the servers and the client uses the reliable transmission control protocol (TCP), which guarantees that messages traveling from one machine to another never get lost or corrupted.' TCP is the protocol that defines the reliable stream transport service, one of the most important internetworking functions. TCP is an independent, general-purpose protocol that makes very few assumptions about the underlying COMPUTER Figure 7. Architecture of the DFSim distributed fault simulation facility. information to perform the initial partition of the fault list and to efficiently perform the fault tolerance and dynamic reconfiguration operations. Finally, three databases reside in the client machine: the circuit database, the test set database, and the fault simulation status database. The circuit database includes the netlist description of the circuit and can be linked with a library containing vjrious primitive circuit descriptions. The test set database contains test sets for all the available servers. The test set for each server is either a modified version of the original one, consisting of breakpoints appropriately placed in the test file. or a collection of test subsets that define sirnulation subtasks to be executed in the respective server. The fault simulation status database contains the undetected fault lists January 1990 zyxwvut and the faulty state of all the simulation subtasks in the network. The files included in the test set and the status databases will be used for fault tolerance and dynamic reconfiguration purposes. The initial partitioning of the fault list also takes place in the status database. System evaluation The simulation results for the fault list and the test set partitioning methods applied to the benchmark circuits described earlier appear in Figure 8. The most interesting observation in this figure is the superlinear speedup attained using the fault list partitioning approach. This behavior stems from the inefficient fault simulation of even medium-size circuits when the memory requirements dictated by large fault lists exceed the available resources. Figure 8 also shows that, given a large number of servers, performance drops from the ideal linear speedup. This occurs because, with a large number of servers. the fault lists at each subtask become relatively small and other operations, such as fault-free simulation, circuit analysis, and intialization, dominate the fault simu- zyxwvu We evaluated the performance of the developed distributed fault simulation approach using the DFSim facility. A homogeneous network of VAX 2000 workstations with the same amount of physical memory (five megabytes) was used during off-peak hours to accurately measure the divergence of the obtained speedup from the ideal linear speedup. 47 zy zyx Simulation time for UFL partitioning (minutes) - 20 - 23 - 27 - 30 - 36 - @ , COC _*--" - 52 e - - Test set zyxwvu - 122 I 1 I I I I I I 2 3 4 5 6 7 Number of servers Figure 8. Performance of distributed fault simulation (a) for a sequential circuit and lation time. However, we expect to see such behavior only after achieving a significant reduction in the fault simulation time. This figure also shows that partitioning of the fault list yields better performance compared to the one attained by partitioning the test set. Since each server simulates the entire undetected fault list with a subset of the test set, excessive page fault activity, in the case of large circuits, results in inefficient fault simulation. Another inefficiency of the test set par48 titioning method is that a server has no information about the faults detected from another machine, in which case faults are unnecessarily simulated more than once by different servers. However, in some cases the partitioning of the test set results in a better performance. This occurs when a circuit with a small fault list is simulated with a large number of test patterns. In this case, the fault simulation time is dominated by the fault-free simulation time (lower bound). Thus, the partitioning of the test set will zyxwvu result in a significant improvement on the fault simulation time, since the partitions will be simulated with smaller test sets. Fault tolerance Fault simulation is a time-consuming process that requires a considerable amount of computing resources. The distributed approach can speed up this process by using a number of workstations connected via a LAN. As the number of COMPUTER Speedup Multiplier 5- I zyxwvutsrqp zyxwvutsrqpon A A 6- Simulation time for UFL (minutes) - 15.5 - 15.9 4- - 17.9 - 19.6 - 21 .a 3- - 26.7 2- 1- zyxwvutsrqponm - 70.3 I I I I I I I 1 2 3 4 5 6 7 Number of servers e : Partition of fault list 0 zyxwvutsrqpo : Partition of test set I (b) for a combinational circuit. servers increases, the probability of a system failure also increases, in which case the loss of a server may result in the termination of the entire fault simulation task. To avoid such conditions, which result in increased simulation time and cost, DFSim includes capabilities that allow the system to recover from server failures. Fault tolerance in DFSim is accomplished by using time redundancy.“ Time redundancy involves the repetition of a fault simulation subtask between rollback points upon detection of a system failure in the respective server. Failure detection of a system takes place using existing network services. In 4.3 BSD Unix, each machine broadcasts messages through the network indicating its state. Each machine also receives similar messages from other machines denoting their state. If the client has not received such a message from a server, i t considers this server to be “down.” At each rollback point the current status of the fault simulation subtask is saved in a status database on the master to be used later in case of a failure. The master process that resides in the client machine carries out the fault recovery operations. When the master process detects a failure in the server, the master assigns the uncompleted subtask to the next available server using the data saved in the stable storage during the last rollback point. The insertion of rollback points, which is fault simulator dependent, is accomplished either by introducing breakpoints in the test set or by dividing the test set into a number of subsets. In the latter case and zyxwvutsrqponm January 1990 49 I z zyx zyxwvutsrq I Fault simulation task A J. Undetected fault list (UFL) Test set Fault simulation subtask 1 I FSin subset 1 Rollback . zyxwvuts k 1-1 zyxwvuts . Fault simulation subtask N subtask 2 FSirn Partition 1 I FSirn ... subset 1 Rollback subset 2 Rollback subset 2 Rollback Test subset M subset M Test subset M Figure 9. Structure of fault simulation processes used for incorporating fault tolerance and dynamic reconfiguration capabilities. during system initialization, the master process creates a copy of the test set for each one of the available servers. As shown in Figure 9, these test sets are further subdivided into subsets simulated sequentially at each server. When a server completes a subset of the test vector file (a rollback point), it records its status to the status database, and the simulation continues with the next subset until the entire test vector set has been exhausted. This capability, called incremental fault simulation, is possible when the simulator can use the undetected fault list from a previous run as input data to a later run. Also, you can simulate additional vectors without having to restart the simulation. However, incremental fault simulation that uses only the undetected fault list as input is not in itself sufficient for implementing the described fault-tolerant technique. In addition, the fault simulator should be able to save the faulty state of the circuit when a breakpoint is encountered. The faulty state applies only for sequential circuits, and it consists of the fault lists at the output of the memory elements in the circuit. minimizing fault simulation time. The same techniques used for recovering from system failures underlie dynamic reconfiguration. The rollback points placed in the test set of each server divide each simulation subtask to a number of fault simulation sessions (see Figure 9). During dynamic reconfiguration the fault simulation sessions assigned to a workstation can migrate to other servers based on processing power and server availability information. The migration of fault simulation sessions was preferred over the partitioning of the fault list because the network becomes imbalanced only during the late stages of the fault simulation where the number of undetected faults is not significant. In this zyxwvutsrqpon 50 Dynamic reconfiguration As mentioned earlier, dynamic reconfiguration of subtasks is important for COMPUTER Table 2. Performance improvement of the distributed facility using dynamic reconfiguration. Servers Reconfiguration VAX 2000- 1 VAX 2000-2 VAX II/GPX Sun-3/160 Sun-3/60 VAX 2000-3 Utilization Improvement in Turnaround Time Three Servers No Yes 5 5 5 Five Servers No Yes Six Servers No Yes zyxwvutsrqp 61.0% 4 9 2 5 5 5 5 2 6 10 2 5 5 5 5 5 92.7% 63.4% 95.8% 68.6% 33.6% case the fault-free simulation time dominates the fault simulation time and, for this reason, further partitioning of the fault list does not result in any improvement. The master process executes the reconfiguration operation once a server completes its fault simulation sessions. In this case the master process examines the network to identify the most loaded server (the server which has to execute the maximum number of sessions) and executes the reconfiguration algorithm. At this point the master process creates a copy of the current undetected fault list, which the target server will use to restart a fault simulation session. It also creates a copy of the state of the circuit (in the case of sequential circuits) that describes the state of the circuit of the fault simulation session that will migrate to the free server. Therefore, when a server reaches a rollback point, it saves the state of the circuit if no such action has been performed by other servers. To demonstrate the performance of the distributed facility with the dynamic reconfiguration capability, we selected a heterogeneous network. The network consisted of a Sun-3/160 (with 16 megabytes of memory and a 16.67-MHz clock), a Sun-3/60 (with 20 megabytes of memory and a 20-MHz clock), a cluster of VAX 2000 (with six megabytes of memory each), and a cluster of VAX II/GPX (with five megabytes of memory each) workstations. We modified the test set of the multiplier by inserting five rollback points (one rollback point per 20 test vectors) and simulated it using a heterogeneous network with three, four, five, and six servers. Table 2 shows the network configuration along with the number of fault simulation sesJanuary 1990 Four Servers No Yes 33.2% sions executed in each workstation. From this table, you can see the significant improvement in resource utilization, obtained using dynamic reconfiguration capabilities. This improvement results from migrating fault simulation sessions from slower workstations to faster ones, thus increasing the utilization of the network and at the same time decreasing the turnaround time, which is determined by the slowest server. An important issue related to the fault tolerance and dynamic reconfiguration operations is the placement of the rollback points. Frequent use of rollback points achieves better load balancing, but it may also increase the simulation time because the simulator spends a significant amount of time updating the status database. Also, frequent use of rollback points increases the disk requirements in the case of large sequential circuits because the state of the circuit has to be saved at the end of each fault simulation session. The insertion of breakpoints is based on information related to the complexity of the circuit and the processing power of the available computing resources. A different approach, meant to minimize the load imbalancing effects, has been implemented in some commercially available fault simulators and in the Chiefs fault simulator.'* This approach divides the entire fault simulation task into a large number of smaller subtasks, assigned to different nodes based on the availability of servers. Once a server completes a subtask, the client assigns a new partition to the free server. This approach has the disadvantage that you must consider small fault partitions to achieve good load balancing. As aresult, it cannot exploit the maximum performance 7 IO 3 5 5 5 5 5 5 7 11 3 88.8% 71.0% 92.3% 3 2 25.8% 3 3 3 23.5% of the concurrent fault simulators. This can be achieved by considering subtasks with as many faults as possible such that paging to disk remains at low levels. This load balancing approach resembles the multipass capability of fault simulators, where the assigned undetected fault list is divided into partitions and only one partition is simulated at a time. This, as shown earlier, degrades the performance of the fault simulators (refer to Figure 5). The dynamic reconfiguration implemented in DFSim allows the servers to simulate larger partitions, thus reducing the overall simulation time. T he foregoing discussion demonstrates that a significant reduction in the fault simulation time of complex circuits is possible using a distributed approach in which a fault simulation task is partitioned into subtasks consequently assigned to the nodes of a distributed network for execution. You can achieve this using existing computing resources without resorting to special-purpose, high-cost hardware. The superlinear speedup observed in medium- and largesize circuits results from reducing the significant memory requirements of concurrent fault simulators. The distributed fault simulation approach described here, which is implemented in the DFSim facility, has significant advantages over current commercially available simulators. The main advantage of the proposed approach is the ability to operate in a heterogeneous computing environment. The dynamic reconfiguration implemented in DFSim main- 51 zyxwvutsrqpon zyxwvutsrqponm zyxwvutsrqp 9. D. Comer, Internetworking wiith TCPIIP: tains load balancing, which increases utilization of the available resources and improves the overall performance of the described method.. Principles, Protocols. and Architecture, Prentice Hall, Englewood Cliffs, N.J., 1988. 10. B. Warren et al., “Distributed Computing using RTI’s Freedomnet in a Heterogeneous Unix Environment,” Proc. 1987 Uni‘forum Conf., Jan. 1987. 1 1. P.K. Lala, Fault Tolerant and Fault Tesrable Hardware Design, Prentice Hall Int’l, References Mark Royals is a research engineer with the Center for Digital Systems Research at the Research Triangle Institute. His research interests include VLSI design and test, test genera12. P.A. Duba et al., “Fault Simulation in a . tion, fault simulation, design for testability, and Distributed Environment,” Proc. 25th built-in self-test techniques. ACMIIEEE D e s i g n Automation Conf., Royals received his BS and MS degrees in 1988, pp. 686-691. electrical engineering from North Carolina State University in 1985 and 1987, respectively. He is a member of Eta Kappa Nu. London, 1985, pp. 103-107. 1 . D.R. Schertz and G. Metze, “A New Representation for Faults in Combinational Digital Circuits,” IEEE Trans. Computers, Vol. C-21, Aug. 1972, pp. 858-866. 2. S . Seshu, “On an Improved Diagnosis Program,” IEEE Trans. Electronic Computers, Vol. EC-14, 1965, pp. 76-79. 3. E.G. Ulrich and T. Baker, “Concurrent Simulation of Nearly Identical Digital Networks,” Computer, Vol. 7, Apr. 1974, pp. 39-44. 4. T.W. Williams and K.P. Parker, “Design for Testability - A Survey,” Proc. IEEE, Vol. 71, No. 1, Jan. 1983, pp. 99-100. 5. K. Son, “Fault Simulation with the Parallel Value List Algorithm,” VLSI Systems Design, Vol. VI, No. 12, Dec. 1985. Tassos Markas is a research engineer with the 6. T. Blank, “A Survey of Hardware Accelerators Used in Computer-Aided Design,” IEEE Design and Test, Aug. 1984, pp. 2139. Center for Digital Systems Research at the Research Triangle Institute and a research assistant in the Computer Science Department at Duke University. His research interests include parallel and fault-tolerant architectures, application-specific IC design, testing, and distributed systems. Markas received his BS in physics in 1985 from the University of Athens in Greece and his MS in electrical engineering from Duke University in 1988. He is currently a PhD student in the Electrical Engineering Department at Duke University. He is a member of the IEEE Computer Society and the ACM. 7. D.K. Pradham, ed., Fault-Tolerant Computing: Theory and Techniques, Vol. I, Prentice Hall, Englewood Cliffs, N.J., 1986, pp. 234-260. 8. W. Joy et al., 4.2 BSD System Manual, Computer Science Research Group, Dept. of Electrical Engineering and Computer Science, Univ. of California at Berkeley, July 1983. Moving? Nick Kanopoulos is manager of the VLSI Design and Test Group in the Center for Digital Systems Research at the Research Triangle Institute and adjunct assistant professor in the Electrical Engineering Department at Duke University. His main research activities cover application-specific IC design using silicon and gallium arsenide technologies, design for testability and built-in self-test techniques, and fault-tolerant system design. Kanopoulos received the EE degree from the University of Patras in Greece in 1979 and the MS and PhD degrees in electrical engineering from Duke University in 1980 and 1984, respectively. He is a member of Tau Beta Pi, Eta Kappa Nu, the American Association for the Advancement of Science, and the Technical Chamber of Greece. Name (Please Print) PLEASE NOTIFY US 4 WEEKS IN ADVANCE . - New Address City MAIL TO: lEEE Service Center 445 Hoes Lane Piscataway, NJ 08854 52 Statelcountry Zip zyxwvutsrq zyxwvutsrqpon ATTACH LABEL HERE This notice of address change will apply to all IEEE publications to which you subscribe. List new address above. If you have a question about your subscription, place label here and clip this form lo your letter. Readers can contact the authors at Research Triangle Institute, Center for Digital Systems Research, PO Box 12194, Research Triangle Park. NC 27709. COMPUTER