Academia.eduAcademia.edu

Via wearout detection with on-chip monitors

2010, Microelectronics Journal

This project aims to detect the onset of chip failure due to via voiding through monitoring the delays of paths in a chip. The proposed method relates the probability of failure of individual vias to an increase in delay for monitors of the system using data for 65 nm technology. The delay increase, as a function of the failure distribution parameters, the path length, gate type, and process variation, has been investigated. An on-chip, ring oscillator-based wearout monitoring circuit is presented. The proposed scheme monitors the delay through a data path using a delay detection circuit (DDC).

Microelectronics Journal 41 (2010) 789–800 Contents lists available at ScienceDirect Microelectronics Journal journal homepage: www.elsevier.com/locate/mejo Via wearout detection with on-chip monitors Fahad Ahmed, Linda Milor  Georgia Institute of Technology, School of Electrical and Computer Engineering, Atlanta, GA 30332, USA a r t i c l e in fo abstract Article history: Received 20 September 2009 Accepted 18 January 2010 Available online 11 February 2010 This project aims to detect the onset of chip failure due to via voiding through monitoring the delays of paths in a chip. The proposed method relates the probability of failure of individual vias to an increase in delay for monitors of the system using data for 65 nm technology. The delay increase, as a function of the failure distribution parameters, the path length, gate type, and process variation, has been investigated. An on-chip, ring oscillator-based wearout monitoring circuit is presented. The proposed scheme monitors the delay through a data path using a delay detection circuit (DDC). & 2010 Elsevier Ltd. All rights reserved. Keywords: Electromigration Built-in self-test Reliability monitor 1. Introduction Many applications, ranging from automatic flight control systems, nuclear power control systems, on-line transaction processors for financial institutions, to hospital patient monitors, require systems to be extremely reliable. Fault-tolerant systems require finding a way to prevent a physical defect or failure from causing an error in system performance. Such systems incorporate fault detection, fault masking, fault isolation, and recovery procedures. Fault-tolerant design at the circuit level focuses primarily on fault detection. The other aspects of fault tolerance (fault location, system reconfiguration, and recovery) are generally performed off-chip by the system architecture and software. The typical approach to fault detection on-chip involves the introduction of redundancy, in space, time, and information, at the cost of either extra computational time and/or extra hardware components. The cost of these approaches is a degradation in yield (and a concomitant increase in manufacturing cost), together with higher power dissipation, since all modules are operated in parallel. This paper aims to demonstrate an alternative approach, involving detection of the onset of failure through detection of component degradation over time. If component degradation can be detected, the cost of building highly reliable systems would be reduced. The additional area needed for fault detection is limited to the design-for-test (DfT) circuitry, which is a small fraction of the component being monitored. Similarly, the cost of operating such a system is similarly reduced, due to the reduced power dissipation, since the DfT circuitry is only turned on intermittently during test.  Corresponding author. Tel.: + 1 404 894 4793; fax: + 1 404 898 0677. E-mail address: [email protected] (L. Milor). 0026-2692/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2010.01.006 A wide variety of faults can cause system failures, ranging from static to transient faults. This paper focuses on faults due to component wearout. The major causes of product wearout are electromigration, gate oxide breakdown, hot carrier injection, negative bias temperature instability, backend dielectric breakdown, and stress migration [1]. This paper focuses on the detection of electromigration only. The proposed method to detect electromigration is through detection of increased delays in data paths. However, it should be noted that many other failure mechanisms can impact the delays of data paths. Hence, our results will only show that delay increases provide a quantifiable indication of electromigration in the absence of other failure mechanisms. Future work will look at the use of additional monitors, in combination with the proposed monitors, to attempt to determine the cause of failure. This paper is organized as follows. In the next section via degradation is related to path delay. Section 3 considers the sensitivity of this relationship to parameters describing the failure rate distribution of vias, chain length, the composition of the data path in terms of gate types, and process variation. Section 4 presents the proposed detection circuit. Section 5 considers the problems caused by power supply noise, jitter, and temperature variation and potential solutions to these problems. Section 6 compares this scheme with a conventional ring oscillator-based monitoring scheme. Section 7 concludes the paper with a summary. 2. Relating via degradation to path delay 2.1. Calculation of the via cumulative probability densities of failure Stress-inducted electromigration has been identified as a key concern in interconnect reliability, due to continued reduction in via dimensions and the resulting increase in current density with F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 each technology generation. This has adversely affected the electromigration lifetime of interconnects, particularly vias and contacts, due to the higher current densities through them [2,3]. Electromigration is caused by the transfer of momentum from the moving ‘‘electron wind’’ to ions in the metallic lattice [4]. It results in the transport of these metallic ions into the neighboring material, causing an increase in resistance. Two different voiding features have been identified at the via– line interface [3]. Depending on the direction of flow of current, void formation could occur on the via side of the line liner or on the line side. Once big enough to be electrically detectible, void formation translates into an increase in the delay through the data path. Thus by constantly monitoring the delay of a path, it is possible to detect the onset of failure of the data path. Median time-to-failure (MTF) of a via is a strong function of current density. Black’s equation [4] has been extensively used for electromigration lifetime modeling. According to this model, the MTF of a conducting path is inversely proportional to the current density, as follows: MTF ¼ AJ n expðF=KTÞ; ð1Þ where J is the current density, F is the activation energy, T is temperature, K is Boltzmann’s constant, and A is a constant. In this study, the activation energy and the value of n were calculated using experimental data from [5]. As a generic example, the data path considered is made up of a chain of inverters, and each node is assumed to contain a via, as shown in Fig. 1. Via dimensions are assumed to be typical via dimensions for 65 nm technology [6]. Therefore, the current density through each via is only a function of the current through the node in which it is located. The ring oscillator was simulated. Table 1 shows the calculated MTFs for vias at different positions in the inverter. It can be seen that the vias at the sources have the least expected lifetime due to the larger current through them. The Weibull cumulative probability density function (CDF) for the time-to-failure is used to model the probability of failure, Table 1 MTF for vias of the inverter at the nodes indicated in Fig. 1. Via PMOSS NMOSS PMOSG NMOSG OUTVIA MTF (years) 5.2040 5.1721 18.8620 114.707 11.6598 1 PMOSS NMOSS 0.8 OUTVIA PMOSG NMOSG 0.6 CDF 790 0.4 0.2 0 0 5 10 Time-Yrs 15 20 Fig. 2. CDF of vias at different nodes of an inverter for b = 1.2. The difference in the failure rates is due to the difference in the current through each node. P(tfail), of these vias as a function of time:  b ! tfail Pðtfail Þ ¼ 1exp  : Z ð2Þ The Weibull distribution is described by two parameters: Z and b, which are the characteristic lifetime and shape parameter, respectively. The median time-to-failure is MTF ¼ Zðln ð2ÞÞ1=b : ð3Þ Substituting (3) into (2) enables us to relate the MTF of a via to its probability of failure. Fig. 2 shows the CDFs of the vias of an inverter. 2.2. Relationship between resistance and failure The data path has been assumed to consist of an inverter chain with 18 inverters. The initial via resistances, R0, for all vias in all of the inverters, were assumed to be normally distributed. Experimental results [3] show that under stressed conditions, resistance increases very slowly, until a point where there is a jump in the via resistance. After the jump, the resistance either increases gradually or there is an immediate open circuit at the via–line contact. This resistance jump was explained as an occurrence of a void at the contact. Once the void is formed, the rate of increase in resistance increases in comparison to the initial increase in resistance. The resistance jump is very critical, since it indicates the point after which a via could break at any time. The resistance value after the jump was taken as the point at which a via fails (RFail). Li et al. [3] had shown that the distribution of this value is a function of the time zero resistance and is approximately distributed around a mean value of 0.2R0 for most via configurations, i.e. RFailMean ¼ 1:2R0 : Fig. 1. The inverter chain consists of inverters, line the one shown, with a via at every node. ð4Þ For each via, RFail is assumed to be normally distributed with the mean of 1.2R0. 791 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 Under normal operating conditions it is assumed that the via resistance is modeled as a time-dependent exponential function: ð6Þ 25 0.25 0.2 Hence, we can calculate G for each via in the inverter chain: G ¼ tfail =lnð1:2Þ ð7Þ 0.15 0.1 Vias Exceeding 1.2Ro where G models the rate of increase of the effective resistance as a function of time. The time to fail for each via (tfail) is calculated from the Weibull curve. Values for RFail and tfail are related by Eq. (5): 1:2R0 ¼ R0 expðtfail =GÞ: 0.3 ð5Þ PFail RðtÞ ¼ R0 et=G 30 and G¼ 20 15 10 5 MTF lnð1PÞ : lnð1:2Þbðln 2Þ1=b 0.05 ð8Þ Accounting for randomness involves assigning a probability, P A ½0; 1, to each via based on a uniform distribution. This probability, P, impacts G for each via, according to (8). Eq. (5) is then used to model the time evolution of each via resistance in the inverter chain. Fig. 3 shows the increase in resistances of the 90 vias in the inverter chain with respect to time for one inverter chain instance. Because P is a random variable, each inverter chain will have a different distribution of resistances vs. time. 0 0 0 1 2 3 Time-Yrs 4 5 Fig. 4. Number of vias exceeding 1.2R0 as a function of time. PFail is the fraction of vias that have failed in the inverter chain, which varies from zero to one. x 10-10 2.41 2.3. Relationship to inverter chain delay 2.405 Delay-sec The inverter chain has numerous vias exceeding RFail at various times. Fig. 4 plots the number of vias exceeding 1.2R0 vs. time. An inverter chain with via resistances varying as shown in Fig. 3 was simulated, to find the delay through the chain. The result is shown in Fig. 5. Fig. 6 combines Figs. 4 and 5, by plotting DDelay vs. the number of failing vias. We see that changes in delay greater than 2e  14 s can clearly indicate the presence of an electromigration problem that impacts the vias. This corresponds to detecting faulty chains at the point when only one or two vias have failed. It can be noted from Fig. 5 that normally delay changes by 0.9% in the course of 5 years. Some of the sources of variation in delay not related to wearout include on-chip temperature variation, IR drops, and crosstalk. Consider, for example, temperature, where over an operating temperature range from 0 to 105 1C, delay of the 2.4 2.395 2.39 2.385 0 1 2 3 4 5 Time-Yrs Fig. 5. Delay through the inverter chain. With increasing time, the via resistances increase, causing an increase in delay. inverter chain varies by 150 ps. This swamps out degradation due to wearout. Hence, it is important that the operating conditions are reproduced exactly for each test of delay degradation. The ability to reproduce operating conditions, and hence temperature profiles, IR drops, and crosstalk limits the resolution of the method. Delay increases due to other mechanisms that impact transistors, i.e. hot carrier injection and negative temperature bias instability, are likely to be more significant than delay increases due to electromigration in vias. Nevertheless, there could be processes with electromigration weaknesses which need to be monitored. Moreover, the delay increase trigger is an indicator of a potential reliability failure of any kind. Hence, if we have other screens to determine the likely cause of failure,1 delay Fig. 3. Simulated values of resistance for all vias in an inverter chain plotted against time. 1 Screens that can distinguish between electromigration and transistor wearout mechanisms could be a set of heavily loaded overstressed ring oscillators, some with large transistors driving single vias and some with the same transistors driving multiple vias, so that electromigration is unlikely. 792 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 x 10-12 Delta Delay-sec 2 1.5 1 0.5 0 0 0 5 0.05 10 15 20 Vias Exceeding 1.2Ro 0.1 0.15 0.2 PFail 25 0.25 30 0.3 Fig. 6. Relationship between the change in delay through the chain and the probability of failure of the inverter chain. increases can be linked to the number of failing vias in the chain and the fraction of failing vias, Pfail, in accordance with Fig. 6. 2.4. Estimation of the number of failures for a full chip Since a circuit can have billions of paths, it is not possible to monitor all paths in a circuit. Let s be the fraction of the total number of vias in the circuit that are monitored. If a circuit has 100,000 vias and 100 are monitored, s would be 0.001. Let us suppose that a set of monitors provide us with data on the delay increase for each monitor. This relationship determines the expected number of failed vias in each chain, ni. These numbers are summed together to provide a total number of failed vias in the P monitors, i.e. f ¼ m i ¼ 1 ni , where m is the number of monitors. The total number of failed vias in the full chip is f/s, provided that the monitors are randomly placed throughout the chip. Hence, if we detect only two fails based on the monitors, and s=0.001, then there are likely to be 2000 via failures in the full chip. However, it should be noted that the failure rate is an exponential function of temperature. Hence, one can get a stronger signal relating to wearout if paths are selected in areas of the chip with high activity and high temperature. In this case, monitors in each sector, j, would be used to estimate the number of failures in that sector, and the results would be summed to estimate the number of failures for the full chip, i.e. X fj =sj : ð9Þ j 3. Sensitivity of delay changes 3.1. Sensitivity to the failure rate distribution function Process variations can cause a shift in the parameters of the Weibull distribution, which describes the failure rate distribution for a specific chip. The Weibull distribution has two parameters, Z and b. In this section, we investigate the sensitivity of changes in delay to changes in these parameters. We consider variations due to b and MTF (which varies Z). To investigate the effect of variations in MTF on the number of via failures and DDelay, the value of A in Eq. (1) is changed, which in turn causes variation in the values of MTF. Table 2 shows the MTFs for three different cases. Similarly, we have investigated the effect of variations in b. Table 3 shows the MTFs for three different cases, where variation in b is considered. For all of the above cases, the corresponding Weibull curves were computed using Eq. (2). Then the corresponding via resistances as a function of time were computed. The number of vias exceeding 1.2R0 for the three cases involving varying MTF and b in Tables 2 and 3 were computed, together with the corresponding increases in delay through the chain. The change in delay as a function of the number of failing vias is shown in Figs. 7 and 8, considering variation in MTF and b, respectively. These figures indicate that delay increases are not sensitive to the parameters describing the probability density function of the via failure rate. Hence delay changes are a strong indicator of the number of failed vias at any given time. 3.2. Sensitivity to chain length The same procedure was repeated for the analysis of the effect of inverter chain length (and hence data path length). Inverter chains with 18, 28, and 38 inverters were considered. The result in Fig. 9 indicates that a change in delay is an indicator of the number of failing vias, independent of via chain length. Table 2 MTFs for Vias of an inverter for three different values of A (in years). Case 1 Case 2 Case 3 PMOSS NMOSS PMOSG NMOSG OUT-VIA 5.20 3.08 13.12 5.17 3.06 13.04 18.86 67.92 47.58 114.70 101.80 289.40 11.64 11.16 29.39 Table 3 MTFs for vias of an inverter for three different values of b (in years). b = 1.2 b = 1.4 b = 1.6 PMOSS NMOSS PMOSG NMOSG OUTVIA 5.20 1.08 0.48 5.17 1.08 0.48 18.86 3.27 1.35 114.7 15.4 5.72 11.64 2.16 0.91 x 10-13 2 Case1 Case2 1.5 Delta Delay-sec 2.5 Case3 1 0.5 0 0 5 0 0.05 10 15 Vias Exceeding 1.2Ro 0.1 PFail 0.15 20 0.2 Fig. 7. Change in delay vs. PFail for the cases with varying MTF. 793 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 x 10-11 10 Beta=1.2 Beta=1.4 8 Delta Delay-sec Beta=1.6 6 4 2 Fig. 10. The data path under test consisted of alternating NAND–NOR gates with vias at the nodes shown. 0 10 0 20 30 40 50 Vias Exceeding 1.2Ro 0.2 0.4 PFail 60 0.6 7 x 10-13 6 Fig. 8. Change in delay vs. PFail for the cases with varying b. Inverter Chain Length=28 Inverter Chain Length=38 5 Delta Delay-sec x 10-13 5 Length=18 Length=28 4 Length=38 Nand-Nor Chain Length=18 4 3 Delta Delay-sec 2 3 1 0 2 0 0.04 0.06 PFail 0.08 0.1 0.12 Fig. 11. A comparison between the delay sensitivity function for the NAND/NOR chain with 18 stages and inverter chains with 28 and 38 stages. 1 0 0.02 0 0.05 0.1 PFail 0.15 0.2 Fig. 9. Change in delay plotted vs. the fraction of vias whose resistance has exceeded 1.2R0 for chains of different lengths. 3.3. Sensitivity to gate type Data paths consist of gates other than inverters. Hence, it is important to study how the gate type composition affects the results. A data path consisting of alternating NAND–NOR gates was considered with the via configuration shown in Fig. 10. For the NAND gates, the nodes with DPGA and DNGA are connected to the supply rail, while the nodes DPGB and DNGB switch. Similarly, for the NOR gates, the nodes with RPGA and RNGA are the switching nodes, while the other inputs are connected to ground. The current through all nodes was used to find the failure rate distributions for each of the vias. As with the inverter chain, the delay sensitivity function vs. the number of vias exceeding 1.2R0 is insensitive to variations in Weibull parameters. The NAND/NOR chain has 1.6 more vias than the inverter chain. Hence, the number of vias is slightly greater than that of an inverter chain of length 28 and less than an inverter chain of length 38. Fig. 11 compares the delay sensitivity curves for the NAND/NOR chain with inverter chains of length 28 and 38. The increased intrinsic capacitances at the output nodes of NAND/ NOR gates leads to greater delay increase values for the same number of failing vias. Hence the DDelay vs. the fraction of failed vias relationship proven so far for similar data paths does not hold when we vary the gate type, unless we take into account gate node capacitances. To better understand the impact of gate type on the delay sensitivity relationship, consider the Elmore delay model Delay ¼ t k X Cn Rn ð10Þ n¼1 where Cn is the node capacitance at each node, Rn is a combination of the driver’s resistance and node resistance, k is the data path length, and t is a constant. In the presence of variation in via resistance, only Rn changes. Let dRn be the degradation in node resistance due to via degradation. DDelay can then be approximated as follows: DDelay ¼ t k X Cn dRn : ð11Þ n¼1 The intrinsic node capacitance Cn is both a function of gate type and the transistor size. Cn mainly consists of the overlap and 794 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 junction capacitances and both are assumed to increase linearly with transistor size. Let k0 be the number of nodes in an inverter chain, with capacitance Cl, l =1,y,k0. Let k be the number of nodes in a path containing arbitrary gates. At each of these nodes, n =1,y,k, jn is the number of nodes in the charge and/or discharge path, which depends on the gate stack size in the pull-up and/or pull-down network, and hence the gate type. Let Cn,m, n =1,y,k and m=1,y,jn be the node capacitances associated with the mth node in the pull-up and/or pull-down network of the nth gate in the path. The capacitance ratio to the inverter chain is the following: P Pk ð jn Cn;m Þ g ¼ n ¼ 1Pk m ¼ 1 : ð12Þ 0 C l¼1 l Therefore, the relationship between the delay sensitivity function for an arbitrary gate, DDelay, and the delay sensitivity function for a reference inverter chain, DDelayinv is DDelay ¼ g  DDelayinv : voltage is due to statistical fluctuation in the number of dopant atoms per unit volume in the device channel. Channel length variation is due to line edge roughness, which is induced by the polymer characteristics of photoresist, and systematic variation in lithography (proximity effect, lens aberrations, and flare) [11]. Our analysis considers random variation in threshold voltage and channel length. Figs. 13 and 14 show that there is very little impact of random process variation on the delay sensitivity curves. The impact is more pronounced in the presence of channel length variation. Hence, it may be desirable to include capacitance compensation for channel length variations, as suggested in the previous section. This is because channel length variations can be large, and they impact both transistor drive strength and load capacitances. 4. DPRO-based delay detection ð13Þ 3.4. Sensitivity to process variations Two major sources of process variations are the threshold voltage [7] and channel length [8–10]. Variation in threshold In this section, we present the design of an on-chip delay monitoring system to detect changes in the delay of a data path being monitored. 1.2 G NOT NAND NOR NAND–NOR (9 NAND and 9 NOR) NAND–NOR-NOT (28 NOT, 9 NAND, 9 NOR, randomly placed) 1 3.5 2.75 3.125 1.23 0.6 0.4 0 2 0 0.02 x 10-13 2 0.06 0.08 PFail 0.1 12 0.12 NOT(28) NOR(18) NAND(L=18) NAND(18) 1.5 Delta-Delay (sec) NAND-NOR(L=18) NAND-NOR-NOT(L=46) 4 0.04 10 x 10-13 NOR(L=18) 5 4 6 8 Vias Exceeding 1.2Ro Fig. 13. Change in delay plotted vs. the number of failing vias and PFail for a random sample of threshold voltages. For 65 nm technology, the standard deviation was assumed to be 30 mV. NOT(L=28) Norm-Delta Delay (sec) 0.8 0 Gate/chain 6 data1 data2 data3 data4 0.2 Table 4 Capacitance ratios for a variety of gates. 7 x 10-13 1 Delta Delay-sec Table 4 contains a variety of capacitance ratios for the gates considered. The differences in the capacitance ratios are partially due to gate sizing. In Fig. 12 we plot DDelay=g for a variety of gates. From this figure it can be concluded that for an arbitrary data path, given g and the delay sensitivity function for the inverter chain, DDelayinv, it is possible to estimate the number of failing vias for arbitrary paths, irrespective of the type of gates that make up the data path. 3 NAND-NOR(18) NAND-NOR-NOT(46) 1 2 0.5 1 0 0 0.02 0.04 0.06 PFail 0.08 0.1 0.12 Fig. 12. Increase in the normalized delay of the data path with varying types pf gates and length plotted against PFail. 0 0 0.02 0.04 0.06 PFail 0.08 0.1 0.12 Fig. 14. Change in delay plotted vs. PFail for a random sample of channel lengths. For 65 nm technology, the standard deviation was assumed to be 20 nm. F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 4.1. System design and operation The proposed scheme is illustrated in Fig. 15. Data paths in a chip can be turned into oscillators, whose frequency provides a measurement of delay. A path is either an inverting data path, which starts oscillating if connected in feedback, or a noninverting path, which requires an additional inverter to oscillate. A set of data paths can be converted to ring oscillators by inserting muxes in the paths, together with an additional feedback inverter for the non-inverting paths. The additional inverter in noninverting paths, together with the muxes, add delay, but since we are only interested in measuring delay degradation, determining the exact delay of a path is not important. During normal operation the path is connected to the regular input and output data lines. In test mode (TM), the data path is disconnected from its data in/out and is connected in a feedback loop, which may or may not contain an inverter, to form a data path ring oscillator (DPRO), shown in Fig. 15. The frequency of oscillation is used to monitor path degradation. This implementation poses a few problems when we take into account the fact that the data paths are often a part of a pipelined stage with very critical delay margins. A single path cannot go into TM without disturbing the whole pipeline. Similarly, the performance overhead introduced by the data selection blocks can also be critical and may cause failures if present in critical data paths. To address these issues, consider the self-test scheme shown in Fig. 16. The figure shows a normal pipeline stage with the path under test (PUT) highlighted. During TM, clock gating halts the pipeline, ensuring that no erroneous data is fed out during TM. Two specially designed test registers (T-R) are inserted in place of conventional registers (Reg) for the monitored data path. During TM, the test registers disconnect the PUT from the preceding and subsequent data paths and connect the feedback path. To minimize physical overhead, another selection block is introduced to choose among the different monitored paths. The target PUT is connected by the selection block to the DDC for delay monitoring. The T-Rs were designed to minimize the performance overhead during the normal operation of the data path. Since the register at the input of the data path has a slightly different function from the register at the output of the data path during TM, two separate registers were designed, as shown in Fig. 17. During normal operation, the T-R selection blocks operate as conventional inverters, as a standard master–slave flip–flop. The operation of the T-Rs during TM is shown in Fig. 18. As the PUT enters TM, the pipeline clock signal (CL) is grounded, halting all operations in the pipeline. The master latch is disconnected from the slave latch, and the T-gates at the input of the T-Rs are turned on. The slave latch of the input T-R and the master latch of the output T-R have data selection blocks which disconnect the latch configuration and connect the inputs to output lines A and B. The data path is thus connected in the DPRO configuration and starts oscillating. In order to monitor numerous paths, another selection block is included to connect the selected PUT to the DDC. Hence only a single DDC block is sufficient for the entire system being monitored, as illustrated in Fig. 16. 4.2. Initiating oscillations in complex data paths In general, a data path may not have a simple inverting or non-inverting relationship between one input and one output. Consider a multiple input (X1, X2, X3,y,Xn) and multiple output (f1, f2, f3,y,fn) data path. Let us assume that one of the output functions of this data path (fx) is to be monitored for wearout. The output fx might be a function of multiple inputs, and therefore could be ‘1’ or ‘0’ depending on the values at the inputs. Let set S1 be a set of input values that give a logic ‘1’ at output fx and let S0 be a set of inputs which give a logic ‘0’, i.e., fx{S1}= 1 and fx{S0} =0 where S0 ; S1 A fX1 ; X2 ; . . . ; Xn g. Fig. 15. Data path ring oscillator (DPRO) operating in Test Mode. Fig. 17. Test registers designed for minimum performance overhead. Fig. 16. The DDC-based self-test scheme. 795 Fig. 18. T-R and data path configuration of the DPRO during TM. 796 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 If we eliminate the inputs that have the same value in both the sets S1 and S0, we are left with the inputs (XR) that toggle the function fx and that can make the data path oscillate. Consider the logic function shown in Fig. 19. For this function S1 = {X1 =1, X2 = 1, X3 = 0} and S0 can consist of all other input combinations. Let us select S0 = {X1 = 0, X2 =0, X3 = 0}. Then in order to create oscillations, we need to toggle X1 and X2. Moreover, since X1 = X2 = 1 results in fx = 1, and X1 =X2 = 0 results in fx =0, fx must be inverted before it is fed back to the inputs X1 and X2. Consider a complex data path in a pipelined stage, shown in Fig. 20. Of all the outputs, suppose we want to monitor one as a selected wearout monitor. In TM, the T-R register at the output of fx feeds the selected data path into the function inverting block. The function inverting block senses the value of the output fx and generates a set of inputs to be inverted or non-inverted and fed back into the register file at the input of the data path. For these inputs, the regular registers are again replaced by the T-R registers which feedback the output of the function inverting block into the data path, causing the value at the output fx to toggle. Fig. 21 shows one possible implementation of the function inverting block. Depending on the value of fx, the selection block chooses the appropriate values independently of other selection blocks for inverting fx. It should be noted that this scheme of initiating oscillations in complex data paths leads to insignificant performance overhead due the insertion of the specially designed T-R registers. Fig. 21. The function inverting block. 4.3. DDC design The oscillations of the DPRO output are fed into the DDC. Fig. 22 shows the block level implementation of the DDC. In the DDC, a counter counts the number of oscillations of the DPRO. The other two blocks in the DDC generate the test start and stop X1 X2 fx X3 Fig. 22. The DDC block connected to the DPRO in TM. Fig. 19. Logic function. Fig. 20. A multiple input, multiple output complex data path in a pipelined stage. signal, PT. PT generation is divided into two parts. The VCO generates pulses of a predetermined frequency. Since the output frequency of the VCO and the VCO size are inversely proportional, an optimal VCO frequency was chosen, while keeping in mind the required sensitivity and device overhead. The VCO feeds into a frequency divider, which generates a waveform with a period of 2PT, by dividing the frequency of the VCO output. As the PUT enters TM, the DPRO and the VCO are enabled simultaneously. The counter is enabled on the rising edge of the signal from the frequency divider and starts counting the number of oscillations of the DPRO. The falling edge of PT disconnects the counter from the DPRO, hence halting the counter. To get an accurate measurement, the duration of oscillation (PT) must be large, to avoid false count triggers due to variations caused by power supply noise and jitter. The VCO makes the DDC more robust towards device degradation. As the DDC only becomes active intermittently, it is far more likely that the data paths fail before any significant degradation in the DDC itself. However, slight degradation in the 797 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 DDC over a large period of time would bring about a change in PT. This would unavoidably lead to less accurate count reads. Specifically, small increases in delay would not be detected because of the increase in PT. However, the VCO can be re-tuned to minimize any errors due to DDC degradation. 4.4. PT generation The VCO is formed by a ring oscillator consisting of current starved inverters, as shown in Fig. 23. VT controls the oscillation frequency of the VCO. VCOs based on current starved inverters can have the problem of slow rise and fall times at low values of VT. They suffer from glitches at the start of operation. For better PT generation, the VCO was terminated using a Schmitt trigger. A pseudo-static counter was also used as a frequency divider. A K-bit counter can effectively act as a N/2K frequency divider. Fig. 24. A single bit of the counter used both as a frequency divider and a frequency monitor. 1145 1140 4.5. Counter design and operation 1135 1130 1125 Count The counter was designed to be both a frequency divider and a frequency monitor for the DPRO. These blocks of the DDC have slightly different requirements. The accuracy of this scheme is directly dependent on PT, which has to be large to detect very small changes in delay. This requires the counter bits to hold logical values for long periods of time. The DPRO frequency monitor, on the other hand, has to be fast to measure high data path frequencies for shorter data paths. Finally, the outputs of the counter have to have stable ‘0’s’ in idle mode to avoid counter output variations due to incorrect counter initiations. Fig. 24 shows the design of a single bit of the counter. As seen from the figure, a pseudo-static configuration was chosen to meet the requirements of both the frequency divider and the frequency monitor. When not in use, the input of the counter is grounded. The bit enable signal ENc is also kept low, which forces the output to a ‘0’. As the DDC enters TM, ENc is forced high and the input terminal IN is made to oscillate by the VCO or the DPRO. During the high time of the input pulse, the first pass transistor connects the two dynamic inverters together enabling the initial state X1, fed back from the output, to propagate to the input of the second pass transistor. During the low time the output of the first dynamic inverter goes to the high Z state, and the two dynamic inverters are disconnected, hence negating the effect of any logic changes due to leakage in the first dynamic inverter. The previous state of the bit is toggled and is passed to the output terminal. Hence, the output toggles for every high-to-low transition of the input, enabling the block to act as a counter. 1120 1115 1110 1105 1100 1095 0 2 4 6 8 10 Time-yrs Fig. 25. Counter output plotted against time. Note that since we are only interested in the change in the counter output, the size of the counter is not important, as long as it is large enough to cover the expected variation in the count from cycle-to-cycle. At every overflow, the counter resets, thus having no effect on the final results. 4.6. Example To test the effectiveness of this scheme, the impact of electromigration on a data path consisting of 18 inverters was simulated. In Fig. 25, the DPRO counter output is plotted for the detection of via degradation. 5. Power supply noise, jitter, and temperature variation tolerance Fig. 23. A ring oscillator consisting of current starved inverters was used as a VCO. Let us suppose that the number of counts for a good circuit is n. Then, ignoring the effects of power supply noise and jitter, a faulty circuit needs at least one more count. If we suppose that the frequency of operation of the DPRO is fDPRO, then PT ¼ n=fDPRO . The test time needed to detect a delay change of DDelay is 2 DDelay. PT ¼ 1=fDPRO Consider, for example, a DPRO that oscillates at 1 GHz. Then, to detect a change in delay of DDelay= 0.02 ps, n =50 000, and PT = 50 ms. 798 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 However, this scheme has to be tolerant to power supply noise and jitter, since they cause false triggers, indicating the end of life of a chip still in working condition. Power supply noise and jitter increase the appropriate value of PT. 5.2. Jitter Similarly, jitter in PT is another potential problem. Jitter may be due to power supply noise and ground bounce, but may also be due to coupling capacitance and local temperature variation. Jitter results in random errors at the VCO and DPRO outputs, because of randomness in the switching times of the waveforms. This results in variation in the count. This can create an error if the count increases by more than one. The case when the DPRO and VCO oscillate at 1 GHz, with jitter on the signals of the DPRO and VCO, is illustrated in Fig. 27. A solution to this problem of potential false triggers due to jitter is to average multiple delay measurements with the counter. During PT, the counter runs and samples an output from the DPRO. During the high-to-low transition, the counter output is shifted into a register and the counter is reset. After N cycles we would have N samples of the DPRO, which are then averaged. The selection of N depends on the level of jitter on the signals. 0 Counts -5 -10 -15 5mv 10mv Noise Free -20 0 1 2 3 Time-Yrs 4 5 6 Fig. 26. Count output with Gaussian noise source added to the power supply. 0.0015 0.001 250 0.0005 50 100 150 200 RMS Jitter on DPRO Outpu t (ps) 50 250 M S O on Jit ut V te pu C r t( O ps ) 150 0 R To investigate the effect of power supply noise, a zero mean Gaussian noise source was added to the power supply, with a standard deviation of 10 mV. A varying power supply produces a varying DPRO oscillation frequency. The final count output depends on the mean oscillation frequency of the DPRO (fDPRO-Mean), i.e. Count= fDPRO-Mean  PT. A larger PT gives the frequency more time to settle down to a steady mean value. Let 0 fDPRO and fDPRO-Mean be the oscillation frequencies in the absence and presence of noise, respectively. Given a difference, 0 Dferr ¼ fDPRO fDPRO-Mean , then the degradation in frequency of oscillation of the DPRO due to wearout (DfW) has to be greater than Dferr to be detectible. To investigate the effect of power supply noise, simulations for the detection of via degradation were repeated with Gaussian noise added to the power supply. The impact of the noisy environment on the final count output is shown in Fig. 26. Although the trend is maintained, it clear that even a small amount of noise in the power supply can have a large impact. 0.002 Probability of Error 5.1. Power supply noise 0.0025 Fig. 27. Test accuracy vs. jitter. 5.3. Temperature Finally, it should be noted that delays are very sensitive to temperature. For example, over the operating temperature range from 0 to 150 1C, delay of the inverter chain varies by 150 ps. This can overwhelm any delay change due to wearout. To overcome this problem, the operating conditions must be reproduced for each test of delay degradation. 6. Comparison with conventional ring oscillator-based monitors System aging is usually monitored by ring oscillators scattered throughout the chip. These monitors oscillate throughout the life time of the chip, and their frequency degradation indicates the total device degradation. These monitors operate on the assumption that the amount of degradation in the ring oscillator is a good measure of the degradation of the operational parts of the chip. One of the major causes of inaccuracy in this approach is the spatial and temporal variations in the operating environment of a chip. Since most wearout mechanisms are exponentially dependent on temperature, it is very unlikely that isolated ring oscillators can accurately predict the complex degradation profile of the chip. As an example, Fig. 28 shows the variation in the MTF of a via in a data path undergoing degradation due to electromigration. Another issue is the assumption of a high correlation between the switching activity of nodes of a ring oscillator and those in complex data paths. Consider the sum generation part of a full adder, shown in Fig. 29, as a simple example. ‘A’ and ‘B’ are the two inputs to be added and ‘S’ is the output sum. Let us assume we want to study the wearout at node ‘S’ of the full adder. The amount of wearout due to electromigration, for example, is a function of current density though a via which in turn is a function of the switching activity. Fig. 30 plots the switching activity of the node ‘S’ assuming no incoming carry and uncorrelated inputs. Compared to this, a ring oscillator oscillates independently of the clock frequency. Typically these oscillating frequencies are very high and a node in a ring oscillator may go through numerous charges and discharges in a single clock cycle. This means that ring oscillators have switching activities that are far greater than one, and monitors based on ring oscillators could provide pessimistic estimates of chip aging. As an example let us assume that a ring oscillator wearout monitor is designed whose frequency of oscillation is almost 799 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 20 200 150 MTF-yrs MTF (yrs) 15 10 100 50 0 1 5 0 280 0.5 P(B ) 300 320 340 Temperature (K) 360 380 Fig. 28. Variation in MTF of a via undergoing degradation due to electromigration with varying temperature. 1 0 0 0.5 P(A) Fig. 31. Variation in MTF of a via placed at the output node of a full adder with varying input signal probabilities. 7. Summary This paper has shown that wearout of vias can be detected by measuring path delays. Analysis of via failure rates shows that the change in delay as a function of time is independent of Weibull distribution parameters and the number of stages in a path. This shows that a trigger point based on an increase in delay directly relates to the number of failing vias in an inverter chain. Hence, a delay detection circuit provides quantifiable evidence of via degradation in the absence of other sources of wearout for inverter chains. A self-test scheme to monitor the wearout of a chip by measuring the delay through data paths has been presented. This paper summarizes the circuit design, together with analysis of the required test time and the impact of potential problems caused by power supply noise, jitter, and temperature. Simulation results indicate that very small changes in delay can be detected. The same procedure can be applicable to any degradation that results in a delay shift, such as hot carrier injection and negative bias temperature instability, which increase the device threshold voltage with time. Fig. 29. The sum generation in a full adder. Switching Activity 0.3 0.2 0.1 0 1 Acknowledgment 0.5 P(B ) 1 0 0 0.5 P(A) Fig. 30. Switching activity of a node of a full adder with varying switching probabilities of the inputs. equal to the clock frequency so that the nodes of the monitor have switching activities close to one. The median time to failure (MTF) for a via due to electromigration at such a node to be around 12 years. Compared to that, Fig. 31 plots the MTF of a via placed at the output node of a full adder with varying input signal probabilities. The variations in MTF of a via with varying switching activities is a clear indicator of the inaccuracy of isolated ring oscillator monitors. Our approach, unlike ring oscillators, requires the halting of the operation of the circuit during testing. However, all modules are not active at all times in many applications, and wearout tests can be scheduled during times of inactivity, at startup, or shutdown. The authors thank the Semiconductor Research Corporation for their financial support, under Tasks 1645.001 and 1645.002. References [1] M.H. Woods, MOS VLSI reliability and yield trends, Proc. IEEE 74 (12) (1986) 1715–1729. [2] K. Banerjee, et al., Characterization of contact and via failure under short duration high pulsed current stress, in: Proceedings of the International Reliability Physics Symposium, 1997, pp. 216–220. [3] B. Li, et al., Impact of via–line contact on CU interconnect electromigration performance, in: Proceedings of the International Reliability Physics Symposium, 2005, pp. 24–30. [4] J. Black, Electromigration—a brief survey and some recent results, IEEE Trans. Electron Devices 16 (4) (1969) 338–347. [5] G. Steinlesberger, et al., Copper damascene interconnects for the 65 nm technology node: a first look at the reliability properties, in: International Interconnect Technical Conference, 2002, pp. 265–267. [6] M. Lamy, et al., How effective are failure analysis methods for the 65 nm CMOS technology node? in: Proceedings of the International Symposium on Physical and Failure Analysis, 2005, pp. 32–37. [7] D. Markovic, et al., Methods for true energy-performance optimization, IEEE J. Solid-State Circuits 39 (8) (2004) 1282–1293 Aug. 800 F. Ahmed, L. Milor / Microelectronics Journal 41 (2010) 789–800 [8] M. Orshansky, et al., Impact of spatial intrachip gate length variability on the performance of high-speed digital circuits, IEEE Trans. Computer-Aided Des. 21 (5) (2002) 544–553. [9] B. Cline, et al., Analysis and modeling of CD variation for statistical static timing, in: Proceedings of the International Conference on Computer-Aided Design, 2006, pp. 60–66. [10] K.A. Bowman, S.G. Duvall, J.D. Meindl, Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration, IEEE J. Solid-State Circuits 37 (2) (2002) 183–190. [11] M. Orshansky, L. Milor, C. Hu, Characterization of spatial intra-field gate CD variability, its impact on circuit performance, and spatial mask-level correction, IEEE Trans. Semicond. Manuf. 17 (1) (2004) 2–11.