Design Challenges in Subthreshold Interconnect Circuits
Design Challenges in Subthreshold Interconnect Circuits
Design Challenges in Subthreshold Interconnect Circuits
have carried out interconnect modeling for multi-gigahertz clock. However, accurate
estimation of these interconnect parasitics requires details of the interconnect
geometry, layout, technology, the current distributions and switching activities of the
wires, which are difcult to predict and require more research. Rosa [51] has given
the formula for self and mutual inductance using BiotSavart law for linear conductors. Banerjee and Mehrotra [52] have introduced an accurate analysis of on-chip
inductance effects for distributed interconnects that takes into account the effect of
series resistance and output parasitic capacitance of the driver. The expressions for
the transfer function of distributed interconnect lines, their time-domain responses,
and computationally efcient performance optimization techniques have been presented. Closed-form approximation of frequency-dependent mutual impedance per
unit length of lossy silicon substrate coplanar-strip IC interconnects has been
developed in [53]. The derivation is based on a quasi-stationary full-wave analysis
and Fourier integral transformation.
Sylvester and Hu [34] have considered the characterization of interconnect with
particular attention to ultra-small capacitance measurement and in-situ noise evaluation techniques. An approach called the charge-based capacitance measurement
technique, to measure Femto-Farad level wiring capacitances, has the advantages of
being compact, having high-resolution and being very simple. Cong and Pan [54]
have presented a set of interconnect performance estimation models for design
planning with consideration of various effective interconnect layout optimization
techniques. These models can be used efciently during high-level design space
exploration, interconnect-driven design planning, and synthesis- and timing-driven
placement to ensure design convergence for deep sub-micrometer designs. A systematic method for deriving the characteristic model of interconnects from timedomain vector tting has been investigated in [55]. The method is based on the
iteration and convolution of time series by recursion. The approach extracts model
parameters from terminal voltage waveforms directly by time-domain vector tting
so that the transformation of frequency loading can be simulated efciently in
SPICE-compatible simulator. The contributive interconnect parasitic impedance
parameters contribute signicantly to delay in VLSI chips. Estimation of propagation delay through interconnect has been of great concern for VLSI designers.
Therefore, consideration of interconnect delay has been developed next.
10
delay asymptotically approaches the Elmore delay as the input signal rise time
increases. A lower bound on delay is also developed using the Elmore delay and the
second moment of the impulse response. Brocco et al. [59] have investigated macromodeling and RC tree approaches giving a unied timing simulation method. The
simulation method is faster than SPICE by two orders, for 2 m CMOS technology.
OBrien and Savarino [60] have modeled the driving point characteristics of resistive
interconnects for delay estimation. Compact expressions for worst-case time delay
and crosstalk of coupled RC lines are proposed by Sakurai [61]. Kahng and Muddu
[62] have developed an analytical delay model based on rst and second moments to
incorporate inductance effects in the delay estimation with step input. Delays estimated are within 15 % of SPICE-computed delay across a wide range of interconnect
parameter values. A stochastic wiring distribution based upon Rents Rule has been
derived by Davis and Meindl [6364]. The distribution determines wire-length
frequency and enables a priori estimation of the local, semi-global, and global wiring
requirements for future GSI systems. Brachtendorf and Laur [65] have provided
analytical models by discretization of the telegraphers equations, for the transient
simulation of lossy interconnects. Chiprout [66] has presented guidelines for modeling on-chip interconnects for accurate simulation of high-performance ultra-largescale integration designs. Pamunuwa and Tenhumen [67] have discussed the delay
model for coupled interconnects. Analytical expressions for delay, buffer size, and
number that are suitable in a priori timing analyses and signal integrity estimations
have been developed.
Davis and Meindl [68, 69] have extended Sakurais work [61] by including self
and mutual inductance. The compact analytical expressions derived give an
explanation for the transient response of high-speed distributed resistiveinductivecapacitive interconnect. Simplied expressions enable physical understanding
and accurate estimation of transient response, propagation delay and crosstalk for
global interconnects. Venkatesan et al. [70, 71] have signicantly extended the
work reported in Refs. [68, 69]. They have developed a new physical model for the
transient response of distributed interconnects with a capacitive load. The solutions
are veried by HSPICE simulations. These solutions are used to derive novel
expressions for the propagation delay, optimum number, and size of buffers for
buffer inserted distributed lines. The analysis denes a design space that reveals the
trade-off between the number of buffers and wire cross section for specied delay
and crosstalk constraints.
Xu and Mazumder [72] have introduced the passive discrete modeling technique
using the numerical approximation method. This is called the differential quadrature
method for estimating signal propagation delays through on-chip long interconnects. This delay modeling generates equivalent circuit interconnect models consisting of current and voltage sources, which can be directly incorporated into
circuit simulators such as SPICE. Current sensing, model-reduction-based algorithms, etc., are some other delay analysis methods which have been proposed in
[73]. Worst-case delay has been estimated by Chen et al. [74]. Singhal et al. [75]
have presented a twofold approach for evaluating the signal and data carrying
capacity of on-chip interconnects. In the rst approach, the wire is modeled as a
11
linear time invariant system and frequency response is studied and higher transmission rate is achieved using ideal signal shape. The second approach addresses
delay and reliability in interconnects. Lehtonen et al. [76] have presented a selfcontained adaptive system for detecting and bypassing permanent errors in on-chip
interconnects. The proposed system reroutes data on erroneous links to a set of
spare wires without interrupting the data flow. An improved syndrome storingbased detection method is presented and compared to the in-line test method. In the
presence of permanent errors, the probability of correct transmission in the proposed systems is improved by up to 140 % over the standalone Hamming code.
These methods achieve up to 38 % area, 64 % energy, and 61 % latency
improvements at comparable error performance. Morgenshtein et al. [77] have
presented a unied logical effort delay model for paths composed of CMOS logic
gates and resistive wires. The method provides conditions for timing optimization
while overcoming the limitations of standard logical effort in the presence of
interconnects. The condition of optimal gate sizing in a logic path with long wires is
also given and the condition is achieved when the delay component due to the gate
input capacitance is equal to the delay component due to the effective output
resistance of the gate.
12
13
interconnects and may become a cause for circuit failure [91]. It also leads to false
propagation delay times and increased power dissipation. In modern interconnect
design, interconnects in adjacent metal layers are kept orthogonal to each other.
This is done to reduce crosstalk as far as possible. But with growing interconnect
density and reduced chip size, even the non-adjacent interconnects exhibit coupling.
The extent of coupling is dependent upon the nature of the signal transitions
[9294]. If both interconnects switch in the same direction, the coupling capacitance (Cc) is approximately zero and the total capacitance of each interconnect is
approximated by the line-to-ground capacitance. If one interconnect is switching
and the other is quiet, the total capacitance of each interconnect is determined by the
capacitance (C + Cc). On the other hand, if the signals on each interconnect switch
out of phase, the effective coupling capacitance approximately doubles to 2 Cc.
Thus, the coupling capacitance changes the effective load capacitance, depending
upon the signal switching activity. Buffer insertion is a technique commonly used
for the reduction of crosstalk. However, the buffer insertion technique in subthreshold is not a feasible technique contrary to the super-threshold region. Another
useful approach of reducing crosstalk is to use shielding wires, which also increases
the capacitive load and therefore delay. A more suitable approach is to increase the
spacing between the wires.
Extensive research has been carried out regarding crosstalk and delay estimation
of CMOS gate-driven coupled interconnects. Xie and Nakhla [95] have proposed a
method for crosstalk and delay estimation in high-speed VLSI interconnects with
nonlinear components. The solution of the mixed frequency and time-domain
problem by replacing the linear subnetworks, with a set of ordinary differential
equations using the asymptotic waveform evaluation technique, has been obtained.
Poltz [96] has given electromagnetic modeling of VLSI interconnects and the
Helmholtz equation is used to build models which include eddy current loss and
dielectric loss. Equivalent circuits with high cutoff frequencies and the smallest
possible number of components are assembled. The performance of a VLSI
interconnect at different clock rates is analyzed. Kuhlmann et al. [97] have proposed
a time-efcient method for the precise estimation of crosstalk noise. A metric to
compute coupling noise according to the sink capacitances and conductances of the
aggressor and victim nets has been reported. The noise waveform is computed
using a closed form leading to short computation time. The problem of crosstalk
computation and reduction using circuit and layout techniques has been addressed
in [9899]. Expressions have been provided for noise amplitude and pulse width in
capacitively coupled resistive lines. The estimation is based upon the RC transmission line model. A three-line structure of coupled RC interconnects using the
transmission line model is presented in [100]. However, MOS transistor has been
approximated by a linear resistor. Ling et al. [101] have developed a method to
estimate the coupling noise in the presence of multiple aggressor nets. Authors have
reported a novel technique for modeling quiet aggressor nets based on the concept
of coupling point admittance and a reduction method to replace tree branches with
effective capacitors. The proposed method has been tested for noise-prone interconnects from an industrial high-performance processor in 0.15 m technology.
14
The worst-case error of 7.8 % and an average error of 2.7 % are observed. Devgan
[102] has presented a metric for estimation of coupled noise in on-chip interconnects. This noise estimation metric is an upper bound known as the Devgan metric
for RC circuits, being similar in spirit to Elmore delay in timing analysis. An
enhancement to the Devgan metric has been proposed in [103] to improve the
accuracy for fast input signals. The coupling noise voltage on a quiet interconnect
line has also been analyzed by Shoji using a simple linear RC circuit [104].
Hashimoto et al. [105] have proposed a method to capture crosstalk-induced noisy
waveform for crosstalk aware static timing analysis. The static timing analysis is
performed with the consideration of dynamic delay variation due to crosstalk noise.
Eo et al. [106] have proposed a simple closed-form crosstalk model and experimentally veried the model with 0.35 m CMOS process-based interconnect test
structures having two, three and ve coupled lines with different switching scenarios. Becer et al. [107] have presented a complete crosstalk noise model which
incorporates all victim and aggressor driver/interconnect physical parameters
including coupling locations on both victim and aggressor nets. The validity of
given model against SPICE has been demonstrated and has a good trade-off
between accuracy and completeness, having an average error of 10 % with respect
to SPICE for 130 nm technology. Hasan et al. [108] have derived and analyzed the
crosstalk noise effect on a single victim line. An accurate and flexible decoupled
transient model for victim wire is introduced. The model can be used to compute
the maximum delay and glitch effect due to crosstalk under different slew rates.
Tuuna et al. [109] have given an analytical model for the current drawn by on-chip
bus. The model is combined with an on-chip power supply grid model in order to
analyze noise caused by switching buses in a power supply grid. The buses are
modeled as distributed lines that are capacitively and inductively coupled to each
other. Different switching patterns and driver skewing times are also included in the
model.
Bazargan-Sabet and Renault [110] have presented closed-form formulas to
estimate capacitive coupling-induced crosstalk noise for distributed RC coupling
trees. The formulas are simple enough to be used in the inner loops of performance
optimization algorithms or as cost functions to guide routers. Kaushik et al. [111]
have considered the effect of crosstalk-induced overshoot and undershoot generated
at noise-site. The false switching occurs when the magnitude of overshoot or
undershoot is beyond the threshold voltage of the gate. The peak overshoot and
undershoot generated at noise-site can wear out the thin gate oxide layer resulting in
permanent failure of the VLSI chip. Agarwal et al. [112] have analyzed a simple
crosstalk noise model for coupled on-chip interconnects. The model is based on
coupled-transmission-line theory and is applicable to asymmetric driver and line
congurations. The noise waveform shape is captured well and yields an average
error of 6.5 % for noise peak over a wide range of test cases. Chen and Sadowska
[113] have proposed closed-form formula to estimate capacitive coupling-induced
crosstalk noise for distributed RC coupling trees. The efciency of the approach
stems from the fact that only the ve basic operations are used in the expressions
viz. addition, subtraction, multiplication, division, and square root. Lee et al. [114]
15
have given crosstalk estimation method using coupled inductive tree models in
high-speed VLSI interconnect. The recursive formulas for moment computation of
coupled inductive interconnect trees with self and mutual inductances have been
generalized. Nieuwoudt et al. [115] have given a comprehensive investigation of
crosstalk-induced delay, noise, and capacitance for 65 nm process technology.
Naeemi et al. [116] have described an analytical model that describes distributed
inductive interconnects with ideal and non-ideal return path to optimize crosstalk
and time delay of high-speed global interconnect structures such that the crosstalk
and delay reduce by 38 and 12 %, respectively.
Vittal et al. [117] have addressed the problem of crosstalk computation and
reduction using circuit and layout techniques. The expressions for crosstalk
amplitude and pulse width in capacitively coupled resistive lines have been provided. The expressions hold good for nets with arbitrary number of pins and of
arbitrary topology under any specied input excitation. The experimental results
show that the average error is about 10 % and the maximum error is less than 20 %.
Avinash et al. [118] have proposed a spatiotemporal bus encoding scheme to
minimize crosstalk in interconnects. The scheme eliminates crosstalk in the interconnect wires, thereby reducing delay and energy consumption. The technique is
evaluated by focusing on L1 cache address/data bus of a microprocessor using
SPEC2000 CINT benchmark and suites for 90 and 65 nm technologies.
Nuroska et al. [119] have given a technique that reduces crosstalk noise on buses
based on proling the switching behavior. Based on this proling information, an
architecture conguration obtained using a genetic algorithm is applied that encodes
pairs of bus wires, permutes the wires, and assigns an inversion level to each wire in
order to optimize for noise and power. Hanchate and Ranganathan [120] have
proposed a methodology for wire sizing with simultaneous optimization of interconnect crosstalk noise and delay in deep submicron VLSI circuits. The wire sizing
is modeled as an optimization problem, formulated as a normal form game, and
solved using the Nash equilibrium. Game theory allows the optimization of multiple metrics with conflicting objectives. Lienig [121] presented a novel approach to
solve the VLSI channel and routing problems. The approach is based on a parallel
genetic algorithm which runs on a distributed network of workstations. The algorithm optimizes physical constraints such as the length of nets, number of vias and
is able to signicantly reduce the occurrence of crosstalk.
Rao et al. [122] have proposed a bus encoding algorithm and circuit scheme for
on-chip buses that eliminates capacitive crosstalk while simultaneously reducing
total power. The encoding scheme signicantly reduces total power by 26 % and
runtime leakage power by 42 % while eliminating capacitive crosstalk. Zhang and
Sapatnekar [123] have presented a method for incorporating crosstalk reduction
criteria into the global routing under a broad power supply network paradigm. The
method utilizes power/ground wires as shields between the signal wires to reduce
capacitive coupling, while considering the constraints imposed by limited routing
and buffering resources. An iterative procedure is employed to route signal wires,
assign supply shields, and insert buffers. Wu et al. [124] have proposed a probabilistic model-based approach for crosstalk mitigation at the layer assignment. The
16
17
18
appropriate for low-power CMOS circuit design that does not permit current flow,
other than leakage current, during steady-state operation.
Constandinou et al. [146] implemented an ultra-low-power consuming, simple,
and robust circuit for edge-detection in integrated vision systems in 0.18 m CMOS
technology. Kim et al. [147] presented a low-power smallest area, delay-locked
loop-based clock generator. Fabricated in a 0.35 m CMOS process, clock generator occupies 0.07 mm2 area and consumes 42.9 mW power and operates in the
frequency range of 120 MHz1.1 GHz. Bhaumik et al. [148] implemented a
divided word line scheme to bring down power dissipation in 256 kB static random
access memory design. Mitra and Chandorkar [149] designed a low-voltage CMOS
amplier with rail-to-rail input common mode range. Alternative methods were
applied for obtaining high common mode range, good common mode rejection
ratio, and output swing at such low supply voltage. Hwang et al. [150] reported a
self regulating CMOS voltage-controlled oscillator with low supply voltage sensitivity. Lidow et al. [151] examined future trends in Internet appliances, portable
electronic appliances, and silicon-based power transistors and diodes. It is discussed
how the changing requirements of end users are driving state-of-the-art devices,
new analog ICs as well as different power management architectures. Methodologies and projections related to power dissipation in CMOS circuits have been
specied by Bhavnagarwala et al. [152].
Mutoh et al. [153] have proposed circuit by inserting high-threshold devices in
series into low-threshold circuitry. A sleep control scheme is introduced for efcient
power management. Kawaguchi et al. [154] have suggested super cutoff CMOS
circuit that uses low-threshold voltage transistor with an inserted gate bias generator. In the standby mode, the voltages are applied to transistors to fully cut off the
leakage current. Wei et al. [155] have implemented the dual-threshold technique to
reduce leakage power by assigning a high-threshold voltage to some transistors in
non-critical paths and using low-threshold transistors in the critical path. An
algorithm for selecting and assigning an optimal high-threshold voltage is also
given. The reduction in leakage power is more than 80 % and total active power
saving is around 50 and 20 %, respectively, at low- and high-switching activities for
ISCAS benchmark circuits. In [156], the authors have presented architectures for
low power and optimum speed for image segmentation using Sobel operators.
Pant et al. [157] have presented algorithms that can be used to design ultra-lowpower CMOS logic circuits by joint optimization of supply voltage, threshold
voltage, and device widths. Various components of power dissipation are considered and an efcient heuristic is developed that delivers over an order of magnitude
savings in power over conventional optimization methods. The authors have also
proposed a heuristic technique for minimizing the total power consumption under a
given delay constraint. The approach simultaneously determines transistor power
supply, threshold voltage, and device width by two distinct phases. The proposed
approaches trade off energy and delay invariably by tuning variables (supply
voltage, threshold voltage, transistor size, etc.). Chi et al. [158] have proposed a
multiple supply voltage-scaling algorithm for low-power design. The algorithm
combines a greedy approach and an iterative improvement optimization approach.
19
Deodhar and Davis [159] have suggested voltage-scaling and repeater insertion
for throughput-centric low-power global interconnects. It is assessed that repeater
insertion improves throughput. Using 180 nm technology, it is illustrated that 1 V
supply voltage can reduce power dissipation up to 25 % of that with 2.5 V supply,
for 2 Gbps throughput. The results are compared with SPICE simulations and show
a good agreement. The possibility of applying the buffer insertion technique to
reduce power dissipation and delay in interconnects in voltage-scaled environment
has been carried out in [160, 161]. Analytical approaches for optimum design and
optimum number of buffers in low-power environment have been developed. Buffer
sizing for minimum power and delay in voltage-scaled environment has also been
carried out. The analytical results are within 10 % of the SPICE simulation results.
Banerjee and Mehrotra [162] have addressed the problem of power dissipation
during the buffer insertion phase of interconnect design optimization. Since all
global interconnects are not on the critical path, a small delay penalty can be
tolerated on these non-critical interconnects. A delay penalty of 5 % for lesser
power dissipation at different MOS technologies has been included. It is proved that
there exists a potential for large power saving by using smaller buffers and larger
inter-buffer interconnect lengths. Wang et al. [163] have represented signals by
localized wave packets that propagate along the interconnect lines at the speed of
light to trigger the receivers. Energy consumption is reduced through charging up
only part of the interconnect lines. Voltage doubling property of the receiver gate
capacitances is used. Zhong and Jha [164] demonstrated the importance of optimizing on-chip interconnects for power reduction. It is concluded that signicant
spurious switching activity occurs in interconnects.
Tajalli and Leblebici [165] experimentally and analytically showed that scaling
supply voltage in deep subthreshold region increases energy consumption and also
investigated that optimum supply voltage for minimum energy consumption lies in
moderate subthreshold region. Moreover, digital circuits operated in deep subthreshold region will have signicant delay and noise margin penalties along with
robustness issues that cannot be ignored for portable devices with real-time
applications [166]. Hence, the designing of robust and moderate performance
subthreshold eld programmable gate arrays, real-time portable devices, buses, and
clock signal is uncertain at such low bias [167].
Bol et al. [168] investigated the interests and limitations of technology scaling
for subthreshold logic from 0.25 m to 32 nm nodes. It is shown that scaling from
90 to 65 nm nodes is highly desirable for medium-throughput applications
(110 MHz) due to great dynamic energy reduction. Upsizing of the channel length
as a circuit level technique has been proposed to efciently mitigate short-channel
effects.
Thus, from the literature, it is clear that reducing power dissipation has been a
crucial parameter for low-power VLSI designs. Also, energy-constrained VLSI
applications have emerged for which the energy consumption is the key metric and
speed of operation less relevant. The power consumption of these systems should
decrease to the extent so as to extend the battery life and theoretically have
unlimited lifetimes [169]. To cope with such ultra-low-power applications, design
20
21
22
23
24
The effect of temperature on the various leakage current components has also
been explained by the various researchers. The thermal dependence of punchthrough current has been explained in [212]. It is found that punch-through current
reduces at low temperatures. The temperature dependence of drain-induced barrier
lowering (DIBL) current in deep submicron MOSFETs has been extensively
investigated in [213]. It is found that the DIBL coefcient is nearly insensitive to
temperature reduction in the temperature interval from 300 to 50 K. Authors in
[214] show that DIBL coefcient increases nearly 2.5 times under temperature
reduction from 150 to 25 C. The dependence of impact ionization current component has been studied in [215217]. The leakage current due to impact ionization
is temperature independent in the temperature interval from 300 to 77 K and
signicantly increases under technology scaling.
http://www.springer.com/978-81-322-2131-9