Interview Vlsibank
Interview Vlsibank
Interview Vlsibank
A Dissertation
by
KE CAO
August 2007
A Dissertation
by
KE CAO
DOCTOR OF PHILOSOPHY
Approved by:
August 2007
ABSTRACT
design and manufacturing becomes more and more inadequate. Gone are the days when
designers simply pass the design GDSII file to the foundry and expect very good man-
ufacturing and parametric yield. This is largely due to the enormous challenges in the
manufacturing stage as the feature size continues to shrink. Thus, the idea of DFM (Design
for Manufacturing) is getting very popular. Even though there is no universally accepted
definition of DFM, in my opinion, one of the major parts of DFM is to bring manufacturing
information into the design stage in a way that is understood by designers. Consequently,
designers can act on the information to improve both manufacturing and parametric yield.
In this dissertation, I will present several attempts to reduce the gap between design and
manufacturing communities: Alt-PSM aware standard cell designs, printability improve-
ment for detailed routing and the ASIC design flow with litho aware static timing analysis.
Experiment results show that we can greatly improve the manufacturability of the designs
and we can reduce design pessimism significantly for easier design closure.
iv
ACKNOWLEDGMENTS
Pursing a doctoral degree certainly is a long and difficult process, and it would not
be possible without the help of numerous people. To them I wish to express my sincere
gratitude. I would especially like to thank my advisor, Dr. Jiang Hu, for his generous time
I am also very grateful for having an exceptional doctoral committee and wish to thank
Dr. Weiping Shi, Dr. Duncan Walker and Dr. Vivek Sarin for their continual support and
encouragement.
My thanks go out to the many fellow students in the Computer Engineering program,
Chin-Ngai Sze (Cliff), Ganesh Venkataraman, Di Wu, Mankang Mai, Shu Yan, Zhuo Li,
Xiang Lu and Ying Zhou. They made my routine student life much more enjoyable.
My research for this dissertation was made more practical and more extensive through
the communications with some of the brightest people in the ASIC design and EDA indus-
try. Thus I would like to thank Mr. Sorin Dobre at Qualcomm for the many hour-long dis-
cussions on the trends of various design methodologies including DFM, Dr. Puneet Gupta
at Blaze-DFM for debating lithography aware design methodology, Mr. Bill Graupp at
Mentor Graphics for developing design tool usability, Dr. Andrew Kahng at UCSD/Blaze-
DFM for his encouragement and support.
Finally, I’d like to thank my family. My parents extended their unconditioned support
to my decision to continue with the graduate program. I’m especially grateful to my wife,
Shasha Luo, for her patience, her sacrifice, and for helping me keep my life in proper
perspective and balance. I can not imagine going through graduate school without her
being by my side.
vi
TABLE OF CONTENTS
CHAPTER Page
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A. Alt-PSM Compliance and Composability for Standard Cell
Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
B. Wire Sizing and Spacing for Lithographic Printability and
Timing Optimization . . . . . . . . . . . . . . . . . . . . . . . 3
C. ASIC Design Flow Considering Lithography Induced Effects . . 6
D. Organization of the Dissertation . . . . . . . . . . . . . . . . . 8
CHAPTER Page
E. Printability Optimization . . . . . . . . . . . . . . . . . . . . . 48
F. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 51
G. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
V CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
viii
LIST OF TABLES
TABLE Page
LIST OF FIGURES
FIGURE Page
14 (a) XOR gate layout: T joints in Metal 1 layer that pose phase conflict
problem. (b) PSM clean layout. . . . . . . . . . . . . . . . . . . . . . . 29
15 (a) 2 Input NOR gate layout with N-P pairs split in 3 fingers each: T
joints in Poly. (b) PSM compliant and composable. layout . . . . . . . . 30
FIGURE Page
CHAPTER I
INTRODUCTION
For an ASIC design, it used to be true that design and manufacturing are relatively indepen-
dent of one another, with a set of design rules the only connection between the two. This
ing that we need to address some of the issues in design practice to improve manufacturing
yield. At the same time, process variations in the advanced technologies cause either dif-
ficulty or unnecessary pessimism for design closure. Clearly, a mechanism is needed for
modeling manufacturing processes accurately and bring the modeling into design space.
Thus designer can effectively account for manufacturing realities to enhance manufactur-
ing yield and at the same time reduce design pessimism. In this dissertation, I will attempt
to present some of the new technologies to reduce the gap between design and manufactur-
ing.
Current VLSI technology has the minimum transistor feature size of 90nm and 65nm, which
is remarkably below the lithography wavelength of 192nm. The technology trend for future
indicates that this gap will become even larger as shown in Figure 1. In the subwavelength
shapes. Phase Shifting Mask(PSM) is one of the common techniques in RET. PSM uses the
destructive interference between two 180 degree out of phase lights to print shape edges.
There are two types of PSM currently in use, Attenuated PSM(Att-PSM) and Alternating
PSM(Alt-PSM). In Att-PSM, mask substrates are used to allow a small amount of the out
of phase light to penetrate the normally opaque mask regions. In Alt-PSM, lights with
opposite phases are shed on two sides of a thin critical feature. While Att-PSM poses less
restriction in layout design than Alt-PSM does, Alt-PSM is easier to control in lithography
process and the quality and the robustness of the printed image is better. Alt-PSM is cur-
rently being used in high performance VLSI designs in 90nm and 65nm technologies for
its better critical dimension(CD) control of transistor gate poly. For routing metals of the
standard cell, Att-PSM could be used instead of Alt-PSM as the requirement of CD control
is not as rigorous. Alt-PSM may be the only option for Phase Shifting Mask of gate poly
when VLSI technology further scales down 45nm and 32nm. The Alt-PSM technique is
Even though the Alt-PSM technique is carried out in the stage of mask design and
a phase conflict occurs is split to enable Alt-PSM as shown in Figure 2(c). As the phases
3
Critical feature
phase 0
Light phase 0 phase 0
phase ?
Mask
phase 180
phase 0 phase 180
Chrome
Fig. 2. (a) Alternating Phase Shifting Mask(Alt-PSM). (b) Phase conflict occurs for a
T-shaped critical feature. (c) Phase conflict removal by splitting a phase region.
of lights are opposite along two sides of the splitting line (the dashed line in Figure 2(c)),
unwanted features may be left there and need to be trimmed by another exposure. The
second exposure will increase the already expensive mask cost and cause misalignment
risk. In [3, 4], graph based algorithms are proposed to modify existing layout for Alt-PSM
compliance. These algorithms can achieve global optimality, but demand large CPU time
if they are applied on an entire chip layout. Furthermore, it is very complicated to modify
layout at top level in standard cell based designs, considering both fixed IP designs and
B. Wire Sizing and Spacing for Lithographic Printability and Timing Optimization
When lithography entered sub-wavelength regime, strong diffraction effect may cause sig-
nificant discrepancy between photo-mask patterns and printed features. For example, a
rectangle feature as in Figure 3(a) on photo-mask may result in printed feature with distor-
tion as in Figure 3(b) on the silicon. A circuit layout with poor printability implies that it is
difficult to make the printed features on wafers follow designed shapes without distortions.
Currently, the printability of devices with sub-wavelength sizes is usually improved by us-
ing Resolution Enhancement Techniques (RET) [1, 11–13] such as Optical Proximity Cor-
4
rection (OPC), Phase Shift Mask (PSM), Off Axis Illumination (OAI), and Sub-Resolution
Assist Feature (SRAF), so as to overcome diffraction limit and process imperfections. For
example, mask layout for printing the rectangle feature in Figure 3(a) becomes Figure 4(a)
with OPC and SRAF. As a result, the printed feature on silicon (Figure 4(b)) becomes
However, relying on RET alone is inadequate to harness the problem because of the
following reasons.
• The increasingly large gap between feature size and wavelength forces several ag-
gressive RETs to be jointly applied and thereby compounds the already complicated
mask design and increases mask cost drastically. An example in [15] is shown in
5
Figure 5 to demonstrate this problem. Compared to mask layout without OPC like
Figure 5(a), the mask layout with OPC like Figure 5(b) increases mask data volume,
mask writing time and therefore mask cost dramatically.
• The circuit complexity keeps growing and makes RET formidably challenging.
through either rule based or model based methodology level approaches. In rule based
approaches, lithography friendliness and/or RET compliance are expressed as a set of rec-
ommended/hard design rules which are applied in detailed layout design such as detailed
routing and layout compaction. This approach is fast and relatively easy to use. However,
lithography and RET procedures are so complicated that it is very difficult to convey their
requests fully through simple rules. In model based approaches, lithography and RET sim-
ulations are performed on a circuit layout and any observed problems are fed back to circuit
designers for layout modification. The model based solutions are generally more reliable
6
than the rule based solutions, but the simulations are very time consuming and it usually
phy friendliness and/or RET compliance need to be considered directly in circuit layout
algorithms.
The International Technology Roadmap for Semiconductors (ITRS) projects that process
variations present a critical challenge for both manufacturing yield and parametric yield of
integrated circuit products. The process variations consist of systematic components and
random components. The systematic variations represent both Front End Of Line (FEOL)
and Back End Of Line (BEOL) parameter variations caused by predictable design and
process procedures, such as CD (Critical Dimension) variations from different poly gate
pitches and metal thickness variations occurred during Chemical Mechanical Planarization
In many existing design methodologies, people usually treat systematic variations to-
gether with random variations without differentiation. By handling both kinds of variations
together in a process corner based methodology, people can conveniently circumvent rel-
atively complex systematic variation models. However, such simplification usually causes
unnecessary pessimism in process corner estimations especially when the systematic com-
ponents account for a large portion of the overall variations. Indeed, it is reported in [27]
that more than 50% of transistor gate length variations are due to systematic sources. As
VLSI technology aggressively scales to 65nm and beyond, the influences from both sys-
tematic and random variations become greater and greater. The consequently expanding
process corners force designers to set aggressive timing targets which intensify both design
7
productivity crisis and power crisis [26]. Therefore, significant pessimism in process corner
Among systematic variations, transistor gate length variation has perhaps the largest
impact on circuit timing and power performance since it directly affects both transistor
switching speed and leakage power [28]. Fortunately, gate length variation largely depends
ity Correction) simulations. A pioneer work [29] tried to estimate gate length variations
through computationally expensive aerial image process simulations. Recently, a post OPC
extraction methodology was proposed [30] for timing analysis of critical paths in a design.
In [30], it is found that timing critical paths are changed when post OPC extraction infor-
mation is utilized. However, the overall timing performance of a circuit is not altered by
this methodology. Another work [31] proposed a timing analysis methodology with aware-
ness of lithography induced gate length variations according to different poly pitches. For a
standard cell, three poly spacing ranges are considered for its four boundaries and thereby
81 variants are characterized for each cell. The timing characteristic of a cell instance in a
layout is obtained by matching its surrounding layout pattern with one of the 81 variants.
Others [13] proposed a Restricted Design Rule (RDR) concept. This approach imposes
ing lithography effects, then we extend this discussion into BEOL to investigate the timing
impact of lithography effects on routing metals. Based on these result, we propose a new
litho-aware timing analysis flow which considers lithography induced effects on both gate
length variations and interconnect wire width variations. In this methodology, the timing
and power performance of a cell is based on layout shapes obtained from lithography and
8
OPC simulations and the interconnect wire width variations is considered with a lookup
table.
The rest of the dissertation is organized as follows. In Chapter II, we present a technique
for Alt-PSM compliant and composable standard cell library design. Chapter III contains
a wire size and spacing technology for timing optimization and printability improvement.
Chapter IV discusses a lithography aware ASIC design flow for standard timing analysis.
CHAPTER II
Alternating Phase Shift Mask (Alt-PSM) has been identified as one of the important Res-
facilitate the use of Alt-PSM in VLSI deep submicron manufacturing, we developed a new
methodology for Alt-PSM aware library cell generation. We proposed a two-way approach
for library cell generation and a new library development flow that is very easy to incor-
porate into the current design flow. The methodology we proposed guarantees that the top
level designs using our Alt-PSM aware library do not have poly layer phase errors, which
A. Introduction
In standard cell designs, one speedup method is to exploit the repetition usage of library
cells. The Alt-PSM compliance for each library cell does not need to be obtained repeat-
edly, it can be achieved once in library cell designs. However, due to the proximity effect,
placing Alt-PSM compliant cells adjacent to each other may cause new phase conflict. For
example, in Figure 6, when the two Alt-PSM compliant cells are placed next to each other,
phase conflict will happen between the two regions indicated by the arrow.
We propose a two way approach for developing standard cell library with Alt-PSM
compliance and composability. For smaller standard cells where placement of transistors
has minimal impact on the electrical performance of the standard cell, we will construct
the standard cell layout to be Alt-PSM compliant and Alt-PSM composable; for larger
standard cells with fixed transistor placement and routing for performance optimization,
10
180
0
0
180
180
(a) (b)
we propose an optimal and efficient algorithm to modify the existing layout to achieve
For small standard cell construction, we consider the transistor placement and intra-
cell routing at the same time so that the layout of poly layer and metal layer can match with
each other. As an early classical work, Uehara and Cleemput [5] developed a graph based
automatic cell layout system with a certain regular style. Recently, the minimum width
version of the problem is tackled [6] with Boolean satisfiability(SAT) method [8]. We also
propose to achieve Alt-PSM compliant and composable cell layout by using Boolean satis-
fiability. We transform the constraints for the standard cell synthesis to a SAT formulation
and then use Siege SAT solver [7] to search for solution. Our formulation handles multiple
For complicated standard cells, the placement of the transistors has been optimized
for performance, we propose a network flow based algorithm to modify the existing layout
for Alt-PSM composability with minimal cell area increase.
Our experiments show that Alt-PSM requirements can be satisfied efficiently at the
IP development level. Block level and top level Alt-PSM compliance is guaranteed by our
11
methodology.
The paper is organized as follows: section 2 presents the SAT formulation for standard
cell construction, section 3 provides the algorithm for standard cell layout modification.
The experiment results are presented in section 4. Section 5 is the conclusions of this work.
For simple standard cells with small number of transistors, we use correct-by-construction
approach for Alt-PSM compliant and composability. This section will present the SAT
1. Layout Style
The layout style employed in this work follows the convention of [5, 6] and is illustrated
by redrawing the example of [6] in Figure 7. The major characteristics of this style are
summarized as follows.
• Transistors are placed in two rows, with PMOS at upper row and NMOS at bottom
row.
• PMOS transistors are aligned with their bottom boundaries. NMOS transistors are
• If a PMOS transistor and an NMOS transistor share the same GATE node, they are
vertically aligned.
• The intra-cell routing uses only the poly layer and the Metal-1 layer.
• The VDD rail lies horizontally above the PMOS row and the GND rail lies horizon-
tally below the NMOS row.
• No dogleg is employed.
It is not hard to see that this layout style is very regular and the constraints of regularity
2. Transistor Placement
This section describes the SAT formulation for transistor placement. Following the layout
style mentioned in Figure 7 and placement formulation from [6], N PMOS transistors and
N NMOS transistors need to be placed in minimum number of columns so that the resultant
placement is PSM Clean and Composable. Each transistor’s placement is defined by a set
variable of unit length is needed to define flip of each transistor. We call the flip variable
as fi . Total number of variables thus needed for placement is 2N X(dlogW e + 1). The
placement constraints are:
MOS Overlap Constraints: Any two N/P transistors should not overlap in same col-
umn, i.e.
where xn /x pi1 , xn /x pi2 ..xn /x pip is the bit vector of length P defining placement of ith N/P
transistor. xnip ⊕ xn j p = 1 means that the column number of the ith NMOS transistor has
the same value at the pth digit as that of the column number of the jth NMOS transistor.
Vertical Gate Constraint: N and P transistor placed in same column must share the
where G is the set of P transistors having same gate as NMOS i. Basically, if there exists a
PMOS transistor that shares the same gate connection with this NMOS transistor, this con-
straint aligns that PMOS transistor with the NMOS transistor. This configuration usually
results in the most compact layout as the gate connection between P and N transistors can
be made with the vertical poly. Similar expressions for NMOS transistor will provide the
−2
_ W_
GAPn (i, j) [ (Cn (i, k) ∧Cn ( j, k + 1)] = 1 (2.3)
k=0
where GAPn (i, j) = 0 if NMOS j is placed right of NMOS i to share diffusion.
GAPn (i, j) = fi f j sameds (i, j) ∨ fi f j samed (i, j)∨ fi f j sames (i, j) ∨ fi f j samesd (i, j) (2.4)
sameds (i, j) = 1 if ith MOSs’ drain is same as jth ’s source. samesd (i, j) = 1 if ith MOSs’
source is same as jth ’s drain. sames (i, j) = 1 if ith MOSs’ source is same as jth ’s source.
Flip of neighboring folded transistors should be opposite to ensure that the fingers share
diffusion areas :
Here, fn (i)⊕ fn (i+1) = 1 means that the diffusion area between two neighboring fingers of
a folding transistor is either source or drain to both of the fingers, that in turn, indicates that
the diffusion can be shared for these two fingers. Similar constraint is applies to PMOFETs
15
as well.
PSM Constraint: For all transistors that are not folded, distance between the one’s
Qi j (xnip ⊕ xn j p ) = 1 (2.7)
where Qi j = 1 if gate i − j is connected, 0 otherwise. xnip /xn j p is the least significant bit
of placement vector of NMOS i/j. Similar constraint is applied for PMOS transistors. This
constraint is illustrated by point 1 in Figure 8. It is important to point out that this constraint
is not necessary for placement stage of the standard cell since the gate connection can also
be made by a metal 1 routing. But if it can be satisfied in placement stage without area
cost, it will help increase the routing options of the standard cell.
3. Routing
This section explains the intra-cell routing. Basically, we need to find available routing
tracks for each net that has to be connected through routing. We have followed the routing
style from [7]. Let Pn , Pp , Pg and Pv be the number of boolean variable for N, P, G and V
nets respectively. If Wn ,Wp ,Wg be the number of rows in N, P and G region and Wv the
number of columns in G region, then Pn = dlogWn e, Pp = dlog2Wp e, Pv = dlog2Wv e, and
Pg = dlogW2 (Wg + 2)e. A connection between two gate terminals is called a Poly connec-
16
Poly Connection Constraints: Poly connection between two gate terminals is con-
sidered as a single instance. Let xi j represents the variable that assumes value of 1 if a
as separate G-Nets. For example in a G-Net with gate terminals i, j and k that need to be
connected together.
xi j ∧ x jk ∧ xik = 0 (2.9)
Net Overlap Constraints: Any two nets i, j have overlapping intervals, can not be
where nip is the pth significant bit for the track assignment for net i in N region, In (i) is
the range of column numbers that net i crosses. For example, a connection for net x has been
made from two transistors in column 2 and column 5 respectively, then In (x) = 2, 3, 4, 5.
Similar constraints apply for P and G regions.
VDD/GND Constraints: VDD connection will block the top metal track and GND
connection will block the bottom metal routing track. If a N net has a GND in its interval
17
In (i), it can not be placed in lowest row in N Region. If a P net has a VDD in its interval
where L p is the pth significant bit for the highest routing track in P region or the pth signif-
icant bit for the lowest routing track in N region, depending on the n or p constraint used in
V-Net Pass Constraints: We defined V-net in Figure 7 as a net that has PMOS and
going to block horizontal metal tracks at the location where the PMOS and NMOS drain
connection is made. For this reason, special constraints have to be considered for the V-nets
routing to avoid short on different nets. If we assign the vertical drain connection of a V-net
to be at a specific column, other V-nets along with all its gate terminals should not take that
column assignment to avoid short of the two V-nets. This is illustrated in Figure 8 at point
3 and point 4, two V-nets do not across each other. This constraint is represented by:
where Ai jk1 = 1 means the source-drain connection of V-net i is at the left of the column
k1 that are spanned by another V-net j. Iv (i) is the range of column numbers that V-net i
crosses.
For a G-net connection, it should not short with the vertical connection of the V-net
drain connection for P and NMOS. A G-Net spans from column i to column j should be
18
xi j ai j (Gik ⊕ G jk ) = 0; (2.14)
where Gik = 1 means column i is at the left of column k. ai j = 1 means that gate connection
is done in G-region, ai j = 0 means that gate connection is done with poly routing at the top
or bottom of the cell.
Also for a G-net that spans from column i to column j, it should not take the same
routing track assignment as the horizontal portion of the V-net l if both of the nets span any
same column. i.e.
Connections to gate terminals of two V-Nets if overlap, should not be placed in same
row in G-Region. For any V-Net i with gate terminal j should not overlap another V-Net l
Similar constraints apply in N region and P region as well to avoid short of two V-nets.
Poly Routing Overlap Constraints: For two different poly nets, namely poly net m
which spans from column i to column j and poly net n which spans from column k and
column l can not overlap with the poly connection made at the top or bottom of the cell:
xi j ∧ xkl ∧ ai j ∧ akl ∧ [(gm1 ⊕ gn1 ) ∨ ...(gmPg ⊕ gnPg )] = 0; I poly (m) ∩ I poly (n) 6= φ (2.17)
19
where gmp is the pth significant bit of the row number in top/bottom regions assigned to poly
gate connection net m. I poly (m) is the range of column numbers that poly net m crosses.
As we showed in Figure 2b, T-shaped poly connection creates a PSM phase error. The
constraint to avoid this T-shape poly connection configuration is described as follows: two
same poly nets with terminals i − j and k − l, both can not be placed in either over or under
The example of this can be found in Figure 8. The gate connections at point 1 and point
4 in Figure 8 show that one of the gate connection is made with bottom poly routing track
Poly 2
Metal 1
3
Diffusion
4
Poly PSM Phase Error Constraint: If all columns between two poly nets are oc-
cupied by transistor gates, the number of poly gates in such interval should be even. This
constraint makes sure that there is no phase error generated when poly routing exists at the
20
Ci j = 1 if all columns between i and j are occupied, 0 otherwise. l Pg /rPg are the least sig-
Composibility Constraint: For any two poly connections i − j (i < j) and k − l (k <
l), the column numbers of one poly connection should have same least significant bit as
other connection if all the intervals are occupied. Satisfying this constraint guarantees that
phase assignment at top and bottom of the cell is going to be same, thus achieve Alt-PSM
xi j ∧ ai j ∧ xkl ∧ akl ∧ [Cik (l pgi ⊕ l pgk ) ∨Cil (l pgi ⊕ l pgl ) ∨C jk (l pg j ⊕ l pgk ) ∨C jl (l pg j ⊕ l pgl ) = 0
(2.20)
Metal 1 Layer PSM Constraint: To avoid T-type connections that introduce phase
assignment problem in Metal 1 layer, the width of T’s top can be increased to make it
a non critical feature, thus preventing any phase error. In V-Nets with gate connections,
such situations are very likely. To overcome this, if a V-Net has a T connection, the width
of its vertical or source-drain connection is increased beyond critical size(see point 3 in
Figure 8). To provide enough space and design rule compliance, both columns in vicinity
of such V-Nets can not be occupied. At least one column should be empty.
where Vc−1 = 1 if the column before V-Net source-drain column is occupied, 0 otherwise.
Vc+1 = 1 if the column after V-Net source-drain is occupied, 0 otherwise. However if V-Net
The flow for generating standard cell with SAT formulation is shown in Figure 9.
For complicated standard cells, the placement of the transistor is determined primarily by
performance and area. While our previous approach does yield the smallest area for the
standard cell design, it does not account for timing tradeoffs of the transistor placement.
Thus we will use the following methodology for Alt-PSM compliant and composability:
we use the method proposed by [4] to achieve Alt-PSM compliant and then we use the
1. Problem Formulation
We start by assuming that all standard cells are free of internal phase errors. It is then
straightforward to derive the fact that if we can assign the boundary regions with the same
phase for all standard cells, we create a standard cell library that is composable, i.e. putting
any two standard cells together will not generate phase errors. Thus the Alt-PSM compos-
Given standard cell design without phase errors, determine if the design is Alt-PSM
composable. If not, find the minimum modification of the design in order to make it com-
posable.
As in previous work of Alt-PSM [9], the relation of b < B holds, where b is the mini-
mum spacing between two features defined by design rules, B is the minimum spacing for
features with different phases between them. It is also commonly true that B < 2b .
22
Procedure: PSMPlaceRoute
cell layout.
1. W ← # NMOS (# PMOS)
2. Generate and solve CNF for placement constraints
W ← W +1
W ←W +1
Go back to 2
}
We construct the phase conflict graph according to [4] in Figure 10. In the graph, each
feature is represented by a feature node, the region with 180 degree light is designated by
a shifter node, and if space between two features is less than B, i.e. the two features are
in phase conflict, there is a conflict node representing this situation. There exists an edge
between a feature node and its associated shifter node, the shifter node and the conflict
node. Also an edge between a feature node and the conflict node is added if there is no
shifter node in that side of the feature. In case one feature is in phase conflict with multiple
features, each phase conflict will be represented by a different conflict node and a different
shifter node if the conflict occurs at the shifter side of the feature. The work of [4] has
proved that this graph is planar. The layout is phase error free if the phase conflict graph is
a bi-partite graph [4], i.e., there is no odd cycle in the graph.
We define critical boundaries to be the boundaries of a cell that need the same phase
for composability. Critical boundaries can be specified by designers, for a row based de-
sign, critical boundaries are the left and right side of the cell borders. In a conflict graph,
for every feature node whose location is within a certain distance B to any critical boundary
of the cell, we define another type of node - boundary node, it is either the feature node
if the shifter of the feature node is pointing into the cell, or the shifter node of the feature
otherwise. For example, assuming the phase conflict graph looks like Figure 11, if critical
boundaries are the four boundaries of the cell, node N1-N7 are boundary nodes. We further
define the odd path in a graph as a path composed by odd number of edges. For example,
the path between N2 and N1 in Figure 11 is an odd path.
We want to point out an obvious observation: we can assign the same phase to all
critical boundary if and only if there is no odd path existed between any two of the boundary
nodes in the phase conflict graph. Another fact is that if there are multiple paths between
two boundary nodes, either all of them are odd paths or none of them is an odd path, since
there is no odd cycle in a conflict graph for a cell layout without phase error itself. Thus,
the problem of cell composability is reduced to:
Given a phase conflict graph of a cell layout without phase errors, determine if odd
path between any two boundary nodes exists. If so, determine a minimum change of layout
2. Algorithm
For every edge in phase conflict graph, we define the edge weight as follows: For an edge
that has a conflict node incident on it, if we traverse this edge in both directions, we will
reach two feature nodes in phase conflict, we define the weight of the edge to be the silicon
area penalty if we have to increase the spacing between those two features to remove the
phase conflict. For all other edges, the edge weights are infinite.
25
Fig. 12. Phase conflict bipartite graph: empty dots represent boundary nodes, solid dots
represent all other nodes.
Since the original cell layout does not have any phase errors, the corresponding phase
conflict graph is bi-partite, i.e. we can color the graph with two colors. Then, the nodes
can be partitioned into two sets L and R, with all the edges between L and R. For any node
x in L and node y in R, if they are connected in the conflict graph, the path from x to y is
an odd path. For any two nodes x and y in the same set, the path from x to y can not be
an odd path. Hence, the composability problem becomes to find the minimum edge weight
cut to separate boundary nodes that belong to different sets. In Figure 12, the empty dots
indicate boundary nodes and the solid nodes indicate all other nodes. We use the network
flow model to separate the set of boundary nodes in subset L and those in subset R by
introducing a source node S and a sink node T. We also add edges between S and all the
boundary nodes in subset L and edges between T and all the boundary nodes in subset R,
with all the added edges having infinite edge weights We then search for a minimum weight
cut of S and T, the result will give us the minimum cut of the boundary nodes in L and R,
which is the same as the minimum change of the layout for composability.
The optimality of this algorithm is obvious. If we cut all edges with finite weight, we
can separate S from T, which implies that the minimum cut of S and T has to be finite, and
26
each edge that has been cut has a conflict node incident on it since the weight of the edge is
infinite otherwise. Because all the cut that we make are necessary and the weight of the cut
is minimized, this algorithm is optimal for the design. Note that the cut could also include
an edge that has no boundary node incident on it, but it has to have a conflict node incident
on it.
We analyze the complexity of this algorithm as follows: the generation of the conflict
graph has the complexity of O(nlogn) as demonstrated in [4], with n being the number of
features. The partitioning of the nodes into L and R sets has the complexity of O(n) as each
feature needs to be processed only once. We use the Minimum Capacity Cut algorithm [10]
as the S-T minimum cut algorithm, the capacity of each edge is the weight of the edge.
This minimum cut algorithm takes time O(n3 ). Therefore, the complexity of our algorithm
is O(n3 ). Since the number of feature in a typical standard cell is fairly small, the running
time for our algorithm is quite reasonable.
In Figure 13, we propose a standard cell design flow for Alt-PSM compliant and com-
posability. We take the standard cell schematic or netlist and use the Alt-PSM aware cell
construction flow to generate the layout of the cell, if the cell meets the performance spec-
ification, this is the final design. If not, we go through the normal design flow to generate
the layout and try to modify the layout for Alt-PSM purposes, the final design also has to
satisfy the performance specification.
27
E. Experiments
The constraints for Placement and Routing described above were transformed into SAT
formulations and solved using Siege Variant 4 SAT Solver [7]. In additional to its fast run
time, Siege SAT solver has the major advantage that its output depends on it’s seed. This
helps to avoid adding additional clauses during placement stage to suppress an existing
and3 8 24 10 4 0.03
and8 16 90 12 9 0.73
aoi22 10 40 7 6 0.05
nd2ab 8 32 17 5 0.32
nor2 4 48 21 6 0.23
mux2 10 40 23 7 0.07
nand3 6 40 14 5 0.12
xor2 12 48 30 8 0.58
cgi2 10 40 18 6 0.04
nor5 10 40 3 5 0.08
nor8 18 64 18 8 0.53
Our formulation has been implemented in ’C++’ and experiments were conducted on
Linux server with 2GB of RAM. A timeout of 3000s was used in our experiment. Table 1
lists some of the test cases used in our experimentation. The fist column gives the circuit
29
name, followed by the number of transistors in each circuit, the number of variables for
Placement, number of variables for Routing, the resultant width in columns, and the time
taken to get a satisfiable result. The number of rows in N/P region are assumed to be 3, for
G region 2 and one row each in Over and Under G region. Apart from being of minimum
width, the layouts generated by our formulation are PSM Compliant and Composable.
Figure 14 shows a comparison of layouts with and without PSM considerations in
placement and routing phases for a 2 input XOR gate that uses 12 transistors. Two T junc-
tions (highlighted by circles) can be noticed in Metal 1 Layer that are not PSM compliant.
To avoid any phase conflict, the width of source-drain portion of the two V-Nets needs to
increased beyond critical feature size. This is shown in Figure 14(b). Without any increase
in area, we generate a PSM clean layout. Figure 15 compare the layouts for a 2 input NOR
gate with each transistor split into 3 fingers each. Again T junctions in POLY layer are
removed to get a PSM compliant and composable layout.
Fig. 14. (a) XOR gate layout: T joints in Metal 1 layer that pose phase conflict problem. (b)
PSM clean layout.
2. Layout Modification
We ran the algorithm on a selected set of test cells. All test cells are pre-processed so
that they are Alt-PSM compliant before applying our algorithm. This pre-process is done
30
Fig. 15. (a) 2 Input NOR gate layout with N-P pairs split in 3 fingers each: T joints in Poly.
(b) PSM compliant and composable. layout
manually as typical standard cells require only minor changes for Alt-PSM compliance.
Table 2 shows the area increment comparison of our algorithm and the method of applying
blank area around the cells. We used the minimum poly spacing b = 0.16 unit length and
phase conflict spacing B = 0.32 unit length.
The comparison shows area savings of our approach. For the library cells that will be
placed into the design for multiple times, the total area saving for Alt-PSM designs will be
significant.
F. Conclusion
We have proposed a two-way approach to generate Alt-PSM aware standard cell library.
Our SAT based methodology results in Alt-PSM clean standard cells with minimum area
while our cell modification based methodology generates Alt-PSM clean standard cells
with minimum impact on cell performance. We introduced a new Alt-PSM aware stan-
dard cell design flow. Using our approach, the generated standard cell library can be used
directly in the place and route environment without any concerns of poly layer Alt-PSM
phase errors.
31
CHAPTER III
The printability problem due to strong diffraction effects poses a serious threat to the
progress of VLSI technology. A circuit layout with poor printability implies that it is diffi-
cult to make the printed features on wafers follow designed shapes without distortions. The
problem but cannot reverse the trend of deterioration. Moreover, over-usage of RET may
dramatically increase photo-mask cost and increase the cycle time for volume production.
Thus, there is a strong demand to consider the sub-wavelength printability problem in cir-
cuit layout designs. However, layout printability optimization should not degrade circuit
timing performance. In this chapter, we introduce a wire sizing and spacing method to im-
prove wire printability with minimal adverse impact on interconnect timing performance.
A new printability model is proposed to handle partially coherent illuminations. The com-
plex printability and timing optimization problem is solved in a 2-phase approach. The
difficulty of the printability optimization due to its multimodal nature is handled with a
sensitivity based heuristic. A coupling aware timing driven continuous wire sizing algo-
rithm is also provided. Lithographic simulation results show that our approach can improve
the printability in term of EPE (Edge Placement Error) by 20% − 40% without violating
timing, wire width and spacing constraints.
A. Introduction
There are previous works to address DFM issues within design environment. In [3, 4], the
Alternating PSM compliance is modeled as a graph problem and layout modification algo-
rithms are developed to achieve Alternating PSM compliance with minimum cost change
33
where the cost can be defined in term of area or timing. Compared with Alternating PSM
compliance, the other RET procedures are very difficult to be abstracted into concise mod-
els that can be easily embedded in automatic layout algorithms. Recently, the work of [14]
circumvents the difficult RET abstraction issue by optimizing interference intensity in-
stead. This is based on the observation that a reduced interference intensity can alleviate
the workload of RET and the interference intensity model is relatively easier to be obtained.
An OPC friendly maze routing algorithm is proposed in [14] using the interference inten-
sity model. An RET aware routing algorithm based on fast litho simulation is proposed
in [16].
In layout printability optimizations, conventional design objectives such as timing per-
formance cannot be ignored since timing performance heavily depends on physical layout
in today’s interconnect dominated technology [17]. In this chapter, we focus on the wire
sizing problem which plays an important role on affecting timing performance and we com-
bine wire sizing and spacing to improve printability of the design. There are many previous
works on wire sizing but mostly for timing optimization alone. The work of [17] attempts
to minimize a weighted sum of sink delays for a Steiner tree. A sensitivity based wire
insertion and wire sizing algorithm is proposed in [19] to achieve timing-area tradeoff for
a Steiner tree. A circuit-wise gate sizing and wire sizing method based on local refinement
is developed in [20]. In [21], the circuit-wise simultaneous gate sizing and wire sizing
problem is solved optimally using Lagrangian relaxation. A simultaneous wire sizing and
issues in both wire sizing and spacing. We propose a new printability model and we also
show that the printability model can be used in the aggressively optimized design to im-
we also imply this approach as the use model of the lithography optimization for inter-
connection metals. Our goal is to improve wire printability, i.e., make printed wires have
sharper boundaries, so that the cost of RET and photo-mask can be reduced. Moreover, the
printability driven wire sizing method should minimize any adverse impact on interconnect
timing performance. The major contributions of this work are listed as follows.
a bottleneck for printability optimization. Compared to the model in [14], our model
has two advantages: (1) our model can handle partially coherent illuminations which
is the mainstream illumination method in practical photolithography while the model
in [14] is limited to coherent illuminations; (2) our model directly measures the fea-
ture sharpness and considers the overall light intensity effect instead of considering
only interference light intensity as in [14].
approach considering different problem natures of printability and timing. The diffi-
culty of the printability optimization due to its multimodal nature is handled with a
• A coupling aware timing driven continuous wire sizing algorithm is also provided.
The closest works are [21] which does not consider coupling capacitance, and [22]
Implementing litho friendly design techniques in design phase, such as our approach to
adjust wire sizing and spacing, will alleviate the printability problem. It will help to reduce
the effort level of OPC/RET that has to be performed at the manufacturing stage. This will
have a direct advantage at reducing mask cost, but the more important cost saving is realized
35
by reducing the number of iteration cycles of the mask correction, which contributes to
shorten the time needed to achieve volume production. Lithographic simulation results
show that our approach can improve the wire printability in term of EPE (Edge Placement
B. Problem Formulation
The input to the wire sizing problem is a set of Steiner trees T = {T1 , T2 , ...} representing
the layout of signal nets. In T , there is a set of wire edges E = {e 1 , e2 , ...} and a set of sink
nodes S = {s1 , s2 , ...} such that each edge and each sink belong to a certain Steiner tree.
Each edge ei ∈ E has a width of wi which is bounded in a range of [Li ,Ui ]. The edge width
vector for all edges is w = (w1 , w2 , ...)t , location vector of all edges are x = (x1 , x2 , ...)t and
y = (y1 , y2 , ...)t . In Figure 16, an example for horizontal wires is illustrated. The space
yi wi
sik
yk
wk
Each sink s j ∈ S has a required arrival time (RAT) q j and a delay t j whose model is
Li ≤ wi ≤ Ui ∀ei ∈ E (3.3)
In other words, we attempt to maximize the overall printability of the layout subject
to timing and wire width/spacing constraints. This optimization framework can include
other objectives such as area and power consumption. We consider continuous wire sizing
wire sizing solutions are needed, they can be obtained through rounding the continuous
solutions as in [18].
The above seemingly simple formulation is actually a rather difficult non-linear pro-
gramming problem, since the expressions for objective (3.1) and constraint (3.2) are com-
plicated. Especially, the objective function Θ is not unimodal in general. Therefore, we
1. Obtain a wire sizing solution considering coupling that satisfies constraints (3.2),
(3.3) and (3.4) regardless the printability. A Lagrangian relaxation based algorithm
2. Based on the solution of phase 1, maximize the printability Θ while the constraints
(3.2), (3.3) and (3.4) are still satisfied. A sensitivity based local adjustment heuristic
is introduced in Section E to solve this sub-problem by adjusting both wire width and
wire spacing. The result of this phase is a much better design in terms of printability,
yet still satisfies all the timing and wire width/spacing constraints.
The problem natures of the printability maximization and satisfying delay constraints
are different. The delay optimizations are net based and have no clear geometrical bound-
ary, especially when coupling capacitance is considered. In other words, the delays for
sinks far apart are mingled with each other through the nets and coupling. In contrast, the
37
lithographic effect from an edge or to an edge is localized. By solving the timing con-
straints first in phase 1, phase 2 can be focused on maximizing the printability function
through geometrically local adjustments. The different problem natures also justify why
our 2-phase approach is more practical than solving the entire problem using Lagrangian
C. Printability Model
1. Aerial Image
Since the printability model is based on the light intensity distribution on the wafer plane,
we first discuss the light intensity models for three basic types of illuminations: (1) co-
herent, (2) incoherent and (3) partially coherent. In the following discussions, we assume
that the optical system is a 1× reduction system. Although practical steppers and scan-
ners are usually 4× or 5× reduction systems, a 1× system with the same NA(numerical
aperture) [1] gives essentially identical printing results under the assumption of thin mask
approximation and aberration-free.
Coherent illumination: The complex field distribution g i (xi , yi ) on the image plane
where go (xo , yo ) is the complex field distribution on the object plane (mask) and h(x, y)
represents the impulse response function of the optical system. In the frequency domain,
spatial frequency along x and y directions are denoted as f x and fy , respectively, and the
38
Gi ( fx , fy ) = H( fx , fy )Go ( fx , fy ) (3.6)
where Ii (xi , yi ) = |gi (xi , yi )|2 , Io (xo , yo ) = |go (xo , yo )|2 are the light intensity distribution on
tems employ partially coherent illumination, although the coherent and incoherent illumi-
nation based models provide theoretic foundations for partially coherent illumination based
models. For partially coherent illuminations, there are several existing methods of comput-
ing aerial image such as Hopkins formula and eigenfunction expansion [23]. Abbe’s ap-
proach and eigenfunction expansion method decompose the illumination source into many
coherent sources, then calculates the complex fields due to each source, and finally add
together the light intensities due to each source to obtain the total light intensity distribu-
existing methods are too computationally expensive to be adopted in the wire sizing and
spacing optimization procedure.
39
p2 p1
y y
x w
Fig. 17. Infinite line model in dashed bounding box. Semi-infinite line model in dotted
boxes.
The complex field at one point p due to a line segment (wire segment) e under coherent
~ Similarly, the
The coefficients of this quadratic function depend on the distance vector d.
~ + b1 (d)w
light intensity from incoherent illumination can also be approximated as b 0 (d) ~ +
~ 2 . If there are m line segment with widths of w1 , w2 , ...wm , and the distances from p
b2 (d)w
to them are d~1 , d~2 , ..., d~m , then the total light intensity at p is:
m
I p = (1 − σ2 )| ∑ [a0 (d~k ) + a1 (d~k )wk + a2 (d~k )w2k ]|2
k=1
m
+σ2 ∑ b0 (d~k ) + b1 (d~k )wk + b2 (d~k )w2k (3.9)
k=1
40
The distance vector d~ is determined differently in two cases. If the point is around
the middle of segment e as p1 in Figure 17, the infinite line model is applied and d~ is
~
equivalent to a distance scalar y in Figure 17. Since the coefficient functions such as a 0 (d)
~ are multimodal and very complex, their values are saved in a lookup table. If the
and b1 (d)
point is close to one end of the segment as p2 in Figure 17, the semi-infinite line model is
employed. In this case, the distance vector d~ is decided by x and y component as shown
in Figure 17. Consequently, a 2-D lookup table is needed for the semi-infinite line model.
In contrast to the model in [14], our model does not depend on the segment length directly
Fig. 18. Contour plot of complex field amplitude vs. feature linewidth and distance to fea-
ture for coherent illumination.
Comparisons between the approximated model and simulation [24] results are shown
in Figure 18 and Figure 19. Figure 18 plots the contours of the amplitude of complex
field for coherent illumination with respect to the linewidth of a feature and distance to the
feature. In Figure 19, the contours of light intensity with respect to the linewidth and the
distance are plotted. We also pick a few real circuit patterns and compare light intensity
calculated with our approximated model and the simulation [24] results in Figure 20. In
41
Figure 20(a), we show the test pattern we used, in Figure 20(b), we plot light intensity
result from our models with those from SPLAT calculations. It can be seen that the results
from the approximated models are very close to the simulation results.
Fig. 19. Contour plot of light intensity vs. feature linewidth and distance to feature for
partially coherent illumination.
L L
0.4
0.36
L ig h t In ten sity
0.32
0.28
0.24
0.2
0.3 0.28 0.26 0.24 0.22 0.2 0.18 0.16
Wire width W(um)
Equation 9 SPLAT
W
a. Test structure b. Light intensity
2. Printability Function
In order to print out a sharp image, we wish the overall light intensity inside a feature (wire
segment) to be full level represented by 1 while the light intensity outside the feature should
42
be 0. Ideally, there is a sharp light intensity transition along boundary of the rectangle
representing a wire segment. If the transition threshold is denoted as Ith , we wish Ii (xi , yi ) ≥
Ith inside segment, Ii (xi , yi ) ≤ Ith outside segment. Therefore, the ideal case is Ii (xi , yi ) = Ith
when (xi , yi ) is on the segment boundary. For a wire segment, we chop its boundary into
multiple small pieces which are sufficiently small, then the light intensity on every point of
a single boundary piece ξ can be regarded as the same Iξ . Then the printability function
is defined as
This printability model has two major differences from the model employed in [14].
First, the model in [14] is for coherent illumination while ours is for partially coherent il-
lumination which is much closer to the practical reality. Second, the model in [14] empha-
sizes on interference while ours is focused on image sharpness. The work of [14] attempts
to limit the interference to a target wire from its neighbor wires. However, the effect of
light from the target wire itself is not considered. In practice, it is the overall effect of lights
from the target wire and its neighbor wire that determines the printability of the target wire.
Therefore, our model captures a more complete picture of the printability problem.
With the printability model, we are able to simplify the complicated lithography mod-
eling and simulations and make it suitable for use in the design environment. In Figure 21,
we reproduce a figure from [16] to show the relationships of light intensity and EPE (Edge
Placement Error). We can see EPE is the error in x-axis caused by light intensity error in
y-axis, they are highly correlated.
In phase 1 of our method, we need to find a wire sizing solution which satisfies the con-
straints (3.2), (3.3) and (3.4). Here we perform wire sizing without changing the center
43
Mask
Light Intensity
1
EPE
locations wires. Thus, only w are variables and the spacing constraint (3.4) can be implic-
itly satisfied by enforcing the width constraint (3.3). As in [21], this sub-problem can be
Subject to Li ≤ wi ≤ Ui ∀ei ∈ E
where {µ1 , µ2 , ...} forms the set of Lagrangian multipliers. Alternatively, the problem in
Maximize τ
Subject to t j (w) + τ ≤ q j ∀s j ∈ S
Li ≤ wi ≤ Ui ∀ei ∈ E
44
where τ indicates the minimum timing slack. This minimum slack maximization problem
Subject to Li ≤ wi ≤ Ui ∀ei ∈ E
We can see that the Lagrangian problems (3.11) and (3.12) are very similar with each
other and can be solved in a similar method. The value of the Lagrangian multipliers
can be found using the sub-gradient method as in [21]. For each set of fixed Lagrangian
multipliers, the problem (3.11) is equivalent to minimizing a weighted sum of sink delays
[22]. However, the algorithm of [22] is for discrete wire sizing and the work of [21] does not
edge ei , its length is li and its width is wi . Its edge capacitance is Ce,i and its downstream
capacitance is CL,i . The wire resistance coefficient is denoted as r. Consider a sink node s j
in a Steiner tree Tk with source node at sk , driver resistance Rk and total load capacitance
Ctotal,k . Then, the delay t j to sink s j can be expressed as
rli Ce,i
t j = RkCtotal,k + ∑ w i
(
2
+CL,i ) (3.13)
∀ei ∈path(sk ,s j )
The area, fringing and coupling capacitance coefficient are represented as c a , c f and cx ,
respectively. The set of edges adjacent with ei is denoted as Ad j(ei ). The wire pitch
between two edges eh and ei is Phi . Then, the edge capacitance Ce,i can be obtained as:
2cx li
Ce,i = ca li wi + c f li + ∑ 2Phi − wh − wi
eh ∈Ad j(e ) i
45
where E is a constant. Function J(w) is the coupling related part and U(w) is similar as the
delay function in [21] where coupling capacitance is not considered. They can be expressed
as:
2cx li
J(w) = ∑ ∑ (2Phi − wh − wi )
∀ei ∈E eh ∈Ad j(e ) i
rli rl j
[
wi s ∑ µb + ∑ w ∑ µb
b ∈Des(ei ) eh ,ei ∈Des(e j ) js b ∈Des(e j )
+ ∑ Rk ∑ µb ]
eh ,ei ∈Des(Rk ) sb ∈Des(Rk )
αi wi
U(w) = ∑ + ∑ β i w i + ∑ γi j
ei ∈E wi ei ∈E ei ,e j ∈E wj
where Des(e) indicates the set of descendant edges of e, αi , βi and γi j are positive constants.
According to [25], U(w) is a unary posynomial and therefore can be optimized through
convex programming. In order to optimize F(w), the convexity of J(w) needs to be inves-
tigated as well.
Theorem 1: Function J(w) is convex.
Proof: It can be observed that J(w) is a positively linear combination of the following base
functions:
1
φ(x, y, z) =
z(D − x − y)
1
ψ(x, y) =
x(D − x − y)
1
ζ(x, y) =
D−x−y
where D is a positive constant and x > 0, y > 0, z > 0, D − x − y > 0. Therefore, J(w) is
46
Let A = 1
z and B = 1
D−x−y , then the Hessian matrix for φ(x, y, z) can be derived as:
2A3 B −A2 B2 −A2 B2
H= 2 2
−A B 2AB3 2AB3
−A2 B2 2AB3 2AB 3
vt Hv = 2AB[a2 A2 + (b + c)2 B2 ] ≥ 0
Sink 2
Source
e4
e1
P Sink 1
e3 e2
Sink 3
Since J(w) is convex and U(w) is a posynomial, there is a unique global minimum
solution for F(w) [25]. Therefore, the global minimum solution can be reached through
iterative local optimization as in [25]. In each local optimization, wire sizing is performed
for only one edge ei while the widths of the other edges are fixed. We use ei 1 s j to denote
that edge ei and sink s j are in the same Steiner tree. Thus, the delay function F(w) in the
47
αi j
f (wi ) = ∑ w
+ ∑ βi j wi
sj ∈Des(e ) i
i ei 1s j
ηi j
+ ∑ w (Qik − wi )
s j ∈Des(ei ),ek ∈Ad j(e ) i
i
ρi j
+ ∑ Q − wi
s j 1ei ,s j 1ek ,ek ∈Ad j(e ) ik
i
where ηi j , ρi j and Qik are positive constants. In the above equation, the first term represents
the impact of wire resistance of ei on its descendant nodes. For example, the effect of e1
on sink 1 and sink 2 in Figure 22. The second term reflects the effect of wire e i self
capacitance on sink nodes in the same tree, as e3 for sink 1, 2 and 3 in Figure 22. The third
term represents the product of wire resistance of ei and coupling capacitance of ei and the
effect on its descendant nodes. For the example in Figure 22, the wire resistance of e 1 and
its coupling capacitance with e2 affect the delay at sink 1 and 2. The last term shows that
the coupling capacitance due to wire ei presents a capacitive load to its Steiner tree and
the Steiner tree it is coupled with. For example, the width of edge e 4 affects the delay of
sink 1, 2 and 3 in Figure 22. Figure 22 also shows that a wire segment could be part of the
Steiner tree edge. Here, we define the portion of the Steiner tree as a wire segment if the
coupling remains the same across the whole segment. For example, e 1 is a segment, but the
rest of the wire toward sink 1 is another segment. This way, we are much more flexible for
optimization since a single physical wire can have different width across it.
Lemma 1 Function f (wi ) is convex.
d f (wi )2
Proof: It can be shown that d 2 wi
> 0. 2
d f (wi )
If there is value w̃i satisfying dwi = 0, then f (wi ) has a unique minimum solution
d f (wi )
at w∗i = min{Ui , max{Li , w̃i }}. Please note that dwi = 0 is a fourth order equation which
has closed form solutions. Based on this local optimization, the algorithm of minimizing
Theorem 2: The timing driven wire sizing algorithm can converge to the optimal solution
and the complexity of the algorithm is O(n), with n being the number of wire segments.
Proof: The proof is similar as the proof of Lemma 2 and Lemma 3 in [25] and is omitted
here. 2
E. Printability Optimization
The feasible solution obtained in phase 1 is fed to phase 2 in which the printability Θ(w, x, y)
is maximized by adjusting wire width and spacing. The feasibility in terms of (3.2),
(3.3) and (3.4) is maintained, which ensures that there is no violation on timing and wire
width/spacing rules.
Equation (3.9) and (3.10), indicate that the printability function is an eighth order
based heuristic similar as in [18] to solve this complicated problem. The sensitivity of the
Θ(wi , x, y) − Θ(w, x, y)
ψi,w (w, x, y) = (3.14)
δ
Θ(w, xi , y) − Θ(w, x, y)
ψi,xy (w, x, y) = (3.15)
δ
49
where ψi,w is the sensitivity with respect to the wire width wi , ψi,xy is the sensitivity with
respect to the center location of the wire. Here, we use vertical wires as an example,
wi = (w1 , w2 , ..., wi + δ, ...wn )t , xi = (x1 , x2 , ..., xi + δ, ...xn )t . Horizontal wire segments can
be handled in a similar way. By changing wire width and center location of the wire, we
adjust both the width of the wire and space of the wire to its neighboring wires to improve
the printability function while the wire size and spacing rules are satisfied. Also, we do
incremental timing analysis to make sure the required arrival time stays accurate.
Even though the printability function Θ(w, x, y) is based on the light intensity of ev-
ery wire segment in the entire layout, the computation of the sensitivity ψ i,w (w, x, y) and
ψi,xy (w, x, y) can be limited to a geometrically local region. This is because a change on
wire width or wire space affects light intensity of only a local region close to that wire
segment. Usually, the effect of a change decays to negligible level at a location more than
2λ (λ is the lithographic wavelength) away from the location of the change. Hence, we
generate a window by expanding the wire segment i by 2λ on each of its four sides. The
computation of sensitivity ψi,w (w, x, y) and ψi,xy (w, x, y) can be limited within this win-
dow. After the sensitivity for every wire is obtained, the wire with the maximal value
of ψi (w, x, y) = max(|ψi,w (w, x, y)|, |ψi,xy (w, x, y)|) is selected to be changed. The pseudo
When we attempt to tune the width or center location of a wire, the constraints on
wire size and wire spacing are enforced. In addition, we need to ensure that there is no
timing violations due to this change. Even though there is analytical formula for sink delay
t j (wi ) of a sink s j ∈ S with respect to edge ei ∈ E, the delay constraint function sometimes
is equivalent to a third order polynomial of wi and the resultant feasible range for wi is
not necessarily continuous. Therefore, we just check the timing feasibility of each change
instead of finding an analytical bound for a change. If a change of wire width and/or space
causes any delay constraint violation, this change is forbidden. The pseudo code for the
50
Printability optimization
such that t j ≤ q j ∀s j ∈ S,
1. Do {
2. e j ← NULL
6. e j ← ei
7. If e j 6= NULL
sensitivity based heuristic is shown in Figure 24. We change every wire segment at most
once during the optimization. After each change, we need to update the timing for the path
that contains this target segment, as well as timing of the paths that contain any wire seg-
ment that couples into the target segment. Let P be the maximum number of wire segments
that couples into any given wire segment in the design, G be the maximum number of wire
segments in any given timing paths in the design, the complexity for printability optimiza-
wire segment of the target segment. Also G is typically much less than the total number of
wire segment n.
F. Experimental Results
Our method is implemented in C++ and the experiment is performed on a SUN Sparc Ultra-
80 workstation with four 450MHz CPU and 4Gb RAM. Table III shows the number of nets,
horizontal wires and vertical wires for the benchmark circuits. Based on 90nm technology,
the wire width is allowed in a range between 100nm and 300nm while the wire pitch is
400nm. The light intensity threshold Ith at wire segment boundary is chosen as 0.3, since
this is usually where light intensity slope is the greatest. Here, light intensity is relative
intensity. Light intensity at the wafer when there is no mask between light source and the
wafer is defined as 1.
Since there is no previous work on this timing and printability optimization problem,
we list the results of the 2 phases of our method in Table IV. The number of sinks with
timing violations for each case is in column 2. After phase 1, all timing violations are elim-
inated as indicated in column 3. The printability function Θ values after phase 1 are shown
in column 4. The printability after phase 2 and the percentage improvement are in column
5 and column 6, respectively. It can be seen that our sensitivity based heuristic can yield
52
the rightmost column. This computation speed is reasonable for practical applications.
In order to validate our approach with a precise model, lithographic simulation [24]
is performed on the layout result of Phase 1 and Phase 2. SPLAT is a simulation program
UC Berkeley, it is one of the first tools in this field. One of the most important and obvious
metric for design printability is the average EPE (Edge Placement Error) [16]. EPE is the
distance between a printed edge to the location where this edge is intended to be in the
design. Smaller EPE means that the printed edge is closer to its intended location. Designs
with smaller EPE will be easier for OPC/RET in the manufacturing stage to further reduce
EPE to spec. We compare the average EPE for the designs before and after printability
optimization. The simulations for wires on Metal 1 and Metal 2 are conducted separately
53
Table IV. Experimental results. # vio is the number of delay violations. Θ is the printability.
# vio remains 0 in phase 2.
Input Phase 1 Phase 2 Total
Circuit # vio #vio Θ Θ improv CPU(s)
and the results are summarized in Table V. These data show that our method can improve
the average EPE by about 20%-40%. Considering that no OPC has been applied yet, such
G. Conclusion
Lithography friendly design is a concept that aims to reduce litho process complexity so
that volume production of the design can be achieved faster in the fab while reducing mask
cost. In this chapter, we propose an approach that adjusts wire size and space to optimize
layout printability while maintaining design performance. A new printability model is pro-
CHAPTER IV
are increasingly affected by process variations. In practice, people often treat systematic
components of the variations, which are generally traceable according to process models,
in the same way as random variations in process corner based methodologies. In particu-
lar, lithography induced process variations are usually estimated by a universal worst case
value without considering their layout environment. Consequently, the process corner mod-
els based on such estimation are unnecessarily pessimistic. In this chapter, we propose a
new ASIC design methodology which captures lithography induced polysilicon gate length
variations including both the layout dependent systematic components and random compo-
nents. Our methodology also shows that look-up table methodology is sufficient to handle
BEOL (Back End Of Line) lithography process variations in timing analysis. In addition,
a new technique of dummy poly insertion is suggested to shield inter-cell optical interfer-
ences. This technique together with standard cells characterized using our methodology
will let current design flows comprehend the variations almost without any changes. More
our methodology greatly reduces pessimism in timing analysis, thus enables both aggres-
sive design implementation and easier timing signoff. Experimental results on industrial
designs indicate that our methodology can averagely reduce timing variation window by
11%, power variation window by 55% when compared to a worst case approach.
A. Introduction
gate length variations on inter-cell spacing. In general, the gate length variations in
the boundary regions of a cell depends on its spacing with outskirt poly of neigh-
boring cells. However, the neighboring cell information is not available before cell
placement is completed. Our dummy poly insertion technique can generally avoid
such dependence on unknown information. Therefore, we do not need to characterize
different variants of a cell as in [31]. Moreover, the pattern matching based variant
selection [31] is not necessary any more in later design stages. In other words, the
lithography aware cell characteristics can be utilized without affecting current stan-
dard cell based design flows.
poly shapes. The printed poly shapes on wafer not only have deviations on gate
length, but also have different deviations on different spots of a same poly. In other
words, the deviations are by and large non-uniform. Our approach is in contrast to
difficult to run lithography simulations on the full chip level. However, the size of a library
cell is very small and the characterization is usually performed only once. Therefore, the
expensive lithography/OPC simulations are affordable in this scenario. More importantly,
cell-based timing annotation methodology is used in typical timing analysis for standard
cell based ASIC design. In order not to disturb this flow, it is therefore, very desirable
STA (Static Timing Analysis) flow utilizes process corner conditions for timing signoff.
This approach introduced high level of pessimism. For example, in slow process corner, all
transistor gates are assumed to have the largest gate lengths. In reality, that will never hap-
pen. Because of the systematic nature for lithography effects, our methodology predicts
this portion of the variations and take it into consideration for STA to reduce pessimism
significantly. We applied our methodology to industrial library cell designs. The experi-
mental results indicate that our methodology can averagely reduce timing variation window
B. Overview of Methodology
It is claimed in [30] that litho simulation on individual standard cell does not represent
accurate lithographic effect for that standard cell in block level designs because lithographic
effect depends on proximity of that standard cell. However, increasing the distance between
one shape and other shapes will reduce the impact of the lithographic effect of other shapes
tremendously. It is also very important to notice that the closest neighbors of a shape are
the dominating factors to model based OPC process and Sub-Resolution Assist Features
We ran Mentor Graphics Calibre LFD on two sets of test structures shown in Figure 25.
The first set of the structures have three shapes with distance L between them. The second
set of structures have five shapes where the distance between the middle shape and the
shapes on both sides is also L. The shapes next to the middle shape are L1 away from it.
In both sets, L is changing and L1 in second set is fixed. The CD (Critical Dimension) data
for the middle shape is recorded from our lithography simulation tool.
58
The result is shown in Figure 26. We can see that the CD of test structure 2 has much
less variations than that of test structure 1, i.e., having neighboring shapes with a fixed dis-
tance really helps reducing printability variations of that shape. Also, other test structures
we ran showed that a shape has minimal impact on another shape’s printing image if there
L L L L
L1 L1
CD CD
a. Test structure 1: variable b. Test structure 2: variable
space L. space L with fixed space L1.
68
66
64
CD(nm)
62 Test 1
60 Test 2
58
56
54
42.5
45
47.5
50
52.5
55
57.5
60
62.5
65
67.5
70
72.5
75
77.5
L(10 nm
)
Fig. 26. Critical dimensions of the two test structures when L changes.
The range of L and L1 that we chose to exercise the test structures is hardly random.
In our 65nm standard cell library implementation, L is the range of the possible poly gate
spacing if two standard cells are placed adjacent to each other. If the two neighbor cells
59
have gate space larger than L, a filler cell with a dummy poly will be inserted between
these two cells for Design Rule Check (DRC) and power connectivity purposes. For the
first set of test structures, we observe over 10% of CD variability over the range of L.
However, in our standard cell library architecture, we can put a dummy poly shape at the
cell border without introducing any area penalty. In this case, the value of L1 in our test
structures represents the minimum spacing between the dummy poly and active transistors
in the standard cell. When two cells are placed side by side, the dummy poly shapes of
both cells overlap exactly (see Figure 27). We also would like to point out that the dummy
poly shapes we insert are field poly, i.e., they do not form new devices as they fall in the
gap of the diffusions between two closely placed standard cells. Thus, these shapes do not
cause extra LVS verification efforts. Our standard cell designs ensure DRC of the dummy
poly lines as the gap of the diffusion is large enough. By adding dummy poly shapes into
the original standard cells, we introduce fixed closest neighbors to the poly gates that are
at the cell boundary, thus greatly reduce the CD variations introduced by various proximity
of this standard cell in the design as shown in Figure 26. In fact, they effectively “shield”
all the internal transistors from lithographic effects of neighboring structures.
In lithography process, there are several contributors of the poly gate length variations (see
Figure 28). One of them is the variation caused by poly pitch to neighbors. The other is
the L-shaped poly cornering effect, which is particularly important for small transistors.
This effect is clearly shown in the middle transistor in Figure 28. To obtain an accurate
prediction of the transistor behavior, we need to account for multiple sources of poly gate
length variations.
60
Dummy poly
The lithography induced deviations of the critical dimension, which is the poly gate
length, are usually in the order of a few nanometers. For other larger shapes, the relative
shape deviation is much smaller. In a typical 65nm design, poly gate width and diffusion
dimensions are at least 2 or 3 times of the gate length. That means that lithography in-
duced circuit performance variations are mostly due to gate length deviation. It is therefore
sufficient to extract the lithography information for only poly gate length. The printed im-
age of gate poly shapes across process window will be employed to replace gate length
61
image offset parameters introduced by traditional process corner models. We keep all the
other process corner parameters unchanged, such as threshold voltage variation, gate oxide
variation, etc.
poly shapes. Ideally, we wish to run SPICE simulations to obtain timing and leakage power
profiles of the cell. However, current device models in SPICE can handle only rectangular
shaped transistors while the lithography/OPC simulation results are often irregular shapes.
For example, a gate length is relatively large at the location of a jog, but is small right
before reaching that jogging region. In order to solve this mismatch, we try to compute
an effective gate length which may provide the same timing/power performance of a post
lithography/OPC simulation poly shape. In general, the on current Ion of a transistor deter-
mines the timing performance of this transistor. The leakage power of a transistor is mostly
dependent on the off current Io f f of the transistor. Since on and off currents of a transis-
tor usually have different sensitivities to gate length variations, we need to use different
l_i l_eff
w_i
Fig. 29. Calculation of effective gate length for timing and leakage.
We utilize a segmentation technique to compute the effective gate length for timing
and leakage. First, we construct two lookup tables for transistor Ion and Io f f . For both
62
tables, each row corresponds to a specific transistor gate width and the columns are for
different transistor gate length. Each entry of the table represents Ion or Io f f of a transistor
with gate width and length specified by the row and column indices. The ranges of transis-
tor width and gate length, i.e., the ranges of row and column indices, are based on typical
transistor sizes allowed in fabrication. The values of Ion and Io f f are obtained through
SPICE simulations.
Next, we chop a poly shape from lithography/OPC simulation into multiple segments
and each segment can be approximated by a rectangle. This is illustrated in Figure 29.
The Ion and Io f f of each small segment can be obtained from a simple calculation based
on the lookup tables. Please note that the width of a segment is usually much smaller
than fabrication allowed size. Thus, it cannot be matched to any row index in the lookup
tables. We suggest to solve this discrepancy through scaling. For example, consider a
transistor with nominal gate length 65nm and width 200nm. We chop its gate poly shape
from lithography/OPC simulation into 10 segments. Thus, each segment i has a length l i
and width of 20nm. Then, we can find the on current Ion (li , 200nm) from the lookup tables.
The on current of this segment can be approximated as Ion (li , 20nm) = Ion (li , 200nm)/10.
The off current Io f f (li , 20nm) can be calculated in the same way.
Once the on and off currents of all segments are available, the overall currents of the
entire transistor based on the lithography/OPC simulated poly shape can be calculated as:
n
Ion,shape = ∑ Ion (li, w) (4.1)
i=1
n
Io f f ,shape = ∑ Io f f (li, w) (4.2)
i=1
and the lookup table of on current. From the nominal transistor width, we can find its
corresponding row in the lookup table. We search for the entry in the row with value
closest to the Ion,shape obtained above. The column index for this entry is the Le f f ,timing for
this transistor. The effective length Le f f ,leakage can be obtained in the same way based on
Io f f ,shape and the lookup table for off current. Similar technique has been proposed in [32]
with detailed analysis of this modeling method.
After we calculate the effective gate length Le f f ,timing and Le f f ,leakage of each poly gate
shape, we need to back annotate the standard cell SPICE netlist with these effective gate
lengths. The layout of a transistor may consist of multiple fingers and lithography usually
has different effects on each finger depending on the layout environment. Therefore, we
need treat these fingers separately even though they belong to the same transistor. When
an LVS (Layout Versus Schematic) tool runs in its normal mode, it automatically merges
multiple fingers of a transistor into a single gate. To avoid this merging, we perform a
special LVS that takes the x and y coordinates of each poly gate shape into a layout netlist
even when some poly gate shapes are the fingers of the same transistor. The layout netlist
will be fed into our extraction tool to generate a netlist with parasitics. The extracted
netlist also keeps each poly gate shape as a separated device. We then use the x and y
coordinates to match the poly shape in the extracted netlist with poly shape contours from
lithography/OPC simulations. We back annotate the Le f f ,timing and Le f f ,leakage into the
extracted netlist. Thus, for each standard cell, we generate one netlist for timing simulation
and another netlist for leakage power simulation. By including dose and focus variations
in the lithography/OPC simulations, we can have the extracted netlist for each cell at the
worst and the best process corners.
64
For interconnections, lithography also introduces variations, mainly on the width of the
wire. In Figure 30, we show the contour of the metal wire on top the intended drawn shape.
The distortion of the routing metal also introduces timing variations. Therefore, we will
also need to consider these effects for STA. However, it is obviously true that metal shape
distortions do not impact timing in a noticeable way locally in an individual standard cell
because of the small size of the cells. For interconnection, signal routing is often done in
a grid based fashion, meaning most portion of the wire will have stable and predictable
environment. It is interesting to notice that even though part of the interconnection will
be distorted by lithography process significantly, the wire stays close to the drawn shape
across most of the its length. In this section, we will verify that traditional parasitic ex-
traction methodology with a lookup table still applies for STA analysis considering lithog-
raphy effects. We compare the STA results from two different approaches, one with full
lithography simulation on BEOL interconnection wire width and one using look-up table
65
extraction.
After performing lithography simulation on BEOL, we obtain the contour of the wire.
However, the contour is presented by a polygon with a large number of vertexes. It would
be extremely difficult for the extraction tool to handle these kinds of polygons. To solve
this problem, we convert these polygons with smoother ones. This process is described as
follows.
• We convert the polygon to a Manhattan polygon whose segments are either vertical
or horizontal.
• We define a threshold value for wire width change between adjacent wire segments.
If the change for adjacent edges is smaller, we merge the two segments into one with
the average wire width for the segment, as shown in Figure 31.
• We use the bounding box of the VIA shapes for VIA area.
b. STA Comparison
Two separate STAs are performed, one use drawn shape routing metal for extraction, the
other use polygons converted from litho contours of the routing metals. We verified the
timing difference is very minor. The results are presented in the experiment section.
Based on our analysis of FEOL and BEOL litho effects, we conclude that we should focus
most of the efforts on poly gate litho effects of standard cells for litho-aware timing analysis
flow. With accurate litho simulation and device characterization, we are able to extract
66
timing impact of litho process for a given standard cell. At the same time, with an effective
technique such as dummy poly insertion, we are able to minimize timing variation caused
by standard cell context for a placed design. For BEOL routing metals, we show that
we can continue to use current parasitic extraction methodology. In summary, the litho-
aware timing analysis flow will reduce the pessimism of the traditional corner based signoff
methodology. At the same time, the evolution to the litho-aware timing analysis flow from
the current STA flow mostly happens in the standard cell development and characterization.
The costly litho simulations can be avoided in the block and chip level.
C. Experiment
1. Standard Cells
We follow the flow shown in Figure 32 for standard cell characterization. From the original
standard cell, we first insert dummy poly on the boundary of the standard cells, we stream
out the gds to feed into our litho simulation tool. We then use the result of the litho simu-
67
lation to generate the new netlist with the lookup table for timing and leakage. Last we run
the standard cell characterization with our standard flow and tools.
Dummy Lookup
Poly Table
insertion Generation
We run litho simulation with Mentor Graphics Calibre LFD on our original standard
cell library layout. Since our litho simulator provides the printed images of transistor poly
gates across process window, we can calculate the longest and shortest effective gate length
for each gate. With our original standard cell netlist at the worst timing corner, which has
the worst RC parasitic extraction, we change the length of each gate to the longest L e f f of
that specific gate, this gives us the annotated cell netlist at the worst timing corner. We do
the same for original cell netlist at best timing corner except that we use the shortest L e f f
of each gate to replace the original gate length in the netlist and we get the annotated cell
netlist at the best timing corner. We repeat the process for leakage corners and we get the
annotated cell netlists at best leakage corner and worst leakage corner.
In Figure 33 and Figure 34, we show the distributions of lithography induced gate
length deviation for both best and worst timing corners for all our library cells. The x-
axises in Figure 33 and Figure 34 are the difference between Le f f of that gate and the draw
length, the y axises are the number of poly transistor gates with the specified gate length
deviation. We show that gate length variations can be as much as 16nm across process
window. Although the data reveals the level of immaturity for current 65nm Resolution
68
Enhancement Techniques (RET) including OPC and SRAF generation, it further confirms
the value of our methodology which considers lithography induced gate length variations
in a systematic fashion.
2000
1800
1600
# o f g ates 1400
1200
1000
800
600
400
200
0
5
.5
.5
.5
1.
2.
3.
4.
5.
6.
7.
8.
9.
10
11
12
a) Litho effect: longest Lgate(nm)
1600
1400
1200
# o f g ates
1000
800
600
400
200
0
0
3
-7
-6
-5
-4
-3
-2
-1
All the generated netlists have been characterized with our standard cell characteriza-
tion flow. We present the timing and leakage variabilities of a set of representative standard
cells in Table 1. In column 2 and 3, we report the timing variation between two timing
signoff corners with original standard cell netlist and with our new netlist, the percentage
69
change is reported in column 4. In column 5 and 6, we report the leakage ratio between
two leakage analysis corners with original standard cell netlist and with our new netlist.
All data are presented with the an input slew of 180ps and output load of 4.7 ff. We see
an average of 11% decrease for the variabilities of delay. As we performed our litho sim-
ulation across process window, through exposure dose and depth of focus rather than at a
normal process condition, we think that is a significant source of the variability. However,
we strongly believe that litho simulation through process window is absolutely necessary
in order to capture lithography effect properly in process corner based design flow. The
leakage analysis shows that the new netlist of the standard cells have far less variability
for leakage, with the average ratio less than half of that of the original netlist. We would
also want to point out that because transistor leakage is exponational to the transistor gate
length, doing leakage calculation with this methodology can help identify lithography sen-
sitive design patterns for leakage and thus help improve standard cell design robustness in
terms of leakage variability. With the improvement of OPC and SRAF generation from
the foundry, we believe we will see better design variability control for both timing and
leakage.
2. Design Implementation
We apply our newly characterized standard cell library to one of the low power, high speed
hard macros of our 65nm designs for timing analysis. We use our standard timing signoff
flow for this analysis, with the same timing constraints as the original design.
Figure 35 shows the timing variability reduction for the 400 most timing critical paths
in the design. Timing variability for a path is defined as the path delay difference between
best timing corner and worst timing corner. Use our methodology, we are able to reduce
the variability on an average of about 330ps. With the average path delay variability at 3ns,
we reduce the variability by 11%, which is consistent with our standard cell analysis.
70
35
30
25
# of paths
20
15
10
5
0
0. 3
31
0. 9
26
0. 4
21
0. 9
16
0. 4
11
0. 9
0. 6
0. 4
01
3
0
26
27
28
29
30
31
32
33
34
35
36
37
38
39
0.
0.
0.
0.
0.
0.
Path timing variability reduction(ns)
We present data for leakage analysis for several 65nm hard macros in table 2. Column
3 and 4 are the leakage ratio between two leakage analysis corners for original and new
netlists respectively. The data is once again, very consistent with cell level analysis.
3. BEOL Comparison
We use a 65nm high speed IP block implemented with standard cells as our test case.
There are about 2k placed standard cell instances in this test case. As described in previous
section, we performed two STA simulations based on different approaches for lithography
effects on interconnection, one with routing metal drawn shape and the other with litho
contour converted polygon for routing metals. We generated STA reports for all endpoints
of the design for both setup and hold delays. We then compare the difference of the delays
with the two STA results for each endpoint. The difference is shown in Figure 36(a) and
Figure 36(b).
The experiment shows that the timing impact for metal litho effects is within 1%-2%
for both setup and hold delay paths except one hold delay path, which is about 3.6%. Ad-
ditionally, litho effects cause some paths to have larger delays, other paths to have smaller
71
delays. In fact, the small differences caused by litho effects on routing metals are in the
same order of the errors in STA flow and the parasitic extraction.
D. Conclusion
In this chapter, we proposed a new lithography aware design methodology. The system-
atic lithography effects are considered in the design flow to reduce design pessimism. For
polysilicon transistor gates, we introduced dummy poly into the standard cell to achieve
context independency of the standard cell timing model; for interconnection, we verified
that traditional lookup table based methodology is still applicable considering lithography
effect. Our methodology can be easily incorporated into current design flow with virtu-
ally no impact on design schedule. We performed a lithography simulation with foundry
validated and calibrated production lithography models across process window, we ex-
tracted the electrical parameters from the lithography images and applied STA on the de-
sign. As a result, we have reduced the pessimism introduced by the traditional process
corners methodology for timing and leakage analysis for real 65nm designs.
72
(ps) (ps)
0.008 0.03
D e l a y C h a n g e (n s )
D e l a y C h a n g e (n s )
0.025
0.006
0.02
0.004 0.015
0.01
0.002
0.005
0 0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77
Path ID Path ID
(a) Delay change of setup paths. (b) Delay change of hold paths
CHAPTER V
CONCLUSION
Design For Manufacturability (DFM) is seen as an essential channel of communication be-
tween design and manufacturing for 65nm VLSI technology and beyond. Our proposed
methodologies serve for the very purpose by translating manufacturing information into
design guidances that can be utilized by designers. We addressed the issue of Alt-PSM
phase error in the standard cell development, we improved the printability of the routing
metals without introducing timing degradations, we enabled the litho aware STA flow with
litho aware standard cell characterization methodology. Hopefully, our efforts help design-
ers successfully capture some of the most important manufacturing effects and therefore
facilitate them to improve both manufacturing and parametric yield of the design.
75
REFERENCES
[2] C. Pierrat, M. Cote, and K. Patterson, “New alternating phase-shifting mask conver-
sion methodology using phase conflict resolution,” in Proc. of SPIE, vol. 4691, July
conflict removal for layout of dark field alternating phase shifting masks,” IEEE
[4] A. B. Kahng, S. Vaya, and A. Zelikovsky, “New graph bipartizations for double-
exposure, bright field alternating phase-shift mask layout,” in Proc. of Asia and South
[5] T. Uehara and W. M. van Cleemput, “Optimal layout of CMOS functional arrays,”
IEEE Trans. on Computers, vol. 30, pp 305-312, May 1981.
[6] T. Iizuka, M. Ikeda, and K. Asada, “High speed layout synthesis for minimum-width
CMOS logic cells via Boolean satisfiability,” in Proc. of Asia and South Pacific De-
sign Automation Conf., Yokohama, Japan, 2004, pp. 149-154.
[7] E. Zarpas, “Siege vs. zChaff and Berkmin561 on BMC Formulas from the IBM For-
an efficient SAT solver,” in Proc. of the ACM/IEEE Design Automation Conf., Las
Vegas, NV, June 2001, pp. 530-535.
[9] K. Ooi, S. Hera, and K. Koyama, “Computer aiede design software for designing
phase shifting masks,” Jpn J. Appl. Phys., vol 32, pp. 5887-5891, 1993.
[10] B. Korte, J. Vygen, Combinatorial Optimization, Theory and Algorithms. New York,
[11] L. Liebmann, J. Lund, F.-L. Heng, and I. Graur, “Enabling alternating phase shifted
mask designs for a full logic gate level: design rules and design rule checking,” in
Proc. of the ACM/IEEE Design Automation Conf., Las Vegas, NV, June 2001, pp.
79-84.
of the ACM/IEEE Design Automation Conf., Las Vegas, NV, June 2001, pp. 73-78.
[14] L.-D. Huang and M. D. F. Wong, “Optical proximity correction (OPC)-friendly maze
routing,” in Proc. of the ACM/IEEE Design Automation Conf., San Diego, CA, Jun.
687.
77
[16] J. Mitra, P. Yu and D. Z. Pan, “RADAR: RET-aware detailed routing using fast lithog-
[17] J. Cong and K.-S. Leung, “Optimal wiresizing under the distributed Elmore delay
model,” IEEE Trans. on Computer-Aided Design, vol. 14, pp. 321-336, 1995.
[18] S. S. Sapatnekar, “RC interconnect optimization under the Elmore delay model,” in
Proc. of the ACM/IEEE Design Automation Conf., San Diego, CA, Jun. 1994, pp.
392-396.
[19] J. Lillis, C. K. Cheng, and T. Y. Lin, “Optimal wire sizing and buffer insertion for
low power and a generalized delay model,” IEEE J. of Solid-State Circuits, vol. 31,
[20] J. Cong and L. He, “Theory and algorithm of local refinement based optimization
with application to device and interconnect sizing,” IEEE Trans. on Computer-Aided
[21] C.-P. Chen, C. C.-N. Chu, and D. F. Wong, “Fast and exact simultaneous gate and
[22] J. Cong, L. He, C.-K. Koh, and Z. Pan, “Interconnect sizing and spacing with consid-
[23] R. Socha, “Propagation effects of partially coherent light in optical lithography and
[24] D. Lee, SPLAT v5.0 Users’ Guide, University of California, Berkeley, 1995.
78
[25] C. Chu and D. F. Wong, “VLSI circuit performance optimization by geometric pro-
[26] C. Visweswariah, “Death, taxes and failing chips,” in Proc. of ACM/IEEE Design
[27] S. Postnikove, and S. Hector, ITRS CD Error Budgets: Proposed Simulation Study
Methodology. Austin, TX: International Technology Roadmap for Semiconductors,
May 2003.
[29] B. Stine, D. Boning, J. Chung, D. Ciplickas, and J. Kibarian, “Simulating the im-
[30] J. Yang, L. Capodieci and D. Sylvester, “Advanced timing analysis based on post-
in Proc. of ACM/IEEE Design Automation Conf., San Diego, CA, Jun. 2004, pp. 321-
326.
[32] W. J. Poppe, L Capodieci, J. Wu, and A. Neureuther, “From poly line to transistor:
building BSIM models for non-rectangular transistors,” in Proc. of SPIE, vol. 6156.
VITA
Ke Cao was born in Changsha, China. He received B.S. and M.S. degrees from Uni-
versity of Science and Technology of China in 1996 and University of Minnesota in 2000
respectively. He received his Ph.D. in Computer Engineering in August 2007 from Texas
A&M University. He was with Integrated Device Technology Inc. from 2000 to 2003 as an
IC design engineer. He worked at Qualcomm Inc. from 2004 to 2006 as a VLSI engineer.
He is now with Marvell Semiconductor as a staff physical design engineer. His research in-
terests include physical design methodology and EDA development, especially in the area
of Design for Manufacturing (DFM), signal integrity and low power design methodology.
His permanent address is: 520 Mansion Court, Apt 305, Santa Clara, CA 95054.