Levinson H J Principles of Lithography PDF
Levinson H J Principles of Lithography PDF
Levinson H J Principles of Lithography PDF
Lithography
Third Edition
Principles of
Lithography
Third Edition
Harry J. Levinson
Levinson, Harry J.
Principles of lithography / Harry J. Levinson. – 3rd ed.
p. cm. – (Press monograph ; 198)
Includes bibliographical references and index.
ISBN 978-0-8194-8324-9
1. Integrated circuits–Design and construction. 2. Microlithography. I. Title.
TK7874.L397 2010
621.3815 031–dc22
2010026775
Published by
SPIE
P.O. Box 10
Bellingham, Washington 98227-0010 USA
Phone: +1 360.676.3290
Fax: +1 360.647.1445
Email: [email protected]
Web: http://spie.org
Copyright
c 2010 Society of Photo-Optical Instrumentation Engineers
All rights reserved. No part of this publication may be reproduced or distributed in any form or by any means
without written permission of the publisher.
The content of this book reflects the work and thought of the author(s). Every effort has been made to publish
reliable and accurate information herein, but the publisher is not responsible for the validity of the information or
for any outcomes resulting from reliance thereon.
Chapter 10 Immersion Lithography and the Limits of Optical Lithography .... 371
Harry J. Levinson
November 2010
Preface to the Second Edition
This book was written to address several needs, and the revisions for the second
edition were made with those original objectives in mind. First, and foremost, this
book is intended to serve as an introduction to the science of microlithography
for people who are unfamiliar with the subject. Most papers written for journals
or conference proceedings assume that the reader is familiar with pre-existing
knowledge. The same can be said for compilations of chapters written by experts
who are providing their colleagues and peers with useful summaries of the current
state of the art and of the existing literature. Such papers and books, while quite
useful to experienced lithographers, are not intended to address the needs of
students, who first need to understand the foundations on which the latest advances
rest. It is the intention of this book to fill that need.
For the experienced lithographer, there are many excellent books written on
specialized topics, such as photoresist and resolution-enhancement techniques, and
I have referenced many of those fine works. However, I have often felt that several
topics have not been well addressed in the past; most notably those subjects directly
related to the tools we use to manufacture integrated circuits. Consequently, this
book goes into a few subjects in depth. These include such topics as overlay, the
stages of exposure tools, and light sources. Finally, this text contains numerous
references. These are resources for students who want to investigate particular
topics in more detail, and they provide the experienced lithographer with lists of
references by topic.
A wise leader once told me that one of the most challenging tasks is to transform
complexity to simplicity; in other words, to make apparent the forest obscured
by all of the trees. I hope that I have succeeded adequately on the subjects
covered in this book. Of course, simplicity should never be confused with easiness
or completeness. To assist the student in recognizing these distinctions, more
problems have been added to the end of each chapter. It is expected that the reader
of this book will have a foundation in basic physics and chemistry. No topics will
require knowledge of mathematics beyond elementary calculus.
Lithography is a field in which advances proceed at a swift pace, and many new
topics have been included in this second edition, commensurate with the learning
that has taken place during the past few years, and several subjects are discussed
in more detail. Optical proximity corrections and next-generation lithography are
examples where the landscape looks quite different than it did just a few years ago.
Other topics, such as immersion lithography, were ideas that few took seriously
just a few years ago, yet today are considered quite mainstream.
It has been my good fortune to work with a number of outstanding lithographers.
In addition to the people acknowledged in the preface to the first edition, I would
like to thank several people who contributed to this update. These include Tim
Brunner of IBM, Wolfgang Henke of Infineon, Margaret Conkling of Nikon
xii Preface to the Second Edition
Precision, Nigel Farrar, Vladmir Fleurov, Palash Das and Charles Hindes of
Cymer, Andreas Erdmann of Fraunhofer-Institut für Integrierte Schaltungen, Doug
Resnick and John Algair of Motorola, Wilhelm Maurer of Mentor Graphics,
Christian Wagner and Robert Socha of ASML, Paul Graeupner of Carl Zeiss,
Johannes Nieder of Leica, John Ricardi and Harry Rieger of JMAR, Ray Morgan
of Canon, USA, Walter Gibson of XOS, and Sandy Burgan of DNS. Merry Schnell
and Sharon Streams of the publications staff of SPIE have been very helpful and
supportive. I apologize if I have failed to mention anyone who has helped me with
this update.
It has also been a privilege and joy to work on a more frequent basis with
some exceptionally outstanding lithographers in my own department, as well as
other lithography departments, at AMD. In particular, this includes manufacturing
organizations, where the principles discussed in this book have been skillfully
applied and expertly enhanced to produce high-performance nonvolatile memory
and the world’s most powerful Windows-compatible microprocessors. From AMD,
I would like to thank Bruno La Fontaine, Jongwook Kye, Ivan Lalovic, Adam
Pawloski, Uzodinma Okoroanyanwu, Rolf Seltmann, Wolfram Grundke, and Rick
Edwards for useful and informative discussions on lithography. I would like to
thank my wife, Dr. Laurie Lauchlan, and my daughters, Sam and Sarah, who
continued to exhibit amazing patience while I worked on the second edition of
this book. On September 11, 2001, the world witnessed the destructive power of
the irrational mind. I hope that this book will be a small reminder of the tremendous
capacity of the rational human mind to improve the world around us.
Harry J. Levinson
January 2005
Preface
This book has been written to address several needs. First, it is intended to serve as
an introduction to the science of microlithography for someone who is unfamiliar
with the subject. It is expected that the reader has a foundation in basic physics
and chemistry. No topic requires knowledge of mathematics beyond elementary
calculus. The book covers a number of advanced subjects as well, and it can be
used by experienced lithographers who wish to gain a better understanding of
topics that are not in their own areas of expertise. Numerous references are made to
literature in optical lithography, providing a guide for both novice and experienced
lithographers who want to learn about particular subjects in detail.
A number of discussions—such as thin-resist modeling, metrics for imaging,
thin-film optics, and the modeling of focus effects—first appeared in Advanced
Micro Design internal reports. Eventually, some parts of these reports were
published elsewhere. Their tutorial nature is not coincidental, as they were
analyses that I used to develop my own understanding of lithography. It is often
found that complex situations are best comprehended through simple models that
describe most of the relevant physics, with the remaining effects considered as
perturbations. This is the approach I used in learning lithography myself, and it is
the method used in this book. Students of the class on lithography science that I
periodically teach will recognize many of the figures and equations. A number of
these are also used in the first chapter in the SPIE Handbook on Microlithography,
Micromachining, and Microfabrication, Volume I: Microlithography. I coauthored
that chapter with Bill Arnold of ASM Lithography.
Additional topics have been added or expanded significantly, especially those
concerning light sources, photomasks, and next-generation lithography. The
chapter on overlay is an expanded version of a similar chapter found in my earlier
text, Lithography Process Control (SPIE Press, 1999). The chapter on photoresists
takes a different approach from that found in most books on lithography.
Here, resists are approached from the perspective of the practicing lithographer,
rather than the resist chemist. Some resist chemistry is discussed because this
knowledge is essential for using resists properly, but the emphasis is on operational
considerations.
A number of acknowledgments are in order. First, there are several people at
AMD to be thanked. Jongwook Kye provided the data for Figs. 8.8, 10.4, and 10.5.
Figure 12.8 shows two micrographs, generated by Janice Gray, of an EUV mask
fabricated at AMD. Dr. Uzodinma Okoroanyanwu has provided the author with
guidance through the world of resist chemistry. Figure 3.26 was contributed by
Marina Plat.
It has been my good fortune to have had the opportunity to interact with many
of the world’s experts in lithography, and their works are referenced throughout
this book. Quite a number of the ideas presented in this book first appeared in
xiv Preface
papers that I coauthored with Bill Arnold, who is now Executive Scientist at ASM
Lithography. I have also had a long association with Dr. Moshe Preil of KLA-
Tencor, and we have written several papers together.
Lithographers at semiconductor companies are integrators. We combine optics,
precision machines, photochemicals, and photomasks into working processes.
While chip makers often get the glory, the lens makers, resist chemists, and
tool makers are the unsung heroes of the microelectronics revolution. It has also
been my good fortune to be able to interact with many of the world’s experts in
optics, equipment, and lithographic materials. A number of people have provided
assistance in the writing of this book. Several people at Carl Zeiss (Dr. Winfried
Kaiser, Dr. Christian Wagner, Dr. Bernd Geh, and Dr. Reiner Gerreis) have
explained elements of lens optics and metrology to me. Bernd Geh donated the
data of Fig. 4.6, Winfried Kaiser provided Fig. 5.15, and Figs. 5.21 and 5.25
were provided by Dr. W. Ulrich. Figure 5.2 was obtained from Roy Wetzel of
ASM Lithography. Dr. Don Sweeney of Lawrence Livermore Laboratory provided
Fig. 5.19 and offered generally useful information on EUV technology, as did
Dr. Dan Bajuk, Dr. Saša Bajt, and Dr. David Attwood. Figure 5.9 was obtained
from Bob Willard of Lambda Physik. Figure 2.25 was provided by Wes Brykailo
of the Shipley Company. The data in Figs. 3.9, 3.10, and 3.11 were contributed
by Eddie Lee, Emir Gurer, Murthy Krishna, and John Lewellen of Silicon Valley
Group. Figure 3.14 was a donation from Will Conley of Motorola, and Dr. Sergey
Babin of Etec provided Fig. 7.14. Dr. Rainer Kaesmaier of Infineon and Dr. Hans
Löschner of IMS Vienna were particularly generous in providing information on
ion-projection lithography, including Figs. 12.13 and 12.14. Kurt Heidinger of the
Cyantek Corporation supplied information on wet chromium etchants. I want to
thank Phil Seidel for providing the data of Fig. 11.1. Dr. Chuck Gwyn of Intel
contributed Fig 12.10.
Dr. David Joy helped to improve my understanding of SEM metrology. Bill
Moffat of Yield Engineering tutored me as a young engineer and more recently
on adhesion promotion. The references would be far less complete without the
excellent services provided by AMD’s librarians, John Owen, Wendy Grimes, and
Sharon Shaw. I apologize to anyone who has helped me put together this book and
has not been acknowledged.
Finally, I would like to thank my wife, Dr. Laurie Lauchlan, who provided me
with a true appreciation of the importance of metrology, and for her apparently
infinite patience while I wrote this book. This book is dedicated to Laurie, and our
daughters, Samantha and Sarah.
Harry J. Levinson
April 2001
Chapter 1
Overview of Lithography
The patterns of integrated circuits are created on wafers by lithography. The steps
of this critical manufacturing process are listed in Table 1.1. Each step will be
discussed at length in later chapters of this book, but a brief description of each
will be given here. Most of this book is devoted to photolithography, where optical
methods are used to transfer the circuit patterns from master images—called masks
or reticles—to the wafers. Photolithography is the method used for patterning
nearly all integrated circuits fabricated today.
Resist coat: Resists are typically comprised of organic polymers applied from a
solution. To coat the wafers with resist, a small volume of the liquid resist is first
Table 1.1 The steps in the lithography process. The steps in italic are optional.
2 Chapter 1
dispensed onto a wafer. The wafer is then spun about its axis at a high rate of spin,
flinging off the excess resist and leaving behind, as the solvent evaporates, a thin
(0.1–2 µm, typically) film of solid resist.
Softbake: After the resist coating has been applied, the density is often insufficient
to support later processing. A bake is used to densify the resist film and drive off
residual solvent.
so high that it will degrade the photochemical properties of the resist. Had the
high-temperature hardbake been employed prior to the development step, the resist
would not have developed properly. Consequently, the hardbake is one of the
last steps in the lithography process, though it may precede measurement and
inspection.
The role of the lithography process in overall integrated circuit fabrication can
be appreciated by considering the sequence of deposition, lithography, and etch
steps used to establish electrical contacts to the transistors that make up integrated
circuits (Fig. 1.1). Electrical interconnections are made after the transistors have
been fabricated and covered with insulating oxide or another suitable dielectric.
The wafers are then covered with resist, and the resist is exposed in places
where electrical contacts to the transistors are desired. For reasons explained later,
positive resists are commonly used for making electrical contacts. The exposed
resist is developed out, exposing the oxide above the points where electrical contact
is desired. The wafer is then put into an environment of oxide etchant. The oxide
still covered by resist is protected against the etch, while the oxide is removed
where electrical contact is desired. The contact holes can then be filled with metal,
thus establishing electrical contact.
The primary tool used for projecting the image of the circuit from a photomask
onto a resist-coated wafer is the wafer stepper. Wafer steppers exist in two
configurations—step-and-repeat and step-and-scan. In a step-and-repeat system,
the wafer is positioned under a lens so that a projected image of the layer to be
exposed will properly overlay the patterns already on the wafer (Fig. 1.2). The
systems that allow the stepper to bring the wafer into proper position are discussed
in Chapter 6. Once the wafer is properly positioned and brought into focus, a
shutter in an illumination system is opened, allowing light to pass through the
Figure 1.1 The sequence of steps used to make electrical connections in a transistor in
an integrated circuit.
4 Chapter 1
photomask. The pattern on the mask is then imaged by the lens onto the wafer. The
image is reduced laterally by the amount N:1, where N is the lens-reduction factor,
most commonly equal to 4 in leading-edge systems today. Values of N of 1, 2, and
2.5 are found on steppers that have been designed primarily for high productivity.
Large values of N are desirable to the extent that they reduce the effects of
variations in linewidths and misregistration on the reticle, generally by the factor
of N, as the image of the reticle is projected onto the wafer. Defects are also reduced
in size to the point that they often fall below the resolution limit of the lens. The first
commercially available wafer stepper, the GCA DSW4800, had N = 10. However,
as chip sizes became larger, smaller values of N were required to avoid reticles
that were bigger than what was practical. For many years, most steppers had lens-
reduction factors of 5, but values of 4 began to appear in the early 1990s. They are
found on all leading-edge systems today. For a given mask size, larger exposure
fields on the wafer are possible with smaller values of N, and systems designed
for high productivity (but not necessarily to produce the finest patterns) often have
N < 4. With larger exposure fields, fewer fields need to be exposed in order to
pattern wafers completely, which gives an opportunity for higher throughput.
Because of reduction by the projection optics, only part of the wafer (an
“exposure field”) is exposed at any one time on a wafer stepper. After one field is
exposed, the wafer is moved on an extremely precise stage so that another part of
the wafer can be exposed. This process of exposure and wafer stepping is repeated
until the entire wafer is exposed. In the typical reduction stepper, the stage travels
in the horizontal plane beneath a fixed, vertically mounted lens.∗
In a step-and-repeat system, the entire area of the reticle to be exposed is
illuminated when the shutter is opened. In a step-and-scan system, only part of
∗
The Micrascan systems on which wafers moved vertically were exceptions. These exposure tools
were originally made by Perkin-Elmer. The Perkin-Elmer exposure tool business was purchased by
SVG, Inc., which in turn was acquired by ASML.
Overview of Lithography 5
the reticle, and therefore only part of the exposure field on the wafer, is exposed
when the shutter is opened (Fig. 1.3). The area exposed on step-and-scan systems
at any instant is usually a narrow rectangular region, referred to as the slit. The
entire field is exposed by scanning the reticle and wafer synchronously. The reticle
stage must be scanned at a speed of N times faster than the wafer, where N is the
lens-reduction factor. (The motivations for scanning are discussed in Chapter 5.)
Problems
1.1 For a mask that is 152 × 152 mm, and assuming that a 10-mm border at the
edges of the plate is required to hold the mask, show that the largest field that can
be patterned on the wafer is 13.2 × 13.2 mm if the lens-reduction factor is 10×.
What is the largest field if the lens-reduction factor is 5 × ? 4 × ?
1.2 The AMD Shanghai quadcore microprocessor has an area of 258 mm2 , and is
approximately square in shape:
Can such a die be printed in a single exposure with an exposure tool with 10× lens
reduction? 4× lens reduction? Why do you think that large lens-reduction factors
are not used, even though their use would reduce the criticality of reticle quality?
1.3 What are the nine principal steps in the lithographic process? Which steps are
optional?
1.4 In the lithographic process, what are the materials called in which patterns are
formed on the wafers?
Chapter 2
Optical Pattern Formation
2.1 The Problem of Imaging
The basic problem of imaging is shown in Fig. 2.1. Light from an illumination
source passes through a photomask, which defines the patterns. The simple
photomask illustrated here consists of two types of complementary areas—one
type that is opaque, while the other is transparent. In this example, the transparent
(or “clear”) area is a long space of uniform width, and the optical and resist
profiles shown in Fig. 2.1 are cross sections for this geometry. Some of the light
that passes through the mask continues through a lens, which projects an image
of the mask pattern onto a wafer. The wafer is coated with a photosensitive
film, a photoresist that undergoes a chemical reaction upon exposure to light.
After exposure, the wafer is baked and developed, leaving regions covered by
photoresist and complementary regions that are not covered. The patterning
objective of microlithography is to produce well-defined resist features, sized
within specifications. This is a challenge because of the shape of the light-
intensity distribution produced at the wafer plane by a lens of finite resolution.
This distribution lacks a clearly defined edge (Fig. 2.1). From the light-intensity
distribution alone, it is not possible to know where the edges of the feature are. If
the light-intensity distribution had the shape shown in Fig. 2.2, there would be no
such problem, because a clear delineation would exist between areas of the resist
exposed to light and areas unexposed to light.
The light distribution shown in Fig. 2.1 was not drawn arbitrarily or artistically.
Because light is a form of electromagnetic radiation, it is possible to use equations
that describe the propagation of electromagnetic waves to calculate optical image
formation,1 and the light-intensity distribution shown in Fig. 2.1 was generated
accordingly. The physics of image formation will be discussed in more detail in
this and later chapters.
Lithographic processes are easier to control the closer the actual optical images
resemble the ideal ones. If the light profile at the wafer plane is represented by the
distribution shown in Fig. 2.2, the edges of the feature could be clearly identified
by looking at the light-intensity distribution. The photoresist on the wafer would be
cleanly separated into exposed and unexposed areas. In the situations that actually
occur, the photoresist receives continuously varying doses of light in the regions
corresponding to features on the mask. Proper pattern definition on the wafer
8 Chapter 2
Figure 2.1 An illustration of the imaging process. Light passes through a reticle
(photomask). The resulting pattern is imaged onto a photoresist-covered wafer by a lens.
The finite resolution of the lens results in a light-intensity distribution that does not have
clearly defined edges. The particular light-intensity distribution shown in this figure was
calculated using PROLITH 1.5, for a 0.25-µm space on a 0.65-µm pitch (i.e., for a grating
pattern of 0.25-µm spaces and 0.4-µm lines), where the numerical aperture (NA) of the
aberration-free lens is 0.5, imaging at a wavelength of 248 nm, and with 0.5 partial
coherence. The parameter “numerical aperture” will be explained later in this chapter, and
partial coherence is discussed in Appendix A. Actinic refers to light that drives chemical
reactions in the photoresist.
requires that small differences in exposure doses at the edges of pattern features be
distinguished through the resist processing. Edge control will be better for light-
intensity distributions that closely resemble the ideal case illustrated in Fig. 2.2.
Light-intensity profiles produced by two sets of optics, one of “higher
resolution” than the other, are shown in Fig. 2.3. (The parameters that affect
image sharpness and resolution will be discussed later in this chapter.) The image
produced by the higher-resolution lens is closer to the ideal image than the one
produced by the lens with lower resolution. For a given feature size, it is possible
to have better pattern definition with higher-resolution optics. However, because of
the highly competitive nature of the microelectronics industry, lithographers need
to operate their processes at the limits of the best available optics. With a given set
of optics, the light-intensity distributions are degraded further, relative to the ideal
case, with smaller features (Fig. 2.4). As features become smaller, the edge slope of
the light-intensity profiles become more sloped, and the peak intensity decreases.
The challenge for lithographers is to produce the smallest possible features on the
wafers, with shape and size well controlled, for a given generation of optics.
Figure 2.3 Light-intensity profiles for 150-nm isolated spaces (nominal). One image was
generated for an aberration-free 0.7-NA lens, with a partial coherence of 0.6, at a wavelength
of 248 nm (KrF), while the other image was generated for a 193-nm (ArF) unaberrated
lens, with a numerical aperture of 0.75 and a partial coherence of 0.6. These images were
calculated using the simulation program Solid-C.2 The ArF lens has higher resolution than
the KrF lens. All images are in the plane of best focus.
10 Chapter 2
Figure 2.4 Calculated light-intensity distributions for isolated spaces of varying sizes,
using parameters for an aberration-free 0.7-NA lens, with a partial coherence of 0.6, at
a wavelength of 248 nm. All of the features are normalized to a width of 1.0. All images are
in the plane of best focus.
As the feature size shrinks, the edge acuity of the light-intensity distribution is
degraded. If even smaller features are considered, at some point one must say that
features are no longer resolved, but it is clear from Fig. 2.4 that there is a gradual
transition from “resolved” to “unresolved.” A definition of resolution is not obvious
because of this lack of a clear delineation between “resolved” and “unresolved.”
Simple diffraction analyses lead to the most frequently cited quantitative definition
of resolution, the Rayleigh criterion, which will be introduced shortly. While
sufficiently instructive to justify discussion, the Rayleigh criterion does not provide
a measure directly applicable to the situation encountered in photolithography,
and one should use it with care. Discussion of the Rayleigh criterion in this book
emphasizes its assumptions, and therefore its applicability.
The subjects of diffraction and image formation are quite involved and, at best,
can only be introduced here. The mathematical formalism is particularly complex,
and the interested reader is referred to the references cited at the end of this chapter.
The phenomenon of diffraction will be introduced through two examples—the
diffraction grating and the circular aperture.
The origin of the resolution limits of optical systems can be appreciated by
considering the system shown in Fig. 2.5. A reticle with a diffraction grating with a
periodicity 2d (equal lines and spaces) is illuminated by a source of coherent light
of wavelength λ. (For readers unfamiliar with coherence, a short introduction is
provided in Appendix A.) For an angle of incidence of θi , the grating diffracts the
light into various beams whose directions are given by1
mλ
sin (θ) − sin (θi ) = , m = 0, ±1, ±2, . . . . (2.1)
2d
Here we are assuming that the index of refraction of air, which surrounds the mask
and lens, is ≈1.
Optical Pattern Formation 11
Figure 2.5 A situation with coherent light and a diffraction grating on the reticle. A ray of
light is diffracted into beams that propagate in distinct directions given by Eq. (2.1).
Consider a lens with a collection angle of θ0 (Fig. 2.6) used to image the grating.
Due to Eq. (2.1), the lens collects only a finite number of diffraction beams.
Coherent light illuminating a grating diffracts into beams that correspond to the
object’s Fourier components, and a diffraction-limited lens will recombine those
beams that pass through the entrance pupil of the lens. If the lens collects all of
the diffracted beams, then the grating will be fully reconstructed in the image.
However, since the lens collects only a finite number of beams the image is only
a partial reconstruction of the original grating pattern. As light from increasingly
larger angles is collected, the image consists of more terms in the Fourier-series
expansion of the light-intensity distribution of the grating (see Fig. 2.7). For on-
Figure 2.6 The finite size of lenses limits the angles at which diffracted beams are
collected.
12 Chapter 2
Figure 2.7 Partial sums of Eq. (2.2), with I0 = 1. The image more closely approximates a
rectangular grating when more terms are added.
If the lens captures only a single beam, then there is no pattern in the image, since
a single plane wave has no spatial variation in its intensity:
~ 2
Intensity of a single plane wave = A0 eik·~x = |A0 |2 . (2.3)
At least two interfering plane waves are needed to generate spatial variation. To
retain more than the first (constant) term in the expansion of I(x), θ0 must be large
enough to capture diffracted beams for m ≥ 1. Equation (2.1) then leads to the
following expression for a minimally resolved feature:
λ
d = 0.5 . (2.4)
sin θ0
Figure 2.9 The light-intensity distribution from a point source (of unit intensity) projected
through a circular aperture in an opaque screen. The dimensions x and y are given in units
of πd/λz, where d is the diameter of the aperture and z is the distance between the two
screens.
14 Chapter 2
where x = πdr/λz, d is the diameter of the aperture, z is the distance between the
two screens, and J1 is the first-order Bessel function. I0 is the intensity at the peak
of the distribution. It should be noted that shorter wavelengths and larger apertures
lead to less angular divergence. The light-intensity distribution given by Eq. (2.5)
is called the Airy pattern, after G. B. Airy who first derived it.3 Because of the
diffraction that occurs at the entrance pupil or edges of a lens producing divergent
beams, the images of point objects are focused Airy patterns.
Rayleigh used the above property of diffraction to establish a criterion for the
resolving power of telescopes, using the following argument.4 Suppose there are
two point sources of light (stars) separated by a small angle. Being independent
sources, the light rays from different stars do not interfere, i.e., they are mutually
incoherent. Accordingly, the total light intensity is the sum of the individual
intensities. The two stars are said to be minimally resolved when the maximum
of the Airy pattern from one star falls on the first zero of the Airy pattern from the
other star. This occurs when
x
= 0.61. (2.6)
2π
The resulting total light intensity due to the two stars with an angular separation
given by x = πdr/λz = 0.61 × 2π is plotted in Fig. 2.10, where two peaks can
be clearly distinguished. It was thus learned that the resolving capability of optical
instruments is limited simply because of the finite entrance pupil.
For focusing optics, the minimum resolved distance δ between the peak of the
Airy distribution and its first zero can be related using criteria that must be satisfied
for focusing optics, resulting in
λ
δ = 0.61 , (2.7)
n sin θ
where n is the index of refraction of the medium surrounding the bottom of the
lens (n ≈ 1 for air), λ is the wavelength of the light, and 2θ is the angle illustrated
Figure 2.10 Light-intensity distribution of two light sources projected through a circular
aperture.
Optical Pattern Formation 15
in Fig. 2.11. The quantity n sin θ is called the numerical aperture (NA) of the lens.
Proper lens design is required for the numerical aperture, and hence the resolution,
to be uniform across the exposure field. The minimum separation given by Eq.
(2.7) is referred to as the Rayleigh resolution.
Examples of wavelengths and numerical apertures typically used in
semiconductor processing are given in Table 2.1 along with the accompanying
Rayleigh resolution. As can be seen from this table, a high resolution (small feature
size) can be achieved with a large NA or a short wavelength. When focus is taken
into consideration, these two paths to resolution are not equivalent, as is discussed
shortly.
It is worth restating that the Rayleigh criterion for resolution is obtained by
considering the imaging of point sources of light, which is not the situation
encountered by photolithographers. The geometries on photomasks that must
be imaged onto the front surface of wafers are extended in size, generally
approximating the size and shape of a circuit element. The resolution problem
is shown in Fig. 2.4. As the features become smaller, the optical profile becomes
Figure 2.11 The angle defined by the aperture in a lens determines its numerical aperture.
λ
smallest feature = k1 , (2.8)
NA
where NA is the numerical aperture, and the prefactor of 0.61, associated with point
sources, is replaced by k1 (usually referred to as the “k factor” for a given process).
For a given resist system, Eq. (2.8) provides scaling, and the prefactor changes if a
different photoresist process is used.
The imaging analysis of a grating structure contains a curious feature associated
with coherent illumination. For a particular set of optics, grating dimensions are
resolved to a certain point and then not resolved at all for smaller features, at the
point when the multiple beams no longer enter the lens. This behavior of optics,
where its performance falls off rapidly as the rated resolution limits of the lenses
are approached, is familiar to most photolithographers. The extremely sharp cutoff
in resolving power of the grating example in this book is an artifact of the complete
coherence of the light and single direction of the illumination. This cutoff led
to a situation in which terms of the expansion Eq. (2.2) were retained with their
complete magnitudes or not retained at all. For partially coherent and incoherent
light, the coefficients of the cosines in the image-series expansion decrease more
gradually as the diffraction limits are approached, and as the illumination includes
light incident from multiple angles.
Consider again the mask pattern consisting of a grating of a periodicity of
2d. Near the resolution limit, the image consists of only a constant plus a single
cosine term that has the same periodicity as the mask pattern, regardless of the
configuration of the optics. After passing through the optical system, the light-
intensity profile appears as shown in Fig. 2.12. The minimum light intensity Imin
is greater than zero, and the peak light intensity Imax is less than the illumination
Optical Pattern Formation 17
Figure 2.12 Light intensity as a function of position, for a grating pattern on the reticle.
Imax − Imin
C= . (2.9)
Imax + Imin
1
ν= . (2.10)
2d
2
MT F(ν) = (φ − cos φ sin φ), (2.11)
π
Figure 2.13 Contrast [defined in Eq. (2.9)] of the image of a grating, produced by an in-
focus diffraction-limited lens. The normalized spatial frequency is given by λ/2dNA, where
2d is the period (or pitch) of the grating, NA is the numerical aperture of the lens, and λ is
the wavelength of the light. The coherence factor σ is defined in Appendix A. For incoherent
light, the contrast equals the MTF.
The circular aperture MTF = 0 for ν > 2NA/λ. This MTF function is plotted in
Fig. 2.13. For incoherent illumination, the image of a diffraction grating is given
by7
∞
m
1 2 X (−1) m mπ mπx .
I(x) = I0 −
MT F sin cos (2.13)
2 π m=1 m 2d 2 d
the physical phenomenon of diffraction. This limit may not be reached in actual
optical systems because of lens aberrations, which are discussed in Chapter 4.
The sharpness of the resolution “edge” of a lens is dependent upon the degree
of coherence of the illumination, and for exposure systems that have been used
in the past, the falloff in optical contrast becomes steep as the resolution limit is
approached, but it is not perfectly sharp.
Creating resist images within manufacturing tolerances, and on a repeatable
basis, from poorly defined optical profiles is the challenge confronting
microlithographers. Fortunately, there is more to lithographic pattern formation
than optics. Photoresist is a second key ingredient for patterning. One can think
of a lithographic-exposure system as essentially a camera in which the photoresist
plays the role of the film. Following the development process, a pattern is formed
in the photoresist. The quality of the final resist patterns is determined by the
resolving power of the optics, the focusing accuracy, the contrast of the resist
process, and an assortment of other variables that cause the final image to be a
less-than-perfect reproduction of the features on the mask. Just as common film for
personal cameras produces pictures of varying quality depending upon the film’s
photospeed, contrast, and graininess, as well as the focus setting of the camera, the
final images produced photolithographically are affected by the resist processes
and the optics used.
An understanding of the independent factors that contribute to pattern formation
and their interactions is essential for the photolithographer. For example, consider
the situation in which a production process has started to produce poorly defined
patterns. Is the problem due to the resist process or the exposure system?
An understanding of the various contributions to the final image enables the
lithography engineer to resolve the problem expeditiously. Process-development
engineers must also have a solid understanding of the fundamentals of lithography.
For a new process, advanced optics provide high resolution at the expense of
capital, while sophisticated resist processes might extend the capabilities of any
optics, but with a possible increase in process complexity (and other issues
discussed later). By appreciating the roles that the resist and optics play in pattern
formation, the photolithography engineer can design the most cost-effective and
capable manufacturing line. An overview of the lithographic process is presented
in this chapter whereby the separate contributions from the optics and the resist
process can be seen. Each subject is then discussed in more detail in subsequent
sections.
Two characteristics of the final resist pattern are of primary interest—the size
and the shape. Processes capable of producing the narrowest possible linewidths
are highly desired because they enable smaller chips to be produced, allowing
large numbers of devices to be produced on each wafer. Thus, there is a tight
connection between lithographic capability and profits. Short gates are also needed
for fast electronics. The requirements of the resist shape are usually imposed by
the post-lithographic processing. Consider, for example, the situation in which the
photoresist is to serve as a mask for ion implantation. If the edges of the resist
are nearly vertical, normally incident implantation is clearly delineated by the
20 Chapter 2
resist edge, resulting in a sharp doping profile. On the other hand, if the resist
has considerable slope, high-energy ions penetrate the partial thickness of the
resist at the edges of the pattern, and the resulting doping profile is graded. The
relevance of the resist-edge slope and its effect on the doping profile depends upon
the overall process requirements. Another common example in which the slope of
the resist is important occurs when the resist is to be used as a mask for plasma
or reactive ion etching, and the etching process erodes the photoresist. The slope
of the photoresist can be transferred to the etched layer, which may or may not
be desirable. There is no lithographic consideration that purely determines the
optimum process. However, once the requirements of the final resist pattern are
determined from considerations of the post-lithographic processing, the masking
process can be specified. Typically, the most preferred resist profiles are ones that
are nearly vertical. Since these are generally the most difficult to produce, as well
as the most desired, our discussion is oriented toward such profiles.
Regardless of the specific size and shape requirements of the resist patterns,
which may vary from one technology to another, all lithographic processes must
be consistent and reproducible, relative to manufacturing specifications, in order to
be appropriate for use in production. The parameters that affect process uniformity
and consistency must be understood, as well as those that limit the ultimate
performance of a lithographic tool or process. Identification of those parameters
is a subject of a significant portion of this chapter.
A typical resist profile is shown in Fig. 2.14. The shape of the cross section is
often idealized as a trapezoid. Three dimensions are of greatest interest: the width
of the resist line at the resist-substrate interface, the slope of the sidewall, and the
maximum thickness of the resist film after development.
Of course, actual resist profiles often depart significantly from the idealized
trapezoid. The Semiconductor Equipment and Materials International (SEMI)
standard that defines linewidth8 accounts for such departures, and linewidth is
Figure 2.14 (a) Cross section of Sumitomo PAR101 resist, printed on an ASML 5500/900
193-nm step-and-scan system, which had a 0.75 NA. (b) Idealized trapezoid cross section.
Optical Pattern Formation 21
defined to be a function of the height from the resist-substrate interface (Fig. 2.15).
Throughout the next section, if the word “linewidth” is used with no further
clarification, it is understood to be the dimension L = y2 − y1 measured at the
resist-substrate interface. This definition has been chosen for three reasons. First,
the width at the base of the resist line is of greatest relevance to the result after
the post-lithographic processing. For example, in a highly selective etch where
there is little resist erosion, all other dimensions of the resist line have negligible
influence on the resultant etch. Moreover, in the presence of a sloped resist profile,
the definition adopted here for linewidth is unambiguous. Finally, the value of the
linewidth is decoupled from the slope of the resist line.
Figure 2.15 The SEMI standard definition for linewidth. Linewidths are given by the
quantities L = y2 − y1 defined at the distances x0 along the resist line and at height z0 .
At this point, the astute reader has noted that the term “resolution” has not been
given a rigorous definition. Since the ability of lithography to resolve small features
is a driving force in the microelectronics industry, a definition of this key quantity
might appear to be in order. The use of theoretically based measures, such as the
Rayleigh criterion, are useful for gaining insight into the lithography process and
estimating capability, but they are not completely adequate because they fail to
account for imperfect lens design and manufacture. They are also insufficient for
other reasons. For example, a lithography process might be unable to reproduce
a pattern coded for 50 nm on the reticle (50 nm times the lens reduction), but
adjustments to exposure in the same process might allow 50-nm features to be
formed on the wafer by overexposing larger features (for example 100 nm times
the lens reduction) on the reticle. Depending on one’s perspective, the resolution
of the lens could be considered either 100 nm or 50 nm. The resulting ambiguity
is a consequence of an additional variable, print bias, which is the difference
between the size of the feature on the mask (divided by the lens reduction) and
the size of the corresponding feature patterned in resist on the wafer. Similarly,
optics have different capabilities for imaging isolated lines, isolated spaces, square
contacts, or isolated islands. Should resolution be considered for one type of feature
or for all features, with or without print bias? The answers to these questions
22 Chapter 2
later. The basic assumption for this simple model is that the thickness T (x) of
photoresist remaining after development is determined by the exposure energy dose
E(x) = I(x) · t, where t is the exposure time:
T E (E) E
0
= γ ln , (2.15)
T0 E
Figure 2.16 A typical characteristic curve for photoresist, showing photoresist thickness
remaining after development as a function of exposure dose. Resist thickness is plotted on
a linear scale, while the exposure dose is plotted on a logarithmic scale to produce a curve
that is approximately linear in the vicinity of E0 .
24 Chapter 2
for E < E0. The slope γ is called the “contrast” of the resist process and is defined
by Eq. (2.15). Typical values for γ are shown in Table 2.2.
The dependence of the shape of the final resist pattern on the resist process can
be seen in the following analysis. The slope of the resist sidewall is simply
dT
= tan θ, (2.16)
dx
where the derivative is evaluated at the point x0 where the resist profile and the
substrate intersect (Fig. 2.17). Note also that
E0 = E(x0 ). (2.17)
Equation (2.18) neatly divides the factors that determine the resist profile.
The first factor, dT E /dE, is characteristic of the photoresist and development
process, independent of the exposure tool, while the second factor, dE/dx, is
completely determined by the optics of the exposure system. One obtains dT E /dE
by differentiating a curve such as the one shown in Fig. 2.16, but on a linear scale.
The function E(x) is discussed shortly.
In the vicinity of x0 , T E (E) is described by Eq. (2.15). This results in the
following expression for the slope of the resist profile:
" #" #
dT 1 dE(x)
tan θ = = −T 0 γ . (2.19)
dx E(x) dx
Steeper resist profiles are produced with high-contrast resists (large values of γ)
and steep optical profiles, which have large values for the normalized derivative,
(1/E)(dE/dx). Our simple model is based upon the assumption that the resist
development behavior that has been measured in large exposed areas can be
Figure 2.17 A space in (positive) resist is created by exposure and development. The
resist thickness T (x) is a function of x.
applied directly to situations where the light intensity is modulated over small
dimensions. This is equivalent to assuming that T (x) is solely a function of
E(x). Within the limits to which our assumption is valid, the dependence of the
profile slope is cleanly separated into the contributions from the optics [the factor
(1/E)(dE/dx)] and from the resist process (represented by γ), and each can be
studied independently of the other.
Theoretical models of image formation in projection optics have been
developed, usually starting from the Hopkins formulation for partially coherent
imaging.19 The Hopkins theory is based on scalar-wave diffraction theory from
physical optics. Imaging models provide the engineer with the ability to calculate
E(x) for various configurations of optics. Let us consider the optics term E(x) and
its normalized derivative, (1/E)(dE/dx). E(x) was calculated for a “perfect” lens
using a commercially available simulation program, Solid-C,2 and the results are
shown in Figs. 2.4 and 2.18. All profiles are shown in normalized dimensions
so that for each feature size the mask edges occur at +/−0.5. The normalized
derivative is shown only around the region of interest, close to the edge of the mask
26 Chapter 2
Figure 2.18 Absolute values of calculated normalized derivatives (image log slope)
(1/E)(dE/dx) of the intensity distributions shown in Fig. 2.4, near the position in the image
that corresponds to the mask edge at x = 0.5. The center of the feature occurs at x = 0.
Overexposure corresponds to values of x > 0.5.
feature. As one can see, for fixed optics the “sharpness” of the optical-intensity
profile degrades and the value of (1/E)(dE/dx) at the edge of the mask feature
is clearly reduced as the feature size is decreased. Lens aberrations or defocus
would also reduce the magnitude of (1/E)(dE/dx), while higher-resolution optics
would increase its value. Since Eq. (2.19) is nonlinear, substantial degradation of
the optical contribution to the profile slope can be tolerated if the contrast γ of the
resist process is large enough.
It should be noted that
! !
1 dE 1 dI
= , (2.20)
E dx I dx
where E is the exposure dose and I is the corresponding light intensity, since
E = I × t, (2.21)
where t is the exposure time. All of the expressions involving the normalized
derivative of the image can be written equivalently in terms of exposure dose or
light intensity.
One can express the normalized derivative of the intensity or dose as
!
1 dE d
= [ln (E)] , (2.22)
E dx dx
∆T
T 0 + ∆T
γ= !, (2.23)
∆E
ln 1 +
E0
∆T
≈ , (2.24)
∆E
T0
E0
for small variations. (It is assumed here that the resist is on a nonreflecting
substrate. The more complex situation in which the substrate is reflecting is
discussed in Chapter 4.)
Figure 2.19 Resist (dark material at the top of the figure) spun over topography has a
relatively planar top surface.
28 Chapter 2
Figure 2.20 Characteristic curve for thicker photoresist. The ordinate is the thickness of
the resist remaining after development, normalized to the resist thickness T 0 + ∆T 0 , while
the abscissa is the natural logarithm of the dose normalized to E0 + ∆E.
The shift in the edge of the photoresist line that occurs due to the change in the
thickness ∆T of the photoresist is given by
∆E
∆x = , (2.25)
dE/dx
where the derivative is evaluated at the point x0 . From Eq. (2.24), this becomes
!#−1
∆T
!" !
1 dE
∆x = (2.26)
γT E0 dx
∆T
= (ILS )−1 . (2.27)
γT
Figure 2.21 With a change in dose, there is a shift in position where the light intensity
equals a threshold value.
Accordingly,
!−1
dE
∆x = − f E (x0 ) , (2.29)
dx
letting
∆E
f = . (2.30)
E
where all expressions are evaluated for x at the line edge. Our expression for
exposure latitude [Eq. (2.31)] contains only factors involving the exposure optics
and is independent of the photoresist process and its characteristics. Since it is
observed empirically that exposure latitude can be modulated to some degree
through the resist process, it is clear that the model presented here is not exactly
valid, but is good to the first order and best within the limits of thin- or high-contrast
photoresist.
30 Chapter 2
aperture, degree of partial coherence, aberrations, etc. The value of the normalized
derivative (1/E)(dE/dx), which affects the exposure latitude, resist wall profile,
and linewidth control over steps, depends not only on the resolution of the optics
but also on the print bias as well, that is, the point at which the derivative is
evaluated. The value of (1/E)(dE/dx) at the mask edge is different from that at
other positions. The normalized derivative is evaluated at the position x0 , which
is the edge of the resist line, and may not correspond to the edge of the feature
on the mask. The resist patterns may have dimensions different from the ones on
the mask. If the spaces in (positive) resist are printed larger than the ones on the
mask, then they are termed “overexposed,” and if they are printed smaller, they
are called “underexposed.” Resist patterns having the same dimensions as on the
mask are termed “nominal.” Resulting from overexposure or underexposure, the
quantities that depend upon the normalized derivative, such as exposure latitude,
are functions of the print bias. This is seen quite clearly in Fig. 2.22, where the best
exposure latitude occurs for overexposure. As is explained in the next section, this
result is true only so long as focus is well maintained, and the situation in which
the optics are significantly defocused is more complex.
Higher-resolution optics improve the image log slope, and hence many
parameters, such as resist profiles and exposure latitude. Unfortunately, higher-
resolution optics are never acquired simply to make the photolithographer’s job
easier. When higher resolution optics become available, the challenge is simply
shifted to smaller features, where the normalized derivative remains small. From
the preceding, an expression for the exposure latitude is given by
∆E ∆L L 1
( " ! #)
d
= E(x) . (2.33)
E L 2 E(x) dx
In terms of the image log slope, one can rewrite this equation as
∆E ∆L
= (L × ILS ) . (2.34)
E 2L
The quantity in parenthesis is referred to as the normalized log slope. This
is important to recognize, since image log slopes need to increase as feature
sizes decrease in order to maintain constant exposure latitude. When discussing
exposure latitude particularly among different feature sizes, the normalized log
slope is the appropriate parameter. As a consequence, a related metric, with an
additional normalization related to the nominal linewidth L is often used, namely,
the normalized image log slope (NILS):
! !
1 dE
NILS = L . (2.35)
E dx
The above results show how the optics and resist processes play separate and
somewhat independent roles in image formation. Of course, the performance
32 Chapter 2
2.4 Focus
In earlier discussions, the light-intensity distributions were considered only in
the planes of best focus. In optical systems, defocus occurs when the separation
between the imaging plane and the lens is different from the optimum. From
common experience, one knows that defocus reduces the clarity of optical images.
The image produced by a common overhead projector may be slightly out of focus,
but the presentation may remain readable. If the defocus becomes too great, the
image is unintelligible. In lithography the depth-of-focus concept is the range of
lens-wafer distances over which linewidths are maintained within specifications,
and resist profiles are adequate. The problem of defocus is particularly acute in
optical lithography, where the depth-of-focus is becoming so small that it is a
concern as to whether optical wafer steppers are capable of maintaining the image
in focus.
The problem of defocus can be appreciated by considering the imaging of a point
source of light. In purely geometrical optics, where it is assumed that light travels
in straight lines, a point object will be imaged by ideal optics to a point located in
the focal plane (Fig. 2.23). However, in other planes, the point source of light is
broadened into a circle. When imaging, there is generally an amount of broadening
that is considered acceptable. Conceptually, the range of values of z over which this
broadening remains acceptable is the depth-of-focus. At a given distance z from the
focal plane, it can be seen that the broadening will be greater for larger values of θ.
Figure 2.23 In geometrical optics, a point source imaged by an ideal lens is focused to a
point in the focal plane. Away from the plane of best focus, the image is a circle of radius r.
Optical Pattern Formation 33
It might therefore be expected that lenses will have decreasing depths-of-focus with
increasing numerical aperture. (Compare Figs. 2.11 and 2.23.) In the discussion of
the imaging of a grating structure, it was noted that gratings with smaller lines
and spaces created diffracted beams at larger angles [Fig. 2.5 and Eq. (2.1)]. From
this consideration, one can also expect that objects with smaller feature sizes will
have smaller depths-of-focus. This shows that there is a tradeoff between resolution
and depth-of-focus. Lenses with larger numerical apertures may be used to resolve
small features, but the resulting images will have smaller depths-of-focus than the
corresponding images of larger features imaged by lower-NA lenses.
The diffraction of a point source of light by a circular aperture discussed earlier
(Fig. 2.9) has been extended to the situation in which a lens is used to image the
point source of light. The resulting calculated light-intensity profiles in different
focal planes are shown in Fig. 2.24. With defocus, the peak intensity diminishes
and more light is diffracted away from the center spot. Using the criteria that the
peak intensity should not decrease by more than 20%, the following expression
results for a depth-of-focus (DOF):1
λ
DOF = ±0.5 , (2.36)
(NA)2
which is valid for small to moderate values of NA and imaging in air. Over this
range of focus, the peak intensity of a point object focused by a lens remains within
20% of the peak value for best focus, where NA is the numerical aperture of the
lens and λ is the wavelength of the light. This expression is usually referred to
as the Rayleigh depth-of-focus; one Rayleigh unit of defocus is 0.5λ/NA2 . The
situation for large values of NA will be discussed in Chapter 10.
Figure 2.24 Light-intensity profile of a point source of light imaged by a circular diffraction-
limited lens at different planes of focus. The horizontal axis is given in units of πdNA/λ,
where d is the diameter of the lens aperture. One Rayleigh unit of defocus is 0.5λ/NA2 .
34 Chapter 2
λ
DOF = k2 . (2.37)
(NA)2
R2
DOF = (constant) . (2.38)
λ
Within the framework of this analysis, one can see that the depth-of-focus
diminishes as one attempts to print smaller features (smaller R). From the Rayleigh
criterion for resolution, there are two options for improving resolution: decreasing
the wavelength or increasing the numerical aperture. Doing either of these also
decreases the depth-of-focus, at least according to Rayleigh. However, improving
resolution by decreasing the wavelength has less effect on the depth-of-focus than
by increasing the numerical aperture. This is the primary reason that leading-edge
optical lithography has involved the use of decreasing wavelengths in the pursuit
of smaller features.
The Rayleigh criteria might lead one to a false conclusion that the depth-of-
focus diminishes whenever one attempts to improve the resolution of the optics.
From the following argument, one can appreciate that this is not true. Suppose
one has a lens that has a resolution of 1.0 µm, i.e., it does not image features
smaller than this. One could therefore say that such a lens has zero depth-of-focus
for submicron features. A higher-resolution lens capable of submicron resolution
therefore has greater depth-of-focus for smaller features. This line of reasoning
leads to the observation that there is an optimum numerical aperture that is a
function of feature size.12,24–28
At one time, improper interpretations of the Rayleigh criteria led to the
conclusion that optical lithography would not be able to break the 1.0-µm-
Optical Pattern Formation 35
Figure 2.25 Resist profiles for imaging 105-nm lines (320-nm pitch) at best focus and
various amounts of defocus. The KrF resist was UV110, patterned on a SiON antireflection
coating, using annular illumination (σ = 0.75 outer, 0.50 inner) on an ASML 5500/300
(NA = 0.60). Annular illumination is described in Chapter 8.
Optical Pattern Formation 37
2r
DOF = , (2.39)
tan θ0
where θ0 is the largest angle for incident rays of light and is related to the numerical
aperture by
When a thick layer of resist is placed in the vicinity of best focus, refraction
becomes a factor. In Fig. 2.26, two focused light rays converge to the point F1
instead of the focal point F0 because of refraction at the air-resist interface.
The angle θ1 between the normal and the refracted ray is related to θ0 by Snell’s
law:
sin (θ0 )
sin (θ1 ) = , (2.41)
n
Figure 2.26 Refraction of light at the air-resist interface, its effect on the position of focus,
and the spreading of the image due to defocus.
38 Chapter 2
where n is the index of refraction of the photoresist. The depth into the photoresist
through which these light rays can travel before they diverge a lateral distance
greater than 2r is increased from Eq. (2.39) to
2r
DOF = . (2.42)
tan θ1
This leads to an effective increase in the depth-of-focus, over the aerial image, by
a factor of
tan (θ0 )
. (2.43)
tan (θ1 )
For small angles, this is approximately equal to the photoresist’s index of refraction
n. That is, the reduction in the depth-of-focus due to resist thickness T is T/n. For
photoresist, typical values for n are in the neighborhood of 1.7. The refraction at
the resist-air interface also breaks the symmetry of an unaberrated image about the
plane of best focus, and is an effect seen in detailed lithography simulations.32
For well-defined and controlled patterns, good imaging needs to be maintained
throughout the thickness of the resist film that may vary over topography. A
rigorous capturing of the effect of device topography on the depth-of-focus is
tricky because there are linewidth variations due to thin-film optical effects (to
be discussed in Chapter 4) that reduce the overall process window. Even on
nonreflective substrates, there are changes in linewidths because of variations in
resist thickness, and these would occur even for aerial images with infinitely
large depths-of-focus. Consequently, the reduction in the depth-of-focus due to
topography can be considered to be T/n, where T is the largest-value resist
thickness across the topography.
It is important that linewidths are good simultaneously throughout the
entire exposure field. The concept of usable depth-of-focus follows from the
understanding that, in the practical world of integrated circuit manufacturing,
imaging needs to be good for all points within a field to ensure complete circuit
functionality. This philosophy, where all points within a field need to be good
simultaneously, was first discussed in detail in the context of overlay,33 but the
basic concepts are applicable to all lithographic parameters, including focus. The
depth-of-focus of a given feature in a particular orientation can be determined for
a given set of optics and resist process. However, this does not represent the depth-
of-focus that can be used for making integrated circuits because integrated devices
typically have geometries of many different orientations distributed over the area
of the chip. It has become accepted practice to distinguish between the usable
depth-of-focus (UDOF) and the individual depth-of-focus (IDOF).34 The IDOF is
the depth-of-focus at any one point in the image field, for one particular orientation.
The UDOF is the amount by which the separation between the wafer and the lens
can be changed and still keep linewidths and resist profiles adequate throughout the
entire exposure field. UDOF is the common range of all IDOFs over the printable
Optical Pattern Formation 39
area and all orientations. This is generally much smaller than the depth-of-focus at
any individual point in the field (IDOF).
There are many factors that cause the UDOF to be less than the IDOF.
Astigmatism is one such factor, and it refers to situations in which lines of different
orientation have different planes of best focus. An example of the consequence
of astigmatism is shown in Fig. 2.27. In this figure, the vertical spaces are well
resolved, but the horizontal spaces are so defocused that the resist pattern is
bridged. This is an extreme example, patterned on an early 5× stepper. For modern
exposure systems, the distances between focal planes for objects of different
orientation are less than 50 nm and typically lead only to differences in linewidth,
rather than extreme consequences, such as bridging. For a cylindrically symmetric
lens, such as those typically found in lithography exposure tools, one would expect
astigmatism to occur between geometries parallel to a radius vector (sagittal)
and perpendicular to the radius vector (tangential). This is certainly true for
astigmatism that results from the design. However, astigmatism can occur between
geometries of any orientation in any part of the field because of lens manufacturing
imperfections. For example, the many lens elements that comprise a stepper lens
may not all be centered perfectly on a common optical axis, thereby breaking the
symmetry in the center of the lens field and potentially causing astigmatism where
none would be possible for a truly cylindrically symmetric system.
Another focus-related aberration is field curvature. As a result of this aberration,
the surface of best focus is a section of the surface of a sphere rather than a plane,
and focus is not constant throughout an exposure field. As seen in Fig. 2.28, good
focus can be maintained at some points within the exposure field, but most of the
field suffers from some defocus because of field curvature. For both astigmatism
and field curvature, the rays of light from a given point on the reticle are all focused
to a single point in the image, but the collection of such points are not always in
a single plane. Astigmatism and field curvature reduce the usable depth-of-focus,
which is the amount that the distance between the lens and wafer can be varied and
still maintain good imaging, as illustrated in Fig. 2.29.
Typical specified depths-of-focus for various wafer steppers are given in
Table 2.4. The amount of defocus that results in significant changes in either slope
or linewidth should be noted: it ranges between 1.5 and 0.4 µm, and has been
decreasing over time. This should be compared to the amount of focus variation
that can be induced within a process (Table 2.5). All parameters in Table 2.5, except
for stepper focus control and the metrology figure,35 refer to variations across an
exposure field. In the preceding discussion, it was shown that the relevant value for
circuit topography is the actual height of circuit features divided by the index of
refraction of the photoresist.
As lenses improved, with significant reductions in field curvature, astigmatism,
and other aberrations, the UDOF began to approximate the Rayleigh DOF. More
Figure 2.29 Usable depth-of-focus is smaller than the individual depth-of-focus because
of astigmatism and field curvature.
Optical Pattern Formation 41
Table 2.4 Typical depths-of-focus for various wafer steppers. Resolution and depth-of-
focus are given for approximately comparable conditions across technologies and exposure
platforms. As will be shown in Chapter 8, techniques have been developed over time to
improve resolution and depth-of-focus significantly.
Stepper Resolution (µm) Usable depth-of-focus (µm) NA Wavelength Year
Chuck nonflatness 30
Lens field curvature, astigmatism 20
Wafer nonflatness 40
Stepper focus control 20
Circuit topography 30
Metrology for determining focus 20
Total (sum) 160
Total (rss) 68
the range over which the 0.5-µm layer of resist can be moved while maintaining
resist linewidths and profiles within specifications.
Changes in linewidth as a function of focus and dose are shown in Fig. 2.30. At a
given dose, the linewidths change with defocus. It should be noted that the amount
of linewidth change as a consequence of defocus depends upon the exposure dose.
A dose at which there are minimal linewidth changes when the image is defocused
is referred to as the isofocal dose.
The thin-resist models used in the preceding section to illustrate the concepts
of resolution can be extended to considerations of focus. The light-intensity
distribution can be calculated for planes other than the plane of best focus;
examples are shown in Fig. 2.31. As can be seen, the optical-intensity profiles
degrade with defocus. However, there are positions where the light intensity
changes little, or not at all, with defocus. Processes that are set up to have
the edges of developed resist features correspond to these positions generally
have small changes in linewidth. These positions are known as the conjugate or
isofocal points,37 and they are usually close to the nominal line edges, i.e., the
resist linewidths have approximately the same dimensions as those on the reticle
(adjusted for the reduction ratio of the projection optics). They correspond to the
isofocal dose. Earlier, it was seen that the normalized derivative of the optical-
intensity profile at best focus was increased by having a substantial exposure
bias. As derived, parameters such as exposure latitude, linewidth control over
topography, and resist sidewall slope are improved with a large normalized
derivative. In the absence of defocus, one would conclude that a large print bias
would lead to an optimized process.
In the presence of defocus, the normalized derivative behaves differently, as
shown in Fig. 2.32. At best focus, the image log slope increases for a significant
Figure 2.30 Linewidth as a function of focus for dense lines.36 The process conditions
were: softbake and post-exposure bake at 110 ◦ C for 60 sec, and puddle development for
52 sec in AZ 300 MIF developer at 22 ◦ C exposed on a Nikon 0.54-NA i-line wafer stepper.
Curves such as these are referred to as Bossung curves.31
Optical Pattern Formation 43
Figure 2.31 Calculated light-intensity distributions at different focus settings for a 300-nm
space exposed at λ = 248 nm, NA = 0.5, and σ = 0.6. Defocus is measured in units of
microns (µm).
amount of print bias. The magnitude of the image log slope diminishes at all
print biases for defocus, but the degradation is greater away from the conjugate
point. For sufficiently large defocus, the image log slope actually decreases with
print bias. Thus, one is led to a significantly different conclusion concerning the
advantage of large print bias when defocus is a significant concern. It was shown
earlier in this chapter that the process window is increased with a large image log
slope. However, it is often advantageous to compromise the image log slope if there
is a limited depth-of-focus.
The thin-resist model of the preceding section can be extended to include
defocus, in which case the light-intensity profile becomes a function of two
variables:
where x refers to the distance parallel to the wafer plane and perpendicular to the
(long) resist lines and spaces, and ξ is the amount of defocus. The condition that
determines the position of the edge of the photoresist is
Expanding the right side of Eq. (2.45) in a Taylor series about the plane of best
focus, the linewidth change due to defocus ∆ζ is obtained:
!−1
∂2E ∂E
!
−1 1 1
∆x = (∆ζ)2 × 2 × . (2.46)
2 E0 ∂ζ E0 ∂ x
44 Chapter 2
Figure 2.32 Absolute values of the normalized derivative (image log slope) of the light-
intensity profiles shown in Fig. 2.31. The normalized derivative is in units of µm−1 and
defocus is measured in µm.
∂E
=0 (2.47)
∂ζ
in the plane of best focus, and higher-order terms in ∆x were dropped from Eq.
(2.46). The symmetry of the aerial image about the plane of best focus is expressed
by Eq. (2.47). In the presence of photoresist, or certain optical aberrations, this
symmetry is broken. This is not particularly problematic where the thin-resist
model is a valid approximation, but it can be relevant for situations involving high
numerical apertures and thick (relative to the Rayleigh DOF) photoresist.
In prior discussions, process latitude was improved when the image log slope
(the normalized derivative),
1 ∂E
× , (2.48)
E0 ∂ x
is increased. For example, as seen in Fig. 2.18, the image log slope at best
focus may be maximized for overexposed spaces, an operating point often distant
from the isofocal point. Consequently, there may be large variations in linewidths
because of defocus, offsetting any benefits of a large image log slope at best focus.
For situations involving large NAs, there is not a single definition for “best
focus.” For low-NA optics, where the Rayleigh DOF greatly exceeds the resist
thickness, most effects of defocus, such as linewidth and resist-slope variation,
behave symmetrically about some plane. Moreover, the smallest features are
typically imaged at that plane. In such a circumstance, it is reasonable to define
Optical Pattern Formation 45
Figure 2.33 Focus correction versus pressure for a stepper equipped with a Zeiss 10-78-
46 lens (0.38 NA, g line).
that plane as the plane of best focus. However, for high-NA optics, asymmetry
is introduced, and the choice of best focus becomes less obvious. In particular, the
middle of the region over which linewidths are within specifications is no longer the
place at which the smallest features are resolved, nor the resist slope maximized.
Because of the significant effects on lithographic performance, focus needs to
be well controlled. Unfortunately, a number of factors can cause focus to vary. For
example, a change in barometric pressure can induce a shift in focus. Lenses with
glass elements create images by means of the physical phenomenon of refraction
where the direction of a beam of light changes at the interface between materials
of different indices of refraction. The angle of refraction, and hence the focal
distance, depends upon the indices of refraction of the glass and air. While the
index of refraction of the air is ≈1, it differs slightly, and changes as barometric
pressure changes. As the air pressure inside of wafer fabricators rises and falls with
the outside barometric pressure, the focal length of lenses containing refractive
elements also changes.38 The focus adjustment required to keep an exposure tool
at best focus while barometric pressure changes for an exposure system that does
not have automatic focus for pressure compensation is shown in Fig. 2.33. Modern
exposure tools control focus automatically for changes in air pressure. Some
systems use pressurized lenses, where the air pressure inside the lens is kept at
a fixed value regardless of the pressure of the air in the fab. An alternative method
of control involves the measurement of barometric pressure and compensation of
the focus through software. In addition to adjusting the focus when barometric
pressure changes, exposure tools also need to move wafers up and down, to adjust
their tilt, and to correct for changes in wafer thickness and flatness if the resist films
are to be maintained in focus. Automatic focusing systems are discussed in detail
in Chapter 5. Barometric pressure affects the reduction of the projection optics39 as
well as focus, and requires similar compensation.
46 Chapter 2
Figure 2.34 Reflections from mirror lens elements are purely geometrical and independent
of the index of refraction of the air.
Mirrors with curved surfaces can also be used for imaging. The imaging
properties from reflective surfaces are purely geometrical (Fig. 2.34), and are
independent of the refractive index of the air or wavelength of the light:
ΘI = ΘR , (2.49)
where ΘI and ΘR are the angles of incidence and reflection, relative to the normal,
respectively. Lenses that use mirrors have less sensitivity of focus to barometric
pressure than lenses that use only glass elements. Stepper lenses will be discussed
in more detail in Chapter 5.
FLEX (Focus-Latitude Enhancement eXposure) has been proposed as a method
for increasing the DOF; this method involves partial exposures at different
focus positions.40,41 With FLEX sharply focused images are superimposed with
defocused ones throughout the depth of the resist film. The contact holes are
extended by doing partial exposures at multiple locations along the optical axis.
FLEX is also known as focus drilling. The appropriate choice of exposure and
focus steps can lead to a 3× or 4× increase in contact-hole DOF, but the DOF
for line features is typically not improved. FLEX has been demonstrated to
improve the DOF, but with a loss of ultimate resolution by the multiple defocused
exposures.42 As we shall see in a later section, this is a general property of
most attempts to improve the DOF that do not address the fundamental issue of
diffraction at the reticle.
Problems
2.1 Assuming diffraction-limited optics, show that the optical resolution of a 0.54-
NA ArF lens is approximately the same as a 0.7-NA KrF lens. Which has the
greater depth-of-focus?
2.3 The following table gives wavelengths and numerical apertures used in volume
manufacturing over time:
Optical Pattern Formation 47
Year Wavelength NA
2001 248 nm 0.80
2003 193 nm 0.75
2005 193 nm 0.85
2007 193 nm 0.92
Calculate the Rayleigh DOF for each year. What is the trend for depth-of-focus
over time?
Using Eq. (2.32), show that the changes in linewidth for a 10% change in
exposure dose are 9, 11, and 13 nm for 250-, 180-, and 150-nm features,
respectively. Show that the dose control needs to be 27%, 16% and 11% for
250-, 180-, and 150-nm features, respectively, for linewidths to change by no
more than 10% of the feature size. How does the challenge of process control
change as smaller features are produced?
2.5 Assuming normally incident coherent light on a grating of equal lines and
spaces, show that the angles of incidence onto the wafer of the first-order
diffracted beams in air are 29.7, 43.5, and 55.8 deg for 250-, 180-, and 150-
nm lines, respectively. What are the minimum numerical apertures required to
image these features?
2.6 Show that the depth-of-focus is reduced by ∼59 nm by increasing the resist
thickness from 200–300 nm for a resist with an index of refraction of 1.7.
References
1. M. Born, and E. Wolf, Principles of Optics, 7th ed., Cambridge Univ. Press,
Cambridge (1999). Note that the sign convention for the angles employed by
Born and Wolf (and in this text) is not used universally. As used here, the
angles indicated in Fig. 2.5 are positive.
2. Solid-C was a simulation program produced by SIGMA-C GmbH of Munich,
Germany. SIGMA-C was acquired by Synopsys, Inc., in 2006. Synopsys
produces lithography simulation software called Sentaurus LithographyTM .
3. G. B. Airy, “On the diffraction of an object-glass with circular aperture,” Trans.
Camb. Phil. Soc. 5(3), 283–291 (1835).
4. Lord Rayleigh, “Investigations in optics, with special reference to the
spectroscope,” Phil. Mag. 8, 261–274 (1879).
48 Chapter 2
Figure 3.2 The cross-linking of cyclic polyisoprene, by means of a commonly used azide,
2, 6-bis(4’-azidobenzal)-4-methylcyclohexanone.3,4
were introduced in the late 1970s with optics that functioned at the mercury g
line (λ = 436 nm). The negative resists used at that time were not very sensitive
at such a long wavelength. There were good reasons to use g-line illumination,
such as it facilitated the design and fabrication of the stepper’s projection optics.
The mercury g line is visible (blue) light. Many lens designers were familiar with
designing at visible wavelengths, and a great deal of lens metrology involved visual
assessment of optical interference patterns. A transition to shorter wavelengths
would have required significant development by the lens makers (something that
occurred later). Fortunately, there were g-line-sensitive positive resists, based on
novolak resins, commercially available at the time steppers were first introduced.
This, along with other advantages of the positive resists, resulted in an industry-
wide transition from the use of negative resists primarily to the predominant
use of positive resists. With the exception of contact-hole patterning (discussed
Photoresists 53
shortly), there were no fundamental reasons that positive resists should have
been chosen over negative ones. However, the available positive resists had other
necessary attributes, such as the lack of swelling. The transition to positive resists
occurred not simply because of intrinsic advantages of positive resists (with the
exception of contact-hole patterning), but rather because of specific advantages
of the existing (novolak) positive resists relative to the existing (azide/isoprene)
negative resists5 (see Table 3.1). Recently, very high-resolution negative resists
that can be developed in the same aqueous developers as positive resists have been
introduced.6–8
In principle, there are no compelling reasons to use positive resists in preference
to negative resists, with some exceptions. Consider the aerial image of square
features—contact holes—shown in Fig. 3.3. For producing contacts with negative
resists, the feature on the mask is a square of chrome surrounded by glass, while
the mask feature for a positive-resist contact process consists of a square of glass
surrounded by chrome. As can be seen in Fig. 3.3, the negative-resist process
suffers from having undesirable exposure at the center of the contact region,
which can lead to closed contacts. The creation of contact holes on wafers is thus
facilitated by the use of positive resists, because the aerial image is superior.9 For
other patterns, the advantages of one tone of resist over the other are less clear. For
example, on a grating with lines and spaces of nearly equal size, the clear and dark
patterns are optically a rough equivalent, so one tone of resist is not going to be
fundamentally preferable to the other tone. There are recently developed negative
resists with very high performance.8
Resists are usually operational over specific wavelength ranges, and they are
usually optimized for application at very specific and narrow ranges of wavelength.
To work well at a given wavelength, there are several requirements:
(1) The resist photochemistry must take place efficiently. This is important for
cost-effective manufacturing, since lengthy times to expose the resists reduce
exposure-tool productivity. It should be noted that there is such a thing
as an exposure dose that is too low, leading to dose-control problems and
poor pattern fidelity because of shot noise.10,11 (Shot noise is discussed in
Section 3.8 and Chapter 12.)
(2) The resist must not be too optically absorbent. For high-resolution imaging,
resist films need to be exposed with a high degree of uniformity from the
Figure 3.3 Simulated cross sections of the aerial images of 250-nm contact holes. The
simulations were performed using Prolith2 and using the high-NA full-scalar model. The key
parameters were wavelength = 248 nm, σ = 0.5, and NA = 0.7. A chrome/glass binary mask
was assumed.
top of the resist film to the bottom, and light will not penetrate the resist film
adequately if it is too optically absorbent.
(3) There must not be competing positive and negative reactions. These can occur
when the light drives desired photochemical reactions in a positive resist but
may also induce polymer cross-linking, a negative-resist characteristic.
These are some of the broad issues related to resists. The remainder of this chapter
discusses specific technical issues in greater detail.
problem for semiconductor processing, but adhesion promoters are available for
use on metals. For metal surfaces, chelating-silane adhesion promoters, such as
trimethylsilylacetamide, have been proposed.14,15
The adhesion-promotion process can be monitored by measuring surface
wettability. This is usually accomplished by placing a drop of water on a primed
wafer and measuring the contact angle θ (Fig. 3.6). Forces are balanced in
equilibrium such that
where γLV is the surface tension of the liquid in equilibrium with its saturated
vapor, γSV is the surface tension of the substrate in equilibrium with a saturated
atmosphere of water, and γSL is the surface tension between the water and the
primed substrate.16 The angle θ is shown in Fig. 3.6, and a properly primed surface
typically has values of 50–70 deg for the contact angle of water. Higher angles
indicate greater degrees of priming, i.e., more hydrophobic surfaces and smaller
values of γSL . From thermodynamic arguments, the work of adhesion of the liquid
to the solid has been shown to be17
There is such a thing as overpriming.18 While priming has the positive effect
of ensuring resist adhesion during resist development, it often has the effect of
reducing the adhesion between the resist and the substrate. That is, the primary
purpose of the priming is to prevent liquid from penetrating the resist/substrate
interface during development19 (Fig. 3.7). Large amounts of priming reduce this
penetration, but may do so at the expense of true adhesion of the resist to the
substrate. With overpriming, the resist may not wet the surface of the wafer
adequately during the resist-coating operation, resulting in parts of the wafer that
are not covered by resist. The resist may delaminate—“pop”—during exposure,
because the resist is not adhering adequately to the wafer surface.
Vapor priming can be performed in stand-alone priming ovens or in single-
wafer modules. With stand-alone ovens, it is possible to bake the wafers for a long
time, thereby achieving maximum dehydration. On the other hand, full integration
with the rest of the lithography equipment is possible when priming takes place
in single-wafer modules. As might be expected, adhesion is a property that is
Figure 3.7 The penetration of liquid at the interface between the resist and the substrate.19
modulated by the resist composition. Some polymers will adhere better to surfaces
than others.
Figure 3.8 Schematic of the process of spin coating. (a) Resist is dispensed onto the wafer
in liquid form. (b) The spreading and thinning of resist during spinning.
58 Chapter 3
Figure 3.9 A bare silicon wafer being coated with resist in a DNS RF3 coat module. (Photo
courtesy of Dainippon Screen Mfg. Co. Ltd.)20
Figure 3.10 Resist thickness versus spin speed for resists of different viscosities from the
TDUR-DP069 family of resists, spun on 200-mm wafers, and using a ProCell from Silicon
Valley Group.21
Users select the rate of spin by first determining the final thickness desired.
Average thickness can be varied by adjusting the rate of spin and/or resist viscosity.
While a coater may be capable of spinning over a wide range of speeds, there is
usually a smaller range over which the coatings are uniform. Typically, spinning
either too fast or too slow produces nonuniform coatings, as illustrated in Fig. 3.11.
Approximate ranges over which uniform coatings are produced are listed in
Table 3.2. This range of spinning speed is also somewhat dependent on resist
material and solvent. The viscosity of the resist is chosen to provide the desired
thickness within the range of spin speed where uniform coatings are obtained.
Fine-tuning average resist thickness is accomplished by the final rate of spin.
Photoresists 59
Figure 3.11 Average resist thickness and uniformity of Shin-Etsu 430S resist, coated on
200-mm wafers on an SVG ProCell.21
100 3000–5000
200 1500–3000
300 1000–2000
The process of fluid spreading on a spinning disk has been the subject of
considerable study since spin coating has emerged as the primary method for
applying resists. It is possible to gain insight into the spin-coating process by first
considering a simple situation in which a flat wafer (at rest) initially has a uniform
layer of fluid at a thickness of h0 . It is assumed that the fluid has constant viscosity
during the spin-coating process. If the wafer is then spun at a rotation rate of f ,
starting at time t = 0, then the balance of forces is22
∂ 2v
−η = 4π2 ρ f 2 r, (3.3)
∂z2
where η is the fluid’s viscosity, v is the velocity of material at radial position r and
height z, and ρ is the mass density. There is also a conservation-of-mass equation:
2 3
∂h −4π2 ρ ∂ r h
= , (3.4)
∂t 3η ∂r
where h is the thickness of the resist at radius r and time t. If any other forces or
physical processes are ignored, these equations result in23
h0
h= s , (3.5)
16π2 f 2 2
1+ h0 t
3η
60 Chapter 3
where the thickness is independent of radius. This equation shows a film that thins
to zero thickness as t → ∞, while resists tend asymptotically to finite thicknesses.
What is missing from the preceding analysis is solvent evaporation, which causes
the viscosity to increase over time to the point where resist solidifies. It is this
effect that leads to resist thickness that is fairly independent of spin time, following
the initial transient thinning.23 Solvent evaporation causes the resist to cool, thus
affecting its viscosity and subsequent rates of evaporation. This evaporation takes
place nonuniformly over the wafer surface and is a function of spin speed, exhaust,
and solvent type. This variation in solvent evaporation rates can lead to resist-
thickness nonuniformity. It is possible to compensate for nonuniform rates of
solvent evaporation by choosing suitable initial temperatures for the resist, wafer,
and ambient atmosphere, which are independently adjustable on modern resist
coaters (see Fig. 3.12 in Color Plates). Since the resist thickness will depend upon
the solvent evaporation, the coating process depends upon the resist’s particular
solvent.24
Resist can be dispensed while the wafer is motionless or spinning. These two
methods are referred to as static and dynamic dispense, respectively. The dispense
nozzle can be held fixed above the center of the wafer during dispense, or, with
some hardware configurations, it can be moved. Movable dispense arms are useful
for obtaining good resist coatings while using a small amount of photoresist on
large-area wafers.
Figure 3.12 Resist uniformity contours as a function of air and resist temperatures for
Shin-Etsu 430S resist coated on an SVG ProCell. The data are single-standard deviations, in
units of Angstroms.21 The initial wafer temperature was 22 ◦ C. The most uniform coatings are
produced with resist and air temperatures slightly different from the initial wafer temperature
(see Color Plates).
Photoresists 61
the amount of resist dispense may not be well controlled. An optimum amount of
suck-back is necessary, and is dependent upon the viscosity of the resist.
Resists are spun to different thicknesses to satisfy different requirements. The
best resolution is often obtained with thin resists; however, resist films need to
be thick enough to serve as etch masks or to block ion implantations. High-
resolution lithography today typically employs resists with thicknesses in the range
of 0.1–0.2 µm. Very thin coatings have not been used for several reasons, in spite
of the resolution potential. First, the resist film needs to be intact throughout the
entire etch process, during which the resist film erodes. In typical plasma-etch
processes, the etch rate of the resist is less than one-third that of the underlying
film, but in some instances can have essentially the same etch rate as the substrate
film. The resist film must be thick enough to have resist remaining at the end
of the etch process over those areas of the substrate that are not supposed to be
etched. This includes areas involving substrate topography, where the minimum
resist thicknesses often occur.33 A profile of resist over topography is shown
schematically in Fig. 3.14. Prior to the widespread use of chemical-mechanical
polishing, requirements imposed by wafer topography often determined resist
thicknesses. In particular, the resist thickness, as measured in large flat areas, must
generally be much thicker than the underlying topography.
Besides the problems resulting from substrate topography and etch selectivity,
ultrathin resist coatings were long avoided because of concerns with defects,
particularly pinholes. Early studies showed large increases in pinhole density for
resist coatings thinner than 150 nm.34,35 However, subsequent studies36,37 showed
that low-defect ultrathin resist coatings could be obtained by applying the same
careful attention to the coating parameters for ultrathin resists that was applied
to coating production resists where yields are critical. However, ultrathin resist
coatings are fundamentally susceptible to pinhole formation. Insight into the
problem was provided by Okoroanyanwu,38 who (referencing earlier work) showed
that the thickness at which instabilities occur in spin-coated films is material-
Photoresists 63
Figure 3.14 Substrate topography results in regions with thinner resist than found on flat
wafers coated under identical conditions.
dependent,39 and also depends on the interaction between the liquid resist and the
substrate.40
This can be understood as follows. Consider a liquid film of nominal thickness
h0 on a flat solid surface. Perturbations in the surface, such as those that arise
during spin coating, will induce a pressure gradient.38 At position x̃ and time t these
perturbations induce modulations in the thickness h ( x̃, t), which will be described
by sums of terms of the form40
1 q2
" !#
A
= 2 3
γq h0 − , (3.7)
τ η 2πh0
the instabilities will grow with time, until the depth of the instability is the same as
the nominal resist thickness. At this point, a pinhole forms (Fig. 3.15). Note that
the thicknesses at which the instabilities occur are material dependent. Thus, some
early observations of pinhole formation were a consequence of intrinsic material
properties, not simply the failure to ensure clean coating processes. For future
technologies that may require ultrathin resist layers, resists need to be formulated
so that they are robust with respect to pinhole formation.
64 Chapter 3
Figure 3.15 Critical resist thickness versus wavenumber for polystyrene.39 A = 8 × 10−20 J
and γ = 40 × 10−3 J/m2 .
Somewhat thick coatings (0.5–3 µm) tend to be required for implant masks, but
the resolution required for these layers is usually less than for the most critical
layers. Extremely thick coatings, where the thickness can be as thick as 100 µm,41
are used for processing thin-film heads for disk drives and for micromachining
applications.42,43 The coating of very thick resists involves special considerations.
One approach that reduces material consumption and improves uniformity is to use
a closed coater bowl which has a solvent-saturated atmosphere.44 This approach is
also advantageous for coating square substrates, such as those used for fabricating
photomasks, where obtaining good resist uniformity at the corners has always been
a challenge.45
After the resist is dispensed and the wafer has completed its spinning, there is
a thick ring of resist at the edge of the wafer referred to as an edge bead.46 The
edge bead exists not only on the upper surface of the wafer, but also on the wafer
edge. It may even extend to the bottom surface of the wafer, near the wafer edge
(Fig. 3.16), and because it can chip and flake, leading to defects, it is desirable to
remove the edge bead prior to post-lithographic processing. In the steps that follow
lithographic patterning, the wafer may be subjected to elevated temperatures,
which will harden the resist and cause the remaining edge bead to be particularly
difficult to remove. Consequently, it is best to remove the edge bead during the
lithography process. Oftentimes, immediately following resist coat or softbake, the
edge bead can be removed by using solvent streams or by exposure. Solvent edge-
bead removal is usually applied in the resist-coating module, and the edge bead is
typically removed by this method following resist coating. Alternatively, a separate
unit may be used to expose the resist at the edge of the wafer, and the edge bead
will be removed during the develop step. Optical edge-bead removal avoids solvent
splashing that can sometimes occur with solvent edge-bead removal. However,
there are some films applied by spin coating, such as organic antireflection coatings
(to be discussed later), that are not photosensitive, and their edge bead cannot be
Photoresists 65
Figure 3.16 At the edge of wafers there is extra resist, known as the edge bead.
attention. Consider the data shown in Fig. 3.17, which shows the temperature of
wafers as they are placed on a hot plate, removed, and finally set on a chill plate.50
When the wafers are placed at time equal to 0 on the hot plate, their temperatures
rise quickly to the hot-plate temperature of 105 ◦ C. After 60 sec on the hot plate, the
wafers are lifted above the plate on pins, so that the wafers can then be picked up
by a wafer-transfer arm and moved to a chill plate. The temperature of the wafers
declines slowly while the wafers are sitting on the pins. Because the wafer-transfer
arm may or may not be occupied with moving other wafers, the wafers may sit
on the pins for variable times. The change in critical dimensions was measured
for a 10-sec variation in time on the pins and was found to be 7.4 nm when the
relative humidity was 42%, and 10.8 nm when the relative humidity was 49% for a
conventional g-line resist.50 Chill plates were introduced to semiconductor resist
processing in order to provide reproducible bake processes,51 and the example
shown in Fig. 3.17 illustrates that even subtle aspects of the bakes need to be
considered in order to squeeze out the last few nanometers of process control.
Hot plates need to be well designed in order to ensure good process control. A
hot plate receives a thermal shock when a wafer is first placed on it, and the heating
elements respond accordingly. If the hot plate controller is not well designed,
there may be a temperature overshoot before the wafer temperature is returned
to the desired value. This overshoot can cause statistically measurable shifts in
linewidth.52 In order to have a well-controlled process, the wafer temperature
needs to be controlled across each wafer, wafer-to-wafer, and during the entire
heating and cooling cycle for each wafer. State-of-the-art hot-plate designs control
temperature throughout the heating cycle and meet uniformity specifications even
while the wafer is heating, not just during the steady state.
Solvent evolves from the resist film during the softbake, and the rate of
solvent evolution depends upon the transfer rate at the resist-gas interface.53,54
Solvent content is an important characteristic because resist-development rates and
diffusion during post-exposure bakes (discussed later in this chapter) depend upon
Figure 3.17 Changes in effective baking because of variable times between bake and
chill.50
Photoresists 67
the solvent content of the resist. Thus, exhaust and airflow that play a role in resist
baking need to be properly controlled.
For very thick resists, the need for long bakes is illustrated in Fig. 3.18,
which shows scanning electron micrographs of cross-sectioned contacts formed
in 100-µm-thick Shin-Etsu SIPR 7120 i-line photoresist. The contact on the left
[Fig. 3.18(a)] received a hot-plate bake at 100 ◦ C for 8 minutes, followed by a
lengthy two-hour oven bake at 110 ◦ C, while the contact on the right received much
less baking. Pitting occurred in the developed resist because too much solvent
remained in the resist as a consequence of inadequate softbake. Because it takes
a long time for solvents to percolate through the polymer matrix comprising the
resist, long bakes are required for very thick resists in order to remove solvent
adequately from the portions of the resist film closer to the substrate.
After the lithography patterning process is completed, the wafers usually go
into another processing step, such as reactive ion etch or ion implantation, that
takes place in a vacuum. To preserve the vacuum integrity of the post-lithographic
processing equipment, it is usually desirable to bake the wafers one last time in
order to drive off low-molecular-weight materials that might otherwise outgas
in the equipment used for post-lithographic processing. To achieve the desired
goal, the required temperature for this bake is often above the decomposition
temperature of key constituents of the photoresist, such as photoactive compounds
or photoacid generators. For this reason, the high-temperature bake—usually
referred to as a hardbake—is deferred until after the critical photochemistry and
pattern formation has been completed.
The tools used for resist processing are often referred to as “tracks” because, in
their earliest configurations, the wafers were moved along a track from the input
cassette, to resist coat, then to bake, and finally to the output cassette. Modern
resist-processing equipment often contains several, or all, of the modules needed
to address various aspects of resist processing: vapor priming, resist coating, edge-
bead removal, bakes, and develop. The straight-line track configuration has been
replaced, and the sequence of processing can now be chosen more flexibly by the
Figure 3.18 Cross sections of contacts in 100-µm-thick resist. (a) Softbake consisting of a
hot-plate bake at 100 ◦ C for 8 min, followed by a two-hour oven bake at 110 ◦ C; (b) inadequate
bake.55
68 Chapter 3
user. Wafers are moved by robots from station to station, and operations that occur
sequentially in time no longer need to be processed in adjacent processing modules.
Nevertheless, lithographers still often refer to the resist-processing equipment as
“tracks.”
A number of companies, listed in Table 3.3, produce equipment for resist
processing. Most of these companies produce tools that have capability for baking,
developing, and removing edge bead, in addition to resist coating. Others make
equipment that is more specialized or designed for laboratory applications, in
which resist materials may be hand-dispensed.
Company Location
isomers of the basic cresol constituents, and by using different couplings between
the cresol elements. Examples of different couplings are shown in Fig. 3.21.
Polymer molecular weight distributions also affect resist dissolution properties.
The evolution of nitrogen can have significance. Very sensitive resists may
evolve nitrogen rapidly—perhaps too fast for the nitrogen to diffuse to the surface
of the resist—and bubbles may form in the resist. This may also occur when very
intense light sources are used. Following spin coating and bake, the resist may
contain residual stress, which may become relieved when the nitrogen is evolved,
leading to delamination of the resist from the wafer surface after exposure.
Figure 3.21 Different couplings in novolak that can lead to different resist characteristics.
Photoresists 71
events (gain), hence the term chemical amplification. Resist behavior depends on
the strength of the photoacid and its size, which affect its catalytic efficiency and
diffusion characteristics, respectively.64
One of the other problems with developing useful resists for DUV lithography
was the high level of optical absorption by novolak resins at DUV wavelengths.
The problem of highly absorbing resist is shown in Fig. 3.24. Light absorption,
as indicated by photoacid concentration, is modeled for a resist with parameters
appropriate for a commercially available DUV resist, and hypothetical resists with
significantly more absorption. For highly absorbing resists, the light fails to reach
the bottom of the resist film, so the space could not be developed out. For less-
absorbing resists, such complete failure of the resist process does not occur, but the
resist profiles would be very sloped.
The problem of novolak’s high DUV absorption was overcome by using
polymers transparent at wavelengths ≤ 260 nm. Poly(hydroxystyrene) (Fig. 3.25)
was one polymer known to dissolve in aqueous bases65 and to be transparent
at DUV (∼250 nm) wavelengths. It is the polymer backbone of a number of
positive resists used today for KrF lithography. As lithography is developed
for shorter wavelengths, the problem of optical absorption reoccurs, and new
chemistries need to be developed. No single chemical platform has been adopted
for ArF lithography, and a number of different materials have been developed
successfully.66
72 Chapter 3
Figure 3.24 Photoacid concentration in a 200-nm space, modeled using Solid-C. The
resist Dill parameters for the “production”-type resist are A = 0.048 µm−1 and B = 0.054 µm−1 .
(The Dill parameters are defined in Chapter 4.) The index of refraction for the resist is
n = 1.76, and the resist thickness is 500 nm. The optics are 0.75 NA, λ = 0.75, and σ = 0.5.
Light-shaded regions indicate areas where light has penetrated.
In one of the first DUV resists widely used at IBM, where chemical
amplification was invented and advanced significantly, the first protecting group
was t-BOC, shown in Fig. 3.26. Later, IBM developed APEX-E, which became
the first chemically amplified positive DUV resist that was commercially available
(from Shipley67 ).
Three-component DUV resists, most notably the acetal-based materials,
that have functionality analogous to novolak/DNQ resists68–71 have also been
generated. The three-component resists consist of a matrix polymer, dissolution
inhibitor, and a photoacid generator. These materials differ from the t-BOC and
related resists in that the chemical species responsible for dissolution inhibition
are not necessarily chemically bonded to the resin.
At short wavelengths, it is often difficult to find polymers for the resist
backbones that have all of the desired attributes, such as transparency, etch
resistance, and good adhesion to substrates. Consequently, resists can be
Photoresists 73
Figure 3.26 One of the earliest chemically amplified resists, with a t-BOC protecting group
providing functionality and a poly(hydroxystyrene) polymer backbone.
Figure 3.28 A chemically amplified resist that has been exposed to amine-containing air
during the time between exposure and post-exposure bake. With short exposure to amines,
(a) a small amount of T-topping is induced, while (b) severe poisoning of the top of the resist
occurs with longer exposures.
Fig. 3.28 because photoacid was neutralized in the top layer of the resist and was
therefore unable to fully deprotect the top of the resist. With extended exposure to
amines, the top layer almost failed entirely to develop [Fig. 3.28(b)].
Several methods have been employed to address this problem. The first—
the brute force method—involves air filtration in the enclosed environment of
the resist-processing equipment and exposure tools. Activated charcoal has been
used to remove complex organic amines, such as NMP,74 while weak acids,
such as citric acid, have been used to neutralize ammonia.76 Newer polymeric
filters are capable of removing amines of low and high molecular weights77
and have an improved lifetime before saturation,78 an important consideration
for maintaining a productive lithography operation. Cluster operations are used
where resist-processing equipment is linked to the exposure tool. This makes it
possible to process a wafer completely through the lithography operation (with the
exception of most post-patterning metrology), without having the wafers leave a
filtered environment, and it minimizes the critical time between exposure and post-
exposure bake.
In another approach, a diffusion barrier is coated over the resist. This is
typically another organic film spin coated onto the wafer immediately following
the resist coating or softbake.79–81 These overcoating materials, called topcoats,
are typically designed to be water soluble and can be rinsed off the wafer prior
to development. Topcoats can provide considerable immunity to ambient amines,
but do so at additional expense, increased process complexity, and the potential
for added defects. Topcoats also address the problem of T-topping that results
from evaporation of photoacid from chemically amplified resist.82 Additionally,
Photoresists 75
interface.91 There are several solutions to the problems of footing and bottom
pinching:
(1) If possible, modify the substrate material.
(2) Insert a coating between the resist and the substrate layer. This is often an
antireflection coating and is discussed in the next chapter. However, with resists
that are extremely sensitive to bases, substantial amounts of bases can penetrate
many coatings, and footing problems remain.
(3) Use a resist intrinsically robust against poisoning by bases.
Since the first two approaches are often unacceptable, resists with built-in
resistance to poisoning are very desirable.
An interfaced operation of tracks and exposure tools has stimulated
improvements in equipment reliability since track down time also keeps the stepper
out of operation, and vice versa.92 In addition to reducing the equipment failures
that cause unscheduled down time, photocluster productivity can be improved
through good management of preventive maintenance schedules that consider
requirements of both the resist-processing equipment and the exposure tool.
Stepper-track integration also requires a high degree of training for operators and
technicians who need to be skilled in operating two pieces of equipment.
With chemically amplified resist, there is the potential for the evolution
of organic material immediately following exposure. The photoacid generator
decomposes into the photoacid and another material that may be sufficiently
volatile to outgas from the resist. With resists where the deprotection can occur
at room temperature, there is yet another chemical mechanism that can result in
the outgassing of organic vapors during exposure.93 The organic material evolving
from the resist during exposure has the potential to coat the bottom of the projection
optics. Bottom-optic contamination of multimillion-dollar projection lenses is
clearly undesirable, and resists that have low levels of outgassing are preferable.
Well-designed airflows between wafers and lenses can also mitigate the effects of
resist outgassing.
Since surfactants may also modify development rates,97 processes can often differ
significantly depending upon whether the developer contains surfactants or not.
The concentration of developers is usually measured in terms of normality
(N).98 While resist processes can work in principle over wide ranges of developer
normality, 0.26 N TMAH solutions appear to have become somewhat of an
industry standard. This standardization has been instituted primarily to reduce
developer costs, rather than because 0.26 N developer normality has been found to
provide the greatest lithography capability.99,100 Some resist-processing equipment
has point-of-use developer dilution capability, which enables the use of more dilute
developer while maintaining the cost advantages of a standard developer normality.
In the 1970s and early 1980s, wafers were typically developed by immersing
batches of wafers in tanks of developer. This method has been largely superseded
by single-wafer processes because of superior uniformity and control, relative
to batch processes. Moreover, single-wafer processes are amenable to cluster
operation, where the resist-processing tracks are interfaced to the exposure tools.
The “puddle process” is the most common method for resist developing, where
100–200 cc of developer are dispensed on each wafer, forming a puddle that covers
the entire wafer surface.101 The developer is left on the wafer for the desired length
of time, typically 30–75 sec. The developer is then rinsed off with deionized water.
Additional puddles are sometimes applied to wafers in order to reduce defects and
to reduce loading effects, where critical dimensions have a dependence on local
pattern density, because the developer becomes depleted in regions where a lot of
material is to be removed during the development process.102
Puddle development on a track system is the most commonly used method today.
The tubes carrying the developer to the wafer are jacketed, and temperature is
controlled actively to within ±0.2 ◦ C. Since the process of development involves
chemical reactions, resist-development rates are temperature dependent,103 and
temperature stability is required for a controlled process. Once the developer is
on the wafer, evaporation causes the wafer to cool. Because the rate of evaporation
is greatest on the outside of the wafer, thermal gradients are generated leading to
different rates of development from the center to the edge of the wafer. (TMAH
developers become more aggressive at lower temperatures.) Evaporation also
causes developer concentration to increase. These effects can be minimized by
using developer that is dispensed at a low temperature to begin with and by a low
airflow around the wafer while the resist is developing.
For puddle development, the nozzle through which the developer is ultimately
dispensed can be a simple tube or a more complex device. The developer
nozzle must provide uniform application of developer and not be a significant
source of defects. These two requirements have not always been easy to achieve
simultaneously.
Developer has also been sprayed onto wafers, either continuously during
the entire development process, or as a means to form a puddle of developer.
Continuous spraying has the potential to consume large amounts of developer, and
a clever ultrasonic nozzle has been devised that atomizes the developer in a jet of
nitrogen;104 however, this approach is not without susceptibility to defects.
78 Chapter 3
d
R= . (3.10)
2 cos θ
2γ cos θ
F = hD , (3.11)
d
Figure 3.29 Closely spaced, high-aspect-ratio lines of resist that have collapsed after
develop.
Photoresists 79
where D is the length of the resist line and h is the height of the rinse liquid. From
Eq. (3.11), one can see that pattern collapse is lessened when the space d between
resist lines is large.
Pattern collapse also depends upon the height of the resist line h. Since wider
resist lines will be stiffer and more mechanically resistant to deformation (for a
given applied force) it is to be expected that pattern collapse depends upon the
height relative to the width, that is, the resist aspect ratio.108
Resist collapse can be reduced by adding a surfactant to the rinse liquid.109–112
Surfactants are beneficial because they reduce the surface tension. Heating the
resist during development has also been proposed as a way to reduce pattern
collapse,113 by hardening the resist and therefore increasing its structural stability.
Frequently, wafers are baked after exposure but before development. One
reason this might be required is chemical amplification. Other motivations for
incorporating a post-exposure bake into the resist process are discussed in
Chapter 4. The post-exposure bake is usually a very critical process, and linewidths
are typically more sensitive to this bake than to the softbake.
Linewidth sensitivities to post-exposure bake temperature for commercially
available KrF resists are shown in Table 3.4. Resist suppliers have responded to
industry requests for materials providing better linewidth control, as evidenced
by reductions in post-exposure bake linewidth sensitivities in the progression
of Apex-E → UVIIHS → UV6. These sensitivities have also driven hot-plate
specifications since resists with essentially no post-exposure linewidth variation are
not universally applicable to all feature types or have provided the best resolution.
As one might expect, the dependence of linewidth on hot-plate temperature
control will be influenced by the quality of the optical image. This can be
understood as follows.116 With a post-exposure bake (PEB) at temperature τ for
a controlled period of time, there will be an effective blurring of the exposure dose
E(x). If we assume that this blurring is described by a simple Fickian diffusion
mechanism, then this “diffused image” Eτ (x) is related to the exposure dose E(x)
80 Chapter 3
as follows:
−(x−z)2
Z
1
Eτ (x) = √ E(z)e λ2
D dz, (3.12)
λD 2π
where λD is the diffusion length associated with the PEB process. If the temperature
changes τ → τ ∗, then
∂ Eτ (x)
Eτ∗ (x) = Eτ (x) + ∆τ + · · · . (3.13)
∂τ
At the different temperature τ∗ the feature edge moves x → x + ∆x. Since
∂ Eτ (x)
" # ! !
1 1
∆CD = 2∆τ . (3.17)
Eτ (x) ∂τ ILS
−dT
R= . (3.18)
dt
−1 ∂T
γ= at ε = 1, (3.20)
T 0 ∂ (ln ε)
where ε = E/E0 , E is the exposure dose, and E0 is the minimum dose to clear the
resist. This is the same parameter discussed in Chapter 2. From Eqs. (3.19) and
(3.20),
Z t
−∂
γ= R(M)dt0 at ε = 1, (3.21)
T 0 ∂ (ln ε) 0
−1 t ∂R ∂ M
Z ! !
γ= dt0 at ε = 1. (3.22)
T 0 0 ∂ M ∂ε
This is an interesting expression. The integrand is divided into two factors. The first
is governed by the development properties of the resist, and the second factor is
determined by the exposure. For example, a dyed resist would be more absorbing,
reducing the change in M near the resist-substrate interface during exposure,
thereby reducing γ. This also explains the curious observation that measured
contrast shows variations with resist thickness,117 which is not characteristic of
an intrinsic property of the resist chemistry.
82 Chapter 3
T0
∂ ln R dz
Z !
R(T 0 )
γ= , (3.23)
T0 0 ∂ ln E R(z)
where z is the depth in the resist film. (Again, technical rigor has been less
than complete with the taking of logarithms of nondimensionless quantities.) The
expression
∂ ln R
= γth (3.24)
∂ ln E
Figure 3.32 Illustrations of line-edge and linewidth roughness. (a) The ideal resist line
edges are dashed lines, while the actual line edges follow the solid lines. (b) At position x
the deviation of the actual line edge from the average line edge ȳ is y(x) − ȳ. (c) The linewidth
at position x is L(x).
[see Fig. 3.32(c).] The measure of LWR is three times the linewidth’s standard
deviation:
v
u
N
t
1 Xh i2
LWR = 3sLWR = 3 L(xm ) − L̄ , (3.26)
N − 1 m=1
where L̄ is the average linewidth along the length over which LWR is measured.
Because linewidth roughness has direct impact on transistor performance,126–131
and gate linewidth control is a critical parameter for microprocessor performance,
resist requirements in the International Technology Roadmap for Semiconductors
84 Chapter 3
(ITRS) are stated in terms of LWR. When the roughness along the two opposite
edges of a line are uncorrelated,
√
LWR = LER 2. (3.27)
Edge roughness with 3σ ∼ 3–5 nm has been measured typically for leading-edge
ArF resists. It has proven useful to look at the characteristics of line-edge roughness
in more detail, beyond the total variation 3σLER . In particular, it is useful to consider
the roughness as a function of spatial frequency. This is done conveniently through
the power spectral density (PSD) of the roughness:132,133
Z W/2 2
1 −2πikx
S (k) = lim dx ,
y(x) − ȳ e (3.28)
W→∞ W
−W/2
where S (k) is the power spectral density at spatial frequency k. Large values of S (k)
indicate that there is a large amount of LER at the spatial frequency k. High values
of k represent variations that occur over short distances. There may be a reticle
contribution to the low-spatial-frequency component of the LER,125 while high-
spatial-frequency roughness is related more to the resist process.134,135 LER has
been measured as a function of spatial frequency, and representative results from
an insightful paper by Yamaguchi and coworkers are shown in Fig. 3.33.136 The
data from several resists showed similar behavior for LER as a function of spatial
frequency. In all cases, when plotted on a log-log scale the largest contributions
to LER came from lower spatial frequencies (although not necessarily at the very
lowest spatial frequency).
Figure 3.33 LER power spectral density as a function of spatial frequency. The results on
the left were from e-beam exposures, while the data on the right were from ArF exposures.
The electron beam was intentionally blurred to induce greater LER.
Photoresists 85
where W = Nd and xm = md. This discrete form also reflects the common practice
of collecting images of resist patterns on grids of uniform spacing using scanning
electron microscopes. Reconstruction of a line edge requires knowledge of the
magnitude and phase of roughness at every spatial frequency, but S (k) provides
only the magnitude. However, simulated line edges can be generated assuming
random phases:
(N/2)−1
1 X q 2πim j
y (xm ) − ȳ = √ S k j e N eiθ j , (3.30)
Nd j=−N/2
2σ2LER Lc
S (k) = , (3.31)
1 + k2 Lc2 0.5+α
where σLER is the standard deviation of the roughness. The parameter Lc is referred
to as the correlation length. Hence, the LER for a large number of resists can be
characterized by just three numbers, σLER , Lc , and α.
There is another function, the autocorrelation function, that is useful. If y(x) is
sampled at evenly spaced points xm = md, then the autocorrelation function R at
lag xi+ j − xi = x j is calculated as:
N− j
X h i
R(x j = xi+ j − xi = jd) = y xi+ j − ȳ y (xi ) − ȳ ,
(3.32)
i=1
86 Chapter 3
which may be computed from the measurements y(x) of the deviation of the actual
resist edge from a straight line [see Fig. 3.32(b)]. When α = 0.5,
−x
R(x) = σ2LER e Lc . (3.33)
Hence, when α = 0.5, the value of x at which R(x) = σ2LER /e is the correlation
length Lc . The values for the line edge y(x) are typically obtained from scanning
electron micrographs, using analysis software to ascertain the edge positions.
Such analysis software is often available from the makers of scanning electron
microscopes; alternatively, stand-alone software packages can be used.139 From
these values the autocorrelation function R can be calculated, which can then be
used to determine the correlation length, as illustrated in the following example.
Figure 3.34 illustrates SEMs showing LER. Using data from these SEMs
the autocorrelation functions are calculated, and these are shown in Fig. 3.35,
normalized to σ2LER . According to Eq. (3.33), the values for x at which the
normalized autocorrelation functions = 1/e = 0.368 provides the correlation
lengths. For these examples, Lc ≈ 13.0 for the resist patterns, while Lc ≈ 22.3
for the patterns after etch.
The characteristics of line-edge roughness and its impact on critical-dimension
variation has been studied as a function of the three parameters σ, Lc , and α. The
amplitude of the line-edge roughness is directly proportional to σLER ,137 and this
behavior is illustrated in Fig. 3.36, where y(x)− ȳ is plotted for two values of 3σLER .
The influence of Lc is shown in Fig. 3.37. For larger values of Lc the variations
occur over longer distances and lower spatial frequencies. The parameter α also
has an impact on the distribution of spatial frequencies, as can be seen in Fig. 3.38.
As can be seen, more high-spatial-frequency roughness is associated with smaller
values of α.
Figure 3.34 SEM images for nominal 65-nm lines on a 190-nm pitch:137 (a) resist (b) after
etch into a layer of Si3 N4 .
Photoresists 87
Figure 3.35 Plots of R/σ2LER for resist (DI) patterns and patterns after etch (FI).
One of the impacts of LER is critical dimension (CD) variation. What is meant
by a critical dimension is the distance L̄ between two lines that are best fits over
a distance W to the edges of resist (Fig. 3.31). Due to LER, the values of L̄ will
vary from location to location, resulting in critical-dimension variation σCD even
in the absence of any other sources of critical-dimension variation. The amount
of variation σCD is directly proportional to the standard deviation of the line-edge
roughness σLER . Simulations have shown that the critical-dimension variation is a
weak function of the parameter α, so it is useful to assume a value of α = 0.5,
and then the analytical form for the autocorrelation function [Eq. (3.33)] can be
used. The dependence of σCD on Lc is shown in Fig. 3.39. The amount of critical
88 Chapter 3
Figure 3.38 Graph of Eq. (3.25) showing the impact of α on the power spectral density of
LER.141
dimension variation is larger for bigger values of the correlation length. Also, there
is more variation for smaller values of W, for given values of σLER , Lc , and α.
This is simply a manifestation of lower variation when averaging over a greater
quantity of data. However, this does have a significant impact on scaling. It might
be expected that the amount of critical dimension variation should decrease in
proportion to shrinks in dimensions, such as gate lengths L. As can be inferred
from Fig. 3.39, it is also necessary to compensate for decreasing values of W as
well.
Photoresists 89
Figure 3.39 The dependence of CD variation on the correlation length. The graphs were
generated by simulating CD variation for lines of resist of two values of W, 50 nm and 200
nm. For both cases it was assumed that α = 0.5 and 3σLER = 4 nm.
Fitting to simulated data, the dependence of σCD on σLER and Lc was found to
be
log Lc + 0.71
!
σCD = σLER , (3.34)
11.2
where the logarithm is base 10, and Lc is in units of nanometers. This equation
shows the direct relationship between σCD and σLER , as well as the weaker,
logarithmic dependence of σCD on Lc .
There are a number of factors that contribute to line-edge roughness,142 not
all of which are directly related to the resist materials and chemistry. An optical
image that has a diffuse line edge might be expected to lead to resist patterns
with edges that are not sharply defined, and this is indeed the case. It has been
observed that LER increases as the image log slope (a good measure of optical edge
acuity, introduced in Chapter 2) becomes small.143,144 However, from Fig. 3.40, it
does appear that LER reaches a nonzero minimum value, even for images with
large edge slope. This indicates more fundamental sources of line-edge roughness.
Some LER certainly originates at the molecular level, but substantial low-spatial-
frequency LER can be seen in Fig. 3.33, occurring over long distances relative to
molecular-length scales. Some of the low-frequency LER on the wafer may get
transferred from LER on the mask.145,146 Naturally, the high-frequency LER from
the mask is filtered by the lens.
When exposures become low, statistical variations in the number of photons
involved in exposing the resist can contribute to LER.148 This results from basic
photon statistics (shot noise), where the root-mean-square (rms) variation in the
number of photons ∆n is related to the average number of photons n̂ by the
following expression:149
r
∆n 1
= + 1. (3.35)
n̂ n̂
90 Chapter 3
Figure 3.40 LER versus image log slope and resist-edge log slope (RELS) for Sumitomo
PAR735 resist exposed on a 0.75-NA ArF scanner at the isofading condition. Fading is
explained in Chapter 5, and details on the isofading condition can be found in Ref. 147.
The RELS parameter can be considered an empirically derived image log slope. The data
involved a wide range of linewidths, pitches, doses, and focus conditions.147
When the number of photons becomes small, the statistical fluctuation can be
appreciable. Consider, for example, ArF light passing through the top of a resist
film in an area that is 10 × 10 nm. For an intensity of 1 mJ/cm2 , approximately
1000 photons on average enters this area. From Eq. (3.35), this light intensity will
fluctuate approximately ±3%. At the edge of features, the light intensity is typically
∼1/3 that found in the middle of large features, so the effect of photon statistics is
greater at line edges. An increase in LER at low doses has been observed and is
shown in Fig. 3.41. As dimensions shrink, and tolerable LER becomes smaller, the
area over which fluctuations become significant also decreases.150 See Problem 3.3
to pursue this further.
As one might expect, diffusion (which occurs during post-exposure bake)
serves to reduce LER. This is related to the original motivation for chemical
amplification—addressing the productivity of exposure tools with low light
intensity. In ArF chemically amplified resists, the photoacid diffuses with a
diffusion length that is typically 20–30 nm. The amount of line blurring in KrF
ESCAP resists was measured to be ∼50 nm (FWHM),151,152 although about half
that of other KrF resists.153 For simple Fickian diffusion, the diffusion length =
Photoresists 91
√
2 Dt, where D is the diffusivity and t is the time during which diffusion takes
place. Diffusivity typically increases with higher temperatures, so diffusion will
be less at low temperatures and with short bake times. This behavior is shown in
Fig. 3.42. Diffusion over such length scales may serve to smooth out the roughness,
but it is problematic when the amount of blurring becomes comparable to the half
pitch.154 At this point, the optical image originally projected into the resist will not
be seen in a developed resist pattern, since the diffusion during post-exposure will
blur the pattern excessively.
Figure 3.41 LER versus exposure dose for resists exposed on an extreme-ultraviolet
(EUV) 0.3-NA microexposure tool with a synchrotron light source161,162 and
√ corrected
for
mask-absorber contributions to LER.163 The solid line is a fit of LER = a/ dose + b to the
data.
Although diffusion is not a useful way to reduce LER for sub-50-nm lithography,
there are some other methods that appear to have some beneficial effects. Directly
addressing line-edge roughness can be difficult, particularly when there is a conflict
between a need for low LER and low exposure doses for exposure-tool throughput.
There has been work to apply post-develop processing to reduce LER, including
bakes,155 etches, vapor treatments, ozonation, and chemical rinses,156 all of which
provide some reduction in LER.157,158 Of these, vapor smoothing and surface-
conditioning rinses appear to have the most benefit and can easily be implemented.
However, most of this improvement occurs at high spatial frequencies and not at
the low-spatial-frequency LER that has the greatest impact on CD variability.159,160
(1) During the coating of the imaging layer, there is the potential for the solvent
to dissolve the planarizing layer, unless the two films have been engineered in
advance for compatibility.169 The insertion of an inorganic layer obviates this
additional materials engineering.
(2) To act as an etch mask for the organic planarizing layer, the imaging layer
must contain silicon. The use of an inorganic middle layer enables the use of
ordinary photoresists for the imaging layer, with the sputtered films acting as a
hard mask.
(3) Inorganic hardmasks often provide superior etch selectivity to oxygen etches,
thereby providing capability for thicker underlayers.
Trilayer processes are obviously more complex than bilayer processes (which
are already complex enough!), but prove useful in several situations, such as
the manufacturing of recording heads for magnetic data storage.170 To reduce
the complexity of multilayer resist processing, alternative techniques have been
proposed that have many elements in common with the bilayer resist process
described above. One class of alternative methods is referred to as top-surface
imaging, often designated by its acronym, TSI.171,172 The basic top-surface
imaging process is outlined in Fig. 3.44. A thick resist layer is coated on the
substrate. The effect of exposure for this resist material is to modify the diffusivity
of silicon-containing molecules in the resist. In the negative-tone embodiment of
94 Chapter 3
Problems
3.3 Suppose that one exposes an ArF resist with a dose 10 mJ/cm2 (in large open
areas). At the edge of the line, the dose is 0.3× this. Show that the shot noise
along the line edge due to photon statistics is ±1.8% (1σ) in 10 × 10 nm.
3.4 For ArF lithography, show that a dose of > 25 mJ/cm2 is required to maintain
±3% dose control (shot-noise limited) through an area 2 × 2 nm.
3.6 Why don’t diazonaphthoquinone resists work well when processed in very dry
ambient conditions?
3.7 What problems occur when chemically amplified resists are processed under
conditions where bases are present? Why do bases cause these problems?
3.8 What is the most commonly used chemical for adhesion promotion?
96 Chapter 3
References
1. H. A. Levine, L. G. Lesoine, and J. A. Offenbach, “Control of photoresist
materials,” Kodak Photores. Sem. Proc., 15–17 (1968).
2. R. K. Agnihotri, D. L. Falcon, F. P. Hood, L. G. Lesoine, C. D. Needham,
and J. A. Offenbach, “Structure and behavior of cyclized rubber photoresist,”
Photo. Sci. Engr. 16(6), 443–448 (1972).
3. T. Iwayanagi, T. Ueno, S. Nonogaki, H. Ito, and C. G. Willson, “Materials
and processes for deep UV lithography,” in Electronic and Photonic
Applications in Polymers, M. J. Bowden and S. R. Turner, Eds., American
Chemical Society, Washington, DC (1988).
4. J. J. Sagura, and J. A. Van Allen, “Azide sensitized resin photographic resist,”
U.S. Patent No. 2,940,853 (1960).
5. P. S. Gwozdz, “Positive versus negative: a photoresist analysis,” Proc. SPIE
275, 156–163 (1981).
6. W. Thackeray, G. W. Orsula, E. K. Pavelchek, D. Canistro, L. E. Bogan,
A. K. Berry, and K. A. Graziano, “Deep UV ANR photoresists for 248-nm
eximer laser photolithography,” Proc. SPIE 1086, 34–47 (1989).
7. G. Gruetzner, S. Fehlberg, A. Voigt, B. Loechel, and M. Rother, “New
negative-tone photoresists avoid swelling and distortion,” Sol. State Technol.,
79–84 (January, 1997).
8. J. D. Shaw, J. D. Delorme, N. C. LaBianca, W. E. Conley, and S.
J. Holmes, “Negative photoresists for optical lithography,” IBM J. Res.
Develop. 41(1–2), 81–94 (1997).
9. C. A. Mack and J. E. Connors, “Fundamental differences between positive
and negative tone imaging,” Proc. SPIE 1574, 328–338 (1992).
10. W. Henke and M. Torkler, “Modeling of edge roughness in ion projection
lithography,” J. Vac. Sci. Technol. 17(6), 3112–3118 (1999).
11. G. P. Patsis, N. Glezos, I. Raptis, and E. S. Valamontes, “Simulation of
roughness in chemically amplified resists using percolation theory,” J. Vac.
Sci. Technol. 17(6), 3367–3370 (1999).
12. C. A. Deckert, and D. A. Peters, “Adhesion, wettability and surface
chemistry,” in Adhesion Aspects of Polymer Coatings, 469–499, Plenum
Press, New York.
13. Bill Moffat, Yield Engineering Systems, private communication (2000).
14. J. N. Helbert, “Process for improving adhesion of resist to gold,” U.S. Patent
No. 4,497,890 (1983).
15. J. N. Helbert, “Photoresist adhesion promoters for gold metallization
processing,” J. Electrochem. Soc. 131(2), 451–452 (1984).
16. S. Wu, Polymer Interface and Adhesion, Marcel Dekker, New York (1982).
Photoresists 97
Figure 4.1 The general imaging problem. Note that not all light rays diffracted by the
pattern on the reticle will pass through the projection optics.
Modeling and Thin-Film Effects 111
(∇2 + k2 )U = 0, (4.2)
where k = ω/c = 2π/λ, with c being the speed of light and λ being the wavelength
of the light. U could be a component of the electric or magnetic field of the light
wave as long as it satisfies the wave equation. Consider light originating at a point
S and diffracted by an aperture σ (Fig. 4.2). The light from a point source must
satisfy the wave equation and will be a spherical wave:
Aeikr
U source = , (4.3)
r
where r is the radial distance from the source. At a point P on the other side of the
aperture, after some analysis, it can be shown that U is obtained by an integration
over the aperture σ,1
−Ai cos δ ! "
U(P) = eik(r+s) dσ, (4.4)
λ rs σ
where δ is the angle between the line S -σ and the normal vector ñ angle to
the aperture. It has further been assumed that r and s are large compared to the
dimensions of the aperture. (This is known as the Fraunhofer approximation.) The
diffraction pattern generated by an extended source can be obtained by further
integrations of Eq. (4.4) over the source. Diffraction has a significant effect on the
image produced by a complete optical system. While light propagates at all angles
on the imaging side of the aperture, not all of the diffracted light is collected by
the lens used to project the image onto the wafer. This partial collection of light
leads to finite resolution and was illustrated in Chapter 2 with the example of a
diffraction grating. The lens itself further modifies the light, ultimately restoring
the light to a spherical wave, one that converges on the focal plane of the lens.
In the simplest lithography models, the optical image can be calculated with
reasonable accuracy without consideration of how the resist modifies the light
distribution parallel to the wafer. An aerial image is calculated and then assumed
to propagate parallel to the optical axis into the resist. At high NAs, this is
not an accurate approximation. Consider the situation depicted in Fig. 4.3. Two
light rays are incident on a wafer. As discussed in Chapter 2, image formation
requires the interference of at least two waves of light. For S-polarized light,
the polarization vectors for the two waves are both perpendicular to the plane
of the paper and interference between the two waves occurs. However, for P-
polarization, the polarization vectors have orthogonal components that do not
interfere. For light rays with 45-deg angles of incidence, the two light rays have
completely orthogonal polarizations, and there is no interference at all between
the two rays. This corresponds to an NA of 0.71, suggesting that there would be
negligible benefits for increasing NAs to such a value since the optical contrast is
not enhanced by the addition of oblique rays. Fortunately, refraction of the light at
the resist-air interface returns the polarization vectors from the two rays to more
parallel directions (see Problem 4.1). The refraction enables high-NA lithography.
It is essential to include the effects of refraction in order to model accurately the
imaging that occurs in lithography.7
At high NAs, it also becomes important to consider polarization-dependent
effects at both the wafer and mask. First consider reflection at the wafer. Shown
in Fig. 4.4 is reflectance as a function of the angle of incidence for S-polarized
and P-polarized light. At low angles of incidence, the reflectance is not strongly
dependent upon polarization, so it is not important to consider polarization
when modeling low-NA lithography. However, at larger angles of incidence the
difference becomes significant. As noted above, there is stronger interference for
S-polarized light than for P-polarized light at oblique angles, so this difference in
reflectivity will have a significant impact on imaging.8
For lithography using very high NAs, features on the mask can become
comparable in size to the wavelength of light. When this happens, the treatment
of the opaque areas on the mask as infinitely thin absorbers is no longer a suitable
approximation. There are polarization effects at the mask that become relevant at
very high NAs. For very fine features, the mask will act as a wire-grid polarizer.9
To account for these effects properly, sophisticated physical models are needed,
and desktop computers are often inadequate for performing the computations.
Figure 4.3 Two P-polarized rays of light, incident on the wafer surface.
114 Chapter 4
Figure 4.4 Reflectance from a resist surface into air for S-polarized and P-polarized light
as a function of the angle of incidence. The index of refraction of the resist is assumed to be
1.7, and substrate effects are ignored.
4.2 Aberrations
The Rayleigh criterion found in Eq. (2.7) assumes the degradation of resolution
results entirely from diffraction, i.e., the lenses are free from aberrations and
imperfections. Such optics are called “diffraction limited” because the design
and manufacture of the lenses are assumed to be so good that the dominant
limit to resolution is diffraction. However, real optical systems are never perfect,
and accurate calculations of lithographic performance need to account for these
deviations from perfection. Departures of the optics from diffraction-limited
imaging are caused by “aberrations,” which can be understood in the context of
geometrical optics as situations in which all light rays originating from a single
object point do not converge to a single image point—or converge to the wrong
point (Fig. 4.5). Aberrations can arise from a number of sources:
(1) Imperfect design
(2) Lens and mirror surfaces that depart from design
(3) Lens material inhomogeneity or imperfection
(4) Imperfect lens assembly.
Figure 4.5 For a lens with aberrations, light rays do not all converge to the same point.
Modeling and Thin-Film Effects 115
Because light rays pass through different parts of glass elements and across various
points on lens-element surfaces, varying glass optical constants or errors in the
curvature of lens-element surfaces cause light rays to miss the focal point of the
lens. Aberrations vary from point to point across the image field, contributing
to across-field linewidth variations.10 A full accounting of imaging must include
the effects of aberrations. The needs of lithography impose stringent requirements
on lens materials and the surfaces of lens elements. To meet the requirements of
modern lithography, the index of optical-materials refraction must be uniform to
less than one part per million, and lens surfaces may deviate from design values by
no more than a few nanometers (rms).
To appreciate the impact of aberrations on optical lithography, consider the
illumination of a grating pattern of pitch d by coherent illumination that is normally
incident on the mask. The light will be diffracted by the grating. Consider the
situation in which only the 0th - and ±1st -order beams are collected by the lens and
imaged onto the wafer. However, suppose that because of imperfections in the lens
that the +1st -order beam acquires a phase error ∆φ relative to the 0th - and −1st -
order beams. Consequently, the light amplitude at the image plane is given by11
where use has been made of the fact that c1 = c−1 . The resulting light intensity at
the wafer plane is given by:
2πx ∆φ ∆φ 2πx ∆φ
! ! !
I(x) = c20 + 4c21 cos 2
+ + 4c0 c1 cos cos + . (4.7)
d 2 2 d 2
The net impact of ∆φ on the image can be seen from Eq. (4.7). First, the
peak intensity is shifted by d∆φ/4π. This illustrates that phase errors will result
in misregistration. Additionally, the peak intensity is reduced by a factor of
cos(∆φ/2). Another consequence of the phase error is a reduction in image
contrast.
The effects of aberrations on lithographic performance are assessed more
generally by including aberrations in the imaging models. Consider light emerging
from a point on the mask, passing through the optical system and ultimately
focused onto the wafer. The situation at the wafer plane is considered as the
reverse of what happened at the mask—the light can be thought of as a wave
converging onto a point of the image. Ideally, this would be a spherical wave, but
because of lens aberrations, the actual wavefront of the image may deviate by small
amounts from that of a perfect spherical wave (Fig. 4.6). A wavefront is a surface
of constant phase. Aberrations are incorporated into imaging models by replacing
the expressions representing spherical waves, such as the exponential in Eq. (4.4),
by the aberrated wavefront.
116 Chapter 4
Figure 4.6 Light waves converging onto a wafer. An unaberrated image is formed by a
spherical wave, while imperfect images result from nonspherical wavefronts.
1 1 Piston none
2 ρ cos θ x tilt transverse pattern shift
3 ρ sin θ y tilt transverse pattern shift
4 2ρ2 −1 focus average focus shift
5 ρ2 cos 2θ 3rd -order x astigmatism line-orientation-dependent focus shift,
elliptical contacts
6 ρ2 sin 2θ 3rd -order 45-deg line-orientation-dependent focus shift,
astigmatism elliptical contacts
7 3ρ3 − 2ρ cos θ 3rd -order x coma transverse pattern shift, depth-of-focus
decrease, pattern asymmetry
8 3ρ3 − 2ρ sin θ 3rd -order y coma transverse pattern shift, depth-of-focus
decrease, pattern asymmetry
9 6ρ4 − 6ρ2 + 1 3rd -order spherical pattern-size-dependent average
aberration best-focus shift, depth-of-focus
decrease
10 ρ3 cos 3θ 3-foil x transverse pattern shift, depth-of-focus
decrease, pattern asymmetry
11 ρ3 sin 3θ 3-foil y transverse pattern shift, depth-of-focus
decrease, pattern asymmetry
12 4ρ4 − 3ρ2 cos 2θ 5th -order x astigmatism line-orientation-dependent focus shift,
elliptical contacts
13 4ρ4 − 3ρ2 sin 2θ 5th -order 45-deg line-orientation-dependent focus shift,
astigmatism elliptical contacts
14 10ρ5 − 12ρ3 + 3ρ cos θ 5th -order x coma transverse pattern shift, depth-of-focus
decrease, pattern asymmetry
15 10ρ5 − 12ρ3 + 3ρ sin θ 5th -order y coma transverse pattern shift, depth-of-focus
decrease, pattern asymmetry
16 20ρ6 − 30ρ4 + 12ρ2 − 1 5th -order spherical pattern-size-dependent average best
aberration focus shift, depth-of-focus decrease
from other geometries. For a lens with a significant amount of coma aberration, the
printed lines will have different widths.14 Good linewidth control clearly requires
low levels of lens aberrations, such as coma.
Spherical aberration: While coma affects left-right symmetry of the image,
spherical aberration affects the symmetry of the image above and below the
focal plane. This is illustrated in Fig. 4.9. As might be expected, the lithographic
implications of spherical aberration involve focus. A particularly interesting
manifestation of spherical aberration is a pitch dependence for the plane of best
focus.10 This follows immediately from the nature of diffraction and the situation
depicted in Fig. 4.9. From Eq. (2.1), the angle at which a light ray is diffracted
by a grating is a function of the grating pitch. When the optical system suffers
from spherical aberration, light rays with different angles of diffraction will focus
onto different planes (Fig. 4.9). For circuit patterns with multiple pitches, spherical
aberration reduces the overall depth of focus.
Resist will also induce spherical aberration.15 Suppose that one has an optical
system that produces aberration-free images in air. As seen in Fig. 4.10, when
118 Chapter 4
Direction
of the
optical
axis
Figure 4.7 Aerial image-intensity contours for a 0.4-µm space on the mask.10 For an
unaberrated lens, the intensity contours have left-right symmetry and are also symmetric
across the plane of best focus. Images produced by lenses with coma (Z7 ) lose the left-right
symmetry, while spherical aberration (Z9 ) breaks the symmetry across the plane of best
focus. The pictures in this figure were simulated with Solid-C for an i-line tool with NA = 0.6
and σ = 0.5. For the aberrated images, 50 nm were assumed for each aberration (see Color
Plates).
Figure 4.8 Lines of equal width (RL = RR ) on the reticle print with different widths (WL , WR )
on the wafer in the presence of coma.
Figure 4.9 Lines with different angles of incidence have different focal points when the
optical system has spherical aberration.
Modeling and Thin-Film Effects 119
Figure 4.10 Lines with different angles of incidence refract by different amounts, leading
to induced spherical aberration.
the light encounters the air-resist interface, the light rays refract. The amount of
refraction depends upon the angle of incidence. It can also be seen that the amount
of induced spherical aberration is greater for high-NA lenses than for low-NA ones.
The extent to which stepper lenses approach the diffraction limit can be
determined by comparing calculated and measured optical profiles. There are
a number of commercially available software packages for calculating image
profiles, and several are listed in Table 4.2. It is also possible to measure
optical images directly. Several methods for measuring optical images have been
developed.16–18 There is also the aerial image measurement system (AIMS),19
which uses a very small-field optical system to mimic actual projection optics. This
type of system is employed by a number of users to evaluate mask quality. Modern
lenses are typically found to be very near the diffraction limit, but deviations from
the limit are observable at levels relevant to lithography. Consequently, there is
considerable interest in measuring the magnitudes and effects of aberrations.
By the very nature of certain aberrations, the degradation of resolution or
misregistration that they cause is greatest for points far from the optical axis. Field
curvature and distortion are the most commonly encountered of such aberrations.
As a result of this characteristic behavior, there has been a tendency for lenses to
have higher resolution in the center of their fields—a characteristic that must be
addressed by lens makers. It is challenging to design a high-resolution lens for
which the resolution is well maintained over a large field. This problem has been
well addressed in modern lenses, which do not show significantly better resolution
in the center of the imaging field relative to the edges of the exposure field.
It is an extremely difficult task to design a lens free from the above aberrations,
even at a single wavelength. Another problem arises when the light source
has a range of wavelengths. The refractive indices of glass materials vary with
wavelength, causing imaging to change at different wavelengths. The inability
of a lens to focus light over a range of wavelengths, which results from this
variation in the refractive indices of the glasses used to make the lens, is called
chromatic aberration. An advantage of reflective optics is the freedom from
chromatic aberration since the focusing properties of mirrors are independent
of the wavelength. For systems that use refractive optics, the light sources and
illuminators must be designed consistently with the bandwidth requirements of
the projection optics. This must be done while keeping in mind that most sources
of light, such as mercury arc lamps, produce a broad spectra of light. Early
Figure 4.11 An all-reflective lens designed for use in extreme ultraviolet (EUV) lithography.
EUV lithography is discussed in detail in Chapter 12.
122 Chapter 4
and these filters may sometimes degrade. This is a rare occurrence, and chromatic
aberration is usually not a problem that is encountered directly in g-line or i-line
lithography.
In the deep UV, the small number of materials that can be used for making
lenses has made it difficult to correct for chromatic aberrations, and extremely
narrow bandwidths are required for refractive DUV lenses.22 These lenses typically
must operate only over a very small portion of the range of wavelengths at which
KrF and ArF lasers may potentially lase. This imposes requirements for good
wavelength control in DUV lithography. This is discussed in further detail in
Section 5.2 on excimer laser light sources. The problem of chromatic aberration
also reappears in Chapter 6, where it presents a problem for through-the-lens
imaging for the purposes of alignment. Another difficulty with exposure systems
that use monochromatic illumination is their susceptibility to substrate thin-film
optical effects, a topic that is also discussed later in this chapter. Highly accurate
simulations need to take into account the polychromatic nature of the light source
and the behavior of the optical system throughout the full bandwidth of the light.
After aerial images are calculated, the light-intensity distributions within the
resist films need to be calculated. The calculations of aerial images I(x, y, z) provide
the modulation of the light intensity along directions parallel to the xy plane of
the resist film, and there is a dependence in the z direction due to the effects of
defocus. Within the resist film, there are further variations in the light intensity in
the z direction as a consequence of absorption in the resist and reflection from the
substrate. It is this variation in the z direction (for causes other than defocus) that
is the subject of the next two sections.
After determining the image produced by the lens in air—the aerial image—the
next step in modeling the lithography process is to describe the propagation of
light through the resist film. This propagation is complicated by the reaction of
the resist to light. Because of the resist’s photochemistry, the absorption by the
resist evolves during exposure. Typically, resists become less absorbing and more
transparent following exposure to light. A practical theory describing the effects
of resist exposure was developed by F. H. Dill et al.,23 who helped to establish
lithography firmly as an engineering science.
Photoresists are multicomponent materials. The active ingredients in resists—
the photoactive compounds and photoacid generators—undergo chemical reactions
upon exposure to light. Other components may absorb light but do not undergo
photochemical reactions. The intensity of light I passing in the z direction through
a homogeneous material varies according to Lambert’s law:
dI
= −αI(z), (4.8)
dz
Modeling and Thin-Film Effects 123
where α is the absorption coefficient of the material. Integrating this equation leads
to
α = ac, (4.10)
α = AM + B, (4.12)
where
A = (aPAC − aP ) c0 , (4.13)
X
B= aR cR + aP c0 , (4.14)
c
M= . (4.15)
c0
result in a chemical reaction. This quantum efficiency is taken into account by the
coefficient C:
∂M
= −CI M, (4.16)
∂t
where t represents time and I is the intensity of the light. The parameters A, B,
and C have become known as the Dill parameters for the resist. Typical values
are given in Table 4.3. The contribution of Dill was the reduction of complex
photochemistry to a set of three easily measured parameters, A, B, and C. The
extent of photochemical reaction is captured by M(x, y, z, t), where (x, y, z) is a
point within the resist film, and t is the exposure time.
Once a film of resist is exposed to light, the absorptivity, represented by AM + B,
is no longer uniform throughout the film because resist at the top of the film is more
bleached (on average) than the resist near the resist-substrate interface, having
received more light than the bottom. That is, following exposure, M is no longer
uniform in the resist film. Bleaching is represented by values of M that are less than
1.0. Because of this bleaching, the determination of resist exposure throughout the
depth of the resist film is not amenable to a closed-form solution, but the above
equations can be used to calculate M(x, y, z, t) iteratively for any specified values of
A, B, and C using a computer. The variation in M in the z direction is determined by
how light propagates through the resist thin film and is reflected from the substrate.
Reflections from the substrate are discussed shortly. The variation of M in the xy
plane results from the pattern on the mask and subsequent diffraction and imaging.
The modeling of imaging was outlined in Section 4.1. Putting together the optical
image and the subsequent exposure of resist, M(x, y, z, t) can be calculated.
An interesting observation was made by integrating Eq. (4.16) with respect to
time, giving27
M = e−CIt , (4.17)
and then differentiating this with respect to x. This results in the following:
∂M ∂I
! !
1
= M ln(M) . (4.18)
∂x I ∂x
Table 4.3 Dill parameters for some commercially produced photoresists.24–26 (Not all of
these resists have remained commercially available.)
Resist A (µm−1 ) B (µm−1 ) C (cm2 /mJ) Exposure wavelength Reference
where I0 is the intensity of the incident light, α is the optical absorption coefficient,
and z is the depth within the resist film, with z = 0 being the top surface. Typical
values for α are determined from the A and B parameters given in Table 4.3 along
with the relationship of Eq. (4.12). From the values for A and B in Table 4.3, one
can see that the spatial scale for variation of the light intensity, due to conventional
absorption of light propagating through an absorbing medium, is on the order of
tenths of a micron.
The phenomenon of resist bleaching during exposure does not change this
behavior significantly. Given the limit where B → 0, Eqs. (4.8), (4.12), and (4.16)
are solved exactly to give the light intensity and amount of remaining photoactive
compound at depth z in the resist, after t seconds of exposure:28
I0
I(z, t) = , (4.20)
1− e−CI0 t 1 − eAz
1
M(z, t) = , (4.21)
1− e−Az 1 − eCI0 t
where I0 is the intensity of the incident light. At t = 0, Eq. (4.20) reduces to Eq.
(4.9). For the case of B , 0, α(z) must be determined iteratively on a computer.29
The presence of the substrate underneath the photoresist has a significant effect
on the light-intensity distribution within the photoresist film compared to the
picture just presented. This can be appreciated by considering the situation depicted
in Fig. 4.12. Light illuminates a substrate covered by photoresist. Typical substrates
are silicon or silicon covered with various films, such as silicon dioxide, silicon
nitride, aluminum, assorted silicides, titanium nitride, and other materials used to
fabricate integrated circuits. In this example the films are uniform and have a large
126 Chapter 4
spatial extent. (Issues associated with topography will be discussed shortly.) At the
interface between each pair of films some of the incident light is reflected, while
the remainder is transmitted.
It is highly relevant to lithography that, within the photoresist film, the light
consists of an incident component as well as a reflected one. There are two
significant consequences of this geometry:
(1) The light intensity varies rapidly, in the vertical direction, within the
photoresist film.
(2) The amount of light energy coupled into the photoresist film has a strong
dependence on the thickness of the various films in the stack.
The first property, the rapid variation of the light intensity within the photoresist
film, results from the interference between the incident and reflected light within
the resist. The variation in light intensity is sinusoidal, where the spacing between
adjacent maxima and minima are separated by very near to λ/4n, where n is the
real part of the refractive index of the photoresist. For typical values of λ and n,
this quarter-wave separation is on the order of hundredths of a micron, a full order
of magnitude smaller than one predicts from simple optical absorption. The effect
of this is shown in Fig. 4.13, where the light intensity is plotted as a function of
depth in the photoresist film for two situations—one for a silicon substrate and
the other where the substrate is matched optically to the photoresist. The rapidly
varying light distribution within the depth of the photoresist is referred to as a
“standing wave.” The consequence of standing waves of light intensity throughout
the depth of the photoresist film is alternating levels of resist with high and low
exposure. For positive resist, the high-exposure regions develop quickly, while the
low-exposure regions develop more gradually. Manifestations of standing waves
Modeling and Thin-Film Effects 127
Figure 4.13 Light intensity (λ = 365 nm) throughout the depth of an 8500-Å-thick film of
photoresist. The calculation of absorption was performed for the resist at the initiation of
exposure; i.e., the resist was unbleached and had uniform optical properties. Depth = 0
represents the air-resist interface.
are visible in micrographs of resist features, where the resist sidewalls have ridges
because alternating layers of resist have developed at different rates (Fig. 4.14).
Standing waves constrain the process because there must be sufficient exposure
for the resist in the least-exposed standing wave to develop out. Otherwise, features
are bridged. For highly reflective films, such as aluminum, this dose may be several
times greater than the resist might require in the absence of standing-wave effects.
This exaggerates the problems of substrates with topography because the resist has
varying thickness over steps. Regions requiring low doses are overexposed in the
attempt to clear out the last standing waves in other areas on the wafers.
In addition to the standing wave throughout the depth of the photoresist film,
there is another consequence of thin-film interference effects: the amount of light
absorbed in the resist has a strong functional dependence on the thickness of the
substrate and resist films. Consider the total light energy Etotal in the photoresist
where E(z) is the energy absorbed at the height z in the resist film. T 0 is the
total thickness of the resist film. The integrated light absorption Etotal is plotted in
Fig. 4.15 as a function of resist thickness for resist on an oxide layer on silicon. For
1.0-µm-thick resist, a change in the oxide thickness from 2500 Å to 2800 Å resulted
in a 14% decrease in the integrated light absorption. This situation has been
simulated, and the corresponding linewidths shown. Again there are oscillations,
with the λ/4n minimum-to-maximum spacing characteristic of standing-wave
phenomena with linewidth minima corresponding to light-absorption maxima.
The curves shown in Fig. 4.15 are known as “swing curves.” The phenomena
exemplified in Fig. 4.15 have significant consequences for microlithography.
Because linewidths vary with resist thickness variations on the order of a quarter
wave, resist thickness must be controlled to levels much less than λ/4n, which
is less than 100 nm for typical exposure wavelengths and photoresist. This is
Figure 4.15 Standing-wave effects as a function of resist thickness for λ = 365 nm (i line)
at the initiation of exposure. (a) Integrated light absorption [Eq. (4.22)], and (b) linewidths for
0.5-µm nominal features exposed at a fixed exposure dose, calculated using PROLITH.
Modeling and Thin-Film Effects 129
an impossible task for process architectures that have virtually any topography.
The standing-wave phenomenon places severe requirements on control of thin-
film thickness, as seen from Fig. 4.15, where a small change in the thickness of
the oxide layer changes the amount of integrated light energy absorbed and also
the linewidth as a consequence. This constraint on substrate films is particularly
severe for materials that have a large index of refraction, such as polysilicon. For
the Hg lamp i line, the index of refraction for polysilicon is 4.9, so the quarter-
wave maximum-to-minimum spacing is only 15 nm for a polysilicon film. It may
not be practical to control films to tolerances that are a fraction of that. While
post-exposure bakes can reduce the variations of M(z) within the resist film (to be
discussed in the next section), they have no effect on the integrated light energy
[Eq. (4.22)].
There are a number of solutions used to address the problem of varying
integrated light intensity caused by the standing-wave effect:
(1) Centering the resist thickness at a standing-wave extremum
(2) Adding dye to resists
(3) Applying bottom antireflection coatings
(4) Applying top antireflection coatings
(5) Using multiple-wavelength light sources
(6) Utilizing multilayer resist processes
(7) Utilizing surface-imaging resists.
One clearly has the minimal sensitivity to resist-thickness variations when the
process is centered at an extremum of the curves shown in Fig. 4.15. On flat
surfaces, choosing the resist thickness to correspond to a swing-curve maximum
or minimum results in the smallest linewidth variations due to variations in resist
thickness. When the topography step heights are a significant fraction of the quarter
wavelength or greater, operating at a swing-curve extremum is not an option for
immunizing the process against standing-wave effects.
Operating at a minimum of a swing curve has certain advantages. Swing curves
such as those shown in Fig. 4.15 are usually dominated by light rays at near-normal
incidence. By choosing a minimum, oblique rays are more strongly coupled into
the resist since they travel further through the resist film and represent higher points
on the swing curve. These oblique rays contain the higher spatial frequencies of
the image. Other differences between swing-curve maxima and minima can be
appreciated by examining Fig. 4.16 where the light intensity is plotted as a function
of depth in resist films corresponding to a swing-curve absorption maximum (resist
thickness = 8900 Å for the parameters used to calculate the results shown in
Figs. 4.14 and 4.15) and a swing-curve minimum (resist thickness = 8365 Å).
At a swing-curve maximum, the light intensity coupled into the resist is greatest,
thereby minimizing exposure time. For a swing-curve minimum on a silicon
substrate, the average dose is similar to that obtained on an optically matched
substrate. The amplitudes of the intensity variations through the depth of the resist
film relative to the average intensity are fairly similar. In both cases, there are
absorption minima at the resist-substrate interface.
130 Chapter 4
Figure 4.16 Light intensity through the depth of resist films representing swing-curve
absorption (a) maxima and (b) minima.
can estimate the point at which image degradation occurs due to lateral diffusion
as follows. For image degradation, the horizontal gradients of the photoactive
compound must be greater than the gradients across standing waves. As a first
approximation, these gradients can be estimated from the gradients in the light
intensity. At the edge of a feature, such as a line or space, the gradient in the
photoactive compound is
∂M
≈ 3 to 7 µm−1 (4.23)
∂x
based on Eq. (4.18) and the data from Figs. 2.18 and 2.22. Throughout much
of the image this derivative is smaller. For example, it is identically zero in the
middle of an isolated line or space. On the other hand, if the standing-wave light
intensity differs by 20% from maximum to minimum, then the vertical gradient is
approximately
0.2
= 7 µm−1 , (4.24)
λ/4n
for λ = 193 nm and n = 1.7. This shows that a very small standing wave can
give a larger vertical gradient than results from the lateral optical image profile. As
features decrease in size, the gradient at the feature edge, Eq. (4.18), increases and
eventually exceeds that of the standing waves. However, shorter wavelengths help
to maintain the diffusion gradient in the vertical direction. The primary reason that
diffusion during post-exposure bake can lead to image degradation is the tendency
toward substrates with very low reflectivity. For such situations, photoresists need
to be designed so that images are not degraded significantly during post-exposure
bake. This is an issue for sub-50-nm lithography.
The efficiency of the post-exposure bake depends upon the diffusion coefficient
of the photoactive compound or photoacid, which depends upon many variables—
the size of the photoactive compound or photoacid, the resin, the amount of residual
solvent, etc.34 This is particularly apparent for post-exposure bake temperatures in
the neighborhood of the resist’s glass transition temperature.35 Because the density
of the resin and the amount of residual solvent are affected by the softbake of the
resist, there is often a significant interaction between the softbake and the post-
exposure bake processes.36
Modeling the post-exposure bakes of novolak resists is reasonably straightfor-
ward. Diffusion is described by the differential equation
∂n ~ ~
= ∇D × ∇n, (4.25)
∂t
determined largely by the novolak matrix, which changes little during the post-
exposure bake.
For chemically amplified resists, the situation is considerably more complicated.
Many chemically amplified resists contain bases that can diffuse and will neutralize
the photoacids. Thus, the diffusion of two species and their neutralization reactions
must be accounted for in post-exposure bakes of chemically amplified resists,
greatly complicating the modeling. In the modeling of chemically amplified resists
it is necessary to determine the degree of deprotection at all points within the resist.
A typical equation describing this is37
∂ [M]
= −k1 [M] p [A]q , (4.26)
∂t
∂ [A] ~ DA ∇
~ [A] − k4 [A] [B]
= −k3 [A]r + ∇ (4.27)
∂t
∂ [B] ~ DB ∇
~ [B] − k4 [A] [B] ,
= −k5 [B] s + ∇ (4.28)
∂t
where DA and DB are diffusion coefficients for the photoacid and base, respectively.
Instead of a single diffusion parameter, as in Eq. (4.25), there are numerous
parameters required to describe the post-exposure baking of chemically amplified
resists. As a result of chemical reactions that modify the porosity of the resist
during the bake, the diffusion parameters are functions of photoacid concentration
and the extent of deprotection. The deprotection reactions during the post-exposure
bake may generate volatile materials. As these reaction products evaporate, the
free volume increases, thereby increasing the diffusivity. At the same time, another
effect of baking is densification of the resist, a mechanism that reduces diffusivity,
the opposite effect.
To complicate matters further, there appears to be some indication that photoacid
becomes trapped during the diffusion process.38 Modeling the post-exposure bake
in chemically amplified resist is today an area of active research, and it is quite
important. Reasonable models for the post-exposure exposure bake of chemically
amplified resists can be obtained by fitting data for lines and spaces,39 but these
models do not describe the behavior of more complex geometries very well. Acid
diffusion appears to have a significant influence on patterns in two dimensions,
such as contact holes40 and line ends. Good models require consistency with
the actual physics and chemistry of post-exposure bakes, and such models are
necessarily complex.
Modeling and Thin-Film Effects 133
where
ni − n j
ρi, j = . (4.30)
ni + n j
134 Chapter 4
When all refractive indices are real, the above equation leads to the well-known
conditions for an antireflection coating:
√
n2 = n1 n3 , (4.32)
(2m + 1)λ
d2 = , (4.33)
4n2
where m is any nonnegative integer. These equations are used for antireflection
coatings on refractive optical elements. When the refractive index of layer 3 has a
large imaginary component, Eq. (4.31) becomes two equations—one for the real
part and one for the imaginary part. This necessitates another factor that can be
varied, such as the imaginary component of the refractive index of layer 2, the
antireflection coating.
If a material is too optically absorbing, it becomes a reflective surface.
Consequently, there is an optimal level of absorption for antireflection coatings.
This optimum is estimated from the following considerations. Thin films (20–120
nm) are desirable in order to maintain process simplicity during the steps that
remove the antireflection coating. For a thin film to attenuate normally incident
light on two passes (incident and upon reflection) through the film to 10% or less,
the following condition must be met:
4πκ
ρe−2t λ ≤ 0.1, (4.34)
where t is the thickness of the film, κ is the imaginary part of the refractive index,
ρ is the reflectance between the underlying substrate and the antireflection coating,
and λ is the wavelength of the light. For ρ = 0.7 and t = 70 nm, this implies that
κ
> 0.0011 nm−1 . (4.35)
λ
On the other hand, if κ becomes too large, then the thin film itself becomes too
reflective. The reflectance between semi-infinite thick layers of material is plotted
in Fig. 4.17. From Fig. 4.17 and Eq. (4.35) one obtains, for λ = 248 nm,
in order to maintain reflectance back into the photoresist below 10%. The upper
bound on κ depends upon the degree to which the real parts of the refractive
indices of the resist and antireflection coating are matched. Antireflection coatings
are more effective when the real part of their index of refraction nearly equals
Modeling and Thin-Film Effects 135
Figure 4.17 Reflectance at the interface between semi-infinite materials, calculated as the
square of Eq. (4.24). One material is nonabsorbing and has an index of refraction equal to
1.74, while the real part of the index of refraction of the other material equals 2.00.
that of the resist. Also, note that the lower bound for the optimum absorption by
the antireflection coating scales with the wavelength according to Eq. (4.35). Not
surprisingly, most materials in use as antireflection coatings have values of κ that
fall in the optimum range. For TiN, a good inorganic ARC for i-line lithography,
n = 2.01 − 1.11i, while n = 1.90 − 0.41i for BARLi, an organic i-line ARC.
Bottom antireflection coatings address all of the lithographic problems
associated with reflective substrates: linewidth variations over topography
(assuming the process results in adequate coverage of the antireflection coating
over the steps), standing waves within the resist film, and the problem of notching,
which will be discussed shortly. However, there are disadvantages as well. First,
the antireflection coating must be deposited or coated. This additional processing
step adds costs and potential for defects. (It may be argued that the deposition does
not represent an additional step, because an adhesion promotion step is eliminated
by the use of a spin-coated organic ARC. However, the ARC materials and
equipment are usually much more expensive than those required for conventional
adhesion promotion.) Second, the antireflection coating usually must be etched.
Some spin-coated ARCs will develop out,51 but the develop step depends critically
on bake temperatures. The processing of such materials can be difficult when
there is topography and accompanying variations in ARC thickness. Developing
is not anisotropic, and the resulting undercut limits the use of ARCs that dissolve
in developer for extremely small geometries. In general, the ARC must be
anisotropically etched out of the exposed areas. Finally, the ARC usually must
ultimately be removed from the unetched areas, since it is rarely a film that is part
of the process architecture, other than to assist the lithography. (TiN on aluminum
is a notable exception to this, where TiN is used for hillock suppression52 and as a
barrier metal.) Organic ARCs can usually be removed by the same process used to
remove remaining photoresist, but inorganic ARCs require a special etch that must
be compatible with the other exposed films. In spite of the expense of additional
processing, the advantages for lithography are so great that antireflection coatings
can often be justified.
136 Chapter 4
where ρ1 is the reflectivity between the air and the resist [see Eq. (4.30)], and
ρ2 is the reflectivity between the resist and aluminum, at normal incidence. The
thickness and refractive index of the resist are d and n, respectively:
2π
δ= nd. (4.38)
λ
The oscillatory behavior of the reflectance (and absorption) results from the
factor ρ1 ρ2 cos 2δ. This factor is suppressed by a bottom antireflection coating
(ρ2 → 0) or a top antireflection coating ρ1 → 0 . An ideal top antireflection
Figure 4.18 Absorption and reflectance for the i-line resist on aluminum (at the beginning
of exposure). Note that absorption and reflectance are complementary.
Modeling and Thin-Film Effects 137
coating is a film coated on top of the resist that has the following properties:54
√
refractive index = narc = nresist , (4.39)
(k + 1) λ
d= , (4.40)
4narc
Figure 4.19 Situation for reflective notching. Light reflects from substrate topography into
areas in which exposure is not desired.
Figure 4.20 Example of reflective notching for i-line resist on polysilicon. In the area of
the topography the resist is notched. Standing waves resulting from substrate reflectivity are
also evident.
that are about twice E0 . Notching does not occur as long as the scattered dose E s
meets the following condition:
where α is the absorption coefficient for the resist and t is the resist thickness. From
the above two equations, there is no notching so long as
where the reflectance from the substrate is ρ. One obvious way to reduce the
distance d over which notching may occur is to increase the absorption α. This
can be done by adding a dye into the resist that does not bleach with exposure,
and such an approach has been used extensively.57 The benefit of having dye in
the resist was already discussed as a method for reducing the effects of standing
waves. There are tradeoffs between absorption to suppress the effects of reflective
substrates and image quality. Highly dyed resists have less vertical resist sidewalls,
so there is optimum dye loading for resists.58
Values for ρ can be as high as 0.9 for specular aluminum, and a typical value
for α is 0.6 µm−1 . For these parameters, no notching occurs for d + t ≥ 1.5 µm.
From these considerations, one can see that the utility of dyed resists is small for
deep submicron lithography. Moreover, as discussed in the section on photoresists,
optical absorption reduces the performance of the resist.
Thus far the discussion of antireflection coatings has assumed that the light
is normally incident or nearly normally incident onto the resist. While this is an
appropriate approximation to make for imaging at low numerical aperture, it is not
suitable for situations involving very high numerical apertures. To understand the
complications arising from high angles of incidence, consider the configuration
depicted in Fig. 4.21. The effectiveness of a single-layer bottom antireflection
coating is shown in Fig. 4.22. At low angles of incidence the antireflection
coating is effective, producing nearly the same low reflectance for both S- and
P-polarization. The antireflective coating becomes less effective at larger angles of
incidence, and there are substantial differences in behavior between the two types
of polarization. Except when imaging simple gratings using S- or P-polarized light,
the inability to produce low reflectance over a large range of angles of incidence
is a problem when using lenses with very high numerical apertures, since light
will be incident across a wide range of angles. For good process control, low
reflectance is required at all angles of incidence. As it turns out, this requires
complex antireflection coatings. There are insufficient degrees of freedom to design
Figure 4.22 Calculated reflectance of ArF light from a bottom antireflection coating into
resist, all of which is on a silicon substrate.8 The index of refraction and thickness of the
antireflection coating are 1.83–0.21i and 135 nm, respectively. The index of refraction of
the photoresist is assumed to be 1.8. In some publications, S-polarization is referred to as
transverse electric polarization, while P-polarized light is referred to as transverse magnetic
light.
4.7 Development
For diazonaphthoquinone resists, pattern profiles can be calculated if the
development rate R is known as a function of the parameter M (discussed
in Section 4.3), and the rate is usually measured directly for particular resist
and developer systems. Values for M in the resist are simulated using optical
models, with suitable modifications as a consequence of diffusion due to post-
exposure bakes. For chemically amplified resists, M can be taken as the fraction
of deprotection after post-exposure bake. Development rates are quite nonlinear
and are usually fit to particular functional forms in order to facilitate calculations.
Various functional forms have been proposed over time. Dill and coworkers
proposed the development rate function59
Other rate functions have since been adopted, notably the “Mack” model:60
(a + 1) (1 − M)n
Rate = rmax + rmin , (4.45)
a + (1 − M)n
where rmax, , rmin , n, and a are parameters of the model. By first calculating
M(x, y, z) using imaging models, Eq. (4.44), Eq. (4.45), or a similar equation can
Modeling and Thin-Film Effects 141
Problems
4.1 Assume that the index of refraction of the resist is 1.7. For P-polarized light,
what numerical aperture is required to produce light rays that have orthogonal
polarization in the resist?
4.2 Referring to Fig. 4.4, the reflectance for S-polarized light is nearly zero for an
angle of incidence of ∼60 deg. For what numerical aperture would the most
oblique rays have this angle of incidence?
4.3 Using the Dill parameters of Table 4.3, show that the light intensity that reaches
the bottom of a 1-µm-thick film of AZ 1470 resist is 55% and 33% of the
incident intensity at the g line and i line, respectively, at the very beginning
of exposure. Show that these numbers are 97% and 80% for a fully bleached
1-µm-thick film of AZ 1470 resist.
4.4 Why are post-exposure bakes used for diazonaphthoquinone resists, even
though such bakes are not required to complete the photochemical conversion
of photoactive compounds?
4.5 Suppose that proper exposure of a DNQ resist requires M = 0.1 in the middle
of large exposed features. Use Eq. (4.21) to show that an exposure time of 0.15
sec is needed if the value for the Dill parameter A of the resist = 0.8 µm−1 ,
B ≈ 0, and C = 0.01 cm2 /mJ, and the light intensity is 2000 mW/cm2 .
References
1. A. Sommerfeld, Optics, Academic Press, New York (1954).
2. W. Oldham, S. Nandgaonkar, A. Neureuther, and M. O’Toole, “A general
simulator for VLSI lithography and etching processes: part I–application to
projection lithography,” IEEE Trans. Electr. Dev. ED-26, 717–722 (1980).
3. C. A. Mack, “Comparison of scalar and vector modeling of image formation
in photoresist,” Proc. SPIE 2440, 381–94 (1995).
142 Chapter 4
37. D. Matiut, A. Erdmann, B. Tollkühn, and A. Semmler, “New models for the
simulation of post-exposure bake of chemically amplified resists,” Proc. SPIE
5039, 1132–1142 (2003).
38. S. V. Postnikov, M. D. Stewart, H. V. Tran, M. A. Nierode, D. R. Madeiros,
T. Cao, J. Byers, S. E. Webber, and C. G. Willson, “Study of resolution limits
due to intrinsic bias in chemically amplified photoresists,” J. Vac. Sci. Technol.
B 17(6), 3335–3338 (1999).
39. B. Tollkühn, A. Erdmann, J. Lammers, C. Nölscher, and A. Semmler, “Do we
need complex resist models for predictive simulation of lithographic process
performance,” Proc. SPIE 5376, 983–994 (2004).
40. J. Nakamura, H. Ban, and A. Tanaka, “Influence of acid diffusion on the
lithographic performance of chemically amplified resists,” Jpn. J. Appl. Phys.
31, 4294–4300 (1992).
41. I. I. Bol, “High-resolution optical lithography using dyed single-layer resist,”
Proc. Kodak Microelectron. Sem., 19–22 (1984).
42. C. A. Mack, “Dispelling the myths about dyed photoresist,” Sol. State Technol.,
125–130 (January, 1988).
43. K. Harrison and C. Takemoto, “The use of antireflection coatings for
photoresist linewidth control,” Proc. Kodak Microelectron. Sem., 107–111
(1983).
44. H. Van den Berg and J. van Staden, “Antireflection coatings on metal layers
for photolithographic purposes,” J. Appl. Phys. 50(3), 1212–1214 (1979).
45. W. H. Arnold, M. Farnaam, and J. Sliwa, “Titanium nitride as an antireflection
coating on highly reflective layers for photolithography,” U.S. Patent No.
4,820,611 (1989).
46. C. Nölscher, L. Mader, and M. Schneegans, “High contrast single layer resists
and antireflection layers: an alternative to multilayer resist techniques,” Proc.
SPIE 1086, 242–250 (1989).
47. T. Ogawa, H. Nakano, T. Gocho, and T. Tsumori, “SiOxNy:H, high
performance antireflection layer for the current and future optical lithography,”
Proc. SPIE 2197, 722–732 (1994).
48. T. Brewer, R. Carlson, and J. Arnold, “The reduction of the standing-wave
effects in positive photoresists,” J. Appl. Photogr. Eng. 7(6), 184–186 (1981).
49. J. Lamb and M. G. Moss, “Expanding photolithography process latitude with
organic AR coatings,” Sol. State Technol., 79–83 (September, 1993).
50. S. Kaplan, “Linewidth control over topography using a spin-on AR coating,”
Proc. KTI Microelectron. Sem., 307–314 (1990).
51. B. Martin, A. N. Odell, and J. E. Lamb III, “Improved bake latitude organic
antireflection coatings for high resolution metalization lithography,” Proc.
SPIE 1086, 543–554 (1989).
Modeling and Thin-Film Effects 145
52. M. Rocke and M. Schneegans, “Titanium nitride for antireflection control and
hillock suppression on aluminum silicon metalization,” J. Vac. Sci. Technol. B
6(4), 1113–1115 (1988).
53. T. Brunner, “Optimization of optical properties of resist processes,” Proc.
SPIE 1466, 297–308 (1991).
54. T. Tanaka, N. Hasegawa, H. Shiraishi, and S. Okazaki, “A new
photolithography technique with antireflection coating on resist: ARCOR,” J.
Electrochem. Soc. 137, 3900–3905 (1990).
55. O. M. Heavens, Optical Properties of Thin Solid Films, Dover, New York
(1955).
56. J. LaRue and C. Ting, “Single and dual wavelength exposure of photoresist,”
Proc. SPIE 275, 17–22 (1981).
57. A. V. Brown and W. H. Arnold, “Optimization of resist optical density for
high resolution lithography on reflective surfaces,” Proc. SPIE 539, 259–266
(1985).
58. B. K. Daniels and P. Trefonas, “Optimum dye concentration to balance
reflective notching against wall angle in positive rhotoresist,” J. Electrochem.
Soc. 135(9), 2319–2322 (1988).
59. F. H. Dill, W. P. Hornberger, P. S. Hauge, and J. M. Shaw, “Characterization
of positive rhotoresist,” IEEE Trans. Electron. Device ED-22(7), 445–4520
(1975).
60. C. A. Mack, “Development of positive photoresists,” J. Electrochem. Soc.
134(1), 148–152 (1987).
61. R. E. Jewett, P. I. Hagouel, A. R. Neureuther, and T. van Duzer, “Line-profile
resist development simulation techniques,” Polymer Eng. Sci. 11(6), 381–384
(1977).
62. J. A. Sethian, “Fast marching level set methods for three-dimensional
photolithography development,” Proc. SPIE 2726, 262–272 (1996).
63. I. Karafyllidis, P. I. Hagouel, A. Thanailakis, and A. R. Neureuther, “An
efficient photoresist development simulator based on cellular automata with
experimental verification,” IEEE Trans. Semicond. Manuf. 13(1), 61–75
(2000).
Chapter 5
Wafer Steppers
Wafer steppers, introduced briefly in Chapter 1, are discussed further in this
chapter, paying particular attention to the key subsystems of the modern reduction
stepper, such as light sources, illuminators, reduction lenses, and the wafer stage.
Alignment systems will be discussed in more detail in Chapter 6. In all of these
discussions, the viewpoint will be that of the user.
5.1 Overview
Prior to the advent of wafer steppers, circuit patterns were transferred from masks
to wafers by contact or proximity printing, or by using full-wafer scanners. In
contact printing, a mask that had the pattern for all chips on the wafer was
brought into contact with a resist-coated wafer. The mask was illuminated, thereby
exposing the resist under the clear features on the mask. This method of exposure
was used in the earliest days of the semiconductor electronics industry, but
the mechanical contact caused defects on both the mask and wafer, reducing
productivity. Proximity printing, where masks and wafers were brought close to
each other, but not into contact, was one approach to reducing the problem of
defects that arises with contact printing. Unfortunately, resolution was poor when
the gap between the mask and the wafer was a practical size, as a consequence of
diffraction. The first workable solution to this problem was the full-wafer scanner,
which also used a mask that contained the patterns of all chips that were transferred
1:1 to the wafer.1 More detail on tools for patterning wafers before the introduction
of wafer steppers is presented in Section 5.9.
The most common method for making the masks used for contact, proximity,
or full-wafer scanning lithography involved a photorepeater.2 The photorepeater
had a stage on which an unpatterned mask could be placed and moved precisely
(Fig. 5.1). A reduction lens was used to image the pattern of a single chip onto
the resist-coated mask blank, exposing the mask one chip at a time and using the
precise stage to move the mask between exposures. In order to distinguish between
the mask being made with multiple chips and the master object containing the
pattern of only one chip, the single-chip mask was called a reticle. Eventually it
occurred to someone to eliminate the intermediate mask and exposure tool and
essentially use a photorepeater to expose the wafers directly. Thus, the wafer
stepper was born.
148 Chapter 5
A cutaway view of a modern wafer stepper is shown in Fig. 5.2 (see Color
Plates). This view shows all the major subsystems: reduction lens and illuminator,
excimer laser light source, wafer stage, reticle stage, wafer cassettes, and operator
workstation. In this particular figure, wafers are being taken from cassettes. In
the configuration more typical of high-performance lithography, resist-processing
equipment is interfaced directly to the exposure tool. Resist-coated wafers are taken
from the resist-processing equipment [or the input cassette, standard mechanical
interface (SMIF) pod, or front-opening unified pod (FOUP)] and placed on a
prealignment station, where the wafers are oriented with respect to the notch (or
flat) and centered mechanically. The wafers are then transferred onto a very flat
vacuum-exposure chuck that sits on a stage whose position can be controlled with
extreme precision. The wafer stage is discussed in more detail later in this chapter.
Once the wafer is placed on the exposure chuck, it is aligned by automatic
systems that detect wafer targets optically and move the stage in small increments
to correct the wafer position with respect to the ideal image field. Alignment
systems enable the overlay of new patterns to circuit patterns that already exist
on the wafer. Prior to exposing each field, the wafer is positioned in the vertical
axis by an autofocus system, which in modern steppers also includes the capability
to pivot the vacuum chuck that holds the wafer during exposure in order to reduce
any net tilt in the wafer surface due to chuck or wafer flatness errors.3,4 Autofocus
systems are discussed in more detail later in this chapter. Once the wafer is properly
positioned and brought into focus, a shutter in the illumination system is opened,
and the resist is exposed in the first exposure field. After exposing the first field,
the shutter is closed and the wafer is moved into position to expose the next field.
This process is repeated until the entire wafer is exposed. This repetitive process
led to the “step-and-repeat” designation for this tool, or “stepper” for short. The
exposed wafer is then moved to an output cassette (or SMIF pod or FOUP) or back
to interfaced resist tracks for additional processing.
The wafer stepper was introduced commercially in the late 1970s by the GCA
Corporation of North Andover, Massachusetts, based on a photorepeater that they
were already producing.5 GCA’s first system, the DSW4800, had been preceded
by systems designed and built by several semiconductor companies, including
Philips, Thomson CSF, and IBM,6–8 who built steppers for their own use and
did not sell them commercially. (The Philips stepper technology was eventually
commercialized by ASM Lithography.) The DSW in the name of GCA’s stepper
referred to Direct Step on the Wafer, an allusion to the stepper’s origin in mask
making. The GCA DSW4800 stepper, which handled 3-, 4-, or 5-in. wafers, was
equipped with a 10× reduction, 0.28-NA, g-line lens that could pattern a maximum
square field of 10 × 10 mm. The lens was supplied by Carl Zeiss of Oberkochen,
Germany. The stepper could achieve overlay of ±0.7 µm and a resolution of 1.25-
µm lines and spaces over 1.5-µm depth-of-focus. Its list price was about $300,000.
Following GCA, several companies began to produce wafer steppers and sell
them commercially. The major stepper suppliers today for fine-line patterning
are Nikon, Canon, and ASM Lithography (ASML). The GCA operations were
absorbed by Ultratech Stepper. Ultratech has a significant share of the market
for bump and packaging applications. Modern steppers have considerably greater
capability than the original GCA DSW4800, but the operating principles are
essentially the same. Leading-edge systems also come in the step-and-scan
configuration, as well as in the step-and-repeat format. The defining characteristics
of these two configurations were discussed in Chapter 1. The characteristics of
contemporary step-and-scan systems are listed in Table 5.1, in comparison to the
DSW4800. Modern systems provide much greater overlay and imaging capability
than GCA’s first machine. There has also been a substantial improvement in
the productivity of exposure tools over time, a topic that is discussed further in
Chapter 11.
150 Chapter 5
Table 5.1 Commercially available step-and-scan 193-nm systems, compared to the first
commercially available wafer stepper. The relationship between the number of shots and
throughput is discussed further in Chapter 11.
GCA Nikon ASML
Figure 5.3 White light is broken into a spectrum of color by a prism. Typically, the
refractive index of glass increases as wavelengths get shorter, increasing refraction for
shorter-wavelength light. In lenses, this causes imaging to be wavelength dependent, and
compensation for this effect is required in lenses for semiconductor lithography in order to
achieve high resolution.
Wafer Steppers 151
particular standard wavelengths, since additional technology that depends upon the
wavelength is required. For example, resist manufacturers need to develop resists
that perform optimally at the specific wavelengths used,10 and pellicles (discussed
in Chapter 7) need to be optimized for the specific wavelengths that are used. The
limited number of wavelengths at which lithography is practiced enables R&D
resources to be well focused.
Mercury-arc lamps and excimer lasers have been the sources of actinic light
for nearly all projection photolithography. (Radiation that can induce chemical
reactions in photoresists is termed actinic.) The lines of the mercury spectrum
and excimer lasing wavelengths used in lithography are listed in Table 5.2. The
mercury-arc lamp has three intense spectral lines in the blue and ultraviolet
portions of the electromagnetic spectrum (Table 5.2), along with some continuum
emission in between these spectral lines. The first commercially available wafer
stepper, the GCA DSW4800, operated at the mercury g line, as did the first steppers
built by Nikon, Canon, ASML, and TRE (TRE was an early supplier of wafer
steppers. They later changed their name to ASET, but discontinued operations in
the early 1990s.) These were based upon the same concepts as the GCA DSW4800.
The mercury g line is blue light (λ = 436 nm) in the visible part of the spectrum.
As mentioned previously (and to be discussed in more detail in Section 5.4), it
has proven possible to design stepper lenses that meet the extreme resolution
and field size requirements of microlithography only over a very narrow range
of wavelengths. For mercury arc lamp–based systems, this has been over the
bandwidths of the arc lamps, on the order of 4–6 nm. These bandwidths are much
larger than the natural bandwidths of mercury atomic emissions, because of the
collision (pressure) and thermal (Doppler) broadening11 that can be considerable
in a high-pressure arc lamp operating at temperatures approaching 2000 ◦ C.
Some systems built by Ultratech image over somewhat broader ranges of
wavelengths (390–450 nm), but these have resolutions limited to 0.75 µm or larger.
This use of multiple wavelengths has significant advantages in terms of reducing
standing-wave effects,12 and the Ultratech steppers have been used effectively
for lithography at ≥1.0-µm feature sizes. Unfortunately, the broadband Ultratech
lens design prints the reticle at a 1:1 ratio.13 There is no reduction, and reticle
quality has largely prevented the use of Ultratech steppers for critical applications
in deep submicron lithography. Besides the lenses from Ultratech, there were a
few other lenses that imaged at both the g and h line,14 but these had limited
Class of light source Specific type Wavelength (nm) Location in the electromagnetic spectrum
acceptance. While the mercury h line was used on a few wafer steppers,15 most
stepper manufacturers made a transition to i-line lithography in the late 1980s
as the need arose to print submicron features, while maintaining depths-of-focus
≥1.0 µm.16 I-line lithography dominated leading-edge lithography until the advent
of deep-ultraviolet (DUV) lithography in the mid-1990s.
There is a strong band of DUV emission (λ = 240–255 nm) from mercury-
xenon arc lamps, and these were used on early DUV exposure tools, such as the
Micrascan I and Micrascan II from SVGL. (SVGL was acquired by ASML in
2001.) Most DUV systems today use excimer lasers as light sources, as did higher-
resolution versions of the Micrascan platform. The bandwidth requirements for
these light sources are usually much less than 1.0 picometer (pm). Excimer lasers
are considered in more detail, following a discussion of arc lamps.
A mercury-arc lamp is illustrated in Fig. 5.4. A fused silica bulb is filled through
a tip with a small quantity of mercury and argon or mercury and xenon. After
filling, the bulb is sealed. Operation is initiated by applying a high-frequency high
voltage (>10 kV) across the two electrodes, ionizing the inert gas. The resulting
discharge causes the mercury to evaporate, and the mercury begins to contribute to
the discharge. The plasma, being electrically conducting, cannot support the high
voltage, so the voltage drops, and steady lamp output is maintained by operating the
lamp at constant current at relatively low dc voltages (50–150 V). High voltages
are needed only to ignite the plasma. Condensation of the mercury to the cooler
walls of the bulb near the electrodes is inhibited by reflecting coatings. Pressure
inside the bulb can exceed 30 atm during operation,17 and catastrophic failure is
always a concern. The electrodes are made of refractory metals, such as tungsten,
in order to withstand the internal temperatures that can be as high as 2000 ◦ C.
Thorium coatings are often used to reduce the electrode work functions and provide
electrons to the plasma more easily.
Figure 5.5 Configuration for excimer laser light sources, where the lasers are placed far
from the stepper.
154 Chapter 5
In KrF excimer lasers, excited dimers are created by placing a strong electric
field across a gas mixture containing Kr, F2 , and Ne.22 Early excimer lasers
required voltages >20 kV, and many components failed from high-voltage
breakdown.23 A high voltage is used to produce an electrical discharge, which,
in turn, drives the reactions which ultimately result in lasing (see Table 5.3).24
Reliability has improved through laser designs that require lower voltages, in
the range of 12–15 kV, to produce the electrical discharge, and these lower
voltages reduce the stress on all electrical components.25 Excimer lasers produce
light in pulses, at rates up to several kilohertz. There has been a transition from
thyratron-based discharging electronics to solid-state electronics, and this has also
contributed to improvements in laser reliability. Excimer lasers have matured
considerably since their first use on commercially available wafer steppers in
198826 to the point where they are sufficiently reliable for industrial use. This
improvement in excimer light sources has played a critical role in bringing
DUV lithography to production worthiness. To appreciate the degree of reliability
required for use in manufacturing, consider that modern KrF and ArF excimer
lasers are capable of pulsing at 6 kHz. With a duty factor of only 10%, this
represents 18.9 billion pulses a year.
The unnarrowed fluorescence spectrum for the KrF emission is shown in
Fig. 5.6.27 There is sufficient gain over only part of this spectrum for lasing, about
400 pm. Freely running ArF lasers have similarly broad bandwidths, with full-
width half-maxima of about 450 pm. These bandwidths are much too large for use
with all-refractive optics and require narrowing for catadioptric lenses as well.
Excimer lasers consist of several subsystems, shown in Fig. 5.7. A high
repetition rate is desirable for these pulsed light systems, and excimer-laser
suppliers have improved the available rates from 200 Hz28,29 to 4 kHz,30,31 and
now to 6 kHz.32,33 The higher rates allow for high doses in short times without
requiring high-peak light intensities, and this reduces damage to optical elements.
Measurements for stepper self-metrology take place no faster than permitted by the
excimer-laser frequency, so stepper set-up time is reduced with high-repetition-rate
lasers.
After several years of improvement, typical (time-averaged) water-plane
intensities for DUV steppers are now comparable to those achieved on i-line
Kr + e− → Kr∗ + e−
Two-step positive krypton production
Kr∗ + e− → Kr+ + 2e−
F + F + Ne → F2 + Ne Recombination
Wafer Steppers 155
steppers (2000 mW/cm2 or more). Further improvements for DUV systems are
limited by glass damage that occurs with higher peak power and difficulties
in increasing the excimer-laser repetition rate beyond 6 kHz. A high level of
atomic fluorine in the discharge region of an excimer laser can cause arcing
and other instabilities, so the by-products from one pulse (see Table 5.3) must
be removed from between the electrodes and replenished with fresh gas before
another pulse can be fired. The repetition rate is limited by the ability to
exchange the gas between the electrodes. Faster exchange rates place significantly
greater requirements on fans and motors, with attendant concerns for reliability
degradation. In principle, glass damage can be reduced by “stretching” the pulses
from their unstretched length of 25–30 nsec,34 reducing the peak energy while
maintaining the total-integrated energy.35 Another advantage of temporally longer
pulses is a reduction in bandwidth.36 An example of a stretched pulse is shown in
Fig. 5.8.
Because laser-pulse intensity does not evolve symmetrically in time, temporal
pulse length needs definition. The most commonly used definition is the integral-
156 Chapter 5
Figure 5.8 Excimer-laser power versus time for a normal and stretched pulse from a
Cymer ArF excimer laser.37
This definition is used because this is the pulse duration that is most relevant to the
issue of glass damage.38
The requirements for the laser gases cause the cost of installation of excimer
laser steppers to increase relative to costs for arc-lamp systems (Table 5.4),
particularly since the fluorine gases have safety requirements that must be
addressed. An innovative solid source for fluorine addresses this safety issue.39
Since fluorine leads to the etching of silicon dioxide, the windows of the excimer
laser are typically made of calcium fluoride, a material that is discussed in more
detail in a later section. Laser gases must be very pure, as impurities will cause
degradation of laser performance. Since fluorine is a primary excimer laser gas,
special materials must be used for handling this very chemically reactive gas.
such as Teflon
R
and Viton
R
, with ceramic and pure metal materials for insulators
and seals, and improved methods for cleaning parts.
The step-and-scan configuration places more stringent requirements on the
excimer lasers than do step-and-repeat machines. There is a direct relationship
between the maximum scan speed and the number of pulses required to achieve
a specified dose:
n
W s = Vm , (5.2)
f
where W s is the slit width, Vm is the maximum wafer-stage scan speed, n is the
minimum number of pulses required to achieve the specified dose, and f is the
laser repetition rate.45,46 The minimum number of pulses used to achieve the dose
is also related to dose control. The standard variation of the dose σD is related to
the pulse-to-pulse variation σP−P by:
σP−P
σD = √ . (5.3)
n
have been used to narrow the bandwidth, principally gratings50 and Fabry-Perot
etalons (discussed below), or combinations of both. Prisms are sufficient to narrow
the bandwidth for application for moderate-NA catadioptric systems,51 while
extremely high-NA catadioptric lenses still have very tight bandwidth requirements
(<0.5 pm FWHM). The different options for narrowing laser bandwidths are
illustrated in Fig. 5.9.
Etalons are based upon the transmission properties of light through a transparent
plane-parallel plate. If the reflectance from an individual surface of the plate is R,
then the transmitted intensity through the plate It , normalized to the incident light
intensity Ii , is given by52
It 1
= , (5.4)
Ii δ
1 + F sin2
2
where
4π
δ= nt cos θ (5.5)
λ
and
4R
F= . (5.6)
(1 − R)2
Figure 5.10 Transmitted light intensity versus δ through an etalon. δ is given by Eq. (5.5).
of the problems with early excimer lasers was maintaining absolute wavelength
calibration. The gratings and Fabry-Perot etalons used to narrow the bandwidth
require a level of mechanical stability to maintain a fixed wavelength that is
not achievable. Modern KrF lasers use an internal iron reference, with several
wavelengths in the region of 248.3 nm, to establish absolute calibration.57 Single-
isotope mercury sources have also been used.58 For ArF lasers, there is a
carbon spectral line at 193.0905 nm that is used to establish absolute wavelength
calibration.59
Bandwidth specifications for excimer lasers for lithographic applications were
originally specified in terms of full-width half-maximum. Over time it became
recognized that image degradation could be caused by wavelengths on the “tails” of
the bandwidth distribution. It was also found that linewidth variations were found
to correlate better to a parameter referred to as E95, which is the bandwidth in
which 95% of the light energy is contained, than to the FWHM metric.60 As a
consequence, bandwidth specifications for the latest generations of excimer lasers
for lithographic applications are given in terms of E95.
Modern exposure tools impose conflicting requirements on lasers. Improved
imaging requires illumination with narrower bandwidths, for the reasons discussed
earlier in this section. There is also strong economic pressure for lasers to
have high-energy output in order to achieve high exposure-tool productivity (see
Chapter 11). It is very difficult to obtain high-laser-energy output at very narrow
wavelengths, particularly for ArF lasers, which are intrinsically less efficient than
KrF lasers. One approach to addressing both requirements is the technique of
injection locking.61,62 In this approach a laser pulse is generated in a seed laser,
or master oscillator. This pulse then travels through a second laser chamber,
where the power of the signal is amplified. The critical beam characteristics,
such as bandwidth, are determined largely by the master oscillator, while power
is generated by the power amplifier. Injection-locking technology has existed for
many years, but its application to lithography has been inhibited by the high spatial
coherence of the light produced by traditional injection-locking systems.63
Wafer Steppers 161
energy, and pulse duration. For more detail on excimer lasers for lithographic
applications, refer to the excellent review paper by Das and Sandstrom.68
Figure 5.12 Schematic of a fly’s eye array for producing uniform light intensity in an
illuminator.
With uniform illumination over the entire slit, good dose control on step-and-
scan systems requires the same number of pulses for every exposure. To avoid a
dose quantization problem, the illumination is usually nonuniform in the direction
of the scan on step-and-scan systems, a trapezoidal shape being typical (Fig. 5.13).
The illuminator must also control the dose. This requires real-time measurement
of the exposure. A typical method involves the use of a beamsplitter to pick off a
small fraction (≤1%) of the light, and the intensity is measured (Fig. 5.14). In a
step-and-repeat system, the light is integrated from the time the shutter is opened,
and the shutter is closed when the correct exposure dose is achieved. For a step-
and-scan system, the feedback is used to control the scanning speed. Constant dose
on the wafer is maintained by decreasing the scanning speed of both the reticle and
wafer by the same percentage to compensate for decreases in laser- or lamp-light
output.
The use of a light-intensity monitor that is integral to the illuminator requires
calibration of this internal intensity monitor to the light intensity that is actually
achieved on the wafer. The light at the wafer plane is affected by transmission of
the lens and those portions of the illuminator that are between the light monitor
and the reticle. Most steppers have a light-intensity detector on the wafer stage,
enabling periodic checks of the dose-control system. The calibration of the detector
Figure 5.14 Part of the light from the source is measured in order to control the exposure
dose. The particular control system shown in this figure applies to step-and-repeat systems.
In step-and-scan systems, the feedback loop involves the scan speed.
on the stage requires the use of an external energy monitor, and these are available
from a number of commercial suppliers. The National Institute of Standards and
Technology (NIST) provides absolute calibration capability.75
The illuminator also establishes the partial coherence (σ) of the light for
conventional illumination and generates various forms of off-axis illumination
as well. (Off-axis illumination is discussed further in Chapter 8.) The subject of
coherence is discussed briefly in Appendix A. On modern wafer steppers, σ is a
user-controllable parameter.76,77 The geometry that controls σ is shown in Fig. A.3
of Appendix A. One simple way to modulate σ is to use an aperture that regulates
the light incident on the reticle. However, apertures reduce the total light intensity,
thereby diminishing exposure-tool productivity. Modern illuminators have been
designed to redirect the light collected from the light source to allow for variable
partial coherence without significant reductions in light intensity when low values
of σ are selected.78 For example, the angle of the cone of light that illuminates the
reticle can be modulated by the use of zoom optics, as shown in Fig. 5.15. Similarly,
light can be directed into an annulus by using a conical shaped lens element,
known as an axicon.79 A cross section of an axicon is illustrated in Fig. 5.16.
As shown in Fig. A.3, modifying the cone angle of the illumination changes the
partial coherence of the light. In addition to using refraction to redirect light within
the illuminator, diffractive optical elements (DOE) can also be used, and they are
useful for generating complex illumination shapes, such as will be discussed in
Chapter 8.
The effect of partial coherence on imaging is shown in Fig. 5.17. Complete
coherence leads to oscillations in the light-intensity profile that result from the
interference of light waves, but the edges of the profiles are steeper than those
obtained with less coherent light. Sharper profiles are obtained with small values
of sigma (σ ≤ 0.4) and are beneficial for patterning features such as contacts82 that
need the best optical profiles possible. On the other hand, it has been found that
Wafer Steppers 165
Figure 5.15 Illustration of zoom optics,80 which can be used to modify the partial
coherence of the illumination.81
Figure 5.17 Variation of light-intensity profiles with partial coherence. The images were
for 300-nm nominal features, calculated using PROLITH (version 1.5), for an aberration-
free 0.6-NA lens, and imaging at a wavelength of 248 nm.
166 Chapter 5
better depth-of-focus of line and space patterns is obtained with higher values of
sigma (σ ≥ 0.6),83 which are often used on gate and metal layers.
As might be expected from Fig. 5.17, linewidths are modulated by the partial
coherence. This effect is shown in more detail in Fig. 5.18, where linewidth is
given as a function of partial coherence. As can be concluded from the data shown
in Fig. 5.18, not only must illumination intensity be uniform over the exposure
field, so must partial coherence84 in order to maintain uniformity of linewidths
across the exposure field.
As is discussed in detail in Chapter 8, the depth-of-focus can be enhanced
in certain circumstances by illuminating the reticle with light from a ring or
annulus of light brought in at oblique angles to the reticle. As with respect
to partial coherence, illuminators are designed to produce off-axis illumination
without significantly reducing the light intensity, by using optical elements such
as axicons.78,85
In another implementation of off-axis illumination, quadrupole, the reticle is
illuminated with four separate beams.86 For quadrupole illumination to work
properly, all quads of the illumination must have the same the light intensity. This
is a more stringent requirement than having uniform light intensity overall.87
Figure 5.18 Linewidth as a function of sigma, given by calculated aerial images for isolated
180-nm lines, with optical parameters of 0.7 NA and a wavelength of 248 nm.
Wafer Steppers 167
accompanying larger values of N. However, the largest field on the wafer equals the
biggest quality area on the reticle demagnified by N. To sustain field sizes adequate
to handle expected chip sizes (see Table 5.6) while maintaining practically sized
reticles, and to provide adequate stepper throughput, the lens-reduction factor N
has been kept to modest values, such as 5× and 4×. The factors that determine the
optimum lens-reduction factor are given in Table 5.7.
Most stepper lenses with numerical apertures of ≤0.93 are refractive, meaning
that the optical elements are all made of transmitting glass. Microlithographic
reduction-lens design has generally followed the “double-Gaussian” form92 first
described by Glatzel93 of Carl Zeiss. An example of a modern stepper lens design
is shown in Fig. 5.19. Early stepper lenses weighed about 10 lbs. and resembled
camera lenses. Indeed, the stepper is a type of camera, in which a picture of the
reticle is taken and printed on a film comprised of photoresist. It is not surprising
Figure 5.19 The lens drawing shows one embodiment of the invention (DE19855108, filed
Nov. 30, 1998) that the lens designers from Carl Zeiss generated within the framework of a
design and feasibility study related to modern optical-projection systems for microlithography
at 248 nm with high numerical aperture (Courtesy of Winfried Kaiser of Carl Zeiss).
168 Chapter 5
Table 5.6 Expected chip sizes at the start of production, from the 2008 International
Technology Roadmap for Semiconductors.* Applications-specific ICs are expected to fill
available field sizes. Memory chips tend to have 2:1 aspect ratios, while microprocessor
aspect ratios are closer to 1:1. For many years exposure field size requirements were driven
by DRAM chip sizes, but logic now determines how big field sizes need to be.
2007 2010 2013 2016
Reasons that compel a large lens-reduction Reasons that may compel a small lens-reduction factor
factor
that the three largest manufacturers of stepper lenses—Nikon, Canon, and Carl
Zeiss—have their roots in making cameras. The modern refractive lens has 20–30
glass elements held firmly in a steel cylindrical jacket. The lens may be a meter
in length and weigh 500 kg or more. Photographs of KrF lenses are shown in
Fig. 5.20.
Refractive lenses typically provide good imaging over a narrow band of
wavelengths. Image formation by lenses relies on the phenomenon of refraction,
where rays of light are bent at the interface between materials with different optical
properties. Referring to Fig. 5.21, light rays that have an angle of incidence θ1 will
refract to the angle θ2 according to Snell’s Law:52
sin θ1 n2
= , (5.7)
sin θ2 n1
Figure 5.20 Two KrF lenses: (a) 0.82-NA KrF lens from Nikon, and (b) 0.8-NA lens from
Canon.
Figure 5.21 Refraction of light at an interface. The medium above the line has an index of
refraction of n1 , and the medium below the line has an index of refraction of n2 .
The bandwidths over which lenses operate tend to decrease along with wavelengths
and feature sizes. Consequently, the lenses used in microlithography, particularly
those used for imaging features of ≤65 nm, operate over narrow bandwidths.
Lenses can be made to operate over wider bandwidths when there is more
than one optical material available, because the lens designer has an additional
degree of freedom to use in designing lenses. G-line and i-line lenses can operate
over the bandwidths produced by mercury-arc lamps, or with slight narrowing,
typically 4–6 nm (FWHM). For many years, fused silica was the only available
material with sufficient transparency and quality for making 248-nm lenses, and
such lenses have needed highly narrowed light sources, typically <1 pm (FWHM).
This is a much smaller bandwidth than needs to be supported at g-line and i-
line wavelengths, where there are several different glass materials that can be
used for making chromatic corrections in the lenses. To meet the needs of ArF
lithography, crystalline CaF2 with adequate quality and in pieces of sufficient size
170 Chapter 5
has become available. This material can work down to 157 nm and somewhat
shorter wavelengths. Lenses for 193-nm lithography typically use both fused silica
and a small quantity of CaF2 . In spite of their shorter wavelengths and higher
resolution, lenses for 193-nm lithography have approximately the same bandwidth
requirements as the single-material 248-nm lenses of comparable NAs, because a
second lens material is used.
For good overall lens transmission, the materials used for the lens elements must
be extremely transparent. The transmission properties of various types of optical
glass are shown in Fig. 5.22.94–96 Today, fused silica and calcium fluoride are the
only suitable materials available in sufficient quantity for use at wavelengths below
350 nm, and these materials, too, have their limits. Lenses for 193-nm lithography
usually include CaF2 as an optical material97 because of its excellent transparency
at 193 nm and its resistance to optical damage (a problem that is discussed shortly).
BaF2 was being studied as a possible material for 157-nm lithography and 193-nm
liquid immersion lithography (to be discussed in Chapter 10).
For 248-nm and 193-nm lithography, it is practical to have lenses that operate
over very narrow ranges of wavelength (<1 pm), because adequate intensity from
excimer lasers can be obtained even over such narrow bandwidths. On the other
hand, i-line lenses need to operate over much wider bandwidths of between 4
and 6 nm. At shorter wavelengths, ingredients that change the glass’s index of
refraction also absorb the light. In stepper lenses this absorption is small and does
not have a major impact on lens transmissions, but there are consequences. Lenses
containing absorbing materials heat up and expand during exposure, causing
defocus,98 changes in magnification, and other aberrations. Focus is the parameter
affected most by lens heating, with some measurable impact on magnification.
Changes to the other aberrations are typically small, but are nonnegligible for
very low-k1 lithography. This heating often occurs with i-line lenses, and a similar
problem occurs for 193-nm lenses, because fused silica becomes slightly absorbing
Figure 5.22 The transmission of 10-mm-thick samples of materials used in the lenses of
wafer steppers. Surface losses are removed. The Ohara and Schott materials are suitable
for i-line lenses, but are too absorbing at DUV wavelengths.
Wafer Steppers 171
at that wavelength (unlike what happens at 248 nm, where fused silica is extremely
transparent).
Stepper software is required to compensate for the effects of lens heating.99
Manipulators are used to adjust lens-element positions actively to correct for
changes induced by lens heating. The amount of light transmitted into the lens
is determined by the patterns on the masks. Less light is transmitted through a
contact mask than a typical gate-layer mask. The light that actually enters the lens
is also affected by feature sizes, since much of the light can be diffracted out of
the entrance pupil of the projection optics. Diffraction may also cause nonuniform
lens heating, reflecting the nonuniformity of the mask’s diffraction pattern. Just as
the lens will heat up during exposure, it will cool when the stepper is idle, and the
stepper must correct for the cooling as well.
Transparency is a necessary but insufficient requirement for a material to be
appropriate for the optics used in microlithography. Many crystalline materials
exhibit birefringence, where the index of refraction is different along various
crystal axes, and such materials are generally not suitable for use in high-resolution
optical systems. For example, SiO2 in its purely crystalline form, quartz, is not
easily incorporated into lenses, while amorphous fused silica is a standard optical
material because the crystalline form of SiO2 has dissimilar behavior among
different optical axes. Other materials, such as MgF2 and LiF have adequate
transparency at 193-nm and 157-nm wavelengths, but are too birefringent for
use in high-resolution lenses. CaF2 has negligible intrinsic birefringence at long
wavelengths, but at wavelengths as short as 193 nm, this birefringence can
no longer be ignored [100]. Fortunately, the birefringence in CaF2 can largely
be corrected by using different [111] oriented CaF2 lens elements with their
crystallographic axes rotated 60 deg relative to each other, and also by including
pairs of elements100 of oriented material (the two elements in the pair rotated 45
deg with respect to each other).101 Barium fluoride has also been considered as a
material for stepper lenses, but it has a much higher level of intrinsic birefringence
than CaF2 .102 Indeed, the intrinsic birefringence of BaF2 at a wavelength of 193
nm is larger than the birefringence of CaF2 at a wavelength of 157 nm.
Low levels of birefringence that can affect imaging at levels relevant for
microlithography are induced by mechanical stresses into materials that are not
birefringent in their perfect crystalline or amorphous state.103 Such stresses are
created if there are large thermal inhomogeneities in the furnaces in which the
CaF2 crystals or boules of fused silica are grown. Residual stress birefringence is
one factor that limits the yield of CaF2 suitable for use in stepper lenses.
The transition to 193-nm lithography necessitated the addition of another lens
material, or the use of mirrors as well as glass lenses in the design, in order to
maintain practical laser bandwidths. A lens with both reflective and refractive
elements is referred to as catadioptric. In addition to fused silica, 193-nm lenses
often use CaF2 , which has significantly different physical properties than fused
silica, such as being much softer. Because CaF2 has long been used in television
camera lenses, lens makers have had some experience in the past working with
this material, albeit not at the critical levels required for the lenses used in
172 Chapter 5
where τ is the pulse length (nsec), N is the number of pulses, I is the energy
density (J/cm2 ) per pulse, and k1 , k2 , a, and b are fitting parameters that vary
from sample to sample. Parameters obtained by measuring the change in refractive
index over the course of 40 billion pulses is given in Table 5.8. The first term on
the right hand side of Eq. (5.8) represents rarefaction, so k1 is negative (or zero),
while the second term represents compaction. The two processes of compaction
and decompaction compensate each other to some extent. The degree to which
this occurs is dependent upon material preparation, and the fused silica which is
used for ArF stepper optics has a balance between the compaction and rarefaction
mechanisms. Lens lifetimes can be estimated by solving Problem 5.7.
Densification and/or rarefaction can change the refractive index and imaging
properties of the optical components at a level significant for lithographic optical
Table 5.8 Parameters for Eq. (5.8), for changes to the index of refraction at a wavelength
of 632.8 nm, which tracks the index of refraction at 193 nm.107
Sample A Sample B Sample C
systems. After extensive use, compaction and/or rarefaction can render 193-nm
optics unsuitable for high-resolution lithography. The details of the manufacturing
process for the fused silica material can influence the rate of compaction, and
grades of fused silica adequate for use in 193-nm projection optics have been
developed. In the illuminators of 193-nm systems, where high light intensities
occur, appreciable amounts of CaF2 are needed.
As can be seen in Eq. (5.8), the compaction and rarefaction terms have different
functional dependencies. Optics lifetime can be maximized by optimizing peak
power and pulse duration. Since the intensity differs at every element in the
optical path, careful studies must be done for each lens design. Since light
distributions within lenses depends on the diffraction patterns resulting from
particular combinations of mask layouts and illumination, lens-lifetime studies
must also take into account how the exposure tools will be used.
In general, some light reflects from the interface between two optically different
materials. Light propagating through a semifinite medium with an index of
refraction n1 and normally incident onto a planar interface with another semifinite
medium that has an index of refraction n2 has a fraction of light energy reflected
away from the interface:
n2 − n1 2
R = . (5.9)
n2 + n1
The index of refraction of air ≈ 1, while the indices of refraction for the
glass materials used in lithography, at the wavelengths of interest, are ∼1.5.
Consequently, approximately 4% of the light is reflected at each interface. For
a lens with 30 optical elements (60 interfaces) the overall transmission loss due
to these reflections would be over 90% with such simple air-glass interfaces. To
avoid such a large loss of transmission, with the resulting diminution of stepper
throughput and productivity, antireflection coatings are applied to the surfaces of
each optical element. The physics of such coatings is identical to that of the top
antireflection coatings used in resist processing and discussed in Chapter 4. As
seen from Eq. (4.40), these coatings perform optimally at certain wavelengths, and
this is another reason why refractive lenses should be used over a narrow band of
wavelengths. On the other hand, associated with the use of exposure systems with
monochromatic illumination, there is the susceptibility of the process to thin-film
optical effects, a topic that was discussed in detail in Chapter 4. It has proven better
to avoid compromising the projection optics and to find solutions to the problems
created by thin-film optical effects.
Because of high fluences, optical coatings need to be developed that are resistant
to laser-induced damage. This is particularly true for antireflection coatings in
the illuminator where fluences are high, and it also applies to high-reflectance
coatings.43 While fluences are lower in the projection optics, hence the potential
for damage is less, the sensitivity to damage is greater for the stepper lens, causing
repair costs to be much higher than for illuminators and lasers. In all cases, coatings
174 Chapter 5
Figure 5.23 Flare resulting from reflections and scattering within lens elements.
Wafer Steppers 175
Imax − Imin
C = (5.10)
Imax + Imin
with flare level δ (Imax + δ) − (Imin + δ)
−−−−−−−−−−−−−→ (5.11)
Imax + δ + Imin + δ
Imax − Imin
= . (5.12)
Imax + Imin + 2δ
Figure 5.24 Four-mirror design with NA = 0.14 and λ = 13.4 nm, designed for use in EUV
lithography.
Ultratech steppers use what is known as the Wynne–Dyson design for lenses.114
These comprise a family of catadioptric 1:1 lenses capable of imaging over
fairly broad ranges of wavelength. Early Ultratech steppers used the part of the
mercury spectrum that included both the g- and h-lines, providing the user with the
capability to minimize standing-wave effects. A more recent model, the Titan III,
uses illumination over the spectrum from 390 to 450 nm, and provides a resolution
of 0.75 µm.115 Unfortunately, it has proven difficult to extend this type of lens from
1:1 to reduction formats.
In addition to being the first commercially available step-and-scan system, the
Micrascan I used a catadioptric lens116 that did provide image reduction (4:1).
However, the Micrascan I used ring-field optics that were difficult to fabricate and
maintain, and extension to higher numerical aperture was problematic. (Ring-field
optics are discussed in Section 5.9.) To address the problem of aligning the optical
elements, an innovative lens design117–119 was introduced with the Micrascan
II and extended to the Micrascan III (Fig. 5.25).120 This lens incorporated a
beamsplitter that addressed the problem of aligning the optical elements when
curved mirrors were involved, by leading to a lens configuration that had only two
orthogonal axes. This greatly simplified mechanical alignment of the lens elements,
thereby permitting the extension of catadioptric lenses to higher NAs (0.50 for
Micrascan II and 0.60 for Micrascan III, compared to 0.357 on Micrascan I). In
the Micrascan II and III, polarized light from the reticle passes through several
refractive lens elements and is reflected by a flat folding mirror before entering
the polarizing beamsplitter. The light is reflected from the beamsplitter to a curved
mirror, returns through the beamsplitter, and is finally imaged onto the wafer. The
use of the aspheric mirror element allows a designed bandwidth of 4 nm for the
Micrascan II, as compared to the ≤1-pm bandwidth for an all-fused silica reduction
stepper lens at 248 nm. Thus the Micrascan II used a mercury-arc lamp for the
illumination source, as did the Micrascan I. A mercury-arc lamp, however, would
be far too weak to use with an all-refractive lens design, which requires that most
of the light be filtered away. As a corollary to being able to image well at a wider
range of wavelengths, catadioptric lenses also have greater stability with respect to
temperature and pressure changes than is possible with all-refractive lens systems.
Wafer Steppers 177
Figure 5.25 Schematic of a 4×, NA = 0.60, λ = 248-nm catadioptric lens used in the
Micrascan III, based on the patented design of Williamson.119
Focus
Temperature −5.01 µm/◦ C −0.04 µm/◦ C
Pressure 0.243 µm/mm Hg <0.01 µm/mm Hg
Wavelength 0.24 µm/pm <0.01 µm/pm
Magnification
Temperature 16.5 ppm/◦ C 0.2 ppm/◦ C
Pressure −0.8 ppm/mm Hg <0.1 ppm/mm Hg
Wavelength −0.8 ppm/pm <0.1 ppm/pm
178 Chapter 5
Figure 5.26 Lens designs considered by Nikon for NA ≥ 1.3, designed to operate at a
wavelength of 193 nm.122 (a) Two-axis and three-mirror type, (b) uniaxis and four-mirror type,
(c) three-axis and three-mirror type, and (d) uniaxis and two-aspheric-mirror type. Nikon
selected design (a) for their S610C scanner. How numerical apertures >1 are achieved is
discussed in Chapter 10.
Lens performance can fall short of the diffraction limit. Deviations from
diffraction-limited performance—aberrations—were discussed in the previous
chapter and can arise from a number of sources. Lens designs may be inadequate,
and even when the lenses are designed nearly to perfection, fabrication will be
imperfect. Important issues in lens fabrication are the purity, homogeneity, and
spectral transmission of glass materials; the precision to which spherical and
aspheric surfaces can be ground and polished; and the centration and spacing of
elements.123
To begin, optical materials must be uniform in terms of refractive index
and transmission. To meet these requirements, the lens materials, when being
grown in a furnace, must be subjected to very gradual changes in temperature;
otherwise, thermal gradients lead to variations in density. Induced stress also leads
to birefringence. The necessary quality levels have resulted in prices for optical
glass that have reached thousands of dollars per kilogram, while crystalline CaF2
is much more expensive.
Once the individual lens elements are produced, they must be mounted properly.
Critical machining and assembly are required to ensure the correct spacing between
lens elements and to make certain that they share, as closely as possible, a common
optical axis. Securing the elements to the lens housing, to ensure that they do not
shift, often requires adhesives. It is important that the curing of these adhesives
does not cause the lens elements to shift and does not induce stress.124 Adhesives
for lithography lenses must also have low levels of outgassing and must be stable
to stray UV and DUV light.
Wafer Steppers 179
For the lowest-aberration lenses, all sources of stress that can distort optical
elements must be eliminated. Mounts for the lens elements have been carefully
designed to minimize deformation.125 A careful balance must be maintained while
securing the lens elements so they do not move in an undesired way or induce
undue stress.
Lens quality is assured through extensive metrology. Optical materials are
measured for homogeneity in refractive index down to the parts-per-million level,
and birefringence is measured as well. Transmission must also be uniform and
greater than minimum required values. After polishing, the surfaces of individual
lens elements are carefully measured and repolished until they meet specifications.
Many lens manufacturers use phase-measuring interferometry (PMI) to determine
the form of the emerging wavefront from the assembled lens, information that
can then be compared to the diffraction-limited wavefront in order to measure the
residual aberrations. The PMI was developed by Bruning and coworkers at AT&T
Bell Laboratories.126
The need for reduced aberrations motivated stepper manufacturers to introduce
the step-and-scan configuration in spite of its greater mechanical complexity.
Because only a small part of the reticle is imaged at any one instant with a
step-and-scan, a smaller part of the lens field is utilized. The part of the lens
with the smallest aberrations, least distortion, best imaging, and tightest linewidth
control is selected for use. The move to step-and-scan provides improvements
in overlay (as a consequence of reduced lens-placement errors and improved
ability to adjust for reticle registration errors) and better linewidth control, without
requiring significant improvements in lens technology. In addition to being able
to select the part of the field with fewest aberrations, there are improvements
in across-field uniformity that result from the averaging effect of scanning. The
placement of an object on the wafer is the result of the lens-placement errors
averaged over the part of the lens through which it is imaged. Since lenses are
judged by the worst placement errors, averaging always has an ameliorating effect.
Step-and-scan also provides improved compensation for reticle errors, because
magnification is adjusted independently in the directions parallel and perpendicular
to the scan,127,128 but other issues can arise when doing this, as will be discussed
in Section 5.7. Similarly, critical dimension uniformity is paramount in lenses, and
this is always improved by averaging. These are all general properties of step-and-
scan, independent of exposure wavelength, but step-and-scan has appeared first on
DUV systems for critical layer applications, although i-line step-and-scan systems
are also available.129
This use of a restricted field in step-and-scan has important implications. Since
only a rectangular strip of the lens field is used, the elements are rotated with
respect to each other, or “clocked,” to find the best image performance. The whole
lens, in fact, can be clocked. This tends to make final assembly of the lens go
more quickly and results in better performance. In addition, many aberrations are
proportional to powers of the field diameter and grow steeply near the field edges,
which is avoided when using the restricted field. Overall, the lens performance
180 Chapter 5
is improved for both CD control and image placement when a restricted field is
scanned rather than using the full image field.
Even though lenses are fabricated with extremely low levels of aberrations, it has
proven necessary to have active controls on the lenses to maintain good imaging.
As noted previously, lens heating and changes in barometric pressure require active
compensation of focus and magnification. As imaging requirements have gotten
tighter, it has proven necessary to compensate actively for other aberrations, such
as spherical aberration.130 This compensation is accomplished in several ways.
In state-of-the-art scanner lenses, the position of a number of lens elements is
controlled actively using actuators.131 To address the problem of asymmetric lens
heating due to the nonuniform diffraction patterns, Nikon has incorporated infrared
lasers to heat the lenses in those portions of the lens not heated by the actinic light
(Fig. 5.27).
Step-and-scan also provides cost-effective very large field sizes. Consider a lens
with a 31.1-mm diameter. The largest square field that is imaged by such a lens is
22 × 22 mm. A very wide, but short, field can also be imaged (Fig. 5.28). In step-
and-repeat systems, such fields are not very useful, since most integrated circuits
do not fit in such a field. On the other hand, such a short imaging field can be
used on step-and-scan machines, where the height of the printed field is obtained
by scanning. Thus, the same 31.1-mm diameter lens that could produce only a
22 × 22-mm maximum square on a step-and-repeat system could print a 30-mm-
wide field using an 8-mm-tall slit. Slit heights on commercially available systems
are shown in Table 5.10.
The height of the field is determined by the scanning stages, lens reduction,
and the size of the reticle. The stage technology exists for scanning up to 50 mm
Figure 5.28 Lens field utilization for a lens with a circular image field. For step-and-repeat
systems, the large square area would be used, while the wide and short area of the lens
field would be used for step-and-scan machines.
For several device generations, lens manufacturers have been able to improve
both NA and field size while maintaining diffraction-limited performance, as is
illustrated in Tables 5.1 and 5.11. Figure 5.29 shows the steady progression of lens
pixel count (the maximum exposure area divided by the area of a square that is one
minimum feature size on a side) for 248-nm lenses from five manufacturers (Zeiss,
Nikon, Canon, Tropel, and SVGL).
Increases in the pixel count, whether resulting from increases in numerical
aperture or field size, generally require larger lenses. This was seen in Fig. 2.11.
Over time, higher resolution and larger field sizes have been required, leading to
increases in lens size. The trend in lens weight is shown in Fig. 5.30. To follow
the pace required by the IC industry, lens manufacturers have had to use more
glass elements and elements with ever greater diameters (∼300-mm maximum).
Wide-field i-line lenses were once considered expensive (∼$500,000), but leading-
edge KrF lenses are several times more expensive due to the cost of the optical-
quality fused silica glass and more stringent requirements for surface figure and
finish, as well as coating uniformity. ArF lenses are even more expensive than KrF
ones. The fraction of stepper cost represented by the lens has been increasing, and
today represents one-third or more of the total system cost. Also, as the size of the
lens increases, the stepper body must add more dynamic compensation in order to
hold the lens in a vibration-free environment. The cost of microlithographic lenses
Figure 5.29 Progression of lens pixel counts versus time for 248-nm lenses.
detail in Chapter 10, where catadioptric designs provided the only practical
solutions for NA > 1.1. Having gained experience with catadioptric lenses for
immersion lithographic applications, ASML introduced a 0.93-NA KrF exposure
system, the model XT:1000H, which incorporates a catadioptric lens to reduce the
number of elements in the lens and to lower costs.141
Aspheric lens elements have also been introduced into microlithographic lenses,
and these have enabled a reduction in the size and weight of stepper lenses. When
a concave surface is moved against a convex surface in a random fashion, spherical
surfaces are naturally produced by abrasion (see Fig. 5.31). This forms the basis
for standard lens-fabrication methods, which produce lens elements with surfaces
that are spherical.142 Such spherical elements necessarily introduce aberrations
that must be compensated by additional elements. The introduction of aspherical
elements enabled the number and size of lens elements to be reduced (see Fig. 5.32
in Color Plates). However, it is necessary to use polishing tools that cover only
a small area of the lens element in order to create aspheric surfaces, and this can
easily lead to surface roughness over short distances, even though the general figure
of the lens element is close to the desired shape. This short-distance roughness can
lead to flare. Aspheric lens technology has many potential benefits, particularly
reducing the number of lens elements, but extra care is required to maintain low
flare with lenses that incorporate a large number of aspheric elements.
Finally, it should be pointed out that resolution is independent of lens reduction,
a factor that does not appear in any of the equations for resolution. In properly
designed microscopes there is a natural relationship between resolution and
magnification, so the two tend to be associated. The historical transition from
full-wafer scanning systems to higher-resolution steppers involved an increase in
the reduction ratio of the optics, but this increase was not required to achieve
the higher resolution. In lithographic equipment, resolution and magnification are
independent quantities. Very high resolution 1× magnification systems have been
built.143
Figure 5.32 The benefit of aspherical lens elements (see Color Plates).
the correct distance from the lens, where good imaging occurs. Because wafer
thickness varies by considerably more than the depth-of-focus, direct detection
of the top surface of the wafer is a practical necessity, in conjunction with servo
mechanisms and appropriate feedback systems. The separation between the lens
and the wafer is measured prior to each exposure or during each scan, and the
wafer height is adjusted with the objective of maintaining a constant lens-to-wafer
distance. (On some older steppers it was the lens height that was changed.)
Because of concerns over defects, the detection of the top surface must be
performed without mechanical contact with the wafer. Several methods have been
devised for detecting the wafer surface, and the basic concepts behind these
methods are now discussed. Three methods for focusing wafers are used on
steppers: optical, capacitance, and pressure.
The optical method, illustrated in Fig. 5.33(a), is the technique used most
commonly for focusing.144–147 In this approach, light with a glancing angle of
incidence is focused onto the substrate. The reflected light hits the detector
at different positions, depending upon the vertical position of the substrate. In
Fig. 5.33(a), light reflecting off a wafer at the position indicated by the solid line
hits the detector at a different position than light that reflects from a wafer at the
position indicated by the dashed line. Detectors that distinguish the position at
which the light strikes the detector can be used to measure focus. The degree to
which the optical method is insensitive to substrate film type and topography is
critically dependent upon the system design.148 For example, the optical system
may detect the strongest reflecting surface, and the position of the resist film will be
186 Chapter 5
Figure 5.33 Schematic representations of different methods for focusing exposure tools.
(a) Optical focusing, (b) pressure sensors for focusing, and (c) capacitance gauges for
focusing.
found at a different distance from the lens if it is sitting on top of a thick oxide film
rather than a bare silicon substrate. Process engineers must empirically determine
if their combination of steppers and substrates creates such a sensitivity. Substrates
comprised of metal and thick oxide films, as typically enountered in the back end
of wafer processing, are most commonly susceptible to focus errors.149 Since metal
films are highly reflective, this is not too surprising. Minimizing the angle θ and
using multiple wavelengths provides the least sensitivity to films on the wafer
surface.150,151 Autofocus errors are also reduced when the plane containing the
incident and reflected autofocus beams is at 45 deg relative to substrate features.152
Two other methods of locating the wafer surface are also shown in Fig. 5.33. In
one method, pressurized air is forced into a block that contains a pressure sensor.153
The pressure in the block depends upon the gap between the block and the wafer
surface. By mounting such blocks on the sides of lenses, the separations between
lenses and wafers are measured. This method is sensitive only to the top surface
of the resist film and is independent of substrate composition. It does require
calibration to the ambient barometric pressure, and this type of focus sensor and
special effort is required to measure the height of the wafer surface directly below
the lens during exposure, since the sensor would interfere with the imaging.154
Such air gauge systems have been used as the primary autofocus systems on
exposure tools, as well as for calibrating optical focus measurement systems.155
In one assessment, focus errors on product wafers were reduced by 20–30 nm (3σ)
when the optical focus system was calibrated with an air gauge.156
Capacitance sensors have also been used to measure the lens-to-wafer
distance.157 A capacitor is created where one plate is the wafer [Fig. 5.33(c)].
Capacitance C is given by
εA
C= , (5.13)
d
Wafer Steppers 187
where A is the area of the capacitor, d is the separation between the plates, and ε is
the dielectric constant of the material between the plates. The separation between
the lens and wafer is determined by a measurement of capacitance because of
its dependence on d. Like the optical method, this technique is sensitive to the
composition of the films on the silicon wafers because of the dependence of the
capacitance on ε.
Since it is necessary to have the entire exposed area at the correct distance from
the lens, the tilt of the wafer surface must also be measured and corrected. This can
be accomplished in a number of ways. The focus systems described above can be
used to measure exposure fields at multiple positions, providing the data required to
level the exposure field. In another approach, parallel beams of light are reflected
from the surface of the wafer. Their angle of reflection is a measure of the local
surface tilt.
Figure 5.34 The Michelson laser interferometer used to measure the X position of the
stage.
188 Chapter 5
and frequency f1 is reflected off a mirror mounted on the wafer stage, while the
other beam with frequency f2 is reflected to a fixed reference mirror. Because the
wafer stage may be moving, the frequency f1 will be Doppler shifted by an amount
∆ f , given by159
v v
∆ f = 2 f1 = 2 , (5.14)
c λ1
where v is the velocity of the stage, c is the speed of light, and λ1 is the wavelength
of the light incident on the moving mirror.
After reflections, the two beams of light are recombined at a heterodyne receiver
capable of measuring the difference frequency f2 − f1 ± ∆ f . Maximum measurable
velocity is set by the requirement that ∆ f < f2 − f1 . Comparing this to a reference
measurement of f2 − f1 enables a determination of the stage velocity through
Eq. (5.14). By integration of the velocity, the stage position is determined.160–162
Systems today typically use interferometers that potentially measure position with
λ/1024 ≈ 0.6-nm precision.
Although the interferometer is capable of measuring a position that is a small
fraction of the wavelength of the laser light, the stage precision has always fallen
short of such capability. The best current stages can step a Cartesian grid accurately
to about 5 nm and are repeatable to about 2 nm (3σ), far less precise than λ/1024 =
0.6 nm. A simple way to measure stage precision is to use pattern structures to
measure overlay on the wafer. Typical types of such structures are discussed in
Chapter 9. The first part of the structure is stepped across the wafer, and then the
second part of the structure is stepped, so that the two structures are concentric if
the stepping is perfectly precise. The discrepancy between interferometer precision
and actual stage performance is a consequence of changes in the air through which
the interferometer beams travel that effectively changes the wavelength of the
light.
The effective wavelength of the light in air can vary because of environmental
changes (Table 5.12). Since the density of air changes nonuniformly along the path
of interferometer beams, the phase of the light is shifted, resulting in measurement
errors. These fluctuations in density result primarily from localized sources of
heat or motion. Interferometer errors due to temperature- or pressure-induced
fluctuations in the optical path length of the air surrounding the wafer stage are
reduced by shielding the path from air currents and thermal gradients. Stepper
designers go to great lengths to maintain temperature uniformity throughout the
environmental enclosure of the steppers, in order to avoid air turbulence induced
Air temperature 1 ◦C
Pressure 2.5-mm Hg
Relative humidity 80%
Wafer Steppers 189
by thermal gradients. All sources of heat that can be remotely operated are removed
from the enclosure. Heat-generating parts that must be in the enclosure, such
as stage motors, are covered with temperature-controlled jackets.164 However,
since the stage must be able to move rapidly and freely within a large area,
it is impossible to shield the entire path length. Lis has described an air
turbulence–compensated interferometer that could potentially reduce the error due
to environmental fluctuations,165 but requirements appear to have exceeded the
improvements that such a system could bring.
In order to meet the very tight overlay requirements of sub-40-nm half-pitch
processes, it is necessary to fundamentally address the problem of air turbulence
and reduce the length of the path through air in which the light of the metrology
system travels. The latest generation of exposure tools makes use of encoders to
decrease such path lengths substantially (to ≤2 mm), thereby reducing turbulence
control and compensation requirements. Encoders make use of gratings, from
which light beams are diffracted, as illustrated in Fig. 5.35, where the grating is
placed on the wafer stage. Alternatively, the grating could be above the stage, with
the laser and detection electronics incorporated into the stage itself.
With encoders, the stage position is measured by having the +1st - and −1st -order
diffracted beams recombined.166,167 As the stage moves the resulting fringe pattern
also shifts (see Fig. 5.36). This fringe pattern can be measured with precisions
of 1 nm or better.168 Unfortunately, it is not yet possible to produce gratings that
are very accurate over long distances, so exposure tools that use encoders must
still have interferometers for calibrating the encoders and ensuring orthogonality
between the X and Y stage motions. As an historical note, an early commercially
available wafer stepper from a company called Optimetrix used an encoder to
measure stage position, but did not also have a laser interferometer. Lack of stage
accuracy contributed to Optimetrix’s inability to gain substantial market share.
Wafer stages need to move in more than one direction, so there are
interferometer beams to measure motion in the Y axis as well as the X axis. More
Figure 5.35 Illustration of the encoder concept used to measure stage position. Above the
grating is a module that contains the laser, as well as optics and electronics for detecting
the diffracted beams. The effects of air turbulence are minimized by making the distance
between this module and the grating short. Two-dimensional gratings can be used to provide
positional information in both the X and Y directions.
190 Chapter 5
Figure 5.36 Illustration of phase differences between the +1st - and −1st -order diffracted
beams as the stage moves. These phase differences lead to changes in the fringe pattern
produced by recombining the diffracted beams, and these changes are used to measure
stage position.169
than one beam in the X or Y axis enables measurement of stage yaw or rotation170
(Fig. 5.37). This is important because stages can rotate, as well as undergo
rectilinear motion. Modern stages have as many as six or more interferometer
beams to control rotation about the X, Y, and Z axes. This is important to minimize
the impact of Abbe errors that result from stage rotations, as illustrated in Fig. 5.38.
A single interferometer beam measures the distance traveled by the point on the
stage mirror where the beam is reflected, not how far the wafer center has moved.
These can be different because of stage rotation.
Not only must the stage position be measurable to high precision, it must
be controllable. This is accomplished by means of precision motors, bearings,
and feedback loops. In order to provide smooth motion, the wafer stage must
slide on some type of low-friction bearing. The original GCA system had Rulon
feet that slid on highly polished metal surfaces. Oil was applied periodically to
maintain smooth motion. Other stages have incorporated roller, air,171 or magnetic
bearings.172,173 Air bearings, riding on ultraflat granite or steel surfaces, are
commonly used.174
Even though near-perfect stage precision might be obtained, stage motion still
may not be accurate for a number of reasons. One reason might be cosine error,
illustrated in Fig. 5.39. In this situation, the stage travel is not parallel to the
interferometer beam. The result is an error proportional to the cosine of the angle
between the line of stage travel and the interferometer beam, i.e., the path of the
interferometer beams is 2d/cosθ rather than 2d, where d is the distance that the
stage actually travels. Cosine errors effectively change the scale by which distance
is measured by a factor of cosθ. Systematic position errors due to deviations
from ideal flatness of the stage mirrors, at least relative to a reference, can be
characterized and compensated for using software.
In order to achieve high stepper throughput, the stages must be capable of
moving at high velocity while maintaining good control of position. Acceleration
and jerk (the third time derivative of position) capability must also be high (see
Table 5.13). This is difficult to achieve because stages have become massive
in order to accommodate large wafers, be robust against vibrations, and remain
stiff, yet have the light mass needed for high acceleration. Since the imaging
performance of the stepper can be severely affected by vibrations,175 the rapid
movements of the wafer stage must be effectively decoupled from the optical
5.7 Scanning
In a step-and-scan exposure tool, it is necessary that the reticle stage scanning be
synchronized with the wafer stage scanning. The optical image of a feature k may
be misplaced because of imperfect scanning or synchronization between the reticle
and wafer stages. If the placement error in the x direction at time ti is ∆x(ti ), then
the resulting feature in resist will be misplaced by an amount
X
∆xk = ∆xk (ti ) , (5.15)
i
where the sum is over all times during which the particular feature is being imaged
through the slit of lens. For a scanner with an excimer laser light source, each time
ti will correspond to a laser pulse. For scanners using arc-lamp light sources there
are analogous equations involving integrals. It will be assumed for the remainder of
the discussion in this section that the light source is pulsed. Since not all position
errors ∆x(ti ) will be identical there will also be variations in the positions of the
optical images of a specific feature k about its average position:
r
1 XN 2
σk = ∆xk (ti ) − ∆xk , (5.16)
N i=1
where there are N pulses during the scan. There are usually specifications for the
quantities given in Eqs. (5.15)n ando(5.16) for a particular model of scanner, and
statistics of the sets of values ∆xk and {σk }. The specified tool moving average
n o
(MA) is usually given as the mean +3σ for the set ∆xk , while the moving standard
deviation (MSD) is the mean +3σ for the set {σk }.
A nonzero MA implies that there will be pattern placement errors, contributing
to imperfect overlay (discussed in the next chapter), while the MSD will cause
images to be blurred (see Fig. 5.40). This blurring has been called image
“fading.”134 For advanced exposure tools, MA < 1, while MSD < 3 nm, and
perhaps even below 2 nm.181–183
There are also systematic errors that can cause image fading. For example,
fading occurs if there is a mismatch between the lens reduction and the scanning
speed. For an explanation of this, suppose the reticle is moving at a speed vR and
the wafer is moving at a speed vW . During a scan of time t, the aerial image moves
a distance vR t/N at the wafer plane, where N is the lens reduction. During this same
period of time, a point on the wafer moves a distance vW t; therefore the amount of
blur caused during the scan is
v
R
− vW t. (5.17)
N
As shown in the next chapter, the quantity in brackets is nonzero when there is
an asymmetric magnification or skew error. Thus, there is systematic image fading
when there is an asymmetric overlay error.
194 Chapter 5
Figure 5.40 Image fading from imperfect synchronization between the reticle and wafer-
stage scanning that leads to the aerial image not staying at a fixed position on the wafer
during the scan. The result is image blurring.
Figure 5.41 Simulated image contours for 100-nm lines and spaces, with no random
stage synchronization errors and with MSD in both the X and Y directions, having random
Gaussian distribution, with σ = 25 nm. The parameters for the imaging are a wavelength of
193 nm, NA = 0.8, and quadrupole illumination. Note that line shortening is affected more
than linewidths, and lines are shortened more with fading.
Wafer Steppers 195
Fig. 5.40, image fading will cause an increase in linewidth. Since asymmetric
magnification errors affect only lines parallel to the scan, such errors lead to
a bias in linewidth between horizontal and vertical lines.185 Thus, one should
be cautious when applying asymmetric overlay corrections, as might be done to
correct for such errors as reticles. Because linewidth bias varies quadratically with
fading (Fig. 5.42), small asymmetric field terms can be employed without affecting
linewidths significantly.
Figure 5.42 Change in linewidth as a function of image fading for lines in a 5-bar structure.
The measure for the abscissa is the amount of fading across the scan due to asymmetric
magnification [Eq. (5.17)].184
196 Chapter 5
Figure 5.45 Calculated diffraction pattern just below a mask surface. Partially coherent
light with the wavelengths of 365 nm, 405 nm, and 436 nm, with a cone angle of incidence
of 8 deg was assumed. (a) 1-µm mask feature and (b) 5-µm mask feature.188
cause defects on the wafers. After each wafer printing the mask would pick up
additional particles from the wafer, or get scratched, and the mask would then need
to be replaced after being used to expose only a few wafers, in order to maintain
yield.
In order to reduce mask costs and improve yields, proximity printing was
introduced, where the mask was positioned a few tens of microns above the
resist surface (Fig. 5.46). By not coming directly into contact with the wafer, the
198 Chapter 5
problems of wafer defects and mask degradation were both addressed. However,
having a gap between the resist and the mask reduced resolution, as can be seen
in Fig. 5.45. As the distance from the mask is increased, the light distribution
broadens due to diffraction, and edge acuity degrades. A useful expression that
estimates the resolution of proximity printing as a function of the gap is189
r
T
resolution = kP λ g + , (5.18)
2
where λ is the wavelength of the light, g is the gap between the mask and wafer, T
is the resist thickness, and kP is a parameter analogous to k1 in optical-projection
lithography. Resolution versus the size of the gap is shown in Fig. 5.47.
A schematic of a commercially available proximity and contact printer is shown
in Fig. 5.48. Such systems can provide both proximity and contact printing.
Automatic wafer handling, alignment capability, and gap control are also available
on contemporary proximity printers.
Because of the defects associated with contact printing and the resolution limits
and difficulties of maintaining linewidth control with proximity printing, optical
Figure 5.47 Resolution versus gap for proximity printing, assuming a wavelength of
0.365 µm.
Wafer Steppers 199
Figure 5.48 Exposure units of a SUSS MicroTec Mask Aligner. For this system, resolution
is specified as <3 µm for proximity printing with a gap of 20 µm and <0.8 µm with vacuum-
assisted contact printing.190
Figure 5.49 Schematic of the optics of a Perkin-Elmer scanner. The mask and wafer are
scanned together.
200 Chapter 5
Problems
5.1 Suppose that 0.5% (3σ) dose control is required from excimer laser pulse-
to-pulse stability. If the single pulse-to-pulse stability of the laser is 2.5%
(3σ), show that the maximum scan speed consistent with the dose control
requirement is 960 mm/sec for a scanner with a slit height of 4 mm.
5.2 For a laser interferometer with a maximum Zeeman frequency split of f2 − f1 =
4 MHz, show that the maximum stage velocity that can be measured is 1266
mm/sec.
5.3 If the heterodyne receiver in a laser interferometer is capable of measuring
frequency changes as small as 10 Hz, show that the the slowest measurable
velocity that such a system can measure is 3.16 × 10−3 mm/sec.
5.4 Suppose that the projection optics of a stepper consists of 25 refractive
elements, and with the application of antireflection coatings the reflectance
from each element surface is 0.5%. Show that the total transmission of the
projection lens is 78%.
5.5 Suppose that fields are scanned at a length of 30 mm. Show that 0.1 ppm of
asymmetric magnification (in ppm) will lead to 3 nm of error across the scan.
From Fig. 5.38, is this expected to lead to significant linewidth errors?
5.6 Why are lenses that are designed to image well over a wide bandwidth less
sensitive to changes in barometric pressure than narrowband optics?
5.7 A typical ArF resist has 30 mJ/cm2 sensitivity, and about 100 pulses are
required to expose the resist. This means that the energy density per pulse
is 0.3 mJ/cm2 . This is also approximately the energy density in the bottom
element of the lens, which is close to the wafer. Imaging will be degraded
at a significant level if the index of refraction in the final lens element is
∆n = 1000 ppb or greater. With this energy density and a pulse duration
of τ = 45 nsec, show that a lens made from Sample B of Table 5.8 will
be degraded after 95.4 billion pulses. With 12.5 billion pulses per year in
manufacturing, what is the lifetime of a lens made from material B? What
is the lifetime if resists with sensitivity of 50 mJ/cm2 are used? Should the
user request a laser with pulses stretched to 100 nsec?
References
1. A. Offner, “New concepts in projection mask aligners,” Opt. Eng. 14(2),
130–132 (1975).
2. F. Klosterman, “A step-and-repeat camera for photomasks,” Philips Tech.
Rev. 30(3), 57–69 (1969).
3. M. van den Brink, B. Katz, and S. Wittekoek, “New 0.54 aperture i-line wafer
stepper with field-by-field leveling combined with global alignment,” Proc.
SPIE 1463, 709–724 (1991).
202 Chapter 5
4. K. Suwa and K. Ushida, “Optical stepper with a high numerical aperture i-line
lens and a field-by-field leveling system,” Proc. SPIE 922, 270–276 (1988).
5. J. Roussel, “Step-and-repeat wafer imaging,” Solid State Tech., 67–71 (May,
1978).
6. A. Bouwer, G. Bouwhuis, H. van Heek, and S. Wittekoek, “The silicon
repeater,” Philips Tech. Rev. 37(11/12), 330–333 (1977); see also S.
Wittekoek, “Optical aspects of the silicon repeater,” Philips Tech. Rev. 41,
268 (1983/84).
7. M. Lacombat, A. Gerard, G. Dubroeucq, and M. Chartier, “Photorepetition et
projection directe,” Rev. Tech. Thomson-CSF 9(2), 337 (1977); see also Hans
Binder and Michel Lacombat, “Step-and-repeat projection printing for VLSI
circuit fabrication,” IEEE Trans. Electron. Dev. ED-26(4), 698–704 (April,
1979).
8. J. Wilcyncski, “Optical step-and-repeat camera with dark field automatic
alignment,” J. Vac. Sci. Tech. 16, 1929–1933 (1979).
9. W. J. Smith, Modern Optical Engineering, 2nd ed., McGraw Hill, Boston
(1990).
10. H. Sewell and T. Friedman, “High-resolution imagery: the matching of
optical and resist systems in the mid-UV,” Proc. SPIE 922, 328–334 (1988).
11. R. Loudon, The Quantum Theory of Light, 2nd ed., Clarendon Press, Oxford
(1983).
12. W. H. Arnold and H. J. Levinson, “High resolution optical lithography using
an optimized single layer photoresist process,” Proc. Kodak Microelectron.
Sem., 80–82 (1983).
13. R. Hershel, “Optics in the model 900 projection stepper,” Proc. SPIE 221,
39–43 (1980).
14. A. R. Neureuther, P. K. Jain, and W. G. Oldham, “Factors affecting linewidth
control including multiple wavelength exposure and chromatic aberrations,”
Proc. SPIE 275, 48–53 (1981).
15. I. Friedman, A. Offner, and H. Sewell, “High resolution imagery: the
matching of optical and resist systems,” Proc. KTI Microelectron. Sem.,
239–250 (1987).
16. W. H. Arnold, A. Minvielle, K. Phan, B. Singh, and M. Templeton, “0.5
micron photolithography using high numerial aperture i-line wafer steppers,”
Proc. SPIE 1264, 127–142 (1990).
17. T. C. Retzer and G. W. Gerung, “New developments in short arc lamps,”
Illumin. Eng. 51, 745–752 (1956).
18. W. E. Thouret, “Tensile and thermal stresses in the envelope of high
brightness high pressure discharge lamps,” Illumin. Eng. 55, 295–305 (1960).
Wafer Steppers 203
600i: recirculating ring ArF light source for double patterning immersion
lithography,” Proc. SPIE 6924, 69241R (2008).
33. T. Kumazaki, T. Suzuki, S. Tanaka, R. Nohdomi, M. Yoshino, S. Matsumoto,
Y. Kawasuji, H. Umeda, H. Nagano, K. Kakizaki, H. Nakarai, T. Tatsunaga,
J. Fujimoto, and H. Mizoguchi, “Reliable high power injection locked 6 kHz
60W laser for ArF immersion lithography,” Proc. SPIE 6924, 69242R (2008).
34. U. K. Sengupta, Private Communication (2000).
35. T. Hofmann, B. Johanson, and P. Das, “Prospects for long pulse operation of
ArF lasers for 193-nm microlithography,” Proc. SPIE 4000, 511–518 (2000).
36. T. Enami, T. Hori, T. Ohta, H. Mizoguchi, and T. Okada, “Dynamics of
output spectra within a single laser pulse from a line-narrowed excimer
Laser,” Jpn. J. Appl. Phys. 39(1), 86–90 (2000).
37. V. B. Fleurov, D. J. Colon III, D. J. W. Brown, P. O’Keeffe, H. Besaucele,
A. I. Ershov, F. Trinchouk, T. Ishihara, P. Zambon, R. Rafac, and A.
Lukashev, “Dual-chamber ultra line-narrowed excimer light source for
193-nm lithography,” Proc. SPIE 5040, 1694–1703 (2003).
38. T. P. Duffey, T. Embree, T. Ishihara, R. Morton, W. N. Partlo, T. Watson,
and R. Sandstrom, “ArF lasers for production of semiconductor devices with
CD < 0.15 µm,” Proc. SPIE 3334, 1014–1020 (1998).
39. R. Pätzel, J. Kleinschmidt, U. Rebban, J. Franklin, and H. Endert, “KrF
excimer laser with repetition rates of 1 kHz for DUV lithography,” Proc.
SPIE 2440, 101–105 (1995).
40. T. Lizotte, O. Ohar, T. O’Keefe, and C. Kelly, “Stable beam delivery expands
excimer laser use,” Laser Focus World, 163–169 (February, 1997).
41. J. Vipperman, P. Das, J. Viatella, K. Webb, L. Lublin, D. Warkentin, and
A. I. Ershov, “High-performance beam-delivery unit for next-generation ArF
scanner systens,” Semicond. Manuf. 5(4), 104–110 (2004).
42. ANSI Standard Z136 1-1993, “Safe use of lasers,” American National
Standards Institute, New York (1993).
43. M. Case, “193-nm coatings resist excimer-laser damage,” Laser Focus World,
93–96 (April 2004).
44. H. Tsushima, M. Yoshino, T. Ohta, T. Kumazaki, H. Watanabe,
S. Matsumoto, H. Nakarai, H. Umeda, Y. Kawasuji, T. Suzuki, S. Tanaka,
A. Kurosu, T. Matsunaga, J. Fujimoto, and H. Mizoguchi, “Reliability report
of high power injection lock laser light source for double exposure and double
patterning ArF immersion lithography,” Proc. SPIE 7274, 72743L (2009).
45. K. Suzuki, S. Wakamoto, and K. Nishi, “KrF step-and-scan exposure system
using higher NA projection lens,” Proc. SPIE 2726, 767–773 (1996).
46. M. van den Brink, H. Jaspar, S. Slonaker, P. Wijnhoven, and F. Klaassen,
“Step-and-scan and step-and-repeat, a technology comparison,” Proc. SPIE
2726, 734–753 (1996).
Wafer Steppers 205
fused silica with 40 billion pulses of 193-nm excimer laser exposure and
their effects on projection lens imaging performance,” Proc. SPIE 5377,
1815–1827 (2004).
108. P. Bousquet, F. Flory, and P. Roche, “Scattering from multilayer thin films:
theory and experiment,” J. Opt. Soc. Am. 71(9), 1115–1123 (1981).
109. D. Golini, M. DeMarco, W. Kordonski, and J. Bruning, “MRF polishes
calcium fluoride to high quality,” Laser Focus World, S5–S13 (July, 2001).
110. R. R. Kunz, V. Lieberman, and D. K. Downs, “Photo-induced organic
contamination of lithographic optics,” Microlithography World, 2–8 (Winter,
2000).
111. S. Barzaghi, A. Plenga, G. Vergani, and S. Guadagnuolo, “Purged gas
purification for contamination control of DUV stepper lenses,” Sol. State
Tech. 44(9), 14–99 (September, 2001).
112. J. P. Kirk, “Scattered light in photolithographic lenses,” Proc. SPIE 2197,
566–572 (1994).
113. B. La Fontaine, M. Dusa, A. Acheta, C. Chen, A. Bourov, H. Levinson, L. C.
Litt, M. Mulder, R. Seltmann, and J. Van Braagh, “Analysis of flare and its
impact on low-k1 KrF and ArF lithography,” Proc. SPIE 4691, 44–56 (2002).
114. R. Hershel, “Optics in the model 900 projection stepper,” Proc. SPIE 221,
39–43 (1980).
115. Ultratech Stepper, San Jose, California.
116. J. D. Buckley and C. Karatzas, “Step and scan: a systems overview of a new
lithography tool,” Proc. SPIE 1088, 424–433 (1989).
117. D. Williamson, “Optical reduction system,” U.S. Patent No. 4,953,960
(1990).
118. M. Barrick, D. Bommarito, K. Holland, K. C. Norris, B. Patterson, Y.
Takamori, J. Vigil, and T. Wiltshire, “Performance of a 0.5 NA broadband
DUV step-and-scan system,” Proc. SPIE 1927, 595–607 (1993).
119. D. Williamson, Inventor, SVG Lithography Systems, Inc., Assignee,
“Catadioptric optical reduction system with high numerical aperture,” U.S.
Patent No. 5,537,260 (1996).
120. H. Sewell, “Advancing optical lithography using catadioptric projection
optics and step-and scan,” Proc. SPIE 2726, 707–720 (1996).
121. D. Williamson, J. McClay, K. Andresen, G. Gallitin, M. Himel, J. Ivaldi, C.
Mason, A. McCullough, C. Otis, J. Shamaly, and C. Tomczyk, “Micrascan
III, a 0.25-µm resolution step-and-scan system,” Proc. SPIE 2726, 780–2726
(1996).
122. T. Matsuyama, Y. Ohmura, Y. Fujishima, and T. Koyama, “Catadioptric
Projection Lens for 1.3 NA scanner,” Proc. SPIE 6520, 652021 (2007).
210 Chapter 5
16-Mbit DRAM using i-line stepper,” VLSI Symposium, Paper III-1, 17–18
(1988).
137. J. Webb, “All-calcium fluoride system using 157-nm light,” Laser Focus
World, 87–92 (September, 2000).
138. H. Nogawa, H. Hata, and M. Kohno, “System design of a 157-nm scanner,”
Proc. SPIE 4691, 602–612 (2002).
139. N. Shiraishi, S. Owa, Y. Omura, T. Aoki, Y. Matsumoto, J. Nishikawa, and
I. Tanaka, “Progress of Nikon’s F2 exposure tool development,” Proc. SPIE
4691, 594–601 (2002).
140. J. Mulkens, T. Fahey, J. McClay, J. Stoeldraijer, P. Wong, M. Brunotte, and
B. Mecking, “157-nm technology: where are we today?” Proc. SPIE 4691,
613–625 (2002).
141. F. Bornebroek, M. de Wit, W. de Boeji, G. Dicker, J. Hong, and
A. Serebryakov, “Cost-effective shrink of semi-critical layers using the
TWINSCAN XT:1000H NA 0.93 KrF Scanner,” Proc. SPIE 7274, 72743I
(2009).
142. T. Miller, “Aspherics come of age,” Photon. Spectra, 76–81 (February 2004).
143. G. Owen, R. F. W. Pease, D. A. Markle, and A. Grenville, “1/8 µm optical
lithography,” J. Vac. Sci. Tech. B 10(6), 3032–3036 (1993).
144. K. Suwa, K. Nakazawa, and S. Yoshida, “10:1 step-and-repeat projection
system,” Proc. Kodak Microelectron. Sem., 61–66 (1981).
145. A. Suzuki, S. Yabu, and M. Ookubo, “Intelligent optical system for a new
stepper,” Proc. SPIE 772, 58–65 (1987).
146. K. Suwa and K. Ushida, “The optical stepper with a high numerical aperture
i-line lens and a field-by-field leveling system,” Proc. SPIE 922, 270–276
(1988).
147. M. A. van den Brink, B. A. Katz, and S. Wittekoek, “New 0.54 aperture i-line
wafer stepper with field-by-field leveling combined with global alignment,”
Proc. SPIE 1463, 709–724 (1991).
148. T. O. Herndon, C. E. Woodward, K. H. Konkle, and J. I. Raffel, “Photocom-
position and DSW autofocus correction for wafer-scale lithography,” Proc.
Kodak Microelectron. Sem., 118–123 (1983).
149. B. La Fontaine, J. Hauschild, M. Dusa, A. Acheta, E. Apelgren, M. Boonman,
J. Krist, A. Khathuria, H. Levinson, A. Fumar-Pici, and M. Pieters, “Study
of the influence of substrate topography on the focusing performance of
advanced lithography scanners,” Proc. SPIE 5040, 570–581 (2003).
150. J. E. van den Werf, “Optical focus and level sensor for wafer steppers,” J.
Vac. Sci. Tech. 10(2), 735–740 (1992).
151. M. A. van den Brink, J. M. D. Stoeldraijer, and H. F. D. Linders, “Overlay
and field-by-field leveling in wafer steppers using an advanced metrology
system,” Proc. SPIE 1673, 330–344 (1992).
212 Chapter 5
169. This figure was provided by Dr. Wolfgang Holzapfel and Callimici Christian
of Heidenhein GmbH, a supplier of high-precision encoders.
170. S. Wittekoek, H. Linders, H. Stover, G. Johnson, D. Gallagher, and
R. Fergusson, “Precision wafer stepper alignment and metrology using
diffraction gratings and laser interferometry,” Proc. SPIE 565, 22–31 (1985).
171. M. A. van den Brink, S. Wittekoek, H. F. D. Linders, F. J. van Hout, and R. A.
George, “Performance of a wafer stepper with automatic intra-die registration
correction,” Proc. SPIE 772, 100–117 (1987).
172. M. Williams, P. Faill, P. M. Bischoff, S. P. Tracy, and B. Arling, “Six degrees
of freedom mag-lev stage development,” Proc. SPIE 3051, 856–867 (1997).
173. P. T. Konkola, “Magnetic bearing stages for electron beam lithography,”
Master of Science Thesis, Massachusetts Institute of Technology (1998).
174. S. Wittekoek, H. Linders, H. Stover, G. Johnson, D. Gallagher, and
R. Fergusson, “Precision wafer stepper alignment and metrology using
diffraction gratings and laser interferometry,” Proc. SPIE 565, 22–31 (1985).
175. B. J. Lin, “Vibration tolerance in optical imaging,” Proc. SPIE 1088, 106–114
(1989).
176. D. Cote, A. Ahouse, D. Galburt, H. Harrold, J. Kreuzer, M. Nelson,
M. Oskotsky, G. O’Connor, H. Sewell, D. Williamson, J. Zimmerman, and
R. Zimmerman, “Advances in 193-nm lithography tools,” Proc. SPIE 4000,
542–550 (2000).
177. Y. Shibazaki, H. Kohno, and M. Hamatani, “An innovative platform for
high-throughput, high-accuracy lithography using a single wafer stage,” Proc.
SPIE 7274, 72741I (2009).
178. J. Mulkens, Private Communication.
179. J. Bischoff, W. Henke, J. van der Werf, and P. Dirksen, “Simulations on step
and scan optical lithography,” Proc. SPIE 2197, 953–964 (1994).
180. B. Slujik, T. Castenmiller, R. du Croo de Jongh, H. Jasper, T. Modderman,
L. Levasier, E. Loopstra, G. Savenije, M. Boonman, and H. Cox,
“Performance results of a new generation of 300-mm lithography systems,”
Proc. SPIE 4346, 544–557 (2001).
181. J. de Klerk, L. Jorritsma, E. van Setten, R. Droste, R. du Croo de Jongh,
S. Hansen, D. Smith, M. van den Kerkhof, F. van de Mast, P. Graeupner,
T. Rohe, and K. Kornitzer, “Performance of a high NA, dual stage 193-nm
TWINSCAN step and scan system for 80-nm applications,” Proc. SPIE 5040,
822–840 (2003).
182. I. Fujita, F. Sakai, and S. Uzawa, “Next generation scanner for sub-100 nm
lithography,” Proc. SPIE 5040, 811–821 (2003).
183. K. Hirano, Y. Shibazaki, M. Hamatani, J. Ishikawa, and Y. Iriuchijima,
“Latest results from the Nikon NSR-S620 double patterning immersion
scanner,” Proc. SPIE 7520, 75200Z (2009).
214 Chapter 5
~=P
O ~2 − P
~ 1. (6.1)
~=P
R ~1 − P
~ 0. (6.2)
Figure 6.1 A contacted transistor. For the transistor to function properly, the contacts must
overlay the appropriate features.
68 2007 13.6
45 2010 9.0
32 2013 6.4
22.5 2016 4.5
Overlay 217
and
where (O x , Oy ) is the position of the origin of the wafer coordinate system in the
stage coordinate system, while θ is the angle of rotation between the two coordinate
systems (Fig. 6.3). The measurement scales of the wafer in relation to the stage
coordinate systems is represented by S . Thus, the wafer and stage coordinate
systems are related by four parameters: S , θ, OX , and OY . These parameters can
be determined by measuring the wafer stage positions of two points on the wafer.
218 Chapter 6
Figure 6.3 Transformation between the stage and wafer coordinate systems.
Issues that arise from having stages where the X and Y motions are not orthogonal
or do not have the same scale are discussed later in this chapter.
Locating points on the wafer with nanometer precision is accomplished by the
stepper’s alignment system. Specially designed structures called alignment marks
or alignment targets are placed on the wafer. The positions at which they are
placed are specified in the coordinate system of the wafer. The stepper’s alignment
system is designed to recognize the alignment targets optically and determine their
locations in the coordinate system of the stage. The key objective is to place
the wafer under the projection optics so that new exposures overlay pre-existing
ones, and this is accomplished by using the laser stage as a very accurate ruler.
The details of the alignment process are discussed in this chapter, along with the
methods used to characterize overlay errors.
Before the wafer is placed on the stage it is positioned mechanically. The
stepper’s prealignment system is used to locate the wafer’s center and orient the
wafer, using the wafer’s edges and notch (or flat) as mechanical references. The
centered and oriented wafer is then placed on the stage. Representative values for
the placement accuracy of prealigned wafers are shown in Table 6.2. These values
are typical of mechanical tolerances. Wafer prealignment enables the alignment
targets to be brought within the alignment system’s field of view, which is usually
at most a few hundred microns in diameter. Some alignment systems have such
small fields of view that an optical alignment, intermediate between the coarse
mechanical prealignment and the fine wafer alignment, is necessary.
Translation 10–100 µm
Rotation 0.1–1 mrad
Overlay 219
The prealignment or the intermediate optical alignment must ensure that the
fine alignment system captures the correct alignment target, which can be a
problem when multiple alignment marks have been generated at several prior
process layers. Because there may be limited silicon real estate in which alignment
targets can be placed, the targets will need to be placed as close together as
possible. Mark separation must be consistent with prealignment or intermediate-
alignment capability to avoid having the fine-alignment system capture the wrong
target. Some alignment targets consist of periodic structures, such as gratings, and
misalignment can occur when the prealignment or intermediate-alignment error is
more than half of the periodicity of the alignment mark.
Alignment marks are placed on the wafer at the first masking step. The marks
may be contained within each exposure field, typically within scribe lines, or
they may be placed in areas near the wafer edge (Fig. 6.4). The information of
ultimate importance consists of the locations of the product dies. The relationship
between the positions of the alignment marks and the product dies is established
by the design of the mask that is first used to pattern the alignment marks. Thus,
knowledge of the locations on the wafer of the alignment marks coupled with
the design data enables the determination of the product die positions. Within an
exposure field, there may be one or several alignment marks depending upon the
stepper’s design and the user’s alignment strategy.
It is also important at the first masking step that the X axis of the reticle is aligned
parallel to the X axis of the wafer stage. The consequence of failing to do this is
shown in Fig. 6.5. The reticle is oriented properly by placing alignment marks on
the reticle. Generally, these reticle alignment marks are outside of the exposure
field.
At each subsequent masking step, the stepper’s alignment system is used to
determine the position of some of the alignment marks on the wafer. It is also
possible to place new alignment marks on the wafer with each new exposure, and
Figure 6.4 Alignment marks can be placed within exposure fields, typically in the scribe
lines between product die, or outside of typical exposure fields.
220 Chapter 6
Figure 6.5 The pattern that results when the reticle is rotated with respect to the wafer
stage.
align subsequent layers back to the new alignment targets. Alignment schemes are
discussed in more detail later in this chapter.
After the positions of the alignment marks are determined, the next step toward
achieving layer-to-layer overlay is to align the X axis of the pattern on the wafer
with the X axis of the stage, similar to the requirement for the reticle (Fig. 6.3).
This requires a wafer rotation that is typically accomplished by rotating the chuck
on which the wafer is mounted. The wafer can then be moved, using the laser
interferometer to precisely measure the wafer’s movements in X and Y, and to
position the wafer correctly under the lens, so that the patterns of new exposures
will overlay pre-existing patterns. Wafer stage rotation is measured using multiple
beams in the interferometer (Fig. 5.37).
After the existing wafer patterns are made parallel to the stage’s X axis, the
wafer is translated using the interferometrically controlled stage. The amount
of translation between alignment and exposure is dependent upon the designed
relationship of the alignment targets and the exposure fields. By aligning to two
marks, four pieces of positional information are acquired: the X and Y positions
of both wafer alignment marks. This provides information for the required rotation
and translations. However, there is one additional piece of information acquired
during the alignment of two marks that is essential for achieving good overlay. The
separation between the marks provides a measure of the expansion or contraction
of the wafer between masking steps. Wafer expansion or contraction up to
several parts per million is quite normal during the processing of silicon wafers,
a consequence of the deposition and removal of thin films from the wafer.4–7
For example, wafer contractions of ∼3 ppm of 100-mm wafers were reported
for wafers prepared for LOCOS oxide isolation, as a consequence of silicon
nitride depositions. Similarly, oxide depositions cause wafers to expand by similar
amounts. The importance of correcting for these expansions or contractions is the
subject of Problem 6.1.
Overlay 221
With two alignment marks, it is possible to correct for wafer rotation, X and
Y translation, and isotropic expansion; i.e., it is possible to correct for expansion
that is the same in the X and Y directions. With three or more alignment marks,
it is possible for the stepper to correct for different expansion in the two axes if
the alignment marks are not colinear with a single axis. Moreover, by choosing
multiple alignment marks that are not all colinear on a single axis, it is possible
to correct for rotation of the patterns on the wafer with respect to both axes of
the stage. Such may be necessary when a stepper, either the one used for the pre-
existing pattern or the one used for patterning the overlaying exposure, has X and
Y stages that do not move orthogonally, i.e., at 90 deg to each other.
wafer was aligned to the fiducial marks, it was moved a known distance of several
centimeters to the appropriate location for exposure. This was done using the laser
interferometrically controlled stage to ensure precise motion control. If the reticle
and wafer were perfectly aligned and if all the distances were as designed, then the
wafer was well aligned. The overlay capability of this system was approximately
±0.7 µm.
There were a number of inadequacies with the alignment system in the first
steppers. Alignment was accomplished manually, both in terms of having the
operator visually ascertain when the wafer alignment marks were positioned
properly relative to the fiducials, and in having the operator move the wafer
manually using a joystick. The reticle was also aligned manually and positioned
using mechanical manipulators. Today’s wafer steppers use automatic forms
of optical pattern recognition, and wafer motion is controlled using automatic
feedback between the pattern recognition system and the precise wafer stage.
Because the wafer was not aligned directly to the reticle in the GCA DSW4800,
periodic calibrations involving test wafers were required to ensure that the
individual wafer and reticle alignment provided the required overlay.8 Moreover,
the baseline—the distance between the align position and the expose position—
in this stepper was unstable with respect to temperature and barometric pressure,
and varied widely in time. This baseline error severely limited both the overlay
performance and the productivity of this stepper since frequent calibrations were
required to maintain adequate overlay.
Modern steppers that use off-axis alignment have automated self-calibration
that essentially eliminates the baseline error effectively and efficiently.9 The
highly precise laser stage again plays a central role in the calibrations. In a
typical implementation, an artifact containing a wafer-style alignment mark and
a detector is placed on the wafer stage.10,11 The separation between the alignment
mark and the detector must be known very accurately. The detector is designed
to detect marks located on the reticle and must be responsive to the exposure
wavelength. By shining light at the exposure wavelength through the reticle marks
and locating them with the stage-mounted detector, the position of the reticle image
is determined in the coordinate system of the stage. For excimer laser steppers,
the self-metrology must be synchronized with the detection of pulsed light. Upon
determining the position of the reticle, the position of the wafer alignment system
is identified by aligning the wafer-style mark on the artifact using the wafer
alignment system. The distance between the wafer alignment system and the reticle
image is measured using the laser stage, which provides an extremely accurate
determination of the baseline. Because the self-calibration can be done quickly, the
baseline can be updated frequently, thereby ensuring good alignment at all times
and without suffering the significant loss of productivity that accompanied early
off-axis alignment systems. By using light at the exposure wavelength to align
the reticle, challenges related to chromatic correction for performing the reticle
alignment are avoided. An excellent summary of the issues associated with off-
axis alignment has been written by van den Brink, Linders, and Wittekoek.12
Overlay 223
Systems have also been developed for aligning wafers directly to the
reticle.3,13–15 There are a number of challenges associated with this approach,
because direct wafer-to-reticle alignment must be accomplished through the
projection optics instead of a dedicated alignment microscope. In order to achieve
the best imaging, the projection optics are designed to image over a very narrow
range of wavelengths. Through-the-lens (TTL) alignment must necessarily avoid
broadband illumination. The exposure wavelength is the simplest to use for TTL
alignment since this is the wavelength at which the projection lens has minimum
aberrations. However, there are disadvantages with alignment of wafers at the
exposure wavelength (see Problem 6.2). Other wavelengths can be used for
alignment, but they require additional optics to correct for chromatic aberrations.
These must be redesigned for each new family of projection optics. Moreover, to
accommodate TTL alignment at nonactinic wavelengths, it may be necessary to
compromise some aspect of imaging at the exposure wavelength. For example,
antireflection coatings on the lens elements may no longer be optimal at the
exposure wavelength in order to avoid spurious reflections at different wavelengths
used for alignment. In spite of such problems, the disadvantages associated with
alignment at the exposure wavelength have been found to outweigh the benefits,
and late-generation TTL alignment systems operated at nonactinic wavelengths.
Monochromatic illumination used for alignment may lead to thin-film interference
effects for wafer reflectance signals when faced with the normal range of thickness
variation in films like nitride or oxide resist.16,17 Asymmetries in resist coatings
over the alignment marks, when viewed in monochromatic light, may also lead
to alignment errors.18 However, issues associated with thin-film optical effects
have been encountered and solved with many off-axis alignment systems that use
monochromatic illumination, so this problem is not unique to TTL alignment, nor
is it insurmountable.
Until quite recently, the relatively low NA of the stepper lens, which necessarily
serves as the objective lens of a TTL alignment system, has had lower resolution
than a dedicated alignment microscope. The resolution for the purpose of
alignment is further degraded, relative to the imaging resolution, if an alignment
system uses a longer wavelength than is used for imaging (see Problem 6.3). This
is problematic, since overlay control must be a fraction of the minimum feature
size.
For many years, the steppers from ASML used a phase-grating alignment
system19 to directly align reticle and wafer images through the main stepper lens.
Instead of relying on an optical-line-edge profile to produce an alignment signal,
the ASML alignment system involved a phase grating on the wafer produced
by etching an alignment mark into the wafer to produce alternating-phase light
(Fig. 6.6). The maximum alignment signal resulted when the two phases differed
by 180 deg, something that was achieved by etching the alignment mark to the
appropriate depth (see Problem 6.5). Auxiliary optics compensated for optical path
differences between HeNe laser light (λ = 632.8 nm) as used by the alignment
system and the actinic light for which the imaging lens was designed (Fig. 6.7). By
working with the diffracted signal, it was possible to filter the image to produce
224 Chapter 6
Figure 6.6 A phase-grating alignment mark. In practice, the lower parts of the mark may
be filled with a different material than that which covers the higher parts, so there may be
an intensity as well as a phase difference between the two sets of beams.
excellent signal-to-noise, with good immunity to small defects and poor linewidth-
edge definition. By not relying on edge detection, the resolution requirements of
the alignment optics were relaxed considerably (see Problem 6.4). The diffraction
alignment method can also be used in an off-axis mode.20
ASML also promoted the use of the so-called zero-level alignment strategy in
which the wafer has alignment marks etched directly into the substrate before any
other processing. All subsequent masks are aligned to the zero-level marks. This
alignment strategy has some obvious disadvantages, such as requiring additional
exposures and wafer processing to produce the alignment marks. However, the
zero-level approach has certain advantages, particularly if it is possible to maintain
the integrity of the mark through all subsequent processing. This is often possible
because the process used to generate the alignment marks can be optimized for
the purpose of alignment rather than some overriding circuit device consideration.
The ability to align to well-maintained marks is a significant advantage. Moreover,
since all layers align to the same marks, certain offsets, such as those that result
from placement errors of alignment marks due to lens distortion, do not affect
overlay. This can be understood as follows. The absolute position of an alignment
Overlay 225
mark is displaced from its designed position due to lens distortion and reticle
registration errors. If all process layers are aligned to the same alignment marks,
they all have the same registration error due to the lens and reticle errors, and
this registration error produces no net overlay error. When alignment marks are
generated at many different masking steps, on many different steppers, different
errors are introduced at different masking steps. It is possible to correct for these
errors, but the corrections can become difficult to maintain over a base of many
steppers.
There is another advantage to the zero-layer alignment scheme. If every process
layer has alignment
√ variation σ to the zero level, then the layer-to-layer alignment
variation is 2σ for any pair of process layers. On the other hand, when alignment
21
is sequential (Fig. 6.8), the overlay between two process layers √ can be larger.
For example, the overlay between layers D and A would be 3σ with sequential
alignment. There are, of course, some disadvantages to the zero-layer scheme. It
may involve an extra processing step, and it does not allow for ultimate alignment
accuracy of one layer to another since all alignments are secondary.
Because there is no associated baseline error, TTL alignment appears to have an
inherent advantage over off-axis alignment systems. However, there are a number
of problems that arise during the implementation of TTL alignment systems. As
mentioned previously, it is necessary to provide for color correction since there
are a number of reasons to avoid aligning at the actinic wavelength.23 This is
not a difficult optical problem to solve, but it does require new alignment optics
every time new projection lenses are introduced. It also means that the coatings
on the lens elements must accommodate the alignment and actinic wavelengths.
As a consequence, both imaging and alignment are ultimately compromised. For
these reasons, companies that, in the past, have implemented TTL alignment
systems, such as ASML and SVGL, have more recently introduced off-axis
alignment systems,20 often not incorporating the TTL alignment capability at
Figure 6.8 Alternative alignment trees. A, B, C, and D are different process layers. It is
possible to have hybrid alignment trees.22 For example, both layers C and D could be aligned
to marks generated on layer B.
226 Chapter 6
all. For their newest generations of exposure tools, ASML offers only off-axis
alignment capability, similar to what is available from Nikon and Canon.
Most alignment systems employ some type of microscope optics to view
alignment marks. The first steppers used bright-field imaging, but these often
had significant process sensitivities. For example, resist that coats asymmetrically
over alignment marks24,25 with appreciable topography can induce shifts in the
image of the alignment target.26 As shown in Fig. 6.9, the spin coating of resist
over topography does not produce symmetrical films. With nonplanar alignment
marks, optical alignment systems are susceptible to thin-film interference effects
that can result in low-contrast signals27 or misalignment.28 Because spin coating
is a radial process, the misalignment goes in opposite directions on diametrical
sides of the wafer and results in a scaling error. (Classes of overlay errors are
discussed later in this chapter.) Resist-coating processes can be modified to reduce
the impact of thin-film optical effects.29 Optical modeling has shown that dark-field
alignment systems are less sensitive to these thin-film optical effects than bright-
field or diffraction alignment systems.30 Because of reduced process sensitivity,
alignment systems based on dark-field imaging were introduced in steppers made
by Ultratech,31 GCA,32,33 and Nikon.34
Dark-field alignment systems have problems of their own. For example, with
grainy and highly reflective metal, polysilicon, or silicide surfaces, a significant
amount of light is scattered into the dark-field cone. Consequently, steppers with
dark-field alignment systems also feature bright-field options for use on metal
layers. The user can then select, by layer, the alignment optical type that gives the
best overlay. Modeling has been used to gain further insight into the acquisition
of alignment targets.35–38 The influence of grains can be minimized by the use of
low-coherence illumination (large σ) in the alignment optics,39 or through the use
of interference alignment techniques, which involve the design of the alignment
system and are generally beyond the control of the user. A good reference detailing
optimization of dark-field alignment on a stepper is given by Smith and Helbert. 40
Figure 6.9 Resist films coated over topography will be asymmetric. When this topography
is a wafer alignment mark that is detected optically, misalignment can result from the
asymmetry.
Overlay 227
Misalignment can happen due to poor optics in any alignment system. For
example, nontelecentric lenses (to be discussed later in this chapter), in which
there is a coupling between defocus and lens magnification, result in alignment
variations as focus changes. Asymmetric illumination of alignment marks also
produces misalignment, similar in effect to real asymmetry in the alignment marks.
Canon has developed a broadband TV alignment system that uses a high-
resolution CCD camera system to align the wafer mark to the reticle mark.41 The
broadband illumination helps to reduce thin-film interference effects, while the
CCD camera has extremely small effective pixel sizes. However, the broadband
illumination dictates a separate optical system from the projection lens, so this
is an off-axis alignment system and must rely on autocalibration to achieve tight
registration.
(XW , YW ) in the coordinate system of the wafer. In terms of the stage coordinates,
the positions of the product die can be determined by assuming that θ in Eqs. (6.3)
and (6.4) is small, and the stage and wafer coordinate systems are then linearly
related:
XS = S X XW − θX YW + OX , (6.5)
and
YS = S Y YW + θY XW + OY . (6.6)
Figure 6.10 Hierarchy of overlay errors. The wafer stage and linear errors comprise the
foundation.
and
where ∆X is the overlay error in the X direction and ∆Y is the error in the Y
direction. The overlay error ∆X can be expressed in a MacLaurin series of the
230 Chapter 6
function fX :
The sign prior to the RX Y term is negative to put these in the same form as the
alignment equations [Eqs. (6.3) and (6.5)]. There is a similar expression for ∆Y:
If only linear terms are retained, then Eqs. (6.9) and (6.10) become
∆X = T X + e x X − RX Y + e x (6.11)
and
∆Y = T Y + ey Y + RY X + ey , (6.12)
where e x and ey are the residual errors that do not conform to the model. The most
common source of residual error is the imprecision of the stepper’s stage, which
is on the order of 5 nm or less (3σ) for the current generation of wafer steppers.46
Another source of nonmodeled error arises when multiple steppers are used, which
is the issue of matching that is discussed in the next section.
The linear terms in Eqs. (6.11) and (6.12) have physical meanings. Each layer
can be considered as patterns printed on a rectangular or near-rectangular grid.
The parameters T X and T Y represent translation errors in the X and Y directions,
respectively, and indicate an overall shift of the printed grid relative to the grid of
the substrate layer. The factors E X and EY are scale errors, which represent the
errors made by the stepper in compensating for wafer expansion or contraction.
Scale errors are dimensionless and are usually expressed in parts per million. The
coefficients RX and RY are rotation factors. When RX = RY , one grid is rotated
relative to the other, and this accounts for the sign convention chosen in Eq.
(6.11). If the angle between the axes of the grids is not equal, then RX , RY .
This latter error is referred to as an orthogonality error since it represents the
situation in which the grid for at least one of the layers (substrate or overlaying)
has nonorthogonal axes. In the presence of orthogonality errors, one can still talk
of a grid rotation error, given by
RX + RY
. (6.13)
2
θ ≈ sin θ, (6.14)
Overlay 231
for small θ, the rotations are usually expressed in radian measure, typically in
units of microradians. This linear model for characterizing grid overlay errors [Eqs.
(6.11) and (6.12)] was introduced by Perloff,44 and it is still widely used today.
Stage precision is typically measured by stepping a pattern on a wafer and then
printing a second pattern on top of the first. These exposures are performed without
removing the wafer from the chuck or removing the reticle between exposures. The
reticle used for this testing typically has both parts of a structure that can be used
for measuring overlay between two layers. (Such structures will be discussed in
more detail in Chapter 9.) The two parts of the structure are offset on the mask, so
the wafer is shifted when it is placed under the lens for the second exposure. Thus,
stage precision means that a wafer can be moved from one position to another
many millimeters away and then returned to the original position to within 5 nm or
better (3σ). This is an extraordinary level of mechanical control.
Nonlinear overlay errors can arise. While these are typically smaller than the
linear ones, they may not be insignificant. One source—matching—is discussed
in the next section. Another observed example of nonlinearity was caused by
the heating of wafers during i-line exposures resulting in nonlinear errors along
with a wafer scaling–error component.47 A wafer may also change temperature
during the alignment and exposure operations if the wafer is at a temperature
different from that inside the stepper environmental enclosure when the wafer is
first moved into the stepper and is not allowed to come to thermal equilibrium
prior to processing48 (see Problem 6.6). Wafers can experience nonlinear plastic
distortions, often caused by rapid thermal processing,49 which are not corrected
by alignment methods that account only for linear errors. Nonlinear errors are not
corrected by the software of most steppers, which use a linear model, though there
have been proposals for incorporating nonlinear effects.50 Some of the errors that
are not included in the model of Eqs. (6.11) and (6.12) are discussed in the next
section.
Overlay errors may also vary across each exposure field. These types of errors
are referred to as intrafield errors, examples of which are illustrated in Fig. 6.11.
Consider, for example, a magnification (or reduction) error. The lens reduction
typically has nominal values of 4:1 or 5:1, but the magnification deviates from
these nominal values by some small amount. When there is wafer expansion or
contraction between masking steps it is necessary for the stepper to be programmed
to measure this change and compensate for it, not only in grid terms, but in the size
of each exposure field as well. Errors in this correction result in magnification
errors. The other primary intrafield errors are shown in Fig. 6.11.
Intrafield overlay models for wafer steppers were introduced by MacMillan
and Ryden in 1982.51 Since then, the most significant change has been the
introduction of models for step-and-scan systems. The models for step-and-scan
systems include parameters for asymmetric magnification and skew, which are not
relevant for step-and-repeat systems.
Suppose the same stepper is used for printing two layers, one overlaying the
other. The overlay error at any point on the wafer is the sum of the grid errors,
defined previously as the overlay errors at the center of the exposure field, and
232 Chapter 6
Figure 6.11 Intrafield overlay errors deviate from these nominal values by some small
amount.
overlay errors that vary across each exposure field. Let (x, y) be the coordinates
of a point on the wafer, relative to the center of the exposure field in which it is
contained (Fig. 6.12). For a step-and-repeat system, the intrafield overlay errors
are modeled as51
δx = t x + mx − ry + T x xy + T y x2 + e x , (6.15)
and
δy = ty + my + rx + T y xy + T x y2 + ey . (6.16)
Figure 6.12 Intrafield overlay is measured relative to the center of each exposure field.
normal lenses this results in a variation in magnification across the exposure field—
that is, trapezoid errors. Telecentric lenses have magnification independent of the
distance between the object and lens over a limited range of separations. Stepper
lenses have always been telecentric on the wafer side (magnification independent
of wafer focus), but newer generations of lenses are also telecentric on the reticle
side. These double-telecentric lenses are free of trapezoid errors.
Trapezoid errors introduce an anomaly into the overlay models. If all intrafield
errors are symmetric about the origin, then the intrafield errors average to zero (by
definition of being relative to the center of the exposure field) up to measurement
234 Chapter 6
noise, at least when considering overlay where the same exposure tool is used for
both layers. As one can see in Fig. 6.11, trapezoid errors are not symmetric with
respect to the center of the exposure field. The introduction of trapezoid to the
model requires the inclusion of translation terms t x and ty to adjust for the center
offset of the trapezoid errors.
Grid (interfield) models are identical for step-and-repeat and step-and-scan
systems, but the appropriate intrafield models differ. For step-and-scan systems,
the intrafield model is
δx = m x x − r x y + e x , (6.17)
and
δy = my y + ry x + ey . (6.18)
LR = vR t, (6.19)
where vR is the speed at which the reticle is scanned. Similarly, the size of the field
on the wafer, in the scan direction, is given by
LW = vW t, (6.20)
where vW is the scanning speed of the wafer. The reduction in field size, in the scan
direction, from reticle to wafer is
LR vR
= . (6.21)
LW vW
Overlay 235
If vR > NvW , then the field reduction in the scan direction is larger than N. The
magnification in the direction perpendicular to the scan is determined only by
the lens-reduction factor (∼N:1), since the image size in that direction is defined
entirely by the reduction optics.
For step-and-scan systems, intrafield rotation issues are more complicated than
for step-and-repeat systems. For step-and-repeat exposure tools, intrafield rotation
involves the relative orientations of the wafer stage, reticle, and existing patterns.
Step-and-scan systems have an additional component, the reticle scanning stage.
Pure reticle rotation occurs when the scans of the reticle and wafer stages are
parallel, but the reticle is rotated relative to both. The origin of field skew is left to
the reader to derive in Problem 6.7.
For any point on the wafer, the overlay error is the sum of the interfield and
intrafield errors. In the X direction, the total overlay error is OX,x = ∆X + δx, and
the overlay error in the Y direction is OY,y = ∆Y + δy, at the point (X + x, Y + y)
(see Fig. 6.12). For step-and-scan systems:
and
where ρX,x and ρY,y are total residual errors. There are similar equations for step-
and-repeat systems.
Classifying errors is important for controlling overlay, because intrafield and
interfield overlay errors generally arise from different causes. For example, reticle-
rotation errors involve the alignment of the reticle, while wafer-rotation issues
involve separate alignments. In order to classify overlay errors the coefficients
in Eqs. (6.22) and (6.23) need to be determined. The most common method for
extracting these coefficients from measured data (OX,x , OY,y ) is the method of least
squares.52,53 In this technique, more data are needed than there are parameters. For
Eqs. (6.22) and (6.23), measurements need to be made at two points per exposure
field or more, in order to model intrafield effects. (Each measurement site provides
two measurements, one in X and one in Y.) The least-squares method determines
the set of coefficients {T X , T Y , m x , . . .}, which minimizes the sum of squares of the
residual errors:
X
ρ2X,x + ρ2Y,y , (6.24)
X,x,Y,y
where the sum is extended over all measured points. Computational speed and
insensitivity to noise are among the reasons why the least-squares method is
the mathematical technique most commonly used. Coefficients obtained from the
least-squares method do not represent local minima. The choice of mathematical
236 Chapter 6
methods used to determine the model coefficients is further discussed later in this
chapter.
Example: Misregistration of patterns on reticles can produce overlay errors. An
engineer wants to compensate for reticle registration errors measured on a product
reticle. Errors on the reticle that fit the model of Eqs. (6.15) and (6.16) were
corrected on step-and-scan systems. Constants were added to each of these
equations as well, to correct for translation shifts. The reticle errors, as measured
on the product reticle, are given in Table 6.4. The model coefficients are given in
Table 6.5. As can be seen, a substantial fraction of the reticle error is eliminated
by applying magnification, rotation, and translation offsets. For example, the
registration error at the position (−69.5, 0) is given by
δx = X-translation + m x x − r x y + e x , (6.25)
= −10.6 + 0.963 × 69.5 − 0.329 × 0.0 + e x , (6.26)
= 56.3 + e x . (6.27)
The measured error is 63 nm, so the model was able to account for most of the
registration error at that point, and most of the error was therefore correctable.
Table 6.4 Reticle registration errors. The first two columns are the positions where
registration is measured. The third and fourth columns are the registration errors, as
measured, and the last columns are the residual errors, after the modeled part of the errors
is subtracted from the raw data.
Measurement positions Measured registration errors Residual registration
errors:
step-and-scan model
x (mm) y (mm) δx (nm) δy (nm) ex (nm) ey (nm)
The step-and-scan models considered to this point have been purely linear. From
Eqs. (6.9) and (6.10) it is readily seen that the interfield terms can be expanded
to include nonlinear terms, and a similar approach may be taken to characterize
intrafield overlay errors. A nonlinear model was used in the early days of stepper
lithography to describe mix-and-match overlay between a stepper and a Perkin-
Elmer scanner.54 With occasional exceptions,55 only linear terms were considered
sufficient for controlling overlay until approximately 2001, at which point interest
in nonlinear models began to increase. Part of this increased interest was the
result of tightened requirements for overlay, beyond the typical 0.7× reduction
node-to-node. With shrinking, everything did not scale at the same pace. As a
consequence, overlay went from being ∼1/3 of the half pitch to ∼20% of the half
pitch (see Table 6.1). At the same time the difficulty of improving overlay also
increased, and it became necessary to find improvement wherever possible. The
introduction of 300-mm wafers appears to have been accompanied by a higher
level of nonlinear wafer distortion induced by processes such as film deposition
and thermal annealing.56 After years of concentrated efforts to minimize linear
errors, the nonlinear components and residuals became a larger fraction of the total
overlay errors.57
The need to address nonlinear overlay errors requires a substantial increase
in the number of wafer alignments and overlay measurements. If one can
legitimately assume purely linear and isotropic errors, then only two across-wafer
and two within-field alignments are needed, and only a few more measurements
are required if parameters such as field magnification require asymmetric
compensation. On the other hand, a much larger number of measurements is
needed when there are substantial nonlinear contributions to the overlay error,
potentially reducing scanner throughput as well as increasing metrology costs. If
the nonlinear signature is reproducible wafer-to-wafer, at least within a single lot
of wafers, then a large number of alignment and measurement sites can be used on
one wafer to determine this signature, which can then be applied to other wafers
on which fewer alignments are made. Nonlinear contributions are also a significant
contributor when two layers, between which overlay is a concern, are exposed on
a different scanner. This problem that arises from using more than one scanner is
the topic of the next section.
6.3 Matching
A set of nonrandom overlay errors not included in the overlay models are found
in situations where more than one stepper is used. These errors are referred to
as matching errors since they refer to the degree on which the pattern placement
produced on one stepper matches that of other steppers. There can be grid and
intrafield matching errors. Grid matching errors arise from absolute stepping errors.
While stepper stages are extremely repeatable, their stepping may deviate at each
stage position, on average, from a perfect grid. When a single stepper is used for
both the substrate and overlaying layers, these average deviations cancel out. A
different situation arises when different steppers are used for the two layers. Some
238 Chapter 6
differences in the stepping of different stages are correctable. For example, scale
errors can arise when the beams of the interferometer are not exactly perpendicular
to the mirrors on the stage (Fig. 6.14). The laser beam travels a distance of 2h,
while the stage travels a distance d. The system is designed on the assumption that
h = d, while actually
d
h= . (6.28)
cos θ
This indicates a scaling error proportional to cosθ, which is why these errors are
designated as cosine errors—being a linear error, it can easily be corrected.
It is also possible that the X and Y steppings are not truly perpendicular on one
machine, resulting in a grid that is not perfectly orthogonal. This is correctable
through software if the magnitude of the error is known. Stability of orthogonality
is important, so stage mirrors are usually made from thermally stable materials,
such as Zerodur, and it is preferable to have the X and Y stage mirrors fabricated
from a single block of material to maximize stability.58
Nonlinear grid-registration errors originate in the nonflatness of the stage
mirrors. Consider the stage mirrors drawn in Fig. 6.15. The branch of the
interferometer that measures the stage’s position in the Y direction measures
apparent movement in the Y direction as the stage is moved from left to right
because of the mirror’s nonflatness. When a single stepper is used, the resulting
stepping errors, relative to a perfect grid, do not lead to overlay errors because
these stage-stepping errors occur for all layers and cancel each out. Because mirror
nonflatness arises from mirror polishing, the nonflatness generally varies from
stepper to stepper, and stage mismatch occurs. For modern wafer steppers, the
stage mirrors are flat to approximately λ/40, or about 16 nm, where λ = 633 nm is
the wavelength of light of the stage’s interferometer.
Grid-registration errors can be corrected using software.59 Typically, reference-
matching wafers are used for matching all steppers in a facility to the same
positions. Look-up tables can be used for these corrections, or the grid models can
Figure 6.15 Stage mirror nonflatness that leads to nonlinear stage-matching errors.
∆r = D3 r3 + D5 r5 , (6.29)
∆x = D3 xr2 + D5 xr4 , (6.30)
∆y = D3 yr2 + D5 yr4 , (6.31)
where r is the radial distance from the center of the lens field,
q
r= x2 + y2 . (6.32)
Because of the magnitude of the exponents of terms in Eq. (6.29), D3 and D5 are
referred to as third- and fifth-order distortion, respectively. Systematic distortion in
scanning systems results from the averaging of Eqs. (6.30) and (6.31) across the
slit during the scan. This leads to image blurring, so it is important that distortion
be low in the lenses used for scanning lithography.
Distortion tends to be fairly constant within a particular lens family. However,
there are strong economic reasons to minimize the numbers of very high-
performance steppers, and to mix-and-match steppers of different types.61
Consequently, lenses of more than one type are usually used to fabricate
semiconductor devices. Older generations of projection optics had nonzero levels
of distortion in the design, where the magnitude of ∆x and ∆y, due to distortion
alone, could be as large as several tens of nanometers, sometimes up to 200 nm. In
such situations good matching could be achieved only within given lens families,
particularly when different lens designs might have distortions with opposite signs
for D3 and D5 . The designs of modern lenses have negligible levels of radial
third- and fifth-order distortion. For these newer lenses, placement errors are due
primarily to manufacturing imperfections, which tend to produce intrafield errors
that are not systematic, as in the case of third- and fifth-order lens distortion.
Polishing variations across the surfaces of lenses and mirrors and inhomogeneity
in optical materials can lead to intrafield registration errors. These tend to have
random patterns.
While distortion is extremely stable for i-line systems, distortion varies with
very small shifts—less than a picometer—in wavelength for DUV systems.62,63
Consequently, additional controls are needed on DUV machines, the most critical
being wavelength control. Modern KrF excimer lasers, the light source for 248-
nm exposure systems, contain internal wavelength references that provide good
wavelength control.64 As with anything, this calibration system can malfunction or
drift, and there is always the potential for change in third- and fifth-order distortion
as a consequence of a small shift in wavelength. Also, changes in barometric
pressure change the wavelength of light in the air, and excimer steppers must adjust
for this correctly, or third- and fifth-order distortion may be introduced.
For step-and-scan systems the intrafield placement errors also have a scanning
component. In some cases, lenses have improved to the point that intrafield
placement errors are dominated by the stage scanning.65 Part of this reduced
contribution from the lens is due to the scanning itself, because the static lens-
placement errors are averaged during the scan. If the lens-placement error at static
field position (x, y) is ~ε(x, y), then the placement error ~e(x) of the scanned image at
Overlay 241
which is illustrated in Fig. 6.17. Note that there is no variation (at a given slit
position of x) of the intrafield placement errors in the y direction from the static
lens-placement errors if the scanning is perfect. Such errors in the y direction must
necessarily result from the scanning.
There are random and systematic contributions to the scanning errors. In this
case random means more than failure to conform to some intrafield error model.
These random errors vary from field-to-field and are independent of measurement
error. These are average errors during the scan, so they lead to overlay errors
but not image fading, which results from variations during the scan. This random
component does not exist in step-and-repeat systems, so there is more potential for
overlay errors with scanners, even for a stepper to itself. Moreover, the stage that
scans the reticle can malfunction, another control problem unique to step-and-scan
systems.
The intrafield overlay contains some theoretically correctable contributions that
can be fit to the overlay model [Eqs. (6.22) and (6.23)]. When using two different
exposure tools, one for the first layer and the other tool for the second layer, there
are additional components that arise from differences between the two lenses that
do not conform to the model. The matching of the lenses is the set of residual
overlay vectors remaining after subtracting out the correctable contributions. These
sets of vectors are dependent upon the criterion used for determining the model
parameters. The least-squares method is used most commonly, but it is not the
only reasonable criterion. For example, since devices are usually designed to yield
so long as overlay is less than a particular value and fail when the value is exceeded,
yield can be maximized when the worst overlay errors are minimized.67 This
criterion consists of the minimization of
max ρ x , ρy , (6.34)
x,y
where ρ x and ρy are the residual errors in Eqs. (6.22) and (6.23). The minimization
of Eq. (6.34) differs from the least-squares criterion, which consists of the
minimization of the quantity in Eq. (6.24) and typically leads to different model
coefficients. We say that lenses are matched when correctable parameters are set
to minimize the selected criterion, either Eq. (6.24), (6.34), or another that may be
chosen.
The least-squares method has two key advantages. The first is computational
simplicity, since model coefficients can be found through simple matrix
calculations. The least-squares method also provides transitivity,68 that is, if
Stepper A is matched to Stepper B, and Stepper B is matched to Stepper C, then
Stepper A is also matched to Stepper C. Minimization of maximum error does
not share this characteristic.69 This is a practical problem when matching many
steppers within a large fabricator.
When only a single stepper is used, the intrafield overlay model accounts for
nearly all of the intrafield overlay errors, and the overlay errors throughout the
entire field are therefore inferred from measurements at only a few points within
the exposure fields. It is quite a different situation when more than one stepper is
used. Matching methodology must account for overlay in the areas of the exposure
field that are not accessible when measuring product wafers. Specialized structures
are normally used for measuring overlay, and these are usually placed in the scribe
lanes between product dies. This limits the amount of overlay data collected on
product wafers, since most of the exposure field is occupied by product devices,
not overlay-measurement structures. This limitation on the number of points within
each exposure field at which overlay can be sampled has significant consequences
since overlay is measured in areas where the product is not placed.
Suppose overlay is measured on several fields of the wafer, and also within
each field at four sites with the same intrafield positions. One can fit the acquired
data to the overlay model [Eqs. (6.22) and (6.23)] by minimizing the appropriate
metric (least squares, minimum error, etc.). This approach may not optimize the
overlay over the entire exposure field, particularly the parts of the field in which
the product is located and overlay is not measured. This is demonstrated in the
following way.69 Between two steppers the overlay is measured on a 12 × 11 grid
with 1.95-mm spacings in the x and y directions within the exposure field. The
resulting lens matching is shown in Fig. 6.18. In a gedanken (thought) experiment,
the overlay measurements are considered at only four points among these 132 sites,
in a pattern that resembles a typical overlay-measurement plan. Several subsets
of four points are considered in Fig. 6.19. For each set of four sites, the overlay
model coefficients are recalculated and plotted in Fig. 6.20. As one can see, the
resulting model coefficients vary significantly among sampling plans. Recall that
the baseline set of coefficients was the one that optimized the overlay over the entire
exposure field, not just at the four corner points. By measuring at only four points
and adjusting the stepper to minimize overlay at just the measured points, overlay
is moved away from the optimum, overall. This problem occurs when different
steppers are used for the overlaying of patterns.
Overlay 243
Figure 6.19 Various sampling plans for measuring overlay. In Sampling Plan #1,
measurements are taken in the four extreme corner measurements sites.
Figure 6.20 Model parameters for different sampling plans of the single data set shown in
Fig. 6.18.
Figure 6.21 The overlay errors between a first layer printed with standard illumination
(NA = 0.63, σ = 0.65) and a second layer exposed using the same numerical aperture, but a
partial coherence of σ = 0.3.69
and the matching errors discussed here occur infrequently when single steppers
are used with fixed operating parameters.
The changes in lens-placement errors caused by variations in illumination
conditions can be understood by considering the imaging of a simple diffraction
grating, discussed in Chapter 2. With various angles of illumination incidence
numerical apertures, and the pitches of the pattern, the light of a grating
pattern projected through a lens passes through different parts of the lens. Many
aberrations are a result of variations in the polishing of optical surfaces and
inhomogeneity in glass materials. Light rays going through one part of the
lens have different errors than rays going through different parts of the lens.
As a consequence, aberrations vary for light passing through different parts
of the lens. For a given feature, the aberrations of its image depends on the
particular imperfections in those parts of the lens through which the light from
that feature passes. Hence, aberrations vary with pitch, numerical aperture, and
the illumination conditions. The light from patterns other than gratings is also
distributed throughout the optics with dependence upon feature size and proximity
to other features. Since light is diffracted in fairly specific directions with highly
coherent illumination and is more spread out with less coherent light, variations in
distortion for different pitches and geometries are smaller with reduced coherence
(larger values of σ).
To address these issues, at least partly, capability for measuring overlay within
product die has been developed.77 Specialized overlay-measurement marks are still
required, but they have been made small enough (<5 µm × 5 µm) that a quantity of
them can be placed within product die areas without interfering too greatly with the
circuit design. Use of such targets can reduce the impact of incomplete sampling.
As feature sizes shrink, these subtle issues generate overlay errors of
significance. For example, certain aberrations, such as coma, result in a feature-size
dependency for intrafield registration. This is shown in Fig. 6.22, where simulated
246 Chapter 6
image-placement shifts of isolated clear lines are plotted against space width,
where the lens has a7 = 0.035 waves of coma.78 In the presence of coma, overlay-
measurement structures with large features measure different overlay than actually
occurs for critical fine-linewidth features in the product.
This implies that overlay measurements do not represent the overlay of critical
features in the circuits. For example, using overlay-measurement structures of
different widths, effective field magnification is found in one instance to differ
by more than 1 ppm between 2.0-µm and 0.2-µm features.79 For a 20-mm × 20-
mm field, the 0.2-µm features that might be found in the circuit are placed ±20 nm
differently at the edges of the field, relative to the positions indicated by the 2.0-µm
features that are conventionally used for overlay measurement. Modern lenses have
lower levels of aberrations than assumed for the calculations of Fig. 6.22, but
image-placement errors of several nanometers still exist between large and small
features. Measuring overlay directly using small features is a problem. Overlay
is typically measured using optical tools (Chapter 9) that can measure features
reliably only if their size is greater than about 0.25 µm. Optical tools are used
for measuring overlay in spite of this limitation because they have throughput and
cost advantages over measurement tools, such as scanning-electron microscopes,
that are capable of measuring smaller features. Subtle issues such as these need to
be addressed when nanometers represent a significant fraction of the total overlay
budget.
In order to enhance productivity, high-throughput steppers with large exposure
fields are often used for noncritical layers, while smaller-field steppers are used for
the critical layers. The mixing of these different types of steppers often results in
nonconcentric exposure fields (Fig. 6.23). Control of overlay requires extensions
of the models discussed thus far, which have assumed that the centers of exposure
fields sit approximately on top of each other.80,81 Overlay needs to be measured
at the corners of the critical fields in order to identify overlay errors associated
with intrafield errors that are associated with the smaller field. Mathematical
models describing field rotation and magnification must take into account the
nonconcentric nature of the fields.
Overlay 247
determine the best mark. Polishing engineers should participate in any programs to
improve overlay, since modifications of the polish process can often reduce overlay
errors considerably, particularly when such errors result from highly variable polish
processes. For example, the use of harder polishing pads may lead to reduced
overlay errors. Stepper manufacturers have also responded with modifications to
their alignment systems, using modified illumination86 and new algorithms for
interpreting alignment signals.87
Problems
6.1 Acceptable values for wafer scaling can be assessed by comparing the effect on
overlay of a 0.01-ppm error in correcting for wafer expansion to the overlay
requirements in Table 6.1. Across a 300-mm wafer, show that a range of 3-
nm overlay error can result from an 0.1-ppm wafer-scaling error. Is this a
significant error for the 32-nm node?
6.2 Give two reasons why alignment systems should be designed to function at
wavelengths other than the ones used for patterning the resist. (Hint: Are there
any particular resist processes that are particularly problematic for actinic-
wavelength-alignment systems?)
6.3 Consider a 0.7-NA 193-nm exposure tool used to pattern 130-nm features.
Show that the diffraction-limited resolution of a through-the-lens alignment
system on this tool, operating at the HeNe laser wavelength of 632.8 nm, is
168 nm. Is this adequate for achieving overlay control that is one-third the
minimum feature size?
6.4 The alignment mark for the ASML alignment system is a phase grating
consisting of approximately equal lines and spaces of 8.0 µm width. If a
HeNe laser is used for alignment and is directed normal to the plane of the
grating, show that the minimum-NA lens required to capture the ±first-order
diffraction beams is 0.04 [use Eq. (2.1)]. What minimum NA is required for
third-order? Fifth-order? Seventh-order? Is this NA requirement too large for
microlithographic-projection optics for through-the-lens alignment?
6.5 Show that the depth required to achieve a 180-deg phase difference for the
first-order diffraction beams from the phase grating shown in Fig. 6.6 is
λ cos θ
.
2n 1 + cos θ
Show that cosθ ≈ 1 is a suitable approximation for first-order alignment on
the ASML alignment system. How much should the depth be adjusted for
optimized seventh-order alignment relative to first-order alignment?
250 Chapter 6
6.7 In a step-and-scan system, the reticle must be parallel with the reticle stage, and
the reticle and wafer stages must be parallel to each other. Show that intrafield
skew errors arise when the reticle is parallel to the wafer stage, but the reticle
and wafer stages do not scan in the same direction.
6.8 Show that the range of displacement errors across a 300-mm wafer due to a
0.01 arcsec rotation error is 14.5 nm.
6.9 Suppose a lens has third-order distortion [Eqs. (6.29)–(6.31)]. Show that the
image-placement error caused by this distortion error in the x direction at
position (x, y) in the exposure field is given by:
W2
∆x = D3 x3 + D3 x ,
12
where (0, 0) is the center of the exposure field. Show that the image-placement
error due to third-order distortion in the y direction is zero everywhere.
References
1. Semiconductor Equipment and Materials International, San Jose, California.
2. H. J. Levinson, Lithography Process Control, SPIE Press, Bellingham,
Washington (1999).
3. M. A. van den Brink, S. Wittekoek, H. F. D. Linders, F. J. van Hout, and R. A.
George, “Performance of a wafer stepper with automatic intra-die registration
correction,” Proc. SPIE 772, 100–117 (1987).
4. V. R. Nagaswami, “In-plane distortions in silicon wafers induced by a sub-
micron CMOS process,” Microelectron. Eng. 9, 457–461 (1989).
5. L. D. Yau, “Process-induced distortion in silicon wafers,” IEEE Trans.
Electron Dev. ED-26, 1299–1305 (1979).
6. J. D. Cuthbert, “Characterization of in-plane wafer distortions by double
polysilicon NMOS processing,” Proc. Microcircuit Eng. 190, (1979).
7. A. Imai, N. Hasegawa, S. Okazaki, and K. Sakaguchi, “Wafer and chip
deformation caused by pattern transfer,” Proc. SPIE 2726, 104–112 (1996).
8. W. C. Schneider, “Testing the Mann type 4800DSW wafer stepper,” Proc.
SPIE 174, 6–14 (1979).
9. H. E. Mayer and E. W. Loebach, “Improvement of overlay and focusing
accuracy of wafer step-and-repeat aligners by automatic calibration,” Proc.
SPIE 470, 178–184 (1984).
Overlay 251
58. M. A. van den Brink, B. A. Katz, and S. Wittekoek, “New 0.54 aperture i-line
wafer stepper with field-by-field leveling combined with global alignment,”
Proc. SPIE 1463, 709–724 (1991).
59. A. Sukegawa, S. Wakamoto, S. Nakajima, M. Kawakubo, and N. Magome,
“Overlay improvement by using new framework of grid compensation for
matching,” Proc. SPIE 6152, 61253A (2006).
60. V. Nagaswami, and W. Geerts, “Overlay control in submicron environment,”
Proc. KTI Microelectron. Sem., 89–106 (1989).
61. J. G. Maltabes, M. C. Hakey, and A. L. Levine, “Cost/benefit analysis of
mix-and-match lithography for production of half-micron devices,” Proc. SPIE
1927, 814–826 (1993).
62. S. K. Jones, E. S. Capsuto, B. W. Dudley, C. R. Peters, and G. C. Escher,
“Wavelength tuning for optimization of deep UV excimer laser performance,”
Proc. SPIE 1674, 499–508 (1992).
63. M. E. Preil and W. H. Arnold, “Aerial image formation with a KrF excimer
laser stepper,” Polymer Eng. Sci. 32(21), 1583–1588 (1992).
64. R. K. Brimacombe, T. J. McKee, E. D. Mortimer, B. Norris, J. Reid, and
T. A. Znotins, “Performance characteristics of a narrow band industrial
excimer laser,” Proc. SPIE 1088, 416–422 (1989).
65. J. de Klerk, “Performance of a high NA, dual stage 193-nm TWINSCAN step-
and-scan system for 80-nm applications,” Proc. SPIE 5040, 822–840 (2003).
66. J. Braat and P. Rennspies, “Effect of lens distortion in optical step-and-scan
lithography,” Appl. Optic. 35(4), 690–700 (1996).
67. H. J. Levinson and R. Rice, “Overlay tolerances for VLSI using wafer
steppers,” Proc. SPIE 922, 82–93 (1988).
68. M. A. van den Brink, C. G. M. de Mol, and J. M. D. Stoeldraijer, “Matching of
multiple wafer steppers for 0.35-µm lithography, using advanced optimization
schemes,” Proc. SPIE 1926, 188–207 (1993).
69. J. C. Pelligrini, “Comparisons of six different intrafield control paradigms in
an advanced mix-and-match environment,” Proc. SPIE 3050, 398–406 (1997).
70. H. J. Levinson, M. E. Preil, and P. J. Lord, “Minimization of total overlay
errors on product wafers using an advanced optimization scheme,” Proc. SPIE
3051, 362–373 (1997).
71. N. R. Farrar, “Effect of off-axis illumination on stepper overlay,” Proc. SPIE
2439, 273–280 (1995).
72. T. Saito, H. Watanabe, and Y. Okuda, “Effect of variable sigma aperture on
lens distortion and its pattern size dependence,” Proc. SPIE 2725, 414–423
(1996).
73. A. M. Davis, T. Dooly, and J. R. Johnson, “Impact of level specific
illumination conditions on overlay,” Proc. Olin Microlithog. Sem., 1–16
(1997).
Overlay 255
Table 7.1 Optical mask requirements. White cells mean solutions exist or are expected.
Light gray cells indicate that significant work is required before solutions are found. Dark
gray cells indicate that no solution may exist by the appropriate date. The phase-shifting
mask (PSM) is described in more detail in Chapter 8.
Year of introduction 2010 2012 2014
“Technology node” 45 nm 36 nm 28 nm
DRAM/MPU/ASIC wafer minimum metal-1 half-pitch (nm) 45 36 28
MPU gate in resist (nm) 35 28 22
Contact in resist (nm) 56 44 35
Magnification 4 4 4
Mask minimum primary feature size (nm) 99 78 62
Mask subresolution feature size opaque (nm) 71 56 44
Image placement (nm, multipoint) 5.4 4.3 3.4
CD uniformity (nm, 3σ)
Isolated lines (MPU gates, binary or attenuated phase-shifting masks) 2.0 1.7 1.3
Dense lines (DRAM half pitch, binary or attenuated phase-shifting masks) 3.4 2.7 2.1
Contact/vias 1.9 1.5 1.2
Linearity (nm) 7.2 5.7 4.5
CD mean-to-target (nm) 3.6 2.9 2.3
Defect size (nm) 36 29 23
Data-volume (GB) 825 1310 2080
Mask-design grid (nm) 1 1 1
Attenuated-PSM transmission mean deviation from target (+/− % of target) 4 4 4
Attenuated-PSM transmission uniformity (+/− % of target) 4 4 4
Attenuated-PSM phase mean deviation from 180◦ (+/− deg) 3 3 3
Alternating-PSM phase mean deviation from nominal phase-angle target 1 1 1
(+/− deg)
Alternating-PSM phase uniformity (+/− deg) 1 1 1
Magnification: Lithography tool reduction ratio N:1.
Mask minimum primary feature size: Minimum printable feature after OPC application to be controlled on the mask for CD
placement and defects.
Mask subresolution feature size: The minimum width of nonprinting features on the mask such as subresolution assist features.
Image placement: The maximum component deviation (x or y) of the array of image centerlines relative to a defined reference
grid after removal of isotropic magnification error.
CD uniformity: The 3σ deviation of actual image sizes on a mask for a single-size and tone-critical feature. This applies to
features in x and y and isolated features.
Linearity: Maximum deviation between mask “mean to target” for a range of features of the same tone and different design sizes.
This includes features that are equal to the smallest subresolution assist mask feature and up to 3× the minimum wafer half pitch
multiplied by the magnification.
CD mean-to-target: The maximum difference between the average of the measured feature sizes and the agreed-to feature size
(design size). Applies to a single feature size and tone.
Defect size: A mask defect is any unintended mask anomaly that prints or changes a printed image size by 10% or more. The
mask-defect size listed in the roadmap is the square root of the area of the smallest opaque or clear “defect” that is expected to
print for the stated generation. Printable 180-deg phase defects are 70% smaller than the number shown.
Data volume: This is the expected maximum file size for uncompressed data for a single layer as presented to a pattern generator
tool.
Mask-design grid: Wafer-design grid multiplied by the mask magnification.
Transmission: Ratio, expressed in percentage, of the fraction of light passing through an attenuated-PSM layer relative to the
mask blank with no opaque films.
Phase: Change in optical-path length between two regions on the mask expressed in degrees. The mean value is determined by
averaging phase measured for many features on the mask.
Phase uniformity: The maximum phase-error deviation of any point from the mean value.
Masks and Reticles 259
layout was identical to the wafer layout. This approach is different from the way
that wafers are patterned with wafer steppers, where the mask pattern has only part
of the wafer pattern, and full wafer coverage is obtained by repetitive imaging of
the mask. In the earlier 1:1 lithography, the pattern on the mask was often generated
through a lithographic process in which the mask blank was exposed die-by-die on
a photorepeater, a tool that was essentially a stepper that patterned masks instead of
wafers. This method of mask making required an object whose pattern was imaged
onto the mask by the photorepeater. To distinguish between the object whose image
was repeated and that which was the mask shop’s product, the object used on the
photorepeater was referred to as a reticle. The inefficiency of this process, by which
circuit patterns were first transferred from a reticle to a mask and then to the wafer,
was recognized, and the wafer stepper was invented, whereby the circuit patterns
were stepped directly onto the wafers. To be technically correct, the object imaged
by a step-and-repeat system should be called a reticle, but producers of integrated
circuits in wafer fabricators have always referred to their master patterns as masks,
so the terms masks and reticles now tend to be used interchangeably. Regardless of
whether they are called masks or reticles, their form, fabrication, and use is much
the same.
errors contribute to correctable magnification errors, but there are residual errors
that are very significant if a substrate less thermally stable than fused silica is
modeled—even with 4× lens reduction. While there is an acceptable level of
registration errors with fused-silica masks seen in these simulations, substrates,
such as borosilicate glass, with coefficients of thermal expansion more than
ten times larger than that of fused silica, would require unrealistic temperature
control to achieve adequate performance. Fortunately, fused silica has excellent
transparency at DUV wavelengths, and new grades have been developed that
have adequate transparency for use as reticle substrates for wavelengths down to
157 nm.3 Thermomechanical stability and good transparency justify the use of
fused silica as photomask substrates, in spite of poor electrical conductivity and
associated electrostatic damage issues.
Ideal fused silica is amorphous, which constitutes another advantage of
this material for mask substrates. Crystalline materials, even those with cubic
symmetry, will have optical properties that depend on the orientation of the
polarization vector of light relative to crystal axes,4 a property referred to as
birefringence.5 Under stress, fused silica may also acquire a significant degree of
birefringence,6 so care must be taken when manufacturing the fused silica from
which photomasks are to be fabricated.7
Just as images on the wafer become blurred when the imaging plane is outside
the depth-of-focus, images also become blurred when the objects are outside the
lens’s depth-of-field. The depth-of-field is the distance that the object can be
moved, parallel to the optical axis, while maintaining good imaging. The depth-
of-field is related to the depth-of-focus by
where N is the lens reduction (4, 5, etc.). Alternatively, one could say that the
effective depth-of-focus is reduced by reticle nonflatness, by the amount
reticle nonflatness
depth-of-focus = . (7.2)
N2
chromium are given in Table 7.3. From these values one can see that chromium
films of typical thickness are very opaque. It should be kept in mind that
the absorbers on commercially produced chromium-based photomask blanks
typically are not comprised of pure chromium.16 The properties of chromium- and
molybdenum-containing compounds vary among mask-blank suppliers, who can
provide details for their own specific films.
It should be noted that most of the light is blocked by the chromium through
absorption. Consequently, masks will heat during use.17 Again, most of the
heating effect results in correctable magnification errors with suitably designed
symmetrical reticle platens. Compensating for this will require periodic reticle
realignments, which provide measurements of reticle expansion or contraction,
though this will usually reduce stepper productivity. Regardless, the noncorrectable
registration errors may become significant for the 22-nm node and beyond.
In addition to optical properties, the stress of absorber films is important.
There will be stress relief when the absorber film is etched, and this can lead
to mask distortion—often nonlinear—between patterning by the beam writer
and final mask use.18 The actual distortion pattern will depend upon the mask
layout. Extremes between clear and dark areas on the mask will lead to greater
nonlinearities. Mask distortion following etch can clearly be reduced by using
absorbers that have been deposited with low stress.
Figure 7.1 Schematic of an electron optical system for mask writing. For electron writers
where throughput is a priority, the electron source is a thermionic emitter (such as lanthanum
hexaboride), while systems designed for extremely high resolution employ thermal-field
emission sources.22,23
shape and the method of scanning the beam. The two most common beam shapes
are Gaussian round beam and variable-shaped beam. As the name suggests, a
Gaussian round beam has circular symmetry in the writing plane, and the intensity
is well approximated by a Gaussian distribution. In order to form the corners of
rectangular shapes on the mask, such beams must have diameters that are much
smaller than the minimum feature sizes. Shaped beams usually have rectangular
shapes that allow good corner fidelity with relatively large beams, but other shapes
are also possible. As might be expected, shaped-beam systems typically have
higher throughput than Gaussian-beam systems, but there are methods that will be
discussed later in this chapter for maintaining reasonable throughput for Gaussian-
beam systems. Gaussian-beam systems are used where the highest resolution is
needed, while shaped-beam writers are more typically used for the patterning of
reduction masks for use in manufacturing, where resolution can be compromised
to some degree in the interest of economy.
There are two basic approaches to scanning in electron-beam writers. The first is
raster scanning, the principle behind television picture tubes and scanning-electron
microscopes. In this approach, one part of the electron optics is set to scan the beam
back-and-forth across the mask blank, while separate components turn the beam on
and off. In the second type of electron-beam writers—vector systems—the beam
264 Chapter 7
is directed only at those areas that actually get exposed. The distinction between
raster and vector systems is illustrated in Fig. 7.2. As can be seen in the figure,
a significantly smaller area is actually scanned when a vector scanning system
is used for mask writing, compared to a raster scanning system. Further writing-
time reductions are possible by employing a variable-shaped beam, where larger
spot sizes can be used for writing large features, thereby reducing the amount of
scanning even further. Vector systems currently are the primary machines used to
write state-of-the-art masks for the manufacturing of integrated circuits. Examples
of different electron-beam writes are given in Table 7.4.
Improvements in edge roughness are obtained by shaping the electron beam,
and vector scanning systems today typically have shaped beams. A shaped beam
is produced by passing the electron beam through a shaped aperture, or sets of
apertures, in the electron optics.26–29 Examples of shapes that could be produced
by apertures in the Leica30 ZBA32H+ are shown in Fig. 7.3.
While the conceptual advantages of vector scanning with shaped beams have
been long understood, it has taken many years for this architecture to be adopted
widely for mask making. One obstacle that needed to be overcome was the lack
of robust software for converting circuit designs into data formats that could be
used by the e-beam writers.36 Because of strong interest in direct e-beam writing
on wafers, for which system throughput was a major concern, vector scanning
systems continued to be developed even while masks were made predominately
by raster scanning tools. Additional resources became available to develop vector
scanning systems further to meet the stringent challenges of 1× x-ray masks (see
Chapter 13), which required very high pattern fidelity (including geometries with
Figure 7.2 The area to be written is (a), while (b) illustrates exposure using a raster
scanning system, where the beam is directed over a large area, with the beam blanked
off except when directed at the dark areas. (c) The same area is scanned more quickly with
a vector scanning beam writer, which scans only over the areas that are to be exposed.24
Figure 7.3 Different beam shapes that can be formed by two overlapping apertures.31 The
apertures can often be in different planes within the electron optical column.32–35
corners) and practical writing times.37 Eventually vector scanning systems reached
maturity, and there are several vector scanning systems available today for mask
making, such as the JEOL JBX-3050MV and the NuFlare EBM-7000.38 For high
resolution, the beam voltage in these systems is 50 kV. Parameters of the EBM-
7000 are given Table 7.5.
The area of a mask is patterned using a combination of electromagnetic
beam deflection and mechanical scanning,40 an example of which is shown in
Fig. 7.4. Scanning can move beams over a distance of up to ∼1 mm, while
mechanical motion is required to cover longer distances. Scanning typically occurs
simultaneously with mechanical movement to minimize periods of acceleration
and deceleration that reduce throughput.
If beam writers are set up incorrectly, local linewidth errors can be produced
because of errors at a stripe boundary. This is illustrated in Fig. 7.5. Geometries
that lie on stripe boundaries can have dimensional errors because of pattern
misplacement of separate geometries that comprise the complete feature. Errors
of this type are <1.5 nm (3σ) on a NuFlare EBM-7000, mitigated in part by the use
of two-pass printing, illustrated in Fig. 7.6. With two-pass printing, the extent to
which patterned placement errors cause dimensional errors is lessened by reducing
the degree to which geometries are split across strip boundaries.
Another example of an error at a stripe boundary is shown in Fig. 7.7. One of
the challenges of scanning is ensuring that the stripes are butted against each other
Figure 7.5 A CD error at a stripe boundary. The intended geometry is made too short.
models, MEBES II–IV,46–48 the MEBES 4500, and the last model, MEBES 5500,
released in 1999.
Electron-beam scanning lithography suffers from a fundamental productivity
problem as the feature size decreases. In the most primitive implementation of
raster scanning, the design needs to be divided into pixels or addresses whose
dimensions are equal to those of the exposure beam. As the minimum feature size
on the mask diminishes linearly, the spot size (area) of the exposure beam must
decrease quadratically. For example, if the feature-size width decreases by a factor
of two, the area of the spot size decreases by a factor of four, and, as a consequence,
the number of pixels needed to expose a fixed area on the reticle increases by a
factor of four. If pixels are exposed at a fixed rate, the throughput of the beam
writer decreases by approximately a factor of four each time the minimum feature
size decreases by a factor of two. Fortunately, the writing time of beam writers has
improved, though not at a pace commensurate with decreases in feature size. When
5× steppers were first introduced, the MEBES exposure rate was 40 MHz,49 and the
wafer’s minimum feature size was approximately two microns. Today’s minimum
feature size on the wafer is less than 50 nm, and mask features have scaled even
faster, due to the widespread use of subresolution features (see Chapter 8). To
maintain beam-writer productivity, it would have been necessary for the MEBES
exposure rate to increase to over 64 GHz. Because the exposure rate could not
increase at a rate commensurate with the increase in pixels, mask-writing time has
increased, leading to higher mask-production costs, even with a transition made
to vector-shaped beam architectures. The issue of lithography costs is covered in
more detail in Chapter 11. Optical-beam writers, which can provide lower mask
costs when mask features no smaller than 400 nm are required,50 are discussed
later in this chapter.
Before discussing writing strategies further, some definitions are needed. It is
important to note that designers place circuits on grids that are different from those
used to fabricate the masks. The distinction must therefore be made between the
design grid and the writing grid. When writing, a particular grid point is often
referred to as an address.
To understand the raster-writing strategies alternative to primitive raster
scanning, it must be recognized that the important requirement for many masks
is placement of the edges of geometries, not the minimum geometry size. For
example, consider a process where the minimum pitch is 100 nm, the nominal
feature size is the half pitch, and 4× reduction steppers are used. On the reticle,
the minimum features are nominally 4 × 50 = 200 nm. However, because of the
needs of optical proximity correction, it may be necessary to shrink some features
on the mask by 20 nm, or 10 nm per edge, while other features remain sized at
the original 200 nm. In a mask-writing scheme where the writing address equals
the spot size of the beam writer, a writing grid of 10 nm would be necessary to
generate this mask, even though the minimum feature size on the mask is 180 nm.
With a writing grid and spot size of 10 nm, beam-writer throughput is very low.
To avoid this problem, raster writing schemes have been developed where the
writing grid and spot sizes are larger than the design grid. Moreover, masks are
Masks and Reticles 269
often patterned with multiple passes. Consider the following example. A circuit
is designed on a grid, which defines the unit scale: the design-grid size is 1.0. In
a writing scheme called multipass, the writing grid is a multiple (laterally) of the
design grid (see Fig. 7.8). Moreover, the x and y 2σ points of the Gaussian spot
are at the edges of the writing grid. By making the spot size large enough, there is
overlap of the tails of the spots, which smooths out the exposure.
Illustrated in Fig. 7.8(a) is one method for moving line edges with the fine
granularity of the design grid, even when writing on a grid twice that size. By
exposing every other spot along part of the right edge the dose is reduced by half,
and the edge is shifted to the left, relative to the edge where every spot is exposed.
This enables higher throughput by using a spot size that is larger than the design
grid.
The multipass gray approach, illustrated in Fig. 7.8(b), takes this concept one
step further. With this technique the writing grid is four times that of the design
grid.51 The pattern area is scanned four times without requiring more writing time
than if the mask was written with a writing grid equal to the design grid by using
only one-quarter dose per pass. Not every pixel is exposed on each writing pass.
By exposing some pixels only once, twice, or three times, the edges of features can
be moved. There is also averaging that takes place with multiple passes that has the
net effect of reducing linewidth, registration, and butting errors.52
Edge features are placed with finer granularity by reducing the number of
exposures at the edge pixels, as shown in Fig. 7.9. In this figure, Gaussians are
placed in a row and centered at points 0, 1, 2, and 3. Each has a standard deviation
equal to two thirds. The total dose is the sum of all of the Gaussians. The three
curves show the doses at the left edge of the pattern when the Gaussian centered
at point 0 has 25%, 50%, 75%, or 100% of the peak magnitude of the doses at the
other points. This represents one, two, three, or four exposures at point 0, while
the other points are exposed on all four passes. As can be seen, the edge of the
Figure 7.8 Different raster-writing schemes: (a) virtual addressing, and (b) multipass gray.
270 Chapter 7
Figure 7.9 Exposure dose as a function of the dose of the Gaussian center at 0.
dose moves as the dose of the edge feature is varied. This approach can also be
combined with pixel deflection for adjusting edge locations on masks.53
From Fig. 7.8 one might expect rough pattern edges to result from patterning
schemes that involve large writing spots. However, reasonably smooth edges are
actually made with large spot sizes, as shown in Fig. 7.10 (see Color Plates). In
this figure, two-dimensional Gaussian spots were placed on grid points with integer
coordinates, and x ≤ 0. The resulting line edge, around a dose of 0.3, is seen to be
reasonably smooth.
It was seen in Fig. 7.10 that reasonably smooth lines are generated from round
Gaussian beams, but shaped beams nevertheless provide improvement, particularly
Figure 7.10 Exposure dose contours from Gaussians placed at integer coordinates (x, y),
with x ≤ 0 (see Color Plates).
Masks and Reticles 271
with respect to sharpening corners. This was important for making 1× x-ray masks,
and x-ray programs motivated the development of several vector scanning tools.
Another way to improve performance in electron-beam writers is to use
higher beam energies. The original EBES system used 10-keV electrons, as did
subsequent generations of MEBES machines. While the use of electrons avoids
the problems of diffraction and reflection associated with optical imaging, electron
lithography has a different set of issues. One problem is scattering.54 When
energetic electrons pass through matter, electrons scatter. Part of this scattering,
the transfer of energy from the electrons to the resist, is an essential step in the
lithographic process, whereby the solubility of the resist is altered by exposure
to radiation. However, many electrons scatter into directions different from their
original trajectories (Figs. 7.11 and 7.12), resulting in degraded resolution and
proximity effects.55,56 Scattering into the resist film, particularly the forward
scattering that degrades resolution, is reduced by increasing the voltage of the
electron beam. Consequently, newer electron-beam exposure systems have higher
beam energies, with 50 keV being typical.
The scattering of electrons has been characterized as Gaussian broadening, with
different amplitudes and widths for forward and backward scattering.55 With this
approach to the characterization of electron scattering, the energy distribution from
a beam of electrons incident at the origin of a coordinate system is given by:
where r is the radial distance from the origin in the plane of the resist, α is the
width of forward scattering (see Fig. 7.12), β is the width of the backscattering,
and η is the fraction of the total scattered energy that is backscattered.
Figure 7.11 Monte Carlo simulation of electron scattering for PMMA on a silicon
substrate.56 The energies of the electrons for the two different examples are shown in the
corners of the graphs.
272 Chapter 7
The validity of this characterization has been assessed using Monte Carlo
simulations,57,58 with a representative result shown in Fig. 7.13. While there
is imperfect quantitative agreement between the Monte Carlo simulations and
the double-Gaussian model, there are some observable characteristics. First, the
broadening attributed to forward scattering has higher peak intensity than that
attributed to backscattering, and it also has a range <100 nm. On the other hand, the
broadening attributed to backscattering extends over distances of several microns.
Consequently, forward and backscattering have difference effects. Resolution is
reduced by forward scattering, while backscattering leads to proximity effects.
Regardless of the beam energy, scattering occurs at a level of significance, and
adjustments are required in order for patterns to be sized correctly, regardless
of proximity. Consider the situation shown in Fig. 7.14. A seven-bar pattern is
imaged using electron beams. The total exposure dose is the sum of incident
and backscattered electrons. Note that the scattered dose is greater for the
centerline than for lines near the edges of the seven-bar pattern. One might expect
exact corrections to be very complex for random logic patterns, but fortunately,
the backscattering range of 50-keV electrons is very long (∼10 µm), enabling
proximity corrections to be based upon average pattern densities rather than
detailed layouts.
One clever approach to proximity correction is GHOST, which is based upon the
idea that the background dose can be equalized by using a second exposure.59 The
pattern of the second exposure is the complement of the original, and the halfwidth
of the beam used for the second pass is approximately equal to the distance
over which the electrons are backscattered. The dose for the second exposure is
proportional to the incident dose and the magnitude of backscattering.
The GHOST technique has the disadvantage of requiring a second exposure
pass, reducing overall exposure-tool productivity. Consequently, many electron-
beam writers rely on sophisticated software for proximity corrections.60–62 In
areas that receive considerable dose from electron backscattering, the beam writer
Masks and Reticles 273
Figure 7.13 Monte Carlo simulation of electron forward and backscattering, compared to
the double-Gaussian model.57,58
exposes with a reduced dose, or the edges of the directly exposed pattern are
shifted, using some of the methods described previously.
Another way that electrons can scatter into unintended parts of the resist film is
by reflection from the bottom of the electron-beam optical column (Fig. 7.15),
a phenomenon often referred to as fogging.63 Software solutions have been
proposed, similar to those used to correct for more direct proximity effects.64–66
Hardware solutions have been implemented involving various types of sufficiently
deep chambers, sometimes with an electrostatic potential to reduce reflections of
electrons from the bottom of the electron-beam column.67,68
Electron beams can impart an electrostatic charge to the substrate, particularly
when a nonelectrically conducting resist is the top layer. The amount of charge
will vary across the substrate according to pattern density, and this will also
vary over time as the mask pattern is being written. This charging can lead to
registration errors, because it will deflect the electron beam from the intended
location.70–72 A number of schemes have been proposed for mitigating the effects
of substrate charging. One proposal involves the generation of a model to predict
the electric fields generated by the electron beam,73 but such models are necessarily
complex, involving both the particular pattern being generated on the mask as well
as the writing sequence. Another approach incorporates electrically conducting
resists74 or electrically conducting overcoats. This method is effective at reducing
electrostatic charging, but overcoats often result in additional defects,75 and the
imaging properties of the electrically conducting resists are not always adequate.
Fundamental to assessing the magnitude of pattern-placement errors is
equipment for measurement. Measuring absolute pattern locations over distances
exceeding 100 mm, yet accurate to nanometers, is extremely challenging. Such
specialized tools exist for performing such measurement, most notably the IPRO4,
currently manufactured by KLA-Tencor.76 While such tools can be calibrated quite
precisely to artifacts,77 it is very difficult to achieve absolute accuracy.78–80 To
accomplish this it is necessary to have accurate linearity across long distances and
nearly perfect orthogonality between axes.
Figure 7.15 Backscattered electrons reflect from the bottom of the electron-beam column,
causing a background of diffuse exposure.69
Masks and Reticles 275
comprising the structure. However, beam writers typically require data that can
be streamed at very high writing rates. This requires that design or OASIS data
hierarchy must first be removed, often referred to as “flattening” the data. After
the hierarchy is removed, the circuit patterns need to be converted from polygons
of arbitrary sizes and shapes to the primitive shapes (rectangles, triangles, etc.)
consistent with the beam writer’s architecture. Finally, the data needs to be broken
down into the fields and subfields of the beam writer. This whole process of
conversion from the hierarchical format is known as “fracturing.” As a consequence
of flattening and fracturing, beam-writer databases are usually much larger than the
original design files.
blank supplier. As typical i-line resists are sufficiently stable, this is a reasonable
thing to do for masks generated on optical-beam writers. An example of a resist
commonly used with 364-nm exposure systems is iP3500 from Tokyo Ohka Kogyo
Co., Ltd. With optical exposure, there are standing waves in the resist, reducing
process control. This is improved by the use of post-exposure bakes, just as is
found in wafer processing.93
Mask-patterning tools operating at DUV wavelengths lead to the prospect of
chemically amplified resists, with all of the instabilities discussed in Chapter 3.
The transition to DUV optical-patterning tools has motivated mask shops to coat
the mask blanks themselves. These also require post-exposure bakes. Control of
bake temperatures of the resist is more difficult with thick glass substrates relative
to what is achievable with silicon wafers. Consequently, low sensitivity of the resist
to the post-exposure bake temperature is an important parameter to consider when
choosing a chemically amplified resist for mask making.
Electron-beam pattern generation usually requires e-beam-sensitive resists.
Since the mechanism for e-beam exposure is different from optical exposure,94
materials optimization is beneficial, although there are some resists, such as
Fujifilm Electronic Material’s FEP171, that work well exposed on e-beam or
DUV optical mask writers. For many years, the most commonly used electron-
beam resist was a positive material developed at Bell Laboratories, poly(butene-
1-sulfone), usually referred to by its acronym, PBS95 (Fig. 7.19). This material is
an alternating copolymer of 1-butene and sulfur dioxide that undergoes scissioning
upon exposure to energetic electrons. The lower-molecular-weight byproducts of
the exposure are soluble in organic solvents. Typical developers for PBS are
mixtures of pentanone and other solvents, such as methyl isoamyl ketone.96
PBS is a reasonably sensitive resist and can be exposed with doses on the
order of 1 µC/cm2 at 10-keV beam energies. Sensitivity is critical for high
patterning-tool throughput. Even with such sensitive resists, several hours are
required to pattern a single mask for a leading-edge microprocessor or memory,
so significantly less-sensitive resists are problematic. PBS also has the virtue of
Masks and Reticles 279
being able to hold up to the wet-etch chemistries used to etch chromium masks.
For these reasons, mask makers long tolerated the otherwise poor lithographic
performance of PBS.97 However, new resists are now being used, such as FEP171
from FujiFilm Electronic Materials, SEBP9092 from Shin-Etsu, or ZEP 7000, a
polymer of methylstyrene and chloromethyl acrylate from Nippon Zeon. All of the
resists require somewhat higher doses (8–10 µC/cm2 ) than PBS, but provide good
lithographic performance and can be used when plasma-etching chromium.98 ZEP
7000 uses an organic solvent developer (a mixture of diethyl ketone and diethyl
malonate, or a mixture of methyl isoamyl ketone and ethyl malonate), which
evaporates much faster than water. Because the development rate is temperature
dependent, and evaporation causes nonuniform cooling, it is more difficult to
control the development of resists that use organic solvents as developers. A
TMAH-based aqueous developer is used with FEP171.99
As electrons penetrate a resist film the electrons undergo inelastic scattering,
resulting in a cascade of secondary electrons.100 Attendant energy transfers can
induce chemical reactions, i.e., exposure of the resist. As discussed previously, this
scattering can limit the resolution potential of electron-beam lithography. In optical
lithography involving photons with 193-nm wavelengths and longer, absorption
occurs between molecular energy levels. Since this is a localized event, it is
typically a photoactive compound or photoacid generator that directly absorbs the
light. In contrast, the high-energy electrons used to pattern masks cause ionization
at the atomic level. Ionization can also be produced by secondary electrons
produced by inelastic scattering of the primary electron beam. The essential
chemical reactions that ultimately lead to changes in the solubility properties of
the resists result from secondary reactions that originate with ionization.
In addition to positive resists, negative resists are used for mask making. With
vector scanning, where scanning is limited to the features that are to be exposed,
exposure times can be reduced substantially for certain patterns by using negative
resists. Another advantage of negative resists is illustrated in Fig. 7.20. Although
there is no significant difference between positive and negative resist processes in
dimensional control of directly patterned features, there is a difference when the
critical feature is a fixed tone. For example, the critical feature for gate masks is
typically a line. Features need to be exposed on both sides of this line when using
positive resists. As a consequence, pattern placement as well as direct linewidth
control will have an impact on the dimension of this most critical feature. When
line-dimensional control is critical, such as for gate masks, negative resists have
280 Chapter 7
Figure 7.20 Exposures using positive and negative resists. To create a line in resist,
exposures to the left and right of the line are required with positive resists, while only the
line itself needs to be exposed when using negative resists.
Figure 7.21 (a) Partially developed novolak-type EBR900 resist following four-pass vector
scanning exposure. The effect of resist heating is apparent, although much less than in
single-pass scanning. (b) Change of effective dose due to heating in four-pass vector
scanning exposure.106
Masks and Reticles 281
high enough, positive resist can even be converted to negative tone! Because
the local temperature rise is related to the electron-beam dose, the overall effect
depends upon the beam-writing strategy. In particular, vector scanning systems
result in greater local temperature rises than raster scanning systems. The effect is
mitigated in both cases by the use of multipass writing strategies. 105
7.7 Etching
For many years the chromium films of photomasks have been wet etched. Typical
chromium wet etchants contain ceric ammonium nitrate—Ce(NH4 )2 (NO3 )6 —
mixed with nitric, perchloric, or acetic acid.107 With wet etching there is
appreciable undercut. Consequently, chromium thickness variations contribute to
linewidth variations on the reticle, particularly when wet etching is used.
Patterns cannot be transferred with good fidelity into the chromium film with
wet etches when the undercut becomes comparable in size to the features. Features
smaller than 100 nm are found on the mask today, dimensions comparable to
absorber thicknesses. For critical applications, chromium mask etching has moved
to dry etching for the same reasons that wafer etching underwent a transition to
plasma etch processes. Because CrOx Cly species are volatile,108 typical chromium
etches involve mixtures of Cl-containing gases (such as Cl2 and CCl4 ) and O2 ,109
with additional gases added to reduce etch-loading effects that can have an impact
on isolated-dense biases.110
For decades chromium and chromium-containing materials were used for
absorber films on photomasks. However, chromium has proven to be a difficult film
to dry etch, so there has been some work on alternative opaque mask materials,
most notably MoSi. MoSi has been applied extensively for the fabrication of
attenuated phase-shifting masks (which are discussed in the next chapter), and
more recently this material has been used more heavily for binary masks, notably
with Shin Etsu’s Opaque-Molybdenum-Over-Glass (OMOG) material.111 Because
MoF6 , MoCl4 , SiF4 , and SiCl4 are volatile compounds, halogens and halogen-
containing gases are used for etching MoSi.112,113 Since SiO2 is etched by
fluorine chemistries, oxygen is sometimes added to improve selectivity between
the absorber and the glass substrate.
As will be discussed in the next chapter, the SiO2 glass substrate is sometimes
etched to produce a phase shift. In this case, there is a significant knowledge base
from silicon-wafer processing regarding plasma etching of SiO2 . In the case of
photomask etching, compatibility with the resists used for mask making and the
materials used for absorbers is necessary.
7.8 Pellicles
As mentioned at the beginning of this chapter, step-and-repeat and step-and-scan
modes of wafer patterning require masks with no killer defects in order to achieve
good yields. While masks are made without any defects that result in nonfunctional
die, preventing particles from depositing on masks during extended mask usage is
282 Chapter 7
4Md
t= , (7.4)
NA
where M is the lens reduction, NA is the numerical aperture of the lens, and d is
the diameter of the particle on the pellicle. More detailed theoretical investigations
have shown that imaging is protected for particles only about one half of that given
by Eq. (7.4). For example, process windows are not reduced significantly with
pellicle standoff distances of 6.3 mm and particles <90 µm.116 Usually a pellicle
needs to be attached only to the chrome side of the mask, since the glass blank
itself serves the same purpose as the pellicle with respect to particles.117
Pellicle films are usually polymers, with nitrocellulose and forms of Teflon
being common. These materials must be mechanically strong when cast as thin
films, be transparent, and resistant to radiation damage. Good light transmission
through the pellicles is a combination of transparency and optimization of the
thin-film optics.118 Transmission through a nonabsorbing film as a function of
film thickness is shown in Fig. 7.23. As can be seen, the pellicle transmission
is maximized at particular thicknesses, and pellicles are fabricated at thicknesses
corresponding to such maxima. In some instances, antireflection coatings are
Figure 7.23 Calculated transmission through a thin Teflon AFTM film, as a function of film
thickness, for normally incident 248.3-nm light. The index of refraction for the film is 1.3.
Having a very thin pellicle reduces many of the requirements for the pellicle.
This is a consequence of refraction, and can be understood by considering
Fig. 7.24. If a pellicle is not perfectly flat, normally incident light refracts and
is displaced by a distance of δ given by
!
1
δ = t 1 − sin φ, (7.5)
n
where t is the thickness of the pellicle and n is the index of refraction of the pellicle
material, which is typically 1.3–1.5. For thin pellicles, the displacement of light
rays by refraction is small, even for moderate amounts of pellicle nonflatness. Other
aberrations can be introduced by a tilted pellicle.121
Even a perfect pellicle distorts the wavefront. Consider the situation shown in
Fig. 7.25. Because of diffraction, light propagates from openings in the mask at
a various angles φ. The largest relevant angle is set by the numerical aperture of
the projection optics on the reticle side of the lens, which is the numerical aperture
usually quoted, divided by the lens reduction. This results in moderate angles of
incidence being relevant. For a lens with a specified NA (on the wafer side), there
can be angles φ up to arcsin(NA/N), where N is the lens reduction. For a 0.93-
NA 4× lens, angles can be as large as 13.4 deg. The light rays that are nonnormal
to the pellicle result in wavefront errors. These are calculated as follows. Light
propagating through the space of the pellicle has an optical path length of t/ cos φ
in the absence of the pellicle. With a pellicle, the light traverses a different optical
path length to reach the line perpendicular to the straight light ray. The optical path
difference (OPD) is
2
−1 n sin φ
OPD = t + !# + sin φ tan φ − r .(7.6)
"
cos φ sin φ sin2
φ
cos arcsin
n
n 1−
n2
The first term is the optical path for a ray traveling straight in air. The second term
is for the refracted ray in the pellicle, and the third term is the optical path of the
refracted light from the pellicle to the point where it meets the line normal to the
rays, shown in Fig. 7.25.
To the second order, this gives an optical path difference of
φ2
" !#
1
t n−1+ 1− . (7.7)
2 n
A constant optical path difference simply represents an overall phase shift for all
rays and has no net effect on imaging, unlike variations in phase as a function of
angle. From Table 4.2, we see that Eq. (7.7) represents a focus error. There are also
higher-order aberrations induced by pellicles, representing spherical aberration.122
These aberrations are minimized by the use of very thin pellicles, as seen from
Eqs. (7.6) and (7.7). Aberrations are also minimized by a low value for the index
of refraction n of the pellicle material. Fluoropolymers, which are useful as pellicle
materials because they are transparent and durable at deep ultraviolet wavelengths,
also have reasonably low indices of refraction at exposure wavelengths. For
example, Teflon AF2400 has an index of refraction ≈ 1.35 at a wavelength of 193
nm.123
Additional complexities have accompanied the use of immersion lithography,
which results in high angles of incidence on pellicles (see Problem 7.5). Shown
in Fig. 7.26 is the transmission through a typical ArF pellicle as a function of
the angle of incidence. At low angles, the transmission is very high and varies
little with the angle of light rays incident on the pellicle. However, at numerical
apertures >1.0 relevant to immersion lithography (see Chapter 10), the pellicle’s
transmission varies with angle of incidence, effectively inducing apodization
and causing linewidths to vary (Fig. 7.27) as a function of pitch (among other
consequences). There will also be differences in phase of the transmitted rays as a
function of the angle of incidence.124
Even with pellicles, particulate defects can form on the surfaces of photomasks.
Pellicles protect masks from particles, but they do not prevent gases from entering
the volume between the mask and the pellicle. As noted above, pellicle frames
typically have holes in them to allow for pressure equalization between the
ambient air and the volume between the pellicle and mask. Although the air inside
286 Chapter 7
Figure 7.26 Calculated transmission of light through a pellicle as a function of the angle
of incidence. For the calculations it was assumed that the pellicle thickness was 825 µm
and index of refraction was 1.4.125 Transverse electric light is perpendicular to the plane of
incidence, while transverse magnetic light is parallel to the plane of incidence.
Figure 7.27 Calculated critical dimensions through pitch for 193-nm immersion
lithography.125 A 1.35-NA lens is assumed, with azimuthally polarized C-quad illumination
(0.8/0.5/30 deg), for a nominally 55-nm line on a binary mask. Differences in critical
dimensions over 2 nm are calculated between masks with and without a pellicle. For the
calculations it was assumed that the pellicle thickness was 825 µm and index of refraction
was 1.4.
Masks and Reticles 287
steppers is very clean, very small amounts (<1 ppb) of contaminants can lead to
photochemical deposition on mask surfaces. This may be enhanced because of
traces of chemicals remaining on the mask surfaces from the mask fabrication
process. Under intense DUV illumination, these chemicals can react and ultimately
form particulates.126,127 For example, ammonium sulfate was one material found
on photomasks on which haze had grown. The ammonium could have come from
ammonium hydroxide used for reticle cleaning, while the sulfur could have come
from sulfuric acid (also used for reticle cleaning) or from sulfur dioxide in the
ambient air. Molecular contaminants can also outgas from pellicle adhesives. A
possible mechanism for the generation of ammonium sulfate is128,129
Masks, being made from glass, are fragile, and need to be handled with care. In
addition to the risk of mechanical damage, the use of an electrical insulator as a
substrate makes photomasks susceptible to damage from electrostatic discharge
(ESD).130 For assessing the susceptibility to ESD of particular mask-handling
methods, a mask (the Canary ReticleTM ) has been designed that has structures
particularly liable to electrostatic discharge.131 These consist of arrays of large
chrome rectangles with isolated chrome lines extending from them. The tips of
these isolated lines are close (1.5 µm) to adjacent large chrome areas (Fig. 7.28).
These structures are arranged so that they point towards the interior of the reticle,
with arrays originating from all four sides of the reticle.
Sparks from discharges cause the mask absorber material to melt and can lead
to bridging across gaps on the mask. An example of this is shown in Fig. 7.29.
This has been observed in a controlled experiment when the Canary Reticle was
contained in a nonstatic-dissipative storage case that was subjected to a potential
of several thousand volts. Inferring what will happen to actual reticles from tests
involving the Canary Reticle is not clear, but this reticle can be quite useful in
identifying causes of electrostatic discharge that are actually occurring. Reticle
damage from ESD is a rare event, and tracking down causes is often difficult. The
Canary Reticle, with its enhanced sensitivity to ESD, can expedite the identification
of sources of ESD. More recently, a method has been developed to assess risk for
electrostatic damage to masks that does not involve the use of something like the
Canary mask, which is permanently damaged by testing.132
More recently, another electric field–induced damage mechanism has been
identified: electric field–induced metal migration (EFM).133–135 This occurs when
voltages are too low to cause sparks, but can induce material migration (see
Fig. 7.30). As features have become smaller, the gaps between geometries on
masks have become very small. With very small gaps, even modest voltages can
lead to large electric fields. Simulations indicate that voltages less than 100 V
between structures can lead to field-induced migration. Features close to the edges
of masks are particularly at risk. Masks are often constructed with rings of chrome
surrounding the device patterns, so small features should not be allowed to be close
to this guard ring. Keeping reticles away from the influence of external electric
fields are also important, which can be accomplished with suitably designed reticle
cassettes that contain the masks within a Faraday cage.136 The situation is even
more complex, as the amount of material that migrates appears to be related to
the amount of light exposure that the mask receives.137 Fortunately, there are test
devices that can measure electric fields to assess risk.138
Because the optical properties of materials vary with the wavelength of light,
particularly when considering light of wavelengths in the visible versus ultraviolet
portions of the electromagnetic spectrum, defect detection is strongly wavelength
dependent. The most dependable defect inspection can be expected from inspection
tools that use light at or near the wavelength of light used for exposures. Hence,
when KrF lithography was the leading-edge optical technology, inspection tools
often used 257-nm light. This was near the 248-nm wavelength of the KrF
light and could be produced conveniently by frequency doubling the emission of
514-nm argon-ion continuous-wave lasers. Continuous-wave lasers are useful for
inspection, because it is convenient to acquire data at rates considerably faster than
the 1–6 kHz frequencies of excimer lasers. Nevertheless, some inspection tools use
excimer lasers for illumination, and good inspection rates have been achieved.
Masks need to be cleaned to remove any particles that sit on the surfaces of the
mask and could block the transmission of light. Such particles are usually referred
to as “soft” defects. The majority of “hard” defects on masks consist of missing
absorber, or absorbing material being where it should have been removed. Over
time, four approaches to defect repair have been developed and implemented:
(1) Laser ablation and deposition146–148
(2) Focused ion-beam sputter etch and ion beam–assisted deposition149–151
(3) Micromachining using atomic-force microscope techniques152–155
(4) Electron beam–assisted chemical etching and deposition.156
Laser repair tools were the first to be adopted and served the semiconductor
industry well for many years. However, such tools are limited by finite optical
resolution, and ion-beam tools were introduced to address the needs associated
with small features on masks, since ion beams can be focused to very small
spots.157 Most typically, gallium ions are used, because high-current gallium-ion
sources can be made.158 The ion energy is typically tens of keV, to provide the
kinetic energy for sputtering opaque defects.159 However, such highly energetic
beams also tend to implant into the glass substrate, leaving a “stain” that results in
an imperfect repair. To reduce this “stain,” gases are introduced into the chamber
containing the mask to assist chemically in the removal of unwanted material. In
their neutral state these gases are inert, but they will produce energy in the area that
is being bombarded with ions. Chemically assisted processes also enable sputtering
and subsequent redeposition of material to be reduced. Ion beams can also be used
to add absorber where it is missing by injecting appropriate gases into the chamber
Masks and Reticles 291
containing the mask that is under repair. With the use of hydrocarbon-containing
gases,160 a carbon patch can be formed where the ion beam strikes the surface of
the mask.
Over time, concerns with gallium staining and sputtering led to the development
of electron beam–based repair tools.161 For removing unwanted absorber, gases
are introduced into the chamber containing the mask, and material is removed
by means of electron beam–assisted etching. The addition of material to
replace missing absorber is accomplished in an analogous way. Micromachining
approaches have also been used for repairing masks, but only for material
removal.162
Inspection tools will find defects and imperfections on masks, but the
printability of these defects is an issue of practical importance, since the repair of
defects is a complex and expensive process that should be avoided if unnecessary. It
was recognized long ago that the printability of a defect depends upon its proximity
to other features on the mask.163 This is illustrated in Fig. 7.31. Consider a process
for which the smallest feature that can be resolved on the wafer is 40 nm in size.
For 4× reduction optics, this means that the smallest printable feature on the mask
is 160 nm. An isolated object of half that size (80 nm) is well below the resolution
of the system and will not print. On the other hand, an 80-nm size feature that is
a protrusion into a narrow space will push that same space below the resolution
limit, and such a defect will likely print. Since the printability of a defect is context
related, and there are many possible configurations for geometries on masks, the
subject of mask-defect printability is complex and has been the subject of much
study.164–169
Figure 7.31 Illustration of mask defects of identical widths. The isolated feature will not
print, while the defect which is a protrusion of the right line is printable.
292 Chapter 7
Figure 7.32 (a) A scanning-electron micrograph of a bridging defect on a mask, and (b)
the measured aerial image from the mask. (c) The corresponding results after the mask was
repaired (see Color Plates).
Masks and Reticles 293
Problems
7.1 For a mask fabricated on a fused silicon substrate, show that the separation
between two geometries nominally separated by 100 mm changes by 5 nm for
a 0.1 ◦ C temperature change. (The coefficient of thermal expansion for fused
silica equals 0.5 ppm/◦ C.) For the same change in temperature, show that the
separation error would be 600 nm for a mask fabricated on a borosilicate glass
substrate with a coefficient of thermal expansion equal to 60 ppm/◦ C.
7.2 Reticle nonflatness consumes part of the depth-of-focus budget. It is desired
that reticle nonflatness reduce the depth-of-focus budget by no more than 10%.
For a 100-nm depth-of-focus, show that the reticle flatness must be <160 nm
in order to be consistent with this 10% criterion and assuming a 4× lens. What
would this be for a 6× lens?
7.3 Suppose a mask is patterned by a beam writer with a 50-keV beam energy over
an area of 100 × 100 mm, using a resist with 10-µC/cm2 sensitivity. Assume
the pattern density is 50%. For a standard 6-in. mask (152 × 152 × 6.35 mm),
with a heat capacity of 0.71 J/gm·K and density 2.2 gm/cc, show that the mask
temperature rises by 0.11 K, assuming all electron energy is converted to heat
and there is no heat conducted away from the mask substrate. (The electronic
charge is 1.6 × 10−19 C and 1 eV = 1.6 × 10−19 J.) Assume that the absorber
is sufficiently thin that its contribution to the thermal mass is negligible. Refer
to Problem 7.1 to assess the significance of such a temperature change on
registration errors.
7.4 Show that the maximum angle for a ray that passes through the top element of
a 1.35-NA 4× lens is 19.7 deg.
7.5 Consider a 1.35-NA 4× lens. For a perfectly flat, 1-µm-thick pellicle, show
that, due to the pellicle, the optical path of the wavefront varies by 15.3 nm.
Assume that the refractive index of the pellicle = 1.35, and take the refraction
by the pellicle into account by using [Eq. (7.7)].
294 Chapter 7
7.6 Using Eq. (7.4), show that pellicles with 6.3-mm standoff provide protection
for particles as large as 366 µm when using a 0.93-NA 4× lens for imaging.
7.7 From Eq. (7.6), show that the pellicle-induced wavefront errors → 0 as either
n → 1 or t → 0.
References
1. R. Iscoff, “Photomask and reticle materials review,” Semicond. Int., 82–86
(March, 1986).
2. B. Shamoun, R. Engelstad, and D. Trost, “Assessment of thermal loading-
induced distortions in optical photomasks due to e-beam multi-pass
patterning,” J. Vac. Sci. Technol. B 16(6), 3558–3562 (1998).
3. Y. Ikuta, S. Kikogawa, T. Kawahara, H. Hishiro, N. Shimodaira, and
S. Yoshizawa, “New silica glass ‘AQF’ for 157 nm lithography,” Proc. SPIE
4000, 1510–1514 (2000).
4. J. H. Burnett, Z. H. Levine, and E. L. Shirley, “Intrinsic birefringence in
calcium fluoride and barium fluoride,” Phys. Rev. B 64, 241102 (2001).
5. M. Born and E. Wolf, Principles of Optics, Pergamon, Oxford (1975).
6. B. B. Wang, “Residual birefringence in photomasks,” J. Microlithogr.
Microfab. Microsyst. 1(1), 43–48 (2002).
7. R. S. Priestly, D. R. Sempolinski, and C. C. Yu, “Photolithography
method, photolithography mask blanks, and method of making,” U.S. Patent
Application No. US 2002/0187407 A1 (2002).
8. SEMI Standards P1-92, “Specification for hard surface photomask
substrates,” and P34-0200, “Specification for 230 mm square photomask
substrates,” Semiconductor Equipment and Materials International (2007).
9. S. Inoue, M. Itoh, M. Asano, K. Okumura, T. Hagiwara, and J. Moriya,
“Desirable reticle flatness from focus deviation standpoint in optical
lithography,” Proc. SPIE 4691, 530–540 (2002).
10. A. R. Mikkelson, R. L. Englestad, E. G. Lovell, T. M. Bloomstein, and
M. E. Mason, “Mechanical distortions in advanced optical reticles,” Proc.
SPIE 3676, 744–755 (1999).
11. K. Bubke, S. Teuber, I. Hoellein, H. Becker, H. Seitz, and U. Buttgereit,
“Investigation of polarization effects on new mask materials,” Proc. SPIE
5754, 587–598 (2005).
12. R. A. Synowicki and J. N. Hilfiker, “Spectroscopic ellipsometry applications
in photomask technology,” Proc. SPIE 7122, 712231 (2008).
13. O. Nagarekawa, and S. Matsui, “Photo-mask blank for use in lithography
including a modified chromium compound,” U.S. Patent No. 4,530,891
(1985).
Masks and Reticles 295
Figure 8.1 On-axis and off-axis illumination. With off-axis illumination, diffraction orders
>0 are propagated through the projection lens. The features on the mask are perpendicular
to the plane of the illumination.
single beam, the zeroth-order beam, is transmitted through the lens. This illustrates
the limitation to resolution imposed by diffraction.
On the other hand, consider the situation in which two plane waves (of unit
amplitude) intersect, shown in Fig. 8.2. In this case,
2
Intensity = ei(kx x+kz z) + ei(−kx x+kz z) (8.1)
= 4 cos2 (k x x) . (8.2)
With two plane waves intersecting, a pattern with structure can be imaged.
Consider the situation in which the illumination is not normally incident
(Fig. 8.1). For such off-axis illumination it is possible for the zeroth-order light
and one first-order beam to enter the entrance pupil of the imaging optics. In this
situation, there is a pattern imaged onto the wafer. From this simple analysis, one
might expect to enhance image contrast by eliminating the illumination with small
angles of incidence, since those light rays contribute only to the background light
intensity without providing spatial modulation, for very small features.
From Fig. 8.1, one might also expect that off-axis illumination should improve
the depth-of-focus, because the angular spread of the light rays in the off-axis
Confronting the Diffraction Limit 309
Figure 8.4 Light rays diffracted by a diffraction grating of pitch p. In the figure k = 2π/λ.
This example is appropriate for 1:1 optics. Angles and dimensions will be scaled by the lens
reduction for reduction optics.
where a0 is the amplitude fraction of the light’s electrical field in the zeroth-
order beam, a1 is the fraction of the electric field in the first-order beam, E0 is
the amplitude of the incident electric field, p is the pitch of the grating, x is the
lateral distance along the wafer, and z is the amount of defocus. This equation is
independent of z, i.e., independent of focus, when
q
cos θ0 = 1 − (sin θ0 − λ/p)2 , (8.4)
λ
sin θ0 = . (8.5)
2p
because conventional illumination contains at least some off-axis rays for all
geometries, regardless of orientation. For more general geometries, illumination
must be configured accordingly. The construction of one such design is illustrated
in Fig. 8.5. Because it is difficult to visualize illumination from many angles,
it is convenient to depict the illumination in the form of its image in the pupil
plane of the projection lens. Figure 8.5 is such a depiction. In such pupil plane
representations, light in the center represents on-axis illumination, while light at
increasing distances from the center corresponds to illumination at larger angles
of incidence on the mask. As can be seen in that figure, there are four regions
where good imaging for horizontal features overlaps the good imaging for vertical
features. Illumination where light rays propagate only from those overlapping
regions is referred to as quadrupole illumination. From Fig. 8.5 and Eq. (8.6),
it can be seen that the optimum center point for the overlap regions occurs at
√ !
2 λ
σ = . (8.7)
2NA p
Each stepper company has its own name for its particular implementation
of quadrupole illumination. Nikon calls their system super-high-resolution
illumination control (SHRINC),8 Canon has named theirs Canon quadrupole
effect for stepper technology (CQuest),10 and ASML’s system is called Quasar.11
Typically, the regions of quadrupole illumination are not square in shape, but are
often circular or annular sections.
A common configuration that applies to objects of arbitrary orientation is
annular illumination, where the light comes down from an annulus, centered on
the optical axis, and forming the envelope of a cone. Optimum annular illumination
consists of an annulus that approximates all regions of Fig. 8.5, which are good for
either vertical or horizontal patterns. While annular illumination is not beneficial
for all patterns, nor is it the optimum type of off-axis illumination for other
patterns, such as gratings aligned in a single direction, it does provide process-
Figure 8.5 Construction of quadrupole illumination. This is the image of illumination in the
pupil plane of the projection optics.
Confronting the Diffraction Limit 313
Figure 8.6 Illumination pupil plane images for (a) annular illumination, (b) quadrupole
illumination, and (c) dipole illumination.
Figure 8.7 (a) Dipole and (b) quadrupole illumination generated from an annular source.
314 Chapter 8
Figure 8.8 Parameters for specifying quadrupole illumination. The parameters shown in
(a) can be used to describe the illumination generated from an annular source (Fig. 8.7),
while the parameterization shown in (b) is applicable to illumination with circular sections.
imaging, such as occurs with on-axis illumination (Fig. 8.1). This line of reasoning
leads to a general conclusion: resolution-enhancement techniques (RET) that can
produce two-beam imaging provide better performance than methods that generate
three-beam imaging. Dipole illumination improves the process window because it
leads to “two-beam” imaging, rather than “three-beam imaging.” This is a general
theme for all techniques to improve resolution and process window: try to create a
situation in which two-beam imaging occurs.
Conventional illumination consists of a cone of light, specified by a partial
coherence parameter σ. Annular illumination can be viewed as a cone of light
with an inner cone removed. The optics used to create any particular illumination
usually will not result in a sharp cutoff for the light around the edges of the cone.
Instead, the illumination will fall from some large value to zero or near zero over
a range of angles. This results in ambiguity over the exact illumination profile
for specific values of σ that are given. Consider conventional illumination. Some
stepper companies will define σ by the angle at which 90% of the integrated light
energy is contained, while others may define it as the angle in which 100% is
contained. Neither is correct nor incorrect, but users must be cautious when trying
to compare results from two different exposure tools, or between experimental
and simulated results. Subtle differences in illumination profiles can lead to very
different results.
Results for off-axis illumination from more detailed analyses and measurements
found the following:
(1) Enhancement is seen for features smaller than the coherent cutoff limit (k1 =
0.5). Above this limit there is no difference between conventional and off-axis
illumination.
(2) The depth-of-focus improvement for grouped features is much greater than for
isolated features.
(3) Grouped/isolated-feature print bias, discussed in more detail in the next
section, is affected by off-axis illumination.
Confronting the Diffraction Limit 315
Illumination optimized for one pitch may work poorly for other pitches. For
patterns with more than one pitch, this is a problem. Illumination sources that
include some on-axis illumination, though at lower intensity than for the off-axis
light,14,15 have shown good results for multiple structures. In such cases, the off-
axis illumination is used to ensure good patterning of the most critical features,
while sufficient additional illumination is added to enable adequate printing of
those patterns that are less difficult to print. Dipole illumination produces the
biggest process-window enhancement for dense lines oriented in a single direction,
while contacts typically image best with illumination that takes into account their
intrinsic two-dimensional character. These examples suggest that many different
types of illumination might be useful in lithography. Indeed, one of the papers
in which off-axis illumination for lithography was first proposed concluded with
the speculation that illumination might be custom engineered for individual
layers.5 Dipole, quadrupole, and annular illumination might be considered only
as examples of a wide variety of illuminator configurations that could be used
to enhance process windows. For example, researchers at Canon have introduced
a method, IDEAL, which involves modified illumination and assist features to
improve the printing of contacts.16 More recently, this idea has been generalized to
a method that can be used for calculating optimal illumination patterns for a wide
range of patterns.17
As suggested by Figs. 8.6 and 8.7, illumination can become very complicated. In
Chapter 5 it was shown how an axicon can be used to produce annular illumination
without losing very much light. More complex illumination can also be produced
efficiently using diffractive optical elements (DOE). Locally, a diffractive optical
element is a blazed grating, illustrated in Fig. 8.9. Efficiency is achieved by setting
the diffraction angle β to the angle at which the light is refracted at the glass-air
interface, which occurs when
where n is the index of refraction of the glass. On a larger scale, diffractive optical
elements are more intricate than simple gratings, as needed to produce complex
illumination shapes, and the techniques of computer-generated holography are
used to design the DOEs.18,19
Another method for producing complex illumination is through the use of
micromirror arrays, similar in concept to those used in data projectors and flat-
panel televisions.20 While more complex and expensive than a DOE, this approach
has the advantage of programmability, which is advantageous for research and
development, as well as for manufacturing environments where large numbers of
different illumination shapes are required. In the latter case, the higher-capital cost
of the programmable illuminator is offset by the expense avoided by not purchasing
a large quantity of DOEs.
For patterning a simple diffraction grating, dipole illumination is optimal. For
more complex patterns, other types of illumination, such as annular or quadrupole,
might be better. It is possible to determine what is the optimal illumination for
316 Chapter 8
Figure 8.9 A blazed diffraction grating. Beveling leads to high efficiency in light transmitted
in the indicated direction. The profille on an actual diffractive optical element will not have
the perfect saw-tooth shape shown in this figure, with less than 100% efficiency as a
consequence.
a given pattern. For every illumination source point, the quality of the patterning
can be assessed. Using criteria set by a lithographer, source points can be retained
as part of the illumination or not. While conceptually simple, this is difficult in
practice. The set of criteria used to assess the illumination can include linewidth
control, image-log slope, line-end shortening (discussed in the next section), and
mask-error factor (the topic of Section 8.3). All of these are important, and the
engineer needs to weight them as appropriate for the application.
Off-axis illumination provides a means of enhancing optical lithography.
Although specific illumination settings improve imaging performance only for
a narrow range of pitches, this is still an advantage for many applications,
such as memories, where many structures are highly periodic. When there are
multiple pitches to be patterned, the problem is more difficult. Some additional
complications resulting from patterns with multiple pitches are discussed in the
next section.
Figure 8.10 Light-intensity profiles for 1.0-µm and 0.35-µm features. Both line sizes in the
image are calculated using Prolith for isolated lines and lines that are part of a grating
consisting of equal lines and spaces. The solid lines are the light-intensity profiles for
isolated lines, while the dashed lines represent lines that are part of gratings that consist
of equal lines and spaces.
(0.042 µm, or 4.2%), while the 0.35-µm lines differ in size by 76 nm, or 22%.
This difference in line size between isolated and grouped lines is referred to as
the iso-dense bias. It is an effect of the proximity of features to other geometries
and becomes more significant as k1 becomes smaller. As seen in this example,
proximity effects were small for k1 = 0.8 and were not significant for older
processes, which had such values for k1 (see Table 8.1). On the other hand, modern
processes are characterized by much smaller values of k1 , and proximity effects
are now significant. This proximity effect represents an additional manifestation of
diffraction within the context of optical lithography.
Proximity effects can be quite complicated. Just discussed was the iso-dense
bias at two extreme pitches, with line-space ratios of 1:1 and 1:∞. Because of
the proximity effect, linewidths vary over the intermediate pitches as well, in a
nonmonotonic fashion (Fig. 8.11). There are a number of parameters that affect the
magnitude of proximity effects. Lens parameters such as wavelength and numerical
aperture are important, as are illuminator settings including partial coherence and
the use of off-axis illumination. Resist processes can also influence the magnitude
of proximity effects. Because models are inexact for resist processes, particularly
the post-exposure bake step, for accuracy the magnitudes of proximity effects need
to be determined empirically and re-evaluated whenever there are process changes.
The simplest way to compensate for linewidth variations caused by proximity
is to have certain geometries on the reticle resized so that all features print at the
desired dimensions on the wafers. Lines that print larger because of the proximity
effect are made slightly smaller on the reticle. These adjustments of features
on the masks to correct for proximity effects are known as optical proximity
corrections, and are often referred to by their acronym, OPC. While the objective
is conceptually simple, the implementation of optical proximity corrections is not
trivial. In a typical chip design, features occur in a large number of configurations,
318 Chapter 8
Figure 8.11 Simulated linewidths for 150-nm lines imaged at best focus on varying
pitches. The parameters of the simulation (using the commercially available simulator, Solid-
C) where NA = 0.75, λ = 193 nm, and σ = 0.6. An aerial-image threshold of 0.3 was used to
determine the linewidth.
not just simple grating structures. Different types of situations that occur are
depicted in Fig. 8.12.21
Masks generally include geometries of various dimensions. Over a wide range
of sizes, the printed features equal approximately the size of the corresponding
feature on the mask divided by the lens-reduction factor. However, as feature sizes
approach the resolution limit of the lens, nonlinearity is introduced.22 This is shown
in Fig. 8.13. Grating patterns consisting of equal lines and spaces were printed on
a Micrascan II (4× lens reduction, 0.5 NA, λ = 250 nm) over a range of sizes. The
resulting spacewidths were measured and plotted. For reticle dimensions ≥1300
nm = 4 × 325 nm, the wafer dimensions transferred 4:1 to the wafer, while for
smaller features, the wafer dimensions varied more rapidly as a function of reticle
feature size. A feature size of 325 nm corresponds to k1 = 0.65.
When critical geometries are involved, corrections on the mask are necessary to
compensate for this nonlinearity if features will be printed over a wide range of
sizes. Print bias is a function of feature size, and this can be corrected by resizing
Figure 8.13 Transfer of features from a reticle to the wafer. The features are reduced in
size by 4× for large feature size, but the reduction changes for features near the resolution
limit of the lens.
Figure 8.14 Contacts on a reticle: (a) without serifs, and (b) with serifs.
where the spacing between contacts is already a minimum resolvable distance, and
additional layout effort is required to create the reticle where serifs have been added
to the contacts.
Serifs may be used for structures other than contacts in order to reduce corner
rounding. L-shaped features, such as shown in Fig. 8.15, may have serifs added
(dark and clear). In this case, a clear serif is often referred to as a notch. An example
will be shown later that illustrates the need to correct corner rounding in order to
maintain gate linewidth control over active regions. As with many techniques that
we now apply to semiconductor lithography, serifs were used to reduce corner
rounding much earlier in other applications.24 This simply illustrates the universal
nature of physics, since similar physical situations can arise across a diverse range
of technological applications.
Subresolution structures, such as serifs, have a significant impact on mask
making. When subresolution features were first introduced, they represented a
decrease in the minimum feature size on the masks much greater than the
usual 0.7× node-to-node reduction. Mask makers were required to improve their
resolution capabilities quickly. Fortunately, the linewidth control for subresolution
assist features is not as stringent as for critical features that print. The intentional
use of subresolution features also complicated the mask-inspection problem. Very
high resolution is needed to inspect subresolution features, but then a large number
of defects that don’t print are also detected, increasing the time for defect review.
Figure 8.15 An L-shaped structure without serifs. (b) The same L-shaped geometry with
serifs to reduce corner rounding.
Confronting the Diffraction Limit 321
(a) (b)
Figure 8.17 Line-end shortening (a) on the mask, and (b) on the wafer. The width of the
rectangle is printed to size, but the length is too short.
Figure 8.18 Hammerhead used to correct for line shortening and corner rounding.
can be made. One of the most significant of these effects, the mask-error factor, is
the topic of the next section.
An example of the application of optical proximity corrections is shown in
Fig. 8.19. Without OPC, several problems can be seen. Resist linewidths for
dissimilar pitches are different, even though the widths are the same on the
mask. When patterning transistors, this leads to devices with different electrical
characteristics. Without OPC, the ends of lines do not extend as long as desired,
which is the problem illustrated in Fig. 8.17. As narrow lines approach pad areas,
the linewidths expand gradually in the resist patterns, not abruptly as in the mask
layout. Compensation for this rounding is needed by using notches to ensure good
gate linewidth control over the active regions. Finally, the corners of the large pad
areas are rounded—a loss of pattern fidelity and desired edge placement. These
problems are seen in the resist patterns of Fig. 8.19(b).
All of these problems can be improved by modifying the patterns on the masks
with optical proximity corrections, as shown in Fig. 8.19(c). By changing feature
sizes on the masks, gate linewidths are adjusted to match transistors on different
Confronting the Diffraction Limit 323
Figure 8.19 Example of optical proximity corrections (OPC).31 (a) An uncorrected pattern,
and (b) the resulting pattern in resist. The corrected mask pattern is found in (c), with the
resulting resist pattern in (d). The SEM images are not on the same scale. The transition of
features from narrow lines to large pads does not occur abruptly as in the design. Without
OPC this results in substantial variation in the length of the gate over the active regions.
pitches, and serifs and hammerheads are added to address the problem of line-
end shortening and corner rounding. Notches are introduced to reduce the problem
of necking, where linewidths change gradually between narrow lines and large
patterns at the ends of the lines.
Implementation of optical proximity corrections is a major task for advanced
semiconductor lithography. This is a phenomenon that is associated with low-k1
lithography, as noted in the introduction to this section. Very little or no OPC
engineering was necessary prior to the 250-nm generation, although it required
the most engineering resources of any single aspect of lithography development
324 Chapter 8
reflection and refraction at the reticle and wafer are required in order to achieve
good accuracy. Such models can be applied to small areas (∼1 µm × 1 µm), but
application to full chips is beyond current cost-effective computational capabilities.
Thus, approximation methods are needed.
Adding to the OPC challenge is the need to maximize process windows. The
optimum illumination condition can be determined for a given mask pattern, but a
different illumination condition may prove optimal after applying optical proximity
corrections. Thus, adjusted optical proximity corrections must be calculated with
the newly determined optimum illumination condition. This is often an iterative
process. An illumination that appears to give a good process window is identified,
and the optical proximity corrections are calculated. The illumination is then
reassessed, and possibly adjusted to provide a larger process window. Then the
OPC needs to be recalculated. This process is repeated until solutions converge.
OPC introduces additional complexities in design databases. In a memory array,
the memory cells are identically designed. This allows for compression of the data
through a hierarchical structure, where memory arrays may be stored compactly
as a basic cell along with its repetitive layout. After OPC is applied, the cells on
the outside of the arrays do not have the same reticle layout as the inner cells.
This perturbs the hierarchy and results in much larger data sets that can be many
gigabytes in size.
Optical proximity corrections are implemented by moving edges of features, or
parts of edges, on masks. To do this, the edges of the polygons that comprise the
mask pattern must be divided into smaller fragments, each of which can potentially
be moved. An example of this is shown in Fig. 8.20. Even within the framework of
model-based OPC, assignment of the fragments is often based upon a set of rules.
How finely designs are fragmented will affect how much computation time
is required when using software that moves edges according to the layout
fragmentation. Fragmentation that is too fine can lead to an unacceptably long
computational time and perhaps even numerical instabilities, while fragmentation,
which is too coarse, will not treat corners adequately. However, efficient
Figure 8.20 An example showing pattern fragmentation. The line between each dot is an
edge that can be moved.
326 Chapter 8
variation in these linewidths on the reticle, and σwafer is the corresponding variation
of linewidths on the wafer, then
MEF
σwafer = σreticle , (8.9)
N
where N is the lens-reduction factor and MEF is the mask-error factor, defined by
the following:
∂CDwafer
MEF = N , (8.10)
∂CDmask
where CDret and Pret are the sizes of the line and pitch on the reticle, respectively,
x measures distance in the wafer plane, and N is the lens reduction. MTF is the
modulation transfer function for spatial frequency equal to the pitch of the grating.
The grating is symmetric about x = 0. When CDret is close to one-half the pitch, the
sine prefactor in Eq. (8.11) is approximately equal to 1.0 and varies little for small
variations in CDret about its nominal value, while the prior term varies linearly with
the mask dimension. If a resist edge x0 corresponds to a threshold exposure dose,
then
" ! !#
CDret 2MT F πCDret 2πN x0
Ithreshold = I0 + sin cos . (8.12)
Pret π Pret Pret
! !
πCDret 2πN x0
0 = 1 + 2MT F cos cos
Pret Pret
∂ x0
! !
πCDret 2πN x0
− 2MT F sin sin 2N . (8.13)
Pret Pret ∂CDret
Recognizing that the dimension of the feature on the wafer is 2x0 , then
∂ x0
MEF = 2N . (8.14)
∂CDret
Consequently,
1
MT F+ 2 cos2 πCD ret
Pret
MEF = . (8.15)
2 πCDret
2 sin Pret
A plot of Eq. (8.15) illustrated in Fig. 8.21 enables us to gain some insight into
the mask-error factor. The mask-error factor is a minimum when the linewidth is
exactly one half of the pitch. Away from this condition, either the line or the space
becomes smaller. It should be noted that the approximations in the above analysis
break down when the linewidth departs far from one half of the pitch. In fact, the
MEF for an isolated line is typically smaller than for equally sized dense lines.
This has a consequence for implementing optical proximity corrections, because
isolated lines have lower MEF than similar lines with scattering bars (Fig. 8.16).37
The optimum process then depends upon photomask quality since the benefits
of scattering bars may be offset by linewidth variations from the photomask.
Near the diffraction limit, contacts have ∼2× the mask-error factor of lines and
spaces of similar width. The mask-error factor is another parameter that must be
considered when optimizing a process since some combinations of mask layout
and illumination can provide very good exposure latitude and depth-of-focus, but
may be very sensitive to small changes in mask linewidth. Overall process control
Figure 8.21 Graph of Eq. (8.15) with MTF = 0.5. The mask-error factor is minimized when
the linewidth is one half of the pitch.39
Confronting the Diffraction Limit 329
depends on the contributions from the mask and sensitivities to mask linewidth
variation depend on the mask-error factor.
The mask-error factor is related to the image log slope. Consider the aerial image
for a pattern consisting of lines perpendicular to the x axis. The intensity for this
image is a function of x and also is a function of the reticle dimension, CDret :
Variations in the mask CD from its nominal value CDret,0 will cause a change in the
position x0 of the edge of the resist feature. In the context of the thin-resist model,
these changes all occur at constant threshold intensity Ith . Accordingly,
∂ I ∂ I
dI(x, CDret ) = δx + δCDret (8.17)
∂ x x0 ∂CDret CDret,0
= dIth (8.18)
= 0. (8.19)
With suitable rearrangement, the use of Eq. (8.10), which defines the mask-error
factor, and the fact that the wafer CD is 2x0 we obtain40
!−1
∂I ∂I
MEF = −2N (8.20)
∂CDret ∂ x
∂I
! ! !
2N 1
=− . (8.21)
Ith ∂CDret ILS
Once again, it can be seen that a process is more robust when there is a large image
log slope.
As discussed previously, dimensions on the reticle can be resized through design
to adjust for nonlinearity in printing. This works to the extent that features are
sized perfectly on the reticle. With mask-error factors considerably larger than
one, optical proximity corrections are much more difficult to make. Small changes
in linewidth on the reticles will cause large changes of linewidth on the wafers
when the mask-error factor is large. The mask-error factor affects the precision
with which optical proximity corrections can be applied. Consider a 45-nm process
that has a 2-nm bias between isolated and dense lines (on the wafer), printed on
a 4×-reduction exposure tool. For MEF = 1, the linewidths on the photomasks
would need to be adjusted by 8 nm to correct for the isolated-dense bias, while
2-nm adjustments would be needed for MEF = 4. To achieve the desired level of
control, the mask would need to be written on a much finer grid. Depending upon
the architecture of the mask-writing tool, this could greatly increase mask-writing
time and therefore reticle cost.
The mask-error factor has a significant impact on reticle costs. As long as
the MEF = 1, it was necessary for improvements in reticle-linewidth control to
scale directly to the minimum feature size, in order to hold constant the reticle
330 Chapter 8
Figure 8.24 Simulated light-intensity distributions of a 400-nm pitch grating with equal
lines and spaces, imaged with 0.5-NA optics at a wavelength of 248 nm. For the binary
mask image, σ = 0.6, while σ = 0.3 for the alternating phase-shifting image.
2π
φa = a, (8.22)
λ
where λ is the wavelength of the light in vacuum. Through the glass material of the
substrate the phase changes by an amount
2π
φn = na, (8.23)
λ
2π
∆φ = a (n − 1) . (8.24)
λ
To achieve the condition ∆φ = 180 deg = π rad, we have the following relationship
for the thickness a of the phase-shifting layer:
λ
a= . (8.25)
2 (n − 1)
There are a number of problems with the alternating phase-shift mask. First,
additional work is required in order to apply it to anything other than an effectively
infinite grating. For example, consider what happens at the edge of a phase
shifter, which must exist if the product consists of anything more than completely
repetitive structures. This can lead to the situation where there is a transition in the
glass from 0 to 180 deg. The light from the clear 0-deg and clear 180-deg adjacent
areas will interfere destructively, resulting in the light-intensity profile shown in
Fig. 8.25.
Such an optical image prints into photoresist (see Fig. 8.26). Implementation
of alternating phase-shifting masks must deal with these edge shifters. Three
approaches have been taken. In the first, additional features are introduced between
the 0 and 180-deg phase areas to produce phase edges of 180 deg, which do
not print. For example, transitions can be made 0 deg → 60 deg, 60 deg →
120 deg, and 120 deg → 180 deg. Generating these additional phases on the
mask requires additional processing for mask fabrication, and there may not be
adequate space to accommodate all of these transitions on masks with very tight
pitches. Alternatively, a second mask can be used to expose the phase edges away.
Neither approach is completely satisfactory. The first method requires additional
Figure 8.25 Light-intensity distribution at a phase edge for coherent light. X = 2πNAx/λ,
where x is the physical distance from the phase edge measured in the same units as λ. The
phase edge occurs at X = 0. The derivation of the function graphed in this figure is given
below.
Confronting the Diffraction Limit 333
Figure 8.26 On the left, extra resist printed by the phase edge for a positive photoresist.
The right side of the figure shows what occurs with negative resist. In this idealized diagram,
issues that may exist, such as line shortening and corner rounding, are not indicated.
steps in the reticle fabrication process and is not applicable to very tight pitches,
while the second reduces stepper productivity. In all cases, mask design and layout
is complex. However, the improved capabilities that result from the alternating
phase-shift–type of mask may justify these additional processing steps.
A third approach involves the use of negative photoresist. As one can see from
Fig. 8.26, creating islands of resist using negative resist and phase-shifting masks
avoids the phase-edge problem. This is one of the reasons that negative resists are
again being used for leading-edge lithography.
The printed patterns shown in Figs. 8.25 and 8.26 can be understood as
follows.44 As discussed in Chapter 2, the image of a diffraction-limited lens for
coherent illumination can be obtained from the Fourier transform of the mask
pattern. In the case of the phase edge, the mask pattern has unit intensity, but half
of the pattern has a phase 180 deg (π rad) different from the other half. For this
situation, this transmission is given by
From Eq. (8.27), one can estimate that the width of the aerial image at a 0.25
intensity is approximately
λ
width = 0.25 . (8.29)
NA
The prefactor of 0.25 in Eq. (8.29) should be compared with the prefactors in
Eqs. (2.4) and (2.7). With the phase edge, one can obtain over a 50% reduction in
the size of a printable feature, compared to non-phase-shifted imaging. Phase-edge
photomasks can be used to delineate extremely narrow lines,45 such as gates and
interconnects.
Phase-shifting masks address the problem of the mask-error factor.46 Shown
in Fig. 8.27 are simulated isolated linewidths as a function of chrome size on an
alternating phase-shifting mask. For small features, the linewidth on the wafer
actually becomes insensitive to the chrome linewidth on the mask, as the object
begins to resemble a phase edge. In this instance, the mask-error factor can actually
be less than 1.0.
One of the problems of a phase-shifting mask with the construction shown in
Fig. 8.23 is the potential for the phase shifter to affect light amplitude as well
as phase. This difficulty is shown in Fig. 8.28. This may occur as a consequence
of etching-induced roughness or three-dimensional optical effects. The resulting
line asymmetry can be characterized and corrected by OPC to maintain linewidth
uniformity. Linewidth size errors may also occur if the phase shifting is different
from 180 deg, which places requirements on the glass-etch process.
Phase edges provide very high resolution, but they have a number of
deficiencies, which limit their applicability. Like alternating phase-shifting
lithography, applying phase-edge lithography to real patterns leads to unwanted
features that must be eliminated by means of a second exposure. It is also difficult
Figure 8.27 The wafer dimension as a function of reticle dimension for an alternating
phase-shifting mask. These results were simulated using PROLITH2, assuming a numerical
aperture of 0.7, best focus, and an imaging wavelength of 193 nm. The feature on the reticle
was an isolated line of chrome. A 30% threshold of the aerial image was used to generate
the wafer dimension.
Confronting the Diffraction Limit 335
Figure 8.28 Calculated KrF images for an alternating phase-shifted mask for 250-nm lines
and spaces, assuming 80-nm-thick chromium. The reduced intensity for the light going
through the recessed glass is apparent in the intensities calculated assuming 0.6 NA and
σ = 0.5.47
Figure 8.29 CPL example. Shown are calculated aerial images, assuming a 0.75-NA ArF
exposure tool with quadrupole illumination (0.55 σinner , 0.85 σouter ). The dimensions are at the
wafer.
when making the mask, but it does simplify design, layout, and OPC. There are
also strong optical proximity corrections required with CPL, and these will be
different, in general, from the corrections required with other types of masks.50
An example of considerable line-end shortening observed with the use of CPL is
shown in Fig. 8.30.
CPL exhibits considerable variations in linewidths and mask-error factors as
a function of pitch,51,52 which requires substantial optical proximity corrections;
Figure 8.30 Line-end shortening with CPL.48 A 0.7-NA KrF scanner with quadrupole
illumination was used for exposing the wafer.
Confronting the Diffraction Limit 337
Figure 8.31 (a) A rim-shift phase-shifting mask, and (b) an outrigger phase-shifting mask.
clear and nonclear areas, sometimes with an etch of the quartz to an appropriate
depth.
The partial transmission of the nonclear areas is a problem with this type of
mask. For example, the threshold exposure dose for significant resist loss must
be less than the amount of light that “leaks” through. An optimization of the
normalized slope of the aerial image may have unacceptable levels of light in
nominally “dark” areas of the mask (Fig. 8.32). This places requirements on the
resist and on the transmittance of the partially transmitting areas on the reticle.
The attenuated phase-shifting mask is attractive because it can be used in a single
exposure step, and is relatively easy to fabricate. It is particularly useful for
patterning contacts.
Another phase-shifting method that has been proposed for printing contacts and
vias is the optical vortex, illustrated in Fig. 8.33.62 A very dark spot is created at
the intersection of all four phases. Using this type of mask and negative resists, it is
possible to print small contacts and vias, with the virtue of having a low mask-error
(a)
(b)
Figure 8.33 Vortex layout for printing an array of contacts using phase shifters of 0, 90,
180, and 270 deg. (a) A maskless implementation, while in (b) an absorber on the mask is
added to control the contact diameters.
factor. Requiring four phases, the mask fabrication is more complicated than for a
two-phase mask.
As phase-shifting technology becomes more commonplace, a name is needed
for the conventional type of mask, which consists only of opaque areas (usually
chrome) and quartz or glass. Because conventional masks have only completely
clear or completely opaque areas, they are usually referred to as binary intensity
masks or chrome-on-glass (COG) masks.
Phase-shifting masks are also classified as strong or weak, according to their
ability to suppress the zeroth-order diffraction component, i.e., the extent to which
they produce two-beam imaging. This is seen in Fig. 8.24, where, without a phase-
shifting mask, zero intensity cannot be achieved at any point because of a zeroth-
order component background. Phase-shifting masks other than the alternating or
chromeless are less capable of eliminating the zeroth-order diffraction component.
The combination of rim-shifted and attenuated phase-shifting masks and off-axis
illumination has been found to be a powerful synergy,63,64 since each method is
only partially capable of reducing the zeroth-order light component by itself.
This can be seen from Eq. (8.3). For a mask pattern comprised of a grating with
equal lines and spaces (without phase shift),
1
ao = (8.30)
2
and
1
a1 = . (8.31)
π
340 Chapter 8
Because a0 > a1 , contrast is less than 1.0 (see Problem 8.1). The use of phase-
shifting masks can balance a0 and a1 , thereby increasing image contrast. This
can be understood as follows. With a binary mask the light amplitude varies
between zero and one. With an attenuated phase-shifting mask, this amplitude
varies between one and some negative value. The coefficient a0 is proportional
to the integral of the light amplitude. This is reduced with attenuated phase-
shifting masks, relative to binary masks, because the 0-deg phase-shifted light
is reduced in amplitude by the nonzero amount of 180-deg phase-shifted light.
Similarly, the coefficient a1 is proportional to the range of the amplitude, which
is greater for attenuated phase-shifting masks than for binary masks. In particular,
the combination of off-axis illumination with attenuated phase-shifting masks is
common. Chromeless phase-shifting lithography can be thought of as a variation
of attenuated phase shifting, with 100% transmission, and off-axis illumination is
typically used with this type of mask as well.48
From the preceding discussions, it is clear that a particular type of phase-
shifting mask or alternative illumination is applicable in specific situations but
not in others. For example, alternating phase-shifting masks are useful for dense
lines and spaces, but not for isolated contacts. Table 8.2 lists different types of
phase-shifting methods and the situations that result in enhanced lithographic
performance. Similarly, off-axis illumination was found to improve imaging of
dense lines and spaces, but not particularly useful for patterning isolated features.
FLEX improves the depth-of-focus for contacts, but not lines and spaces, and was
found to reduce ultimate contrast. This appears to be a general feature of many
newly developed methods for enhancing optical lithography: the improvement is
restricted to particular types of features; there is no “magic bullet.”
The fabrication of several types of phase-shifting masks requires additional
processing compared to what is necessary for making binary photomasks. In
particular, the alternating phase-shifting mask requires at least two patterning steps.
This introduces the need for overlay capability in the mask-making process. The
introduction of additional types of features on the reticles imposes new demands
on inspection capabilities. Such new requirements have significant effects with
respect to the capitalization required for making advanced photomasks. There are
also phase defects where either glass is etched in areas where it should not have
been, or glass may remain where it should have been etched. These glass-on-glass
phase defects are often more difficult to detect than defects formed from opaque
materials. Detection of phase defects is facilitated by inspection wavelengths close
to the stepper exposure wavelengths. This has motivated the development of new
generations of mask-defect-inspection tools with wavelengths close to 248 nm and
193 nm.
Attenuated phase-shifting masks are used extensively in manufacturing today.
Because they require little additional processing steps compared to binary masks,
they can be reasonably priced. Design issues typically revolve around the optical
proximity corrections that need to be optimized for the type of mask used.
As illustrated in Fig. 8.26, there are design challenges associated with
alternating phase shifting. In addition to the typical opaque or missing-chrome
Confronting the Diffraction Limit 341
Figure 8.34 Design layout and mask design of a contact layer, resulting from applying
inverse lithography techniques.67 In both cases the black areas represent the design
features, namely the contacts or the clear areas on the mask.
Confronting the Diffraction Limit 343
the critical lines on a given masking layer are oriented in the same direction.74
This approach to making designs more printable is often referred to as design
for manufacturability (DFM). This approach does increase the complexity of
developing new processes, since it requires a greater level of interaction between
lithographers and designers. However, the benefits are such that all companies
making leading-edge integrated circuits use some level of DFM.
The use of resolution-enhancement techniques enables the extension of optical
lithography to processes with values of k1 much smaller than thought possible
before the introduction of these methods. For the reader interested in early studies
of resolution enhancement, many of the original papers on these techniques
have been collected by Schellenberg.75 Since the initial pioneering work,
resolution-enhancement techniques have become a core component of lithographic
technology. How far optical lithography can be extended by the use of these
techniques will be discussed in Chapter 10.
Problems
8.1 Referring to Eq. (8.3), show that a0 = a1 is required to achieve an optical
contrast of 1.0.
8.2 For dipole illumination, show that the optimum value for σ = 1/4k1 . If the
maximum σ available on a scanner’s illuminator is 0.8, what is the minimum
k1 that can be imaged with dipole illumination? For σ = 0.9?
8.3 The index of refraction of fused silica is 1.51 at a wavelength of 248.4 nm, and
n = 1.56 at a wavelength of 193.4 nm. For KrF and ArF lithography, show that
the fused silica should be recessed 243 nm and 172 nm, respectively, in order
to achieve 180-deg phase shifting on an alternating phase-shifting mask.
8.4 If we want to control the phase shift on an alternating phase-shifting mask for
ArF lithography to ±2 deg, show that the etch depth in the recessed areas of
the mask must be controlled to ±1.9 nm.
8.5 Show that the width of the image of a phase edge is 36 nm for an ArF lens
with NA = 1.35. (Numerical apertures greater than one are discussed in
Chapter 10.)
References
1. W. N. Partlo, P. J. Thompkins, P. G. Dewa, and P. F. Michaloski, “Depth of
focus and resolution enhancement for i-line and deep-UV lithography using
annular illumination,” Proc. SPIE 1927, 137–157 (1993).
2. K. Tounai, S. Hashimoto, S. Shiraki, and K. Kasama, “Optimization of
modified illumination for 0.25-µm resist patterning,” Proc. SPIE 2197, 31–41
(1994).
344 Chapter 8
33. N. Cobb and D. Dudau, “Dense OPC and verification for 45 nm,” Proc. SPIE
6154, 615401 (2006).
34. Y. Cao, Y. Lu, L. Chen, and J. Ye, “Optimized hardware and software for fast,
full chip simulation,” Proc. SPIE 5724, 407–414 (2005).
35. J. Conga, and Y. Zou, “Lithographic aerial image simulation with FPGA-based
hardware acceleration,” Proc. 16th Int. ACM/SIGDA Sym. Field Programmable
Gate Array, 67–76 (2008).
36. M. Terry, G. Zhang, G. Lu, S. Chang, T. Aton, R. Soper, M. Mason, S. Best,
and B. Dostalik, “Process window and interlayer aware OPC for the 32 nm
node,” Proc. SPIE 6520, 65200S (2007).
37. A. K. Wong, R. A. Ferguson, and S. M. Mansfield, “The mask error factor in
optical lithography,” IEEE Trans. Semicond. Manuf. 13(2), 235–242 (2000).
38. A. Vacca, B. Eynon, and S. Yeomans, “Improving wafer yields at low k1 with
advanced photomask defect detection,” Solid State Technol., 185–192 (June,
1998).
39. C. A. Mack, “More on the mask error enhancement factor,” Microlithog.
World, 18–20 (Autumn, 1999).
40. C.-K. Chen, T.-S. Gau, J.-J. Shin, R.-G. Liu, S.-S. Yu, A. Yen, and B. J.
Lin, “Mask error tensor and causality of mask error enhancement for low-k1
imaging: theory and experiments,” Proc. SPIE 4691, 247–258 (2002).
41. T. Terasawa and N. Hasegawa, “Theoretical calculation of mask error
enhancement factor for periodic pattern imaging,” Jpn. J. Appl. Phys 39(1),
6786–6791 (2000).
42. M. D. Levenson, N. S. Viswanathan, and R. A. Simpson, “Improving
resolution in photolithography with a phase-shifting mask,” IEEE Trans.
Electron. Dev. ED-29(12), 1828–1836 (1982).
43. K. Hashimoto, K. Kawano, S. Inoue, S. Itoh, and M. Nakase, “Effect of
coherence factor σ and shifter arrangement for the Levenson-type phase-
shifting mask,” Jpn. J. Appl. Phys 31, 4150–4154 (1992).
44. C. A. Mack, “Fundamental issues in phase-shifting mask technology,” KTI
Microelectron. Conf., 23–35 (1991).
45. T. Tanaka, S. Uchino, N. Hasegawa, T. Yamanaka, T. Terasawa, and
S. Okazaki, “A novel optical lithography technique using the phase-shifter
fringe,” Jpn. J. App. Phys. 30, 1131–1136 (1991).
46. M. D. Levenson, “Will phase-shift save lithography?” Proc. Olin Microlithog.
Symp., 165–178 (1998).
47. The results shown in this figure were calculated by Dr. Andreas Erdmann and
were published in an article by W. Henke, “Simulation for DUV-lithography,”
Semicond. Fabtech, 9th ed., 211–218 (1999).
Confronting the Diffraction Limit 347
over which the incident beam is scanned, the intensity of the detected signal varies
so that an image can be created.
As the incident beam is scanned across a line, the detected signal varies.
Algorithms are required to relate the edges of the line to points on the curve of
intensity plotted as a function of position, and these algorithms do not always
provide complete accuracy. An idealized signal at the edge of a feature is shown
in Fig. 9.2. A mathematical algorithm is required to identify a point on the signal
profile to represent the edge of the line. A number of edge-detection methods are
used.3 The simplest algorithm uses a threshold value for identifying this point:
Figure 9.2 The electron signal corresponding to one edge of a feature being measured
by a scanning electron microscope. Several attributes of the feature are potentially used to
infer the location of the edge of the feature.
Figure 9.3 (a) Slope of a feature being measured, and (b) the secondary electron
production as a function of this slope.
Figure 9.4 Top-down SEM measurements versus measurements from cross sections of
the same etched polysilicon features.5 Measurement error bars are not shown.
354 Chapter 9
Figure 9.5 At incident energies E1 and E2 the sample is electrically neutral while it charges
for other energies. The values of E1 and E2 are material dependent.
Metrology 355
Esurface
IB = δ + η IB + ,
(9.2)
R
where IB is the incident beam current, δ is the secondary electron yield, η is the
backscattered electron yield, Esurface is the surface potential, and R is the effective
resistance to ground. Consequently, the surface potential is given by7
Esurface = IB R 1 − δ − η .
(9.3)
The surface potential is seen to be proportional to beam current, once the current-
to-ground is taken into consideration. Conductivity to ground can be complex in
insulating materials, where the excitation of electrons to the conduction band is
a function of beam energy and current. Complicating the situation even more is
the ability of insulators to trap charge. The time it takes to discharge the trapped
charges as a consequence of thermal relaxation is on the order of minutes.
Unfortunately, this simple picture is still incomplete for the types of
measurements that occur in lithography applications and typically involve features
patterned into photoresist. As noted previously, secondary electron emission
depends on the angle between the incident beam and the plane of the surface being
probed. Consequently, the electron-beam energy at which there is charge neutrality
varies between the top and sides of particular features, because the emission rate of
secondary and backscattered electrons from the tops of features differ from the rate
of the sides. The photoresist is nearly always developed to the substrate in exposed
areas, and the substrate usually has a different value for E2 than the photoresist.
Moreover, the surface potential may be more important than overall charge
neutrality. It is the electrical potential at the surface of the features, relative to
the potential at the detector, that determines the flow of secondary electrons from
the object being measured to the detector. Overall charge neutrality is important
for electrons far from the object, but the distribution of electrons within and near
the object will determine the electrical potential at the object’s surface. The issue
of sample charging remains a topic of current research.
Because photoresist is not electrically conducting it is difficult to dissipate
charge once accumulated, and for the reasons discussed above, complete charge
neutrality is rarely achieved. Charging also affects the linewidth measurement.
A negatively charged line, which occurs for higher voltages, deflects the electron
beam and results in a narrower measurement (Fig. 9.6).8 For this reason, charging is
of great significance to linewidth metrology using scanning electron microscopes.
Once secondary electrons are generated, they need to escape from the sample
and hit a detector in order to create a signal. SEM measurements of contact holes
are difficult for this reason—it is difficult for low-energy secondary electrons to
emerge from positively charged holes.9 For this reason, detection of backscattered
electrons is sometimes used for metrology with lithography applications. Because
backscattered electron emission peaks at or near the direction of the incident beam,
356 Chapter 9
Figure 9.6 Scanning-beam electrons are deflected away from the negatively charged
photoresist. As a consequence, the line is measured narrower than it actually is.
detectors are typically placed within the lens.10 The conversion of a secondary or
backscattered electron intensity into a linewidth number is not automatic. It may
be argued that measurement precision is the critical requirement, and that absolute
accuracy is less important. This is true to a certain extent, but facilities with more
than one SEM, or ones that need to transfer processes to or from others, require
standards to match SEMs. Consequently, some level of calibration is usually
required, and this gauge setting is generally material dependent. Algorithms for
calibrations on resist features will differ from those for polysilicon and depend
on the thickness and slopes of the resist features. Consequently, there are very
few linewidth standards available from organizations such as NIST that are used
directly in lithography applications. However, there are standards that can be used
for calibrating the magnification of electron microscopes, which has utility for
maintaining process control, particularly in facilities with multiple measurement
tools.
A current NIST magnification standard reference material, SRM 2800,
is fabricated from chromium on quartz, intended primarily for calibrating
measurements on photomasks, and has features down to 1 µm.11,12 There have
been efforts to develop standards designed specifically for use in SEMs.13,14 A
standard for calibrating SEM magnification, SRM 8820, is available from NIST.
SRM 8820 consists of etched lines of amorphous silicon on an oxide-coated
silicon wafer, and it has pitches down to 200 nm.15 The NIST SRM 2069b
is comprised of graphitized rayon fibers and can be used to assess contrast in
the SEM.16 Since SEMs “measure” different linewidths for features made from
different materials or thickness, these standards are useful only for ensuring
measurements on photomasks or controlling such parameters as the magnification
of the SEM. However, the magnification is verified by measuring the pitch
of grating patterns, which gives measurements independent of materials and
measurement algorithms.17,18
Metrology 357
errors from measurement noise. Calibrations over large ranges of dimensions are
typically piecewise linear.
The SEM used for metrology must be properly focused and stigmated in order
to obtain good measurements. In the past, this was accomplished by using a sample
of very small grains, such as sputtered gold or grainy silicon, and adjustments were
made until the image appeared best to the operator. Recent work has shown that
the subjective element associated with this approach can be overcome by taking
a two-dimensional Fourier transform of the image.24 Defocus involves a decrease
in the magnitudes of the higher-frequency components, and astigmatism shows
asymmetries between axes.
The introduction of ArF optical lithographic technology brought along a
metrology problem associated with measuring linewidths in resists. Many of
the materials comprising ArF resists are modified by electron beams, with the
consequence that resist linewidths shrink under exposure to such beams, as occurs
in scanning electron microscopes. Measurement precision requires repeatability
among measurements in terms of the exposure of the resist to e-beams, and
accuracy requires taking the shrinkage into account. The magnitude of the problem
is reduced by minimizing the time that the resist is exposed to the electron beam
and by operating at low-beam voltages.25
Care must be taken to compensate properly for the shrinkage. One method to
characterize the shrinkage is to measure and remeasure the same feature repeatedly.
When this is done, results similar to the curve in Fig. 9.8 are typically produced.
As the resist is exposed initially to the electron beam the changes occur rapidly,
but slow down after longer exposures. Correct compensation for resist shrinkage
requires that this nonlinear behavior be included properly.
In addition to measuring linewidths, SEMs are used for measuring line-
edge roughness (LER) and linewidth roughness (LWR). Although there have
been efforts to develop other techniques,27 currently only SEMs provide the
capability for measuring LER and LWR, particularly when there is interest
in the power spectral density of the roughness. Even when using a scanning
Figure 9.8 Exponential trend in measured critical dimension (CD) for ArF resists after
successive measurements in a scanning electron microscope.26
Metrology 359
electron microscope, there are details that need to be considered in order to get
good measurements of line-edge roughness.28 For example, suppose one wants
to measure the power spectral density for a given resist. Prior to making the
measurement, it is necessary to determine what domain of spatial frequencies is
of interest. Low spatial frequencies are important, because they have big impact
on linewidth control, while high-spatial-frequency roughness can affect electrical
resistance of metal lines and reliability. Suppose that the data in a scanning electron
micrograph are collected at N positions over a length L, where
L = ∆N. (9.5)
In this situation the maximum spatial frequency that can be measured is the Nyquist
frequency (see Fig. 9.9):
1
fmax = . (9.6)
2∆
Equation (9.6) is used to determine the value for ∆, which is related to the
resolution at which the SEM is operated. Similarly, the minimum measurable
spatial frequency is
1
fmin ≤ . (9.7)
L
In order to measure low spatial frequencies one needs to scan over long distances,
which requires a large image field for the SEM. On the other hand, high resolution
is usually achieved at high magnification and small image fields. Thus, there is a
tradeoff in measurement between fmax and fmin .
9.1.2 Scatterometry
One configuration for the measurement of linewidth using scatterometry is
illustrated in Fig. 9.10. A grating is patterned on the wafer. A beam of light is
Figure 9.9 Illustration of the minimum and maximum spatial frequencies that can be
measured.
360 Chapter 9
then reflected from the grating over a range of wavelengths, and the reflectance is
measured as a function of wavelength. The reflectance versus wavelength is also
calculated, with various assumed values for the printed pattern’s critical dimension
(CD) and resist profile, and for the thicknesses of the various films, including
the patterned resist. The measured and calculated reflectances are compared for
various assumed values of the linewidths, profiles, and film thicknesses, and a good
match is considered as identification of these parameters. Another configuration
is angle-resolved scatterometry, where the angle of incidence is varied, rather
than the wavelength.29 When wavelength is varied, the method is referred to as
spectroscopic ellipsometry. It is also possible to make the measurements using an
ellipsometer. In this case, measurements and calculations are performed for tan ψ
and cos ∆.30 Each of these techniques has its merits.31
Shown in Fig. 9.11 is an example of a ∼160-nm resist linewidth measured with
scatterometry. The measurement from scatterometry agreed well with the linewidth
determined from a cross section measured with a scanning electron microscope.
Scatterometry provided more information than possible from a top-down SEM.
In addition to the linewidth, resist profile information was obtained. Remarkably,
even the foot of the resist near the resist-substrate interface was captured by the
ellipsometric measurement. The ability to capture the resist profile indicates the
potential of scatterometry to identify out-of-focus exposure conditions.
Scatterometry is a measurement method somewhat complementary to
measurement with scanning electron microscopes. Fairly large areas (at least
50 µm × 50 µm) are required to obtain an adequate signal. An array pattern
is needed to obtain adequate signal-to-noise, but the pitch may be sufficiently
loose for the lines to be effectively isolated from a lithography point of view.
With some implementations of scatterometry it is necessary to prepare libraries of
calculated curves in advance, so considerable setup is needed. Thus, scatterometry
is best suited for use in manufacturing, although it has been used effectively to
characterize exposure tools, since simple bare silicon substrates can be used that
reduce setup complexity. It is not an appropriate method of measurement when
one is attempting to verify OPC of individual transistors. Some attempts have
Metrology 361
Rs
W= L. (9.8)
Rb
Figure 9.12 The resistor test structure for measuring linewidths electrically.
−
current is reversed, and a voltage V25 is measured. The current I is then forced
+ −
alternately between pads 2 and 3, and voltages V45 and V45 are measured. The sheet
resistance is given by
+ +
π V25 + V25 + V45 + V45
− −
Rs = . (9.9)
ln 2 4I
+
V56 + V56
−
Rb = . (9.10)
2Ib
Figure 9.14 A structure for measuring overlay. The inner box is part of the substrate
pattern, and the outer box is created in the overlaying photoresist layer.
SEMs. Sufficient data are collected automatically in order to perform the overlay
modeling discussed in Chapter 6.
Overlay measurement can falsely introduce apparent overlay errors. The most
common of these errors are discussed in this section. This error is called tool-
induced shift (TIS), and results from asymmetries in the optics of the overlay
measurement system. Because the materials of Layer 1 and Layer 2 in Fig. 9.14
are not the same, the measurements of x1 and x2 are not equivalent if there are
asymmetries in the measurement equipment. Types of asymmetries that occur
are tilted or decentered lenses, nonuniform illumination, lens aberrations, and
nonuniform detector response.37 The presence of TIS is easily verified by rotating
wafers from 0 to 180 deg.38,39 In the absence of TIS,
where ∆x(0) is the measurement for the unrotated wafer, and ∆x(180) is the
measurement for the wafer rotated 180 deg. A measure of TIS is therefore
which is ideally zero. The TIS value is measured when overlay measurement
programs are first established for particular process layers, and automatic
compensation is made to reduce the TIS error.40 Asymmetries in alignment
measurement can also be induced when features are too close to each other (x1
or x2 is too small), relative to the resolving power of the optics in the measurement
tool.41 The resolution of optical tools using visible light is typically in the range of
0.8–1.0 µm, while systems using ultraviolet light have somewhat better resolution.
The features of overlay measurement structures on the wafer must be consistent
with the resolution of the optics of the overlay measurement tool.
Wafer processing induces asymmetries in overlay measurement marks. Consider
situations where metal is sputtered onto wafers. If the deposition is not collimated,
metal builds up preferentially on one side of the overlay measurement marks near
Metrology 365
the edges of the wafers (Fig. 6.24). Measurements of overlay are shifted relative to
the overlay of the critical features. Errors in overlay measurement caused by wafer
processing are referred to as wafer-induced shift (WIS). For the example illustrated
in Fig. 6.24, the overlay error is in opposite directions on opposite sides of the
wafer and appears to be a wafer scaling error.42 To the extent that asymmetries
in step coverage are repeatable, wafer-induced shifts are corrected by comparing
overlay measurements before and after etch.43
Chemical-mechanical polishing (CMP) causes problems for correct acquisition
of alignment targets (Chapter 6) and also creates difficulties for overlay
measurement. Overlay measurement structures need to be optimized to minimize
this effect. The bar or frame structures shown in Fig. 9.15 typically lead to less
of a problem than the box-in-box structures, though the exact dimensions need
to be determined for individual processes. More recently, overlay measurement
structures have been developed that involve grating patterns. An example is shown
in Fig. 9.16.44 These new patterns are less sensitive to degradation by CMP. Such
grating marks also have the potential for being made small (≤5 µm × 5 µm), since
gratings provide a high ratio of line edge to area, which can be useful for measuring
overlay within product die.45
Another motivation for using small marks is the increase in the number of
masking steps that has occurred over time. For many years only two metal layers
were used to fabricate integrated circuits, while eight or more metal layers are
common today. Many implant layers are required to produce the transistors for
high-performance microprocessors. As a consequence, even scribe line space has
become insufficient for including the targets required to measure overlay for all of
the layers. This has motivated the development of overlay measurement marks that
can be used to measure overlay among many layers while being space efficient.46,47
As discussed in Chapter 6, intrafield registration depends on the illumination
conditions and the feature type. Lines and spaces may have different registration
at the different illumination settings. This causes a problem for metrology if the
critical features are one type, such as contact holes, while other types of features,
such as lines, are used for overlay measurement.48 This type of subtle difference is
significant as overlay requirements approach 50 nm or less.
Overlay measurement errors can occur simply because the linewidths of the
measurement structure differ from those in the circuit. Because optical methods
are used, the feature sizes for these structures have been fairly large, on the order
of 1 µm or larger, even when the features in the devices have been submicron.
These differences in feature sizes between the overlay measurement structures and
features actually in the circuits were of minor consequence prior to the advent
of lithography with low k1 . With low-k1 processes, the placement errors of small
features due to lens aberrations will be different from the placement errors of larger
features (Fig. 9.17). Without a change in the overlay measurement structures, there
will be overlay errors as a consequence.
Figure 9.17 Pattern shift of 120-nm line/360-nm pitch features compared to 1-µm features.
These were calculated errors based upon measured aberrations of an ArF lens.
Problems
9.1 What are advantages and disadvantages of scanning electron microscopes for
measuring critical dimensions, relative to scatterometry tools?
9.2 Why are low-beam voltages (a few hundred eV or less) used when
applying scanning electron microscopes for measuring dimensions of features
composed of photoresist?
Metrology 367
References
1. K. M. Monahan, M. Davidson, Z. Grycz, R. Krieger, B. Sheumaker, and
R. Zmrzli, “Low-loss electron imaging and its application to critical dimension
metrology,” Proc. SPIE 2196, 138–144 (1994).
2. L. Reimer, Image Formation in Low-voltage Scanning Electron Microscopy,
SPIE Press, Bellingham, WA (1993).
3. R. R. Hershey and M. B. Weller, “Nonlinearity in scanning electron
microscope critical dimension measurements introduced by the edge detection
algorithm,” Proc. SPIE 1926, 287–294 (1993).
4. J. I. Goldstein, D. E. Newbury, P. Echlin, D. C. Joy, C. Fiori, and E. Lifshin,
Scanning Electron Microscopy and X-Ray Microanalysis, 2nd ed., Plenum
Press, New York (1984).
5. J. Finders, K. Ronse, L. Van den Hove, V. Van Driessche, and P. Tzviatkov,
“Impact of SEM accuracy on the CD-control during gate patterning process of
0.25 µm generations,” Proc. Olin Microlithog. Sem., 17–30 (1997).
6. K. M. Monahan, J. P. H. Benschop, and T. A. Harris, “Charging effects in
low-voltage SEM metrology,” Proc. SPIE 1464, 2–9 (1991).
7. D. C. Joy and C. S. Joy, “Low voltage scanning electron microscopy,” Micron
27(3–4), 247–263 (1996).
8. M. Davidson and N. T. Sullivan, “An investigation of the effects of charging
in SEM based CD metrology,” Proc. SPIE 3050, 226–242 (1997).
9. C. M. Cork, P. Canestrari, P. DeNatale, and M. Vasconi, “Near and sub-half
micron geometry SEM metrology requirements for good process control,”
Proc. SPIE 2439, 106–113 (1995).
10. S. R. Rogers, “New CD-SEM technology for 0.25 µm production,” Proc. SPIE
2439, 353–362 (1995).
11. D. A. Swyt, “Certificate of analysis, standard reference material 2800,”
National Institute of Standards and Technology, (2002).
12. NIST Standard Reference Materials Catalog, National Institute of Standards
and Technology, U.S. Department of Commerce. www.nist.gov/srm.
13. M. T. Postek, A. E. Vladar, S. Jones, and W. J. Keery, “Report on the NIST
low accelerating voltage SEM magnification standard interlaboratory study,”
Proc. SPIE 1926, 268–286 (1993).
14. M. T. Postek, “Scanning electron microscope-based metrological electron
microscope system and new prototype scanning electron microscope
magnification standard,” Scanning Microsc. 3(4), 1087–1099 (1989).
15. M. T. Postek, and R. L. Watters, “Report of investigation reference material
8820,” National Institute of Standards and Technology (2009).
16. W. P. Reed, “Certificate standard reference material 2069b,” National Institute
of Standards and Technology (1991).
368 Chapter 9
Figure 10.1 Illustration of local containment of the immersion fluid (water in this instance).
Immersion Lithography and the Limits of Optical Lithography 373
λ λ
r ≈ 2
(10.5)
NA n sin θ
2
2n 1 − 1 − 2
n
nλ
= . (10.6)
NA2
As can be seen, the exact expression for depth-of-focus reduces to the expression
given in Chapter 2 for small NAs and n = 1.
There are several things to be noted from these expressions. First, the exact
depth-of-focus is actually less than the depth-of-focus given by the expression
that is suitable for low NA. This means that analyses which indicated difficulties
with small depths-of-focus for large-NA lithography were actually optimistic!
While these differences between Eqs. (10.6) and (10.4) are small for low NAs,
the disparity at high NA can be as much as 50%. It can also be seen from Eq.
(10.6) that the depth-of-focus is improved by a factor of n or larger. To the extent
that optical lithography is limited by the depth-of-focus, immersion can provide
improvement.
Silicon wafers will heat during exposure. Modeling has predicted a temperature
rise of ∼16 K for localized regions of the wafer.3 Although only some of this heat
will be transferred to the immersion fluid, the fluid’s index of refraction will be
altered. Careful measurements have shown that the index of refraction of water will
change by about −1 × 10−4 /◦ C for ArF light,2 so only very small increases in the
374 Chapter 10
NA = n sin θ (10.7)
= 1.35. (10.8)
200 nm
= 584 nm. (10.9)
cos θ
Consequently, a 1 ◦ C temperature rise within this volume will create a phase error
of
n (1 ◦ C) −1 × 10−4 /◦ C 584 nm = −0.0839 nm. (10.10)
Since phase errors from other aberrations are typically ∼1 nm,5 the additional phase
errors induced by water heating are not considered critical during the exposure
of a single field. Over the course of exposing an entire wafer, the average water
temperature could rise, so the immersion fluid must be circulated, and temperature
must be controlled actively.
Heating of the immersion fluid due to optical absorption is not the only
temperature concern associated with immersion lithography. Even with good fluid
containment, small amounts of immersion fluid—perhaps just monolayers—will
remain on the wafer surface. Subsequent evaporative cooling will cause the wafer
to contract, affecting overlay (see Problem 10.4). For this reason, considerable
engineering resources have been applied to improving overlay on immersion
exposure tools.
Sensitivity to defects is considerably greater in application to semiconductor
immersion lithography than it would be in microscopy. Not only must there be
nearly no defects in lithographic applications, this must be the case over very large
areas, in contrast to microscopy, where the fields of view are often only microns on
a side, or less. The issue of defects is a critical one for immersion lithography.
Figure 10.4 Top view of immersion-scanning lithography, illustrating how water will be
swept off the side of the wafer and onto the wafer stage.
Figure 10.5 Cross-sectional side view of the wafer on the wafer chuck of an immersion
scanner.
376 Chapter 10
Figure 10.6 Scanning-electron micrograph of a defect resulting from a water drop. The
pattern was supposed to consist purely of lines and spaces.
at higher scanning speeds, and this leads to another set of engineering problems for
achieving high scanner throughput. The potential for material leaching from the
resist also leads to concern over contamination of the bottom lens element.
Two approaches have been taken to address the problem of chemicals leaching
from the photoresist. One is to use a topcoat. As discussed in Section 3.5, topcoats
have long been used as barriers to base diffusion in chemically amplified resists.
For process simplicity, these topcoats are typically water soluble so they can
be readily removed prior to resist development, but this property makes them
unsuitable for use in water-immersion lithography. Two different types of topcoats
have been used in immersion lithography. An early immersion-compatible topcoat
required a solvent to be removed, but this was undesirable because of process
cost. Later topcoats were soluble in developer but not pure water. The most
preferred solution is to have immersion resists that intrinsically leach only very
small amounts of chemicals into water.
Bubbles constitute another potential source of defects in immersion lithography
because they can scatter light.6–9 One typically thinks of particles in terms of
dense objects, but light can be scattered by objects of low index of refraction, such
as bubbles, that are embedded in media of higher optical density.9 Fortunately,
bubble formation and dissipation have been studied extensively, and it has been
found that the gases that comprise air are very soluble in water. Once the water has
been degassed, i.e., the air dissolved in it has mostly been removed, bubbles have
short lifetimes. Bubble lifetime has been predicted theoretically10 and measured in
confirmation of the theory. Results from both are shown in Fig. 10.7. As can be seen
from the data, bubbles have very short lifetimes in degassed water; consequently,
light scattering due to bubbles is expected to be small. However, even if there
are no bubbles, care must be taken to control the amount of air dissolved in
the water, since air can affect the index of refraction at a level significant for
lithography. A difference in the refractive index at λ = 193 nm between air-
saturated and completely degassed water has been measured to be 6.7 × 10−6 .11
Light passing through millimeters of water will have differences in phase on
the order of nanometers, depending upon whether the water is air saturated or
degassed.
Immersion Lithography and the Limits of Optical Lithography 377
It should be noted that water can be used for KrF immersion, at least from an
optical point of view. However, the index of refraction of water is only 1.378 at
KrF wavelengths, so water is less effective at improving KrF than ArF lithography.
Also, it appears that many existing KrF resist platforms do not work well when
the resists are immersed in water. Consequently, there has been little effort to
develop KrF immersion lithography. At λ = 193 nm, water immersion provides a
significant extension of optical lithography, enabling >40% increase in resolution
over imaging in air. How far optical lithography can ultimately be extended is the
subject of the remainder of this chapter.
resolution is given by
λ
resolution = k1 , (10.11)
NA
where the prefactors of Eqs. (2.4) and (2.7) are replaced by a general factor k1 .
Similarly, the expression for depth-of-focus can be written as
λ
depth-of-focus = ±k2 . (10.12)
NA2
It has long been recognized that Rayleigh’s and equivalent expressions are inexact
predictors of resolution, but do correctly capture the trends associated with
wavelengths and numerical apertures. Other factors, such as the resist process,
are captured by the coefficients k1 and k2 . In 1979, the state-of-the-art lens had
a resolution of 1.25 µm, a ±0.75-µm depth-of-focus, a numerical aperture of 0.28,
and was imaged at the mercury g line (λ = 436 nm). This produced values of 0.80
and 0.13 for k1 and k2 , respectively. With these values for the coefficients in Eqs.
(10.11) and (10.12), the numerical aperture of a g-line lens capable of producing
0.8-µm features would be 0.44, with a ±0.3-µm depth-of-focus. In 1979, long
before the use of chemical-mechanical polishing in device fabrication, this was too
small a depth-of-focus to provide adequate imaging over the heights of then-typical
device topographies and within the ability of steppers to control focus. Thus, it was
declared that optical lithography would not be capable of submicron lithography.
There were several mistakes in this argument that were already becoming
apparent by the mid-1980s.15
1. These early predictions were wrong, because they assumed that the optics
found on state-of-the-art steppers in the late 1970s and early 1980s were nearly
diffraction limited, and extrapolations were based upon that assumption. The
extent to which these assumptions were off the mark has been demonstrated by
the application of today’s superior lens-designing methods and manufacturing
capability. Until recently, these capabilities have been applied only to leading
edge, high-numerical-aperture lenses, but with the introduction of very large
field systems for mix-and-match applications, actual diffraction limits for
smaller numerical apertures can be observed. This enables a direct reassessment
of earlier analyses. For example, Nikon produced a 0.3-NA i-line lens for its
4425i stepper. This lens had a resolution of 0.7 µm (k1 = 0.58) and specified
±2.5-µm depth-of-focus (k2 = 0.62), while thirty years ago, lenses (g line)
with similar numerical apertures had 1.1- to 1.25-µm resolution and ±0.75-µm
depths-of-focus. By extrapolating the future of optical lithography from these
two different sets of capabilities—old and new—one arrives at significantly
different conclusions.
2. It was wrong to assume that the mercury g line would always be used for
optical lithography. For a given feature size, depth-of-focus is increased by
Immersion Lithography and the Limits of Optical Lithography 379
using a smaller wavelength. The progression from the 436-nm (mercury g-line)
wavelength to the 365-nm (i-line) wavelength, and onto 248-nm (KrF) and 193-
nm (ArF) lithography, has enabled feature sizes to shrink while maintaining a
useable depth-of-focus.
3. Photoresists have improved, contributing to effective changes to k1 and k2 .
Lithography today routinely operates at values of k1 = 0.35 and smaller. The
argument of 1979, revised with a k1 value of 0.5, predicts that 0.8-µm resolution
could be achieved with a numerical aperture of 0.27 with a g-line lens. Even
with the previously predicted value for k2 of 0.13, the resulting depth-of-focus
would have been more than adequate.
4. Depth-of-focus requirements have decreased, primarily as a result of the use
of chemical-mechanical polishing. Because it is no longer necessary to image
through the depth of device topography, good images can be obtained on optical
systems with small depths-of-focus. Depths-of-focus of only ±0.1 µm and
smaller are now considered adequate.
In addition to the reasons already identified in the 1980s, more recently developed
resolution-enhancement techniques, such as phase shifting, off-axis illumination,
and the other methods presented in Chapter 8, can further extend optical
lithography beyond the limits that were inferred from scaling using the Rayleigh
criteria. The field of “wavefront engineering,” an umbrella term coined to include
all the techniques that seek to modify the effects of diffraction in order to improve
resolution and depth-of-focus,16 addressed many of the challenges at the 100-nm
level and below, forestalling the long-forecasted “death of optics.”
Regardless of the errors in past predictions, the scaling laws for resolution and
depth-of-focus indicate that arbitrarily small features will not be made optically.
The issue has never been whether there is a limit to optical lithography; the
questions have been of when the end will come, and what the ultimate resolution
will be. It is clear that we need to look beyond the basic Rayleigh criteria.
Wavefront engineering has benefits for specific feature types, dependent on the
optical configuration of stepper and mask. It is important to understand the
applicability of these sophisticated enhancement techniques to obtain an estimate
of optical lithography’s ultimate limits.
It is useful to define more precisely what is meant by “optical lithography.”
For the purposes of this book, optical lithography is defined as any lithographic
technique that
1. Uses photons to induce chemical reactions in a photoresist,
2. Involves a transmission photomask, and
3. Has the potential for image reduction using projection optics.
These definitions help to distinguish optical lithography from other patterning
techniques that do not involve photons, or do use photons but are very different
in character from the techniques described thus far in this book. The last two
requirements separate x-ray and extreme ultraviolet (EUV) lithography from
optical lithography, even though these two other types of lithography involve
380 Chapter 10
photons. Several reasons were noted above as to why optical lithography has
survived longer than earlier anticipated. It is worth trying to understand whether
there are opportunities remaining for improvement along the same paths used to
extend optical lithography beyond past expectations, and how much extension will
be provided by recently developed techniques.
Figure 10.8 Progress in lithography with 0.7× improvement between generations, relative
to a limit of zero, or a limit of 40 nm.
Immersion Lithography and the Limits of Optical Lithography 381
Lenses have improved considerably during the past two decades. Unfortunately,
the biggest gains have already been made in terms of reductions of aberrations.
While further reductions in aberrations will continue to occur, great strides are no
longer possible. An extension of optical lithography will have to come from other
sources.
Figure 10.9 Illustration of the factors that increase in difficulty in proportion to tan θ.
382 Chapter 10
1.30 65 2.14
1.35 70 2.75
1.40 77 4.32
Figure 10.10 Greater numerical apertures are possible when the immersion fluid has an
index of refraction less than that of the material of the bottom lens element.
Water 1.44
Fused silica21 1.56
Calcium fluoride21 1.50
Aluminum oxide22 1.92
Lutetium aluminum garnet (LuAG)23 2.14
that of water without also fabricating the bottom lens element with a material other
than fused silica and calcium fluoride.
A materials search identified one material, lutetium aluminum garnet
(Lu3 Al5 O12 ), that has a high refractive index at λ = 193 nm and has sufficiently
low birefringence that lens makers can consider using it for a lens element.
However, progress has been slow in producing crystals with high transparency and
good homogeneity. The problem of transparency did not arise from fundamental
absorption by the crystal, but rather was a consequence of chemical impurities.
With a slow rate of improvement, efforts to produce lens-quality lutetium
aluminum garnet crystals have since been reduced. Another reason for slowing
down crystal development efforts were challenges in developing an immersion
fluid with a high index of refraction which meets all requirements, without which
a high-index crystal is not needed.
Assuming that a high-index lens material and a high-index immersion fluid
can be identified, and that solutions can be found for all associated technical
issues, there is still one more material that needs to be addressed, and that is the
photoresist. As noted in Chapter 4, image contrast is reduced at high angles of
incidence on the resist for P-polarized light (see Fig. 4.3). The only reason that
there is good imaging even at moderate numerical apertures is because of the
refraction at the interface between the resist and air or fluid. For this refraction
to improve image contrast, it is necessary that the index of refraction of the resist
be greater than that of the fluid. For most chemically amplified resists, n ≈ 1.7,
so this condition is satisfied with water as the immersion fluid, as well as most
384 Chapter 10
problems with mask thermal expansion, lens materials, and resists, 157 nm will be
considered for the remainder of this chapter as the smallest wavelength at which
optical lithography can theoretically be practiced in the future, while 193 nm is the
shortest wavelength from a practical point of view. The ability to advance optical
lithography by moving to a shorter wavelength is very near an end.
Recent measurements indicate that the diffusion length for this process is on the
order of 10–30 nm. When chemically amplified resists were first used widely,
linewidths were ∼350 nm. For such features, the blurring from photoacid diffusion
was not significant. However, as features have become smaller than 50 nm, the
impact of photoacid diffusion becomes quite relevant.
It is possible to reduce the amount of diffusion during the post-exposure
baking step, but there are consequences. First, this diffusion ameliorates line-edge
roughness (LER) to a certain degree. The larger the diffusion, the more the LER
is smoothed. Reducing the diffusion will necessitate additional measures to keep
LER at an acceptable level. Also, a significant amount of diffusion enables a large
amount of chemical amplification. Thus, reducing diffusion will also reduce resist
sensitivity. This is of particular concern for some of the technologies that will be
discussed in Chapters 12 and 13.
0.25
= 0.83, (10.13)
0.30
and this requires taking resolution to the absolute limit allowed by the laws of
physics. A more likely minimum value will be somewhat larger, perhaps k1 = 0.28
or 0.27. Thus, only modest decreases in minimum feature sizes will be possible in
the future through reduction in k1 .
As has been described thus far in this chapter, nearly all of the improvements
that have been introduced in the past to advance lithography—better resists, lenses
with lower aberrations and higher numerical apertures, flatter wafers, and shorter
wavelengths—have been exploited nearly completely. How close to the theoretical
limit optical lithography can be taken is discussed in the next section.
Figure 10.12 The evolution of k1 over time, using DRAM half pitches to ascertain k1 .
Immersion Lithography and the Limits of Optical Lithography 389
λ
resolution = k1 . (10.14)
NA
Today, ArF processes with k1 < 0.35 are common. For these processes, pitches
<90 nm are imaged with currently available 1.35-NA lenses. If we assume that
similar values of θ can be produced for a 157-nm lens and an immersion fluid with
n = 1.65 is available, along with glass materials to support such a fluid, it would
be possible to produce 27-nm lines and spaces with k1 = 0.27 and a depth-of-focus
[Eq. (10.4)] of approximately ±36 nm. Lithography with such depths-of-focus will
require lenses with very little field curvature and astigmatism, exposure tools with
focus control of 10 nm (or less), negligible device topography, and extremely flat
wafers. Nevertheless, such processes are not out of the question. Even if we restrict
ourselves to conservative values for NA (1.35), k1 (0.28), and wavelength (193
nm), a resolution of 40 nm is achievable. All of these estimations indicate that
the ultimate limit for optical lithography is somewhere between 27 nm and 40 nm.
Beyond this range—the 22-nm node, for example—lithography methods that differ
in form from optical lithography as described thus far in this book will be required.
The degree to which optical lithography can be extended to very low values of
k1 depends upon how successfully all of the methods described in Chapter 8—
optical proximity corrections, off-axis illumination, and phase-shifting masks—
can be implemented. The ability to implement particular techniques depends upon
the application. For example, the imaging of highly repetitive structures, such as
those found in memories, can be enhanced through the use of off-axis illumination,
but the benefit is less for isolated features.
In Chapter 8, several resolution-enhancement techniques were presented:
optical proximity corrections, phase shifting, and off-axis illumination. The
issues associated with each of these techniques are best understood in light
of requirements. The constraints for memories, logic, and application-specific
integrated circuits are all different. For memories, the goal is packing a large
number of bits into a given area of silicon. Memory is used by the bit, and
increasing density provides more bits at lower prices. The memory business is very
dependent upon the manufacturing cost per bit.
The requirements for logic are somewhat different. There is certainly an element
of importance in manufacturing efficiency that can come from packing density,
particularly for microprocessors with significant amounts of on-board cache
memory. However, processor speed has a big effect on the value of the part. This
is shown for a Windows-compatible microprocessor in Fig. 10.13. Retail prices
are seen to be extremely dependent upon processor speed. Clearly, consumers are
willing to pay significantly more money for higher-performance microprocessors,
which motivates manufacturers of microprocessors to maximize performance. Fast
chips require fast transistors, which usually necessitates short gate lengths. The
speed at which the processor can operate reliably is usually limited by the slowest
transistor on the part. Hence, it is undesirable to have any transistors with long
gate lengths. On the other hand, transistors with gates that are too short may break
down electrically or can result in wasted power consumption due to leakage. Thus,
it is undesirable for gates to be either too long or too short. For microprocessors,
linewidth control is critical.
Immersion Lithography and the Limits of Optical Lithography 391
Figure 10.13 AMD Phenom II Deneb processor prices as a function of processor clock
speed. The prices were taken on January 11, 2010, from the Web site∗ of a company that
sells computer hardware.
Linewidth-control requirements for logic gates have been incorporated into the
International Technology Roadmap for Semiconductors and are summarized in
Table 10.4. These requirements are very tight, and even the smallest detractors of
linewidth control will need to be addressed. Consider the issue of optical proximity
corrections. There are limits to how finely these can be applied, particularly in
circumstances where there are large mask-error factors (MEF). The problem can
be illustrated by a hypothetical example. Suppose that a 3-nm iso-dense adjustment
needs to be made. For 4× reticles, this represents a 12-nm adjustment, if MEF = 1
and the beam writer is writing on a 5-nm grid, then there will be a 2-nm residual
error following a 10-nm correction on the mask. With MEF = 1, this results in a
0.5-nm error on the wafer, which might be considered acceptable. However, with
MEF = 4, the error on the wafer will be 2 nm, which is quite significant relative to
linewidth control requirements in the 45-nm node and beyond.
Table 10.4 Gate critical dimension (CD) control requirements from the year 2009
International Technology Roadmap for Semiconductors.
Node (nm) Year Microprocessor gate CD in resist Microprocessor gate CD
(nm) control, lithography
contribution only (3σ, nm)
45 2010 41 2.8
32 2013 28 2.1
23 2016 20 1.6
16 2019 14 1.2
Phase shifting does not eliminate the problem of a finite grid size for adjusting
iso-dense biases, although it does reduce the mask-error factor significantly. The
∗
www.Newegg.com
392 Chapter 10
value of this is illustrated in Figs. 10.14 and 10.15, where the linewidth target of
120 nm is simulated over a range of pitches. Without optical proximity corrections,
the linewidth variations are large (Fig. 10.14). They are reduced significantly by
adjusting the mask CD at several different pitches, as illustrated in Fig. 10.15.
However, even with these optical proximity corrections there is some residual
linewidth variation, even at best focus. This is a consequence of corrections being
made on a grid of only 20 nm on the mask. The mask could be made on a finer
grid, but this would increase the cost.
There are also certain “forbidden” pitches35 where the depth-of-focus is much
worse than it is for others. For some technologies, such variations might be
tolerable, but not for high-performance microprocessors, where exceptionally
Figure 10.14 Simulated linewidth on the wafer versus pitch for a fixed linewidth (100 nm)
on an alternating phase-shifting mask, for several values of defocus, targeting 120 nm on
the wafer. The parameters of the calculation assumed λ = 248 nm, NA = 0.6, and σ = 0.5.34
The resist was UV5-like. An aberration-free lens was assumed.
Figure 10.15 Simulated linewidth for the same conditions as for Fig. 10.14, except that the
dimensions were proximity corrected on a 5-nm grid (20 nm on the mask for a 4× system).
Numerical aperture and partial coherence were selected to optimize the range of the
forbidden pitches.
Immersion Lithography and the Limits of Optical Lithography 393
Figure 10.16 When memory arrays are densely packed, there is no room to add features
for optical proximity corrections.
394 Chapter 10
Figure 10.17 The gate shape is the result of simulation. As a consequence of patterns in
proximity to this gate, the gate length is not uniform across the active region.
Immersion Lithography and the Limits of Optical Lithography 395
There are many challenges that need to be overcome for optical lithography
to be extended to the limits of water-immersion lithography, and even greater
challenges for extension beyond that. Regardless, 27 nm appears to be a clear
limit for conventional optical lithography. A possible optical approach involving
a paradigm that has potential for patterning down to the 22-nm node is presented
in the next section, and nonoptical technologies will be the subject of this book’s
final two chapters.
If σ1 = σ2 = σ, this reduces to
s
" #2
1
σtotal = σ2 + µ1 − µ2 .
(10.16)
2
For the situation where lines of resist are the features produced by the individual
patterning operations, this equation clearly shows the increase in the spacewidth
variation caused by overlay.
If the lines are much more critical than the spaces, as is typically true for
gate layer, then patterning lines of resist is an appropriate choice for a double-
patterning process, since the critical gate length is not affected directly by overlay
errors. However, there are circumstances where the spaces might be considered
more critical than the lines. For example, in a damascene metal process, the metal
lines are defined by the space. If the primary concerns are resistance and perhaps
electromigration, then the preceding analysis shows that process control might be
severely limited by overlay. A way to avoid this is to print spaces as the primary
features, as shown in Fig. 10.21. There are some difficulties with this. For very
small features, spaces are more difficult to print using positive resist than lines
of resist, even on loose pitches. Negative tone resist processes can provide a
solution.39
Another way to reduce the impact of overlay on spacewidth control is by the
use of spacers to achieve tight pitches.40 This approach is illustrated in Fig. 10.22.
In this approach a pattern on a loose pitch is created, and this is then coated with
a conformal thin film. The film is then etched anisotropically in a process very
similar to that used for many years to create spacers on gates. As can be seen,
the spacer process eliminates the dependence of critical dimensions on overlay.
398 Chapter 10
Figure 10.22 Double patterning using a spacer process that halves the pitch dimension.
However, the opposite situation can occur. A mis-sized core pattern used in a spacer
process will cause misplacement of the spacer-defined features. Spacer processes
will require additional lithography and etch steps to remove features at the ends of
lines that may not be desired (Fig. 10.23).
Double patterning has been illustrated in Figs. 10.19–10.23 with line and space
patterns. This is appropriate, since the problem that double patterning is attempting
to address is that of creating tight pitches, and such processes have already found
applications for printing the dense grating structures found in flash memories.
However, patterns of real circuits consist of far more than simple lines and spaces,
particularly for logic and memories outside of the cores (Fig. 10.24). It may not be
possible to decompose layouts for double-patterning processes, and circuit layouts
Immersion Lithography and the Limits of Optical Lithography 399
Figure 10.23 Top view of a pattern generated in a spacer process, illustrating an unwanted
spacer that needs to be removed with additional lithography and etch steps in order to
created a unidirectional grating.
Figure 10.24 Metal layout for random logic, unmodified to be amenable to patterning using
spacer double-patterning processes.
λ
. (10.19)
2 sin θ
birefringence that are too high for use in lenses. One such material is aluminum
oxide, with an index of refraction of 1.92 at λ = 193 nm.45
Interferometric lithography is a technique capable of patterning gratings of
very fine pitches, a method of great utility for making high-resolution optical
gratings. It is less straightforward to extend this technique to the patterning of
random logic patterns. Some approaches have been proposed,46,47 but there has not
yet been a commercial attempt to implement them and verify their practicality.
Nevertheless, interferometric lithography has proven to be a useful method for
studying fundamental resist issues, such as how resists behave when immersed
in fluids during exposure. Interferometric lithography is a useful tool for the
laboratory, even if it cannot yet be practically used to pattern integrated circuits.
The extension of optical lithography to its ultimate limits will require that many
problems be solved. These include full implementation of the topics presented in
Chapter 8, along with further refinement of lenses, exposure tools, and masks.
While technical solutions for these problems may be found, their cost becomes
a concern. The cost of lithography is the subject of the next chapter.
Problems
10.1 Show that the approximate depth-of-focus given by
λ
DOF =
NA2
is always larger than the exact DOF given by the expression in Eq. (10.4).
10.2 Show that the linewidth reduction possible by changing the wavelength from
193 nm to 157 nm is 21%, assuming equivalence in all other aspects of the
lithographic processes. Is this sufficient to advance lithography one node?
How does this resolution improvement compare with the benefits of ArF
water-immersion lithography?
10.3 Assuming λ = 193 nm and NA = 1.35, show that 40-nm lines and spaces is
k1 = 0.28. How much smaller do think is possible from such optics?
402 Chapter 10
References
1. C. A. Winslow, Elements of Applied Microscopy, John Wiley & Sons, New
York (1905).
2. J. H. Burnett and S. G. Kaplan, “Measurement of the refractive index and
thermo-optic coefficient of water near 193 nm,” J. Microlith. Microfab.
Microsyst. 3(1), 68–72 (2004).
3. G. Nellis and A. Wei, “Preliminary analysis of laser-pulse-induced pressure
variation for immersion lithography,” J. Microlith. Microfab. Microsyst. 3(1),
84–86 (2004).
4. S. Owa and H. Nagasaka, “Advantage and feasibility of immersion
lithography,” J. Microlith. Microfab. Microsyst. 3(1), 97–103 (2004).
5. S. Owa, K. Shiraishi, S. Nagaoka, T. Fujiwara, and Y. Ishii, “Full field, ArF
immersion projection tool,” Proc.17th Ann. SEMI/IEEE Adv. Semicond. Manuf.
Conf., 63–70 (2006).
6. P. L. Marsten, “Light scattering from bubbles in water,” Proc. Oceans ’89,
1186–1193 (1989).
7. L. Tsang, J. A. Kong, and K. H. Ding, Scattering of Electromagnetic Waves:
Theories and Applications, Wiley, New York (2000).
8. T. Gau, C. -K. Chen, and B. J. Lin, “Image characterization of bubbles in water
for 193-nm immersion lithography–far-field approach,” J. Microlith. Microfab.
Microsyst. 3(1), 61–67 (2004).
9. D. Ngo and G. Videen, “Light scattering from spheres,” Opt. Eng. 36(1),
150–156 (1997).
10. P. S. Epstein and M. S. Plesset, “On the stability of gas bubbles in liquid-gas
solutions,” J. Chem. Phys. 18(11), 1505–1509 (1950).
11. A. H. Harvey, S. G. Kaplan, and J. H. Burnett, “Effect of dissolved air on the
density and refractive index of water,” Int. J. Thermophys. 26(5), 1495–1514
(2005).
12. A. Suzuki, “Immersion lithography update,” Sematech 2nd Immersion Lithog.
Work. (July, 2003).
13. A. C. Tobey, “Wafer stepper steps up yield and resolution in IC lithography,”
Electronics, 109–112 (August 16, 1979).
Immersion Lithography and the Limits of Optical Lithography 403
43. K. R. Chen, W. Huang, W. Li, and P. R. Varanasi, “Resist freezing process for
double exposure lithography,” Proc. SPIE 6923, 69230G (2008).
44. J. A. Hoffnagle, W. D. Hinsberg, M. Sanchez, and F. A. Houle, “Liquid
immersion deep-ultraviolet interferometric lithography,” J. Vac. Sci. Technol.
B 17(6), 3306–3309 (1999).
45. B. W. Smith, Y. Fan, M. Slocum, and L. Zavyalova, “25 nm immersion
lithography at a 193 nm wavelength,” Proc. SPIE 5754, 141–147 (2004).
46. X. Chen and S. R. J. Brueck, “Imaging interferometric lithography–a
wavelength division mulitplex approach to extending optical lithography,” J.
Vac. Sci. Technol. B 16, 3392–3397 (1998).
47. X. Chen and S. R. J. Brueck, “Imaging interferometric lithography for arbitrary
systems,” Proc. SPIE 3331, 214–224 (1998).
Chapter 11
Lithography Costs
11.1 Cost-of-Ownership
The high price tags of exposure tools have made the cost of lithography a concern
since the advent of projection lithography, and lithography costs may ultimately
limit patterning capability, more so than technical feasibility. While there will
always be a market for electronics where price is secondary to performance, the
large personal computer and portable phone markets have proven to be extremely
elastic. To meet the demands of the consumer, lithography will need to be cost
effective, in addition to providing technical capability. Lithography costs have
several components. These include:
Figure 11.1 Stepper prices over time (in U.S. dollars), collected by SEMATECH.2 Note the
logarithmic scale.
Figure 11.2 Stepper prices over time (in U.S. dollars), with costs from the newest tools
added to those included in Fig. 11.1. The data are also plotted on a linear scale. The solid
line is an exponential least-squares fit to the SEMATECH data. The prices for the EUV tools
(EUV lithography is the subject of the next chapter) are from Wüest, Hazelton, and Hughes.3
Table 11.1 Comparison of capital costs and capability for the GCA DSW4800 and a
modern step-and-scan system. The bit throughput assumes that the bit size is proportional
to the square of the resolution, which changed from 1.25 µm for the GCA DSW4800 to 40 nm
for modern 193-nm immersion systems. This does not take into account any nonlithographic
innovations that have enabled reductions in bit size.
GCA DSW4800 Modern ArF imersion
step-and-scan
Figure 11.3 Lithography costs per unit area exposed per hour. The solid line is the average
for the dry tools.
410 Chapter 11
per bit has plummeted. In terms of the value to the consumer of microelectronics,
modern wafer steppers are extremely cost effective. However, for producers of
chips, the very high capital costs require enormous investments. This cost has
certainly contributed significantly to the transformation of the semiconductor
industry from being comprised of a large number of small chip-making companies
to only a few large producers and an assortment of companies that contract others to
do their manufacturing. Since most cost-of-ownership analyses are oriented toward
a time period in which the wafer size is constant, the most common type of cost-of-
ownership analysis performed is the cost per wafer, which is the metric discussed
in this chapter. Extension to area or bit cost analyses can be obtained by scaling the
wafer cost of ownership appropriately.
As noted, the cost of lithography is strongly affected by equipment throughput.
The basic capital cost per wafer exposed is
C ED
cost/wafer = , (11.1)
T pU
where C ED is the capital depreciation per hour, T p is the raw throughput of the
system (in wafers per hour), and U is the fractional equipment utilization. Raw
throughput is the number of wafers per hour that can be aligned and exposed by
a stepper operating in the steady state. If lithography equipment is configured
to have exposure tools interfaced with resist-processing equipment, then C ED
should be the capital depreciation per hour of the total photocluster, and U is the
overall photocluster utilization. T p is the photocluster throughput, which may be
determined by the throughput capability of the resist processing equipment, rather
than by the exposure tool.
Early efforts to improve lithography productivity centered on stepper
throughput. The basic model for the raw throughput of step-and-repeat systems,
in wafers per hour, is
3600
throughput = , (11.2)
tOH + N(texp + tstep )
where N is the number of exposure fields per wafer, texp is the exposure time per
field, tstep is the amount of time required to step between fields, including the
time for vibrations to settle out (“step-and-settle time”) and the time required for
focusing. Alignment time for a stepper using die-by-die alignment would also be
included in tstep . The overhead time per wafer tOH is the time required to remove
a wafer from the chuck, place a new wafer onto the chuck, and align the wafer.
All times in the right-hand side of Eq. (11.2) are measured in seconds, and for this
equation it is assumed that only a single reticle is used per wafer.
The GCA4800 had a 10× lens with a 10-mm × 10-mm field size. With such
a small field, many exposures [N in Eq. (11.2)] were required per wafer. One of
the first steps taken to improve productivity was to decrease the lens reduction
to 5×. Field sizes (on the wafer) were increased to diameters > 20 mm, enabling
Lithography Costs 411
14-mm × 14-mm and 15-mm × 15-mm fields. Over time, field diameters increased
to over 31 mm, capable of supporting 22-mm × 22-mm fields. These bigger fields
provided higher throughput and could support large chip sizes. The migration from
10-mm × 10-mm fields to 22-mm × 22-mm fields led to a decrease in N, and
greatly increased stepper throughput. It should be noted that the reduction in N
does not scale exactly to the ratio of exposure-field areas, because partial fields
are often exposed at the edges of wafers, but the transition from 10× to 5× lens
reduction reduced N by approximately a factor of 4. With step-and-scan, even
larger exposure fields have become available, and tools with exposure fields as
large as 26 mm × 33 mm are typical. Nikon and Canon have introduced step-and-
repeat systems that match this field size. These tools will be discussed further in
Section 11.2.
Decreases in N are beneficial, to the extent that they are not offset by increases in
texp and tstep . If the total light energy available to expose a field remained constant,
then texp would need to increase in direct proportion to any increase in field size,
because the fixed amount of light would be spread over a larger area, and Ntexp
would remain fairly constant, assuming that N roughly scales inversely with field
area. Because of edge fields, Ntexp could actually increase without an improvement
in total illumination flux, and the only benefit to larger fields would come from the
reduction in the Ntstep term. Fortunately, another important way in which stepper
productivity has improved has been in terms of light flux. If I is the light intensity,
and S is the resist sensitivity (exposure dose required to achieve the desired resist
dimension on the wafer), then
S
texp = . (11.3)
I
Early wafer steppers provided <100 mW/cm2 of light to the wafer plane,
while intensities for i-line exposure tools that exceed 4000 mW/cm2 are common
today. These increases in light intensity result in proportional reductions in texp .
Photoresist sensitivity has also improved over time, particularly with the advent
of chemically amplified resists. DUV systems, where chemically amplified resists
are used, can have exposure intensities <1000 mW/cm2 while still maintaining
practical productivity.4 Today, fields are often exposed in <0.1 sec. There are
practical limits to how short the exposure time can be, since there will be dose-
control problems associated with exceedingly short exposure times on step-and-
repeat systems, and rapid exposures on step-and-scan systems are limited by the
fastest controllable stage-scanning speed. Scanning speeds are also limited by the
combination of pulse-to-pulse energy stability and excimer-laser-pulse frequency,
as discussed in Chapter 5.
Throughput improvement from large field sizes (smaller N) is somewhat offset
by the need to step or scan longer distances. Increases in field sizes require longer
stepping distances, so decreases in N are somewhat offset by increases in tstep .
However, there have been significant improvements in tstep as a consequence
of improvements in wafer-stage technology. Stages must be capable of high
velocity, acceleration, deceleration, and jerk (third time derivative of position).
412 Chapter 11
Prior to exposure, the stage position must be returned to a stable position, and the
wafer must also be properly focused and leveled. Advanced mechanical modeling
techniques have been used to optimize stage designs, and state-of-the-art control
techniques and electronics are used. It should be noted that tstep in Eq. (11.2) is an
average step-and-settle time, and actual times will differ in the x and y directions,
particularly for nonsquare fields. Typical values for the parameters used in Eq.
(11.2), for contemporary i-line wafer steppers, are given in Table 11.2.
The overhead time tOH includes wafer transport times and the time for global
alignment. There is a clear tradeoff between alignment precision, enhanced by a
large number of alignment sites, and throughput. If multiple reticles are used per
wafer, then the time to exchange reticles can be included in tOH (see Problem 11.2).
Step-and-scan systems have somewhat modified throughput models. The basic
throughput equation, Eq. (11.2), is applicable in modified form. For step-and-scan
systems, exposure times are given by
HF + H
texp = , (11.4)
v
where HF is the height of the scanned field, H is the slit height, and v is the scanning
speed (Fig. 11.4). All sizes and speeds must consistently refer to either the wafer or
reticle. There needs to be a certain amount of scanning startup time, so the stages
are moving at a constant controlled speed when exposure actually occurs, and this
time can be included in tstep . To incorporate resist sensitivity into the model, the
exposure dose S needs to calculated. If I(y) is the intensity at position y in the slit,
then the exposure dose in a scanner is given by
RH
I(y)dy
S = 0
, (11.5)
v
¯
where H is the slit height. In terms of the average intensity I,
¯
IH
S = . (11.6)
v
N 76
tOH (single reticle/wafer) 9 sec
I 4000 mW/cm2
S 200 mJ/cm2
texp 0.05 sec
tstep 0.07 sec
throughput 200 wafers/hour
Lithography Costs 413
Comparing this to Eq. (11.3), to achieve the same exposure time for a given
exposure dose, the light intensity must be greater for a scanner, compared to a
step-and-repeat system, by the factor
HF + H
. (11.8)
H
For a typical field height HF = 25 mm and slit heights of 5–10 mm, the exposure
intensity must be greater by a factor of 3.5–6 in order for the scanner to have the
same exposure times as an equivalent step-and-repeat system. However, the light
for a scanning exposure tool is being spread over an area that is smaller than on a
step-and-repeat system with the same field size by the factor
H
. (11.9)
HF
Illuminating a smaller area is not difficult with laser-light sources that produce
collimated light, but this is a challenge for arc-lamp systems that must gather light
emitted in many different directions.
Improvement in throughput with taller fields is not what is predicted by changing
N in Eq. (11.2), because longer scans are needed. For example, suppose the number
of fields is cut in half by a 2× increase in scanning height:
N
N→ , (11.10)
2
HF → 2HF . (11.11)
414 Chapter 11
HF + H
tOH + N(tstep + texp ) = tOH + N tstep + , (11.12)
v
HF H
= tOH + Ntstep + N +N . (11.13)
v v
N 0 HF N H
→tOH + tstep + N + . (11.14)
2 v 2 v
0
The step time is modified as tstep → tstep , indicating slightly longer stepping times
with the longer fields. While two terms are cut in half (or nearly half) by the
reduction in exposure fields, there is one term, N(HF /v), that remains proportional
to N. Consequently, the benefit of fewer fields is less than proportional to the
reduction in the number of fields.
Equipment downtime and setup time detract from the equipment output, and
these are included in the utilization factor in Eq. (11.1). A standard set of
states has been defined to facilitate the analysis of semiconductor manufacturing
equipment productivity (Fig. 11.5).5 Productivity is enhanced when the amount
of time that the equipment is in the productive state is maximized. The most
obvious loss of productivity occurs when tools are in a nonfunctional state.
This is equipment downtime. Typical downtime for lithography tools is 5–10%.
Unscheduled downtime occurs when equipment breaks unexpectedly and requires
repair. Scheduled downtime occurs for lamp changes, excimer-laser-window
replacements, the cleaning of resist-processing equipment, and similar types of
equipment maintenance that are necessary and can be planned. Engineering time
is the time the equipment is used for process and equipment engineering. To
the extent that process engineering is an objective of the wafer fab, this can be
considered useful time. It is not valuable time when process engineering is using
the tool to analyze a tool-related problem.
Standby time is often a significant detractor from productivity. It includes the
time that no operators are available, the time waiting for the results of production
tests, and the time during which there is no product available. When test wafers
are needed to set up individual lots, standby time can be high. It should be noted
that optimum fab operation will necessarily result in some standby time when there
is no product available. This can be understood as follows. Because the times are
variable when wafers are being processed outside of lithography, the work available
for lithography will also vary. Consider a lithography sector with only a single
photocluster. From queuing theory, the number of lots waiting for processing Nq is
related to the process time t and average rate R at which lots enter the lithography
Lithography Costs 415
where h· · ·i denotes the time average of the quantity in the brackets. From the same
theory, the fraction of time F0 when there are no wafers in process or queue is
given by
F0 = 1 − hti R. (11.16)
and Eqs. (11.15) and (11.16) can be solved. The result is plotted in
Fig. 11.6. Similar but more complex equations apply for situations with multiple
photoclusters, but the basic conclusion remains the same: the queues in front of
the lithography operations grow very large unless some time is planned when no
wafers are available. With multiple photoclusters, the size of the queue is reduced
for a given fraction of time with zero queue. To maximize productivity, this time
with no wafers needs to be kept as small as possible, but the optimum value
is nonzero. This conclusion is unappealing to managers who want to avoid any
416 Chapter 11
Figure 11.6 The number of lots increases as the percentage of time with no queue goes
to zero for a single photocluster.
idle time for their expensive lithography equipment, yet also want to maintain
good cycle times, which require small queues. There have been a number of fab
managers who have tried to avoid what has been shown mathematically to be
inescapable, to their eventual regret and failure.
The utilization factor U in Eq. (11.1) is the sum of the fraction of productive
time and engineering times in the equipment-state diagram of Fig. 11.5. Part of the
uptime is consumed because of the practical necessity of having a finite standby
time to avoid large queues, but unfortunately, this is usually only a small part of the
lost productive time. Another contributor is inadequate production organization,
and it is clear that some companies are more effective at managing this than others.
This is a management issue, not directly one of the science of lithography, and will
not be discussed further in this book.
To place lithography capital costs in perspective, consider a photocell consisting
of a step-and-scan system costing $50M, interfaced to resist processing equipment
that costs $9M. If the equipment is depreciated over five years, then the straight
depreciation cost is $1347 per hour. If the raw throughput is 140 wafers per hour
and the utilization is 80%, then the capital contribution to the cost per wafer is $12.
All other costs can be considered in comparison to this number. In this analysis,
the cost of capital, which fluctuates with interest rates, was not included.
11.1.2 Consumables
Photochemicals are needed on a per-wafer basis. The most expensive chemical is
photoresist, with typical costs shown in Table 11.3. Resist coaters use 1–2 cc per
wafer (300 mm), so resist costs per wafer, in manufacturing, can run between $0.10
and $2.00 per wafer, assuming the lower volume/wafer resist use. Developer costs
can add another $0.20 to $0.50 per wafer. If antireflection coatings are used, their
costs must also be included. If these antireflection coatings are not essential parts of
the device integration and are there solely to improve the lithography, their coating
Lithography Costs 417
g or i line $100–$300
248 nm $250–$500
193 nm $300–$2000
Gas 2 billion
Laser cavity 40 billion
Line-narrowing module 60 billion
418 Chapter 11
for which market requirements dictate a low selling price will not be made in the
most advanced technologies, because of reticle costs. As discussed in Chapter 13,
considerations are also being given to maskless lithographic technologies for
products made in low volumes.
11.1.4 Rework
11.1.5 Metrology
f M X C ED,i
, (11.18)
U i T P,i
where the sum is over the different types of metrology tools. If we assume the
parameters of Table 11.5 and f = 0.02 (2% of the wafers are measured), U = 0.8,
and straight five-year depreciation, then the metrology cost per wafer is $1.36,
which is greater than 10% of the capital cost from the photocluster. While this
is a smaller cost than the photocluster capital cost or the mask cost, it is not
insignificant.
per photocluster (140 wafers per hour, 80% utilization), labor represents $0.18
per wafer. This is ∼1.5% of capital equipment costs. Direct labor costs are easily
offset by small differences in tool utilization, so a highly skilled workforce is a
competitive advantage. Improving productivity by only one or two percent can
completely compensate for direct labor costs.
60
$0.5B × = $1M. (11.19)
30,000
For an 80% utilized photocluster with raw throughput capability of 140 wafers per
hour, the cost per wafer of this floor space is $0.20, assuming five years straight
depreciation for the building. This is small compared to equipment capital costs
and mask costs, but not insignificant.
Figure 11.7 Nonconcentric fields when the exposure tools used for patterning critical and
noncritical layers have different field sizes.
Problems
11.1 The total cost of consumables for an excimer-laser system has been estimated
to be ∼$300,000 per year and ∼$60,000 per year for a lamp-based system. For
an exposure system with a raw throughput capability of 140 wafers per hour
and utilization of 80%, what is the contribution of light-source consumables
to the per-wafer cost of ownership for excimer-laser light sources? And for
lamp sources?
424 Chapter 11
11.2 Derive an extension of Eq. (11.2) for a stepper appropriate for exposing two
reticles per wafer. Assume that the wafer remains on the exposure chuck
while the reticles are exchanged.
References
1. W. Trybula and D. Dance, “Cost of mask fabrication,” Proc. SPIE 3048,
211–215 (1997).
2. M. Mason, Sematech Next Generation Lithography Workshop (1998).
3. A. Wüest, A. J. Hazelton, and G. Hughes, “Estimation of cost comparison
of lithography technologies at the 22-nm half-pitch node,” Proc. SPIE 7271,
72710Y (2009).
4. N. Deguchi and S. Uzawa, “150-nm generation lithography equipment,” Proc.
SPIE 3679, 464–472 (1999).
5. SEMI E10–96 Standard, “Guideline for definition and measurement of equip-
ment reliability, availability, and maintainability (RAM),” Semiconductor
Equipment and Materials International, San Jose, CA.
6. A. O. Allen, Probability, Statistics, and Queuing Theory: With Computer
Science Applications, Academic Press, Harcourt Brace Jovanovich, Boston
(1990).
7. R. DeJule, “More productivity at subcritical layers,” Semicond. Int. 50,
(April,1998).
8. H. Tsushima, M. Yoshino, T. Ohta, T. Kumazaki, H. Watanabe, S. Matsumoto,
H. Nakarai, H. Umeda, Y. Kawasuji, T. Suzuki, S. Tanaka, A. Kurosu, T.
Matsunaga, J. Fujimoto, and H. Mizoguchi, “Reliability report of high power
injection lock laser light source for double exposure and double patterning ArF
immersion lithography,” Proc. SPIE 7274, 72743L (2009).
9. K. O’Brien, W. J. Dunstan, R. Jacques, and D. Brown, “Lithography line
productivity impact using Cymer GLXTM technology,” Proc. SPIE 7274,
72743N (2009).
10. K. Early and W. H. Arnold, “Cost of ownership for soft x-ray lithography,”
OSA Topical Meeting on EUV Lithography, Monterey, CA (May, 1993).
11. K. Early and W. H. Arnold, “Cost of ownership for 1× proximity x-ray
lithography,” Proc. SPIE 2087, 340–349 (1993).
12. D. Mansur, “Components of IC design cost relative to the market opportunity,”
Adv. Reticle Symp., San Jose (2003).
13. M. E. Preil, T. Manchester, and A. Minviell, “Minimization of total overlay
errors when matching nonconcentric exposure fields,” Proc. SPIE 2197,
753–769 (1994).
Chapter 12
Extreme Ultraviolet Lithography
Since the resolution capability of lithography can be extended by using short-
wavelength light, at least in principle, a number of concepts involving light
with wavelengths much shorter than 193 nm have been proposed. Considerable
effort has been applied to the development of one of these approaches, referred
to as extreme ultraviolet (EUV) lithography. In this chapter, the basic concepts
underlying EUV technology are discussed.
mλ
d= r , m = 1, 2, . . . , (12.1)
2δ̄ − δ̄2
2 cos θ 1−
cos2 θ
426 Chapter 12
Figure 12.1 Attenuation length for selected materials at short wavelengths.5,6 The
attenuation length is the distance at which the intensity of light propagating through a
material falls to 1/e of its initial value.
Figure 12.2 Reflectivity of selected materials at short wavelengths for normal incidence.5,6
Extreme Ultraviolet Lithography 427
where
δA dA + δ B d B
δ̄ = . (12.2)
dA + d B
Figure 12.3 Illustration of the way in which high total reflectivity is achieved from reflections
from mulitiple interfaces.
Figure 12.4 Reflectance versus wavelength for a multilayer reflector. The measurements
were made at the Advanced Light Source (synchrotron) at Lawrence Berkeley Laboratory
on a substrate that was made into an EUV mask at Advanced Micro Devices.
wavelength, in this case ∼13.4 nm. From Eq. (12.1), it is apparent that only a very
small change in the thickness of layer films will cause a shift in peak wavelength
(see Problem 12.3). Even if every mirror in the EUV system has very high peak
reflectance, the overall system transmission can be low if these peak reflectances
do not occur at nearly the same wavelength. A similar statement is true regarding
the need to match the reflectance properites of the masks to those of the projection-
optics mirrors.12 When specifying the “operating” wavelength, it is important to
pay attention to detail. As can be seen in Fig. 12.4, the curve of reflectance versus
wavelength is asymmetric. It is standard to specify the operating wavelength as
the center of the full-width half-maximum range (Fig. 12.5).13 Because of the
asymmetry, this median wavelength typically differs from the peak wavelength.
As discussed in Chapter 5, optical lithography has been practiced where
there are light sources that satisfy certain key requirements, particularly narrow
bandwidth and high intensity. For each optical lithography technology the optics
and masks have been engineered around the wavelength where such a source exists.
For EUV lithography there are few options for masks and optics. EUV lithography
must necessarily be practiced at wavelengths where there are multilayer reflectors
with high reflectance. For EUV lithography it is the light source that needs to
be engineered around the wavelength chosen on the basis of multilayer reflector
capability, rather than the other way around. Because EUV lithography is practiced
at wavelengths where high-intensity light sources were not established first,
productivity due to low light intensity at the wafer is a concern with EUV
lithographic technology, and considerable attention must be paid to maximizing
multilayer reflectance and source-output power. More will be said in Section 12.4
about EUV light sources.
For EUV lithography, it appears that the semiconductor industry has settled
on an operating wavelength of 13.5 nm. This choice was dictated in part by
the selection of MoSi multilayers, which limited the range for the operating
wavelength. The operating wavelength was further pinned down by matching
this wavelength to the peak outputs of candidate light sources at the time the
wavelength was selected.14
Extreme Ultraviolet Lithography 429
Figure 12.5 Curve showing the definition of median wavelength (λm ) and how it can differ
from the peak wavelength (λ p ).13
430 Chapter 12
Figure 12.6 Schematic of an EUV exposure system.19 Actual lenses may contain a
different number of mirrors.
Extreme Ultraviolet Lithography 431
vacuum, since EUV light will not propagate in air. (The transmission of 13.5-nm
light through 0.1 mm of air at atmospheric pressure is only ∼7%.) Moreover, the
vacuum must be very good, because photon-induced carbon deposition on mirror
and mask surfaces or surface oxidation can result from the presence of very low
levels of hydrocarbons in the system.20,21 For example, if every mirror in a six-
mirror lens system becomes coated with 1-nm-thick carbon layers, the transmission
of the lens will be degraded by over 7% (see Problem 12.4). This imposes the
requirement that all components within the vacuum chamber be constructed of
ultrahigh vacuum-compatible materials. It also means that capability must be
designed into exposure tools to prevent optics from becoming contaminated by
the inevitable outgassing from resists. One idea for preventing the optics from
becoming contaminated is to use the flow of an inert gas at a low partial pressure,
between the optics and wafer (Fig. 12.7).22,23 Contaminants will be carried along
by the flow of inert gas, greatly reducing optics contamination. Some small amount
of EUV light will be absorbed by the gas, estimated to be <3% for an optimized
flow, because good protection is possible even with a low partial pressure of argon.
The requirement for good vacuum affects other parts of EUV scanners. While it
is possible to construct air bearings for the stages that can be used in a vacuum,24–26
magnetic bearings have obvious advantages for EUV applications.27–29 Vacuum
load locks are also required for moving wafers and reticles in and out of scanners
efficiently, increasing system mechanical complexity, with possible consequences
for reduced reliability and increased cost.
Another interesting consequence of exposing wafers in a vacuum is the need
to change the method for chucking wafers and masks. In optical scanners,
vacuum clamping is used nearly universally to hold wafers and reticles onto
their chucks. The force for vacuum clamping is actually provided by atmospheric
pressure, so some other approach for holding wafers and masks is needed in EUV
Figure 12.7 Illustration of a gas curtain. Outgassing material is swept along with the gas
flow, thereby reducing contamination of the optics.
432 Chapter 12
exposure tools. Electrostatic chucks have been used extensively in other types
of semiconductor equipment where the processes occur in a vacuum, and this
appears to be the approach taken for EUV systems as well. Electrostatic chucks
are considered further in the next section.
As discussed in Chapter 5, very small levels of optical absorption can cause
lenses to heat, leading to exposure-dependent variations in focus and overlay. The
multilayer coatings on the mirrors that comprise EUV lenses are ∼70% reflective,
at best, and most of the light energy that is not reflected is absorbed. This can lead to
substantial heating of lens elements, and very good active compensation is needed
to maintain good focus and overlay control. Masks will also absorb an appreciable
amount of light energy, which could lead to expansion and contraction of masks
with attendent overlay errors.30 For this reason, EUV masks are fabricated from
ultralow expansion materials, with coefficients of thermal expansion measured in
parts-per-billion.13
In the excimer lasers used for lithographic applications, the DUV light is
produced by the plasmas of very corrosive gases. Such plasmas do not harm any
optics because the plasmas are contained, with the light being emitted from a
sealed chamber through transparent windows, typically comprised of CaF2 . Such
windows do degrade over time and need to be replaced periodically, but they are
inexpensive relative to optical components.
The situation is completely different for EUV. Because there are no sufficiently
transparent materials at EUV wavelengths, there can be no windows to segregate
gases and plasmas from the rest of the optical system. The same line-of-sight paths
by which EUV photons travel from the point at which they are generated to the
mirrors of the collection optics can also be pathways for contamination and debris.
Since such contamination can potentially reduce the reflectivity of such mirrors
substantially, the illumination systems of exposure tools must be designed with
appropriate mitigation schemes.
Contamination can affect EUV light sensors that depend on the absorption of
light to produce some type of electrical signal. This is because the photoelectrical
properties of all solids are very sensitive to surface conditions. As described
in Chapter 5, steppers use light sensors in real time to control exposure dose.
Contamination of EUV light sensors will reduce the accuracy to which exposure
doses can be controlled in EUV exposure tools.
Recently, manufacturers of lithographic equipment, ASML and Nikon, have
built 0.25-NA full-field exposure tools.31,32 Using such tools, confidence that EUV
lithography will be viable for use in high-volume manufacturing was increased by
the fabrication of electrically functional SRAMs and other complex circuits,33,34
because all parts of EUV technology needed to work in order to produce these
devices. The remainder of this chapter will cover particular key elements of EUV
lithographic technology in more detail.
Extreme Ultraviolet Lithography 433
Figure 12.8 Micrographs of an early EUV mask fabricated at Advanced Micro Devices.
The TiN absorber sits on a SiO2 buffer layer that has been coated directly onto a MoSi
multilayer. The left micrograph is a tilt SEM, while the right picture is a TEM, showing the
multilayer reflector.
434 Chapter 12
Figure 12.9 Lines that run perpendicular to the plane of incidence of the nonnormally
incident light are shadowed. For NA ≤ 0.35, φ = 6 deg and will be greater for larger NA.
While all materials absorb EUV light, some are more absorbing than others,
and this influences the choice of materials for EUV masks. TaN41 and TaBN42
are examples of materials with good absorption, along with other desirable
characteristics, such as ease of cleaning.
Because EUV masks are ∼30% absorbing even in the reflecting areas, these
masks will heat up during exposure. Such heating and subsequent cooling will
lead to overlay errors due to thermal expansion and contraction of the mask, so
EUV masks should be fabricated on substrates with extremely low coefficients
of thermal expansion (<30 ppb/K). Glass materials with such low coefficients
of thermal expansion are available and have been characterized for use as EUV
mask substrates.43,44 Examples of such glass materials are ULE
R
from Corning,
R
Inc. and Zerodur from Schott. These and other EUV mask requirements are
detailed in SEMI Standards P37 (specification for extreme ultraviolet lithography
mask substrates), P38 (specification for absorbing film stacks and multilayers on
extreme ultraviolet lithography mask blanks), and P40 (specification for mounting
requirements and alignment reference locations for extreme ultraviolet lithography
masks).
The use of reflective masks also imposes requirements on mask flatness. As can
be seen in Fig. 12.10, mask nonflatness will result in image-placement errors. For
consistency with the overlay requirements of the 32-nm node, EUV masks must
be flat to within 36 nm (peak to valley) (see Problem 12.6).45,46 Flatness of this
magnitude must be maintained while the masks are in use, which means that EUV
mask chucks must also be very flat, and great care must be taken to avoid particles
on the back side of the masks that can lead to front-side mask nonflatness after
chucking (see Fig. 12.11).47
The use of electrostatic chucks in EUV exposure systems imposes requirements
on the backsides of EUV masks. In a basic bipolar Coulomb electrostatic chuck,
electrodes within the chuck are charged. When the backside of a substrate is
electrically conducting, charges can move freely and lead to electrostatic attraction
between the chuck and the substrate (Fig. 12.12). Thus, there is a requirement in
SEMI Standard P38 that the backsides of EUV masks be electrically conducting.
There is very limited capability for repairing a multilayer-film stack.49
Consequently, blanks need to be made essentially defect free. This is a significant
Extreme Ultraviolet Lithography 435
Figure 12.10 Mask nonflatness leads to image-placement errors. The lateral displacement
h × tan θ is reduced on the wafer by the reduction of the projection optics.
Figure 12.11 Mask deformation due to a particle between the mask substrate and the
chuck.
challenge for EUV lithography. With wafer feature sizes of 32 nm, and 4×
reduction lenses, mask defects need to be much less than 4 × 32 nm = 128 nm.
There is also the potential for phase defects illustrated in Fig. 12.13. Steps can
result from particles or scratches on the substrate surface on which the multilayer
film is deposited, and these steps can form phase defects. The extent to which
such substrate defects propagate through the multilayer is dependent upon the
deposition method, and there are initial data indicating that substrate defects can
be smoothed over with ion-beam deposition techniques.50 Phase defects can be
created by extremely small particles, which may be smaller than can be detected
with optical (visible or DUV) light. Indeed, defects have been detected using
actinic inspection tools that could not be found with conventional visible-light
defect inspection tools.51 Consequently, it may be necessary to inspect masks with
tools that use EUV light to provide sensitivity to phase defects. This will further
436 Chapter 12
add to the cost and complexity of implementing EUV lithography.51,52 The mask
is one of the biggest challenges for EUV lithography, even with the advantage of
reduction lenses.
The absence of highly transparent materials at EUV wavelengths implies that
conventional pellicles cannot be used. The transparency requirement for EUV
lithography is high because light needs to pass twice through a pellicle in the
reflection geometries currently implemented. This lack of a pellicle is another
problem that needs to be addressed in the deployment of EUV lithography.
Figure 12.14 Schematic of a laser-produced plasma (LPP) EUV light source. The collector
optics, which may be more complex than illustrated in this figure, focus the light onto the
intermediate focus of the condenser optics.
charged xenon ions to the condenser optics. Given a need to address condenser
optics damage even with the use of a noble gas, fuel materials that provide higher
EUV light output than Xe have come into use. Lithium was considered,58,59 but the
dominant fuel in use today is tin, primarily due to its potential for high conversion
efficiency to EUV light at the wavelengths where Mo/Si multilayers provide good
reflectivity.60
LPP EUV sources were among the first to be considered, and such a source,
using Nd-YAG lasers, was used on the first full-field EUV exposure tool.61 Early
cost estimates for LPP sources indicated that they would be too expensive for
practical use, and discharge sources were subsequently pursued.62–65 A conceptual
drawing of one such discharge source is shown in Fig. 12.15, and a picture of
the source in operation in shown in Fig. 12.16 (see Color Plates). Discharge
sources involve plasmas generated between electrodes, which can be a significant
engineering constraint that limits the ability to cool the electrodes and collect EUV
light.
As DPP sources came into use on exposure tools,32 lithographers became very
conscious of the low intensities that were being achieved, and LPP sources were
reconsidered. Sources have since been developed using CO2 lasers that appear to
have lower cost of ownership than had been projected earlier for LPP sources that
employed Nd-YAG lasers. A picture of an LPP source that uses CO2 lasers is shown
in Fig. 12.17.
There is a thermal challenge for EUV light sources. In addition to EUV light,
EUV sources generate even more light at longer wavelengths, including infrared
radiation, i.e., heat. The intrinsically compact arrangement of components that
produce the plasmas in discharge sources makes it difficult to remove heat. One
clever idea is to fabricate the electrodes from the same liquid tin used to fuel the
plasma.66 This liquid tin can continously be resupplied, addressing the problem
of electrode erosion in the harsh plasma environment. However, even with such
ingenuity, the output of discharge sources has remained too low for use in high-
volume manufacturing. The ultimate power output of discharge sources may be
inherently limited, in part by the difficulty of dissipating the heat that is generated.
Figure 12.18 Calculated reflectivity of normally incident light from a 40-pair Mo/Si
multilayer film.68
440 Chapter 12
Figure 12.19 Example of a system for trapping debris from the plasma used to produce
EUV light.78
slope), modeling studies show significant reductions in process windows for EUV
lithography with levels of flare of 10% or larger (see Problem 12.6).80
The reason for the greater level of flare in EUV lenses is the short wavelength
involved and the use of mirrors. Consider a surface which produces an rms level
of phase variation rmsphase in light reflected from the surface. The total integrated
scattered (TIS) light from such a surface is given by81
rms 2
phase
T IS = 4π2 . (12.3)
λ
For a given level of surface roughness, a mirror will produce greater phase error
than a refracting surface, as the light passes twice past surface roughness from a
reflector but only once with a refracting surface. Because of the very short EUV
wavelength, compared to DUV light, the amount of scatter is much greater for
EUV optical systems for a given level of surface roughness. [Note the denominator
of Eq. (12.3).]
Further complicating the polishing of mirror surfaces is the need to use aspheric
mirrors in order to minimize the number of mirrors in the lens. With less than 70%
reflectance per mirror, each pair of mirrors added reduces light intensity at the
wafer by over half. Fabrication of high-quality aspheric mirrors is relatively new
technology, although it has matured considerably.
To obtain the smoothest optical surfaces possible, particular care must be taken
in the polishing process. In addition, the multilayer deposition process can be tuned
to have the effect of smoothing roughness in the underlying substrate.82 Because of
the very stringent requirements for mirror-surface figure, the multilayer films must
be low stress so that they do not deform the carefully polished glass substrates.
Increasing the numerical aperture of all-reflective optics involves a number of
challenges. For example, consider the lens design shown in Fig. 12.20. Increasing
the numerical aperture necessitates an increase in maximum angles for rays of
light, but an increase in the size of certain lens elements will cause light rays to
get blocked, a problem known as self-vignetting. To solve this problem, larger off-
axis angles are required, and this makes it more difficult to maintain mechanical
stability. Also, it becomes more difficult to compensate for aberrations with larger
angles of incidence and reflection on the mirrors, necessitating the addition of
mirrors to maintain low levels of aberrations. As a general rule, the lenses for
EUV lithography systems have an even number of mirrors. This is necessary to
separate the reticle and wafer stages physically to opposite sides of the lens. This
is a practical consideration because of the large size of the stages, particularly for
systems that involve multiple wafer stages. Consequently, the minimum increment
in the number of mirrors in a lens is two. Since mirror reflectance is <70%, adding
a single pair of additional mirrors will reduce lens transmission of light by over
half. In order to maximize exposure-tool throughput, additional mirrors should be
added only when absolutely necessary.
One interesting consequence of the use of all-reflective optics is the shape
of the slit in scanning-exposure tools. In KrF and ArF reduction scanners with
Extreme Ultraviolet Lithography 443
dioptric lenses, the slit is rectangular. In these situations the imaging is done
through the center of the lens, where aberrations are usually smallest. The slit
remains rectangular in shape when optical scanners have catadioptric lenses, but
the slit height is necessarily reduced in height due to the central obscuration found
in catadioptric lenses (Fig. 12.21). As mentioned in Section 5.9, aberrations can
often be minimized in a slit that is curved. By imaging in such a curved slit,
fewer optical elements are required to correct for aberrations. Since minimizing
the number of reflections to maximize throughput is an important consideration for
EUV lithography, EUV scanners typically have curved slits.84 Shown in Fig. 12.22
is a mask that was fabricated so it could be exposed in a static as well as a scanning
mode. Consequently, sections of the mask were arranged to follow the slit, and the
curvature is evident in Fig. 12.22.
As described in Section 12.3, off-axis illumination causes different horizontal
and vertical print biases. The ring-field configuration increases the complexity
of this problem, because the direction of the light incident on the mask changes
across the slit. While the angle of incidence remains constant, it acquires an
azimuthal component, as illustrated in Fig. 12.23. Consequently, the light may not
be strictly perpendicular or parallel to horizontal or vertical features on the mask.
To compensate for differences in the printing of horizontal and vertical features,
OPC needs to vary across the slit to achieve full accuracy.
Figure 12.21 Examples of slits for different types of lenses: (a) refractive lens, such as is
typically found in nonimmersion optical scanners; (b) catadioptric lens, such as is found in
immersion optical scanners or very high-NA nonimmersion scanners; and (c) ring field, as
is usually used in EUV exposure systems.
444 Chapter 12
Figure 12.22 An EUV mask with features arranged to follow a curved slit to enable static
as well as scanning exposures.
Figure 12.23 The illumination is incident on the mask with an azimuthal angle ψ across
the slit.
after post-exposure bake, the diffusion length should be less than ∼20% of the
pitch. Using this estimate as a guide, for 22-nm half-pitch technology, the diffusion
length will need to be less than 9 nm and must be even smaller for later nodes.
The problem of shot noise discussed in Chapter 3 is particularly acute for EUV
lithography,85,86 since there is an order of magnitude fewer photons per mJ of
EUV light than for DUV light. Moreover, because EUV lithography is intended for
use at smaller features than those created using ArF lithography, the requirements
for LER will be more stringent for EUV lithography. The problem of shot noise
in EUV lithography is illustrated in Fig. 12.24, which is a graph of LER versus
exposure dose for a number of tested EUV resists, along with a curve derived from
a simple model of shot noise that predicts the minimum LER that can be achieved
Extreme Ultraviolet Lithography 445
Figure 12.24 Line-edge roughness versus exposure dose for various EUV resists. The
diamonds are measured results for individual resists,87 and corrected for mask absorber
and multilayer-roughness contributions to LER,88 while the solid curve was produced by a
model of LER resulting purely from shot noise. (Figure courtesy of B. LaFontaine.)
for a given exposure dose. From the data and model it appears that adequate LER
cannot be obtained with extremely sensitive resists.
The model used to produce the solid curve in Fig. 12.24 relating LER and resist
sensitivity is derived as follows. Suppose a uniform light beam is normally incident
on the surface of a resist film. The average number N of EUV photons that crosses
a surface section of size d × d into a resist film below the surface is given by
N = 0.68Ed2 , (12.4)
σN 1
= √ , (12.5)
N N
is substantially different. First, because the energy per EUV photon is much higher
than that for photons with wavelengths of 248 or 193 nm, there are far fewer EUV
photons for the same doses as measured in mJ/cm2 . Second, EUV light sources
are much weaker than the excimer lasers used for DUV lithography, so doses for
EUV lithography tend to be lower than those in longer-wavelength lithography,
even when measured in mJ/cm2 .
The deviation in the placement of a line-edge ∆x due to the variation in dose ∆E
is given by the expression (see Section 2.3):
∆E
! !
1
∆x = , (12.6)
ILS E
where ILS is the image log slope. Since photon absorption is also a statistical event,
there is another source of variation that must be taken into account. If we assume
that half of the incident photons are absorbed by the resist as a binomial process of
probability 0.5, the total absorbed dose variation becomes
√
∆E 12 + 0.52
= √ (12.7)
E N
1.12
= √ . (12.8)
N
Both the image log slope and d will scale with the critical dimensions being
patterned, i.e., LER can be expected to be greater for smaller features, at least
without better optics. The length d establishes the minimum distance over which
LER is evaluated, and this distance over which we are concerned will get smaller as
critical dimensions shrink. Estimating that the normalized image log slope (NILS)
is given by
and
CD
d= . (12.12)
3
4.9
LER = √ . (12.13)
E
The photoelectrons will travel some distance before being scattered. As a result,
there is blur of the original optical image in the resist, even prior to post-exposure
bake. Although the mean free path in organic materials for electrons with energy
∼100 eV is <1 nm,98 there can be multiple scattering events, leading to ranges for
photoacid generation potentially much larger than 1 nm from the point of initial
photon absorption.99 Determining actual photoelectron ranges in EUV resists is
still an area of active research,100,101 and the answer will have implications for
the ultimate resolution capability of EUV lithography. Although existing data
are incomplete, an upper boundary for photoelectron-limited resolution can be
made from exposure made using EUV interferometric lithography. (Inteferometric
lithography is discussed in the next chapter.) While differing optically from
conventional projection lithography, interferometric lithography is subject to the
same resolution limits caused by photoelectrons. Using calixerene-type negative
resists, which are not subject to photoacid blur, patterns with half pitches below 15
nm have been obtained.102
Many polymers tend to cross-link when exposed to radiation. As a consequence,
resists can simultaneously exhibit positive resist behavior (usually at low to
moderate doses) and negative resist behavior (at higher doses).103 This is another
resist issue associated with EUV lithography of which practitioners need to be
mindful.
EUV lithography is a promising technology, but one where development is still
in progress to enable high-volume manufacturing of integrated circuits. Only a
survey of EUV technology has been provided here. For further reading on the
basic engineering science of EUV radiation, the interested reader is referred to the
excellent book by Dr. David Atwood.104 In addition, there are a number of survey
papers, as well as entire books dedicated to the subject of EUV lithography.105,106
Problems
12.1 Show that the Rayleigh resolution of an EUV (λ = 13.5 nm) lithography
system with an NA = 0.25 is 33 nm, and that the Rayleigh depth-of-focus is
216 nm. Show that the resolution and depth-of-focus are 20.5 nm and 84 nm,
respectively, for an EUV lens with 0.4 NA.
Extreme Ultraviolet Lithography 449
12.2 Show that Eq. (12.1) reduces to the conventional Bragg condition
mλ
d= ,
2 cos θ
when the indices of refraction of the materials in a multilayer film
stack → 1.0.
12.3 Using the above formula for the conventional Bragg condition, show that
each pair in a multilayer reflector should be 6.8 nm thick to produce peak
reflectivity at a wavelength of 13.5 nm when the incident light is 6 deg from
normal. (Assume that m = 1.) Show that a change in the film-pair thickness
of 1 Å changes the peak wavelength by 0.2 nm.
12.4 The attenuation length of carbon for light of wavelength 13.5 nm is 155 nm.
Show that the transmission of light through a six-mirror lens where each
mirror is coated with 1 nm of carbon is 92.6% of an identical lens for which
the mirrors have no carbon coating. (Remember that the light will travel twice
through the carbon coating on each mirror.)
12.5 The ITRS contains a requirement of 36-nm peak-to-valley mask flatness for
the 32-nm node. Show that this level of nonflatness contributes an overlay
error of ∼1 nm for systems which illuminate the mask at a mean angle of 6
deg and have projection optics with a reduction ratio of 4:1.
12.6 In Eq. (2.32) the change in linewidth ∆L when the dose E(x) is changed
proportionally E(x) → (1 + ∆)E(x) is given by
∆L = 2∆ (ILS )−1 ,
where ILS is the image log slope. Show that the same relationship holds for
an added dose E(x) → E(x) + ∆, independent of position.
12.7 Derive Eq. (12.4), keeping in mind that the energy of a single EUV photon is
1.47 × 10−14 mJ.107
12.8 Suppose the reflected light produced by a multilayer mirror as a function of
wavelength (Fig. 12.4) is approximated by a Gaussian
−(λ−λ0 )2
R(λ) = R0 e a(∆λ)2 ,
longer than a scanner with a rectangular slit of the same height h, where R is
the curved slit’s radius of curvature and W is the width of the exposure field.
References
1. T. W. Barbee, S. Mrowka, and M. C. Hettrick, “Molybdenum-silicon
multilayers for the extreme ultraviolet,” Appl. Optic. 24, 883 (1985).
2. S. Yulin, “Multilayer interference coatings for EUVL,” in Extreme Ultraviolet
Lithography, B. Wu and A. Kumar, Eds., McGraw Hill, New York (2009).
3. A. M. Hawryluk and L. G. Seppala, “Soft x-ray projection lithography using
an x-ray reduction camera,” J. Vac. Sci. Technol. B 6, 2161–2166 (1988).
4. A concise history of EUV lithography written by two pioneers of the
technology can be found in H. Kinoshita and O. Wood, “EUV lithography:
an historical perspective,” in EUV Lithography, V. Baksi, Ed., SPIE Press,
Bellingham, Washington (2009).
5. B. La Fontaine, “EUV optics,” in Extreme Ultraviolet Lithography, B. Wu
and A. Kumar, Eds., McGraw Hill, New York (2009).
6. B. L. Henke, E. M. Gullikson, and J. C. Davis, “X-ray interactions: photoab-
sorption, scattering, transmission, and reflection at E = 50–30000 eV, Z =
1–92,” Atomic Data and Nuclear Data Tables 54(2), 181–342 (July 1993).
See also http://henke.lbl.gov/optical_constants/.
7. E. Spiller, S. L. Baker, P. B. Mirkarimi, V. Sperry, E. M. Gullikson, and
D. G. Stearns, “High-performance Mo-Si multilayer coatings for extreme-
ultraviolet lithography by ion-beam deposition,” Appl. Optic. 42(19),
4049–4058 (2003).
8. A. A. Krasnoperova, R. Rippstein, A. Flamholz, E. Kratchmer, S. Wind,
C. Brooks, and M. Lercel, “Imaging capabilities of proximity x-ray
lithography at 70 nm ground rules,” Proc. SPIE 3676, 24–39 (1999).
9. J. A. Folta, S. Bajt, T. W. Barbee, R. F. Grabner, P. B. Mirkarimi,
T. Nguyen, M. A. Schmidt, E. Spiller, C. C. Walton, M. Wedowski, and
C. Montcalm, “Advances in multilayer reflective coatings for extreme-
ultraviolet lithography,” Proc. SPIE 3676, 702–709 (1999).
10. S. Bajt, “Molybdenum-ruthenium/beryllium multilayer coatings,” J. Vac. Sci.
Technol. A 18(2), 557–559 (2000).
Extreme Ultraviolet Lithography 451
26. T. H. Bisschops, et al., “Gas bearings for use with vacuum chambers and their
application in lithographic projection apparatuses,” U.S. Patent No. 6603130
(2003).
27. A. T. A. Peijnenburg, J. P. M. Vermeulen, and J. van Eijk, “Magnetic lev-
itation systems compared to conventional bearing systems,” Microelectron.
Eng. 83(4–9), 1372–1375 (2006).
28. P. T. Konkola, “Magnetic bearing stages for electron beam lithography,” MS
Thesis, Massachussetts Institute of Technology (1998). http://hdl.handle.net/
1721.1/9315.
29. M. Williams, P. Faill, P. M. Bischoff, S. P. Tracy, and B. Arling, “Six degrees
of freedom Mag-Lev stage development,” Proc. SPIE 3051, 856–867 (1997).
30. A. Abdo, B. La Fontaine, and R. Engelstad, “Predicting the thermomechani-
cal distortion of extreme ultraviolet lithography reticles for preproduction and
production exposure tools,” J. Vac. Sci. Technol. B 21(6), 3037–3040 (2003).
31. H. Meiling, E. Boon, N. Buzing, K. Cummings, O. Frijns, J. Galloway,
M. Goethals, N. Harned, B. Hultermans, R. de Jonge, B. Kessels,
P. Kürz, S. Lok, M. Lowisch, J. Mallman, B. Pierson, K. Ronse,
J. Ryan, E. Smitt-Weaver, M. Tittnich, C. Wagner, A. van Dijk, and
J. Zimmerman, “Performance of the full field EUV systems,” Proc. SPIE
6921, 69210L (2008).
32. T. Miura, K. Murakami, K. Suzuki, Y. Kohama, K. Morita, K. Hada,
Y. Ohkubo, and H. Kwai, “Nikon EUVL development progress update,”
Proc. SPIE 6921, 69210M (2008).
33. B. La Fontaine, Y. Deng, R. Kim, H. J. Levinson, S. McGowan,
U. Okoroanyanwu, R. Seltmann, C. Tabery, A. Tchikoulaeva, T. Wallow,
O. Wood, J. Arnold, D. Canaperi, M. Colburn, K. Kimmel, C. Koay,
E. McLellan, D. Medeiros, S. P. Rao, K. Petrillo, Y. Yin, H. Mizuno,
S. Bouten, M. Crouse, A. van Dijk, Y. van Dommelen, J. Galloway, S. Han,
B. Kessels, B. Lee, S. Lok, B. Niekrewicz, B. Pierson, R. Routh, E. Schmitt-
Weaver, K. Cummings, and J. Word, “The use of EUV lithography to produce
demonstration devices,” Proc. SPIE 6921, 69210P (2008).
34. G. F. Lorusso, J. Hermans, A. M. Goethals, B. Baudemprez, F. Van Roey,
A. M. Myers, I. Kim, B. S. Kim, R. Jonckheer, A. Niroomand, S. Lok,
A. Van Dijk, J. F. de Marneffe, S. Demuynck, D. Goossens, and K. Ronse,
“Imaging performance of the EUV alpha demo tool at IMEC,” Proc. SPIE
6921, 69210O (2008).
35. A. R. Pawloski, B. La Fontaine, H. J. Levinson, S. Hirscher, S. Schwarzl, K.
Lowack, F.-M. Kamm, M. Bender, W.-D. Domke, C. Holfeld, U. Dersch,
P. Naulleu, F. Letzkus, and J. Butschke, “Comparative study of mask
architectures for EUV lithography,” Proc. SPIE 5567, 762–773 (2004).
Extreme Ultraviolet Lithography 453
is a typical membrane material, and silicon nitride films were used early in the
development of x-ray lithography.
Silicon carbide is a good material to use for the membrane because it has a high
Young’s modulus (450 GPa), a characteristic that minimizes mechanical distortion,
and it is not damaged by long exposure to x rays.1 Gold2 and tungsten3 have been
used as absorbers, but the best success has been found with compounds of Ta, such
as TaN,4 TaSiNx ,5 and Ta4 B,6 because they are compatible with various etch and
cleaning processes.7
The use of thin membranes for masks introduces a set of challenges. Such
films deform because of stresses, and there has been extensive work to understand
and control them.8,9 Mask deformation is particularly problematic for x-ray
lithography, because there is no reduction of the image between the mask and
wafer. This 1:1 pattern transfer necessitates very tight tolerances for the masks,
relative to 4:1 or 5:1 reduction printing. On the other hand, with x rays there
are no lens distortions or feature size–dependent pattern-placement errors, so a
greater part of the overlay budget can be allocated to the mask in x-ray lithography.
However, thin-film masks are susceptible to vibrations when stepped or scanned,
and this needs to be addressed in any x-ray exposure system.10 Diamond was
pursued as an x-ray-mask-membrane material because its extremely high Young’s
modulus (900 GPa) reduces mask mechanical distortion.11
X-ray masks are typically made from silicon wafers. The membrane-mask area
is fabricated in the center of the wafer. The mask fabrication process is outlined in
Fig. 13.2. Because the membrane area is fragile, and silicon wafers are too thin to
provide stability, frames or rings have been adopted for x-ray masks. The fragile
silicon wafer is bonded to the ring to provide mechanical strength. An example
of such a frame, the ARPA-NIST standard, 12 is shown in Fig. 13.3. The frame is
made of Pyrex to facilitate bonding between the frame and the silicon on which the
mask is fabricated.
Until the recent interest in EUV lithography, there has been greater investment
in x-ray lithography than any other potential successor to optical technology. There
have been programs at several universities, such as MIT and the University of
Wisconsin, and at companies such as IBM, AT&T, and NTT. X-ray technology
has also received support from the United States and Japanese governments. X-ray
step-and-repeat and step-and-scan systems were made available commercially.13,14
Alternative Lithography Techniques 461
Figure 13.3 The ARPA/NIST x-ray-mask standard mounting fixture.12 All dimensions are
given in mm.
462 Chapter 13
For many years, x-ray sources were a problem for lithographic applications.
Point sources of x rays, such as those used in medicine or dentistry, have long been
available. However, such sources are far from ideal for use in lithography. This is
illustrated in Fig. 13.4, which shows that there are pattern-placement effects that
do not occur with a collimated source. Since L r, the displacement d is given to
good approximation:
g
d=r . (13.1)
L
What this shows is that the pattern shift is a function of position on the mask and
depends upon the gap. For a point source, the x-ray masks need to be fabricated
with adjustment for this magnification, and a stringent requirement for gap control
is imposed in order to maintain good overlay.
Because of this deficiency of point x-ray sources, synchrotron radiation was
often adopted for use in x-ray lithography. Synchrotron radiation is produced by
moving electrons at speeds approaching the speed of light (relativistic) along a
circular arc. When this is done, synchrotron radiation is emitted in a narrow range
of angles in the forward direction, along a tangent to the arc and in the same plane.
By the use of extremely powerful magnets, relativistic electrons are bent in an arc,
causing them to emit synchrotron radiation down a beamline connected to an x-ray
exposure tool (Fig. 13.5). This directionality, intrinsic to synchrotron radiation, is
useful for x-ray lithography because mirrors and lenses cannot be used to gather
photons efficiently at x-ray wavelengths and direct them toward the exposure tool.
Intensity is proportional to the current of electrons. The combination of collimation
and intensity make synchrotrons good sources of light for x-ray lithography.
Synchrotron light sources are more complex and expensive than arc lamps or
excimer lasers, but are not unreasonable sources of x rays for use in lithography.
Because they have no moving parts, synchrotrons are very reliable.15 A single
storage ring can supply x-rays for several steppers, spreading the light-source cost
across multiple exposure systems.
Because x-ray light sources also produce photons at long wavelengths, a filter
is needed. Beryllium is transparent for wavelengths shorter than 15 Å. Beryllium
windows also enable an ultrahigh vacuum to be maintained on the source side of
the window, while allowing some level of atmosphere in the exposure tools. This
transmission cutoff for beryllium sets a limit for the longest wavelengths used for
x-ray lithography. Another limit on the range of wavelengths for x-ray lithography
is set by the membrane materials used for the x-ray mask. The silicon-K absorption
edge occurs at a wavelength of 6.7 Å, setting this as a lower bound on the usable
wavelengths when the mask membrane is silicon carbide. There has been recent
work on diamond membranes, which would enable the use of shorter wavelengths
for x-ray lithography, but this technology is yet unproven. Consequently, x-ray
lithography involves the use of photons with wavelengths between 6.7 Å and 15 Å.
A compact x-ray source has been developed that produces collimated x rays,
overcoming a number of problems associated with earlier point sources.16 The x
rays are produced initially by hitting a copper foil with high-power light from a
464 Chapter 13
Figure 13.6 Spectrum from compact x-ray light source. The x rays are produced by hitting
a copper foil with high-intensity light from a Nd:YAG laser.17
the parameter k1 :
λ
minimum linewidth = k1 . (13.2)
NA
where g is the gap between the x-ray mask and the wafer and α is a parameter
that captures the contributions from the resist process. Typically, α ranges between
0.5 and 1.5. In order to achieve resolution <100 nm, gaps less than 10 µm are
required. Maintaining such tight gaps is not beyond current capability, but it is a
nontrivial task. In optical lithography, there is a focus-control requirement in order
to maintain good linewidth control. There is a similar requirement for gap control
in x-ray lithography.20 With a 15-µm nominal gap, linewidth variations of 3 nm
per micron of gap change were measured for 150-nm features.21 This problem of
linewidth variation caused by changes in the gap increases as the targeted feature
size gets smaller.
Many DUV resists, such as UVIIHS,22 work quite well as x-ray masks. Doses
range from 100–300 mJ/cm2 with this particular chemically amplified resist. This
simplified the x-ray lithography development effort, since much work could be
done using commercially available resist materials.
Of all the challenges to x-ray lithography, the greatest involves the mask.
Because there is no potential for image reduction, defects, linewidth variation,
and misregistration are transferred from the mask to the wafer unmitigated by
the reduction found in optical steppers. There is also no potential for a pellicle
that will keep particulate defects out of the depth-of-field. On the other hand,
because there are no optics, there are no lens contributions to linewidth variation
and misregistration, so 1× x-ray masks need to have linewidth variations and
misregistration only about one third that of 4× reticles. However, since leading-
edge mask-making capability is needed just to meet the requirements of 4× reticles,
this tightening of requirements by a factor of three has made 1× x-ray masks
virtually impossible to make. As a consequence, most x-ray programs have been
scaled back considerably or terminated altogether.
beams also image with extremely large depths-of-focus, providing relief from one
the most challenging problems of optical lithography.
Conceptually, electron-beam direct-write lithography is particularly appealing
for manufacturing products made in extremely low volume. In Chapter 11, the
impact of mask prices on cost of ownership was discussed. With complete mask
sets for products often exceeding $2M and increasing with time, masks contribute
significantly to the cost of products produced in low volume (<1000 wafers/reticle
set). For these types of products, electron-beam direct-write lithography looks
attractive because there are no mask costs. However, direct-write lithography is
currently not very efficient for high-volume manufacturing. The high efficiency
of optical lithography, discussed in Chapter 1, is obtained because many features
on the mask are transferred to the wafer in parallel. On the other hand, direct-
write lithography fundamentally involves serial processes, making it slow. This
low speed is tolerable for mask making, where only a few chips are patterned on
reticles, but writing time does become problematic when trying to pattern many
chips per wafer and on more than just a few wafers.
A brief analysis serves to illuminate the magnitude of the problem. Consider
a 22-nm technology patterned with a raster scanning tool. If the pattern is divided
into pixels and we want to be able to adjust the position of the edge of every feature,
then 11 × 11-nm pixels are needed, of which there are 8.26 × 1011 such pixels in
one square centimeter. Writing at a rate of one-billion pixels per second, it would
take nearly a week to completely pattern a single 300-mm-diameter wafer.
Increasing direct-write throughput significantly also represents a challenge in
data-transfer technology. In order to achieve 1-nm edge placement for 22-nm
technology, a 5-bit per-pixel data representation is required to achieve the required
level of gray scaling.25 The total design information density therefore exceeds
a terabit (1012 ) per cm2 . Comparable data volumes would be required if vector
scanning was used, assuming similar flexibility in pattern placement. To pattern
one 300-mm wafer per minute, it is necessary to transfer information at a rate of
approximately
where σ f and t are both measured in units of microns and E is in units of keV. Due
to forward scattering alone, high resolution requires thin resist and high-energy
electrons (see Problem 13.4).
Such issues of resolution degradation and proximity effects were discussed
in the context of mask fabrication in Chapter 7. There are, however, differences
when writing directly onto wafers. First, because masks are typically used for
reduction lithography, the feature sizes of interest on the mask are usually bigger
than those on the wafer, even when subresolution features are included. This eases
the requirements for mask making, relative to direct writing on wafers.
Another problem associated with directly writing on wafers is energy
deposition. The scattering of electrons in the resist is reduced when using
high-energy electrons, something desirable for high-resolution electron-beam
lithography. However, most high-energy electrons pass through the resist film and
deposit their energy in the underlying substrate. For mask making, this is not
particularly problematic, for several reasons.28 First, electrons more energetic than
50 keV are rarely used for mask making because the resolution requirements for
mask making are less stringent than for writing directly on wafers, a consequence
of reduction lithography. For some of the concepts for direct writing on wafers,
electrons with energy greater than 50 keV may be required (Section 13.2.4), and
more energy is thus deposited into wafer substrates than typically occurs in mask
making. More importantly, reticles are usually made from fused silica, which has a
low coefficient of thermal expansion (∼0.5 ppm/◦ C), nearly an order of magnitude
smaller than for silicon. Typical photomasks also have much greater thermal
capacity than wafers. The combination of a lower coefficient of thermal expansion
468 Chapter 13
and a greater thermal capacity offsets fused silica’s lower thermal conductivity
relative to silicon. Even with electron energies as low as 5 keV, wafer thermal
distortion is large enough to affect overlay at a level of significance for 22-nm
technology and beyond,29 unless great care is taken for mitigation.
Another important consideration relevant to the problem of thermal distortion is
throughput. Write times of several hours are considered acceptable for photomasks,
since far fewer masks need to be fabricated than wafers. For patterning wafers, such
throughputs are acceptable for a few applications in research or prototyping, but a
serious application of electron lithography to manufacturing requires throughputs
of several wafers per hour. This implies that the rate of energy deposition for wafer
patterning should be more than two orders of magnitude greater than for mask
making. The heat generated by such a greater rate of energy deposition cannot
be conducted away readily, leading to greater temperature changes. For all of
these reasons, silicon wafers will mechanically distort more than photomasks as
a consequence of energy deposition, and the effect on registration is more. For
patterning wafers with electron beams, overlay is a significant issue.
Resist sensitivity is a factor that directly influences throughput. As noted in the
previous chapter, lithographic processes involving very sensitive resists will also
be subject to high line-edge roughness and dimensional variation. For example,
suppose we are patterning 22 × 22 nm contacts, and wish to achieve <5% 3σ dose
variation as a consequence of shot noise. This necessitates resists with sensitivity
of 120 µC/cm2 or greater (see Problem 13.5). With each new generation, linear
dimensions scale as ∼0.7×, and areas scale as ∼0.5×. This means that every new
generation of e-beam technology will require ∼2× increase in dose (as measured in
µC/cm2 ). However, the beam will also be ∼2× smaller in area in order to achieve
the needed resolution, so this higher dose can be achieved by focusing a fixed
current into a smaller spot size. Thus, in order to maintain throughput node to
node, currents and scanning rates of the electron beam must necessarily increase
∼2×. It is for this reason that enthusiasm for direct-write electron-beam lithography
has waxed and waned over time. While electron-beam systems have at times met
the throughput requirements for manufacturing, the ability to meet such needs
for subsequent nodes has involved engineering challenges not always overcome
in time to meet throughput and lithographic requirements.
Highly energetic electrons deposit a net negative charge to the substrate (see
Fig. 9.5). If the substrate builds up a negative charge, the electron beam is
eventually deflected electrostatically. In this situation, the electrical conductivity
of silicon is an advantage, and pattern-placement errors due to such hypothesized
deflections have been smaller than measurement capability (20 nm) at the time the
experiments were performed.30 However, for the very tight overlay requirements
of the 22-nm node and beyond, electrically conducting layers under the resist may
be required. On the other hand, electron-beam lithography systems are operated
in vacuum, so the interferometers used for controlling wafer-stage position are not
subject to air-induced noise. This enables the use of somewhat simpler wafer stages
than for very advanced optical-exposure tools.
Alternative Lithography Techniques 469
k m 1/3 I 2/3 L
∆α = , (13.6)
4πε0 4 V 4/3 rb4/3
where m is the mass of the electron, I is the beam current, V is the beam voltage,
ε0 is the permittivity of free space, and k is a constant ∼1.5–2. The amount of beam
blur depends upon the electron optical design, but Eq. (13.6) shows that the blur
increases with current, regardless of the design. Single columns produce beams
with blur that is too great to support the resolution requirements of the 22-nm node
and beyond when the beam current reaches 10 µA. Hence, there is a fundamental
tradeoff between resolution and throughput. Attempts to neutralize the beam with
ions do not eliminate the stochastic beam blurring, which results from random
interactions among particles.
The time t required to expose an area A covered with a resist of sensitivity S ,
using a beam of current I, is given by
AS
t= . (13.7)
I
This shows the direct relationship between beam current and throughput.
Limitations on shot noise–induced LER prevent the resist sensitivity S from being
reduced significantly to improve throughput.
Another factor limiting throughput is the ability to scan the beam controllably
at high speeds. Components for scanning the electron beam and blanking it on and
off involve electrostatic or electromagnetic elements, and the intrinsic resistances,
capacitances, and inductances of such components ultimately limit the speed at
which deflections can occur.41
Single-beam electron lithography is proven technology and provides useful
capability when patterning small areas is adequate. However, there are several
obstacles to achieving the throughputs required for producing integrated circuits
cost effectively with a single beam, including the need for high exposures to avoid
shot noise–induced LER, beam blur, and limitations on electronics speed. In order
to achieve writing times of less than one hour per wafer, some degree of parallel
imaging is required.
Figure 13.10 Schematic of the MAPPER aperture that generates separate electron
beams.
providing patterning capability. The apertures in the rows are offset by 2 µm,
and full-wafer coverage is accomplished by scanning each beam across 2 µm.
To minimize wafer heating, 5-keV electrons are used in the MAPPER system, in
contrast to the tens of kilovolt beams typically used for electron-beam lithography.
However, from Eq. (13.5), such low energies lead to imaging blurring due to
forward scattering. This can be limiting for 22-nm lithography and beyond, unless
very thin resist layers are used (see Problem 13.4).
In order to achieve throughputs of 10 wafers per hour or more, MAPPER
systems with tens of thousands of beams have been proposed. This large number
of beams also enables redundancy to compensate for a few beams that might not
be functioning properly. With many beams, the total beam current necessary for
throughput of ≥10 wafers per hour can be achieved with low current per beam,
thereby avoiding unacceptable resolution lost due to stochastic beam blurring.
The multi-electron-beam systems from IMS, Vistec, and MAPPER are based on
concepts that circumvent some of the inherent current limitations of single-beam-
electron lithography systems. However, increased currents will heat the wafers, and
the resulting thermal expansion will cause overlay errors without compensation.
Alternative Lithography Techniques 473
Pt
∆T = , (13.8)
hAρc
where h is the wafer thickness, ρ and c are the density and specific heat of silicon,
respectively, and P is the net flow of power into the section of the wafer under
consideration during the time of exposure. There is a flow of energy into the wafer
by the electron beams, while energy is conducted away in the form of heat, and P
is the difference between the rates of the two.
A detailed analysis of thermal wafer distortion is complex, but two extreme
cases can be considered, between which the overlay problem due to wafer heating
by the electron beams is bounded. The following analysis is adapted from Ref. 29.
If we can imagine that the volume hA of silicon is thermally isolated, then P equals
the power input from the electron beam. With a throughput target of 10 wafers per
hour, and a 120 µC/cm2 resist sensitivity, from Eq. (13.7) a beam current of at least
0.24 mA is required. Equation (13.8) can be used to calculate the temperature rise
resulting from heating by the electron beam, with examples given in Table 13.2.
The case opposite of thermal isolation is that where heat is conducted away
readily. An infinitely thick wafer serves as a model for this case, since there are
no thermal boundaries in that instance. The heated area can be considered to be
a wafer of effective thickness heff equal to the thermal diffusion length, which is
given by29
s
πkt
heff = , (13.9)
ρc
Table 13.2 Wafer heating and resulting uncompensated registration errors for two cases
of electron-beam direct write.
Parameter 5-kV beam voltage 50-kV beam voltage
field size is 26 × 32 mm, and the total beam current is 0.24 mA. This leads to an
effective thickness of 32 mm. Even in this ideal case, overlay errors greater than
2 nm can result from electron beam–induced heating, even with beam energies
as low as 5 keV. This is a significant fraction of the overlay requirements for
the 22-nm node and beyond. The problem is substantially more challenging with
high-energy beams. In principle, these registration errors can be corrected to some
degree by detailed modeling of the wafer heating, and deflecting the electron beams
to compensate.
are very fragile,58 and it has proven difficult to fabricate such masks with large
areas. Another potential limitation of projection-electron lithography is mask
heating. To achieve adequate throughput, high currents are needed. For masks that
create opaque regions by simply blocking and absorbing the electrons, the masks
become hot. To overcome this problem, a very clever scheme has been proposed,59
where nearly all electrons are allowed to pass through the mask. Instead of
absorbing electrons in those portions of the mask that are supposed to correspond
to unexposed areas on the wafer, the electrons are scattered by high-atomic-number
materials, such as tungsten, tantalum, or other materials used for x-ray masks.
At a focal plane within the electron optics is a physical aperture, through which
unscattered electrons pass. However, only a very small fraction of the scattered
electrons pass through the aperture (Fig. 13.11). Those portions of the mask
corresponding to regions of the design that are supposed to be exposed on the wafer
must allow electrons to penetrate with little scattering. As an alternative to stencil
masks, the scattering materials are placed on a thin membrane of low-atomic-
number material, such as silicon nitride or diamond-like carbon.60 Fabricating
masks on a continuous membrane, rather than by use of stencils, also increases
mask manufacturability and durability. This combination of a scattering mask, in
conjunction with a focal plane aperture in the electron optics, has been given the
name SCALPEL, the acronym for SCattering with Angular Limitation-Projection
Electron-beam Lithography. The SCALPEL approach has demonstrated patterning
capability. Shown in Fig. 13.12 are 80-nm contacts created using SCALPEL.61
Membrane and stencil masks are inherently thin and fragile. Most obviously,
there is the potential for damage to such masks. Another problem associated with
large-field membrane masks is distortion, a problem that was discussed in the
section on x-ray lithography. An innovative solution is to break the design into
1 × 12-mm sections on the masks, separated by tall struts. SCALPEL masks are
usually made from silicon wafers, so the strut heights equal the thickness of a
standard silicon wafer (0.75 mm for SEMI-standard 200-mm-diameter wafers).
With 4× lens reduction, the SCALPEL field size on the wafer is 0.25 mm × 3.0 mm,
so the thin membrane area between the struts on the mask is 1.0 mm × 12.0 mm.
To create a complete chip pattern on the wafer, the individual 0.25-mm × 3.0-mm
fields that comprise the total field need to be stitched together.
This stitching is not a trivial challenge for the SCALPEL technology, since
the left half of features, written in one field, need to line up with the right
half, with a tolerance that is a small fraction of the linewidth. What complicates
this stitching is the heating of the mask and wafer that occurs during exposure.
While very little energy is deposited into the SCALPEL mask, the membranes,
by being so thin, have very little thermal mass, and they heat appreciably during
exposures even though little total energy is deposited. Simulations show that the
temperature increase, which occurs nonuniformly across the mask, exceeds 7 ◦ C,
causing deformations of ∼20 nm.62 Similarly, most of the energy of the 100-keV
electrons used in the SCALPEL technology is deposited into the silicon wafer,
causing temperature rises calculated to be several ◦ C.63 This problem of wafer
heating has already been discussed in this chapter. In principle, adjustments can be
performed with the electron-beam optics, rather than through mechanical motion,
enabling very rapid corrections. Compensation for thermal distortions needs to be
accomplished through software, which is known to be the least-reliable component
of semiconductor manufacturing equipment.
As with the other next-generation lithography technologies discussed thus far,
such as x-ray and EUV lithography, SCALPEL masks cannot be protected by
pellicles from particulate contamination. It is apparent that mask-defect mitigation
will be a general problem for postoptical lithography.
Throughput is another challenge for SCALPEL. Because of the very high
currents involved in a large-field approach to e-beam lithography, the problem of
stochastic scattering is particularly severe. The amount of blur that can be tolerated
decreases as feature sizes shrink. As a consequence, the beam current must be
Alternative Lithography Techniques 477
F~ = m~a. (13.10)
Figure 13.14 Geometry that illustrates the donut problem. The dark areas represent the
opaque part of the mask, while the white area is open.
Alternative Lithography Techniques 479
carbon film provides high emissivity and resistance against ion damage to enable
radiative cooling of the ion-projection masks.70
to the 1× nature of the template, difficulties similar to those experienced with x-ray
masks might be expected. However, there are some aspects of template fabrication
for imprint lithography that make them easier to produce than x-ray masks. In
particular, it was necessary to fabricate x-ray masks on thin membranes, which
made registration control very difficult. In contast, imprint templates are formed
on rigid glass substrates. Even on glass substrates, the 1× nature of the template
represents a formidible challenge.
Imprint lithography is a contact-patterning method. Projection optical
lithography was developed as a replacement for optical-contact printing because
defect levels were too high with contact printing to support high levels of
integration. Nevertheless, imprint lithography is used in applications which are
defect tolerant, have loose or no requirements for overlay, and low levels of
integration. One example of the application of imprint lithography is the generation
of patterned media for magnetic storage.76,77
self-assembled pattern will resemble that in Fig. 13.18, where pitch splitting is
achieved. The use of an underlayer film to guide the self-assembly79 is referred
to as chemical epitaxy. It is also possible to use physical features, as illustrated
in Fig. 13.19. Without anchoring the pattern, self-assembling materials will form
random patterns, such as seen in the left side of Fig. 13.20. Properly directed, self-
assembled films can form useful patterns.
It is also possible to create arrays of dots or holes by using blocks of different
sizes, as illustrated in Fig. 13.21. The polymers will be oriented perpendicular
to the substrate. Depending on properties of the specific block polymers, square
or hexagonal arrays can be formed. The resulting arrays are useful for creating
patterns for magnetic storage, for example.81
Figure 13.18 Pitch splitting by the use of directed self-assembly and chemical epitaxy.
482 Chapter 13
Figure 13.19 Pitch splitting by the use of directed self-assembly and graphoepitaxy.
Figure 13.20 Materials with capability for self-assembly will form random patterns (left),
while achored features (right) form patterns useful for circuit fabrication.80
lithography and its high cost at its most capable, there are motivations to try
alternatives. This is particularly true for making products in low volumes, where
mask costs become prohibitive. Whether any of the techniques described in this
chapter ultimately are used in high-volume manufacturing is something that we
can look forward to seeing in the future.
Problems
13.1 Consider an x-ray lithography system with a gap of 20 µm and a distance of
2 m between the point source and mask. If the distance between the center and
edge of the exposure field is 10 mm, what is the pattern shift for geometries at
the edges of exposure fields, compared to the center due to the noncollimated
light source [Eq. (13.1)]?
13.2 For x-ray lithography, λ ≈ 1 nm. What gap between the mask and wafer is
required for a resolution of 50 nm, assuming α = 1? Is control of such a gap
practical?
13.3 Show that the deBroglie wavelength of a 5-keV electron is 1.7 × 10−2 nm and
is 5.0 × 10−3 nm for a 50-keV electron. Comment on the resolution limit of
electron lithography due to the wavelength of electrons.
13.4 The full-width half-maximum (FWHM) of a Gaussian distribution with a
standard deviation of σ is 2.36σ. Suppose that we want to pattern 22-nm
features using 5-keV electron beams, and assume that 22 nm ≈ FWHM of
the beam. Using Eq. (13.5), show that the resist can be no thicker than 36 nm
to achieve this resolution.
13.5 For electron-beam lithography, show that a dose of at least 120 µC/cm2 is
required to maintain <5% 3σ dose variation in 22 × 22-nm contacts.
13.6 Assuming that all other technical difficulties can be overcome, show that the
time required to expose one tenth of the area of a 300-mm wafer covered with
a resist with 120 µC/cm2 sensitivity and using a beam current of 10 µA is 14
min.
References
1. P. A. Seese, K. D. Cummings, D. J. Resnik, A. W. Yanfo, W. A. Johnson,
G. M. Wells, and J. P. Wallace, “Accelerated radiation damage testing of x-ray
mask membrane materials,” Proc. SPIE 1924, 457–466 (1993).
2. G. E. Georgiou, C. A. Jankowski, and T. A. Palumbo, “DC electroplating of
sub-micron gold patterns on X-ray masks,” Proc. SPIE 471, 96–102 (1984).
3. K. D. Cummings, D. J. Resnick, J. Frackoviak, R. R. Kola, L. E. Trimble,
B. Grant, S. Silverman, L. Haas, and B. Jennings, “Study of electron beam
patterning of resist on tungsten x-ray masks,” J. Vac. Sci. Technol. B 11(6),
2872–2875 (1993).
484 Chapter 13
35. H. C. Pfeiffer, “Variable spot shaping for electron beam lithography,” J. Vac.
Sci. Technol. 15(3), 887–890 (1978).
36. M. A. Sturans, J. G. Hartley, H. C. Pfeiffer, R. S. Dhaliwal, T. R. Groves, J. W.
Pavick, R. J. Quickle, C. S. Clement, G. J. Dick, W. A. Enichen, M. S. Gordon,
R. A. Kendall, C. A. Kostek, D. J. Pinckney, C. F. Robinson, J. D. Rockrohr,
J. M. Safran, J. J. Senesi, and E. V. Tressler, “EL5: one tool for advanced x-ray
and chrome on glass mask making,” J. Vac. Sci. Technol. B 16(6), 3164–3167
(1998).
37. Y. Pain, M. Charpin, Y. Laplanche, D. Herisson, J. Todeschini, R. Palla,
A. Beverina, H. Leininger, S. Tourniol, M. Broekaart, E. Luce, F. Judong,
K. Brosselin, Y. Le Friec, F. Leverd, S. Del Medico, V. De Jonghe, D. Henry,
M. Woo, and F. Arnaud, “Advanced patterning studies using shaped e-beam
lithography for 65 nm CMOS pre-production,” Proc. SPIE 537, 560–571
(2003).
38. G. H. Jansen, Interactions in Particle Beams, Academic Press, Boston (1990).
39. L. R. Harriott, S. D. Berger, J. A. Liddle, G. P. Watson, and M. M. Mkrtchyan,
“Space charge effects in projection charged particle lithography systems,” J.
Vac. Sci. Technol. B 13(6), 2404–2408 (1995).
40. M. M. Mkrtchyan, J. A. Liddle, S. D. Berger, L. R. Harriott, A. M. Schwartz,
and J. M. Gibson, “An analytical model of stochastic interaction effects in
projection systems using a nearest-neighbor approach,” J. Vac. Sci. Technol. B
12(6), 3508–3512 (1994).
41. M. Gesley and P. Condran, “Electron beam blanker optics,” J. Vac. Sci.
Technol. B 8(6), 1666–1672 (1990).
42. T. H. P. Chang, “Electron beam microcolumns for lithography and related
applications,” J. Vac. Sci. Technol. B 14(6), 3774–3781 (1996).
43. S. Arai, “Fast electron-beam lithography system with 1024 beams individually
controlled by a blanking aperture,” Jpn. J. Appl. Phys. 32, 6012–6017 (1995).
44. M. Slodowski, H.-J. Doering, T. Elster, and I. A. Stolberg, “Coulomb blur
advantage of a multi shaped beam lithography approach,” Proc. SPIE 7271,
72710Q (2009).
45. C. Klein, E. Platzgummer, H. Loeschner, G. Gross, P. Dolezel, M. Tmej,
V. Kolarik, W. Klingler, F. Letzkus, J. Butschke, M. Irmscher, M. Witt, and
W. Pilz, “Projection mask-less lithography (PML2): proof-of-concept setup
and first experimental results,” Proc. SPIE 6921, 69211O (2008).
46. C. Klein, E. Platzgummer, J. Klikovits, W. Piller, H. Loeschner, T. Bejdak,
P. Dolezal, V. Kolarik, W. Klingler, F. Letzkus, J. Butschke, M. Irmscher,
M. Witt, W. Pilz, P. Jaschinsky, F. Thrum, C. Hohle, J. Kretz, J. T. Nogatch,
and A. Zepka, “PML2: the maskless multibeam solution for the 22 nm node
and beyond,” Proc. SPIE 7271, 72710N (2009).
Alternative Lithography Techniques 487
where ρ is the reflectivity of the resist-silicon interface, c is the speed of light, and
n is the index of refraction of the resist.
The time-averaged intensity at point ~x is given by:
* !2 +
2h
I(~x) = A(t) + ρ × A t + ,
(A.2)
c
where h· · ·i indicates time averaging and all quantities are evaluated at the point ~x.
E 2 * !2 +
D 2hn
I(x) = |A(t)| + ρ A t +
2
* !+ * c !+
2hn 2hn (A.3)
+ A(t)ρ ∗ A ∗ t + + A ∗ (t)ρA t +
c c
" !#
2 2hn
= 1 + ρ I0 + 2Re ρΓ , (A.4)
c
where
D E
I0 = |A(t)|2 (A.5)
and
|Γ (τ)| ≤ I0 . (A.7)
That is, the coherence can only degrade in time. The light intensity in the resist
will depend upon the quantity Γ, that is, upon the coherence properties of the light.
For completely incoherent light, Γ = 0, in which case the total intensity is the
sum of the intensity of the incident and reflected light. For completely incoherent
light, light intensity is independent of the relative phases of the incident and
reflected waves. An example of completely coherent light is a plane wave:
A = A0 ei(kx+ωt) . (A.8)
|Γ(τ)| = 1 (A.9)
for all values of τ. In this case, the light intensity at any point in the resist film
depends significantly on the relative phases between incident and reflected waves.
It has been shown that propagating light maintains a high degree of coherence
over a time, referred to as the coherence time, which is inversely proportional to
the bandwidth of the light. For light with a Lorentzian distribution, such as that
produced by a high-pressure mercury-arc lamp, the coherence time is given by1
0.318
τc = , (A.10)
∆ν
Coherence 493
sin θi
σ= , (A.11)
sin θ0
the desired degree of coherence without sacrificing light intensity, and therefore
throughput and productivity.
Problems
A.1 From the values for spectral width given in Chapter 5, and using Eq. (A.10),
does the light remain highly coherent throughout the depths of resist films
used in semiconductor lithography (0.2–2.0 µm thick)? For resist films used
to fabricate thin-film heads for magnetic recording (10–20 µm thick)? For the
resist films used in micromachining (20–200 µm thick)?
References
1. J. W. Goodman, Statistical Optics, John Wiley and Sons, New York (1985).
2. J. Hecht, The Laser Guidebook, 2nd ed., McGraw Hill, New York (1999).
3. K. Jain, Excimer Laser Lithography, SPIE Press, Bellingham, Washington
(1990).
4. Y. Ozaki, K. Takamoto, and A. Yoshikawa, “Effect of temporal and spatial
coherence of light source on patterning characteristics in KrF excimer laser
lithography,” Proc. SPIE 922, 444–448 (1988).
5. T. E. Jewell, J. H. Bennewitz, G. C. Escher, and V. Pol, “Effect of laser
characteristics on the performance of a deep UV projection system,” Proc. SPIE
774, 124–132 (1987).
Index
D direct-write electron-beam
dark field, 221 lithography, 468
alignment, 226 directed self-assembly, 480
data discharge-produced plasma, 437
compression, 466 source, 436
format, 264 dispense nozzle, 60
volume, 258 dispersion, 168
decompaction, 172 distortion, 119, 159
deep ultraviolet photoresist, 70 Doppler shift, 188
defects, 374 dose control, 163
dehydration, 55 dose to clear, 23
dense optical proximity correction, double Gaussian, 167
326 double patterning, 395, 397, 399
depth-of-field, 260 downtime, 414
depth-of-focus, 32–35, 37, 41, 43, DSW4800, 149, 151
166, 373, 377 dual-stage exposure tool, 195
individual, 38, 40 dyed resist, 81, 133, 139
usable, 38, 40 dynamic dispense, 60
design cost, 418
design for manufacturability, 343, 394 E
developer, 77 e-beam-sensitive resist, 278
development, 2 E-zero, 23
rate, 76, 81, 140 EBM-7000, 265
diazonaphthoquinone (DNQ), 68 edge bead, 64
resist, 68, 69
removal, 64, 67
die-by-die alignment, 227, 410
edge-placement error, 326
die-to-database inspection, 290
electric field–induced metal
die-to-die inspection, 289
migration, 288
diethylaminotrimethylsilane
electrical linewidth metrology, 361,
(DEATS), 55
363
diffracted beams, 12
diffraction, 9, 12, 19, 111 electron
grating, 10 scattering, 271
limit, 114, 120, 378 storage ring, 463
diffractive optical element, 164, 315 electron-beam
diffusion, 90, 131, 386 assisted chemical etching, 290
coefficient, 132 direct-write lithography, 466
length, 80 exposure system, 262
diffusion-enhanced silylating resist lithography, 467
(DESIRE), 94 writer, 262, 267
diffusivity, 91 electrostatic
Dill parameters, 124 charge, 274
dipole illumination, 311, 313 chuck, 432, 434
498 Index
humidity, 66 J
Huygen’s principle, 110 JEOL, 265
jerk, 191
I
IDEAL, 315 K
illumination, 7 Kodak’s thin-film resist (KTFR), 51
system, 162 KrF excimer lasers, 154, 158, 160
uniformity, 162 KrF lithography, 71
image L
fading, 193, 195 labor cost, 420
log slope, 26, 28, 30, 31, 44, 89, Lambert’s law, 122
329, 446 lanthanum hexaboride, 263, 267
placement, 258 laser ablation, 290
error, 434 laser-produced plasma, 437
imaging, 7 source, 436
immersion leaching, 376
fluid, 372 Leica, 264, 469
lithography, 371, 372 lens
immiscible blocks, 480 aberration reduction, 389
imprint lithography, 479 distortion, 239
IMS, 471 field curvature, 40
incoherent heating, 170, 171, 180
illumination, 18 pixel, 182
light, 16 reduction, 4
individual depth-of-focus, 38, 40 factor, 167
infrared aberration control system, lens-placement error, 245
180 leveling agents, 69
injection locking, 160 light-intensity distribution, 7, 13, 43
inspection, 2 line-edge
integral-squared pulse width, 156 deviation, 87
interfering plane waves, 12 roughness, 82, 84, 86, 88–90, 358,
interferometric lithography, 400 387, 444, 470
intrafield linearity, 258
error, 231 linewidth, 20, 21
registration, 365 control, 16, 390
inverse lithography, 342 measurement, 351
ion-projection lithography, 477 roughness, 82, 85, 358
IPRO4, 274 lithium, 437
iso-dense bias, 317 lithography cost, 407, 409
isofocal low k1 , 387
dose, 42 lutetium aluminum garnet, 383
points, 42 Lyman-α wavelength, 384
500 Index
M moving
Mack model, 140 average, 193
magnification error, 231 standard deviation, 193
maintenance cost, 420 multi-electron-beam lithography, 471
manufacturing electron-beam multilayer
exposure system, 262, 267 reflector, 425
MAPPER, 471 resist process, 92
mask, 2, 147, 259 multipass
cost, 418 gray, 269
defect inspection, 289 writing strategy, 281
defect printability, 291
N
deformation, 460
nanoimprint lithography, 479
distortion, 261 National Institute of Standards and
nonflatness, 434 Technology, 164
registration error, 257 Nd-YAG laser, 438, 464
roughness, 447 negative resist, 2, 51, 52
usage, 417 next-generation lithography, 459
mask-error factor, 327–329 Nikon, 149, 151, 168, 175, 182, 196,
master oscillator power amplifier, 161 226, 227
master oscillator power oscillator, 162 NIST, 356
matching error, 237 nitrogen, 68, 70
memory, 390 nonconcentric
mercury-arc lamp, 151, 152 field, 422
metrology cost, 420 matching, 247
Michelson interferometer, 187 nonlinear overlay error, 231
Micralign, 199, 200 nonlinearity, 318
Micrascan, 152, 158, 176, 177, 183 nontelecentric imaging, 430
microinjectors, 61 normalized
Micronic Laser Systems, 277 derivative, 28, 31
mirror reflectance, 429 image log slope, 31, 446
mirror-surface figure, 442 novolak resin, 68, 69
mix-and-match lithography, 237, 422 NuFlare, 264
Mo/Si numerical aperture, 15, 16
multilayer, 427 maximum, 381
reflector, 427 Nyquist frequency, 359
model-based optical proximity O
correction, 324 OASIS, 275
modeling, 109 off-axis
modulation transfer function, 17, 319, alignment, 222
327 illumination, 166, 308, 311, 316,
Monte Carlo simulation, 271, 272 339, 340, 379
Index 501
theoretical contrast, 82 V
thermal distortion, 468 vapor priming, 55, 56, 67
thermal-field emission source, 263 variable-shaped beam, 263
thermionic emitter, 263 vector
thick resist, 37, 64, 67 scanning, 264
thin-film optical effects, 125 system, 264
thin-resist model, 35, 42, 43 vector-shaped beam, 268
third-order distortion, 240 vibration, 192, 410
thorium, 153 virtual addressing, 269
three-beam imaging, 314 viscosity, 57, 62
Vistec, 264, 471
through-the-lens, 221, 223
alignment, 223, 225 W
throughput, 410, 412, 421, 468 wafer
TiN, 135 expansion, 220
tin, 437, 438 heating, 473
tool-induced shift, 364 scaling, 231
top antireflection coating, 136 stage, 149, 187
top-surface imaging, 93, 94 steppers, 3, 4, 147
topcoat, 74, 376 wafer-edge defects, 375
total-integrated energy, 155 wafer-induced shift, 365
tracks, 67 water, 382
translation error, 230 temperature, 374
wavefront error, 116
trapezoid error, 232
wavelength, 15
TRE, 151
working distance, 175
trilayer resist process, 93
Wynne–Dyson design, 176
trimethylsilyldiethylamine
(TMSDEA), 55 X
Tropel, 182 x-ray
Twinscan, 196 lithography, 459
two-beam imaging, 313 mask fabrication, 461
two-pass printing, 265 source, 462
tyranny of the asymptote, 389 xenon, 436, 439
Y
U
yaw, 190
ULE, 441
Ultratech, 151 Z
Stepper, 149, 175 Zeiss, 182
ultrathin resist, 62 ZEP 7000, 279, 280
underexposed, 31 Zernike polynomials, 116
usable depth-of-focus, 38, 40 zero-level alignment strategy, 224
utilization, 410, 414 Zerodur, 238, 441
UVIIHS resist, 75 zoom optics, 165
Harry J. Levinson is a Senior Fellow and manager of
GLOBALFOUNDRIES’s Strategic Lithography Technology
Department, which is responsible for advanced lithographic
processes and equipment. He started his career in bipolar
memory development at AMD, then spent some time at
Sierra Semiconductor and IBM, before returning to AMD—
now GLOBALFOUNDRIES—in 1994. During the course of
his career, Dr. Levinson has applied lithography to many different technologies,
including bipolar memories, 64Mb and 256Mb DRAM development, the
manufacturing of applications-specific integrated circuits, thin-film heads for
magnetic recording, flash memories, and advanced logic. He was one of the first
users of 5× steppers in Silicon Valley and was an early participant in 248-nm and
193-nm lithography. He also served for several years as the chairman of the USA
Lithography Technology Working Group that participates in the generation of the
lithography chapter of the International Technology Roadmap for Semiconductors.
He has published numerous articles on lithographic science, on topics ranging from
thin-film optical effects and metrics for imaging, to overlay and process control,
and he is the author of two books, Lithography Process Control and Principles of
Lithography. He holds over 40 U.S. patents. He is an SPIE Fellow and formerly
chaired the SPIE Publications Committee. He has a BS in engineering from Cornell
University and a PhD in Physics from the University of Pennsylvania.
COLOR PLATES
Figure 3.12 Resist uniformity contours as a function of air and resist temperatures for
Shin-Etsu 430S resist coated on an SVG ProCell. The data are single-standard deviations, in
units of Angstroms.21 The initial wafer temperature was 22 ◦ C. The most uniform coatings are
produced with resist and air temperatures slightly different from the initial wafer temperature
(see p. 60).
Direction
of the
optical
axis
Figure 4.7 Aerial image-intensity contours for a 0.4-µm space on the mask.10 For an
unaberrated lens, the intensity contours have left-right symmetry and are also symmetric
across the plane of best focus. Images produced by lenses with coma (Z7 ) lose the left-right
symmetry, while spherical aberration (Z9 ) breaks the symmetry across the plane of best
focus. The pictures in this figure were simulated with Solid-C for an i-line tool with NA = 0.6
and σ = 0.5. For the aberrated images, 50 nm were assumed for each aberration (see
p. 118).
Figure 7.10 Exposure dose contours from Gaussians placed at integer coordinates (x, y),
with x ≤ 0 (see p. 270).
Figure 7.32 (a) A scanning-electron micrograph of a bridging defect on a mask, and (b)
the measured aerial image from the mask. (c) The corresponding results after the mask was
repaired (see p. 292).