Benefits of Moving To Plug-And-Play Hierarchical DFT
Benefits of Moving To Plug-And-Play Hierarchical DFT
Benefits of Moving To Plug-And-Play Hierarchical DFT
A major challenge for many current and upcoming IC devices is how to create the test
patterns for huge designs. For designs with 100 million gates, or even one billion gates, the
traditional approach of waiting until the design is completed before creating test patterns is
impractical and unrealistic; it would require too much compute power and too much time to
generate all those patterns. A hierarchical DFT approach solves this problem by completing
the DFT insertion and pattern generation at the block, or core level. This provides an order of
magnitude faster pattern generation time, and requires an order of magnitude less compute
resources. It also lets you complete most of the DFT and pattern generation work much
earlier in the design process, which is a huge aid in predictability and lowering risk. This
article describes the hierarchical DFT processes of inserting scan wrappers, generating
graybox images for the cores, and retargeting the core-level patterns to the IC top level as a
simple mapping step.
Traditional Full-Chip ATPG is Running Out of Steam
As IC designs have grown in size and performance, automatic test pattern generation (ATPG)
tools have done an excellent job keeping up. New fault models and pattern types were
introduced to detect novel defects that came with each successive fabrication technology. At
one point, the number of tester cycles needed to apply the necessary patterns became
impractical, and embedded compression was introduced to solve that problem. This provided
roughly a 100x improvement in efficiency and allowed ATPG to keep pace with modern
designs. In addition, multiprocessing ATPG, with distributed processes and multithreaded
processes, have helped to control ATPG run times. However, many designs have become so
large and complex that even if ATPG manages many of the challenges, the traditional
methodology of waiting until the complete chip is done before creating patterns presents
several major obstacles.
Here are a few of the more significant issues that can occur when creating patterns and testing
the IC as a whole entity when the entire design is complete:
Pattern generation on the entire IC would require a very large workstation for large designs.
Some companies already need to use 256 Gig workstations with lots of swap space for
existing designs.
Run time could be extremely long for pattern generation on very large designs.
The pattern generation has to wait until late in the design cycle and could become part of the
critical path. If a problem occurs during ATPG, then it could impact design tapeout.
Because the IC is tested all at once, the power could be higher than desired.
In addition to the issues mentioned above, sometimes it makes more sense to focus most test
resources on one block or core at a time. This is because the types of patterns and clocking
needed for two blocks might differ too much to be applied concurrently. Lets look at a
simple example of an IC with core 1 instantiated twice and core 2 instantiated once. Here are
a few situations that could make testing core 1 and core 2 concurrently, inefficient and
possibly ineffective:
Both cores use the same master on-chip clock controller but require different clock
sequences.
Core 1 requires 500 patterns and core 2 needs 5000 patterns. If they are tested in parallel
then any IO going to core 1 would be wasted after the first 500 patterns.
Why Plug-and-Play Makes Sense
The general idea of plug-and-play continues to proliferate in modern society. This nod to
usabilityjust plug it in and it worksis a necessity for many types of businesses that wish
to stay competitive as suppliers and customers become more distributed and diversified. Any
mobile phone could not be competitive today without a plug-and-play interface for the
multitude of external vendors to supply applications.
The simplicity of integration that plug-and-play provides is also an important aspect to some
of the current challenges with IC test. Even the IC test infrastructure has started to move to
more plug-and-play practices with the adoption of IJTAG [IEEE P1687]. DFT for cores and
patterns can also be treated as plug-and-play.
One important benefit of such as methodology is that you can do all the work early on in the
process at the core level. This lowers many types of risks since any issues can be worked out
early and makes the final IC test architecture and results more predictable. Doing more test
work at the core level also enables separate development teams to work independently and
then deliver the standard DFT practice and pattern data to the chip integrator. In addition,
once the design and pattern data is complete, that same data can be re-used in any IC design
employing that core.
Plug-and-play test approaches are also very flexible. If there is a problem in the design and an
ECO is necessary, then only the core with the ECO would need to have patterns regenerated.
as close to the core IO as possible. We accomplish this with special scan chains referred to as
wrapper chains. DFT tools can start at the core IO and traverse the core logic until it finds the
first flop and then includes it in the wrapper chain. These are called shared wrapper cells
because they perform both functional and test tasks. Many designs include registered IO so
that signals entering or exiting the core have well-defined timing. This makes wrapper
insertion pretty simple. However, there are often cases where there is too much combinatorial
logic between the IO and flops. Thus, DFT tools let the user see an assessment of how much
logic is between each IO and flop prior to inserting the wrapper chains. Alternatively, the user
can set a threshold so that existing functional flops are used unless the threshold is exceeded,
and in those cases a new dedicated wrapper cell is automatically added. An effective tool will
identify as many shared wrapper cells as possible, and only add dedicated wrapper cells as a
last resort. This can save significant silicon area and reduce impact to functional timing.
Wrapper chains are automatically balanced with core internal scan chains so they can be used
efficiently with embedded compression. Separate scan_enable signals are used with wrapper
chains so that at-speed test of the core can be supported regardless of the external
connections. They also let the wrappers be used for top-level IC interconnect test between
cores.
The wrapper chains not only make the cores independent but also support top-level IC
modeling and rules checking. Once the wrapper chains are inserted, a DFT tool process can
analyze any core and figure out what logic existing between the IO and wrapper chains. Then
a small image of the core, called a graybox, is written out with this logic (Figure1).
Grayboxes are used to verify that top-level core connections are correct (design rule checks)
and are used to create simple interconnect tests between the various cores. Because the
graybox only uses a small amount of the core logic, the design image is typically an order of
magnitude smaller than the full core design. Consequently, there is no need to ever have the
IC design include any of the full core netlists.
On-chip clock controllers (OCCs) are sometimes within cores and sometimes placed at the IC
top level. Both approaches are supported with hierarchical DFT. However, if the OCC is
located inside the core then the core itself is more independent. Otherwise, there could be
dependencies between cores sharing the same OCC that inhibit multiple core patterns from
being applied concurrently.
There is additional flexibility with graybox generation such that any DFT logic or other logic
that the user wants to include in the graybox (or exclude) can be defined.
Pattern Generation at the Core Level
Once wrapper chains, internal scan chains, and embedded compression is inserted into a core
it is ready for ATPG. As mentioned previously, an advantage of hierarchical DFT is that core
DFT and ATPG can be performed completely independent from other cores (Figure 2). The
wrapper chains enable ATPG to achieve high coverage even if the IO values are unknown.
The ATPG tool just needs to be instructed that the patterns are expected to be retargeted so
that unknown values are placed at IOs and proper data is saved out, such as any clocks or
constrained pins that need to be verified at the IC top level.
If a core is duplicated several times in a design, then the core ATPG only needs to be
performed once. The retargeting step can apply the pattern data to all blocks in parallel. Using
this approach, core-level DFT logic and pattern verification can be completed as soon as the
core design is complete.
Retargeting and Merging Core Patterns to the Top
Top-level IC pattern integration is quick and easy with a hierarchical DFT approach. The first
step is to perform some basic DFT design rule checks (DRCs). All that is needed for this step
is an IC level netlist with graybox models of each core (Figure 3). Hierarchical DFT
approaches often use an IC-level test access mechanism (TAM) to port IC IOs to the
particular block or groups of blocks that will be tested. It can be as simple as a few muxes or
much more sophisticated. Duplicate cores usually have input channels that are broadcast to
all cores in parallel so the same tests can be applied from just one set of input channels. Our
suggestion for a TAM is to base it on IJTAG because it is a very broad and flexible standard,
and because it is the most plug-and-play option available.
The design image with the TAM and core grayboxes is dramatically smaller than a full
netlist, but provides enough information about the core IO and DFT logic that thorough
DRCs can be performed. Once the DRCs are verified, the core patterns can be automatically
retargeted so they are executed from the IC level. Even though core-level patterns are
generated independently, pattern retargeting can merge and apply them so they are executed
in parallel, as long as the TAM allows parallel access to the blocks.
The last step in a hierarchical approach is to generate IC-level patterns that test the
interconnects between the various cores. Graybox models are used for this. It is an ATPG step
late in the design process because all the core designs and TAM must exist first. However, it
is a simple circuit and ATPG should be quick and simple.
Whats Next?
The basic hierarchical DFT features of scan and wrapper insertion, graybox generation, and
pattern retargeting provide a significant advantage for many designs. There is still some work
to optimize which blocks are most efficient to test in parallel and which to test in series.
Effective top-level planning requires some information about the core patterns to be efficient.
Similar to compression analysis features that help determine optimal compression
configurations, top-level TAM planning is more efficient when the core designs are available.
One method being developed to work around this issue is to have a dynamic allocation of IC
channel bandwidth to cores. This way, core pattern properties do not need to be known prior
to designing the TAM. In addition, dynamically allocating scan channels will reduce the
overall pattern set size.
To summarize, hierarchical DFT approaches are being adopted for many designs. It results in
an order of magnitude faster ATPG and smaller workstations because ATPG is only
performed at the core level. This is critical for very large designs with a hundred million gates
or more. Another significant motivation for hierarchical DFT is the greatly improved process
and the benefits of plug-and-play integration. As a result, more DFT and ATPG work can be
performed very early in the design cycle as soon as individual cores are completed, which
lowers risk, improves predictability, and is ECO friendly.