Siemens SW Streaming Scan Network WP 82735 C7

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Siemens EDA

Tessent Streaming Scan


Network
No-compromise packetitized test

Executive summary
Tessent™ Streaming Scan Network (SSN) is a system for packetized
delivery of scan test patterns. It enables simultaneous testing of any
number of cores with few chip-level pins and reduces test time and test
data volume. With SSN, DFT engineers have a true SoC DFT solution
without compromises between implementation effort and manufacturing
test cost.

Geir Eide, Siemens Digital Industries Software

siemens.com/software
White paper | Tessent Streaming Scan Network

Introduction number of chip-level pins through a set of muxes.


Which cores can be tested together is based on the mux
The increasing complexity in large System on Chip (SoC) network, and therefore has to be determined early in
designs presents challenges to all IC design disciplines, the design flow. More flexibility can be enabled through
including design-for-test (DFT). To alleviate some of a more complex mux network, which may lead to
those challenges, hierarchical DFT is used as a divide- routing congestion. With more cores and a steady
and-conquer approach where all DFT implementation, number of chip-level pins available for test, additional
including pattern generation and verification, is done at groups of cores and access configurations must be
the core level rather than the chip level. However, created. This impacts the DFT implementation effort,
hierarchical DFT by itself is no longer enough. silicon area, pattern retargeting complexity, and test
DFT managers have to make difficult, and sometimes time.
costly, trade-offs between test implementation effort The number of test cycles required to test a core is
and manufacturing test cost. determined by the compression configuration, the
This paper describes the basic components of the length of the scan chains, and the number of test
Tessent Streaming Scan Network (SSN), a packetized patterns per core. An important part of test planning is
test delivery technology designed to decouple core level to determine which cores should be tested together
and chip level DFT requirements. With SSN, DFT concurrently, as this is usually a hard-wired decision.
engineers have a true SoC DFT solution without In a linear bottom-up flow, where the number of scan
compromises between implementation effort and channels and compression configuration per core is
manufacturing test cost. fixed, this could result in sub-optimal results and wasted
test bandwidth, as illustrated in figure 1.
Test challenges More optimal results can be achieved through iterative
Let’s take a closer look at some of the challenges facing optimizations, where the test resources are balanced
DFT engineers today. across the cores that are tested together. This approach
is resource-intensive and could be impractical for
Planning and Layout complex designs with many levels of hierarchy. For
To test a group of cores concurrently using a traditional cores that are used in multiple designs, a compression
hierarchical pin-mux scan test approach, scan channel configuration that is optimal in one design, may not be
inputs and outputs are connected directly to a limited ideal in another.

Figure 1: The Hierarchical DFT Compromise: Implementation effort vs. bandwidth management.

Siemens Digital Industries Software 2


White paper | Tessent Streaming Scan Network

Effective handling of identical cores


One way to optimize test pattern volume for identical
The Streaming Scan Network
core instances is to broadcast the same pattern to the advantage
scan inputs from the same top-level pins. Outputs are
Tessent SSN decouples test delivery and core-level DFT
often observed independently to guarantee test
requirements. This means that core-level compression
coverage and to ensure diagnosability. Since at least
configuration can be defined completely independently
one output channel is needed per core instance, this
of chip IO limitations. Which cores will be tested
can limit the number of identical cores that can be
concurrently is selected programmatically, not hard-
tested concurrently. Another challenge is that the
wired. This concept dramatically reduces the DFT
capture clock is usually applied simultaneously to all
planning and implementation effort. Rather than having
cores. This implies that the number of pipeline stages
to trade off DFT implementation effort and
must be equal between a scan input pin and all the
manufacturing test cost, SSN lets you achieve the most
identical core instances it drives. This can be
optimal test data time and volume for your SoC without
challenging in the presence of tiling where no routing
expensive design iterations.
or logic may exist outside the cores.
With SSN, the compression and number of scan
Tile-based designs with abutment channels for a core is determined based on what results
Tile-based layout adds further complexity and in the most compact pattern set for that core by itself.
constraints to DFT architectures. Cores are designed to
Compression can be configured once and for all for
abut one another such that connections flow from one
cores that are used in multiple designs. SSN
core to the next, virtually eliminating top-level routing.
automatically distributes the available bandwidth
Any connectivity between cores has to flow through
among the active cores based on what’s required for
cores that are between them. Top-level logic has to be
each core, eliminating whitespace in the test data
pushed into the cores and designed as part of the cores.
(figure 2).

Figure 2: SSN enables a true bottom-up flow with multiple test cost reduction capabilities.

Siemens Digital Industries Software 3


White paper | Tessent Streaming Scan Network

In addition to the planning, implementation, and test per pattern set and only the scan payload is streamed
cost benefits, the SSN architecture eases routing and following the setup. There is no need to send any
timing closure and is fully compatible with tile-based opcode or address information with each packet. Each
design with abutment. SSH controls the local scan operations for the core,
including transitions between load/unload and capture
This is all made possible while still supporting all ATPG stages. All scan signals and EDT controls are generated
pattern types and fault models. SSN is also compatible by the SSH local to the core.
with all Tessent DFT methodologies and products, and
The SSN bus width is selected based on chip-level pin
has full support for diagnosis and yield analysis.
availability and is independent of the number and size
SSN technology and concepts of the scanned cores, and the number of channels
SSN is a bus-based scan data distribution architecture. needed by the EDT controller(s) in each core. With the
Figure 3 shows a simple example with a 6-core design same parallel bus width, each core has the same plug-
that uses SSN. Each core typically contains one and-play interface, allowing SSN to scale efficiently as
Streaming Scan Host (SSH) node (light blue box). The the design floorplan, number of cores, or the content of
SSH drives local scan resources with data delivered on the cores change.
the SSN bus. Although just a single compressor/
With SSN, the bus width and channel count of the cores
compactor is shown in the figure, the SSH node can
are independent of each other. Scan test data is
interface with one or more Tessent TestKompress
delivered synchronously across the bus in a packet
embedded deterministic test (EDT) controller(s),
format to each core. The number of bits a core can
uncompressed scan chains, or a combination of the
receive per packet is algorithmically determined from
two.
the pattern statistics of the cores running concurrently,
but cannot be greater than the number of scan (EDT)
channels. The data delivered from the tester may be
viewed as a continuous stream of packets that may
wrap around SSN bus boundaries.
Consider the example shown in figure 4 where two
blocks are being tested at the same time. Block A has
five scan channels and loads/unloads five bits per shift
cycle. Block B has four channels. For both blocks to
perform one shift cycle, nine bits have to be delivered.
In a conventional pin-mux scan access method, this
would have required nine chip-level scan input pins and
nine scan output pins. With SSN, the packet size is set to
nine bits independent of the SSN bus width, which is
Figure 3: SSN used in a 6-core design. eight bits in this example. The concept of packetized
data delivery in SSN is further explained in the
Appendix.
Each SSH has two external interfaces: An IEEE 1687
IJTAG interface and a parallel SSN data bus. The IJTAG
network is used to configure all nodes in the SSN
network before the application of scan test patterns.
Each node is loaded with information related to the
protocol such as the active bus width, its location in the
series of nodes driven, the number of shift cycles per
scan pattern, scan_enable transition timing information,
etc.
Following this setup, the entire scan test pattern set is
Figure 4: Testing two blocks at the same time.
applied as packetized data that is streamed on the
parallel SSN bus. The SSHs are programmed just once

Siemens Digital Industries Software 4


White paper | Tessent Streaming Scan Network

The ability to route the bus carrying the data from one scanned into the chip through 64 pins at 200 MHz and a
core to the next while dynamically controlling which bus frequency multiplier (BFM) added between the scan
cores are active/inactive/bypassed means one has inputs and the first SSH to convert this input stream to a
flexibility in accessing any combination of cores without 32-bit, 400 MHz bus. This 32-bit bus is then used across
changing the hardware. Unlike pin-mux architectures, the chip, connecting all SSH nodes with 32- bit buses.
this flexibility does not come at the expense of routing On the output side, a bus frequency divider (BFD) node
congestion. There is no need to try and predict at is added to convert the SSN output bus back to a 200
design time how to group cores that are to be tested MHz 64-bit bus driving the output pins.
concurrently. Whether performing ATPG on groups of
cores or retargeting patterns from different cores, the Optimizing test time and data volume
same SSN network can provide access to one core at a When ATPG is run with multiple interacting cores in
time, all cores simultaneously, or anything in-between. external test mode, it is necessary to align capture
cycles of all the affected cores. With SSN, the SSHs are
Time multiplexing programmed such that each core can shift
When driving multiple cores concurrently, the packet independently, but capture occurs concurrently once all
typically spans multiple bus widths, resulting in an cores have completed scan load/unload.
internal shift frequency slower than the SSN bus
frequency. In many cases, it is possible to implement a In other situations, such as when ATPG for wrapped
400MHz SSN bus, but not possible to shift data through cores with OCCs run in isolation, it is more effective if
the chip-level pins at more than 200 MHz. Assume that each core can independently transition between shift
the SoC has enough pins to implement 64 scan inputs and capture. A core with short scan chains should not
and 64 scan outputs. need to wait for other cores to complete shifting before
they can capture. Often there are significant imbalances
One option would be to implement a 64-bit bus in the pattern counts of different cores.
throughout the chip and operate it at 200 MHz.
Alternatively, as shown in figure 5, the data can be Traditional retargeting methods add padding for the
cores with fewer patterns resulting in wasted data,

Figure 5: Time multiplexing.

Siemens Digital Industries Software 5


White paper | Tessent Streaming Scan Network

cycles, and test time. If a core requires many fewer The same packet data is used by each identical core
overall shift cycles across a pattern set than other cores, instance, as it synchronously moves through the
it can be sent fewer bits per packet. network. Each core performs its own on-chip
comparison. A pass/fail “sticky” bit is observed on TDO.
For example, a core with 4 channels does not need to
The optional accumulated per-shift status can be added
be allocated 4 bits per packet. It can be throttled down
to the packet and observed on the SSN outputs.
and sent only 1 bit per packet such that it shifts
internally every four packets instead of every packet.
Tile-based designs
The result is that the total number of packets remains
SSN is designed to support the abutment of cores in
the same, but the size of the packets is reduced,
tile-based designs with no routing outside the cores.
speeding up the overall test time.
The outputs of one core connect to the inputs of the
next adjacent core. A chip with SSN usually has a single
Effective testing of identical cores
SSN datapath (parallel bus) that goes through all cores.
Many SoCs achieve high throughput by parallelizing
Depending on the floorplan and pad locations, it may
processing contain a number of cores that are replicated
be preferable for physical design to implement multiple,
multiple times. In pin-mux scan architectures, the scan
physically independent datapaths. Each datapath is also
inputs may be broadcast to identical core instances, but
configurable and can include muxes that can be
the scan outputs are usually observed independently to
programmed to include or exclude segments of the
ensure lossless mapping and observability for diagnosis.
network similar to the Segment Insertion Bit (SIB) in
SSN provides a scalable method for testing any number
IJTAG networks.
of identical core instances in near-constant test time,
independent of the number of available chip-level pins.
Input data, expected responses, and compare/
nocompare mask data are scanned in within each
packet, as illustrated in figure 6.

Figure 6: Testing any number of identical core instances.

Siemens Digital Industries Software 6


White paper | Tessent Streaming Scan Network

Implementing SSN in your design A comprehensive set of SSN verification capabilities,


including DRCs, dedicated testbenches, and network
Table 1 summarizes the requirements and integrity patterns, are available throughout the flow to
recommendations for implementing SSN in a design. ensure that any potential problems or mistakes are
For instance, as IJTAG is used to program the SSN captured early, after SSN insertion is complete, before
circuitry, IJTAG infrastructure is required. To enable synthesis. Later in the flow, loopback patterns help
independent shift and capture, standard (Tessent or 3rd validate the SSN network down to the individual cores,
party) OCC is required. without having to perform a complete simulation of the
There are two things that should be taken into scan patterns.
consideration during the design planning phase. First, The failure diagnosis flow is virtually identical to that of
at the core level, compression should be optimized to a hierarchical DFT flow. Failures captured on the tester
what provides you the best results (most compact are reverse mapped to core level failures. After the
pattern set) for that core in isolation. There is no need reverse mapping, layout-aware diagnosis is performed
to take chip-level resources or even the planned SSN with no limitations.
bus width into consideration.
Second, at the chip level, the SSN bus should be Industry results
planned out based on the number of pins available, and One study the Tessent team conducted with Intel,
the block diagram of the design. The SSN datapath “Streaming Scan Network (SSN): An Efficient Packetized
should be planned through physical regions of the Data Network for Testing of Complex SoCs”, was
design. In addition to the actual connectivity, in this published at the 2020 International Test Conference.
planning, you will also plan for muxes as needed for Intel designers compared SSN to a different packetized
debug return paths, and pipelines needed for timing. network as well as a traditional pin muxed solution.
What is not needed is any upfront planning of which They found SSN reduced the test data volume by 36%
regions run in parallel or relative order. and 43%, respectively. It reduced test cycles by 16% and
43%, respectively. Steps in the design and retargeting
The SSN implementation flow is described in detail in flow were between 10x – 20x faster with SSN compared
the “Tessent SSN Workflows” section of the Tessent to the other packetized solution.
Shell User’s Manual. A testcase demonstrating this flow
is included in the Tessent Shell release tree. The flow is
fully integrated with all other Tessent DFT such as
memory BIST.

Table 1. SSN readiness

Feature/capability Traditional hierarchical flow SSN


IJTAG When used for test_setup, enables “plug and play” of any Required. IJTAG is required for initializing and using SSN.
ICL modules and simplifies integration. Highly recommended for test_setup as it shares the same
network and ICL extraction with SSN.
Standard OCC Recommended for a wrapped core Required. Standard OCC is required to control SSN
clocking at wrapped core level.
TSDB automation Simplifies the data management of hierarchical designs, Highly Recommended
especially with large number of cores.
DFT signals Enables flow automation and simplifies test_setup Highly Recommended
DFT specification Enables flow automation and usage of DFT signals Highly Recommended. Creates SSN hardware & other
logic test IP (EDT and/or OCC)
EDT for wrapper chains Keeps edt/shift clk implementation and timing local to Highly Recommended
wrapped core
EDT dual configuration Reduces number of chip level resources needed for EDT Recommended. Further improves bandwidth optimiza-
channels tion.
Non-overlapping edt / shift Avoids retiming flops between EDT logic and scan chain SSH does not support generating non-overlapping edt/
clocks at the expense of slower shift rates. shift clocks.

Siemens Digital Industries Software 7


White paper | Tessent Streaming Scan Network

Summary
The SSN technology described in this paper solves many
of the scan distribution challenges in complex SoCs. By
decoupling chip and core level DFT, it enables
concurrent testing of any number of cores with few
chip- level pins, and it has multiple features to reduce
test time and test data volume.
It can test any number of identical core instances in
near-constant time, minimizes padding in the presence
of cores with mismatched pattern counts and/or scan
chain lengths, and enables fast streaming of data to/
from and throughout the chip. It simplifies design
planning and implementation and is especially well
suited for tile-based designs.
The SSN implementation flow is based on Tessent Shell
flow for hierarchical designs. SSN is fully supported by
Tessent TestKompress™ and Tessent Diagnosis, and can
co-exist with all other Tessent DFT technologies such as
Tessent MemoryBIST and Tessent LogicBIST.

Siemens Digital Industries Software 8


White paper | Tessent Streaming Scan Network

Appendix capture clocks through an OnCC, and re-asserting scan_


enable in preparation for the next scan operation.

In SSN terminology, a “packet” refers to all the scan data The stream of nine bit packets is folded into the 8-bit
needed for all the active SSH nodes to perform a single bus with no bits wasted. The first nine bit packet
internal scan shift operation. Key to the SSN occupies the first eight bit parallel word of the bus, and
architecture is that the size of a packet is independent the first bit of the second word (second tester cycle).
of the SSN bus width. The SSN payload delivered from The locations of the nine bit packets within each eight
the tester may be viewed as a continuous stream of bit bus word rotate with each packet, so that the second
packets that may wrap around the SSN bus boundaries. packet starts immediately after the first, occupying the
remaining seven bits of the second parallel word, and
Consider the example shown in figure 7 where two the 2 bits of the following parallel word. Typically, the
blocks are tested concurrently. Block A loads/unloads same time slots of the packet that carry scan-in data to
five bits per shift cycle (has five EDT channels). Block B an SSH node also carry scan-out data from that node. As
has four channels and loads/unloads four bits per shift block A reads the first 5 bits of every packet, it replaces
cycle. For both blocks to perform one shift cycle, nine them with 5 bits scanned out (with slight latency).
bits have to be delivered. In conventional pin-mux scan
access method, this would have required nine chip-level Any number of internal cores and their channels can be
scan input pins and nine scan output pins. controlled with an SSN bus that is as narrow as one bit.
This is because the packets can be as wide as they need
With SSN, the packet size is set to nine bits independent to be, and can occupy as many bus words as needed. If
of the SSN bus width, which is eight bits in this the packet is wider than the bus and occupies multiple
example. Nine bits have to be delivered for each of the bus words, the cores shift less often than once every
two blocks to shift once. The first five bits of every bus shift cycle but it will be possible to drive all the
nine-bit packet are programmed to belong to block A, cores needed. In this example with 9-bit packets and an
and the next four bits of every packet are programmed 8-bit bus, the blocks shift approximately every bus/
to belong to block B. This is all determined and tester clock cycle. Occasionally, a block may omit
programmed at pattern generation time – it is not hard- shifting in a given cycle because it has to wait to acquire
coded in the SSN logic. all the bits it needs for one shift cycle. If the bus is
After programming all the SSN nodes using IJTAG, SSN 1- bit-wide instead of 8 bits wide, it takes 9 tester cycles
delivers a continuous, repeating stream of 9-bit packets. to scan in each packet.
As soon as block A extracts 5 bits from the bus, it There are cases where the packet size is less than the
performs one internal shift operation. Likewise for block total number of scan channels. Assume that in the
B, every time it accumulates 4 bits. example in figure 7, block A has a much smaller number
The SSH is programmed with the shift count per scan of test cycles than block B. In this case, to optimized the
load, so it can identify when to perform shift, and when overall bandwidth, rather than allocating 5 bits per
to perform capture. Capture involves events generated packet to block A, a smaller number (for instance 3) is
by the SSH such as de-asserting scan_enable, applying allocated. This reduces the packet size and optimizes
the overall test time.

Figure 7: Simultaneous testing of two blocks.

Siemens Digital Industries Software 9


Siemens Digital Industries Software About Siemens Digital Industries Software
Siemens Digital Industries Software is driving
Headquarters transformation to enable a digital enterprise where
Granite Park One engineering, manufacturing and electronics design
5800 Granite Parkway meet tomorrow. Our solutions help companies of all
Suite 600 sizes create and leverage digital twins that provide
Plano, TX 75024 organizations with new insights, opportunities and
USA levels of automation to drive innovation. For more
+1 972 987 3000 information on Siemens Digital Industries Software
products and services, visit siemens.com/software
Americas or follow us on LinkedIn, Twitter, Facebook and
Granite Park One Instagram. Siemens Digital Industries Software –
5800 Granite Parkway Where today meets tomorrow.
Suite 600
Plano, TX 75024
USA
+1 314 264 8499

Europe
Stephenson House
Sir William Siemens Square
Frimley, Camberley
Surrey, GU16 8QD
+44 (0) 1276 413200

Asia-Pacific
Unit 901-902, 9/F
Tower B, Manulife Financial Centre
223-231 Wai Yip Street, Kwun Tong
Kowloon, Hong Kong
+852 2230 3333

siemens.com/software
© 2021 Siemens. A list of relevant Siemens trademarks can be found here. Other trademarks
belong to their respective owners.
82735-C5 06/2021 BM

10

You might also like