Siemens SW Streaming Scan Network WP 82735 C7
Siemens SW Streaming Scan Network WP 82735 C7
Siemens SW Streaming Scan Network WP 82735 C7
Executive summary
Tessent™ Streaming Scan Network (SSN) is a system for packetized
delivery of scan test patterns. It enables simultaneous testing of any
number of cores with few chip-level pins and reduces test time and test
data volume. With SSN, DFT engineers have a true SoC DFT solution
without compromises between implementation effort and manufacturing
test cost.
siemens.com/software
White paper | Tessent Streaming Scan Network
Figure 1: The Hierarchical DFT Compromise: Implementation effort vs. bandwidth management.
Figure 2: SSN enables a true bottom-up flow with multiple test cost reduction capabilities.
In addition to the planning, implementation, and test per pattern set and only the scan payload is streamed
cost benefits, the SSN architecture eases routing and following the setup. There is no need to send any
timing closure and is fully compatible with tile-based opcode or address information with each packet. Each
design with abutment. SSH controls the local scan operations for the core,
including transitions between load/unload and capture
This is all made possible while still supporting all ATPG stages. All scan signals and EDT controls are generated
pattern types and fault models. SSN is also compatible by the SSH local to the core.
with all Tessent DFT methodologies and products, and
The SSN bus width is selected based on chip-level pin
has full support for diagnosis and yield analysis.
availability and is independent of the number and size
SSN technology and concepts of the scanned cores, and the number of channels
SSN is a bus-based scan data distribution architecture. needed by the EDT controller(s) in each core. With the
Figure 3 shows a simple example with a 6-core design same parallel bus width, each core has the same plug-
that uses SSN. Each core typically contains one and-play interface, allowing SSN to scale efficiently as
Streaming Scan Host (SSH) node (light blue box). The the design floorplan, number of cores, or the content of
SSH drives local scan resources with data delivered on the cores change.
the SSN bus. Although just a single compressor/
With SSN, the bus width and channel count of the cores
compactor is shown in the figure, the SSH node can
are independent of each other. Scan test data is
interface with one or more Tessent TestKompress
delivered synchronously across the bus in a packet
embedded deterministic test (EDT) controller(s),
format to each core. The number of bits a core can
uncompressed scan chains, or a combination of the
receive per packet is algorithmically determined from
two.
the pattern statistics of the cores running concurrently,
but cannot be greater than the number of scan (EDT)
channels. The data delivered from the tester may be
viewed as a continuous stream of packets that may
wrap around SSN bus boundaries.
Consider the example shown in figure 4 where two
blocks are being tested at the same time. Block A has
five scan channels and loads/unloads five bits per shift
cycle. Block B has four channels. For both blocks to
perform one shift cycle, nine bits have to be delivered.
In a conventional pin-mux scan access method, this
would have required nine chip-level scan input pins and
nine scan output pins. With SSN, the packet size is set to
nine bits independent of the SSN bus width, which is
Figure 3: SSN used in a 6-core design. eight bits in this example. The concept of packetized
data delivery in SSN is further explained in the
Appendix.
Each SSH has two external interfaces: An IEEE 1687
IJTAG interface and a parallel SSN data bus. The IJTAG
network is used to configure all nodes in the SSN
network before the application of scan test patterns.
Each node is loaded with information related to the
protocol such as the active bus width, its location in the
series of nodes driven, the number of shift cycles per
scan pattern, scan_enable transition timing information,
etc.
Following this setup, the entire scan test pattern set is
Figure 4: Testing two blocks at the same time.
applied as packetized data that is streamed on the
parallel SSN bus. The SSHs are programmed just once
The ability to route the bus carrying the data from one scanned into the chip through 64 pins at 200 MHz and a
core to the next while dynamically controlling which bus frequency multiplier (BFM) added between the scan
cores are active/inactive/bypassed means one has inputs and the first SSH to convert this input stream to a
flexibility in accessing any combination of cores without 32-bit, 400 MHz bus. This 32-bit bus is then used across
changing the hardware. Unlike pin-mux architectures, the chip, connecting all SSH nodes with 32- bit buses.
this flexibility does not come at the expense of routing On the output side, a bus frequency divider (BFD) node
congestion. There is no need to try and predict at is added to convert the SSN output bus back to a 200
design time how to group cores that are to be tested MHz 64-bit bus driving the output pins.
concurrently. Whether performing ATPG on groups of
cores or retargeting patterns from different cores, the Optimizing test time and data volume
same SSN network can provide access to one core at a When ATPG is run with multiple interacting cores in
time, all cores simultaneously, or anything in-between. external test mode, it is necessary to align capture
cycles of all the affected cores. With SSN, the SSHs are
Time multiplexing programmed such that each core can shift
When driving multiple cores concurrently, the packet independently, but capture occurs concurrently once all
typically spans multiple bus widths, resulting in an cores have completed scan load/unload.
internal shift frequency slower than the SSN bus
frequency. In many cases, it is possible to implement a In other situations, such as when ATPG for wrapped
400MHz SSN bus, but not possible to shift data through cores with OCCs run in isolation, it is more effective if
the chip-level pins at more than 200 MHz. Assume that each core can independently transition between shift
the SoC has enough pins to implement 64 scan inputs and capture. A core with short scan chains should not
and 64 scan outputs. need to wait for other cores to complete shifting before
they can capture. Often there are significant imbalances
One option would be to implement a 64-bit bus in the pattern counts of different cores.
throughout the chip and operate it at 200 MHz.
Alternatively, as shown in figure 5, the data can be Traditional retargeting methods add padding for the
cores with fewer patterns resulting in wasted data,
cycles, and test time. If a core requires many fewer The same packet data is used by each identical core
overall shift cycles across a pattern set than other cores, instance, as it synchronously moves through the
it can be sent fewer bits per packet. network. Each core performs its own on-chip
comparison. A pass/fail “sticky” bit is observed on TDO.
For example, a core with 4 channels does not need to
The optional accumulated per-shift status can be added
be allocated 4 bits per packet. It can be throttled down
to the packet and observed on the SSN outputs.
and sent only 1 bit per packet such that it shifts
internally every four packets instead of every packet.
Tile-based designs
The result is that the total number of packets remains
SSN is designed to support the abutment of cores in
the same, but the size of the packets is reduced,
tile-based designs with no routing outside the cores.
speeding up the overall test time.
The outputs of one core connect to the inputs of the
next adjacent core. A chip with SSN usually has a single
Effective testing of identical cores
SSN datapath (parallel bus) that goes through all cores.
Many SoCs achieve high throughput by parallelizing
Depending on the floorplan and pad locations, it may
processing contain a number of cores that are replicated
be preferable for physical design to implement multiple,
multiple times. In pin-mux scan architectures, the scan
physically independent datapaths. Each datapath is also
inputs may be broadcast to identical core instances, but
configurable and can include muxes that can be
the scan outputs are usually observed independently to
programmed to include or exclude segments of the
ensure lossless mapping and observability for diagnosis.
network similar to the Segment Insertion Bit (SIB) in
SSN provides a scalable method for testing any number
IJTAG networks.
of identical core instances in near-constant test time,
independent of the number of available chip-level pins.
Input data, expected responses, and compare/
nocompare mask data are scanned in within each
packet, as illustrated in figure 6.
Summary
The SSN technology described in this paper solves many
of the scan distribution challenges in complex SoCs. By
decoupling chip and core level DFT, it enables
concurrent testing of any number of cores with few
chip- level pins, and it has multiple features to reduce
test time and test data volume.
It can test any number of identical core instances in
near-constant time, minimizes padding in the presence
of cores with mismatched pattern counts and/or scan
chain lengths, and enables fast streaming of data to/
from and throughout the chip. It simplifies design
planning and implementation and is especially well
suited for tile-based designs.
The SSN implementation flow is based on Tessent Shell
flow for hierarchical designs. SSN is fully supported by
Tessent TestKompress™ and Tessent Diagnosis, and can
co-exist with all other Tessent DFT technologies such as
Tessent MemoryBIST and Tessent LogicBIST.
In SSN terminology, a “packet” refers to all the scan data The stream of nine bit packets is folded into the 8-bit
needed for all the active SSH nodes to perform a single bus with no bits wasted. The first nine bit packet
internal scan shift operation. Key to the SSN occupies the first eight bit parallel word of the bus, and
architecture is that the size of a packet is independent the first bit of the second word (second tester cycle).
of the SSN bus width. The SSN payload delivered from The locations of the nine bit packets within each eight
the tester may be viewed as a continuous stream of bit bus word rotate with each packet, so that the second
packets that may wrap around the SSN bus boundaries. packet starts immediately after the first, occupying the
remaining seven bits of the second parallel word, and
Consider the example shown in figure 7 where two the 2 bits of the following parallel word. Typically, the
blocks are tested concurrently. Block A loads/unloads same time slots of the packet that carry scan-in data to
five bits per shift cycle (has five EDT channels). Block B an SSH node also carry scan-out data from that node. As
has four channels and loads/unloads four bits per shift block A reads the first 5 bits of every packet, it replaces
cycle. For both blocks to perform one shift cycle, nine them with 5 bits scanned out (with slight latency).
bits have to be delivered. In conventional pin-mux scan
access method, this would have required nine chip-level Any number of internal cores and their channels can be
scan input pins and nine scan output pins. controlled with an SSN bus that is as narrow as one bit.
This is because the packets can be as wide as they need
With SSN, the packet size is set to nine bits independent to be, and can occupy as many bus words as needed. If
of the SSN bus width, which is eight bits in this the packet is wider than the bus and occupies multiple
example. Nine bits have to be delivered for each of the bus words, the cores shift less often than once every
two blocks to shift once. The first five bits of every bus shift cycle but it will be possible to drive all the
nine-bit packet are programmed to belong to block A, cores needed. In this example with 9-bit packets and an
and the next four bits of every packet are programmed 8-bit bus, the blocks shift approximately every bus/
to belong to block B. This is all determined and tester clock cycle. Occasionally, a block may omit
programmed at pattern generation time – it is not hard- shifting in a given cycle because it has to wait to acquire
coded in the SSN logic. all the bits it needs for one shift cycle. If the bus is
After programming all the SSN nodes using IJTAG, SSN 1- bit-wide instead of 8 bits wide, it takes 9 tester cycles
delivers a continuous, repeating stream of 9-bit packets. to scan in each packet.
As soon as block A extracts 5 bits from the bus, it There are cases where the packet size is less than the
performs one internal shift operation. Likewise for block total number of scan channels. Assume that in the
B, every time it accumulates 4 bits. example in figure 7, block A has a much smaller number
The SSH is programmed with the shift count per scan of test cycles than block B. In this case, to optimized the
load, so it can identify when to perform shift, and when overall bandwidth, rather than allocating 5 bits per
to perform capture. Capture involves events generated packet to block A, a smaller number (for instance 3) is
by the SSH such as de-asserting scan_enable, applying allocated. This reduces the packet size and optimizes
the overall test time.
Europe
Stephenson House
Sir William Siemens Square
Frimley, Camberley
Surrey, GU16 8QD
+44 (0) 1276 413200
Asia-Pacific
Unit 901-902, 9/F
Tower B, Manulife Financial Centre
223-231 Wai Yip Street, Kwun Tong
Kowloon, Hong Kong
+852 2230 3333
siemens.com/software
© 2021 Siemens. A list of relevant Siemens trademarks can be found here. Other trademarks
belong to their respective owners.
82735-C5 06/2021 BM
10