All My X's Come From Texas Not!!: Matt Weber Jason Pecor
All My X's Come From Texas Not!!: Matt Weber Jason Pecor
All My X's Come From Texas Not!!: Matt Weber Jason Pecor
Matt Weber
Jason Pecor
[email protected]
[email protected]
ABSTRACT
Undertaking gate level simulation can put you on the fast track to confusion, frustration, and
language more colorful than an X infested simulation waveform. This paper will look at the
reasons for doing gate level simulation anyway, describe design and library "features" that can
cause gate level simulation problems, and discuss potential solutions to these problems.
Table of Contents
Table of Figures
Figure 1: Simple Clock Divider .................................................................................................... 8
Figure 2: Clock Divider with Reset (won't work) ......................................................................... 8
Figure 3: Transfer From Fast Clock to Divide By 2 Clock ......................................................... 11
Figure 4: Race Condition in Transfer to Divide By 2 Clock Domain......................................... 12
Figure 5: SDF Fixes Race Condition to Divide By 2 Clock Domain ........................................ 13
Figure 6: Reset Close to Target Register..................................................................................... 15
Figure 7: Synthesis Result........................................................................................................... 16
Figure 8: AND-OR Combination Prevents Reset From Clearing X's......................................... 16
Figure 9: Logic Optimization Fixes X Problem.......................................................................... 16
Figure 10: Even Better Logic Optimization ................................................................................ 17
Figure 11: Simple Flop to Flop Path ........................................................................................... 17
Figure 12: “Creative” Logic for Simple Flop to Flop Path ......................................................... 18
Figure 13: Synchronizer Circuit .................................................................................................. 19
Some design teams use gate level simulation only in a zero-delay, ideal clock mode to check that
the design can come out of reset cleanly or that the test structures have been inserted properly.
Other teams do fully back annotated simulation as a way to check that the static timing
constraints have been set up correctly. In all cases, getting a gate level simulation up and running
is generally accompanied by a series of challenges so frustrating that they precipitate a shower of
adjectives as caustic as those typically directed at your most unreliable internet service provider.
There are many sources of trouble in gate level simulation. This paper will look at examples of
problems that can come from your library vendor, problems that come from the design, and
problems that can come from synthesis. It will also look at some of the additional challenges that
arise when running gate level simulation with back annotated SDF.
When using the –y option to point to a directory of library cell models, the module name and
filename must match exactly. Sometimes you may find that they do not match; for example, the
module name is uppercase and the filename is lowercase. In these situations you must work
around the problem by using the –v option to point directly to the desired file(s). You should also
fix the problem at its source by informing your technology vendor of the troubles they are
causing you.
Some technology libraries are set up with separate verilog files for each library cell, but the
modules depend on user defined primitives which are collected in another file. In these cases, the
–y option can be used to get the library cell models, but the –v option will also be needed to point
to the file containing the user defined primitives.
Instead of directly including these –y and –v options on the VCS command line, they are often
put together into a technology specific file that can be referenced with VCS’ –f option.
+libext+.v
-y /techlibs/your_technology/v6.0/verilog/
-v /techlibs/your_technology/v6.0/verilog/cell_udps.v
-y /techlibs/your_technology/v6.0_rams/verilog/
-y /techlibs/your_technology/v6.0_custom/verilog/
Typically, the specify blocks of the modules are set up in one of two ways:
1. All library cells are given specify block delays of 1ns as shown for the buffer above. This
arrangement will almost certainly cause simulation failures. If your simulation is running
with a 250MHz clock (4ns cycle time), any logic path with more than four gates will not
finish its updates in time for the next clock cycle to begin. Simulations with this problem
can often be difficult to debug as the logic at first appears to be engaged in all kinds of
interesting, creative, and nonsensical behavior. Adding +nospecify to the VCS command
line (or the technology specific options file described in section 2.1) is the correct way to
work around this problem.
2. The flops in the library may be given specify delays of 1ns or 0.1ns, but all combinational
cells are given specify delays of 0ns. This type of library setup can usually be run without
using the +nospecify option. The +nospecify option may still be needed if your clock is
faster than the specify time. Even if the +nospecify option is not required, you may want
to use it anyway because it does improve simulation performance.
In some cases, the specify blocks are included inside of `ifdef statements and can be avoided by
using the appropriate `define. The define can be set in either your testbench, or on the VCS
command line (or technology specific options file) with the +define+ command line option.
While this model is functionally correct, it can be pessimistic in its propagation of X’s. When the
D0 and D1 inputs are both 1’b1, the output of the mux should be 1’b1, regardless of the state of
the SD input. However, with the above code, a 1’bx on the SD will cause a 1’bx on the Z output,
even if D0 and D1 are both 1’b1. The following two lines are typically added to the UDP table
for a mux:
// D0 D1 SD : Z
0 0 x : 0 ;
1 1 x : 1 ;
While we have never seen a vendor’s library with pessimistic X propagation on a multiplexer, we
have seen this problem on more complex gates. Sometimes an element needed to be dropped
from the logic equation as in this example:
In all cases, the ultimate fix was obtained by having the technology vendor modify their library
models.
One example of a register that may need help to remove its initial X’s is a clock divider. A clock
divider is created by having a register feed back into itself through an inverter.
clk_divby2
clk
rst rst_divby2
Logic
cloud
clk_divby2
clk
`ifdef EN_START_VALS
initial Q = 1'b0;
`endif
endmodule // SLE_CLKDIV_FLOP
After synthesis, the initial block will no longer exist, and some other method is required to force
an initial condition into the register. If the library model for the flop uses a reg construct, the
verilog assign statement can be used at time zero to set an initial condition on the register. Most
technology libraries, however, define the functionality with a user defined primitive (UDP) that
does not have a reg that can be assigned. The following “gate level kicker” code can be used to
initialize such registers.
With this code, the actual internal state of the register has not been changed. However, by
overriding the Q (and/or the QN) output, the initial X in this register is not allowed to propagate
to downstream logic. The value is chosen randomly to ensure that the downstream logic is not
making any assumptions about the initial state of this register. Because of the clock divider’s
feedback loop, the D input to the register now has a non-X value on it, and this will clear the
register when the clock starts running.
You can also “kick” the D input of the register instead of the Q and/or QN outputs. This will
initialize the register when the clock starts. Until then, however, it will still allow the initial X to
The gate level kicker code above also includes a hint that could save you a few hours of
frustrating trial and error debug time. VCS uses a “.” character to separate levels of hierarchy in a
design. When flattening a design using Design Compiler and other tools, the names of the gates
in the flattened design often include the previously hierarchical instance names and a “/”
character is often used between these hierarchical instance names (for example “clkdiv_reg/U1”
above). In verilog, a “/” character is only valid in an escaped identifier. An escaped identifier
starts with a “\” character and ends with a space, tab, or newline. VCS further requires a space in
front of the “\” character that starts the escaped identifier.
In the gate level kicker code above, the testbench instantiates a design as tx_phy. This tx_phy
design has been flattened, so the name of the clock divider flop is now “clkdiv_reg/U1”. There
are several methods to incorrectly reference pins on this flop, and only one correct method.
SLOW_CLK
D Q
REG_A REG_B
REG_A_IN D Q REG_B_IN D Q REG_B_OUT
REG_C REG_D
REG_C_IN D Q REG_D_IN D Q REG_D_OUT
FAST_CLK
In RTL simulations, regA and regC are typically described behaviorally, with non blocking
assignments. The clock divider is either described with a blocking assignment, or with the
instantiation of a particular gate from the target library. In either case the simulator will update
the value of the clock divider before updating the regA and regC outputs. In gate level simulation
however, regA, regC, and the clock divider register can be executed in any order. If the simulator
chooses to execute regA, then the clock divider, then regC, the data flowing through this simple
pipeline will be corrupted. In the following example, the data coming out of {regA,regC} is
00,11,00. However, regA executes before the clock divider and regC executes after the clock
divider, and the data coming out of {regB,regD} is 00,10,01. Debugging a problem like this can
take many hours if you have not seen it before.
REG_B_IN
SLOW_CLK
REG_D_IN
REG_B_OUT
REG_D_OUT
There are a few different methods for dealing with this problem. The first possibility is to avoid
the problem through careful design. If the fast_clk registers update on the cycles where slow_clk
is falling and hold their values on the cycles when slow_clk is rising, the problem will be
avoided. This constraint does not need to be applied to all fast_clk registers, just those fast_clk
registers that are sending their data into the slow_clk clock domain. Achieving this kind of
synchronization between the clock domains is sometimes difficult and requires advance planning
during rtl design.
Another design change that can be used to avoid the problem is to use a falling edge triggered
flip flop for the clock divider. This guarantees that the rising edges of the original and divided
clocks will be at different times and will therefore eliminate the race condition. While this
solution can make testability and timing closure more difficult, it is a relatively easy change to
make in RTL.
If the problem has not been avoided through the design, there are a few possible methods for
working around it in gate level simulation. Each of these methods seeks to ensure that the
registers in the slow_clk domain will not see a rising clock edge until after all of the fast_clk
domain registers have been updated. Since there is no way to guarantee when the clock divider
register will update relative to the other fast_clk registers, some delay needs to be added between
the clock divider register and the clock inputs of the slow_clk registers. One way to do this is to
use an SDF file to create a CLK->Q delay on the clock divider register. The SDF file may look
like this:
(CELL
(CELLTYPE "DFF")
(INSTANCE clkdiv_reg\/U1)
(DELAY
(ABSOLUTE
(IOPATH CLK Q (0.2))
)
)
)
)
VCS uses the $sdf_annotate system task to apply the SDF to the netlist.
When used by the testbench, this SDF file will create 200ps of CLK->Q delay on the divider
flop, ensuring that the fast_clk registers are all updated before the slow_clk registers see a clock
edge.
FAST_CLK
REG_B_IN
SLOW_CLK
REG_D_IN
REG_B.CLK
REG_D.CLK
REG_B_OUT
REG_D_OUT
Another potential workaround is typically available after the clock trees have been built. If the
slow_clk clock tree starts with a single clock buffer at the root of the tree, that clock buffer can
be given a nominal delay without the overhead of using an SDF file. In the gate level kicker code
of section 3.1, a constant value was used in a Verilog force statement. The force statement,
however, can also reference a wire or other variable, and when that variable changes, the force
statement will be reevaluated. Adding delay to the slow_clk path can then be as simple as forcing
some delay on the input of the root clock buffer.
The disadvantage of this method is the care that must be taken when creating this force
statement. If there is logic between the clock divider flop and the root of the clock tree (such as a
bypass mux for testability), the force statement must account for that functionality, otherwise the
functionality of the design will have been changed and the simulation results may not be
accurate.
And the resulting gates from Design Compiler (v2003.06-SP1) for bit 4 of this register looked
like this:
rst
Figure 8: AND-OR Combination Prevents Reset From Clearing X's
With this logic structure, the reset signal is not able to overcome X’s in the logic cloud and
successfully reset the register. With a little Boolean Algrebra, however, we find that the B1 path
is completely unnecessary and could have been tied to 1’b1:
count[4]
Logic
cloud
rst
Figure 10: Even Better Logic Optimization
Either of these two modifications will allow the register to reset properly and the gate level
simulation to run better. Of course, manual edits of a gate level netlist are not a popular addition
to the design flow. Instead, we have reverted to Design Compiler 2003.03 where the problem has
not been seen (yet), and we will report the problem to Synopsys Support.
It was a simple pipeline stage with no logic between the flops. The expected synthesis output was
this:
din_reg dout_reg
clk
din_reg
Q DN Q
EN
dout_reg
clk
The purpose of SDF simulation is to prove that the design will run at the specified operating
frequency while modeling expected timing delays in the circuit. When setup and hold check
error messages or functional failures generated during simulation are true failures, they indicate
areas of your static timing scripts and constraints that need improvement. Unfortunately, timing-
based simulations can also generate false errors so egregious that they will have you thinking
about what size box you want for your personal effects and where you would like you eat your
farewell lunch.
Typically, a timing failure will not only result in a message in the error log, but it will also
propagate an unknown value on to the register output. Obviously, if a particular register is
designed such that it is not intended to meet timing, this behavior will likely introduce terminal
cases of X propagation. Some libraries will allow for modification of this behavior by enabling
certain defines. In other cases, unique, hand-modified local libraries may be required to handle
the problem. In either case, understanding that the issue might develop is a step toward
mitigating the time lost debugging this category of problems.
clk1
clk2
Multi-cycle paths are another design element that can create false errors in timing based
simulations. In this case, the designer understands the need for more than one clock cycle to
propagate through a logic cloud, and the STA environment will be appropriately set up to handle
the multi-cycle requirement. Unfortunately, the simulation model checks could still recognize
and flag timing violations. Similar to the synchronization registers, the endpoints of multicycle
paths generally need to have their setup and hold checks removed from the SDF file.
An alternative is to allow the setup/hold check fail and the resulting X to propagate through the
design. If the path is truly a mutli-cycle path, the register output should not be used during the
cycle when it is at an X value, and allowing the register to go to X during that cycle provides an
extra check to ensure that the functionality truly operates this way. When using this approach,
some post processing of the log file is typically required to filter out the timing violations that
have been accepted on the multi-cycle paths.