FC - Flow 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Fusion Compiler

The Fusion Compiler tool synthesizes the RTL descriptions into optimized gate-level designs,
optimizes them to achieve the best quality of results (QoR) in speed, area, and power, and
allows early design exploration and prototyping.

Performing Unified Physical Synthesis

The Fusion Compiler tool performs unified physical synthesis. The benefits of unified physical
synthesis include

Reduced runtime

Unified physical synthesis streamlines the iterations of placement and optimization, and
utilizes the initial placement and buffer trees built during synthesis to achieve the best
runtime.

Improved quality of results (QoR)

The unified preroute optimization framework uses consistent costs in optimization algorithms
to improve QoR. The best technologies, such as logic restructuring, concurrent clock and data
optimization, multibit mapping, and advanced legalizer, are shared throughout the preroute
flow.

The following topics describe how perform unified physical synthesis:

Using the compile_fusion Command

Performing Prechecks

Generating Verification Checkpoints During Compilation

• Overview of Fusion Compiler Physical Synthesis


Fusion Compiler physical synthesis provides the following benefits:

• Enhanced WordView GTECH (WVGTECH), which represents a set of bits as a single


object, enables bit-slice preservation from RTL through placement and provides
better optimization by using parallel wide-design structures.

• Automatic floorplan generation during RTL synthesis uses a native floorplanning


engine that can generate a complete floorplan or generate missing floorplan data.

• Comprehensive design checking before and after every stage of the flow allows for
early detection of design and floorplan issues.

• Early RTL synthesis can use libraries with missing technology library cells, such as
retention cells, isolation cells, level shifters, and integrated clock-gating cells.
• Tightly coupled DFT synthesis technology supports concurrent synthesis of
functional logic and DFT logic, congestion optimization with placement-aware DFT
codecs, and better handling of the interaction between power intent and DFT logic.

• Unified physical synthesis (UPS) brings unparalleled sharing of optimization engines


between synthesis and place-and-route, which delivers a simplified flow with greater
convergence and correlation.

Figure 1 shows the basic physical synthesis design flow.

Figure 1: Fusion Compiler Physical Synthesis Flow

Using the compile_fusion Command


To perform unified physical synthesis, use the compile_fusion command.
The compile_fusion command consists of the following stages:
initial_map
During this stage, the tool maps the design and performs area optimization.
logic_opt
During this stage, the tool performs logic-based delay optimization.
initial_place
During this stage, the tool merges the clock-gating logic and performs coarse placement.
initial_drc
During this stage, the tool removes existing buffer trees and performs high-fanout-net
synthesis and electrical DRC violation fixing.
initial_opto
During this stage, the tool performs physical optimization and incremental placement.
final_place
During this stage, the tool performs final placement to improve timing and congestion.
final_opto
During this stage, the tool performs final optimization and legalization to improve timing,
power, and logical DRCs.
When you run the compile_fusion command, by default, the tool runs all stages. To run only
some of these stages, use the -from option to specify the stage from which you want to begin
and the -to option to specify the stage after which you want to end. If you do not specify the -
from option, the tool begins from the initial_place stage. Similarly, if you do not specify the -
to option, the tool continues until the final_opto stage is completed.
Breaking the compile_fusion command in to stages allows you to perform different tasks
between the stages, such as changing constraints or settings, generating reports for analysis,
performing other implementation tasks such as design planning or DFT insertion, and so on.
If you make manual changes to the design between stages, ensure the changes honor the
expected state of the design. For example, the design is expected to be mapped after
the initial_map stage. Therefore, do not introduce unmapped logic to the design after this
stage.
You can repeat a stage; however, ensure that you run all stages in the correct sequence. The
following example runs the compile_fusion command until it completes
the initial_opto stage, performs analysis, changes an application option setting, and restarts
by running the initial_opto stage again.
fc_shell> compile_fusion -to initial_opto
fc_shell> report_qor
fc_shell> set_app_option -name opt.common.advanced_logic_restructuring_mode -value
timing
fc_shell> compile_fusion -from initial_opto

• In the following situations, the design can contain unmapped cells after
the compile_fusion command:
Missing the required cells in the library
User-specified restrictions on the required library cells
When a design contains unmapped cells, the compile_fusion command continues the
compilation and provides an estimated area for the unmapped cells.
Unmapped cells are marked with the is_unmapped attribute. To query unmapped cells in a
netlist, use the following command:
fc_shell> get_cells -hierarchical -filter "is_unmapped==true"
Note the following limitations:
DesignWare operators are not mitigated.
Only placement information is stored in the database after mitigation; routing information is
not saved.

• Performing Prechecks
Fusion Compiler can perform compile prechecks to prevent compile failures and allow
the compile_fusion command to quickly exit with error messages when incomplete (or
inconsistent) setup or constraint issues are encountered. This capability is on by default, but
these checks can also be enabled by the compile_fusion -check_only command.
It will do flowing checks:
Design is not linked or has link issues

Cell is not resolved after linking.

Cell is physical hierarchy and not allowed in compile

Cell is logical hierarchy and not allowed in compile.

Design is not present.

• Compile Precheck Examples


The following two examples show the compile logs with and without compile prechecks. When
the -check_only option is set, the compile_fusion command issues error messages for any failed
precheck condition.

Example 1: Compile Log with Compile Prechecks


------------------------- Begin compile flow ----------------------- (FLW-3001)
Compile in progress: 1.0% (FLW-3778)
Error: Layer M1 does not have a preferred direction (OPT-1008)
Error: Layer MRDL does not have a preferred direction (OPT-1008)
Warning: Technology layer 'M1' setting 'routing-direction' is not valid (NEX-001)
Warning: Technology layer 'MRDL' setting 'routing-direction' is not valid (NEX-001)
Example 2: Compile Log Without Compile Prechecks
---------------------- End logic optimization -------------------- (FLW-3001)
Compile in progress: 27.0% (FLW-3778)
Compile in progress: 28.0% (FLW-3778)
Compile in progress: 28.5% (FLW-3778)
Information: Corner max_corner: no PVT mismatches. (PVT-032)
Warning: Technology layer 'M1' setting 'routing-direction' is not valid (NEX-001) ...
Warning: Technology layer 'MRDL' setting 'routing-direction' is not valid (NEX-001)
Information: Design Average RC for design top (NEX-011)

Generating Verification Checkpoints During Compilation


You can generate intermediate verification checkpoints during the compile_fusion command,
which you can use in the Formality tool to provide a netlist snapshot of the synthesis design
state and verify the intermediate netlists. Verification checkpointing allows the Fusion
Compiler and Formality tools to synchronize on an intermediate netlist and in turn results in
higher completion rates and better QoR.
To enable verification checkpointing, use the set_verification_checkpoints command as
shown in the following example:
fc_shell> set_verification_checkpoints
Enabling checkpoint "ckpt_logic_opt"
Enabling checkpoint "ckpt_pre_map"
When you enable this feature, by default, the compile_fusion command generates verification
checkpoints before mapping and during logic optimization, and the corresponding
verification-checkpoint stages are named ckpt_pre_map and ckpt_logic_opt. You can enable
verification checkpointing at only one stage as shown in the following example:
fc_shell> set_verification_checkpoints {ckpt_logic_opt}
Disabling checkpoint "ckpt_pre_map"
Enabling checkpoint "ckpt_logic_opt"
At each verification checkpoint, the tool automatically generates a checkpoint netlist and
the guide_checkpoint command in the .svf file that is used by the Formality tool during
verification.
• Controlling Mapping and Optimization
The following topics describe how you can customize and control the mapping and
optimization that is performed during the initial_map, logic_opto, and initial_drc stages of
the compile_fusion command:
Ungrouping or Preserving Hierarchies During Optimization
Controlling Boundary Optimization
Controlling Datapath Optimization
Controlling Mux Mapping
Controlling Sequential Mapping
Controlling Register Replication
Controlling Register Merging
Selectively Removing or Preserving Constant and Unloaded Registers
Reporting Cross-Probing Information for Optimized Registers
Controlling High-Fanout-Net Synthesis

Ungrouping or Preserving Hierarchies During Optimization


During optimization, the compile_fusion command performs the following types of automatic
grouping:
Area-based automatic ungrouping:
Before initial mapping, the command estimates the area for unmapped hierarchies and
removes small subdesigns. Because the command performs automatic ungrouping at an early
stage, it has a better optimization context. Additionally, Datapath extraction is enabled across
ungrouped hierarchies. These factors improve the area and timing quality of results.
Delay-based automatic ungrouping
The command ungroups hierarchies along the critical path and is used essentially for timing
optimization.
The following example specifies that parent hierarchies of blocks with UPF and SDC
constraints should preserved and that automatic ungrouping of other hierarchies should begin
from the third level:
fc_shell> set_autoungroup_options -keep_parent_hierarchies UPF

fc_shell> set_autoungroup_options -keep_parent_hierarchies SDC

fc_shell> set_autoungroup_options -start_level 3


• Controlling Boundary Optimization
You can control boundary optimization across user-specified logical boundaries and match the
interface boundary of a physical block with the RTL module interface. Boundary optimization
allows you to use RTL floorplanning DEF files during block implementation.
The following figure shows how constants, equal-opposite logic, logic phase, and unloaded
ports are propagated through the logical boundaries without and with boundary optimization.

Based on the types of boundary optimization you enable or disable, the tool sets
the constant_propagation, unloaded_propagation, equal_opposite_propagation,
and phase_inversion read-only attributes of the specified object to true or false. If you
Enable all four types, the tool sets the boundary_optimization read-only attribute to true
Disable at least one type, the tool sets the boundary_optimization read-only attribute to false
To report boundary optimization settings, use the report_boundary_optimization command.
To remove them, use the remove_boundary_optimization command.
The following example
Disables all boundary optimization for the U1 cell instance
Disables equal-opposite logic propagation and phase inversion for the U2 cell instance
Disables constant propagation and phase inversion for the U3/in1 cell pin

fc_shell> set_boundary_optimization [get_cells U1] none


fc_shell> set_boundary_optimization [get_cells U2] auto
fc_shell> set_boundary_optimization [get_pins U3/in1] -constant_propagation
false
fc_shell> set_boundary_optimization [get_pins U3/in1] -phase_inversion false

• Controlling Datapath Optimization


During datapath optimization, the tool performs the following tasks:

Datapath Extraction
Datapath extraction transforms arithmetic operators, such as addition, subtraction, and
multiplication, into datapath blocks to be implemented by a datapath generator. This
transformation improves the quality of results (QoR) by using carry-save arithmetic.
The tool supports extraction of the following components:
Arithmetic operators that can be merged into one carry-save arithmetic tree
Operators extracted as part of a datapath: *, +, -, >, <, <=, >=, ==, !=, and MUXes
Variable shift operators (<<, >>, <<<, >>> for Verilog and sll, srl, sla, sra, rol, ror for VHDL)
Operations with bit truncation
The datapath flow can extract these components only if they are directly connected to each
other, that is, no nonarithmetic logic between components. Keep the following points in mind:
Extraction of mixed signed and unsigned operators is allowed only for adder trees
Instantiated DesignWare components cannot be extracted

• Running Concurrent Clock and Data Optimization


Changing clock latencies can balance the slack in successive timing path stages to optimize
clock and data paths. This capability is called concurrent clock and data (CCD) optimization,
which can reduce total negative slacks (TNS), worst negative slacks (WNS), area, and leakage
power. In addition, it can improve correlation between physical synthesis and place-and-route
.
Useful Skew
This figure shows how concurrent clock and data optimization allows for clock skew to remove
negative slack and potential area and leakage recovery.
• Controlling Concurrent Clock and Data Optimization
By default, the tool performs concurrent clock and data optimization during
the compile_fusion command. You can disable it by setting the compile.
flow.enable_ccd application option to false. The default is true.
During concurrent clock and data optimization,
The tool uses ideal clock tree, and it updates clock latencies and balance points for all active
scenarios.
The tool adjusts the register latencies within the delayed and advanced latency settings for the
most critical sequential elements in the design.
By default, the maximum delayed latency is 100 ps, and the maximum advanced latency is 300
ps.
All path groups are considered for useful skew computation.
All boundary timing paths are considered during concurrent clock and data optimization.
Note:
In an incremental compile, enabling concurrent clock and data optimization has no effect.
For best results, you should also do the following:
Enable the hold scenario even during compile to reduce hold violations that are caused by
clock latency insertions during concurrent clock and data optimization.
The tool considers the hold scenario only for concurrent clock and data optimization, but not
for other optimization steps.
Enable concurrent clock and data optimization for the clock_opt command.
• Performing Design Analysis
Use the reports generated by Fusion Compiler to analyze and debug your design. You can
generate reports before and after you compile your design. Generate reports before compile to
check that you have set attributes, constraints, and design rules properly. Generate reports
after compile to analyze the results and debug your design.
This section includes the following topics:
Reporting Commands and Examples
Measuring Quality of Results
Comparing QoR Data
Reporting Logic Levels in Batch Mode
Querying Specific Message IDs

• Reporting Commands and Examples


This table shows some reporting commands to report QoR, timing, area, and so on.

Use this command To do this

report_area Report area information of the current design.


- Area is based on physical libraries.
- Numbers of ports and nets are not reported.

report_qor Report QoR information and statistics for the current design.
- Information is based on physical libraries.

report_timing Report the timing information of the current design.

report_power Calculate and report dynamic and static power of the design or instance.

report_clock_gating Report total clock-gating statistics of the design.

report_logic_levels Report logic levels of the design.

• report_area Example
fc_shell> report_area -designware -hierarchy -physical
****************************************
Report : area
Design : test
****************************************
Information: Base Cell (com): cell XNOR2ELL, w=3300, h=6270 (npin=3)
Information: Base Cell (seq): cell LPSDFE2, w=10560, h=6270 (npin=5)
Number of cells: 8
Number of combinational cells: 4
Number of sequential cells: 4
Number of macros/black boxes: 0
Number of buf/inv 0
Number of references 2
Combinational area: 49.66
Buf/Inv area: 0.00
Noncombinational area: 231.74
Macro/Black box area: 0.00
Total cell area: 281.40
_______________________________________________________________
Hierarchical area distribution
Global cell area Local cell area
---------------------------------------------
Hierarchical cell Absolute Percent Combi- Noncombi- Black-
Total Total national national boxes
Design
--------------------------------------------------------------------
test 281.40 100.0 37.24 231.74 0.00
test
mult_10 12.41 4.4 12.41 0.00 0.00
DW_mult_uns_J1_H3_D1
--------------------------------------------------------------------
Total 49.66 231.74 0.00

Area of detected synthetic parts


Perc.of
Module Implem. Count Area cell area
----------------------------------------------------
DW_mult_uns pparch 1 12.41 4.4%
----------------------------------------------------
Total: 1 12.41 4.4%

Estimated area of ungrouped synthetic parts


Estimated Perc. of
Module Implem. Count area cell area
----------------------------------------------------
DW_mult_uns pparch 3 48.28 17.2%
----------------------------------------------------
Total: 3 48.28 17.2%
Total synthetic cell area: 60.69 21.6% (estimated)
____________________________________________________________
Core Area: 1767
Aspect Ratio: 1.0902
Utilization Ratio: 0.1593

The above information was from the logic library.


The following information was from the physical library:

Total moveable cell area: 281.4


Total fixed cell area: 0.0
Total physical cell area: 281.4
Core area: 0.000, 0.000, 40.260, 43.890
• report_qor Example
fc_shell> report_qor
***********************************************
Report : qor
Design : top
***********************************************
Scenario 'SC1'
Timing Path Group 'clk'
-----------------------------------------------
Levels of Logic: 6
Critical Path Length: 0.28
Critical Path Slack: -0.19
Critical Path Clk Period: 0.30
Total Negative Slack: -0.80
No. of Violating Paths: 11
Worst Hold Violation: -0.07
Total Hold Violation: -0.13
No. of Hold Violations: 32.00
-----------------------------------------------
Cell Count
-----------------------------------------------
Hierarchical Cell Count: 2
Hierarchical Port Count: 24
Leaf Cell Count: 287
Buf/Inv Cell Count: 45
Buf Cell Count: 4
Inv Cell Count: 41
CT Buf/Inv Cell Count: 0
Combinational Cell Count: 208
Sequential Cell Count: 79
Macro Count: 1
-----------------------------------------------
Area
-----------------------------------------------
Combinational Area: 419.34
Noncombinational Area: 736.26
Buf/Inv Area: 67.60
Total Buffer Area: 9.66
Total Inverter Area: 57.94
Macro/Black Box Area: 2337.90
Net Area: 0
Net XLength: 0
Net YLength: 0
-----------------------------------------------
Cell Area (netlist): 3493.49
Cell Area (netlist and physical only): 3493.49
Net Length: 0

Design Rules
-----------------------------------------------
Total Number of Nets: 363
Nets With Violations: 0
Max Trans Violations: 0
Max Cap Violations: 0
-----------------------------------------------

• Comparing QoR Data


You can generate a web-based report to view and compare your QoR data with a QORsum
report by performing the following steps:
(Optional) Configure the QoR data, which is captured and displayed in the subsequent steps,
by using the set_qor_data_options command.
o To specify the most critical power scenarios in your design, use the
-leakage_scenario and -dynamic_scenario options.

The tool uses the power scenarios you specify to generate the high-level summary of the power
QoR in the QORsum report. If you do not specify these options, it uses the active power
scenario with the highest total power for both the leakage and dynamic scenario for the power
QoR summary.
These settings are only used for the power QoR summary. The tool uses the power information
of all active power scenarios to capture and report the detailed power information in the
QORsum report.
o To specify the most critical clock name and clock scenario, use the -clock_name and -
clock_scenario options.

The tool uses the clock name and scenario you specify to generate the high-level summary of
the clock QoR in the QORsum report. If you do not specify these options, the tool identifies
the most critical clock and uses it for the clock QoR summary.
These settings are only used for the clock QoR summary. The tool uses all clocks to generate
the detailed clock QoR information in the QORsum report.
o To specify a name to identify the run in the QORsum report, use the -run_name option.
By default, the tool names each run with a number, such as Run1, Run2, and so on. You can
use this option to give a more meaningful name to each run. You can also specify the run name
by using the -run_names option when you generate the QORsum report by using
the compare_qor_data command in step 3. If you do so, the tool ignores the run name specify
by the set_qor_data_options -run_name command.
The following example specifies the leakage-power scenario, dynamic-power scenario, clock
scenario, and the clock to use for the corresponding summary in the QORsum report:
• Introduction to Clock Gating
Clock gating applies to synchronous load-enable registers, which are flip-flops that share the
same clock and synchronous control signals. Synchronous control signals include
synchronous load-enable, synchronous set, synchronous reset, and synchronous toggle.
Synchronous load-enable registers are represented by a register with a feedback loop that
maintains the same logic value through multiple cycles. Clock gating applied to synchronous
load enable registers reduces the power needed when reloading the register banks.
Figure 1 shows a simple register bank implementation using a multiplexer and feedback loop.
Figure 1: Synchronous Load-Enable Register With Multiplexer

When the synchronous load enable signal (EN) is at logic state 0, the register bank is disabled.
In this state, the circuit uses the multiplexer to feed the Q output of each storage element in
the register bank back to the D input. When the EN signal is at logic state 1, the register is
enabled, allowing new values to load at the D input.
These feedback loops can unnecessarily use power. For example, if the same value is reloaded
in the register throughout multiple clock cycles (EN equals 0), the register bank and its clock
net consume power while values in the register bank do not change. The multiplexer also
consumes power.
Clock gating eliminates the feedback net and multiplexer shown in Figure 1 by inserting a gate
in the clock net of the register.
Note:
While applying the clock-gating techniques, the tool considers generated clocks similar to
defined clocks.
The clock-gating cell selectively prevents clock edges, thus preventing the gated-clock signal
from clocking the gated register.
Figure 2 shows a latch-based clock-gating cell and the waveforms of the signals are shown with
respect to the clock signal, CLK.
The clock input to the register bank, ENCLK, is gated on or off by the AND gate. ENL is the
enabling signal that controls the gating; it derives from the EN signal on the multiplexer shown
in Figure 1. The register bank is triggered by the rising edge of the ENCLK signal.

The latch prevents glitches on the EN signal from propagating to the register’s clock pin. When
the CLK input of the 2-input AND gate is at logic state 1, any glitching of the EN signal could,
without the latch, propagate and corrupt the register clock signal. The latch eliminates this
possibility because it blocks signal changes when the clock is at logic 1.
In latch-based clock gating, the AND gate blocks unnecessary clock pulses by maintaining the
clock signal’s value after the trailing edge. For example, for flip-flops inferred by HDL
constructs of rising-edge clocks, the clock gate forces the gated clock to 0 after the falling edge
of the clock.
By controlling the clock signal for the register bank, you can eliminate the need for reloading
the same value in the register through multiple clock cycles. Clock gating inserts clock-gating
circuitry into the register bank’s clock network, creating the control to eliminate unnecessary
register activity.
Clock gating does the following:
Reduces clock network power dissipation
Relaxes datapath timing
Reduces congestion by eliminating feedback multiplexer loops
For designs that have large register banks, clock gating can save power and area by reducing
the number of gates in the design. However, for smaller register banks, the overhead of adding
logic to the clock tree might not compare favourably to the power saved by eliminating a few
feedback nets and multiplexers.
o Clock Gating Flows
The Fusion Compiler tool inserts clock-gating cells during the compile_fusion command. The
following topics describe clock-gating flows:

o Inserting Clock Gates in Multivoltage Designs


In a multivoltage design, the different hierarchies of the design might have different operating
conditions and use different target library subsets. When inserting clock-gating cells in a
multivoltage design, the tool chooses the appropriate library cells based on the specified clock-
gating style as well as the operating conditions that match the operating conditions of the
hierarchical cell of the design. If you do not specify a clock-gating style, the tool uses a default
style where the test point is before and the observation output is false.
If the tool does not find a library cell that suits the clock-gating style and the operating
conditions, the tool issues a warning message and does not insert a clock-gating cell. To check
whether there are integrated clock-gate cells available for clock-gate insertion, use
the check_clock_gate_library_cell_availability command.

o Inserting Clock Gates in an RTL Design


To insert clock gating logic in an RTL design and to synthesize the design with the clock-gating
logic, follow these steps:
Read the RTL design.
(Optional) Use the insert_dft command to insert test cells into the design.
Use the compile_fusion command to compile the design.

During the compile process, the tool inserts clock gates on the registers qualified for clock-
gating. By default, during clock-gate insertion, the compile_fusion command uses the clock
gating default settings and also honors the setup, hold, and other constraints specified in the
logic libraries. To override the setup and hold values specified in the library, use
the set_clock_gating_check command before compiling the design. The default settings are
suitable for most designs.
The compile_fusion command automatically connects the scan enable and test ports or pins of
the integrated clock-gating cells, as needed.
Use the report_clock_gating command to report the registers and the clock-gating cells in the
design. Use the report_power command to get information about the dynamic power used by
the design after clock-gate insertion.
The following example illustrates a typical command sequence for clock using default settings:
fc_shell> read_verilog design.v
fc_shell> create_clock -period 10 -name CLK
fc_shell> compile_fusion
fc_shell> insert_dft
fc_shell> report_clock_gating
fc_shell> report_power
• Placement-Aware Clock Gating
When you enable placement-aware clock gating, the tool performs the following
optimizations:
Replication of clock gates with timing-critical enables
Adjustment of clock gates so they are placed closer to their gated registers
Automatic clock network latency annotation for clock-gate cells

To enable this feature, set the compile.clockgate.physically_aware application option to true.


(The default is false.)
The example in Figure 1 shows the difference between using and not using physically-aware
clock gating. In this layout view, you can see that the integrated clock gates (shown in red) are
physically closer to their gated registers.
Figure 1: Placement-Aware Clock Gating
o Self-Gating Optimization
The tool supports self-gating optimization, which occurs at the compile_fusion command. This
feature reduces dynamic power consumption by turning off the clock signal of certain registers
during clock cycles when the data remains unchanged.

Figure 1: Example of a Self-Gating Cell


Registers with an enable condition that cannot be inferred from the existing logic can only be
gated using the self-gating technique. They cannot be gated using traditional clock gating. By
default, the tool supports self-gating only on registers that are not gated. You can use
the set_self_gating_options command to allow self-gating on these registers. However, the
time duration that the clock signal is turned off might increase for these registers,.
To ensure QoR improvements, the self-gating algorithm takes timing and power into
consideration. A self-gating cell is inserted for the registers if
There is enough timing slack available in the register's data pin. For designs with multiple
scenarios, the algorithm considers the timing of the worst case among active scenarios enabled
for setup.
Internal dynamic power of the circuit is reduced. For designs with mutiple scenarios, the
algorithm uses the average internal dynamic power among active scenarios enabled for
dynamic power.
To minimize the area and power overhead, a self-gating cell can be shared across a few
registers by creating a combined enable condition with a tree of comparator cells. If the self-
gated registers are driven by synchronous set or synchronous clear signals, these signals are
also included in the construction of the enable signal so that the circuit remains functionally
unchanged. Figure 2 is an example of a self-gating cell that is shared across two registers (4
bits). Note that one of the self-gated registers is a multibit register and the other register is a
single-bit register. The tool can also self-gate a group of multibit registers or a group of single-
bit registers.
Figure 2: Self-Gating Tree
• Partitioning and Planning the Full Chip Design
The hierarchical synthesis flow using abstracts includes the following two cases:
Floorplans of subblock and top-level design are not available
Floorplans of subblock and top-level design are available

o Hierarchical Synthesis Flow When Floorplans are not Available


To partition and plan the top-level design and create the lower-level blocks for subsequent
bottom-up synthesis, when the floorplans of subblock and top-level design are not available,
perform the following steps:
Figure 1: Hierarchical Synthesis Flow When Floorplans are not Available
Read in the full chip design and apply constraints as shown in the following example:
fc_shell> set REF_LIBS "stdcell.ndm macro.ndm"

fc_shell> create_lib TOP -technology techfile -ref_libs $REF_LIBS

fc_shell> analyze -format verilog $rtl_files

fc_shell> elaborate TOP

fc_shell> set_top_module TOP

fc_shell> load_upf fullchip.upf

fc_shell> read_sdc fullchip.sdc

Identify the design partitions and split the constraints as shown in the following example:
fc_shell> set_budget_options -add_blocks {BLOCK1 BLOCK2}

fc_shell> split_constraints

Create the subblock design libraries with design information and enable block-specific
reference library setup as shown in the following example:
fc_shell> copy_lib -to_lib BLOCK1.nlib -no_design

fc_shell> copy_lib -to_lib BLOCK2.nlib -no_design

fc_shell> set_attribute -objects BLOCK1.nlib \

-name use_hier_ref_libs -value true

fc_shell> set_attribute -objects BLOCK2.nlib \

-name use_hier_ref_libs -value true

fc_shell> save_lib -all


Create the subblock design partitions as shown in the following example:
fc_shell> commit_block -library BLOCK1.nlib BLOCK1

fc_shell> commit_block -library BLOCK2.nlib BLOCK2

fc_shell> save_lib -all

Load the UPF and SDC constraints for the unmapped subblocks and the top-level, which are
generated by the split_constraints command earlier as shown in the following example:
fc_shell> set_constraint_mapping_file ./split/mapfile

fc_shell> load_block_constraints -all_blocks -type SDC -type UPF -type CLKNET

fc_shell> save_lib -all

Run the compile_fusion command until logic optimization with auto floorplanning for sub
blocks is complete as shown in the following example:
fc_shell> compile_fusion -to logic_opto

Run the compile_fusion command until technology mapping for the top-level design is
complete as shown in the following example:
fc_shell> compile_fusion -to initial_map

Perform design planning operations starting from floorplan initialization, subblock and
voltage area shaping, hard macro and standard cell placement, power network creation until
pin assignment.
Run the compile_fusion command until logic optimization for the top-level design is complete
as shown in the following example:
fc_shell> compile_fusion -from logic_opto -to logic_opto

Perform incremental top-level only standard cell placement as shown in the following
example:
fc_shell> create_placement -floorplan -use_seed_locs

Perform timing estimation and budgeting steps to generate the block budgets needed for block
level implementation.
After completing the design planning operations, rebuild the subblock and top-level design by
opening the elaborated RTL NDM, loading floorplan, feedthroughs and budgets.
o Hierarchical Synthesis Flow When Floorplans are Available
To partition and plan the top-level design and create the lower-level blocks for subsequent
bottom-up synthesis, when the floorplans of subblock and top-level design are available,
perform the following steps:
Figure 1: Hierarchical Synthesis Flow When Floorplans are Available

Read in the full chip design and apply constraints as shown in the following example:
fc_shell> set REF_LIBS "stdcell.ndm macro.ndm"

fc_shell> create_lib TOP -technology techfile \

-ref_libs $REF_LIBS

fc_shell> analyze -format verilog $rtl_files

fc_shell> elaborate TOP

fc_shell> set_top_module TOP

fc_shell> load_upf fullchip.upf

fc_shell> read_sdc fullchip.sdc

1. For the design partitions, split the constraints as shown in the following example:
fc_shell> set_budget_options -add_blocks {BLOCK1 BLOCK2}

fc_shell> split_constraints
2. Create the subblock design libraries with design information and enable block-specific
reference library setup as shown in the following example:
fc_shell> copy_lib -to_lib BLOCK1.nlib -no_design

fc_shell> copy_lib -to_lib BLOCK2.nlib -no_design

fc_shell> set_attribute -objects BLOCK1.nlib -name use_hier_ref_libs -value true

fc_shell> set_attribute -objects BLOCK2.nlib -name use_hier_ref_libs -value true

fc_shell> save_lib -all

3. Create the subblock design partitions as shown in the following example:


fc_shell> commit_block -library BLOCK1.nlib BLOCK1

fc_shell> commit_block -library BLOCK2.nlib BLOCK2

fc_shell> save_lib -all

4. Load the UPF and SDC constraints for the unmapped subblocks as well as the top-
level, which are generated by the split_constraints command earlier as shown in the
following example:
fc_shell> set_constraint_mapping_file ./split/mapfile

fc_shell> load_block_constraints -all_blocks -type SDC -type UPF -type CLKNET

fc_shell> save_lib -all

5. Load the floorplan for the subblocks and top-level design.


Run the compile_fusion command until logic optimization for the subblocks and top-level
design is complete as shown in the following example:
fc_shell> compile_fusion -to logic_opto

Perform standard cell placement followed by the pin assignment, timing estimation and
budgeting.
After completing the design planning operations, rebuild the subblock and top-level design by
opening the elaborated RTL NDM, loading floorplan, feedthroughs and budgets.
Note:
For details about each of the design planning operation, see the Design Planning User Guide.
o Synthesizing a Subblock
To synthesize a subblock and generate the block-level information required for top-level
synthesis, perform the following steps:
Read in a subblock design library generated after the rebuild step as described in Partitioning
and Planning the Full Chip Design.
Apply any block-specific constraints and settings required for synthesizing the block.
Apply DFT settings by using commands such as set_dft_signal, set_scan_configuration, and so
on, and create a test protocol by using the create_test_protocol command.
1. Insert DFT logic and synthesize the block by using the following commands:
fc_shell> compile_fusion -to initial_opto
fc_shell> insert_dft
fc_shell> compile_fusion -from initial_place
2. Create a read-only abstract view, frame view, and a test protocol, and save the block by
using the following commands:
fc_shell> create_abstract -read_only
fc_shell> create_frame
fc_shell> write_test_model BLOCK1.ctl
fc_shell> save_block -as BLOCK1/PostSynthesis
o Synthesizing the Top-Level
To synthesize the top-level, perform the following steps:
Read in the top-level design generated after the rebuild step as described in Partitioning and
Planning the Full Chip Design.
1. Set appropriate settings and link the top-level design as shown in the following
example:
fc_shell> set_label_switch_list -reference {BLOCK1 BLOCK2} PostCompile
fc_shell> set_attribute -objects {BLOCK1.nlib BLOCK2.nlib} \
-name use_hier_ref_libs -value true
fc_shell> set_top_module TOP
2. When a subblock has been saved in the design library using labels, use
the set_label_switch_list command to specify which label to link to.
The set_top_module command links the specified top-level module.
Apply any block-specific constraints and settings required for synthesizing the top block and
read in the test models for the subblocks as shown in the following example:
fc_shell> read_test_model BLOCK1.ctl
fc_shell> read_test_model BLOCK2.ctl
fc_shell> create_test_protocol
3. Insert DFT logic and synthesize the top-level design by using the following commands:
fc_shell> compile_fusion -to initial_opto
fc_shell> insert_dft
fc_shell> compile_fusion -from initial_place

You might also like