FC - Flow 1
FC - Flow 1
FC - Flow 1
The Fusion Compiler tool synthesizes the RTL descriptions into optimized gate-level designs,
optimizes them to achieve the best quality of results (QoR) in speed, area, and power, and
allows early design exploration and prototyping.
The Fusion Compiler tool performs unified physical synthesis. The benefits of unified physical
synthesis include
Reduced runtime
Unified physical synthesis streamlines the iterations of placement and optimization, and
utilizes the initial placement and buffer trees built during synthesis to achieve the best
runtime.
The unified preroute optimization framework uses consistent costs in optimization algorithms
to improve QoR. The best technologies, such as logic restructuring, concurrent clock and data
optimization, multibit mapping, and advanced legalizer, are shared throughout the preroute
flow.
Performing Prechecks
• Comprehensive design checking before and after every stage of the flow allows for
early detection of design and floorplan issues.
• Early RTL synthesis can use libraries with missing technology library cells, such as
retention cells, isolation cells, level shifters, and integrated clock-gating cells.
• Tightly coupled DFT synthesis technology supports concurrent synthesis of
functional logic and DFT logic, congestion optimization with placement-aware DFT
codecs, and better handling of the interaction between power intent and DFT logic.
• In the following situations, the design can contain unmapped cells after
the compile_fusion command:
Missing the required cells in the library
User-specified restrictions on the required library cells
When a design contains unmapped cells, the compile_fusion command continues the
compilation and provides an estimated area for the unmapped cells.
Unmapped cells are marked with the is_unmapped attribute. To query unmapped cells in a
netlist, use the following command:
fc_shell> get_cells -hierarchical -filter "is_unmapped==true"
Note the following limitations:
DesignWare operators are not mitigated.
Only placement information is stored in the database after mitigation; routing information is
not saved.
• Performing Prechecks
Fusion Compiler can perform compile prechecks to prevent compile failures and allow
the compile_fusion command to quickly exit with error messages when incomplete (or
inconsistent) setup or constraint issues are encountered. This capability is on by default, but
these checks can also be enabled by the compile_fusion -check_only command.
It will do flowing checks:
Design is not linked or has link issues
Based on the types of boundary optimization you enable or disable, the tool sets
the constant_propagation, unloaded_propagation, equal_opposite_propagation,
and phase_inversion read-only attributes of the specified object to true or false. If you
Enable all four types, the tool sets the boundary_optimization read-only attribute to true
Disable at least one type, the tool sets the boundary_optimization read-only attribute to false
To report boundary optimization settings, use the report_boundary_optimization command.
To remove them, use the remove_boundary_optimization command.
The following example
Disables all boundary optimization for the U1 cell instance
Disables equal-opposite logic propagation and phase inversion for the U2 cell instance
Disables constant propagation and phase inversion for the U3/in1 cell pin
Datapath Extraction
Datapath extraction transforms arithmetic operators, such as addition, subtraction, and
multiplication, into datapath blocks to be implemented by a datapath generator. This
transformation improves the quality of results (QoR) by using carry-save arithmetic.
The tool supports extraction of the following components:
Arithmetic operators that can be merged into one carry-save arithmetic tree
Operators extracted as part of a datapath: *, +, -, >, <, <=, >=, ==, !=, and MUXes
Variable shift operators (<<, >>, <<<, >>> for Verilog and sll, srl, sla, sra, rol, ror for VHDL)
Operations with bit truncation
The datapath flow can extract these components only if they are directly connected to each
other, that is, no nonarithmetic logic between components. Keep the following points in mind:
Extraction of mixed signed and unsigned operators is allowed only for adder trees
Instantiated DesignWare components cannot be extracted
report_qor Report QoR information and statistics for the current design.
- Information is based on physical libraries.
report_power Calculate and report dynamic and static power of the design or instance.
• report_area Example
fc_shell> report_area -designware -hierarchy -physical
****************************************
Report : area
Design : test
****************************************
Information: Base Cell (com): cell XNOR2ELL, w=3300, h=6270 (npin=3)
Information: Base Cell (seq): cell LPSDFE2, w=10560, h=6270 (npin=5)
Number of cells: 8
Number of combinational cells: 4
Number of sequential cells: 4
Number of macros/black boxes: 0
Number of buf/inv 0
Number of references 2
Combinational area: 49.66
Buf/Inv area: 0.00
Noncombinational area: 231.74
Macro/Black box area: 0.00
Total cell area: 281.40
_______________________________________________________________
Hierarchical area distribution
Global cell area Local cell area
---------------------------------------------
Hierarchical cell Absolute Percent Combi- Noncombi- Black-
Total Total national national boxes
Design
--------------------------------------------------------------------
test 281.40 100.0 37.24 231.74 0.00
test
mult_10 12.41 4.4 12.41 0.00 0.00
DW_mult_uns_J1_H3_D1
--------------------------------------------------------------------
Total 49.66 231.74 0.00
Design Rules
-----------------------------------------------
Total Number of Nets: 363
Nets With Violations: 0
Max Trans Violations: 0
Max Cap Violations: 0
-----------------------------------------------
The tool uses the power scenarios you specify to generate the high-level summary of the power
QoR in the QORsum report. If you do not specify these options, it uses the active power
scenario with the highest total power for both the leakage and dynamic scenario for the power
QoR summary.
These settings are only used for the power QoR summary. The tool uses the power information
of all active power scenarios to capture and report the detailed power information in the
QORsum report.
o To specify the most critical clock name and clock scenario, use the -clock_name and -
clock_scenario options.
The tool uses the clock name and scenario you specify to generate the high-level summary of
the clock QoR in the QORsum report. If you do not specify these options, the tool identifies
the most critical clock and uses it for the clock QoR summary.
These settings are only used for the clock QoR summary. The tool uses all clocks to generate
the detailed clock QoR information in the QORsum report.
o To specify a name to identify the run in the QORsum report, use the -run_name option.
By default, the tool names each run with a number, such as Run1, Run2, and so on. You can
use this option to give a more meaningful name to each run. You can also specify the run name
by using the -run_names option when you generate the QORsum report by using
the compare_qor_data command in step 3. If you do so, the tool ignores the run name specify
by the set_qor_data_options -run_name command.
The following example specifies the leakage-power scenario, dynamic-power scenario, clock
scenario, and the clock to use for the corresponding summary in the QORsum report:
• Introduction to Clock Gating
Clock gating applies to synchronous load-enable registers, which are flip-flops that share the
same clock and synchronous control signals. Synchronous control signals include
synchronous load-enable, synchronous set, synchronous reset, and synchronous toggle.
Synchronous load-enable registers are represented by a register with a feedback loop that
maintains the same logic value through multiple cycles. Clock gating applied to synchronous
load enable registers reduces the power needed when reloading the register banks.
Figure 1 shows a simple register bank implementation using a multiplexer and feedback loop.
Figure 1: Synchronous Load-Enable Register With Multiplexer
When the synchronous load enable signal (EN) is at logic state 0, the register bank is disabled.
In this state, the circuit uses the multiplexer to feed the Q output of each storage element in
the register bank back to the D input. When the EN signal is at logic state 1, the register is
enabled, allowing new values to load at the D input.
These feedback loops can unnecessarily use power. For example, if the same value is reloaded
in the register throughout multiple clock cycles (EN equals 0), the register bank and its clock
net consume power while values in the register bank do not change. The multiplexer also
consumes power.
Clock gating eliminates the feedback net and multiplexer shown in Figure 1 by inserting a gate
in the clock net of the register.
Note:
While applying the clock-gating techniques, the tool considers generated clocks similar to
defined clocks.
The clock-gating cell selectively prevents clock edges, thus preventing the gated-clock signal
from clocking the gated register.
Figure 2 shows a latch-based clock-gating cell and the waveforms of the signals are shown with
respect to the clock signal, CLK.
The clock input to the register bank, ENCLK, is gated on or off by the AND gate. ENL is the
enabling signal that controls the gating; it derives from the EN signal on the multiplexer shown
in Figure 1. The register bank is triggered by the rising edge of the ENCLK signal.
The latch prevents glitches on the EN signal from propagating to the register’s clock pin. When
the CLK input of the 2-input AND gate is at logic state 1, any glitching of the EN signal could,
without the latch, propagate and corrupt the register clock signal. The latch eliminates this
possibility because it blocks signal changes when the clock is at logic 1.
In latch-based clock gating, the AND gate blocks unnecessary clock pulses by maintaining the
clock signal’s value after the trailing edge. For example, for flip-flops inferred by HDL
constructs of rising-edge clocks, the clock gate forces the gated clock to 0 after the falling edge
of the clock.
By controlling the clock signal for the register bank, you can eliminate the need for reloading
the same value in the register through multiple clock cycles. Clock gating inserts clock-gating
circuitry into the register bank’s clock network, creating the control to eliminate unnecessary
register activity.
Clock gating does the following:
Reduces clock network power dissipation
Relaxes datapath timing
Reduces congestion by eliminating feedback multiplexer loops
For designs that have large register banks, clock gating can save power and area by reducing
the number of gates in the design. However, for smaller register banks, the overhead of adding
logic to the clock tree might not compare favourably to the power saved by eliminating a few
feedback nets and multiplexers.
o Clock Gating Flows
The Fusion Compiler tool inserts clock-gating cells during the compile_fusion command. The
following topics describe clock-gating flows:
During the compile process, the tool inserts clock gates on the registers qualified for clock-
gating. By default, during clock-gate insertion, the compile_fusion command uses the clock
gating default settings and also honors the setup, hold, and other constraints specified in the
logic libraries. To override the setup and hold values specified in the library, use
the set_clock_gating_check command before compiling the design. The default settings are
suitable for most designs.
The compile_fusion command automatically connects the scan enable and test ports or pins of
the integrated clock-gating cells, as needed.
Use the report_clock_gating command to report the registers and the clock-gating cells in the
design. Use the report_power command to get information about the dynamic power used by
the design after clock-gate insertion.
The following example illustrates a typical command sequence for clock using default settings:
fc_shell> read_verilog design.v
fc_shell> create_clock -period 10 -name CLK
fc_shell> compile_fusion
fc_shell> insert_dft
fc_shell> report_clock_gating
fc_shell> report_power
• Placement-Aware Clock Gating
When you enable placement-aware clock gating, the tool performs the following
optimizations:
Replication of clock gates with timing-critical enables
Adjustment of clock gates so they are placed closer to their gated registers
Automatic clock network latency annotation for clock-gate cells
Identify the design partitions and split the constraints as shown in the following example:
fc_shell> set_budget_options -add_blocks {BLOCK1 BLOCK2}
fc_shell> split_constraints
Create the subblock design libraries with design information and enable block-specific
reference library setup as shown in the following example:
fc_shell> copy_lib -to_lib BLOCK1.nlib -no_design
Load the UPF and SDC constraints for the unmapped subblocks and the top-level, which are
generated by the split_constraints command earlier as shown in the following example:
fc_shell> set_constraint_mapping_file ./split/mapfile
Run the compile_fusion command until logic optimization with auto floorplanning for sub
blocks is complete as shown in the following example:
fc_shell> compile_fusion -to logic_opto
Run the compile_fusion command until technology mapping for the top-level design is
complete as shown in the following example:
fc_shell> compile_fusion -to initial_map
Perform design planning operations starting from floorplan initialization, subblock and
voltage area shaping, hard macro and standard cell placement, power network creation until
pin assignment.
Run the compile_fusion command until logic optimization for the top-level design is complete
as shown in the following example:
fc_shell> compile_fusion -from logic_opto -to logic_opto
Perform incremental top-level only standard cell placement as shown in the following
example:
fc_shell> create_placement -floorplan -use_seed_locs
Perform timing estimation and budgeting steps to generate the block budgets needed for block
level implementation.
After completing the design planning operations, rebuild the subblock and top-level design by
opening the elaborated RTL NDM, loading floorplan, feedthroughs and budgets.
o Hierarchical Synthesis Flow When Floorplans are Available
To partition and plan the top-level design and create the lower-level blocks for subsequent
bottom-up synthesis, when the floorplans of subblock and top-level design are available,
perform the following steps:
Figure 1: Hierarchical Synthesis Flow When Floorplans are Available
Read in the full chip design and apply constraints as shown in the following example:
fc_shell> set REF_LIBS "stdcell.ndm macro.ndm"
-ref_libs $REF_LIBS
1. For the design partitions, split the constraints as shown in the following example:
fc_shell> set_budget_options -add_blocks {BLOCK1 BLOCK2}
fc_shell> split_constraints
2. Create the subblock design libraries with design information and enable block-specific
reference library setup as shown in the following example:
fc_shell> copy_lib -to_lib BLOCK1.nlib -no_design
4. Load the UPF and SDC constraints for the unmapped subblocks as well as the top-
level, which are generated by the split_constraints command earlier as shown in the
following example:
fc_shell> set_constraint_mapping_file ./split/mapfile
Perform standard cell placement followed by the pin assignment, timing estimation and
budgeting.
After completing the design planning operations, rebuild the subblock and top-level design by
opening the elaborated RTL NDM, loading floorplan, feedthroughs and budgets.
Note:
For details about each of the design planning operation, see the Design Planning User Guide.
o Synthesizing a Subblock
To synthesize a subblock and generate the block-level information required for top-level
synthesis, perform the following steps:
Read in a subblock design library generated after the rebuild step as described in Partitioning
and Planning the Full Chip Design.
Apply any block-specific constraints and settings required for synthesizing the block.
Apply DFT settings by using commands such as set_dft_signal, set_scan_configuration, and so
on, and create a test protocol by using the create_test_protocol command.
1. Insert DFT logic and synthesize the block by using the following commands:
fc_shell> compile_fusion -to initial_opto
fc_shell> insert_dft
fc_shell> compile_fusion -from initial_place
2. Create a read-only abstract view, frame view, and a test protocol, and save the block by
using the following commands:
fc_shell> create_abstract -read_only
fc_shell> create_frame
fc_shell> write_test_model BLOCK1.ctl
fc_shell> save_block -as BLOCK1/PostSynthesis
o Synthesizing the Top-Level
To synthesize the top-level, perform the following steps:
Read in the top-level design generated after the rebuild step as described in Partitioning and
Planning the Full Chip Design.
1. Set appropriate settings and link the top-level design as shown in the following
example:
fc_shell> set_label_switch_list -reference {BLOCK1 BLOCK2} PostCompile
fc_shell> set_attribute -objects {BLOCK1.nlib BLOCK2.nlib} \
-name use_hier_ref_libs -value true
fc_shell> set_top_module TOP
2. When a subblock has been saved in the design library using labels, use
the set_label_switch_list command to specify which label to link to.
The set_top_module command links the specified top-level module.
Apply any block-specific constraints and settings required for synthesizing the top block and
read in the test models for the subblocks as shown in the following example:
fc_shell> read_test_model BLOCK1.ctl
fc_shell> read_test_model BLOCK2.ctl
fc_shell> create_test_protocol
3. Insert DFT logic and synthesize the top-level design by using the following commands:
fc_shell> compile_fusion -to initial_opto
fc_shell> insert_dft
fc_shell> compile_fusion -from initial_place