High Level Synthesis With Catapultc: Michal Stala
High Level Synthesis With Catapultc: Michal Stala
High Level Synthesis With Catapultc: Michal Stala
MICHAL STALA
Motivation
What you learn in the DSP-course is actually used in the
real world
Introduction to Lab 2/3
Topics
Introduction to Catapult
High Level Synthesis (HLS) advantages
Catapult Demo
Introduction normal design flow?
Netlist
Design (VHDL) Synthesis
(VHDL)
Introduction Catapult design flow?
Catapult SystemC/
C++ code
Intermediate
Catapult
netlist (VHDL)
Constraints
Netlist
Synthesis
(VHDL)
Introduction Catapult
Writing software C++code for hardware design
Successful projects require HW engineers not SW
engineers
It is important to know HW concepts to get optimal results
The following concepts that you have learned in the DSP
course are used in the Catapult SW
Folding/Unfolding
Re-timing/Pipelining
Bit-level optimization
and more
HLS advantages
Verification/debug
QuestaSim VS SW debugger
SC_CTOR(sc_method_comb_example):clk("clk"),
rst("rst"),a("a"),b("b"),dout("dout")
{
SC_METHOD(exec);
sensitive << a << b;
}
void exec(){
sum = a.read() + b.read();
dout.write(sum);
}
private:
sc_int<9> sum;
};
Other advantages
Low- and High-level design optimize where needed
Reduce time for design changes
Efficient SW/HW co-simulation
The same env. as SW, for example gcc
Same code different environments
HW/SW co-simulation
VHDL design restricted to requirements
Frequency requirement
System clock is 100Mhz and all block should used
this frequency
ASIC target lib
Target is for example 28nm CMOS
Think big!
X + out
y_in
void f(int &x_in, int &y_in, int &z_in, int &out){!
out = x_in*y_in+z_in;!
}!
!
Catapult pipeline
z_in
x_in
X + out
y_in
Where do we put the cutset?
Catapult pipeline
cutset
z_in D
x_in
X D + out
y_in
Time for demo!
Imagine a huge design
Difficulty finding the optimal pipeline solution (largest ASICs use
5B gates in 2013)
All the cutset possibilities are difficult to analyze
All optimizations wasted when using another tech node or
frequency
Catapult unfolding
Unfolding or parallel processing
Speed up the computation by a factor: J
Area penalty
Parallel 3-Tap
b0 b1 b2
y(2k+1)
Parallel 3-Tap (3)
x(5) x(4)
x(3) x(2) y(0) = b0 x (0) + b1 x (1) + b2 x (2)
x(1) x(0)
y(1) = b0 x (1) + b1 x (0) + b2 x (1)
x(2k+1) x(2k)
x(2k-1)
2 Inputs D
k=0
x(2k-2)
D
b0 b1 b2
y(2k)
y(0) y(2) y(4)
b0 b1 b2
y(2k+1)
y(1) y(3) y(5)
Time for Catapult magic!