Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools
Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools
Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools
Versal
Olivier TREMOIS, PhD
SW Technical Marketing AI Engine Tools
Network-on-Chip
Protocol Engines • Guaranteed Bandwidth
• Integrated 600G cores • Enables SW Programmability
• 4X encrypted bandwidth
MEMORY
MEMORY
AI AI
Engine Engine
MEMORY
MEMORY
AI AI
Engine Engine
Huge performance improvements versus UltraScale+ Terabytes/sec of interface bandwidth to other engines
˃ 8x compute density @ 40% lower power ˃ Direct, massive throughput to adaptable HW engines
1GHz+ VLIW / SIMD vector processors ˃ Implement core application with AI for “Whole App Acceleration”
˃ Versatile core for ML and other advanced DSP workloads SW programmable for any developer
Massive array of interconnected cores ˃ C programmable, compile in minutes
˃ Instantiate multiple tiles (10s to 100s) for scalable compute ˃ Library-based design for ML framework developers
PL Fabric + AIE PS
HW SW
Platform Platform
PS
Subsystem #2 PL Firmware
Application
Subsystem #N PL Firmware
Subsystems form the customer’s differentiating logic: AIE and PL kernels, operating under the supervision of the PS
Versal platform provides essential infrastructure services (CIPS, NoC, I/Os, OS, Drivers…)
Platform insulates developers from low-level details; lets them focus on application development (SW, PL or AIE)
AIE Kernels, Graph PL Kernels (HLS) RTL Kernels XRT, Graph API
Vitis HW Platform
AIE driver
Vitis SW Platform
AIE Simulation HLS Cosimulation RTL Verification PS App
Linux + rootfs
Vivado HW Build
SIM Build
Timing Closure
SSW
Run on Device HW Emulation
Profile Vivado
AIESim QEMU SIM Vitis
Debug
AIE Kernels, Graph PL Kernels (HLS) RTL Kernels XRT, Graph API
Vitis HW Platform
AIE driver
Vitis SW Platform
AIE Simulation HLS Cosimulation RTL Verification PS App
Linux + rootfs
Vivado HW Build
SIM Build
Timing Closure
SSW
Run on Device HW Emulation
Profile
Vivado
AIESim QEMU SIM
Debug Vitis
d e f e
fir_tap_7
˃ The programming model allows you to use: ˃ ADF graph based programming
˃ Various Vector datatypes ˃ Modular, hierarchical graph definition
˃ AI Engine intrinsics ˃ Instantiation of AI Engine memories,
˃ Window function API, … Streams, …
window_decr_v8(w_input,1);
}
}
>> 11 © Copyright 2020 Xilinx
Graph Development, Validation and optimization
I/F optimization
Graph
Location constraints, stamp (AI Engine graph map) and repeat
Optimization FIFO settings, circuit/packet switch communications, …
#include "kernels.h"
AI Engine Application
described as a graph
class myGraph : public graph {
private:
kernel kernel1,kernel2; Single Kernel Based Graph
public:
IOs of the graph
input_port in;
are “ports”
input_port NSamples;
input_port Coefficients;
output_port out; The constructor of the graph
myGraph(){ describes all the connections
and some other parameters.
kernel1 = kernel::create(fir_16taps_symm);
kernel2 = kernel::create(fir_23taps_symm); For “Single Kernel programming”
connect< window<128> > net0 (in, kernel1.in[2]); this section is very simple
connect< window<128> > net1 (kernel1.out[0],kernel2.in[0]);
connect< window<128> > net2 (kernel2.out[0], out);
connect<parameter> (NSamples, async(kernel1.in[0]));
connect<parameter> (Coefficients, async(kernel1.in[1]));
source(kernel1) = "kernels/Kernel_1.cc";
source(kernel2) = "kernels/Kernel_2.cc";
runtime<ratio>(kernel1) = 0.1;
runtime<ratio>(kernel2) = 0.1;
}
};
>> 13 © Copyright 2020 Xilinx
Testbench
Adaptive
#include <adf.h> Dataflow
using namespace adf; Library
#include "kernels.h"
#include "kernels/include.h"
#include "project.h"
AI Engine Graph
kernelOptGraph mygraph;
Creation of a virtual platform:
simulation::platform<1,1> platform("data/input.txt", "data/output.txt"); - Input test vector file
connect<> net0(platform.src[0], mygraph.in); - Output vector file
connect<> net1(mygraph.out, platform.sink[0]); - Connection of the graph
int main(void) {
int32 taps[16] = {-100, 200, -300, 400, -500, 600, -700, 800, 800, -700, 600, -500,
400, -300, 200, -100};
return 0;
}
Profiling Viewer
Simulation Timeline analysis
Can be used also within Makefile flow
System
HW Link System configuration file
Project
Baremetal
OS
Linux
Processing
AI Engine Drivers
System
PS application XRT
OpenCL
Graph
Weighted PolarClip
MM2S Average Classifier S2MM
sum HLS
(PL DMA) (AI Engine) (AI Engine) (PL DMA)
(AI Engine) (PL Kernel)
DDR DDR
Programmable PolarClip
MM2S S2MM
HLS
(PL DMA) Logic (PL Kernel)
(PL DMA)
DDR DDR
Vitis is a unified tool that is used throughout the AI Engine development flow