Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools

AI Engine Development for
Versal
Olivier TREMOIS, PhD
SW Technical Marketing AI Engine Tools
© Copyright 2020 Xilinx

Versal Architecture Overview
Adaptable Engines
2X compute density
Scalar Engines Intelligent Engines

• Platform Management • AI Compute
Controller (PMC) • Diverse DSP workloads
• Edge Compute
Network-on-Chip
Protocol Engines • Guaranteed Bandwidth
• Integrated 600G cores • Enables SW Programmability
• 4X encrypted bandwidth
Programmable I/O DDR Memory

• Any sensor, any interface • 2X bandwidth/pin
• Extendable peripheral set • Server-class density
PCIe & CCIX

Transceivers
• 2X PCIe & DMA bandwidth
• Broad range, 25G →112G
• Cache-coherent interface
• 58G in mainstream devices
to accelerators
>> 2 © Copyright 2020 Xilinx

AI Engines
Hardened Compute, Memory & Interconnect
MEMORY
MEMORY
AI AI
Engine Engine
MEMORY
MEMORY
AI AI
Engine Engine
Huge performance improvements versus UltraScale+ Terabytes/sec of interface bandwidth to other engines
˃ 8x compute density @ 40% lower power ˃ Direct, massive throughput to adaptable HW engines
1GHz+ VLIW / SIMD vector processors ˃ Implement core application with AI for “Whole App Acceleration”
˃ Versatile core for ML and other advanced DSP workloads SW programmable for any developer
Massive array of interconnected cores ˃ C programmable, compile in minutes
˃ Instantiate multiple tiles (10s to 100s) for scalable compute ˃ Library-based design for ML framework developers

Vitis Philosophy : Platforms and Subsystems
PL Fabric + AIE PS
HW SW
Platform Platform
Subsystem #1 AIE PL Firmware
PS
Subsystem #2 PL Firmware
Application
Subsystem #N PL Firmware
 Subsystems form the customer’s differentiating logic: AIE and PL kernels, operating under the supervision of the PS
 Versal platform provides essential infrastructure services (CIPS, NoC, I/Os, OS, Drivers…)
 Platform insulates developers from low-level details; lets them focus on application development (SW, PL or AIE)
4 © Copyright 2020 Xilinx

Vitis 2020.2 Flow for Versal
AIE PL (HLS) PL (RTL) Platform PS
AIE Kernels, Graph PL Kernels (HLS) RTL Kernels XRT, Graph API
Vitis HW Platform
AIE driver
Vitis SW Platform
AIE Simulation HLS Cosimulation RTL Verification PS App
Linux + rootfs
PL and AIE Integration (v++ --link)
Vivado HW Build
SIM Build
Timing Closure
Generate Binary (v++ --package)
SSW
Run on Device HW Emulation
Profile Vivado
AIESim QEMU SIM Vitis
Debug

Vitis 2020.2 Flow for Versal
AIE PL (HLS) PL (RTL) Platform PS
AIE Kernels, Graph PL Kernels (HLS) RTL Kernels XRT, Graph API
Vitis HW Platform
AIE driver
Vitis SW Platform
AIE Simulation HLS Cosimulation RTL Verification PS App
Linux + rootfs
PL and AIE Integration (v++ --link)
Vivado HW Build
SIM Build
Timing Closure
Generate Binary (v++ --package)
SSW
Run on Device HW Emulation
Profile
Vivado
AIESim QEMU SIM
Debug Vitis
6 © Copyright 2020 Xilinx

AI Engine Programming
a b c d f
a b c polarclip feedback equalizer fir_tap_11 scale
d e f e
fir_tap_7
Single Kernel Programming AI Engine Application
˃ Create AI Engine kernel programs ˃ Create multi-kernel AI Engine projects
˃ The programming model allows you to use: ˃ ADF graph based programming
˃ Various Vector datatypes ˃ Modular, hierarchical graph definition
˃ AI Engine intrinsics ˃ Instantiation of AI Engine memories,
˃ Window function API, … Streams, …
˃ Analyze and Debug Kernel code ˃ Analyze and Debug

˃ Compile, Simulate, profile, … ˃ Dataflow, Function scheduling, …

Programming Flow

Kernel Functional and Performance Validation
Single Node Development template

Kernel Development Or your own single Node project
Required for profiling and low-level analysis
Simple connection of the graph to the environment

Kernel Validation In-context, full AI Engine array access, PL-connection, …
Debug at the kernel level
Single Kernel Code vectorization, vector datatypes

Vector intrinsics, optimized interface, …
Optimization

AI Engine Kernel Programming Flow
Functional Matlab/C/C++  Code restructuring
Reference
Verification  Vector data-types
 Function intrinsics
Kernel
Vectorization  Memory optimization
Performance
Verification AIE Optimized
Directives
C/C++  Directives (pragmas)
 Loop unrolling
 Software pipelining
AIE Compiler
 Software development framework

AIE Assembly
Code  C/C++ verification (Debugger)
 Profiler
Cycles  SW-Emulation: functional only

AIE-Emulation
 AIE-Emulation: cycle true
AI Engine Programming: Standard Vector Programming Techniques

Kernel Programming Adaptive
Dataflow
A Kernel is a ‘C/C++’ function using
#include <adf.h> Library
special IO and Vector data types. It will
be launched automatically by a
void fir_16taps_symm(const unsigned samples, const int32 (&taps_in)[16], scheduler depending on some events
input_window_cint16 * w_input, output_window_cint16 * w_output)
{
v16int16 coeffs;
v32cint16 sbuff = undef_v32cint16(); Vector Datatypes
for (unsigned i = 0; i < 12 ; i++) for vectorized
coeffs = shft_elem(coeffs, (int16) taps_in[15 - i]); computations
const unsigned LSIZE = (samples / 4);
Directives to
for ( unsigned i=0; i<LSIZE; i+=2)
help in
chess_loop_range(2,)
scheduling for
chess_prepare_for_pipelining
performance
{
v4cacc48 acc; C Window API
sbuff = upd_w(sbuff, 0, window_readincr_v8(w_input)); to access data
sbuff = upd_w(sbuff, 1, window_readincr_v8(w_input));
sbuff = upd_w(sbuff, 2, window_read_v8(w_input) );
acc = mul4_sym( sbuff , 0 , 0x3210 , 1 , 15 , coeffs, 0, 0x0000, 1 );
acc = mac4_sym(acc, sbuff , 4 , 0x3210 , 1 , 11 , coeffs, 4, 0x0000, 1 ); AI Engine
window_writeincr(w_output, srs(acc,SRS_SHIFT)); intrinsics to
perform
acc = mul4_sym( sbuff , 4 , 0x3210 , 1 , 19 , coeffs, 0, 0x0000, 1 ); vectorized
acc = mac4_sym(acc, sbuff , 8 , 0x3210 , 1 , 15 , coeffs, 4, 0x0000, 1 ); computation
window_writeincr(w_output, srs(acc,SRS_SHIFT));
window_decr_v8(w_input,1);
}
}
Graph Development, Validation and optimization
Kernel stitching within graph

Graph Development AI Engine compiler, placer and router
Can include PL-based kernel
Emulation-SW: complete graph functional simulation

Graph Validation Emulation-AIE: Cycle true graph simulation
Debug at the graph level
I/F optimization
Graph
Location constraints, stamp (AI Engine graph map) and repeat
Optimization FIFO settings, circuit/packet switch communications, …

Graph Programming
Adaptive
#include <adf.h> Dataflow
using namespace adf; Library
#include "kernels.h"
AI Engine Application
described as a graph
class myGraph : public graph {
private:
kernel kernel1,kernel2; Single Kernel Based Graph
public:
IOs of the graph
input_port in;
are “ports”
input_port NSamples;
input_port Coefficients;
output_port out; The constructor of the graph
myGraph(){ describes all the connections
and some other parameters.
kernel1 = kernel::create(fir_16taps_symm);
kernel2 = kernel::create(fir_23taps_symm); For “Single Kernel programming”
connect< window<128> > net0 (in, kernel1.in[2]); this section is very simple
connect< window<128> > net1 (kernel1.out[0],kernel2.in[0]);
connect< window<128> > net2 (kernel2.out[0], out);
connect<parameter> (NSamples, async(kernel1.in[0]));
connect<parameter> (Coefficients, async(kernel1.in[1]));
source(kernel1) = "kernels/Kernel_1.cc";
source(kernel2) = "kernels/Kernel_2.cc";
runtime<ratio>(kernel1) = 0.1;
runtime<ratio>(kernel2) = 0.1;
}
};
Testbench
Adaptive
#include <adf.h> Dataflow
using namespace adf; Library
#include "kernels.h"
#include "kernels/include.h"
#include "project.h"
AI Engine Graph
kernelOptGraph mygraph;
Creation of a virtual platform:
simulation::platform<1,1> platform("data/input.txt", "data/output.txt"); - Input test vector file
connect<> net0(platform.src[0], mygraph.in); - Output vector file
connect<> net1(mygraph.out, platform.sink[0]); - Connection of the graph
int main(void) {
int32 taps[16] = {-100, 200, -300, 400, -500, 600, -700, 800, 800, -700, 600, -500,
400, -300, 200, -100};
mygraph.init(); Simulation control

mygraph.run(4);
mygraph.update(mygraph.samples, uint32(INPUT_SAMPLES));
mygraph.update(mygraph.coefficients, taps, 16);
mygraph.end();
return 0;
}

Vitis Analyzer

Vitis Analyzer introduction
Compile Results Analysis:

 Graph
 Mapping
 Memory footprint
 DMAs, Locks, …
Profiling Viewer
Simulation Timeline analysis
Can be used also within Makefile flow

Vitis Analyzer Compilation View
Graph View
 Shows all the kernels defined in the AI Engine graph (AI Engine Array and PL)
 The kernels can be grouped by Tile or Subgraph or no grouping at all

Vitis Analyzer Compilation View
Array View
 Shows the complete AI Engine array and specifies which Tile is used and wall connections

Vitis Analyzer Trace view
The Trace view gives

information on what runs on
each tile (active tiles only) of
the array:
 Core, DMA, Locks and IOs
A Tile is active as soon as

its AI Engine processor, its
local memory or its
interconnect is active

AI Engine Project Creation in Vitis 2020.2

System Project structure in Vitis
AI Engine kernel source files

AI Engine
Sub-graphs and graphs description
PL kernel source files (HLS)

Programmable
Logic PL kernel source files (packaged RTL)
System
HW Link System configuration file
Project
Baremetal
OS
Linux
Processing
AI Engine Drivers
System
PS application XRT
OpenCL

Vitis 2020.2 Demo

Example design partitioning
Graph
Weighted PolarClip
MM2S Average Classifier S2MM
sum HLS
(PL DMA) (AI Engine) (AI Engine) (PL DMA)
(AI Engine) (PL Kernel)
DDR DDR

Example design partitioning
AI Engine
Array
Weighted
Average Classifier
sum
(AI Engine) (AI Engine)
(AI Engine)
Programmable PolarClip
MM2S S2MM
HLS
(PL DMA) Logic (PL Kernel)
(PL DMA)
DDR DDR

Vitis 2020.2
Project Creation and AI Engine Simulation

Vitis 2020.2
PL kernel compilation and HW link

Vitis 2020.2
PS app compilation and HW Emulation

Vitis 2020.2
HW Implementation

Summary
Vitis is a unified tool that is used throughout the AI Engine development flow
AI Engine development is a 2-stage process

 Single kernel
 Graph development
Vitis handles all Versal ACAP domains

Thank You

Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools

Uploaded by

Copyright:

Available Formats

Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ai Engine Development For Versal: Olivier Tremois, PHD SW Technical Marketing Ai Engine Tools

Uploaded by

Copyright:

Available Formats

AI Engine Development for

© Copyright 2020 Xilinx

Scalar Engines Intelligent Engines

Programmable I/O DDR Memory

PCIe & CCIX

>> 2 © Copyright 2020 Xilinx

>> 3 © Copyright 2020 Xilinx

Subsystem #1 AIE PL Firmware

4 © Copyright 2020 Xilinx

PL and AIE Integration (v++ --link)

Generate Binary (v++ --package)

>> 5 © Copyright 2020 Xilinx

PL and AIE Integration (v++ --link)

Generate Binary (v++ --package)

6 © Copyright 2020 Xilinx

Single Kernel Programming AI Engine Application

˃ Create AI Engine kernel programs ˃ Create multi-kernel AI Engine projects

˃ Analyze and Debug Kernel code ˃ Analyze and Debug

>> 7 © Copyright 2020 Xilinx

© Copyright 2020 Xilinx

Single Node Development template

Simple connection of the graph to the environment

Single Kernel Code vectorization, vector datatypes

>> 9 © Copyright 2020 Xilinx

 Software development framework

Cycles  SW-Emulation: functional only

AI Engine Programming: Standard Vector Programming Techniques

Kernel stitching within graph

Emulation-SW: complete graph functional simulation

>> 12 © Copyright 2020 Xilinx

mygraph.init(); Simulation control

>> 14 © Copyright 2020 Xilinx

© Copyright 2020 Xilinx

Compile Results Analysis:

>> 16 © Copyright 2020 Xilinx

© Copyright 2020 Xilinx

© Copyright 2020 Xilinx

The Trace view gives

A Tile is active as soon as

© Copyright 2020 Xilinx

© Copyright 2020 Xilinx

AI Engine kernel source files

PL kernel source files (HLS)

>> 21 © Copyright 2020 Xilinx

© Copyright 2020 Xilinx

>> 23 © Copyright 2020 Xilinx

>> 24 © Copyright 2020 Xilinx

© Copyright 2020 Xilinx

© Copyright 2020 Xilinx

© Copyright 2020 Xilinx

© Copyright 2020 Xilinx

AI Engine development is a 2-stage process

Vitis handles all Versal ACAP domains

© Copyright 2020 Xilinx

© Copyright 2020 Xilinx

You might also like