Programming Vision Applications On Zynq Using Opencv and High-Level Synthesis
Programming Vision Applications On Zynq Using Opencv and High-Level Synthesis
Programming Vision Applications On Zynq Using Opencv and High-Level Synthesis
Switches
I/O
MUX
MIO
ARM
Switches
AMBA
Switches
AMBA
Switches
Programmable
Logic:
System Gates,
DSP, RAM
XADC
PCIe
Multi-Standards I/Os (3.3V & High Speed 1.8V)
M
u
l
t
i
-
S
t
a
n
d
a
r
d
s
I
/
O
s
(
3
.
3
V
&
H
i
g
h
S
p
e
e
d
1
.
8
V
)
Multi Gigabit Transceivers
MPSoC 2013
Video Board: Zync 702 board
Page 7
MPSoC 2013
Image processing often programmed using streaming and
OpenCV libraries
Processor runs Linux, and a window system (Qt)
On chip performance monitors tied to running Display,
processor load and FPGA to external memory load
Basic idea:
take the edge detect of the current frame and
the edge detect of previous frame
Subtract: if the same: nothing, if different: show fat pixel
Page 8
Programming
MPSoC 2013
while(1){
//Get the image frame from the video input
frame_current = cvQueryFrame(capture);
//Detect edges in the current frame
cvSobel( gray_current, edge_current2, 1, 0, aperature_size );
cvSobel( gray_current, edge_current1, 0, 1, aperature_size );
cvAdd(edge_current2,edge_current1,edge_current2,NULL);
cvConvertScale(edge_current2,edge_current,scale,0);
cvThreshold(edge_current,edge_current,5,255,CV_THRESH_BINARY);
//Detect edges in the previous frame
cvSobel( gray_prev, edge_prev2,1, 0, aperature_size );
cvSobel( gray_prev, edge_prev1,0, 1, aperature_size );
cvAdd(edge_prev2,edge_prev1,edge_prev2,NULL);
cvConvertScale(edge_prev2,edge_prev,scale,0);
cvThreshold(edge_prev,edge_prev,5,255,CV_THRESH_BINARY);
//Detect edges that are only present in the edge_current image
detect_new_edges(edge_prev,edge_current,new_edge);
//Remove noise from the new edges
cvSmooth(new_edge,filtered_new_edges,CV_MEDIAN,7,7);
//Combine new edges with current frame
//Highlight new edges in red
highlight_blend(frame_current,filtered_new_edges,output_frame);
//Copy current frame into previous frame
cvCopy(frame_current,frame_prev,NULL);
//Display output frame
cvShowImage(Detector Output,output_frame);
}
Video Input
Subsystem
Application Code Application on Zynq
ARM A9
Processor
Video Output
Subsystem
Source Code Mapping
SW
FPGA
FABRIC
Memory
Subsystem
Memory
Subsystem
FPGA
FABRIC
FPGA
FABRIC
FPGA
FABRIC
MPSoC 2013
Page 9
while(1){
//Get the image frame from the video input
frame_current = cvQueryFrame(capture);
//Detect edges in the current frame
cvSobel( gray_current, edge_current2, 1, 0, aperature_size );
cvSobel( gray_current, edge_current1, 0, 1, aperature_size );
cvAdd(edge_current2,edge_current1,edge_current2,NULL);
cvConvertScale(edge_current2,edge_current,scale,0);
cvThreshold(edge_current,edge_current,5,255,CV_THRESH_BINARY);
//Detect edges in the previous frame
cvSobel( gray_prev, edge_prev2,1, 0, aperature_size );
cvSobel( gray_prev, edge_prev1,0, 1, aperature_size );
cvAdd(edge_prev2,edge_prev1,edge_prev2,NULL);
cvConvertScale(edge_prev2,edge_prev,scale,0);
cvThreshold(edge_prev,edge_prev,5,255,CV_THRESH_BINARY);
//Detect edges that are only present in the edge_current image
detect_new_edges(edge_prev,edge_current,new_edge);
//Remove noise from the new edges
cvSmooth(new_edge,filtered_new_edges,CV_MEDIAN,7,7);
//Combine new edges with current frame
//Highlight new edges in red
highlight_blend(frame_current,filtered_new_edges,output_frame);
//Copy current frame into previous frame
cvCopy(frame_current,frame_prev,NULL);
//Display output frame
cvShowImage(Detector Output,output_frame);
}
Video Input
Subsystem
Application Code Application on Zynq
ARM A9
Processor
Video Output
Subsystem
Source Code Mapping
Memory Subsystem
Memory
Subsystem
Sobel
Current Edge
Sobel
Prev Edge
detect_new_edge
cvSmooth
highlight_blend
H
L
S
G
E
N
E
R
A
T
E
D
MPSoC 2013
Page 10
Current Frame
Previous Frame
Output Frame
Detected New Car
OpenCV
Motion Detection
Algorithm
MPSoC 2013
Page 11
External Input/Output and compute
PS7
HDMIinput
HDMI Output
hdmi2axi axivdma
processing
YCbCr2RGB
logicvc
axidma
32b
sync
HP0
HP2
Hsync
Vsync
Clk,de
Hsync
Vsync
Clk,de
2x axidma processing
HP1
32b
Frame Buffer is the application level
abstraction for HDMI input/output
HDMI
to FB
FB to
HDMI
HDMI to Framebuffer
IO IP subsystem
Framebuffer to HDMI
IO IP subsystem
sobel_filter_pass();
sobel_filter();
diff_image();
median_char_filter_pass();
combo_image();
ycbcr2rgb_pad();
MPSoC 2013
Page 12
Exactly the same OpenCV image processing pipe
Using OpenCV libraries optimized for Intel SSE vector
processing
Webcam is 1280 x 720 (720p)
Runs on Intel i7 with ~2.7GHZ processor and 8Gbyte DRAM
Net result in the range of one frame every 1-2 seconds
Demo
Page 13
Laptop demo with webcam
MPSoC 2013
Model train
HDTV camera 1080p 60Frames per second, HDMI
Board, with application running, linux, Qt, processor + Bus -load
Mouse tied to a register to set threshold value dynamically
HDTV, 1080p , HDMI
Page 14
Complete setup:
MPSoC 2013
Power zones
2x GigE
with DMA
2x USB
with DMA
2x SDIO
with DMA
Static Memory Controller
Quad-SPI, NAND, NOR
Dynamic Memory Controller
DDR3, DDR2, LPDDR2
AMBA
Switches
I/O
MUX
MIO
ARM
Switches
AMBA
Switches
AMBA
Switches
Programmable
Logic:
System Gates,
DSP, RAM
XADC
PCIe
Multi-Standards I/Os (3.3V & High Speed 1.8V)
M
u
l
t
i
-
S
t
a
n
d
a
r
d
s
I
/
O
s
(
3
.
3
V
&
H
i
g
h
S
p
e
e
d
1
.
8
V
)
Multi Gigabit Transceivers
INT
BRAM
AUX, ADJ
PINT
MIO
1V5
PAUX
MPSoC 2013
Page 15
Power zones
INT
PINT
AUX
PAUX
ADJ
BRAM
MIO
1V5 = DDR
3V3
2V5
1V5
3V3
2V5
MPSoC 2013
Page 16
Power Measurement output on running board
Power in Watts
Only few measurements/sec
10% noise
10% difference between bit streams
MPSoC 2013
Page 17
All Pixel processing with OpenCV libraries running on ARM A9:
one frame per 13 secs
All Pixel processing with C++ running on A9: 1 frame per 1-2
seconds
All Pixel processing with C++ libraries implemented via HLS in
FPGA: 60 frames per second, FPGA runs 130MHz
A9 processors : 500mW 800mW
FPGA fabric fully running: 500mW 1W
On Chip I/O few hundred mW, on board DRAM 800mW
Result: ~ 100x speedup at SAME power consumption, ~100GOps
Energy efficiency is in the 100 200 Gops/W range for the FPGA
in the complete system!
You can put your finger on the running chip in the system, warm
but not HOT : less than ~2 W!
Page 18
Performance and Power Measurements
MPSoC 2013
Energy Efficiency (MOPS/mWor OP/nJ)
Microprocessors
General
Purpose DSP
Dedicated
3 orders of
Magnitude!
Courtesy Bob Brodersen, based on published results at ISSCC conferences.
MPSoC 2013
Page 19
Energy Efficiency (MOPS/mWor GOPS/W)
Microprocessors
General
Purpose DSP
Dedicated
3 orders of
Magnitude!
Courtesy Bob Brodersen, based on published results at ISSCC conferences.
This application
MPSoC 2013
Page 20
Every processor benefits from a combination with FPGA:
Zynq device
Every FPGA benefits from a combination with a processor:
Zynq device
We have shown an application programmed in OpenCV,
leveraging Vivado HLS running on a 1-2W Zynq device at
1080p 60fps real-time.
Special thanks to the team that worked on the demo:
Jack Lo, Fernando Martinez Vallina, S. Mohan, Vinod Kathail,
and to many colleagues in the Vivado High Level Synthesis
team and the Video Platform teams.
Page 21
Conclusion
MPSoC 2013
OpenCV and HLS video:
http://www.xilinx.com/csi/training/vivado/leveraging-opencv-and-
high-level-synthesis-with-vivado.htm
OpenCV and HLS application note:
http://www.xilinx.com/support/documentation/application_notes/
xapp1167.pdf
Xilinx Zynq 702 board:
http://www.xilinx.com/products/boards-and-kits/
EK-Z7-ZC702-G.htm
http://www.zedboard.org/
Page 22
You can do this too:
MPSoC 2013