Modica RASC

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

Reconfigurable

Application
Specific
Computing
Presented by:

Steve Modica
RASC Product Manager

Silicon Graphics, Inc.


SGI Proprietary
Altix 350

PC2100
PC2700
PC2100
PC2700 4 Channels SDRAM
Itanium2 DDR
DDRSDRAM
DDRSDRAM 10.8 – 12.8 GB/s
DDRSDRAM
SDRAM
NUMAlink4
SHUB
Front Side Bus NUMAlink4
6.4 GB/s
2 Channels NUMAlink
12.8 GB/s
Itanium2
PIC

4 Slots / 2 PCI-X Busses


PCI-X 2 GB/s

Ethernet
SCSI Disk BASE I/O

SGI Proprietary | 9/20/2004 | Page 2


SGI Altix™ 3700 Bx2 Platform
Introduction:
CR-Brick
IP57 -Node
Components
Board CR-Brick
Node 0 ddr1 SDRAM ddr1 SDRAM ddr1 SDRAM ddr1 SDRAM

ddr1 SDRAM ddr1 SDRAM ddr1 SDRAM ddr1 SDRAM


P P
ddr1 SDRAM ddr1 SDRAM ddr1 SDRAM ddr1 SDRAM I/O
A
Processor Node Board
(Intel Madison 9M
6.4GB/s
ASIC P P
(Shub1.2)
Processor
(Intel Madison 9M R A R
O Node Board
O
NL4 NL4 U U
Network Network I/O
T P P T
6.4GB/s 6.4GB/s 2.4GB/s
Full Duplex Full Duplex Full Duplex
E A E
R Node Board
R
NL4 NL4
P P
A
Node Board I/O

SGI Confidential
Slide 3
SGI Altix™ 3700 Bx2 Platform
Introduction:
Itanium® 2 CR-brick
Building Blocks CPU and memory

M-brick
Memory

SGI®
R-brick Advanced
Router interconnect
Linux
Environment
IX-brick With
Base I/O module SGI
ProPack

PA-brick, PX-brick
PCI-X expansion

D-brick2
Disk expansion

SGI Confidential
Slide 4
SGI Altix™ 3700 Bx2 Platform
Introduction:
System Topology Example
Router Plane 1

Router Plane 2

SGI Confidential
Slide 5
Reconfigurable Application Specific Computing
Accelerating Interaction

Speedup interactive analysis and modeling


– CPUs are often the bottleneck in computations p ute Compute
IO
– Goal is to insert faster elements C om
Access to data

IO
ut
is critical

mp
Style 1 -- Traditional FPGAs

Co
– Work with traditional FPGAs in PCI / PCI-X slots Memory
• Nallatech, Clearspeed, Annapolis Micro et al bandwidth is
– Development environments relatively advanced the key to
• All driving to same goal of “write in C, run on FPGA” Sp success
– Leverages other industry efforts Ele ecia h ics
p
• Cray, PCs, Clusters me list G ra
nts
Specialist hics
Grap
Style 2 -- Tightly coupled Elements
– Athena --- FPGA + memory for computation at high b/w
– Daytona --- FPGA + spigots for fast network
– Both being proto’d by a few customers

Confidential
The 3 Single-Paradigm Architectures

Scalar Vector App-Specific

Intel Itanium Cray X1 Graphics - GPU


SGI MIPS NEC SX Signals - DSP
IBM Power Prog’ble - FPGA
Sun SPARC Other ASICs
HP PA

SGI Proprietary | 9/20/2004 | Page 7


Paradigms to Applications
high

Application-specific

Application-specific
Compute
Intensity

Vector
Scalar
Low

Low Data locality High

SGI Proprietary | 9/20/2004 | Page 8


Architectural Challenges

• Ease of Use
– Languages
– Compilers
– Debuggers
– APIs

• Performance
– Bandwidth to/from System
– Scalability

SGI Proprietary | 9/20/2004 | Page 9


Ease of Use

•Leverage 3rd Party Std Language Tools


–Celoxica, Impulse Acceleration, Mitrion, Viva
–In discussions with other HLL tool vendors

•Developed an FPGA aware version of GDB


–Capable of debugging the FPGA and System Software
–Capable of multiple CPUs and multiple FPGAs

•Developed RASC Abstraction Layer (RASCAL)

•Provide for HDL modules


–Integrated environment with debugger
–Highest performance

SGI Proprietary | 9/20/2004 | Page 11


Contrasting ISVs
Hardware Software

HDL Handel-C Impulse-C VIVA Mitrion C

SGI Proprietary | 9/20/2004 | Page 12


Ease of Use v. Efficiency

x
High

VHDL
Verilo
g
Efficiency

x x

x
x
Low

Easy Ease of Use Difficult

SGI Proprietary | 9/20/2004 | Page 13


ISV Features

• Handel-C
– Runs on Windows only
– Plans to port to Linux in June of 2005
– Most efficient procedural language
• Starbridge VIVA
– Extremely easy to learn, Graphical, Object-oriented
– Develop on Windows only, execute anywhere.
– Easiest language to program, creates very efficient cores
– Large library of packaged algorithm primitives
• Mitrion C
– Runs natively on Altix
– Utilizes a processor abstraction
– Most useful debugging environment
• Impulse-C
– Runs on Windows
– Highly optimized for Streaming Applications
– Fastest language to port legacy C code

SGI Proprietary | 9/20/2004 | Page 14


Bitstream Generation using
High Level Language Tools
HLL Design Entry Design Verification
(Handel-C, Impulse C, Mitrion C, Viva)
RTL Generation and
Integration with Core Services .v, .
vhd Behavioral Simulation
.v, .vhd
.v, .v (VCS, Modelsim)
IA-32 hd
Linux Design Synthesis
(Synplify Pro,
Metadata Amplify)
Machine Processing
.edf
(Python) Static Timing Analysis
.ncd, . (ISE Timing Analyzer)
Design Implementation pcf
.cfg (ISE)
.bin

Device Programming .c Real-time Verification


Altix (RASC Abstraction Layer, (gdb)
Device Manager, Device Driver)

SGI Proprietary | 9/20/2004 | Page 16


Ease of Use

•Leverage 3rd Party Std Language Tools


–Celoxica, Impulse Acceleration, Mitrion, Starbridge Viva
–In discussions with other HLL tool vendors

•Developed an FPGA aware version of GDB


–Capable of debugging the FPGA and System Software
–Capable of multiple CPUs and multiple FPGAs

•Developed RASC Abstraction Layer (RASCAL)

•Provide for HDL modules


–Integrated environment with debugger
–Highest performance

SGI Proprietary | 9/20/2004 | Page 17


FPGA Aware Debugger

• Based on Open Source Gnu Debugger (GDB)


• Uses extensions to current command set
• Can debug host application and FPGA
• Provides notification when FPGA starts or stops
• Supplies information on FPGA characteristics
• Can “single-step” or “run N steps” of the algorithm
• Can HLL line step / step per C-line source
• Dumps data regarding the set of “registers” that are
visible when the FPGA is active

SGI Proprietary | 9/20/2004 | Page 18


Optimal Debugging Environment

Algorithm.c

tmp = a & b;
(gdb) fpgastep
Debugger running d = tmp | c;
(gdb) p/x $a
$6 = 0x444433 in real time
(gdb) p/x $b
$7 = 0x111122
(gdb) p/x $tmp
$8 = 0x555533
(gdb) fpgastep
(gdb) p/x $tmp a COP FPGA
$9 = 0x555533 tmp
&
(gdb) p/x $c
$10 = 0x331222 b
| d
(gdb) p/x $d
$11 = 0x111022
c

SGI Proprietary | 9/20/2004 | Page 19


Ease of Use

•Leverage 3rd Party Std Language Tools


–Celoxica, Impulse Acceleration, Mitrion
–In discussions with other HLL tool vendors

•Developed an FPGA aware version of GDB


–Capable of debugging the FPGA and System Software
–Capable of multiple CPUs and multiple FPGAs

•Developed RASC Abstraction Layer (RASCAL)

•Provide for HDL modules


–Integrated environment with debugger
–Highest performance

SGI Proprietary | 9/20/2004 | Page 20


Application Programming Interface Overview

Open|Speedshop
Debugger (GDB) Download
Pro|Speedshop
Utilities
Application
User Space
Device
Abstraction Layer Manager
Library

Algorithm Device Driver Download Driver Linux Kernel

Co-Processor FPGA ( RASC Hardware ) Hardware

SGI Proprietary | 9/20/2004 | Page 21


Abstraction Layer: Algorithm API

The Abstraction Layer’s algorithm API mirrors the COP API with a
few additions that enable wide scaling,
Algorithm
Input Data Output Data
COP

COP
Application

COP

• and deep scaling.


Input Data Algorithm Output Data

COP COP
Application

SGI Proprietary | 9/20/2004 | Page 22


Ease of Use

•Leverage 3rd Party Std Language Tools


–Celoxica, Impulse Acceleration, Mitrion
–In discussions with other HLL tool vendors

•Developed an FPGA aware version of GDB


–Capable of debugging the FPGA and System Software
–Capable of multiple CPUs and multiple FPGAs

•Developed RASC Abstraction Layer (RASCAL)

•Provide for HDL modules


–Integrated environment with debugger
–Highest performance

SGI Proprietary | 9/20/2004 | Page 23


Verilog / VHDL Module Support

•Templates for Verilog


–Fast start to algorithm coding
•Templates for VHDL
–Fast start to algorithm coding
•Provide a system simulation stub
–Allows both simulation debug or system debug
•Provide source code for core service
–Allows user to modify to meet special needs
•Extractor tools supports GDB meta-data
–Application and FPGA debugging

SGI Proprietary | 9/20/2004 | Page 24


Proto-type Configuration

NUMAlink4

Altix 350
MOATB

SGI Proprietary | 9/20/2004 | Page 25


Performance

•Direct Connection to NUMAlink4


6.4GB/s/connection
•Fast System Level Reprogramming of FPGA
FPGA load at memory speeds
•Atomic Memory Operations
Same set as System CPUs
•Hardware Barriers
Dynamic Load Balancing
•Configurations to 128 NUMA/FPGA Nodes
Scalability

SGI Proprietary | 9/20/2004 | Page 26


MOATB Block Diagram

Addr & Ctrl 2MB


QDR SRAM

36 36
NUMAlink Connectors
Addr & Ctrl

72

Algorithm 36

QDR SRAM
TIO SSP

2MB
FPGA 36
72

NUMAlink 12.8 GB/s


36 36
Addr & Ctrl
PCI 66MHz 2MB
SSP 6.4 GB/s Select Map
Programming Interface
QDR SRAM

QDR SRAM 9.6GB/s


3 reads @ 1.6GB/s Loader
3 writes @ 1.6GB/s FPGA

SGI Proprietary | 9/20/2004 | Page 28


System Configuration

PC2100
PC2700 2MB SRAM 0
PC2100
PC2700 QDR SRAM

Itanium2 DDR
DDRSDRAM
SDRAM NUMAlink Addr & Ctrl 36 36
DDR
DDRSDRAM
SDRAM Addr & Ctrl

72

SHUB Algorithm 36

QDR SRAM
TIO SSP

2MB
FPGA 36
72
Itanium2 SRAM 1
PIC
36 36
Addr & Ctrl
PCI 66MHz 2MB
Select Map QDR SRAM
Programming Interface
PCI-X SRAM 2
Loader
BASE I/O FPGA
Altix 350 MOATB

SGI Proprietary | 9/20/2004 | Page 29


MOATB Data Performance

SSP System Interface Performance


Measured performance MOATB with SSP test card
bitstream
• DMA Read => 2.548 GB/s
• DMA Write => 2.607 GB/s

Measured performance MOATB with MBCS bitstream


• DMA Read => 1.588 GB/s
• DMA Write => 1.589 GB/s
Limited by 1.6 GB/s of external SSRAMs

MOATB Core Services


Core Clock Frequency 200MHz

SGI Proprietary | 9/20/2004 | Page 30


FPGA Architecture Overview

QDR-II SRAM
Bank 0

Reads @ 1.6GB/s
Write Read
3.2 GB/s
port 0 port 0 Write Writes @ 1.6GB/s
Core port 1
QDR-II SRAM
SSP Services Algorithm Block Bank 1
Block
Read
port 1
3.2 GB/s Read Write
port 2 port 2

QDR-II SRAM
Bank 2

SGI Proprietary | 9/20/2004 | Page 31


Algorithm Block as Submodule

alg_clk
do_step
Algorithm alg_rst
controller
step_flag Algorithm
alg_done Block
debug0
debug63
Debug

sram_wr_data[63:0]
sram_wr_addr[17:0]
sram_rd_cmd_vld

sram_wr_be[7:0]
sram_rd_addr[17:0]

sram_rd_dv
sram_rd_da

sram_wr_r
sram_wr_g

sram_wr_dv
port sram_rd_g
sram_rd_req

eq
ld

nt
ta

ld
nt

SRAM controller
(one bank shown)

SGI Proprietary | 9/20/2004 | Page 32


MOATB Sample Application Performance

Bit Manipulation (Crypto)


79x 1.5GHz Itanium-2 (single MOATB)
119x 1.5GHz Itanium-2 (dual MOATB)
DOD Bit Matrix Multiply Benchmark
TBDx 1.5GHz Itanium-2 (single MOATB)
Graphics Edge Detection
42x 1.5GHz Itanium-2 (single MOATB)
(DEMO at NAB)

SGI Proprietary | 9/20/2004 | Page 34


Reconfigurable Application Specific Processing

MOATB Proof of Concept


V2 - 6000
Athena Computation Brick
V2 - 6000
Abacus Computation Blade
V4 LX 200 V4 FX200 Virtex 5

Daytona Ingest/Egress Blade


V2 Pro 100 V4 FX100 Virtex 5
System Interface
NL4 / SSP NL5 / SSP2

Systems
Altix 3700/350 BX2 SHUB2 UV

2004 2005 2006 2007 2008

SGI Proprietary | 9/20/2004 | Page 35


Athena Computation Blade

2MB
QDR SRAM

NUMAlink Connectors

QDR SRAM
2MB
SSP Algorithm
TIO
FPGA

QDR SRAM
2MB
PCI 66MHz 2MB
QDR SRAM

Loader Algorithm FPGA Virtex2 6000 -6


FPGA

SGI Proprietary | 9/20/2004 | Page 36


Abacus Computation Blade

SSRAM SSRAM

SSRAM

NL4 SSP
V4LX200
TIO SSRAM

PCI
Selmap SSRAM

NL4
Loader
SSRAM
Selmap

SSRAM
NL4 SSP
TIO V4LX200
SSRAM

SSRAM SSRAM

SGI Proprietary | 9/20/2004 | Page 38


RASC 3U Chassis

Blade
Slots

TPS Power
Supply Slots

SGI Proprietary | 9/20/2004 | Page 40 5.128” high x 17.39” w


Investigations Underway

Additional 3rd Party Partnerships


– Pull in additional “Best in Industry Features”
– Help drive openFPGA.org direction
– Pull in IO and additional scalability features
New High Level Languages
– Matlab – Working with a RASC partner to add tool as
module generator
Library Support for Matlab*P
C-Code Improvement Tools
– FPGA aware Speedshop enhancements
– Source to source code optimizer targeted at 3rd party
tools

SGI Proprietary | 9/20/2004 | Page 41


SGI Proprietary | 9/20/2004 | Page 43

You might also like