VLSI Programming: Lecture 1
VLSI Programming: Lecture 1
VLSI Programming: Lecture 1
Course 2IN35 Course: Kees van Berkel Rudolf Mak Lab: [email protected] [email protected]
www:
http://www.win.tue.nl/~cberkel/2IN35/
Lecture 1
Introduction
2/7/2012
to acquire insight in the description, design, and optimization of fine-grained parallel computations; to acquire insight in the (future) capabilities of VLSI as an implementation medium of parallel computations; to acquire skills in the design of parallel computations and in their implementation on FPGAs.
2/7/2012
Contents
Massive parallelism is needed to exploit the huge and still increasing computational capabilities of Very Large Scale Integrated (VLSI) circuits: we focus on fine-grained parallelism (not on networks of computers); we assume that parallelism is by design (not by compilation); we draw inspiration from consumer applications, such as digital TV, 3D TV, image processing, mobile phones, etc.; we will use Field Programmable Arrays (FPGA) as finegrained abstraction of VLSI for practical implementation.
2/7/2012
FPGA
XC2VP30
4 2/7/2012
Notebook Exceed (can be obtained through the software distribution of the university) Access to UNIX server Dept. W&I (can be obtained through BCF, HG floor 8) Lab work is by teams of two students. Have FPGA tools (SW) installed on your machine by Feb 28
2/7/2012
May 22
2/7/2012
the quality of your programs/designs your final report on the design and evaluation of these programs (guidelines will follow) a concluding discussion with you on the programs, the report and the lecture notes intermediate assignments
2/7/2012
Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science 1999. This book is recommended.
Mandatory reading: Keshab K. Parhi. High-Level Algorithm and Architecture Transformations for DSP Synthesis. Journal of VLSI Signal Processing, 9, 121-143 (1995), Kluwer Academic Publishers.
2/7/2012
Introduction
Parhi, Chapters 1, 2
DSP Representation Methods Iteration bounds
9 2/7/2012
10
2/7/2012
11
2/7/2012
12
2/7/2012
13
2/7/2012
Power dissipation 130 W (107 W typical) 3 Levels of Cache: 16k + 16k, 256K, 6M CMOS technology: 130nm Clock frequency1.5 GHz 7,877 pins, 95 percent of the are for power 1,322 Specint base2000 for a single processor! Next generation: 1.7GHz and 9MByte cache!
Rusu et. al [Intel], Itanium 2 Processor 6M: Higher Frequency and Larger L3 Cache, IEEE Micro April 2004, vol. 24, Issue 2, pp. 10-18.
14
2/7/2012
Moores Law
15
2/7/2012
Every 2 generations of IC technology (6 years) device feature size chip size clock frequency number of i/o pins DRAM capacity logic-gate density 4 x 0.5 x 2x 2x 2x 16 x (no longer true)
16
2/7/2012
Involves over 1000 technical experts, world wide. a self-fulfilling prophecy? or wishful thinking?
17
ST-Ericsson confidential
2/7/2012
18
ST-Ericsson confidential
2/7/2012
19
2/7/2012
20
ST-Ericsson confidential
2/7/2012
300x 20x
100
Virtex-II Pro
10 200x10x
Spartan-3
1 1x 1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01 1/02 1/03 1/04
Year
21 2/7/2012
500MHz clock
450MHz PowerPC
22
2/7/2012
23
2/7/2012
24
2/7/2012
25
2/7/2012
26
2/7/2012
... the ultimate exploration tool ... and the ultimate software defined radio
27
antenna surface: 1 km2 (sensitivity 50) large physical extent (3000+ km) wide frequency range: 70 MHz 30 GHz full design by 2010; phase 1: 2017;
phase 2: 2022
1000- 1500 dishes (15m) in the central 5 km (2000-3000 total) + dense and/or sparse aperture arrays connected to a massive data processor by an optical fibre network
Software Defined Radio Astronomy computational load (on-line) 1 exa MAC power budget = 30 MW 30 pJ/MAC allall-in
28 MPSoC -- 2010, June 30
(1018 MAC/s)
References
Chip fotos:
http://www-vlsi.stanford.edu/group/chips.html
ITRS Roadmap
http://www.itrs.net/Links/2005ITRS/ExecSum2005.pdf
29
30
2/7/2012
Sample rate
# instructions/sample
31 2/7/2012
speech (de-)coding speech recognition speech synthesis speaker identification Hi-fi audio en/decoding noise cancellation audio equalization ambient acoustic
emulation.
sound synthesis echo cancellation modem: (de-)modulation vision image (de-)compression image composition beam cancellation spectral estimation etc.
32
2/7/2012
where
x is the input sequence y is the output sequence h is the impulse response (filter coefficients) N is the number of taps (coefficients) in the filter
N 1
34
2/7/2012
y (k ) = WN nk x(n)
n =0
N 1
WN
j = 1
x is the input sequence in the time domain (real or complex) y is an output sequence in the frequency domain (complex)
The Inverse Discrete Fourier Transform (IDFT) is computed as
N 1 k =0
x ( n ) = WN
nk
y (k ), for n = 0, 1, ... , n - 1
The Fast Fourier Transform (FFT) and its inverse (IFFT) provide an efficient method for computing the DFT and IDFT.
35
2/7/2012
2 x ( n) = N
N 1 k =0
e(k ) cos[
where e(k) = 1/sqrt(2) if k = 0; otherwise e(k) = 1. A N-Point, 1D-DCT requires N2 MAC operations.
36
2/7/2012
1 d= N
N 1 i =0
| x(i) rk (i) |
2/7/2012
1 d= N
[ x(i ) rk (i )]2
37
38
2/7/2012
Computation Rates
where
RC = RS N S
Rc is the computation rate Rs is the sampling rate Ns is the (average) number of operations per sample
39
2/7/2012
40
2/7/2012
41
2/7/2012
42
2/7/2012
43
2/7/2012
44
2/7/2012
45
2/7/2012
46
2/7/2012
Linear Systems
input x, output y: discrete system:
x(n)
results in
y(n)
c1 x1(n) + c2 x2(n)
c1 y1(n) + c2 y2(n)
47
2/7/2012
48
2/7/2012
x(n)
LTI System A
f(n)
LTI System B
y(n)
is equivalent to
x(n)
LTI System B
g(n)
LTI System A
y(n)
49
2/7/2012
Example of a
multi-rate DFG:
50
2/7/2012
Iteration period
Iteration period = the time required for the execution of one iteration of the SFG
a
Example:
x(n)
Let
y(n-1)
51
2/7/2012
52
2/7/2012
53
2/7/2012
54
2/7/2012
55
2/7/2012
inputs
outputs
57
2/7/2012
DSP references
Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science 1999.
Richard G. Lyons. Understanding Digital Signal Processing (2nd edition). Prentice Hall 2004.
John G. Proakis and Dimitris K Manolakis. Digital Signal Processing (4th edition), Prentice Hall, 2006.
Simon Haykin. Neural Networks, a Comprehensive Foundation (2nd edition). Prentice Hall 1999.
58
2/7/2012
Hennessy and Patterson, Computer Architecture, a Quantitative Approach. 3rd edition. Morgan Kaufmann, 2002.
Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSP Processor Fundamentals, Berkeley Design Technology, Inc, 1994-199
Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEE Signal Processing Magazine, 2000.
Kees van Berkel et al. Vector Processing as an Enabler for Software-Defined Radio in Handheld Devices, EURASIP Journal on Applied Signal Processing 2005:16, 2613-2625.
59
2/7/2012
Parhi, Chapters 2, 3
Representations of DSP algorithms Data flow graphs Loop bounds and iteration bounds Pipelining of digital filters Parallel processing Retiming techniques
60
2/7/2012
THANK YOU