Clocking Strategies

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

VLSI Design I

CMOS Sequential Logic


Clocking Strategies

Today’s handouts:
(1) Lecture Slides

MicroLab, VLSI-10 (1/21)

JMM v1.2
Sequential Logic
Use #1: Get better utilization from
idle combinational logic blocks.
Pipeline the system so that new
computations start before the old ones
complete. Add registers to keep
computations separate.

8
A
8 Use #2: Convert parallel operations
x C
B to a sequence of (faster, smaller)
8 serial operations.
1
A
1
+ C
B
8 8

Use #3: Need to process a


sequence of inputs and want to
reuse the same hardware (finit
state machine).

MicroLab, VLSI-10 (2/21)

JMM v1.2
Latches and Flip-Flops
Q follows D

D Q D

G G
Q
level sensitive latch
Q stable

Q takes value from D

D Q D

clk clk
Q
edge sensitive flip-flop

Q stable

A static latch will hold data while G is inactive, however long


that may be. A dynamic latch will hold data while G is
inactive, but only “for a while”, after which the saved value
may decay.
Do static latches dissipate static power?
How long is “for a while”?
Which one should I use?
MicroLab, VLSI-10 (3/21)

JMM v1.2
Latch Timing Constraints #1
latch a latch b

D Q CLa D Q CLb D Q

G G G

CLK

t1a
t2b
H S
CLK H S

Do I have to
check ALL these t1a = tmqa+ tmda > thb
constraints?
t1b = tmqb + tmdb > tha
t2a = tqa + tda < tc0 - tsb
t2b = tqb + tdb < tc1 - tsa
th = hold time
ts = setup time
tm = min delay from invalid input to invalid output

td = max delay from valid input to valid output for comb. logic
tq = max delay from G to Q

tc0 = low periode of clock cycle tc


MicroLab, VLSI-10 (4/21)

JMM v1.2
Latch Timing Constraints #2
t1a
t2b
H S
CLK H S

t1a = tmqa+ tmda > thb


t1b = tmqb + tmdb > tha
t2a = tqa + tda < tc0 - tsb
t2b = tqb + tdb < tc1 - tsa

Questions for latch-based designs:


w how much time for useful work (i.e. for combinational logic
delay)?
tda + tdb < tc - 2(ts + tq)
w what is the maximal clock frequency

w does it help to guarantee a minimum tm, for example, by requiring


a minimum number of gates in each cloud?
w Suppose the maximum clock skew is tSKEW. How does that affect
the equations above? Clock skew measures the difference in
arrival of CLK at two cascaded latches (not necessarily any two
latches!).
MicroLab, VLSI-10 (5/21)

JMM v1.2
Static Latches
Basic idea: Want storage node to
be isolated from whatever
Need gain around user does to Q.
this loop to make 0
latch static.
Q
D 1
Would like fast CLK-to-Q,
small setup and zero hold
times.
CLK
Oops… feedback not
Obvious implementation: isolated from Q. Could
add additional
output inverters...

Good! Input goes


only to fet gates
Q

D D

CLKN

CLK CLK
Should we buffer CLK
0, 1 or 2 times?

MicroLab, VLSI-10 (6/21)

JMM v1.2
Latch Timing
1 2

CLK

setup time = how long D input has to be stable


before CLK transition.
hold time = how long D input has to be stable
after CLK transition.
ts
th
CLK

So, what node should we use to measure


setup and hold times? And what should we measure?

Other time of interest: CLK-to-Q MicroLab, VLSI-10 (7/21)

JMM v1.2
Dynamic Latches
Suppose in the interest of speed we were
willing to give up the “static guarantee”
and take our chances with dynamic latches,
i.e., remove feedback path...
Eliminate when
Q fanout is small (1)

D Q
Can combine
other logic
with inverter
CLK local or global
clock inverter?

Can we do without the CLK inverter too?


DEC did without on 21064 but put in back in for 21164

CLKN
D Q
D Q
CLK
CLK

Delete the PFET driven by CLKN and then add


NFET driven by CLK in Q’s pulldown path to
handle what happens when D goes from 1 to 0.

MicroLab, VLSI-10 (8/21)

JMM v1.2
Single-Phase Clocked Systems
RTL #1:

D Q D Q D Q

clk clk clk

CLK

latch #2:
D Q D Q D Q

G G G

CLK

Simplest clocking methodology is to use a single clock in conjunction


with a register. Clocks are generated with global clock buffers.
CLK and CLK are generated locally.
buffers necessary
for large loads
clk-in
clk

clk
MicroLab, VLSI-10 (9/21)

JMM v1.2
Clock Skew
D Q D Q D Q

clk clk clk

CLK delay delay

w if a clock net is heavily loaded, there might be a race


between clock and data -> clock skew
w special attention has be made by designing the clock
tree. CAD tools are able to design balanced clock trees.
w two methods to avoid clock skew:
latch
D Q D Q D Q

clk clk clk

CLK delay

D Q D Q

clk clk

delay CLK
MicroLab, VLSI-10 (10/21)

JMM v1.2
Two-Phase Clocked Systems

D Q D Q D Q

G G G
PHI1
PHI2
phi1
“non-overlapping
two phase clocks” phi2

w a problem in singlem phase clocked systems is the


generation ad distribution of nearly perfect overlapping
clocks.
w in two-phase clocked systems this is solved by non-
overlapping clocks
w non-overlapping clocks can be generated with latch
structures
clk ≥1 phi1

≥1 phi2

MicroLab, VLSI-10 (11/21)

JMM v1.2
Clock Distribution
Two main techniques for clock distribution exist:
u a single large buffer (see Alpha processor)

u a distributed clock tree approach

n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath delays have
n-bit datapath to match
clk n-bit datapath between
n-bit datapath stages
n-bit datapath
n-bit datapath
n-bit datapath

u there is no such thing as design-free clocking


strategy in today’s high-performance processes
u clock buffers should be surrounded by power pads
due to its large power consumption
vdd clk gnd clk

clk clk clk clk driver

clk

MicroLab, VLSI-10 (12/21)

JMM v1.2
Phase Locked Loop Clock Technique
Phase locked loops (PLL) are used to generate
internal clocks on chips for two main reasons:
u to synchronize the internal clock of a chip with an
external clock
u to operate the internal clock at a higher rate than
the external clock input
clock clock

PLL

clock clock
route route

dclk dclk

dclk+dpad dclk+dpad

clock clock

dclk dclk

data out data out


MicroLab, VLSI-10 (13/21)

JMM v1.2
Flip-flops (registers)
Using alternating positive and negative dynamic latches with
a single clock gives great speed and small area, but…
w lots of worries about clock skew
w must balance logic delays to minimize wastage
w need latch size checks (check optimizations!)

What about those of us who don’t have buildings full of


engineers to sweat the details? Use D-flip-flops and
address all the problems once!

D D Q D Q Q D D Q Q
master slave
G G CLK
CLK

D
CLK

Q
!
MicroLab, VLSI-10 (14/21)

JMM v1.2
Flip-flop Implementations
Obvious implementation:

Q
D

CLK

Use “jamb” latches to lighten CLK load:


“Weak” feedback inverters
(long n and p) get overridden

D Q

CLK

MicroLab, VLSI-10 (15/21)

JMM v1.2
Flip-Flop Timing
D Q CLa D Q

clk clk

CLK

t1
t2
CLK

t1 = tmq + tma > th


t2 = tq + tda < tc - ts

Questions for register-based designs:


w how much time for useful work (i.e. for combinational logic
delay)?
w does it help to guarantee a minimum tm? How about designing
registers so that
tmq > th?
w Supose the maximum clock skew is tSKEW. How does that affect
the equations above?

MicroLab, VLSI-10 (16/21)

JMM v1.2
Dynamic Flip-Flops
I’ll have the Christer Svensson
special please!
2

CLK QN

CLK is low:
w node 1 follows not(D)
w node 2 pulled up
w QN is “floating” with it’s old value

CLK is high:
w node 2 = “0” if node 1 = “1”,
otherwise it stays “1”
ð node 2 = not(node 1) shortly after CLKé
w QN = not(node 2) ð stable soon after CLKé
w node 1 can be pulled down if D goes to “0” (capacitive
coupling), but node 2 won’t change!
MicroLab, VLSI-10 (17/21)

JMM v1.2
Static Timing Analysis
Do I have to Yup, for every pair of connected
check ALL the register/latches AND for all
constraints? possible data values!

We need a CAD tool: static timing analyzer. Here’s how


it works:
Step 1: “Level-ize” all signal nodes.
Start by assigning all register outputs and top-level inputs a
level of 0. For all other gates: levelOUTPUT =
max(levelINPUT )+1.

Step 2: Compute min/max signal delays.


For each successive node level, compute min and max time for
all nodes on that level (see next slide for details). This is a
“data independent” computation. Might need case analysis to
avoid false paths.

Step 3: Check setup and hold constraints


Use min times of register inputs to check hold time. Use max
times and tCLK to check setup time or use max time + tSETUP
to determine min tCLK.

MicroLab, VLSI-10 (18/21)

JMM v1.2
Stage Delay Computation
Look at each gate and use knowledge of input timing and rise/fall
timing to compute earliest and latest time output could change for
both rising and falling output transitions.

IN VDD

D é ð OUT ê
C1 COUT
2
CLKN min ð 1=OV, fast
IN OUT max ð 1=VDD, slow
CLK
1 IN GND

D ê ð OUT é
C2 COUT
Other transitions:
CLK é, CLK ê, CLKN é, CLKN ê min ð 2= VDD , fast
max ð 2=0V, slow

Use Penfield-Rubenstein model to compute


td,in-out = sum(Ri,Ci) over all nodes “i” in the stage, where Ri is
total “effective resistance” to power rail and C i is non-zero if node
capacitor needs to be charged/discharged. Multiply by derating
factor to account for rise/fall time of input.

MicroLab, VLSI-10 (19/21)

JMM v1.2
Coming Up...
Next topic…
Finite state machines: state diagrams, state
minimization, state assignment, logic and PLA
implementations.

Readings for next time…


Weste:
u Sections 5.5 thru 5.5.6 (latch, FF)
u 5.5.8 thru 5.5.11 (clock strategy)

u 5.5.15 and 5.5.16 (clock strategy)

Selfstudy…
Weste:
u PLL section 9.3.5.3

MicroLab, VLSI-10 (20/21)

JMM v1.2
Exercises: VLSI-10

Ex vlsi10.1 (difficulty: easy): calculate peak current


and power cnsumption of a 100MHz clock driver
with rise and fall times of 1ns driving 30k registers
bits at 100fF each with Vdd=3.3V
Result: Ipeak=9.9A, Pd=2.18 Watt

MicroLab, VLSI-10 (21/21)

JMM v1.2

You might also like