Project Ele 447
Project Ele 447
Project Ele 447
The goal of the project is to realize an 8x8 bit unsigned binary multiplier using a state-of-
the-art CMOS Process. Magic will be used as layout editor tool. HSpice and Irsim are
used as verification tools. The final product will be a layout wired inside a 28 pin DIP(See
Appendix 1)
frame. The design specifications are as listed below.
Fully Functioning Design.
Verifiable in static and dynamic modes
Throughput(See Appendix 2) > 10 Million operations per second
Introduction:
An 8x8 bit unsigned binary multiplier takes two 8 bits inputs and generates an output of
16 bits. They have several applications and are used in many microprocessors, for
example: Microchips PIC18F series microprocessors.
Commercial Multipliers:
Texas Instruments SN54LS261, SN74LS261, SN74284 are a few examples of
multipliers available in the market.
Design Challenges:
An 8x8 bit unsigned binary multiplier takes two 8 bits inputs and generates an output of
16 bits using some control signals such as Clk, Reset, Load etc. Therefore the package
should have a provision for at least 40 I/O pins. But the pad frame available for the
design has 28 pins only, in which 4 pins are taken by Vdd and GND. Thus there are only
24 available for the design.
The solution for this challenge would be to multiplex I/O pins wherever possible,
therefore the timing of the control circuitry is crucial.
Performance comparisons:
5V 3.3V
Max Frequency 125MHz 50MHz
Throughput (million operation per second) 15.6 6.25
Average Power 4.79mW 532W
These results are obtained from HSPICE. The values listed are approximate, actual
numbers can be obtained once the chip is tested (future work).
Procedure used:
This project is implemented in Top-down approach. and can be divided in to the
following 9 steps/stages.
Defining top-level blocks
Defining Logic
Choosing an approach.
Defining Inner blocks
Layout of the core multiplier block in Magic
Testing of the multiplier block
Control Circuitry design
Combining the multiplier block and the control circuitry
Testing and verification of the whole block.
Procedure:
The organization of the inputs and outputs in the 28pin package is the initial step of this
project and is performed in the following way.
Initially the whole design is divided into two sections:
The core multiplier section.
The control circuit which ensures the proper functionality of the multiplier.
We can further subdivide these sections in to smaller building blocks. At this point, the
main focus is on defining the building blocks. These blocks are shown in Figure2.
The basic building blocks necessary are the following:
8x8 multiplier
8 bit Counter
Latches/Registers
Mux (multiplexers)
The details of the block will be covered in the second stage of this project. The 8x8
bit multiplier block returns 16 bit output at the end of 8 cycles.
In order for debugging, and to make sure the counter is functioning properly, the
output of the counter will be monitored. There are two pins available, so the 8 bit
counter output is serially bit by bit . And the CounterByeSent pin is turned
high at each complete byte output.
Only the first time it takes one additional cycle to get the process started.
Step2: Defining Logic
To understand and develop the 8bit binary multiplier logic, lets take a look into a 3bit
binary multiplier. Following figure shows the regular method used to multiply to 3 bit
numbers.
Basically all bits of the multiplicand are multiplied by each bit of the Multiplier (serially).
The output of the second stage is shifted to the left and added to the previous outputs, as
shown in figure4. And the carry of the current stage is added to the next operation as
well.
So basically its a shift and add operation.
Array Multiplier:
In figures 4 and 5, X3X2X1X0 is the multiplicand and Y3Y2Y1Y0 is the multiplier. As
shown in the block diagrams, each multiplicand bit is available to each column, and each
bit of the multiplier is available to each row. For example In the row Y0 is anded with
X3X2X1X0 and the result is shifted and the added to the output of the next rows, which is
Y1 anded with X3X2X1X0. This process is continued for m bits of the multiplier.
The basic cells needed for these implementations are And, And with a Half adder
(And+HA), And with a Full Adder (And+FA) (See Appendix 3). In these approaches the
longest path determines the Propagation delays.
Figure 5: Parallel Multiplier version1
The basic cells needed for this approach are the Shift register, adder, and, 2to1 mux,
latches and delay registers.
In this approach the adder limits the speed. There are different kinds of adders, following
is a list of adders.
Ripple carry
Manchester-carry
Carry select
Carry save
Carry Look-ahead.
In this step the approach has to be decided. In order to do so, lets take a look at the pros
and cons of the approaches listed in the previous step.
The pros and cons of these approaches are listed in the following table.
Implementation type Worst case Propagation delay Area in 2 ( for 8bit
multiply)
Parallel Multiplier Version1 (2n-2) Tcarry + (n-1) Tproduct 61500
Parallel Multiplier Version2 (2n-2) Tcarry + Tproduct 691200
Serial-Parallel Multiplier Requires m cycles for the 36200
output
The area of the Parallel Multiplier Version1 can be approximately calculated as mxn
times the area of a single adder cell (where m and n are the number of multiplier and
multiplicand bits). The 1 bit adder cell dimensions are:
And the area of the Version2 can be approximately considered as an (m+1)Xn times the
area of a single adder cell.
When a pipelined version of Parallel Multiplier Version2 is used, it is faster but the area
will be even more, because of the added latches.
The following figure 8 shows the logic flow with an example: multiplication of two 3 bit
numbers 111 x 111 (it uses the same principle explained in figure 4). This shows more
details with respect to a clock.
In this case, it is taken that the multiplicand bits are all available and the multiplier bits
are provided sequentially one after the other (from LSB to MSB)
From the above figure, the basic building blocks necessary for multiplication using serial-
parallel implementation can be obtained.
Initially the multiplicand multiplier bits should be anded together. The initial result
obtained by multiplying the LSBs is required output (p0). In order to obtain the other bits
the previous values have to delayed (shifted) and added to the next anded inputs as
shown in above figure. For example: to obtain p1 the result of the adder cell (b1 and a0)
has to added to anded result of b0 and a1 in the second cycle. This delay is provided by
a Delay Flip-flop.
In order to make sure that all outputs are available at the end of 3 cycles, the results p0,
p1, p2 are passed through another set of delay flip-flops. This block diagram is as shown
in the following figure9.
One important point to remember is to make sure that the outputs of the multiplication
cycle do not affect the values of the next multiplication cycle. In other words the outputs
p3, p4, p5 which are available at the end of 3 cycles cant be carried over to the cycle.
This is taken care by providing mux. The control signal R1 is generated internally and
will be discussed in the Control Circuitry design section.
From the above figure, it can be seen that the 3 bit multiplier can be designed as three
blocks. Each block can be called as a basecell. The layout is described in the next step.
The 3bit multiplier was layed out in Magic. It was tested and verified using IRSIM. And
the same idea was used in building the 8bit multiplier.
Step5: Layout in magic
These cells are taken from the cell-library. All the cells
used are static.
This base cell was organized such that the input is given
at the top and the output is latched at the bottom and it
was designed such that when arrayed, no major routing
is necessary.
Building of an 8bit core multiplier:
The following picture 10 shows the core 8bit multiplier block. It can be seen that the
basecell described above has been arrayed 8 times. Also it can be seen no additional
routing was necessary.
The only routing needed was the connection from the adder of the LSB slice to the delay
flip-flop and the carry out of the MSB slice to the delay flip-flop.
This layout was extracted from Magic, verified and tested using IRSIM. The control
signals necessary were generated using IRSIM initially. Later the signals were generated
by creating additional circuitry (glue logic).
Step6: Control Circuitry design
In this step, additional logic was developed to generate the necessary control signals. The
control signals generated are the following
La (latch MSBsLSBs) (NLa)
Ld (NLd)
En
R1 (signal for the mux shown in figure9)
OutputSync (named as Outla)
The following figure shows the Control circuitry block, which generated La (latch
MSBsLSBs), Ld, R1 (signal which controls the mux)
La was generated by anding nq2, nq1, nq0 and NClk, where nq1, nq1, nq0 are the
outputs of the 3 bit counter. It was generated using a nand4_50 and an inverter (the right
most part of the following picture).
Ld/NLd was generated by anding nq2, nq1, nq0. It was generated using a nand3_50 and
an inverter (the after the counter3bsync_vj block of the following picture).
R1 was just delayed using a delay flip-flop (sdff2-50).
In order to generate the necessary control signals, the 3 bit counter was initially designed.
The following figure shows the construction of the 3 bit synchronous counter.
For dynamic testing purposes an internal 8 bit counter was also included in the layout.
The flowing figure shows the construction of the 8bit asynchronous counter.
All the above blocks were testing and verified using IRSIM. The above specified signals
can be seen in the complete testing IRSIM result, shown in figure 15.
Step7: Combining the core multiplier block and Control Circuitry design
In this step, the core multiplier block developed in step 5 and the additional Control
Circuitry developed was put together. The following figure 14 shows the complete 8 bit
multiplier.
Figure 14: 8x8 bit multiplier with the additional control Logic
The three rails (near 2) are the Reset, Clk/NClk.
The two rails (near 3) are Insel/NInsel
The two rails (near 4) are IntExt/NIntExt
The inputs are given at the top of the block.
The outputs are latched from the bottom. These are available in the 8 bit sliced
format.
Only the signals listed above should be provided by the user. The other control signal and
the complementary signals are generated internally.
Step8: Testing and Verification of the whole block
The complete layout shown in figure 14 was tested and verified using IRSIM and
HSPICE. Static and dynamic testing was performed.
For the dynamic testing, the outputs of an internal 8bit counter will be used as
inputs to the multiplier block.
For static testing, the inputs will be provided externally to the input pins shown.
The following figure shows the IRSIM result. The important signals for verification are
the ain, bin, lsbout, msbout, out, out8.
Names used:
ain, bin: are the inputs.
out, out8 (lsbout, msbout) are the outputs. Out shows the complete 16 bit output and out8
shows the output as two 8 bit slices.
R: the Reset which is also the Enable.
En: the enable for the 3 bit counter.
Counter3: the output of the 3bit counter
Counter8: the output of the 8bit counter
Insel: when the multiplier is selected as an external input, this signal allows to provide
multiplier takes the user specified input.
IntExt: signal used to specify the ain whether counter output or user defined inputs.
Outla: used to generate Out8. Also used as the OutputSync
As it can be seen the Out shows the results when the counter outputs are multiplied with
0xff (up to about 550ns). After that the result the IntExt signal is turned low, which
makes the multiplier block to use the external inputs. Hence the result shows the
multiplication of 0xff and 0xff,
This layout was tested under various conditions and the output obtained was correct.
Figure 15: Irsim Result
Figures 16 and 17 show the HSPICE result of the layout when tested at 3.3 and 5V. On
the top the plots show the counter outputs multiplied with 0xff and the values are correct.
These plots verify the IRSIM results.
Overall dimensions:
Width x height: 1386 X 1440
Area: 1995840 2
Number of channel transistors: 630 p-channel and 630 n-channel transistors
Conclusions: HSPICE and IRSIM results prove that the 8x8 bit unsigned binary
multiplier is working.
Following are some tasks which can still be performed on the layout.
By using a carry-save adder, the multiplier can be made faster.
Some of the routing area shown in figure11 can be reduced by changing the
location of the building blocks.
Observations: The timing of the control signals is very crucial. I made the following
observations during this project.
I had to use a synchronous 3 bit counter instead of the asynchronous 3bit counter
to avoid any glitches in the control signals.
Initially when I was trying to generate the Latch signal, I used an And of
R1(internal reset signal) and Clk. But due to the delay in the actual magic
implementation, the latch signal generated is longer than necessary; hence the
wrong outputs were latched. Hence I had to take this delay into consideration and
generated the clock signal used the counter outputs.
I had to place buffers (2 inverters) at couple of places for better driving capability.
By running the multiplier at 3.3V the power is low, but the max speed it can be
operated is considerably low too.
I had to align the blocks in the bit-slice in order to obtain symmetry. Figure9
shows the final block diagram I came up with, but I initially started out with the
bit-slice shown in Appendix 6. Also as can be seen the 2to1 mux shown in on the
left hand side of the adder was placing later on when the results obtained were
incorrect.
Appendix:
DIPs may be used for integrated circuits (ICs, "chips"), like microprocessors, or for
arrays of discrete components such as resistors or toggle switches. They can be mounted
on a printed circuit board (PCB) either directly using through-hole technology, or using
inexpensive sockets to allow for easy replacement of the device and to reduce the risk of
overheat damage during soldering.
2. Throughput
Throughput can be defined as the number of operations performed per unit time. Or it is
also defined as the number of outputs obtained per second. For example for this project,
10 Million operations per second imply that for every 100ns a product (output) has to be
seen.
Multiply is a complex operation, MIPS (Million instructions per second).
3. Multiplier cells (See Reference 2)
5. Tools Used:
Magic
Irsim
Hspice
Xfig
Xview
6. Initial block diagram of a bit slice, shown using a 3x3 bit multiplier:
References:
1. Discussions with Dr Fischer.
2. ELE 447 Notes for the project.
3. International Rectifiers website
4. Wikipedia website
Acknowledgements:
I would like to take this opportunity to thank Dr Fischer for all help throughout the
semester. I would also like to thank Farshid, my friends and colleagues for their input.
Future work:
The complete 8bit multiplier block can be placed in the 28 pin frame and
fabricated.
The following specs can be obtained once the chip is fabricated and tested under
various conditions.
Electrical Characteristics:
Ambient temperature:
Storage Temperature
Max output current sunk
Max output current Sourced
DC characteristics:
Supply voltage
Vdd rise time
Supply Current
Power down current
Input leakage current