Project Ele 447

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

ELE-447 Project

Design and Implementation of an 8x8 bit Binary Multiplier


Vijay Kumar Peddinti
Design and Implementation of an 8x8 bit Binary Multiplier

The goal of the project is to realize an 8x8 bit unsigned binary multiplier using a state-of-
the-art CMOS Process. Magic will be used as layout editor tool. HSpice and Irsim are
used as verification tools. The final product will be a layout wired inside a 28 pin DIP(See
Appendix 1)
frame. The design specifications are as listed below.
Fully Functioning Design.
Verifiable in static and dynamic modes
Throughput(See Appendix 2) > 10 Million operations per second

Introduction:
An 8x8 bit unsigned binary multiplier takes two 8 bits inputs and generates an output of
16 bits. They have several applications and are used in many microprocessors, for
example: Microchips PIC18F series microprocessors.

Commercial Multipliers:
Texas Instruments SN54LS261, SN74LS261, SN74284 are a few examples of
multipliers available in the market.

Brief description of SN74284: (http://focus.ti.com/docs/prod/folders/print/sn74284.html)


These high-speed TTL circuits are designed to be used in high-performance parallel
multiplication applications. When connected, these circuits perform the positive-logic
multiplication of two 4-bit binary words. The eight-bit binary product is generated with
typically only 40 nanoseconds delay.
This basic four-by-four multiplier can be utilized as a fundamental building block for
implementing larger multipliers. For example, the four-by-four building blocks can be
connected to generate sub-multiple partial products. These results can then be summed in
a Wallace tree and will produce a 16-bit product for the two eight-bit words typically in
70 nanoseconds. SN54H183/SN74H183 carry-save adders and SN54S181/SN74S181
arithmetic logic units with the SN54S182/SN74S182 look-ahead generator are used to
achieve this high performance. The scheme is expandable for implementing N M bit
multipliers.

Brief description of Texas Instruments MPY634: Wide Bandwidth Precision Analog


Multiplier
The MPY634 is a wide bandwidth, high accuracy, four-quadrant analog multiplier. Its
accurately laser-trimmed multiplier characteristics make it easy to use in a wide variety
of applications with a minimum of external parts, often eliminating all external trimming.
Its differential X, Y, and Z inputs allow configuration as a multiplier, squarer, divider,
square-rooter, and other functions while maintaining high accuracy.
The wide bandwidth of this new design allows signal processing at IF, RF, and video
frequencies. The internal output amplifier of the MPY634 reduces design complexity
compared to other high frequency multipliers and balanced modulator circuits. It is
capable of performing frequency mixing, balanced modulation, and demodulation with
excellent carrier rejection.
An accurate internal voltage reference provides precise setting of the scale factor. The
differential Z input allows user-selected scale factors from 0.1 to 10 using external
feedback resistors.

Design Challenges:
An 8x8 bit unsigned binary multiplier takes two 8 bits inputs and generates an output of
16 bits using some control signals such as Clk, Reset, Load etc. Therefore the package
should have a provision for at least 40 I/O pins. But the pad frame available for the
design has 28 pins only, in which 4 pins are taken by Vdd and GND. Thus there are only
24 available for the design.
The solution for this challenge would be to multiplex I/O pins wherever possible,
therefore the timing of the control circuitry is crucial.

Summary of the Project:


Features:
The circuit is realized using static CMOS devices, hence the power dissipations is
very less.
There is a provision for an 8-bit internal testing.
The Reset to the counter can also be used to reduce the power dissipation when
the chip is not in use

Chip Features (28 pins):

I/O pins descriptions:

Pin Number Type Name


4,18 Power Vdd
11,25 Power Gnd
13 Input Clk
14 Input Reset/Enable
12 Input Ext/intSelect
10 Input InputSelect
15 Input ProjectA/B Select
1-3, 5-9 Inputs In0-7
20-24, 26-28 Outputs Out0-7
19 Output OutputSync
16,17 Outputs Counter debug pins

Performance comparisons:
5V 3.3V
Max Frequency 125MHz 50MHz
Throughput (million operation per second) 15.6 6.25
Average Power 4.79mW 532W

These results are obtained from HSPICE. The values listed are approximate, actual
numbers can be obtained once the chip is tested (future work).
Procedure used:
This project is implemented in Top-down approach. and can be divided in to the
following 9 steps/stages.
Defining top-level blocks
Defining Logic
Choosing an approach.
Defining Inner blocks
Layout of the core multiplier block in Magic
Testing of the multiplier block
Control Circuitry design
Combining the multiplier block and the control circuitry
Testing and verification of the whole block.

The report is also organized in the above listed order.


Step1: Top-Level Block diagram. Control signals And Timing

The first step is to define the following:


A Top-level block diagram.
The necessary control signals
The timing needed for the proper functioning of the chip.

Procedure:
The organization of the inputs and outputs in the 28pin package is the initial step of this
project and is performed in the following way.
Initially the whole design is divided into two sections:
The core multiplier section.
The control circuit which ensures the proper functionality of the multiplier.

We can further subdivide these sections in to smaller building blocks. At this point, the
main focus is on defining the building blocks. These blocks are shown in Figure2.
The basic building blocks necessary are the following:
8x8 multiplier
8 bit Counter
Latches/Registers
Mux (multiplexers)

Defining Control Signals:


Defining control signals is a very important step for the datapath to operate correctly.
This is the first step toward the control logic for the datapath. These signals clarify
exactly what is required for correct operation and completion of the instructions.
Control signals include not only the control for the registers but also indication of sending
a data packet.

Following is the list of the control signals defined:


InputSelect
Ext/IntSelect
LdInput
Reset
La (Latch LSBs and MSBs)
OutputSync
CounterByteSent
The following figure shows the pin-out of the final design using the 28 pin package.

Figure 1: Pin Out of the chip


Figure 2: Top-Level Block Diagram
Figure 3: Timing Diagram
.
Description, function and timing of the control signals:
Figure 3 shows the timing information. All the control signals have to be derived from
the Clock signal and are shown with reference to the Clock.
InputSelect: This signal is used to load the Multiplicand or the Multiplier via the
Mux. When the InputSelect is high, the Multiplicand is be loaded to a register
and when its low the multiplier is loaded. This has an effect on the multiplier only
when the external inputs are used.
Ext/IntSelect : This signal is used to specify whether the inputs to the multiplier
are internally generated counter outputs or the external input given by the user.
An 8bit counter is available and the outputs of the counter can be used as inputs of
8x8 bit multiplier.
Ld (LdInput): This signal is used to load both the Multiplicand and the Multiplier
into the registers. These are the inputs of the 8x8 bit multiplier block.
Reset: Reset is an external input given by the user. This Reset can be used an
enable. When the reset is low the multiplier will function otherwise the circuit
will not work. The circuit has to be initially reset.
La (LatchMSBs/LSBs): This signal is used to latch the MSBs and LSBs of output.
OutputSync: This signal is generated internally, this signal can be used to latch
the outputs as two 8bit slices and also used as the indicator of the outputs sent
from the chip. The rising edge of OutputSync(OutLa in the circuit) indicates the
completion of LSB and starting of MSB. The falling edge indicates the
completion of MSB and starting of LSB.

The details of the block will be covered in the second stage of this project. The 8x8
bit multiplier block returns 16 bit output at the end of 8 cycles.

In order for debugging, and to make sure the counter is functioning properly, the
output of the counter will be monitored. There are two pins available, so the 8 bit
counter output is serially bit by bit . And the CounterByeSent pin is turned
high at each complete byte output.
Only the first time it takes one additional cycle to get the process started.
Step2: Defining Logic

To understand and develop the 8bit binary multiplier logic, lets take a look into a 3bit
binary multiplier. Following figure shows the regular method used to multiply to 3 bit
numbers.

Figure 4: Picture showing 3x3 bit multiplication logic

Basically all bits of the multiplicand are multiplied by each bit of the Multiplier (serially).
The output of the second stage is shifted to the left and added to the previous outputs, as
shown in figure4. And the carry of the current stage is added to the next operation as
well.
So basically its a shift and add operation.

Following are two different approaches to implement the logic.


Array multiplier
Serial parallel multiplier

Figures 5, 6 and 7 shows the different implementation block diagrams(Reference 2).

Array Multiplier:
In figures 4 and 5, X3X2X1X0 is the multiplicand and Y3Y2Y1Y0 is the multiplier. As
shown in the block diagrams, each multiplicand bit is available to each column, and each
bit of the multiplier is available to each row. For example In the row Y0 is anded with
X3X2X1X0 and the result is shifted and the added to the output of the next rows, which is
Y1 anded with X3X2X1X0. This process is continued for m bits of the multiplier.

The basic cells needed for these implementations are And, And with a Half adder
(And+HA), And with a Full Adder (And+FA) (See Appendix 3). In these approaches the
longest path determines the Propagation delays.
Figure 5: Parallel Multiplier version1

Figure 6: Parallel Multiplier Version2

To speed up these techniques, pipelining can be used.


Serial-Parallel Multiplier:
Figure7 shows the block diagram of a Serial-parallel Multiplier. In this case the
implementation is performed as bit-slices. Each block again works on the same principle.

The basic cells needed for this approach are the Shift register, adder, and, 2to1 mux,
latches and delay registers.

In this approach the adder limits the speed. There are different kinds of adders, following
is a list of adders.
Ripple carry
Manchester-carry
Carry select
Carry save
Carry Look-ahead.

Figure 7: Serial-Parallel Multiplier


Step3: Choosing an Approach

In this step the approach has to be decided. In order to do so, lets take a look at the pros
and cons of the approaches listed in the previous step.

The pros and cons of these approaches are listed in the following table.
Implementation type Worst case Propagation delay Area in 2 ( for 8bit
multiply)
Parallel Multiplier Version1 (2n-2) Tcarry + (n-1) Tproduct 61500
Parallel Multiplier Version2 (2n-2) Tcarry + Tproduct 691200
Serial-Parallel Multiplier Requires m cycles for the 36200
output

The area of the Parallel Multiplier Version1 can be approximately calculated as mxn
times the area of a single adder cell (where m and n are the number of multiplier and
multiplicand bits). The 1 bit adder cell dimensions are:
And the area of the Version2 can be approximately considered as an (m+1)Xn times the
area of a single adder cell.

When a pipelined version of Parallel Multiplier Version2 is used, it is faster but the area
will be even more, because of the added latches.

I am choosing Serial-parallel multiplier for the following reasons:


its area efficient
requires m cycle for the output
Learn bit-slice approach.

From this stage all the examples will be Serial-parallel specific.


Step4: Defining Inner blocks

The following figure 8 shows the logic flow with an example: multiplication of two 3 bit
numbers 111 x 111 (it uses the same principle explained in figure 4). This shows more
details with respect to a clock.
In this case, it is taken that the multiplicand bits are all available and the multiplier bits
are provided sequentially one after the other (from LSB to MSB)

Figure 8: 3x3 bit multiplication example using (111x111)

From the above figure, the basic building blocks necessary for multiplication using serial-
parallel implementation can be obtained.
Initially the multiplicand multiplier bits should be anded together. The initial result
obtained by multiplying the LSBs is required output (p0). In order to obtain the other bits
the previous values have to delayed (shifted) and added to the next anded inputs as
shown in above figure. For example: to obtain p1 the result of the adder cell (b1 and a0)
has to added to anded result of b0 and a1 in the second cycle. This delay is provided by
a Delay Flip-flop.
In order to make sure that all outputs are available at the end of 3 cycles, the results p0,
p1, p2 are passed through another set of delay flip-flops. This block diagram is as shown
in the following figure9.
One important point to remember is to make sure that the outputs of the multiplication
cycle do not affect the values of the next multiplication cycle. In other words the outputs
p3, p4, p5 which are available at the end of 3 cycles cant be carried over to the cycle.
This is taken care by providing mux. The control signal R1 is generated internally and
will be discussed in the Control Circuitry design section.

Figure 9: 3x3bit Multiplication block diagram

From the above figure, it can be seen that the 3 bit multiplier can be designed as three
blocks. Each block can be called as a basecell. The layout is described in the next step.

The 3bit multiplier was layed out in Magic. It was tested and verified using IRSIM. And
the same idea was used in building the 8bit multiplier.
Step5: Layout in magic

As specified before this core multiplier was layed out


using a bit slice approach. The first step in this approach
was to build a single bit slice.

Building of a base cell:


The figure on the right hand side shows the organization
of a single slice in the whole array. The 1bit slice is also
called as the basecell. This basecell is basically a single
block shown in figure 9.

The bit-slice has the following cells:


2 Delay cells(sdff)
Parallel-in serial out register (sdff2_mux)
Adder (1badd)
And
2to1 Mux
2 Latches

These cells are taken from the cell-library. All the cells
used are static.

The top cell numbered 1 in the figure shown, acts as a


Parallel-in serial out register. It generated each bit of the
multiplier for a cycle.

The cell numbered 2 is a 1bit adder. and the cell


numbered 3 is an and gate.

The mux numbered 4, makes sure the outputs of the


previous multiplication doesnt affect the value of the
new multiplication cycle (as described in the previous
section).

The cells numbered 5 and 6 are the delay flip-flops.


The cells numbered 7 and 8 are latches; they are used to
latch the LSBs and MSBs of the output. The control
signal is generated internally and will be discussed in the
Control Circuitry design section.

This base cell was organized such that the input is given
at the top and the output is latched at the bottom and it
was designed such that when arrayed, no major routing
is necessary.
Building of an 8bit core multiplier:

The following picture 10 shows the core 8bit multiplier block. It can be seen that the
basecell described above has been arrayed 8 times. Also it can be seen no additional
routing was necessary.
The only routing needed was the connection from the adder of the LSB slice to the delay
flip-flop and the carry out of the MSB slice to the delay flip-flop.

Figure 10: 8x8bit Core Multiplier

This layout was extracted from Magic, verified and tested using IRSIM. The control
signals necessary were generated using IRSIM initially. Later the signals were generated
by creating additional circuitry (glue logic).
Step6: Control Circuitry design

In this step, additional logic was developed to generate the necessary control signals. The
control signals generated are the following
La (latch MSBsLSBs) (NLa)
Ld (NLd)
En
R1 (signal for the mux shown in figure9)
OutputSync (named as Outla)

The following figure shows the Control circuitry block, which generated La (latch
MSBsLSBs), Ld, R1 (signal which controls the mux)
La was generated by anding nq2, nq1, nq0 and NClk, where nq1, nq1, nq0 are the
outputs of the 3 bit counter. It was generated using a nand4_50 and an inverter (the right
most part of the following picture).
Ld/NLd was generated by anding nq2, nq1, nq0. It was generated using a nand3_50 and
an inverter (the after the counter3bsync_vj block of the following picture).
R1 was just delayed using a delay flip-flop (sdff2-50).

Figure 11: Control circuitry block

In order to generate the necessary control signals, the 3 bit counter was initially designed.
The following figure shows the construction of the 3 bit synchronous counter.

Figure 12: 3 bit synchronous counter


The enable signal for the 3bit synchronous counter is just the inverted Reset signal
provided by the user.
OutputSync (OutLa) was generated by delaying q2 (second bit output of the 3bit
synchronous counter)

For dynamic testing purposes an internal 8 bit counter was also included in the layout.
The flowing figure shows the construction of the 8bit asynchronous counter.

Figure 13: 8 bit asynchronous counter

All the above blocks were testing and verified using IRSIM. The above specified signals
can be seen in the complete testing IRSIM result, shown in figure 15.
Step7: Combining the core multiplier block and Control Circuitry design

In this step, the core multiplier block developed in step 5 and the additional Control
Circuitry developed was put together. The following figure 14 shows the complete 8 bit
multiplier.

Brief description of the layout:


Following is the brief description of the layout. It describes the location of the control
signals.
The two rails (near1, on the right hand side) are the Power rails (Vdd and Gnd).

Figure 14: 8x8 bit multiplier with the additional control Logic
The three rails (near 2) are the Reset, Clk/NClk.
The two rails (near 3) are Insel/NInsel
The two rails (near 4) are IntExt/NIntExt
The inputs are given at the top of the block.
The outputs are latched from the bottom. These are available in the 8 bit sliced
format.

Only the signals listed above should be provided by the user. The other control signal and
the complementary signals are generated internally.
Step8: Testing and Verification of the whole block

The complete layout shown in figure 14 was tested and verified using IRSIM and
HSPICE. Static and dynamic testing was performed.
For the dynamic testing, the outputs of an internal 8bit counter will be used as
inputs to the multiplier block.
For static testing, the inputs will be provided externally to the input pins shown.

Description of the IRSIM result:

The following figure shows the IRSIM result. The important signals for verification are
the ain, bin, lsbout, msbout, out, out8.

Names used:
ain, bin: are the inputs.
out, out8 (lsbout, msbout) are the outputs. Out shows the complete 16 bit output and out8
shows the output as two 8 bit slices.
R: the Reset which is also the Enable.
En: the enable for the 3 bit counter.
Counter3: the output of the 3bit counter
Counter8: the output of the 8bit counter
Insel: when the multiplier is selected as an external input, this signal allows to provide
multiplier takes the user specified input.
IntExt: signal used to specify the ain whether counter output or user defined inputs.
Outla: used to generate Out8. Also used as the OutputSync

As it can be seen the Out shows the results when the counter outputs are multiplied with
0xff (up to about 550ns). After that the result the IntExt signal is turned low, which
makes the multiplier block to use the external inputs. Hence the result shows the
multiplication of 0xff and 0xff,

This layout was tested under various conditions and the output obtained was correct.
Figure 15: Irsim Result
Figures 16 and 17 show the HSPICE result of the layout when tested at 3.3 and 5V. On
the top the plots show the counter outputs multiplied with 0xff and the values are correct.
These plots verify the IRSIM results.

Figure 16: HSPICE result using 5V

Figure 17: HSPICE result using 3.3V


HSPICE results showed that the multiplier can be operated at a maximum frequency of
125MHZ when 5V is used and when 3.3V is used the maximum frequency drops to
50MHz.
Result: The layout of an 8x8 bit unsigned multiplier is ready. Its tested and verified
using Irsim. From HSPICE the values for Power and maximum operable speed of the
multiplier were obtained.
The design specifications are met:
A fully functional block is available.
It can be tested in static and dynamic mode.
The throughput of 10 Million operations per second is met when the layout is
operated at 5V.

Overall dimensions:
Width x height: 1386 X 1440
Area: 1995840 2
Number of channel transistors: 630 p-channel and 630 n-channel transistors

Conclusions: HSPICE and IRSIM results prove that the 8x8 bit unsigned binary
multiplier is working.
Following are some tasks which can still be performed on the layout.
By using a carry-save adder, the multiplier can be made faster.
Some of the routing area shown in figure11 can be reduced by changing the
location of the building blocks.

Observations: The timing of the control signals is very crucial. I made the following
observations during this project.
I had to use a synchronous 3 bit counter instead of the asynchronous 3bit counter
to avoid any glitches in the control signals.
Initially when I was trying to generate the Latch signal, I used an And of
R1(internal reset signal) and Clk. But due to the delay in the actual magic
implementation, the latch signal generated is longer than necessary; hence the
wrong outputs were latched. Hence I had to take this delay into consideration and
generated the clock signal used the counter outputs.
I had to place buffers (2 inverters) at couple of places for better driving capability.
By running the multiplier at 3.3V the power is low, but the max speed it can be
operated is considerably low too.
I had to align the blocks in the bit-slice in order to obtain symmetry. Figure9
shows the final block diagram I came up with, but I initially started out with the
bit-slice shown in Appendix 6. Also as can be seen the 2to1 mux shown in on the
left hand side of the adder was placing later on when the results obtained were
incorrect.
Appendix:

1. DIP Package(See Reference 4)


In microelectronics, a dual in-line package (DIP), also know as a DIL package, is an
electronic device package with a rectangular housing and two parallel rows of electrical
connecting pins, usually protruding from the longer sides of the package and bent
downward. A DIP is usually referred to as a DIPn, where n is the total number of pins.
For example, a microcircuit package with two rows of seven vertical leads would be a
DIP14.

DIPs may be used for integrated circuits (ICs, "chips"), like microprocessors, or for
arrays of discrete components such as resistors or toggle switches. They can be mounted
on a printed circuit board (PCB) either directly using through-hole technology, or using
inexpensive sockets to allow for easy replacement of the device and to reduce the risk of
overheat damage during soldering.

Details(See Reference 3): 28 pin DIP (Dual Inline Package) www.irf.com/package/pkcic.html

2. Throughput
Throughput can be defined as the number of operations performed per unit time. Or it is
also defined as the number of outputs obtained per second. For example for this project,
10 Million operations per second imply that for every 100ns a product (output) has to be
seen.
Multiply is a complex operation, MIPS (Million instructions per second).
3. Multiplier cells (See Reference 2)

4. Cells used from the library:


2to1 mux (multiplexer): The 2to1 mux is used to select an output from available
two inputs depending on the control signal.
Sdff2_50 is a delay element.
1badder1_50 is basically a one bit adder. It takes three inputs (a,b, carry-in) and
generated sum and carry-out.
latch_50 is used to obtain the output
and_50

5. Tools Used:
Magic
Irsim
Hspice
Xfig
Xview
6. Initial block diagram of a bit slice, shown using a 3x3 bit multiplier:
References:
1. Discussions with Dr Fischer.
2. ELE 447 Notes for the project.
3. International Rectifiers website
4. Wikipedia website

Acknowledgements:
I would like to take this opportunity to thank Dr Fischer for all help throughout the
semester. I would also like to thank Farshid, my friends and colleagues for their input.

Future work:

The complete 8bit multiplier block can be placed in the 28 pin frame and
fabricated.
The following specs can be obtained once the chip is fabricated and tested under
various conditions.
Electrical Characteristics:
Ambient temperature:
Storage Temperature
Max output current sunk
Max output current Sourced

DC characteristics:
Supply voltage
Vdd rise time
Supply Current
Power down current
Input leakage current

Tools required/used for testing:


Logic State Analyzer is used to verify the output.
Clock generator
Pattern generator
Power supply
Multimeter
Oscilloscope

You might also like