Decimal Floating Point Arithmetic Unit Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Priyankar Talukdar Abhinav Pundir Rajanikanth Kashi

MS2011009 MT2010004 PH2011004

Decimal Floating Point Arithmetic Unit Design

Introduction:
Transaction processing is one of the major uses of computers. Typically, these transactions involve many decimal multiplications, such as multiplying the cost per minute or the tax rate per charge. These decimal calculations must be rounded to a decimal radix point. Decimal calculations cannot be directly implemented with binary floating point because fractions such as 0.1 cannot be represented exactly. Hence rounding error becomes a bottleneck in the design of such financial systems. A round-off error, also called rounding error, is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate this error when using approximation equations and/or algorithms, especially when using finitely many digits to represent real numbers (which in theory have infinitely many digits). This is a form of quantization error. When a sequence of calculations subject to rounding error are made, errors may accumulate in certain cases known as ill-conditioned, sometimes to such an extent as to dominate the calculation and make the result meaningless. Needless to say the financial transaction suffers the brunt of such rounding errors, thereby resulting in loss to financial institutions.

Notation

Represent

Approximate

Error

1/7

0.142 857

0.142 857

0.000 000 142 857

ln 2

0.693 147 180 559 945 309 41...

0.693 147

0.000 000 180 559 945 309 41...

log10 2

0.301 029 995 663 981 195 21...

0.3010

0.000 029 995 663 981 195 21...

1.259 921 049 894 873 164 76...

1.25992

0.000 001 049 894 873 164 76...

Decimal Floating Point (Using Decimal Format Encoding)


Decimal floating point arithmetic refers to both a representation and operations on decimal floating point numbers. Working directly with decimal (base 10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions (common in human-entered data, such as measurements or financial information) and binary (base 2) fractions. Contrary to the binary interchange formats, the sign, exponent, and (trailing) significand fields are not fully separated: to preserve as much accuracy as possible, some information on the significand is partly encoded in what used to be the exponent field and is hence called the combination field. BCD encoding is not very efficient and utilizes only 62.5% of the encoding space, but it allows quick conversion from a database and is optimal for shifting, scaling, and extracting fields of data. Binary integer encoding provides 100% compression and fast execution of high-order arithmetic operations, but it isslow in reading and writing data from databases and performing simple operations and rounding. Binary integer encodings have their disadvantages, so another encoding was desired. This encoding is a BCD compressed format called densely packed decimal (DPD). Three BCD digits, which would normally require 12 bits to represent, are compressed to 10 bits in what is called a declet. This provides greater than 97.6% efficiency (or 1,000 out of 1,024 possible representations). It also has the advantage of requiring only three logic gate delays to convert from BCD to DPD and from DPD to BCD format. The DPD format has the same advantages of the BCD format, but with the additional benefit of being more compact and allowing more digits to be represented in a given data width. Architecture for DPD encoding design:

S
Sign Bit 1bit

G
W+5 bits 20bits

T
2 declet

11bits

Description of combo bit mapping :Combination W+5 scheme.

Now the significands are stored in multiples of 10 bits . There is a combination field for storing the most significand digit and the exponent, which is stored in w+5 bits. Since the Most Significand Digit can be 0-7 or 8-9 . There is different encoding schema to store for either of the case. It is very important to realize that the combination part contains the exponent as well as the MSD, which is different if we talk about Binary interchange Format(IEEE-754-1985 standard).

IEEE-754-2008 Standard (encoder decoder for DPD)


Storage size in bits Trailing bits=10*J Combination=w+5 Emax=3*2^w-1 Bias =Emax+p-2 Decimal 32 32 20 32-20-1=11 96 101 Decimal 64 64 50 64-50-1=13 384 398 Decimal 128 128 110 128-110-1=17 6144 6176

If the decimal encoding is used for the significand, then the least significant w bits of the biased exponent E are made up of the bits G5 to Gw+4 of G, whereas the most significant two bits of E and the most significant two digits of C are obtained as follows:

if the five most significant bits G0G1G2G3G4 of G are of theform 110xx or 1110x, then
the leading significand digit C0 is8 + G4 (which equals 8 or 9), and the leading biased exponent bits are G2G3; if the five most significant bits of G are of the form 0xxxx or10xxx, then the leading significand digit C0 is 4G2 +2G3 +G4(which is between 0 and 7), and the leading biased exponent bits are G0G1. The p1 = 3J decimal digits C1, . . . ,Cp1 of C are encoded by T,which contains J declets encoded in densely packed decimal .Note that if the five most significant bits of Gare 00000, 01000, or 10000, and T = 0, then the significand is 0 and the represented number is (1)S (+0).

A Case Study on IBM Z10 Architecture, Decimal Floating Point

Fig: Data-Flow for IBM z10 processor.

The new IBM System z10* processor and the IBM POWER6*processor, part of the IBM System p* 570 server, supportDFP natively in hardware.The first hardware implementation of the IEEE

754-2008 floating-point standard using a DFU was on thePOWER6 processor. The z10* DFU is based on this design, and both were implemented by essentially the same team. The DFU dataflow can be determined from the diagram provided above. The following discussion would be based on the decimal64 implementation on the z10 architecture. At the top of the 144-bit-wide(16x64bit) dataflow of the significand is the dumult macro, otherwise known as the multiple creator macro, which creates two times and five times the multiplicand. Just below it is the durotmacro, otherwise known as the rotator, which is used for shifting the significand left or right and has a built-in mask used to zero-out digits, depending on the operation. Next, there are a few small macros, including leading-zerodetect macros for each operand, dulzd_a and dulzd_b,and an exponent difference macro, duxdif. The du10to3macro is used to expand DFP data. The significand in densely packed decimal (DPD) encoding and is expanded by the du10to3 macro to BCD encoding. In the top to middle of the stack are the operand A and B registers, duareg and dubreg. The adder, duaddr, is located next to operand registers to reduce wire length in this critical timing path. The adder can be separated into two 18-digit adders or combined into one 36-digit adder that has a latency of two cycles,but a throughput of one add per cycle. Several multiplexers and fixed shifters are contained in the dumisc macro. A BCD result register is contained in the duwreg macro. The BCD data is compressed back to DPD format in the du3x10 macro and placed in the Cregister, ducreg, to be sent to the FPR. There is also a macro for converting BCD to binary, ducvb, and thereverse, ducvd, and a macro for detecting the number of leading zeros in the result, dulzdw. These macros makeup the dataflow of the significand; parity is used to check the interfaces with other units, and residue-3 checking is used to protect the significand dataflow from transient failures. Below the significand stack is most of the exponent computation logic in the duxres and duxres2 macros .Note that there are latch buffers above and below these macros to stage delayed signals to duplicate copies of the macros (duxres_ras and duxres2_ras) so that checking is not timing critical. The stack on the right-hand side is mostly for controls and consists primarily of random logic macros. The multiplication and division controls are in ductlm, the addition controls are in ductla, and miscellaneous instruction controls are in ductlx. The ductlg macro isused to perform decodes of the instruction text and also contains global controls. It is duplicated for reliability(ductlg_ras). The ductls1, ductls2, and ductls3macros perform miscellaneous operations, such as handling special results (e.g., not-a-number) and implementing a common rounding routine used by all arithmetic operations.

The ductls0 macro performs much of the RAS (reliability, availability, andserviceability) checking and reporting. Mixed in with the control macros are duxabcq, which holds the input exponents, and duxaln, which creates a shift amount for alignment of the significands. There is a lookup table for division to prescale the operands by an approximation to the reciprocal of the divisor in dupstbla.

Implementation
The block module of a Decimal Floating point design has been conceptually realized. The blocks of the entire design layout have been taken as a case study from the power6 and z10 architecture where Decimal Floating point on hardware architecture exists. Till date we have realized the framework for such a design. We have designed a32 bit encoder and decoder for BCD to DPD. The scope of the design is in consistency with IEEE 754-2008 standard, as maintained in the IEEE 754 section mentioned as above in the manual. We are still exploring possibilities while we are designing one block at a time for minimum wire length for critical data-path analysis. As in IBM z10 architecture we intend to keep reliable valuable timing critical modules as close as possible in realization. However in simulation purposes that is barely realized as yet. The design for two way encoder (BCD to DPD) and decoder (DPD to BCD) has been realized on a Spartan 3 hardware platform(fpga).

Future Work
A lot remains to be achieved now since the scope of design is clearly chalked out in front of us currently. The following tasks would be taken up in future work on this topic of research. DFP Addition: DFP Multiplication: DFP division: Pipelining of the above arithmetic functional blocks would also be very important in analysis and design for faster arithmetic blocks. For all the design above the IBM architecture is a model of reference. Power analysis and timing analysis for the blocks would be also taken up for investigation.

References
1. IEEE Standard for Floating-Point Arithmetic, IEEE-754-2008. 2. Computer Arithmetic, Algorithms and Hardware Designs: Behrooz Parhami. 3. Handbook of Floating Point Arithmetic : Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lef`ever, Guillaume Melquiond, Nathalie Revol, Damien Stehle, Serge Torres. 4. IIT Madras, Nptel Lectures, Electronics Design and Automation. 5. Spartan-3E Starter Kit Board User Guide. 6. PlanAhead Software Tutorial I/O Pin Planning. 7. Decimal floating point support on the IBM System z10 processor, E. M. Schwarz, J. S. Kapernick M. F. Cowlishaw 8. Decimal floating-point in z9: An implementation and testing perspective : A. Y. Duale
M. H. Decker, H.-G. Zipperer M. Aharoni 9. IBM POWER6 accelerators: VMX and DFU : L. Eisen ,J. W. Ward III, H.-W. Tast, N. Ma ding J. Leenstra, S. M. Mueller,C. Jacobi,J. Preiss,E. M. Schwarz,S. R. Carlough 10. IBM POWER6 microarchitecture : H. Q. Le W. J. Starke J. S. Fields F. P. OConnell D. Q.Nguyen B. J. Ronchetti W. M. Sauer E. M. Schwarz M. T. Vaden

You might also like