Representing Numbers On The Computer: What Happened?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Representing Numbers on the Computer

J N=J!
1 1
2 2
Let’s calculate J! 3 6
4 24
Program Maxnumber 5 120
*
* Check what the max numbers are on the computer 6 720
*
Integer N 7 5040
* 8 40320
N=1
Do J=1,20
9 362880
N=N*J 10 3628800
Print *,J,N
Enddo 11 39916800
*
stop 12 479001600
end
13 1932053504
14 1278945280
What happened? 15 2004310016
16 2004189184
17 -288522240
18 -898433024
19 109641728
20 -2102132736
Winter Semester 2006/7 Computational Physics I Lecture 2 1
Representing Integers

E.g., single precision: 4 bytes or 32 bits

1 bit is used for the sign (1 for - 0 for +)


31 bits for value
Because start from 0
Biggest integer 231 -1 = 2 147 483 647= 01111111 11111111 11111111 11111111
Sign Value
Bit
0 1 1 1 1 1 1 1 127
2’s complement is
0 0 0 0 0 0 1 0 2
standard for
0 0 0 0 0 0 0 1 1
integer
0 0 0 0 0 0 0 0 0
representation.
1 1 1 1 1 1 1 1 -1
1 1 1 1 1 1 1 0 -2
8 bit example
1 0 0 0 0 0 0 1 -127
(from Wikipedia)
1 0 0 0 0 0 0 0 -128
Winter Semester 2006/7 Computational Physics I Lecture 2 2
Representing Integers
To calculate the 2’s complement value: N*=2n-N, where n is the
number of bits used to represent an integer. N* is the 2’s
complement representation of the negative of N.

e.g., N10=5 N2=0000 0101 (n=8)


N*2=2n-N2=1 0000 0000 - 0000 0101 = 1111 1011

2’s complement is convenient for computer calculations

There is no rounding error - only a maximum allowed range for


the values.

For the mathematicians: 2n possible values of n bits form a ring of


equivalence classes

Winter Semester 2006/7 Computational Physics I Lecture 2 3


Representing Integers
Or, invert the bits and add 1.

E.g., 5 = 0000 0101

To convert to -5, flip the bits  1111 1010

Then add 1  1111 1011

The other way, to go from -5 to 5,


flip the bits  0000 0100
And add 1  0000 0101

Winter Semester 2006/7 Computational Physics I Lecture 2 4


Representing Real Numbers

Representation of real numbers (scientific notation - IEEE754):

Mantissa and Exponent + sign bit. E.g., single precision (4 bytes)

1 + 8 + 23 = 32 bits
(sign) (exponent) (mantissa)

Double precision
1 + 11 + 52 = 64 bits
(sign) (exponent) (mantissa)
x = (  ) iai2 b  E
s

s is sign bit
a is normalized so first bit is 1 (radix point - implicit)
E = 1/2 of (maximum exponent -1), or
E=01111111 in single precision
Winter Semester 2006/7 Computational Physics I Lecture 2 5
Representing Real Numbers

Example: 4/7 on the computer:


4
= 1.0010010012 1 2
7 10
4 100 1000 1 1 1
= =0+ 2 = 0 + 1 2 +  2 1 =
7 10 111 2 111 111
1000 4
= 0 + 1 2 1 + 0  2 2 + 0  2 3 +  2 = 0.1001001001 = 1.0010010012 1
111
s=0
a = 001001001001001001 (first 1 is implicit)
b-E=-1 or, b2  0111111= -1|10 , b2 = 01111110

Winter Semester 2006/7 Computational Physics I Lecture 2 6


Representing Real Numbers
If the exponent b=11111111, the number has a special value:

• if a=00000000000000000000000, value is ±  depending on s


• else, value is NaN (not a number)

If b=00000000
• x=±0.a•2-126

Otherwise x=±1.a•2b-127 (single precision)


Precision # bits Relative Max magnitude Min
precision magnitude
a b (normalize
d)
single 23 8 2-2310-7 2(255-127)1038 10-38

double 52 11 2-5210-16 2(2047-1023)10308 10-308

Winter Semester 2006/7 Computational Physics I Lecture 2 7


Calculation of 

As an example, consider the calculation of  using the following


algorithm (due to Madhava of Sangamagrama, Indian Mathematician of the 14th
century)
I single precision double precision

1
 = 12  (1) i 1 3.079201459885 3.079201435678

( 2i + 1) 3i
2 3.156181335449 3.156181471570
i=0 3 3.137852907181 3.137852891596
4 3.142604827881 3.142604745663
5 3.141308784485 3.141308785463
6 3.141674280167 3.141674312699
7 3.141568660736 3.141568715942
8 3.141599655151 3.141599773812
9 3.141590356827 3.141590510938
10 3.141593217850 3.141593304503
11 3.141592502594 3.141592454288
12 3.141592741013 3.141592715020
First 16 digits of correct value 13 3.141592741013 3.141592634547

3.14159 26535 89793 20 3.141592741013 3.141592653596


Error  

After 20 iterations, single precision good to 1 10-7 Double


precision to 10-11
Winter Semester 2006/7 Computational Physics I Lecture 2 8
Calculation of 

Close to 10-16

Winter Semester 2006/7 Computational Physics I Lecture 2 9


Calculation of 
Dear folks, 20th October 2005

Our latest record which was announced already at press release time of 6-th of
December, 2002 was as the followings;

Declared record: http://www.super-computing.org/pi_current.html

1,030,700,000,000 hexadecimal digits


1,241,100,000,000 decimal digits

Two independent hexadecimal calculation based on two different algorithms


generated more than 1,030,775,430,000 hexadecimal digits of pi and comparison
of two generated sequences matched completely. Computed hexadecimal digits of
pi were radix converted into base 10, generating more than 1,241,177,300,000
decimal digits of pi and generated decimal digits of pi were radix converted
again into base 16. Radix converted hexadecimal digits of pi were compared with
original hexadecimal digits of pi. There were no difference up to
1,241,100,000,000 decimal digits. Then we are declaring 1,030,700,000,000
hexadecimal digits and 1,241,100,000,000 decimal digits as the new world
records. Details of computed results are available on the following URL's.

http://www.super-computing.org/pi-hexa_current.html (hexadecimal)
http://www.super-computing.org/pi-decimal_current.html (decimal)

Winter Semester 2006/7 Computational Physics I Lecture 2 10


Rounding Errors for Simple Sum

In contrast to integers, there are rounding errors for real


numbers.The error resulting from adding two numbers:
y = x1 + x2
y = rd [ rd(x1 ) + rd(x2 )] where rd() means computer rounding
y  [ x1 (1 +  ) + x2 (1 +  )] (1 +  ) where  is the typical relative error
  2 t where t is the number of bits assigned to the mantissa
single precision,  = 2 23  10 7 double precision,  = 2 52  10 16
y  x1 + x2 +  (x1 + x2 ) + x11 + x2 2

y  y x1 x2
+ 1 + 2
y x1 + x2 x1 + x2

Can get large multiplication of relative error if x1-x2


Winter Semester 2006/7 Computational Physics I Lecture 2 11
Error Propagation

More generally (see Lecture Notes from Scherer):



Input data x = (x1 ,, xn )

Output data y = (y1 ,, ym )
where
 
y =  ( x) =  (r ) (r 1)  (1) and the  are simple functions
Define
 (1) 
x1 =  ( x)
 (i ) 
xi =  ( xi 1 )
 
y =  (r ) ( xr 1 )

Treat all errors as small, represent with x

Winter Semester 2006/7 Computational Physics I Lecture 2 12


Error Propagation

First step:
    
( )
x1 = rd( (1) ( x + x))   (1) ( x) + D (1) x (1 + E1 ) First order in errors
where
 x11 x11
 x 
xn

 x1i  1

D (1) =
=   

x
 j 

 x1n1 
x1n1

 x x

1 n

and
 1(1)
   (1)  (1) 
E1 =  
x1 = x1  x1  D x +  ( x)E1

  n1

(1)

Winter Semester 2006/7 Computational Physics I Lecture 2 13


Error Propagation

  (1)  (2)  (r ) 
y  yEr + D  x + D  x1E1 +  + D xr 1Er 1
(r ) (r )

 y1 y1

 x xn

 1

D = D (r )  (1) =   

 y ym

 m 

 x1 xn

The first term is the inevitable rounding error


The second term contains the propagation of the input errors and
initial rounding errors.
The other terms depend on how the algorithm is set up.
Winter Semester 2006/7 Computational Physics I Lecture 2 14
Error Propagation
Let’s look at the individual terms:


yEr i  yi  The rounding error on the final answer

 yi
D  x i  
(r ) (1)
x j Propagation of input errors
j x j

The other terms depend on the specific algorithm. The goal is for
the algorithm to not give errors larger than the first two
(unavoidable) errors.

Winter Semester 2006/7 Computational Physics I Lecture 2 15


Error Propagation

Let us look at an example in detail - the calculation of a2-b2


Procedure I:
1. Calculate a2 and b2
2. Calculate their difference

  a   x1 
2

x=  x1 =  2  y = x11  x12
 b  x2 

Unavoidable error:
y (a 2  b 2 ) (a 2  b 2 )
y  = a b 
2 2
 x j = +  = 2 ( a + b )
j x j a b
y(0) = a 2  b 2  + 2 ( a + b ) 

Winter Semester 2006/7 Computational Physics I Lecture 2 16


Error Propagation

Let us look at an example in detail - the calculation of a2-b2


Procedure I:
1. Calculate a2 and b2
2. Calculate their difference

  a   x1 
2

x=  x1 =  2  y = x11  x12
 b  x2 
Error magnitude estimation:

  a(1 +  )   a(1 +  )a(1 +  )(1 +  )  a 2


(1 + 2 a + 11 )
x= x1 =   2
a a a 11

 b(1 +  b )  b(1 +  b )b(1 +  b )(1 + 12 )  b (1 + 2 b + 12 )




y =
a 2 (1 + 2 a + 11 )  b 2 (1 + 2 b + 12 )  (1 +  2 )
y  a 2  b 2  + 3(a 2 + b 2 )

Winter Semester 2006/7 Computational Physics I Lecture 2 17


Error Propagation

Procedure II:
1. Calculate a-b and a+b
2. Calculate their product

  a   x1  x2  
x=  x1 =  y = x11 ix12
 b  x1 + x2 
Error magnitude estimation:

  a(1 +  a )   ( a(1 +  a )  b(1 +  b )) (1 + 11 )  (a  b)(1 + 11 ) + a a  b b


x= x1 =  
 b(1 +  b )  ( a(1 +  a ) + b(1 +  b )) (1 + 12 )  (a + b)(1 + 12 ) + a a + b b

y =
(a 2  b 2 )(1 + 11 + 12 ) + 2a 2 a  2b 2 b  (1 +  2 )
y  3 a 2  b 2  + 2(a 2 + b 2 )

Winter Semester 2006/7 Computational Physics I Lecture 2 18


Error Propagation

Single precision

a b Exact value (a2-b2) a2-b2 (a-b)(a+b)


1.0 0.999 1.999 10-3 1.99896 10-3 1.99897 10-3

1.0 0.9999 1.9999 10-4 2.00033 10-4 2.00023 10-4

Winter Semester 2006/7 Computational Physics I Lecture 2 19


Exercises 2
1. Look up a different algorithm to calculate  from the one presented in the lecture
and code it in single and double precision. Compare the speed of convergence to
the one shown in class.

2. Calculate (a4-b4) numerically in single and double precision. Compare the resulting
accuracy to the true value for test cases. Compare to the expected precision for
single and double precision calculations.

Winter Semester 2006/7 Computational Physics I Lecture 2 20

You might also like