Questions tagged [floating-point]
Approximate representation of numbers as a fixed number of digits multiplied by a logarithmic scale.
244 questions
1
vote
1
answer
39
views
Does adding two positive floating point numbers ever result in a smaller number?
When dealing with floating point numbers, is there every a time when adding two non-negative, finite, non-NaN numbers will result in a result that is less than the greater of the two? ...
0
votes
0
answers
37
views
Convert float to integer representation
It's related with my previous question.
I want to map arbitrary float space into integer representation.
I have define the transformation, where $x'$ is integer representation, $x$ is real value like <...
2
votes
1
answer
62
views
Is there such data type for low range float?
For integers, we have unsigned integers to represent positive integers, including zero, and we have signed integers to represent negative and positive. There are always trade-offs between them.
For ...
0
votes
0
answers
30
views
What is the purpose of rounding bit in floating point numbers?
Let's only consider single precision ieee-754 floating point numbers. I understand how to convert decimal floating-point number to its ieee-754 representation (Well... almost). My problem is with ...
0
votes
0
answers
29
views
Exact computation with fractional powers as integer bounds
Context: I am implementing the prime-counting function $\pi(x)$ with the Meissel-Lehmer algorithm, extended by Lagarias, Miller, and Odlyzko. The notation comes from an overview by Oliveira e Silva. $\...
2
votes
2
answers
62
views
Diverging floating point calculation
I remember from computer science classes an example where the same floating-point calculation would diverge to infinity. From memory, it was something like this:
...
5
votes
2
answers
1k
views
Big Transition of Binary Counting in perspective of IEEE754 floating point
If I iterate over binary representations from 000...000 to 111...111, is there a significant transition at some point?
In the ...
-2
votes
8
answers
1k
views
Why do computers use binary numbers in IEEE754 fraction instead of BCD or DPD?
I asked a new question because it more accurately reflects what I asked: about:Why don't decimal-floating-point numbers have CPU level support like float point numbers in usual computers?
I will ...
-1
votes
1
answer
43
views
Why is ot returning TRUE in first case and FALSE in the second?
I understand 0.3 does not have an accurate binary representation.
Suppose I run the following code:
Why is the answer "True" in the first case and "False" in the second? Shouldn't ...
1
vote
0
answers
50
views
What are the most number of bits ever used in arbitrary/multiple precision floating point arithmetic?
I've been exploring the evolution of floating-point arithmetic formats from single to octuple precision. Here's what I THINK I have learned about the key specifications and capabilities for each ...
0
votes
1
answer
102
views
How reliable is a floating point operation (how often does it makes mistakes)
While computers are very reliable, they can also do errors because of noise. I would like to have an idea of the rough order of magnitude of error per floating point operation in a computer, in a ...
3
votes
4
answers
144
views
Are float pseudo-random number generators always implemented using integer generators underneath
In C it's well known to use simple routine for turning integer rng into float rng. Something like that
...
0
votes
1
answer
129
views
A floating-point rounding problem
I run the Python code below. x and y differ in their 4th and 5th number, and x has larger ...
0
votes
2
answers
106
views
How does signed floating point adder implement?
The following picture is a block diagram of an arithmetic unit dedicated to IEEE 754 floating-point addition from Computer Organization and Design RISC-V Edition: The Hardware Software Interface 2nd ...
0
votes
1
answer
53
views
Absolute difference between largest IEEE754 number and its predecesor
In simple precision format, the largest possible positive number is
$A = 0 ~~~ 11111110 ~~~ 111\ldots 111$
Its predecessor is
$B = 0 ~~~ 11111110 ~~~ 111 \ldots 110$
But what is the absolute ...
2
votes
1
answer
131
views
Comparison of different algorithms for summing floating point numbers
I am exploring several approaches to summing floating point values, such as:
Naive summation, for comparison
Summing sorted values
summing with numpy, again for comparison
Kahan's algorithm
Pairwise ...
1
vote
2
answers
520
views
Convert a rational number to a floating-point number exactly
We have two integers, $n$ and $d$. They are coprime (the only positive integer that is a divisor of both of them is $1$). They may be implemented as something that fits in a machine register, or they ...
2
votes
2
answers
430
views
Multiplication of subnormal and normal numbers under IEEE 754
As far as I understand, when doing this operation, we first need to identify the subnormal value, normalise it, adjust the exponent, and then multiply the significands. Wouldn't it be advantageous to ...
3
votes
2
answers
374
views
Floating-point modular multiplication algorithm
Is there a well-known algorithm for modular multiplication of floating-point numbers?
I would like to multiply some large angle in single precision (6-7 significant digits) and wrap it back to 360 ...
1
vote
1
answer
132
views
FFT of logarithmic input data
Is there a reasonably accurate method of computing an FFT of logarithmically-represented input data (with a sign bit, that is $±2^{\text{double-precision value}}$)?
The naive method (convert to linear ...
1
vote
1
answer
73
views
Floating-point rounding - bit patterns of values that are halfway between two possible results
I am working through the book Computer Systems: A Programmer's Perspective.
The authors explain that round-to-even rounding can be applied for values that are halfway between two possible results. For ...
0
votes
0
answers
481
views
Why XMM register width is 128-bit while double precision floating point is 64-bit?
As far as I know, XMM is used to store floating point. But Highest width floating point according IEEE754 standard is double precision (64-bit). But why XMM register width is 128-bit. Is another 64-...
0
votes
1
answer
42
views
How much larger is the next representable value if 2^59 is stored in a double?
This is an exam question I couldn't solve
If I store 2^59 as double, that would give me
1 * 2^58. Is the answer just 2? I.e. next value is 2^60??
2
votes
0
answers
53
views
Best way to constrain a complex number to being within the unit circle?
What is the best way of implementing the following function,
$$f(x) = \frac{x}{\max(1, |x|)},$$
where $x$ is complex, using a Cartesian representation of $x$ with IEEE 754 floating point ...
3
votes
0
answers
533
views
What are the use cases for the IEEE 754 inexact flag?
The IEEE 754 standard for floating point numbers defines a flag that is set when a result from floating point calculation isn't exact, i.e. has to be rounded. What algorithms are there that utilize ...
4
votes
3
answers
3k
views
Is significand same as mantissa in IEEE754?
I'm trying to understand IEEE 754 floating point. when I try convert 0.3 from decimal to binary with online calculator, it said the significand value was ...
1
vote
1
answer
42
views
How does CPU determine Reserved Exponent cases?
Using IEEE 754 algorithm i assume, that it can be implemented in a branchless way.
But how does CPU determine special cases (Reserved Exponent values):
Exponent
Significand
is
11111111
000000000...
...
1
vote
1
answer
184
views
What does it mean unambiguously that a number is value 0 up to numerical precision?
I was reading that a quantity $x$ is $0$ upt to numerical precision. What does this statement formally mean -- especially in the context of numerical methods or real computers.
I looked up in google ...
1
vote
0
answers
40
views
Standard for representing a float scaled to a particular range?
TL;DR Is there a "standard" way to represent a float scaled to a particular range, such that we get maximum precision for the given bit depth, within that range?
I'll start with my general ...
0
votes
2
answers
93
views
(Numerical Analysis) What is the largest double float represented for the gamma function and $n!$
Consider that
\begin{align}
\Gamma(n+1) = n!
\end{align}
for any integers. I then got the following two questions:
What is the largest value of $n$ for which $Γ(n+1)$ and $n!$ can be exactly ...
3
votes
1
answer
86
views
Bisecting Intervals of floating point numbers containing 0 and infinity fairly
It is seldom considered that floating points are not evenly distributed in the real number line. I've been working with interval arithmetic and noticed when bisecting $[a,b]$ on the real number line ...
2
votes
1
answer
451
views
Can Radix Sort be modified for signed ints and/or floats?
A few months ago I learned about the magic that allows radix sort to run in O(n) time and space. Most tutorials on radix sort say it is useful for very large ...
0
votes
1
answer
81
views
IEEE 754 conversion
I'm trying to convert 3.2 into IEEE 754 format. We find that $(3)_2=11$ and we also find that
$0.2*2=0.4 -0$
$0.4*2=0.8 -0$
$0.8*2=1.6 -1$
$0.6*2=1.2 -1$
and this cycle repeats so $.2=00110011...$
...
4
votes
0
answers
167
views
Uniformly random decimal numbers
Due to finite precision of number representations, we face situations like:
In: 0.1+0.1+0.1==0.3
Out: False
(on my ...
1
vote
0
answers
56
views
(Branchless) Bitonic Sorting Network for a Set of Floating Point Numbers
In the past I've implemented a branchless Bitonic Sorting Network on a gpu using CUDA, for integers.
I am facing a related problem:
In my Order Independent Transparency implementation, I would like to ...
0
votes
1
answer
71
views
How can vector angle comparison between lattice points be done without using floating-points? (Convex Hull)
Let's say I have a point $(x_0, y_0)$, and some other points $(x_1, y_1), (x_2, y_2) ... (x_n, y_n)$, such that all of them are lattice points; all have integer coordinates. Let's further assume that ...
1
vote
2
answers
1k
views
How many Integers can be represent in Double-Precision floating-point form
How to calculate the number of Integers that can be represent in Double-Precision floating-point form?
1
vote
1
answer
415
views
Prove every number in double precision 32-bit floating-point format can be represented in 64-bit format
Theorem: Prove every number in double precision 32-bit floating-point format can be represented in double precision 64-bit floating point-format.
64-bit format:
Attempt: Let $ b = b_0 ,...,b_{31} $ ...
1
vote
1
answer
49
views
Why does floating point become less accurate as the powers of 2 increase?
https://fabiensanglard.net/floating_point_visually_explained/
I was reading this article where the exponent and the mantissa are explained as the window and offset respectively. As the gap between ...
0
votes
0
answers
282
views
Is there a way to convert FLOPS to bit operation per second
My problem is the following: I have $N$ inner products to compute in parallel every second.
Each of the vectors in those inner product is composed of $7$ bits.
I want to know for which $N$ it starts ...
2
votes
1
answer
301
views
Unit conversion - Better to divide by an integer or multiply by a double?
I currently have a long timestamp measured in units of 100ns elapsed since January 1st, 1900. I need to convert it to milliseconds.
I have the choice of either ...
1
vote
2
answers
1k
views
Half precision floating point question -- smallest non-zero number
There's a floating point question that popped up and I'm confused about the solution. It states that
IEEE 754-2008 introduces half precision, which is a binary
floating-point representation that uses ...
1
vote
1
answer
723
views
Floating Point Arithmetic with 3 bits mantissa
Find all values of $ x ∈ R $ such that x + 1 = 1 in floating point arithmetic with 3 bits mantissa.
How do we represent number 1 in floating point arithmetic with 3 bits mantissa I wonder? After that, ...
3
votes
2
answers
199
views
Python versus Matlab on the quantity 1/0
Python and Matlab seem to disagree on the division by 0.
Python:
...
1
vote
3
answers
5k
views
Negative Numbers in 32 bit Floating Point IEEE Numbers
So I understand the logic behind converting positive decimal numbers to IEEE 32 bit floating numbers but I'm not completely sure behind the negative one's. If for example we have a decimal number say -...
1
vote
2
answers
558
views
Adding two numbers in base 2(floating point) vs Multiplying two numbers in base 2(floating point)
Is it true that adding two numbers in base 2 is more complex than multiplying them? If so can someone please explain why this is the case?
1
vote
2
answers
153
views
Prove that $1^\text{nan} = 1.00$
I know that for most computation involve nan (not a number) the result is a nan itself except for some cases.
For example, $1^{\text{nan}} = 1.00$ which proven by mathematicians to be true.
I tried to ...
2
votes
2
answers
31
views
Floating point bitwise comparator. If f1 and f2 are floating point numbers with the following properties can we always say f1 > f2?
Recall floating-point representation:
Suppose $f$ is a floating-point number then we can express f as,
If $f$ is normal:
$$(-1)^{s}\cdot2^{e-127}(1 + \sum\limits_{k=1}^{23} b_{23-k}\cdot 2^{-k})$$
If $...
0
votes
0
answers
122
views
Convert $8.75×10^{6}$ to IEEE-32 format?
There is a similar question already asked on this site but does not have an answer as to how the 10x was converted into 2y. I know how to convert 8.75 or 875 into IEEE representation. But what about ...
1
vote
1
answer
608
views
What is the machine epsilon and number of mantissa bits for TI-83?
I am trying to determine how many bits the TI-83 Plus uses to store floating point numbers. I am using the algorithm for approximating the machine epsilon given in "Numerical Mathematics and ...