Skip to main content

Questions tagged [floating-point]

Approximate representation of numbers as a fixed number of digits multiplied by a logarithmic scale.

Filter by
Sorted by
Tagged with
1 vote
1 answer
39 views

Does adding two positive floating point numbers ever result in a smaller number?

When dealing with floating point numbers, is there every a time when adding two non-negative, finite, non-NaN numbers will result in a result that is less than the greater of the two? ...
Captain Man's user avatar
0 votes
0 answers
37 views

Convert float to integer representation

It's related with my previous question. I want to map arbitrary float space into integer representation. I have define the transformation, where $x'$ is integer representation, $x$ is real value like <...
Muhammad Ikhwan Perwira's user avatar
2 votes
1 answer
62 views

Is there such data type for low range float?

For integers, we have unsigned integers to represent positive integers, including zero, and we have signed integers to represent negative and positive. There are always trade-offs between them. For ...
Muhammad Ikhwan Perwira's user avatar
0 votes
0 answers
30 views

What is the purpose of rounding bit in floating point numbers?

Let's only consider single precision ieee-754 floating point numbers. I understand how to convert decimal floating-point number to its ieee-754 representation (Well... almost). My problem is with ...
imahmadrezas's user avatar
0 votes
0 answers
29 views

Exact computation with fractional powers as integer bounds

Context: I am implementing the prime-counting function $\pi(x)$ with the Meissel-Lehmer algorithm, extended by Lagarias, Miller, and Odlyzko. The notation comes from an overview by Oliveira e Silva. $\...
qwr's user avatar
  • 628
2 votes
2 answers
62 views

Diverging floating point calculation

I remember from computer science classes an example where the same floating-point calculation would diverge to infinity. From memory, it was something like this: ...
emonigma's user avatar
  • 121
5 votes
2 answers
1k views

Big Transition of Binary Counting in perspective of IEEE754 floating point

If I iterate over binary representations from 000...000 to 111...111, is there a significant transition at some point? In the ...
Muhammad Ikhwan Perwira's user avatar
-2 votes
8 answers
1k views

Why do computers use binary numbers in IEEE754 fraction instead of BCD or DPD?

I asked a new question because it more accurately reflects what I asked: about:Why don't decimal-floating-point numbers have CPU level support like float point numbers in usual computers? I will ...
user avatar
-1 votes
1 answer
43 views

Why is ot returning TRUE in first case and FALSE in the second?

I understand 0.3 does not have an accurate binary representation. Suppose I run the following code: Why is the answer "True" in the first case and "False" in the second? Shouldn't ...
Golden_Hawk's user avatar
1 vote
0 answers
50 views

What are the most number of bits ever used in arbitrary/multiple precision floating point arithmetic?

I've been exploring the evolution of floating-point arithmetic formats from single to octuple precision. Here's what I THINK I have learned about the key specifications and capabilities for each ...
Curious Layman's user avatar
0 votes
1 answer
102 views

How reliable is a floating point operation (how often does it makes mistakes)

While computers are very reliable, they can also do errors because of noise. I would like to have an idea of the rough order of magnitude of error per floating point operation in a computer, in a ...
StarBuck's user avatar
  • 137
3 votes
4 answers
144 views

Are float pseudo-random number generators always implemented using integer generators underneath

In C it's well known to use simple routine for turning integer rng into float rng. Something like that ...
simd's user avatar
  • 131
0 votes
1 answer
129 views

A floating-point rounding problem

I run the Python code below. x and y differ in their 4th and 5th number, and x has larger ...
Rocky's user avatar
  • 11
0 votes
2 answers
106 views

How does signed floating point adder implement?

The following picture is a block diagram of an arithmetic unit dedicated to IEEE 754 floating-point addition from Computer Organization and Design RISC-V Edition: The Hardware Software Interface 2nd ...
user153245's user avatar
0 votes
1 answer
53 views

Absolute difference between largest IEEE754 number and its predecesor

In simple precision format, the largest possible positive number is $A = 0 ~~~ 11111110 ~~~ 111\ldots 111$ Its predecessor is $B = 0 ~~~ 11111110 ~~~ 111 \ldots 110$ But what is the absolute ...
lafinur's user avatar
  • 195
2 votes
1 answer
131 views

Comparison of different algorithms for summing floating point numbers

I am exploring several approaches to summing floating point values, such as: Naive summation, for comparison Summing sorted values summing with numpy, again for comparison Kahan's algorithm Pairwise ...
Olumide's user avatar
  • 153
1 vote
2 answers
520 views

Convert a rational number to a floating-point number exactly

We have two integers, $n$ and $d$. They are coprime (the only positive integer that is a divisor of both of them is $1$). They may be implemented as something that fits in a machine register, or they ...
user2373145's user avatar
2 votes
2 answers
430 views

Multiplication of subnormal and normal numbers under IEEE 754

As far as I understand, when doing this operation, we first need to identify the subnormal value, normalise it, adjust the exponent, and then multiply the significands. Wouldn't it be advantageous to ...
pabloabur's user avatar
3 votes
2 answers
374 views

Floating-point modular multiplication algorithm

Is there a well-known algorithm for modular multiplication of floating-point numbers? I would like to multiply some large angle in single precision (6-7 significant digits) and wrap it back to 360 ...
phil5's user avatar
  • 33
1 vote
1 answer
132 views

FFT of logarithmic input data

Is there a reasonably accurate method of computing an FFT of logarithmically-represented input data (with a sign bit, that is $±2^{\text{double-precision value}}$)? The naive method (convert to linear ...
TLW's user avatar
  • 1,493
1 vote
1 answer
73 views

Floating-point rounding - bit patterns of values that are halfway between two possible results

I am working through the book Computer Systems: A Programmer's Perspective. The authors explain that round-to-even rounding can be applied for values that are halfway between two possible results. For ...
cmplx96's user avatar
  • 113
0 votes
0 answers
481 views

Why XMM register width is 128-bit while double precision floating point is 64-bit?

As far as I know, XMM is used to store floating point. But Highest width floating point according IEEE754 standard is double precision (64-bit). But why XMM register width is 128-bit. Is another 64-...
Muhammad Ikhwan Perwira's user avatar
0 votes
1 answer
42 views

How much larger is the next representable value if 2^59 is stored in a double?

This is an exam question I couldn't solve If I store 2^59 as double, that would give me 1 * 2^58. Is the answer just 2? I.e. next value is 2^60??
Rubus's user avatar
  • 123
2 votes
0 answers
53 views

Best way to constrain a complex number to being within the unit circle?

What is the best way of implementing the following function, $$f(x) = \frac{x}{\max(1, |x|)},$$ where $x$ is complex, using a Cartesian representation of $x$ with IEEE 754 floating point ...
sircolinton's user avatar
3 votes
0 answers
533 views

What are the use cases for the IEEE 754 inexact flag?

The IEEE 754 standard for floating point numbers defines a flag that is set when a result from floating point calculation isn't exact, i.e. has to be rounded. What algorithms are there that utilize ...
QuantumWiz's user avatar
4 votes
3 answers
3k views

Is significand same as mantissa in IEEE754?

I'm trying to understand IEEE 754 floating point. when I try convert 0.3 from decimal to binary with online calculator, it said the significand value was ...
Muhammad Ikhwan Perwira's user avatar
1 vote
1 answer
42 views

How does CPU determine Reserved Exponent cases?

Using IEEE 754 algorithm i assume, that it can be implemented in a branchless way. But how does CPU determine special cases (Reserved Exponent values): Exponent Significand is 11111111 000000000... ...
uptoyou's user avatar
  • 113
1 vote
1 answer
184 views

What does it mean unambiguously that a number is value 0 up to numerical precision?

I was reading that a quantity $x$ is $0$ upt to numerical precision. What does this statement formally mean -- especially in the context of numerical methods or real computers. I looked up in google ...
Charlie Parker's user avatar
1 vote
0 answers
40 views

Standard for representing a float scaled to a particular range?

TL;DR Is there a "standard" way to represent a float scaled to a particular range, such that we get maximum precision for the given bit depth, within that range? I'll start with my general ...
hazymat's user avatar
  • 111
0 votes
2 answers
93 views

(Numerical Analysis) What is the largest double float represented for the gamma function and $n!$

Consider that \begin{align} \Gamma(n+1) = n! \end{align} for any integers. I then got the following two questions: What is the largest value of $n$ for which $Γ(n+1)$ and $n!$ can be exactly ...
Jens Kramer's user avatar
3 votes
1 answer
86 views

Bisecting Intervals of floating point numbers containing 0 and infinity fairly

It is seldom considered that floating points are not evenly distributed in the real number line. I've been working with interval arithmetic and noticed when bisecting $[a,b]$ on the real number line ...
worldsmithhelper's user avatar
2 votes
1 answer
451 views

Can Radix Sort be modified for signed ints and/or floats?

A few months ago I learned about the magic that allows radix sort to run in O(n) time and space. Most tutorials on radix sort say it is useful for very large ...
Adam Hoelscher's user avatar
0 votes
1 answer
81 views

IEEE 754 conversion

I'm trying to convert 3.2 into IEEE 754 format. We find that $(3)_2=11$ and we also find that $0.2*2=0.4 -0$ $0.4*2=0.8 -0$ $0.8*2=1.6 -1$ $0.6*2=1.2 -1$ and this cycle repeats so $.2=00110011...$ ...
Iwan5050's user avatar
  • 135
4 votes
0 answers
167 views

Uniformly random decimal numbers

Due to finite precision of number representations, we face situations like: In: 0.1+0.1+0.1==0.3 Out: False (on my ...
Matthieu Latapy's user avatar
1 vote
0 answers
56 views

(Branchless) Bitonic Sorting Network for a Set of Floating Point Numbers

In the past I've implemented a branchless Bitonic Sorting Network on a gpu using CUDA, for integers. I am facing a related problem: In my Order Independent Transparency implementation, I would like to ...
Vectorizer's user avatar
0 votes
1 answer
71 views

How can vector angle comparison between lattice points be done without using floating-points? (Convex Hull)

Let's say I have a point $(x_0, y_0)$, and some other points $(x_1, y_1), (x_2, y_2) ... (x_n, y_n)$, such that all of them are lattice points; all have integer coordinates. Let's further assume that ...
Christopher Miller's user avatar
1 vote
2 answers
1k views

How many Integers can be represent in Double-Precision floating-point form

How to calculate the number of Integers that can be represent in Double-Precision floating-point form?
0xAlon's user avatar
  • 15
1 vote
1 answer
415 views

Prove every number in double precision 32-bit floating-point format can be represented in 64-bit format

Theorem: Prove every number in double precision 32-bit floating-point format can be represented in double precision 64-bit floating point-format. 64-bit format: Attempt: Let $ b = b_0 ,...,b_{31} $ ...
flamel12's user avatar
  • 233
1 vote
1 answer
49 views

Why does floating point become less accurate as the powers of 2 increase?

https://fabiensanglard.net/floating_point_visually_explained/ I was reading this article where the exponent and the mantissa are explained as the window and offset respectively. As the gap between ...
Neel Sandell's user avatar
0 votes
0 answers
282 views

Is there a way to convert FLOPS to bit operation per second

My problem is the following: I have $N$ inner products to compute in parallel every second. Each of the vectors in those inner product is composed of $7$ bits. I want to know for which $N$ it starts ...
StarBuck's user avatar
  • 137
2 votes
1 answer
301 views

Unit conversion - Better to divide by an integer or multiply by a double?

I currently have a long timestamp measured in units of 100ns elapsed since January 1st, 1900. I need to convert it to milliseconds. I have the choice of either ...
Bassinator's user avatar
1 vote
2 answers
1k views

Half precision floating point question -- smallest non-zero number

There's a floating point question that popped up and I'm confused about the solution. It states that IEEE 754-2008 introduces half precision, which is a binary floating-point representation that uses ...
Manny's user avatar
  • 13
1 vote
1 answer
723 views

Floating Point Arithmetic with 3 bits mantissa

Find all values of $ x ∈ R $ such that x + 1 = 1 in floating point arithmetic with 3 bits mantissa. How do we represent number 1 in floating point arithmetic with 3 bits mantissa I wonder? After that, ...
Hung Do's user avatar
  • 13
3 votes
2 answers
199 views

Python versus Matlab on the quantity 1/0

Python and Matlab seem to disagree on the division by 0. Python: ...
pluton's user avatar
  • 133
1 vote
3 answers
5k views

Negative Numbers in 32 bit Floating Point IEEE Numbers

So I understand the logic behind converting positive decimal numbers to IEEE 32 bit floating numbers but I'm not completely sure behind the negative one's. If for example we have a decimal number say -...
idkrlly's user avatar
  • 13
1 vote
2 answers
558 views

Adding two numbers in base 2(floating point) vs Multiplying two numbers in base 2(floating point)

Is it true that adding two numbers in base 2 is more complex than multiplying them? If so can someone please explain why this is the case?
Roy Fischer's user avatar
1 vote
2 answers
153 views

Prove that $1^\text{nan} = 1.00$

I know that for most computation involve nan (not a number) the result is a nan itself except for some cases. For example, $1^{\text{nan}} = 1.00$ which proven by mathematicians to be true. I tried to ...
Monther's user avatar
  • 118
2 votes
2 answers
31 views

Floating point bitwise comparator. If f1 and f2 are floating point numbers with the following properties can we always say f1 > f2?

Recall floating-point representation: Suppose $f$ is a floating-point number then we can express f as, If $f$ is normal: $$(-1)^{s}\cdot2^{e-127}(1 + \sum\limits_{k=1}^{23} b_{23-k}\cdot 2^{-k})$$ If $...
VilePoison's user avatar
0 votes
0 answers
122 views

Convert $8.75×10^{6}$ to IEEE-32 format?

There is a similar question already asked on this site but does not have an answer as to how the 10x was converted into 2y. I know how to convert 8.75 or 875 into IEEE representation. But what about ...
callmeanythingyouwant's user avatar
1 vote
1 answer
608 views

What is the machine epsilon and number of mantissa bits for TI-83?

I am trying to determine how many bits the TI-83 Plus uses to store floating point numbers. I am using the algorithm for approximating the machine epsilon given in "Numerical Mathematics and ...
irowe's user avatar
  • 113

1
2 3 4 5