I'm doing a data analysis project where I'm working with really large numbers. I originally did everything in pure python but I'm now trying to do it with numpy and pandas. However it seems like I've hit a roadblock, since it is not possible to handle integers larger than 64 bits in numpy (if I use python ints in numpy they max out at 9223372036854775807). Do I just throw away numpy and pandas completely or is there a way to use them with python-style arbitrary large integers? I'm okay with a performance hit.
-
1You can take a look at this post. They discuss the concerns regarding maximum number representation in numpy. You could convert it to object to accommodate for storing large values but you can not use it for intermediate computations.– Anurag ReddyCommented Dec 14, 2021 at 13:15
Add a comment
|
1 Answer
by default numpy keeps elements as number datatype. But you can force typing to object, like below
import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x
print(x_exp2)
the output is
[1000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
The drawback is that the execution is much slower.
Later Edit to show that np.sum() works. There could be some limitations of course.
import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x
print(x_exp2)
print(np.sum(x_exp2))
print(np.prod(x_exp2))
and the output is:
[1000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
1000000000000000000000000000001000000000000000000000000000001000000000000000000000000000001000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
-
1Thanks, I see. However, don't I lose much of the benefits of numpy by doing this? It seems like I now don't have access to any of the methods available on ints, such as sum. I guess that's reasonable though– lapuritaCommented Dec 14, 2021 at 13:58
-
@lapurita: it may depend by the python/numpy version. I was trying the sum() and prod() operations and they seems to be OK. I'll add an edit to the answer Commented Dec 14, 2021 at 15:43