2

I'm doing a data analysis project where I'm working with really large numbers. I originally did everything in pure python but I'm now trying to do it with numpy and pandas. However it seems like I've hit a roadblock, since it is not possible to handle integers larger than 64 bits in numpy (if I use python ints in numpy they max out at 9223372036854775807). Do I just throw away numpy and pandas completely or is there a way to use them with python-style arbitrary large integers? I'm okay with a performance hit.

1
  • 1
    You can take a look at this post. They discuss the concerns regarding maximum number representation in numpy. You could convert it to object to accommodate for storing large values but you can not use it for intermediate computations. Commented Dec 14, 2021 at 13:15

1 Answer 1

5

by default numpy keeps elements as number datatype. But you can force typing to object, like below

import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x
print(x_exp2)

the output is

[1000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]

The drawback is that the execution is much slower.

Later Edit to show that np.sum() works. There could be some limitations of course.

import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x

print(x_exp2)
print(np.sum(x_exp2))
print(np.prod(x_exp2))

and the output is:

[1000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
1000000000000000000000000000001000000000000000000000000000001000000000000000000000000000001000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
2
  • 1
    Thanks, I see. However, don't I lose much of the benefits of numpy by doing this? It seems like I now don't have access to any of the methods available on ints, such as sum. I guess that's reasonable though
    – lapurita
    Commented Dec 14, 2021 at 13:58
  • @lapurita: it may depend by the python/numpy version. I was trying the sum() and prod() operations and they seems to be OK. I'll add an edit to the answer Commented Dec 14, 2021 at 15:43

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.