Calculate least squares in batches

Question

I want to calculate a Gaussian process. For this I need to calculate the following equation: f = X @ Z^-1 @ y

To handle the inverse I use c = Z^-1 @ y which is the solution of the problem Z @ c = y. So, I use a solver for linear systems (least squares) to calculate c on my GPU (using cupy). The problem now is, that I reach the memory capacity (16000x16000 matrix). Is it possible to calculate c in batches? Or how can I solve the out of memory issue?

I searched the internet but I found nothing.

Bob · Accepted Answer · 2023-02-03 11:05:16Z

The GPU will help you with the arithmetic, but will penalize data access.

Solving a linear system will require O(n^2) arithmetic operations, and the amount of data accessed is O(n^2). So you are probably not going to have a big gain.

You will find efficient linear solvers CPU implementations in pytorch, or scipy for instance, if you can use float32 instead of float64 even better.

You can have a matrix of 16000 x 16000 if you have a recent decent GPU, it would take 2GB using float64 elements. Make sure to release all other allocated memory before trying to allocate Z, and when solving try not to allocate more data. If the matrix is well conditioned, on GPU you could try to use float16 that would place a 16000 x 16000 matrix in 512MB memory.

If you want to use GPU pytorch is my preferred option, and if you have an algorithm running on CPU you can run it on GPU with very little changes.

Collectives™ on Stack Overflow

Calculate least squares in batches

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
machine-learning
out-of-memory
linear-algebra
batch-processing
cupy
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged machine-learningout-of-memorylinear-algebrabatch-processingcupy or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
machine-learning
out-of-memory
linear-algebra
batch-processing
cupy
or ask your own question.