Newest 'cupy' Questions

0 votes

0 answers

18 views

Code Optimization for For Loops in Task Parameterized Gaussian Mixture Models

I'm currently developing TPGMM from Salinon's work (https://calinon.ch/papers/Calinon-JIST2015.pdf). I'm dealing with large matrix operations using CuPy as shown in the code below. However, I'm having ...

Loh Jia Quan

1

asked 2 days ago

0 votes

1 answer

20 views

cupy.nanargmax throwing exception

I have a 2D array allocated on GPU and I need to use the cuPy's nanargmax() function to find the maximum value's index in each row. Some of the values could be NaN. Since the 2D array is quite large (...

skm

5,639

asked Nov 25 at 21:57

0 votes

1 answer

73 views

Named symbol not found when use cupy to invoke cuda kernel

This is my cuda kernel: https://pastebin.com/ti95Qy2p, and I want to invoke compute_linear_recurrence method in this kernel. But when I use code: import cupy as cp # code_str is code in https://...

forestbat

901

asked Nov 7 at 9:39

0 votes

1 answer

37 views

CompileException occurs when compile .cu file with cupy

I have a .cu file with these heads: #include </usr/include/features.h> #include </usr/include/assert.h> #include </usr/include/stdio.h> When I use nvcc command to compile this file, ...

forestbat

901

asked Nov 7 at 5:24

1 vote

1 answer

172 views

Batched matrix multiplication with JAX on GPU faster with larger matrices

I'm trying to perform batched matrix multiplication with JAX on GPU, and noticed that it is ~3x faster to multiply shapes (1000, 1000, 3, 35) @ (1000, 1000, 35, 1) than it is to multiply (1000, 1000, ...

Nin17

3,442

asked Oct 14 at 10:34

1 vote

1 answer

88 views

Understanding the permutation test

I'm attempting to optimize the performance of the permutation test implemented in scipy.stats. My dataset consists of 500,000 observations, each associated with 2,000 binary covariates. I've applied ...

Dan Bolser

31

asked Oct 11 at 8:08

0 votes

1 answer

66 views

Raw kernel with dynamically allocated shared memory

Consider the following CUDA kernel that is used in Python via CuPy from the CuPy docs add_kernel = cp.RawKernel(r''' extern "C" __global__ void my_add(const float* x1, const float* x2, float*...

Uwe.Schneider

1,415

asked Oct 3 at 18:03

3 votes

1 answer

128 views

Why (x / y)[i] faster than x[i] / y[i]?

I'm new to CuPy and CUDA/GPU computing. Can someone explain why (x / y)[i] faster than x[i] / y[i]? When taking advantage of GPU accelerated computations, are there any guidelines that would allow me ...

huang

1,167

asked Sep 30 at 14:37

0 votes

0 answers

16 views

AttributeError: module 'cupy' has no attribute 'ctypeslib'

Trying to import from numpy.ctypeslib import ndpointer but from cupy. Any ideas when this will be implemented or if there is a workaround for this?

Zohim Chandani

29

asked Sep 23 at 13:52

1 vote

0 answers

106 views

Solving sparse linear system on GPU is much slower than CPU

Below is the code which solves a sparse linear system: import cupyx import cupyx.scipy.sparse.linalg import time import scipy import scipy.sparse.linalg import pathlib file_dir = str(pathlib.Path(...

Andy

11

asked Sep 15 at 10:05

1 vote

0 answers

29 views

Manual indexing with multidimensional cupy ndarray in user defined kernels

In the cupy docs on user defined kernels (https://docs.cupy.dev/en/stable/user_guide/kernel.html), there is a section defining certain variables that are predefined, like _ind.size() and i for things ...

rak

21

asked Sep 3 at 15:40

0 votes

0 answers

15 views

How do I redirect the Cuda kernel IO in Cupy?

import sys class Logger: def __init__(self, filename): self.console = sys.stdout self.file = open(filename, 'w') def write(self, message): self.console.write(message) ...

Marko Grdinić

4,042

asked Aug 24 at 10:32

0 votes

0 answers

34 views

Any support for cuTENSORMg in python?

I have recently been looking into cuTENSOR (+cupy) for a speedy tensor contraction GPU library, and have been wanting to extend my single GPU code to multi GPU distributed code via cuTENSORMg; however,...

rak

21

asked Aug 23 at 4:12

0 votes

1 answer

97 views

cuSPARSELt not found by CuPy

I have a hard time getting CuPy to detect and use, where applicable, the cuSPARSELt library in Windows. I tried installing versions 0.2.0 (as mentioned by CuPy's installation guide) and 0.6.2 (the ...

srcLegend

74

asked Aug 22 at 15:28

0 votes

0 answers

26 views

send/recv block in CuPy

I want to send a cupy array from one node to the other. The sender has the following code: import cupy import cupyx.distributed import torch.multiprocessing as mp def send(): cupy.cuda.Device(0)....

edhu

451

asked Aug 18 at 5:23

1 vote

1 answer

64 views

Fast square of absolute value of complex numbers with cupy or otherwise

When one is comparing the magnitudes of complex numbers (essentially sqrt(real² + imag²)) to find the largest absolute values, it would suffice to compare the square of the absolute values, thereby ...

Mikael

33

asked Aug 6 at 10:01

2 votes

0 answers

39 views

cupy.linalg.solve for positive definite matrix?

It seems like cupy.linalg.solve doesn't have an option for me to solve linear system Ax=b assumingA is positive definite? I am looking for something like scipy.linalg.solve where one can actually tell ...

zvi

21

asked Aug 1 at 19:30

0 votes

1 answer

193 views

Cannot use GPU, CuPy is not installed

I have a GPU enabled machine. O.S: Ubuntu 20.04.6 LTS nvcc version: 12.2 Nvidia Driver Version: 535.183.01 Pytorch version 2.3.1+cu121 spaCy version 3.7.5 Python version 3.8.10 Pipelines : ...

Encipher

2,724

asked Jul 11 at 15:49

0 votes

0 answers

48 views

Python app using cupy and cupyx fails with cl.exe not found: how to package to work with no cl.exe on target machine

The app is built in Python on Windows 10 and make heavy use of cupy and cupyx.scipy.ndimage, and a few other cupyx libraries: It is distributable and it works. It now needs to go to a more secure ...

delicasso

187

asked Jun 12 at 5:28

0 votes

0 answers

29 views

Why does the compute sanitizer not detect leaks in CuPy kernels?

kernel = r""" extern "C" __global__ void entry0() { if (threadIdx.x == 0) malloc(16); return ; } """ import cupy as cp raw_module = cp....

Marko Grdinić

4,042

asked Jun 11 at 10:05

0 votes

1 answer

188 views

Access CUDAarray in CuPy using pointer from C++

I'm trying to allocate a CUDAarray (as in, texture memory) in c++ and pass the pointer up to CuPy. From there, I would like to treat it as an ndarray. Many examples show how to cudaMalloc() linear ...

Nicholas Masso

485

asked Jun 3 at 20:57

0 votes

1 answer

195 views

Cupy copy numpy array to existing device array

I would like to copy a numpy array on an existing, pre-allocated, gpu array. I've seen that cupy offers the functions copy and copyto, however the former does not allow to specify the destination ...

stavoltafunzia

77

asked May 30 at 11:52

1 vote

1 answer

241 views

Cannot use GPU for custom spaCy NER model

I'm trying to make a custom NER model using spaCy. When I try to leverage gpu it throws an error stating that Cupy is not installed even though it is. Attaching relevant info below. > ubuntu@:~$ ...

Daaku-C5

11

asked May 28 at 7:48

0 votes

1 answer

252 views

CuPy takes more time to preprocess the image?

import cv2 import numpy as np import cupy as cp import time def op_image(image): start_time = time.time() image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (640, ...

RajeshKumar S

1

asked May 24 at 11:38

0 votes

0 answers

17 views

Assertion error when insering large sized data into a spawned process using multiprocessing with queue

I'm trying to spawn a new process and run a certain function with it that uses cupy to preform heavy computations on the GPU. Each process have the following worker instance (code snippet not the full ...

Idan Uri

1

asked May 21 at 8:46

0 votes

1 answer

326 views

Open3D can't call function 'read_point_cloud'/Module 'Open3D' has no attribute 'read_point_cloud'

So I cloned a git repository which is registrating point clouds using probabilistic methods such as GMMs (Gaussian-Mixture-Models) but also incorporates Open3D. Because the registration is running on ...

Ozymandias

13

asked May 21 at 1:57

1 vote

0 answers

69 views

Ensure same seed generates same random numbers when using numpy and cupy

I import numpy or cupy as follows: import numpy as np # import cupy as np Then I generate X as follows: np.random.seed(0) X = np.random.rand(4, 3) I get two very different matrices depending on ...

Atharva

181

asked May 20 at 18:37

0 votes

0 answers

30 views

When compiling Cuda modules with Cupy, how do I get the diagnostic results from the `-ptxas=v` option?

I am not sure if this even the right question to ask, but when I compile the Cuda Visual Studio example project, there is an --ptxas=v option to turn on diagnostics which prints out the local memory ...

Marko Grdinić

4,042

asked May 18 at 15:45

0 votes

0 answers

96 views

How to use cuSOLVER Multi-GPU to get eigenvalues with CuPy

I have a large, dense, symmetric matrix (~50000x50000) that I want to calcuate the eigenvalues of. I tried cupy.linalg.eigvalsh(), but ran out of memory on a single GPU. I know the cuSOLVER library ...

s769

3

asked May 17 at 11:24

4 votes

1 answer

236 views

Diagonalising matrices that are too large for gpu memory

I want to diagonalise matrices which are too large for the amount of memory available on the gpu. I am interested in any approaches that would allow me to gain some speed up over just diagonalising ...

J.L.

111

asked May 16 at 21:42

0 votes

0 answers

48 views

How can an application using CuPy be deployed? (VC++ dependency)

I've had no problems using the given documentation to install CuPy and develop with it on my own machine. But I'm seeing a roadblock in deploying applications using CuPy in a commercial setting to ...

Clyde

8,145

asked Apr 19 at 16:35

0 votes

0 answers

50 views

Achieving parallelism with multiprocessing on batches of data

I have data that I want to preform a batched process on using two GPUs in parallel, so I wrote the following worker class: class Worker(multiprocessing.Process): def __init__(self, gpu_id: int, ...

James

35

asked Apr 11 at 11:34

1 vote

1 answer

67 views

Runtime Error coccures when using torchsummary

My code is like below. import numpy as np import torch import torch.nn as nn import cupy as cp from torchviz import make_dot from torchinfo import summary from torchsummary import summary as summary_ ...

YJ C

13

asked Apr 11 at 7:58

0 votes

0 answers

79 views

cupy.transpose() not working the same as numpy.transpose()

I am trying to get the indicies when finding the non-zero values of an array using cupy. I first used Numpy as so np.transpose(np.nonzero(a)) which works just fine, but when changing it to cupy cp....

Fernando Chueca

17

asked Apr 6 at 13:21

1 vote

1 answer

299 views

How to get all available devices for CuPy?

How can I get all available devices for CuPy? I'm looking to write a version of this for CuPy: if is_torch(xp): devices = ['cpu'] import torch # type: ignore[import] num_cuda = torch.cuda....

Lucas Colley

21

asked Mar 31 at 10:47

0 votes

0 answers

44 views

Finite difference code for 2D and 3D arrays using CuPy ElementwiseKernel

I am trying to write a finite difference Python code for 2D and 3D arrays using CuPy library. I want to use cupy.ElementwiseKernel() to speed up my code. Currently, I am facing problems to write the ...

Harshit Tiwari

1

asked Mar 13 at 13:02

0 votes

0 answers

13 views

How do I get the PTX in text form when compiling with CuPy for both NVRTC and NVCC backends?

The warp matrix multiply functions are producing vomit in the SASS output, so I want to study the PTX itself to see whether that is the library's or the PTX's fault.

Marko Grdinić

4,042

asked Mar 9 at 9:58

0 votes

0 answers

20 views

When launching a kernel with cooperative threads, and it reduces the number of blocks per grid, how do I make that an error instead of a warning?

This is for a CuPy kernel. I get the following warning when I try to launch it. UserWarning: The grid size will be reduced from 48 to 24, as the specified grid size exceeds the limit. I am doing some ...

Marko Grdinić

4,042

asked Mar 8 at 9:55

2 votes

2 answers

102 views

Making masks based on euclidean distance with pyopencl, arrayfire or another python opencl library

I am doing 2D or 3D binary masks around given coordinates and then identifying them as labels with scipy.ndimage.label. Now, I have a cupy solution, a numpy solution. Cupy is fast, numpy is very slow, ...

João Mamede

147

asked Feb 27 at 21:32

0 votes

0 answers

145 views

cupy_backends.cuda.libs.curand.CURANDError: CURAND_STATUS_INITIALIZATION_FAILED

When I run cupy.random.seed(123), the error below occurred. >>> cupy.random.seed(123) Traceback (most recent call last): File "<stdin>", line 1, in <module> File &...

YJ C

13

asked Feb 27 at 1:15

0 votes

1 answer

235 views

jitify file not found

I am absolutely new to python programming anf VS code. I want to do GPU programming and i installed CUDA toolkit, pip installed cupy and tried running gpu codes but i get this runtime error ../...

Anish Kumar

9

asked Feb 26 at 12:05

1 vote

1 answer

235 views

Using CuPy on Maxwell GPU

Anyone here trying to use cupy on a Maxwell GPU? I am trying to do a simple array.mean() operation and getting the message below. Is there a way I can get around this? Do I need to install a different ...

Stanley Powerlock

11

asked Feb 22 at 22:32

1 vote

0 answers

149 views

Saving a Cupy array directly to JPEG without converting to NumPy

I'm currently facing a challenge in my project where I need to save large Cupy arrays directly to JPEG files without the intermediate step of converting them to NumPy arrays due to performance ...

nyar

97

asked Feb 22 at 8:54

-2 votes

2 answers

122 views

How do I parallelize a set of matrix multiplications

Consider the following operation, where I take 20 x 20 slices of a larger matrix and dot product them with another 20 x 20 matrix: import numpy as np a = np.random.rand(10, 20) b = np.random.rand(20, ...

anonymous1a

1,270

asked Feb 18 at 2:22

3 votes

0 answers

352 views

cupy cooperative_groups.h: [jitify] File not found

from cupyx.scipy.signal import convolve2d as convolve2d_gpu convolved_image_using_GPU = convolve2d_gpu(deltas_gpu, gauss_gpu) %timeit -n 7 -r 1 convolved_image_using_GPU = convolve2d_gpu(deltas_gpu, ...

Amritesh

41

asked Feb 8 at 10:49

0 votes

3 answers

124 views

More efficient way of looping over a multidimensional numpy array other than numpy.where

I have a nested array of shape: [200, 500, 1000]. Each index represents a coordinate of an image, eg array[1, 2, 3] would give me the value of the array at x=1, y=2, and z=3 in coordinate space. I ...

postnubilaphoebus

136

asked Feb 7 at 20:02

-1 votes

2 answers

127 views

Fast tensor-dot on sparse arrays with GPU in any programming language?

I'm now working on two multi-dimensional arrays arr and cost. arr is dense with size (width, height, m, n) and cost is sparse with size (width, height, width, height). Values: width and height are ...

C.K.

1,559

asked Feb 2 at 13:59

3 votes

2 answers

345 views

cupy runtime compilation failed

I'm new to cupy and try to learn it. This following code provides an error using cuda11 import numpy import cupy def monte_carlo_gpu(n:int, m:int)-> float: accum = 0 for i in range(m):...

Harena2019

41

asked Feb 2 at 9:58

0 votes

0 answers

45 views

How to convert chainer.Variable to PyTorch Tensor?

When try to run a piece of code from neural_renderer, it report the following error. The code is based on Cuda 9.2, and I have to upgrade to Cuda 11.1 in order to support latest GPU, chainer upgrade ...

Lamp

342

asked Feb 1 at 9:39

0 votes

1 answer

145 views

How do I pass in the `--gpu-architecture=compute_89` into a NVRTC kernel with CuPy?

cp.RawModule(code=kernel, backend='nvrtc', options=('--gpu-architecture=compute_89',)) When I try to do it like this, I get an error that the option has already been passed in. Do I have to build the ...

Marko Grdinić

4,042

asked Jan 30 at 17:48

Collectives™ on Stack Overflow

Related Tags