Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
18 views

Code Optimization for For Loops in Task Parameterized Gaussian Mixture Models

I'm currently developing TPGMM from Salinon's work (https://calinon.ch/papers/Calinon-JIST2015.pdf). I'm dealing with large matrix operations using CuPy as shown in the code below. However, I'm having ...
Loh Jia Quan's user avatar
0 votes
1 answer
20 views

cupy.nanargmax throwing exception

I have a 2D array allocated on GPU and I need to use the cuPy's nanargmax() function to find the maximum value's index in each row. Some of the values could be NaN. Since the 2D array is quite large (...
skm's user avatar
  • 5,639
0 votes
1 answer
73 views

Named symbol not found when use cupy to invoke cuda kernel

This is my cuda kernel: https://pastebin.com/ti95Qy2p, and I want to invoke compute_linear_recurrence method in this kernel. But when I use code: import cupy as cp # code_str is code in https://...
forestbat's user avatar
  • 901
0 votes
1 answer
37 views

CompileException occurs when compile .cu file with cupy

I have a .cu file with these heads: #include </usr/include/features.h> #include </usr/include/assert.h> #include </usr/include/stdio.h> When I use nvcc command to compile this file, ...
forestbat's user avatar
  • 901
1 vote
1 answer
172 views

Batched matrix multiplication with JAX on GPU faster with larger matrices

I'm trying to perform batched matrix multiplication with JAX on GPU, and noticed that it is ~3x faster to multiply shapes (1000, 1000, 3, 35) @ (1000, 1000, 35, 1) than it is to multiply (1000, 1000, ...
Nin17's user avatar
  • 3,442
1 vote
1 answer
88 views

Understanding the permutation test

I'm attempting to optimize the performance of the permutation test implemented in scipy.stats. My dataset consists of 500,000 observations, each associated with 2,000 binary covariates. I've applied ...
Dan Bolser's user avatar
0 votes
1 answer
66 views

Raw kernel with dynamically allocated shared memory

Consider the following CUDA kernel that is used in Python via CuPy from the CuPy docs add_kernel = cp.RawKernel(r''' extern "C" __global__ void my_add(const float* x1, const float* x2, float*...
Uwe.Schneider's user avatar
3 votes
1 answer
128 views

Why (x / y)[i] faster than x[i] / y[i]?

I'm new to CuPy and CUDA/GPU computing. Can someone explain why (x / y)[i] faster than x[i] / y[i]? When taking advantage of GPU accelerated computations, are there any guidelines that would allow me ...
huang's user avatar
  • 1,167
0 votes
0 answers
16 views

AttributeError: module 'cupy' has no attribute 'ctypeslib'

Trying to import from numpy.ctypeslib import ndpointer but from cupy. Any ideas when this will be implemented or if there is a workaround for this?
Zohim Chandani's user avatar
1 vote
0 answers
106 views

Solving sparse linear system on GPU is much slower than CPU

Below is the code which solves a sparse linear system: import cupyx import cupyx.scipy.sparse.linalg import time import scipy import scipy.sparse.linalg import pathlib file_dir = str(pathlib.Path(...
Andy's user avatar
  • 11
1 vote
0 answers
29 views

Manual indexing with multidimensional cupy ndarray in user defined kernels

In the cupy docs on user defined kernels (https://docs.cupy.dev/en/stable/user_guide/kernel.html), there is a section defining certain variables that are predefined, like _ind.size() and i for things ...
rak's user avatar
  • 21
0 votes
0 answers
15 views

How do I redirect the Cuda kernel IO in Cupy?

import sys class Logger: def __init__(self, filename): self.console = sys.stdout self.file = open(filename, 'w') def write(self, message): self.console.write(message) ...
Marko Grdinić's user avatar
0 votes
0 answers
34 views

Any support for cuTENSORMg in python?

I have recently been looking into cuTENSOR (+cupy) for a speedy tensor contraction GPU library, and have been wanting to extend my single GPU code to multi GPU distributed code via cuTENSORMg; however,...
rak's user avatar
  • 21
0 votes
1 answer
97 views

cuSPARSELt not found by CuPy

I have a hard time getting CuPy to detect and use, where applicable, the cuSPARSELt library in Windows. I tried installing versions 0.2.0 (as mentioned by CuPy's installation guide) and 0.6.2 (the ...
srcLegend's user avatar
0 votes
0 answers
26 views

send/recv block in CuPy

I want to send a cupy array from one node to the other. The sender has the following code: import cupy import cupyx.distributed import torch.multiprocessing as mp def send(): cupy.cuda.Device(0)....
edhu's user avatar
  • 451
1 vote
1 answer
64 views

Fast square of absolute value of complex numbers with cupy or otherwise

When one is comparing the magnitudes of complex numbers (essentially sqrt(real² + imag²)) to find the largest absolute values, it would suffice to compare the square of the absolute values, thereby ...
Mikael's user avatar
  • 33
2 votes
0 answers
39 views

cupy.linalg.solve for positive definite matrix?

It seems like cupy.linalg.solve doesn't have an option for me to solve linear system Ax=b assumingA is positive definite? I am looking for something like scipy.linalg.solve where one can actually tell ...
zvi's user avatar
  • 21
0 votes
1 answer
193 views

Cannot use GPU, CuPy is not installed

I have a GPU enabled machine. O.S: Ubuntu 20.04.6 LTS nvcc version: 12.2 Nvidia Driver Version: 535.183.01 Pytorch version 2.3.1+cu121 spaCy version 3.7.5 Python version 3.8.10 Pipelines : ...
Encipher's user avatar
  • 2,724
0 votes
0 answers
48 views

Python app using cupy and cupyx fails with cl.exe not found: how to package to work with no cl.exe on target machine

The app is built in Python on Windows 10 and make heavy use of cupy and cupyx.scipy.ndimage, and a few other cupyx libraries: It is distributable and it works. It now needs to go to a more secure ...
delicasso's user avatar
  • 187
0 votes
0 answers
29 views

Why does the compute sanitizer not detect leaks in CuPy kernels?

kernel = r""" extern "C" __global__ void entry0() { if (threadIdx.x == 0) malloc(16); return ; } """ import cupy as cp raw_module = cp....
Marko Grdinić's user avatar
0 votes
1 answer
188 views

Access CUDAarray in CuPy using pointer from C++

I'm trying to allocate a CUDAarray (as in, texture memory) in c++ and pass the pointer up to CuPy. From there, I would like to treat it as an ndarray. Many examples show how to cudaMalloc() linear ...
Nicholas Masso's user avatar
0 votes
1 answer
195 views

Cupy copy numpy array to existing device array

I would like to copy a numpy array on an existing, pre-allocated, gpu array. I've seen that cupy offers the functions copy and copyto, however the former does not allow to specify the destination ...
stavoltafunzia's user avatar
1 vote
1 answer
241 views

Cannot use GPU for custom spaCy NER model

I'm trying to make a custom NER model using spaCy. When I try to leverage gpu it throws an error stating that Cupy is not installed even though it is. Attaching relevant info below. > ubuntu@:~$ ...
Daaku-C5's user avatar
0 votes
1 answer
252 views

CuPy takes more time to preprocess the image?

import cv2 import numpy as np import cupy as cp import time def op_image(image): start_time = time.time() image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (640, ...
RajeshKumar S's user avatar
0 votes
0 answers
17 views

Assertion error when insering large sized data into a spawned process using multiprocessing with queue

I'm trying to spawn a new process and run a certain function with it that uses cupy to preform heavy computations on the GPU. Each process have the following worker instance (code snippet not the full ...
Idan Uri's user avatar
0 votes
1 answer
326 views

Open3D can't call function 'read_point_cloud'/Module 'Open3D' has no attribute 'read_point_cloud'

So I cloned a git repository which is registrating point clouds using probabilistic methods such as GMMs (Gaussian-Mixture-Models) but also incorporates Open3D. Because the registration is running on ...
Ozymandias's user avatar
1 vote
0 answers
69 views

Ensure same seed generates same random numbers when using numpy and cupy

I import numpy or cupy as follows: import numpy as np # import cupy as np Then I generate X as follows: np.random.seed(0) X = np.random.rand(4, 3) I get two very different matrices depending on ...
Atharva's user avatar
  • 181
0 votes
0 answers
30 views

When compiling Cuda modules with Cupy, how do I get the diagnostic results from the `-ptxas=v` option?

I am not sure if this even the right question to ask, but when I compile the Cuda Visual Studio example project, there is an --ptxas=v option to turn on diagnostics which prints out the local memory ...
Marko Grdinić's user avatar
0 votes
0 answers
96 views

How to use cuSOLVER Multi-GPU to get eigenvalues with CuPy

I have a large, dense, symmetric matrix (~50000x50000) that I want to calcuate the eigenvalues of. I tried cupy.linalg.eigvalsh(), but ran out of memory on a single GPU. I know the cuSOLVER library ...
s769's user avatar
  • 3
4 votes
1 answer
236 views

Diagonalising matrices that are too large for gpu memory

I want to diagonalise matrices which are too large for the amount of memory available on the gpu. I am interested in any approaches that would allow me to gain some speed up over just diagonalising ...
J.L.'s user avatar
  • 111
0 votes
0 answers
48 views

How can an application using CuPy be deployed? (VC++ dependency)

I've had no problems using the given documentation to install CuPy and develop with it on my own machine. But I'm seeing a roadblock in deploying applications using CuPy in a commercial setting to ...
Clyde's user avatar
  • 8,145
0 votes
0 answers
50 views

Achieving parallelism with multiprocessing on batches of data

I have data that I want to preform a batched process on using two GPUs in parallel, so I wrote the following worker class: class Worker(multiprocessing.Process): def __init__(self, gpu_id: int, ...
James's user avatar
  • 35
1 vote
1 answer
67 views

Runtime Error coccures when using torchsummary

My code is like below. import numpy as np import torch import torch.nn as nn import cupy as cp from torchviz import make_dot from torchinfo import summary from torchsummary import summary as summary_ ...
YJ C's user avatar
  • 13
0 votes
0 answers
79 views

cupy.transpose() not working the same as numpy.transpose()

I am trying to get the indicies when finding the non-zero values of an array using cupy. I first used Numpy as so np.transpose(np.nonzero(a)) which works just fine, but when changing it to cupy cp....
Fernando Chueca's user avatar
1 vote
1 answer
299 views

How to get all available devices for CuPy?

How can I get all available devices for CuPy? I'm looking to write a version of this for CuPy: if is_torch(xp): devices = ['cpu'] import torch # type: ignore[import] num_cuda = torch.cuda....
Lucas Colley's user avatar
0 votes
0 answers
44 views

Finite difference code for 2D and 3D arrays using CuPy ElementwiseKernel

I am trying to write a finite difference Python code for 2D and 3D arrays using CuPy library. I want to use cupy.ElementwiseKernel() to speed up my code. Currently, I am facing problems to write the ...
Harshit Tiwari's user avatar
0 votes
0 answers
13 views

How do I get the PTX in text form when compiling with CuPy for both NVRTC and NVCC backends?

The warp matrix multiply functions are producing vomit in the SASS output, so I want to study the PTX itself to see whether that is the library's or the PTX's fault.
Marko Grdinić's user avatar
0 votes
0 answers
20 views

When launching a kernel with cooperative threads, and it reduces the number of blocks per grid, how do I make that an error instead of a warning?

This is for a CuPy kernel. I get the following warning when I try to launch it. UserWarning: The grid size will be reduced from 48 to 24, as the specified grid size exceeds the limit. I am doing some ...
Marko Grdinić's user avatar
2 votes
2 answers
102 views

Making masks based on euclidean distance with pyopencl, arrayfire or another python opencl library

I am doing 2D or 3D binary masks around given coordinates and then identifying them as labels with scipy.ndimage.label. Now, I have a cupy solution, a numpy solution. Cupy is fast, numpy is very slow, ...
João Mamede's user avatar
0 votes
0 answers
145 views

cupy_backends.cuda.libs.curand.CURANDError: CURAND_STATUS_INITIALIZATION_FAILED

When I run cupy.random.seed(123), the error below occurred. >>> cupy.random.seed(123) Traceback (most recent call last): File "<stdin>", line 1, in <module> File &...
YJ C's user avatar
  • 13
0 votes
1 answer
235 views

jitify file not found

I am absolutely new to python programming anf VS code. I want to do GPU programming and i installed CUDA toolkit, pip installed cupy and tried running gpu codes but i get this runtime error ../...
Anish Kumar's user avatar
1 vote
1 answer
235 views

Using CuPy on Maxwell GPU

Anyone here trying to use cupy on a Maxwell GPU? I am trying to do a simple array.mean() operation and getting the message below. Is there a way I can get around this? Do I need to install a different ...
Stanley Powerlock's user avatar
1 vote
0 answers
149 views

Saving a Cupy array directly to JPEG without converting to NumPy

I'm currently facing a challenge in my project where I need to save large Cupy arrays directly to JPEG files without the intermediate step of converting them to NumPy arrays due to performance ...
nyar's user avatar
  • 97
-2 votes
2 answers
122 views

How do I parallelize a set of matrix multiplications

Consider the following operation, where I take 20 x 20 slices of a larger matrix and dot product them with another 20 x 20 matrix: import numpy as np a = np.random.rand(10, 20) b = np.random.rand(20, ...
anonymous1a's user avatar
  • 1,270
3 votes
0 answers
352 views

cupy cooperative_groups.h: [jitify] File not found

from cupyx.scipy.signal import convolve2d as convolve2d_gpu convolved_image_using_GPU = convolve2d_gpu(deltas_gpu, gauss_gpu) %timeit -n 7 -r 1 convolved_image_using_GPU = convolve2d_gpu(deltas_gpu, ...
Amritesh's user avatar
0 votes
3 answers
124 views

More efficient way of looping over a multidimensional numpy array other than numpy.where

I have a nested array of shape: [200, 500, 1000]. Each index represents a coordinate of an image, eg array[1, 2, 3] would give me the value of the array at x=1, y=2, and z=3 in coordinate space. I ...
postnubilaphoebus's user avatar
-1 votes
2 answers
127 views

Fast tensor-dot on sparse arrays with GPU in any programming language?

I'm now working on two multi-dimensional arrays arr and cost. arr is dense with size (width, height, m, n) and cost is sparse with size (width, height, width, height). Values: width and height are ...
C.K.'s user avatar
  • 1,559
3 votes
2 answers
345 views

cupy runtime compilation failed

I'm new to cupy and try to learn it. This following code provides an error using cuda11 import numpy import cupy def monte_carlo_gpu(n:int, m:int)-> float: accum = 0 for i in range(m):...
Harena2019's user avatar
0 votes
0 answers
45 views

How to convert chainer.Variable to PyTorch Tensor?

When try to run a piece of code from neural_renderer, it report the following error. The code is based on Cuda 9.2, and I have to upgrade to Cuda 11.1 in order to support latest GPU, chainer upgrade ...
Lamp's user avatar
  • 342
0 votes
1 answer
145 views

How do I pass in the `--gpu-architecture=compute_89` into a NVRTC kernel with CuPy?

cp.RawModule(code=kernel, backend='nvrtc', options=('--gpu-architecture=compute_89',)) When I try to do it like this, I get an error that the option has already been passed in. Do I have to build the ...
Marko Grdinić's user avatar

1
2 3 4 5
8