364 questions
0
votes
0
answers
18
views
The curious gap in time cost for QKV computation in LLM inference
I use Nsight System to profile the LLM inference process in HuggingFace Transformers framework. I observe that time for q_proj, k_proj and v_proj varies significantly. As far as I know, the Q, K ...
0
votes
1
answer
61
views
What do the Instruction Statistics fields in Nsight Compute mean? How do they relate to elapsed cycles?
In my example, what is the meaning of 'Executed Instructions'? According to the literal meaning, it would mean how many instruction have been executed.
But how does it relate to the total run time (...
0
votes
1
answer
96
views
How can I create a container in which to use the Nvidia Nsight Systems graphical interface?
I am looking to create a container in which I can work with the graphical interface of the Nvidia Nsight Systems tool, to be able to obtain application reports with cuda and python, I have found ...
1
vote
1
answer
304
views
How to check my tensor core occupancy and utilization by Nsight Compute?
In my cuda program, I use many tensor cores operations like m8n8k4 and even use cusparseSpMV. However, when checking the ncu report, it shows like this:
There is no active tensors in my program. The ...
1
vote
0
answers
149
views
How to generate a roofline analysis by Nsight?
When I was trying to analyze the performance of a kernel, I used ncu command to generate a report. However, it didn't display the roofline analysis under the section "GPU Speed of Light Troughput&...
0
votes
0
answers
26
views
Nsight Compute + Roofline chart
I am new to using Nsight Compute and have a question about the roofline chart. When I profile different kernels on Nsight Compute and view their roofline charts, nothing is shown for some kernels, ...
0
votes
0
answers
76
views
Imcompatible Qt library when running nsight compute ncu-ui
I am using Ubuntu 22.04 (x86_64 architecture) and my goal is to run NVIDIA Nsight Compute ncu-ui command to visualize some GPU performance profiling outcomes. When I run ncu-ui, the following message ...
1
vote
1
answer
714
views
how to get CUDA syntax highlighting in Nsight VSCode extension when when cuda toolkit installed by Conda?
I'm using Fedora 39 and installed cudatoolkit using conda install in a conda env (not base). When inside the conda env, I can do nvcc foo.cu && ./a.out and it works fine. (when I do which nvcc,...
3
votes
0
answers
116
views
Compute and Data transfer not happening concurrently in cuda Streams on Iteration 2
I have written a basic program where a chunk of data is loaded in CPU memory (Pinned), and then I transfer it in chunks to GPUs (Asynchronously), and then do computation on each chunk. So for each ...
1
vote
1
answer
366
views
Problems when profiling LLM-training using "huggingface/accelerate" to Night system
I am learning the Llama model in a multi-node environment using huggingface/accelerate, and if I run it as follows to profile it, the program will die due to a problem with the ssh connection to ...
1
vote
0
answers
131
views
How to debug shader (OpenGL) per pixel in nsight graphic like render doc
In renderdoc, It's easy .But this feature doesn't support Opengl
So I try to use Nsight.
But I don't know how to do can I reproduce this operation in nsight. Its interface is too complex,and I also ...
0
votes
0
answers
102
views
CUDA profiling aten::mul Does it only include calculation time or does it include time to access memory?
The following results were obtained by pytorch cuda
profiling.
----------------------------------------------------------------------------
Name Self CPU% ... Self CUDA ...
0
votes
0
answers
183
views
How do simple warps causing low warp occupansy and high register usage?
During the warp occupancy investigation of my gbuffer pass, I found even if I simplify the scene and the shader, the nsight still reports a very low warp occupancy, or even much lower than the ...
1
vote
0
answers
277
views
Can NVIDIA Nsight still be used to debug shaders?
Numerous online resources claim it is possible to debug OpenGL shaders using NVIDIA Nsight Visual Studio Edition. Here is an old video of it being done.
However, the Nsight VSE page mentions "the ...
1
vote
0
answers
46
views
How to capture a bake program running without rendering window using NSight
I'm writing a DXR baker program, since it's just a baker generating ray traced results to a buffer, I didn't write any rendering window for this baker. It just keep calling DXR's dispatchRay API in a ...
1
vote
1
answer
271
views
CUDA math function register usage
I am trying to understand the significant register usage incurred when using a few of the built-in CUDA math ops like atan2() or division and how the register usage might be reduced/eliminated.
I'm ...
2
votes
1
answer
660
views
Roofline Model with CUDA Manual vs. Nsight Compute
I have a very simple vector addition kernel written for CUDA.
I want to calculate the arithmetic intensity as well as GFLOP/s for this Kernel.
The values I calculate differ visibly from the values ...
1
vote
1
answer
440
views
Power Usage Profiling in Nsight?
New to Nsight and GPU programming. I need a way to evaluate the affect my code has on power usage in the GPU.
This article from 2013 shows that the feature was part of Nsight's toolset at some point, ...
0
votes
2
answers
10k
views
Nsys CLI profiling guidance
I am just entering into the CUDA development world and now trying to profile my code. Expected to run the nvprof tool for profiling, but get the following error:
======== Warning: This version of ...
0
votes
1
answer
902
views
How to use ncu command to profile average time/usage/etc for a kernel repeating 10 times?
For example, I have a test program for 5 kernels:
int main()
{
for (int i = 0; i < 10; i++){
kernel_1<<<...>>>(...); // warm up
}
for (int i = 0; i < 10; i++...
1
vote
1
answer
2k
views
Trouble using Nsight Compute on Google Colab: 'command not found' error with ncu and installation script error with Nsight Compute
I am trying to use ncu on Colab, however when I type
ncu
/bin/bash: ncu: command not found
A few days ago this command was working fine, I am unsure if I am making some mistakes in the code or if it ...
1
vote
2
answers
1k
views
How to get average execution time of CUDA kernel using NSight Systems or NSight Compute
Suppose I have a simple CLI test app named "Foo". This app executes a kernel "Bar" 100 times in a loop.
How may I obtain an average kernel execution time for Bar, using Nsight ...
0
votes
1
answer
760
views
Error in profiling shared memory atomic kernel in Nsight Compute
I am trying the global atomics vs shared atomics code from NVIDIA blog https://developer.nvidia.com/blog/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/
But when I am trying to profile with ...
0
votes
1
answer
3k
views
Failed to open dynamic library RTSSVkLayer64.dll when using NVIDIA NSight Graphic to debug app
I am writing some tiny game app using Rust and use Vulkan as the graphic api.
It is perfect to debug my app in RenderDoc but something went wrong when I am trying to debug my app in NVIDIA NSight. ...
1
vote
2
answers
567
views
How do I profile OpenMP offloading code compiled by clang
I am currently working with OpenMP offloading using LLVM/clang-16 (built from the github repository). Using the built-in profiling tools in clang (using environment variables such as ...
0
votes
1
answer
376
views
How to use CUPTI to get metrics related to Launch Metrics, Source Metrics and Instructions Per Opcode Metrics
I am able to use ncu to get the metrics related to Launch Metrics, Source Metrics and Instructions Per Opcode Metrics (found here). However I am unable to use CUPTI to get the values after modifying ...
0
votes
1
answer
313
views
Command to run callback_profiling sample from CUPTI
I am running the sample code available for Nvidia CUDA CUPTI in /usr/local/cuda-11.8/extras/CUPTI/samples/callback_profiling. There is a Makefile, but I want to run it using single command (without ...
0
votes
1
answer
282
views
Difference in SASS using cuobjdump and Nsight compute
I have a simple kernel as
__global__ void hello_cuda() {
int a = 10;
printf("hello from GPU\n");
}
When I use Nsight compute to see the Source and SASS section, I see:
# Address ...
0
votes
1
answer
1k
views
NSight Compute not showing achieved occupancy in the metrics
I want to calculate the achieved occupancy and compare it with the value that is being displayed in Nsight Compute.
ncu says: Theoretical Occupancy [%] 100, and Achieved Occupancy [%] 93,04. What ...
0
votes
2
answers
105
views
The number of times to run a profiling experiment
I am trying to profile a CUDA Application. I had a basic doubt about performance analysis and workload characterization of HPC programs. Let us say I want to analyse the wall clock time(the end-to-end ...
1
vote
1
answer
818
views
Nsight Compute profiling of a __device__ function in a kernel
I am trying to use Nsight Compute to profile kernels in my CUDA code. But how do I profile functions inside a kernel? Say for example, I have 2 functions (device functions) in a kernel (global). ...
2
votes
1
answer
323
views
Can Nsight Systems use debug info URLs?
So I am on Arch Linux and the libraries from the official repositories do not ship with debug symbols. To work around this in most debugging tools, one can use DEBUGINFOD_URLS=https://debuginfod....
0
votes
2
answers
679
views
CUDA kernel launched from Nsight Compute gives inconsistent results
I have completed writing my CUDA kernel, and confirmed it runs as expected when I compile it using nvcc directly, by:
Validating with test data over 100 runs (just in case)
Using cuda-memcheck (...
0
votes
1
answer
3k
views
Nsys Does not show the CUDA kernels profiling output
My system is V100 with the following information:
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.6 |
NVIDIA Nsight Systems version 2021.5.2.53-28d0e6e
sudo sh -c “echo 2 >/proc/...
0
votes
1
answer
2k
views
How to see NVTX markers in Nvidia Nsight Systems? With host and guest being the same Windows machine
I am trying profiling CPU/GPU applications, using Nsight suite.
Currently trying to understand a stuttering problem, I added a range around the simulation step (taking place on the CPU):
#include &...
0
votes
1
answer
532
views
Cuda-gdb in vscode, Cannot find user-level thread for LWP 4077: generic error
I am trying to set up cuda programming in vs code and ran into this problem where cuda-gdb just returns an error. I tried running it with regular gdb and that works. I am using wsl.
running the "...
1
vote
1
answer
215
views
OpenGL - Is there a way to track actually used memory allocated by glBufferData / glBufferSubData?
There is a big codebase which allocates empty fixed size of GPU memory using glBufferData function, and fills/updates these empty allocated space partially using glBufferSubData. Since not all of the ...
1
vote
2
answers
785
views
VSCode fail to debug Cython wrapped CUDA code (but CLI cuda-gdb can)
Background: Running VSCode on Ubuntu 20.04
The following have been accomplished:
(a) Compiled and build the Cython wrapper for CUDA code (packaged as shared library .so);
(b) Python script importing ...
1
vote
0
answers
1k
views
How to get detailed Nvidia GPU usage?
Nvidia-smi only provides a few metrics to measure GPU utilization. Most importantly, utilization.gpu represents the percent of time over the past sample period during which one or more kernels was ...
0
votes
1
answer
929
views
Nsight Graphics and RenderDoc cannot trace application
I am stuck writing a Vulkan renderer. The final output I see on the screen is only the clear color, animated over time, but no geometries. Even with all possible validation turned on I dont get any ...
1
vote
1
answer
116
views
What the `ipa` pipeline is about in CUDA architecture?
When looking into ncu --query-metrics it turns out that several counters are about this ipa pipeline that isn't even cited in NSight docs, smsp__inst_executed_pipe_ipa for example. While for all of ...
1
vote
0
answers
316
views
Nvidia Nsight crashes when creating BLAS. What could be the cause?
EDIT: I found the mistake in the code. I mistakenly set up the "max_primive_count" to 3, but it should be 1, since I only wanted to display one single triangle. Also the maxVertex should be ...
0
votes
1
answer
2k
views
nsys profile multiple processes
I'd like to experiment with MPS on Nvidia GPUs, therefore I'd like to be able to profile two process running in parallel.
With the, now deprecated, nvprof, there used to be an option "--profile-...
1
vote
0
answers
340
views
Nsys Profile with MPMD(multiple program and multiple data) simulation
I am trying to profile a MPI+OPENACC program with nsys.
I am using OpenMPI(3.1.6) from Nvidia HPC SDK(20.7) with UCX enabled.
There are three exectuables, exec1, exec2, exec3. I want to profile for ...
-2
votes
1
answer
557
views
NSight Compute - expecting bank conflicts but not detecting any
I was trying to detect shared memory bank conflicts for matrix transposition kernels. The first kernel performs matrix transposition without padding, and hence should have bank conflicts, while the ...
0
votes
1
answer
2k
views
Tracing custom CUDA kernels with Nsight Systems
I work on library which is implemented in C++20 and CUDA 11. This library is called from Python via ctypes through a C API that just exchanges JSON strings. We compile it using Clang 11.
In order to ...
1
vote
1
answer
1k
views
"Start Performance Analysis" button missing on Nsight + Visual Studio
I am usually debug my kernel and check timing with "Start Performance Analysis" Button.
It shows When I used CUDA 10.2, RTX Titan V.
But, That button now shown since I upgraded CUDA version ...
1
vote
1
answer
2k
views
NVIDIA Nsight Systems CLI not getting memory statistics
I'm using NVIDIA Nsight Systems cli (nsys) to profile a simple cuda program (vectors adding). I've already checked the documentation but I think I'm missing something.
I'm running the nsys profile ...
-3
votes
1
answer
528
views
Cuda debugging using Single GPU with visual studio
I am working on Windows 7, Visual studio 2010.
Can we debug cuda code using single GPU which also providing display to the monitor in the same PC?
What tools are available ? NSIGHT seems to be working ...
0
votes
1
answer
426
views
VS2019 Nsight extension installed, not showing up in Manage Extension and impossible to disable
I have the Nsight extension installed on VS2019 and it shows up in the menu:
Unfortunately, it makes Intellisense unbearably slow, so I would like to disable that extension, however, it doesn't show ...