Newest 'nsight' Questions

0 votes

0 answers

18 views

The curious gap in time cost for QKV computation in LLM inference

I use Nsight System to profile the LLM inference process in HuggingFace Transformers framework. I observe that time for q_proj, k_proj and v_proj varies significantly. As far as I know, the Q, K ...

CarryPls

23

asked Nov 29 at 9:08

0 votes

1 answer

61 views

What do the Instruction Statistics fields in Nsight Compute mean? How do they relate to elapsed cycles?

In my example, what is the meaning of 'Executed Instructions'? According to the literal meaning， it would mean how many instruction have been executed. But how does it relate to the total run time (...

sorfkc

13

asked Oct 19 at 1:59

0 votes

1 answer

96 views

How can I create a container in which to use the Nvidia Nsight Systems graphical interface?

I am looking to create a container in which I can work with the graphical interface of the Nvidia Nsight Systems tool, to be able to obtain application reports with cuda and python, I have found ...

Gota_12

23

asked Sep 19 at 19:24

1 vote

1 answer

304 views

How to check my tensor core occupancy and utilization by Nsight Compute?

In my cuda program, I use many tensor cores operations like m8n8k4 and even use cusparseSpMV. However, when checking the ncu report, it shows like this: There is no active tensors in my program. The ...

Severus Snape

23

asked Sep 4 at 12:20

1 vote

0 answers

149 views

How to generate a roofline analysis by Nsight?

When I was trying to analyze the performance of a kernel, I used ncu command to generate a report. However, it didn't display the roofline analysis under the section "GPU Speed of Light Troughput&...

Severus Snape

23

asked Sep 2 at 16:23

0 votes

0 answers

26 views

Nsight Compute + Roofline chart

I am new to using Nsight Compute and have a question about the roofline chart. When I profile different kernels on Nsight Compute and view their roofline charts, nothing is shown for some kernels, ...

Sahar M

1

asked Aug 8 at 14:32

0 votes

0 answers

76 views

Imcompatible Qt library when running nsight compute ncu-ui

I am using Ubuntu 22.04 (x86_64 architecture) and my goal is to run NVIDIA Nsight Compute ncu-ui command to visualize some GPU performance profiling outcomes. When I run ncu-ui, the following message ...

chchien

1

asked Aug 6 at 16:44

1 vote

1 answer

714 views

how to get CUDA syntax highlighting in Nsight VSCode extension when when cuda toolkit installed by Conda?

I'm using Fedora 39 and installed cudatoolkit using conda install in a conda env (not base). When inside the conda env, I can do nvcc foo.cu && ./a.out and it works fine. (when I do which nvcc,...

xdavidliu

3,017

asked Mar 23 at 19:59

3 votes

0 answers

116 views

Compute and Data transfer not happening concurrently in cuda Streams on Iteration 2

I have written a basic program where a chunk of data is loaded in CPU memory (Pinned), and then I transfer it in chunks to GPUs (Asynchronously), and then do computation on each chunk. So for each ...

Lokesh

31

asked Mar 8 at 18:38

1 vote

1 answer

366 views

Problems when profiling LLM-training using "huggingface/accelerate" to Night system

I am learning the Llama model in a multi-node environment using huggingface/accelerate, and if I run it as follows to profile it, the program will die due to a problem with the ssh connection to ...

상현박

13

asked Feb 20 at 11:28

1 vote

0 answers

131 views

How to debug shader (OpenGL) per pixel in nsight graphic like render doc

In renderdoc, It's easy .But this feature doesn't support Opengl So I try to use Nsight. But I don't know how to do can I reproduce this operation in nsight. Its interface is too complex，and I also ...

bad apple

11

asked Jan 5 at 9:20

0 votes

0 answers

102 views

CUDA profiling aten::mul Does it only include calculation time or does it include time to access memory?

The following results were obtained by pytorch cuda profiling. ---------------------------------------------------------------------------- Name Self CPU% ... Self CUDA ...

kmkm

1

asked Dec 19, 2023 at 12:58

0 votes

0 answers

183 views

How do simple warps causing low warp occupansy and high register usage?

During the warp occupancy investigation of my gbuffer pass, I found even if I simplify the scene and the shader, the nsight still reports a very low warp occupancy, or even much lower than the ...

painkiller

175

asked Dec 16, 2023 at 8:34

1 vote

0 answers

277 views

Can NVIDIA Nsight still be used to debug shaders?

Numerous online resources claim it is possible to debug OpenGL shaders using NVIDIA Nsight Visual Studio Edition. Here is an old video of it being done. However, the Nsight VSE page mentions "the ...

Leon Frickenschmidt

329

asked Oct 3, 2023 at 21:31

1 vote

0 answers

46 views

How to capture a bake program running without rendering window using NSight

I'm writing a DXR baker program, since it's just a baker generating ray traced results to a buffer, I didn't write any rendering window for this baker. It just keep calling DXR's dispatchRay API in a ...

Wood

975

asked Jul 28, 2023 at 10:19

1 vote

1 answer

271 views

CUDA math function register usage

I am trying to understand the significant register usage incurred when using a few of the built-in CUDA math ops like atan2() or division and how the register usage might be reduced/eliminated. I'm ...

Chris Uchytil

150

asked Jul 14, 2023 at 0:21

2 votes

1 answer

660 views

Roofline Model with CUDA Manual vs. Nsight Compute

I have a very simple vector addition kernel written for CUDA. I want to calculate the arithmetic intensity as well as GFLOP/s for this Kernel. The values I calculate differ visibly from the values ...

Cherry Toska

181

asked Jul 12, 2023 at 19:10

1 vote

1 answer

440 views

Power Usage Profiling in Nsight?

New to Nsight and GPU programming. I need a way to evaluate the affect my code has on power usage in the GPU. This article from 2013 shows that the feature was part of Nsight's toolset at some point, ...

Lauren Vk

13

asked Jul 7, 2023 at 13:58

0 votes

2 answers

10k views

Nsys CLI profiling guidance

I am just entering into the CUDA development world and now trying to profile my code. Expected to run the nvprof tool for profiling, but get the following error: ======== Warning: This version of ...

dru10

33

asked May 19, 2023 at 19:29

0 votes

1 answer

902 views

How to use ncu command to profile average time/usage/etc for a kernel repeating 10 times?

For example, I have a test program for 5 kernels: int main() { for (int i = 0; i < 10; i++){ kernel_1<<<...>>>(...); // warm up } for (int i = 0; i < 10; i++...

thanksarose

39

asked Apr 15, 2023 at 19:18

1 vote

1 answer

2k views

Trouble using Nsight Compute on Google Colab: 'command not found' error with ncu and installation script error with Nsight Compute

I am trying to use ncu on Colab, however when I type ncu /bin/bash: ncu: command not found A few days ago this command was working fine, I am unsure if I am making some mistakes in the code or if it ...

Alessandro Bossi

167

asked Apr 9, 2023 at 19:27

1 vote

2 answers

1k views

How to get average execution time of CUDA kernel using NSight Systems or NSight Compute

Suppose I have a simple CLI test app named "Foo". This app executes a kernel "Bar" 100 times in a loop. How may I obtain an average kernel execution time for Bar, using Nsight ...

Tyson Hilmer

771

asked Mar 23, 2023 at 9:29

0 votes

1 answer

760 views

Error in profiling shared memory atomic kernel in Nsight Compute

I am trying the global atomics vs shared atomics code from NVIDIA blog https://developer.nvidia.com/blog/gpu-pro-tip-fast-histograms-using-shared-atomics-maxwell/ But when I am trying to profile with ...

yolo_ML

15

asked Mar 9, 2023 at 12:03

0 votes

1 answer

3k views

Failed to open dynamic library RTSSVkLayer64.dll when using NVIDIA NSight Graphic to debug app

I am writing some tiny game app using Rust and use Vulkan as the graphic api. It is perfect to debug my app in RenderDoc but something went wrong when I am trying to debug my app in NVIDIA NSight. ...

CrystaLamb

23

asked Feb 21, 2023 at 6:36

1 vote

2 answers

567 views

How do I profile OpenMP offloading code compiled by clang

I am currently working with OpenMP offloading using LLVM/clang-16 (built from the github repository). Using the built-in profiling tools in clang (using environment variables such as ...

Dogyman

31

asked Jan 6, 2023 at 13:15

0 votes

1 answer

376 views

How to use CUPTI to get metrics related to Launch Metrics, Source Metrics and Instructions Per Opcode Metrics

I am able to use ncu to get the metrics related to Launch Metrics, Source Metrics and Instructions Per Opcode Metrics (found here). However I am unable to use CUPTI to get the values after modifying ...

BoringSession

73

asked Dec 5, 2022 at 20:16

0 votes

1 answer

313 views

Command to run callback_profiling sample from CUPTI

I am running the sample code available for Nvidia CUDA CUPTI in /usr/local/cuda-11.8/extras/CUPTI/samples/callback_profiling. There is a Makefile, but I want to run it using single command (without ...

BoringSession

73

asked Nov 17, 2022 at 13:14

0 votes

1 answer

282 views

Difference in SASS using cuobjdump and Nsight compute

I have a simple kernel as __global__ void hello_cuda() { int a = 10; printf("hello from GPU\n"); } When I use Nsight compute to see the Source and SASS section, I see: # Address ...

BoringSession

73

asked Nov 9, 2022 at 20:07

0 votes

1 answer

1k views

NSight Compute not showing achieved occupancy in the metrics

I want to calculate the achieved occupancy and compare it with the value that is being displayed in Nsight Compute. ncu says: Theoretical Occupancy [%] 100, and Achieved Occupancy [%] 93,04. What ...

BoringSession

73

asked Oct 29, 2022 at 12:22

0 votes

2 answers

105 views

The number of times to run a profiling experiment

I am trying to profile a CUDA Application. I had a basic doubt about performance analysis and workload characterization of HPC programs. Let us say I want to analyse the wall clock time(the end-to-end ...

punter147

312

asked Oct 28, 2022 at 11:40

1 vote

1 answer

818 views

Nsight Compute profiling of a device function in a kernel

I am trying to use Nsight Compute to profile kernels in my CUDA code. But how do I profile functions inside a kernel? Say for example, I have 2 functions (device functions) in a kernel (global). ...

BoringSession

73

asked Oct 25, 2022 at 20:59

2 votes

1 answer

323 views

Can Nsight Systems use debug info URLs?

So I am on Arch Linux and the libraries from the official repositories do not ship with debug symbols. To work around this in most debugging tools, one can use DEBUGINFOD_URLS=https://debuginfod....

TIL

35

asked Oct 20, 2022 at 10:11

0 votes

2 answers

679 views

CUDA kernel launched from Nsight Compute gives inconsistent results

I have completed writing my CUDA kernel, and confirmed it runs as expected when I compile it using nvcc directly, by: Validating with test data over 100 runs (just in case) Using cuda-memcheck (...

forever__newbie

31

asked Oct 17, 2022 at 9:01

0 votes

1 answer

3k views

Nsys Does not show the CUDA kernels profiling output

My system is V100 with the following information: | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.6 | NVIDIA Nsight Systems version 2021.5.2.53-28d0e6e sudo sh -c “echo 2 >/proc/...

Hossam Amer

19

asked Oct 1, 2022 at 11:39

0 votes

1 answer

2k views

How to see NVTX markers in Nvidia Nsight Systems? With host and guest being the same Windows machine

I am trying profiling CPU/GPU applications, using Nsight suite. Currently trying to understand a stuttering problem, I added a range around the simulation step (taking place on the CPU): #include &...

Ad N

8,366

asked Aug 24, 2022 at 15:19

0 votes

1 answer

532 views

Cuda-gdb in vscode, Cannot find user-level thread for LWP 4077: generic error

I am trying to set up cuda programming in vs code and ran into this problem where cuda-gdb just returns an error. I tried running it with regular gdb and that works. I am using wsl. running the "...

William Hofsøy

13

asked Jul 31, 2022 at 15:37

1 vote

1 answer

215 views

OpenGL - Is there a way to track actually used memory allocated by glBufferData / glBufferSubData?

There is a big codebase which allocates empty fixed size of GPU memory using glBufferData function, and fills/updates these empty allocated space partially using glBufferSubData. Since not all of the ...

user1559792

377

asked Jul 5, 2022 at 15:03

1 vote

2 answers

785 views

VSCode fail to debug Cython wrapped CUDA code (but CLI cuda-gdb can)

Background: Running VSCode on Ubuntu 20.04 The following have been accomplished: (a) Compiled and build the Cython wrapper for CUDA code (packaged as shared library .so); (b) Python script importing ...

CorneliusJack

113

asked Jul 1, 2022 at 6:49

1 vote

0 answers

1k views

How to get detailed Nvidia GPU usage?

Nvidia-smi only provides a few metrics to measure GPU utilization. Most importantly, utilization.gpu represents the percent of time over the past sample period during which one or more kernels was ...

gebbissimo

2,579

asked Jun 30, 2022 at 15:13

0 votes

1 answer

929 views

Nsight Graphics and RenderDoc cannot trace application

I am stuck writing a Vulkan renderer. The final output I see on the screen is only the clear color, animated over time, but no geometries. Even with all possible validation turned on I dont get any ...

Samwise

96

asked Apr 6, 2022 at 22:17

1 vote

1 answer

116 views

What the `ipa` pipeline is about in CUDA architecture?

When looking into ncu --query-metrics it turns out that several counters are about this ipa pipeline that isn't even cited in NSight docs, smsp__inst_executed_pipe_ipa for example. While for all of ...

nazavode

11

asked Feb 20, 2022 at 11:17

1 vote

0 answers

316 views

Nvidia Nsight crashes when creating BLAS. What could be the cause?

EDIT: I found the mistake in the code. I mistakenly set up the "max_primive_count" to 3, but it should be 1, since I only wanted to display one single triangle. Also the maxVertex should be ...

Ruslan

31

asked Nov 29, 2021 at 19:23

0 votes

1 answer

2k views

nsys profile multiple processes

I'd like to experiment with MPS on Nvidia GPUs, therefore I'd like to be able to profile two process running in parallel. With the, now deprecated, nvprof, there used to be an option "--profile-...

Blaizz

57

asked Oct 15, 2021 at 7:30

1 vote

0 answers

340 views

Nsys Profile with MPMD(multiple program and multiple data) simulation

I am trying to profile a MPI+OPENACC program with nsys. I am using OpenMPI(3.1.6) from Nvidia HPC SDK(20.7) with UCX enabled. There are three exectuables, exec1, exec2, exec3. I want to profile for ...

HEMANT GIRI

31

asked May 17, 2021 at 9:00

-2 votes

1 answer

557 views

NSight Compute - expecting bank conflicts but not detecting any

I was trying to detect shared memory bank conflicts for matrix transposition kernels. The first kernel performs matrix transposition without padding, and hence should have bank conflicts, while the ...

loonatick

1,107

asked Apr 26, 2021 at 13:10

0 votes

1 answer

2k views

Tracing custom CUDA kernels with Nsight Systems

I work on library which is implemented in C++20 and CUDA 11. This library is called from Python via ctypes through a C API that just exchanges JSON strings. We compile it using Clang 11. In order to ...

Martin Ueding

8,669

asked Apr 20, 2021 at 11:22

1 vote

1 answer

1k views

"Start Performance Analysis" button missing on Nsight + Visual Studio

I am usually debug my kernel and check timing with "Start Performance Analysis" Button. It shows When I used CUDA 10.2, RTX Titan V. But, That button now shown since I upgraded CUDA version ...

powermew

133

asked Apr 20, 2021 at 9:14

1 vote

1 answer

2k views

NVIDIA Nsight Systems CLI not getting memory statistics

I'm using NVIDIA Nsight Systems cli (nsys) to profile a simple cuda program (vectors adding). I've already checked the documentation but I think I'm missing something. I'm running the nsys profile ...

l.g.karolos

1,142

asked Apr 7, 2021 at 2:13

-3 votes

1 answer

528 views

Cuda debugging using Single GPU with visual studio

I am working on Windows 7, Visual studio 2010. Can we debug cuda code using single GPU which also providing display to the monitor in the same PC? What tools are available ? NSIGHT seems to be working ...

gpuguy

4,595

asked Jan 1, 2021 at 7:35

0 votes

1 answer

426 views

VS2019 Nsight extension installed, not showing up in Manage Extension and impossible to disable

I have the Nsight extension installed on VS2019 and it shows up in the menu: Unfortunately, it makes Intellisense unbearably slow, so I would like to disable that extension, however, it doesn't show ...

Damien

1,542

asked Dec 28, 2020 at 5:04

Collectives™ on Stack Overflow

Related Tags