I have a simple kernel as
__global__ void hello_cuda() {
int a = 10;
printf("hello from GPU\n");
}
When I use Nsight compute to see the Source and SASS section, I see:
# Address Source
1 00007fe8 82f9ca00 IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28]
2 00007fe8 82f9ca10 UMOV UR4, 0x0
3 00007fe8 82f9ca20 CS2R R6, SRZ
4 00007fe8 82f9ca30 UMOV UR5, 0x0
5 00007fe8 82f9ca40 MOV R4, UR4
6 00007fe8 82f9ca50 IMAD.U32 R5, RZ, RZ, UR5
7 00007fe8 82f9ca60 MOV R20, 0x0
8 00007fe8 82f9ca70 MOV R21, 0x0
9 00007fe8 82f9ca80 CALL.ABS.NOINC 0x0
10 00007fe8 82f9ca90 EXIT
11 00007fe8 82f9caa0 BRA 0x7fe882f9caa0
12 00007fe8 82f9cab0 NOP
13 00007fe8 82f9cac0 NOP
14 00007fe8 82f9cad0 NOP
15 00007fe8 82f9cae0 NOP
16 00007fe8 82f9caf0 NOP
But when I do cuobjdump -sass saas_source_mapping_trial
, I get
code for sm_52
Function : _Z10hello_cudav
.headerflags @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
/* 0x001fc400fe2007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ MOV32I R4, 0x0 ; /* 0x010000000007f004 */
/*0018*/ MOV32I R5, 0x0 ; /* 0x010000000007f005 */
/* 0x001ff400fec007f1 */
/*0028*/ MOV R6, RZ ; /* 0x5c9807800ff70006 */
/*0030*/ MOV R7, RZ ; /* 0x5c9807800ff70007 */
/*0038*/ JCAL 0x0 ; /* 0xe220000000000040 */
/* 0x001ffc00fda007ef */
/*0048*/ NOP ; /* 0x50b0000000070f00 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ EXIT ; /* 0xe30000000007000f */
/* 0x001f8000fc0007ff */
/*0068*/ BRA 0x60 ; /* 0xe2400fffff07000f */
/*0070*/ NOP; /* 0x50b0000000070f00 */
/*0078*/ NOP; /* 0x50b0000000070f00 */
- Why is there a difference between my SASS and the NCU one?
- How did NCU get the address? The address from my cuobjdump is not the same at all.