Newest 'amd-gpu+amd-gcn' Questions

0 votes

0 answers

67 views

Understanding MUBUF instruction in AMD GCN Architecture

I am trying to understand how MUBUF instruction works using the following kernel. Assume only 1 wavefront (64 WIs). According to ISA ref guide gcn3-instruction-set-architecture.pdf, ADDR = Base + ...

Lokananda Hari

1

asked Jun 16 at 17:56

3 votes

3 answers

363 views

How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?

Motivation I'm doing some micro-benchmarks on AMD GPUs to understand its performance characteristics in order to improve kernel performance. I'm now suspecting that different register allocation and ...

比尔盖子

3,507

asked Sep 17, 2023 at 6:07

2 votes

0 answers

647 views

How to read and write to Global Data Share in AMD GCN?

I'm trying to use GDS in AMD GPU, but I can not make it work. My GPU is AMD RX580. I used this OpenCL kernel: __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void localVarExample(__global ...

Michael Lukin

889

asked Jul 8, 2019 at 20:58

3 votes

1 answer

912 views

How to compile .cl file that contains inline assembly for GCN cards?

There are some examples of inline assembly inside .cl files: Example #1 Example #2 But I cannot find the way they can be compiled. ROCM has this guide. It seems that you can just export the env ...

user1200759

101

asked Aug 22, 2018 at 21:45

0 votes

1 answer

364 views

Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?

Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;

user1200759

101

asked Aug 21, 2018 at 20:50

0 votes

0 answers

177 views

How to run two work groups per one compute unit on AMD GCN cards

Usually one compute unit can only run one work group. But AMD's doc says there can be more than one wavefronts running on the same compute unit. How can I do that? Is that an OpenCL function for that? ...

user1200759

101

asked Aug 18, 2018 at 19:13

0 votes

1 answer

264 views

OpenCL and AMD GPU Architecture understanding

So I was reading the architecture for GCN 1st Generation GPUs provided by the paper here, and I'm a bit confused on the size of the vector ALUs and some other things. According to it, each compute ...

gallickgunner

480

asked Jun 27, 2018 at 10:05

1 vote

1 answer

103 views

V_SUB_F64 in AMD's GCN and VEGA instruction set

Why there is no V_SUB_F64 instruction in AMD's GCN and VEGA instruction set? How do they realize the double precision subtraction?

air_sky_123

11

asked Jun 8, 2018 at 14:26

1 vote

1 answer

451 views

Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL

I am writing a series of test for a GPU's DRAM (global) memory. Specifically targeting AMD GCN architecture of Tahiti and Hawaii model lines. The archs have a write-back L2 caches. What I want is to ...

user2765828

11

asked Jun 25, 2015 at 15:51

Collectives™ on Stack Overflow

All Questions

Understanding MUBUF instruction in AMD GCN Architecture

How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?

How to read and write to Global Data Share in AMD GCN?

How to compile .cl file that contains inline assembly for GCN cards?

Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?

How to run two work groups per one compute unit on AMD GCN cards

OpenCL and AMD GPU Architecture understanding

V_SUB_F64 in AMD's GCN and VEGA instruction set

Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags