Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
67 views

Understanding MUBUF instruction in AMD GCN Architecture

I am trying to understand how MUBUF instruction works using the following kernel. Assume only 1 wavefront (64 WIs). According to ISA ref guide gcn3-instruction-set-architecture.pdf, ADDR = Base + ...
Lokananda Hari's user avatar
3 votes
3 answers
363 views

How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?

Motivation I'm doing some micro-benchmarks on AMD GPUs to understand its performance characteristics in order to improve kernel performance. I'm now suspecting that different register allocation and ...
比尔盖子's user avatar
  • 3,507
2 votes
0 answers
647 views

How to read and write to Global Data Share in AMD GCN?

I'm trying to use GDS in AMD GPU, but I can not make it work. My GPU is AMD RX580. I used this OpenCL kernel: __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void localVarExample(__global ...
Michael Lukin's user avatar
3 votes
1 answer
912 views

How to compile .cl file that contains inline assembly for GCN cards?

There are some examples of inline assembly inside .cl files: Example #1 Example #2 But I cannot find the way they can be compiled. ROCM has this guide. It seems that you can just export the env ...
user1200759's user avatar
0 votes
1 answer
364 views

Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?

Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;
user1200759's user avatar
0 votes
0 answers
177 views

How to run two work groups per one compute unit on AMD GCN cards

Usually one compute unit can only run one work group. But AMD's doc says there can be more than one wavefronts running on the same compute unit. How can I do that? Is that an OpenCL function for that? ...
user1200759's user avatar
0 votes
1 answer
264 views

OpenCL and AMD GPU Architecture understanding

So I was reading the architecture for GCN 1st Generation GPUs provided by the paper here, and I'm a bit confused on the size of the vector ALUs and some other things. According to it, each compute ...
gallickgunner's user avatar
1 vote
1 answer
103 views

V_SUB_F64 in AMD's GCN and VEGA instruction set

Why there is no V_SUB_F64 instruction in AMD's GCN and VEGA instruction set? How do they realize the double precision subtraction?
air_sky_123's user avatar
1 vote
1 answer
451 views

Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL

I am writing a series of test for a GPU's DRAM (global) memory. Specifically targeting AMD GCN architecture of Tahiti and Hawaii model lines. The archs have a write-back L2 caches. What I want is to ...
user2765828's user avatar