All Questions
9 questions
0
votes
0
answers
67
views
Understanding MUBUF instruction in AMD GCN Architecture
I am trying to understand how MUBUF instruction works using the following kernel. Assume only 1 wavefront (64 WIs).
According to ISA ref guide gcn3-instruction-set-architecture.pdf,
ADDR = Base + ...
3
votes
3
answers
363
views
How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?
Motivation
I'm doing some micro-benchmarks on AMD GPUs to understand its performance characteristics in order to improve kernel performance. I'm now suspecting that different register allocation and ...
2
votes
0
answers
647
views
How to read and write to Global Data Share in AMD GCN?
I'm trying to use GDS in AMD GPU, but I can not make it work. My GPU is AMD RX580.
I used this OpenCL kernel:
__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void localVarExample(__global ...
3
votes
1
answer
912
views
How to compile .cl file that contains inline assembly for GCN cards?
There are some examples of inline assembly inside .cl files:
Example #1
Example #2
But I cannot find the way they can be compiled.
ROCM has this guide.
It seems that you can just export the env ...
0
votes
1
answer
364
views
Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
Which of the "+" calculation is faster?
1)
uint2 a, b, c;
c = a + b;
2)
ulong a, b, c;
c = a + b;
0
votes
0
answers
177
views
How to run two work groups per one compute unit on AMD GCN cards
Usually one compute unit can only run one work group. But AMD's doc says there can be more than one wavefronts running on the same compute unit. How can I do that? Is that an OpenCL function for that? ...
0
votes
1
answer
264
views
OpenCL and AMD GPU Architecture understanding
So I was reading the architecture for GCN 1st Generation GPUs provided by the paper here, and I'm a bit confused on the size of the vector ALUs and some other things.
According to it, each compute ...
1
vote
1
answer
103
views
V_SUB_F64 in AMD's GCN and VEGA instruction set
Why there is no V_SUB_F64 instruction in AMD's GCN and VEGA instruction set? How do they realize the double precision subtraction?
1
vote
1
answer
451
views
Triggering L2 cache write to global memory on AMD GCN architecture using OpenCL
I am writing a series of test for a GPU's DRAM (global) memory. Specifically targeting AMD GCN architecture of Tahiti and Hawaii model lines. The archs have a write-back L2 caches.
What I want is to ...