Assignment 5 - OpenCL Optimizations
Assignment 5 - OpenCL Optimizations
Assignment 5 - OpenCL Optimizations
Ababa University
Addis Ababa Institute of Technology
School of Electrical and Computer Engineering
Follow the lecture notes posted on the course page to do this assignment
1. write a kernel that accepts two arrays of size N= (64 * 1024 * 1024) chars and also an
offset. What the kernel does is shown below
measure the time it takes to complete running this kernel. You are supposed to vary the
offset from 0,1,2,...,16 and repeat the measurement.
2. Also do the same kind of measurement for the following kernel.
here also vary stride from 1,2,...16. But you will need to limit the global work item
number to N/16.
1. Implement a naive matrix multiplication using OpenCL. Measure the time it takes to
complete a multiplication of two floating point (Real) matrices with dimensions of
1024x1024 (if this does not take long and if you feel you want to see a more relevant
result change it to 2048 x 2048). Also vary the work group size from 4x4, 8x8,....,until
the MAX workgroup size can accommodate.
3. Implement a local memory cached version of the Matrix multiplication and do the same
measurements asked in 1. To do this experiment and appreciate the results you need to do
it on a GPU. I advise you to write you opencl code and test it on your own machine (can
be a computer that does not have a GPU). Then you can do your experiments on a
computer with a dedicated GPU in our lab. The operating system on this machine is
Ubuntu. Please make arrangements with me if you want to test your code on this
machine.