0

I haven't written many Metal kernel shaders yet; here's a fledgling "fade" shader between two RGBX-32 images, using a tween value of 0.0 to 1.0 between inBuffer1 (0.0) to inBuffer2 (1.0).

Is there something I'm missing here? Something strikes me that this may be terribly inefficient.

My first inkling is to attempt to do subtraction and multiplication using the vector data types (eg. char4) thinking that might be better, but the results of this are certainly undefined (as some components will be negative).

Also, is there some advantage to using MTLTexture versus MTLBuffer objects as I've done?

kernel void fade_Kernel(device const uchar4  *inBuffer1  [[ buffer(0) ]],
                        device const uchar4  *inBuffer2  [[ buffer(1) ]],
                        device const float   *tween      [[ buffer(2) ]],
                        device uchar4        *outBuffer  [[ buffer(3) ]],
                        uint gid [[ thread_position_in_grid ]])
{
    const float t = tween[0];
    uchar4 pixel1 = inBuffer1[gid];
    uchar4 pixel2 = inBuffer2[gid];

    // these values will be negative
    short r=(pixel2.r-pixel1.r)*t;  
    short g=(pixel2.g-pixel1.g)*t;
    short b=(pixel2.b-pixel1.b)*t;

    outBuffer[gid]=uchar4(pixel1.r+r,pixel1.g+g,pixel1.b+b,0xff);
}

2 Answers 2

2

First, you should probably declare the tween parameter as:

constant float &tween [[ buffer(2) ]],

Using the constant address space is more appropriate for a value like this that's the same for all invocations of the function (and not indexed into by grid position or the like). Also, making it a reference instead of a pointer tells the compiler that you won't be indexing other elements in the "array" that a pointer might be.

Finally, there's a mix() function that performs exactly the sort of computation that you're doing here. So, you could replace the body of the function with:

uchar4 pixel1 = inBuffer1[gid];
uchar4 pixel2 = inBuffer2[gid];

outBuffer[gid] = uchar4(uchar3(mix(float3(pixel1.rgb), float3(pixel2.rgb), tween)), 0xff);

As to whether it would be better to use textures, that depends somewhat on what you plan to do with the result after running this kernel. If you're going to be doing texture-like things with it anyway, it might be better to use textures all throughout. Indeed, it might be better to use drawing operations with blending rather than a compute kernel. After all, such blending is something GPUs have to do all the time, so that path is probably fast. You'd have to test the performance of each approach.

3
  • Thanks for this, Ken. You're very helpful, once again. Oddly "mix" doesn't appear to be part of <metal_common> under pre-Metal2 implementations. Looking the the Metal Shader Language docs, I can do "saturate" but not "mix" -> No matching function for call to 'mix'.
    – zzyzy
    Commented Jul 23, 2017 at 16:40
  • My mistake. mix() only works for floating-point types. I've edited my answer to convert back and forth. The conversion was implicit in your original code. You might want to wrap the call to mix() in a call to round(), too, although your original code just truncated like my new code does. Commented Jul 23, 2017 at 21:41
  • Thanks Ken. If alpha was to be considered, I suppose outBuffer[gid]=uchar4(mix(float4(pixel1),float4(pixel2),tween)) could also work just fine.
    – zzyzy
    Commented Jul 24, 2017 at 22:03
0

If you are dealing with images, it's much more efficient to use MTLTexture than MTLBuffer. It is also better to use "half" than "uchar". I've learned this directly from an Apple engineer at WWDC this year.

kernel void alpha(texture2d<half, access::read>  inTexture2  [[texture(0)]],
    texture2d<half, access::read>  inTexture1  [[texture(1)]],
    texture2d<half, access::write> outTexture [[texture(2)]],
    const device float& tween [[ buffer(3) ]],
    uint2 gid [[thread_position_in_grid]]) 
{
    // Check if the pixel is within the bounds of the output texture
    if((gid.x >= outTexture.get_width()) || (gid.y >= outTexture.get_height())) {
        // Return early if the pixel is out of bounds
        return;
    }
    half4 color1  = inTexture1.read(gid);
    half4 color2  = inTexture2.read(gid);
    outTexture.write(half4(mix(color1.rgb, color2.rgb, half(tween)), color1.a), gid);
}
1
  • 1
    I suppose the efficient use of "half" or "char4" would depend on whether the source buffer was RGBA-32 (4 bytes/pixel) or some other format, no?
    – zzyzy
    Commented Jul 25, 2017 at 20:23

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.