Monday, April 28, 2014

Using AMD's Static C++ Kernel Language in OpenCL

AMD provides an implementation of an OpenCL extension which adds support for certain C++ features (e.g. templates) in OpenCL source code. It's called the "OpenCL Static C++ Kernel Language" extension and the specification can be found here (PDF). This extension is quite useful (especially for people coming from NVIDIA's CUDA who see the C99-based OpenCL kernel language as lacking).

In this blog post I go over how to use C++ templates directly in OpenCL kernels with this extension. I hope this is useful to others as, to me at least, how to use this extension is not immediately apparent.

First, we define our templated function (the prototypical square() function template):

template<typename T>
inline T square(const T x)
    return x * x;

Next we define a templated kernel which calls the square() function:

template<typename T>
__kernel void square_kernel(__global T *data)
    const uint i = get_global_id(0);
    data[i] = square(data[i]);

Now, in order to use the templated kernel, we must explicitly instantiate it and give it a name that can be used to call it from the host. We do that by explicitly stating the template-types and using the mangled_name attribute as so:

template __attribute__((mangled_name(square_kernel_int)))
__kernel void square_kernel(__global int *data);

The kernel can be instantiated multiple times for different types (though each must be given a unique mangled name). For example, to define the square kernel for float's:

template __attribute__((mangled_name(square_kernel_float)))
__kernel void square_kernel(__global float *data);

That's it for the OpenCL code. To compile the program on the host you must pass the "-x clc++" compile option to the clBuildProgram() function. In Boost.Compute, that is done like so (where source is a string containing all of the OpenCL source code from above and context is the OpenCL context object):

compute::program square_program =
    compute::program::build_with_source(source, context, "-x clc++");

Once built, the templated kernels can be instantiated by passing their mangled name to the clCreateKernel() function. In Boost.Compute, you would do the following:

compute::kernel square_int_kernel(square_program, "square_kernel_int");

The square_int_kernel can now be used just like any other kernel object (e.g. passed to clEnqueueNDRangeKernel()for execution on the device).

A fully implemented and compilable example demonstrating the code above can be found here.


  1. It seems the need to instantiate all possible variants in advance (and to use mangled names) spoils all the fun. Here is a version that does the same thing with standard OpenCL:

    1. And of course this is not VexCL-specific. The fact that OpenCL allows one to generate source on the fly is its great advantage.

    2. Yeah... Needing to explicitly instantiate each templated kernel you plan to use makes this a bit limited (though still may be useful for writing generic functions called from regular kernels).

      But anyway, this post was mainly just to see what the static C++ kernel language offered and how it was used.