clCreateBuffer2D and clCreateBuffer3D

sean.settle · April 12, 2011, 3:36am

Similar to how there are clCreateImage2D (which is also used for 1D) and clCreateImage3D, it would be very useful for linear algebra routines for a clCreateBuffer2D, and by extension a clCreateBuffer3D. In linear algebra, matrices/tensors and sub-matrices/tensors are typically defined by m (height), n (width), and lda (row_pitch), which aren’t communicated well using the current clEnqueue{Read, Write}Buffer. The clEnqueue{Read, Write}Image seems to be a better match, but I’m not sure about the internal structure of cl_mem (or * _cl_mem).

Maybe new functions aren’t needed if I can accomplish my goal of 2D and 3D buffers using images if there is an appropriate cl_mem_flags value, say channel_order of CL_R or CL_A and channel_data_type of CL_FLOAT or CL_DOUBLE?

david.garcia · April 12, 2011, 5:00am

Maybe new functions aren’t needed if I can accomplish my goal of 2D and 3D buffers using images if there is an appropriate cl_mem_flags value, say channel_order of CL_R or CL_A and channel_data_type of CL_FLOAT or CL_DOUBLE?

Your intuition was good because that’s exactly what I was going to recommend

You could create a 2D real matrix with something like this:


cl_image_format image_format = {CL_R, CL_FLOAT};

myMatrix = clCreateImage2D(ctx, CL_READ|CL_WRITE, &image_format, width, height, /*row_pitch*/ 0, /*host_ptr*/ NULL, &errcode);

if(errcode)
{
     do_not_panic(errcode);
}

I don’t think there’s any hardware out there that supports double-precision floats in images, though. For doubles you would need to use buffer objects and treat them as 2D manually. Keep in mind that treating a 1D buffer as a 2D surface is not very difficult. Essentially all you have to do is choose whether you want it to be in row-major order or column-major order and write a few utility functions to create, read, write and copy from them.

In linear algebra, matrices/tensors and sub-matrices/tensors are typically defined by m (height), n (width), and lda (row_pitch), which aren’t communicated well using the current clEnqueue{Read, Write}Buffer. The clEnqueue{Read, Write}Image seems to be a better match, but I’m not sure about the internal structure of cl_mem (or * _cl_mem).

Right. When you read/write/copy strided data using clEnqueue{Read,Write}Buffer() you need to account for the stride. A couple of support functions in your app would do the trick.

The internal structure of cl_mem is different in each OpenCL implementation, which is why the OpenCL API exposes it as an opaque handle.

sean.settle · April 12, 2011, 5:10am

Thanks David!

In the above use case would there be any difference in using CL_R or CL_A? I’m not yet very familiar with the imaging part of OpenCL. And in order to use this method I have to check if CL_DEVICE_IMAGE_SUPPORT returns CL_TRUE, is that correct?

david.garcia · April 12, 2011, 5:22am

In the above use case would there be any difference in using CL_R or CL_A?

Yes, there is. This is explained at the very end of chapter 6 in the spec. Let me show you with an example:


const sampler_t mySampler = ...;

__kernel void foo(image2d_t myImage)
{
    float4 val = read_imagef(myImage, mySampler, (int2)(get_global_size(0), get_global_size(1));

    // If myImage is CL_R, val.x contains the fetched value, val.yz is zero and val.w is one.
    // If myImage is CL_A, val.w contains the fetched value and val.xyz is zero.
}

david.garcia · April 12, 2011, 5:26am

One more thing: images are very fast on GPUs, not so on CPUs. My advice was unintentionally GPU-centric because I’m a GPU kind of guy

If your target hardware is CPUs or a mixture of the two, it may be better to stick to buffer objects for portability. As you correctly said your app would need to query whether images are supported and some devices could reply nay.