OpenGL-OpenCL Interoperability for GL_TEXTURE_2D_ARRAY

Hi all,
I’m developing an application that uses OpenGL-OpenCL interoperability feature. I render to a GL_TEXTURE_2D_ARRAY using geometry shader and then uses OpenCL to process the texture array.

The issue I’m facing is that, I originally used Geforce GTS 250 graphics card and it works fine, the OpenCL kernel can access the texture array as image3d_t passed in as argument. I’ve upgraded my graphics card to Geforce GTX 550 Ti and even though the buffer creation (using clCreateFromGLTexture3D()) and call to clEnqueueNDRangeKernel() return CL_SUCCESS, when I call clFinish() after the kernel execution the returned value is CL_INVALID_COMMAND_QUEUE (-36).

The issue seems to be with the z-component of the texture coordinate being passed into the read_imagef() function. If I put 0 to access the first layer it doesn’t complain, but if I put anything greater than 0, it returns the CL_INVALID_COMMAND_QUEUE.

If I used GL_TEXTURE_3D instead of GL_TEXTURE_2D_ARRAY, OpenCL can access the buffer just fine, but I need to attach a depth buffer to the texture array and I don’t think I can do that with GL_TEXTURE_3D.

I’m using the developer driver 270.81, OpenGL 3.3 and OpenCL 1.0 for both graphics card.
The only difference I can think of between the two graphics cards are the architecture and the compute capability. But I’m not sure if those are what causing the issue.

Does anyone have any experience working with 3D texture in OpenCL? Is GL_TEXTURE_2D_ARRAY meant to be supported in OpenCL?

I’ve asked this question on the NVIDIA Developer forums as well, but I thought there’s a lot more people here. Trying to increase the chance of resolving this issue ^^;

Thank you in advance.
Surya

Is GL_TEXTURE_2D_ARRAY meant to be supported in OpenCL?

This is the key question. I don’t know whether NVidia has a special extension to support GL_TEXTURE_2D_ARRAY. What I know is that GL_TEXTURE_2D_ARRAY doesn’t appear in the Khronos standard OpenGL/OpenCL interoperability extensions. See section 9.8.3 of the OpenCL 1.1. spec.

What surprises me is that clCreateFromGLTexture3D() or clCreateFromGLTexture2D() is not returning an error code. That’s really inconvenient for the developer.

Thanks for your help David,

Even more surprising to me is how it was working on GTS 250. I’ve tested it on 8600M GT which has the same Compute Capability as GTS 250 and it works as well. Both has Compute Capability of 1.1.

I’m trying to test this on different cards with different Compute Capability to see if it’s the case - if I can get hold of them.

Or if anyone has graphics card that supports Compute Capability of 1.3 or 2.0 and can try it, I’d really appreciate it. http://developer.nvidia.com/cuda-gpus to check the list of graphics card.

I’ll try to attach or list the code to test it sometime later to make it easier for anyone that can help.

Thanks again
Surya

I will test on a GTX 470 with 285.26 drivers

I don’t think I have permission to attach files, so I’ll list the necessary code here.

Kernel:


#pragma OPENCL EXTENSION cl_khr_gl_sharing : enable

const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;

__kernel void Test(__global float4* output, __read_only image3d_t input)
{
	size_t global_id_x = get_global_id(0);
	size_t global_id_y = get_global_id(1);
	size_t global_id_z = get_global_id(2);

	size_t global_size_x = get_global_size(0);
	size_t global_size_y = get_global_size(1);

	size_t global_index = (global_id_z * (global_size_x * global_size_y)) + (global_id_y * global_size_x) + global_id_x;
	int4 tex_coordinate = (int4)(global_id_x, global_id_y, global_id_z, 0);
	// Uncomment below and comment above to make it work
	//int4 tex_coordinate = (int4)(global_id_x, global_id_y, 0, 0);
	float4 value = read_imagef(input, sampler, tex_coordinate);
	output[global_index] = value;
}

Framebuffer Initialization:


void InitFramebuffer()
{
	glGenFramebuffers(1, &framebuffer);
	glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);

	glGenTextures(1, &depthbuffer);
	glBindTexture(GL_TEXTURE_2D_ARRAY, depthbuffer);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_COMPARE_FUNC, GL_LEQUAL);
	glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, GL_DEPTH_COMPONENT32, TEXTURE_WIDTH, TEXTURE_HEIGHT, TEXTURE_DEPTH, 0, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);

	glGenTextures(1, &texture);
	glBindTexture(GL_TEXTURE_2D_ARRAY, texture);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_EDGE);
	glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_COMPARE_FUNC, GL_LEQUAL);
	glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, GL_RGBA16F, TEXTURE_WIDTH, TEXTURE_HEIGHT, TEXTURE_DEPTH, 0, GL_RGBA, GL_FLOAT, NULL);

	glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, texture, 0);
	glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthbuffer, 0);

	GLenum status = glCheckFramebufferStatus(GL_FRAMEBUFFER);
	assert(status == GL_FRAMEBUFFER_COMPLETE);

	glBindTexture(GL_TEXTURE_2D_ARRAY, 0);
	glBindFramebuffer(GL_FRAMEBUFFER, 0);
}

OpenCL initialization:


void InitCL()
{
	cl_int status = CL_SUCCESS;
	cl_uint num_platforms;
	cl_uint num_devices;

	status = clGetPlatformIDs(1, &platform_id, &num_platforms);
	assert(status == CL_SUCCESS);
	status = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &num_devices);
	assert(status == CL_SUCCESS);

	cl_context_properties cl_properties[] = 
	{
		CL_GL_CONTEXT_KHR,		(cl_context_properties)wglGetCurrentContext(),
		CL_WGL_HDC_KHR,			(cl_context_properties)wglGetCurrentDC(),
		CL_CONTEXT_PLATFORM,	(cl_context_properties)platform_id,
		0
	};

	context = clCreateContext(cl_properties, 1, &device_id, NULL, NULL, &status);
	assert(status == CL_SUCCESS);

	command_queue = clCreateCommandQueue(context, device_id, CL_QUEUE_PROFILING_ENABLE, &status);
	assert(status == CL_SUCCESS);

	// Create buffer
	gl_texture_array_buffer = clCreateFromGLTexture3D(context, CL_MEM_READ_ONLY, GL_TEXTURE_3D, 0, texture, &status);
	assert(status == CL_SUCCESS);
	result_buffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * 4 * (TEXTURE_WIDTH * TEXTURE_HEIGHT * TEXTURE_DEPTH), NULL, &status);
	assert(status == CL_SUCCESS);

	// Create kernel
	const char* source = ReadFile("kernel.cl");
	assert(source);

	size_t source_length = strlen(source);
	program = clCreateProgramWithSource(context, 1, &source, &source_length, &status);
	assert(status == CL_SUCCESS);

	status = clBuildProgram(program, 1, &device_id, "-cl-fast-relaxed-math", NULL, NULL);
	if (status != CL_SUCCESS)
	{
		char* program_info_log;
		size_t info_length = 0;
		status = clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, 0, NULL, &info_length);
		assert(status == CL_SUCCESS);
		if (info_length > 0)
		{
			program_info_log = new char[info_length + 1];
			memset(program_info_log, 0, sizeof(char) * (info_length + 1));
			status = clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, info_length, program_info_log, NULL);
			printf("%s
", program_info_log);
			delete[] program_info_log;
		}
	}

	kernel = clCreateKernel(program, "Test", &status);
	assert(status == CL_SUCCESS);
	status  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &result_buffer);
	status |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &gl_texture_array_buffer);
	assert(status == CL_SUCCESS);
}

Test:


void Test()
{
	glViewport(0, 0, TEXTURE_WIDTH, TEXTURE_HEIGHT);
	glClearColor(1.0f, 0.0f, 0.0f, 1.0f);
	glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
	// Clear the texture array
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	glBindFramebuffer(GL_FRAMEBUFFER, 0);

	// Try to acquire the texture array for use in OpenCL
	size_t global_item_size[3] = {TEXTURE_WIDTH, TEXTURE_HEIGHT, TEXTURE_DEPTH};
	size_t local_item_size[3] = {16, 16, 1};
	cl_int status = CL_SUCCESS;
	cl_event acquire_event;
	cl_event kernel_event;
	cl_event release_event;
	status = clEnqueueAcquireGLObjects(command_queue, 1, &gl_texture_array_buffer, 0, NULL, &acquire_event);
	assert(status == CL_SUCCESS);
	status = clEnqueueNDRangeKernel(command_queue, kernel, 3, NULL, global_item_size, local_item_size, 1, &acquire_event, &kernel_event);
	assert(status == CL_SUCCESS);
	status = clEnqueueReleaseGLObjects(command_queue, 1, &gl_texture_array_buffer, 1, &kernel_event, &release_event);
	assert(status == CL_SUCCESS);
	status = clFinish(command_queue);
	if (status == CL_SUCCESS)
		printf("CL_SUCCESS
");
	else if (status == CL_INVALID_COMMAND_QUEUE)
		printf("CL_INVALID_COMMAND_QUEUE
");

	clReleaseEvent(acquire_event);
	clReleaseEvent(kernel_event);
	clReleaseEvent(release_event);
}

The OpenGL context and extensions need to be initialized as well, using SDL/GLUT and GLEW.

The program should print “CL_SUCCESS” when using texture coordinate with 0 in its z-component, and “CL_INVALID_COMMAND_QUEUE” otherwise if the graphics card doesn’t support GL_TEXTURE_2D_ARRAY.
There’s one line in the kernel file that can be commented/uncommented to change the z-component to 0. The kernel file is hardcoded to be named “kernel.cl” in InitCL().

Thanks again
Surya