Hi,
we are working on a rendering prototype which shares 2D and 3D textures with OpenCL. The volume texture is roughly 125MiB in size. We ran into a problem with the clEnqueueAcquireGLObjects and clEnqueueReleaseGLObjects calls. They take up ~15ms each (~30ms combined!).
This is unacceptable. We suspect that OpenCL internally duplicates the texture memory and copies the data to and from OpenGL. When only acquiring small 2D OpenGL resources the calls do not take up much frame time.
The example how we run the kernel:
std::vector acq;
acq.push_back(*_output_cl_image);
acq.push_back(*vdata->volume_image());
acq.push_back(*vdata->color_alpha_image());
int arg_count = 0;
cl_error = _ray_cast_kernel->setArg(arg_count++, *_output_cl_image); assert(!cl_error_string(cl_error).empty());
cl_error = _ray_cast_kernel->setArg(arg_count++, *vdata->volume_image()); assert(!cl_error_string(cl_error).empty());
cl_error = _ray_cast_kernel->setArg(arg_count++, *vdata->color_alpha_image()); assert(!cl_error_string(cl_error).empty());
cl_error = _ray_cast_kernel->setArg(arg_count++, *vdata->volume_uniform_buffer()); assert(!cl_error_string(cl_error).empty());
cl_error = context->cl_command_queue()->enqueueAcquireGLObjects(&acq); assert(!cl_error_string(cl_error).empty());
cl_error = context->cl_command_queue()->enqueueNDRangeKernel(*_ray_cast_kernel, ::cl::NullRange, global_range, local_range, 0, 0);
cl_error = context->cl_command_queue()->enqueueReleaseGLObjects(&acq); assert(!cl_error_string(cl_error).empty());
Here the test code how we measured the acquire and release times:
_acquire_timer.start();
cl_error = context->cl_command_queue()->enqueueAcquireGLObjects(&acq); assert(!cl_error_string(cl_error).empty());
context->cl_command_queue()->finish();
_acquire_timer.stop();
//cl_error = context->cl_command_queue()->enqueueNDRangeKernel(*_ray_cast_kernel, ::cl::NullRange, global_range, local_range, 0, 0); assert(!cl_error_string(cl_error).empty());
_release_timer.start();
cl_error = context->cl_command_queue()->enqueueReleaseGLObjects(&acq); assert(!cl_error_string(cl_error).empty());
context->cl_command_queue()->finish();
_release_timer.stop();
These are the two read only images used by the kernel:
_volume_image.reset(new cl::Image3DGL(*device->cl_context(), CL_MEM_READ_ONLY,
voldata->volume_raw()->object_target(), 0,
voldata->volume_raw()->object_id(), &cl_error));
_color_alpha_image.reset(new cl::Image2DGL(*device->cl_context(), CL_MEM_READ_ONLY,
voldata->color_alpha_map()->object_target(), 0,
voldata->color_alpha_map()->object_id(), &cl_error));
This is the single write only image used by the kernel:
_output_cl_image.reset(new cl::Image2DGL(*device->cl_context(), CL_MEM_WRITE_ONLY,
_output_texture->object_target(), 0,
_output_texture->object_id(), &cl_error));
As said, we are suspecting the OpenCL implementation to copy the OpenGL resources to its own memory. Maybe someone has an answer if this is really happening and if this can or will be solved in future implementations? As it is today it is sadly not usable for us…
We are trying this on Nvidia GeForce 480/580 hardware using r285 drivers.