OpenCL-openGL interoperablity in multi-threaded environment

Hi,
I’m using linux (ubuntu11.04) and NVidia GEForce9800 series PCI card.
i’m trying to use “OpenCL-openGL interoperablity” in multi-threaded environment.
Creating and accessing OpenCL functions from thread A and Creating and accessing OpenGL/GLX functions from another thread B.
please let me know if this is a feasible use case. Is is valid to try this two different threads. Will there be any issues in to achieve this functionality?
please help/

Does the GL-CL buffers should be created in same thread?
Anyone tried in this approach?

thanks in advance for the help

Creating and accessing OpenCL functions from thread A and Creating and accessing OpenGL/GLX functions from another thread B.
please let me know if this is a feasible use case. Is is valid to try this two different threads. Will there be any issues in to achieve this functionality?
please help

That should work fine in OpenCL 1.1. onwards. There is only limited thread safety in OpenCL 1.0.

Does the GL-CL buffers should be created in same thread?
Anyone tried in this approach?

That should not be necessary.

Hi,

Thanks for quick response.

That should work fine in OpenCL 1.1. onwards. There is only limited thread safety in OpenCL 1.0.

Yes. I was facing kernel panic issue in opencl1.0 driver. I have updated Nvidia driver to opencl1.1 and it works only from 295.20 driver. when i tried in nvidia driver 290.10 (which supports opencl1.1) doesn’t work.

  1. Regarding the buffer creation. i’m facing some issue. i may need your help, if things doesn’t work as expected.

  2. please let me know your inputs in case of Synchronizing Access to Shared Data between opencl to opengl when using two different threads. I’m worried if this will be a bottleneck if i use OPenCL access and opengl access from two different threads/
    please advice.

thanks in advance for the reply.

hi,
I’m able to successfully create the CL shared context and shared buffer in multi thread environment. i.e

CL Creating and accessing OpenCL functions from thread A and Creating and accessing OpenGL/GLX functions from another thread B.

Now i’m facing issue while calling ‘clEnqueueAcquireGLObjects’ from CL thread. it results into segmentation fault in nvidia library.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb5fabb70 (LWP 32292)]
0xb723eed0 in ?? () from /usr/lib/libnvidia-glcore.so.295.20

I also confirmed that GL thread is not accessing the buffer.
If ‘clEnqueueAcquireGLObjects’ is called from the GL thread, it works fine.
please let me know if i need to take care of any additional handling in case of multi thread. My requirement is to make it in two different threads.
thanks in advance for help and support.

Did you follow the instructions in section 9.8.6.1 of the CL 1.1. spec? Here’s an excerpt:

Prior to calling clEnqueueAcquireGLObjects, the application must ensure that any pending GL operations which access the objects specified in mem_objects have completed. This may be accomplished portably by issuing and waiting for completion of a glFinish command on all GL contexts with pending references to these objects. […]

Thanks for your quick inputs.

Did you follow the instructions in section 9.8.6.1 of the CL 1.1. spec? Here’s an excerpt:

Yes. I confirmed that GL calls are complete by calling glFlush() and glFinish() calls.
However, the same clAcquireGL…(…) works fine if called from same (GL) thread. That means, there are no pending calls from GL.Is that right?
Is there some additional handling to be done in case of different thread? Or may be driver issue? This is because i faced similar issue when i call clCreatesharedContext(…) with openCL1.0 driver. the same code work fine when i updrade to opencl1.1 driver (as per your input).
Kind request to provide your expert input to resolve this issue ASAP.

If it works with the same thread and it fails with different threads, that may be a driver issue. If you can provide your vendor with a very short application that reproduces the issue they are more likely to look into it and fix it. Less than 50 lines of code is ideal.

Thanks once again for your time and effort. Your inputs really helped me to process as i’m new to this activity.

If it works with the same thread and it fails with different threads, that may be a driver issue. If you can provide your vendor with a very short application that reproduces the issue they are more likely to look into it and fix it. Less than 50
lines of code is ideal.

Yes. I have a sample program to confirm this. I’ll post it to Nvidia forum. i have already raised it in http://forums.nvidia.com/index.php?s=5c … pic=223745
However, i’ll attach the sample code also.
by the way, i guess NVIDIA CUDA GPU programing forum is the right place to raise it. Please correct me if i’m wrong.

Also, in OpenCL1.1 specification the following is mentioned.

9.8.6.1 Synchronizing OpenCL and OpenGL Access to Shared Objects
If a GL context is bound to a thread other than the one in which clEnqueueReleaseGLObjects
is called, changes to any of the objects in mem_objects may not be visible to that context without
additional steps being taken by the application. For an OpenGL 3.1 (or later) context, the
requirements are described in Appendix D (“Shared Objects and Multiple Contexts”) of the
OpenGL 3.1 Specification. For prior versions of OpenGL, the requirements are implementationdependent

In Appendix D of openGL specification, i could not find relevance much about the additional steps that should be taken after 'clEnqueueReleaseGLObjects.
Please provide me inputs if such additional steps is already known to you.

thanks in advance

Hi,

Once again, i need some inputs regarding the performance issue for OpenCL-openGL interoperablity. In continuation with the above issue mentioned, i had a temporary fix by moving some of the CL APIs (that results into seg. fault), into GL thread. With that, things are working fine. i’m using NBody simulation example.
Here, i found another problem where i could not see performance improvement even after using CL. i.e, FPS is still 20.
I confirmed that NDkernel and copy buffers are not taking time.

Please provide inputs on what are all the possible reasons if the performance doesn’t improve.
thanks in advance