Results 1 to 3 of 3

Thread: Global Barriers?

  1. #1
    Junior Member
    Join Date
    Dec 2008
    Toronto, Ontario, Canada

    Global Barriers?

    Currently I'm writing an algorithm where I need a single (very quick) global barrier, and then processing can resume in parallel as it was... so basically I have a large amount of parallel work, then all work_items should hit a barrier... one work item proceeds past and does some very quick work... then all work_items resume past the barrier.

    I don't see that this is possible with OpenCL. The barrier() instruction specifies that it only applies to work groups. This isn't good enough, because I want to work at the global_id level.

    The other thing to do is to break my kernels into three kernels... kernel_1 does everything in parallel up to the barrier... kernel_2 does a single_work item and very little work (a huge waste of time to spawn, but required for the algorithm), and finally kernel_3 again works in parallel. Obviously I want to avoid the CPU management where I can, because it will add a bit of overhead that isn't required.

    Normally I wouldn't care... but this is part of a very time-critical algorithm, and I want to ensure this part is as fast as possible.

    OpenCL standards committee member

  2. #2
    Join Date
    Nov 2009

    Re: Global Barriers?

    OpenCL only supports synchronization within workgroups. The official way of a global synchronization is to have multiple kernels as you pointed out. But rather than having 3 kernels you would only need 2 I think: In the first kernel you do all the work up to the barrier and only one workitem (say the one with global_id 0) does the sequential work. Then in the second kernel you do the remaining parallel work.

    There's a paper a this year's CC conference called "Automatic C-to-CUDA Code Generation for Affine Programs". They say they use
    a "single-writer multiple-reader" technique to achieve synchronization across thread blocks using the global memory space
    They don't discuss the performance of this technique though...

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Northern Europe

    Re: Global Barriers?

    The "single-writer multiple-reader" thing sounds a lot like one work-item writes and the others spin-lock on it. That may work, but without assurances as to how the hardware schedules work-groups it might also never complete. (I've heard that it tends to work on Nvidia hardware.)

Similar Threads

  1. Using pair barriers in the condtional statement
    By igorp in forum OpenCL - parallel programming of heterogeneous systems
    Replies: 1
    Last Post: 09-30-2011, 03:14 PM
  2. Specifics of barriers
    By xgromd in forum OpenCL - parallel programming of heterogeneous systems
    Replies: 1
    Last Post: 09-30-2011, 03:12 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Proudly hosted by Digital Ocean