Calling a kernel from within a kernel?

I have an application that does a lot of FFT + some subsequent data processing.

At the moment I do this

  1. Copy data from CPU to GPU.
  2. Do FFT on GPU (with kernel 1)
  3. Copy data from GPU to CPU.
  4. Do some (minor) processing + logic on CPU
  5. Copy data from CPU to GPU.
  6. Do some more data processing on GPU (with kernel 2)
    … and so on …

Problem: Memory transfers consume a lot of time!

Would it be possible to implement the complete program in OpenCL. That means some program flow / logic that does the same as the CPU part not. Calling kernels, etc.?

Question: Is it possible to call a kernel from within a kernel in OpenCL?

Is it possible at all to do steps 1 … 6 completely on GPU with OpenCL?

As far as I understand it it’s all about implementing step 4 for the GPU. Once you’ve got that you can skip steps 3 and 5, because you’ve already got the data where you need it.

I’m not sure if calling a kernel from within a kernel is possible. Maybe it just gets inlined like any other function call. It’s definitely not possible to enqueue a kernel to a command queue from within a kernel. So if you want to start a new kernel with a possibly different number of work-items, you’d have to do that from your CPU.

What’s the processing + logic step on you CPU doing? Could you implement that for the GPU?

I am sure I could. But that’s mostly serial stuff.

Did I understand it correct that once you have called a kernel you can not change the workgroup setup any more? So for all subsequent functions it’s the same?

If so, this would be a problem, because my kernels use different workgroup sizes.

[quote=“FredericX”]

I am sure I could. But that’s mostly serial stuff.[/quote]
If you can run it on the GPU you wouldn’t have the overhead of transferring your data all the time. So even if it’s not optimal to run the code on the GPU, it might still be beneficial (depends on how much computation it is compared to how much data you transfer…)

Did I understand it correct that once you have called a kernel you can not change the workgroup setup any more? So for all subsequent functions it’s the same?

If so, this would be a problem, because my kernels use different workgroup sizes.

No, you can have different work-group sizes as well as different numbers of work-items. But you have to launch the kernel from the host, i.e. the CPU. That shouldn’t be a problem though as the overhead of launching kernel is pretty small compared to data transfer overhead.