Pipe question

Hi ,
OpenCL provides the new feature - pipe.
It can transfer data between kernels.

Therefore , I want to ask that if producer knows how many times it would transfer data to consumer, then we can set a terminal condition to consumer to finish the kernel execution.

But if producer doesn’t know how many time it would transfer data to consumer , how to set a terminal condition to make the consumer know when to finish the kernel function.

Thanks.

get_pipe_num_packets Returns the number of available entries in the pipe

You cannot expect two kernels to run in parallel because the majority of devices don’t support this. And the result of a simultaneous modification of the same memory object from different devices is undefined. Expected usage scenario for pipes is to run the producer kernel and run the consumer afterwards. You probably want to use device-side dispatch as well.

Hi
Thanks for your reply.
I understand that producer produces a data , and then transfer the data to consumer , finally consumer receives the data to do compute.
If producer knows he will pass 10 data to consumer , therefore consumer could set a condition to terminate the kernel function after receiving 10 data.
But if producer will produce N data (N is random) (consumer will use infinite loop to receive N data) , how does consumer could break the loop and finish the kernel execution.

Thanks,

Consumer could use get_pipe_num_packets to query the number of data in pipe.
In pseudo code I’d do something like this, probably:

items_in_pipe = get_pipe_num_packets
while items_in_pipe > 0
    if (items_in_pipe > global_id)
         data = *all the stuff to read from pipe*
         process(data)
         items_in_pipe = max(0, items_in_pipe - work_items_num)

Is there even a use case for pipes on CPU or GPU devices (that is more efficient or less code than just using global memory or images between kernels), or do they exist just for FPGA devices?

Some CPUs and GPUs can run concurrent kernels, so pipes may be as efficient and less code than using global memory, but not nearly as efficient as they can implemented on FPGAs.