PNG decoder using opencl

Hi folks,
I am trying to implement a png decoder using C++ and Opencl (partly ). As of now , I have decided to do the (un)filtering part at the decoder side using OpenCL as it is the only repetitive thing . I have implemented a little but when passing the decompressed stream of image data to kernel, it is only operating over its global size, and is processing only 6 kb of data passed, instead of the complete stream. Also , the pixels(current ) are dependent on previous ones, so are there any ways to fix the problems.
Any help would be appreciated…

Thanks

A key to fast parallel code is to reduce the dependencies. If your “previous pixel” dependencies are always “to the left in the same row” then make your work item process a whole row left to right so there are no dependencies outside of the work item. I don’t know what is causing your incomplete processing case but it sounds like further study of work items, work groups, and global sizes is in order.

the previous pixels are not always in the same row, as PNG implements adaptive filtering, it varies from a pixel in left of the current,exactly above the current and left of the above pixel;