I have a 2D image and I perform a classical local extremum (min-max) by reading the 8-neighborhood. My code is simple but I think absolutely not optimal for GPU.

Is there a know algorithm to do this task? (haven't found yet)

I was thinking about overlapping blocks where each block reads a part of the image, put it in a local buffer, and perform the local min-max by reading this buffer.