I’m trying to implement a simple iterated convolution kernel with edge wrapping. I have a few questions about how to do this most effectively.
What is the best way to implement edge wrapping in a kernel? Should I use modulo? Should I limit my buffers to widths/heights that are powers of two and use a bitwise and? Or should I avoid this altogether and use a larger buffer and copy the left edge to the right side and vice versa? If so, what’s the best way to do this?
Paul is right if your input is an image. Just set the sampler to wrap. If you are using buffers and/or blocking into local memory you’ll have to handle it yourself. In the case of using local memory you can just load the extra data for the edges. In general you’ll want to have separate code-paths for the edge case/body case so you don’t pay the conditional/modulo overhead on every calculation.)