Division by Zero on Large Problem Sizes

Hello all,
I am hoping that someone might be able to give me a bit of help with a problem I am having. I am writing a Computational Fluids code using OpenCL, but am having problems when a grid/mesh gets to large sizes.

The mesh I have is 2D, but I am storing it as a large 1D array of cl_float2.

My difficulty is that up to about 15,000 points in the array, my program operates happily. After 15,000 points, the program crashes, throwing an “Integer Division by Zero.” I am using the C++ Bindings for OpenCL as well.

My investigation so far has led me to:

  • Ensuring that the work group size is not larger than either the device or the kernel can handle (Device States 512, Kernel States 320, a 20,000 point grid might be set to 100 or 200)
  • Attempting to use a 2D Work Item size (Device States 512x512x64, I’m not sure how to get his from the Kernel, and I have tried setting manually to 200x100x1)
  • Altering the grids being used (like I mentioned, they go up to about 15,000 points and, afterwards, give me problems).
  • Running on the CPU (dual core Core2Duo, and quad-core i7 processor, which has Work group size of 1024, and work item sizes of 1024x1024x1024)

I have validated the inputs and outputs from the device and my kernel works fine when the grids are small enough. Unfortunately, for the problems used in CFD, my grids need to be larger or more refined.

My hardware is:
Intel Core 2 Duo CPU @ 3 GHz using ATI-Stream OpenCL 1.0 Driver
3.25 GB RAM
Windows XP 32-bit SP3
nVIDIA GeForce 9500 GT, 1GB DDR2, with 32 GPU cores, using nVIDIA OpenCL 1.1 developer driver 258.19.

So, in short, is there anything obvious that I am doing wrong? I am, at heart, an engineer, not a computer programmer, but I am trying :slight_smile:

Thanks in advance!