Work group size

I got very strande issue with local group size.
I proceed matrix 512x512 (my global group size) and had a local group size (256,256). The module worked without any problems using C API.
Now I rewrote the program using C++ Wrapper. The kernel code stay the same, but I got message about wrong work group size from enqueueNDRange(). The biggest size I can use now is 16x16.
Can anybody explain what’s going on?

I have trouble believing that your hardware supports a work group size of 256x256. That would be 65536. Typical hardware supports a maximum work-group size of 256 or 512.

Is it possible that your C code is actually using a work-group size of 256 and that’s why it works fine? You can query your device’s max work group size with clGetDeviceInfo() and CL_DEVICE_MAX_WORK_GROUP_SIZE.