Today I tried to compile my program on a CUDA machine:
-
platform: NVIDIA CUDA [0]
-
version: OpenCL 1.1 CUDA 4.2.1
-
device: GeForce GTX 480 [0]
-
compute units: 15
-
global memory: 1535MB
-
max. buffer size: 383MB
-
max. work group size: 1024
- floating point precision: double
The opencl program won’t build with this kernel:
#pragma OPENCL EXTENSION cl_khr_fp64: enable
__kernel void test(__global double *a, __global double *b)
{
int i = get_global_id(0);
a[i] = 99.9;
b[i] = a[i] + 77.7;
}
But it runs fine with this one:
// #pragma OPENCL EXTENSION cl_khr_fp64: enable
__kernel void test(__global float *a, __global float *b)
{
int i = get_global_id(0);
a[i] = 99.9f;
b[i] = a[i] + 77.7f;
}
Both kernels are build and executed correctly on a second machine with a HD7970. It gets even stranger: The following kernel will build and execute on the GTX 480 machine:
#pragma OPENCL EXTENSION cl_khr_fp64: enable
__kernel void test(__global double *a, __global double *b)
{
int i = get_global_id(0);
a[i] = 99.9;
b[i] = 77.7;
}
The only difference to the first kernel is that I’ve removed a[i] in the last line. It turns out, that the program won’t build if I use double data type and try to read from either array a or b. But there is no problem with writing to both arrays! Can someone think of the reason? The driver is 304.54 and CUDA version is 5.0.35. I also tried the same code on another machine with a GTX 480 but with driver version 319.37 and CUDA 5.5.22 and got exactly the same behavior.