The following code works on Nvidia, but the AMD and Intel OpenCL compilers complain about the absence of a valid address space qualifier:
__kernel void foo(__global float (*pointer_to_matrix)[3][4])
{
}
I think this code is legal. Any opinion?
The following code works on Nvidia, but the AMD and Intel OpenCL compilers complain about the absence of a valid address space qualifier:
__kernel void foo(__global float (*pointer_to_matrix)[3][4])
{
}
I think this code is legal. Any opinion?
According to OpenCL specification: “arguments to kernel functions in a program cannot be declared as a pointer to a pointer”
But this is not a pointer to a pointer, this is a pointer to a 3x4 matrix (i.e., a pointer to a memory area holding 12 consecutive floats).
When you declare “float matrix[3][4]”, matrix is a pointer to an array of 12 floats.
So “float (*pointer_to_matrix)[3][4]” is obviously a pointer to a pointer to floats.
If you intend to use a pointer to an array of 12 floats, you should declare your kernel as
__kernel void foo(__global float matrix[3][4])
Both declarations declare a pointer to an array of 12 floats, even though the syntax for dereferencing the pointer is different.
Consider the five examples below; these are fully equivalent and could/should lead to the same generated binary code (remove the __kernel and __global, and it is legal C code; you will see that an optimizing C compiler will generate exactly the same instructions for all five cases).
__kernel void foo1(__global float *ptr) // ok on NVIDIA, AMD, and Intel
{
ptr[6] = 42;
}
__kernel void foo2(__global float ptr[12]) // gives error on Intel
{
ptr[6] = 42;
}
__kernel void foo3(__global float (*ptr)[12]) // gives error on AMD and Intel
{
(*ptr)[6] = 42;
}
__kernel void foo4(__global float ptr[3][4]) // gives error on AMD and Intel
{
ptr[1][2] = 42;
}
__kernel void foo5(__global float (*ptr)[3][4]) // gives error on AMD and Intel
{
(*ptr)[1][2] = 42;
}
As far as I can see, none of these constructs is forbidden by the OpenCL specification, so I think that all these five functions should be accepted by the compiler (only NVIDIA does so).