Hello forum,
I created a buffer of cl_host2 and sent to the kernel as float2. Is it possible to cast it to float inside the kernel as follows:
float *converted = (float*)(original_float2);
Thanks
Hello forum,
I created a buffer of cl_host2 and sent to the kernel as float2. Is it possible to cast it to float inside the kernel as follows:
float *converted = (float*)(original_float2);
Thanks
Yes, it is possible. Check section 6.2.5 “Pointer casting” in OpenCL specification though for more detail.
Thanks for the hint . I am getting CL_INVALID_COMMAND_QUEUE with the following kernel where i am casting a float2 pointer to the float* pointer.
__kernel void velocity(read_only image2d_t image, // opencl image
__global float2 *vx , // X-component of the velocity field
__global float2 *vy, // Y-component of the velocity field
int dx, // size of the domian on X-dimension
int pdx, // padded width for the in-place FFT
int dy, // size of the domain on Y-dimension
float dt, // delta time
int lb,
sampler_t sample)
{
//cast between the pointer types
float *vx_aligned = (float*)vx;
float *vy_aligned = (float*)vy;
int gtidx = get_global_id(0);
int gtidy = get_group_id(1) * (lb * get_local_size(1)) + get_local_id(1) * lb;
int p;
float2 ploc;
float4 vterm;
//variable to store the x-component and y-component
//of the velocity field
float vxterm,vyterm;
//gtidx is the domain location in x for this thread
if(gtidx < dx)
{
for(p = 0; p < lb; p++)
{
//fi is the domain location in x for this thread
int fi = gtidy + p;
if(fi < dy)
{
int fj = fi * pdx + gtidx;
//define the 2D coordinate
float2 coord = (float2)(gtidx,fi);
vterm = read_imagef(image,sample,coord);
ploc.x = (gtidx + 0.5f) - (dt * vterm.x * dx);
ploc.y = (fi + 0.5f) - (dt * vterm.y * dy);
vterm = read_imagef(image,sample, ploc);
vxterm = vterm.x;
vyterm = vterm.y;
//only the real component of the velocity field is updated
vx_aligned[fj] = vxterm;
vy_aligned[fj] = vyterm;
}
}
}
}
I am not sure what i am doing wrong here. Being new to OpenCL , I need some hint to debug a kernel. I read somewhere in the forum that we usually get this type of error when we have “page fault”/“segmentation fault”/"invalid memory access. But the compiler does not say anything about it.
The computation domain is 2D and its size is 512 by 512. The domain is divided into tiles of 64-by-64 cells. And a workgroup of 64-by-4 work-items is responsible for computing each tile of 64X64. In other words, 256 work-items are divided logically into 64 workitems in x-direction times 4 work-items in y-direction. The work-items are distributed over the tile such that each work-item computes results for a vertical column of 16 cells.
The local and global work size is defined as follows:
#define TILEX 64 // Tile width
#define TILEY 64 // Tile height
#define TIDSX 64 // Tids in X
#define TIDSY 4 // Tids in Y
..............................................
...............................................
localWorkSize[0] = TIDSX; // work group # of work items
localWorkSize[1] = TIDSY;
globalWorkSize[0] = ((dx/TILEX)+(!(dx%TILEX)?0:1)) * TIDSX; // global # of work items
globalWorkSize[1] = ((dy/TILEY)+(!(dy%TILEY)?0:1)) * TIDSY * (TIDSX/TIDSY);
//and execute the kernel
errNum = clEnqueueNDRangeKernel(commandQueue,
advectVelocityKernel,
2,
NULL,
globalWorkSize,
localWorkSize,
0,
NULL,
NULL);
Any idea to address the issue?
Thanks
I agree that CL_INVALID_COMMAND_QUEUE is often caused by a memory access violation by a kernel.
If your sampler uses some kind of address clamping (i.e. does not use CLK_ADDRESS_NONE mode), the only access to memory is to vx_aligned[fj] and vy_aligned[fj]. You should check that fj index does not run outside the array bounds (it probably does).
Note that get_group_id(1) * get_local_size(1) + get_local_id(1) is simply get_global_id(1), so gtidy = get_global_id(1) * lb. fj max value is then (dy - 1) * pdx + dx. Check that it is coherent with your allocation for vx and vy.