Results 1 to 4 of 4

Thread: cast between the scalar and vector types

  1. #1

    cast between the scalar and vector types

    Hello forum,

    I created a buffer of cl_host2 and sent to the kernel as float2. Is it possible to cast it to float inside the kernel as follows:

    Code :
    float *converted = (float*)(original_float2);


  2. #2
    Senior Member
    Join Date
    Oct 2012
    Yes, it is possible. Check section 6.2.5 "Pointer casting" in OpenCL specification though for more detail.

  3. #3
    Thanks for the hint . I am getting CL_INVALID_COMMAND_QUEUE with the following kernel where i am casting a float2 pointer to the float* pointer.

    Code :
    __kernel void velocity(read_only image2d_t image, // opencl image
    	      		       __global float2 *vx , // X-component of the velocity field
    			       __global float2 *vy,  // Y-component of the velocity field
    	      		       int dx,               // size of the domian on X-dimension
    			       int pdx,              // padded width for the in-place FFT 
    			       int dy,               // size of the domain on Y-dimension
    			       float dt,             // delta time
    			       int lb,
    			       sampler_t sample)
    	//cast between the pointer types
    	float *vx_aligned = (float*)vx;
    	float *vy_aligned = (float*)vy;
    	int gtidx = get_global_id(0);
    	int gtidy = get_group_id(1) * (lb * get_local_size(1)) + get_local_id(1) * lb;
    	int p;
    	float2 ploc;
    	float4 vterm;
    	//variable to store the x-component and y-component
        	//of the velocity field	
    	float vxterm,vyterm;
    	//gtidx is the domain location in x for this thread
    	if(gtidx < dx)
    		for(p = 0; p < lb; p++)
    			//fi is the domain location in x for this thread
    			int fi = gtidy + p;
    			if(fi < dy)
    				int fj = fi * pdx + gtidx;
    				//define the 2D coordinate
    				float2 coord = (float2)(gtidx,fi);
    				vterm = read_imagef(image,sample,coord);
    				ploc.x = (gtidx + 0.5f) - (dt * vterm.x * dx);
    				ploc.y = (fi + 0.5f) - (dt * vterm.y * dy);
    				vterm = read_imagef(image,sample, ploc);
    				vxterm = vterm.x;
    				vyterm = vterm.y;
    				//only the real component of the velocity field is updated
    				vx_aligned[fj] = vxterm;
    				vy_aligned[fj] = vyterm;

    I am not sure what i am doing wrong here. Being new to OpenCL , I need some hint to debug a kernel. I read somewhere in the forum that we usually get this type of error when we have "page fault"/"segmentation fault"/"invalid memory access. But the compiler does not say anything about it.

    The computation domain is 2D and its size is 512 by 512. The domain is divided into tiles of 64-by-64 cells. And a workgroup of 64-by-4 work-items is responsible for computing each tile of 64X64. In other words, 256 work-items are divided logically into 64 workitems in x-direction times 4 work-items in y-direction. The work-items are distributed over the tile such that each work-item computes results for a vertical column of 16 cells.

    The local and global work size is defined as follows:

    Code :
     #define TILEX 64 // Tile width
    #define TILEY 64 // Tile height
    #define TIDSX 64 // Tids in X
    #define TIDSY 4  // Tids in Y
       localWorkSize[0] =  TIDSX;  // work group # of work items
       localWorkSize[1] =  TIDSY;
       globalWorkSize[0] = ((dx/TILEX)+(!(dx%TILEX)?0:1)) * TIDSX; // global # of work items
       globalWorkSize[1] = ((dy/TILEY)+(!(dy%TILEY)?0:1)) * TIDSY * (TIDSX/TIDSY);
       //and execute the kernel
       errNum = clEnqueueNDRangeKernel(commandQueue,

    Any idea to address the issue?


  4. #4
    Senior Member
    Join Date
    Oct 2012
    I agree that CL_INVALID_COMMAND_QUEUE is often caused by a memory access violation by a kernel.

    If your sampler uses some kind of address clamping (i.e. does not use CLK_ADDRESS_NONE mode), the only access to memory is to vx_aligned[fj] and vy_aligned[fj]. You should check that fj index does not run outside the array bounds (it probably does).

    Note that get_group_id(1) * get_local_size(1) + get_local_id(1) is simply get_global_id(1), so gtidy = get_global_id(1) * lb. fj max value is then (dy - 1) * pdx + dx. Check that it is coherent with your allocation for vx and vy.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Proudly hosted by Digital Ocean