Work Item

Hi,

sorry if this is a really stupid question, but I’m writing a really simple CL kernel to take an array of vertices as input, and offset the Y-position of each vertex based on a value read from an image input. The problem is, I can’t get the kernel only to loop through the items in the input array

__kernel void main(__global const float4 *PointsIn, image2d_t ControlImg, int LineNo, float DAmt, __global float4 *PointsOut)
{
	int  tid = get_global_id(0);
	float4 vert = PointsIn[tid];
	int2 spos = (int2)(tid, LineNo);
	float4 color = read_imagef(ControlImg, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, spos);
	float lum = length(color);
	lum -= 0.5;
	lum *= DAmt;
	
	vert[1] += lum;
	
	PointsOut[tid] = vert;
}

I need to loop through only the vertex array, but read from the DImg image input on each iteration.

Anyone any tips?

a|x

The problem is, I can’t get the kernel only to loop through the items in the input array […] I need to loop through only the vertex array, but read from the DImg image input on each iteration.

I understand what you are trying to do but I don’t understand what problem you are having. First, forget about “iterating” and “loops”: work items run in parallel, not in a loop. OpenCL will make a lot more sense once you think about it in parallel.

So you want to process each input vertex and produce one output vertex for each input vertex. Also, each vertex requires a texture lookup. That seems to be what you are doing. What is wrong with it? Is the problem that your image is 2D and your vertices are laid out as a 1D vector? Can you elaborate on why the output is not the way you want?

Also, please post here the parameters you pass when you call clEnqueueNDRangeKernel(), since that controls the number of work-items that are executed.

Hi David,

thanks for getting back to me.
I’ve sorted it now. I must admit, as an enthusiastic but non-professional coder, I do have a hard time getting my head around the basic concepts of OpenCL. I’m also non-typical of forum members, in that I’m using OpenCL within the context of an authoring environment (Apple’s Quartz Composer), so I have no control over how the code is executed, other than that offered by the authoring environment itself. Basically, OpenCL is used as a scripting language within the application, along with JavaScript, GLSL and a proprietary subset of GLSL Apple use for 2D-only GPU-accelerated image-processing. In QC, there’s an option for manually setting the size of the output array, which solved my problem, in the end. The output array was previously the length of the total number of pixels in the input image, rather than the length of the input array. This seems to have speeded things up considerably, unsurprisingly.

Thanks again,

a|x