Strange problem wirting to image

Hi!

I have a strange problem here. I have a very simple kernel, each item writes a fix color(red) to a texture. I create a grid with the size of the image in the host application so i expect that the whole image is red. Using the cpu with the current intel opencl sdk everything works fine.
But using the GPU with a Gefoce GT 330, only half of the pixels in x direction are red. The rest is filled with a random color.


ipf::parser::cl::CLContext context;
context.create(CL_DEVICE_TYPE_GPU);

if(!context.createCommandQueue(CL_QUEUE_PROFILING_ENABLE))
   std::cout<<"CommandQueue Failed"<<std::endl;

std::string path =
   ipf::util::config.get<std::string>("ipf.shader.path") + "/";
	
path += "test.cl";
ipf::parser::cl::CLProgram prog = context.createProgram(path);
if(!prog.build())
   std::cout<<prog.log()<<std::endl;


int width = 392;
int height = 392;
	
ipf::parser::cl::CLKernel k = prog.createKernel("writeimage");
QImage result(QSize(width,height),QImage::Format_ARGB32);

cl_image_format format;
format.image_channel_order = CL_BGRA;
format.image_channel_data_type = CL_UNORM_INT8;

	
cl_int error = 0;
cl_mem dest = clCreateImage2D(context.contextID(),
							  CL_MEM_WRITE_ONLY,
							  &format,
							  width,height,0,NULL,&error);
if(error != CL_SUCCESS)
   std::cout<<"Error: "<<ipf::parser::cl::CLContext::errorName(error)
			<<std::endl;

	
error = clSetKernelArg(k.kernelID(),0,sizeof(cl_mem),&dest);
if(error != CL_SUCCESS)
   std::cout<<"Error: "<<ipf::parser::cl::CLContext::errorName(error)
			<<std::endl;

size_t local[2] = 
   {
	   8,8
   };
size_t global[2] = 
   {
	   width,height
   };

error = clEnqueueNDRangeKernel(
	context.commandQueue().commandQueueID(), k.kernelID(), 2,
	0,global,local,0,0,NULL);
if(error != CL_SUCCESS)
	std::cerr<<"CLKernel::run(): "<<
	   ipf::parser::cl::CLContext::errorName(error)<<std::endl;


size_t origin[] = 
   {
	   0,0,0
   };
size_t region[]  =
{
	width,height,1
};
	
	
error = clEnqueueReadImage(context.commandQueue().commandQueueID(),
						   dest,CL_TRUE,
						   origin,
						   region,
						   width*4,
						   0,result.bits(),0,NULL,NULL);
if(error != CL_SUCCESS)
   std::cerr<<"Error: "<<
	   ipf::parser::cl::CLContext::errorName(error)<<std::endl;
	
result.save("geht.png");


The context and queue creation is wrappend with some c++ code. I use qt for saving the image. Im quite sure the image format is setup correctly because i took it from the QtOpenCL lib. With this lib btw my kernel works correctly so it should not be a driver bug.

Here is the kernel:


__kernel void writeimage(__write_only image2d_t dest)
{
	
	int2 pos = (int2)(get_global_id(0),get_global_id(1));
	float4 pixel = (float4)(0,0,1,1);
	//pixel=read_imagef(src,sampler,pos);

	write_imagef(dest,pos,clamp(pixel,0.0f,1.0f));
}

The result using the code above, is an image with the size of 392x392 pixels. But only 196x392 from the leftside is red, the rest is filled with (i guess) random color. Anybody sees the problem? Im staring at the code for a couple of hours but i dont find the mistake.
Is there something from with the EnqueueNDRange call?

Ok i just rewrite everything to get rid of qt and the opencl wrapper stuff. But it still does not work. Now i save the result from clEnqueueReadImage in a uchar buffer.
Printing out this buffer you can see that after width/2 steps in x-direction the color values are messed up. When i set the global work item size to width*2 it works. Is it possible that too less threads are created? Or am i doing something wrong with the clEnqueueNDRange call.


	cl_context context = 0;
	cl_command_queue queue = 0;
	cl_program program = 0;
	cl_device_id device = 0;
	cl_kernel kernel = 0;
	cl_int err = 0;
	cl_platform_id platformid = 0;
	cl_uint ptn = 0;
	

	err = clGetPlatformIDs(1,&platformid, &ptn);
	if(err != CL_SUCCESS || ptn <= 0)
	   std::cout<<"CLPlafrom fail"<<std::endl;

	cl_context_properties contextProperties[] = 
	   {
		   CL_CONTEXT_PLATFORM,
		   (cl_context_properties) platformid,
		   0
	   };
	
	context = clCreateContextFromType(contextProperties,
									  CL_DEVICE_TYPE_GPU,
									  NULL, NULL, &err);
	if(err != CL_SUCCESS)
	   std::cout<<"context fail"<<std::endl;

	size_t devsize = -1;
	err = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL,
						   &devsize);
	if(err != CL_SUCCESS)
	   std::cout<<"device info fail"<<std::endl;
	std::vector<cl_device_id> devices(devsize);
	err = clGetContextInfo(context,CL_CONTEXT_DEVICES,devsize,&devices[0],
					 NULL);
	if(err != CL_SUCCESS)
	   std::cout<<"device info fail"<<std::endl;

	device = devices[0];
	queue = clCreateCommandQueue(context,device,0,NULL);
	if(queue == NULL)
	   std::cout<<"queue fail"<<std::endl;

	std::string path =
	   ipf::util::config.get<std::string>("ipf.shader.path") + "/";
	
	path += "test.cl";

	std::ifstream code(path.c_str(),std::ios::in);
	if(!code.is_open())
	   std::cout<<"soruce fail"<<std::endl;
	std::ostringstream oss;
	oss <<code.rdbuf();
	std::string src = oss.str();
	const char *srcc = src.c_str();
	program = clCreateProgramWithSource(context,1,(const char**)&srcc,NULL,
										NULL);
	if(program == NULL)
	   std::cout<<"program fail"<<std::endl;

	err = clBuildProgram(program,0,NULL,NULL,NULL,NULL);
	if(err != CL_SUCCESS)
	   std::cout<<"build fail"<<std::endl;

	kernel = clCreateKernel(program, "writeimage", NULL);
	if(kernel == NULL)
	   std::cout<<"kernel fail"<<std::endl;

	int width = 512;
	int height = 512;
	int bytes = 4;
	int bytesperline = width*4;
	uchar *buffer = new uchar[width*height*bytes];
	
	//QImage image(width,height,QImage::Format_ARGB32);
	cl_image_format iformat;
	iformat.image_channel_order = CL_BGRA;
	iformat.image_channel_data_type = CL_UNSIGNED_INT8;
	
	cl_mem dest = clCreateImage2D(context,
								  CL_MEM_WRITE_ONLY,
								  &iformat,
								  width,
								  height,
								  0,
								  NULL,
								  &err);
	if(err != CL_SUCCESS)
	   std::cout<<"dest fail"<<std::endl;
	
	err = clSetKernelArg(kernel,0,sizeof(cl_mem),&dest);
	if(err != CL_SUCCESS)
	   std::cout<<"kernel set fail"<<std::endl;

	size_t global[2] = 
	   {
		   width,height
	   };
          //Fixes the problem
         //size_t global[2] = 
	  // {
	//	   width*2,height
	//   };
	
	size_t local[2] = 
	   {
		   8,
		   8
	   };
	

	err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL,
								 global,local,0,NULL,NULL);
	if(err != CL_SUCCESS)
	   std::cout<<"ND range failt"<<std::endl;

	size_t origin[3] = 
	   {
		   0,0,0
	   };

	size_t region[3] = 
	   {
		   width,height,1
	   };

	err = clEnqueueReadImage(queue,dest,CL_TRUE,origin,region,
						 0,0,buffer,0,0,0);
        //Does not fix the problem
        //err = clEnqueueReadImage(queue,dest,CL_TRUE,origin,region,
	//						 0,4*width,buffer,0,0,0);
	if(err != CL_SUCCESS)
	   std::cout<<"read image fail"<<std::endl;


	int k = 0;
	for(int i= 0; i < width*4; i+=4){
		std::cout<<k++<<" :("<<
		   (int)(buffer[i])<<" " <<
		   (int)(buffer[i+1])<<" "<<
		   (int)(buffer[i+2])<<" "<<
		   (int)(buffer[i+3])<<")"<<std::endl;
	}

Here is the kernel, i changed from normalized values to not normalized:


__kernel void writeimage(__write_only image2d_t dest)
{

	int x = get_global_id(0);
	int y = get_global_id(1);
	
	//int2 pos = (int2)(get_global_id(0),get_global_id(1));
//	float4 pixel = (float4)(0,1,0,1);
	uint4 pixel = (uint4)(255,0,0,255);
	
	//pixel=read_imagef(src,sampler,pos);

	//write_imagef(dest,(int2)(x,y),pixel);
	write_imageui(dest,(int2)(x,y),pixel);
	
}

What is wrong here? A driver bug?

It doesn’t look like you are waiting for the kernel to finish. clEnqueueNDRangeKernel is not blocking so you may be reading before the GPU is done working. either create an event and wait for it to finish or call clFinish() after your call to clEnqueueNDRangeKernel.

It doesn’t look like you are waiting for the kernel to finish.

That is not necessary. Right after calling clEnqueueNDRangeKernel() there is a blocking call to clEnqueueReadImage(). There are very few situations where calling clFinish() is necessary or a good idea.

Wow, looks frustrating. I can’t see anything obviously wrong with the code.

About all I can suggest is:

  • try using CL_RGBA as the image format type (no particular reason other than that this is what i always use).
  • try a different driver version

Hi!

First of all thank you for your suggestions.

@themilesman:
That was one of the first things i have tried. Even i call ReadImage in blocking mode i thought
i have to call clFinish too. But it doesnt help. Bug is still there.

@david.garci
I tried different image formats. Nothing solve the problem.

At the moment i thing it must be a driver problem. I tested the code on different gpus and different operating system. On a ati hd 6570 and on an nvidia geforce gtx 560( both on windows) it works without problem. On my system (Linux, Nvidia gt 330, Driver: 285.05.09) the bug occurs. On monday it will try the same setup but under windows.

The funny thing is that from time to time it works. But when i change the rgba values in the kernel half of the image is filled with the new color while the other half has the color of the old rgba values from a prevoius run. It seems that get_global_id returns a wrong thread id or write_image can’t access coordiantes with x > image_width/2.