clEnqueueTask is returning CL_INVALID_COMMAND_QUEUE

I am trying to implement the Motion estimation algorithm using open CL.
I have written a code to calculate the SAD values in open CL and tried running on NVIDIA GE 9400GT GPU.
I am trying to encode Video with resolution 320x240. I have allocated 2 buffers with memory 320x240x2 bytes each. One for current frame and other for the reference frame.
I copied the data to the allocated buffers using clEnqueueWriteBuffer.

The problem I am facing is that the code is running fine for 10MBs, after that the SAD values are not proper. Some time I am getting the CL_INVALID_COMMAND_QUEUE error from clEnqueueTask function and some other time I am getting the same error from clEnqueueReadbuffer. I am using cl_Finish after each operation.

1). My doubt is whether I can allocate the memory for the complete frame.
2). Why the code is failing to run after running properly for 10 times.
3). Is memory the only issue or is there any other issue.

Some time I am getting the CL_INVALID_COMMAND_QUEUE error from clEnqueueTask function

That often means that your kernel has caused a page fault due to an invalid memory access. Review your kernel source.

Make sure to pass a pfn_notify function when you call clCreateContext(). That way you will get the error right when it happened.

Thanks David. Now it started working.

I have one more doubt.
My kernel funtion is
__kernel void s264e_me_ipel_sad( __global unsigned char *ref, __global unsigned char *org, int x, int y, int mv, int min_cost)
{
char ref1;

ref1 = ref + 320 * y + x;
org = ref1[0];
the above statement is not working.
But if use org = ref[320 * y+x]; it is working
}
May I know the reason.
Can you please suggest me a method to calculate the offset address.

“char *ref1 = ref + 320 * y + x;” not working.

You need to remember about the different address spaces. In your case, ref is a pointer in the global address space and so ref1 has to be in global space as well:


__global char *ref1 = ref + 320 * y + x;

Thanks David. I am able to use it.

Dear David. My Kernel code looks as shown below.

__kernel(arguments)
{
for(i=0;i<4;i++)
{

}

  for(j=0;j&lt;4;j++)
  {
   ....................
  }

I am passing global_id =4 to kernel and running the first for loop on 4 work items.
But, How can I run the second for loop on 4 work items. Should I call separate kernel for that?

Thanks,
Naresh
}

Sorry, I don’t understand the question. I believe you are saying that you enqueued an NDRange with four work-items. Also, I see two loops inside your kernel.

What are you trying to accomplish? Can you explain what you are trying to do using just C99 (without OpenCL)?