Hello,
I have a very simple kernel that adds a constant to all elements in an array and then updates an output. The kernel is simple and it looks as follows:
__kernel void add_constant_to_vec(__global const float *cpp,
__global float *out,
float offset,
int num_elements)
{
int gid = get_global_id(0);
if (gid >= num_elements) return;
out[gid] = cpp[gid] + offset;
}
Now I run this kernel as follows. I am using the C++ wrapper around the OpenCL API that I procured from Khronos. SDome bits are removed for brevity but the kernel runs fine and the output is correct.
cl::Buffer input_buffer(OCL::cl_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float)*NUM_ELEMENTS, input);
cl::Buffer output_buffer(OCL::cl_context, CL_MEM_WRITE_ONLY, sizeof(float)*NUM_ELEMENTS);
int num_elements = 100000;
float const_add = 100.0f;
const int NUM_WORK_ITEMS = 512;
const int GLOBAL_WORK_SIZE = round_up(NUM_WORK_ITEMS, num_elements);
cl::Event event;
cl::CommandQueue queue(OCL::cl_context, OCL::cl_context_devices[0]);
queue.enqueueNDRangeKernel(m_kernels[0], cl::NullRange, cl::NDRange(GLOBAL_WORK_SIZE), cl::NDRange(NUM_WORK_ITEMS), NULL, &event);
event.wait();
Now, a single run of this kernel works just fine. However, when I do this:
for (int i=0; i < 1000000; ++i)
{
queue.enqueueNDRangeKernel(m_kernels[0], cl::NullRange, cl::NDRange(GLOBAL_WORK_SIZE), cl::NDRange(NUM_WORK_ITEMS), NULL, &event);
event.wait();
}
Then it fails with
OpenCL error: clEnqueueNDRangeKernel
Error code: -6
which I think translates to out of memory on host but I am a bit surprised as to why that should be the case… Is there something I am doing wrong here? This is my first attempt at doing any OpenCL , so I would not be surprised if I am doing something obviously wrong
Also, any advise on optimising this very simple kernel further?
Thanks,
/xarg