I have point lists with x- and y-coordinates. I calculate a result for each combination of the points with a lot of operations in each opencl process.
Then I want to count how often the different results have been found (frequency of each result).
At the moment the calculation of the results is done in a kernel. The results are saved to list.
In the C+±part i count the frequency in an array…
Is it possible to make this in OpenCL? I tried it, but i got problems. I think it couldn’t work because it isn’t synchronised.
Can anybody help me? I need an information about this or an information to find informations about this.
In the actual version i merge the results in C++. So i have a number of global threads of ~5000000 (~3000 points). My “histogramm”-memory has 200*200 dimensions.
Now i have to use local threads.
What would be the best distribution?
Did you get any error codes from any of the API calls you made, such as clEnqueueNDRangeKernel()? Did you pass a pfn_notify function to clCreateContext()?
Could you answer the other questions about error codes, etc?
As for the usage of local memory, 1.25MB is still a lot more than what your hardware probably supports. You can query the amount of available local memory with clGetDeviceInfo(…, CL_DEVICE_LOCAL_MEM_SIZE, …).
As long as clEnqueueNDRangeKernel() requires more local memory than is available on your system the program will not work.
You can compute partial histograms within a work-group using barriers and finally add together the partial histograms using atomic operations like atom_add().
Although perhaps you are saying that your device doesn’t support that extension? In that case you can compute partial histograms, store them in global memory and then launch one more NDRange kernel with a single work-group to add together the partial histograms.
Now i updated my system to OpenCL 1.1. I had to change some things. Now i get this error message when compiling the kernel:
Try to compile the program... Error: Failed to build program executable!
Error: Code selection failed to select: 0xf764e38: i32,ch = AtomicLoadAdd 0xf764
1e8, 0xf76ce78, 0xf764d98 <0xeff5eb8:0> <volatile> alignment=4
This happens when i use:
atom_inc(&result[position]);
I got following informations with the NVDIA OpenCL Device Query about the CL_DEVICE_EXTENSIONS:
CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_d3d9_sharing
cl_nv_d3d10_sharing
cl_khr_d3d10_sharing
cl_nv_d3d11_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
So i can’t use atomic operations with my machine?!
The problem of the solution with partial histograms is that i need 201201sizeof(int) for each partial histogram. That can’t work because i the maximum possible to allocate are 16kB for local variables…
The problem of the solution with partial histograms is that i need 201201sizeof(int) for each partial histogram.
You can do it in multiple phases. Instead of computing a histogram with 201x201
bins, you can first compute partial histograms with (for example) only 201 bins each and then refine each of them in another step. Think of it as a multiresolution approach. That’s how I would do it.
If I were you I would go to citeseer.ist.psu.edu and try to find if there are some papers on that topic. It must be a well-researched area.