Quote Originally Posted by naroqueen
I don't know how to thank you Andrew for the detailed explanation... this made things much more clear.... by the way I tried implementing your idea of the character buffer and working on indices instead of pointers... it's now working...
Also I guess I'll stick to your advice David, of practicing C first... and I'd appreciate any advice of how to improve my self in C and OpenCl...
Thanks again.
Glad I could help.

There are lots of good books on C -- I'm sure you can find one with good reviews on Amazon. Even if you're a C expert though, I would still recommend this general implementation pattern:

  • implement in C[/*:m:3kysdict]
  • test/debug[/*:m:3kysdict]
  • minimal conversion to OpenCL C[/*:m:3kysdict]
  • test/debug on CPU device where you can debug and/or printf[/*:m:3kysdict]
  • test on GPU device[/*:m:3kysdict]
  • take advantage of OpenCL C extensions and built-in functions to optimize incrementally[/*:m:3kysdict]
  • test/debug on CPU[/*:m:3kysdict]
  • test on GPU device[/*:m:3kysdict]
  • profile and use this to inform next incremental optimization[/*:m:3kysdict]

It isn't too hard to build your application so that your enqueuing of kernels can be easily replaced by normal host function calls. In fact, as an extra incremental step, on most CPU devices you can enqueue your host function as a native kernel so that it still executes as part of your task graph.

It gets a bit more challenging when you start into data parallelism, i.e. using multiple work-items. Even there though you can usually start by creating it as 1 work-item that you enqueue from a loop. For this the OpenCL 1.1 feature of being able to provide work-item id offsets is very useful (in 1.0 you have to pass in and use the offsets manually).

The basic philosophy is to always have something working, and making small changes at a time so the problems are easier to figure out. Its important to have an idea of where you're trying to go (i.e. asynchronous optimized data-parallel tasks in a graph that runs parallel with your host code) so that you're taking steps in the right direction to get there, but small steps are much more manageable and it avoids the rather depressing feeling that nothing ever seems to be working. Using a version control system is extremely useful so that you keep a history of each incremental step of your process, along with an informative comment on each version that you put in your repository.