Scalability issues in openCL.

I am new to openCL and trying to learn the basics. I have a doubt that if i specify the size of work-group and NDRange in a way so that the total number of work-items excede the processing elements in a device , whether it is possible to execute those work-items in parallel and if yes how it is implemented.

I will also be grateful if someone can explain how the abstract model of openCL is mapped to hardware model with some example.

This is covered in the introductory material for all GPU vendor’s documentation. Much better than could be explained with a few words - the diagrams help.

They also cover how specifying more work groups is beneficial - it’s pretty much the design rationale of modern GPU’s that allowed opencl to exist in the first place, so you can be sure it works - but you don’t really need to know how it’s done to use it.

e.g. nvidia opencl programming guide, section 2.1
amd accelerated paralel processing - opencl - programming guide, chapter 1.

(if you don’t already have them … use google)

This is covered in the introductory material for all GPU vendor’s documentation. Much better than could be explained with a few words - the diagrams help.

They also cover how specifying more work groups is beneficial - it’s pretty much the design rationale of modern GPU’s that allowed opencl to exist in the first place, so you can be sure it works - but you don’t really need to know how it’s done to use it.

e.g. nvidia opencl programming guide, section 2.1
amd accelerated paralel processing - opencl - programming guide, chapter 1.

(if you don’t already have them … use google)[/quote]

Thank you notzed for your suggestion.