Maximum private memory?

I’m wondering how I can find out the size limits of __private memory.
I can already query the device for maximum __local and __global memory,
but I cannot seem to find what parts are split into private memory.

I need to know because I want to load some sets of data from global
memory (each thread loads a different data set), and then each of my
threads will search it’s own data set. Because each thread is independent
from each other, I cannot use a common __local memory to store it in.
(I already filled this space actually ;))

So I need to know how big chunks I can load from __global to
__private memory without causing problems. I understand this is
probably specific to what hardware I am using, so I am looking
for some way of calculating this amount.

Unfortunately this is very architecture-dependent as you guessed. In the case of Nvidia architectures I believe the more private memory you use, the less register file you have available, which means a smaller maximum workgroup size, and, hence, less parallelism. This implies a tradeoff between private memory usage and number of work-items running in parallel.

Well Ok I guess I will have to simply test this then :).

Copying some data from global to private and searching it
vs
Searching it directly in global.

Will report what I find :wink:

(Btw I use Tesla 1060 on linux 64 if it is of any help to you)