questions about gpu computing

Hi,
I would like to ask you the following questions:
1- do you know why the gpu is good for data parallel but not for task parallel? the arquitecture? why?
2- I heard the GPU is much slower than the CPU for passing data from and to memory. Do you have an idea of the diference in numbers? and the problem is with system memory but the gpu own memory is fast isn´t it?
Thanks!!

Pablo

1 - think about its heritage. It’s graphics. Embarassingly data-parallel. It’s also much much easier to implement very wide data-parallel processors as all you need to do is cut and paste the ALUs and register banks and hook them up to the same instruction logic.

2- Read the vendor documentation - it depends on the hardware. But general rule of thumb is 1 order of magnitude for level of remove from registers. The actual numbers don’t really matter because they’re beyond your control.

from the gpu side:
system memory is roughly 1 order slower than global memory (if across pcie bus).
global memory is roughly 1 order slower than registers.

In general, discrete gpus can access memory much faster than a cpu can: e.g. 100-170GB/s, vs 20-30GB/s, and PCIe 2 is about 5GB/s iirc (this information is readily available). Good cache coherency will give you about an order of magnitude improvement at the level it sits, so the numbers achievable depend greatly on the algorithm.

On unified memory devices (e.g. llano), system/cpu and global/gpu memory are the same thing. The system/register thing applies to CPUs too.

It’s only a problem if your code is not designed properly and is passing results around synchronously between gpu and cpu. But this type of design constraint occurs in every field of system design so it isn’t unique to gpu devices and the solutions involve well known approaches such as pipe-lining or multiple queues/threads. Of course, it doesn’t mean every problem can be changed to suit.

Thank you for answering!!