Performance on APU with different buffer creation strategies

Cadorino · August 28, 2011, 8:08am

Hi!
I’m testing various buffer creation streategies on an APU (acer iconia tab). The algorithm is Saxpy (vector addition), performed many times with different vector sizes. In particular, I’d like to find out if on an APU there is the chance to perform vector addition on the GPU faster than on the CPU, something that is practically never convenient on a traditional architecture (CPU and GPU not on the same chip) due to the PCI bus latency.

Since the ram is shared between the CPU and the GPU, I expected that creating a buffer with USE_HOST_POINTER and using Mapping/Unmapping would lead to extremely better performances. However, I tested both a project where data transfers between buffers and host memory are performed “manually” (i.e. enqueueRead/WriteBuffer) and a project based on Mapping/Unmapping. In the first case, the GPU execution time begins to be lower than the CPU execution time for vectors that are bigger than about 1 million elements. In the second case, the GPU never “wins” on the CPU, that is, its execution time is always higher than the CPU one. Moreover, the GPU execution time with mapping is lower than the GPU execution time with copy only for “small” vector sizes, but it turns to be higher for quite big vectors.

Any idea about this? Is my assumpition wrong?

The C++ sources of the projects:

http://www.gabrielecocco.it/apu/SaxpyCopy.cpp
http://www.gabrielecocco.it/apu/SaxpyAl … opyPtr.cpp

The followings are the execution timings (GPU with copy and GPU with mapping), in the format: VECT_SIZE EXEC_TIME

http://www.gabrielecocco.it/apu/gpu_data_copy.txt
http://www.gabrielecocco.it/apu/gpu_data_map.txt

Cadorino · August 28, 2011, 8:11am

CPU execution timings:
http://www.gabrielecocco.it/apu/cpu_data.txt

The kernel:
http://www.gabrielecocco.it/apu/kernel.cl

Maxim_Milakov · August 29, 2011, 4:13am

You are using AMD OpenCL driver, right? Then you need to check AMD APP OpenCL programming guide, zero copy buffers are thoroughly covered in that document.

Cadorino · August 30, 2011, 2:35am

I’ve currently installed the following driver:
Driver Packaging Version 8.881-110728a-122938C-ATI
Catalyst Version 11.8
Provider ATI Technologies Inc.
2D Driver Version 8.01.01.1178
2D Driver File Path /REGISTRY/MACHINE/SYSTEM/ControlSet001/Control/Class/{4D36E968-E325-11CE-BFC1-08002BE10318}/0000
Direct3D Version 7.14.10.0855
OpenGL Version 6.14.10.11005
AMD VISION Engine Control Center Version 2011.0728.1756.30366
AMD Audio Driver Version 7.12.0.7702

I’ve also installed the AMD APP SDK 2.5.

Are those an appropriate version of the software needed?

Maxim_Milakov · August 30, 2011, 3:38am

Yes.