caching variables from arrays

memo · November 10, 2009, 10:51am

Hi all, I have the following kernel code

typedef struct {
	float2 vel;
	float mass;
	float life;
} Particle;

typedef struct {
	float2 pos;
	float ejectForce;
	float attractForce;
	float waveAmp;
	float waveFreq;
} Node;

__kernel void update(__global Particle* particles,		//0
					 __global float2* posBuffer,		//1
					 __global float4 *colBuffer,		//2
					 __global Node *nodes,				//3
					 const int numNodes,				//4
					 ) {				
	
	int		id				= get_global_id(0);

	__global Particle	*p	= &particles[id];
	__global Node		*n	= &nodes[id % numNodes];
	
	
	float	mass			= particles[id].mass;
	float2	pos				= posBuffer[vboIndex];

In my kernel code after that, I directly use p, n, mass and pos. I’ve been trying to determine if that is faster, or directly accessing from the arrays, but the results seem roughly the same. I was wondering if those who understand the architecture better than I do can comment on theoretical performance difference? (NVidia 9600GT in Macbook Pro).

dbs2 · November 11, 2009, 11:30am

I don’t think it should matter. The compiler will load the memory access into a register, and if it can reuse that load later on it will do so. So in effect you get the version where you use a private variable either way if the compiler thinks it is better. The case where you can get a performance speedup is if you load a lot of data and have reuse, particularly within a workgroup. In that case you can manually load the data into the local memory and access it from there as a software managed cache.

memo · November 13, 2009, 4:01pm

ok makes sense thanks.