Hi all, I have the following kernel code

Code :
typedef struct {
	float2 vel;
	float mass;
	float life;
} Particle;
typedef struct {
	float2 pos;
	float ejectForce;
	float attractForce;
	float waveAmp;
	float waveFreq;
} Node;
__kernel void update(__global Particle* particles,		//0
					 __global float2* posBuffer,		//1
					 __global float4 *colBuffer,		//2
					 __global Node *nodes,				//3
					 const int numNodes,				//4
					 ) {				
	int		id				= get_global_id(0);
	__global Particle	*p	= &particles[id];
	__global Node		*n	= &nodes[id % numNodes];
	float	mass			= particles[id].mass;
	float2	pos				= posBuffer[vboIndex];

In my kernel code after that, I directly use p, n, mass and pos. I've been trying to determine if that is faster, or directly accessing from the arrays, but the results seem roughly the same. I was wondering if those who understand the architecture better than I do can comment on theoretical performance difference? (NVidia 9600GT in Macbook Pro).