Hi all, I have the following kernel code
typedef struct {
float2 vel;
float mass;
float life;
} Particle;
typedef struct {
float2 pos;
float ejectForce;
float attractForce;
float waveAmp;
float waveFreq;
} Node;
__kernel void update(__global Particle* particles, //0
__global float2* posBuffer, //1
__global float4 *colBuffer, //2
__global Node *nodes, //3
const int numNodes, //4
) {
int id = get_global_id(0);
__global Particle *p = &particles[id];
__global Node *n = &nodes[id % numNodes];
float mass = particles[id].mass;
float2 pos = posBuffer[vboIndex];
In my kernel code after that, I directly use p, n, mass and pos. I’ve been trying to determine if that is faster, or directly accessing from the arrays, but the results seem roughly the same. I was wondering if those who understand the architecture better than I do can comment on theoretical performance difference? (NVidia 9600GT in Macbook Pro).