I'm converting some particle system code from using OpenGL shaders to update particle positions. This code currently stores the positions and velocities of particles as 3 element floating point vectors, and uses vec3's to do the calculation.

Looking at the OpenCL spec I notice that vectors can only be sized in powers of 2. This isn't a great hardship in the actual calculation, as I understand most hardware would have 128 bit vectors anyway, so I shouldn't get a penalty. I would be specifying a third more work for a scalar architecture, but I'd hope that could be pruned if a compiler sees I'm not using the result.

The problem is reading the data into the kernel. Is there a nice way of getting the data in/out a single access, or am I resigned to doing something like:

Code :
__kernel particle (global float * positions, global float * velocities) {
    int idx = get_global_id(0) * 3;
    float4 position = (float4) (positions[idx +0], positions[idx +1], positions[idx +2], 1.0)
    float4 velocity = (float4) (velocities[idx +0], velocities[idx +1], velocities[idx +2], 0.0)
    positions[idx +0] = position.x;
    positions[idx +1] = position.y;
    positions[idx +2] = position.z;
    velocities[idx +0] = velocity.x;
    velocities[idx +1] = velocity.y;
    velocities[idx +2] = velocity.z;

I'd like to keep the 3 element structure in memory to avoid a 33% increase in memory for no gain.


This may be more suitable in one of the other forums, but they don't seem to be active. If the moderators want to move the thread that's fine by me.