I'm converting some particle system code from using OpenGL shaders to update particle positions. This code currently stores the positions and velocities of particles as 3 element floating point vectors, and uses vec3's to do the calculation.

Looking at the OpenCL spec I notice that vectors can only be sized in powers of 2. This isn't a great hardship in the actual calculation, as I understand most hardware would have 128 bit vectors anyway, so I shouldn't get a penalty. I would be specifying a third more work for a scalar architecture, but I'd hope that could be pruned if a compiler sees I'm not using the result.

The problem is reading the data into the kernel. Is there a nice way of getting the data in/out a single access, or am I resigned to doing something like:

Code :
__kernel particle (global float * positions, global float * velocities) {
    int idx = get_global_id(0) * 3;
    float4 position = (float4) (positions[idx +0], positions[idx +1], positions[idx +2], 1.0)
    float4 velocity = (float4) (velocities[idx +0], velocities[idx +1], velocities[idx +2], 0.0)
    positions[idx +0] = position.x;
    positions[idx +1] = position.y;
    positions[idx +2] = position.z;
    velocities[idx +0] = velocity.x;
    velocities[idx +1] = velocity.y;
    velocities[idx +2] = velocity.z;

I'd like to keep the 3 element structure in memory to avoid a 33% increase in memory for no gain.


