hi, i'm trying to implement a simple particle system

last time i used transform feedback objects and double buffering, it delivers (im my judgement) very good results:

without collision detection, about 3 millions particles can be simulated witht 60 frames per second (only gravity and collision with y = 0 level enabled)

with simple line-triangle-intersection method to detect collisions between particles and some (few) triangles in the scene, i can render about 800.000 particles with 60 frames per second

(my graphics card: NVIDIA GT 640, about 3 years old)

this time i want to push the limits further by using compute shaders, i managed to build this application:

web.engr.oregonstate.edu/~mjb/cs557/Handouts/compute.shader.1pp.pdf

i changed that to only 1 particle buffer for position / velocity / color / etc, but double buffered

the rendering method looks like this:

Code :void ParticleSystem::Render(const glm::mat4 & view, const glm::mat4 & projection, float timestep) { // double buffered, switch vertex array every frame static unsigned int flipflop = 1; flipflop = !flipflop; // bind both particle buffers glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, m_particle_buffer[1 - flipflop].ID()); // source glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, m_particle_buffer[flipflop].ID()); // results // compute shader unsigned int program = m_program_update.ID(); // simulate 1 frame glUseProgram(program); glDispatchCompute(m_particle_count / PARTICLES_WORK_GROUP_SIZE, 1, 1); // work group size = 128 glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); // render 1 frame program = m_program_render.ID(); glUseProgram(program); glUniformMatrix4fv(glGetUniformLocation(program, "Model"), 1, false, glm::value_ptr(glm::mat4(1))); glUniformMatrix4fv(glGetUniformLocation(program, "View"), 1, false, glm::value_ptr(view)); glUniformMatrix4fv(glGetUniformLocation(program, "Projection"), 1, false, glm::value_ptr(projection)); glBindVertexArray(m_vertexarray[flipflop].ID()); glDrawArrays(GL_POINTS, 0, m_particle_count); glBindVertexArray(0); glUseProgram(0); }

question 1:

i've read that glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); is used to syncronize and is relatively expensive, so that if i want to read back data from that buffer, i can be sure that the compute shader already finished processing the data

BUT: i use 2 buffers, the comput shader calculates data for te next frame, the current one renders the "old" frame from which the compute shader ONLY reads data

do i acually need to syncronize ?

or can i delete glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); without problems ?

compute shader source:

Code :#version 450 layout(local_size_x = 128, local_size_y = 1, local_size_z = 1) in; layout (std140, binding = 0) buffer Source { vec4 DataSource[]; }; // particle buffer to read from layout (std140, binding = 1) buffer Destination { vec4 DataDestination[]; }; // particle buffer to write into const vec3 gravity = vec3( 0, -9.81, 0); const float timestep = 0.016; void main() { // read old data // this is a 1-dimensional calculation because the data is a 1D array (of particles) uint index = gl_GlobalInvocationID.x; // .y and .z == 1 vec4 data0 = DataSource[3 * index + 0]; vec4 data1 = DataSource[3 * index + 1]; vec4 data2 = DataSource[3 * index + 2]; vec3 position = data0.xyz; float lifetime = data0.w; vec3 velocity = data1.xyz; float unused = data1.w; vec4 color = data2; // calculate new data //vec3 accelleration = gravity; vec3 accelleration = vec3(0, 0, 0); vec3 position_new = position + velocity * timestep; float lifetime_new = lifetime - timestep; vec3 velocity_new = velocity + accelleration * timestep; vec4 color_new = color; if (position_new.x < -1) { position_new.x = -1; velocity_new.x *= -0.9; } if (position_new.y < -1) { position_new.y = -1; velocity_new.y *= -0.9; } if (position_new.z < -1) { position_new.z = -1; velocity_new.z *= -0.9; } if (position_new.x > +1) { position_new.x = +1; velocity_new.x *= -0.9; } if (position_new.y > +1) { position_new.y = +1; velocity_new.y *= -0.9; } if (position_new.z > +1) { position_new.z = +1; velocity_new.z *= -0.9; } // write new data DataDestination[3 * index + 0] = vec4(position_new, lifetime_new); DataDestination[3 * index + 1] = vec4(velocity_new, 0); DataDestination[3 * index + 2] = color_new; }

question 2:

what about the ModelxViewxProjection matrix calculation in the vertex shader (for rendering the particles) ?

should i move this calculation also to the compute shader and store the results in a third buffer ? what about syncronising ?

question 3:

what about a struct Particle { ... }; in the compute shader as data source / destination array, can i assume that the data is packed tightly together or do i have to bother about any offsets between struct members ??

(i would like to avoid this uglyness)

Code :vec4 data0 = DataSource[3 * index + 0]; vec4 data1 = DataSource[3 * index + 1]; vec4 data2 = DataSource[3 * index + 2]; vec3 position = data0.xyz; float lifetime = data0.w; vec3 velocity = data1.xyz; float unused = data1.w; vec4 color = data2;