rendering an indexed triangle array with vbo

Good day to all, i have an array of triangles where each element is an index to a corresponding vertex in a vertex buffer. Now i have uploaded the vbo and ibo into a vao , lets pretend i want to draw only a small set of triangles in a random order, let’s say i want to draw triangle 5, 24, 32 ,16 ,8 and so on . How can i achieve this without rebuilding the ibo for each frame ?
sorry but i am pretty new to opengl.
Thanks for your time

Use multiple draw calls; the parameters to all glDraw* calls allow you to specify a first (which with glDrawElements is the offset to the first) and count, so use them, because you don’t have to draw the entire buffer with each draw call.

Sorry i didn’t understand completely what you meant, wouldn’t this be very inefficient to have a drawcall for each triangle ???

glMultiDrawElements.

I’d try both glMultiDrawElements() and building the index array each frame, and see which is faster.

I wouldn’t take it for granted that glMultiDrawElements() will win for single-triangle batches.

It depends.

If you really really want to draw them without rebuilding your index buffer then this is one way of doing so, and it’s a tradeoff versus the cost of multiple draw calls versus the cost of rebuilding the index buffer.

glMultiDrawElements will also work but some drivers may implement that as a loop over a bunch of glDrawElements calls.

A third way, that’s probably going to seem unintuitive, is glBegin(GL_TRIANGLES), a bunch of glArrayElement calls (3 per triangle), then glEnd. This lets you keep your vertex buffer and dynamically index it without using an index buffer and may be fastest of all.

The problem of rebuilding the index buffer is not that horrible, i’d like to update only the index buffer without using glbufferdata , since i have read around that its not the most efficient way to do , using gldrawelement is another deprecated method, i think that the only viable way is to use glmultidrawelement

There’s nothing deprecated about glDrawElements.

There are 3 basic options: glBufferData(), glBufferSubData(), and glMapBufferRange.

Replacing the entire buffer with glBufferData() is likely to be preferable to using glBufferSubData() to overwrite a portion of it. The former allows the implementation to allocate a new block of memory for the buffer and “orphan” the old block (flag it to be deleted once the last command referencing it has completed). The latter has to wait until the GPU has finished reading from the region being replaced.

Similarly, glMapBufferRange() is preferable to glMapBuffer(), as the latter supports various flags to allow the GPU to avoid unnecessary copies and/or pipeline stalls.

As far as i can understand the *range instruction replace only a portion leaving the size untouched, i need to draw an indexed triangle buffer, in the sense that i have a bound vertex buffer to a vao.
Now i have read that if i bound to the same vao an element array , the gldrawelements 's last parameter is considered as an offset to the already bound buffer, if element array is not bound the last parameter
is considered to be an indices , correct me if i am wrong.
P.S.
English is not my native language, sorry for that, further i have read about double buffering the vbos , where can i find some concrete code ?

That is correct, assuming that you’re using the compatibility profile. If you’re using the core profile, it is an error if no element array buffer is bound.

In general, where OpenGL 1.x reads large amounts of data from client-side arrays, OpenGL 2 and OpenGL 3+ compatibility profile allows the data to be read from a buffer object, with the pointer argument becoming an offset into the buffer, and OpenGL 3+ core profile (usually) requires the data to be read from a buffer object.

I have come up with this solution, and it seems to work , i’d like to know if i did everything correctly or there is something i must fix


// copy all the original index buffer into another buffer * only for testing *

std::vector<GLuint> indices;
for (size_t i = 0; i < LevelMesh->GetIndicesCount(); ++i)
    indices.push_back(LevelMesh->SurfaceIndices[i]);

// bind the index buffer object and tell opengl to explicitly orphan it

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, IndexBufferObject);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, indices.size() * sizeof(GLuint), 0, GL_DYNAMIC_DRAW);

// map the buffer

GLuint *mappedindices =reinterpret_cast<GLuint*>(glMapBufferRange(GL_ELEMENT_ARRAY_BUFFER, 0,sizeof(GLuint)*indices.size(),GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT));

// copy data into the mapped memory

std::copy(indices.begin(), indices.end(), mappedindices);

// unmap the buffer

glUnmapBuffer(GL_ELEMENT_ARRAY_BUFFER);

// draw mesh

glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);

glBindVertexArray(VAOid);

glDrawElements(
    GL_TRIANGLES, // mode
    indices.size(), // count
    GL_UNSIGNED_INT, // type
    (void*)0 // element array buffer offset
);

glBindVertexArray(0);
glUseProgram(0);

Thanks to all

That looks fine, assuming that the setup code and the rendering code have been glued together for this post, and aren’t like that in the original code. If that is the exact code, then the glBindVertexArray() call is out of place; the VAO needs to be bound when you bind to GL_ELEMENT_ARRAY_BUFFER, as that particular binding is stored in the current VAO (the other bindings are context state).

For anyone who might be interested, i have tested both methods and glbufferdata is faster than mapping the buffer and then copy the data , the cpu bound copy operation stalls the cpu pretty good, while glbufferdata seems to allocate memory much faster

Have you tried:

  1. Mapping the buffer with the UNSYNCHRONIZED flag, and/or
  2. Mapping the buffer with the PERSISTENT and COHERENT flags set?

In the latter case, you do it once up-front and then leave it mapped.

For more details, see Buffer Object Streaming in the GL wiki.

This is about what I’d expect. I wouldn’t say that the copy is the bottleneck with mapping though; rather that mapping (even with unsynchronized) incurs synchronization anyway.

In general I would expect glBufferData - with or without orphaning, but provided you don’t do it too many times per frame - to give better performance than a naive/simple mapping approach (I define any mapping approach that isn’t persistent/coherent as “naive/simple”). The main reasons why are that the driver can (in theory) manage resource contention for you better than naive/simple mapping, and that after a few frames everything will settle to a steady state where the driver is just recycling previously allocated blocks of memory rather than doing any new allocations.

The various AZDO persentations bear this out - from recollection, naive/simple GL mapping was slowest by a long shot, glBuffer(Sub)Data/D3D mapping were about equal, and persistent/coherent GL mapping was fastest by a long shot on the tested hardware.

Typically with glBuffer(Sub)Data you will want to make a few calls as possible per frame: preferably only one. GL drivers tend to be quite poor at managing multiple updates and you’ll fall off the performance cliff again. The best way to achieve this is to make two passes over your geometry; the first pass collects objects to render and copies off data that will go into buffer objects to a large system memory array. Then you update the buffer objects one time only. Then you make the second pass which actually does the drawing via GL API calls. This is a technique that came up a few times in the course of discussions about pre-GL_ARB_buffer_storage UBO updates and it should hold good for any other kind of buffer object too.

D3D mapping with discard/no-overwrite is more robust in the face of multiple updates (being able to sustain high performance with 10s, 100s or even 1000s of updates to the same buffer per frame) and I believe the difference is mostly down to GL’s client/server architecture (which D3D just doesn’t have).

With persistent/coherent GL mapping you have more freedom to just draw everything as it passes, of course, provided you exrcise appropriate care with correct usage.

The other option, provided you’re not purist about using core profile only functionality, is to source data from client-side arrays, which can often run faster than buffer object updates. The API doesn’t always allow it though.