glDrawArrays & memory access weirdness

I just noticed to my very surprise that when glDrawArrays returns, the memory block passed with glVertexPointer & friends, is still accessed AFTER the glDrawArrays call. In some situations that leads to crashes (e.g. if the vertex buffer is deleted) and graphics corruption (if the vertex buffer is changed) immediately after glDrawArrays call. This is with NV hardware with fairly recent drivers on WinXP, not some crappy half-baked mobile ES implementation (or Intel).

Doesn’t this blatantly fail the specification? Did the driver writers really assume that everyone uses glDrawArrays to for static geometry only? And that the buffer won’t be deleted until after GL is done rendering? I understand how this may be good for performance, but isn’t this really risky?
I’m really surprised legacy applications work at all. For example, how is it possible to have a function void DrawCube(float size) { // you imagine the code } using glDrawArrays, where the temporary vertex buffer may be deallocated as the function returns, that it doesn’t crash? Is the driver somehow able to detect if the app is going to release that block of memory, and make a copy just in time? Or is it plain luck?

On a side note, a call to glFinish after glDrawArrays appears to help, but not in combination with multitexturing (glClientActiveTexture etc). Not sure what is up with that either.

Is there anyway to force proper driver behavior with a reliable workaround? I’d like to re-use a pre-allocated vertex buffer for dynamically changing geometry. I’ve thought of VBOs, but since the data completely changes between glDrawArrays calls, AFAIK my app would not benefit from buffering it the GPU. Though this is the only workaround I can think of besides immediate mode.

Are you talking about vertex arrays? It is pretty ancient approach and probably NV developers forget to check if the implementation is correct in new drivers. :wink:

Well, according to various official documents you should be able to alter VAs immediately after glDrawArrays() or similar function returns. You only have to confirm that you are changing values from the same thread, because if it is not so the solution is obvious.

If the problem persists send sample project with repro-case to NV, and in the meanwhile use double buffering technique (while drawing from some VAs, use others to fill with new data and swap at each frame).

Yes, plain old vertex arrays.

I’d be surprised if this wasn’t intentional? It would seem like a pretty obvious bug to miss in conformance tests. But certainly it would improve parallelism, if they took the gamble?

I considered double-buffering, but I’m drawing small amounts of geometry with relatively heavy shader work, and draw it successively with different data, in the same frame. So I can’t exploit pipeline stalls such as SwapBuffers.
If one call to glDrawArrays doesn’t finish in time before the next one, it may just as well fail to complete before any subsequent ones. Then I’d need tripple buffering? And what if three buffers aren’t enough? There is no guarantee if or when it will be done in time.

I don’t think it is intentional. VAs are rarely used these days, so improving their performance while violate their behavior at the same time seems like a wrong bet.

I didn’t mean that kind of double buffering, but switching your own VA buffers with each new draw-call.

I think double buffering will be enough, but you have to try. If it doesn’t succeed, try to use sync objects. Or, even better, also try to measure the time needed for glDrawArrays() to complete. That would probably reveal what is going under the hood. Also, check whether your code is correct. Instead of updating, just for the purpose of testing, try to delete VA buffers and create new before each call. The performance will be lower, but if this succeeds it would indicate that the problem is in the update and not in VA locking.

It would seem like a pretty obvious bug to miss in conformance tests.

What conformance tests?

I don’t think it is intentional. VAs are rarely used these days, so improving their performance while violate their behavior at the same time seems like a wrong bet.

Yes, you are probably right.

I didn’t mean that kind of double buffering, but switching your own VA buffers with each new draw-call.

I know what you meant, but I guess I wasn’t clear enough; I added another glDrawArrays call with a second VA buffer after the first glDrawArrays call, to see if that would force the previous call to complete (trigger a stall), but likewise, it returned immediately and the second buffer was thus exposed to the same problem. So the problem is that there’s no telling how many buffers I then need before the first becomes available again (duration may vary in a non-deterministic way due to varying HW performance).

try to delete VA buffers and create new before each call

Yes, I tried that, it causes a crash in NVOGL (as expected). It is what made me realize it wasn’t a bug in my code. I should also note that occasionally, everything worked without graphics corruption and crashes. I guess in that situation the first glDrawArray call doesn’t overlap with the subsequent one due to slightly varying hardware/OS/app performance.

To be entirely sure I also verified VA buffer content by drawing it a second time, side by side, with immediate mode. The buffer and it’s content are fine.

I’ve thought of VBOs, but since the data completely changes between glDrawArrays calls, AFAIK my app would not benefit from buffering it the GPU. Though this is the only workaround I can think of besides immediate mode.

What happened if you try to disable the arrays before changing your values ? I know this is not very elegant, might be a bit slower, but this will certainly tell the driver that it should not use the array anymore…

Yes, I enabled/disabled the VAs before/after each glDrawArrays call, with no positive effect. It seems glDrawArrays pushes the VA address onto a command stack, and immediately returns before the command has been fully processed. Subsequent glVertexPointer, glDrawArrays, glEnable/glDisable(GL_VERTEX_ARRAY) calls don’t force a pipeline stall/flush as I hoped. As I mentioned above, even glFinish doesn’t.

Sorry, I haven’t understood this from your previous posts.

There is still the old ext_compile_vertex_array extension, but it might be slow for your use, and also might not work :slight_smile: But it should ensure your data have the good values when they are locked.

And as you said:

I’m drawing small amounts of geometry
you can try to use VBO and use BufferSubData to update your data. This might be better. You can also try to use several VBO for each of your objects, or even maybe two VBO, one for each even/odd frame. I’m pretty sure this will be faster than immediate mode…

I used vertex arrays for code simplicity, performance isn’t that critical. Sticking with immediate mode for now, but if it becomes a bottleneck later on I will go for VBO as you describe.