Results 1 to 4 of 4

Thread: glClientWaitSync always timeouts, glMapBufferRange always stalls

  1. #1

    Unhappy glClientWaitSync always timeouts, glMapBufferRange always stalls

    Extra explanations

    I completely don't understand what's wrong with my tries to make asynchronous frames downloading from FBO. I'm now is trying to get high FPS in fullscreen rendering with OpenGL.

    Well, that's OpenGL ES 3.0, I use GL functions from QOpenGLExtraFunctions (QT framework), but I think this context is not important.

    I have a background OpenGL rendering thread, which for now draws nothing, just reads frames from FBO with no pauses.

    My screen has resolution 1920x1080 pixels, so FBO has the same size.

    I realised, that glReadPixels are too slow to transfer such big frames through PCI from NVIDIA video card to RAM, I have about 55 FPS, but I want 60 FPS.

    Then I knew about PBOs and got an idea, that I can copy frames from FBO to a PBOs (create a buffer with GL_PIXEL_PACK_BUFFER, bind it and call glReadPixels, which in this case copies pixels to PBO in video card memory, not to a storage on RAM, and returns immediately because GL_PIXEL_PACK_BUFFER is bound) and asynchronously transfer them to my storages on RAM after calling glMapBuffer. And while the last frame is being written to storages on RAM, I map and draw last second frame, which is already completely (as we hope) transferred to RAM.

    I also read about shared contexts for multithreading, but as I understood, the best solution for performance - one thread for one context with asynchronous data downloading/uploading, just forget about shared contexts.

    glMapBufferRange stalling issue

    So i have, for easy example, two buffers. What i realised next:

    Code :
    glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[0]);
    glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, 0); // ~50 microsecs
     
    glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[1]);
    glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, 0); // ~50 microsecs
     
    glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[0]);
    glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height * 4, GL_MAP_READ_BIT); // ~20000 microsecs
    glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
    glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, 0); // ~50 microsecs
     
    glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[1]);
    glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height * 4, GL_MAP_READ_BIT); // ~15 microsecs
    glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
    glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, 0); // ~50 microsecs
     
    // keep mapping and glReadPixel'ing pbo[0] and pbo[1] with same call durations

    As you see, mapping of first PBO stalls CPU for 20 ms, but the mapping of second PBO is no-op.

    But I need about the same time duration between mapping of PBOs.

    How I understand, that means that when I map the first buffer, it causes synchronization such what OpenGL is finishing glReadPixels to 1st and 2nd PBOs before return, because I try to map same PBO that is already using in other GL commands (glReadPixels), but instead of waiting 1st glReadPixels finish, GL just flushes all already queued commands, including 2nd glReadPixels.

    But! When i place std::this_thread::sleep_for(10ms) before every glMapBufferRange, i get same durations, so when my CPU thread have waited enough before calling glMapBufferRange, glMapBufferRange call for 1st PBO still takes 20ms! That's why i have "glMapBufferRange always stalls" in title.

    Otherwise, I have no idea what's happening. So did I understood this right?

    glClientWaitSync timeout issues

    Then I knew about OpenGL synchronization objects, which are inserted into GL command queue, and when such object is processed by GL and signalled, that means, that all commands in the queue before this objects are processed.

    So I wanted to insert glFenceSync just after glMapBufferRange and glClientWaitSync just before glMapBufferRange, or after/before glReadPixels, to make my frames to be updated evenly. But I still didn't try, because my sync objects just don't work properly.

    Now i try to execute just this:

    Code :
    GLsync fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
    while (true)
    {
    	GLenum syncRes = glClientWaitSync(fence, 0, 1000);
    	switch (syncRes)
    	{
    		case GL_ALREADY_SIGNALED: qDebug() << "ALREADY"; break;
    		case GL_CONDITION_SATISFIED: qDebug() << "EXECUTED"; break;
    		case GL_TIMEOUT_EXPIRED: qDebug() << "TIMEOUT"; break;
    		case GL_WAIT_FAILED: qDebug() << "FAIL"; break;
    	}
    	if (syncRes == GL_CONDITION_SATISFIED || syncRes == GL_ALREADY_SIGNALED) break;
    }
    glDeleteSync(fence);

    This loop becomes infinite and always prints "TIMEOUT", so as I understand, GL just can't process this sync fence, although I've inserted it into the command queue.

    So what's wrong with my sync fences using?
    Last edited by mr.indieperson; 01-19-2018 at 09:48 AM.

  2. #2
    As you see, mapping of first PBO stalls CPU for 20 ms, but the mapping of second PBO is no-op.
    What exactly do you expect to happen here? You told OpenGL that you wanted to do an async transfer into a buffer. Then you told OpenGL that you're going to read from that buffer. Which means you have to be able to see all of the data in that buffer, which includes the results of the transfer. Therefore, OpenGL must synchronize with the async process you just started.

    You may as well have just used `glReadPixels` into client memory directly.

    Remember: OpenGL is a synchronous API. It allows things to behave asynchronously, but only so far as everything still works "as if" it were synchronous. Which means that, so long as you don't look at the result of a process, it can be executed asynchronously. If you actually look, the implementation must synchronize.

    So if you want to make an async transfer actually improve performance, you have to wait before you access the buffer. Ideally at least one frame long. And if you're going to busy-wait on a fence issued after the transfer, there's really no point in having the fence; just map the buffer.

    So I wanted to insert glFenceSync just after glMapBufferRange and glClientWaitSync just before glMapBufferRange, or after/before glReadPixels, to make my frames to be updated evenly. But I still didn't try, because my sync objects just don't work properly.
    Sync objects have to be properly flushed; if you don't, they may never become signaled. This is why `glClientWaitSync` can take the `GL_SYNC_FLUSH_COMMANDS_BIT` flag.

  3. #3
    Quote Originally Posted by Alfonse Reinheart View Post
    What exactly do you expect to happen here? You told OpenGL that you wanted to do an async transfer into a buffer. Then you told OpenGL that you're going to read from that buffer. Which means you have to be able to see all of the data in that buffer, which includes the results of the transfer. Therefore, OpenGL must synchronize with the async process you just started.
    Thank you for reply!

    But i want to repeat myself:

    When i place std::this_thread::sleep_for(10ms) before every glMapBufferRange, i get same durations, so when my CPU thread have waited enough before calling glMapBufferRange, glMapBufferRange call for 1st PBO still takes 20ms! That's why i have "glMapBufferRange always stalls" in title.

    So i give enough of time for OpenGL to finish all commands before i map my first PBO. 10ms, or 30ms, or 100ms, whatever. But it still takes 20 ms! Why? I really have no idea. I'm sorry, if i don't understand something obvious.

  4. #4
    Well, another man answered me, that i should call glFlush() before std::this_thread::wait_for(), to push commands to GL forcely, i didn't think about this completely. Now when a thread is woke up, buffers are loaded already.

    And yes, i totally missed GL_SYNC_FLUSH_COMMANDS_BIT in glClientWaitSync().

    Thank you!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Proudly hosted by Digital Ocean