glreadpixels too slow

I know this question has cropped up before but maybe something is now possible that wasn’t previously…

Is there a faster alternative to glreadpixels? It is currently taking 75% of my processing.

I am calculating the average amount of colour in an image (for my radiosity engine). glReadPixels seems faster than using glGetTexImage.

Is there some way of doing it on the GPU since it seems that getting the image back to system RAM is what is causing the slowness.

I am not familiar with pixel shaders or cg and I was wondering if they would allow me to read the frame buffer (or a texture) and perform the necessary calculation on the GPU.

Thanks for any help.

You can effectively down-sample a 4x4 area to a 1x1 output pixel using a fragment program, pixel shader, or register combiner quite easily. On more advanced hardware, you can probably go from 16x16 to 1x1. You can use this a few times to drop the size of your frame buffer (using render-to-pbuffer and pbuffer-as-texture, say) until it’s small enough that reading it is no longer a speed problem.

Also, when reading pixels, the driver may convert the data using the CPU, if you don’t read it in exactly the same format as the card uses. Thus, if your frame buffer is 24/32 bits, you should probably read as GL_BGRA; if it’s 16 bits, you should probably read it as GL_BGR5_A1.

out of interest - can you benchmark at 16bit colour and 32(24) bit colour and post the results here ?

Thanks.

Thanks jwatte,

here are some benchmarks with most of the code apart from the glReadPixels removed.

16bit - 1.41
32bit(RGB) - 2.18
32bit(BGRA) - 2.12

The figures show the number of seconds to render one frame. It is an unoptimised radiosity engine

I couldn’t find information as to how to use BGR5_A1.

EXT_packed_pixels

It actually got folded into the OpenGL spec a while back (1.2 perhaps?) so the glspec14.pdf file that’s available on this web site should give you all you need (except for the enumerant values).

>>16bit - 1.41
32bit(RGB) - 2.18
32bit(BGRA) - 2.12<<

one frame!, (unless the windows 4,000 x 4,000 it sounds as if youre doing something wrong)
my old tnt2 (+ my gf2mx) could do at least 20million RGB pixels a second (glReadPixels)

theres about 30 different methods of reading pixels (including packed pixels which often gives best results esp in 16bit color) i suggest u benchmark them all + choose then.

Hehe, it’s a radioisity engine so the lighting is calculated by rendering the scene thousands of times (for each frame) and reading back the light values. I’m actually getting about 170 million pixels per second. (GF4600)

Edit: Correction 60 million pixels per second.

[This message has been edited by Adrian (edited 11-28-2002).]

[This message has been edited by Adrian (edited 11-28-2002).]

I have an idea similar to what jwatte suggested, but I don’t know if it’s good enough for your purposes:

What if you turned on SGIS_GENERATE_MIPMAPS, uploaded your scene to a texture, and then read back the smallest mipmap (a single pixel)? That way there’s only one copyTexSubImage involved and you’re reading back only 1 pixel, plus you get hardware accelerated reduced-size filtering.

– Zeno

Thanks Zeno, I’ll try that out.

I’ve just found out that you can call readpixels asynchronously. It looks like it only recently became possible. http://www.nvidia.com/dev_content/nvopenglspecs/GL_NV_pixel_data_range.txt

very nice to see, this is big, more important (IMHO) than render_to_texture etc

Zed, what happened to your realtime radiosity engine? Do you have a demo I could have a look at? Did you use hemicubes or some other technique? Was it one pass or more?

Originally posted by Adrian:
[b]Hehe, it’s a radioisity engine so the lighting is calculated by rendering the scene thousands of times (for each frame) and reading back the light values.

Just out of curiosity, why are you reading back pixels from the color buffer?
It seems by your description that you are using progressive refinement or other iterative approach to solve the radiosity matrix. Storing delta radiances(unshot) in system ram would probably be faster. Put final radiance values(RGB radiance) in framebuffer.

I’m rendering a hemicube from each patch using the graphics hardware. I know there are issues with lack of intensity range when using a 32bit colour buffer but it seems ok so far. I only do one or two passes anything else is too slow for realtime.

I basically follow this approach but do the hemicube part in hardware not software. http://freespace.virgin.net/hugo.elias/radiosity/radiosity.htm

If I wanted to store in system ram I would have to use software rendering wouldn’t I? This would be too slow for me as my goal is to get it running in realtime.

[This message has been edited by Adrian (edited 11-29-2002).]

If I wanted to store in system ram I would have to use software rendering wouldn’t I? This would be too slow for me as my goal is to get it running in realtime.

[This message has been edited by Adrian (edited 11-29-2002).][/b]

Ahh, I see. I glanced at the page, I misunderstood you. The web page uses an approach nicknamed “gathering”. You use readpixels for formfactor determination.