VBO Test - glBufferData vs glBufferSubData vs glMapBufferOES

Hi,

I am John, and new to this forum. I want to start by saying, I love OpenGL ES. There are few things that the more I use, the more I fall in love with; C and OpenGL ES are among them.

Here is my question:
I have been doing a lot of tests for how I should setup the base rendering engine. Right now I am not using buffers, but had been considering them for quite a while. I have found that glMapBufferOES is slower then glBufferSubData which is slower then glBufferData; and I am wondering why that is.

My test environment:
OpenGL ES 1.1
iPad (first generation) - PowerVR SGX
iPhone 3G - PowerVR MBX

To my understanding, glBufferSubData should always be faster then glBufferData because glBufferData reallocs the memory each time called, thus if your size doesn’t change, use glBufferSubData otherwise use glBufferData. What I have found is that glBufferSubData runs about 68% the speed of only using glBufferData.

I also understand that glMapBufferOES is an extension, but I have found that it also runs slower then glBufferSubData or glBufferData. It, in fact, is the slowest way of updating vertex information.

Overall:
iPad speed: glMapBufferOES < glBufferSubData < glBufferData <= No buffers
iPhone speed: glMapBufferOES ? glBufferSubData ? glBufferData ? No buffers

Is this normal?

Thanks for your help!

There might be inefficiencies in the implementation of glMapBuffers and glBufferSubData on the ipad. If things are optimal I would expect:

glMapBufferOES ? glBufferSubData ? glBufferData

Note, some driver will optimize the case of doing uploads with glBufferData if the size is the same as the previous, which allows skipping the re-allocation. You’re likely hitting the optimized case. Otherwise, it would be much slower then glBufferSubData and glMapBuffers. You can try removing or adding one vertex each frame, and I bet you’d see a difference.

Thanks for the reply!

I took your suggestion and randomized the quantity of data I sent to openGL (seeding prior to each test of course). Unfortunately I came up with the same results.

Note: The values below are Frames Per Second that were averaged over 300 tests.

Key:


I = Not buffered         P = Points     N = Not textured
M = glMapBufferOES       Q = Quad       Y = Textured
S = glBufferSubData
V = glBufferData

iPad


Count   IPN     IPY     IQN     IQY     MPN     MPY     MQN      MQY     SPN     SPY    SQN     SQY     VPN     VPY     VQN     VQY
256     60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00
384     60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00
512     60.00	60.00	60.00	60.00	60.00	51.51	60.00	60.00	60.00	51.79	60.00	60.00	60.00	60.00	60.00	60.00
768     60.00	53.95	60.00	60.00	60.00	40.53	60.00	56.76	60.00	40.54	60.00	57.18	60.00	54.07	60.00	60.00
1024    60.00	42.78	60.00	60.00	60.00	33.26	60.00	48.07	60.00	33.41	60.00	48.31	60.00	42.74	60.00	60.00
1532    60.00	29.96	60.00	48.71	60.00	24.60	60.00	36.71	60.00	24.69	59.87	36.72	60.00	30.09	60.00	48.23
2048    60.00	23.13	60.00	37.88	54.96	19.55	51.04	29.70	55.59	19.65	51.45	29.77	60.00	23.00	60.00	37.86
3072    60.00	15.85	60.00	26.11	43.30	13.82	39.87	21.40	43.71	13.87	39.97	21.35	60.00	15.81	60.00	26.18
4096    54.32	12.09	49.16	18.79	35.81	10.74	32.81	16.83	36.28	10.78	33.06	16.82	54.30	12.08	49.17	19.98
6144    40.51	 8.19	33.97	12.68	26.44	 7.40	23.86	11.71	26.76	 7.43	24.09	11.74	40.36	 8.18	36.56	13.60
8192    31.51	 6.19	26.08	 9.58	21.02	 5.65	19.00	 9.06	21.27	 5.67	19.21	 9.08	31.47	 6.19	28.72	10.27
10500   25.23	 4.85	20.42	 7.53	17.02	 4.46	15.12	 7.14	17.28	 4.47	15.31	 7.18	25.14	 4.86	22.84	 8.05

iPad (Count - rand() % 10)


Count   IPN     IPY     IQN     IQY     MPN     MPY     MQN      MQY     SPN     SPY    SQN     SQY     VPN     VPY     VQN     VQY
256     60.00   60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00
384     60.00   60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00	60.00
512     60.00   60.00	60.00	60.00	60.00	51.56	60.00	60.00	60.00	51.89	60.00	60.00	60.00	60.00	60.00	60.00
768     60.00   54.64	60.00	60.00	60.00	40.56	60.00	56.29	60.00	40.53	60.00	56.74	60.00	53.90	60.00	60.00
1024    60.00   42.79	60.00	60.00	60.00	33.30	60.00	47.73	60.00	33.47	60.00	47.97	60.00	42.92	60.00	60.00
1532    60.00   30.14	60.00	48.39	60.00	24.61	59.68	36.39	60.00	24.70	59.83	36.41	60.00	30.20	60.00	48.81
2048    60.00   22.96	60.00	37.58	54.67	19.54	50.84	29.34	55.50	19.64	51.20	29.48	60.00	23.01	60.00	37.14
3072    60.00   15.88	60.00	25.78	43.05	13.81	39.71	21.14	43.64	13.87	39.71	21.12	60.00	15.87	60.00	25.69
4096    53.66   12.08	48.94	18.50	35.64	10.73	32.70	16.62	36.15	10.77	32.96	16.70	54.47	12.09	48.96	19.65
6144    40.48    8.17	33.59	12.55	26.38	 7.40	23.80	11.57	26.68	 7.43	23.87	11.61	40.33	 8.20	36.45	13.37
8192    31.54    6.20	25.88	 9.46	20.93	 5.65	18.92	 8.93	21.19	 5.67	19.09	 8.97	31.30	 6.18	28.50	10.14
10500   24.97    4.85	20.38	 7.43	16.98	 4.45	15.12	 7.04	17.20	 4.46	15.12	 7.07	25.24	 4.85	22.82	 7.94

iPhone


Count   IPN     IPY     IQN     IQY     MPN     MPY     MQN      MQY     SPN     SPY    SQN     SQY     VPN     VPY     VQN     VQY
256     30.00	29.23	20.77	20.12	30.00	29.38	20.22	19.18	30.00	29.24	20.22	19.29	30.00	29.33	20.11	20.18
384     30.00	27.21	14.82	14.47	30.00	27.29	15.60	14.46	30.00	27.36	15.41	14.36	30.00	27.12	14.84	14.38
512     30.00	24.60	12.11	11.32	30.00	24.77	12.09	11.29	30.00	24.70	12.10	11.28	30.00	24.63	12.08	11.26
768     28.61	18.41	 8.21	 7.74	28.58	18.63	 8.22	 7.74	28.64	18.30	 8.21	 7.73	28.66	18.30	 8.20	 7.72
1024    27.97	17.07	 6.17	 5.85	27.67	17.06	 6.17	 5.84	27.36	17.08	 6.15	 5.81	27.29	16.99	 6.15	 5.81
1532    26.06	12.54	 4.06	 3.86	25.70	12.53	 4.06	 3.86	26.19	12.55	 4.06	 3.85	25.69	12.47	 4.05	 3.84
2048    21.28	 9.79	 2.94	 2.80	21.40	 9.80	 2.94	 2.79	21.40	 9.77	 2.93	 2.79	21.26	 9.76	 2.91	 2.79

Whoops, hit submit instead of preview and it won’t let me edit :-(. I was trying to align the tables properly.

I should also note that ‘quads’ refers to GL_TRIANGLE_STRIP with 6 * count - 2 vertices.

I am just really confused why glBufferSubData could ever be slower then glBufferData.

Are you replacing the entire buffer contents or just a subset?

Whoops, sorry, I forgot to explain.

The count in my tests is the quantity that I am actually updating and drawing. The real amount in memory is always the “next power of two”.

Ex:
Count (update and send) -> in Memory
256 -> 256
384 -> 512
512 -> 512
768 -> 1024
1024 -> 1024

etc.

info->_sendCount = count;
info->_maxCount = nextPowerOfTwo(count);


void render()
...
if (info->_lastMaxCount != info->_maxCount)
{
    info->_lastMaxCount = info->_maxCount;
    glBufferData(GL_ARRAY_BUFFER, dataSize * info->_maxCount, info->_vertices, GL_DYNAMIC_DRAW);
}
else
{
    glBufferSubData(GL_ARRAY_BUFFER, 0, dataSize * info->_sendCount, info->_vertices);
}
...

This should give subBuffer an advantage on half of the tests (unless bufferData is optimized as jpilon mentioned). Though, even if it is the full buffer though, shouldn’t it be at least equal speed?

If you want, I can post the code, I don’t mind sharing :-).

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.