"Background" rendering on HT machines ??

fetz · December 1, 2003, 12:31am

Hi there,

I found a new bug that occurs only with hyperthreading (intel 4 cpu) in my software which I don’t understand:

I have a thread which does the rendering of my data, sorted in a vertex array. Another thread is filling the array with the data that is to be rendered. The synchronization is safe (at least, that’s what I’m hoping ). The data that i want to render are (x,y) pairs of a (for example sine) curve, packed tight in the vertex array.
In the function which does the rendering, I change the data that is rendered before rendering, and that’s where the trouble starts:

for (int j = 1; j < VertexArraySize * 2; j+= 2) {
ArrayData[j] += Offset;
}
glVertexPointer (2, GL_INT, 0, ArrayData);
glDrawArrays (GL_LINE_STRIP, 0, VertexArraySize);

Obviously, this function should shift the line which i want to display by the given Offset in y-direction. Sometimes, though, it seems that the for()-loop is not executed completely before the glDrawArrays-call is executed. That is: the line which i want to display is only shifted partially. This happens when the process in running in default settings, which is on two processors of the hyper threading machine i’m using. When I set the affinity mask of the process to use only
one processor (which one: 0 or 1 does not care) the line is shifted correctly !

Does the vertex array get displayed more often than I call glDrawArrays() (in the background ?), or can the system (win xp) execute the for()-loop on one processor, and call the glDrawArrays() function on the other one ? — or something even more weired ??

Anybody an idea ??
Greetings,
fetz

[This message has been edited by fetz (edited 12-01-2003).]

maximian · December 1, 2003, 5:10am

It sound like your not synchronizing. The reason it works fine w/ out ht is because the code executes so quickly that the for loop is not being interrupted. if you would like to post some more code, we might be able to help.

fetz · December 1, 2003, 6:06am

Originally posted by maximian:
It sound like your not synchronizing.
The reason it works fine w/ out ht is because the code executes so quickly that the for loop is not being interrupted.

Hm. If another thread in my program uses much processor time (> 50%) the shifting of the line works as intended (even with the “two” processor model)

if you would like to post some more code, we might be able to help.

Ok. Let’s try. My synchronization is a little bit complex, but this should give you an idea:

// global variables
HANDLE DoDraw; (a manual reset event)
bool BusyDrawing = false;
bool ArrayFull = false;
GLint* VertexData;
int VertexDataSize;
//
// thread 1 (the thread which fills the vertex array with some data)
//
int DataOffset = 1;
while (1) {
DoSomething ();

if (ArrayFull) {
BusyDrawing = true;
SetEvent (DoDraw);
DataOffset = 1;
ArrayFull = false;
}

DoSomethingElse ();

if (BusyDrawing == false) {
VertexData[DataOffset] = y; // fill in the value to display
DataOffset += 2;
if (DataOffset >= VertexDataSize * 2) {
ArrayFull = true;
}
}

DoEvenMore ();

}

//
// thread 2 (the rendering thread)
//
while (1) {
if (WaitForSingleObject (DoDraw, timeout) == WAIT_OBJECT_0) { // wait for the event to be set
ShiftAndDisplayData();// (see initial post)
SwapBuffers ();
BusyDrawing = false;
ResetEvent (DoDraw);
}
}

Is the vertex data pointer used for rendering even after the SwapBuffers() call ?? That would be an explanation, because after SwapBuffers() returns, I definitely change the data behind the vertex pointer…

P.S.: The phenomenon occurres more often when I disable OGL HW support.

Anyone an explanation ?
Greetings, fetz

Csiki · December 1, 2003, 11:07am

It’s at least buggy:

Thread2: BusyDrawing=false
Thread1: Fills the array, ArrayFull=true
BusyDrawing=true, SetEvent(DoDraw)
Thread2: ResetEvent(DoDraw)

Why do you use events?
Use:
if (BusyDraw) {
…
} else Sleep(0)

imported_jwatte · December 1, 2003, 12:12pm

Synchronization is somewhat tricky, and none of the suggestions here seem like a great idea.

Are you using VertexArrayRange? If you are, then you have to execute an SFENCE after the writes to the range, before you’re guaranteed to see the changes on the other (logical) CPU, or on the bus, for that matter.

Also, when using VertexArrayRange, the drawing is asynchronous, even after DrawElements or SwapBuffers.

If you’re using plain vertex arrays (malloc() or similar) and no VertexArrayRange(), then as soon as DrawArrays/DrawElements/DrawRangeElements returns, the data has been copied to GL and you can do whatever you want with it.

It’s been my experience that multi-threading something that updates “in place” is almost never the right thing to do, neither in computer graphics nor in any other situation. Multi-threading in a FIFO manner, where one thread is a consumer, and one is a producer, is a little better. I e, you’d do something like this:

initialization:

CRITICAL_SECTION cs[2]; InitializeCriticalSection(…);
float myVertexArrays[2][ARRAY_SIZE];
HEVENT startup = CreateEvent(…);
threadA = CreateThread(ThreadAFunc);
threadB = CreateThread(ThreadAFunc);

ThreadAFunc()
{
EnterCriticalSection(&cs[0]);
int current = 0;
SetEvent(startup);
forever() {
produceIntoBuffer(myVertexArrays[current]);
LeaveCriticalSection(&cs[current]);
current = 1-current;
EnterCriticalSection(&cs[current]);
}
}

ThreadBFunc()
{
WaitForSingleObject(startup);
int current = 0;
forever() {
EnterCriticalSection(&cs[current]);
drawOutOfArray(myVertexArrays[current]);
LeaveCriticalSection(&cs[current]);
current = 1-current;
}
}

This will implement producer/consumer parallelism. Note that it still has a race, in that both arrays could be un-locked at the same time, and a thread getting both arrays in quick succession. However, both sides will always be using consistent data.

If you don’t mind using an event, you can use a producer event and a consumer event, and enforce exactly 1 level of parallelization, which will be correct in addition to consistent, but may waste a little more time on synchronization (the bet in the previous code is that critical sections are more efficient than raw events on average).