OutOfResources and InvalidCommandQueue errors

dlw · July 20, 2011, 1:43pm

Hey guys,

I am running a kernel using cloo in C#, but I keep having some random errors that I cannot figure out. At first I thought it was an index out of bounds exception inside the GPU, but then it reported back an InvalidCommandQueue error when I switched the order of my read-execute calls on the host. So, I pulled me kernel apart piece by piece and something funny happened. (Well not so funny to me). Here is my last 3 lines

float value = weight/total;
output[0] = value;

//OR depending on whether or not it passes an if above

output[index] = 255;

the strange thing is that the output[index] = 255 works. But if I try to assign a value to it that is something else, like the weight/total value, then it breaks the command queue, so I think there is something wrong with either the amount of data im passing in (<115k floats, 128k chars), or something is happening and the error is something that is oblivious to me. Any ideas on what would cause these errors, but interchangeably depending on how I order my commands?

And if it helps, I am running 2x Nvidia 485M cards on my laptop, for 4GB total RAM on the cards, and if I did my math right, that many numbers constitutes a max of 580kB of information.

david.garcia · July 20, 2011, 3:15pm

Can you show us the whole kernel source? Can you show us the values you pass to clEnqueueNDRangeKernel()? (That is, global size, local size)

Is it possible that “total” is zero, and so you are trying to divide by zero?

dlw · July 21, 2011, 10:45am

here is my kernel. I changed some of the variables names, due to the fact that the project is under IP. but it is essentially a convolution algorithm applied using weights that vary depending on the y values;

__kernel void Convo(
	        global uchar* input,
	        global float* output,
	        global float* weightsIn,
	        private const int numDataPerRow,
            private const int numDataPerCol,
	        global const int* numWeightsX,
	        global const int* numWeightsY,
            global const int* indices)
        {
            int index = get_global_id(0);  
            
            if(input[index] < 255)
            {
			    bool flipped = true;
	            int x = index%numDataPerRow;
	            int y = index/numDataPerRow;

                int yVal = ((75-y*.625)+2.5)/5;
                if (yVal <= 0)
                {
                    yVal *= -1;
                    flipped = false;
                }
                else
                    flipped = true;
                
                int halfValRowWeights = numWeightsX[yVal]/2;
                int halfValColWeights = numWeightsY[yVal]/2;

	            int left = x- halfValRowWeights;
	            int right = x+ halfValRowWeights;
	            int top = y- halfValColWeights;
	            int bottom = y+ halfValColWeights;

	            float weight = 0;
                float total = 0;

                //the weight indices
                int weightRow = 0;
                int weightCol = 0;

                //if we are in the negative, then we need to adjust 
                //the Y index in the weight grid, and the dataGrid
                if (top < 0)
                {
                    //adjust the weight coordinates
                    weightRow = abs(top);

                    //set top of data to 0
                    top = 0;
                }
                else if (y > numDataPerCol - 1 - halfValColWeights)
                {
                    //dont have to change weights here because we access all of them
                    bottom = numDataPerCol - 1;
                }

                if (flipped)
                {
                    y = numWeightsY[yVal]-1;
                }

                //the values from the array
                float weightVal = 0;
                float inputVal = 0;

                for (int yIn = top; yIn <= bottom; yIn++)
                {
                    weightCol = 0;
                    for (int xIn = left; xIn <= right; xIn++)
                    {
                        //used if we have a wrapped X value
                        int indexWaveDataX = 0;
                        
                        if (xIn < 0)
                        {
                            //we need to wrap in the X
                            indexWaveDataX = numDataPerRow + xIn;
                        }
                        else if (xIn > numDataPerRow - 1)
                        {
                            indexWaveDataX = xIn - numDataPerRow;
                        }
                        else
                        {
                            indexWaveDataX = xIn;
                        }
                        inputVal = input[yIn*numDataPerRow + indexWaveDataX];
                        weightVal = weightsIn[weightCol + weightRow * numWeightsX[lat] + indices[lat]];
                        if (inputVal < 255)
                        {
                            weight = weight + (weightVal * inputVal);
                            total = total + weightVal;
                        }
                        weightCol++;
                    }
                    if (flipped)
                        weightRow--;
                    else
                        weightRow++;
                }
                float value = (float)(weight/total);
                output[index] = value;
            }
            else
                output[index] = -1;
        }

The syntax is correct, I know that. Here is the code for my host.

myWatch.Start();
            float[] convert;
            IEnumerable<float> we;

            ComputeEventList eventList = new ComputeEventList();
            ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None);
            ComputeKernel kernel = null;

            for (int i = 0; i < totalRuns; i++)
            {
                numWeightsXS = new ComputeBuffer<int>(context,
                ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, numWeightsX[i]);

                numWeightsYS = new ComputeBuffer<int>(context,
                ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, numWeightsY[i]);

//indices buffer is used to determine where we are in convert
//convert is a 3d array, flattened at the second level
                indicesBuffer = new ComputeBuffer<int>(context,
                ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, indices[i]);

                we = weights[i].SelectMany(x => x);
                convert = we.ToArray();

                weightsIn = new ComputeBuffer<float>(context,
                ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer, convert);

                Console.WriteLine(convert.Length);
                
                kernel = program.CreateKernel("Convo");

                    inBuffer = new ComputeBuffer<byte>(context,
                    ComputeMemoryFlags.ReadOnly | //used to make sure that we use a const
                    ComputeMemoryFlags.CopyHostPointer, //so we cant change the original data, but we have it
                    dataArray[i]);

                    outBuffer = new ComputeBuffer<float>(context,
                    ComputeMemoryFlags.WriteOnly, // make sure that we arent reading and writing to this object
                    dataOut[i]);

                    kernel.SetMemoryArgument(0, inBuffer); // make sure you sync these with the kernel arguments
                    kernel.SetMemoryArgument(1, outBuffer);
                    kernel.SetMemoryArgument(2, weightsIn);
                    kernel.SetValueArgument<int>(3, globalData[i].XGrids);
                    kernel.SetValueArgument<int>(4, globalData[i].YGrids);
                    kernel.SetMemoryArgument(5, numWeightsXS);
                    kernel.SetMemoryArgument(6, numWeightsYS);
                    kernel.SetMemoryArgument(7, indicesBuffer);


                    commands.Execute(kernel, null, new long[] { dataArray[i].Length }, null, eventList);
                    commands.ReadFromBuffer<float>(outBuffer, ref dataOut[i], true, eventList);
                    eventList.Wait();
                }
                commands.Finish();
                eventList.Clear();
                if(DEBUG)
                for(int a = 0; a < dataOut[i].Length; a++)
                {
                    Console.WriteLine(dataOut[i][a]);
                }

            }
            myWatch.Stop();
            //printData("GPU: ");
            myWatch.Reset();

            kernel.Dispose();
            inBuffer.Dispose();
            outBuffer.Dispose();
            commands.Dispose();
            weightsIn.Dispose();
            numWeightsXS.Dispose();
            numWeightsYS.Dispose();
            indicesBuffer.Dispose();
            program.Dispose();
            context.Dispose();

Sorry its alot of code. The data that is getting to the kernel is correct. I output it and it is giving the right numbers. Note that I had to cast from bytes to chars, due to memory alignment problems between the host and GPU.
So I dont know if im killing something I shouldnt be too soon, or if its an index out of bounds thing. I am assuming the index out of bounds, because if I get rid of the weights line that uses the indices, then it stops failing. But it may be something else. Any ideas?

dlw · July 21, 2011, 11:06am

edit on the kernel code above : This line

weightVal = weightsIn[weightCol + weightRow * numWeightsX[lat] + indices[lat]];

changed to

weightVal = weightsIn[weightCol + weightRow * numWeightsX[yVal] + indices[yVal]];

to keep it consistent with the changes. I have to copy it over from one machine to another to get the code online…but this is the only code that I can show

dlw · July 21, 2011, 12:10pm

I just found this post: which relates directly to what is happening in my program. viewtopic.php?f=37&t=3613&p=12078#p12078

dlw · July 21, 2011, 1:09pm

AND FINAL POST:::

I was clearing the event list and calling commands.finish() when it got through running one iteration of the loop, for some reason, this was causing the commands object to be invalidated…maybe the events werent ready to be finished…idk. But I got rid of those two calls, and it brought the code back up to running standards. So for anyone who read through ALL the code i posted, thank you for your time, and thanks to David.Garcia for his help