CL_OUT_OF_RESOURCE on NVIDIA GPUs only.

Hi all!

I already started a thread in NVIDIA forums, but it looks like noone is interested in things different from CUDA… Here is my problem again:

Some month ago I started working on a fractal raytracer in C++/OpenCL. I already met a lot of bugs in the NVIDIA OpenCL compiler (access violations when declaring variables without using them) but I was Always able to find some workaround. This time it looks a bit more serious: In my raytracer I have to select the color of the nearest shape. I implemented this using a switch, but I always get a CL_OUT_OF_RESOURCES error when reading the output buffer (CL_INVALID_COMMAND_QUEUE if I call clFinish() before). This happens only with NVIDIA GPUs, works correctly with AMD GPU and Intel CPU. This is the important part of the code:

TracerOut Trace(CameraOut in, SceneParams params)
{
	float dist[2];
	TracerOut out;

	Mandelbulb1_OO Mandelbulb1_oo = Mandelbulb1_Object(in, params);
	dist[0] = distance(Mandelbulb1_oo.intersection, in.origin);
	Mandelbulb2_OO Mandelbulb2_oo = Mandelbulb2_Object(in, params);
	dist[1] = distance(Mandelbulb2_oo.intersection, in.origin);

	uint nearestId = 0;
	float nearestDist = 10000000.0f;
	for (uint i = 0; i < 2; i++)
	{
		if (dist[i] < nearestDist)
		{
			nearestDist = dist[i];
			nearestId = i;
		}
	}

	// Trick needed to avoid access violation bug
	Mandelbulb1_SO Mandelbulb1_so;
	Mandelbulb1_so.color.x = 0.0f;
	Mandelbulb2_SO Mandelbulb2_so;
	Mandelbulb2_so.color.x = 0.0f;

	switch (nearestId)
	{
	case 0:
		Mandelbulb1_so = Mandelbulb1_Shader(in, Mandelbulb1_oo, params);
		out.color = Mandelbulb1_so.color;
		break;
	case 1:
		Mandelbulb2_so = Mandelbulb2_Shader(in, Mandelbulb2_oo, params);
		out.color = Mandelbulb2_so.color;
		break;
	default:
		out.color = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
		break;
	}

	return out;
}

I imagined that the switch construct can cause the problem, so i tried with simple if’s:

TracerOut Trace(CameraOut in, SceneParams params)
{
	//...

	Mandelbulb1_SO Mandelbulb1_so;
	Mandelbulb1_so.color.x = 0.0f;
	Mandelbulb2_SO Mandelbulb2_so;
	Mandelbulb2_so.color.x = 0.0f;

	out.color = (float4)(0.0f, 0.0f, 0.0f, 0.0f);

	Mandelbulb1_so = Mandelbulb1_Shader(in, Mandelbulb1_oo, params);
	Mandelbulb2_so = Mandelbulb2_Shader(in, Mandelbulb2_oo, params);

	if (nearestId == 0)
		out.color = Mandelbulb1_so.color;
	
	if (nearestId == 1)
		out.color = Mandelbulb2_so.color;

	return out;
}

And I still have the same problem. Removing one or both the if’s solves the problem:

TracerOut Trace(CameraOut in, SceneParams params)
{
	//...
	out.color = (float4)(0.0f, 0.0f, 0.0f, 0.0f);

	Mandelbulb1_so = Mandelbulb1_Shader(in, Mandelbulb1_oo, params);
	Mandelbulb2_so = Mandelbulb2_Shader(in, Mandelbulb2_oo, params);

	out.color = Mandelbulb1_so.color;
	
	if (nearestId == 1)
		out.color = Mandelbulb2_so.color;

	return out;
}

Removing the declaration of the structs also lets it run fine:

TracerOut Trace(CameraOut in, SceneParams params)
{
	//...

	switch (nearestId)
	{
	case 0:
		out.color = Mandelbulb1_Shader(in, Mandelbulb1_oo, params).color;
		break;
	case 1:
		out.color = Mandelbulb2_Shader(in, Mandelbulb2_oo, params).color;
		break;
	default:
		break;
	} 

	return out;
}

But of course this is not what I want. I know that conditionals are very bad for GPUs, but at the moment I don’t have other solutions (someone has suggestions? :slight_smile: ), optimization will come later. This should be supposed to work so I believe this is a bug in the NVIDIA OpenCL driver, right? Anyone had similar problem? Any fix coming?

I tried to run my program on different PCs. I can run it without problems on the fallowing devices:

[ul]
[li]CPU Intel i7 2600K
[/li][li]CPU Intel i7 920
[/li][li]CPU Intel i7 2620M
[/li][li]GPU Intel HD Graphics 3000
[/li][li]GPU AMD HD 6470M
[/li][/ul]
I get the CL_OUT_OF_RESOURCES / CL_INVALID_COMMAND_QUEUE errors on:

[ul]
[li]GPU NVIDIA GTX 680 (EVGA) 320.18
[/li][li]GPU NVIDIA GTX 560 Ti OC (Gigabyte) 320.18
[/li][li]GPU NVIDIA GTX 470 (Zotac) 320.18
[/li][/ul]

Last remark: some parts of the code my look bad written… This is because I’m not writing directly the OpenCL code. I’m writing a program that assembles OpenCL scripts dynamically and then runs them.

Thank you!
Mattia.