Branching in compute kernels

Hi,

this is probably not the best place to ask this question, but I could not find any better place.
I’m porting a D3D11 program I wrote to Vulkan, in this program I do volume ray marching.
At each ray step I need to do some operations that requires a dynamic for loop, as the iterations number is evaluated on the fly at each step along the ray.
This means no loop unrolling or compiler optimizations.
On the other hand the hlsl code works just fine keeping a real-time speed.
In my Vulkan implementation, it just crashes after the compute fence goes timeout because of resource locking.
What “fixed” the problem is using a constant value in the loop of course, but that kinda kills my algorithm.

Does anybody have tried anything like this? Have some infos on this matter? I would like to dig into this further.

Here’s a pseudo-code of my ray marching algorithm


vec3 ForEachStep()
{
        vec3 retVal = 0, 0, 0;
        numIter = FunctionCallToDetermine();
        for i = 0 to numIter
       {
              retVal += FuncCall();
       }
       return retVal;
}

const uint numSteps = someValue;

void main()
{
       // do other unrelated stuff
       vec3 color = 0, 0, 0
       for i = 0 to numSteps
      {
           // do other unrelated stuff 
           
           color += ForEachStep();
           
           // do other unrelated stuff
      }

      // store color to texture target
}

Cheers.