Hi
I’m new to OpenCL and am trying to port the source code below to Nvidia GPU Quadro FX1700 using OpenCL. There are data feedback (i.e. alpha_t[s]=new_alpha_t[s]) in the nested loops so that the intermediate results are used in the subsequent computations. How do I achieve data feedback in the kernel? I used a global work size of 752 and local work size of 8 for my kernel. In addition, I perform a loop unrolling in the innermost loop (i.e. z) to achieve sum[0], sum[1], sum[2] and sum[3].
sm_lut[4][8] = {{1,5,5,7,6,3,0,5},
{2,6,4,6,5,2,2,7},
{7,7,3,1,3,5,2,2},
{0,1,2,0,4,4,6,1}
};
int s,m,z;
int alpha_t[8]={0};
int new_alpha_t[8];
for (m=0; m<752; m++)
{
for (s=0; s<8; s++)
{
int sum[4];
for (z=0; z<4; z++)
{
int sm1;
sm1 = sm_lut[z][s];
sum[z] = alpha_t[sm1];
}
new_alpha_t[s]=max4(sum[0],sum[1],sum[2],sum[3]);
}
for (s=0; s<8; s++)
alpha_t[s]=new_alpha_t[s];
}
Thanks in advance for your help.