please see this kernel and correspond output, when I execute this kernel it will work for the first thread and the other will give a wrong output, so any one can help me to figure out what is the problem?
__kernel void rmsCalculation(const __global float* a ,
const __global float * C,
__global float * O,
const int col)
{
const int ar = get_global_id(0);
float R=0;
float I=0;
float c=0;
bool totalSch = true;
float sum=0;
for(int j=0;j<col; ++j)
{
c = C[j] * a[ar * col + j];
I=0;
do
{
R = I + c;
I=0;
if(R>T[j])
{
totalSch = false;
break;
}
else
{
for(int k=0 ; k<j ; ++k)
{
I = I + C[k] * a[ar * col + k];
}
}
}while(I+c > R);
sum = sum + R;
if(totalSch == false)
{
break;
}
}//end for(j=0..
O[ar]=sum;
}
but in the output the first element in the “O[]” array is calculated correctly but the other elements are wrong as shown bleow;
0= 11
1= -9.99199e+18
2= -9.99199e+18
3= -9.99199e+18
4= -9.99199e+18
5= -9.99199e+18
6= -9.99199e+18
7= -9.99199e+18
8= -9.99199e+18
9= -9.99199e+18
10= -9.99199e+18
11= -9.99199e+18
12= -9.99199e+18
13= -9.99199e+18
14= -9.99199e+18
15= -9.99199e+18
16= -9.99199e+18
17= -9.99199e+18
18= -9.99199e+18
19= -9.99199e+18
…
So what is the problem in the code?