EDIT:
Solved
I am in the process of porting a graph based genetic algorithm and I keep coming across a strange problem. I generate my chromosomes on the cpu and offload them to the gpu. One of the steps of my fitness function is to determine how many bits are set to 1 (which would indicate the inclusion of a node). When trying to verify results on the cpu, the numbers are not matching up. First I figured my local caching was the issue, so I switched to using my global memory, to no avail. Next I figured maybe the integer modulus and division were the problem, so I tried re-implementing it using floating point operations and casts. Still not working. It seems to me that the chromosome isn’t being copied properly for gid > 0. Has any one used bitmasks effectively on the gpu?
here, chrome_local is an array
uchar chrome_local[CHROM_SIZE_BYTES];
totalChromOff = gid*popSize*CHROM_SIZE_BYTES+i*CHROM_SIZE_BYTES;
//copy the chromosome
for (int n = 0; n < CHROM_SIZE_BYTES; n++) {
chrome_local[n] = InputChroms[totalChromOff+n];
}
sSize = 0;
//count all zero size items
for (unsigned int item = 0; item < numVerts; item++) {
if (!isBitZero(chrome_local, item))
{
sSize++;
}
}
OutputFitness[gid*popSize+i] = (float)sSize;
As a note: This is my first attempt at using OpenCl and I love the power. I just need to learn all the tricks of the trade
If you have any questions or need more information please let me know!
Note: It seems that the first workitem in the group calculates all of its sizes properly, but every other workitem is off. Might this have to do with memory access?
Also, I’ve tried copying the chromosomes back after they are written and recalculating. It all gets copied correctly. So the problem either lies in the conditional being executed incorrectly for whatever reason for gid>0, or chrome_local not having the correct data for gid>0. The address gets calculated properly as far as I know. I’ll try eliminating the conditional using a lookup table. If that doesn’t fix it, chrome_local must not be copied correctly. Otherwise I guess I’m just crazy
Okay now I’ve changed the code to use a lookup table and it doesn’t work…
totalChromOff = (gid*POP_SIZE+i)*CHROM_SIZE_BYTES;
sSize = 0;
//copy the chromosome
for (int n = 0; n < CHROM_SIZE_BYTES; n++) {
chrome_loc[n] = InputChroms[totalChromOff+n];
sSize += LookupTable[ chrome_loc[n] ];
}
OutputFitness[gid*POP_SIZE+i] = (float)sSize; //testing
Here is my code in the host
Buffer bufferMyChroms = Buffer(context, CL_MEM_READ_WRITE, CHROM_SIZE_BYTES*numTotalChromosomes * sizeof(cl_uchar));
...
queue.enqueueWriteBuffer(bufferMyChroms, CL_TRUE, 0, sizeChrom*numTotalChromosomes, chromosomes);
...
kernelGA.setArg(1, bufferMyChroms);