Hello,
I’m trying to work the following problem.
I am trying to create a tiled version of an image where per tile i need to find the min/max luminance value.
To start with i began with a 1024x512 image with tiles of 16x16 each.
Based on what i’ve been reading this is the kernel i came up with using barriers to synch the local threads in a workgroup. I assumed that by synchronising it and creating a __local variable for Min and Max they could actually come up with a correct value for it, but it’s not the case… My results are every frame the image changes in a way i can’t really understand what values it comes up with:
on the host side here’s what i’m doing:
1024*512 work items in workgroups of 16x16
clProgram->Set( 0, clTexMem ); // 1024x512 res input
clProgram->Set( 1, clMinMaxLumMem ); // 64x32 res output
clProgram->Set( 2, 0.0f ); // default min value
clProgram->Set( 3, 1.0f ); // default max value
clProgram->Set( 4, tileSize ); // tile size (default: 16)
clProgram->Set( 5, AppParams::showTileMaxZ?1:0 ); // a condition in the kernel that renders the min OR max just for debugging
int workItemsX = Math::Min( (int)clDevice->info.maxWorkGroupSize, tex->GetWidth()/tileSize );
int workItemsY = Math::Min( (int)clDevice->info.maxWorkGroupSize, tex->GetHeight()/tileSize );
clDevice->Run2D( workItemsX*tileSize, workItemsY*tileSize, tileSize, tileSize );
Here’s the kernel code:
__kernel void Image_GetMinMaxLum( read_only image2d_t srcImage, write_only image2d_t dstImage, float nearClipPlane, float farClipPlane, int tileSize, int showTileMaxZ )
{
const float3 lumColor = (float3)( 0.30, 0.59, 0.11 );
int uStart = get_global_id(0);
int vStart = get_global_id(1);
int lx = get_local_id(0);
int ly = get_local_id(1);
float4 color;
// input image coords
int2 coords = (int2)( uStart, vStart );
// output image coords
int2 coordsDst = (int2)( get_global_id(0)/tileSize, get_global_id(1)/tileSize );
// local work group vars that store min/max values
__local float minZ;
__local float maxZ;
if( showTileMaxZ > 0 )
{
maxZ = nearClipPlane;
color = read_imagef( srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords );
float Y = dot( lumColor, color.rgb ); // Luminance
maxZ = max(maxZ, Y);
color.rgb = (float3)( maxZ );
}
// barrier( CLK_LOCAL_MEM_FENCE );
if( !showTileMaxZ )
{
// attempt to reset minZ to max value when first thread runs (assuming its the first one that runs??)
// if( (lx+ly) == 0 )
minZ = farClipPlane;
color = read_imagef( srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords );
float Y = dot( lumColor, color.rgb ); // Luminance
minZ = min(minZ, Y);
color.rgb = (float3)( minZ );
}
barrier( CLK_LOCAL_MEM_FENCE );
write_imagef( dstImage, coordsDst, color );
}
I’m pretty new with OpenCL but the current tests above where all based on what i read online and found in sources around the web.
I have to say that i did a test with FOR loops and it seems to work, but the thing is it’s probably not the way to do it.
Basically its a 16x16 loop to compute the min/max of all items in a block of pixels and save it. With this method i create less work items, but also have 16x16 image reads more.
I’m looking for advice on this, hopefully someone can help ?
Thanks
f