Not sure if this is an implementation bug or a problem on my end, and I’m stuck, so to the forums!
I’m still making some simple benchmarking functions for OpenCL, working on image samplers. The current mini kernel-in-question copies an image to a buffer iteratively (yes i know it would be more efficient to copy one pixel per work item, this is to test instruction latency):
disregarding the confusion between “__constant” and “const” that ATI creates, which is currently under discussion in another topic…
I can compile this fine within ATI Stream Kernel Analyzer, but in program, i get the following errors:
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0xf): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0x24): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0x31): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0x68): undefined reference to `__read_imagei_image2d2i32'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0x78): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0xa4): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0xb4): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0xe8): undefined reference to `__read_imagei_image2d2i32'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0xf8): undefined reference to `__get_image_width_image2d'
C:\Users\agimenez\AppData\Local\Temp\OCL3DB9.tmp.obj:fake:(.text+0x105): undefined reference to `__get_image_width_image2d'
no idea where to go from these errors… is ATI the issue or am i missing something else?
There is also an error with the values in the sampler that you are using with read_imagei.
The sampler you are using with read_imagei in your kernel is defined to be: (CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_REPEAT | CLK_FILTER_NEAREST).
The spec states the following for read_imagei and read_imageui:
"The read_image{i|ui} calls support a nearest filter only. The filter_mode specified in sampler
must be set to CLK_FILTER_NEAREST; otherwise the values returned are undefined.
Furthermore, the read_image{i|ui} calls that take integer coordinates must use a sampler with
normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to
CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined."
You cannot use CLK_NORMALIZED_COORDS_TRUE and CLK_ADDRESS_REPEAT which you are using. In the code it looks like you are passing un-normalized coords but the sampler is programmed in your code to use normalized coords.
Given those error messages I’m inclined to think that the problem is in your host program, not in your kernel. They look more like linker errors than compiler errors. Perhaps you are trying to use this kernel on a device with no image support?
Andrew, looks like that was the issue. I was under the impression that I’d get an error from clCreateImage2D if image support was not available, but because I have several devices with different capabilities, things get confused…
I’m creating an OCL device for all available hardware (one CPU, two GPU), and if I disable the CPU device (which doesn’t have image support), it compiles, but gives me an access violation when I run it. This looks like a problem on my end with buffer sizes, will post later if it turns out to be something more involved.
So the big problem was that I have three devices (all held within a private class), two of which have image support and various extensions, and one of which has no image support and more limited extensions. Looks like I’ll have to run the kernels device-specifically for “special” instructions like images or atomic functions.