Device Function Inlining

Hi,

I was wondering if there was information about how device functions are treated by the CL compiler? I’ve read that all function calls made from kernels are inlined; is this true? If so, are args passed by value copied or directly substituted?

inline void foo(MyStruct *a) { a->b = 0; }
inline void foo(MyStruct a) { a.b = 0; } // equivalent?

Thanks for any information.

It depends on the device type. GPUs do not have a stack and their ISAs don’t have anything similiar to CALL operator. This is why recursion is not supported in OpenCL C. On these devices everything is indeed inlined even without inline keyword. In case of CPU devices, however, compiler can do whatever it wishes.

Regarding your example, you are incorrect. Passing the struct by value means only local copy of “a” will be modified. As a result, the second function has no side effects and will be optimized out.

inline int foo(MyStruct *a) { return a->b * 4; }
inline int foo(MyStruct a) { return a.b * 4; }

Here a call of any function will result in the same behavior and, probably, ISA. But it is impossible to tell what happens exactly: OpenCL C will be converted to LLVM IR and will go through a number of optimisation passes (for AMD and NVIDIA at least), so end result is completely unpredictable.

[QUOTE=Salabar;40866]It depends on the device type. GPUs do not have a stack and their ISAs don’t have anything similiar to CALL operator. This is why recursion is not supported in OpenCL C. On these devices everything is indeed inlined even without inline keyword. In case of CPU devices, however, compiler can do whatever it wishes.

Regarding your example, you are incorrect. Passing the struct by value means only local copy of “a” will be modified. As a result, the second function has no side effects and will be optimized out.

inline int foo(MyStruct *a) { return a->b * 4; }
inline int foo(MyStruct a) { return a.b * 4; }

Here a call of any function will result in the same behavior and, probably, ISA. But it is impossible to tell what happens exactly: OpenCL C will be converted to LLVM IR and will go through a number of optimisation passes (for AMD and NVIDIA at least), so end result is completely unpredictable.[/QUOTE]

Thank you very much for the response. This behavior is similar to what I had heard of the CL compilers. I was curious how aggressive the inlining was, even in the case of modifying copied args. Thanks for confirming this.