Question about Kernel Arguement Speed

    Question about Kernel Arguement Speed

    Would it be faster to declare variables in my kernel and assign them to arguments passed if I'm going to be using them a lot?

    Or is just using the arguments fast.

    For example

    if I pass a float as an argument and it is located in memory, would it be just as fast to use the memory as to make a local object in the kernel and assign it to the memory?

    I guess the real question is do stream processors have local cache memory or register memory, and if they do, do kernels use it?

    My 8800GTS is supposed to get 200+ gigaflops and I'm getting about 1.6 lol. Which I know I won't get anything near the 200 as my algorithm does much more than just floating point operations, but to say 1.6 compared to 200...seems like my kernels could be sped up a bit.

    Re: Question about Kernel Arguement Speed

    The first step of performance tuning in any language is measuring where time is being spent.

    You mention you are using an NVidia platform. Why not give Visual Profiler a look? (The page seems to be down, maybe due to AWS' downtime)

    It's also a good idea to read some general guides on how to write OpenCL code, such as NVidia's OpenCL programming guide or AMD's OpenCL programming guide.

    As for the other questions, I can't quite make sense of them. Try rephrasing them in terms of "kernel arguments", "kernel scope variables", "global memory", "local memory", etc.
