Would it be faster to declare variables in my kernel and assign them to arguments passed if I’m going to be using them a lot?
Or is just using the arguments fast.
For example
if I pass a float as an argument and it is located in memory, would it be just as fast to use the memory as to make a local object in the kernel and assign it to the memory?
I guess the real question is do stream processors have local cache memory or register memory, and if they do, do kernels use it?
My 8800GTS is supposed to get 200+ gigaflops and I’m getting about 1.6 lol. Which I know I won’t get anything near the 200 as my algorithm does much more than just floating point operations, but to say 1.6 compared to 200…seems like my kernels could be sped up a bit.