Results 1 to 5 of 5

Thread: slow addition of vector-components using float16 ?!

Threaded View

  1. #1
    Join Date
    Jul 2013

    slow addition of vector-components using float16 ?!


    i have tried to optimize my kernel using "float8" and "float16" instead of "float4".
    System is XP32, OpenCL 1.2 on an AMD Athlon X2 250 and a Radeon 6750. (testing on a AMD A6 3450M APU shows the same behavior)

    After a workaround because the <vectorname>.s[<index>] is unsupported ( why? ), i stuck at the following problem:

    I need to add all components of a vector. So i did it with the following line (part of an n-Body Simulation)

    Code :
    barrier(CLK_GLOBAL_MEM_FENCE); waiting every item has finished
    vx[tid] += dt * (; adding all 16 components
    Kernel runs as expected, but very slow....(comparing to "float" )

    After some debugging i changed the line of code to
    Code :
    vx[tid] += dt * (Fx.s0+Fx.s1+Fx.s2+Fx.s3+Fx.s4+Fx.s5+Fx.s6+Fx.s7);   //;
    this doubles the speed of the execution of the kernel (note that above these lines there is a loop calculating millions of sqrt´s with float16 without any (timing) problems) ! Why does a "cheap" addition slows the kernel in that manner?

    Is there any function to add the components of a vector fast(er), or what can i do to avoid this strange behavior?
    Last edited by multifitter; 07-21-2013 at 12:08 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Proudly hosted by Digital Ocean