  1. Replies

    Thanks for looking at the ISA, Salabar. So, it...

    Thanks for looking at the ISA, Salabar. So, it looks like this is just a convenience method on AMD GCN. Good to know :)
  2. Replies


    Does anyone have experience with performance advantage from using async_work_group_copy() over regular coalesced
    read of global memory ?

    I tested this method out on AMD GCN 1.0 card, and found...
  3. most efficient way of converting byte array to vector

    What is the most efficient way of converting an array of 16 bytes into
    a uint4 vector ? currently, I manually OR the bytes into uints, then set
    the vector's components with the completed uints.
  4. Is there a way of easily clearing a command queue with kernels in wait state ?

    In my application, I enqueue many kernels and wait for certain events before executing these kernels.
    If my application has to stop for some reason, then the command queue is filled with waiting...
  5. Replies

    Unit testing framework for OpenCL ?

    Is anyone familiar with a unit testing framework for OpenCL, similar to junit ?
  6. Replies

    Bank conflicts in 2D kernel

    Suppose our hardware has 32 banks of 4 byte width. And we have a 1D kernel
    of size 32, and a local 1D array of ints.

    Then, ensuring that each consecutive thread accesses consecutive
  7. Thread: GPU vs FPGA

    by boxerab

    Thanks, Dithermaster. One interesting...

    Thanks, Dithermaster. One interesting development: Intel is planning a Xeon chip with integrated FPGA. Should be interesting.
  8. How to optimize kernel with mixture of parallel and serial code ?

    I have a kernel that performs two tasks (A followed by B) - the first is quite parallel, and the second task cannot be parallelized.

    Task A is performed by all work items, and task B is only...
  9. Thread: GPU vs FPGA

    by boxerab

    GPU vs FPGA

    So far, I have only been thinking of GPU platforms when developing my kernel,
    But, I just learned that the two largest FPGA manufacturers, Xilinx and Altera,
    now have OpenCL SDKs.

    Can anyone...
  10. Thanks, Dithermaster. Makes sense.

    Thanks, Dithermaster. Makes sense.
  11. Impact of PCI bus speed on opencl performance

    PCIe 4 is expected in 2016. Can anyone comment on the impact this will have
    on gpgpu performance? For gaming, I have read that pci 3 has about same perf
    as pci 2.
  12. Tried this out on HD 7700 series GPU: best perf...

    Tried this out on HD 7700 series GPU: best perf was from individual loads, not vloadn.
  13. Thanks kunze. Now, what about bank conflicts. If...

    Thanks kunze. Now, what about bank conflicts. If work item one issues memory reads from address 0 to address 4, and
    the next work item reads from address 1 to address 5, then the individual reads...
  14. vload4 vs four buffer acceses for local memoy buffer

    Does vload4 have any advantage over four individual buffer accesses for a local memory buffer?


    __local int FOO[256];

    // case...
  15. Replies

    Sticky: Would like to target system with intel cpu and amd gpu

    Hello List,
    I would like to be able to load-balance my algorithm onto both intel cpu and amd gpu
    at the same time.

    Now, Intel SDK supports intel hardware, and AMD SDK supports AMD hardware.
