Search:

Type: Posts; User: Dithermaster

Page 1 of 9 1 2 3 4

Search: Search took 0.00 seconds.

  1. Replies
    3
    Views
    125

    Your kernel takes three arguments, int, float, &...

    Your kernel takes three arguments, int, float, & buffer
    You are calling setArg with buffer, buffer, & buffer
    The first two are wrong, you should be passing int & float (and also don't need to...
  2. No, they are not emulated. Your target...

    No, they are not emulated.

    Your target platform is AMD GPU and your kernel uses AMD-specific extension but your development system has an NVIDIA GPU that does not support them.

    Every GPU AMD...
  3. get_global_size is the same for all work iteam...

    get_global_size is the same for all work iteam (aka threads). If you enqueued your kernel for 512 work items it would be 512.
    get_global_id is unique for each work items. If you enqueued your kernel...
  4. Short answer: No Long answer: You are trying...

    Short answer: No

    Long answer: You are trying to run code that uses AMD-specific extensions on an NVIDIA GPU, where they are not supported. You should either switch to an AMD GPU or re-write the...
  5. Replies
    3
    Views
    125

    General advice: Since the display driver needs to...

    General advice: Since the display driver needs to use the GPU it kills your long-running compute process. When running OpenCL on the display GPU you should try to keep your kernels in the...
  6. Could your kernel execution time be dependent on...

    Could your kernel execution time be dependent on the input data?
  7. Starting in OpenCL 1.1 (OpenCL 1.0 was not...

    Starting in OpenCL 1.1 (OpenCL 1.0 was not threadsafe).
  8. Replies
    1
    Views
    247

    Doing it just as you describe works fine for us...

    Doing it just as you describe works fine for us on Windows. On Mac there were some issues so we always build from source. What platform are you on? Very sorry, but I can't share source, just confirm...
  9. Another thing to watch for: In CUDA the host...

    Another thing to watch for: In CUDA the host specified the block size and number of blocks. In OpenCL you specify the global size and optionally the block (workgroup) size.
  10. Replies
    2
    Views
    453

    Exactly. OpenCL isn't good for a low-latency...

    Exactly. OpenCL isn't good for a low-latency short calculation. The buffer and command queue overhead would always take longer than just doing the simple calculation on the host. OpenCL is about...
  11. AMD fast path mentioned in their optimization...

    AMD fast path mentioned in their optimization guide: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/
  12. It is a mystery to us too since we can't see the...

    It is a mystery to us too since we can't see the code. One possibility is that the compiler doesn't know the condition is always false (perhaps it is passed in as a kernel argument) and your...
  13. I haven't used the physics packages so I'm not...

    I haven't used the physics packages so I'm not help there, sorry.
  14. First time someone named me in a thread title ...

    First time someone named me in a thread title <g>

    I've had both NVIDIA and AMD cards in my HP Z820 for some time (different versions of each over time) as a way to confirm our kernels compile and...
  15. Replies
    2
    Views
    409

    @gregorstopar, all of the errors are defined in...

    @gregorstopar, all of the errors are defined in cl.h

    When you get CL_BUILD_PROGRAM_FAILURE you should get the build log using clGetProgramBuildInfo with CL_PROGRAM_BUILD_LOG so you know what the...
  16. You are doing extensive buffer manipulation in...

    You are doing extensive buffer manipulation in the host code, and this is very likely the source of your performance problems. The host code should only initialize the buffers and get the final...
  17. The OpenCL specification has a section that...

    The OpenCL specification has a section that describes exactly what read_imagef does for every sampler type, including interpolation. It would not be hard to write a replacement (in fact, the CPU...
  18. Two things: 1) local work group size area (width...

    Two things: 1) local work group size area (width * height) cannot be larger than what CL_DEVICE_MAX_WORK_GROUP_SIZE returns (which I've seen as small as 128 on older hardware, which 32x32 is larger...
  19. Replies
    2
    Views
    1,049

    Some of the Amazon GPU instances have high-end...

    Some of the Amazon GPU instances have high-end NVIDIA GPUs which support OpenCL. I am not affiliated, just sharing.
  20. Replies
    3
    Views
    993

    It depends. If the array rarely changes and you...

    It depends. If the array rarely changes and you need to access it a lot, make a copy of it in i,j (instead of j,i) order. Alternatively, store it in an OpenCL image instead, which has more fair...
  21. Replies
    5
    Views
    1,063

    Of course. For example, and system header that...

    Of course. For example, and system header that includes file system access, system clock access, stdio access, etc. None of these can be accessed from the device.
  22. The benefit of shared local memory is if many...

    The benefit of shared local memory is if many work items in a work group need to access the same memory at different times (for example, a matrix multiply). If each of your work items accesses...
  23. Replies
    5
    Views
    1,063

    Something you are #include'ing is trying to...

    Something you are #include'ing is trying to #include stdarg.h, which is likely not compatible with OpenCL C99. Check your includes (and their includes) to find the culprit.
  24. Replies
    1
    Views
    645

    Intel has some tools which can measure CPU and...

    Intel has some tools which can measure CPU and GPU power usage. I'm not aware of anything off the top of my head for AMD or NVIDIA.
  25. Replies
    5
    Views
    774

    Is there even a use case for pipes on CPU or GPU...

    Is there even a use case for pipes on CPU or GPU devices (that is more efficient or less code than just using global memory or images between kernels), or do they exist just for FPGA devices?
Results 1 to 25 of 222
Page 1 of 9 1 2 3 4
Proudly hosted by Digital Ocean