Search:

Type: Posts; User: utnapishtim

Page 1 of 6 1 2 3 4

Search: Search took 0.00 seconds.

  1. Replies
    3
    Views
    730

    Atomic operations can be very slow on GPUs...

    Atomic operations can be very slow on GPUs (before Kepler for NVidia for instance), so we try to avoid them if they occur frequently in a kernel.

    Recent AMD GPUs have hardware atomic counters...
  2. Replies
    3
    Views
    730

    A simple way (although not the most efficient if...

    A simple way (although not the most efficient if most of the pixels are to be returned) is to use an atomic counter.
    Your device has to support the cl_khr_global_int32_base_atomics extension.
    ...
  3. Your kernel must not modify the w coordinate....

    Your kernel must not modify the w coordinate. Remember that a 4D-point (x, y, z, w) is equivalent to a 3D-point (x/w, y/w, z/w) (to make it simple).
    You can consider (x, y, z, 1) as a point and (x,...
  4. Is it normal that the global work size is three...

    Is it normal that the global work size is three times larger than the buffer size?
  5. Replies
    1
    Views
    342

    1) This is true for desktop platforms. exp()...

    1) This is true for desktop platforms. exp() should return HUGE_VALF, which evaluates to +infinity.

    Embedded platforms do not necessarily handle infinity, in which case HUGE_VALF is the largest...
  6. Your OpenGL position is made of 3 consecutive...

    Your OpenGL position is made of 3 consecutive floats, whereas your OpenCL kernel expects a position made of 4 consecutive floats.

    So either use a position with 4 floats and pass a stride parameter...
  7. And in your initial case, use an internal format...

    And in your initial case, use an internal format GL_RGBA32F instead of GL_RGBA8 for a data type GL_FLOAT.
  8. Try with GL_RGBA8 as internal format instead of...

    Try with GL_RGBA8 as internal format instead of GL_RGBA.
  9. When you create a texture with glTexImage2D() and...

    When you create a texture with glTexImage2D() and a null data pointer, the texture is incomplete and clCreateFromGLTexture() fails with CL_INVALID_GL_OBJECT.
    You have to create it with a non-null...
  10. Where do you set the arguments of your kernel ?

    Where do you set the arguments of your kernel ?
  11. Replies
    1
    Views
    378

    /* Read the kernel's output */ err =...

    /* Read the kernel's output */
    err = clEnqueueReadBuffer( queue, results_buffer, CL_TRUE, 0, nbPoints * sizeof(float), results, 0, NULL, NULL);
  12. Replies
    2
    Views
    497

    I doubt that your kernel can compile: srand(),...

    I doubt that your kernel can compile: srand(), time() and rand() are not part of OpenCL C.
  13. From OpenCL specs about memory consistency:...

    From OpenCL specs about memory consistency: "Global memory is consistent across work-items in a single work-group at a work-group barrier, but there are no guarantees of memory consistency between...
  14. Replies
    7
    Views
    1,255

    Only powers of two can be exactly represented by...

    Only powers of two can be exactly represented by binary floating-point formats such as float or double.

    Since 3.03 or 7.03 are not powers of two, they simply cannot be exactly represented.

    For...
  15. You can't, but there's nothing wrong with a loop...

    You can't, but there's nothing wrong with a loop in a kernel.
  16. CL_DEVICE_LOCAL_MEM_SIZE returns the max amount...

    CL_DEVICE_LOCAL_MEM_SIZE returns the max amount of local memory that a work-group can allocate (and use). Since a work-group can run on only one compute unit, this amount of memory is for each...
  17. If you use the CPU device and your app is...

    If you use the CPU device and your app is compiled for x64, get_global_id() returns a size_t value with is 64-bit wide.
    In this case, as_uchar4(get_global_id(0)) is not legal.

    You should first...
  18. Each compute unit has 32 ALU. So the device has a...

    Each compute unit has 32 ALU. So the device has a total of 4x32=128 ALU.
    Each compute unit can run a work-group of up to 512 work-items.
  19. A work-group runs on one compute unit. It cannot...

    A work-group runs on one compute unit. It cannot be split among several compute units (first of all because local memory is local to a compute unit).

    The max work-group size is an indication of...
  20. Adding "return ret;" at the end of getuint2()...

    Adding "return ret;" at the end of getuint2() will probably help...
  21. Replies
    7
    Views
    1,010

    VGPRs are 32-bit wide.

    VGPRs are 32-bit wide.
  22. Replies
    7
    Views
    1,010

    Private memory is a lot faster than global memory...

    Private memory is a lot faster than global memory (roughly 100x faster). However you must consider it as a scarce resource. As I stated earlier, the optimal maximum number of registers for a kernel...
  23. From what I have seen,...

    From what I have seen, CL_DEVICE_MAX_WORK_GROUP_SIZE is 256 on a HD7970.
  24. Replies
    7
    Views
    1,010

    A wavefront is more or less the hardware...

    A wavefront is more or less the hardware counterpart of a work-group. Each work-group is split in blocks of 64 work-items; this block is executed as a wavefront by a compute unit. Several wavefronts...
  25. Replies
    7
    Views
    1,010

    This might be caused by register spilling. The...

    This might be caused by register spilling. The full code may need more registers than each isolated part of the algorithm.
    In this case, values have to be temporarily stored to and read from global...
Results 1 to 25 of 144
Page 1 of 6 1 2 3 4
Proudly hosted by Digital Ocean