    Re: PowerVR SGX (omap3530) support

    There's a big difference between having a GPU whose hardware supports the OpenCL requirements and a driver that passes conformance. :) My guess is that it will take a while to get working drivers for...
    Re: Global memory alignment

    There's no real way to control this alignment. Since the driver does the data movement/allocation/management you can reasonably assume that the global structures will be nicely aligned for you. You...
  3. Re: low float precision - only on parts of the output data

    The first thing that comes to mind are data races. Are you sure that your barrier() in your kernel is not trying to cross work-groups? Your work-group size is 2x2, so that barrier will only...
  4. Re: Weird compile error with structures defined in *.cl

    The built-in functions are only supported from within code compiled by OpenCL and called from your kernel. Since your host-side code is neither, they won't work. However, most of them would be...
    Re: opencl sources of zlib..

    I have not heard of anyone doing zlib in OpenCL, but people may have. The biggest issue will be that such compression algorithms tend to be very serial (think Huffman encoding) and rely heavily on...
  6. Re: How to get correct access to all values in the global memory

    In OpenCL 1.0 you need to check if the byte writes are supported. I know Nvidia GPUs do support this, but the 4xxx AMD ones do not, for example.
    Re: Visual Studio 2008 Setup with OpenCL Plugin

    You will need to upgrade to Snow Leopard to get OpenCL on the Mac. I believe the Quadro FX 5600 is supported, but you should check Apple's page to verify it. If the card works currently for graphics...
    Re: Will OpenCL replace OpenGL in the future?

    I'd guess they'll co-exist for a long time, with more advanced stuff being done in OpenCL and the more basic rendering pipeline stages still going through GL. The real win is that CL allows you to...
  9. Re: how to implement serial calculation in kernel code?

    If you're only using one work-group you will get only a tiny (1/4 to 1/48th) of the total GPU performance.

    If you need to do this sort of synchronization across all work-items you have to wait for...
  10. Re: Memory error occurs after releasing a memory object

    creating/releasing memory buffers can indeed be costly because it involves doing an allocation on the device. However, writing data to the memory object can be far, far more costly. If you are...
    Re: memorey object release and ndrange arguments

    Yes. It will stay around until both you release it and the runtime is done with it.
    Re: OpenCL: Hardware requirements

    These are defined in section 7, in particular 7.4.
    Re: Wrong precision in multiplication results

    The size of the error will depend on the magnitude of the result.

    An error of 0.000005 would be expected if your result is about the size of 1.
    If your result is the size of 1000000.0 then you'd...
    Re: Wrong precision in multiplication results

    That looks about right for 32-bit floating point precision. They have about 24 bits of precision, which is around 7 decimal digits. Are you comparing this to 64-bit doubles on a CPU or 32-bit floats?...
    Re: blcoking clReleaseCommandQueue crashes

    Calling clReleaseCommandQueue should never crash as long as you call it only once for any call to clRetain... plus once for the initial creation. E.g., command queues should have a effective...
  16. Re: Using structs to pass optional features into kernels

    Good luck! This is really hard to get right with the current API. E.g., on a Mac, if you compile on an SSE4.2 machine your saved kernel binary will crash on an SSE4.1 or SSSE3 machine with invalid...
  17. Re: Using structs to pass optional features into kernels

    Two possibly useful comments:
    1) I don't think you can pass global pointers in in a struct from the host since the address of the global pointer can only be set by clSetKernelArgs.
    2) You can...
  18. Thread: Strange results

    by dbs2

    Re: Strange results

    I can't really tell what's going on here since I don't know where localID or idx come from, but I'd suspect you have a bunch of data races. You have some loops that are walking over aux and setting...
  19. Re: Max __constant variables defined in program source

    My understanding is that a kernel can not use more than the max constant buffer size. At least in PTX I believe this is a separate memory space which has different allocation limits. (I may both be...
    Re: How to assure kernel execution ended?

    The most efficient way to do this is to use wait(). Finish() is a very heavy command as it will make sure everything in the command queue finishes, not just your kernel.
    Re: Atomics and flushing

    I'm not sure I completely understand but maybe some of this will help. atom_add returns the old value before the add so you can use that to determine what the value is after the add. This will get...
    Re: Computation values turn out to be incorrect

    It looks like you are setting bbox[0] in all your work-items at the same time. This means they may overwrite each other in a non-determinisitic fashion. (E.g., work-item 302 might replace the value...
  23. Re: Iterations And GlobalRange Difficulty (Related?)

    I think there's some confusion here. I'm not talking about a global memory size when I say there is (was?) a limitation in the Nvidia drivers. I have heard several people say that...
  24. Re: Poor bandwith on matrix multiplication with local memory?

    Running the code manually many times is not going to have the same effect since the data will still have to be transferred and all the setup and initialization will have to happen.

    If you want to...
  25. Re: Iterations And GlobalRange Difficulty (Related?)

    Are you using Nvidia's drivers? If so, check their release notes because I don't think they support a global size > 65,535.

    Also, if your kernel is taking too long (say longer than 5 seconds) the...
