Type: Posts; User: andrew.brownsword

Page 1 of 3 1 2 3

Search: Search took 0.00 seconds.

  1. What Kunze said, plus you could combine buffers...

    What Kunze said, plus you could combine buffers into one and use pointer arithmetic (inside the kernel).
  2. Replies

    One approach that I have used is to create a...

    One approach that I have used is to create a #include file that has just my types in it and is included from the host. Then this header and the .cl source file(s) are passed to...
  3. One thing I notice is that you are reading back...

    One thing I notice is that you are reading back several buffers and then writing them again. All this data transfer in/out of the cl_mem buffer objects is going to carry a substantial performance...
  4. Note that in OpenCL 1.x you need to pass a cl_mem...

    Note that in OpenCL 1.x you need to pass a cl_mem object to clSetKernelArg, not a raw pointer.
  5. Re: Memory Problem when trying to speed up Kernel

    Have you checked all your error and other return values? How big is the buffer you're asking for? Is it possible you've previously leaked memory? Does this happen with another vendor's driver or...
  6. Re: Memory Problem when trying to speed up Kernel

    And I assume the first parameter should be pSrcPos as well?

    Its a bit challenging to figure out what your problem is when you aren't posting the code you're actually running...
  7. Re: Memory Problem when trying to speed up Kernel

    The write is the only place I see where you multiply by 3.
  8. Replies

    Re: async_work_group_strided_copy

    Sorry David, but I have to quibble. The async_work_group_strided_copy is not especially useful for an AoS <-> SoA transformation. If it were to be useful for the latter it would take this:

  9. Re: Need a way to calculate theoretical FLOPS of a device

    Doing this in a meaningful way is very complex and highly subject to the exact nature of your algorithm(s). The best way to evaluate this turns out to be for an application to simply try it's...
  10. Re: setKernelArg as size in bytes and NULL for __global poin

    Because global memory is a shared resource that persists beyond the scope of a kernel. It is better to let the application allocate and reused it's allocated buffers as it knows best. Local memory...
  11. Replies

    Re: "clDevicePointer" function needed!

    Why not use indices instead of pointers? This way they are independent of buffer location, devices, address spaces, etc. The same index could be applied to multiple buffers. Index size can be...
  12. Replies

    Re: OpenCL for Real-Time environments

    I am also interested in problems in this domain. What do you see as needing to change in the spec to enable RT in CL? If no spec changes are required, what do you see as needing to change in...
  13. Re: How do I check OpenCL is OK? Mac/Windows

    On SnowLeopard, OpenCL is always present and you can just start calling it using the default cl_platform. Under Windows your app should either have a hard dependency on the ICD dll (if you can't run...
  14. Re: Image object support on MAC OS with ATI Radeon HD 5750 ?

    Unfortunately Apple is generally not very forthcoming with information about their roadmaps, so its hard to know when things will be fixed and updated. You are correct about the lack of image...
  15. Replies

    Re: Images without Image support?

    You can certainly do this, and it will work although performance will not be as good as if you were using the GPU's texture sampler units. You need to compute each pixel's linear address from the...
  16. Re: Traversing a Tree using the root pointer

    You really don't want to use pointers. Not only are they potentially different sizes between devices (and host), but they are also potentially different sizes between address spaces (global, local,...
  17. Replies

    Re: calculation of a float value

    Currently you have no option. You have to create two contexts in that case. If you had an AMD GPU and an AMD CPU then you could have both in one context.

    FWIW, I believe the AMD OpenCL...
  18. Replies

    Re: Matrix Multiplication

    You could also try using the CPU device to see how that performs.
  19. Re: OpenCL Image Rotate/Scale/Translate, Affine Transform, .

    The hardware math acceleration comes in the form of SIMD vector operations which are exposed as the vector types in OpenCL C (e.g. float4) and many built-in math functions and operators on those. ...
  20. Replies

    Re: Init Buffer Problem

    How is globalWorkSizeInit declared and initialized?

    The 1D case should work fine and is a little simpler.
  21. Re: Undefined reference errors with image2d functions

    Given those error messages I'm inclined to think that the problem is in your host program, not in your kernel. They look more like linker errors than compiler errors. Perhaps you are trying to use...
  22. Re: clGetKernelWorkGroupInfo does not return correct local m

    Sounds like a bug in the implementation, I would report it to your vendor.

    Before checking the spec, I didn't realize that CL_KERNEL_LOCAL_MEM_SIZE was supposed to include the dynamically set arg...
  23. Re: Multi-GPU System, multiple contexts or command queues?

    I too would expect that a single context with multiple devices would be preferable. In addition to being able to synchronize between them, they could also then share buffers. My wild-assed guess...
  24. Re: Communication between OpenCL and CUDA

    I don't know exactly what your requirements are, but I would suggest an all OpenCL application that compiles from source (and perhaps caches compiled binaries), or if you can't ship source then...
  25. Re: Communication between OpenCL and CUDA

    The ICD is supposed to enable applications to use all vendor implementations. Your application links against the ICD DLL and uses the clGetPlatformIDs to find all the installed implementations, and...
Results 1 to 25 of 66
Page 1 of 3 1 2 3
Proudly hosted by Digital Ocean