Page 1 of 6 12345 ... LastLast
Results 1 to 10 of 60

Thread: Suggestions for next release of OpenCL

  1. #1
    Newbie Newbie
    Join Date
    Sep 2013

    Exclamation Spec confusion regarding convert_ functions

    Refer OpenCL Spec 1.2

    Section 6.2.3
    Explicit conversions may be performed using the
    suite of functions. These provide a full set of type conversions between supported types (see
    sections 6.1.1, 6.1.2 and 6.1.3) except for the following types: bool, half, size_t,
    ptrdiff_t, intptr_t, uintptr_t, and void

    Conversions are available for the following scalar types: char, uchar, short, ushort,
    int, uint, long, ulong, float, and built-in vector types derived therefrom.

    There are datatypes like
    Which are covered in section 6.2.3, but not in What is expected of these datatypes?

  2. #2
    Senior Member Regular Contributor
    Join Date
    Oct 2012
    double is handled like float.

    The other data types (image, sampler, event) are neither scalar types nor vector types, so section does not apply to them.
    They can't be cast to another type. They are simply considered opaque types.

  3. #3
    Newbie Newbie
    Join Date
    Jan 2014

    Accept (int) comparison for select() for all scalar types.

    The relational select() function is very handy for vectorization and mimics the ternary ()?: op. But, the supported types in scalar and vector modes for doubles (and halfs) is inconsistent with the relational comparison functions such as isgreater().

    Scalar prototypes:
    Code :
    int isgreater (double a, double b);
    double select (double a, double b, long cmp);

    Vector prototypes:
    Code :
    longn isgreater (doublen a, doublen b);
    doublen select (doublen a, doublen b, longn cmp);

    The scalar isgreater() (and similar) functions match the c99 math.h prototypes and return int for all datatypes. But, select() only accepts long for double (and short for half). This requires an explicit cast in most (all?) implementations and makes for some headaches when building type-independent code. That is, we can't cleanly write

    Code :
    T select (T a, T b, isgreater(a,b));

    and expect it to work with double and doublen. This same issues occurs with halfs. I have to wrap an #ifdef statement to distinguish scalar and vector types.

    Code :
    #if (__VectorSize == 1)
       // ()?: version
       // double result = (isgreater(a,b)) ? b : a;
       double result = select (a, b, (long) isgreater(a,b));
       double2 result = select (a, b, isgreater(a,b));

    I propose that select() accept the datatype output of the relational functions in both scalar and vector modes. That is, accept (int) for all datatypes in scalar modes and accept the equivalent bit-masks in vector mode.

  4. #4
    Junior Member Newbie
    Join Date
    Mar 2014

    Would like to target system with intel cpu and amd gpu

    Hello List,
    I would like to be able to load-balance my algorithm onto both intel cpu and amd gpu
    at the same time.

    Now, Intel SDK supports intel hardware, and AMD SDK supports AMD hardware.

    How can I develop a solution that targets both platforms concurrently?

  5. #5
    Member Contributor
    Join Date
    Sep 2013

    SPIR version number

    Make SPIR version number the same as the OpenCL version it belongs too.
    Reduces potential confusion.

  6. #6
    Newbie Newbie
    Join Date
    Nov 2014

    Asynchronous memory release

    Releasing temporary buffers in the middle of a chain of kernels executing asynchronously is currently cumbersome. It requires either a synchronization with the device to guarantee that all pending operations using the buffer have finished, or a clumsy event callback on a marker with wait list (or even worse through a native kernel if the device supports it).
    The drawback of the first is that it introduces needless synchronization just to release memory, and the disadvantage of the second besides the horrible syntax is the fact that there is no guarantee as to when the callback will be invoked.

    I think it would be useful to have a function such as clEnqueueReleaseMemObject, which can be pushed onto a queue with the traditional wait list and attached event. It would do exactly the same as clReleaseMemObject with the added advantage that it can be woven into a complex task graph to release the memory as soon as it is not needed.

    Proposed function:

    Code :
    cl_int clEnqueueReleaseMemObject ( cl_command_queue command_queue,
                                       cl_mem memobj,
                                       cl_uint num_events_in_wait_list,
                                       const cl_event *event_wait_list,
                                       cl_event *event )

    Has this been already discussed?

  7. #7
    Member Contributor
    Join Date
    Jul 2011
    Bristol, UK
    It's not clear to me what the problem is. There is no requirement that all pending operations using a buffer complete before you can release it - the buffer will only be destroyed when the reference count is 0 and all commands that use it have completed.

    Can you give an example of the sequence of operations that you are trying to perform, and where you would like to release the buffers?

  8. #8
    Newbie Newbie
    Join Date
    Nov 2014
    Quote Originally Posted by jprice View Post
    and all commands that use it have completed.
    Right, my bad, I missed that part. I based my assumption of the note of clSetKernelArg (5.7.2):
    A kernel object does not update the reference count for objects such as memory, sampler objects specified as argument values by clSetKernelArg, Users may not rely on a kernel object to retain objects specified as argument values to the kernel.
    and the definition of reference counting from the spec (2):
    After the reference count reaches zero, the objectís resources are deallocated by OpenCL.
    So I thought that temporary buffers could only be safely released after synchronization. But the doc of clReleaseMemObject indeed says that the object stays alive event with a ref count of zero as long as it is used by an object in the command queue:
    After the memobj reference count becomes zero and commands queued for execution on a command-queue(s) that use memobj have finished, the memory object is deleted.
    Thanks for pointing that out.

  9. #9
    Administrator Regular Contributor khronos's Avatar
    Join Date
    Feb 2000

    Suggestions for next release of OpenCL

    We're restructuring and cleaning up our forums. This will be the official thread for everyone to post their suggestions for the next version of OpenCL. We have moved the most recent suggestions into this thread already. We look forward to seeing more suggestions.

  10. #10
    Junior Member Newbie
    Join Date
    Mar 2015


    Hello, I originally wrote this in AMD CL support forums but as noted it's really not a vendor issue so I registered.

    I propose a function to wait on a single event of a set. This could be in various forms:
    • cl_int clWaitForAnyEvent(cl_uint num_events, const cl_event *event_list), basically as now or
    • cl_int clWaitEvent(cl_bool all, cl_uint num_events, const cl_event *event_list), in an attempt to save an entry point by putting the other one in a deprecated status
    • cl_uint clWaitEvent(cl_bool all, cl_uint num_events, const cl_event *event_list, cl_int *error), to allow return of a triggering event index,

    The behavior of select(...) is to wake up when at least one watched descriptor is "ready".
    Pthreads takes it easy with a single condition variable to pthread_cond_wait.
    Windows has WaitForMultipleObjects(...) which allows to sleep pretty much on everything. It will wake up when at least one event is triggered but it is possible to require all events to be triggered.

    clWaitForEvents returns CL_SUCCESS if the execution status of all events in event_list is CL_COMPLETE.

    To wait on the first event only, it is necessary to put a callback system in place. Leaving aside this has to be done with some care it seems to me that assembling the wait for all events operation from multiple wait for an event operations to be simpler than the opposite.

    Leaving aside select(...) also updates lists, which does not seem like anything reasonable to me.

    Maybe not pertinent to this specific thread, I would like to know the rationale behind the decision of wait for all.

    I haven't currently read the CL2.1 spec. A quick search suggests this function is not there.
    Last edited by MaxDZ8; 03-05-2015 at 09:39 AM. Reason: element -> event, other quirks.

Page 1 of 6 12345 ... LastLast

Similar Threads

  1. OpenCL Spec: release/retain errors
    By avichihi1 in forum OpenCL
    Replies: 0
    Last Post: 07-08-2018, 11:55 PM
  2. Suggestions for improving the OpenCL C++ API?
    By simonmcs in forum OpenCL
    Replies: 8
    Last Post: 08-13-2013, 02:25 PM
  3. Suggestions for the next release of OpenGL
    By bug343 in forum OpenGL: General
    Replies: 11
    Last Post: 06-23-2013, 08:41 AM
  4. My suggestions for OpenCL 2.0..
    By oscarbg in forum OpenCL
    Replies: 0
    Last Post: 03-04-2013, 03:37 PM
  5. OpenCL Spec: release/retain errors
    By guillona in forum OpenCL
    Replies: 0
    Last Post: 01-17-2011, 07:11 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Proudly hosted by Digital Ocean