Official OpenVX 1.0 specification feedback thread

The Khronos™ Group today announced the ratification and public release of the finalized OpenVX™ 1.0 specification, an open, royalty-free standard for cross platform acceleration of computer vision applications. OpenVX enables performance and power-optimized computer vision processing, especially important in embedded and real-time uses cases such as face, body and gesture tracking, smart video surveillance, advanced driver assistance systems (ADAS), object and scene reconstruction, augmented reality, visual inspection, robotics and more. In addition to the OpenVX specification, Khronos has developed a full set of conformance tests and an Adopters Program, that enables implementers to test their implementations and use the OpenVX trademark if conformant. Khronos plans to ship an open source, fully-conformant CPU-based implementation of OpenVX 1.0 before the end of 2014.

OpenVX defines a higher level of abstraction for execution and memory models than compute frameworks such as OpenCL™, enabling significant implementation innovation and efficient execution on a wide range of architectures while maintaining a consistent vision acceleration API for application portability. An OpenVX developer expresses a connected graph of vision nodes that an implementer can execute and optimize through a wide variety of techniques such as: acceleration on CPUs, GPUs, DSPs or dedicated hardware, compiler optimizations, node coalescing, and tiled execution to keep sections of processed images in local memories. This architectural agility enables OpenVX applications on a diversity of systems optimized for different levels of power and performance, including very battery-sensitive, vision-enabled, wearable displays.

We’d like to hear what you think about this new API.

Hi, I am working on creating an openvx implementation. I was wondering if there is any place to download the header files or should I just extract them from the sample implementation as is?
Thanks.


Nitin Garg

Hi, I had the same issue. I used a script to extract the headers. It is available here:

[https://github.com/hakanardo/pyvx/tree/master/tools](https://github.com/hakanardo/pyvx/tree/master/tools)

and the resulting headers are here:

[https://github.com/hakanardo/pyvx/tree/master/pyvx/inc/headers/VX](https://github.com/hakanardo/pyvx/tree/master/pyvx/inc/headers/VX)

Hi,
great work with the 1.0 specs! How about making the VX_CHANNEL_… constants unique? That way things like:

vxChannelExtractNode(graph, rgb_image, VX_CHANNEL_Y, output)

could return an error during verification instead of silently returning the red plane, or possibly even inserting an ColorConvert node when needed.

Hi,
I’ve been playing a bit with implementing bits and pieces of the OpenVX
specs, and I believe it has great potential! Below are a few random thoughts.
Sorry for not being more structured…

  1. What’s the recommended way of getting interleaved RGB or UYVY image data in
    and out of OpenVX graphs without copying the data? Does
    vxCreateImageFromHandle assume the image data passed to be valid as long as it
    is needed or is it forced/guaranteed to copy it? (I think this should be
    specified in the specs).

  2. How do I transform a (x, y) pixel coordinate into an address offset using a
    vx_imagepatch_addressing_t? Are the step/strides specified in bytes or pixels?
    A specific formula would be nice. In the example code provided there is the
    following code that is used for this:

    ((addr.stride_yyaddr.scale_y) / VX_SCALE_UNITY) +
    ((addr.stride_xxaddr.scale_x) / VX_SCALE_UNITY)

Is it possible that the parenthesis have been mixed up here? To make it
possible to address for example UYVY images I would instead expect something
like:

addr.stride_y * ((y*addr.scale_y) / VX_SCALE_UNITY) +
addr.stride_x * ((x*addr.scale_x) / VX_SCALE_UNITY)

in which case the existence of both stride and scale makes more sense. Note
that the example as it is written probably works find as x and y is
incremented with step in which case both formulas would work, but it would be
nice to have the general formula specified somewhere.

  1. The information in vx_imagepatch_addressing_t seems to be redundant as it
    contains both step and scale with the invariant:

    step == VX_SCALE_UNITY / scale

enforced. Is that correct? Scale seem to be designed to support both scaling
up and down using a scale factor with 10 bits of precision (VX_SCALE_UNITY ==
1024). At the same time, step restricts the possibilities to only scale down
by and integer factor. This seems a bit odd, am I missing something?

  1. What’s the recommended way of doing a a RGB to grey conversion? A
    ColorConvert to some YUV format followed by a ChannelExtract seems unnecessary
    heavy, or is the allocation and calculation of the unneeded color-planes
    optimized out?

  2. I find that identifying the troublesome node when verification fails is
    be troublesome. Here it will become important with good error-messages. Today
    there is only a single return-code from the graph verify function to help you
    do that, right? Would it be possible to also add a reference to the failing
    node? Preferable including some way to also find the source-code line where
    the user instantiated that Node.

  3. For debugging it would also be great to have nodes with floating-point
    support and nodes for importing and displaying video. Even if those nodes
    don’t get hardware acceleration it would allow me to debug a troublesome graph
    offline using a video recording and visualize different internal states of the
    graph using floating-point calculations.

  4. I like the inclusion of user kernels. I think this is going to be an
    important feature as I don’t think you’ll ever get all the way using standard
    nodes only. You’ll always need some application specific code. Thus I think
    it is important to make user kernels as efficient as possible. If possible
    provide a way for user to specify kernels in a way that allows them to be
    optimized together or even merged with standard nodes. The tiling extension is
    probably a good step in this direction (I’ve not yet digged into it yet).

  5. In the Graph Parameters section it is claimed that the manual node creation
    interface typically needs to be used. Wouldn’t the natural way be to construct
    the graph using the vxXxxNode functions and then extract the parameters of
    interest with vxGetParameterByIndex? Those could be passed to
    vxAddParameterToGraph, right?

  6. If I understand the concept of graph factories right, it does not allow for
    different graphs to be combined and optimized together? This is where I
    believe the real benefit of graph-level optimizations lies.

How about introducing a vxSubGraphNode that allows the user create custom
nodes by encapsulating graphs into such nodes. They can then be used as any
other nodes when creating higher level functionalities. From an implementation
point of view, those subgrpah nodes would at the beginning of the verification
step be “inlined” into a single non-hierarchical graph which now only contains
standard nodes.

This inlined graph would then be optimized and compunctions common among the
different subgraphs could be reused. I could for example make my
vxSubGraphNode’s accept any image format, by always placing a ColorConvert()
followed by a ChannelExtract(CHANNEL_Y) as the first nodes. Then when I, at a
higher level, uses multiple vxSubGraphNode’s that does this, all the
ColorConvert nodes from all the vxSubGraphNode’s would during optimization be
combined to a single ColorConvert node.

I think you could get quite far in this directing by using a function that
takes a graph as an argument and adds multiple nodes to it, but probably not
all the way. A vxSubGraphNode would be a cleaner approach with consistent
interfaces when using both standard and custom nodes.

  1. The vxCornersGraphFactory example in “Framework: Graph Parameters” uses a
    dimof function which should be defined somewhere. Also, the first use of this
    function (inside a declaration) should probably be removed.

  2. vxAddParameterToGraph claims to return VX_ERROR_INVALID_PARAMETER in
    certain cases. This constant does not exists. May it should be
    VX_ERROR_INVALID_PARAMETERS even though there is only one parameter involved.

  3. Is it the responsibility of the user to age the delay objects by calling
    vxAgeDelay? Is there any way to declare that a delay object should be aged
    after each call to vxProcessGraph? This does not fit well with the
    encapsulation ideas of the graph factories. I.e. you would probably want to
    hide Delay nodes within those subgraphs, but if that means the user of those
    subgraphs would need to call vxAgeDelay on each of them, the encapsulation is
    broken in an unpleasant way.

Would it be a good idea to use Node Callbacks that calls vxAgeDelay in those
cases (there are some warnings about this being inefficient)?

  1. To use the distinction between vx_scalar and vx_int32 to specify where
    changing a parameter will enforce a new verification feels quite hackish.
    Wouldn’t it be better to specify this more explicitly? I.e. add an attribute
    that tells which case it is and add a column to the argument-table in the
    specs with it’s value. This way this information would be available in the
    same ways as the parameter direction is. This would also allow for the specs
    to allow the implementation to decided whether a reverification is needed or
    not in cases where that is appropriate.

  2. How about adding a VX_TYPE_STRING type to allow user kernels to take for
    example a file-name as a parameter? Using a vxArray with item_type ==
    VX_TYPE_CHAR would be cumbersome and would not allow the file to be loaded
    only during verification (unless point 13 above is addressed).

  3. In many cases where pointers are passed to the vx functions, there is also
    a size parameter specifying how much data could be written. However this is
    not the case for i.e. vxCreateScalar vxAccessScalarValue and
    vxCommitScalarValue. Is there some logic to when the size is needed and when
    not?

Lots of interesting feedbacks hakanardo. I’ll comment some of them now, more later:

Question 1.

> What’s the recommended way of getting interleaved RGB or UYVY image data in and out of OpenVX graphs without copying the data?

I think that the best way is to use ColorConvert and ChannelExtract nodes. Doing this, you can enable optimizations by the graph manager (avoid a copy for instance). You’ll then need to use virtual images as intermediate data. For instance, you can build the following graph:
[RGB real image] -> ColorConvert -> [YUV virtual image] -> ChannelExtract -> [Y virtual image] -> …
Note that if you use real images as intermediate images, you will force the graph manager to make a copy

> Does vxCreateImageFromHandle assume the image data passed to be valid as long as it is needed or is it forced/guaranteed to copy it? (I think this should be specified in the specs).

The OpenVX specification certainly need to be clarify here. To ensure best portability, I would advise to keep the data area you passed by pointer valid.

Question 4.

> What’s the recommended way of doing a a RGB to grey conversion? A ColorConvert to some YUV format followed by a ChannelExtract seems unnecessary heavy, or is the allocation and calculation of the unneeded color-planes optimized out?

If you use virtual images, the OpenVX graph manager has all the information to avoid unecessary copies and computation (for instance U&V computation in your case). [RGB real image] -> ColorConver -> [YUV virtual image] -> ChannelExtract -> [Y virtual image] -> …

Question 5.

> I find that identifying the troublesome node when verification fails is be troublesome. Here it will become important with good error-messages. Today there is only a single return-code from the graph verify function to help you do that, right? Would it be possible to also add a reference to the failing node? Preferable including some way to also find the source-code line where the user instantiated that Node.

You have a way to get some messages from the OpenVX implementation: you can register a log callback with vxRegisterLogCallback. The messages are not standardized, but an OpenVX implementation can return more detailed information about error. The callback gets as parameter the reference of an object that will, in most of cases, be the object causing the issue.

More comments to hakanardo:

Question 9.

> If I understand the concept of graph factories right, it does not allow for different graphs to be combined and optimized together? This is where I believe the real benefit of graph-level optimizations lies.

In OpenVX 1.0, you don’t have the possibility to combine graphs together.

> How about introducing a vxSubGraphNode that allows the user create custom nodes by encapsulating graphs into such nodes. They can then be used as any other nodes when creating higher level functionalities. From an implementation point of view, those subgrpah nodes would at the beginning of the verification step be “inlined” into a single non-hierarchical graph which now only contains standard nodes.

I think that this sub-graph concept is very interesting indeed. But if it looks simple conceptually, I think this is actually a highly complex subject. In effect, the graph can be modified after verification (some changes require a re-verification, some others not). For instance, a node can be removed by the application, or have some of its parameter changed. It’s not necessarily easy to keep the consistency in your ‘expanded’ graph. But in any case, I agree this is an interesting subject to investiguate further.

Question 10.

> The vxCornersGraphFactory example in “Framework: Graph Parameters” uses a dimof function which should be defined somewhere. Also, the first use of this function (inside a declaration) should probably be removed.

dimof is not standardized. The specification then needs to be updated, maybe by simply providing the code of dimof in the examples you mention.

Question 12.

> Is it the responsibility of the user to age the delay objects by calling vxAgeDelay? Is there any way to declare that a delay object should be aged after each call to vxProcessGraph? This does not fit well with the encapsulation ideas of the graph factories. I.e. you would probably want to hide Delay nodes within those subgraphs, but if that means the user of those subgraphs would need to call vxAgeDelay on each of them, the encapsulation is broken in an unpleasant way.

In OpenVX 1.0, a delay object is a ‘real’ object. It can be used in a graph, but its scope is not limited to the graph (like real image). This object can be used outside of the graph (for instance in an other graph). For this reason, it is the responsability of the user to age the delay.
Nevertheless, I agree that in many cases, the user will want the delay to be aged automatically after graph execution, so this is a subject to invertiguate further also.

> Would it be a good idea to use Node Callbacks that calls vxAgeDelay in those cases (there are some warnings about this being inefficient)?

I don’t think that NodeCallback is the good mechanism for what you want since, as a user, you don’t well control when the node callback is actually called. I don’t think also that there is a good way to do what you want in OpenVX 1.0. I would then recommend to age delays manually for the time being.

Question 13.

> To use the distinction between vx_scalar and vx_int32 to specify where changing a parameter will enforce a new verification feels quite hackish. Wouldn’t it be better to specify this more explicitly? I.e. add an attribute that tells which case it is and add a column to the argument-table in the specs with it’s value. This way this information would be available in the same ways as the parameter direction is. This would also allow for the specs to allow the implementation to decided whether a reverification is needed or not in cases where that is appropriate.

Using vx_int32 instead of vx_scalar is not only for preventing connecting 2 nodes with this parameter. It is also simpler for the user to create a node with a vx_int32 argument rather than a vx_scalar (no scalar object to create, initialize and release).I think your comment is relevant, things would certainly be more explicit with an additional parameter property.

Question 15.

> In many cases where pointers are passed to the vx functions, there is also a size parameter specifying how much data could be written. However this is not the case for i.e. vxCreateScalar vxAccessScalarValue and vxCommitScalarValue. Is there some logic to when the size is needed and when

Usualy, a vx_size parameter is given in addition of the pointer in vxQuery and vxSetAttribute functions. These functions are generic and need to work with attributes of very different sizes. Since the attribute enum name does not explicitely tells which size is expected in most of cases, the vx_size parameter acts as a sanity check to avoid memory corruptions.
For other functions like access/commit functions, there in effect no such vx_size ‘sanity check’ param. Since a scalar can object with different sizes, it may also be safer to have a vx_size parameter also. I think it’s a question of tradeoff to find between simplicity/safety/performance : more parameters usually means more work for the user and lower performance.

OK, say I have an OpenVX graph I want to apply to every frame of a video. I
decode the first frame and create the first input image using
vxCreateImageFromHandle, and verify my graph. Now for the second frame, can I
call vxCreateImageFromHandle again to create a second input image and update
the input parameter, or would this force me to run verify again?

The alternative would be to use vxAccessImagePatch and pass the provided
pointer to my decoder and when it is done call vxCommitImagePatch. This will
however not work in my case as the driver producing the images have no api for
passing in buffers (they are allocated internal and returned).

Also I would typically want to pipeline this process using a queue of a few
frames. That way I can have the next frame loaded into it’s buffer through
some DMA access while I process the current frame. Can I use a Delay object
here? I would need to have a few of the images in the queue “open” with
vxAccessImagePatch while I run a graph that process some of the other images
in the queue.

For the output, say I use an image created by vxCreateImageFromHandle as the
output image of some node. Can I assume that the pointer I passed to
vxCreateImageFromHandle contains valid results when ProcessGraph returns? Or
do I have to use vxAccessImagePatch here as well?

Question 5.

> I find that identifying the troublesome node when verification fails is be troublesome. Here it will become important with good error-messages. Today there is only a single return-code from the graph verify function to help you do that, right? Would it be possible to also add a reference to the failing node? Preferable including some way to also find the source-code line where the user instantiated that Node.

You have a way to get some messages from the OpenVX implementation: you can register a log callback with vxRegisterLogCallback. The messages are not standardized, but an OpenVX implementation can return more detailed information about error. The callback gets as parameter the reference of an object that will, in most of cases, be the object causing the issue.

Nice!

I see. Is there somewhere a list of what modifications are allowed without reverifying the graph? Removing a node could render a graph illegal so I suppose that requires reverification?

Using vx_int32 instead of vx_scalar is not only for preventing connecting 2 nodes with this parameter. It is also simpler for the user to create a node with a vx_int32 argument rather than a vx_scalar (no scalar object to create, initialize and release).I think your comment is relevant, things would certainly be more explicit with an additional parameter property.

Yes, vx_int32 is a lot simpler. Why not use it everywhere in the VxXxxNode functions (except for outputs I suppose) and force people to use the manual node interface if they realy need to pass in vx_scalar’s. That should be the uncommon case and is not very simple anyway.

Usualy, a vx_size parameter is given in addition of the pointer in vxQuery and vxSetAttribute functions. These functions are generic and need to work with attributes of very different sizes. Since the attribute enum name does not explicitely tells which size is expected in most of cases, the vx_size parameter acts as a sanity check to avoid memory corruptions.
For other functions like access/commit functions, there in effect no such vx_size ‘sanity check’ param. Since a scalar can object with different sizes, it may also be safer to have a vx_size parameter also. I think it’s a question of tradeoff to find between simplicity/safety/performance : more parameters usually means more work for the user and lower performance.

Yes. It is also a matter of consistency in the API. It is easier to learn a consistent API and I’d say that it is actually easier to always add a size parameter than to almost always look up the function in the docs to see if I need to add the size parameter or not. The same goes for the vx_int32/vx_scalar case above.

Hi,
two more comments:

What’s the idea behind mixing the parameter size in bytes and it’s vx_type_e in VX_PARAMETER_ATTRIBUTE_TYPE? Why not always use it’s vx_type_e? From that it’s size in bytes could be derived, right? If not it would be cleaner to add a second attribute, VX_PARAMETER_ATTRIBUTE_SIZE_IN_BYTES.

There’s a typo in the specs making it unclear whether to use a vx_size or a “void *” parameter with VX_KERNEL_ATTRIBUTE_LOCAL_DATA_PTR. I’d go for “void *” :slight_smile:

To the point of vx_scalar versus vx_int32’s:

A vx_scalar is an opaque data object reference (internally it may be a handle/index/pointer) with no defined (by the specification) internal size (since it’s opaque), so no vx_size (a size_t) is needed. vx_scalar can have ANY “register” style value (int/double/char, etc) and it can exist any where in the system (CPU/GPU/DSP, etc). The vxQueryXXXX takes a vx_size to compliment the defined data structures like vx_perf_t which has a defined sizeof(vx_perf_t) in the specification.


vx_perf_t perf;
vx_context context = vxCreateContext();
vx_graph graph = vxCreateGraph(context);
// add nodes
// verify, execute
vx_status status = vxQueryGraph(graph, VX_GRAPH_ATTRIBUTE_PERFORMANCE, &perf, sizeof(perf));

This allows a simple API (ptr, size) which handles multiple attribute types instead of having potentially multiple query APIs per object.

Hi,

In last update of standart you removed vxAssociateWithDelay function and now node parameter associate to delay object by default.

Is there any substitution of this function or is there any other way to update reference of node parameter after delay aging?

Thanks!

  1. Do I have to use Visual Studio 2013 (12)? Can I use Visual Studio 2010 (10) instead?
  2. I built it via CMake 3.2.2’s GUI based on Visual Studio 2010 (10), but after it there is no \install\ directory as indicated in README. If I then tried to build INSTALL project in the OpenVX.sln, loads of errors were reported, most of which were related to unrecognizable typedefine object such as vx_float16 vx_uint32, vx_array, vx_uint8, and vx_distributio etc.

I did the aforementioned building for Debug-Win32 Mode in VS.

Thanks for your help in advance.

Laters…
Theron

Plus, I tried to build it using VS 2013 (12) in the morning, the same issue persists. Also, as I tried to run build.py via VS’s command line, it says the following.

[ATTACH=CONFIG]75[/ATTACH]

What did I wrongly? Could anyone give me a hand? Cheers a lot.

Laters…
Theron

Hi, I got all previously presented problem solved. Thanks all the same!