Feedback: New SPIR-V common intermediate language used by both OpenCL 2.1 and Vulkan

The Khronos Group made another significant announcement, OpenCL 2.1 and Vulkan™, the new open standard API for high-efficiency access to graphics and compute on modern GPUs, are now sharing core intermediate language technologies resulting in SPIR-V; a revolution in the Khronos Standard Portable Intermediate Representation initially used by OpenCL™, now fully defined by Khronos with native support for shader and kernel features. SPIR-V splits the compiler chain, enabling high-level language front-ends to emit programs in a standardized intermediate form to be ingested by Vulkan or OpenCL drivers. Eliminating the need for a built-in high-level language source compiler significantly reduces driver complexity and will enable a diversity of language front-ends. Additionally, a standardized IR provides a measure of kernel IP protection, accelerated kernel load times and enables developers to use a common language front-end, improving kernel reliability and portability across multiple implementations.

All feedback that is posted to this topic thread will be considered by the working group. We greatly appreciate what you have to say and the time you spent preparing your feedback.

Hello folks,

First and foremost I’d like to congratulate and to tank you all about this incredible work!

I have a few questions (/worries) about OpenCL++ specs and I’d like to know if these are because of lack of SPIR-V features or because somebody decide to restrict them in OpenCL++ specs.
So, at page 37 of OpenCL++ specs I see the following restrictions:

  • The dynamic_cast operator (section 5.2.7)
    Is this a SPIR-V restriction? Is not a must to have, most probably c++ developers can survive without it, but, IMHO, there are cases where it can be quite useful.

  • Type identification (section 5.2.8)
    Same as dynamic_cast.

  • Recursive function calls (section 5.2.2, item 9)
    You mean I can create a recursive function? E.g. I can’t compute Fibonaci number recursively? If my assumption is true (TBH I really hope is not!) then the question is: it is SPIR-V fault? It will be a quite annoying restriction even it will be OpenCL++ only restriction…

  • new and delete operators (sections 5.3.4, 5.3.5)
    Is this a SPIR-V restriction? Why can’t I allocate/free memory at runtime?

  • goto statement (section 6.6)
    Is this a SPIR-V restriction?

  • register, thread_local storage qualifiers (section 7.1.1)
    These are not necessary unless GPU vendors will make use of them. IMHO it doesn’t hurt if SPIR-V can handle them…

  • virtual function qualifier (section 7.1.2)

  • virtual functions and abstract classes (sections 10.3, 10.4)

  • function pointers (sections 8.3.5, 8.5.3)
    Is this a SPIR-V restriction? IMHO SPIR-V should handle them.

  • noexcept operator (section 5.3.7)

  • exception handling (section 15)
    I hope this is not SPIR-V restriction :). Exceptions are quite common in all modern languages. If you want people to be able to write a shader in JavaScript,Python, etc. then they are used to use exceptions and they will find quite annoying if they can’t use them.

[QUOTE=bog_dan_ro;31038]Hello folks,

First and foremost I’d like to congratulate and to tank you all about this incredible work!

I have a few questions (/worries) about OpenCL++ specs and I’d like to know if these are because of lack of SPIR-V features or because somebody decide to restrict them in OpenCL++ specs.
So, at page 37 of OpenCL++ specs I see the following restrictions:

  • The dynamic_cast operator (section 5.2.7)
    Is this a SPIR-V restriction? Is not a must to have, most probably c++ developers can survive without it, but, IMHO, there are cases where it can be quite useful.

  • Type identification (section 5.2.8)
    Same as dynamic_cast.
    [/QUOTE]
    you are not allowed to redeclare the same type with a different IDs so each type is fully defined and unique

conversion between primitive types are

recursion is possible as you can call forward declared functions (you need to specify the type of the function you are calling which is prototyped with the types at the top)

however drivers/hardware can restrict the stack size such that you can only squeeze out 2 recursive calls

OpVariable is a “new Type()” and OpVariableArray is a “new Type[n]”

deallocation/deletion can happen implicitly by not using it anymore inside the scope

besides that there is the OpLifetimeStart and OpLifetimeEnd in flow-control

no you can use random opBranches but that makes optimization harder as the compiler can’t add the op*Merge opcodes to some flow control constructs

I think you use the function IDs as first class object making them part of a struct type and opSelect between them

This will be needed if you want to support sub routines in shaders (as you’d need to opLoad it from a variable)

[QUOTE=bog_dan_ro;31038]

  • noexcept operator (section 5.3.7)
  • exception handling (section 15)
    I hope this is not SPIR-V restriction :). Exceptions are quite common in all modern languages. If you want people to be able to write a shader in JavaScript,Python, etc. then they are used to use exceptions and they will find quite annoying if they can’t use them.[/QUOTE]
    C has worked fine without exceptions,

I think it was removed because the basic block assumption will be violated and remove some optimization options

I don’t think you fully understand that the GPU computational space is decidedly limited in its capabilities, compared to what CPUs can do. This is dictated by hardware; if SPIR-V limits some form of execution, it is generally because those are the limits imposed by the hardware SPIR-V abstracts.

These limitations are what binds OpenCL++'s language support, not SPIR-V. GPUs cannot handle C++ in all of its complexity, so the goal with OpenCL++ is to handle as much of C++ as GPUs can.

That doesn’t make GPU compute or OpenCL++ useless; it’s quite useful in fact. But it’s not reasonable to expect the same level of feature support as a CPU.

At least not yet.

I’ll re-order your questions in my response:

[QUOTE=bog_dan_ro;31038]- Recursive function calls (section 5.2.2, item 9)
You mean I can create a recursive function? E.g. I can’t compute Fibonaci number recursively? If my assumption is true (TBH I really hope is not!) then the question is: it is SPIR-V fault? It will be a quite annoying restriction even it will be OpenCL++ only restriction…[/quote]

The execution model of most GPUs does not include a stack, which is essential to implementing arbitrary recursion. I understand the desire for recursion and such, but it’s not feasible on most modern GPUs.

[QUOTE=bog_dan_ro;31038]- The dynamic_cast operator (section 5.2.7)

  • virtual function qualifier (section 7.1.2)
  • virtual functions and abstract classes (sections 10.3, 10.4)
  • function pointers (sections 8.3.5, 8.5.3)
    [/quote]

These all boil down to basically the same issue: the lack of function pointers. You can’t really implement virtual functions/inheritance without function pointers. You can’t have abstract classes if you don’t have virtual functions. And there’s no point in dynamic_cast if you can’t have virtual functions/inheritance, since all of your types are static and known at compile-time.

So it all comes down to not having function pointers. The problem is that GPUs don’t work well with function pointers, since function pointers tend to rely on stack. And as previously stated, the execution model of most GPUs don’t give you one.

Given the above restrictions… why would you need RTTI? If you can’t have virtual functions and virtual base classes, then all your types are static. Thus, all types are known at compile-time.

Also, RTTI would require bloating your executable with runtime type information.

[QUOTE=bog_dan_ro;31038]- new and delete operators (sections 5.3.4, 5.3.5)
Is this a SPIR-V restriction? Why can’t I allocate/free memory at runtime?[/quote]

Because GPU compute elements can’t allocate memory. You can provide them with memory buffers, and they can use those. But allocating actual memory is very much beyond their capabilities.

Remember: the GPU execution model is many-core; you have hundreds of these small threads all executing at once, doing the same operations on different parts of memory. And memory allocation (from a shared pool, like GPU memory) is a highly single-threaded process. So every time one thread hits this mutex, any other thread that tries to allocate memory will have to stop and wait for that thread to finish.

And remember: we’re talking hundreds of threads.

That’s just not viable if you want anything remotely like performance. Which is why you’re using OpenCL to begin with :wink:

  • goto statement (section 6.6)
    Is this a SPIR-V restriction?

Only in the sense that SPIR-V doesn’t have a generalized goto. It’s much more of a “we don’t want you to be able to jump around code arbitrarily,” restriction. That is, even if SPIR-V did support it, I doubt OpenCL++ would.

And in this case, it’s not so much that GPUs can’t handle general goto (probably). It’s likely more that GPU vendors don’t want users to be able to have the capability.

  • noexcept operator (section 5.3.7)
  • exception handling (section 15)
    I hope this is not SPIR-V restriction :). Exceptions are quite common in all modern languages. If you want people to be able to write a shader in JavaScript,Python, etc. then they are used to use exceptions and they will find quite annoying if they can’t use them.

There are many “modern languages” that don’t have exceptions, so they’re hardly a linguistic requirement.

Equally importantly… why would you want to write compute operations in a scripting language like JavaScript or Python? Compute operations would gain nothing from such an environment. As you’ve pointed out, memory allocation is not allowed, so you don’t get the benefits of garbage collection. Nor can you do runtime compilation of new functions. Since function pointers aren’t possible, you’ll lose the ability to have first-class functions (so it’s doubtful that you could even implement them). If you have to give up most of the advantages of the language just to be able to execute, why are you using that language?

So really, the only reason to use these over something like C or C++ is syntactic preference. In which case, I’d say: “deal with it.”

Lastly, I’m curious as to exactly what you plan to do in OpenCL that you feel the need to have exceptions. While compute operations don’t need to be tiny things, a single compute operation is usually not so gigantic that exceptions would be an essential means of error handling.

[QUOTE=Alfonse Reinheart;31041]These all boil down to basically the same issue: the lack of function pointers. You can’t really implement virtual functions/inheritance without function pointers. You can’t have abstract classes if you don’t have virtual functions. And there’s no point in dynamic_cast if you can’t have virtual functions/inheritance, since all of your types are static and known at compile-time.

So it all comes down to not having function pointers. The problem is that GPUs don’t work well with function pointers, since function pointers tend to rely on stack. And as previously stated, the execution model of most GPUs don’t give you one.[/QUOTE]

So shader subroutine uniform support will need to be handled using opSelect on a uniform int?

To the extent that people use shader subroutines, yes, you’d have to do something like that. Unless I missed something in the SPIR-V spec (which is entirely possible), there’s no direct support for such functions.

I’d like to start with a clarification. I fully understand and I really don’t mind if all these restrictions are for OpenCL++ 2.1 spec, because it’s targeting past and present GPUs, however SPIR-V is not OpenCL++ and it shall not be tight related to past and present GPUs, instead the specs should have then future GPUs in mind (e.g. 10-20 years from now), it should be the technology that will push the things forward.

So, this is SPIR-V restriction, right?

Good to know that this is not an SPIR-V restriction.

[QUOTE=ratchet freak;31039]OpVariable is a “new Type()” and OpVariableArray is a “new Type[n]”

deallocation/deletion can happen implicitly by not using it anymore inside the scope

besides that there is the OpLifetimeStart and OpLifetimeEnd in flow-control
[/QUOTE]

Hmm … what if I want to create a function that computes something returns big result (e.g. a huge vector)? It will be copied twice?

e.g.
vector * compute_my_vector()
{
if(some_error)
return nullptr;
vector * ret = new vector;
//…
return vector;
}

[QUOTE=ratchet freak;31039]no you can use random opBranches but that makes optimization harder as the compiler can’t add the op*Merge opcodes to some flow control constructs
[/QUOTE]

Good to know that this is not an SPIR-V restriction.

[QUOTE=ratchet freak;31039]I think you use the function IDs as first class object making them part of a struct type and opSelect between them

This will be needed if you want to support sub routines in shaders (as you’d need to opLoad it from a variable)
[/QUOTE]

Shall I understand that SPIR-V can’t handle virtual functions, right?

[QUOTE=ratchet freak;31039]C has worked fine without exceptions,

I think it was removed because the basic block assumption will be violated and remove some optimization options
[/QUOTE]

Right, but SPIR-V is not C and you aim for a rich variety of frontends which are not C :).
IMHO will not hurt at all if SPIR-V will handle it. Of course OpenCL standard can explicitly specify that current GPUs drivers (because I don’t see any reason why the GPU asm can’t handle it) don’t supports it and maybe in OpenCL 3.0 you’ll add it.

IMHO SPIR-V should not limit anything. SPIR-V shall handle full C++ or any other language out there. The GPU limitations should come only from OpenCL standard. I really don’t understand why SPIR-V needs to be bound to current drivers/GPUs capabilities…

At least not yet. That’s what i’m talking about! So why limit SPIR-V?

But we can’t assume that in the future the situation will never change, right?

I understand that current GPUs/drivers don’t have a stack and starting from that restrictions you’ve had to remove the other features, but all these restrictions for current GPUs and must come from OpenCL specs NOT from SPIR-V, because nobody knows, 10y from now, what GPUs/drivers will be capable of.

I really don’t know many general proposed language that was created after 2k and doesn’t have exceptions. Actually besides Go which can return multiple types so you can still signal the problem, I don’t know any other.

Why do you think it will be that wrong to use JavaScript, Python, etc. as frontends to write shaders/compute operations as long as they end-up as SPIR-V code? I thought that was the idea of using SPIR-V, to allow game engines (or other compiler/interpreters) to use whatever they want as front ends for shaders/compute operations. I don’t see anything wrong that in the future web browsers will allow me to use JavaScript to write your shaders.

Exceptions can be heaven or/and hell, it just depends where and how they are used. Personally I like exceptions because they make my code looks cleaner, especially when I’m going to call lots of functions that might fail. But it really doesn’t matter at all why, how and if I (or you) will use exceptions, because we can’t assume that we know what’s the best for everyone, IMHO what matters the most is that SPIR-V, as an open IL specification that can be used by any frontend, shall not limit the exceptions.

SPIR-V shall have no limits at all, all the restrictions (device capabilities) shall come only from OpenCL/GLSL specs. It will quite unpleasant if in the future we’ll see SPIR-V 2.0 not being compatible with the initial SPIR-V :slight_smile:

I really appreciate what you’ve done with SPIR-V, is a huge step forward, and I really hope my feedback will help and doesn’t offend anyone.

One quick question:
Is it “spir-vee” or “spir-five”?

According to the presentation live-stream it’s spir-Vee (kinda logical since there wasn’t a spir 3 or 4)

Hmm … what if I want to create a function that computes something returns big result (e.g. a huge vector)? It will be copied twice?

What do the OpenCL++ and the C++14 specifications say? I can’t say what OpenCL++ says, but I do know what C++14 says about this.

You’re trying to look at an intermediate language when thinking about the features of the higher-level language you’re actually using. You shouldn’t; the whole point of the high-level language is that it’s an abstraction. What that language says happens is what happens; let the compiler deal with how to implement it in SPIR-V.

IMHO SPIR-V should not limit anything. SPIR-V shall handle full C++ or any other language out there. The GPU limitations should come only from OpenCL standard. I really don’t understand why SPIR-V needs to be bound to current drivers/GPUs capabilities…

This sort of predicting-the-future has been tried. OpenGL did a whole lot of it. It made a lot of guesses about what future graphics hardware would look like.

It got some things right, particularly early on. But as time passed, it got more and more wrong, to the point where OpenGL 2.0 has to develop an entirely new mechanism to deal with a new paradigm (ie: shader and program objects) that overrode large chunks of the API.

Predicting the future is dangerous. Better to standardize what exists and what will exist in the near-term.

SPIR-V is an intermediate language; you’re not supposed to keep data in this form long-term. So if we get a SPIR-V 2.0 in 3 years that provides more features, no harm done. If they have to completely rebuild SPIR-V from the ground up, no harm done. They just change the intermediate representation, people patch their front end (and add new features to their languages), and everyone’s happy.

You never want to give people the idea that something will be widely supported unless it actually is widely supported. So if there’s no support for it now or in the near future, it shouldn’t be there.

If Vulkan and OpenCL 2.1 wanted a future-proof intermediate language, they’d just use LLVM. But they didn’t. SPIR-V is not meant to be the final, ultimate intermediate language for everything under the sun.

It’s meant for today and the next few years. When things change, so too will SPIR-V. Think of it as hardware-neutral assembly. As hardware changes, as it gets more features, so too does the assembly.

Why do you think it will be that wrong to use JavaScript, Python, etc. as frontends to write shaders/compute operations as long as they end-up as SPIR-V code?

I didn’t say that it would be “wrong”. I asked why you would want to. What advantage would you get from doing so, beside some small syntactic sugar?

Python is a solid language because of its wide-ranging support, including modules for pretty much anything. It’s a scripting language that has first-class functions with proper lexical scoping, the ability to compile itself within itself, and various other scripting features. When I consider starting a project and evaluate languages for use in that project, these are the things that matter when deciding to use Python or not.

You get none of these advantages in a compute environment.

Remember: your argument was that SPIR-V needed to support exceptions, so that “modern” languages can use them. My counter-argument is to ask why supporting that particular feature is so important, when SPIR-V also doesn’t support most of the other reasons to use those languages to begin with.

Or to put it another way, you can freely use JavaScript as your front-end language to SPIR-V. But you will not have most of the language features of JavaScript, since SPIR-V doesn’t support compilation, first-class functions, etc. So why would having exceptions make the job of implementing such a language easier, when you already have to lobotomize the language just to make the syntax work?

I do got a question how are undefined intermediates handled (result of opUndef)?

Is it by default that any result of an opCode using an undefined intermediate is also undefined except things like opSelect?

Then how is opCompositeInsert handled with an undefined composite, an entirely undefined result or is the inserted portion defined while the rest remains undefined?

Writing my decoder noticed that some constant fields are masked fields. For example with Function Control Mask. I can do (Pure | Const), but it may be better if I could just treat it as a set: {“Pure”, “Const”}

Could you clearly tell when a constant field can mask in the specification, so they can be appropriately treated?

Why is the default multi-variable return handled by variable parameters instead of structures? Like the modf function of the glsl extension, why not make the result type a 2-member structure where the members must be the same type as x? The return by pointer stems from the olden C days where returning structs was not allowed in the spec these days it’s not necessary at all; the result will probably be put into 2 registers anyway and a bad optimizer will then need to store one into memory and when used load it in again.

Using a structure return would be much cleaner and more straightforward in use and closer to the metal anyway.

It’s probably because returning by structure requires that this structure be defined somewhere. So… how does that happen, exactly? Who defines it?

modf, for example, is allowed to take any numeric type. And its return type is the exact same type. Since there can be lots and lots of numeric types (various bit-sizes), and each type is considered to be different, there would need to be lots and lots of structs for them.

Where do they come from? Where is the Result<id> that declared these types?

No, it’s much simpler for opcode return values to happen as values, rather than via objects like this.

As to whether it would be “closer to the metal”… I fail to see how. It’s not like registers are organized into structs or something. In fact, I wonder if assembly languages even have a direct opcode equivalent for “modf”. I’d guess (admittedly based on nothing specific) that this function would be implemented as two distinct opcodes: one for the division and one for the remainder. And if you don’t use one or the other, that opcode can be culled during optimization.

If that’s the case, it’s much easier for the compiler to see whether you’re using a register than whether you’re using a particular member of a structure. Not that much easier probably, but it’s certainly more explicitly visible when such a value has gone unused.

[QUOTE=Alfonse Reinheart;31168]It’s probably because returning by structure requires that this structure be defined somewhere. So… how does that happen, exactly? Who defines it?

modf, for example, is allowed to take any numeric type. And its return type is the exact same type. Since there can be lots and lots of numeric types (various bit-sizes), and each type is considered to be different, there would need to be lots and lots of structs for them.

Where do they come from? Where is the Result<id> that declared these types?

No, it’s much simpler for opcode return values to happen as values, rather than via objects like this.

As to whether it would be “closer to the metal”… I fail to see how. It’s not like registers are organized into structs or something. In fact, I wonder if assembly languages even have a direct opcode equivalent for “modf”. I’d guess (admittedly based on nothing specific) that this function would be implemented as two distinct opcodes: one for the division and one for the remainder. And if you don’t use one or the other, that opcode can be culled during optimization.

If that’s the case, it’s much easier for the compiler to see whether you’re using a register than whether you’re using a particular member of a structure. Not that much easier probably, but it’s certainly more explicitly visible when such a value has gone unused.[/QUOTE]

With the structure output the optimizer just has to look for the opCompositeExtract on the result and see if any use index 0 (for the frac part) or 1 (for the integral part). Otherwise it has to look for the opLoad using the output parameter. There will be more opLoads than opCompositeExtracts in the average program.

the modf direct opcode would use a single input register and 2 output registers. The output pointer would necessarily follow that up with a opStore. By that point all structures would be flattened anyway and the assembler can consider the extra output thrashed.

Currently modf requires “Result Type, the type of x, and the type i points to must all be the same type” So the type of i must be the opTypePointer of the opType of x. similarly you can require result to be the “OpTypeStruct a,a” where a is the opType of x.

Wait, I’m now confused as to what you’re wanting, because you seem to be talking about different things. First, you asked about opcodes like modf returning structs. Now, you want to change the language to allow opcodes to have multiple return values (your “2 output registers” bit). This is a very different thing, which would require not only a significant rework of the language (extension opcode return values are part of the extension instruction), but a significant rework of drivers that are already being written.

I don’t think it’s worth it. Either of them.

And I really don’t see the point. Even you point out that, while it won’t hurt optimizers, it won’t help them either. So what’s the point? This seems to be a feature that does nothing but make the language easier for a human to use. But humans aren’t supposed to write SPIR-V, so the need for such a feature is… dubious.

[QUOTE=Alfonse Reinheart;31172]Wait, I’m now confused as to what you’re wanting, because you seem to be talking about different things. First, you asked about opcodes like modf returning structs. Now, you want to change the language to allow opcodes to have multiple return values (your “2 output registers” bit). This is a very different thing, which would require not only a significant rework of the language (extension opcode return values are part of the extension instruction), but a significant rework of drivers that are already being written.

I don’t think it’s worth it. Either of them.

And I really don’t see the point. Even you point out that, while it won’t hurt optimizers, it won’t help them either. So what’s the point? This seems to be a feature that does nothing but make the language easier for a human to use. But humans aren’t supposed to write SPIR-V, so the need for such a feature is… dubious.[/QUOTE]

It’s not a feature it’s a convention to avoid out-parameters in general.

If you are going to design an instruction set with a combined round-to-zero/frac instruction (requiring a shift, mask and subtraction) you are not going to design it with an implicit store to memory and have the program risk a cache miss every time it’s used.

This means that some of the first steps in compilation to machine code would be to expand modf to modf + opStore and then hope that the variable opStore-opLoad elimination will be sufficient to find and remove that variable.

With the structured output from the get-go there is no variable that may survive beyond the block/function unless the program explicitly opStores the result.

Let me rephrase a bit. First a few definitions for clarity: opCode means spir-V opcode, ILcode means a driver-specific intermediate language opcode used for optimizing beyond what you can do with just spir-v (with the assumption that structures in the ILcode are flattened and it only works with scalars or vectors and the modf ILcode only uses registers).

I want the modf opCode (and the frexp opCode in the openCL extension) to return a struct and avoid the pointer store and later load that the driver needs to optimize out.

With the current style it means that that the modf opcode will expand to a modf ILcode (which uses 2 output registers) and a store ILcode for the second output then the optimizer will need to eliminate the redundant load of the stored value.

With the structured output the relevant structures will already be flattened into their individual components in the IL and that the register allocator can have its way with the assigned intermediates.

With the current style it means that that the modf opcode will expand to a modf ILcode (which uses 2 output registers) and a store ILcode for the second output then the optimizer will need to eliminate the redundant load of the stored value.

What if the ILcode doesn’t have a direct modf ILcode equivalent at all (like LLVM)? It would have to use multiple opcodes anyway.

Or what if the modf ILcode works exactly like the modf OPcode?

Or what if the OPcode-to-ILcode translator turns it into a function call? One that returns a value and has a pointer that it can fill in.

Or what if the OPcode-to-ILcode translator was written by someone who actually did their job correctly and filtered out such minor differences? You know, the kind of thing a translation layer is supposed to do.

You’re talking about a problem that is back-end specific, and easily solvable in those instances when it occurs.

[QUOTE=Alfonse Reinheart;31178]What if the ILcode doesn’t have a direct modf ILcode equivalent at all (like LLVM)? It would have to use multiple opcodes anyway.
[/QUOTE]
modf can be emulated with a trunc and a subtract. frexp require a bitcast a mask and some shifts (plus a opStore for both in its current form).

But there still shouldn’t be a store needed to get the values

Then it’s no trouble adding the opload

That function should be changed to return a struct and would then get inlined

w00t one guy did his job correctly, now for all the other implementations

[QUOTE=Alfonse Reinheart;31178]
You’re talking about a problem that is back-end specific, and easily solvable in those instances when it occurs.[/QUOTE]
But it still puts pressure on the optimizers to get it correct.