malloc/free and executing sub kernels

I’d find it handy in the program I’m writing to be able to acquire more memory from within the OpenCL runtime without needing the host to preallocate the memory before hand.

Also I’d like to see some sort of functionality for executing kernels (like you would from a host) within the runtime, again without needing to involve the host.

yes I totally agree.

It would be very nice if we could spawn new kernels out of the existing ones. this seems to be a little bit contradictory, since fast thread execution is partly achieved by pre-allocating resources before kernel invocation and making changes to these on the fly could lower the performance due to dynamic resource management overhead. but this could be managed through an option upon kernel launch: you could pass arguments to the kernel launch function, something like: allow_dynamic_thread_creation or allow_dynamic_mem_allocation etc. the developer would then have the freedom to choose between flexibility and performance.