Some general confusion about Coarse-Grain SVM allocation

A buffer is allocated the size of the total memory required by the data structure, I get that much, using clSVMAlloc… I’m filling it up with a binary tree, so the root node will be the first data pointed to, now how do I create additional pointers into the shared virtual memory to represent additional nodes. In C using malloc or C++ with new generally I wouldn’t worry about where the data was, and just make each new node allocate its own memory space with additional calls to malloc or new. But I have a feeling doing that with clSVMalloc will just introduce additional complexity when it comes time to send all this data to the kernel, and possibly produce a much slower to start kernel.

The AMD and Intel examples are helpful, but I’m having a little trouble finding where they’re populating the data in their structures to answer this question on my own…

(if only my gpu fully supported fine grain SVM system >.>)

edit ----
Now maybe answering my own question… If I just treated it like a dynamically allocated array, I could just plop the data into computed offsets, leading to faster tree traversal as well since each level of the tree would be in contiguous memory… I’d still appreciate some input… I guess I feel like there’s a call I’m missing to generate a pointer to a currently unused block of the previously allocated buffer.

My nodes are very simple, basically just 3 pointers, 1 pointer to some data, and then of course the left and right node pointers. The data will be in a separate SVM buffer.

I think you’re on the right path here. Once you allocate the space for all of the nodes, using clSVMAlloc, then you need to manage the nodes in that block. You could do that a number of ways. If you allocation pattern is simple, and you allocate many entries then free them all, then you just need a counter that says the index of the next free entry. If you need to interleave allocating and freeing memory, you could make an out-of-band list of free nodes and push and pop from there. Although if your nodes are just 3 pointers, then the list of free nodes seems excessive. If you still want to use “new” on the host, “placement new” can be used to initialize the nodes given a pointer to memory that you allocate through whatever mechanism you implement.

Although this does add a bit of code complexity, I think you’ll find it more efficient than using generalized malloc on large numbers of fixed-size things, even if there were no OpenCL interactions at play.