help aligning struct for sending array of structs to the GPU

Hi there

I have some structures, and I didn’t understand how I should align them.

On the CPU I have an array of structures (voxel *VoxelList) and I want to send it to the GPU, where I copy the elements into another array which is on the GPU (GPUVoxelList). Then I want to access from the kernel each element with GPUVoxelList.

here are the structures, organised as good as I could understand:

typedef struct attribute ((aligned(16))) Raza

{

cl_uint4 pozitie; ->x

cl_float4 directie; ->x + 16

cl_float dimensiune; ->x + 32

cl_float fov; ->x + 36

cl_uchar4 mediu_parcurs; ->x + 40

cl_uchar4 mediu_curent; ->x + 44

cl_ushort2 pixel; ->x + 48

cl_ushort influenta; ->x + 52

cl_ushort luminozitate_curenta; ->x + 54

cl_uchar padd[8]; ->x + 56

}raza; ->x + 64

typedef struct Material

{

cl_uchar4 culoare; ->x

cl_char4 normala; ->x + 4

cl_uchar2 reflexivitate; ->x + 8

cl_uchar2 transparenta; ->x + 12

cl_ushort luminozitate; ->x + 16

cl_uchar densitate; ->x + 18

}material; ->x + 19

typedef struct attribute ((aligned(16))) Voxel

{

cl_uint fiu[8]; ->x

cl_uint parinte; ->x + 32

material m; ->x + 36

cl_uchar padd[9]; ->x + 55

}voxel; ->x + 64

I am only sending the raza and the voxel structs. The material struct doesn’t have to be padded because I padd at the end of the voxel struct and it starts at 36, right?

In C99, internal and trailing padding in structs is implementation-defined. OpenCL inherits this behavior as well.

Fortunately, you can use “attribute ((packed))” to force the OpenCL C compiler to eliminate all padding from a struct. On the host side you can do the same either with attributes or with pragmas and once you have packed structs on both the host and the OpenCL device, sharing data using structs becomes straightforward.

so I don’t need any padding nor organising of elements, just attribute packed?

And what about the alignments in the manual: 2byte elements should be 2byte aligned, 4 byte elements 4 byte aligned?
What if my array starts at memory location 3, and the structure starts with a 4 byte uint, how could I read that on the gpu? I am a bit confused :frowning:

so I don’t need any padding nor organising of elements, just attribute packed?

That is correct. Reshuffling some elements might improve performance but it should work in any case.

And what about the alignments in the manual: 2byte elements should be 2byte aligned, 4 byte elements 4 byte aligned?

That’s an excellent question! :slight_smile:

You made me realize something I said yesterday was wrong. attribute packed guarantees that “each member of the structure or union is placed to minimize the memory required” but it doesn’t override the natural alignment of types given in section 6.1.5. In other words:

struct attribute ((packed)) mystruct
{
char foo;
int foobar;
};

is going to have a sizeof equal to 8 bytes because ‘foobar’ needs to be 4-byte aligned.

Hopefully Affie will confirm that this is the case.

What if my array starts at memory location 3, and the structure starts with a 4 byte uint, how could I read that on the gpu?

AFAIK you need to make sure that the struct is correctly aligned. I.e. its address must be a multiple of the sizeof the first member of the struct.

Affie, can you shed some light on this?

Wow, thank you very much, you made everything much simpler :smiley:

doesn’t work :frowning:
but I didn’t align the array on the device? how do I align a cl_mem?
P.S. there are many other places that could contain errors, including the cpu part, but I’ll see tomorrow :slight_smile:

but I didn’t align the array on the device? how do I align a cl_mem?

I’m having trouble understanding that. Could you explain it in a different way?

All cl_mem objects are automatically aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN. This value will be relatively big so this won’t be an issue for the alignment of your structs.

Now I realize that emulating attribute ((pack)) on the host (application) side is going to be a bit difficult. Your original idea of aligning everything to 16 was actually better. Sorry to add to the confusion. Your “Material” struct also needs the attribute for alignment.

Remember to use the same alignment on both the kernels and your application’s source code:

On your kernel source code:


struct __attribute__ ((aligned (16))) my_struct
{
    ...
};

On your application’s source code (this #pragma will work on Visual Studio and some other compilers):


#pragma pack (push, 16)
struct my_struct
{
   ...same fields as the kernel source above...
};
#pragma pack(pop)

With these you are telling both the OpenCL C compiler and your application’s compiler (Visual Studio) that you want all your struct members to be aligned to 16 bytes.

Let me know if it helps.

Thank you very much :smiley:

At list for a member I can test, in the middle of the structure, it works (the pixel element). I’m sure the others are also correct :slight_smile:

typedef struct Raza
{
uint4 pozitie; //x, y, z
float4 directie; //i, j, k (cu cat se misca la fiecare pas)
float dimensiune; //cat de mare e cubul acum
float fov; //cu cat creste la fiecare pas

uchar4 mediu_parcurs;	//r, g, b -> cat filtreaza din fiecare culoare

uchar4 mediu_curent;	//r, g, b, densitate

uint pixel;	//pixelul care trebuie updatat (w=pixel%width h=pixel/hidth)

ushort influenta;	//cat influenteaza culoarea lui pixelul
ushort luminozitate_curenta;	//luminozitatea mediului curent
uchar padd[8];

}attribute((aligned(16))) raza;

It also needs the padd[8], without it I only get random.
I’ll also have to see for the voxel witch contains the material structure :slight_smile:

Also tried the other two structures, something had to be modified:

typedef struct Material
{
uchar4 culoare; //r, g, b ( [0, 255] )

char4 normala;	//x, y, z ( [-1, 1] * 127 )

uchar2 reflexivitate;	//indice reflexie, specularitate ( [0, 255] )
uchar2 transparenta;	//indice transparenta, specularitate ( [0, 255] )

ushort luminozitate; //( [0, 65535] )

uchar densitate; //( [0, 255] )

}attribute((aligned(16))) material;

typedef struct Voxel
{
uint fiu[8];

material m;
uchar padd1[5];

uint parinte;


uchar padd2[4];

}attribute((aligned(16))) voxel;

99% of the values of ‘parinte’ are correct, I assume the others are wrong because I did something wrong in the code, not because of the alignments.
I think the idea was that material should start at a multiple of 16 (because it is aligned to 16) and the paddings help align parinte to 4 (uint=4) and the size of the structure to be multiple of 16 (64).

Thank you :slight_smile: