weird result of local variable

holyodin · March 3, 2017, 8:16am

hi, all.
i wrote some codes to test the behavior of local variable last night. the result however, was really confusing. the kernerl is very simple:

kernel void  Diff2(global int* C, local int* h)
{
	__local int x;
	x = get_local_id(0);
	h[0] = 0;
	int idx = get_group_id(0)*get_local_size(0) + get_local_id(0);
	C[idx] = x;

}

in which C is a 10 elements array use to show the result. on the client side, i divide the wokrgroups and workitems like this:

	size_t global_offset[] = { 0 };
	size_t global_size[] = { 10 };
	size_t local_size[] = { 2 };

when i read the C array back to client, the result is 1 1 1 1 1 1 1 1 1 1 , which is as my expected.however , when i comment out h[0] = 0; , the result turns to 0 1 0 1 0 1 0 1 0 1.
The h[0] = 0; line does nothing in the kernerls , how does it change the behavior of x???:doh:
i am new to CL, any advice and suggestions will be greatly appreciated:o

my gpu is NV GTX 780, and use OPENCL 1.2
another strange problem is when i run the same code on my surface pro4, the result is 0 0 0 0 0 0 0 0 0 0 (not 1 1 1 1 1 1 1 1 1 1 in NV case).

HadrienG · March 4, 2017, 2:42am

Your code has a data race. Because x is local, it is shared by the two work-items in each work-group, which means that the effect of assigning two different values (0 and 1) to x from both work items is undefined by the OpenCL C language.

Here are just a few possibilities:
[ul][li]Work-item 0 and 1 concurrently execute the assignment and work-item 1 wins. Final value of C will be a sequence of 1.[/li][li]Work-item 0 is processed first. It assigns 0 to x, then assigns x to C. Work-item 1 is then processed. It assigns 1 to x, then assigns x to C. Final value of C will feature alternating 0s and 1s.[/li][li]Work-item 0 and 1 concurrently execute the assignment and work-item 0 wins. Final value of C will be a sequence of 0.[/ul][/li]
In general, you should not try to reason about code which has data races, as the result of a data race can be both timing-dependent and hardware-dependent.

In this specific case, making x private would make your code race-free and predictable: each work-item sets x to its local id, then sets C to x, so the final value of C will feature alternating 0s and 1s.

By the way, some tips about the way you compute global indexes:
[ul][li]This is not CUDA, so you don’t need this "get_group_id(0)get_local_size(0) + get_local_id(0)" boilerplate. Just write “get_global_id(0)” instead.[/li][]For large global work sizes, int indices can overflow. You may want to use size_t for this purpose instead, as suggested by the specification of OpenCL’s get_something functions.[/ul]

holyodin · March 7, 2017, 5:10am

[QUOTE=HadrienG;41932]Your code has a data race. Because x is local, it is shared by the two work-items in each work-group, which means that the effect of assigning two different values (0 and 1) to x from both work items is undefined by the OpenCL C language.

Here are just a few possibilities:
[ul][li]Work-item 0 and 1 concurrently execute the assignment and work-item 1 wins. Final value of C will be a sequence of 1.
[/li][li]Work-item 0 is processed first. It assigns 0 to x, then assigns x to C. Work-item 1 is then processed. It assigns 1 to x, then assigns x to C. Final value of C will feature alternating 0s and 1s.
[/li][li]Work-item 0 and 1 concurrently execute the assignment and work-item 0 wins. Final value of C will be a sequence of 0.[/ul]
[/li]
In general, you should not try to reason about code which has data races, as the result of a data race can be both timing-dependent and hardware-dependent.

In this specific case, making x private would make your code race-free and predictable: each work-item sets x to its local id, then sets C to x, so the final value of C will feature alternating 0s and 1s.

By the way, some tips about the way you compute global indexes:
[ul][li]This is not CUDA, so you don’t need this “get_group_id(0)*get_local_size(0) + get_local_id(0)” boilerplate. Just write “get_global_id(0)” instead.
[/li][li]For large global work sizes, int indices can overflow. You may want to use size_t for this purpose instead, as suggested by the specification of OpenCL’s get_something functions.[/ul][/QUOTE]
[/li]
very informative:D,you mentioned data race then i googled it and found a lot of infos. your advices are also very helpful. THX dude!!