Kernel for finding the minimum element in the array

I am writing an OpenCL code which parallelizes finding the minimum number in an array.
Here is my kernel function:
What i am doing is comparing all the elements with all the other elements and the element which is greater i am changing the corresponding value of M[i] to 1. At last we have only one element in M array with value 0, with that index we can look in the A array and have our minimum value

__kernel void array(__global int *A, __global int *M) {

    // Get index of the element
             int i = get_global_id(0);
             int j = get_global_id(1);

   
     //barrier(CLK_LOCAL_MEM_FENCE);
     // Do the operation

            if(A[i]<A[j] && i!=j)
             {M[j]=1;}

}
I am getting all the values M array as 0.
But the output should be that only one value of the M array should be 0. and that index would give me the min number index in array A.

You could get yourself some good ideas from this approach
https://developer.nvidia.com/content/th … uction-gpu
Or search for oher sorting algorihms in cuda. there is an implementation on a parallel bubble sort.

can you please give me a simple kernel function which gives you the minimum element in an array.

no !

__kernel void u(__global int *A, __global int *B, __global int *C,int k)
{

    // Get the index of the current element
	 int i = get_global_id(0); //this will get you the index
     int j;
int v=0;

    //Checking the A[i] value with all the values of Set B
//k is the size of set B
	for(j=0;j<k;j++)
		{
		if(A[i] == B[j])
		{
			v=1;
			}
	}
if(v == 0)C[i]=A[i];

}

This is the kernel function to compute A - B(A and B are two vectors) but the problem is
size_t global_item_size = s;
size_t local_item_size = s;
s is the size of A array and if i change the global_item_size and the local_item_size i don’t get the answer

In the above kernel function k is the size of B vector