Porting fortran spread() function to OpenCL

Hello

I posted this question on both AMD and Nvidia forums few days back but no answer yet so I am putting it here now. I need functionality of fortran spread function in OpenCL kernel. For example; inside a loop with index k, I have following fortran statements that need to make (n-k)(n-k) matrices first from a row of matrix “a” and then from a column of matrix “a”;

spread(a(k,k+1:n),1,n-k)
spread(a(k+1:n,k),2,n-k)

I will probably have to make a new matrix in each iteration of loop inside OpenCL kernel and will need to spread kth row of “a” along all rows of first new matrix and kth column along all colums of second new matrix. How could I do that in Opencl kernel? Thanks in advance