Multiplication Matrix with OpenCL in JAVA

Hello.

i am beginner in OpenCL, and i need your help for create programme with OpenCL in JAVA.
this programme is for multiplication Matrix in GPU

i found many kernel example for this programme, but i can’t find how i pass my matrix in VAJA to OpenCL Kernel

Hello

when i run programme , bock here in this first line :
[NOTE]
String src = IOUtils.readText(JavaCLTutorial1.class.getResource(“ClKernels.cl”));
CLProgram program = context.createProgram(src);[/NOTE]

my kernel code is :

[NOTE]kernel void matrix_mult( global float4 *a_mat,
global float4 *b_mat, global float *c_mat) {

float sum;

int num_rows = get_global_size(0);
int vectors_per_row = num_rows/4;

int start = get_global_id(0) * vectors_per_row;
a_mat += start;
c_mat += start*4;

for(int i=0; i<num_rows; i++) {
    sum = 0.0f;
    for(int j=0; j<vectors_per_row; j++) {
        sum += dot(a_mat[j], b_mat[i*vectors_per_row + j]);

    }
    c_mat[i] = sum;
} [/NOTE]

my java code, when error occurred

[NOTE]
public class Cl {
public static void main(String[] args) throws IOException {
CLContext context = JavaCL.createBestContext();
CLQueue queue = context.createDefaultQueue();
ByteOrder byteOrder = context.getByteOrder();
int n = 1024;
// create memory for a 4x4 matrix
Pointer<Float> AmatrixPtr=Pointer.allocateFloats(44).order(context.getByteOrder());
Pointer<Float> BmatrixPtr=Pointer.allocateFloats(4
4).order(context.getByteOrder());
Pointer<Float> OutmatrixPtr=Pointer.allocateFloats(4*4).order(context.getByteOrder());

    // write the matrix elements
    AmatrixPtr.setFloats(new float[]{
        1, 0, 0, 1,
        2, 1, 2, 0,
        1, 0, 1, 1,
        0, 0, 1, 2});
    BmatrixPtr.setFloats(new float[]{
        1, 0, 0, 1,
        2, 1, 2, 0,
        1, 0, 1, 1,
        0, 0, 1, 2});
        
   
    
// create a GPU buffer for the matrix
    CLBuffer&lt;Float&gt; OutmatrixGpu=context.createBuffer(CLMem.Usage.Output, OutmatrixPtr, true);
    CLBuffer&lt;Float&gt; AmatrixGpu=context.createBuffer(CLMem.Usage.Input, AmatrixPtr, true);
    CLBuffer&lt;Float&gt; BmatrixGpu=context.createBuffer(CLMem.Usage.Input, BmatrixPtr, true);
    
    
    String src = IOUtils.readText(JavaCLTutorial1.class.getResource("ClKernels.cl"));
    CLProgram program = context.createProgram(src);
    
    // Get and call the kernel :
    CLKernel addFloatsKernel = program.createKernel("matrix_mult");
    addFloatsKernel.setArgs(AmatrixGpu, BmatrixGpu, OutmatrixGpu);
    }

     
    
}

[/NOTE]