1. Matrix Multiplication

Hi,

I want to test un matrix multiplication, because,i think, it's a good way to compare the GPU perf.(OpenCL) and CPU perf(OpenMP, in my case).

So, first, I began with un simple vector addition :
Code :
```__kernel void IntAdd(__global const int* a, __global const int* b, __global int* c)

{

int iGID = get_global_id(0);

c[iGID] = a[iGID] + b[iGID];

}```

A a can deduce that :
c[0] = a[0] + b[0]
c[1] = a[1] + b[1]
....

Now, I'm trying the following kernel code :
Code :
```#define N 32  // matrice carre

__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)

{

int row = get_global_id(0);
int col = get_global_id(1);

int Cres = 0;
for(int i = 0;i< N; i++)

{Cres += a[row*N + i ] * b[i*N + col];}

c[row*N + col]= Cres;

}```

I see that row={0,1,....15} and col is always 0. If I need col =1 , i should do col+1.
, and I have this result on my terminal :

Code :
```##  A MATRIX  ##
0    1    2    3
4    5    6    7
8    9   10   11
12   13   14   15

##  B MATRIX  ##
0    1    2    3
4    5    6    7
8    9   10   11
12   13   14   15

##  C MATRIX  ##
0    1    2    3
4    5    6    7
8    9   10   11
12   13   14   15```

I can put the full code (single .cpp file) if necessary. compilation : "g++ DemoMatMult.cpp -o MatMult -lOpenCL".

My questions are :
- Do "get_global_id" works like said.
- Of course, why the code doesn't work.

2. Re: Matrix Multiplication

The values returned by get_global_id() are determined by the arguments passed to clEnqueueNDRangeKernel().

In your case you want something like this:
Code :
```size_t work_size[2] = {N, N};

errcode = clEnqueueNDRangeKernel(queue, kernel, 2 /*two-dimensional ndrange */,
NULL, &work_size[0], NULL, 0, NULL, NULL);```

With that, get_global_id(0) will return values from 0 to N-1 and get_global_id(1) will do the same.

3. Re: Matrix Multiplication

Thank you Mr Garcia.

I do like you said, but I think my "main" problem was something else.

Let me explain : matrix multiplication work with :

Code :
```const char* program_source[] =

{
"__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)",

"{",

"int row = get_global_id(0);",
"int col = get_global_id(1);",
"int Cres = 3;",
"for(int i = 0;i< 4; i++)",

"{Cres += a[row*4 + i ] * b[i*4 + col];}",

"c[row*4+col]= Cres;",

"}",

};```

but, it doesn't work with a constant value (#define N 4 ):

Code :
```...
#define N 4
...
const char* program_source[] =

{
"__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)",

"{",

"int row = get_global_id(0);",
"int col = get_global_id(1);",
"int Cres = 3;",
"for(int i = 0;i< N; i++)",

"{Cres += a[row*N + i ] * b[i*N + col];}",

"c[row*N+col]= Cres;",

"}",

};```

I think " " " are not take account the value of N ?!
I don't use the .cl file for now. Should I use it on this kind of situation.
Thanks.

4. Re: Matrix Multiplication

You need to #define N inside the kernel source.

Code :
```const char* program_source =
"#define N 4\n"
"__kernel void MatMult(__global const int* a, __global const int* b, __global int* c)\n"
...```

5. Re: Matrix Multiplication

So simple. I had not thought of that.
Thank you once again.

(This) Problem solved.
Best Regards!

6. Re: Matrix Multiplication

I think I can continue here for asking my question. If inappropriate, please correct.

I did a comparison between my "optimized" matrix multiplication with OpenMP, and my "simple" mat. mult. with OpenCL.

#define N 2048 // for both -> N*N matrices
OpenMP :
Code :
```...
#pragma omp parallel for private(i,j,k), shared(a,b,c), schedule(dynamic)
for(i=0; i<N; i++)
{
for(k=0; k<N; k++)
{
for(j=0; j<N; j++)
{
c[i][j]+=a[i][k]*b[k][j];
}
}
}
...```
compiled with gcc -O3

, OpenCL
Code :
```...
const size_t global_work_size[2] = {N,N};
const size_t local_work_size[2] = {16,16};

result = clEnqueueNDRangeKernel (command_queue,
kernel,
2,
NULL,
&global_work_size[0],
&local_work_size[0],
0, NULL, NULL);```

OpenMP time = 1.125 s.
OpenCL time = 5.532 s.

i think the calculating is not so big, and it take time to transfer data to GPU, and we don't need to use it on this case.
I don't know yet use the shared memory. Maybe I should continue by learning how to use shared memory.

7. Re: Matrix Multiplication

sorry, local memory for openCL

8. Re: Matrix Multiplication

Originally Posted by wrx
i think the calculating is not so big, and it take time to transfer data to GPU, and we don't need to use it on this case.
You could also try using the CPU device to see how that performs.

9. Re: Matrix Multiplication

OpenMP code use CPU only:

\$ time ./mult_matrix
c[2047][2047] = 1449828352
real 0m1.108s
user 0m8.550s
sys 0m0.030s

10. Re: Matrix Multiplication

First of all the code written is in the wrong syntax .

int iGID = get_global_id(0);In the get _global_id(),there should be no parameters passed.The error is thrown as now the function takes the 0 as a parameter and thus the entire logic changes

moreover c[row*N + col]= Cres;

is not the proper way to write again,as the variable should always be present in the LHS.and the constants or the working formula should be in the RHS

Page 1 of 2 12 Last

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•