syntax of 'dot' routine

Hello,

Following is a code taken from “OpenCL in action”.

__kernel void matvec_mult(__global float4* matrix,
__global float4* vector,
__global float* result) {

int i = get_global_id(0);
result[i] = dot(matrix[i], vector[0]);
//printf(“Greetings host from work-item [%03d]”, i);
}

In this case ‘dot’ will always take 4 ‘float’ elements.

In my project the kernel gets the following parameters:

__kernel void matvec_mult(__global float* matrix,
__global float* vector,
__global float* result) {

}

‘matrix’ is NxM
‘vector’ has Mx1 elements.
Is it possible to run ‘dot (matrix[1,:], vector)’ in one operation ?

Thanks,
Zvika

The dot function (could not link to sdk definition due to my newbie forum status) can take in either single values or vectors (in the programming term). If you are getting in a input matrix and vector that are of arbitrary length you will have to at least break them down to the biggest possible vector-components that are possible. So as far as I know you can not do the dot function in one operation on arbitrarily sized inputs. You’ll probably want to make float4 vectors for each thread.

I am pretty new to this whole thing so there might be some inaccuracies.

__kernel void matvec_mult(__global float* matrix,
__global float* vector,
__global float* result) {

int i = get_global_id(0);

float4* matvec = matrix[i*4];
float4* vector = vector[0];

result[i] = dot(matvec, vector[0]);

//printf(“Greetings host from work-item [%03d]”, i);

}

I think it would look something like that though I have not tested to see if this indeed works.

EDIT: This was a horrible example by me! The intent was to show how vectors work, but the actual matrix computation doesn’t make much sense. What the code above does is compute the dotproduct of the each 4 element segment of the matrix array with the first 4 elements of the input vector.

You need a loop to compute this dot product:


float tmp = 0.0f;
__global float* line = matrix + N * i;

for (int j = 0; j < M; j++)
{
    tmp += line[j] * vector[j];
}

result[i] = tmp;

Note that since the content of vector is used again and again by all work-items, it could be interesting to store it in local memory.