calculation of a float value

Hi
I am using my laptop with an intel graphics and a Intel CPU. So i Think the whole work is done by the CPU or? I am using a testprogram which calculates the result of a multiplication of two float values. The calculation is done in a openCL kernel and the result is checked in the normal c program but the results are not the same as you can see in the following output of the program:

results = 0.772663116455078, data[1020] = 0.772663124036853

Results is the value from the kernel and data is the value from the normal c program. When the both runs on the same execution unit the results should be the same or?

Best regards

Harald

I am using my laptop with an intel graphics and a Intel CPU. So i Think the whole work is done by the CPU or?

In OpenCL the application has to choose explicitly what device is going to do the work. If you search for “clGetDeviceIDs()” in the source code of your app you will find what device is used. It’s almost certainly the CPU in your case.

When the both runs on the same execution unit the results should be the same or?

Can you show us the kernel source and the way you computed the same thing in regular C? I would suspect that OpenCL is computing the value in single precision while the regular C code is using x87 internally.

Intel GPU’s don’t have vertex shaders - and their fragment shaders are pretty slow and rather limited.

You might be better off doing it in the CPU.

Intel GPU’s don’t have vertex shaders - and their fragment shaders are pretty slow and rather limited.

Intel GPUs DO have vertex shaders. Regardless, shaders aren’t in question unless you are using heavy OpenGL interoperation.

However, the current Intel OpenCL implementation supports only Intel CPUs at this point. Until the drivers come out for Intel GPU OpenCL, your app is running only on the CPU.

Hi

Thanks for the answers. I am using the ATI Stream SDK but I am working on a Laptop with an Intel CPU and Intel GPU so I think the program only runs on the CPU. Is there a driver from Intel which supports OpenCL for Linux?

For not starting another thread I write here because I have another question. The first opencl command in my code is clGetPlatformDs and than i run clPlatformInfo. The output of the clPlatformInfo command is “Advanced Micro Device, Inc.”. My question know is if this is normal? I thought the platform is Intel or shows he do me the sdk which i use? When i run the clGetPlatformIDs on a PC with a intel CPU and a NVIDIA GPU is the number of platfroms than two or one?

With the output of the clGetPlatformIDs I can run the clGetDeviceIDs where i get the devices which i can assign to a context or? Is it possible to assign more than one device to a context or do I need a new context for every device?

Thank for your help!

Best regards
Harald

The output of the clPlatformInfo command is “Advanced Micro Device, Inc.”.

That’s normal. The platforms represent which SDKs you have available in your system. Currently you are using AMD’s SDK.

If you install for example both AMD’s and NVidia’s SDK you will find two platforms.

When i run the clGetPlatformIDs on a PC with a intel CPU and a NVIDIA GPU is the number of platfroms than two or one?

Platforms only depend on what SDKs you installed, not which devices are connected to your system.

With the output of the clGetPlatformIDs I can run the clGetDeviceIDs where i get the devices which i can assign to a context or?

That’s right. Each platform shows zero or more devices. You can assign any of those devices to a context.

Is it possible to assign more than one device to a context or do I need a new context for every device?

Yes, it’s legal. However, you can only assign to the same context devices that come from the same platform. You can’t put in one context a device from platform A and a device from platform B.

Hi

Thanks for your answer. The whole part is know more transparent for me. I always get the amd platform an a intel device which works fine. If i work with windows 7 and I use the Intel driver for OpenCL and the NVIDIA driver with OpenCL i get two platforms. The intel platform is the CPU and the NVIDIA platform is the GPU. So it is not possible to use both devices in the same context? That means that i decide which kernel runs on the GPU and which kernel runs on the CPU or?

Best regards

Harald

So it is not possible to use both devices in the same context?

That’s right. Devices from different platforms cannot be in the same context.

That means that i decide which kernel runs on the GPU and which kernel runs on the CPU or?

You always have to select which kernels run on which devices. How devices are arranged in different contexts has nothing to do with this. OpenCL doesn’t do automatic load balancing.

The reason contexts are important is because objects like images (textures), buffers, etc. all belong to a single context. You can’t share an image or a buffer between two different contexts and you can’t wait in one context for an event that belongs to a different context. You would have to make a copy.

Hi
Thanks for your describtion of contexts. Did developers in the industrie uses multiple contexts or to the use only one context where all available devices are included? So when i want to use a intel CPU and a NVIDIA GPU with the best performance i need two contexts or? The partition into work groups is done by opencl in dependcie of the available devices or compute units or is it possible for the developer to set the number of work groups and work items?

Best regards
Harald

So when i want to use a intel CPU and a NVIDIA GPU with the best performance i need two contexts or?

Currently you have no option. You have to create two contexts in that case. If you had an AMD GPU and an AMD CPU then you could have both in one context.

The partition into work groups is done by opencl in dependcie of the available devices or compute units or is it possible for the developer to set the number of work groups and work items?

The developer always has to select the number of work items and the device where the work is executed. The developer can also choose the number of work-groups.

FWIW, I believe the AMD OpenCL CPU implementation will run on any x86 processor (and do a good job of it, although I doubt AVX support exists). So if you have an OpenCL-capable AMD GPU, you should be able to use one context with both CPU and GPU devices.

This does not apply to the Mac platform, where a single context can contain all the devices because of the unified OS-level implementation.

True. Thanks for the correction :slight_smile: I didn’t mean to imply that you need an AMD CPU to be able to use AMD’s SDK. Putting together an Intel CPU and an AMD GPU would also work fine.

Ok Thanks for all your answers.

When I run a PC with a Intel quad core, a NVIDIA graphiccard and a linux OS I can use the NVIDIA OpenCL implementation for the graphiccard and the ATI implementation for the Intel CPU because there is no implementation for OpenCL wfrom Intel in linux or?

What i dont understand is how i can set the number of work groups. Is this nor normale set automatic from OpenCL and is the same as the number of Compute Units or Processing Units?

What i dont understand is how i can set the number of work groups.

The number of work groups is computed based on a parameter you pass to clEnqueueNDRangeKernel called “local_work_size”.

“local_work_size” determines the number of work-items in a work-group. The number of work-groups is then computed as the global size (that is, the total number of work-items in the NDRange) divided by the work-group size.

This is a bit different from the way it works in DirectCompute, where you have to choose the number of work-groups directly and the global size.

Results is the value from the kernel and data is the value from the normal c program. When the both runs on the same execution unit the results should be the same as the logic for both the programs are the same.

The calculations are performed by the ALU which is an integral part od the CPU.So which ever processor or OS one uses the multiplication of the two float values will always return the same value.

So which ever processor or OS one uses the multiplication of the two float values will always return the same value.

As you surely know, CPUs in modern PCs have both SSE units as well as x87 floating point units. The two of them do not operate in exactly the same way and this is one of the reasons results can differ. OpenCL requires computations to follow IEEE 754 rules whereas x87 doesn’t do that.

Also, floating point units can typically be configured for different rounding modes. This can be a second cause of slight differences in the computed values.