In the Codeproject example:

// create data for the run
float* data = new float[DATA_SIZE];

// Create the device memory vectors
input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL);

// Transfer the input vector into device memory
err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0,

// Set the arguments to the compute kernel
err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);

// Execute the kernel
err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);

Question is if I can choose between CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_CPU, when executing on the host, how would the kernel use data on the host? It seems to me that in clSetKernelArg, the kernel is always set to use &input, which is on the device, and that doesn't make sense when running on the CPU.

Any clarification is much appreciated.