I want to pass a short array of floats as input to every work item -- say less than ten floats.

I gather you can't do the following, because fruits is a __private float pointer which is not allowed as a kernel argument (6.5).
Code :
__kernel void simpleKernel( global float* notrelavant , float fruits[2] )
  ...  // not allowed, but gives the most natural coding syntax for my problem
  .... // case.  i.e.   for (i=0; i< 2; i++) sum += fruits[i];

I can think of two alternatives. Are there others? What are the performance implications?

First, we can do the exact same thing using a structure.
Code :
typedef struct {
   float one;
   float two;
} structOfFruits_t;
__kernel void simpleKernel( global float* notrelavant, structOfFruits_t fruits )
  ...  //this method confirmed to work

Or, we can declare a small ~10 element memory buffer in constant or global memory.
Code :
__kernel void simpleKernel( global float* notrelavant, constant float* fruits )
  ...  //all work items will likely access fruits[0], fruits[1], etc. nearly simultaneously

Can we speculate how the compiler would treat this case on a GPU? Which would win out in a race, the private memory structure, or the constant memory array?

By the way,
the AMD OpenCL compiler catches the first case error,
"error: kernel pointer arguments must point to addrSpace global, local, or constant.
float fruits[2]"
The Apple OpenCL implementation does not. (I'm on OS X 10.6.8 ) Instead I get some EXC_BAD_ACCESS from within the clSetKernelArg() function call on the host. I'm thinking about reporting it...

Thanks for your thoughts.