Device info: maximum number of workitems per compute unit

When tuning the local work size for optimum performance, one of the parameters that must be taken into account is the number of work-items (WI) and work-groups (WG) that can be managed by a compute unit (CU) at a given time.

This information is currently not exposed by the clGetDeviceInfo() API call, and I propose their inclusion in the next issue of the standard (something like CL_DEVICE_MAX_WORK_GROUPS_PER_CU and CL_DEVICE_MAX_WORK_ITEMS_PER_CU).