doubt about count of work items in a work group

Hi,

I am writing sample code in opencl for adreno 330 gpu (qualcomm dragon8074 board platform : android), i have queried for max_work_group_size, i got value 512, max compute units i have 4, it means that each workgroup can support upto 512 work items which can run on compute unit, or total 4 compute units execute 4 workgroups which can support 512 in total.(means 512/4 can be supported by each work group). and i have seen no of cores to be 128 (seen on wikipedia site that alu cores are 128 ), it means that all the 4 compute units contain total of 128 or it mean that each compute is having 128 cores. can any one explain me this.

Thanks,
shabuddin

As far as I can tell WP always counts on the whole device.

A work-group runs on one compute unit. It cannot be split among several compute units (first of all because local memory is local to a compute unit).

The max work-group size is an indication of the max number of work-items in a work-group that a compute unit can track. Not all work-items (or rather threads at the HW level) run concurrently. The CU scheduler suspends threads which are stalled by a memory request or waiting for an available ALU, and resumes others.

For example, an NVIDIA GPU with Compute Capability 3.0 can run up to 2048 threads concurrently, but has only 192 ALU.

[QUOTE=utnapishtim;31316]A work-group runs on one compute unit. It cannot be split among several compute units (first of all because local memory is local to a compute unit).

The max work-group size is an indication of the max number of work-items in a work-group that a compute unit can track. Not all work-items (or rather threads at the HW level) run concurrently. The CU scheduler suspends threads which are stalled by a memory request or waiting for an available ALU, and resumes others.

For example, an NVIDIA GPU with Compute Capability 3.0 can run up to 2048 threads concurrently, but has only 192 ALU.[/QUOTE]

Sorry, i didn’t understood it clearly, about ALU cores, for example: qualcomm snapdragon 800 series board is having adreno330 gpu, its having 128 cores, and 4 compute units, it means that it has 32 cores for each compute unit ?, and each compute unit is capable of handling a workgroup having max 512work items (i queried for device info i got max_work_group_size as 512 )?

Thanks,
shabuddin.

Each compute unit has 32 ALU. So the device has a total of 4x32=128 ALU.
Each compute unit can run a work-group of up to 512 work-items.

Hi utnapishtim,

Thanks for your reply, now i understood clearly. can you explain me one more thing, now we have less no of ALU than no of work items running on a compute unit. similarly is it possible to assign more number of work groups to a single compute unit?

Thanks,
Shabuddin.