Anaive simulation of 3-axis CNC cutting 'digital metal' using height maps

As an exercise to get my hands dirty with OpenCL I’m writing a naive simulation of 3-axis CNC cutting ‘digital metal’ with a ‘digital cutter’.

Both the metal and cutter are represented as 2D height maps and
the ‘cutting’ of metal is performed with kernel along these lines:

(Excuse the Java syntax, I’ve not yet written the OpenCL code, just sketching ideas out in random curly brackets syntax)

	
        int metal_size = 200;
	float metal[metal_size*metal_size];

	int cutter_size = 4000;
	float cutter[cutter_size*cutter_size];

	void kernel(float[] metal, float[] cutter,int cutter_x, int cutter_y) {
		int x = get_global_id(0);
		int y = get_global_id(1);

		int si = (x + cutter_x) + (y + cutter_y) * metal_size;
		int ti = x + y * cutter_size;
		if (metal[si] < cutter[ti])
			metal[si] = cutter[ti];
	}

Obviously the plan is to ‘clEnqueueNDRangeKernel’ for all tool cutter positions.

The problem is that because the cutter position along the cutter path changes in increments of 1 the areas over which the cutter modifies the metal will overlap and simple queueing will not produce correct results.

My idea is to divide the cutter path to segments so that no two adjacent segments and their associated area of cutting will overlap and then queue first every other segment, wait for them to finish and then queue the other segments. (This assumes the cutter path will not self intersect).

Hope that makes sense, English is not my strong suit.

Now I’m looking for comments on this approach.

How can it be improved?

Things to watch out for?

Move efficient approaches?

wbr Kusti