# Thread: Reduce / remove loop dependency in HSL

1. ## Reduce / remove loop dependency in HSL

In OpenCL / RTL design, there is a way to reduce loop dependency by making the accumulator into a shift register to improve the pipeline factor like the code below:

Code :
```float shift_reg[DEPTH];
for(int i = 0; i < DEPTH; i++) {
shift_reg[i] = 0;
}
for(int i = 0; i < loop_bound; i++) {
shift_reg[DEPTH - 1] = shift_reg[0] + arr[i];
#pragma unroll
for(int j = 0; j < DEPTH - 1; ++j) {
shift_reg[j] = shift_reg[j + 1];
}
}
float sum = 0;
#pragma unroll
for(int i = 0; i < DEPTH - 1; ++i) {
temp_sum += shift_reg[i];
}
result = temp_sum;```

I don't quite understand this method. And Can I use normal register(array) instead of shift register to implement this?

2. This is the manual implementation of the CPU technique called Register renaming. Purpose is the same. https://en.wikipedia.org/wiki/Register_renaming

And Can I use normal register(array) instead of shift register to implement this?
Shift register is a normal register array.

3. Purpose is the same
When looking at the code more closely, though, I'm kinda not sure what the hell is actually going on in there. That rotation in the inner loop creates a false dependency, does it not?

If you do something like this:
Code :
```float shift_reg[DEPTH];
for (int i = 0; i < loop_bound / DEPTH; ++i){ //assume they divide exactly
for (int j = 0; j < DEPTH; ++j){
shift_reg[j] += arr[i * DEPTH + j];//Will probably be replaced by a compiler with "load loop" and "compute sum loop"
}
}

float sum = 0;
#pragma unroll
for(int i = 0; i < DEPTH - 1; ++i) {
temp_sum += shift_reg[i];
}
result = temp_sum;```

It is obvious that sums associated with every register can be computed independently, which allows a CPU or GPU utilize their pipelining capabilities better. Your code I don't really understand either.

4. I've just realized you were talking about FPGA programming. In this case the inner loop probably can be performed in one cycle, but I don't quite have enough knowledge on the topic to tell the difference between my variant and yours. It's probably some FPGA compiler magic that detects your code, but not mine for whatever reason.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•