Branch Prediction Schemes2
Branch Prediction Schemes2
Branch Prediction Schemes2
There are many methods to deal with the pipeline stalls caused by branch delay. We discuss four simple compile-time schemes in which predictions are static - they are fixed for each branch during the entire execution, and the predictions are compile-time guesses. Stall pipeline Predict taken Predict not taken Delayed branch
Stall pipeline
The simplest scheme to handle branches is to freeze or flush the pipeline, holding or deleting any instructions after the branch until the branch destination is known. Advantage: simple both to software and hardware (solution described earlier)
IF
ID IF
EX ID IF
ID IF
EX idle IF
Branch target+1
IF
ID
EX
MEM
WB
When branch is not taken, determined during ID, we have fetched the fall-through and just continue. If the branch is taken during ID, we restart the fetch at the branch target. This causes all instructions following the branch to stall one clock cycle.
Predict Taken
An alternative scheme is to predict the branch as taken. As soon as the branch is decoded and the target address is computed, we assume the branch to be taken and begin fetching and executing at the target address. Because in DLX pipeline the target address is not known any earlier than the branch outcome, there is no advantage in this approach. In some machines where the target address is known before the branch outcome a predict-taken scheme might make sense.
Delayed Branch
In a delayed branch, the execution cycle with a branch delay of length n is
Branch instr sequential successor 1 sequential successor 2 . . . . . sequential successor n Branch target if taken
Sequential successors are in the branch-delay slots. These instructions are executed whether or not the branch is taken. The pipeline behavior of the DLX pipeline, which has one branch delay slot is shown below: Untaken branch instr Branch delay instr(i+1) Instr i+2 Instr i+3 Instr i+4 Taken branch instr Branch delay instr(i+1) Branch target Branch target+1 IF ID EX MEM WB IF ID EX IF ID IF MEM WB EX ID IF IF ID EX MEM WB IF ID EX IF ID IF MEM WB EX ID MEM WB EX MEM WB MEM WB EX ID MEM WB EX MEM WB
Branch target+2
IF
ID
EX
MEM WB
The job of the compiler is to make the successor instructions valid and useful. We will show three branch-scheduling schemes: From before branch From target From fall through
Scheduling the branch-delay slot. The left box in each pair shows the code before scheduling, the right box shows the scheduled code In (a) the delay slot is scheduled with an independent instruction from before the branch. This is the best choice. Strategies (b) and (c) are used when (a) is not possible. In the code sequences for (b) and (c), the use of R1 in the branch condition prevents the ADD instruction (whose destination is R1) from being moved after the branch. In (b) the branch-delay slot is scheduled from the target of the branch; usually the target instruction will need to be copied because it can be reached by another path. In (c) the branch-delay slot is scheduled from the not-taken fall through To make this optimazation legal for (b) and (c), it must be OK to execute the SUB instruction when the branch goes in the
g unexpected direction. OK means that the work might be wasted but the program will still execute correctly.
Scheduling strategy
Requirements
When Improves Performance Always When branch is taken. May enlarge program if instructions are duplicated
From before Branch must not depend on the rescheduled branch instructions From target From fall though Must be OK to execute rescheduled instructions if branch is not taken
The limitations on delayed-branch scheduling arise from the restrictions on the instructions that are scheduled into the delay slots and our ability to predict at compile time whether a branch is likely to be taken or not.
Cancelling Branch
To improve the ability of the compiler to fill branch delay slots, most machines with conditional branches have introduced a cancelling branch. In a cancelling branch the instruction includes the direction that the branch was predicted. - if the branch behaves as predicted, the instruction in the branch delay slot is fully executed; - if the branch is incorrectly predicted, the instruction in the delay slot is turned into no-op(idle). The behavior of a predicted-taken cancelling branch depends on whether the branch is taken or not: Untaken branch instr Branch delay instr(i+1) Instr i+2 Instr i+3 Instr i+4 IF ID EX MEM IF ID idle IF ID IF WB idle idle EX ID IF MEM EX ID WB MEM EX WB MEM WB
Taken branch instr Branch delay instr(i+1) Branch target Branch target+1 Branch target+2
The advantage of cancelling branches is that they eliminate the requirements on the instruction placed in the delay slot. Delayed branches are an architecturally visible feature of the pipeline. This is the source both of their advantage - allowing the use of simple compiler scheduling to reduce branch penalties; and their disadvantage - exposing an aspect of the implementation that is likely to change.