Instruction Scheduler in LLVM

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Instruction Scheduling in

LLVM
Hsiangkai Wang
[email protected]
Andes Technology
Agenda
Introduction to Instruction Scheduling

Scheduler in LLVM

Pipeline Modeling

Scheduler Customization
Instruction Scheduling

Different operations take different lengths of time.

Instruction scheduling is the process reordering


the operations in an attempt to decrease its
running time.
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b c
x5_32 x6_32

d e
x5_32 x7_32

f g
x5_32 x9_32
h
x5_32

i
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a 13
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b 10 c 12

a c e b d g f h i x5_32 x6_32

d 9 e 10
x5_32 x7_32

f 7 g 8
x5_32 x9_32
h 5
x5_32

i 3
a( 1): load $x5_32, $x8_32, @a load, store: 3
b( 4): add $x5_32, $x5_32, $x5_32 add: 1
c( 5): load $x6_32, $x8_32, @b mul: 2
d( 8): mul $x5_32, $x5_32, $x6_32
e(10): load $x7_32, $x8_32, @c
f(13): mul $x5_32, $x5_32, $x7_32 a 13
g(15): load $x9_32, $x8_32, @d
h(18): mul $x5_32, $x5_32, $x9_32 x5_32
i(20): store $x5_32, $x8_32, @a
b 10 c 12

a c e b d g f h i x5_32 x6_32

d 9 e 10
x5_32 x7_32
a( 1): load $x5_32, $x8_32, @a
c( 2): load $x6_32, $x8_32, @b f 7 g 8
e( 3): load $x7_32, $x8_32, @c
x5_32
b( 4): add $x5_32, $x5_32, $x5_32 x9_32
d( 5): mul $x5_32, $x5_32, $x6_32 h 5
g( 6): load $x9_32, $x8_32, @d
f( 7): mul $x5_32, $x5_32, $x7_32 x5_32
h( 9): mul $x5_32, $x5_32, $x9_32
i(11): store $x5_32, $x8_32, @a i 3
Scheduler in LLVM SchedulerDAG
(unit latency) (2008)(Itineraries) -scheditins
(Itineraries) (2012)(SchedModel) -schedmodel

(2008) (2008)
ScheduleDAGSDNodes ScheduleDAGInstrs

(2008)
(2012) SchedulePostRA

ScheduleDAGMI
TDList

-pre-RA-sched=<value>
=fast
(2013)
=linearize
=list-burr LiveInterval ScheduleDAGMILive
=source RegPressure
=list-hybrid
=list-ilp
=vliw-td

Instruction
Register
Selector
MI MI
Allocator
(DAG)

-post-RA-scheduler
(2013) (2008)
SelectionDAGISel MachineScheduler PostRAScheduler
(SchedulePostRATDList)
ScheduleDAGSDNodes::Run (ScheduleDAGMILive)
-enable-misched (2013) PostMachine
(ScheduleDAGMI)
Scheduler
-enable-post-misched
TargetPassConfig::substitutePass(&PostRASchedulerID, &PostMachineSchedulerID)
Data Dependency Graph
a x10_32: data

a: $x10_32 = LUI @Arr x10_32: output


b: $x8_32 = CLI 10
c: SW $x8_32, $x10_32, @Arr b x8_32: data
d: $x10_32 = ADDI $x8_32, 0
e: Call @foo, implicit $x10_32 (ExitSU) c
x8_32: data
x10_32: anti

x10_32: artificial

e
Data Dependency Graph

a
a: SW $x10_32, $x27_32, 12 x10_32: anti
MayAliasMem
b: $x10_32 = LW $x9_32, 0
c: SW $x10_32, $x27_32, 0 b
d: $x10_32 = LW $x8_32, @Glob
... x10_32: output
MayAliasMem x10_32: data

MayAliasMem x10_32: anti

d
Pipeline Modeling for Target
Use target description to describe the pipeline model.

For architecture

Create scheduling categories for operands.

<Target>Schedule.td

Associate scheduling categories to instructions.

<Target>InstrInfo.td

For processor

Associate pipeline information to scheduling categories.

<Target>Schedule<Processor>.td
Associate per-operand SchedReadWrite types with Instructions by
modifying the Instruction definition to inherit from Sched.

Sched
+SchedRW
SchedReadWrite
SchedRW lists the per-operand types that
map to the machine model of an instruction. Associate with instructions

Define a scheduler Define a scheduler


resource associated with a resource associated with a
use operand. def operand.

SchedRead SchedWrite
Associate with target
Associate with subtargets

ProcReadAdvance ProcWriteResources
+Cycles
+ProcResources

+ValidWrites +ResourceCycles

+Latency

For use with InstRW or For use with InstRW or


ItinRW. ItinRW.

SchedReadAdvance SchedWriteRes
ReadAdvance WriteRes
+ReadType +WriteType

Define WriteRes and ReadAdvance to


InstRW: Map a set of opcodes to a list of SchedReadWrite types. associate processor resources and latency
This allow the sub target to easily override specific operations. with each SchedReadWrite type.

ItinRW: Map a set of itinerary classes to SchedReadWrite resources. inherentence


aggregate
Create scheduling categories
def ALUOut : SchedWrite; // For define operands of ALU op
def ALUIn : SchedRead; // For use operands of ALU op
def MULOut : SchedWrite; // For define operands of MUL op
def MULIn : SchedRead; // For use operands of MUL op

Associate instructions with scheduling categories


class ALU_ri<bits<3> funct3, string opcodestr>
: RVInstI<funct3, OPC_OP_IMM,
(outs GPR:$rd),
(ins GPR:$rs1, simm12:$imm12),
opcodestr, "$rd, $rs1, $imm12”>,
Sched<[ALUOut, ALUIn]>;
Associate pipeline information to scheduling categories
def UnitALU : ProcResource<1> { let BufferSize = 0; }
def UnitMDU : ProcResource<1> { let BufferSize = 0; }

def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }


def : WriteRes<MULOut, [UnitMDU]> { let Latency = 4; }
def : ReadAdvance<ALUIn, 1>;
def : ReadAdvance<MULIn, 1>;

ISSUE ALU MDU WB

ALU
MDU
def : WriteRes<ALUOut, [UnitALU]> { let Latency = 2; }
def : WriteRes<MULOut, [UnitMDU]> { let Latency = 4; }
def : ReadAdvance<ALUIn, 1>;
def : ReadAdvance<MULIn, 1>;

Latency =
a: MUL r3, r3, r2 a MUL’s Latency - ADD’s Advance
b: ADD r4, r3, r2 r3: data
3
b
time ISSUE ALU MDU WB

t0 MUL
t1 ADD MUL
t2 stall MUL
t3 stall MUL
t4 ADD stall
GenericScheduler::tryCandidate
Physical register copies

Register pressure (Excess, CriticalMax)

Acyclic Latency

Cluster

Weak edges

Register pressure (CurrentMax)

Resource

Latency

Source order
Customize Scheduler for Target
Define your scheduling policy.

Define your scheduling strategy.

Add DAG mutations.


Implement overrideSchedPolicy

struct MachineSchedPolicy {
bool ShouldTrackPressure = false;
bool ShouldTrackLaneMasks = false;
bool OnlyTopDown = false;
bool OnlyBottomUp = false;
bool DisableLatencyHeuristic = false;
};

void
<Target>Subtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const {
Policy.OnlyTopDown = false;
Policy.OnlyBottomUp = false;
Policy.ShouldTrackPressure = true;
}
Derive MachineSchedStrategy

MachineSchedStrategy
class YourStrategy : public GenericScheduler {
...
SUnit *pickNode(bool &IsTopNode) override {
// ...
// Your heuristic algorithm.
GenericSchedulerBase
//

return GenericScheduler::pickNode(IsTopNode);

}

};
GenericScheduler
DAG mutations
// Implement your mutation.
class CustomMutation : public ScheduleDAGMutation {
public:
void apply(ScheduleDAGInstrs *DAGInstrs) override;
};

std::unique_ptr<SchuduleDAGMutation>
llvm::createCustomMutation(const <Target>Subtarget *STI) {
return llvm::make_unique<CustomMutation>(STI);

}

// Install
ScheduleDAGInstr *
createMachineScheduler(MachineSchedContext *C) const override {
const <Target>Subtarget &STI = C->MF->getSubtarget<<Target>Subtarget>();
ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createCustomMutation(STI));
return DAG;

}
a a

b c b c

d d
Reference
Engineering a Compiler, 2nd Edition

2017 LLVM Developers’ Meeting: “Writing Great


Machine Schedulers”

You might also like