Are there any good practices for optimizing and micro-optimizing EFI Byte code (or bytecode in general) for speed given that instructions are executed in strict order? Like maybe one instruction is always faster (even if it's longer) than two consecutive or something.
For example, will
MOVIqq R1, -1 ; (10 bytes)
still be faster than
XOR64 R1, R1 ; (2 bytes)
NOT64 R1, R1 ; (2 bytes)
or will
MOVIqq R1, -10 ; (10 bytes)
still be faster than
MOVIww R1, -10 ; (4 bytes)
EXTNDW64 R1, R1 ; (2 bytes)
Is it preferable to use 64-bit variants over 32-bit when possible or not? (XOR32 R1, R1
, which zero-extends to 64 bits vs XOR64 R1, R1
)
Unfortunately I cannot test different implementations myself on a variety of EBC VM implementations to get empirical data. Only hope that it's not strictly implementation dependent and some general rules exist and may be applied.
mov r64, sign-extended-imm32
the way x86-64 machine code does? If it's interpreted, I'd assume that 2 instructions are slower than 1, even if the one is much bigger.