-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the Tier 2 interpreter #112287
Comments
This makes the Tier 2 interpreter a little faster. I calculated by about 3%, though I hesitate to claim an exact number. This starts by doubling the trace size limit (to 512), making it more likely that loops fit in a trace. The rest of the approach is to only load `oparg` and `operand` in cases that use them. The code generator know when these are used. For `oparg`, it will conditionally emit ``` oparg = CURRENT_OPARG(); ``` at the top of the case block. (The `oparg` variable may be referenced multiple times by the instructions code block, so it must be in a variable.) For `operand`, it will use `CURRENT_OPERAND()` directly instead of referencing the `operand` variable, which no longer exists. (There is only one place where this will be used.)
…12286) This makes the Tier 2 interpreter a little faster. I calculated by about 3%, though I hesitate to claim an exact number. This starts by doubling the trace size limit (to 512), making it more likely that loops fit in a trace. The rest of the approach is to only load `oparg` and `operand` in cases that use them. The code generator know when these are used. For `oparg`, it will conditionally emit ``` oparg = CURRENT_OPARG(); ``` at the top of the case block. (The `oparg` variable may be referenced multiple times by the instructions code block, so it must be in a variable.) For `operand`, it will use `CURRENT_OPERAND()` directly instead of referencing the `operand` variable, which no longer exists. (There is only one place where this will be used.)
Closing because the PR has been merged. Please re-open if there's more needed here. |
Thanks for the ping! Arguably the issue was wider, but we've decided to focus on JIT performance, and the Tier 2 interpreter's speed is no longer of great concern (we keep it because it's easier to debug the rest of the Tier 2 machinery this way). So let's keep it closed but mark as "not planned", which is closer to the truth. |
…12286) This makes the Tier 2 interpreter a little faster. I calculated by about 3%, though I hesitate to claim an exact number. This starts by doubling the trace size limit (to 512), making it more likely that loops fit in a trace. The rest of the approach is to only load `oparg` and `operand` in cases that use them. The code generator know when these are used. For `oparg`, it will conditionally emit ``` oparg = CURRENT_OPARG(); ``` at the top of the case block. (The `oparg` variable may be referenced multiple times by the instructions code block, so it must be in a variable.) For `operand`, it will use `CURRENT_OPERAND()` directly instead of referencing the `operand` variable, which no longer exists. (There is only one place where this will be used.)
The Tier 2 interpreter hasn't really been optimized carefully. While the "optimizer" pass is intended to make the Tier 2 micro-code faster through things like guard elimination or constantification, we should also look into just making the Tier 2 interpreter itself faster -- possibly by changing the representation of executable traces held in the executor (the current format is identical to the IR, which is rather verbose, using 16 bytes per uop!), and possibly by just carefully tuning the interpreter. (For example, if the space of micro-opcode ordinals could overlap the space of Tier 1 bytecode ordinals, we could fit the Tier 2 opcode in one byte.)
Linked PRs
The text was updated successfully, but these errors were encountered: