Timeline for Cache set-sizes that aren't a power of two
Current License: CC BY-SA 4.0
14 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Nov 4, 2021 at 2:35 | history | edited | Peter Cordes | CC BY-SA 4.0 |
added 1690 characters in body
|
Nov 4, 2021 at 2:28 | history | edited | Peter Cordes | CC BY-SA 4.0 |
added 1690 characters in body
|
Nov 4, 2021 at 2:03 | comment | added | Hadi Brais | Oh, I don't think you'll be able to find this piece of information in any public written source. At least I couldn't find any. But I'm 99% sure it's skewed. You can quote me on this, no problem, until someone finds an official source. | |
Nov 4, 2021 at 1:51 | comment | added | Peter Cordes | @HadiBrais: Sorry, I meant to ask if you had a link for the fact that it's a "unified skewed" design in the first place. Is that in AMD's optimization manual somewhere? But thanks for the reminder about InstLatx64's archive of CPUID dumps, that's interesting too. | |
Nov 4, 2021 at 1:49 | comment | added | Hadi Brais | Yea sure. instlatx64 has a treasure of CPUID dumps. Take for example the EPYC 7551P server processor whose dump can be found here. Leaf 80000006 tells you that L2 DTLB has 6 ways for 4KB pages and 3 ways for 2MB/4MB pages. Note that CPUID doesn't report the number of ways for coalesced pages (32KB in Zen/Zen+), but it's easy to deduce from the other information. OP's processor dump is here. | |
Nov 4, 2021 at 0:21 | comment | added | Peter Cordes | @HadiBrais: Interesting. So you'd expect that a server part would report 6-way associativity in this CPUID field, if following the same meaning as the desktop part? I'd like to link a source for that info in my answer, if you have one; it seems like something that should have been mentioned in at least one of the sources I did link, preferably all 3 of wikichip, 7-cpu, and AMD's own manual. (But maybe I missed it in AMD's optimization manual.) | |
Nov 3, 2021 at 17:12 | comment | added | Hadi Brais |
For server models, the mapping is 6/3/3, respectively. The way cpuid reports TLB info is clear for previous uarches that use split TLBs. AMD wanted to use the same format for the new unified skewed design in Zen, but, as you can see, it doesn't really fit well. Anyway, effectively, it's indeed as 12-way cache with 1536 entries. You just have to to know that it's skewed to interpret the cpuid info correctly. PDEs are also cached in the L2 DTLB, but these work differently.
|
|
Nov 3, 2021 at 17:12 | comment | added | Hadi Brais |
Both the manual and cpuid info provide the correct L2 DTLB associativity and number of entries. Starting with Zen, the L2 DTLB is a skewed unified cache. This means that for a page with a particular address and size (which is unknown at the time of lookup), it can be mapped to some subset of ways of the total 12 ways according to a mapping function. For desktop/mobile models such as the Ryzen 7 1800X, any 4KB page can be mapped to 8 ways out of the 12 ways, any 2MB/4MB page can be mapped to 2 other ways, any coalesced 32KB page can be mapped to 2 other ways. That's a total of 12 ways.
|
|
Nov 3, 2021 at 6:54 | history | edited | Peter Cordes | CC BY-SA 4.0 |
added 501 characters in body
|
Nov 3, 2021 at 6:35 | comment | added | Peter Cordes | @BonitaMontero: Right, that's what I said, as the reason why AMD wouldn't / couldn't use the "1001b=see level 8000_001Dh instead" encoding for TLBs. | |
Nov 3, 2021 at 6:33 | comment | added | Bonita Montero | 0x8000001Du is not for TLB-caches. | |
Nov 3, 2021 at 6:29 | comment | added | Peter Cordes | @BonitaMontero: I see. That's doc is from 2010, I don't supposed there's an updated version? I checked sandpile.org/x86/cpuid.htm, and it doesn't mention any encoding for 12-way either. (Except one that means "see level 8000_001Dh", but that level only makes sense for normal caches, not TLBs. It does encode the associativity as an integer, not a code, but probably AMD doesn't use it.) So anyway, it's possible that AMD chose to report the largest encodeable value that's <= the real value, for L2dTLB associativity, and maybe there is no way to get the CPU to tell you the real val. | |
Nov 3, 2021 at 6:19 | comment | added | Bonita Montero | The associativity of the number of ways given in CPUID 0x80000006u is encoded in a 4-bit value. The encoding is given in amd.com/system/files/TechDocs/25481.pdf - but there isn't an encoding for 12 ways. | |
Nov 3, 2021 at 4:14 | history | answered | Peter Cordes | CC BY-SA 4.0 |