AMD's optimization manual from 2017 says Zen 1's L2dTLB is 12-way associative, 1536 entry, at the top of page 26, in section 2.7.2 L2 Translation Lookaside Buffers. That document is nominally about Epyc 7001 series, but those are the same Zen 1 cores as your Ryzen.
The L2 iTLB is 8-way associative.
(512-entry, for 4k or 2M entries, with a 1G page entry "smashed" into a 2M entry.)
But assuming you're checking the right level,8000_0006h
, it seems there's no encoding for 12-way associativity in the field. It's unfortunately just codes for a table of possible values, not an integer bitfield.
Since there's (AFAIK) no way to encode a 12-way L2 dTLB, perhaps AMD just chose to encode the highest value <= the real value, so any code that uses it as a tuning parameter re: avoiding aliasing won't have way more conflict misses than expected.
The 1001b
encoding that means "see level 8000_001Dh instead" is (probably) not usable, because that level is only for normal caches, not TLBs.
But actually it's more interesting than that. Hadi Brais commented on this answer that it's not just a "simple" 12-way associative TLB, but also not fully separate. Instead, it's broken down into 8-way for 4K entries, 2-way for 2M/4M, and 2-way for coalesced 32K groups of 4K pages. Or on server CPUs, the breakdown is 6/3/3, and the CPUID dump reports 6-way for 4k and 3-way for 2M.
I found this write-up that gives an overview of the idea behind "skewed" TLBs. Apparently it does have separate ways for separate sizes, but with a hash function for indexing instead of just a couple low bits, reducing conflict misses vs. a simple index scheme for 2-way associative sub-sets.
Hadi writes:
Both the manual and cpuid info provide the correct L2 DTLB associativity and number of entries. Starting with Zen, the L2 DTLB is a skewed unified cache. This means that for a page with a particular address and size (which is unknown at the time of lookup), it can be mapped to some subset of ways of the total 12 ways according to a mapping function. For desktop/mobile models such as the Ryzen 7 1800X, any 4KB page can be mapped to 8 ways out of the 12 ways, any 2MB/4MB page can be mapped to 2 other ways, any coalesced 32KB page can be mapped to 2 other ways. That's a total of 12 ways.
For server models, the mapping is 6/3/3, respectively. The way cpuid reports TLB info is clear for previous uarches that use split TLBs. AMD wanted to use the same format for the new unified skewed design in Zen, but, as you can see, it doesn't really fit well. Anyway, effectively, it's indeed as 12-way cache with 1536 entries. You just have to to know that it's skewed to interpret the cpuid info correctly. PDEs are also cached in the L2 DTLB, but these work differently.
It's possible AMD may have published an erratum or other documentation about the CPUID encoding for L2dTLB associativity on Zen.
BTW, Wikichip's Zen page unfortunately doesn't list the associativities of each level of TLB. But https://www.7-cpu.com/cpu/Zen.html does list the same associativities as AMD's PDF manual.
This would give a set-size of 192 entries, so no easy modulo power 2 indexing.
Indeed, that would require some trickery, if it's doable efficiently at all.
or example, @Hadi suggested in comments on How does the indexing of the Ice Lake's 48KiB L1 data cache work? that a split design could have been possible, e.g. a 32k and a 16k cache. (But actually Intel did increase associativity to 12-way, keeping the number of sets the same and a power of 2, also avoiding aliasing problems while maintaining VIPT performance.)
That's actually a very similar Q&A, but with wrong associativity coming from a manual instead of CPUID. CPUs do sometimes have bugs where CPUID reports wrong info about cache/TLB parameters; programs that want to use CPUID info should have tables of fixups per CPU model/stepping so you have a place to correct errata that doesn't get fixed by microcode update.
(Although in this case it may not really be fixable due to encoding limitations, except by defining some of the unused encodings.)